【GAIL】逆強化学習とGANを組み合わせた模倣学習アルゴリズムを実装してみる【CartPole】

morika-takeuchi https://blog.hatena.ne.jp/morika-takeuchi/ Morikatron Engineer Blog https://tech.morikatron.ai/ 機械学習強化学習 GAIL Python こんにちは、エンジニアの竹内です。以前の記事でDQNに模倣学習の仕組みを取り入れたDeep Q-Learning from Demonstrationsというアルゴリズムを紹介しましたが、模倣学習には他にもいろいろなアプローチが存在します。特にエキスパートの行動軌跡から環境の報酬関数を推定する逆強化学習(Inverse Reinforcement Learning)という手法を利用したものは模倣学習アルゴリズムの中でも代表的な手法の1つであり、環境からの報酬が得られない場合でも模倣学習を行う事ができます。そこで今回は逆強化学習を用いた模倣学習アルゴリズムの中でも特に有用な手法である、敵対的… 190 <iframe src="https://hatenablog-parts.com/embed?url=https%3A%2F%2Ftech.morikatron.ai%2Fentry%2F2020%2F10%2F12%2F100000" title="【GAIL】逆強化学習とGANを組み合わせた模倣学習アルゴリズムを実装してみる【CartPole】 - Morikatron Engineer Blog" class="embed-card embed-blogcard" scrolling="no" frameborder="0" style="display: block; width: 100%; height: 190px; max-width: 500px; margin: 10px 0px;"></iframe> https://cdn-ak.f.st-hatena.com/images/fotolife/m/morika-takeuchi/20200925/20200925162630.png Hatena Blog https://hatena.blog 2020-10-12 10:00:00 【GAIL】逆強化学習とGANを組み合わせた模倣学習アルゴリズムを実装してみる【CartPole】 rich https://tech.morikatron.ai/entry/2020/10/12/100000 1.0 100%