{"description":"\u5f37\u5316\u5b66\u7fd2\u306e\u57fa\u672c Introduction to Reinforcement Learning with Function Approximation Temporal-Difference Learning Bellman expectation equation off-policy Function approximation \u03b5-greedy policy Model-based reinforcement learning \u6d3b\u7528\u3068\u63a2\u7d22\u306e\u30b8\u30ec\u30f3\u30de \u6b21\u56de \u30b5\u30c3\u30c8\u30f3\u6c0f\u306e\u672c(\u30c9\u30e9\u30d5\u30c8\u7248)\u306e\u7ae0\u7acb\u3066\u306b\u6cbf\u3063\u3066\u30e1\u30e2\u3001\u3068\u308a\u3042\u3048\u305a\u5c0e\u5165\u90e8\u5206\u3002 \u899a\u66f8\u7a0b\u5ea6\u3067\u7d30\u304b\u3044\u5185\u5bb9\u306b\u8e0f\u307f\u8fbc\u3080\u4e88\u5b9a\u3082\u7d30\u304b\u3044\u5f0f\u3092\u53c2\u7167\u3057\u305f\u308a\u3082\u3057\u306a\u3044\u2026","blog_title":"\u3081\u3082","html":"<iframe src=\"https://hatenablog-parts.com/embed?url=https%3A%2F%2Fpaper.hatenadiary.jp%2Fentry%2F2016%2F11%2F30%2F032743\" title=\"\u5f37\u5316\u5b66\u7fd2\u306e\u8cc7\u6599\u30e1\u30e2\uff11\uff1a\u57fa\u672c - \u3081\u3082\" class=\"embed-card embed-blogcard\" scrolling=\"no\" frameborder=\"0\" style=\"display: block; width: 100%; height: 190px; max-width: 500px; margin: 10px 0px;\"></iframe>","image_url":"http://ws-fe.amazon-adsystem.com/widgets/q?_encoding=UTF8&ASIN=4627826613&Format=_SL250_&ID=AsinImage&MarketPlace=JP&ServiceVersion=20070822&WS=1&tag=misoscript-22","version":"1.0","published":"2016-11-30 03:27:43","provider_url":"https://hatena.blog","provider_name":"Hatena Blog","width":"100%","title":"\u5f37\u5316\u5b66\u7fd2\u306e\u8cc7\u6599\u30e1\u30e2\uff11\uff1a\u57fa\u672c","blog_url":"https://paper.hatenadiary.jp/","author_name":"misos","type":"rich","author_url":"https://blog.hatena.ne.jp/misos/","height":"190","url":"https://paper.hatenadiary.jp/entry/2016/11/30/032743","categories":["\u5f37\u5316\u5b66\u7fd2","\u30ea\u30f3\u30af\u96c6","\u6a5f\u68b0\u5b66\u7fd2","\u30d0\u30f3\u30c7\u30a3\u30c3\u30c8"]}