{"blog_url":"https://paper.hatenadiary.jp/","width":"100%","blog_title":"\u3081\u3082","author_name":"misos","url":"https://paper.hatenadiary.jp/entry/2016/11/30/154328","provider_url":"https://hatena.blog","type":"rich","title":"\u5f37\u5316\u5b66\u7fd2\u306e\u8cc7\u6599\u30e1\u30e2\uff12\uff1a\u591a\u8155\u30d0\u30f3\u30c7\u30a3\u30c3\u30c8\u554f\u984c","author_url":"https://blog.hatena.ne.jp/misos/","categories":["\u8ad6\u6587\u30fb\u8cc7\u6599\u30fb\u30b9\u30e9\u30a4\u30c9\u96c6","\u6a5f\u68b0\u5b66\u7fd2","\u5f37\u5316\u5b66\u7fd2","\u30d0\u30f3\u30c7\u30a3\u30c3\u30c8"],"description":"\u57fa\u672c\u306e\u5185\u5bb9 \u5404\u7a2e\u5b9a\u5f0f\u5316 Exploration/Exploitation Dilemma Stationary Problem(\u5b9a\u5e38\u306a\u30b1\u30fc\u30b9) Action-Value Methods \u884c\u52d5\u9078\u629e\u306e\u6226\u7565 greedy(\u8caa\u6b32) \u03b5-Greedy Soft-max action selection Non-stationary Problem(\u975e\u5b9a\u5e38\u306a\u30b1\u30fc\u30b9) \u30a2\u30fc\u30e0\u306e\u884c\u52d5\u6226\u7565 Gradient-Bandit All Moves As Fist(AMAF) Upper Confidence Bound (UCB) action selection \u6b21\u56de Sutton\u672c\u306e2\u7ae0\u3001\u591a\u8155\u30d0\u30f3\u30c7\u30a3\u30c3\u30c8\u554f\u984c\u95a2\u4fc2\u2026","provider_name":"Hatena Blog","image_url":"http://ecx.images-amazon.com/images/I/51yD20bFEYL.jpg","version":"1.0","published":"2016-11-30 15:43:28","height":"190","html":"<iframe src=\"https://hatenablog-parts.com/embed?url=https%3A%2F%2Fpaper.hatenadiary.jp%2Fentry%2F2016%2F11%2F30%2F154328\" title=\"\u5f37\u5316\u5b66\u7fd2\u306e\u8cc7\u6599\u30e1\u30e2\uff12\uff1a\u591a\u8155\u30d0\u30f3\u30c7\u30a3\u30c3\u30c8\u554f\u984c - \u3081\u3082\" class=\"embed-card embed-blogcard\" scrolling=\"no\" frameborder=\"0\" style=\"display: block; width: 100%; height: 190px; max-width: 500px; margin: 10px 0px;\"></iframe>"}