{"blog_url":"https://gasyou.hatenablog.jp/","url":"https://gasyou.hatenablog.jp/entry/2018/10/25/094150","height":"190","type":"rich","html":"<iframe src=\"https://hatenablog-parts.com/embed?url=https%3A%2F%2Fgasyou.hatenablog.jp%2Fentry%2F2018%2F10%2F25%2F094150\" title=\"\u539f\u70b9\u56de\u5e30\u3057\u3066\u3001PGLeaf\uff08\u521d\u4ee3\uff09\u306e\u6539\u826f\u306b\u53d6\u308a\u7d44\u3080\u4e8b\u306b\u3057\u307e\u3057\u305f - GA\u5c06\uff1f\u958b\u767a\u65e5\u8a18\uff5e\u738b\u7406\u306e\u305d\u306e\u5148\u3078\uff5e\" class=\"embed-card embed-blogcard\" scrolling=\"no\" frameborder=\"0\" style=\"display: block; width: 100%; height: 190px; max-width: 500px; margin: 10px 0px;\"></iframe>","categories":["\u958b\u767a\u65e5\u8a18"],"title":"\u539f\u70b9\u56de\u5e30\u3057\u3066\u3001PGLeaf\uff08\u521d\u4ee3\uff09\u306e\u6539\u826f\u306b\u53d6\u308a\u7d44\u3080\u4e8b\u306b\u3057\u307e\u3057\u305f","provider_url":"https://hatena.blog","width":"100%","author_name":"Gasyou","author_url":"https://blog.hatena.ne.jp/Gasyou/","blog_title":"GA\u5c06\uff1f\u958b\u767a\u65e5\u8a18\uff5e\u738b\u7406\u306e\u305d\u306e\u5148\u3078\uff5e","provider_name":"Hatena Blog","version":"1.0","published":"2018-10-25 09:41:50","description":"https://papers.nips.cc/paper/1713-policy-gradient-methods-for-reinforcement-learning-with-function-approximation.pdf http://proceedings.mlr.press/v80/xu18d/xu18d.pdf TDLeaf(\u03bb)\u3068\u304b\u3068\u7d44\u307f\u5408\u308f\u305b\u308b\u306e\u306f\u5f53\u9762\u898b\u9001\u3063\u3066\u3001PGLeaf\u5358\u4f53\u3067\u3069\u3053\u307e\u3067\u884c\u3051\u308b\u304b\u30c8\u30e9\u30a4\u3002 \u3093\u3067\u3001\u4e0a\u8a18\u53c2\u8003\u6587\u732e\u3092\u30d9\u30fc\u30b9\u306b\u3001PGLeaf\u3092Off-Policy\u5316\u3057\u30e1\u30bf\u5b66\u7fd2\u3092\u53d6\u308a\u5165\u308c\u308b\u3064\u3082\u308a\u3002 \u4e0a\u624b\u304f\u884c\u3051\u3070\u4eca\u5e74\u306e\u9078\u624b\u6a29\u3067\u3042\u3063\u305f\u8af8\u3005\u306e\u554f\u984c\u304c\u89e3\u6c7a\u51fa\u6765\u308b\u2026\u306f\u305a\u3002","image_url":null}