{"url":"https://techblog.vpoint.co.jp/entry/2023/09/26/135228","height":"190","title":"Reinforcement Learning from Human Feedback(RLHF)\u306b\u3064\u3044\u3066\u8abf\u3079\u3066\u5b9f\u88c5\u307e\u3067\u8a66\u3057\u3066\u307f\u307e\u3057\u305f\uff01","description":"\u3053\u3093\u306b\u3061\u306f\u3001CCCMK\u30db\u30fc\u30eb\u30c7\u30a3\u30f3\u30b0\u30b9 TECH LAB\u4e09\u6d66\u3067\u3059\u3002 \u306a\u3093\u3060\u304b\u6025\u306b\u6dbc\u3057\u304f\u306a\u3063\u3066\u304d\u307e\u3057\u305f\u3002\u904e\u3054\u3057\u3084\u3059\u304f\u306a\u3063\u3066\u3042\u308a\u304c\u305f\u3044\u306e\u3067\u3059\u304c\u3001\u6025\u306a\u6c17\u6e29\u306e\u5909\u5316\u306b\u4f53\u304c\u4ed8\u3044\u3066\u3044\u3051\u3066\u3044\u306a\u3044\u3067\u3059\u30fb\u30fb\u30fb\u3002\u3053\u3046\u3044\u3046\u6642\u671f\u306f\u3061\u3083\u3093\u3068\u7761\u7720\u3092\u3068\u3089\u306a\u3044\u3068\u3001\u3068\u610f\u8b58\u3059\u308b\u3088\u3046\u306b\u306a\u308a\u307e\u3057\u305f\u3002 \u4eca\u56de\u306f\u4ee5\u524d\u304b\u3089\u6c17\u306b\u306a\u3063\u3066\u3044\u305f\u3001Reinforcement Learning from Human Feedback(RLHF)\u3068\u3044\u3046\u5f37\u5316\u5b66\u7fd2\u306e\u624b\u6cd5\u306b\u3064\u3044\u3066\u8abf\u3079\u3066\u307f\u307e\u3057\u305f\u3002 LM\u304c\u3088\u308a\u597d\u307e\u3057\u3044\u30c6\u30ad\u30b9\u30c8\u3092\u751f\u6210\u51fa\u6765\u308b\u3088\u3046\u306b\u3059\u308b \u5927\u91cf\u306e\u30c6\u30ad\u30b9\u30c8\u30c7\u30fc\u30bf\u306b\u3088\u3063\u3066\u81ea\u7136\u306a\u30c6\u30ad\u30b9\u30c8\u3092\u751f\u6210\u3067\u304d\u308b\u3053\u3068\u304c\u51fa\u6765\u308b\u3088\u3046\u306b\u306a\u3063\u305f\u8a00\u8a9e\u30e2\u30c7\u30eb(Language model\u2026","blog_title":"V\u30dd\u30a4\u30f3\u30c8\u30de\u30fc\u30b1\u30c6\u30a3\u30f3\u30b0\uff5cTECH LAB\u306e Tech Blog","provider_url":"https://hatena.blog","html":"<iframe src=\"https://hatenablog-parts.com/embed?url=https%3A%2F%2Ftechblog.vpoint.co.jp%2Fentry%2F2023%2F09%2F26%2F135228\" title=\"Reinforcement Learning from Human Feedback(RLHF)\u306b\u3064\u3044\u3066\u8abf\u3079\u3066\u5b9f\u88c5\u307e\u3067\u8a66\u3057\u3066\u307f\u307e\u3057\u305f\uff01 - V\u30dd\u30a4\u30f3\u30c8\u30de\u30fc\u30b1\u30c6\u30a3\u30f3\u30b0\uff5cTECH LAB\u306e Tech Blog\" class=\"embed-card embed-blogcard\" scrolling=\"no\" frameborder=\"0\" style=\"display: block; width: 100%; height: 190px; max-width: 500px; margin: 10px 0px;\"></iframe>","version":"1.0","author_url":"https://blog.hatena.ne.jp/miu4930/","categories":["LLMs","RLHF"],"type":"rich","width":"100%","provider_name":"Hatena Blog","published":"2023-09-26 13:52:28","blog_url":"https://techblog.vpoint.co.jp/","author_name":"miu4930","image_url":null}