{"provider_url":"https://hatena.blog","type":"rich","width":"100%","author_name":"go5paopao","published":"2022-08-10 10:06:14","url":"https://tech-blog.abeja.asia/entry/abeja-gpt-model-202208","blog_title":"ABEJA Tech Blog","blog_url":"https://tech-blog.abeja.asia/","description":"1. \u306f\u3058\u3081\u306b 2. \u5148\u884c\u7814\u7a76\u304b\u3089\u306e\u5b66\u3073 3. \u524d\u63d0 4. \u30a2\u30fc\u30ad\u30c6\u30af\u30c1\u30e3\u5909\u66f4\u5019\u88dc \u6d3b\u6027\u5316\u95a2\u6570\u306e\u5909\u66f4 (SwishGLU) Transformer layer\u306e\u4e26\u5217\u5316 bias\u30d1\u30e9\u30e1\u30fc\u30bf\u9664\u53bb Input-Output Embedding\u306e\u5171\u6709 (Weight tying) 5. \u5c0f\u898f\u6a21\u30e2\u30c7\u30eb\u3067\u306e\u5b9f\u9a13 \u5b9f\u9a13\u8a2d\u5b9a Transformer layer\u306e\u4e26\u5217\u5316 SwishGLU\u306e\u9069\u7528 Bias parameter\u306e\u9664\u53bb bias\u524a\u9664\u306e\u5b9f\u9a13 \u6700\u521d\u3082\u3057\u304f\u306f\u6700\u5f8c\u306ebias\u3060\u3051\u3092\u6b8b\u3059 Input-Output Embedding\u306e\u5171\u6709 (Weight tying) 6. \u4e2d\u898f\u6a21\u30e2\u30c7\u30eb\u3067\u306e\u5b9f\u9a13 \u5b9f\u9a13\u8a2d\u5b9a \u30e2\u30c7\u30eb\u2026","version":"1.0","image_url":"https://cdn-ak.f.st-hatena.com/images/fotolife/h/hiroyuki_abeja/20220809/20220809153955.png","height":"190","title":"ABEJA GPT\u30e2\u30c7\u30eb\u306b\u304a\u3051\u308b\u30a2\u30fc\u30ad\u30c6\u30af\u30c1\u30e3\u306e\u5de5\u592b","provider_name":"Hatena Blog","categories":[],"html":"<iframe src=\"https://hatenablog-parts.com/embed?url=https%3A%2F%2Ftech-blog.abeja.asia%2Fentry%2Fabeja-gpt-model-202208\" title=\"ABEJA GPT\u30e2\u30c7\u30eb\u306b\u304a\u3051\u308b\u30a2\u30fc\u30ad\u30c6\u30af\u30c1\u30e3\u306e\u5de5\u592b - ABEJA Tech Blog\" class=\"embed-card embed-blogcard\" scrolling=\"no\" frameborder=\"0\" style=\"display: block; width: 100%; height: 190px; max-width: 500px; margin: 10px 0px;\"></iframe>","author_url":"https://blog.hatena.ne.jp/go5paopao/"}