{"blog_title":"Sabrou-mal \u30b5\u30d6\u30ed\u30a6\u4e38","provider_name":"Hatena Blog","blog_url":"https://inarizuuuushi.hatenablog.com/","image_url":"https://cdn-ak.f.st-hatena.com/images/fotolife/i/inarizuuuushi/20220531/20220531180951.jpg","url":"https://inarizuuuushi.hatenablog.com/entry/2022/06/08/090000","published":"2022-06-08 09:00:00","categories":["\u8ad6\u6587\u30b5\u30fc\u30d9\u30a4","\u5206\u6563\u6df1\u5c64\u5b66\u7fd2"],"html":"<iframe src=\"https://hatenablog-parts.com/embed?url=https%3A%2F%2Finarizuuuushi.hatenablog.com%2Fentry%2F2022%2F06%2F08%2F090000\" title=\"\u30b5\u30fc\u30d9\u30a4: Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism - Sabrou-mal \u30b5\u30d6\u30ed\u30a6\u4e38\" class=\"embed-card embed-blogcard\" scrolling=\"no\" frameborder=\"0\" style=\"display: block; width: 100%; height: 190px; max-width: 500px; margin: 10px 0px;\"></iframe>","height":"190","width":"100%","type":"rich","version":"1.0","author_url":"https://blog.hatena.ne.jp/inarizuuuushi/","provider_url":"https://hatena.blog","author_name":"inarizuuuushi","title":"\u30b5\u30fc\u30d9\u30a4: Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism","description":"@article{shoeybi2019megatron, title={Megatron-lm: Training multi-billion parameter language models using model parallelism}, author={Shoeybi, Mohammad and Patwary, Mostofa and Puri, Raul and LeGresley, Patrick and Casper, Jared and Catanzaro, Bryan}, journal={arXiv preprint arXiv:1909.08053}, year={\u2026"}