{"description":"https://dl.acm.org/doi/10.1145/3458817.3476209 paper: @inproceedings{10.1145/3458817.3476209, author = {Narayanan, Deepak and Shoeybi, Mohammad and Casper, Jared and LeGresley, Patrick and Patwary, Mostofa and Korthikanti, Vijay and Vainbrand, Dmitri and Kashinkunti, Prethvi and Bernauer, Julie and \u2026","type":"rich","version":"1.0","author_url":"https://blog.hatena.ne.jp/inarizuuuushi/","title":"\u30b5\u30fc\u30d9\u30a4: Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM","categories":["\u5206\u6563\u6df1\u5c64\u5b66\u7fd2","\u8ad6\u6587\u30b5\u30fc\u30d9\u30a4"],"width":"100%","provider_url":"https://hatena.blog","url":"https://inarizuuuushi.hatenablog.com/entry/2022/05/17/090000","height":"190","html":"<iframe src=\"https://hatenablog-parts.com/embed?url=https%3A%2F%2Finarizuuuushi.hatenablog.com%2Fentry%2F2022%2F05%2F17%2F090000\" title=\"\u30b5\u30fc\u30d9\u30a4: Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM - Sabrou-mal \u30b5\u30d6\u30ed\u30a6\u4e38\" class=\"embed-card embed-blogcard\" scrolling=\"no\" frameborder=\"0\" style=\"display: block; width: 100%; height: 190px; max-width: 500px; margin: 10px 0px;\"></iframe>","published":"2022-05-17 09:00:00","author_name":"inarizuuuushi","provider_name":"Hatena Blog","blog_url":"https://inarizuuuushi.hatenablog.com/","image_url":"https://cdn-ak.f.st-hatena.com/images/fotolife/i/inarizuuuushi/20220513/20220513100226.png","blog_title":"Sabrou-mal \u30b5\u30d6\u30ed\u30a6\u4e38"}