PytorchのDeviceMesh

inarizuuuushi https://blog.hatena.ne.jp/inarizuuuushi/ Sabrou-mal サブロウ丸 https://inarizuuuushi.hatenablog.com/ DeviceMeshはGPUなどのリソースグループを管理するツールで、これを使えば分散学習に割り当てるGPUリソースを柔軟に割り当てられる。分散並列手法にはいくつかの種類があり、大まかにデータ並列とモデル並列の二つがある。LLMのようなパラメータ数が多いモデルを学習する際には、これらを組み合わせて実行したいことがある。その際、サーバー内ではモデル並列、サーバー間でデータ並列を行うといった細かい指定が可能だ。図は以下のDeviceMeshドキュメントから引用。 Getting Started with DeviceMesh — PyTorch Tutorials 2.7.0+cu126 do… 190 <iframe src="https://hatenablog-parts.com/embed?url=https%3A%2F%2Finarizuuuushi.hatenablog.com%2Fentry%2F2025%2F04%2F28%2F195443" title="PytorchのDeviceMesh - Sabrou-mal サブロウ丸" class="embed-card embed-blogcard" scrolling="no" frameborder="0" style="display: block; width: 100%; height: 190px; max-width: 500px; margin: 10px 0px;"></iframe> https://pytorch.org/tutorials/_images/device_mesh.png Hatena Blog https://hatena.blog 2025-04-28 19:54:43 PytorchのDeviceMesh rich https://inarizuuuushi.hatenablog.com/entry/2025/04/28/195443 1.0 100%