{"author_url":"https://blog.hatena.ne.jp/inarizuuuushi/","author_name":"inarizuuuushi","url":"https://inarizuuuushi.hatenablog.com/entry/2024/09/10/094353","version":"1.0","width":"100%","type":"rich","blog_url":"https://inarizuuuushi.hatenablog.com/","description":"nccl-tests\u3092\u5b9f\u884c\u4e2d\u306b\u6b21\u306e\u3088\u3046\u306a\u30a8\u30e9\u30fc\u3002 tateiwa@snail01:/data/nccl-tests$ NCCL_DEBUG=INFO ./build/all_reduce_perf -g 2 # nThread 1 nGpus 2 minBytes 33554432 maxBytes 33554432 step: 1048576(bytes) warmup iters: 5 iters: 20 agg iters: 1 validation: 1 graph: 0 # # Using devices # Rank 0 Group 0 Pid 175547 on snail01 \u2026","categories":[],"html":"<iframe src=\"https://hatenablog-parts.com/embed?url=https%3A%2F%2Finarizuuuushi.hatenablog.com%2Fentry%2F2024%2F09%2F10%2F094353\" title=\"Test NCCL failure common.cu:1005 &#39;unhandled cuda error (run with NCCL_DEBUG=INFO for details) &#39;  .. pid 175547: Test failure common.cu:891 - Sabrou-mal \u30b5\u30d6\u30ed\u30a6\u4e38\" class=\"embed-card embed-blogcard\" scrolling=\"no\" frameborder=\"0\" style=\"display: block; width: 100%; height: 190px; max-width: 500px; margin: 10px 0px;\"></iframe>","provider_url":"https://hatena.blog","title":"Test NCCL failure common.cu:1005 'unhandled cuda error (run with NCCL_DEBUG=INFO for details) '  .. pid 175547: Test failure common.cu:891","provider_name":"Hatena Blog","blog_title":"Sabrou-mal \u30b5\u30d6\u30ed\u30a6\u4e38","height":"190","published":"2024-09-10 09:43:53","image_url":null}