{"blog_url":"https://inarizuuuushi.hatenablog.com/","version":"1.0","html":"<iframe src=\"https://hatenablog-parts.com/embed?url=https%3A%2F%2Finarizuuuushi.hatenablog.com%2Fentry%2F2024%2F09%2F10%2F094606\" title=\"Test NCCL failure common.cu:1005 &#39;unhandled cuda error (run with NCCL_DEBUG=INFO for details) &#39; .. pid 175547: Test failure common.cu:891 - Sabrou-mal \u30b5\u30d6\u30ed\u30a6\u4e38\" class=\"embed-card embed-blogcard\" scrolling=\"no\" frameborder=\"0\" style=\"display: block; width: 100%; height: 190px; max-width: 500px; margin: 10px 0px;\"></iframe>","provider_url":"https://hatena.blog","author_name":"inarizuuuushi","description":"I encountered the following error when executing nccl-tests. tateiwa@snail01:/data/nccl-tests$ NCCL_DEBUG=INFO ./build/all_reduce_perf -g 2 # nThread 1 nGpus 2 minBytes 33554432 maxBytes 33554432 step: 1048576(bytes) warmup iters: 5 iters: 20 agg iters: 1 validation: 1 graph: 0 # # Using devices # R\u2026","blog_title":"Sabrou-mal \u30b5\u30d6\u30ed\u30a6\u4e38","published":"2024-09-10 09:46:06","title":"Test NCCL failure common.cu:1005 'unhandled cuda error (run with NCCL_DEBUG=INFO for details) ' .. pid 175547: Test failure common.cu:891","categories":[],"provider_name":"Hatena Blog","image_url":null,"width":"100%","url":"https://inarizuuuushi.hatenablog.com/entry/2024/09/10/094606","type":"rich","author_url":"https://blog.hatena.ne.jp/inarizuuuushi/","height":"190"}