量子化：論文 Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference を読む（１）

daisuke20240310 https://blog.hatena.ne.jp/daisuke20240310/ 土日の勉強ノート https://daisuke20240310.hatenablog.com/ AI AI-量子化今回からAIの量子化について、学んでいこうと思います。具体的には、量子化することによる推論の高速化について調べていきたいと思います。最近のChatGPTなどに代表される「大規模言語モデル（LLM）」の動向としても、量子化が注目されてきています。量子化によって、一定の精度劣化はありますが、モデルを小さくしたり、推論速度を改善したりできることから、エッジデバイスで推論する際に、よく使われています。最近、スマホにLLMを搭載したというニュースが出てました。 AIの量子化では、大きく分けて、PTQ（Post Training Quantization）と呼ばれる「学習後の量子化」と、QAT（Qu… 190 <iframe src="https://hatenablog-parts.com/embed?url=https%3A%2F%2Fdaisuke20240310.hatenablog.com%2Fentry%2Fqat1" title="量子化：論文 Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference を読む（１） - 土日の勉強ノート" class="embed-card embed-blogcard" scrolling="no" frameborder="0" style="display: block; width: 100%; height: 190px; max-width: 500px; margin: 10px 0px;"></iframe> https://cdn-ak.f.st-hatena.com/images/fotolife/d/daisuke20240310/20240507/20240507222836.png Hatena Blog https://hatena.blog 2024-05-03 17:45:18 量子化：論文 Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference を読む（１） rich https://daisuke20240310.hatenablog.com/entry/qat1 1.0 100%