Counting UTF-8 characters with word

nurse https://blog.hatena.ne.jp/nurse/ なるせにっき https://naruse.hateblo.jp/ Char こちらも search non ascii 同様にワード単位で見れば早くなります。具体的には、そもそも UTF-8 は trail byte が [\x80-\xBF] に限定され、またこの範囲は lead byte には出現しません。つまり、バイト列の中から、0b10xxxxxx 以外のバイトの数を数えれば、それがそのまま文字列長になります。これをワード単位で並列して実行する場合は、バイトの最上位ビットの否定と、2番目のビットの論理和をとり、それが真なものを数えれば大丈夫です。範囲 2進最上位ビット最上位ビットの否定 2番目論理和 \x00-\x7F 0b0x0000 0 1 x 1… 190 <iframe src="https://hatenablog-parts.com/embed?url=https%3A%2F%2Fnaruse.hateblo.jp%2Fentry%2F20080326%2F1206527852" title="Counting UTF-8 characters with word - なるせにっき" class="embed-card embed-blogcard" scrolling="no" frameborder="0" style="display: block; width: 100%; height: 190px; max-width: 500px; margin: 10px 0px;"></iframe> Hatena Blog https://hatena.blog 2008-03-26 19:37:32 Counting UTF-8 characters with word rich https://naruse.hateblo.jp/entry/20080326/1206527852 1.0 100%