64MBに切り分けられたブロック境界の扱われ方

nobu-q https://blog.hatena.ne.jp/nobu-q/ nobu-qの日記 https://nobu-q.hatenadiary.org/ Hadoop memo まだ実装面まで踏み込んでる訳じゃないので、あくまで俺メモ。参考にはならない。HDFSでは大きなファイルを細かく刻んで色んなノードに保存している。当然データの切れ目は勝手に決められるのでその辺はどうしようもない。ここで、ある馬鹿でかいテキストファイルの行数をMapReduceで数える事を考える。 conf.setInputFormat(TextInputFormat.class);こうすると入力を一行単位でMapperに渡してくれるので、WordCountをちょこっといじくって public void map(略) { output.collect(new Text("Lines"), one)… 190 <iframe src="https://hatenablog-parts.com/embed?url=https%3A%2F%2Fnobu-q.hatenadiary.org%2Fentry%2F20081002%2F1222913495" title=" 64MBに切り分けられたブロック境界の扱われ方 - nobu-qの日記" class="embed-card embed-blogcard" scrolling="no" frameborder="0" style="display: block; width: 100%; height: 190px; max-width: 500px; margin: 10px 0px;"></iframe> Hatena Blog https://hatena.blog 2008-10-02 11:11:35 64MBに切り分けられたブロック境界の扱われ方 rich https://nobu-q.hatenadiary.org/entry/20081002/1222913495 1.0 100%