MapReduce デザインパターン (2)

enakai00 https://blog.hatena.ne.jp/enakai00/ めもめも https://enakai00.hatenablog.com/ "word co-occurrence problem" （文書内の近くにペアで出現する単語の数をカウントする処理）の２つの基本パターンである "Pairs" と "Stripes" から、まずは、Pairs を見てみます。事前準備「カラマーゾフの兄弟」のテキストを HDFS に保存しておきます。 $ wget http://www.gutenberg.org/files/28054/28054.zip $ unzip 28054.zip $ hadoop fs -copyFromLocal 28054.txt Karamazov.txt ソースコード例えば、連続して出現する単語のペアを… 190 <iframe src="https://hatenablog-parts.com/embed?url=https%3A%2F%2Fenakai00.hatenablog.com%2Fentry%2F20100702%2F1278648805" title="MapReduce デザインパターン (2) - めもめも" class="embed-card embed-blogcard" scrolling="no" frameborder="0" style="display: block; width: 100%; height: 190px; max-width: 500px; margin: 10px 0px;"></iframe> Hatena Blog https://hatena.blog 2010-07-02 13:13:25 MapReduce デザインパターン (2) rich https://enakai00.hatenablog.com/entry/20100702/1278648805 1.0 100%