You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by 柳松 <la...@126.com> on 2009/02/19 04:43:50 UTC

Sogou Corpus Decoder/Codec for Hadoop

Dear all!
Can any one provide me a decoder or cdoec for Sogou Corpus? I'm analyzing Sogou Corpus using hadoop, but I cannot decode the .7z files.

I have tried LZMA, but Idont know why it is not able to uncompress and decode the Sogou Corpus.
If there are some one who like me are analysing this largest internet corpus, please let me know and help me to figure out this problem!

Thanks

Song Liu in Suzhou University , China.