You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Emmanuel JOKE <jo...@gmail.com> on 2007/06/02 20:23:07 UTC

Compression

Hi Guys,

I've read an article which explain that we are now able to use the native
lib of hadoop in order to compress our data crawled.

I'm just wondering how can we compress a crawldb and all others stuff that
are already saved on the disk.
could you please help me ?

Thanks
E

Re: Compression

Posted by Andrzej Bialecki <ab...@getopt.org>.
Emmanuel JOKE wrote:
> Hi Guys,
> 
> I've read an article which explain that we are now able to use the native
> lib of hadoop in order to compress our data crawled.
> 
> I'm just wondering how can we compress a crawldb and all others stuff that
> are already saved on the disk.
> could you please help me ?

You can use the *Merger tools to re-write the data. E.g. CrawlDbMerger 
for crawldb, giving just a single db as the input argument.


-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com