You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by "Kraemer, Fabian" <F....@esolut.de> on 2006/05/23 15:31:06 UTC

nutch compressing huge content data

Hi.

I use lucene 1.4.3 and nutch 0.6. I have a working implementation of lucene, searching over several indices. All the data is generated directly from the db, not by a crawler. The search request can go over multiple indices with boolean clauses for each index.

I have the problem that I wanted to use nutch only for crawling and indexing, not for the search (because it is already implemented). But I got the problem, that nutch seems to compress the data in several fields of a document. I don't want to use Nutch search mechanism nor do I want to touch my working search implementation.

I got two questions:

1) how can I stop nutch from compressing the data in a field
2) will this "uncompressed" index be equal to an index produced by an IndexWriter of lucene (1.4.3?)

Thanks for your help,

Fabian


Re: nutch compressing huge content data

Posted by Stefan Groschupf <sg...@media-style.com>.
Hi Fabian,
wow nutch 0.6 is really old school.. :-)
However the simplest thing you can do is just write a class that  
reads the data from a segment (parsed text and data) and writes those  
into a own index.
Should be simple if you know how to write into a lucene index.
HTH
Stefan



Am 23.05.2006 um 15:31 schrieb Kraemer, Fabian:

> Hi.
>
> I use lucene 1.4.3 and nutch 0.6. I have a working implementation  
> of lucene, searching over several indices. All the data is  
> generated directly from the db, not by a crawler. The search  
> request can go over multiple indices with boolean clauses for each  
> index.
>
> I have the problem that I wanted to use nutch only for crawling and  
> indexing, not for the search (because it is already implemented).  
> But I got the problem, that nutch seems to compress the data in  
> several fields of a document. I don't want to use Nutch search  
> mechanism nor do I want to touch my working search implementation.
>
> I got two questions:
>
> 1) how can I stop nutch from compressing the data in a field
> 2) will this "uncompressed" index be equal to an index produced by  
> an IndexWriter of lucene (1.4.3?)
>
> Thanks for your help,
>
> Fabian
>
>