You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by "Kraemer, Fabian" <F....@esolut.de> on 2006/05/23 15:31:06 UTC
nutch compressing huge content data
Hi.
I use lucene 1.4.3 and nutch 0.6. I have a working implementation of lucene, searching over several indices. All the data is generated directly from the db, not by a crawler. The search request can go over multiple indices with boolean clauses for each index.
I have the problem that I wanted to use nutch only for crawling and indexing, not for the search (because it is already implemented). But I got the problem, that nutch seems to compress the data in several fields of a document. I don't want to use Nutch search mechanism nor do I want to touch my working search implementation.
I got two questions:
1) how can I stop nutch from compressing the data in a field
2) will this "uncompressed" index be equal to an index produced by an IndexWriter of lucene (1.4.3?)
Thanks for your help,
Fabian
Re: nutch compressing huge content data
Posted by Stefan Groschupf <sg...@media-style.com>.
Hi Fabian,
wow nutch 0.6 is really old school.. :-)
However the simplest thing you can do is just write a class that
reads the data from a segment (parsed text and data) and writes those
into a own index.
Should be simple if you know how to write into a lucene index.
HTH
Stefan
Am 23.05.2006 um 15:31 schrieb Kraemer, Fabian:
> Hi.
>
> I use lucene 1.4.3 and nutch 0.6. I have a working implementation
> of lucene, searching over several indices. All the data is
> generated directly from the db, not by a crawler. The search
> request can go over multiple indices with boolean clauses for each
> index.
>
> I have the problem that I wanted to use nutch only for crawling and
> indexing, not for the search (because it is already implemented).
> But I got the problem, that nutch seems to compress the data in
> several fields of a document. I don't want to use Nutch search
> mechanism nor do I want to touch my working search implementation.
>
> I got two questions:
>
> 1) how can I stop nutch from compressing the data in a field
> 2) will this "uncompressed" index be equal to an index produced by
> an IndexWriter of lucene (1.4.3?)
>
> Thanks for your help,
>
> Fabian
>
>