You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Gregor Dorfbauer <gr...@lagentz.at> on 2010/09/03 11:23:34 UTC

Index Field feeded from Reader that also stores cleartext

 Hi!

I'm working on an indexer that should process documents on hard-disk 
which are of arbitrary size and type. I use Apache Tika for plain text 
extraction which offers the feature to stream the parsers output through 
a reader.

My problem is following:
Is there a possibility to generate a document field that gets its data 
from an Reader-instance and where the plain text is also stored into the 
index (like the Store.YES field denotes)?
If I can't stream the data, memory usage is exceeding the limits of my 
machine.


Thanks for your help,
Gregor

Re: Index Field feeded from Reader that also stores cleartext

Posted by Ian Lea <ia...@gmail.com>.
If you can't use one of the Reader based Field methods then no.
You'll have to convert the data to a string.  If you do it a doc at a
time and you still don't have enough memory then I don't know what you
can do.


--
Ian.


On Fri, Sep 3, 2010 at 10:23 AM, Gregor Dorfbauer
<gr...@lagentz.at> wrote:
> Hi!
>
> I'm working on an indexer that should process documents on hard-disk which
> are of arbitrary size and type. I use Apache Tika for plain text extraction
> which offers the feature to stream the parsers output through a reader.
>
> My problem is following:
> Is there a possibility to generate a document field that gets its data from
> an Reader-instance and where the plain text is also stored into the index
> (like the Store.YES field denotes)?
> If I can't stream the data, memory usage is exceeding the limits of my
> machine.
>
>
> Thanks for your help,
> Gregor
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org