You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Harald Kirsch <ki...@ebi.ac.uk> on 2005/02/17 12:04:06 UTC
reuse of TokenStream
Hi,
is it thread safe to reuse the same TokenStream object for several
fields of a document or does the IndexWriter try to parallelise
tokenization of the fields of a single document?
Similar question: Is it safe to reuse the same TokenStream object for
several documents if I use IndexWriter.addDocument() in a loop? Or
does addDocument only put the work into a queue where tasks are taken
out for parallel indexing by several threads?
Thanks,
Harald.
--
------------------------------------------------------------------------
Harald Kirsch | kirsch@ebi.ac.uk | +44 (0) 1223/49-2593
BioMed Information Extraction: http://www.ebi.ac.uk/Rebholz-srv/whatizit
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Re: reuse of TokenStream
Posted by Harald Kirsch <ki...@ebi.ac.uk>.
On Fri, Feb 18, 2005 at 10:43:07AM -0500, Erik Hatcher wrote:
> I'm confused on how you're reusing a TokenStream object. General
> Lucene usage would not involve a developer dealing with it directly.
Why not? The IndexWriter wants to tokenize a field, so it calls my
Analyzer to get a custom made Tokenizer or TokenStream object for the
given field. Since setting up the TokenStream needs some work to be
done, I rather not repeat this work for every document to be indexed.
Harald.
--
------------------------------------------------------------------------
Harald Kirsch | kirsch@ebi.ac.uk | +44 (0) 1223/49-2593
BioMed Information Extraction: http://www.ebi.ac.uk/Rebholz-srv/whatizit
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Re: reuse of TokenStream
Posted by Erik Hatcher <er...@ehatchersolutions.com>.
I'm confused on how you're reusing a TokenStream object. General
Lucene usage would not involve a developer dealing with it directly.
Could you share an example of what you're up to?
I'm not sure if this is related, but a technique I'm using is to index
the same Document instance into two different IndexWriter instances
(each uses a different Analyzer) - and this is working fine.
Erik
On Feb 17, 2005, at 6:04 AM, Harald Kirsch wrote:
> Hi,
>
> is it thread safe to reuse the same TokenStream object for several
> fields of a document or does the IndexWriter try to parallelise
> tokenization of the fields of a single document?
>
> Similar question: Is it safe to reuse the same TokenStream object for
> several documents if I use IndexWriter.addDocument() in a loop? Or
> does addDocument only put the work into a queue where tasks are taken
> out for parallel indexing by several threads?
>
> Thanks,
> Harald.
>
> --
> -----------------------------------------------------------------------
> -
> Harald Kirsch | kirsch@ebi.ac.uk | +44 (0) 1223/49-2593
> BioMed Information Extraction:
> http://www.ebi.ac.uk/Rebholz-srv/whatizit
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org