You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Harald Kirsch <ki...@ebi.ac.uk> on 2005/02/17 12:04:06 UTC

reuse of TokenStream

Hi,

is it thread safe to reuse the same TokenStream object for several
fields of a document or does the IndexWriter try to parallelise
tokenization of the fields of a single document?

Similar question: Is it safe to reuse the same TokenStream object for
several documents if I use IndexWriter.addDocument() in a loop?  Or
does addDocument only put the work into a queue where tasks are taken
out for parallel indexing by several threads?

  Thanks,
  Harald.

-- 
------------------------------------------------------------------------
Harald Kirsch | kirsch@ebi.ac.uk | +44 (0) 1223/49-2593
BioMed Information Extraction: http://www.ebi.ac.uk/Rebholz-srv/whatizit

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: reuse of TokenStream

Posted by Harald Kirsch <ki...@ebi.ac.uk>.

On Fri, Feb 18, 2005 at 10:43:07AM -0500, Erik Hatcher wrote:
> I'm confused on how you're reusing a TokenStream object.  General  
> Lucene usage would not involve a developer dealing with it directly.   

Why not? The IndexWriter wants to tokenize a field, so it calls my
Analyzer to get a custom made Tokenizer or TokenStream object for the
given field. Since setting up the TokenStream needs some work to be
done, I rather not repeat this work for every document to be indexed.

  Harald.

-- 
------------------------------------------------------------------------
Harald Kirsch | kirsch@ebi.ac.uk | +44 (0) 1223/49-2593
BioMed Information Extraction: http://www.ebi.ac.uk/Rebholz-srv/whatizit

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: reuse of TokenStream

Posted by Erik Hatcher <er...@ehatchersolutions.com>.

I'm confused on how you're reusing a TokenStream object.  General  
Lucene usage would not involve a developer dealing with it directly.   
Could you share an example of what you're up to?

I'm not sure if this is related, but a technique I'm using is to index  
the same Document instance into two different IndexWriter instances  
(each uses a different Analyzer) - and this is working fine.

	Erik


On Feb 17, 2005, at 6:04 AM, Harald Kirsch wrote:

> Hi,
>
> is it thread safe to reuse the same TokenStream object for several
> fields of a document or does the IndexWriter try to parallelise
> tokenization of the fields of a single document?
>
> Similar question: Is it safe to reuse the same TokenStream object for
> several documents if I use IndexWriter.addDocument() in a loop?  Or
> does addDocument only put the work into a queue where tasks are taken
> out for parallel indexing by several threads?
>
>   Thanks,
>   Harald.
>
> --  
> ----------------------------------------------------------------------- 
> -
> Harald Kirsch | kirsch@ebi.ac.uk | +44 (0) 1223/49-2593
> BioMed Information Extraction:  
> http://www.ebi.ac.uk/Rebholz-srv/whatizit
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org