You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Rohit Banga <ia...@gmail.com> on 2010/02/10 14:16:57 UTC

read more tokens during analysis

i want to consider the current word & the next as a single term.

when analyzing "Arun Kumar"

i want my analyzer to consider "Arun",  "Arun Kumar" as synonyms.

in the tokenstream method, how do we read the next token "Kumar"
i am going through the setPositionIncrements method for considering them as
synonyms, but i don't understand how to implement look ahead in the
analyzer.



Rohit Banga

Re: read more tokens during analysis

Posted by Grant Ingersoll <gs...@apache.org>.

On Feb 10, 2010, at 8:33 AM, Rohit Banga wrote:

> basically i want to use my own filter wrapping around a standard analyzer.
> 
> the kind explained on page 166 of Lucene in Action, uses input.next() which
> is perhaps not available in lucene 3.0
> 
> what is the substitute method.

captureState() and restoreState() are the new versions in 3.0.  There are several examples of how they work in contrib/analyzers.



--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem using Solr/Lucene: http://www.lucidimagination.com/search


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: read more tokens during analysis

Posted by Rohit Banga <ia...@gmail.com>.

basically i want to use my own filter wrapping around a standard analyzer.

the kind explained on page 166 of Lucene in Action, uses input.next() which
is perhaps not available in lucene 3.0

what is the substitute method.

Rohit Banga


On Wed, Feb 10, 2010 at 6:46 PM, Rohit Banga <ia...@gmail.com>wrote:

> i want to consider the current word & the next as a single term.
>
> when analyzing "Arun Kumar"
>
> i want my analyzer to consider "Arun",  "Arun Kumar" as synonyms.
>
> in the tokenstream method, how do we read the next token "Kumar"
> i am going through the setPositionIncrements method for considering them as
> synonyms, but i don't understand how to implement look ahead in the
> analyzer.
>
>
>
> Rohit Banga
>

Re: read more tokens during analysis

Posted by Rohit Banga <ia...@gmail.com>.

thanks

will try the code and get back if i have any problems.

Rohit Banga


On Fri, Feb 12, 2010 at 10:38 PM, Ahmet Arslan <io...@yahoo.com> wrote:

>
> > i want to consider the current word
> > & the next as a single term.
> >
> > when analyzing "Arun Kumar"
> >
> > i want my analyzer to consider "Arun",  "Arun Kumar"
> > as synonyms.
> >
> > in the tokenstream method, how do we read the next token
> > "Kumar"
> > i am going through the setPositionIncrements method for
> > considering them as
> > synonyms, but i don't understand how to implement look
> > ahead in the
> > analyzer.
>
> Can we say that you want to implement a synonym filter that takes a list of
> custom synonyms?
> If yes why not use Solr's SynonymFilterFactory[1] that does this
> automatically? It can handle multi-words synonym like "Arun",  "Arun Kumar"
> I can share the code to integrate it into Lucene if you want.
>
> [1]
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: read more tokens during analysis

Posted by Ahmet Arslan <io...@yahoo.com>.

> i want to consider the current word
> & the next as a single term.
> 
> when analyzing "Arun Kumar"
> 
> i want my analyzer to consider "Arun",  "Arun Kumar"
> as synonyms.
> 
> in the tokenstream method, how do we read the next token
> "Kumar"
> i am going through the setPositionIncrements method for
> considering them as
> synonyms, but i don't understand how to implement look
> ahead in the
> analyzer.

Can we say that you want to implement a synonym filter that takes a list of custom synonyms?
If yes why not use Solr's SynonymFilterFactory[1] that does this automatically? It can handle multi-words synonym like "Arun",  "Arun Kumar"
I can share the code to integrate it into Lucene if you want.

[1]http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory



      

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org