You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by TimF <ti...@timflanders.com> on 2007/03/26 23:30:39 UTC

Custom Analyzer Help please

I would like to be able to get terms from my data that are a combination of
two existing analyzers. 
I would like this for both posting and searching of various fields.
An example of the data might be as follows:
   Hello XY&Z Corporation - abc@example.com
I would like the following terms to come out of the analyzer:
 [hello]  [xy&z]  [corporation] [abc@example] [com]  //this is the
StandardAnalyzer output
as well as 
  [xy] [z]  [abc] [example] 

Essentially, I want the StandardAnalyzer output, but then I want to run the
StopAnalyzer on the terms that come out of the StandardAnalyzer. Basically I
would like to be able to search against part of the "special" word or the
whole "special" word, where special word contains tokens for things like
email and part numbers, etc...

I know the answer is that I have to create a custom analyzer that combines
the standard and stop analyzers, and I have tried... but I just cannot
figure out how to do this.

I have read through the LIA book and looked through the samples for keyword
and perfield analyzers, but they just dont do it.

Anyone have any samples that do this kind of thing?
Thanks,
Tim
-- 
View this message in context: http://www.nabble.com/Custom-Analyzer-Help-please-tf3469717.html#a9682141
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Custom Analyzer Help please

Posted by Grant Ingersoll <gs...@apache.org>.
Ask on java-user, java-dev is for people developing Lucene, java-user  
is for questions of this nature.

Also, look at the javadocs for StandardAnalyzer, as it uses the  
StopTokenFilter.  You need to pass in your stopwords or use the  
default stop set.

-Grant

On Mar 26, 2007, at 5:30 PM, TimF wrote:

>
> I would like to be able to get terms from my data that are a  
> combination of
> two existing analyzers.
> I would like this for both posting and searching of various fields.
> An example of the data might be as follows:
>    Hello XY&Z Corporation - abc@example.com
> I would like the following terms to come out of the analyzer:
>  [hello]  [xy&z]  [corporation] [abc@example] [com]  //this is the
> StandardAnalyzer output
> as well as
>   [xy] [z]  [abc] [example]
>
> Essentially, I want the StandardAnalyzer output, but then I want to  
> run the
> StopAnalyzer on the terms that come out of the StandardAnalyzer.  
> Basically I
> would like to be able to search against part of the "special" word  
> or the
> whole "special" word, where special word contains tokens for things  
> like
> email and part numbers, etc...
>
> I know the answer is that I have to create a custom analyzer that  
> combines
> the standard and stop analyzers, and I have tried... but I just cannot
> figure out how to do this.
>
> I have read through the LIA book and looked through the samples for  
> keyword
> and perfield analyzers, but they just dont do it.
>
> Anyone have any samples that do this kind of thing?
> Thanks,
> Tim
> -- 
> View this message in context: http://www.nabble.com/Custom-Analyzer- 
> Help-please-tf3469717.html#a9682141
> Sent from the Lucene - Java Developer mailing list archive at  
> Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>

--------------------------
Grant Ingersoll
Center for Natural Language Processing
http://www.cnlp.org

Read the Lucene Java FAQ at http://wiki.apache.org/jakarta-lucene/ 
LuceneFAQ



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org