You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Furkan KAMACI <fu...@gmail.com> on 2013/08/19 14:07:30 UTC

Prevent Some Keywords at Analyzer Step

Hi;

I want to write an analyzer that will prevent some special words. For
example sentence to be indexed is:

diet follower

it will tokenize it as like that

token 1) diet
token 2) follower
token 3) diet follower

How can I do that with Solr?

Re: Prevent Some Keywords at Analyzer Step

Posted by Furkan KAMACI <fu...@gmail.com>.
How can I remove unnecessary tokens after shingle filter?


2013/8/20 Jeff Porter <jp...@o2ointeractive.com>

> Why not use ShingleFilterFactory and then match on that token if you find
> it?
>
>
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ShingleFilterFactory
>
>
> Jeff Porter
> co-founder
> email: jporter@o2ointeractive.com
> mobile: +1-303-332-4006
>
> On Aug 19, 2013, at 11:23 AM, Dan Davis wrote:
>
> > This is an interesting topic - my employer is a medical library and there
> > are many keywords that may need to be aliased in various ways, and 2 or 3
> > word phrases that perhaps should be treated specially.   Jack, can you
> give
> > me an example of how to do that sort of thing?    Perhaps I need to buy
> > your almost released Deep Dive book...
> > Sorry to be too tangential - it is my strange way.
> >
> >
> > On Mon, Aug 19, 2013 at 12:32 PM, Jack Krupansky <
> jack@basetechnology.com>wrote:
> >
> >> Okay, but what is it that you are trying to "prevent"??
> >>
> >> And, "diet follower" is a phrase, not a keyword or term.
> >>
> >> So, I'm still baffled as to what you are really trying to do. Trying
> >> explaining it in plain English.
> >>
> >> And given this same input, how would it be queried?
> >>
> >>
> >> -- Jack Krupansky
> >>
> >> -----Original Message----- From: Furkan KAMACI
> >> Sent: Monday, August 19, 2013 11:22 AM
> >> To: solr-user@lucene.apache.org
> >> Subject: Re: Prevent Some Keywords at Analyzer Step
> >>
> >>
> >> Let's assume that my sentence is that:
> >>
> >> *Alice is a diet follower*
> >>
> >> My special keyword => *diet follower*
> >>
> >> Tokens will be:
> >>
> >> Token 1) Alice
> >> Token 2) is
> >> Token 3) a
> >> Token 4) diet
> >> Token 5) follower
> >> Token 6) *diet follower*
> >>
> >>
> >> 2013/8/19 Jack Krupansky <ja...@basetechnology.com>
> >>
> >> Your example doesn't "prevent" any keywords.
> >>>
> >>> You need to elaborate the specific requirements with more detail.
> >>>
> >>> Given a long stream of text, what tokenization do you expect in the
> index?
> >>>
> >>> -- Jack Krupansky
> >>>
> >>> -----Original Message----- From: Furkan KAMACI Sent: Monday, August 19,
> >>> 2013 8:07 AM To: solr-user@lucene.apache.org Subject: Prevent Some
> >>> Keywords at Analyzer Step
> >>> Hi;
> >>>
> >>> I want to write an analyzer that will prevent some special words. For
> >>> example sentence to be indexed is:
> >>>
> >>> diet follower
> >>>
> >>> it will tokenize it as like that
> >>>
> >>> token 1) diet
> >>> token 2) follower
> >>> token 3) diet follower
> >>>
> >>> How can I do that with Solr?
> >>>
> >>>
> >>
>
>

Re: Prevent Some Keywords at Analyzer Step

Posted by Jeff Porter <jp...@o2ointeractive.com>.
Why not use ShingleFilterFactory and then match on that token if you find it?

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ShingleFilterFactory


Jeff Porter
co-founder
email: jporter@o2ointeractive.com
mobile: +1-303-332-4006

On Aug 19, 2013, at 11:23 AM, Dan Davis wrote:

> This is an interesting topic - my employer is a medical library and there
> are many keywords that may need to be aliased in various ways, and 2 or 3
> word phrases that perhaps should be treated specially.   Jack, can you give
> me an example of how to do that sort of thing?    Perhaps I need to buy
> your almost released Deep Dive book...
> Sorry to be too tangential - it is my strange way.
> 
> 
> On Mon, Aug 19, 2013 at 12:32 PM, Jack Krupansky <ja...@basetechnology.com>wrote:
> 
>> Okay, but what is it that you are trying to "prevent"??
>> 
>> And, "diet follower" is a phrase, not a keyword or term.
>> 
>> So, I'm still baffled as to what you are really trying to do. Trying
>> explaining it in plain English.
>> 
>> And given this same input, how would it be queried?
>> 
>> 
>> -- Jack Krupansky
>> 
>> -----Original Message----- From: Furkan KAMACI
>> Sent: Monday, August 19, 2013 11:22 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Prevent Some Keywords at Analyzer Step
>> 
>> 
>> Let's assume that my sentence is that:
>> 
>> *Alice is a diet follower*
>> 
>> My special keyword => *diet follower*
>> 
>> Tokens will be:
>> 
>> Token 1) Alice
>> Token 2) is
>> Token 3) a
>> Token 4) diet
>> Token 5) follower
>> Token 6) *diet follower*
>> 
>> 
>> 2013/8/19 Jack Krupansky <ja...@basetechnology.com>
>> 
>> Your example doesn't "prevent" any keywords.
>>> 
>>> You need to elaborate the specific requirements with more detail.
>>> 
>>> Given a long stream of text, what tokenization do you expect in the index?
>>> 
>>> -- Jack Krupansky
>>> 
>>> -----Original Message----- From: Furkan KAMACI Sent: Monday, August 19,
>>> 2013 8:07 AM To: solr-user@lucene.apache.org Subject: Prevent Some
>>> Keywords at Analyzer Step
>>> Hi;
>>> 
>>> I want to write an analyzer that will prevent some special words. For
>>> example sentence to be indexed is:
>>> 
>>> diet follower
>>> 
>>> it will tokenize it as like that
>>> 
>>> token 1) diet
>>> token 2) follower
>>> token 3) diet follower
>>> 
>>> How can I do that with Solr?
>>> 
>>> 
>> 


Re: Prevent Some Keywords at Analyzer Step

Posted by Dan Davis <da...@gmail.com>.
This is an interesting topic - my employer is a medical library and there
are many keywords that may need to be aliased in various ways, and 2 or 3
word phrases that perhaps should be treated specially.   Jack, can you give
me an example of how to do that sort of thing?    Perhaps I need to buy
your almost released Deep Dive book...
Sorry to be too tangential - it is my strange way.


On Mon, Aug 19, 2013 at 12:32 PM, Jack Krupansky <ja...@basetechnology.com>wrote:

> Okay, but what is it that you are trying to "prevent"??
>
> And, "diet follower" is a phrase, not a keyword or term.
>
> So, I'm still baffled as to what you are really trying to do. Trying
> explaining it in plain English.
>
> And given this same input, how would it be queried?
>
>
> -- Jack Krupansky
>
> -----Original Message----- From: Furkan KAMACI
> Sent: Monday, August 19, 2013 11:22 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Prevent Some Keywords at Analyzer Step
>
>
> Let's assume that my sentence is that:
>
> *Alice is a diet follower*
>
> My special keyword => *diet follower*
>
> Tokens will be:
>
> Token 1) Alice
> Token 2) is
> Token 3) a
> Token 4) diet
> Token 5) follower
> Token 6) *diet follower*
>
>
> 2013/8/19 Jack Krupansky <ja...@basetechnology.com>
>
>  Your example doesn't "prevent" any keywords.
>>
>> You need to elaborate the specific requirements with more detail.
>>
>> Given a long stream of text, what tokenization do you expect in the index?
>>
>> -- Jack Krupansky
>>
>> -----Original Message----- From: Furkan KAMACI Sent: Monday, August 19,
>> 2013 8:07 AM To: solr-user@lucene.apache.org Subject: Prevent Some
>> Keywords at Analyzer Step
>> Hi;
>>
>> I want to write an analyzer that will prevent some special words. For
>> example sentence to be indexed is:
>>
>> diet follower
>>
>> it will tokenize it as like that
>>
>> token 1) diet
>> token 2) follower
>> token 3) diet follower
>>
>> How can I do that with Solr?
>>
>>
>

Re: Prevent Some Keywords at Analyzer Step

Posted by Jack Krupansky <ja...@basetechnology.com>.
Okay, but what is it that you are trying to "prevent"??

And, "diet follower" is a phrase, not a keyword or term.

So, I'm still baffled as to what you are really trying to do. Trying 
explaining it in plain English.

And given this same input, how would it be queried?

-- Jack Krupansky

-----Original Message----- 
From: Furkan KAMACI
Sent: Monday, August 19, 2013 11:22 AM
To: solr-user@lucene.apache.org
Subject: Re: Prevent Some Keywords at Analyzer Step

Let's assume that my sentence is that:

*Alice is a diet follower*

My special keyword => *diet follower*

Tokens will be:

Token 1) Alice
Token 2) is
Token 3) a
Token 4) diet
Token 5) follower
Token 6) *diet follower*


2013/8/19 Jack Krupansky <ja...@basetechnology.com>

> Your example doesn't "prevent" any keywords.
>
> You need to elaborate the specific requirements with more detail.
>
> Given a long stream of text, what tokenization do you expect in the index?
>
> -- Jack Krupansky
>
> -----Original Message----- From: Furkan KAMACI Sent: Monday, August 19,
> 2013 8:07 AM To: solr-user@lucene.apache.org Subject: Prevent Some
> Keywords at Analyzer Step
> Hi;
>
> I want to write an analyzer that will prevent some special words. For
> example sentence to be indexed is:
>
> diet follower
>
> it will tokenize it as like that
>
> token 1) diet
> token 2) follower
> token 3) diet follower
>
> How can I do that with Solr?
> 


Re: Prevent Some Keywords at Analyzer Step

Posted by Furkan KAMACI <fu...@gmail.com>.
Let's assume that my sentence is that:

*Alice is a diet follower*

My special keyword => *diet follower*

Tokens will be:

Token 1) Alice
Token 2) is
Token 3) a
Token 4) diet
Token 5) follower
Token 6) *diet follower*


2013/8/19 Jack Krupansky <ja...@basetechnology.com>

> Your example doesn't "prevent" any keywords.
>
> You need to elaborate the specific requirements with more detail.
>
> Given a long stream of text, what tokenization do you expect in the index?
>
> -- Jack Krupansky
>
> -----Original Message----- From: Furkan KAMACI Sent: Monday, August 19,
> 2013 8:07 AM To: solr-user@lucene.apache.org Subject: Prevent Some
> Keywords at Analyzer Step
> Hi;
>
> I want to write an analyzer that will prevent some special words. For
> example sentence to be indexed is:
>
> diet follower
>
> it will tokenize it as like that
>
> token 1) diet
> token 2) follower
> token 3) diet follower
>
> How can I do that with Solr?
>

Re: Prevent Some Keywords at Analyzer Step

Posted by Jack Krupansky <ja...@basetechnology.com>.
Your example doesn't "prevent" any keywords.

You need to elaborate the specific requirements with more detail.

Given a long stream of text, what tokenization do you expect in the index?

-- Jack Krupansky

-----Original Message----- 
From: Furkan KAMACI 
Sent: Monday, August 19, 2013 8:07 AM 
To: solr-user@lucene.apache.org 
Subject: Prevent Some Keywords at Analyzer Step 

Hi;

I want to write an analyzer that will prevent some special words. For
example sentence to be indexed is:

diet follower

it will tokenize it as like that

token 1) diet
token 2) follower
token 3) diet follower

How can I do that with Solr?