You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by "Murdoch, Paul" <PA...@saic.com> on 2010/03/03 22:11:31 UTC

Phrase search on NOT_ANALYZED content

If I have indexed some content that contains some words and a single
whitespace between each word as NOT_ANALYZED, is it possible to perform
a phrase search on that a portion of that content?  I'm indexing and
searching with the StandardAnalyzer 2.9.  Using the KeywordAnalyzer
works, but I have to type in the entire content as it was indexed.  I
would like to search for a phrase in the content, but not all of the
content, preferably with the StandardAnalyzer.

 

Thanks,

 

Paul 

 


RE: Phrase search on NOT_ANALYZED content

Posted by "Murdoch, Paul" <PA...@saic.com>.
Yep. PerFieldAnalyzerWrapper seems to have solved my problem.

Thanks,

Paul


-----Original Message-----
From: java-user-return-45289-PAUL.B.MURDOCH=saic.com@lucene.apache.org
[mailto:java-user-return-45289-PAUL.B.MURDOCH=saic.com@lucene.apache.org
] On Behalf Of Erick Erickson
Sent: Thursday, March 04, 2010 8:54 AM
To: java-user@lucene.apache.org
Subject: Re: Phrase search on NOT_ANALYZED content

I'm still struggling with your overall goal here, but...

It sounds like what you're looking for is an exact match
in some cases but not others? In which case you could
think about indexing the info: field in a second field and
adding a clause against *that* field for your phrase case.
PerFieldAnalyzerWrapper is your friend here <G>.

HTH
Erick

On Thu, Mar 4, 2010 at 8:35 AM, Murdoch, Paul
<PA...@saic.com>wrote:

> I'm using NOT_ANALYZED because I have a list of text items to index
> where some of the items are single words and some of the items are two
> words or more with punctuation.  My problem is that sometimes one of
the
> words in a item with two or more words matches one of the single text
> items.  That sounds confusing so here is an example of the list:
>
> info:name
> info:name, paul
> info:name, unknown
> info:unknown
>
> So here I have four items in the list. If I index this ANALYZED and do
a
> search for "info:unknown", I will get two hits where I only wanted
one.
> If I do a search for "info:name" I will get three hits where I only
> wanted one.  I index using the StandardAnalyzer.  I also index
> everything as ANALYZED into a "content" field for broad searches.
That
> functions as expected.  This is a specialized search where the user
can
> search on field name (in this case "info").  The specialized search is
> not restricted to exact matches so the KeywordAnalyzer is out.  I'm
> trying to allow searching on terms, phrases, and ranges as well as
> allowing fuzzy, wildcard, slop, and boost modifiers.  I'm making use
of
> the TermQuery, PhraseQuery, FuzzyQuery, WildcardQuery, TermRangeQuery
> and BooleanQuery.  Using these classes produces better results than
> simply feeding a Query object to the QueryParser.  Everything works as
> expected except for phrases because of the NOT_ANALYZED flag.  Is
there
> a KeywordQuery class?  I think that would solve my problem.
>
> Thanks,
>
> Paul
>
> -----Original Message-----
> From: java-user-return-45278-PAUL.B.MURDOCH=saic.com@lucene.apache.org
>
[mailto:java-user-return-45278-PAUL.B.MURDOCH=saic.com@lucene.apache.org
> ] On Behalf Of Erick Erickson
> Sent: Wednesday, March 03, 2010 4:30 PM
> To: java-user@lucene.apache.org
> Subject: Re: Phrase search on NOT_ANALYZED content
>
> NOT_ANALYZED is probably not what you want.
> NOT_ANALYZED stores the entire input as
> a *single* token, so you can never match on
> anything except the entire input.
>
> What did you hope to accomplish by indexint
> NOT_ANALYZED? That's actually a pretty
> specialized thing to do, perhaps there's a better
> way to accomplish your goal.
>
> Best
> Erick
>
> On Wed, Mar 3, 2010 at 4:11 PM, Murdoch, Paul
> <PA...@saic.com>wrote:
>
> > If I have indexed some content that contains some words and a single
> > whitespace between each word as NOT_ANALYZED, is it possible to
> perform
> > a phrase search on that a portion of that content?  I'm indexing and
> > searching with the StandardAnalyzer 2.9.  Using the KeywordAnalyzer
> > works, but I have to type in the entire content as it was indexed.
I
> > would like to search for a phrase in the content, but not all of the
> > content, preferably with the StandardAnalyzer.
> >
> >
> >
> > Thanks,
> >
> >
> >
> > Paul
> >
> >
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Phrase search on NOT_ANALYZED content

Posted by Erick Erickson <er...@gmail.com>.
I'm still struggling with your overall goal here, but...

It sounds like what you're looking for is an exact match
in some cases but not others? In which case you could
think about indexing the info: field in a second field and
adding a clause against *that* field for your phrase case.
PerFieldAnalyzerWrapper is your friend here <G>.

HTH
Erick

On Thu, Mar 4, 2010 at 8:35 AM, Murdoch, Paul <PA...@saic.com>wrote:

> I'm using NOT_ANALYZED because I have a list of text items to index
> where some of the items are single words and some of the items are two
> words or more with punctuation.  My problem is that sometimes one of the
> words in a item with two or more words matches one of the single text
> items.  That sounds confusing so here is an example of the list:
>
> info:name
> info:name, paul
> info:name, unknown
> info:unknown
>
> So here I have four items in the list. If I index this ANALYZED and do a
> search for "info:unknown", I will get two hits where I only wanted one.
> If I do a search for "info:name" I will get three hits where I only
> wanted one.  I index using the StandardAnalyzer.  I also index
> everything as ANALYZED into a "content" field for broad searches.  That
> functions as expected.  This is a specialized search where the user can
> search on field name (in this case "info").  The specialized search is
> not restricted to exact matches so the KeywordAnalyzer is out.  I'm
> trying to allow searching on terms, phrases, and ranges as well as
> allowing fuzzy, wildcard, slop, and boost modifiers.  I'm making use of
> the TermQuery, PhraseQuery, FuzzyQuery, WildcardQuery, TermRangeQuery
> and BooleanQuery.  Using these classes produces better results than
> simply feeding a Query object to the QueryParser.  Everything works as
> expected except for phrases because of the NOT_ANALYZED flag.  Is there
> a KeywordQuery class?  I think that would solve my problem.
>
> Thanks,
>
> Paul
>
> -----Original Message-----
> From: java-user-return-45278-PAUL.B.MURDOCH=saic.com@lucene.apache.org
> [mailto:java-user-return-45278-PAUL.B.MURDOCH=saic.com@lucene.apache.org
> ] On Behalf Of Erick Erickson
> Sent: Wednesday, March 03, 2010 4:30 PM
> To: java-user@lucene.apache.org
> Subject: Re: Phrase search on NOT_ANALYZED content
>
> NOT_ANALYZED is probably not what you want.
> NOT_ANALYZED stores the entire input as
> a *single* token, so you can never match on
> anything except the entire input.
>
> What did you hope to accomplish by indexint
> NOT_ANALYZED? That's actually a pretty
> specialized thing to do, perhaps there's a better
> way to accomplish your goal.
>
> Best
> Erick
>
> On Wed, Mar 3, 2010 at 4:11 PM, Murdoch, Paul
> <PA...@saic.com>wrote:
>
> > If I have indexed some content that contains some words and a single
> > whitespace between each word as NOT_ANALYZED, is it possible to
> perform
> > a phrase search on that a portion of that content?  I'm indexing and
> > searching with the StandardAnalyzer 2.9.  Using the KeywordAnalyzer
> > works, but I have to type in the entire content as it was indexed.  I
> > would like to search for a phrase in the content, but not all of the
> > content, preferably with the StandardAnalyzer.
> >
> >
> >
> > Thanks,
> >
> >
> >
> > Paul
> >
> >
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

RE: Phrase search on NOT_ANALYZED content

Posted by "Murdoch, Paul" <PA...@saic.com>.
I'm using NOT_ANALYZED because I have a list of text items to index
where some of the items are single words and some of the items are two
words or more with punctuation.  My problem is that sometimes one of the
words in a item with two or more words matches one of the single text
items.  That sounds confusing so here is an example of the list:

info:name
info:name, paul
info:name, unknown
info:unknown

So here I have four items in the list. If I index this ANALYZED and do a
search for "info:unknown", I will get two hits where I only wanted one.
If I do a search for "info:name" I will get three hits where I only
wanted one.  I index using the StandardAnalyzer.  I also index
everything as ANALYZED into a "content" field for broad searches.  That
functions as expected.  This is a specialized search where the user can
search on field name (in this case "info").  The specialized search is
not restricted to exact matches so the KeywordAnalyzer is out.  I'm
trying to allow searching on terms, phrases, and ranges as well as
allowing fuzzy, wildcard, slop, and boost modifiers.  I'm making use of
the TermQuery, PhraseQuery, FuzzyQuery, WildcardQuery, TermRangeQuery
and BooleanQuery.  Using these classes produces better results than
simply feeding a Query object to the QueryParser.  Everything works as
expected except for phrases because of the NOT_ANALYZED flag.  Is there
a KeywordQuery class?  I think that would solve my problem.

Thanks,   

Paul 

-----Original Message-----
From: java-user-return-45278-PAUL.B.MURDOCH=saic.com@lucene.apache.org
[mailto:java-user-return-45278-PAUL.B.MURDOCH=saic.com@lucene.apache.org
] On Behalf Of Erick Erickson
Sent: Wednesday, March 03, 2010 4:30 PM
To: java-user@lucene.apache.org
Subject: Re: Phrase search on NOT_ANALYZED content

NOT_ANALYZED is probably not what you want.
NOT_ANALYZED stores the entire input as
a *single* token, so you can never match on
anything except the entire input.

What did you hope to accomplish by indexint
NOT_ANALYZED? That's actually a pretty
specialized thing to do, perhaps there's a better
way to accomplish your goal.

Best
Erick

On Wed, Mar 3, 2010 at 4:11 PM, Murdoch, Paul
<PA...@saic.com>wrote:

> If I have indexed some content that contains some words and a single
> whitespace between each word as NOT_ANALYZED, is it possible to
perform
> a phrase search on that a portion of that content?  I'm indexing and
> searching with the StandardAnalyzer 2.9.  Using the KeywordAnalyzer
> works, but I have to type in the entire content as it was indexed.  I
> would like to search for a phrase in the content, but not all of the
> content, preferably with the StandardAnalyzer.
>
>
>
> Thanks,
>
>
>
> Paul
>
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Phrase search on NOT_ANALYZED content

Posted by Erick Erickson <er...@gmail.com>.
NOT_ANALYZED is probably not what you want.
NOT_ANALYZED stores the entire input as
a *single* token, so you can never match on
anything except the entire input.

What did you hope to accomplish by indexint
NOT_ANALYZED? That's actually a pretty
specialized thing to do, perhaps there's a better
way to accomplish your goal.

Best
Erick

On Wed, Mar 3, 2010 at 4:11 PM, Murdoch, Paul <PA...@saic.com>wrote:

> If I have indexed some content that contains some words and a single
> whitespace between each word as NOT_ANALYZED, is it possible to perform
> a phrase search on that a portion of that content?  I'm indexing and
> searching with the StandardAnalyzer 2.9.  Using the KeywordAnalyzer
> works, but I have to type in the entire content as it was indexed.  I
> would like to search for a phrase in the content, but not all of the
> content, preferably with the StandardAnalyzer.
>
>
>
> Thanks,
>
>
>
> Paul
>
>
>
>