You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Leandro Saad <le...@gmail.com> on 2006/09/11 16:37:32 UTC

Fields with phrases

Hi all,

I have a field called "location" on my index. For example, this string:  "A
B" "A C" D was stored on my index
When I search for "location: ", these are the results that I'd like to
retrieve:
1) location: D -- 1 hit
2) location: A -- no hits
3) location: "A B" -- 1 hit
4) location: "A C" -- 1 hit

Is there any way I can make this work?

-- 
Leandro Rodrigo Saad Cruz
software developer - certified scrum master
:: scrum.com.br
:: db.apache.org/ojb
:: guara-framework.sf.net
:: xingu.sf.net

Fields with phrases

Posted by Leandro Saad <le...@gmail.com>.
Hi all,

I have a field called "location" on my index. For example, this string:  "A
B" "A C" D was stored on my index
When I search for "location: ", these are the results that I'd like to
retrieve:
1) location: D -- 1 hit
2) location: A -- no hits
3) location: "A B" -- 1 hit
4) location: "A C" -- 1 hit

Is there any way I can make this work?

-- 
Leandro Rodrigo Saad Cruz
software developer - certified scrum master
:: scrum.com.br
:: db.apache.org/ojb
:: guara-framework.sf.net
:: xingu.sf.net

Re: Fields with phrases

Posted by Chris Hostetter <ho...@fucit.org>.
: I have a field called "location" on my index. For example, this string:  "A
: B" "A C" D was stored on my index
: When I search for "location: ", these are the results that I'd like to
: retrieve:
: 1) location: D -- 1 hit
: 2) location: A -- no hits
: 3) location: "A B" -- 1 hit
: 4) location: "A C" -- 1 hit

off the cuff, it sounds like what you want is to index your individual
quoted strings as single token field values (using UN_TOKENIZED or the
KeywordAnalyzer at index time) if you want your matches to be case
insensative, or if you what white space normalization (ie: 'should "A  c"
be a match?) then you may need your own analyzer to deal with that ... but
the cruz of the issue is putting D in as a token, and "A B" as one token,
and "A C" as one token, etc...

there was a very long thread baout this recently...

http://www.nabble.com/Phrase-search-using-quotes----special-Tokenizer-tf2200760.html



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Fields with phrases

Posted by Erick Erickson <er...@gmail.com>.
I know of no way of doing this with the standard analyzers, unless you do
some fooling around..

I think you'd have to write your own analyzer/tokenizer that you use both at
indexing time and query parsing time that broke the input streams up the way
you want. In this case, A B would be a SINGLE token. A C likewise, and D
would be a single token too. Your index would then contain what you want.
You'd have to use the same analyzer when searching as indexing.

Alternatively, you could substitute a special character (again on reading
the input for both the indexing process and the searching process) that
strung your input together, and then use normal analyzers. In this case,
index A_B, A_C, and D. Searching for A_B, A_C and D should then be hits,
while A would not. I like this quite a lot better than fooling around with a
custom tokenizer now that I think of it.

You have to be a bit careful though. If you use StandardAnalyzer in this
case, I *think* it'll split the input on the underscore, so either use some
other character that doesn't get broken up, or use a different analyzer, say
the WhitespaceAnalyzer.

Oh, and be sure to get a copy of Luke to look at your initial tries at this
to see if what you actually index is what you *think* you're indexing. I've
been confused by this more than once <G>....

Best
Erick

On 9/11/06, Leandro Saad <le...@gmail.com> wrote:
>
> Hi all,
>
> I have a field called "location" on my index. For example, this
> string:  "A
> B" "A C" D was stored on my index
> When I search for "location: ", these are the results that I'd like to
> retrieve:
> 1) location: D -- 1 hit
> 2) location: A -- no hits
> 3) location: "A B" -- 1 hit
> 4) location: "A C" -- 1 hit
>
> Is there any way I can make this work?
>
> --
> Leandro Rodrigo Saad Cruz
> software developer - certified scrum master
> :: scrum.com.br
> :: db.apache.org/ojb
> :: guara-framework.sf.net
> :: xingu.sf.net
>
>