You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by rokham <so...@gmail.com> on 2009/03/09 16:01:41 UTC

How to search both Tokenized and Untokenized fields

Hi,

I've been trying to find a way which allows executing a query that contains
both Tokenized and Untokenized fields on Lucene's index, without having to
parse the query. I've been able to execute a query which only uses Tokenized
fields as follows:

   QueryParser queryParser = new QueryParser( DEFAULT_FIELD, analyzer);
   Query query = queryParser.parse(queryString);
   Hits hits = indexSearcher.search(query);

This works fine for Tokenized fields but I'm not sure how to execute a query
("queryString") which contains both tokenized and untokenized fields.

Any suggestion is very much appreciated.

Rokham
-- 
View this message in context: http://www.nabble.com/How-to-search-both-Tokenized-and-Untokenized-fields-tp22413438p22413438.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

RE: How to search both Tokenized and Untokenized fields

Posted by Fa...@emc.com.

Hi,

What do you mean untokenized field?

Are you using different analyzer for different field? If yes, I think
you just use the same analyzer (PerfieldAnalyzer, I guess) for query.

Li

-----Original Message-----
From: rokham [mailto:somebodyiknow@gmail.com] 
Sent: Monday, March 09, 2009 11:02 PM
To: java-user@lucene.apache.org
Subject: How to search both Tokenized and Untokenized fields


Hi,

I've been trying to find a way which allows executing a query that
contains
both Tokenized and Untokenized fields on Lucene's index, without having
to
parse the query. I've been able to execute a query which only uses
Tokenized
fields as follows:

   QueryParser queryParser = new QueryParser( DEFAULT_FIELD, analyzer);
   Query query = queryParser.parse(queryString);
   Hits hits = indexSearcher.search(query);

This works fine for Tokenized fields but I'm not sure how to execute a
query
("queryString") which contains both tokenized and untokenized fields.

Any suggestion is very much appreciated.

Rokham
-- 
View this message in context:
http://www.nabble.com/How-to-search-both-Tokenized-and-Untokenized-field
s-tp22413438p22413438.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: How to search both Tokenized and Untokenized fields

Posted by Chris Hostetter <ho...@fucit.org>.

: Well, PerFieldAnalyzerWrapper is just a bunch of Analyzers,independent of
: queries. See the API, but in general
: PerFieldAnalyzerWrapper perf = new PerFieldAnalyzerWrapper("default", new
: StandardAnalyzer());
: 
: perf.add("untokenized", new WhitespaceAnalyzer());
: perf.add("tokenized", new SnowballAnalyzer());

if the "untokenized" field was indexed using Field.Index.UN_TOKENIZED (or 
NO_NORMS) then you'll probably want to use KeywordAnalyzer instead of 
WhitespaceAnalyzer ... that way a query string like...

    +untokenized:"string with whitespace" +tokenized:"other string"

...will correctly match a doc containing that value in the untokenized 
field.




-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: How to search both Tokenized and Untokenized fields

Posted by Erick Erickson <er...@gmail.com>.

Well, PerFieldAnalyzerWrapper is just a bunch of Analyzers,independent of
queries. See the API, but in general
PerFieldAnalyzerWrapper perf = new PerFieldAnalyzerWrapper("default", new
StandardAnalyzer());

perf.add("untokenized", new WhitespaceAnalyzer());
perf.add("tokenized", new SnowballAnalyzer());

etc...

Some time later...
QueryParser qp = new QueryParser("defaultfield", perf);
Query q = qp.parse("tokenized:value1 +tokenized:vaue2) (+untokenized:value3
+
untokenized:value4)");


But you have to get from your user input to the field:value form
and that's what your application has to do. Presumably
your application has some way of getting the query from
the user in such a fashion that you can map particular terms
to particular fields. If you don't, you have a problem that
Lucene can't help you with <G>..

Best
Erick


On Wed, Mar 11, 2009 at 1:22 AM, rokham <so...@gmail.com> wrote:

>
> Thanks a bunch for you very prompt reply. I looked into the
> PerFieldAnalyzerWrapper class and I understand how you can add a specific
> analyzer for each field. My question is how does this link to the query
> that's sent to me.
>
> If I'm given a query as follows:
> (+tokenized:value1 +tokenized:vaue2) (+untokenized:value3 +
> untokenized:value4)
>
> can you please give me a seudo code/code example where I would search
> Lucene's index based on the given fields and my desired analyzer for each
> field? I'm not clear on how I can go about building a
> PerFieldAnalyzerWrapper object without having to parse the query and take
> out the fields and assign their specific analyzer to them.
>
> Rokham
>
>
>
> Erick Erickson wrote:
> >
> > PerFieldAnalyzerWrapper is your friend, assuming that you have separate
> > fields, some tokenized and some not. If you *don't* have separate
> > fields, then we need more details of what you hope to accomplish...
> >
> > something like
> >
> > (+tokenized:value1 +tokenized:vaue2) (+untokenized:value3 +
> > untokenized:value4)
> >
> > should do the trick, where you've constructed a PerFieldAnalyzerWrapper
> > with a tokenizing analyzer for field "tokenized" and a non-tokenizing
> > analyzer
> > for field "untokenized".
> >
> > Best
> > Erick
> >
> > On Mon, Mar 9, 2009 at 11:01 AM, rokham <so...@gmail.com> wrote:
> >
> >>
> >> Hi,
> >>
> >> I've been trying to find a way which allows executing a query that
> >> contains
> >> both Tokenized and Untokenized fields on Lucene's index, without having
> >> to
> >> parse the query. I've been able to execute a query which only uses
> >> Tokenized
> >> fields as follows:
> >>
> >>   QueryParser queryParser = new QueryParser( DEFAULT_FIELD, analyzer);
> >>   Query query = queryParser.parse(queryString);
> >>   Hits hits = indexSearcher.search(query);
> >>
> >> This works fine for Tokenized fields but I'm not sure how to execute a
> >> query
> >> ("queryString") which contains both tokenized and untokenized fields.
> >>
> >> Any suggestion is very much appreciated.
> >>
> >> Rokham
> >> --
> >> View this message in context:
> >>
> http://www.nabble.com/How-to-search-both-Tokenized-and-Untokenized-fields-tp22413438p22413438.html
> >> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >>
> >
> >
>
> --
> View this message in context:
> http://www.nabble.com/How-to-search-both-Tokenized-and-Untokenized-fields-tp22413438p22449012.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: How to search both Tokenized and Untokenized fields

Posted by rokham <so...@gmail.com>.

Thanks a bunch for you very prompt reply. I looked into the
PerFieldAnalyzerWrapper class and I understand how you can add a specific
analyzer for each field. My question is how does this link to the query
that's sent to me.

If I'm given a query as follows:
(+tokenized:value1 +tokenized:vaue2) (+untokenized:value3 +
untokenized:value4)

can you please give me a seudo code/code example where I would search
Lucene's index based on the given fields and my desired analyzer for each
field? I'm not clear on how I can go about building a
PerFieldAnalyzerWrapper object without having to parse the query and take
out the fields and assign their specific analyzer to them.

Rokham



Erick Erickson wrote:
> 
> PerFieldAnalyzerWrapper is your friend, assuming that you have separate
> fields, some tokenized and some not. If you *don't* have separate
> fields, then we need more details of what you hope to accomplish...
> 
> something like
> 
> (+tokenized:value1 +tokenized:vaue2) (+untokenized:value3 +
> untokenized:value4)
> 
> should do the trick, where you've constructed a PerFieldAnalyzerWrapper
> with a tokenizing analyzer for field "tokenized" and a non-tokenizing
> analyzer
> for field "untokenized".
> 
> Best
> Erick
> 
> On Mon, Mar 9, 2009 at 11:01 AM, rokham <so...@gmail.com> wrote:
> 
>>
>> Hi,
>>
>> I've been trying to find a way which allows executing a query that
>> contains
>> both Tokenized and Untokenized fields on Lucene's index, without having
>> to
>> parse the query. I've been able to execute a query which only uses
>> Tokenized
>> fields as follows:
>>
>>   QueryParser queryParser = new QueryParser( DEFAULT_FIELD, analyzer);
>>   Query query = queryParser.parse(queryString);
>>   Hits hits = indexSearcher.search(query);
>>
>> This works fine for Tokenized fields but I'm not sure how to execute a
>> query
>> ("queryString") which contains both tokenized and untokenized fields.
>>
>> Any suggestion is very much appreciated.
>>
>> Rokham
>> --
>> View this message in context:
>> http://www.nabble.com/How-to-search-both-Tokenized-and-Untokenized-fields-tp22413438p22413438.html
>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
> 
> 

-- 
View this message in context: http://www.nabble.com/How-to-search-both-Tokenized-and-Untokenized-fields-tp22413438p22449012.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: How to search both Tokenized and Untokenized fields

Posted by Erick Erickson <er...@gmail.com>.

PerFieldAnalyzerWrapper is your friend, assuming that you have separate
fields, some tokenized and some not. If you *don't* have separate
fields, then we need more details of what you hope to accomplish...

something like

(+tokenized:value1 +tokenized:vaue2) (+untokenized:value3 +
untokenized:value4)

should do the trick, where you've constructed a PerFieldAnalyzerWrapper
with a tokenizing analyzer for field "tokenized" and a non-tokenizing
analyzer
for field "untokenized".

Best
Erick

On Mon, Mar 9, 2009 at 11:01 AM, rokham <so...@gmail.com> wrote:

>
> Hi,
>
> I've been trying to find a way which allows executing a query that contains
> both Tokenized and Untokenized fields on Lucene's index, without having to
> parse the query. I've been able to execute a query which only uses
> Tokenized
> fields as follows:
>
>   QueryParser queryParser = new QueryParser( DEFAULT_FIELD, analyzer);
>   Query query = queryParser.parse(queryString);
>   Hits hits = indexSearcher.search(query);
>
> This works fine for Tokenized fields but I'm not sure how to execute a
> query
> ("queryString") which contains both tokenized and untokenized fields.
>
> Any suggestion is very much appreciated.
>
> Rokham
> --
> View this message in context:
> http://www.nabble.com/How-to-search-both-Tokenized-and-Untokenized-fields-tp22413438p22413438.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>