You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Ryan O'Hara <oh...@genome.chop.edu> on 2007/04/05 20:56:36 UTC

Not able to search on UN_TOKENIZED fields

Hey,

I was just wondering if you are supposed to be able to search on  
UN_TOKENIZED fields?  It seems like you can from the docs, but I have  
been unsuccessful.  I want to do exact string matching on a certain  
field without analyzer interference.

Thanks,
Ryan

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Not able to search on UN_TOKENIZED fields

Posted by Antony Bowesman <ad...@teamware.com>.

You can either use KeywordAnalyzer with your QueryParser which will correctly 
handle UN_TOKENIZED fields, but that will use KeywordAnalyzer for all fields.

To use a field specific Analyzer you either need to use PerFieldAnalyzerWrapper 
and preload it with all possible fields and use that as the Analyzer for 
QueryParser.  Alternatively, override QueryParser's getFieldQuery() and then 
choose your Analyzer there based on the field being searched.

Antony

Ryan O'Hara wrote:
> Hey Erick,
> 
> Thanks for the quick response.  I need a truly exact match.  What I 
> ended up doing was using a TOKENIZED field, but altering the 
> StandardAnalyzer's stop word list to include only the word/letter 'a'.  
> Below is my searching code:
> 
> String[] stopWords = {"a"};
>             StandardAnalyzer sa = new StandardAnalyzer(stopWords);
>             QueryParser qp = new QueryParser("symbol", sa);
>             Query queryPhrase = qp.parse(symbol.toUpperCase());
>             Hits hits = searcher.search(queryPhrase);
>             String hit;
>             if(hits.length() > 0){
>                 hit = hits.doc(0).get("count");
>                 count = Integer.parseInt(hit);
>             }
> 
> Is the reason it wasn't working due to the fact that I'm passing in a 
> StandardAnalyzer?  I thought that maybe the searching mechanisms would 
> be able to use or not use an analyzer according to what the field.index 
> value is.
> 
> One other question that you may have an answer to:  I'm eventually going 
> to need to alter the stop word list to include all default stop words, 
> except those that match certain criteria.  Can this be done?
> 
> Thanks,
> Ryan



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Not able to search on UN_TOKENIZED fields

Posted by Erick Erickson <er...@gmail.com>.

See below

On 4/5/07, Ryan O'Hara <oh...@genome.chop.edu> wrote:
>
> Hey Erick,
>
> Thanks for the quick response.  I need a truly exact match.  What I
> ended up doing was using a TOKENIZED field, but altering the
> StandardAnalyzer's stop word list to include only the word/letter
> 'a'.  Below is my searching code:

You're heading for a world of hurt here. I flat guarantee that you'll
spend a LOT of time tweaking this to get it right, because you're
not doing what you said you wanted to do, that is match exactly.
You said it yourself, you're "removing stop words" which is NOT exact
matching.

You need to make a clear statement of what you're trying to do
before too much longer. An example or two would help.

String[] stopWords = {"a"};
>              StandardAnalyzer sa = new StandardAnalyzer(stopWords);
>              QueryParser qp = new QueryParser("symbol", sa);
>              Query queryPhrase = qp.parse(symbol.toUpperCase());
>              Hits hits = searcher.search(queryPhrase);
>              String hit;
>              if(hits.length() > 0){
>                  hit = hits.doc(0).get("count");
>                  count = Integer.parseInt(hit);
>              }

You're uppercasing your search string. Did you index things
uppercased? Your QueryParser is trying to work on a field called
"symbol". Is that what you intend? I'm a bit confused when you
use the same word for the field and its value.

Is the reason it wasn't working due to the fact that I'm passing in a
> StandardAnalyzer?

I have no idea why it isn't working. You haven't provided the
query.toString() output. There's no evidence you've used
Luke to look at what you've indexed to know what to expect.
Without those two things, or a self-contained example I can only
guess.

I thought that maybe the searching mechanisms
> would be able to use or not use an analyzer according to what the
> field.index value is.

No, you can't do this. You can make a PerFieldAnalyzerWrapper to
process different *fields* with different analyzers, but not different
field *values*. Step back and consider that the purpose of an
Analyzer is to break up the token stream. Attaching
meaning to those tokens isn't part of an Analyzer's job. You
could create your own analyzer (see Lucene In Action) if you
wanted to do special operations on the tokens, but I really don't
think you need to go there.

One other question that you may have an answer to:  I'm eventually
> going to need to alter the stop word list to include all default stop
> words, except those that match certain criteria.  Can this be done?

Sure, this can be done. Again you've already done it with your
StandardAnalyzer constructor. But you better have *indexed*
the data with the same stop words removed or you'll be disappointed.
Again, you're doing something that directly contradicts
what you stated you are trying to do. So can you come up with
a better problem statement?

I'd also recommend that you use one of the other analyzers for a while
until you get a better feel for what Lucene is doing. StopAnalyzer comes
to mind.

Thanks,
> Ryan
>
>
> On Apr 5, 2007, at 3:08 PM, Erick Erickson wrote:
>
> > Yes, you can search on UN_TOKENIZED fields, but they're exact,
> > really, really exact <G>.
> >
> > I'd recommend that you get a copy of Luke (google lucene luke) and
> > examine your index to see what you actually have in your index.
> >
> > Also, you haven't provided us a clue what the actual query is. I'd
> > use Query.toString().
> >
> > I suspect that the query is getting tokenized if you're using one
> > of the normal Analyzers...
> >
> > Erick
> >
> > On 4/5/07, Ryan O'Hara <oh...@genome.chop.edu> wrote:
> >>
> >> Hey,
> >>
> >> I was just wondering if you are supposed to be able to search on
> >> UN_TOKENIZED fields?  It seems like you can from the docs, but I have
> >> been unsuccessful.  I want to do exact string matching on a certain
> >> field without analyzer interference.
> >>
> >> Thanks,
> >> Ryan
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Not able to search on UN_TOKENIZED fields

Posted by Ryan O'Hara <oh...@genome.chop.edu>.

Hey Erick,

Thanks for the quick response.  I need a truly exact match.  What I  
ended up doing was using a TOKENIZED field, but altering the  
StandardAnalyzer's stop word list to include only the word/letter  
'a'.  Below is my searching code:

String[] stopWords = {"a"};
             StandardAnalyzer sa = new StandardAnalyzer(stopWords);
             QueryParser qp = new QueryParser("symbol", sa);
             Query queryPhrase = qp.parse(symbol.toUpperCase());
             Hits hits = searcher.search(queryPhrase);
             String hit;
             if(hits.length() > 0){
                 hit = hits.doc(0).get("count");
                 count = Integer.parseInt(hit);
             }

Is the reason it wasn't working due to the fact that I'm passing in a  
StandardAnalyzer?  I thought that maybe the searching mechanisms  
would be able to use or not use an analyzer according to what the  
field.index value is.

One other question that you may have an answer to:  I'm eventually  
going to need to alter the stop word list to include all default stop  
words, except those that match certain criteria.  Can this be done?

Thanks,
Ryan

On Apr 5, 2007, at 3:08 PM, Erick Erickson wrote:

> Yes, you can search on UN_TOKENIZED fields, but they're exact,
> really, really exact <G>.
>
> I'd recommend that you get a copy of Luke (google lucene luke) and
> examine your index to see what you actually have in your index.
>
> Also, you haven't provided us a clue what the actual query is. I'd
> use Query.toString().
>
> I suspect that the query is getting tokenized if you're using one
> of the normal Analyzers...
>
> Erick
>
> On 4/5/07, Ryan O'Hara <oh...@genome.chop.edu> wrote:
>>
>> Hey,
>>
>> I was just wondering if you are supposed to be able to search on
>> UN_TOKENIZED fields?  It seems like you can from the docs, but I have
>> been unsuccessful.  I want to do exact string matching on a certain
>> field without analyzer interference.
>>
>> Thanks,
>> Ryan
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Not able to search on UN_TOKENIZED fields

Posted by Erick Erickson <er...@gmail.com>.

Yes, you can search on UN_TOKENIZED fields, but they're exact,
really, really exact <G>.

I'd recommend that you get a copy of Luke (google lucene luke) and
examine your index to see what you actually have in your index.

Also, you haven't provided us a clue what the actual query is. I'd
use Query.toString().

I suspect that the query is getting tokenized if you're using one
of the normal Analyzers...

Erick

On 4/5/07, Ryan O'Hara <oh...@genome.chop.edu> wrote:
>
> Hey,
>
> I was just wondering if you are supposed to be able to search on
> UN_TOKENIZED fields?  It seems like you can from the docs, but I have
> been unsuccessful.  I want to do exact string matching on a certain
> field without analyzer interference.
>
> Thanks,
> Ryan
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>