You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Luke Shannon <ls...@futurebrand.com> on 2005/02/18 23:44:52 UTC

More Analyzer Question

I have created an Analyzer that I think should just be converting to lower
case and add synonyms in the index (it is at the end of the email).

The problem is, after running it I get one more result than I was expecting
(Document 1 should not be there):

Running testNameCombination1, expecting: 1 result
The query: +(type:138) +(name:mario*) returned 2

Start Listing documents:

Document: 0 contains:
Name: Text<name:mario test>
Desc: Text<desc:this is test from mario>


Document: 1 contains:
Name: Text<name:test mario>
Desc: Text<desc:retro>

End Listing documents

Those same 2 documents in Luke look like this:

Document 0
Text<name:mario test>
Text<desc:this is test from mario>

Document 1
Text<name:test mario>
Text<desc:retro>

That looks correct to me. The query shouldn't match Document 1.

The analzyer used on this field is below and is applied like so:

//set the default
PerFieldAnalyzerWrapper analyzer = new PerFieldAnalyzerWrapper(new
SynonymAnalyzer(new FBSynonymEngine()));

//the analyzer for the name field (only converts to lower case and adds
synonyms
analyzer.addAnalyzer("name", new KeywordSynonymAnalyzer(new
FBSynonymEngine()));

Any help would be appreciated.

Thanks,

Luke


import org.apache.lucene.analysis.*;
import java.io.Reader;

public class KeywordSynonymAnalyzer extends Analyzer {
    private SynonymEngine engine;

    public KeywordSynonymAnalyzer(SynonymEngine engine) {
        this.engine = engine;
    }

    public TokenStream tokenStream(String fieldName, Reader reader) {
        TokenStream result = new SynonymFilter(new
LowerCaseTokenizer(reader), engine);
        return result;
    }
}







Luke Shannon | Software Developer
FutureBrand Toronto

207 Queen's Quay, Suite 400
Toronto, ON, M5J 1A7
416 642 7935 (office)



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Optional Terms in a single query

Posted by Andrzej Bialecki <ab...@getopt.org>.
Todd VanderVeen wrote:
>>
> I would be careful using wildcards as proposed. They can be inefficient 
> (particularly in a list of disjunctions) but even more importantly you 
> are excluding more than the 3 names. Your results won't be consistent 
> with your intent.

In the new version of Luke (the tool) you can view how your wildcard 
query is re-written into boolean queries. This should help to catch 
those cases where wildcard queries match unwanted terms.

-- 
Best regards,
Andrzej Bialecki
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Optional Terms in a single query

Posted by Todd VanderVeen <td...@part.net>.
Luke Shannon wrote:

>Hi Tod;
>
>Thanks for your help.
>
>I was able to do what you said but in a much uglier way using a Boolean
>Query and adding Wildcard Queries.
>
>The end result looks like this:
>
>The query: +(type:138) +((-name:*tim* -name:*bill* -name:*harry*
>+olfaithfull:stillhere))
>
>But this one works as expected.
>
>Thanks!
>
>Luke
>----- Original Message ----- 
>From: "Todd VanderVeen" <td...@part.net>
>To: "Lucene Users List" <lu...@jakarta.apache.org>
>Sent: Monday, February 21, 2005 6:26 PM
>Subject: Re: Optional Terms in a single query
>
>
>  
>
>>Luke Shannon wrote:
>>
>>    
>>
>>>The API I'm working with combines a series of queries into one larger one
>>>using a boolean query.
>>>
>>>Queries on the same field get OR's into one big query. All remaining
>>>      
>>>
>queries
>  
>
>>>are AND'd with this big one.
>>>
>>>Working with in this system I have:
>>>
>>>arg = (mario luigi bobby joe) //i do have control of how this list is
>>>created
>>>
>>>I pass this to the QueryParser:
>>>
>>>Query query1 = QueryParser.parse(arg, "name", new StandardAnalyzer());
>>>Query query2 = QueryParser.parse("stillhere", "olfaithfull", new
>>>StandardAnalyzer());
>>>BooleanQuery typeNegativeSearch = new BooleanQuery();
>>>typeNegativeSearch.add(query1, false, true);
>>>typeNegativeSearch.add(query2, true, false);
>>>
>>>This is half the query.
>>>
>>>It gets AND'd with the other half, to create what you see below:
>>>
>>>+(type:181) +((-(name:tim name:harry name:bill) +olfaithfull:stillhere))
>>>
>>>What I am having trouble with is getting the QueryParser to create
>>>this: -name:(tim bill harry)
>>>
>>>I feel like this is something simple, but for some reason I can't figure
>>>      
>>>
>it
>  
>
>>>out.
>>>
>>>Thanks,
>>>
>>>Luke
>>>
>>>
>>>
>>>      
>>>
>>Is the API something you control?
>>
>>Lets call the other half of you query query3. To avoid the extra nesting
>>you need to do the composition in a single boolean query.
>>
>>Query query1 = QueryParser.parse(arg, "name", new StandardAnalyzer());
>>Query query2 = QueryParser.parse("stillhere", "olfaithfull", new
>>    
>>
>StandardAnalyzer());
>  
>
>>Query query3 = ....
>>
>>BooleanQuery finalQuery = new BooleanQuery();
>>finalQuery.add(query1, false, true);
>>finalQuery.add(query2, true, false);
>>finalQuery.add(query3, true, false);
>>
>>Cheers,
>>Todd VanderVeen
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>
>>    
>>
>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>  
>
I would be careful using wildcards as proposed. They can be inefficient 
(particularly in a list of disjunctions) but even more importantly you 
are excluding more than the 3 names. Your results won't be consistent 
with your intent.

Cheers,
Todd VanderVeen



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Optional Terms in a single query

Posted by Luke Shannon <ls...@futurebrand.com>.
Hi Tod;

Thanks for your help.

I was able to do what you said but in a much uglier way using a Boolean
Query and adding Wildcard Queries.

The end result looks like this:

The query: +(type:138) +((-name:*tim* -name:*bill* -name:*harry*
+olfaithfull:stillhere))

But this one works as expected.

Thanks!

Luke
----- Original Message ----- 
From: "Todd VanderVeen" <td...@part.net>
To: "Lucene Users List" <lu...@jakarta.apache.org>
Sent: Monday, February 21, 2005 6:26 PM
Subject: Re: Optional Terms in a single query


> Luke Shannon wrote:
>
> >The API I'm working with combines a series of queries into one larger one
> >using a boolean query.
> >
> >Queries on the same field get OR's into one big query. All remaining
queries
> >are AND'd with this big one.
> >
> >Working with in this system I have:
> >
> >arg = (mario luigi bobby joe) //i do have control of how this list is
> >created
> >
> >I pass this to the QueryParser:
> >
> >Query query1 = QueryParser.parse(arg, "name", new StandardAnalyzer());
> >Query query2 = QueryParser.parse("stillhere", "olfaithfull", new
> >StandardAnalyzer());
> >BooleanQuery typeNegativeSearch = new BooleanQuery();
> >typeNegativeSearch.add(query1, false, true);
> >typeNegativeSearch.add(query2, true, false);
> >
> >This is half the query.
> >
> >It gets AND'd with the other half, to create what you see below:
> >
> >+(type:181) +((-(name:tim name:harry name:bill) +olfaithfull:stillhere))
> >
> >What I am having trouble with is getting the QueryParser to create
> >this: -name:(tim bill harry)
> >
> >I feel like this is something simple, but for some reason I can't figure
it
> >out.
> >
> >Thanks,
> >
> >Luke
> >
> >
> >
> Is the API something you control?
>
> Lets call the other half of you query query3. To avoid the extra nesting
> you need to do the composition in a single boolean query.
>
> Query query1 = QueryParser.parse(arg, "name", new StandardAnalyzer());
> Query query2 = QueryParser.parse("stillhere", "olfaithfull", new
StandardAnalyzer());
> Query query3 = ....
>
> BooleanQuery finalQuery = new BooleanQuery();
> finalQuery.add(query1, false, true);
> finalQuery.add(query2, true, false);
> finalQuery.add(query3, true, false);
>
> Cheers,
> Todd VanderVeen
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Optional Terms in a single query

Posted by Todd VanderVeen <td...@part.net>.
Luke Shannon wrote:

>The API I'm working with combines a series of queries into one larger one
>using a boolean query.
>
>Queries on the same field get OR's into one big query. All remaining queries
>are AND'd with this big one.
>
>Working with in this system I have:
>
>arg = (mario luigi bobby joe) //i do have control of how this list is
>created
>
>I pass this to the QueryParser:
>
>Query query1 = QueryParser.parse(arg, "name", new StandardAnalyzer());
>Query query2 = QueryParser.parse("stillhere", "olfaithfull", new
>StandardAnalyzer());
>BooleanQuery typeNegativeSearch = new BooleanQuery();
>typeNegativeSearch.add(query1, false, true);
>typeNegativeSearch.add(query2, true, false);
>
>This is half the query.
>
>It gets AND'd with the other half, to create what you see below:
>
>+(type:181) +((-(name:tim name:harry name:bill) +olfaithfull:stillhere))
>
>What I am having trouble with is getting the QueryParser to create
>this: -name:(tim bill harry)
>
>I feel like this is something simple, but for some reason I can't figure it
>out.
>
>Thanks,
>
>Luke
>
>  
>
Is the API something you control?

Lets call the other half of you query query3. To avoid the extra nesting 
you need to do the composition in a single boolean query.

Query query1 = QueryParser.parse(arg, "name", new StandardAnalyzer());
Query query2 = QueryParser.parse("stillhere", "olfaithfull", new StandardAnalyzer());
Query query3 = ....

BooleanQuery finalQuery = new BooleanQuery();
finalQuery.add(query1, false, true);
finalQuery.add(query2, true, false);
finalQuery.add(query3, true, false);

Cheers,
Todd VanderVeen

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Optional Terms in a single query

Posted by Luke Shannon <ls...@futurebrand.com>.
The API I'm working with combines a series of queries into one larger one
using a boolean query.

Queries on the same field get OR's into one big query. All remaining queries
are AND'd with this big one.

Working with in this system I have:

arg = (mario luigi bobby joe) //i do have control of how this list is
created

I pass this to the QueryParser:

Query query1 = QueryParser.parse(arg, "name", new StandardAnalyzer());
Query query2 = QueryParser.parse("stillhere", "olfaithfull", new
StandardAnalyzer());
BooleanQuery typeNegativeSearch = new BooleanQuery();
typeNegativeSearch.add(query1, false, true);
typeNegativeSearch.add(query2, true, false);

This is half the query.

It gets AND'd with the other half, to create what you see below:

+(type:181) +((-(name:tim name:harry name:bill) +olfaithfull:stillhere))

What I am having trouble with is getting the QueryParser to create
this: -name:(tim bill harry)

I feel like this is something simple, but for some reason I can't figure it
out.

Thanks,

Luke

----- Original Message ----- 
From: "Todd VanderVeen" <td...@part.net>
To: "Lucene Users List" <lu...@jakarta.apache.org>
Sent: Monday, February 21, 2005 5:33 PM
Subject: Re: Optional Terms in a single query


> Luke Shannon wrote:
>
> >Hi;
> >
> >I'm trying to create a query that look for a field containing type:181
and
> >name doesn't contain tim, bill or harry.
> >
> >+(type: 181) +((-name: tim -name:bill -name:harry +oldfaith:stillHere))
> >+(type: 181) +((-name: tim OR bill OR harry +oldfaith:stillHere))
> >+(type: 181) +((-name:*(tim bill harry)* +olfaithfull:stillhere))
> >+(type:1 81) +((-name:*(tim OR bill OR harry)* +olfaithfull:stillhere))
> >
> >I would really think to do this all in one Query. Is this even possible?
> >
> >Thanks,
> >
> >Luke
> >
> >
> >
> >---------------------------------------------------------------------
> >To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> >For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> >
> >
> >
> All all the queries listed attempts at the same things?
>
> I'm guessing you want this:
>
> +type:181 -name:(tim bill harry) +oldfaith:stillHere
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Optional Terms in a single query

Posted by Todd VanderVeen <td...@part.net>.
Luke Shannon wrote:

>Hi;
>
>I'm trying to create a query that look for a field containing type:181 and
>name doesn't contain tim, bill or harry.
>
>+(type: 181) +((-name: tim -name:bill -name:harry +oldfaith:stillHere))
>+(type: 181) +((-name: tim OR bill OR harry +oldfaith:stillHere))
>+(type: 181) +((-name:*(tim bill harry)* +olfaithfull:stillhere))
>+(type:1 81) +((-name:*(tim OR bill OR harry)* +olfaithfull:stillhere))
>
>I would really think to do this all in one Query. Is this even possible?
>
>Thanks,
>
>Luke
>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>  
>
All all the queries listed attempts at the same things?

I'm guessing you want this:

+type:181 -name:(tim bill harry) +oldfaith:stillHere



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Optional Terms in a single query

Posted by Luke Shannon <ls...@futurebrand.com>.
Sorry about the typos.

What I would like is a document with a type field = 181,
olfaithfull=stillHere and a name field not containing tim, bill or harry.

Thanks,

Luke

----- Original Message ----- 
From: "Paul Elschot" <pa...@xs4all.nl>
To: <lu...@jakarta.apache.org>
Sent: Monday, February 21, 2005 5:31 PM
Subject: Re: Optional Terms in a single query


> On Monday 21 February 2005 23:23, Luke Shannon wrote:
> > Hi;
> >
> > I'm trying to create a query that look for a field containing type:181
and
> > name doesn't contain tim, bill or harry.
>
> type: 181  -(name: tim name:bill name:harry)
>
> > +(type: 181) +((-name: tim -name:bill -name:harry +oldfaith:stillHere))
>
> stillHere is normally lowercased before searching. Is that ok?
>
> > +(type: 181) +((-name: tim OR bill OR harry +oldfaith:stillHere))
> > +(type: 181) +((-name:*(tim bill harry)* +olfaithfull:stillhere))
>
> typo? olfaithfull
>
> > +(type:1 81) +((-name:*(tim OR bill OR harry)* +olfaithfull:stillhere))
>
> typo? (type:1 81)
>
> > I would really think to do this all in one Query. Is this even possible?
>
> How would you want to combine the results?
>
> Regards,
> Paul Elschot
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Optional Terms in a single query

Posted by Paul Elschot <pa...@xs4all.nl>.
On Monday 21 February 2005 23:23, Luke Shannon wrote:
> Hi;
> 
> I'm trying to create a query that look for a field containing type:181 and
> name doesn't contain tim, bill or harry.

type: 181  -(name: tim name:bill name:harry)

> +(type: 181) +((-name: tim -name:bill -name:harry +oldfaith:stillHere))

stillHere is normally lowercased before searching. Is that ok?

> +(type: 181) +((-name: tim OR bill OR harry +oldfaith:stillHere))
> +(type: 181) +((-name:*(tim bill harry)* +olfaithfull:stillhere))

typo? olfaithfull 

> +(type:1 81) +((-name:*(tim OR bill OR harry)* +olfaithfull:stillhere))

typo? (type:1 81)
 
> I would really think to do this all in one Query. Is this even possible?

How would you want to combine the results?

Regards,
Paul Elschot


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Optional Terms in a single query

Posted by Luke Shannon <ls...@futurebrand.com>.
Hi;

I'm trying to create a query that look for a field containing type:181 and
name doesn't contain tim, bill or harry.

+(type: 181) +((-name: tim -name:bill -name:harry +oldfaith:stillHere))
+(type: 181) +((-name: tim OR bill OR harry +oldfaith:stillHere))
+(type: 181) +((-name:*(tim bill harry)* +olfaithfull:stillhere))
+(type:1 81) +((-name:*(tim OR bill OR harry)* +olfaithfull:stillhere))

I would really think to do this all in one Query. Is this even possible?

Thanks,

Luke



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: More Analyzer Question

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
The problem is your KeywordSynonymAnalyzer is not truly a "keyword" 
analyzer in that it is tokenizing the field into parts.  So Document 1 
has [test] and [mario] as tokens that come from the LowerCaseTokenizer.

Look at Lucene's svn repository under contrib/analyzers and you'll see 
a KeywordTokenizer and corresponding KeywordAnalyzer you can use.

	Erik


On Feb 18, 2005, at 5:44 PM, Luke Shannon wrote:
> I have created an Analyzer that I think should just be converting to 
> lower
> case and add synonyms in the index (it is at the end of the email).
>
> The problem is, after running it I get one more result than I was 
> expecting
> (Document 1 should not be there):
>
> Running testNameCombination1, expecting: 1 result
> The query: +(type:138) +(name:mario*) returned 2
>
> Start Listing documents:
>
> Document: 0 contains:
> Name: Text<name:mario test>
> Desc: Text<desc:this is test from mario>
>
>
> Document: 1 contains:
> Name: Text<name:test mario>
> Desc: Text<desc:retro>
>
> End Listing documents
>
> Those same 2 documents in Luke look like this:
>
> Document 0
> Text<name:mario test>
> Text<desc:this is test from mario>
>
> Document 1
> Text<name:test mario>
> Text<desc:retro>
>
> That looks correct to me. The query shouldn't match Document 1.
>
> The analzyer used on this field is below and is applied like so:
>
> //set the default
> PerFieldAnalyzerWrapper analyzer = new PerFieldAnalyzerWrapper(new
> SynonymAnalyzer(new FBSynonymEngine()));
>
> //the analyzer for the name field (only converts to lower case and adds
> synonyms
> analyzer.addAnalyzer("name", new KeywordSynonymAnalyzer(new
> FBSynonymEngine()));
>
> Any help would be appreciated.
>
> Thanks,
>
> Luke
>
>
> import org.apache.lucene.analysis.*;
> import java.io.Reader;
>
> public class KeywordSynonymAnalyzer extends Analyzer {
>     private SynonymEngine engine;
>
>     public KeywordSynonymAnalyzer(SynonymEngine engine) {
>         this.engine = engine;
>     }
>
>     public TokenStream tokenStream(String fieldName, Reader reader) {
>         TokenStream result = new SynonymFilter(new
> LowerCaseTokenizer(reader), engine);
>         return result;
>     }
> }
>
>
>
>
>
>
>
> Luke Shannon | Software Developer
> FutureBrand Toronto
>
> 207 Queen's Quay, Suite 400
> Toronto, ON, M5J 1A7
> 416 642 7935 (office)
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org