You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Les Fletcher <le...@affinitycircles.com> on 2007/05/10 12:35:10 UTC

query syntax question

I have a question about empty fields.  I want to run a query that will 
search against a few particular fields for the query term but then also 
also check to see if a two other fields have any value at all.  i.e., I 
want to search for a set records but don't want to return a record if 
that record has blank first and last name fields.  Any help would be 
greatly appreciated.

Les

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: query syntax question

Posted by Erick Erickson <er...@gmail.com>.
I've thought about a flag field, and I see no reason why that wouldn't
work quite well, it all depends, I suppose, upon how ugly it would
eventually get....

But about caching, what does making a filter have to do with Lucene
caching<G>? Sure, there exist Lucene filter caching classes, but
there's no reason you could not implement, say, a singleton class
whose first invocation filled the underlying filter out and just use that.
in other parts of your program.

Anyway, it sounds like you're well on the way to a solution.

Good luck!
Erick

On 5/10/07, Les Fletcher <le...@affinitycircles.com> wrote:
>
> Unfortuantely at the moment we don't make good use of lucene caching, so
> the setting up of the filter on startup doesn't really work for us at
> the moment.  Maybe just a general flag field instead of a hasname field
> would work better and be more general.  You could just fill this field
> with any flags that need to be set for the particular document.  Each
> flag has a different unique tag that gets concatinated into the field
> then you can make use of your filters with:
>
> flagfield:(+hasfirstname +haslastname)
>
> and the like.
>
> How does this sound?
>
> Les
>
> Erick Erickson wrote:
>
> > I was going to suggest something about TermEnum/TermDocs, but
> > upon reflection that doesn't work so well because you have to
> > enumerate all the terms over all the docs for a field. Ouch.
> >
> > But one could combine the two approaches. Don't index
> > any "special" values in your firstname or lastname fields.
> > I suspect this will hurt you down the road...
> >
> > Instead, ONLY for those documents that DO have either a first
> > or last name, index an orthogonal field, HASNAMES with
> > a single value of "yes" or something. Now you can construct
> > your filter efficiently by enumerating the HASNAMES terms/docs
> > by only enumerating a single term/value.
> >
> > Depending upon how big your index is, you might be able to
> > get away with the termenum/terndocs approach by constructing
> > your filter at start-up time and caching it away somewhere...
> >
> > You *might* also be able to do something at startup time like
> > for (each document in the index) {
> >   get the firstname and lastname. If both null, set your filter bit
> > }
> >
> > If you use the lazy loading, you may be able to do this without
> > loading all of every document....
> >
> > I'm curious to know what you settle on and how it works....
> >
> > Does this make sense in your application?
> >
> > Erick
> >
> > On 5/10/07, Les Fletcher <le...@affinitycircles.com> wrote:
> >
> >>
> >> I like the idea of the filter since I am making heavy use of filters
> for
> >> this particular query, but how would one go about constructing it
> >> efficiently at query time?  All I can see is hacking around not being
> >> able to use the * as the first character.
> >>
> >> Les
> >>
> >> Erick Erickson wrote:
> >>
> >> > You could create a Lucene Filter that had a bit for each document
> that
> >> > had a first or last name and use that at query time to restrict your
> >> > results appropriately. You could create this at startup time or at
> >> > query time. See CachingWrapperFilter for a way to cache it.
> >> >
> >> >
> >> > Another approach would be to add a dummy field to each document,
> >> > something like HASFIRSTORLASTNAME. At index time, when
> >> > you index a document, if it has a first or last name, put "yes" in
> the
> >> > field. Otherwise, put "no".
> >> >
> >> > Then, at search time, add an +HASFIRSTORLASTNAME:yes to the
> >> > query......
> >> >
> >> > You could add as many states to this field as you want.
> >> >
> >> >
> >> > Erick
> >> >
> >> >
> >> > On 5/10/07, Les Fletcher <le...@affinitycircles.com> wrote:
> >> >
> >> >>
> >> >> I have a question about empty fields.  I want to run a query that
> >> will
> >> >> search against a few particular fields for the query term but then
> >> also
> >> >> also check to see if a two other fields have any value at all.
> >> i.e., I
> >> >> want to search for a set records but don't want to return a record
> if
> >> >> that record has blank first and last name fields.  Any help would be
> >> >> greatly appreciated.
> >> >>
> >> >> Les
> >> >>
> >> >>
> ---------------------------------------------------------------------
> >> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >> >>
> >> >>
> >> >
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >>
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: query syntax question

Posted by Les Fletcher <le...@affinitycircles.com>.
Unfortuantely at the moment we don't make good use of lucene caching, so 
the setting up of the filter on startup doesn't really work for us at 
the moment.  Maybe just a general flag field instead of a hasname field 
would work better and be more general.  You could just fill this field 
with any flags that need to be set for the particular document.  Each 
flag has a different unique tag that gets concatinated into the field 
then you can make use of your filters with:

flagfield:(+hasfirstname +haslastname)

and the like. 

How does this sound?

Les

Erick Erickson wrote:

> I was going to suggest something about TermEnum/TermDocs, but
> upon reflection that doesn't work so well because you have to
> enumerate all the terms over all the docs for a field. Ouch.
>
> But one could combine the two approaches. Don't index
> any "special" values in your firstname or lastname fields.
> I suspect this will hurt you down the road...
>
> Instead, ONLY for those documents that DO have either a first
> or last name, index an orthogonal field, HASNAMES with
> a single value of "yes" or something. Now you can construct
> your filter efficiently by enumerating the HASNAMES terms/docs
> by only enumerating a single term/value.
>
> Depending upon how big your index is, you might be able to
> get away with the termenum/terndocs approach by constructing
> your filter at start-up time and caching it away somewhere...
>
> You *might* also be able to do something at startup time like
> for (each document in the index) {
>   get the firstname and lastname. If both null, set your filter bit
> }
>
> If you use the lazy loading, you may be able to do this without
> loading all of every document....
>
> I'm curious to know what you settle on and how it works....
>
> Does this make sense in your application?
>
> Erick
>
> On 5/10/07, Les Fletcher <le...@affinitycircles.com> wrote:
>
>>
>> I like the idea of the filter since I am making heavy use of filters for
>> this particular query, but how would one go about constructing it
>> efficiently at query time?  All I can see is hacking around not being
>> able to use the * as the first character.
>>
>> Les
>>
>> Erick Erickson wrote:
>>
>> > You could create a Lucene Filter that had a bit for each document that
>> > had a first or last name and use that at query time to restrict your
>> > results appropriately. You could create this at startup time or at
>> > query time. See CachingWrapperFilter for a way to cache it.
>> >
>> >
>> > Another approach would be to add a dummy field to each document,
>> > something like HASFIRSTORLASTNAME. At index time, when
>> > you index a document, if it has a first or last name, put "yes" in the
>> > field. Otherwise, put "no".
>> >
>> > Then, at search time, add an +HASFIRSTORLASTNAME:yes to the
>> > query......
>> >
>> > You could add as many states to this field as you want.
>> >
>> >
>> > Erick
>> >
>> >
>> > On 5/10/07, Les Fletcher <le...@affinitycircles.com> wrote:
>> >
>> >>
>> >> I have a question about empty fields.  I want to run a query that 
>> will
>> >> search against a few particular fields for the query term but then 
>> also
>> >> also check to see if a two other fields have any value at all.  
>> i.e., I
>> >> want to search for a set records but don't want to return a record if
>> >> that record has blank first and last name fields.  Any help would be
>> >> greatly appreciated.
>> >>
>> >> Les
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> >> For additional commands, e-mail: java-user-help@lucene.apache.org
>> >>
>> >>
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: query syntax question

Posted by Erick Erickson <er...@gmail.com>.
I was going to suggest something about TermEnum/TermDocs, but
upon reflection that doesn't work so well because you have to
enumerate all the terms over all the docs for a field. Ouch.

But one could combine the two approaches. Don't index
any "special" values in your firstname or lastname fields.
I suspect this will hurt you down the road...

Instead, ONLY for those documents that DO have either a first
or last name, index an orthogonal field, HASNAMES with
a single value of "yes" or something. Now you can construct
your filter efficiently by enumerating the HASNAMES terms/docs
by only enumerating a single term/value.

Depending upon how big your index is, you might be able to
get away with the termenum/terndocs approach by constructing
your filter at start-up time and caching it away somewhere...

You *might* also be able to do something at startup time like
for (each document in the index) {
   get the firstname and lastname. If both null, set your filter bit
}

If you use the lazy loading, you may be able to do this without
loading all of every document....

I'm curious to know what you settle on and how it works....

Does this make sense in your application?

Erick

On 5/10/07, Les Fletcher <le...@affinitycircles.com> wrote:
>
> I like the idea of the filter since I am making heavy use of filters for
> this particular query, but how would one go about constructing it
> efficiently at query time?  All I can see is hacking around not being
> able to use the * as the first character.
>
> Les
>
> Erick Erickson wrote:
>
> > You could create a Lucene Filter that had a bit for each document that
> > had a first or last name and use that at query time to restrict your
> > results appropriately. You could create this at startup time or at
> > query time. See CachingWrapperFilter for a way to cache it.
> >
> >
> > Another approach would be to add a dummy field to each document,
> > something like HASFIRSTORLASTNAME. At index time, when
> > you index a document, if it has a first or last name, put "yes" in the
> > field. Otherwise, put "no".
> >
> > Then, at search time, add an +HASFIRSTORLASTNAME:yes to the
> > query......
> >
> > You could add as many states to this field as you want.
> >
> >
> > Erick
> >
> >
> > On 5/10/07, Les Fletcher <le...@affinitycircles.com> wrote:
> >
> >>
> >> I have a question about empty fields.  I want to run a query that will
> >> search against a few particular fields for the query term but then also
> >> also check to see if a two other fields have any value at all.  i.e., I
> >> want to search for a set records but don't want to return a record if
> >> that record has blank first and last name fields.  Any help would be
> >> greatly appreciated.
> >>
> >> Les
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >>
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: query syntax question

Posted by markharw00d <ma...@yahoo.co.uk>.
Here's a way to do it using the XML query parser in contrib....
1) Create this query.xsl file (note use of cached double negative filter)

<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/Document">
<FilteredQuery>
    <Query>
        <UserQuery><xsl:value-of select="query"/></UserQuery>
    </Query>
    <Filter>
        <CachedFilter>
            <BooleanFilter>
                <Clause occurs="mustNot">
                    <RangeFilter fieldName="surname" lowerTerm="a" 
upperTerm="z"/>
                </Clause>
                <Clause occurs="mustNot">
                    <RangeFilter fieldName="forename" lowerTerm="a" 
upperTerm="z"/>
                </Clause>
            </BooleanFilter>
        </CachedFilter>
    </Filter>
</FilteredQuery>
</xsl:template>
</xsl:stylesheet> 

2) Query as follows:
        //Setup test data
        Analyzer analyzer=new WhitespaceAnalyzer();       
        RAMDirectory rd=new RAMDirectory();
        IndexWriter w=new IndexWriter(rd,new WhitespaceAnalyzer(),true);
        Document d=new Document();
        d.add(new Field("contents","foo 1- must not 
match",Field.Store.YES,Field.Index.TOKENIZED));
        d.add(new 
Field("surname","smith",Field.Store.YES,Field.Index.TOKENIZED));
        w.addDocument(d);
       
        d=new Document();
        d.add(new Field("contents","foo 2- must not 
match",Field.Store.YES,Field.Index.TOKENIZED));
        d.add(new 
Field("forename","fred",Field.Store.YES,Field.Index.TOKENIZED));
        w.addDocument(d);
       
        d=new Document();
        d.add(new Field("contents","foo 3- must not 
match",Field.Store.YES,Field.Index.TOKENIZED));
        d.add(new 
Field("forename","fred",Field.Store.YES,Field.Index.TOKENIZED));
        d.add(new 
Field("surname","smith",Field.Store.YES,Field.Index.TOKENIZED));
        w.addDocument(d);
       
        d=new Document();
        d.add(new Field("contents","foo 4- must 
match",Field.Store.YES,Field.Index.TOKENIZED));
        w.addDocument(d);
        w.close();
       
        IndexSearcher searcher=new IndexSearcher(rd);
       
        //one-off setup - store these
        QueryTemplateManager qtm=new QueryTemplateManager(
                TestXml.class.getResourceAsStream("query.xsl"));
        CorePlusExtensionsParser cp = new CorePlusExtensionsParser(analyzer,
                new QueryParser("contents",analyzer));
       
         //get the user form input
        String queryString="foo";
        Properties userInput=new Properties();
        userInput.setProperty("query",queryString);

//        Transform the user input into a Lucene XML query, then pass to 
parser
        Query 
q=cp.getQuery(qtm.getQueryAsDOM(userInput).getDocumentElement());
       
        Hits h = searcher.search(q);
        for (int i = 0; i < h.length(); i++)
        {
            d=h.doc(i);
            System.out.println(d.get("contents"));
        }

Cheers
Mark




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: query syntax question

Posted by Les Fletcher <le...@affinitycircles.com>.
Would a good solution be to insert a secret string into blank fields 
that represents blank.  That way you could search for:

firstname:(-Xd8fgrSjg) lastname:(-Xd8fgrSjg) some query string

Les

Les Fletcher wrote:

> I like the idea of the filter since I am making heavy use of filters 
> for this particular query, but how would one go about constructing it 
> efficiently at query time?  All I can see is hacking around not being 
> able to use the * as the first character.
>
> Les
>
> Erick Erickson wrote:
>
>> You could create a Lucene Filter that had a bit for each document that
>> had a first or last name and use that at query time to restrict your
>> results appropriately. You could create this at startup time or at
>> query time. See CachingWrapperFilter for a way to cache it.
>>
>>
>> Another approach would be to add a dummy field to each document,
>> something like HASFIRSTORLASTNAME. At index time, when
>> you index a document, if it has a first or last name, put "yes" in the
>> field. Otherwise, put "no".
>>
>> Then, at search time, add an +HASFIRSTORLASTNAME:yes to the
>> query......
>>
>> You could add as many states to this field as you want.
>>
>>
>> Erick
>>
>>
>> On 5/10/07, Les Fletcher <le...@affinitycircles.com> wrote:
>>
>>>
>>> I have a question about empty fields.  I want to run a query that will
>>> search against a few particular fields for the query term but then also
>>> also check to see if a two other fields have any value at all.  i.e., I
>>> want to search for a set records but don't want to return a record if
>>> that record has blank first and last name fields.  Any help would be
>>> greatly appreciated.
>>>
>>> Les
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: query syntax question

Posted by Les Fletcher <le...@affinitycircles.com>.
I like the idea of the filter since I am making heavy use of filters for 
this particular query, but how would one go about constructing it 
efficiently at query time?  All I can see is hacking around not being 
able to use the * as the first character.

Les

Erick Erickson wrote:

> You could create a Lucene Filter that had a bit for each document that
> had a first or last name and use that at query time to restrict your
> results appropriately. You could create this at startup time or at
> query time. See CachingWrapperFilter for a way to cache it.
>
>
> Another approach would be to add a dummy field to each document,
> something like HASFIRSTORLASTNAME. At index time, when
> you index a document, if it has a first or last name, put "yes" in the
> field. Otherwise, put "no".
>
> Then, at search time, add an +HASFIRSTORLASTNAME:yes to the
> query......
>
> You could add as many states to this field as you want.
>
>
> Erick
>
>
> On 5/10/07, Les Fletcher <le...@affinitycircles.com> wrote:
>
>>
>> I have a question about empty fields.  I want to run a query that will
>> search against a few particular fields for the query term but then also
>> also check to see if a two other fields have any value at all.  i.e., I
>> want to search for a set records but don't want to return a record if
>> that record has blank first and last name fields.  Any help would be
>> greatly appreciated.
>>
>> Les
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: query syntax question

Posted by Erick Erickson <er...@gmail.com>.
You could create a Lucene Filter that had a bit for each document that
had a first or last name and use that at query time to restrict your
results appropriately. You could create this at startup time or at
query time. See CachingWrapperFilter for a way to cache it.


Another approach would be to add a dummy field to each document,
something like HASFIRSTORLASTNAME. At index time, when
you index a document, if it has a first or last name, put "yes" in the
field. Otherwise, put "no".

Then, at search time, add an +HASFIRSTORLASTNAME:yes to the
query......

You could add as many states to this field as you want.


Erick


On 5/10/07, Les Fletcher <le...@affinitycircles.com> wrote:
>
> I have a question about empty fields.  I want to run a query that will
> search against a few particular fields for the query term but then also
> also check to see if a two other fields have any value at all.  i.e., I
> want to search for a set records but don't want to return a record if
> that record has blank first and last name fields.  Any help would be
> greatly appreciated.
>
> Les
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>