You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by "Manepalli, Kalyan" <Ka...@orbitz.com> on 2008/11/20 21:23:20 UTC

Filtering on blank fields

Hi,

            I want to fetch only the documents which have a certain
field.

For this I am using a fq query like this

fq=rev.comments:[* TO *]

 

rev.comments fields is of type string.

The functionality works correctly but I am seeing a performance
degradation

Without the above fq, the QTime is around 300ms

With fq, the QTime jumps to 850ms

 

Is there any known issue with range query on String fields

Is there any other efficient way to do this.

 

Any suggestions in this regard will be very helpful

Thanks,

Kalyan Manepalli

RE: Filtering on blank fields

Posted by Lance Norskog <go...@gmail.com>.

The problem with a zero-length string "" is that it is also returned by:
field:[* TO *].  So you don't know if you're doing this right or not. For
those of us who cannot reindex at the drop of a hat, this is a big deal. We
went with -1.

Lance

-----Original Message-----
From: Manepalli, Kalyan [mailto:Kalyan.Manepalli@orbitz.com] 
Sent: Thursday, November 20, 2008 12:58 PM
To: solr-user@lucene.apache.org
Subject: RE: Filtering on blank fields

Hi Mike,
	Thanks for the suggestion, I will test it out and post the results

Thanks,
Kalyan Manepalli
-----Original Message-----
From: Mike Klaas [mailto:mike.klaas@gmail.com]
Sent: Thursday, November 20, 2008 2:38 PM
To: solr-user@lucene.apache.org
Subject: Re: Filtering on blank fields


On 20-Nov-08, at 12:23 PM, Manepalli, Kalyan wrote:

> Hi,
>
>            I want to fetch only the documents which have a certain
> field.
>
> For this I am using a fq query like this
>
> fq=rev.comments:[* TO *]
>
>
>
> rev.comments fields is of type string.
>
> The functionality works correctly but I am seeing a performance
> degradation
>
> Without the above fq, the QTime is around 300ms
>
> With fq, the QTime jumps to 850ms
>
>
>
> Is there any known issue with range query on String fields
>
> Is there any other efficient way to do this.

This is an inverted index at its worst, unfortunately (to look for an  
empty field, you are enumerating every possible value of that field  
and excluding the docs containing it).

The solution is to store a token indicating that the field is empty,  
such as "<nocomment>" (I think that "" works too).  Then change your  
fq to

fq=-comments:"<nocomment>"

It should be much faster.

-Mike
If you are not the intended recipient of this e-mail message, please notify
the sender 
and delete all copies immediately. The sender believes this message and any
attachments 
were sent free of any virus, worm, Trojan horse, and other forms of
malicious code. 
This message and its attachments could have been infected during
transmission. The 
recipient opens any attachments at the recipient's own risk, and in so
doing, the 
recipient accepts full responsibility for such actions and agrees to take
protective 
and remedial action relating to any malicious code. Travelport is not liable
for any 
loss or damage arising from this message or its attachments.

RE: Filtering on blank fields

Posted by "Manepalli, Kalyan" <Ka...@orbitz.com>.

Hi Mike,
	Thanks for the suggestion, I will test it out and post the
results

Thanks,
Kalyan Manepalli
-----Original Message-----
From: Mike Klaas [mailto:mike.klaas@gmail.com] 
Sent: Thursday, November 20, 2008 2:38 PM
To: solr-user@lucene.apache.org
Subject: Re: Filtering on blank fields


On 20-Nov-08, at 12:23 PM, Manepalli, Kalyan wrote:

> Hi,
>
>            I want to fetch only the documents which have a certain
> field.
>
> For this I am using a fq query like this
>
> fq=rev.comments:[* TO *]
>
>
>
> rev.comments fields is of type string.
>
> The functionality works correctly but I am seeing a performance
> degradation
>
> Without the above fq, the QTime is around 300ms
>
> With fq, the QTime jumps to 850ms
>
>
>
> Is there any known issue with range query on String fields
>
> Is there any other efficient way to do this.

This is an inverted index at its worst, unfortunately (to look for an  
empty field, you are enumerating every possible value of that field  
and excluding the docs containing it).

The solution is to store a token indicating that the field is empty,  
such as "<nocomment>" (I think that "" works too).  Then change your  
fq to

fq=-comments:"<nocomment>"

It should be much faster.

-Mike
If you are not the intended recipient of this e-mail message, please notify the sender 
and delete all copies immediately. The sender believes this message and any attachments 
were sent free of any virus, worm, Trojan horse, and other forms of malicious code. 
This message and its attachments could have been infected during transmission. The 
recipient opens any attachments at the recipient's own risk, and in so doing, the 
recipient accepts full responsibility for such actions and agrees to take protective 
and remedial action relating to any malicious code. Travelport is not liable for any 
loss or damage arising from this message or its attachments.

Re: Filtering on blank fields

Posted by Mike Klaas <mi...@gmail.com>.

On 20-Nov-08, at 12:23 PM, Manepalli, Kalyan wrote:

> Hi,
>
>            I want to fetch only the documents which have a certain
> field.
>
> For this I am using a fq query like this
>
> fq=rev.comments:[* TO *]
>
>
>
> rev.comments fields is of type string.
>
> The functionality works correctly but I am seeing a performance
> degradation
>
> Without the above fq, the QTime is around 300ms
>
> With fq, the QTime jumps to 850ms
>
>
>
> Is there any known issue with range query on String fields
>
> Is there any other efficient way to do this.

This is an inverted index at its worst, unfortunately (to look for an  
empty field, you are enumerating every possible value of that field  
and excluding the docs containing it).

The solution is to store a token indicating that the field is empty,  
such as "<nocomment>" (I think that "" works too).  Then change your  
fq to

fq=-comments:"<nocomment>"

It should be much faster.

-Mike