You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Jim Adams <ja...@gmail.com> on 2009/01/31 02:53:53 UTC

Range search question

I have a string field in my schema that actually numeric data.  If I try a
range search:

fieldInQuestion:[ 100 TO 150 ]

I fetch back a lot of data that is NOT in this range, such as 11, etc.

Any idea why this happens?  Is it because this is a string?

Thanks.

Re: Range search question

Posted by Lance Norskog <go...@gmail.com>.
A bit of Solr Kung Fu on this topic:

Let us suppose that your data source cannot be changed to use leading
zeroes. Also suppose that the field is required in every record.

The copyField directive automatically populates other fields with your input
data. If you do this:

      fieldQuestion type="string" stored="true" indexed="false"
multiValued="false"
      fieldQuestionSint type="sint" stored="false" indexed="true"
multiValued="false"

      copyField "fieldQuestion" "fieldQuestionSint"

You will get an automatic copy of your non-leading-zero number into a field
that can do range queries correctly. You will only store one copy of the raw
text. Both range queries and queries for particular integer values will work
correctly against "fieldQuestionSint".

Why multiValued="false"? This enforces the contents of "fieldQuestionSint".
A "copyField" directive adds new data to existing data, so if you
accidentally supply a "fieldQuestionSint" along with "fieldQuestion"
indexing will fail because two values have been pesented for
"fieldQuestionSint". This trick will still fail if you give a value for
"fieldQuestionSint" without a value for "fieldQuestion". I acquired this
trick because I have had problems in the past with controlling the exact
data that is sent to the indexer.

Sorting is a parallel problem. If you want to sort a query set by
"fieldQuestion", you need a field of type "integer". "sint" will generally
not work. These additions take care of that problem:

      fieldQuestionInteger type="integer" stored="false" indexed="true"
multiValued="false"
      copyField "fieldQuestion" "fieldQuestionInteger"

Syntax stripped for clarity.

Lance

On Sat, Jan 31, 2009 at 11:53 PM, Koji Sekiguchi <ko...@r.email.ne.jp> wrote:

> Because the lucene term ordering is lexicographic,
> if you index strings "11", "100", and "150",
> the terms  in the index "100","11","150" in this order.
>
> Koji
>
>
>
> Jim Adams wrote:
>
>> Why is this?
>>
>> Thanks.
>>
>> On Sat, Jan 31, 2009 at 3:50 AM, Koji Sekiguchi <ko...@r.email.ne.jp>
>> wrote:
>>
>>
>>
>>> Jim Adams wrote:
>>>
>>>
>>>
>>>> True, which is what I'll probably do, but is there any way to do this
>>>> using
>>>> 'string'?  Actually I have even seen this with date fields, which seems
>>>> very
>>>> odd (more data being returned than I expected).
>>>>
>>>>
>>>>
>>>>
>>>>
>>> If you want to stick with string, index "011" instead of "11".
>>>
>>> Koji
>>>
>>>
>>>
>>>
>>>
>>
>>
>>
>
>


-- 
Lance Norskog
goksron@gmail.com
650-922-8831 (US)

Re: Range search question

Posted by Koji Sekiguchi <ko...@r.email.ne.jp>.
Because the lucene term ordering is lexicographic,
if you index strings "11", "100", and "150",
the terms  in the index "100","11","150" in this order.

Koji


Jim Adams wrote:
> Why is this?
>
> Thanks.
>
> On Sat, Jan 31, 2009 at 3:50 AM, Koji Sekiguchi <ko...@r.email.ne.jp> wrote:
>
>   
>> Jim Adams wrote:
>>
>>     
>>> True, which is what I'll probably do, but is there any way to do this
>>> using
>>> 'string'?  Actually I have even seen this with date fields, which seems
>>> very
>>> odd (more data being returned than I expected).
>>>
>>>
>>>
>>>       
>> If you want to stick with string, index "011" instead of "11".
>>
>> Koji
>>
>>
>>
>>     
>
>   


Re: Range search question

Posted by Jim Adams <ja...@gmail.com>.
Why is this?

Thanks.

On Sat, Jan 31, 2009 at 3:50 AM, Koji Sekiguchi <ko...@r.email.ne.jp> wrote:

> Jim Adams wrote:
>
>> True, which is what I'll probably do, but is there any way to do this
>> using
>> 'string'?  Actually I have even seen this with date fields, which seems
>> very
>> odd (more data being returned than I expected).
>>
>>
>>
> If you want to stick with string, index "011" instead of "11".
>
> Koji
>
>
>

Re: Range search question

Posted by Koji Sekiguchi <ko...@r.email.ne.jp>.
Jim Adams wrote:
> True, which is what I'll probably do, but is there any way to do this using
> 'string'?  Actually I have even seen this with date fields, which seems very
> odd (more data being returned than I expected).
>
>   
If you want to stick with string, index "011" instead of "11".

Koji



Re: Range search question

Posted by Jim Adams <ja...@gmail.com>.
True, which is what I'll probably do, but is there any way to do this using
'string'?  Actually I have even seen this with date fields, which seems very
odd (more data being returned than I expected).

On Fri, Jan 30, 2009 at 7:04 PM, Koji Sekiguchi <ko...@r.email.ne.jp> wrote:

> Jim Adams wrote:
>
>> I have a string field in my schema that actually numeric data.  If I try a
>> range search:
>>
>> fieldInQuestion:[ 100 TO 150 ]
>>
>> I fetch back a lot of data that is NOT in this range, such as 11, etc.
>>
>> Any idea why this happens?  Is it because this is a string?
>>
>> Thanks.
>>
>>
>>
>
> Yep, try sint field type instead.
>
> Koji
>
>

Re: Range search question

Posted by Koji Sekiguchi <ko...@r.email.ne.jp>.
Jim Adams wrote:
> I have a string field in my schema that actually numeric data.  If I try a
> range search:
>
> fieldInQuestion:[ 100 TO 150 ]
>
> I fetch back a lot of data that is NOT in this range, such as 11, etc.
>
> Any idea why this happens?  Is it because this is a string?
>
> Thanks.
>
>   

Yep, try sint field type instead.

Koji