You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Aaron Daubman <da...@gmail.com> on 2012/12/04 15:30:07 UTC

Range Queries performing differently on SortableIntField vs TrieField of type integer

Greetings,

I'm finally updating an old instance and in testing, discovered that using
the recommended TrieField instead of SortableIntField for range queries
returns unexpected and seemingly incorrect results.

A query with:

q=*:*&fq=+i_yearStartSort:{* TO 1995}&fq=+i_yearStopSort:{* TO *}

Should, and does under 1.4.1 with SortableIntField, only return docs that
have some i_yearStopSort value and have an i_yearStartSort value less than
1995.

Unfortunately, under 3.6.1 with class="solr.TrieField" type="integer", this
query is returning docs that have neither an i_yearStopSort nor a
i_yearStartSort value.


Here are the two schemas:

Solr 1.4.1 Relevant Schema Parts - Working as desired:
---------------------------------------------------------------------------------
<fieldType name="sint" class="solr.SortableIntField" sortMissingLast="true"
omitNorms="true"/>
...
<field name="i_yearStartSort" type="sint" indexed="true" stored="false"
required="false" multiValued="true"/>
<field name="i_yearStopSort" type="sint" indexed="true"  stored="false"
required="false" multiValued="true"/>


Solr 3.6.1 Relevant Schema Parts - Not working as expected:
-----------------------------------------------------------------------------------------
<fieldType name="tint" class="solr.TrieField" type="integer"
precisionStep="4" sortMissingLast="true" positionIncrementGap="0"
omitNorms="true"/>
...
<field name="i_yearStartSort" type="tint" indexed="true" stored="false"
required="false" multiValued="false"/>
<field name="i_yearStopSort" type="tint" indexed="true" stored="false"
required="false" multiValued="false"/>


1) What is the best way to return to the desired/expected behavior?
2) Can you explain to me why this happens?
3) I have a sneaking suspicion (but could be totally wrong) that this
relates to sortMissingLast="true" - if it does, can you explain the seeming
discrepancies in:
SOLR-2881 and SOLR-2134? If I am reading these correctly, SOLR-2134 says
this was fixed for Trie in 4.0, but not in 3.x... SOLR-2881 has a fix
version of 3.5 listed, but some of the comments also seem to indicate this
was not actually fixed in 3.5+

Thanks,
     Aaron

Re: Range Queries performing differently on SortableIntField vs TrieField of type integer

Posted by Chris Hostetter <ho...@fucit.org>.
: q=*:*&fq=+i_yearStartSort:{* TO 1995}&fq=+i_yearStopSort:{* TO *}
	...
: Unfortunately, under 3.6.1 with class="solr.TrieField" type="integer", this
: query is returning docs that have neither an i_yearStopSort nor a
: i_yearStartSort value.

Hmmmm... I can't seem to reproduce this.

Here's what i tried...

1) start up the Solr 3.6.1 example

2) index the 3.6.1 example docs...
java -jar post.jar *.xml

3) index a single doc using some "*_ti" dynamic fields (which us 
"tint")...
java -Ddata=args -jar post.jar '<add><doc><field name="id">HOSS</field><field name="start_ti">45</field><field name="end_ti">100</field></doc></add>'

If i do some open ended range queries on the *_ti fields, i get the 
results i expect (either only my HOSS doc if it's in the ranges, or no 
docs if HOSS is out of range)...

Matches HOSS...
http://localhost:8983/solr/select?q=*:*&fq=start_ti:{*%20TO%2050}&fl=start_ti,id,end_ti
http://localhost:8983/solr/select?q=*:*&fq=start_ti:{*%20TO%2050}&fq=end_ti:{*%20TO%20*}fl=start_ti,id,end_ti

Matches nothing...
http://localhost:8983/solr/select?q=*:*&fq=start_ti:{*%20TO%205}&fl=start_ti,id,end_ti
http://localhost:8983/solr/select?q=*:*&fq=start_ti:{*%20TO%205}&fq=end_ti:{*%20TO%20*}fl=start_ti,id,end_ti

I repeated the test after deleting all data, and adding 
sortMissingLast="true" to the example "tint" fieldType, and got the same 
results.

: Solr 3.6.1 Relevant Schema Parts - Not working as expected:
: -----------------------------------------------------------------------------------------
: <fieldType name="tint" class="solr.TrieField" type="integer"
: precisionStep="4" sortMissingLast="true" positionIncrementGap="0"
: omitNorms="true"/>

FYI: you have some wackiness there: 'type="integer"' inside the 
'<fieldType name="tint" .../>' ... that shouldn't have caused any problems 
though, but it doesn't make any sense. 

: <field name="i_yearStartSort" type="tint" indexed="true" stored="false"
: required="false" multiValued="false"/>
: <field name="i_yearStopSort" type="tint" indexed="true" stored="false"
: required="false" multiValued="false"/>

can you try changing those to stored="true" and re-indexing as a sanity 
check? perhaps your indexing code is putting a default value in that 
you aren't realizing?

w/o more specifics (ie: sample docs to index) on how to reproduce, i can't 
seem to find any problem.


-Hoss

Re: Range Queries performing differently on SortableIntField vs TrieField of type integer

Posted by Aaron Daubman <da...@gmail.com>.
Hi Upayavira,

One small question - did you re-index in-between? The index structure
> will be different for each.
>

Yes, the Solr 1.4.1 (working) instance was built using the original schema
and that solr version.
The Solr 3.6.1 (not working) instance was re-built using the new schema and
Solr 3.6.1...

Thanks,
      Aaron

Re: Range Queries performing differently on SortableIntField vs TrieField of type integer

Posted by Aaron Daubman <da...@gmail.com>.
I forgot a possibly important piece... Given the different Solr versions,
the schema version (and it's related different defaults) is also a change:

Solr 1.4.1 Has:
<schema name="ourSchema" version="1.1">

Solr 3.6.1 Has:
<schema name="ourSchema" version="1.5">


> Solr 1.4.1 Relevant Schema Parts - Working as desired:

> >
> ---------------------------------------------------------------------------------
> > <fieldType name="sint" class="solr.SortableIntField"
> > sortMissingLast="true"
> > omitNorms="true"/>
> > ...
> > <field name="i_yearStartSort" type="sint" indexed="true" stored="false"
> > required="false" multiValued="true"/>
> > <field name="i_yearStopSort" type="sint" indexed="true"  stored="false"
> > required="false" multiValued="true"/>
> >
> >
> > Solr 3.6.1 Relevant Schema Parts - Not working as expected:
> >
> -----------------------------------------------------------------------------------------
> > <fieldType name="tint" class="solr.TrieField" type="integer"
> > precisionStep="4" sortMissingLast="true" positionIncrementGap="0"
> > omitNorms="true"/>
> > ...
> > <field name="i_yearStartSort" type="tint" indexed="true" stored="false"
> > required="false" multiValued="false"/>
> > <field name="i_yearStopSort" type="tint" indexed="true" stored="false"
> > required="false" multiValued="false"/>
>

Re: Range Queries performing differently on SortableIntField vs TrieField of type integer

Posted by Upayavira <uv...@odoko.co.uk>.
One small question - did you re-index in-between? The index structure
will be different for each.

Upayavira

On Tue, Dec 4, 2012, at 02:30 PM, Aaron Daubman wrote:
> Greetings,
> 
> I'm finally updating an old instance and in testing, discovered that
> using
> the recommended TrieField instead of SortableIntField for range queries
> returns unexpected and seemingly incorrect results.
> 
> A query with:
> 
> q=*:*&fq=+i_yearStartSort:{* TO 1995}&fq=+i_yearStopSort:{* TO *}
> 
> Should, and does under 1.4.1 with SortableIntField, only return docs that
> have some i_yearStopSort value and have an i_yearStartSort value less
> than
> 1995.
> 
> Unfortunately, under 3.6.1 with class="solr.TrieField" type="integer",
> this
> query is returning docs that have neither an i_yearStopSort nor a
> i_yearStartSort value.
> 
> 
> Here are the two schemas:
> 
> Solr 1.4.1 Relevant Schema Parts - Working as desired:
> ---------------------------------------------------------------------------------
> <fieldType name="sint" class="solr.SortableIntField"
> sortMissingLast="true"
> omitNorms="true"/>
> ...
> <field name="i_yearStartSort" type="sint" indexed="true" stored="false"
> required="false" multiValued="true"/>
> <field name="i_yearStopSort" type="sint" indexed="true"  stored="false"
> required="false" multiValued="true"/>
> 
> 
> Solr 3.6.1 Relevant Schema Parts - Not working as expected:
> -----------------------------------------------------------------------------------------
> <fieldType name="tint" class="solr.TrieField" type="integer"
> precisionStep="4" sortMissingLast="true" positionIncrementGap="0"
> omitNorms="true"/>
> ...
> <field name="i_yearStartSort" type="tint" indexed="true" stored="false"
> required="false" multiValued="false"/>
> <field name="i_yearStopSort" type="tint" indexed="true" stored="false"
> required="false" multiValued="false"/>
> 
> 
> 1) What is the best way to return to the desired/expected behavior?
> 2) Can you explain to me why this happens?
> 3) I have a sneaking suspicion (but could be totally wrong) that this
> relates to sortMissingLast="true" - if it does, can you explain the
> seeming
> discrepancies in:
> SOLR-2881 and SOLR-2134? If I am reading these correctly, SOLR-2134 says
> this was fixed for Trie in 4.0, but not in 3.x... SOLR-2881 has a fix
> version of 3.5 listed, but some of the comments also seem to indicate
> this
> was not actually fixed in 3.5+
> 
> Thanks,
>      Aaron

Re: Range Queries performing differently on SortableIntField vs TrieField of type integer

Posted by Jack Krupansky <ja...@basetechnology.com>.
Could you show us some input data, both WITH a i_yearStopSort value and 
WITHOUT the the value?

I tried a quick test using the stock Solr 3.6.1 example schema and a dynamic 
integer field and the filter query did in fact filter out all documents that 
did not have a value in that field:

http://localhost:8983/solr/select?q=*:*&fq=%2bx_i:{*+TO+*}

Maybe you could come up with a simple sample solrxml document that can be 
added to the stock 3.6.1 example schema that shows the problem.

-- Jack Krupansky

-----Original Message----- 
From: Aaron Daubman
Sent: Tuesday, December 04, 2012 9:30 AM
To: solr-user@lucene.apache.org
Subject: Range Queries performing differently on SortableIntField vs 
TrieField of type integer

Greetings,

I'm finally updating an old instance and in testing, discovered that using
the recommended TrieField instead of SortableIntField for range queries
returns unexpected and seemingly incorrect results.

A query with:

q=*:*&fq=+i_yearStartSort:{* TO 1995}&fq=+i_yearStopSort:{* TO *}

Should, and does under 1.4.1 with SortableIntField, only return docs that
have some i_yearStopSort value and have an i_yearStartSort value less than
1995.

Unfortunately, under 3.6.1 with class="solr.TrieField" type="integer", this
query is returning docs that have neither an i_yearStopSort nor a
i_yearStartSort value.


Here are the two schemas:

Solr 1.4.1 Relevant Schema Parts - Working as desired:
---------------------------------------------------------------------------------
<fieldType name="sint" class="solr.SortableIntField" sortMissingLast="true"
omitNorms="true"/>
...
<field name="i_yearStartSort" type="sint" indexed="true" stored="false"
required="false" multiValued="true"/>
<field name="i_yearStopSort" type="sint" indexed="true"  stored="false"
required="false" multiValued="true"/>


Solr 3.6.1 Relevant Schema Parts - Not working as expected:
-----------------------------------------------------------------------------------------
<fieldType name="tint" class="solr.TrieField" type="integer"
precisionStep="4" sortMissingLast="true" positionIncrementGap="0"
omitNorms="true"/>
...
<field name="i_yearStartSort" type="tint" indexed="true" stored="false"
required="false" multiValued="false"/>
<field name="i_yearStopSort" type="tint" indexed="true" stored="false"
required="false" multiValued="false"/>


1) What is the best way to return to the desired/expected behavior?
2) Can you explain to me why this happens?
3) I have a sneaking suspicion (but could be totally wrong) that this
relates to sortMissingLast="true" - if it does, can you explain the seeming
discrepancies in:
SOLR-2881 and SOLR-2134? If I am reading these correctly, SOLR-2134 says
this was fixed for Trie in 4.0, but not in 3.x... SOLR-2881 has a fix
version of 3.5 listed, but some of the comments also seem to indicate this
was not actually fixed in 3.5+

Thanks,
     Aaron