You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Jason Toy <ja...@gmail.com> on 2011/07/18 14:10:06 UTC

I found a sorting bug in solr/lucene

Hi all,  I found a bug that exists in the 3.1 and in trunk, but not in 1.4.1

When I try to sort by a column with a colon in it like
"scores:rails_f",  solr has cutoff the column name from the colon
forward so "scores:rails_f" becomes "scores"

To test, I inserted this doc:

In 1.4.1 I was able to insert this  doc:
<?xml version="1.0" encoding="UTF-8"?><add><doc><field name="id">User
14914457</field><field name="type">User</field><field name="city_s">San
Francisco</field><field name="name_text">jtoy</field><field
name="login_text">jtoy</field><field name="description_text">life
hacker</field><field name="scores:rails_f">0.05</field></doc></add>


And then I can run the query:

http://localhost:8983/solr
/select?q=life&qf=description_text&defType=dismax&sort=scores:rails_f+desc

On 1.4.1 the query runs fine and returns the expected results.

If I insert the same document into solr 3.1 or trunk and run the same query
I get the error:

Problem accessing /solr/select. Reason:

    undefined field scores

I can see in the lucene index that the data for scores:rails_f is in the
document. So solr/lucene is allowing me to store docs with fields that have
colons in it, but then I am not able to sort on it.

Can anyone else confirm this is a bug. Is this in lucene or solr? I believe
the issue resides in solr.




-- 
- sent from my mobile
6176064373

Re: I found a sorting bug in solr/lucene

Posted by Chris Hostetter <ho...@fucit.org>.
: According to that bug list, there are other characters that break the
: sorting function.  Is there a list of safe characters I can use as a
: delimiter?

the safest field names to use (and most efficient to parse when sorting) 
are things that follow the the "id" semenatics in java (not including the 
"$" character at the begining) ...

http://download.oracle.com/javase/1.4.2/docs/api/java/lang/Character.html#isJavaIdentifierStart%28char%29
http://download.oracle.com/javase/1.4.2/docs/api/java/lang/Character.html#isJavaIdentifierPart%28char%29

So sorts like "foo_bar_baz asc" will definitely work, and are heavily 
tested

I've just posted a patch to SOLR-2606 that should fix the "foo:bar asc" 
and "foo-bar asc" situations, but because of the function query sort 
parsing that happens first, they will always be slightly slower to parse.


-Hoss

Re: I found a sorting bug in solr/lucene

Posted by Jason Toy <ja...@gmail.com>.
According to that bug list, there are other characters that break the
sorting function.  Is there a list of safe characters I can use as a
delimiter?

On Mon, Jul 18, 2011 at 1:31 PM, Chris Hostetter
<ho...@fucit.org>wrote:

>
> : When I try to sort by a column with a colon in it like
> : "scores:rails_f",  solr has cutoff the column name from the colon
> : forward so "scores:rails_f" becomes "scores"
>
> Yes, this bug was recently reported against the 3.x line, but no fix has
> yet been identified...
>
> https://issues.apache.org/jira/browse/SOLR-2606
>
> : Can anyone else confirm this is a bug. Is this in lucene or solr? I
> believe
> : the issue resides in solr.
>
> it's specific to the param parsing, likely due to the addition of
> supporting functions in the sort param.
>
>
> -Hoss
>



-- 
- sent from my mobile
6176064373

Re: I found a sorting bug in solr/lucene

Posted by Chris Hostetter <ho...@fucit.org>.
: When I try to sort by a column with a colon in it like
: "scores:rails_f",  solr has cutoff the column name from the colon
: forward so "scores:rails_f" becomes "scores"

Yes, this bug was recently reported against the 3.x line, but no fix has 
yet been identified...

https://issues.apache.org/jira/browse/SOLR-2606

: Can anyone else confirm this is a bug. Is this in lucene or solr? I believe
: the issue resides in solr.

it's specific to the param parsing, likely due to the addition of 
supporting functions in the sort param.


-Hoss

Re: I found a sorting bug in solr/lucene

Posted by Jason Toy <ja...@gmail.com>.
I am using a fairly popular library (sunspot-solr for ruby) on top of solr
that introduces the use of a colon, so I will modify the library, but I
think  there is still a bug as this stopped working in recent version of
solr. Solr should also not allow the data into the doc in the first place if
it can't sort by that column name.

On Mon, Jul 18, 2011 at 9:47 AM, Nicholas Chase <nc...@earthlink.net>wrote:

> Seems to me that you wouldn't want to use a colon in a field name, since
> the search syntax uses it (ie, to find a document with foo = bar, you use
> foo:bar).  I don't know whether that's actually prohibited, but that could
> be your problem.
>
> ----  Nick
>
>
> On 7/18/2011 8:10 AM, Jason Toy wrote:
>
>> Hi all,  I found a bug that exists in the 3.1 and in trunk, but not in
>> 1.4.1
>>
>> When I try to sort by a column with a colon in it like
>> "scores:rails_f",  solr has cutoff the column name from the colon
>> forward so "scores:rails_f" becomes "scores"
>>
>


-- 
- sent from my mobile
6176064373

Re: I found a sorting bug in solr/lucene

Posted by Nicholas Chase <nc...@earthlink.net>.
Seems to me that you wouldn't want to use a colon in a field name, since 
the search syntax uses it (ie, to find a document with foo = bar, you 
use foo:bar).  I don't know whether that's actually prohibited, but that 
could be your problem.

----  Nick

On 7/18/2011 8:10 AM, Jason Toy wrote:
> Hi all,  I found a bug that exists in the 3.1 and in trunk, but not in 1.4.1
>
> When I try to sort by a column with a colon in it like
> "scores:rails_f",  solr has cutoff the column name from the colon
> forward so "scores:rails_f" becomes "scores"