You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Jason Toy <ja...@gmail.com> on 2011/07/18 14:10:06 UTC
I found a sorting bug in solr/lucene
Hi all, I found a bug that exists in the 3.1 and in trunk, but not in 1.4.1
When I try to sort by a column with a colon in it like
"scores:rails_f", solr has cutoff the column name from the colon
forward so "scores:rails_f" becomes "scores"
To test, I inserted this doc:
In 1.4.1 I was able to insert this doc:
<?xml version="1.0" encoding="UTF-8"?><add><doc><field name="id">User
14914457</field><field name="type">User</field><field name="city_s">San
Francisco</field><field name="name_text">jtoy</field><field
name="login_text">jtoy</field><field name="description_text">life
hacker</field><field name="scores:rails_f">0.05</field></doc></add>
And then I can run the query:
http://localhost:8983/solr
/select?q=life&qf=description_text&defType=dismax&sort=scores:rails_f+desc
On 1.4.1 the query runs fine and returns the expected results.
If I insert the same document into solr 3.1 or trunk and run the same query
I get the error:
Problem accessing /solr/select. Reason:
undefined field scores
I can see in the lucene index that the data for scores:rails_f is in the
document. So solr/lucene is allowing me to store docs with fields that have
colons in it, but then I am not able to sort on it.
Can anyone else confirm this is a bug. Is this in lucene or solr? I believe
the issue resides in solr.
--
- sent from my mobile
6176064373
Re: I found a sorting bug in solr/lucene
Posted by Chris Hostetter <ho...@fucit.org>.
: According to that bug list, there are other characters that break the
: sorting function. Is there a list of safe characters I can use as a
: delimiter?
the safest field names to use (and most efficient to parse when sorting)
are things that follow the the "id" semenatics in java (not including the
"$" character at the begining) ...
http://download.oracle.com/javase/1.4.2/docs/api/java/lang/Character.html#isJavaIdentifierStart%28char%29
http://download.oracle.com/javase/1.4.2/docs/api/java/lang/Character.html#isJavaIdentifierPart%28char%29
So sorts like "foo_bar_baz asc" will definitely work, and are heavily
tested
I've just posted a patch to SOLR-2606 that should fix the "foo:bar asc"
and "foo-bar asc" situations, but because of the function query sort
parsing that happens first, they will always be slightly slower to parse.
-Hoss
Re: I found a sorting bug in solr/lucene
Posted by Jason Toy <ja...@gmail.com>.
According to that bug list, there are other characters that break the
sorting function. Is there a list of safe characters I can use as a
delimiter?
On Mon, Jul 18, 2011 at 1:31 PM, Chris Hostetter
<ho...@fucit.org>wrote:
>
> : When I try to sort by a column with a colon in it like
> : "scores:rails_f", solr has cutoff the column name from the colon
> : forward so "scores:rails_f" becomes "scores"
>
> Yes, this bug was recently reported against the 3.x line, but no fix has
> yet been identified...
>
> https://issues.apache.org/jira/browse/SOLR-2606
>
> : Can anyone else confirm this is a bug. Is this in lucene or solr? I
> believe
> : the issue resides in solr.
>
> it's specific to the param parsing, likely due to the addition of
> supporting functions in the sort param.
>
>
> -Hoss
>
--
- sent from my mobile
6176064373
Re: I found a sorting bug in solr/lucene
Posted by Chris Hostetter <ho...@fucit.org>.
: When I try to sort by a column with a colon in it like
: "scores:rails_f", solr has cutoff the column name from the colon
: forward so "scores:rails_f" becomes "scores"
Yes, this bug was recently reported against the 3.x line, but no fix has
yet been identified...
https://issues.apache.org/jira/browse/SOLR-2606
: Can anyone else confirm this is a bug. Is this in lucene or solr? I believe
: the issue resides in solr.
it's specific to the param parsing, likely due to the addition of
supporting functions in the sort param.
-Hoss
Re: I found a sorting bug in solr/lucene
Posted by Jason Toy <ja...@gmail.com>.
I am using a fairly popular library (sunspot-solr for ruby) on top of solr
that introduces the use of a colon, so I will modify the library, but I
think there is still a bug as this stopped working in recent version of
solr. Solr should also not allow the data into the doc in the first place if
it can't sort by that column name.
On Mon, Jul 18, 2011 at 9:47 AM, Nicholas Chase <nc...@earthlink.net>wrote:
> Seems to me that you wouldn't want to use a colon in a field name, since
> the search syntax uses it (ie, to find a document with foo = bar, you use
> foo:bar). I don't know whether that's actually prohibited, but that could
> be your problem.
>
> ---- Nick
>
>
> On 7/18/2011 8:10 AM, Jason Toy wrote:
>
>> Hi all, I found a bug that exists in the 3.1 and in trunk, but not in
>> 1.4.1
>>
>> When I try to sort by a column with a colon in it like
>> "scores:rails_f", solr has cutoff the column name from the colon
>> forward so "scores:rails_f" becomes "scores"
>>
>
--
- sent from my mobile
6176064373
Re: I found a sorting bug in solr/lucene
Posted by Nicholas Chase <nc...@earthlink.net>.
Seems to me that you wouldn't want to use a colon in a field name, since
the search syntax uses it (ie, to find a document with foo = bar, you
use foo:bar). I don't know whether that's actually prohibited, but that
could be your problem.
---- Nick
On 7/18/2011 8:10 AM, Jason Toy wrote:
> Hi all, I found a bug that exists in the 3.1 and in trunk, but not in 1.4.1
>
> When I try to sort by a column with a colon in it like
> "scores:rails_f", solr has cutoff the column name from the colon
> forward so "scores:rails_f" becomes "scores"