You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Peter Spam <ps...@mac.com> on 2011/10/21 21:57:03 UTC

Sorting fields with letters?

Hi everyone,

I have a field that has a letter in it (for example, 1A1, 2A1, 11C15, etc.).  Sorting it seems to work most of the time, except for a few things, like 10A1 is lower than 8A100, and 10A100 is lower than 10A99.  Any ideas?  I bet if my data had leading zeros (ie 10A099), it would behave better?  (But I can't really change my data now, as it would take a few days to re-inject - which is possible but a hassle).


Thanks!
Pete

Re: Usage of Double quotes for single terms (camelcase) while querying

Posted by Chris Hostetter <ho...@fucit.org>.
: Subject: Usage of  Double quotes for single terms (camelcase) while querying
: References: <A5...@mac.com>
:  <CA...@mail.gmail.com>
:  <66...@mac.com>
:  <CA...@mail.gmail.com>
:  <51...@mac.com>
: In-Reply-To: <51...@mac.com>

https://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists

When starting a new discussion on a mailing list, please do not reply to 
an existing message, instead start a fresh email.  Even if you change the 
subject line of your email, other mail headers still track which thread 
you replied to and your question is "hidden" in that thread and gets less 
attention.   It makes following discussions in the mailing list archives 
particularly difficult.



-Hoss

Usage of Double quotes for single terms (camelcase) while querying

Posted by Nasima Banu <na...@position2.com>.
Hello Solr, 

Do we have to specify double quotes for a single term (if the term is a camelcase, eg, OrientalTradingCo) while querying. I am using apache-solr-3.3.0.
For example the query :
q=OrientalTradingCo&debugQuery=true gives the debugging response as ---

<lst name="debug">
<str name="rawquerystring">OrientalTradingCo</str>
<str name="querystring">OrientalTradingCo</str>
<str name="parsedquery">title:orient title:trade title:co title:orientaltradingco</str>
<str name="parsedquery_toString">title:orient title:trade title:co title:orientaltradingco</str>
</lst> 


I see a major change in the debug output using solr 1.4. The same query gives me the debug output of ---

q=OrientalTradingCo&debugQuery=true


<lst name="debug">
<str name="rawquerystring">OrientalTradingCo</str>
<str name="querystring">OrientalTradingCo</str>
<str name="parsedquery">MultiPhraseQuery(defaultquery:"orient trade (co orientaltradingco)")</str>
<str name="parsedquery_toString">defaultquery:"orient trade (co orientaltradingco)"</str>
</lst>

The schema used for both the versions are the same ---


<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
	<analyzer type="index">
	<tokenizer class="solr.KeywordTokenizerFactory"/>
	<filter class="solr.CommonGramsFilterFactory" words="stopwords.txt" minShingleSize="3" maxShingleSize="3" ignoreCase="true"/>
	<!--<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> -->
	<filter class="solr.LowerCaseFilterFactory"/>
	<filter class="solr.SnowballPorterFilterFactory" protected="protwords.txt" language="English"/>
	<!-- <filter class="solr.PorterStemFilterFactory" language="English" protected="protwords.txt"/> -->
	</analyzer>
	<analyzer type="query">
	<tokenizer class="solr.KeywordTokenizerFactory"/>
	<filter class="solr.CommonGramsFilterFactory" words="stopwords.txt" minShingleSize="3" maxShingleSize="3" ignoreCase="true"/>
	<!--<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> -->
 	<filter class="solr.LowerCaseFilterFactory"/>
	<filter class="solr.SnowballPorterFilterFactory" protected="protwords.txt" language="English"/>
	<!--<filter class="solr.PorterStemFilterFactory" language="English" protected="protwords.txt"/> --></analyzer>
	</fieldType>


Has there been a change in specifying queries for single terms which are camelcase????


Thanks,
Nasima

Re: Sorting fields with letters?

Posted by Peter Spam <ps...@mac.com>.
Tried using the ord() function, but it was the same as the standard sort.

Do I just need to bite the bullet and reindex everything?


Thanks!
Pete

On Oct 21, 2011, at 5:26 PM, Tomás Fernández Löbbe wrote:

> I don't know if you'll find exactly what you need, but you can sort by any
> field or FunctionQuery. See http://wiki.apache.org/solr/FunctionQuery
> 
> On Fri, Oct 21, 2011 at 7:03 PM, Peter Spam <ps...@mac.com> wrote:
> 
>> Is there a way to use a custom sorter, to avoid re-indexing?
>> 
>> 
>> Thanks!
>> Pete
>> 
>> On Oct 21, 2011, at 2:13 PM, Tomás Fernández Löbbe wrote:
>> 
>>> Well, yes. You probably have a string field for that content, right? so
>> the
>>> content is being compared as strings, not as numbers, that why something
>>> like 1000 is lower than 2. Leading zeros would be an option. Another
>> option
>>> is to separate the field into numeric fields and sort by those (this last
>>> option is only recommended if your data always look similar).
>>> Something like 11C15 to field1: 11, field2:C field3: 15. Then use
>>> "sort=field1,field2,field3".
>>> 
>>> Anyway, both this options require reindexing.
>>> 
>>> Regards,
>>> 
>>> Tomás
>>> 
>>> On Fri, Oct 21, 2011 at 4:57 PM, Peter Spam <ps...@mac.com> wrote:
>>> 
>>>> Hi everyone,
>>>> 
>>>> I have a field that has a letter in it (for example, 1A1, 2A1, 11C15,
>>>> etc.).  Sorting it seems to work most of the time, except for a few
>> things,
>>>> like 10A1 is lower than 8A100, and 10A100 is lower than 10A99.  Any
>> ideas?
>>>> I bet if my data had leading zeros (ie 10A099), it would behave better?
>>>> (But I can't really change my data now, as it would take a few days to
>>>> re-inject - which is possible but a hassle).
>>>> 
>>>> 
>>>> Thanks!
>>>> Pete
>>>> 
>> 
>> 


Re: Sorting fields with letters?

Posted by Tomás Fernández Löbbe <to...@gmail.com>.
I don't know if you'll find exactly what you need, but you can sort by any
field or FunctionQuery. See http://wiki.apache.org/solr/FunctionQuery

On Fri, Oct 21, 2011 at 7:03 PM, Peter Spam <ps...@mac.com> wrote:

> Is there a way to use a custom sorter, to avoid re-indexing?
>
>
> Thanks!
> Pete
>
> On Oct 21, 2011, at 2:13 PM, Tomás Fernández Löbbe wrote:
>
> > Well, yes. You probably have a string field for that content, right? so
> the
> > content is being compared as strings, not as numbers, that why something
> > like 1000 is lower than 2. Leading zeros would be an option. Another
> option
> > is to separate the field into numeric fields and sort by those (this last
> > option is only recommended if your data always look similar).
> > Something like 11C15 to field1: 11, field2:C field3: 15. Then use
> > "sort=field1,field2,field3".
> >
> > Anyway, both this options require reindexing.
> >
> > Regards,
> >
> > Tomás
> >
> > On Fri, Oct 21, 2011 at 4:57 PM, Peter Spam <ps...@mac.com> wrote:
> >
> >> Hi everyone,
> >>
> >> I have a field that has a letter in it (for example, 1A1, 2A1, 11C15,
> >> etc.).  Sorting it seems to work most of the time, except for a few
> things,
> >> like 10A1 is lower than 8A100, and 10A100 is lower than 10A99.  Any
> ideas?
> >> I bet if my data had leading zeros (ie 10A099), it would behave better?
> >> (But I can't really change my data now, as it would take a few days to
> >> re-inject - which is possible but a hassle).
> >>
> >>
> >> Thanks!
> >> Pete
> >>
>
>

Re: Sorting fields with letters?

Posted by Peter Spam <ps...@mac.com>.
Is there a way to use a custom sorter, to avoid re-indexing?


Thanks!
Pete

On Oct 21, 2011, at 2:13 PM, Tomás Fernández Löbbe wrote:

> Well, yes. You probably have a string field for that content, right? so the
> content is being compared as strings, not as numbers, that why something
> like 1000 is lower than 2. Leading zeros would be an option. Another option
> is to separate the field into numeric fields and sort by those (this last
> option is only recommended if your data always look similar).
> Something like 11C15 to field1: 11, field2:C field3: 15. Then use
> "sort=field1,field2,field3".
> 
> Anyway, both this options require reindexing.
> 
> Regards,
> 
> Tomás
> 
> On Fri, Oct 21, 2011 at 4:57 PM, Peter Spam <ps...@mac.com> wrote:
> 
>> Hi everyone,
>> 
>> I have a field that has a letter in it (for example, 1A1, 2A1, 11C15,
>> etc.).  Sorting it seems to work most of the time, except for a few things,
>> like 10A1 is lower than 8A100, and 10A100 is lower than 10A99.  Any ideas?
>> I bet if my data had leading zeros (ie 10A099), it would behave better?
>> (But I can't really change my data now, as it would take a few days to
>> re-inject - which is possible but a hassle).
>> 
>> 
>> Thanks!
>> Pete
>> 


Re: Sorting fields with letters?

Posted by Tomás Fernández Löbbe <to...@gmail.com>.
Well, yes. You probably have a string field for that content, right? so the
content is being compared as strings, not as numbers, that why something
like 1000 is lower than 2. Leading zeros would be an option. Another option
is to separate the field into numeric fields and sort by those (this last
option is only recommended if your data always look similar).
Something like 11C15 to field1: 11, field2:C field3: 15. Then use
"sort=field1,field2,field3".

Anyway, both this options require reindexing.

Regards,

Tomás

On Fri, Oct 21, 2011 at 4:57 PM, Peter Spam <ps...@mac.com> wrote:

> Hi everyone,
>
> I have a field that has a letter in it (for example, 1A1, 2A1, 11C15,
> etc.).  Sorting it seems to work most of the time, except for a few things,
> like 10A1 is lower than 8A100, and 10A100 is lower than 10A99.  Any ideas?
>  I bet if my data had leading zeros (ie 10A099), it would behave better?
>  (But I can't really change my data now, as it would take a few days to
> re-inject - which is possible but a hassle).
>
>
> Thanks!
> Pete
>