You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Anupam Bhattacharya <an...@gmail.com> on 2012/08/02 19:06:13 UTC

Re: Sorting fields of text_general fieldType

The approach used to work perfectly.

But recently i realized that it is not working for more than 300000 indexed
records.
I am using SOLR 3.5 version.

Is there another approach to SORT a title field in proper alphabetical
order irrespective of Lower case and Upper case.

Regards
Anupam

On Thu, May 17, 2012 at 4:32 PM, Ahmet Arslan <io...@yahoo.com> wrote:

> > The title sort works in a strange manner because the SOLR
> > server treats
> > title string based on Upper Case or Lower Case String. Thus
> > if we sort in
> > ascending order, first the title with numeric shows up then
> > the titles in
> > alphabetical order which starts with Upper Case & after
> > that the titles
> > starting with Lowercase.
> >
> > The title field is indexed as text_general fieldtype.
> >
> > <field name="title" type="text_general" indexed="true"
> > stored="true"/>
>
> Please see Otis' response http://search-lucene.com/m/uDxTF1scW0d2
>
> Simply create an additional field named title_sortable with the following
> type
>
>  <!-- lowercases the entire field value, keeping it as a single token.  -->
>     <fieldType name="lowercase" class="solr.TextField"
> positionIncrementGap="100">
>       <analyzer>
>         <tokenizer class="solr.KeywordTokenizerFactory"/>
>         <filter class="solr.LowerCaseFilterFactory" />
>         <filter class="solr.TrimFilterFactory" />
>       </analyzer>
>     </fieldType>
>
> Populate it via copyField directive :
>
>   <copyField source="title" dest="title_sortable" maxChars="N"/>
>
> then &sort=title_sortable asc
>
>
>

Re: Sorting fields of text_general fieldType

Posted by Erick Erickson <er...@gmail.com>.
Did you re-index everything after the change you made? Your old docs
will be sorted by null values in the title_sort field, so they'd all come out
first or last depending, then sub-sorted by internal Lucene doc ID.

If you have, can you just create an index with, say, 6 titles that sorts
improperly and give us the output from your app?

I find it very unlikely that this is really broken, lots and lots and lots
of people are using this all the time so my first guess is it's something
you're doing that _seems_ harmless. Don't get me wrong, there could
indeed be a bug here, it just seems unlikely.....

To be really safe, I'd stop my Solr server and blow away the
<solr_home>/data/index directory. Remove the directory itself
not just the contents and start indexing over again.

Best
Erick

On Fri, Aug 3, 2012 at 4:30 AM, Anupam Bhattacharya <an...@gmail.com> wrote:
> Few titles are as following:
>
> Embattled JPMorgan boss survives power challenge - Jakarta Globe
>
> Kitten Survives 6500-Mile Trip in China-US Container - Jakarta Globe
>
> Guard survives hail of bullets - Jakarta Post
>
> On Fri, Aug 3, 2012 at 1:09 PM, Lance Norskog <go...@gmail.com> wrote:
>
>> Give us some pairs of titles which sort the wrong way.
>>
>> On Thu, Aug 2, 2012 at 10:06 AM, Anupam Bhattacharya
>> <an...@gmail.com> wrote:
>> > The approach used to work perfectly.
>> >
>> > But recently i realized that it is not working for more than 300000
>> indexed
>> > records.
>> > I am using SOLR 3.5 version.
>> >
>> > Is there another approach to SORT a title field in proper alphabetical
>> > order irrespective of Lower case and Upper case.
>> >
>> > Regards
>> > Anupam
>> >
>> > On Thu, May 17, 2012 at 4:32 PM, Ahmet Arslan <io...@yahoo.com> wrote:
>> >
>> >> > The title sort works in a strange manner because the SOLR
>> >> > server treats
>> >> > title string based on Upper Case or Lower Case String. Thus
>> >> > if we sort in
>> >> > ascending order, first the title with numeric shows up then
>> >> > the titles in
>> >> > alphabetical order which starts with Upper Case & after
>> >> > that the titles
>> >> > starting with Lowercase.
>> >> >
>> >> > The title field is indexed as text_general fieldtype.
>> >> >
>> >> > <field name="title" type="text_general" indexed="true"
>> >> > stored="true"/>
>> >>
>> >> Please see Otis' response http://search-lucene.com/m/uDxTF1scW0d2
>> >>
>> >> Simply create an additional field named title_sortable with the
>> following
>> >> type
>> >>
>> >>  <!-- lowercases the entire field value, keeping it as a single token.
>>  -->
>> >>     <fieldType name="lowercase" class="solr.TextField"
>> >> positionIncrementGap="100">
>> >>       <analyzer>
>> >>         <tokenizer class="solr.KeywordTokenizerFactory"/>
>> >>         <filter class="solr.LowerCaseFilterFactory" />
>> >>         <filter class="solr.TrimFilterFactory" />
>> >>       </analyzer>
>> >>     </fieldType>
>> >>
>> >> Populate it via copyField directive :
>> >>
>> >>   <copyField source="title" dest="title_sortable" maxChars="N"/>
>> >>
>> >> then &sort=title_sortable asc
>> >>
>> >>
>> >>
>>
>>
>>
>> --
>> Lance Norskog
>> goksron@gmail.com
>>
>
>
>
> --
> Thanks & Regards
> Anupam Bhattacharya

Re: Sorting fields of text_general fieldType

Posted by Anupam Bhattacharya <an...@gmail.com>.
Few titles are as following:

Embattled JPMorgan boss survives power challenge - Jakarta Globe

Kitten Survives 6500-Mile Trip in China-US Container - Jakarta Globe

Guard survives hail of bullets - Jakarta Post

On Fri, Aug 3, 2012 at 1:09 PM, Lance Norskog <go...@gmail.com> wrote:

> Give us some pairs of titles which sort the wrong way.
>
> On Thu, Aug 2, 2012 at 10:06 AM, Anupam Bhattacharya
> <an...@gmail.com> wrote:
> > The approach used to work perfectly.
> >
> > But recently i realized that it is not working for more than 300000
> indexed
> > records.
> > I am using SOLR 3.5 version.
> >
> > Is there another approach to SORT a title field in proper alphabetical
> > order irrespective of Lower case and Upper case.
> >
> > Regards
> > Anupam
> >
> > On Thu, May 17, 2012 at 4:32 PM, Ahmet Arslan <io...@yahoo.com> wrote:
> >
> >> > The title sort works in a strange manner because the SOLR
> >> > server treats
> >> > title string based on Upper Case or Lower Case String. Thus
> >> > if we sort in
> >> > ascending order, first the title with numeric shows up then
> >> > the titles in
> >> > alphabetical order which starts with Upper Case & after
> >> > that the titles
> >> > starting with Lowercase.
> >> >
> >> > The title field is indexed as text_general fieldtype.
> >> >
> >> > <field name="title" type="text_general" indexed="true"
> >> > stored="true"/>
> >>
> >> Please see Otis' response http://search-lucene.com/m/uDxTF1scW0d2
> >>
> >> Simply create an additional field named title_sortable with the
> following
> >> type
> >>
> >>  <!-- lowercases the entire field value, keeping it as a single token.
>  -->
> >>     <fieldType name="lowercase" class="solr.TextField"
> >> positionIncrementGap="100">
> >>       <analyzer>
> >>         <tokenizer class="solr.KeywordTokenizerFactory"/>
> >>         <filter class="solr.LowerCaseFilterFactory" />
> >>         <filter class="solr.TrimFilterFactory" />
> >>       </analyzer>
> >>     </fieldType>
> >>
> >> Populate it via copyField directive :
> >>
> >>   <copyField source="title" dest="title_sortable" maxChars="N"/>
> >>
> >> then &sort=title_sortable asc
> >>
> >>
> >>
>
>
>
> --
> Lance Norskog
> goksron@gmail.com
>



-- 
Thanks & Regards
Anupam Bhattacharya

Re: Sorting fields of text_general fieldType

Posted by Lance Norskog <go...@gmail.com>.
Give us some pairs of titles which sort the wrong way.

On Thu, Aug 2, 2012 at 10:06 AM, Anupam Bhattacharya
<an...@gmail.com> wrote:
> The approach used to work perfectly.
>
> But recently i realized that it is not working for more than 300000 indexed
> records.
> I am using SOLR 3.5 version.
>
> Is there another approach to SORT a title field in proper alphabetical
> order irrespective of Lower case and Upper case.
>
> Regards
> Anupam
>
> On Thu, May 17, 2012 at 4:32 PM, Ahmet Arslan <io...@yahoo.com> wrote:
>
>> > The title sort works in a strange manner because the SOLR
>> > server treats
>> > title string based on Upper Case or Lower Case String. Thus
>> > if we sort in
>> > ascending order, first the title with numeric shows up then
>> > the titles in
>> > alphabetical order which starts with Upper Case & after
>> > that the titles
>> > starting with Lowercase.
>> >
>> > The title field is indexed as text_general fieldtype.
>> >
>> > <field name="title" type="text_general" indexed="true"
>> > stored="true"/>
>>
>> Please see Otis' response http://search-lucene.com/m/uDxTF1scW0d2
>>
>> Simply create an additional field named title_sortable with the following
>> type
>>
>>  <!-- lowercases the entire field value, keeping it as a single token.  -->
>>     <fieldType name="lowercase" class="solr.TextField"
>> positionIncrementGap="100">
>>       <analyzer>
>>         <tokenizer class="solr.KeywordTokenizerFactory"/>
>>         <filter class="solr.LowerCaseFilterFactory" />
>>         <filter class="solr.TrimFilterFactory" />
>>       </analyzer>
>>     </fieldType>
>>
>> Populate it via copyField directive :
>>
>>   <copyField source="title" dest="title_sortable" maxChars="N"/>
>>
>> then &sort=title_sortable asc
>>
>>
>>



-- 
Lance Norskog
goksron@gmail.com