You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by themanwho <th...@mac.com> on 2011/10/05 16:53:49 UTC

Sorting by article title

Hi all!

I have documents, all of which have a title, and I would like to sort by
that title.  The catch is, I wish to sort ignoring any "A" or "The" at the
beginning of the title.  

My first (and only) attempt is by creating a type that looks like:

        <fieldType name="titleSort" class="solr.TextField"
           sortMissingLast="true" omitNorms="true">
          <analyzer>
            <tokenizer class="solr.KeywordTokenizerFactory"/>
            <filter class="solr.LowerCaseFilterFactory"/>
            <filter class="solr.TrimFilterFactory"/>
            <filter class="solr.PatternReplaceFilterFactory"
                pattern="([^a-z])" replacement="" replace="all" />
            <filter class="solr.PatternReplaceFilterFactory"
                pattern="^the\s" replacement="" replace="first" />
            <filter class="solr.PatternReplaceFilterFactory"
                pattern="^a\s" replacement="" replace="first" />
            <filter class="solr.StopFilterFactory" ignoreCase="true"
                words="stopwords.txt"/>
          </analyzer>
        </fieldType>

Also, the StopFilter should do the same thing I think, so there is some
redundancy here too, right?

and a field that looks like:

        <field name="title.main" type="stringSort" indexed="true"
           maxChars="32" stored="true" multiValued="false"/>

I copyField my original title to this field at index time.

However, when I add "sort=title.main asc" to my query, the original sort is
what I see.

Clearly, I'm either doing something wrong, or I am misunderstanding
something.  Can anybody explain what's up and suggest a way to accomplish
what I need to do?

Thanks in Advance!!

--
View this message in context: http://lucene.472066.n3.nabble.com/Sorting-by-article-title-tp3396743p3396743.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Sorting by article title

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
Hi,

You can also check out LUCENE-3413 [1] and the CombiningFilter that 
I wrote and associated example. This lets you:

1. perform normal tokenization and analysis in your analysis chain
2. recombine the tokens at the end for sorting purposes

HTH,
Chris

[1] https://issues.apache.org/jira/browse/LUCENE-3413

On Oct 5, 2011, at 12:47 PM, themanwho wrote:

> OK, I'm going to answer my own question -- it was probably so obvious that
> nobody else wanted answer such an easy one!
> 
> I simply needed to apply
> 
>    <filter class="solr.PatternReplaceFilterFactory" 
>        pattern="([^a-z])" replacement="" replace="all" />
> 
> after
> 
>    <filter class="solr.PatternReplaceFilterFactory" 
>        pattern="^the\s" replacement="" replace="first" />
>    <filter class="solr.PatternReplaceFilterFactory" 
>        pattern="^a\s" replacement="" replace="first" />
> 
> instead of before, as I had it originally.  Otherwise "the\s" and "a\s" is
> never matched!
> 
> Hope this maybe helps somebody else...
> 
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Sorting-by-article-title-tp3396743p3397694.html
> Sent from the Solr - User mailing list archive at Nabble.com.


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattmann@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Re: Sorting by article title

Posted by themanwho <th...@mac.com>.
OK, I'm going to answer my own question -- it was probably so obvious that
nobody else wanted answer such an easy one!

I simply needed to apply

    <filter class="solr.PatternReplaceFilterFactory" 
        pattern="([^a-z])" replacement="" replace="all" />

after

    <filter class="solr.PatternReplaceFilterFactory" 
        pattern="^the\s" replacement="" replace="first" />
    <filter class="solr.PatternReplaceFilterFactory" 
        pattern="^a\s" replacement="" replace="first" />

instead of before, as I had it originally.  Otherwise "the\s" and "a\s" is
never matched!

Hope this maybe helps somebody else...

--
View this message in context: http://lucene.472066.n3.nabble.com/Sorting-by-article-title-tp3396743p3397694.html
Sent from the Solr - User mailing list archive at Nabble.com.