You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by themanwho <th...@mac.com> on 2011/10/05 16:53:49 UTC
Sorting by article title
Hi all!
I have documents, all of which have a title, and I would like to sort by
that title. The catch is, I wish to sort ignoring any "A" or "The" at the
beginning of the title.
My first (and only) attempt is by creating a type that looks like:
<fieldType name="titleSort" class="solr.TextField"
sortMissingLast="true" omitNorms="true">
<analyzer>
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.TrimFilterFactory"/>
<filter class="solr.PatternReplaceFilterFactory"
pattern="([^a-z])" replacement="" replace="all" />
<filter class="solr.PatternReplaceFilterFactory"
pattern="^the\s" replacement="" replace="first" />
<filter class="solr.PatternReplaceFilterFactory"
pattern="^a\s" replacement="" replace="first" />
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"/>
</analyzer>
</fieldType>
Also, the StopFilter should do the same thing I think, so there is some
redundancy here too, right?
and a field that looks like:
<field name="title.main" type="stringSort" indexed="true"
maxChars="32" stored="true" multiValued="false"/>
I copyField my original title to this field at index time.
However, when I add "sort=title.main asc" to my query, the original sort is
what I see.
Clearly, I'm either doing something wrong, or I am misunderstanding
something. Can anybody explain what's up and suggest a way to accomplish
what I need to do?
Thanks in Advance!!
--
View this message in context: http://lucene.472066.n3.nabble.com/Sorting-by-article-title-tp3396743p3396743.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: Sorting by article title
Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
Hi,
You can also check out LUCENE-3413 [1] and the CombiningFilter that
I wrote and associated example. This lets you:
1. perform normal tokenization and analysis in your analysis chain
2. recombine the tokens at the end for sorting purposes
HTH,
Chris
[1] https://issues.apache.org/jira/browse/LUCENE-3413
On Oct 5, 2011, at 12:47 PM, themanwho wrote:
> OK, I'm going to answer my own question -- it was probably so obvious that
> nobody else wanted answer such an easy one!
>
> I simply needed to apply
>
> <filter class="solr.PatternReplaceFilterFactory"
> pattern="([^a-z])" replacement="" replace="all" />
>
> after
>
> <filter class="solr.PatternReplaceFilterFactory"
> pattern="^the\s" replacement="" replace="first" />
> <filter class="solr.PatternReplaceFilterFactory"
> pattern="^a\s" replacement="" replace="first" />
>
> instead of before, as I had it originally. Otherwise "the\s" and "a\s" is
> never matched!
>
> Hope this maybe helps somebody else...
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Sorting-by-article-title-tp3396743p3397694.html
> Sent from the Solr - User mailing list archive at Nabble.com.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattmann@nasa.gov
WWW: http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Re: Sorting by article title
Posted by themanwho <th...@mac.com>.
OK, I'm going to answer my own question -- it was probably so obvious that
nobody else wanted answer such an easy one!
I simply needed to apply
<filter class="solr.PatternReplaceFilterFactory"
pattern="([^a-z])" replacement="" replace="all" />
after
<filter class="solr.PatternReplaceFilterFactory"
pattern="^the\s" replacement="" replace="first" />
<filter class="solr.PatternReplaceFilterFactory"
pattern="^a\s" replacement="" replace="first" />
instead of before, as I had it originally. Otherwise "the\s" and "a\s" is
never matched!
Hope this maybe helps somebody else...
--
View this message in context: http://lucene.472066.n3.nabble.com/Sorting-by-article-title-tp3396743p3397694.html
Sent from the Solr - User mailing list archive at Nabble.com.