You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by jo...@aol.com on 2010/10/28 22:44:53 UTC

Upgrading from Solr 1.2 to 1.4.1

I'm using Solr 1.2.  If I upgrade to 1.4.1, must I re-index because of LUCENE-1142?  If so, how will this affect me if I don’t re-index (I'm using EnglishPorterFilterFactory)?  What about when I’m using non-English stammers from Snowball?
 
Beside the brief note "IMPORTANT UPGRADE NOTE" about this in CHANGES.txt, where can I read more about this?  I looked in JIRA, LUCENE-1142, there isn't much.
 
-M

Re: Upgrading from Solr 1.2 to 1.4.1

Posted by Lance Norskog <go...@gmail.com>.
Yes, from Solr 1.2 to 1.3/Lucene 2.4.1 to 2.9 there was a change in
the Porter stemmer for English. I don't know what it was. It may also
affect the other language variants of the stemmer.

If stemming is important for your users, you might want to try the
Solr 3.x branch instead, or find Lucid's KStem implementation for
1.4.1. 3.x has a lot of work on better stemmers for many languages.

On Thu, Oct 28, 2010 at 2:23 PM, Robert Muir <rc...@gmail.com> wrote:
> On Thu, Oct 28, 2010 at 4:44 PM,  <jo...@aol.com> wrote:
>>
>> I'm using Solr 1.2.  If I upgrade to 1.4.1, must I re-index because of LUCENE-1142?  If so, how will this affect me if I don’t re-index (I'm using EnglishPorterFilterFactory)?  What about when I’m using non-English stammers from Snowball?
>>
>> Beside the brief note "IMPORTANT UPGRADE NOTE" about this in CHANGES.txt, where can I read more about this?  I looked in JIRA, LUCENE-1142, there isn't much.
>
> I haven't looked in detail regarding these changes, but the snowball
> was upgraded to revision 500 here.
> you can see the revisions/logs of the various algorithms here:
> http://svn.tartarus.org/snowball/trunk/snowball/algorithms/?pathrev=500
>
> One problem being, i don't know the previous revision you were
> using...but since it had no Hungarian before LUCENE-1142, it couldnt
> have possibly been any *later* than revision 385:
>
>    Revision 385 - Directory Listing
>    Added Mon Sep 4 14:06:56 2006 UTC (4 years, 1 month ago) by martin
>    New Hungarian stemmer
>
> This means, for example, that you would certainly be affected by
> changes in the english stemmer such as revision 414, among others:
>
>    Revision 414 - Directory Listing
>    Modified Mon Nov 20 10:49:29 2006 UTC (3 years, 11 months ago) by martin
>    'arsen' as exceptional p1 position, to prevent 'arsenic' and
> 'arsenal' conflating
>
> In my opinion, it would be best to re-index.
>



-- 
Lance Norskog
goksron@gmail.com

Re: Upgrading from Solr 1.2 to 1.4.1

Posted by Robert Muir <rc...@gmail.com>.
On Thu, Oct 28, 2010 at 4:44 PM,  <jo...@aol.com> wrote:
>
> I'm using Solr 1.2.  If I upgrade to 1.4.1, must I re-index because of LUCENE-1142?  If so, how will this affect me if I don’t re-index (I'm using EnglishPorterFilterFactory)?  What about when I’m using non-English stammers from Snowball?
>
> Beside the brief note "IMPORTANT UPGRADE NOTE" about this in CHANGES.txt, where can I read more about this?  I looked in JIRA, LUCENE-1142, there isn't much.

I haven't looked in detail regarding these changes, but the snowball
was upgraded to revision 500 here.
you can see the revisions/logs of the various algorithms here:
http://svn.tartarus.org/snowball/trunk/snowball/algorithms/?pathrev=500

One problem being, i don't know the previous revision you were
using...but since it had no Hungarian before LUCENE-1142, it couldnt
have possibly been any *later* than revision 385:

    Revision 385 - Directory Listing
    Added Mon Sep 4 14:06:56 2006 UTC (4 years, 1 month ago) by martin
    New Hungarian stemmer

This means, for example, that you would certainly be affected by
changes in the english stemmer such as revision 414, among others:

    Revision 414 - Directory Listing
    Modified Mon Nov 20 10:49:29 2006 UTC (3 years, 11 months ago) by martin
    'arsen' as exceptional p1 position, to prevent 'arsenic' and
'arsenal' conflating

In my opinion, it would be best to re-index.