You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by dhastings <dh...@wshein.com> on 2011/07/01 15:42:58 UTC

upgraded from 2.9 to 3.x, problems. help?

i recently upgraded al systems for indexing and searching to lucene/solr 3.1,
and unfortunatly it seems theres a lot more changes under the hood than
there used to be.

i have a java based indexer and a solr based searcher, on the java end for
the indexing this is what i have:

   Set nostopwords = new HashSet(); 
nostopwords.add("needtoindexstopwords");
Analyzer an = new StandardAnalyzer(Version.LUCENE_31, nostopwords);
         writer
            = new IndexWriter(fsDir,an,MaxFieldLength.UNLIMITED);
....
doc.add(new Field("text", contents, Field.Store.NO, Field.Index.ANALYZED));

and for the solr end i have:
 <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
     
        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"/>
      <analyzer class="org.apache.lucene.analysis.standard.StandardAnalyzer"
ignoreCase="true" />
    </fieldType>
    

and it seems to be working well enough, EXCEPT i somehow lost matching
against strings like:
"97 Yale L.J. 1493" 
which with 2.9 would give me 753 results in my data, and now 3.1 gives me
105

is there something i can change to the indexer to be able to understand what
used to be default behavior with the standard analyzer, or is this something
with my solr schema against the data?



--
View this message in context: http://lucene.472066.n3.nabble.com/upgraded-from-2-9-to-3-x-problems-help-tp3129348p3129348.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: upgraded from 2.9 to 3.x, problems. help?

Posted by Erick Erickson <er...@gmail.com>.

Can you post the results of adding &debugQuery=on to your two versions? And
have you re-indexed or not?

Best
Erick
On Jul 1, 2011 12:31 PM, "dhastings" <dh...@wshein.com> wrote:
> i guess what im asking is how to set up solr/lucene to find
> yale l.j.
> yale l. j.
> yale l j
> as all the same thing.
>
> --
> View this message in context:
http://lucene.472066.n3.nabble.com/upgraded-from-2-9-to-3-x-problems-help-tp3129348p3129520.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: upgraded from 2.9 to 3.x, problems. help?

Posted by dhastings <dh...@wshein.com>.

i guess what im asking is how to set up solr/lucene to find 
yale l.j.
yale l. j.
yale l j
as all the same thing.

--
View this message in context: http://lucene.472066.n3.nabble.com/upgraded-from-2-9-to-3-x-problems-help-tp3129348p3129520.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: upgraded from 2.9 to 3.x, problems. help?

Posted by Chris Hostetter <ho...@fucit.org>.

: i recently upgraded al systems for indexing and searching to lucene/solr 3.1,
: and unfortunatly it seems theres a lot more changes under the hood than
: there used to be.

it wounds like you are saying you had a system that wsa working fine for 
you, but when you tried to upgrade it stoped working.

: i have a java based indexer and a solr based searcher, on the java end for
	...
: Analyzer an = new StandardAnalyzer(Version.LUCENE_31, nostopwords);

right off the bat, that line of code couldn't posisly have been in your 
existing 2.9 code (Version.LUCENE_31 didn't existing in 2.9) and 
instructs StandardAnalyzer to to some very basic things very 
differnetly then they were dong in 2.9...

http://lucene.apache.org/java/3_1_0/api/all/org/apache/lucene/analysis/standard/StandardAnalyzer.html

I would start by setting that to Version.LUCENE_29 to tell 
StandardAnalyzer that you want the same behavior as before.

Having said all of that -- the LUCENE_31 is considered better then the 
LUCENE_29 behavior, so you should consider change that to get the benefits 
-- but you need to understand your full analysis stack to do that.

: and for the solr end i have:

...you should also check if you added a <luceneMatchVersion/> of LUCENE_31 
to your solrconfig.xml -- if not do so so it's consistent with your 
external java code.

generally speaking just having your indexer using an off the shelf 
analyzer while your solr instead uses something like WordDelimiterFilter 
isn't going to work well, you need to think about index time analysis and 
query time anslysis in conjunction with eachother.

hang on, scratch that -- you may think you are using 
WordDelimiterFilterFactory, but you are not...

:  <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
:      
:         <filter class="solr.WordDelimiterFilterFactory"
: generateWordParts="1" generateNumberParts="1" catenateWords="1"
: catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"/>
:       <analyzer class="org.apache.lucene.analysis.standard.StandardAnalyzer"
: ignoreCase="true" />
:     </fieldType>

...you can't just plop a <filter/> tag in a <fieldType/> like htat nad 
have it mean something.  <filter/> can be used when you are declaring an 
custom analyzer chain in the schema.xml, if you use <analyzer class="..." 
/> you get a concrete analyzer that has hardcoded behavior.

so if you aren't getting matches, it's a straight up discrepency between 
the LUCENE_31 and whatever seting you have in solrconfig.xml (which if you 
didn't add to your existing config, is going to be a legacy default ... 
2.4 or 2.9 ... i can't remember)


-Hoss