You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Jakub Godawa <ja...@gmail.com> on 2010/11/02 13:07:50 UTC

Re: How to use polish stemmer - Stempel - in schema.xml?

Thank you Bernd! I couldn't make it run though. Here is my problem:

1. There is a file ~/apache-solr-1.4.1/ifaq/lib/stempel-1.0.jar
2. In ~/apache-solr-1.4.1/ifaq/solr/conf/solrconfig.xml there is a
directive: <lib path="../lib/stempel-1.0.jar" />
3. In ~/apache-solr-1.4.1/ifaq/solr/conf/schema.xml there is fieldType:

(...)
  <!-- Polish -->
  <fieldType name="text_pl" class="solr.TextField">
    <analyzer>
      <tokenizer class="solr.WhitespaceTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="org.getopt.stempel.lucene.StempelFilter" />
      <!--    <filter
class="org.getopt.solr.analysis.StempelTokenFilterFactory"
protected="protwords.txt" /> -->
    </analyzer>
  </fieldType>
(...)

4. jar file is loaded but I got an error:
SEVERE: Could not start SOLR. Check solr/home property
java.lang.NoClassDefFoundError: org/apache/lucene/analysis/TokenFilter
      at java.lang.ClassLoader.defineClass1(Native Method)
      at java.lang.ClassLoader.defineClass(ClassLoader.java:634)
      at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
(...)

5. Different class gave me that one:
SEVERE: org.apache.solr.common.SolrException: Error loading class
'org.getopt.solr.analysis.StempelTokenFilterFactory'
      at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:375)
      at org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:390)
(...)

Question is: How to make <fieldType /> and <filter /> work with that Stempel? :)

Cheers,
Jakub Godawa.

2010/10/29 Bernd Fehling <be...@uni-bielefeld.de>:
> Hi Jakub,
>
> I have ported the KStemmer for use in most recent Solr trunk version.
> My stemmer is located in the lib directory of Solr "solr/lib/KStemmer-2.00.jar"
> because it belongs to Solr.
>
> Write it as FilterFactory and use it as Filter like:
> <filter class="de.ubbielefeld.solr.analysis.KStemFilterFactory" protected="protwords.txt" />
>
> This is how my fieldType looks like:
>
>    <fieldType name="text_kstem" class="solr.TextField" positionIncrementGap="100">
>      <analyzer type="index">
>        <tokenizer class="solr.WhitespaceTokenizerFactory" />
>        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="false" />
>        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1"
> catenateAll="0" splitOnCaseChange="1" />
>        <filter class="solr.LowerCaseFilterFactory" />
>        <filter class="de.ubbielefeld.solr.analysis.KStemFilterFactory" protected="protwords.txt" />
>        <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
>      </analyzer>
>      <analyzer type="query">
>        <tokenizer class="solr.WhitespaceTokenizerFactory" />
>        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
>        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0"
> catenateAll="0" splitOnCaseChange="1" />
>        <filter class="solr.LowerCaseFilterFactory" />
>        <filter class="de.ubbielefeld.solr.analysis.KStemFilterFactory" protected="protwords.txt" />
>        <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
>      </analyzer>
>    </fieldType>
>
> Regards,
> Bernd
>
>
>
> Am 28.10.2010 14:56, schrieb Jakub Godawa:
>> Hi!
>> There is a polish stemmer http://www.getopt.org/stempel/ and I have
>> problems connecting it with solr 1.4.1
>> Questions:
>>
>> 1. Where EXACTLY do I put "stemper-1.0.jar" file?
>> 2. How do I register the file, so I can build a fieldType like:
>>
>> <fieldType name="text_pl" class="solr.TextField">
>>   <analyzer class="org.geoopt.solr.analysis.StempelTokenFilterFactory"/>
>> </fieldType>
>>
>> 3. Is that the right approach to make it work?
>>
>> Thanks for verbose explanation,
>> Jakub.
>

Re: How to use polish stemmer - Stempel - in schema.xml?

Posted by Jakub Godawa <ja...@gmail.com>.
After all I choose hunspell-solr as a Polish language interpreter. It
"understands" Polish and is much easier to install. But look out! I does
not work with current nightly build - works good with solr 1.4.1!

It just works well, and hey! I got Ukrainian out of the box too. I am
thinking of replacing all required lanugages' SnowballPorterFilters with
*.aff and *.dic support.

Thanks for the help everyone!

On Wed, 2010-11-24 at 19:00 +0100, Jakub Godawa wrote:
> Yes, from the current nightly release setting up Stempel is quite easy.
> 
> All I did was:
> 
> svn co https://svn.apache.org/repos/asf/lucene/dev/trunk ./lucene-solr
> 
> cd lucene-solr/solr
> ant example
> 
> cp ./contrib/analysis-extras/lucene-libs/lucene-analyzers-stempel-4.0-SNAPSHOT.jar ./lib
> cp ./contrib/analysis-extras/build/apache-solr-analysis-extras-4.0-SNAPSHOT.jar ./lib
> 
> in solrschema.xml
> 
> <lib path="../../lib/apache-solr-analysis-extras-4.0-SNAPSHOT.jar" />
> <lib path="../../lib/lucene-analyzers-stempel-4.0-SNAPSHOT.jar" />
> 
> in schema.xml
> 
> <!-- Polish -->
> <fieldType name="text_pl" class="solr.TextField">
>   <analyzer>
>     <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>     <filter class="solr.LowerCaseFilterFactory"/>
>     <filter class="solr.WordDelimiterFilterFactory" />
> 
>     <filter class="solr.StempelPolishStemFilterFactory"
> language="Polish" />
>   </analyzer>
> </fieldType>
> 
> The end.
> 
> Anyway. I don't know if that is Polish stemmer or bad configurated
> fieldType, but the results are just wrong.
> 
> example:
> 
> index for type "text_pl": bilety
> query for type "text_pl": bilet
>  
> Index Analyzer
> 
> org.apache.solr.analysis.StempelPolishStemFilterFactory
> {language=Polish, luceneMatchVersion=LUCENE_24}
> term position
> 1
> term text
> bilić
> term type
> word
> source start,end
> 0,6
> payload
> 
> Query Analyzer
> 
> org.apache.solr.analysis.StempelPolishStemFilterFactory
> {language=Polish, luceneMatchVersion=LUCENE_24}
> term position
> 1
> term text
> binąć
> term type
> word
> source start,end
> 0,5
> payload
> 
> 
> But I imagine the result as: bilet and bilet which are the base.
> 
> Any clues how to make it work like Polish? Maybe someone has good
> experience with hunspell-solr and Polish dictonaries?
> 
> Thanks for letting me know!
> 
> Cheers,
> Jakub Godawa.
> 
> 
> 
> 
> On Mon, 2010-11-15 at 08:35 -0500, Robert Muir wrote:
> > https://issues.apache.org/jira/browse/SOLR-2237
> > 
> > On Mon, Nov 15, 2010 at 5:04 AM, Jakub Godawa <ja...@gmail.com>
> > wrote:
> > > I tried to reach the autors twice, but with no luck. I've seen some
> > > posts where people finally were able to lunch it (without much
> > pain).
> > > I don't know. If any pro would be so nice to try to run the stempel
> > on
> > > his/her machine and paste me some verbose step by step solution I
> > > would really appreciate.
> > >
> > > Cheers,
> > > Jakub Godawa.
> > >
> > > 2010/11/13 Lance Norskog <go...@gmail.com>:
> > >> I don't know of the Stempel jar includes the Java source. At this
> > point I
> > >> think you should ask the author to Stempel to make a Solr front-end
> > for it.
> > >> It's very simple for him.
> > >>
> > >> Jakub Godawa wrote:
> > >>>
> > >>> Am I not doing it in the point no 4? I am compiling all the folder
> > >>> that was extracted before, but now with that new class file.
> > >>>
> > >>> 2010/11/12 Lance Norskog<go...@gmail.com>:
> > >>>
> > >>>>
> > >>>> I think you have to compile all of the stempel source including
> > your
> > >>>> filter factory into one jar at the same time. Everybody does
> > this; I
> > >>>> don't know how different Java versions make class file binaries.
> > >>>>
> > >>>> On Thu, Nov 11, 2010 at 3:06 AM, Jakub
> > Godawa<ja...@gmail.com>
> > >>>>  wrote:
> > >>>>
> > >>>>>
> > >>>>> Hi! Sorry for such a break, but I was moving house... anyway:
> > >>>>>
> > >>>>> 1. I took the
> > >>>>>
> > ~/apache-solr/src/java/org/apache/solr/analysis/StandardFilterFactory.java
> > >>>>> file and modified it (named as StempelFilterFactory.java) in Vim
> > that
> > >>>>> way:
> > >>>>>
> > >>>>> package org.getopt.solr.analysis;
> > >>>>>
> > >>>>> import org.apache.lucene.analysis.TokenStream;
> > >>>>> import org.apache.lucene.analysis.standard.StandardFilter;
> > >>>>>
> > >>>>> public class StempelTokenFilterFactory extends
> > BaseTokenFilterFactory {
> > >>>>>  public StempelFilter create(TokenStream input) {
> > >>>>>    return new StempelFilter(input);
> > >>>>>  }
> > >>>>> }
> > >>>>>
> > >>>>> 2. Then I put the file to the extracted stempel-1.0.jar in
> > >>>>> ./org/getopt/solr/analysis/
> > >>>>> 3. Then I created a class from it: jar -cf
> > >>>>> StempelTokenFilterFactory.class StempelFilterFactory.java
> > >>>>> 4. Then I created new stempel-1.0.jar archive: jar -cf
> > stempel-1.0.jar
> > >>>>> -C ./stempel-1.0/ .
> > >>>>> 5. Then in schema.xml I've put:
> > >>>>>
> > >>>>>    <fieldType name="text_pl" class="solr.TextField">
> > >>>>>      <analyzer>
> > >>>>>        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> > >>>>>        <filter class="solr.LowerCaseFilterFactory"/>
> > >>>>>        <filter
> > >>>>> class="org.getopt.solr.analysis.StempelTokenFilterFactory" />
> > >>>>>      </analyzer>
> > >>>>>    </fieldType>
> > >>>>>
> > >>>>> 6. I started the solr server and I recieved the following error:
> > >>>>>
> > >>>>> 2010-11-11 11:50:56 org.apache.solr.common.SolrException log
> > >>>>> SEVERE: java.lang.ClassFormatError: Incompatible magic value
> > >>>>> 1347093252 in class file
> > >>>>> org/getopt/solr/analysis/StempelTokenFilterFactory
> > >>>>>        at java.lang.ClassLoader.defineClass1(Native Method)
> > >>>>>        at
> > java.lang.ClassLoader.defineClass(ClassLoader.java:634)
> > >>>>>        at
> > >>>>>
> > java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
> > >>>>> ...
> > >>>>>
> > >>>>> Question: What is wrong? :) I use "jar (fastjar) 0.98" to create
> > jars,
> > >>>>> I googled on that error but with no answer gave me idea what is
> > wrong
> > >>>>> in my .java file.
> > >>>>>
> > >>>>> Please help, as I believe I am close to the end of that subject.
> > >>>>>
> > >>>>> Cheers,
> > >>>>> Jakub Godawa.
> > >>>>>
> > >>>>> 2010/11/3 Lance Norskog<go...@gmail.com>:
> > >>>>>
> > >>>>>>
> > >>>>>> Here's the problem: Solr is a little dumb about these Filter
> > classes,
> > >>>>>> and so you have to make a Factory object for the Stempel
> > Filter.
> > >>>>>>
> > >>>>>> There are a lot of other FilterFactory classes. You would have
> > to just
> > >>>>>> copy one and change the names to Stempel and it might actually
> > work.
> > >>>>>>
> > >>>>>> This will take some Solr programming- perhaps the author can
> > help you?
> > >>>>>>
> > >>>>>> On Tue, Nov 2, 2010 at 7:08 AM, Jakub
> > Godawa<ja...@gmail.com>
> > >>>>>>  wrote:
> > >>>>>>
> > >>>>>>>
> > >>>>>>> Sorry, I am not Java programmer at all. I would appreciate
> > more
> > >>>>>>> verbose (or step by step) help.
> > >>>>>>>
> > >>>>>>> 2010/11/2 Bernd Fehling<be...@uni-bielefeld.de>:
> > >>>>>>>
> > >>>>>>>>
> > >>>>>>>> So you call
> > org.getopt.solr.analysis.StempelTokenFilterFactory.
> > >>>>>>>> In this case I would assume a file
> > StempelTokenFilterFactory.class
> > >>>>>>>> in your directory org/getopt/solr/analysis/.
> > >>>>>>>>
> > >>>>>>>> And a class which extends the BaseTokenFilterFactory rigth?
> > >>>>>>>> ...
> > >>>>>>>> public class StempelTokenFilterFactory extends
> > BaseTokenFilterFactory
> > >>>>>>>> implements ResourceLoaderAware {
> > >>>>>>>> ...
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> Am 02.11.2010 14:20, schrieb Jakub Godawa:
> > >>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> This is what stempel-1.0.jar consist of after jar -xf:
> > >>>>>>>>>
> > >>>>>>>>> jgodawa@ubuntu:~/apache-solr-1.4.1/ifaq/lib$ ls -R org/
> > >>>>>>>>> org/:
> > >>>>>>>>> egothor  getopt
> > >>>>>>>>>
> > >>>>>>>>> org/egothor:
> > >>>>>>>>> stemmer
> > >>>>>>>>>
> > >>>>>>>>> org/egothor/stemmer:
> > >>>>>>>>> Cell.class     Diff.class    Gener.class  MultiTrie2.class
> > >>>>>>>>> Optimizer2.class  Reduce.class        Row.class
> >  TestAll.class
> > >>>>>>>>> TestLoad.class  Trie$StrEnum.class
> > >>>>>>>>> Compile.class  DiffIt.class  Lift.class   MultiTrie.class
> > >>>>>>>>> Optimizer.class   Reduce$Remap.class  Stock.class
> >  Test.class
> > >>>>>>>>> Trie.class
> > >>>>>>>>>
> > >>>>>>>>> org/getopt:
> > >>>>>>>>> stempel
> > >>>>>>>>>
> > >>>>>>>>> org/getopt/stempel:
> > >>>>>>>>> Benchmark.class  lucene  Stemmer.class
> > >>>>>>>>>
> > >>>>>>>>> org/getopt/stempel/lucene:
> > >>>>>>>>> StempelAnalyzer.class  StempelFilter.class
> > >>>>>>>>> jgodawa@ubuntu:~/apache-solr-1.4.1/ifaq/lib$ ls -R META-INF/
> > >>>>>>>>> META-INF/:
> > >>>>>>>>> MANIFEST.MF
> > >>>>>>>>> jgodawa@ubuntu:~/apache-solr-1.4.1/ifaq/lib$ ls -R res
> > >>>>>>>>> res:
> > >>>>>>>>> tables
> > >>>>>>>>>
> > >>>>>>>>> res/tables:
> > >>>>>>>>> readme.txt  stemmer_1000.out  stemmer_100.out
> >  stemmer_2000.out
> > >>>>>>>>> stemmer_200.out  stemmer_500.out  stemmer_700.out
> > >>>>>>>>>
> > >>>>>>>>> 2010/11/2 Bernd Fehling<be...@uni-bielefeld.de>:
> > >>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> Hi Jakub,
> > >>>>>>>>>>
> > >>>>>>>>>> if you unzip your stempel-1.0.jar do you have the
> > >>>>>>>>>> required directory structure and file in there?
> > >>>>>>>>>> org/getopt/stempel/lucene/StempelFilter.class
> > >>>>>>>>>>
> > >>>>>>>>>> Regards,
> > >>>>>>>>>> Bernd
> > >>>>>>>>>>
> > >>>>>>>>>> Am 02.11.2010 13:54, schrieb Jakub Godawa:
> > >>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>> Erick I've put the jar files like that before. I also
> > added the
> > >>>>>>>>>>> directive and put the file in instanceDir/lib
> > >>>>>>>>>>>
> > >>>>>>>>>>> What is still a problem is that even the files are loaded:
> > >>>>>>>>>>> 2010-11-02 13:20:48
> > org.apache.solr.core.SolrResourceLoader
> > >>>>>>>>>>> replaceClassLoader
> > >>>>>>>>>>> INFO: Adding
> > >>>>>>>>>>>
> > 'file:/home/jgodawa/apache-solr-1.4.1/ifaq/lib/stempel-1.0.jar'
> > >>>>>>>>>>> to classloader
> > >>>>>>>>>>>
> > >>>>>>>>>>> I am not able to use the FilterFactory... maybe I am
> > attempting it
> > >>>>>>>>>>> in
> > >>>>>>>>>>> a wrong way?
> > >>>>>>>>>>>
> > >>>>>>>>>>> Cheers,
> > >>>>>>>>>>> Jakub Godawa.
> > >>>>>>>>>>>
> > >>>>>>>>>>> 2010/11/2 Erick Erickson<er...@gmail.com>:
> > >>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> The polish stemmer jar file needs to be findable by Solr,
> > if you
> > >>>>>>>>>>>> copy
> > >>>>>>>>>>>> it to<solr_home>/lib and restart solr you should be set.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Alternatively, you can add another<lib>  directive to the
> > >>>>>>>>>>>> solrconfig.xml
> > >>>>>>>>>>>> file
> > >>>>>>>>>>>> (there are several examples in that file already).
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> I'm a little confused about not being able to find
> > TokenFilter,
> > >>>>>>>>>>>> is that
> > >>>>>>>>>>>> still
> > >>>>>>>>>>>> a problem?
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> HTH
> > >>>>>>>>>>>> Erick
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> On Tue, Nov 2, 2010 at 8:07 AM, Jakub
> > >>>>>>>>>>>> Godawa<ja...@gmail.com>  wrote:
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Thank you Bernd! I couldn't make it run though. Here is
> > my
> > >>>>>>>>>>>>> problem:
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> 1. There is a file
> > ~/apache-solr-1.4.1/ifaq/lib/stempel-1.0.jar
> > >>>>>>>>>>>>> 2. In ~/apache-solr-1.4.1/ifaq/solr/conf/solrconfig.xml
> > there is
> > >>>>>>>>>>>>> a
> > >>>>>>>>>>>>> directive:<lib path="../lib/stempel-1.0.jar" />
> > >>>>>>>>>>>>> 3. In ~/apache-solr-1.4.1/ifaq/solr/conf/schema.xml
> > there is
> > >>>>>>>>>>>>> fieldType:
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> (...)
> > >>>>>>>>>>>>>  <!-- Polish -->
> > >>>>>>>>>>>>>   <fieldType name="text_pl" class="solr.TextField">
> > >>>>>>>>>>>>>    <analyzer>
> > >>>>>>>>>>>>>       <tokenizer
> > class="solr.WhitespaceTokenizerFactory"/>
> > >>>>>>>>>>>>>      <filter class="solr.LowerCaseFilterFactory"/>
> > >>>>>>>>>>>>>      <filter
> > class="org.getopt.stempel.lucene.StempelFilter" />
> > >>>>>>>>>>>>>      <!--<filter
> > >>>>>>>>>>>>>
> > class="org.getopt.solr.analysis.StempelTokenFilterFactory"
> > >>>>>>>>>>>>> protected="protwords.txt" />  -->
> > >>>>>>>>>>>>>    </analyzer>
> > >>>>>>>>>>>>>  </fieldType>
> > >>>>>>>>>>>>> (...)
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> 4. jar file is loaded but I got an error:
> > >>>>>>>>>>>>> SEVERE: Could not start SOLR. Check solr/home property
> > >>>>>>>>>>>>> java.lang.NoClassDefFoundError:
> > >>>>>>>>>>>>> org/apache/lucene/analysis/TokenFilter
> > >>>>>>>>>>>>>      at java.lang.ClassLoader.defineClass1(Native
> > Method)
> > >>>>>>>>>>>>>      at
> > java.lang.ClassLoader.defineClass(ClassLoader.java:634)
> > >>>>>>>>>>>>>      at
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
> > >>>>>>>>>>>>> (...)
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> 5. Different class gave me that one:
> > >>>>>>>>>>>>> SEVERE: org.apache.solr.common.SolrException: Error
> > loading
> > >>>>>>>>>>>>> class
> > >>>>>>>>>>>>> 'org.getopt.solr.analysis.StempelTokenFilterFactory'
> > >>>>>>>>>>>>>      at
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:375)
> > >>>>>>>>>>>>>      at
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:390)
> > >>>>>>>>>>>>> (...)
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Question is: How to make<fieldType />  and<filter />
> >  work with
> > >>>>>>>>>>>>> that
> > >>>>>>>>>>>>> Stempel? :)
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Cheers,
> > >>>>>>>>>>>>> Jakub Godawa.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> 2010/10/29 Bernd
> > Fehling<be...@uni-bielefeld.de>:
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Hi Jakub,
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> I have ported the KStemmer for use in most recent Solr
> > trunk
> > >>>>>>>>>>>>>> version.
> > >>>>>>>>>>>>>> My stemmer is located in the lib directory of Solr
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> "solr/lib/KStemmer-2.00.jar"
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> because it belongs to Solr.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Write it as FilterFactory and use it as Filter like:
> > >>>>>>>>>>>>>> <filter
> > class="de.ubbielefeld.solr.analysis.KStemFilterFactory"
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> protected="protwords.txt" />
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> This is how my fieldType looks like:
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>    <fieldType name="text_kstem" class="solr.TextField"
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> positionIncrementGap="100">
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>      <analyzer type="index">
> > >>>>>>>>>>>>>>        <tokenizer
> > class="solr.WhitespaceTokenizerFactory" />
> > >>>>>>>>>>>>>>        <filter class="solr.StopFilterFactory"
> > ignoreCase="true"
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> words="stopwords.txt"
> > enablePositionIncrements="false" />
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>        <filter class="solr.WordDelimiterFilterFactory"
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> generateWordParts="1" generateNumberParts="1"
> > catenateWords="1"
> > >>>>>>>>>>>>> catenateNumbers="1"
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> catenateAll="0" splitOnCaseChange="1" />
> > >>>>>>>>>>>>>>        <filter class="solr.LowerCaseFilterFactory" />
> > >>>>>>>>>>>>>>        <filter
> > >>>>>>>>>>>>>> class="de.ubbielefeld.solr.analysis.KStemFilterFactory"
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> protected="protwords.txt" />
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>        <filter
> > class="solr.RemoveDuplicatesTokenFilterFactory"
> > >>>>>>>>>>>>>> />
> > >>>>>>>>>>>>>>      </analyzer>
> > >>>>>>>>>>>>>>      <analyzer type="query">
> > >>>>>>>>>>>>>>        <tokenizer
> > class="solr.WhitespaceTokenizerFactory" />
> > >>>>>>>>>>>>>>        <filter class="solr.StopFilterFactory"
> > ignoreCase="true"
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> words="stopwords.txt" />
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>        <filter class="solr.WordDelimiterFilterFactory"
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> generateWordParts="1" generateNumberParts="1"
> > catenateWords="0"
> > >>>>>>>>>>>>> catenateNumbers="0"
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> catenateAll="0" splitOnCaseChange="1" />
> > >>>>>>>>>>>>>>        <filter class="solr.LowerCaseFilterFactory" />
> > >>>>>>>>>>>>>>        <filter
> > >>>>>>>>>>>>>> class="de.ubbielefeld.solr.analysis.KStemFilterFactory"
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> protected="protwords.txt" />
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>        <filter
> > class="solr.RemoveDuplicatesTokenFilterFactory"
> > >>>>>>>>>>>>>> />
> > >>>>>>>>>>>>>>      </analyzer>
> > >>>>>>>>>>>>>>    </fieldType>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Regards,
> > >>>>>>>>>>>>>> Bernd
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Am 28.10.2010 14:56, schrieb Jakub Godawa:
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> Hi!
> > >>>>>>>>>>>>>>> There is a polish stemmer
> > http://www.getopt.org/stempel/ and I
> > >>>>>>>>>>>>>>> have
> > >>>>>>>>>>>>>>> problems connecting it with solr 1.4.1
> > >>>>>>>>>>>>>>> Questions:
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> 1. Where EXACTLY do I put "stemper-1.0.jar" file?
> > >>>>>>>>>>>>>>> 2. How do I register the file, so I can build a
> > fieldType
> > >>>>>>>>>>>>>>> like:
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> <fieldType name="text_pl" class="solr.TextField">
> > >>>>>>>>>>>>>>>   <analyzer
> > >>>>>>>>>>>>>>>
> > class="org.geoopt.solr.analysis.StempelTokenFilterFactory"/>
> > >>>>>>>>>>>>>>> </fieldType>
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> 3. Is that the right approach to make it work?
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> Thanks for verbose explanation,
> > >>>>>>>>>>>>>>> Jakub.
> > >>>>>>>>>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>> --
> > >>>>>> Lance Norskog
> > >>>>>> goksron@gmail.com
> > >>>>>>
> > >>>>>>
> > >>>>>
> > >>>>>
> > >>>>
> > >>>>
> > >>>> --
> > >>>> Lance Norskog
> > >>>> goksron@gmail.com
> > >>>>
> > >>>>
> > >>
> > >
> 


Re: How to use polish stemmer - Stempel - in schema.xml?

Posted by Jakub Godawa <ja...@gmail.com>.
On Wed, 2010-11-24 at 19:00 +0100, Jakub Godawa wrote:
> Yes, from the current nightly release setting up Stempel is quite easy.
Thanks to Rober Muir :)


Re: How to use polish stemmer - Stempel - in schema.xml?

Posted by Jakub Godawa <ja...@gmail.com>.
Yes, from the current nightly release setting up Stempel is quite easy.

All I did was:

svn co https://svn.apache.org/repos/asf/lucene/dev/trunk ./lucene-solr

cd lucene-solr/solr
ant example

cp ./contrib/analysis-extras/lucene-libs/lucene-analyzers-stempel-4.0-SNAPSHOT.jar ./lib
cp ./contrib/analysis-extras/build/apache-solr-analysis-extras-4.0-SNAPSHOT.jar ./lib

in solrschema.xml

<lib path="../../lib/apache-solr-analysis-extras-4.0-SNAPSHOT.jar" />
<lib path="../../lib/lucene-analyzers-stempel-4.0-SNAPSHOT.jar" />

in schema.xml

<!-- Polish -->
<fieldType name="text_pl" class="solr.TextField">
  <analyzer>
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.WordDelimiterFilterFactory" />

    <filter class="solr.StempelPolishStemFilterFactory"
language="Polish" />
  </analyzer>
</fieldType>

The end.

Anyway. I don't know if that is Polish stemmer or bad configurated
fieldType, but the results are just wrong.

example:

index for type "text_pl": bilety
query for type "text_pl": bilet
 
Index Analyzer

org.apache.solr.analysis.StempelPolishStemFilterFactory
{language=Polish, luceneMatchVersion=LUCENE_24}
term position
1
term text
bilić
term type
word
source start,end
0,6
payload

Query Analyzer

org.apache.solr.analysis.StempelPolishStemFilterFactory
{language=Polish, luceneMatchVersion=LUCENE_24}
term position
1
term text
binąć
term type
word
source start,end
0,5
payload


But I imagine the result as: bilet and bilet which are the base.

Any clues how to make it work like Polish? Maybe someone has good
experience with hunspell-solr and Polish dictonaries?

Thanks for letting me know!

Cheers,
Jakub Godawa.




On Mon, 2010-11-15 at 08:35 -0500, Robert Muir wrote:
> https://issues.apache.org/jira/browse/SOLR-2237
> 
> On Mon, Nov 15, 2010 at 5:04 AM, Jakub Godawa <ja...@gmail.com>
> wrote:
> > I tried to reach the autors twice, but with no luck. I've seen some
> > posts where people finally were able to lunch it (without much
> pain).
> > I don't know. If any pro would be so nice to try to run the stempel
> on
> > his/her machine and paste me some verbose step by step solution I
> > would really appreciate.
> >
> > Cheers,
> > Jakub Godawa.
> >
> > 2010/11/13 Lance Norskog <go...@gmail.com>:
> >> I don't know of the Stempel jar includes the Java source. At this
> point I
> >> think you should ask the author to Stempel to make a Solr front-end
> for it.
> >> It's very simple for him.
> >>
> >> Jakub Godawa wrote:
> >>>
> >>> Am I not doing it in the point no 4? I am compiling all the folder
> >>> that was extracted before, but now with that new class file.
> >>>
> >>> 2010/11/12 Lance Norskog<go...@gmail.com>:
> >>>
> >>>>
> >>>> I think you have to compile all of the stempel source including
> your
> >>>> filter factory into one jar at the same time. Everybody does
> this; I
> >>>> don't know how different Java versions make class file binaries.
> >>>>
> >>>> On Thu, Nov 11, 2010 at 3:06 AM, Jakub
> Godawa<ja...@gmail.com>
> >>>>  wrote:
> >>>>
> >>>>>
> >>>>> Hi! Sorry for such a break, but I was moving house... anyway:
> >>>>>
> >>>>> 1. I took the
> >>>>>
> ~/apache-solr/src/java/org/apache/solr/analysis/StandardFilterFactory.java
> >>>>> file and modified it (named as StempelFilterFactory.java) in Vim
> that
> >>>>> way:
> >>>>>
> >>>>> package org.getopt.solr.analysis;
> >>>>>
> >>>>> import org.apache.lucene.analysis.TokenStream;
> >>>>> import org.apache.lucene.analysis.standard.StandardFilter;
> >>>>>
> >>>>> public class StempelTokenFilterFactory extends
> BaseTokenFilterFactory {
> >>>>>  public StempelFilter create(TokenStream input) {
> >>>>>    return new StempelFilter(input);
> >>>>>  }
> >>>>> }
> >>>>>
> >>>>> 2. Then I put the file to the extracted stempel-1.0.jar in
> >>>>> ./org/getopt/solr/analysis/
> >>>>> 3. Then I created a class from it: jar -cf
> >>>>> StempelTokenFilterFactory.class StempelFilterFactory.java
> >>>>> 4. Then I created new stempel-1.0.jar archive: jar -cf
> stempel-1.0.jar
> >>>>> -C ./stempel-1.0/ .
> >>>>> 5. Then in schema.xml I've put:
> >>>>>
> >>>>>    <fieldType name="text_pl" class="solr.TextField">
> >>>>>      <analyzer>
> >>>>>        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> >>>>>        <filter class="solr.LowerCaseFilterFactory"/>
> >>>>>        <filter
> >>>>> class="org.getopt.solr.analysis.StempelTokenFilterFactory" />
> >>>>>      </analyzer>
> >>>>>    </fieldType>
> >>>>>
> >>>>> 6. I started the solr server and I recieved the following error:
> >>>>>
> >>>>> 2010-11-11 11:50:56 org.apache.solr.common.SolrException log
> >>>>> SEVERE: java.lang.ClassFormatError: Incompatible magic value
> >>>>> 1347093252 in class file
> >>>>> org/getopt/solr/analysis/StempelTokenFilterFactory
> >>>>>        at java.lang.ClassLoader.defineClass1(Native Method)
> >>>>>        at
> java.lang.ClassLoader.defineClass(ClassLoader.java:634)
> >>>>>        at
> >>>>>
> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
> >>>>> ...
> >>>>>
> >>>>> Question: What is wrong? :) I use "jar (fastjar) 0.98" to create
> jars,
> >>>>> I googled on that error but with no answer gave me idea what is
> wrong
> >>>>> in my .java file.
> >>>>>
> >>>>> Please help, as I believe I am close to the end of that subject.
> >>>>>
> >>>>> Cheers,
> >>>>> Jakub Godawa.
> >>>>>
> >>>>> 2010/11/3 Lance Norskog<go...@gmail.com>:
> >>>>>
> >>>>>>
> >>>>>> Here's the problem: Solr is a little dumb about these Filter
> classes,
> >>>>>> and so you have to make a Factory object for the Stempel
> Filter.
> >>>>>>
> >>>>>> There are a lot of other FilterFactory classes. You would have
> to just
> >>>>>> copy one and change the names to Stempel and it might actually
> work.
> >>>>>>
> >>>>>> This will take some Solr programming- perhaps the author can
> help you?
> >>>>>>
> >>>>>> On Tue, Nov 2, 2010 at 7:08 AM, Jakub
> Godawa<ja...@gmail.com>
> >>>>>>  wrote:
> >>>>>>
> >>>>>>>
> >>>>>>> Sorry, I am not Java programmer at all. I would appreciate
> more
> >>>>>>> verbose (or step by step) help.
> >>>>>>>
> >>>>>>> 2010/11/2 Bernd Fehling<be...@uni-bielefeld.de>:
> >>>>>>>
> >>>>>>>>
> >>>>>>>> So you call
> org.getopt.solr.analysis.StempelTokenFilterFactory.
> >>>>>>>> In this case I would assume a file
> StempelTokenFilterFactory.class
> >>>>>>>> in your directory org/getopt/solr/analysis/.
> >>>>>>>>
> >>>>>>>> And a class which extends the BaseTokenFilterFactory rigth?
> >>>>>>>> ...
> >>>>>>>> public class StempelTokenFilterFactory extends
> BaseTokenFilterFactory
> >>>>>>>> implements ResourceLoaderAware {
> >>>>>>>> ...
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Am 02.11.2010 14:20, schrieb Jakub Godawa:
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>> This is what stempel-1.0.jar consist of after jar -xf:
> >>>>>>>>>
> >>>>>>>>> jgodawa@ubuntu:~/apache-solr-1.4.1/ifaq/lib$ ls -R org/
> >>>>>>>>> org/:
> >>>>>>>>> egothor  getopt
> >>>>>>>>>
> >>>>>>>>> org/egothor:
> >>>>>>>>> stemmer
> >>>>>>>>>
> >>>>>>>>> org/egothor/stemmer:
> >>>>>>>>> Cell.class     Diff.class    Gener.class  MultiTrie2.class
> >>>>>>>>> Optimizer2.class  Reduce.class        Row.class
>  TestAll.class
> >>>>>>>>> TestLoad.class  Trie$StrEnum.class
> >>>>>>>>> Compile.class  DiffIt.class  Lift.class   MultiTrie.class
> >>>>>>>>> Optimizer.class   Reduce$Remap.class  Stock.class
>  Test.class
> >>>>>>>>> Trie.class
> >>>>>>>>>
> >>>>>>>>> org/getopt:
> >>>>>>>>> stempel
> >>>>>>>>>
> >>>>>>>>> org/getopt/stempel:
> >>>>>>>>> Benchmark.class  lucene  Stemmer.class
> >>>>>>>>>
> >>>>>>>>> org/getopt/stempel/lucene:
> >>>>>>>>> StempelAnalyzer.class  StempelFilter.class
> >>>>>>>>> jgodawa@ubuntu:~/apache-solr-1.4.1/ifaq/lib$ ls -R META-INF/
> >>>>>>>>> META-INF/:
> >>>>>>>>> MANIFEST.MF
> >>>>>>>>> jgodawa@ubuntu:~/apache-solr-1.4.1/ifaq/lib$ ls -R res
> >>>>>>>>> res:
> >>>>>>>>> tables
> >>>>>>>>>
> >>>>>>>>> res/tables:
> >>>>>>>>> readme.txt  stemmer_1000.out  stemmer_100.out
>  stemmer_2000.out
> >>>>>>>>> stemmer_200.out  stemmer_500.out  stemmer_700.out
> >>>>>>>>>
> >>>>>>>>> 2010/11/2 Bernd Fehling<be...@uni-bielefeld.de>:
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Hi Jakub,
> >>>>>>>>>>
> >>>>>>>>>> if you unzip your stempel-1.0.jar do you have the
> >>>>>>>>>> required directory structure and file in there?
> >>>>>>>>>> org/getopt/stempel/lucene/StempelFilter.class
> >>>>>>>>>>
> >>>>>>>>>> Regards,
> >>>>>>>>>> Bernd
> >>>>>>>>>>
> >>>>>>>>>> Am 02.11.2010 13:54, schrieb Jakub Godawa:
> >>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> Erick I've put the jar files like that before. I also
> added the
> >>>>>>>>>>> directive and put the file in instanceDir/lib
> >>>>>>>>>>>
> >>>>>>>>>>> What is still a problem is that even the files are loaded:
> >>>>>>>>>>> 2010-11-02 13:20:48
> org.apache.solr.core.SolrResourceLoader
> >>>>>>>>>>> replaceClassLoader
> >>>>>>>>>>> INFO: Adding
> >>>>>>>>>>>
> 'file:/home/jgodawa/apache-solr-1.4.1/ifaq/lib/stempel-1.0.jar'
> >>>>>>>>>>> to classloader
> >>>>>>>>>>>
> >>>>>>>>>>> I am not able to use the FilterFactory... maybe I am
> attempting it
> >>>>>>>>>>> in
> >>>>>>>>>>> a wrong way?
> >>>>>>>>>>>
> >>>>>>>>>>> Cheers,
> >>>>>>>>>>> Jakub Godawa.
> >>>>>>>>>>>
> >>>>>>>>>>> 2010/11/2 Erick Erickson<er...@gmail.com>:
> >>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> The polish stemmer jar file needs to be findable by Solr,
> if you
> >>>>>>>>>>>> copy
> >>>>>>>>>>>> it to<solr_home>/lib and restart solr you should be set.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Alternatively, you can add another<lib>  directive to the
> >>>>>>>>>>>> solrconfig.xml
> >>>>>>>>>>>> file
> >>>>>>>>>>>> (there are several examples in that file already).
> >>>>>>>>>>>>
> >>>>>>>>>>>> I'm a little confused about not being able to find
> TokenFilter,
> >>>>>>>>>>>> is that
> >>>>>>>>>>>> still
> >>>>>>>>>>>> a problem?
> >>>>>>>>>>>>
> >>>>>>>>>>>> HTH
> >>>>>>>>>>>> Erick
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Tue, Nov 2, 2010 at 8:07 AM, Jakub
> >>>>>>>>>>>> Godawa<ja...@gmail.com>  wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Thank you Bernd! I couldn't make it run though. Here is
> my
> >>>>>>>>>>>>> problem:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> 1. There is a file
> ~/apache-solr-1.4.1/ifaq/lib/stempel-1.0.jar
> >>>>>>>>>>>>> 2. In ~/apache-solr-1.4.1/ifaq/solr/conf/solrconfig.xml
> there is
> >>>>>>>>>>>>> a
> >>>>>>>>>>>>> directive:<lib path="../lib/stempel-1.0.jar" />
> >>>>>>>>>>>>> 3. In ~/apache-solr-1.4.1/ifaq/solr/conf/schema.xml
> there is
> >>>>>>>>>>>>> fieldType:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> (...)
> >>>>>>>>>>>>>  <!-- Polish -->
> >>>>>>>>>>>>>   <fieldType name="text_pl" class="solr.TextField">
> >>>>>>>>>>>>>    <analyzer>
> >>>>>>>>>>>>>       <tokenizer
> class="solr.WhitespaceTokenizerFactory"/>
> >>>>>>>>>>>>>      <filter class="solr.LowerCaseFilterFactory"/>
> >>>>>>>>>>>>>      <filter
> class="org.getopt.stempel.lucene.StempelFilter" />
> >>>>>>>>>>>>>      <!--<filter
> >>>>>>>>>>>>>
> class="org.getopt.solr.analysis.StempelTokenFilterFactory"
> >>>>>>>>>>>>> protected="protwords.txt" />  -->
> >>>>>>>>>>>>>    </analyzer>
> >>>>>>>>>>>>>  </fieldType>
> >>>>>>>>>>>>> (...)
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> 4. jar file is loaded but I got an error:
> >>>>>>>>>>>>> SEVERE: Could not start SOLR. Check solr/home property
> >>>>>>>>>>>>> java.lang.NoClassDefFoundError:
> >>>>>>>>>>>>> org/apache/lucene/analysis/TokenFilter
> >>>>>>>>>>>>>      at java.lang.ClassLoader.defineClass1(Native
> Method)
> >>>>>>>>>>>>>      at
> java.lang.ClassLoader.defineClass(ClassLoader.java:634)
> >>>>>>>>>>>>>      at
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
> >>>>>>>>>>>>> (...)
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> 5. Different class gave me that one:
> >>>>>>>>>>>>> SEVERE: org.apache.solr.common.SolrException: Error
> loading
> >>>>>>>>>>>>> class
> >>>>>>>>>>>>> 'org.getopt.solr.analysis.StempelTokenFilterFactory'
> >>>>>>>>>>>>>      at
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:375)
> >>>>>>>>>>>>>      at
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:390)
> >>>>>>>>>>>>> (...)
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Question is: How to make<fieldType />  and<filter />
>  work with
> >>>>>>>>>>>>> that
> >>>>>>>>>>>>> Stempel? :)
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Cheers,
> >>>>>>>>>>>>> Jakub Godawa.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> 2010/10/29 Bernd
> Fehling<be...@uni-bielefeld.de>:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Hi Jakub,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I have ported the KStemmer for use in most recent Solr
> trunk
> >>>>>>>>>>>>>> version.
> >>>>>>>>>>>>>> My stemmer is located in the lib directory of Solr
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> "solr/lib/KStemmer-2.00.jar"
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> because it belongs to Solr.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Write it as FilterFactory and use it as Filter like:
> >>>>>>>>>>>>>> <filter
> class="de.ubbielefeld.solr.analysis.KStemFilterFactory"
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> protected="protwords.txt" />
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> This is how my fieldType looks like:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>    <fieldType name="text_kstem" class="solr.TextField"
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> positionIncrementGap="100">
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>      <analyzer type="index">
> >>>>>>>>>>>>>>        <tokenizer
> class="solr.WhitespaceTokenizerFactory" />
> >>>>>>>>>>>>>>        <filter class="solr.StopFilterFactory"
> ignoreCase="true"
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> words="stopwords.txt"
> enablePositionIncrements="false" />
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>        <filter class="solr.WordDelimiterFilterFactory"
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> generateWordParts="1" generateNumberParts="1"
> catenateWords="1"
> >>>>>>>>>>>>> catenateNumbers="1"
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> catenateAll="0" splitOnCaseChange="1" />
> >>>>>>>>>>>>>>        <filter class="solr.LowerCaseFilterFactory" />
> >>>>>>>>>>>>>>        <filter
> >>>>>>>>>>>>>> class="de.ubbielefeld.solr.analysis.KStemFilterFactory"
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> protected="protwords.txt" />
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>        <filter
> class="solr.RemoveDuplicatesTokenFilterFactory"
> >>>>>>>>>>>>>> />
> >>>>>>>>>>>>>>      </analyzer>
> >>>>>>>>>>>>>>      <analyzer type="query">
> >>>>>>>>>>>>>>        <tokenizer
> class="solr.WhitespaceTokenizerFactory" />
> >>>>>>>>>>>>>>        <filter class="solr.StopFilterFactory"
> ignoreCase="true"
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> words="stopwords.txt" />
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>        <filter class="solr.WordDelimiterFilterFactory"
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> generateWordParts="1" generateNumberParts="1"
> catenateWords="0"
> >>>>>>>>>>>>> catenateNumbers="0"
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> catenateAll="0" splitOnCaseChange="1" />
> >>>>>>>>>>>>>>        <filter class="solr.LowerCaseFilterFactory" />
> >>>>>>>>>>>>>>        <filter
> >>>>>>>>>>>>>> class="de.ubbielefeld.solr.analysis.KStemFilterFactory"
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> protected="protwords.txt" />
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>        <filter
> class="solr.RemoveDuplicatesTokenFilterFactory"
> >>>>>>>>>>>>>> />
> >>>>>>>>>>>>>>      </analyzer>
> >>>>>>>>>>>>>>    </fieldType>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Regards,
> >>>>>>>>>>>>>> Bernd
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Am 28.10.2010 14:56, schrieb Jakub Godawa:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Hi!
> >>>>>>>>>>>>>>> There is a polish stemmer
> http://www.getopt.org/stempel/ and I
> >>>>>>>>>>>>>>> have
> >>>>>>>>>>>>>>> problems connecting it with solr 1.4.1
> >>>>>>>>>>>>>>> Questions:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> 1. Where EXACTLY do I put "stemper-1.0.jar" file?
> >>>>>>>>>>>>>>> 2. How do I register the file, so I can build a
> fieldType
> >>>>>>>>>>>>>>> like:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> <fieldType name="text_pl" class="solr.TextField">
> >>>>>>>>>>>>>>>   <analyzer
> >>>>>>>>>>>>>>>
> class="org.geoopt.solr.analysis.StempelTokenFilterFactory"/>
> >>>>>>>>>>>>>>> </fieldType>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> 3. Is that the right approach to make it work?
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Thanks for verbose explanation,
> >>>>>>>>>>>>>>> Jakub.
> >>>>>>>>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> Lance Norskog
> >>>>>> goksron@gmail.com
> >>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>> --
> >>>> Lance Norskog
> >>>> goksron@gmail.com
> >>>>
> >>>>
> >>
> >


Re: How to use polish stemmer - Stempel - in schema.xml?

Posted by Robert Muir <rc...@gmail.com>.
https://issues.apache.org/jira/browse/SOLR-2237

On Mon, Nov 15, 2010 at 5:04 AM, Jakub Godawa <ja...@gmail.com> wrote:
> I tried to reach the autors twice, but with no luck. I've seen some
> posts where people finally were able to lunch it (without much pain).
> I don't know. If any pro would be so nice to try to run the stempel on
> his/her machine and paste me some verbose step by step solution I
> would really appreciate.
>
> Cheers,
> Jakub Godawa.
>
> 2010/11/13 Lance Norskog <go...@gmail.com>:
>> I don't know of the Stempel jar includes the Java source. At this point I
>> think you should ask the author to Stempel to make a Solr front-end for it.
>> It's very simple for him.
>>
>> Jakub Godawa wrote:
>>>
>>> Am I not doing it in the point no 4? I am compiling all the folder
>>> that was extracted before, but now with that new class file.
>>>
>>> 2010/11/12 Lance Norskog<go...@gmail.com>:
>>>
>>>>
>>>> I think you have to compile all of the stempel source including your
>>>> filter factory into one jar at the same time. Everybody does this; I
>>>> don't know how different Java versions make class file binaries.
>>>>
>>>> On Thu, Nov 11, 2010 at 3:06 AM, Jakub Godawa<ja...@gmail.com>
>>>>  wrote:
>>>>
>>>>>
>>>>> Hi! Sorry for such a break, but I was moving house... anyway:
>>>>>
>>>>> 1. I took the
>>>>> ~/apache-solr/src/java/org/apache/solr/analysis/StandardFilterFactory.java
>>>>> file and modified it (named as StempelFilterFactory.java) in Vim that
>>>>> way:
>>>>>
>>>>> package org.getopt.solr.analysis;
>>>>>
>>>>> import org.apache.lucene.analysis.TokenStream;
>>>>> import org.apache.lucene.analysis.standard.StandardFilter;
>>>>>
>>>>> public class StempelTokenFilterFactory extends BaseTokenFilterFactory {
>>>>>  public StempelFilter create(TokenStream input) {
>>>>>    return new StempelFilter(input);
>>>>>  }
>>>>> }
>>>>>
>>>>> 2. Then I put the file to the extracted stempel-1.0.jar in
>>>>> ./org/getopt/solr/analysis/
>>>>> 3. Then I created a class from it: jar -cf
>>>>> StempelTokenFilterFactory.class StempelFilterFactory.java
>>>>> 4. Then I created new stempel-1.0.jar archive: jar -cf stempel-1.0.jar
>>>>> -C ./stempel-1.0/ .
>>>>> 5. Then in schema.xml I've put:
>>>>>
>>>>>    <fieldType name="text_pl" class="solr.TextField">
>>>>>      <analyzer>
>>>>>        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>>>>        <filter class="solr.LowerCaseFilterFactory"/>
>>>>>        <filter
>>>>> class="org.getopt.solr.analysis.StempelTokenFilterFactory" />
>>>>>      </analyzer>
>>>>>    </fieldType>
>>>>>
>>>>> 6. I started the solr server and I recieved the following error:
>>>>>
>>>>> 2010-11-11 11:50:56 org.apache.solr.common.SolrException log
>>>>> SEVERE: java.lang.ClassFormatError: Incompatible magic value
>>>>> 1347093252 in class file
>>>>> org/getopt/solr/analysis/StempelTokenFilterFactory
>>>>>        at java.lang.ClassLoader.defineClass1(Native Method)
>>>>>        at java.lang.ClassLoader.defineClass(ClassLoader.java:634)
>>>>>        at
>>>>> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
>>>>> ...
>>>>>
>>>>> Question: What is wrong? :) I use "jar (fastjar) 0.98" to create jars,
>>>>> I googled on that error but with no answer gave me idea what is wrong
>>>>> in my .java file.
>>>>>
>>>>> Please help, as I believe I am close to the end of that subject.
>>>>>
>>>>> Cheers,
>>>>> Jakub Godawa.
>>>>>
>>>>> 2010/11/3 Lance Norskog<go...@gmail.com>:
>>>>>
>>>>>>
>>>>>> Here's the problem: Solr is a little dumb about these Filter classes,
>>>>>> and so you have to make a Factory object for the Stempel Filter.
>>>>>>
>>>>>> There are a lot of other FilterFactory classes. You would have to just
>>>>>> copy one and change the names to Stempel and it might actually work.
>>>>>>
>>>>>> This will take some Solr programming- perhaps the author can help you?
>>>>>>
>>>>>> On Tue, Nov 2, 2010 at 7:08 AM, Jakub Godawa<ja...@gmail.com>
>>>>>>  wrote:
>>>>>>
>>>>>>>
>>>>>>> Sorry, I am not Java programmer at all. I would appreciate more
>>>>>>> verbose (or step by step) help.
>>>>>>>
>>>>>>> 2010/11/2 Bernd Fehling<be...@uni-bielefeld.de>:
>>>>>>>
>>>>>>>>
>>>>>>>> So you call org.getopt.solr.analysis.StempelTokenFilterFactory.
>>>>>>>> In this case I would assume a file StempelTokenFilterFactory.class
>>>>>>>> in your directory org/getopt/solr/analysis/.
>>>>>>>>
>>>>>>>> And a class which extends the BaseTokenFilterFactory rigth?
>>>>>>>> ...
>>>>>>>> public class StempelTokenFilterFactory extends BaseTokenFilterFactory
>>>>>>>> implements ResourceLoaderAware {
>>>>>>>> ...
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Am 02.11.2010 14:20, schrieb Jakub Godawa:
>>>>>>>>
>>>>>>>>>
>>>>>>>>> This is what stempel-1.0.jar consist of after jar -xf:
>>>>>>>>>
>>>>>>>>> jgodawa@ubuntu:~/apache-solr-1.4.1/ifaq/lib$ ls -R org/
>>>>>>>>> org/:
>>>>>>>>> egothor  getopt
>>>>>>>>>
>>>>>>>>> org/egothor:
>>>>>>>>> stemmer
>>>>>>>>>
>>>>>>>>> org/egothor/stemmer:
>>>>>>>>> Cell.class     Diff.class    Gener.class  MultiTrie2.class
>>>>>>>>> Optimizer2.class  Reduce.class        Row.class    TestAll.class
>>>>>>>>> TestLoad.class  Trie$StrEnum.class
>>>>>>>>> Compile.class  DiffIt.class  Lift.class   MultiTrie.class
>>>>>>>>> Optimizer.class   Reduce$Remap.class  Stock.class  Test.class
>>>>>>>>> Trie.class
>>>>>>>>>
>>>>>>>>> org/getopt:
>>>>>>>>> stempel
>>>>>>>>>
>>>>>>>>> org/getopt/stempel:
>>>>>>>>> Benchmark.class  lucene  Stemmer.class
>>>>>>>>>
>>>>>>>>> org/getopt/stempel/lucene:
>>>>>>>>> StempelAnalyzer.class  StempelFilter.class
>>>>>>>>> jgodawa@ubuntu:~/apache-solr-1.4.1/ifaq/lib$ ls -R META-INF/
>>>>>>>>> META-INF/:
>>>>>>>>> MANIFEST.MF
>>>>>>>>> jgodawa@ubuntu:~/apache-solr-1.4.1/ifaq/lib$ ls -R res
>>>>>>>>> res:
>>>>>>>>> tables
>>>>>>>>>
>>>>>>>>> res/tables:
>>>>>>>>> readme.txt  stemmer_1000.out  stemmer_100.out  stemmer_2000.out
>>>>>>>>> stemmer_200.out  stemmer_500.out  stemmer_700.out
>>>>>>>>>
>>>>>>>>> 2010/11/2 Bernd Fehling<be...@uni-bielefeld.de>:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Hi Jakub,
>>>>>>>>>>
>>>>>>>>>> if you unzip your stempel-1.0.jar do you have the
>>>>>>>>>> required directory structure and file in there?
>>>>>>>>>> org/getopt/stempel/lucene/StempelFilter.class
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> Bernd
>>>>>>>>>>
>>>>>>>>>> Am 02.11.2010 13:54, schrieb Jakub Godawa:
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Erick I've put the jar files like that before. I also added the
>>>>>>>>>>> directive and put the file in instanceDir/lib
>>>>>>>>>>>
>>>>>>>>>>> What is still a problem is that even the files are loaded:
>>>>>>>>>>> 2010-11-02 13:20:48 org.apache.solr.core.SolrResourceLoader
>>>>>>>>>>> replaceClassLoader
>>>>>>>>>>> INFO: Adding
>>>>>>>>>>> 'file:/home/jgodawa/apache-solr-1.4.1/ifaq/lib/stempel-1.0.jar'
>>>>>>>>>>> to classloader
>>>>>>>>>>>
>>>>>>>>>>> I am not able to use the FilterFactory... maybe I am attempting it
>>>>>>>>>>> in
>>>>>>>>>>> a wrong way?
>>>>>>>>>>>
>>>>>>>>>>> Cheers,
>>>>>>>>>>> Jakub Godawa.
>>>>>>>>>>>
>>>>>>>>>>> 2010/11/2 Erick Erickson<er...@gmail.com>:
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> The polish stemmer jar file needs to be findable by Solr, if you
>>>>>>>>>>>> copy
>>>>>>>>>>>> it to<solr_home>/lib and restart solr you should be set.
>>>>>>>>>>>>
>>>>>>>>>>>> Alternatively, you can add another<lib>  directive to the
>>>>>>>>>>>> solrconfig.xml
>>>>>>>>>>>> file
>>>>>>>>>>>> (there are several examples in that file already).
>>>>>>>>>>>>
>>>>>>>>>>>> I'm a little confused about not being able to find TokenFilter,
>>>>>>>>>>>> is that
>>>>>>>>>>>> still
>>>>>>>>>>>> a problem?
>>>>>>>>>>>>
>>>>>>>>>>>> HTH
>>>>>>>>>>>> Erick
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Nov 2, 2010 at 8:07 AM, Jakub
>>>>>>>>>>>> Godawa<ja...@gmail.com>  wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thank you Bernd! I couldn't make it run though. Here is my
>>>>>>>>>>>>> problem:
>>>>>>>>>>>>>
>>>>>>>>>>>>> 1. There is a file ~/apache-solr-1.4.1/ifaq/lib/stempel-1.0.jar
>>>>>>>>>>>>> 2. In ~/apache-solr-1.4.1/ifaq/solr/conf/solrconfig.xml there is
>>>>>>>>>>>>> a
>>>>>>>>>>>>> directive:<lib path="../lib/stempel-1.0.jar" />
>>>>>>>>>>>>> 3. In ~/apache-solr-1.4.1/ifaq/solr/conf/schema.xml there is
>>>>>>>>>>>>> fieldType:
>>>>>>>>>>>>>
>>>>>>>>>>>>> (...)
>>>>>>>>>>>>>  <!-- Polish -->
>>>>>>>>>>>>>   <fieldType name="text_pl" class="solr.TextField">
>>>>>>>>>>>>>    <analyzer>
>>>>>>>>>>>>>       <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>>>>>>>>>>>>      <filter class="solr.LowerCaseFilterFactory"/>
>>>>>>>>>>>>>      <filter class="org.getopt.stempel.lucene.StempelFilter" />
>>>>>>>>>>>>>      <!--<filter
>>>>>>>>>>>>> class="org.getopt.solr.analysis.StempelTokenFilterFactory"
>>>>>>>>>>>>> protected="protwords.txt" />  -->
>>>>>>>>>>>>>    </analyzer>
>>>>>>>>>>>>>  </fieldType>
>>>>>>>>>>>>> (...)
>>>>>>>>>>>>>
>>>>>>>>>>>>> 4. jar file is loaded but I got an error:
>>>>>>>>>>>>> SEVERE: Could not start SOLR. Check solr/home property
>>>>>>>>>>>>> java.lang.NoClassDefFoundError:
>>>>>>>>>>>>> org/apache/lucene/analysis/TokenFilter
>>>>>>>>>>>>>      at java.lang.ClassLoader.defineClass1(Native Method)
>>>>>>>>>>>>>      at java.lang.ClassLoader.defineClass(ClassLoader.java:634)
>>>>>>>>>>>>>      at
>>>>>>>>>>>>>
>>>>>>>>>>>>> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
>>>>>>>>>>>>> (...)
>>>>>>>>>>>>>
>>>>>>>>>>>>> 5. Different class gave me that one:
>>>>>>>>>>>>> SEVERE: org.apache.solr.common.SolrException: Error loading
>>>>>>>>>>>>> class
>>>>>>>>>>>>> 'org.getopt.solr.analysis.StempelTokenFilterFactory'
>>>>>>>>>>>>>      at
>>>>>>>>>>>>>
>>>>>>>>>>>>> org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:375)
>>>>>>>>>>>>>      at
>>>>>>>>>>>>>
>>>>>>>>>>>>> org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:390)
>>>>>>>>>>>>> (...)
>>>>>>>>>>>>>
>>>>>>>>>>>>> Question is: How to make<fieldType />  and<filter />  work with
>>>>>>>>>>>>> that
>>>>>>>>>>>>> Stempel? :)
>>>>>>>>>>>>>
>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>> Jakub Godawa.
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2010/10/29 Bernd Fehling<be...@uni-bielefeld.de>:
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Jakub,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I have ported the KStemmer for use in most recent Solr trunk
>>>>>>>>>>>>>> version.
>>>>>>>>>>>>>> My stemmer is located in the lib directory of Solr
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> "solr/lib/KStemmer-2.00.jar"
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> because it belongs to Solr.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Write it as FilterFactory and use it as Filter like:
>>>>>>>>>>>>>> <filter class="de.ubbielefeld.solr.analysis.KStemFilterFactory"
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> protected="protwords.txt" />
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This is how my fieldType looks like:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>    <fieldType name="text_kstem" class="solr.TextField"
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> positionIncrementGap="100">
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>      <analyzer type="index">
>>>>>>>>>>>>>>        <tokenizer class="solr.WhitespaceTokenizerFactory" />
>>>>>>>>>>>>>>        <filter class="solr.StopFilterFactory" ignoreCase="true"
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> words="stopwords.txt" enablePositionIncrements="false" />
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>        <filter class="solr.WordDelimiterFilterFactory"
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> generateWordParts="1" generateNumberParts="1" catenateWords="1"
>>>>>>>>>>>>> catenateNumbers="1"
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> catenateAll="0" splitOnCaseChange="1" />
>>>>>>>>>>>>>>        <filter class="solr.LowerCaseFilterFactory" />
>>>>>>>>>>>>>>        <filter
>>>>>>>>>>>>>> class="de.ubbielefeld.solr.analysis.KStemFilterFactory"
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> protected="protwords.txt" />
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>        <filter class="solr.RemoveDuplicatesTokenFilterFactory"
>>>>>>>>>>>>>> />
>>>>>>>>>>>>>>      </analyzer>
>>>>>>>>>>>>>>      <analyzer type="query">
>>>>>>>>>>>>>>        <tokenizer class="solr.WhitespaceTokenizerFactory" />
>>>>>>>>>>>>>>        <filter class="solr.StopFilterFactory" ignoreCase="true"
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> words="stopwords.txt" />
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>        <filter class="solr.WordDelimiterFilterFactory"
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> generateWordParts="1" generateNumberParts="1" catenateWords="0"
>>>>>>>>>>>>> catenateNumbers="0"
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> catenateAll="0" splitOnCaseChange="1" />
>>>>>>>>>>>>>>        <filter class="solr.LowerCaseFilterFactory" />
>>>>>>>>>>>>>>        <filter
>>>>>>>>>>>>>> class="de.ubbielefeld.solr.analysis.KStemFilterFactory"
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> protected="protwords.txt" />
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>        <filter class="solr.RemoveDuplicatesTokenFilterFactory"
>>>>>>>>>>>>>> />
>>>>>>>>>>>>>>      </analyzer>
>>>>>>>>>>>>>>    </fieldType>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>> Bernd
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Am 28.10.2010 14:56, schrieb Jakub Godawa:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi!
>>>>>>>>>>>>>>> There is a polish stemmer http://www.getopt.org/stempel/ and I
>>>>>>>>>>>>>>> have
>>>>>>>>>>>>>>> problems connecting it with solr 1.4.1
>>>>>>>>>>>>>>> Questions:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 1. Where EXACTLY do I put "stemper-1.0.jar" file?
>>>>>>>>>>>>>>> 2. How do I register the file, so I can build a fieldType
>>>>>>>>>>>>>>> like:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> <fieldType name="text_pl" class="solr.TextField">
>>>>>>>>>>>>>>>   <analyzer
>>>>>>>>>>>>>>> class="org.geoopt.solr.analysis.StempelTokenFilterFactory"/>
>>>>>>>>>>>>>>> </fieldType>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 3. Is that the right approach to make it work?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks for verbose explanation,
>>>>>>>>>>>>>>> Jakub.
>>>>>>>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Lance Norskog
>>>>>> goksron@gmail.com
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Lance Norskog
>>>> goksron@gmail.com
>>>>
>>>>
>>
>

Re: How to use polish stemmer - Stempel - in schema.xml?

Posted by Jakub Godawa <ja...@gmail.com>.
I tried to reach the autors twice, but with no luck. I've seen some
posts where people finally were able to lunch it (without much pain).
I don't know. If any pro would be so nice to try to run the stempel on
his/her machine and paste me some verbose step by step solution I
would really appreciate.

Cheers,
Jakub Godawa.

2010/11/13 Lance Norskog <go...@gmail.com>:
> I don't know of the Stempel jar includes the Java source. At this point I
> think you should ask the author to Stempel to make a Solr front-end for it.
> It's very simple for him.
>
> Jakub Godawa wrote:
>>
>> Am I not doing it in the point no 4? I am compiling all the folder
>> that was extracted before, but now with that new class file.
>>
>> 2010/11/12 Lance Norskog<go...@gmail.com>:
>>
>>>
>>> I think you have to compile all of the stempel source including your
>>> filter factory into one jar at the same time. Everybody does this; I
>>> don't know how different Java versions make class file binaries.
>>>
>>> On Thu, Nov 11, 2010 at 3:06 AM, Jakub Godawa<ja...@gmail.com>
>>>  wrote:
>>>
>>>>
>>>> Hi! Sorry for such a break, but I was moving house... anyway:
>>>>
>>>> 1. I took the
>>>> ~/apache-solr/src/java/org/apache/solr/analysis/StandardFilterFactory.java
>>>> file and modified it (named as StempelFilterFactory.java) in Vim that
>>>> way:
>>>>
>>>> package org.getopt.solr.analysis;
>>>>
>>>> import org.apache.lucene.analysis.TokenStream;
>>>> import org.apache.lucene.analysis.standard.StandardFilter;
>>>>
>>>> public class StempelTokenFilterFactory extends BaseTokenFilterFactory {
>>>>  public StempelFilter create(TokenStream input) {
>>>>    return new StempelFilter(input);
>>>>  }
>>>> }
>>>>
>>>> 2. Then I put the file to the extracted stempel-1.0.jar in
>>>> ./org/getopt/solr/analysis/
>>>> 3. Then I created a class from it: jar -cf
>>>> StempelTokenFilterFactory.class StempelFilterFactory.java
>>>> 4. Then I created new stempel-1.0.jar archive: jar -cf stempel-1.0.jar
>>>> -C ./stempel-1.0/ .
>>>> 5. Then in schema.xml I've put:
>>>>
>>>>    <fieldType name="text_pl" class="solr.TextField">
>>>>      <analyzer>
>>>>        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>>>        <filter class="solr.LowerCaseFilterFactory"/>
>>>>        <filter
>>>> class="org.getopt.solr.analysis.StempelTokenFilterFactory" />
>>>>      </analyzer>
>>>>    </fieldType>
>>>>
>>>> 6. I started the solr server and I recieved the following error:
>>>>
>>>> 2010-11-11 11:50:56 org.apache.solr.common.SolrException log
>>>> SEVERE: java.lang.ClassFormatError: Incompatible magic value
>>>> 1347093252 in class file
>>>> org/getopt/solr/analysis/StempelTokenFilterFactory
>>>>        at java.lang.ClassLoader.defineClass1(Native Method)
>>>>        at java.lang.ClassLoader.defineClass(ClassLoader.java:634)
>>>>        at
>>>> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
>>>> ...
>>>>
>>>> Question: What is wrong? :) I use "jar (fastjar) 0.98" to create jars,
>>>> I googled on that error but with no answer gave me idea what is wrong
>>>> in my .java file.
>>>>
>>>> Please help, as I believe I am close to the end of that subject.
>>>>
>>>> Cheers,
>>>> Jakub Godawa.
>>>>
>>>> 2010/11/3 Lance Norskog<go...@gmail.com>:
>>>>
>>>>>
>>>>> Here's the problem: Solr is a little dumb about these Filter classes,
>>>>> and so you have to make a Factory object for the Stempel Filter.
>>>>>
>>>>> There are a lot of other FilterFactory classes. You would have to just
>>>>> copy one and change the names to Stempel and it might actually work.
>>>>>
>>>>> This will take some Solr programming- perhaps the author can help you?
>>>>>
>>>>> On Tue, Nov 2, 2010 at 7:08 AM, Jakub Godawa<ja...@gmail.com>
>>>>>  wrote:
>>>>>
>>>>>>
>>>>>> Sorry, I am not Java programmer at all. I would appreciate more
>>>>>> verbose (or step by step) help.
>>>>>>
>>>>>> 2010/11/2 Bernd Fehling<be...@uni-bielefeld.de>:
>>>>>>
>>>>>>>
>>>>>>> So you call org.getopt.solr.analysis.StempelTokenFilterFactory.
>>>>>>> In this case I would assume a file StempelTokenFilterFactory.class
>>>>>>> in your directory org/getopt/solr/analysis/.
>>>>>>>
>>>>>>> And a class which extends the BaseTokenFilterFactory rigth?
>>>>>>> ...
>>>>>>> public class StempelTokenFilterFactory extends BaseTokenFilterFactory
>>>>>>> implements ResourceLoaderAware {
>>>>>>> ...
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Am 02.11.2010 14:20, schrieb Jakub Godawa:
>>>>>>>
>>>>>>>>
>>>>>>>> This is what stempel-1.0.jar consist of after jar -xf:
>>>>>>>>
>>>>>>>> jgodawa@ubuntu:~/apache-solr-1.4.1/ifaq/lib$ ls -R org/
>>>>>>>> org/:
>>>>>>>> egothor  getopt
>>>>>>>>
>>>>>>>> org/egothor:
>>>>>>>> stemmer
>>>>>>>>
>>>>>>>> org/egothor/stemmer:
>>>>>>>> Cell.class     Diff.class    Gener.class  MultiTrie2.class
>>>>>>>> Optimizer2.class  Reduce.class        Row.class    TestAll.class
>>>>>>>> TestLoad.class  Trie$StrEnum.class
>>>>>>>> Compile.class  DiffIt.class  Lift.class   MultiTrie.class
>>>>>>>> Optimizer.class   Reduce$Remap.class  Stock.class  Test.class
>>>>>>>> Trie.class
>>>>>>>>
>>>>>>>> org/getopt:
>>>>>>>> stempel
>>>>>>>>
>>>>>>>> org/getopt/stempel:
>>>>>>>> Benchmark.class  lucene  Stemmer.class
>>>>>>>>
>>>>>>>> org/getopt/stempel/lucene:
>>>>>>>> StempelAnalyzer.class  StempelFilter.class
>>>>>>>> jgodawa@ubuntu:~/apache-solr-1.4.1/ifaq/lib$ ls -R META-INF/
>>>>>>>> META-INF/:
>>>>>>>> MANIFEST.MF
>>>>>>>> jgodawa@ubuntu:~/apache-solr-1.4.1/ifaq/lib$ ls -R res
>>>>>>>> res:
>>>>>>>> tables
>>>>>>>>
>>>>>>>> res/tables:
>>>>>>>> readme.txt  stemmer_1000.out  stemmer_100.out  stemmer_2000.out
>>>>>>>> stemmer_200.out  stemmer_500.out  stemmer_700.out
>>>>>>>>
>>>>>>>> 2010/11/2 Bernd Fehling<be...@uni-bielefeld.de>:
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hi Jakub,
>>>>>>>>>
>>>>>>>>> if you unzip your stempel-1.0.jar do you have the
>>>>>>>>> required directory structure and file in there?
>>>>>>>>> org/getopt/stempel/lucene/StempelFilter.class
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Bernd
>>>>>>>>>
>>>>>>>>> Am 02.11.2010 13:54, schrieb Jakub Godawa:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Erick I've put the jar files like that before. I also added the
>>>>>>>>>> directive and put the file in instanceDir/lib
>>>>>>>>>>
>>>>>>>>>> What is still a problem is that even the files are loaded:
>>>>>>>>>> 2010-11-02 13:20:48 org.apache.solr.core.SolrResourceLoader
>>>>>>>>>> replaceClassLoader
>>>>>>>>>> INFO: Adding
>>>>>>>>>> 'file:/home/jgodawa/apache-solr-1.4.1/ifaq/lib/stempel-1.0.jar'
>>>>>>>>>> to classloader
>>>>>>>>>>
>>>>>>>>>> I am not able to use the FilterFactory... maybe I am attempting it
>>>>>>>>>> in
>>>>>>>>>> a wrong way?
>>>>>>>>>>
>>>>>>>>>> Cheers,
>>>>>>>>>> Jakub Godawa.
>>>>>>>>>>
>>>>>>>>>> 2010/11/2 Erick Erickson<er...@gmail.com>:
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> The polish stemmer jar file needs to be findable by Solr, if you
>>>>>>>>>>> copy
>>>>>>>>>>> it to<solr_home>/lib and restart solr you should be set.
>>>>>>>>>>>
>>>>>>>>>>> Alternatively, you can add another<lib>  directive to the
>>>>>>>>>>> solrconfig.xml
>>>>>>>>>>> file
>>>>>>>>>>> (there are several examples in that file already).
>>>>>>>>>>>
>>>>>>>>>>> I'm a little confused about not being able to find TokenFilter,
>>>>>>>>>>> is that
>>>>>>>>>>> still
>>>>>>>>>>> a problem?
>>>>>>>>>>>
>>>>>>>>>>> HTH
>>>>>>>>>>> Erick
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Nov 2, 2010 at 8:07 AM, Jakub
>>>>>>>>>>> Godawa<ja...@gmail.com>  wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Thank you Bernd! I couldn't make it run though. Here is my
>>>>>>>>>>>> problem:
>>>>>>>>>>>>
>>>>>>>>>>>> 1. There is a file ~/apache-solr-1.4.1/ifaq/lib/stempel-1.0.jar
>>>>>>>>>>>> 2. In ~/apache-solr-1.4.1/ifaq/solr/conf/solrconfig.xml there is
>>>>>>>>>>>> a
>>>>>>>>>>>> directive:<lib path="../lib/stempel-1.0.jar" />
>>>>>>>>>>>> 3. In ~/apache-solr-1.4.1/ifaq/solr/conf/schema.xml there is
>>>>>>>>>>>> fieldType:
>>>>>>>>>>>>
>>>>>>>>>>>> (...)
>>>>>>>>>>>>  <!-- Polish -->
>>>>>>>>>>>>   <fieldType name="text_pl" class="solr.TextField">
>>>>>>>>>>>>    <analyzer>
>>>>>>>>>>>>       <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>>>>>>>>>>>      <filter class="solr.LowerCaseFilterFactory"/>
>>>>>>>>>>>>      <filter class="org.getopt.stempel.lucene.StempelFilter" />
>>>>>>>>>>>>      <!--<filter
>>>>>>>>>>>> class="org.getopt.solr.analysis.StempelTokenFilterFactory"
>>>>>>>>>>>> protected="protwords.txt" />  -->
>>>>>>>>>>>>    </analyzer>
>>>>>>>>>>>>  </fieldType>
>>>>>>>>>>>> (...)
>>>>>>>>>>>>
>>>>>>>>>>>> 4. jar file is loaded but I got an error:
>>>>>>>>>>>> SEVERE: Could not start SOLR. Check solr/home property
>>>>>>>>>>>> java.lang.NoClassDefFoundError:
>>>>>>>>>>>> org/apache/lucene/analysis/TokenFilter
>>>>>>>>>>>>      at java.lang.ClassLoader.defineClass1(Native Method)
>>>>>>>>>>>>      at java.lang.ClassLoader.defineClass(ClassLoader.java:634)
>>>>>>>>>>>>      at
>>>>>>>>>>>>
>>>>>>>>>>>> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
>>>>>>>>>>>> (...)
>>>>>>>>>>>>
>>>>>>>>>>>> 5. Different class gave me that one:
>>>>>>>>>>>> SEVERE: org.apache.solr.common.SolrException: Error loading
>>>>>>>>>>>> class
>>>>>>>>>>>> 'org.getopt.solr.analysis.StempelTokenFilterFactory'
>>>>>>>>>>>>      at
>>>>>>>>>>>>
>>>>>>>>>>>> org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:375)
>>>>>>>>>>>>      at
>>>>>>>>>>>>
>>>>>>>>>>>> org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:390)
>>>>>>>>>>>> (...)
>>>>>>>>>>>>
>>>>>>>>>>>> Question is: How to make<fieldType />  and<filter />  work with
>>>>>>>>>>>> that
>>>>>>>>>>>> Stempel? :)
>>>>>>>>>>>>
>>>>>>>>>>>> Cheers,
>>>>>>>>>>>> Jakub Godawa.
>>>>>>>>>>>>
>>>>>>>>>>>> 2010/10/29 Bernd Fehling<be...@uni-bielefeld.de>:
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Jakub,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I have ported the KStemmer for use in most recent Solr trunk
>>>>>>>>>>>>> version.
>>>>>>>>>>>>> My stemmer is located in the lib directory of Solr
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> "solr/lib/KStemmer-2.00.jar"
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> because it belongs to Solr.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Write it as FilterFactory and use it as Filter like:
>>>>>>>>>>>>> <filter class="de.ubbielefeld.solr.analysis.KStemFilterFactory"
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> protected="protwords.txt" />
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> This is how my fieldType looks like:
>>>>>>>>>>>>>
>>>>>>>>>>>>>    <fieldType name="text_kstem" class="solr.TextField"
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> positionIncrementGap="100">
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>      <analyzer type="index">
>>>>>>>>>>>>>        <tokenizer class="solr.WhitespaceTokenizerFactory" />
>>>>>>>>>>>>>        <filter class="solr.StopFilterFactory" ignoreCase="true"
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> words="stopwords.txt" enablePositionIncrements="false" />
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>        <filter class="solr.WordDelimiterFilterFactory"
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> generateWordParts="1" generateNumberParts="1" catenateWords="1"
>>>>>>>>>>>> catenateNumbers="1"
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> catenateAll="0" splitOnCaseChange="1" />
>>>>>>>>>>>>>        <filter class="solr.LowerCaseFilterFactory" />
>>>>>>>>>>>>>        <filter
>>>>>>>>>>>>> class="de.ubbielefeld.solr.analysis.KStemFilterFactory"
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> protected="protwords.txt" />
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>        <filter class="solr.RemoveDuplicatesTokenFilterFactory"
>>>>>>>>>>>>> />
>>>>>>>>>>>>>      </analyzer>
>>>>>>>>>>>>>      <analyzer type="query">
>>>>>>>>>>>>>        <tokenizer class="solr.WhitespaceTokenizerFactory" />
>>>>>>>>>>>>>        <filter class="solr.StopFilterFactory" ignoreCase="true"
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> words="stopwords.txt" />
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>        <filter class="solr.WordDelimiterFilterFactory"
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> generateWordParts="1" generateNumberParts="1" catenateWords="0"
>>>>>>>>>>>> catenateNumbers="0"
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> catenateAll="0" splitOnCaseChange="1" />
>>>>>>>>>>>>>        <filter class="solr.LowerCaseFilterFactory" />
>>>>>>>>>>>>>        <filter
>>>>>>>>>>>>> class="de.ubbielefeld.solr.analysis.KStemFilterFactory"
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> protected="protwords.txt" />
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>        <filter class="solr.RemoveDuplicatesTokenFilterFactory"
>>>>>>>>>>>>> />
>>>>>>>>>>>>>      </analyzer>
>>>>>>>>>>>>>    </fieldType>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>> Bernd
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Am 28.10.2010 14:56, schrieb Jakub Godawa:
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi!
>>>>>>>>>>>>>> There is a polish stemmer http://www.getopt.org/stempel/ and I
>>>>>>>>>>>>>> have
>>>>>>>>>>>>>> problems connecting it with solr 1.4.1
>>>>>>>>>>>>>> Questions:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 1. Where EXACTLY do I put "stemper-1.0.jar" file?
>>>>>>>>>>>>>> 2. How do I register the file, so I can build a fieldType
>>>>>>>>>>>>>> like:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> <fieldType name="text_pl" class="solr.TextField">
>>>>>>>>>>>>>>   <analyzer
>>>>>>>>>>>>>> class="org.geoopt.solr.analysis.StempelTokenFilterFactory"/>
>>>>>>>>>>>>>> </fieldType>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 3. Is that the right approach to make it work?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks for verbose explanation,
>>>>>>>>>>>>>> Jakub.
>>>>>>>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Lance Norskog
>>>>> goksron@gmail.com
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Lance Norskog
>>> goksron@gmail.com
>>>
>>>
>

Re: How to use polish stemmer - Stempel - in schema.xml?

Posted by Lance Norskog <go...@gmail.com>.
I don't know of the Stempel jar includes the Java source. At this point 
I think you should ask the author to Stempel to make a Solr front-end 
for it. It's very simple for him.

Jakub Godawa wrote:
> Am I not doing it in the point no 4? I am compiling all the folder
> that was extracted before, but now with that new class file.
>
> 2010/11/12 Lance Norskog<go...@gmail.com>:
>    
>> I think you have to compile all of the stempel source including your
>> filter factory into one jar at the same time. Everybody does this; I
>> don't know how different Java versions make class file binaries.
>>
>> On Thu, Nov 11, 2010 at 3:06 AM, Jakub Godawa<ja...@gmail.com>  wrote:
>>      
>>> Hi! Sorry for such a break, but I was moving house... anyway:
>>>
>>> 1. I took the ~/apache-solr/src/java/org/apache/solr/analysis/StandardFilterFactory.java
>>> file and modified it (named as StempelFilterFactory.java) in Vim that
>>> way:
>>>
>>> package org.getopt.solr.analysis;
>>>
>>> import org.apache.lucene.analysis.TokenStream;
>>> import org.apache.lucene.analysis.standard.StandardFilter;
>>>
>>> public class StempelTokenFilterFactory extends BaseTokenFilterFactory {
>>>   public StempelFilter create(TokenStream input) {
>>>     return new StempelFilter(input);
>>>   }
>>> }
>>>
>>> 2. Then I put the file to the extracted stempel-1.0.jar in
>>> ./org/getopt/solr/analysis/
>>> 3. Then I created a class from it: jar -cf
>>> StempelTokenFilterFactory.class StempelFilterFactory.java
>>> 4. Then I created new stempel-1.0.jar archive: jar -cf stempel-1.0.jar
>>> -C ./stempel-1.0/ .
>>> 5. Then in schema.xml I've put:
>>>
>>>     <fieldType name="text_pl" class="solr.TextField">
>>>       <analyzer>
>>>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>>         <filter class="solr.LowerCaseFilterFactory"/>
>>>         <filter class="org.getopt.solr.analysis.StempelTokenFilterFactory" />
>>>       </analyzer>
>>>     </fieldType>
>>>
>>> 6. I started the solr server and I recieved the following error:
>>>
>>> 2010-11-11 11:50:56 org.apache.solr.common.SolrException log
>>> SEVERE: java.lang.ClassFormatError: Incompatible magic value
>>> 1347093252 in class file
>>> org/getopt/solr/analysis/StempelTokenFilterFactory
>>>         at java.lang.ClassLoader.defineClass1(Native Method)
>>>         at java.lang.ClassLoader.defineClass(ClassLoader.java:634)
>>>         at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
>>> ...
>>>
>>> Question: What is wrong? :) I use "jar (fastjar) 0.98" to create jars,
>>> I googled on that error but with no answer gave me idea what is wrong
>>> in my .java file.
>>>
>>> Please help, as I believe I am close to the end of that subject.
>>>
>>> Cheers,
>>> Jakub Godawa.
>>>
>>> 2010/11/3 Lance Norskog<go...@gmail.com>:
>>>        
>>>> Here's the problem: Solr is a little dumb about these Filter classes,
>>>> and so you have to make a Factory object for the Stempel Filter.
>>>>
>>>> There are a lot of other FilterFactory classes. You would have to just
>>>> copy one and change the names to Stempel and it might actually work.
>>>>
>>>> This will take some Solr programming- perhaps the author can help you?
>>>>
>>>> On Tue, Nov 2, 2010 at 7:08 AM, Jakub Godawa<ja...@gmail.com>  wrote:
>>>>          
>>>>> Sorry, I am not Java programmer at all. I would appreciate more
>>>>> verbose (or step by step) help.
>>>>>
>>>>> 2010/11/2 Bernd Fehling<be...@uni-bielefeld.de>:
>>>>>            
>>>>>> So you call org.getopt.solr.analysis.StempelTokenFilterFactory.
>>>>>> In this case I would assume a file StempelTokenFilterFactory.class
>>>>>> in your directory org/getopt/solr/analysis/.
>>>>>>
>>>>>> And a class which extends the BaseTokenFilterFactory rigth?
>>>>>> ...
>>>>>> public class StempelTokenFilterFactory extends BaseTokenFilterFactory implements ResourceLoaderAware {
>>>>>> ...
>>>>>>
>>>>>>
>>>>>>
>>>>>> Am 02.11.2010 14:20, schrieb Jakub Godawa:
>>>>>>              
>>>>>>> This is what stempel-1.0.jar consist of after jar -xf:
>>>>>>>
>>>>>>> jgodawa@ubuntu:~/apache-solr-1.4.1/ifaq/lib$ ls -R org/
>>>>>>> org/:
>>>>>>> egothor  getopt
>>>>>>>
>>>>>>> org/egothor:
>>>>>>> stemmer
>>>>>>>
>>>>>>> org/egothor/stemmer:
>>>>>>> Cell.class     Diff.class    Gener.class  MultiTrie2.class
>>>>>>> Optimizer2.class  Reduce.class        Row.class    TestAll.class
>>>>>>> TestLoad.class  Trie$StrEnum.class
>>>>>>> Compile.class  DiffIt.class  Lift.class   MultiTrie.class
>>>>>>> Optimizer.class   Reduce$Remap.class  Stock.class  Test.class
>>>>>>> Trie.class
>>>>>>>
>>>>>>> org/getopt:
>>>>>>> stempel
>>>>>>>
>>>>>>> org/getopt/stempel:
>>>>>>> Benchmark.class  lucene  Stemmer.class
>>>>>>>
>>>>>>> org/getopt/stempel/lucene:
>>>>>>> StempelAnalyzer.class  StempelFilter.class
>>>>>>> jgodawa@ubuntu:~/apache-solr-1.4.1/ifaq/lib$ ls -R META-INF/
>>>>>>> META-INF/:
>>>>>>> MANIFEST.MF
>>>>>>> jgodawa@ubuntu:~/apache-solr-1.4.1/ifaq/lib$ ls -R res
>>>>>>> res:
>>>>>>> tables
>>>>>>>
>>>>>>> res/tables:
>>>>>>> readme.txt  stemmer_1000.out  stemmer_100.out  stemmer_2000.out
>>>>>>> stemmer_200.out  stemmer_500.out  stemmer_700.out
>>>>>>>
>>>>>>> 2010/11/2 Bernd Fehling<be...@uni-bielefeld.de>:
>>>>>>>                
>>>>>>>> Hi Jakub,
>>>>>>>>
>>>>>>>> if you unzip your stempel-1.0.jar do you have the
>>>>>>>> required directory structure and file in there?
>>>>>>>> org/getopt/stempel/lucene/StempelFilter.class
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Bernd
>>>>>>>>
>>>>>>>> Am 02.11.2010 13:54, schrieb Jakub Godawa:
>>>>>>>>                  
>>>>>>>>> Erick I've put the jar files like that before. I also added the
>>>>>>>>> directive and put the file in instanceDir/lib
>>>>>>>>>
>>>>>>>>> What is still a problem is that even the files are loaded:
>>>>>>>>> 2010-11-02 13:20:48 org.apache.solr.core.SolrResourceLoader replaceClassLoader
>>>>>>>>> INFO: Adding 'file:/home/jgodawa/apache-solr-1.4.1/ifaq/lib/stempel-1.0.jar'
>>>>>>>>> to classloader
>>>>>>>>>
>>>>>>>>> I am not able to use the FilterFactory... maybe I am attempting it in
>>>>>>>>> a wrong way?
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>> Jakub Godawa.
>>>>>>>>>
>>>>>>>>> 2010/11/2 Erick Erickson<er...@gmail.com>:
>>>>>>>>>                    
>>>>>>>>>> The polish stemmer jar file needs to be findable by Solr, if you copy
>>>>>>>>>> it to<solr_home>/lib and restart solr you should be set.
>>>>>>>>>>
>>>>>>>>>> Alternatively, you can add another<lib>  directive to the solrconfig.xml
>>>>>>>>>> file
>>>>>>>>>> (there are several examples in that file already).
>>>>>>>>>>
>>>>>>>>>> I'm a little confused about not being able to find TokenFilter, is that
>>>>>>>>>> still
>>>>>>>>>> a problem?
>>>>>>>>>>
>>>>>>>>>> HTH
>>>>>>>>>> Erick
>>>>>>>>>>
>>>>>>>>>> On Tue, Nov 2, 2010 at 8:07 AM, Jakub Godawa<ja...@gmail.com>  wrote:
>>>>>>>>>>
>>>>>>>>>>                      
>>>>>>>>>>> Thank you Bernd! I couldn't make it run though. Here is my problem:
>>>>>>>>>>>
>>>>>>>>>>> 1. There is a file ~/apache-solr-1.4.1/ifaq/lib/stempel-1.0.jar
>>>>>>>>>>> 2. In ~/apache-solr-1.4.1/ifaq/solr/conf/solrconfig.xml there is a
>>>>>>>>>>> directive:<lib path="../lib/stempel-1.0.jar" />
>>>>>>>>>>> 3. In ~/apache-solr-1.4.1/ifaq/solr/conf/schema.xml there is fieldType:
>>>>>>>>>>>
>>>>>>>>>>> (...)
>>>>>>>>>>>   <!-- Polish -->
>>>>>>>>>>>    <fieldType name="text_pl" class="solr.TextField">
>>>>>>>>>>>     <analyzer>
>>>>>>>>>>>        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>>>>>>>>>>       <filter class="solr.LowerCaseFilterFactory"/>
>>>>>>>>>>>       <filter class="org.getopt.stempel.lucene.StempelFilter" />
>>>>>>>>>>>       <!--<filter
>>>>>>>>>>> class="org.getopt.solr.analysis.StempelTokenFilterFactory"
>>>>>>>>>>> protected="protwords.txt" />  -->
>>>>>>>>>>>     </analyzer>
>>>>>>>>>>>   </fieldType>
>>>>>>>>>>> (...)
>>>>>>>>>>>
>>>>>>>>>>> 4. jar file is loaded but I got an error:
>>>>>>>>>>> SEVERE: Could not start SOLR. Check solr/home property
>>>>>>>>>>> java.lang.NoClassDefFoundError: org/apache/lucene/analysis/TokenFilter
>>>>>>>>>>>       at java.lang.ClassLoader.defineClass1(Native Method)
>>>>>>>>>>>       at java.lang.ClassLoader.defineClass(ClassLoader.java:634)
>>>>>>>>>>>       at
>>>>>>>>>>> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
>>>>>>>>>>> (...)
>>>>>>>>>>>
>>>>>>>>>>> 5. Different class gave me that one:
>>>>>>>>>>> SEVERE: org.apache.solr.common.SolrException: Error loading class
>>>>>>>>>>> 'org.getopt.solr.analysis.StempelTokenFilterFactory'
>>>>>>>>>>>       at
>>>>>>>>>>> org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:375)
>>>>>>>>>>>       at
>>>>>>>>>>> org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:390)
>>>>>>>>>>> (...)
>>>>>>>>>>>
>>>>>>>>>>> Question is: How to make<fieldType />  and<filter />  work with that
>>>>>>>>>>> Stempel? :)
>>>>>>>>>>>
>>>>>>>>>>> Cheers,
>>>>>>>>>>> Jakub Godawa.
>>>>>>>>>>>
>>>>>>>>>>> 2010/10/29 Bernd Fehling<be...@uni-bielefeld.de>:
>>>>>>>>>>>                        
>>>>>>>>>>>> Hi Jakub,
>>>>>>>>>>>>
>>>>>>>>>>>> I have ported the KStemmer for use in most recent Solr trunk version.
>>>>>>>>>>>> My stemmer is located in the lib directory of Solr
>>>>>>>>>>>>                          
>>>>>>>>>>> "solr/lib/KStemmer-2.00.jar"
>>>>>>>>>>>                        
>>>>>>>>>>>> because it belongs to Solr.
>>>>>>>>>>>>
>>>>>>>>>>>> Write it as FilterFactory and use it as Filter like:
>>>>>>>>>>>> <filter class="de.ubbielefeld.solr.analysis.KStemFilterFactory"
>>>>>>>>>>>>                          
>>>>>>>>>>> protected="protwords.txt" />
>>>>>>>>>>>                        
>>>>>>>>>>>> This is how my fieldType looks like:
>>>>>>>>>>>>
>>>>>>>>>>>>     <fieldType name="text_kstem" class="solr.TextField"
>>>>>>>>>>>>                          
>>>>>>>>>>> positionIncrementGap="100">
>>>>>>>>>>>                        
>>>>>>>>>>>>       <analyzer type="index">
>>>>>>>>>>>>         <tokenizer class="solr.WhitespaceTokenizerFactory" />
>>>>>>>>>>>>         <filter class="solr.StopFilterFactory" ignoreCase="true"
>>>>>>>>>>>>                          
>>>>>>>>>>> words="stopwords.txt" enablePositionIncrements="false" />
>>>>>>>>>>>                        
>>>>>>>>>>>>         <filter class="solr.WordDelimiterFilterFactory"
>>>>>>>>>>>>                          
>>>>>>>>>>> generateWordParts="1" generateNumberParts="1" catenateWords="1"
>>>>>>>>>>> catenateNumbers="1"
>>>>>>>>>>>                        
>>>>>>>>>>>> catenateAll="0" splitOnCaseChange="1" />
>>>>>>>>>>>>         <filter class="solr.LowerCaseFilterFactory" />
>>>>>>>>>>>>         <filter class="de.ubbielefeld.solr.analysis.KStemFilterFactory"
>>>>>>>>>>>>                          
>>>>>>>>>>> protected="protwords.txt" />
>>>>>>>>>>>                        
>>>>>>>>>>>>         <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
>>>>>>>>>>>>       </analyzer>
>>>>>>>>>>>>       <analyzer type="query">
>>>>>>>>>>>>         <tokenizer class="solr.WhitespaceTokenizerFactory" />
>>>>>>>>>>>>         <filter class="solr.StopFilterFactory" ignoreCase="true"
>>>>>>>>>>>>                          
>>>>>>>>>>> words="stopwords.txt" />
>>>>>>>>>>>                        
>>>>>>>>>>>>         <filter class="solr.WordDelimiterFilterFactory"
>>>>>>>>>>>>                          
>>>>>>>>>>> generateWordParts="1" generateNumberParts="1" catenateWords="0"
>>>>>>>>>>> catenateNumbers="0"
>>>>>>>>>>>                        
>>>>>>>>>>>> catenateAll="0" splitOnCaseChange="1" />
>>>>>>>>>>>>         <filter class="solr.LowerCaseFilterFactory" />
>>>>>>>>>>>>         <filter class="de.ubbielefeld.solr.analysis.KStemFilterFactory"
>>>>>>>>>>>>                          
>>>>>>>>>>> protected="protwords.txt" />
>>>>>>>>>>>                        
>>>>>>>>>>>>         <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
>>>>>>>>>>>>       </analyzer>
>>>>>>>>>>>>     </fieldType>
>>>>>>>>>>>>
>>>>>>>>>>>> Regards,
>>>>>>>>>>>> Bernd
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Am 28.10.2010 14:56, schrieb Jakub Godawa:
>>>>>>>>>>>>                          
>>>>>>>>>>>>> Hi!
>>>>>>>>>>>>> There is a polish stemmer http://www.getopt.org/stempel/ and I have
>>>>>>>>>>>>> problems connecting it with solr 1.4.1
>>>>>>>>>>>>> Questions:
>>>>>>>>>>>>>
>>>>>>>>>>>>> 1. Where EXACTLY do I put "stemper-1.0.jar" file?
>>>>>>>>>>>>> 2. How do I register the file, so I can build a fieldType like:
>>>>>>>>>>>>>
>>>>>>>>>>>>> <fieldType name="text_pl" class="solr.TextField">
>>>>>>>>>>>>>    <analyzer class="org.geoopt.solr.analysis.StempelTokenFilterFactory"/>
>>>>>>>>>>>>> </fieldType>
>>>>>>>>>>>>>
>>>>>>>>>>>>> 3. Is that the right approach to make it work?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks for verbose explanation,
>>>>>>>>>>>>> Jakub.
>>>>>>>>>>>>>                            
>>>>>>              
>>>>>            
>>>>
>>>>
>>>> --
>>>> Lance Norskog
>>>> goksron@gmail.com
>>>>
>>>>          
>>>        
>>
>>
>> --
>> Lance Norskog
>> goksron@gmail.com
>>
>>      

Re: How to use polish stemmer - Stempel - in schema.xml?

Posted by Jakub Godawa <ja...@gmail.com>.
Am I not doing it in the point no 4? I am compiling all the folder
that was extracted before, but now with that new class file.

2010/11/12 Lance Norskog <go...@gmail.com>:
> I think you have to compile all of the stempel source including your
> filter factory into one jar at the same time. Everybody does this; I
> don't know how different Java versions make class file binaries.
>
> On Thu, Nov 11, 2010 at 3:06 AM, Jakub Godawa <ja...@gmail.com> wrote:
>> Hi! Sorry for such a break, but I was moving house... anyway:
>>
>> 1. I took the ~/apache-solr/src/java/org/apache/solr/analysis/StandardFilterFactory.java
>> file and modified it (named as StempelFilterFactory.java) in Vim that
>> way:
>>
>> package org.getopt.solr.analysis;
>>
>> import org.apache.lucene.analysis.TokenStream;
>> import org.apache.lucene.analysis.standard.StandardFilter;
>>
>> public class StempelTokenFilterFactory extends BaseTokenFilterFactory {
>>  public StempelFilter create(TokenStream input) {
>>    return new StempelFilter(input);
>>  }
>> }
>>
>> 2. Then I put the file to the extracted stempel-1.0.jar in
>> ./org/getopt/solr/analysis/
>> 3. Then I created a class from it: jar -cf
>> StempelTokenFilterFactory.class StempelFilterFactory.java
>> 4. Then I created new stempel-1.0.jar archive: jar -cf stempel-1.0.jar
>> -C ./stempel-1.0/ .
>> 5. Then in schema.xml I've put:
>>
>>    <fieldType name="text_pl" class="solr.TextField">
>>      <analyzer>
>>        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>        <filter class="solr.LowerCaseFilterFactory"/>
>>        <filter class="org.getopt.solr.analysis.StempelTokenFilterFactory" />
>>      </analyzer>
>>    </fieldType>
>>
>> 6. I started the solr server and I recieved the following error:
>>
>> 2010-11-11 11:50:56 org.apache.solr.common.SolrException log
>> SEVERE: java.lang.ClassFormatError: Incompatible magic value
>> 1347093252 in class file
>> org/getopt/solr/analysis/StempelTokenFilterFactory
>>        at java.lang.ClassLoader.defineClass1(Native Method)
>>        at java.lang.ClassLoader.defineClass(ClassLoader.java:634)
>>        at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
>> ...
>>
>> Question: What is wrong? :) I use "jar (fastjar) 0.98" to create jars,
>> I googled on that error but with no answer gave me idea what is wrong
>> in my .java file.
>>
>> Please help, as I believe I am close to the end of that subject.
>>
>> Cheers,
>> Jakub Godawa.
>>
>> 2010/11/3 Lance Norskog <go...@gmail.com>:
>>> Here's the problem: Solr is a little dumb about these Filter classes,
>>> and so you have to make a Factory object for the Stempel Filter.
>>>
>>> There are a lot of other FilterFactory classes. You would have to just
>>> copy one and change the names to Stempel and it might actually work.
>>>
>>> This will take some Solr programming- perhaps the author can help you?
>>>
>>> On Tue, Nov 2, 2010 at 7:08 AM, Jakub Godawa <ja...@gmail.com> wrote:
>>>> Sorry, I am not Java programmer at all. I would appreciate more
>>>> verbose (or step by step) help.
>>>>
>>>> 2010/11/2 Bernd Fehling <be...@uni-bielefeld.de>:
>>>>>
>>>>> So you call org.getopt.solr.analysis.StempelTokenFilterFactory.
>>>>> In this case I would assume a file StempelTokenFilterFactory.class
>>>>> in your directory org/getopt/solr/analysis/.
>>>>>
>>>>> And a class which extends the BaseTokenFilterFactory rigth?
>>>>> ...
>>>>> public class StempelTokenFilterFactory extends BaseTokenFilterFactory implements ResourceLoaderAware {
>>>>> ...
>>>>>
>>>>>
>>>>>
>>>>> Am 02.11.2010 14:20, schrieb Jakub Godawa:
>>>>>> This is what stempel-1.0.jar consist of after jar -xf:
>>>>>>
>>>>>> jgodawa@ubuntu:~/apache-solr-1.4.1/ifaq/lib$ ls -R org/
>>>>>> org/:
>>>>>> egothor  getopt
>>>>>>
>>>>>> org/egothor:
>>>>>> stemmer
>>>>>>
>>>>>> org/egothor/stemmer:
>>>>>> Cell.class     Diff.class    Gener.class  MultiTrie2.class
>>>>>> Optimizer2.class  Reduce.class        Row.class    TestAll.class
>>>>>> TestLoad.class  Trie$StrEnum.class
>>>>>> Compile.class  DiffIt.class  Lift.class   MultiTrie.class
>>>>>> Optimizer.class   Reduce$Remap.class  Stock.class  Test.class
>>>>>> Trie.class
>>>>>>
>>>>>> org/getopt:
>>>>>> stempel
>>>>>>
>>>>>> org/getopt/stempel:
>>>>>> Benchmark.class  lucene  Stemmer.class
>>>>>>
>>>>>> org/getopt/stempel/lucene:
>>>>>> StempelAnalyzer.class  StempelFilter.class
>>>>>> jgodawa@ubuntu:~/apache-solr-1.4.1/ifaq/lib$ ls -R META-INF/
>>>>>> META-INF/:
>>>>>> MANIFEST.MF
>>>>>> jgodawa@ubuntu:~/apache-solr-1.4.1/ifaq/lib$ ls -R res
>>>>>> res:
>>>>>> tables
>>>>>>
>>>>>> res/tables:
>>>>>> readme.txt  stemmer_1000.out  stemmer_100.out  stemmer_2000.out
>>>>>> stemmer_200.out  stemmer_500.out  stemmer_700.out
>>>>>>
>>>>>> 2010/11/2 Bernd Fehling <be...@uni-bielefeld.de>:
>>>>>>> Hi Jakub,
>>>>>>>
>>>>>>> if you unzip your stempel-1.0.jar do you have the
>>>>>>> required directory structure and file in there?
>>>>>>> org/getopt/stempel/lucene/StempelFilter.class
>>>>>>>
>>>>>>> Regards,
>>>>>>> Bernd
>>>>>>>
>>>>>>> Am 02.11.2010 13:54, schrieb Jakub Godawa:
>>>>>>>> Erick I've put the jar files like that before. I also added the
>>>>>>>> directive and put the file in instanceDir/lib
>>>>>>>>
>>>>>>>> What is still a problem is that even the files are loaded:
>>>>>>>> 2010-11-02 13:20:48 org.apache.solr.core.SolrResourceLoader replaceClassLoader
>>>>>>>> INFO: Adding 'file:/home/jgodawa/apache-solr-1.4.1/ifaq/lib/stempel-1.0.jar'
>>>>>>>> to classloader
>>>>>>>>
>>>>>>>> I am not able to use the FilterFactory... maybe I am attempting it in
>>>>>>>> a wrong way?
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Jakub Godawa.
>>>>>>>>
>>>>>>>> 2010/11/2 Erick Erickson <er...@gmail.com>:
>>>>>>>>> The polish stemmer jar file needs to be findable by Solr, if you copy
>>>>>>>>> it to <solr_home>/lib and restart solr you should be set.
>>>>>>>>>
>>>>>>>>> Alternatively, you can add another <lib> directive to the solrconfig.xml
>>>>>>>>> file
>>>>>>>>> (there are several examples in that file already).
>>>>>>>>>
>>>>>>>>> I'm a little confused about not being able to find TokenFilter, is that
>>>>>>>>> still
>>>>>>>>> a problem?
>>>>>>>>>
>>>>>>>>> HTH
>>>>>>>>> Erick
>>>>>>>>>
>>>>>>>>> On Tue, Nov 2, 2010 at 8:07 AM, Jakub Godawa <ja...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Thank you Bernd! I couldn't make it run though. Here is my problem:
>>>>>>>>>>
>>>>>>>>>> 1. There is a file ~/apache-solr-1.4.1/ifaq/lib/stempel-1.0.jar
>>>>>>>>>> 2. In ~/apache-solr-1.4.1/ifaq/solr/conf/solrconfig.xml there is a
>>>>>>>>>> directive: <lib path="../lib/stempel-1.0.jar" />
>>>>>>>>>> 3. In ~/apache-solr-1.4.1/ifaq/solr/conf/schema.xml there is fieldType:
>>>>>>>>>>
>>>>>>>>>> (...)
>>>>>>>>>>  <!-- Polish -->
>>>>>>>>>>   <fieldType name="text_pl" class="solr.TextField">
>>>>>>>>>>    <analyzer>
>>>>>>>>>>       <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>>>>>>>>>      <filter class="solr.LowerCaseFilterFactory"/>
>>>>>>>>>>      <filter class="org.getopt.stempel.lucene.StempelFilter" />
>>>>>>>>>>      <!--    <filter
>>>>>>>>>> class="org.getopt.solr.analysis.StempelTokenFilterFactory"
>>>>>>>>>> protected="protwords.txt" /> -->
>>>>>>>>>>    </analyzer>
>>>>>>>>>>  </fieldType>
>>>>>>>>>> (...)
>>>>>>>>>>
>>>>>>>>>> 4. jar file is loaded but I got an error:
>>>>>>>>>> SEVERE: Could not start SOLR. Check solr/home property
>>>>>>>>>> java.lang.NoClassDefFoundError: org/apache/lucene/analysis/TokenFilter
>>>>>>>>>>      at java.lang.ClassLoader.defineClass1(Native Method)
>>>>>>>>>>      at java.lang.ClassLoader.defineClass(ClassLoader.java:634)
>>>>>>>>>>      at
>>>>>>>>>> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
>>>>>>>>>> (...)
>>>>>>>>>>
>>>>>>>>>> 5. Different class gave me that one:
>>>>>>>>>> SEVERE: org.apache.solr.common.SolrException: Error loading class
>>>>>>>>>> 'org.getopt.solr.analysis.StempelTokenFilterFactory'
>>>>>>>>>>      at
>>>>>>>>>> org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:375)
>>>>>>>>>>      at
>>>>>>>>>> org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:390)
>>>>>>>>>> (...)
>>>>>>>>>>
>>>>>>>>>> Question is: How to make <fieldType /> and <filter /> work with that
>>>>>>>>>> Stempel? :)
>>>>>>>>>>
>>>>>>>>>> Cheers,
>>>>>>>>>> Jakub Godawa.
>>>>>>>>>>
>>>>>>>>>> 2010/10/29 Bernd Fehling <be...@uni-bielefeld.de>:
>>>>>>>>>>> Hi Jakub,
>>>>>>>>>>>
>>>>>>>>>>> I have ported the KStemmer for use in most recent Solr trunk version.
>>>>>>>>>>> My stemmer is located in the lib directory of Solr
>>>>>>>>>> "solr/lib/KStemmer-2.00.jar"
>>>>>>>>>>> because it belongs to Solr.
>>>>>>>>>>>
>>>>>>>>>>> Write it as FilterFactory and use it as Filter like:
>>>>>>>>>>> <filter class="de.ubbielefeld.solr.analysis.KStemFilterFactory"
>>>>>>>>>> protected="protwords.txt" />
>>>>>>>>>>>
>>>>>>>>>>> This is how my fieldType looks like:
>>>>>>>>>>>
>>>>>>>>>>>    <fieldType name="text_kstem" class="solr.TextField"
>>>>>>>>>> positionIncrementGap="100">
>>>>>>>>>>>      <analyzer type="index">
>>>>>>>>>>>        <tokenizer class="solr.WhitespaceTokenizerFactory" />
>>>>>>>>>>>        <filter class="solr.StopFilterFactory" ignoreCase="true"
>>>>>>>>>> words="stopwords.txt" enablePositionIncrements="false" />
>>>>>>>>>>>        <filter class="solr.WordDelimiterFilterFactory"
>>>>>>>>>> generateWordParts="1" generateNumberParts="1" catenateWords="1"
>>>>>>>>>> catenateNumbers="1"
>>>>>>>>>>> catenateAll="0" splitOnCaseChange="1" />
>>>>>>>>>>>        <filter class="solr.LowerCaseFilterFactory" />
>>>>>>>>>>>        <filter class="de.ubbielefeld.solr.analysis.KStemFilterFactory"
>>>>>>>>>> protected="protwords.txt" />
>>>>>>>>>>>        <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
>>>>>>>>>>>      </analyzer>
>>>>>>>>>>>      <analyzer type="query">
>>>>>>>>>>>        <tokenizer class="solr.WhitespaceTokenizerFactory" />
>>>>>>>>>>>        <filter class="solr.StopFilterFactory" ignoreCase="true"
>>>>>>>>>> words="stopwords.txt" />
>>>>>>>>>>>        <filter class="solr.WordDelimiterFilterFactory"
>>>>>>>>>> generateWordParts="1" generateNumberParts="1" catenateWords="0"
>>>>>>>>>> catenateNumbers="0"
>>>>>>>>>>> catenateAll="0" splitOnCaseChange="1" />
>>>>>>>>>>>        <filter class="solr.LowerCaseFilterFactory" />
>>>>>>>>>>>        <filter class="de.ubbielefeld.solr.analysis.KStemFilterFactory"
>>>>>>>>>> protected="protwords.txt" />
>>>>>>>>>>>        <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
>>>>>>>>>>>      </analyzer>
>>>>>>>>>>>    </fieldType>
>>>>>>>>>>>
>>>>>>>>>>> Regards,
>>>>>>>>>>> Bernd
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Am 28.10.2010 14:56, schrieb Jakub Godawa:
>>>>>>>>>>>> Hi!
>>>>>>>>>>>> There is a polish stemmer http://www.getopt.org/stempel/ and I have
>>>>>>>>>>>> problems connecting it with solr 1.4.1
>>>>>>>>>>>> Questions:
>>>>>>>>>>>>
>>>>>>>>>>>> 1. Where EXACTLY do I put "stemper-1.0.jar" file?
>>>>>>>>>>>> 2. How do I register the file, so I can build a fieldType like:
>>>>>>>>>>>>
>>>>>>>>>>>> <fieldType name="text_pl" class="solr.TextField">
>>>>>>>>>>>>   <analyzer class="org.geoopt.solr.analysis.StempelTokenFilterFactory"/>
>>>>>>>>>>>> </fieldType>
>>>>>>>>>>>>
>>>>>>>>>>>> 3. Is that the right approach to make it work?
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks for verbose explanation,
>>>>>>>>>>>> Jakub.
>>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Lance Norskog
>>> goksron@gmail.com
>>>
>>
>
>
>
> --
> Lance Norskog
> goksron@gmail.com
>

Re: How to use polish stemmer - Stempel - in schema.xml?

Posted by Lance Norskog <go...@gmail.com>.
I think you have to compile all of the stempel source including your
filter factory into one jar at the same time. Everybody does this; I
don't know how different Java versions make class file binaries.

On Thu, Nov 11, 2010 at 3:06 AM, Jakub Godawa <ja...@gmail.com> wrote:
> Hi! Sorry for such a break, but I was moving house... anyway:
>
> 1. I took the ~/apache-solr/src/java/org/apache/solr/analysis/StandardFilterFactory.java
> file and modified it (named as StempelFilterFactory.java) in Vim that
> way:
>
> package org.getopt.solr.analysis;
>
> import org.apache.lucene.analysis.TokenStream;
> import org.apache.lucene.analysis.standard.StandardFilter;
>
> public class StempelTokenFilterFactory extends BaseTokenFilterFactory {
>  public StempelFilter create(TokenStream input) {
>    return new StempelFilter(input);
>  }
> }
>
> 2. Then I put the file to the extracted stempel-1.0.jar in
> ./org/getopt/solr/analysis/
> 3. Then I created a class from it: jar -cf
> StempelTokenFilterFactory.class StempelFilterFactory.java
> 4. Then I created new stempel-1.0.jar archive: jar -cf stempel-1.0.jar
> -C ./stempel-1.0/ .
> 5. Then in schema.xml I've put:
>
>    <fieldType name="text_pl" class="solr.TextField">
>      <analyzer>
>        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>        <filter class="solr.LowerCaseFilterFactory"/>
>        <filter class="org.getopt.solr.analysis.StempelTokenFilterFactory" />
>      </analyzer>
>    </fieldType>
>
> 6. I started the solr server and I recieved the following error:
>
> 2010-11-11 11:50:56 org.apache.solr.common.SolrException log
> SEVERE: java.lang.ClassFormatError: Incompatible magic value
> 1347093252 in class file
> org/getopt/solr/analysis/StempelTokenFilterFactory
>        at java.lang.ClassLoader.defineClass1(Native Method)
>        at java.lang.ClassLoader.defineClass(ClassLoader.java:634)
>        at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
> ...
>
> Question: What is wrong? :) I use "jar (fastjar) 0.98" to create jars,
> I googled on that error but with no answer gave me idea what is wrong
> in my .java file.
>
> Please help, as I believe I am close to the end of that subject.
>
> Cheers,
> Jakub Godawa.
>
> 2010/11/3 Lance Norskog <go...@gmail.com>:
>> Here's the problem: Solr is a little dumb about these Filter classes,
>> and so you have to make a Factory object for the Stempel Filter.
>>
>> There are a lot of other FilterFactory classes. You would have to just
>> copy one and change the names to Stempel and it might actually work.
>>
>> This will take some Solr programming- perhaps the author can help you?
>>
>> On Tue, Nov 2, 2010 at 7:08 AM, Jakub Godawa <ja...@gmail.com> wrote:
>>> Sorry, I am not Java programmer at all. I would appreciate more
>>> verbose (or step by step) help.
>>>
>>> 2010/11/2 Bernd Fehling <be...@uni-bielefeld.de>:
>>>>
>>>> So you call org.getopt.solr.analysis.StempelTokenFilterFactory.
>>>> In this case I would assume a file StempelTokenFilterFactory.class
>>>> in your directory org/getopt/solr/analysis/.
>>>>
>>>> And a class which extends the BaseTokenFilterFactory rigth?
>>>> ...
>>>> public class StempelTokenFilterFactory extends BaseTokenFilterFactory implements ResourceLoaderAware {
>>>> ...
>>>>
>>>>
>>>>
>>>> Am 02.11.2010 14:20, schrieb Jakub Godawa:
>>>>> This is what stempel-1.0.jar consist of after jar -xf:
>>>>>
>>>>> jgodawa@ubuntu:~/apache-solr-1.4.1/ifaq/lib$ ls -R org/
>>>>> org/:
>>>>> egothor  getopt
>>>>>
>>>>> org/egothor:
>>>>> stemmer
>>>>>
>>>>> org/egothor/stemmer:
>>>>> Cell.class     Diff.class    Gener.class  MultiTrie2.class
>>>>> Optimizer2.class  Reduce.class        Row.class    TestAll.class
>>>>> TestLoad.class  Trie$StrEnum.class
>>>>> Compile.class  DiffIt.class  Lift.class   MultiTrie.class
>>>>> Optimizer.class   Reduce$Remap.class  Stock.class  Test.class
>>>>> Trie.class
>>>>>
>>>>> org/getopt:
>>>>> stempel
>>>>>
>>>>> org/getopt/stempel:
>>>>> Benchmark.class  lucene  Stemmer.class
>>>>>
>>>>> org/getopt/stempel/lucene:
>>>>> StempelAnalyzer.class  StempelFilter.class
>>>>> jgodawa@ubuntu:~/apache-solr-1.4.1/ifaq/lib$ ls -R META-INF/
>>>>> META-INF/:
>>>>> MANIFEST.MF
>>>>> jgodawa@ubuntu:~/apache-solr-1.4.1/ifaq/lib$ ls -R res
>>>>> res:
>>>>> tables
>>>>>
>>>>> res/tables:
>>>>> readme.txt  stemmer_1000.out  stemmer_100.out  stemmer_2000.out
>>>>> stemmer_200.out  stemmer_500.out  stemmer_700.out
>>>>>
>>>>> 2010/11/2 Bernd Fehling <be...@uni-bielefeld.de>:
>>>>>> Hi Jakub,
>>>>>>
>>>>>> if you unzip your stempel-1.0.jar do you have the
>>>>>> required directory structure and file in there?
>>>>>> org/getopt/stempel/lucene/StempelFilter.class
>>>>>>
>>>>>> Regards,
>>>>>> Bernd
>>>>>>
>>>>>> Am 02.11.2010 13:54, schrieb Jakub Godawa:
>>>>>>> Erick I've put the jar files like that before. I also added the
>>>>>>> directive and put the file in instanceDir/lib
>>>>>>>
>>>>>>> What is still a problem is that even the files are loaded:
>>>>>>> 2010-11-02 13:20:48 org.apache.solr.core.SolrResourceLoader replaceClassLoader
>>>>>>> INFO: Adding 'file:/home/jgodawa/apache-solr-1.4.1/ifaq/lib/stempel-1.0.jar'
>>>>>>> to classloader
>>>>>>>
>>>>>>> I am not able to use the FilterFactory... maybe I am attempting it in
>>>>>>> a wrong way?
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Jakub Godawa.
>>>>>>>
>>>>>>> 2010/11/2 Erick Erickson <er...@gmail.com>:
>>>>>>>> The polish stemmer jar file needs to be findable by Solr, if you copy
>>>>>>>> it to <solr_home>/lib and restart solr you should be set.
>>>>>>>>
>>>>>>>> Alternatively, you can add another <lib> directive to the solrconfig.xml
>>>>>>>> file
>>>>>>>> (there are several examples in that file already).
>>>>>>>>
>>>>>>>> I'm a little confused about not being able to find TokenFilter, is that
>>>>>>>> still
>>>>>>>> a problem?
>>>>>>>>
>>>>>>>> HTH
>>>>>>>> Erick
>>>>>>>>
>>>>>>>> On Tue, Nov 2, 2010 at 8:07 AM, Jakub Godawa <ja...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Thank you Bernd! I couldn't make it run though. Here is my problem:
>>>>>>>>>
>>>>>>>>> 1. There is a file ~/apache-solr-1.4.1/ifaq/lib/stempel-1.0.jar
>>>>>>>>> 2. In ~/apache-solr-1.4.1/ifaq/solr/conf/solrconfig.xml there is a
>>>>>>>>> directive: <lib path="../lib/stempel-1.0.jar" />
>>>>>>>>> 3. In ~/apache-solr-1.4.1/ifaq/solr/conf/schema.xml there is fieldType:
>>>>>>>>>
>>>>>>>>> (...)
>>>>>>>>>  <!-- Polish -->
>>>>>>>>>   <fieldType name="text_pl" class="solr.TextField">
>>>>>>>>>    <analyzer>
>>>>>>>>>       <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>>>>>>>>      <filter class="solr.LowerCaseFilterFactory"/>
>>>>>>>>>      <filter class="org.getopt.stempel.lucene.StempelFilter" />
>>>>>>>>>      <!--    <filter
>>>>>>>>> class="org.getopt.solr.analysis.StempelTokenFilterFactory"
>>>>>>>>> protected="protwords.txt" /> -->
>>>>>>>>>    </analyzer>
>>>>>>>>>  </fieldType>
>>>>>>>>> (...)
>>>>>>>>>
>>>>>>>>> 4. jar file is loaded but I got an error:
>>>>>>>>> SEVERE: Could not start SOLR. Check solr/home property
>>>>>>>>> java.lang.NoClassDefFoundError: org/apache/lucene/analysis/TokenFilter
>>>>>>>>>      at java.lang.ClassLoader.defineClass1(Native Method)
>>>>>>>>>      at java.lang.ClassLoader.defineClass(ClassLoader.java:634)
>>>>>>>>>      at
>>>>>>>>> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
>>>>>>>>> (...)
>>>>>>>>>
>>>>>>>>> 5. Different class gave me that one:
>>>>>>>>> SEVERE: org.apache.solr.common.SolrException: Error loading class
>>>>>>>>> 'org.getopt.solr.analysis.StempelTokenFilterFactory'
>>>>>>>>>      at
>>>>>>>>> org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:375)
>>>>>>>>>      at
>>>>>>>>> org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:390)
>>>>>>>>> (...)
>>>>>>>>>
>>>>>>>>> Question is: How to make <fieldType /> and <filter /> work with that
>>>>>>>>> Stempel? :)
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>> Jakub Godawa.
>>>>>>>>>
>>>>>>>>> 2010/10/29 Bernd Fehling <be...@uni-bielefeld.de>:
>>>>>>>>>> Hi Jakub,
>>>>>>>>>>
>>>>>>>>>> I have ported the KStemmer for use in most recent Solr trunk version.
>>>>>>>>>> My stemmer is located in the lib directory of Solr
>>>>>>>>> "solr/lib/KStemmer-2.00.jar"
>>>>>>>>>> because it belongs to Solr.
>>>>>>>>>>
>>>>>>>>>> Write it as FilterFactory and use it as Filter like:
>>>>>>>>>> <filter class="de.ubbielefeld.solr.analysis.KStemFilterFactory"
>>>>>>>>> protected="protwords.txt" />
>>>>>>>>>>
>>>>>>>>>> This is how my fieldType looks like:
>>>>>>>>>>
>>>>>>>>>>    <fieldType name="text_kstem" class="solr.TextField"
>>>>>>>>> positionIncrementGap="100">
>>>>>>>>>>      <analyzer type="index">
>>>>>>>>>>        <tokenizer class="solr.WhitespaceTokenizerFactory" />
>>>>>>>>>>        <filter class="solr.StopFilterFactory" ignoreCase="true"
>>>>>>>>> words="stopwords.txt" enablePositionIncrements="false" />
>>>>>>>>>>        <filter class="solr.WordDelimiterFilterFactory"
>>>>>>>>> generateWordParts="1" generateNumberParts="1" catenateWords="1"
>>>>>>>>> catenateNumbers="1"
>>>>>>>>>> catenateAll="0" splitOnCaseChange="1" />
>>>>>>>>>>        <filter class="solr.LowerCaseFilterFactory" />
>>>>>>>>>>        <filter class="de.ubbielefeld.solr.analysis.KStemFilterFactory"
>>>>>>>>> protected="protwords.txt" />
>>>>>>>>>>        <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
>>>>>>>>>>      </analyzer>
>>>>>>>>>>      <analyzer type="query">
>>>>>>>>>>        <tokenizer class="solr.WhitespaceTokenizerFactory" />
>>>>>>>>>>        <filter class="solr.StopFilterFactory" ignoreCase="true"
>>>>>>>>> words="stopwords.txt" />
>>>>>>>>>>        <filter class="solr.WordDelimiterFilterFactory"
>>>>>>>>> generateWordParts="1" generateNumberParts="1" catenateWords="0"
>>>>>>>>> catenateNumbers="0"
>>>>>>>>>> catenateAll="0" splitOnCaseChange="1" />
>>>>>>>>>>        <filter class="solr.LowerCaseFilterFactory" />
>>>>>>>>>>        <filter class="de.ubbielefeld.solr.analysis.KStemFilterFactory"
>>>>>>>>> protected="protwords.txt" />
>>>>>>>>>>        <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
>>>>>>>>>>      </analyzer>
>>>>>>>>>>    </fieldType>
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> Bernd
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Am 28.10.2010 14:56, schrieb Jakub Godawa:
>>>>>>>>>>> Hi!
>>>>>>>>>>> There is a polish stemmer http://www.getopt.org/stempel/ and I have
>>>>>>>>>>> problems connecting it with solr 1.4.1
>>>>>>>>>>> Questions:
>>>>>>>>>>>
>>>>>>>>>>> 1. Where EXACTLY do I put "stemper-1.0.jar" file?
>>>>>>>>>>> 2. How do I register the file, so I can build a fieldType like:
>>>>>>>>>>>
>>>>>>>>>>> <fieldType name="text_pl" class="solr.TextField">
>>>>>>>>>>>   <analyzer class="org.geoopt.solr.analysis.StempelTokenFilterFactory"/>
>>>>>>>>>>> </fieldType>
>>>>>>>>>>>
>>>>>>>>>>> 3. Is that the right approach to make it work?
>>>>>>>>>>>
>>>>>>>>>>> Thanks for verbose explanation,
>>>>>>>>>>> Jakub.
>>>>
>>>
>>
>>
>>
>> --
>> Lance Norskog
>> goksron@gmail.com
>>
>



-- 
Lance Norskog
goksron@gmail.com

Re: How to use polish stemmer - Stempel - in schema.xml?

Posted by Jakub Godawa <ja...@gmail.com>.
Hi! Sorry for such a break, but I was moving house... anyway:

1. I took the ~/apache-solr/src/java/org/apache/solr/analysis/StandardFilterFactory.java
file and modified it (named as StempelFilterFactory.java) in Vim that
way:

package org.getopt.solr.analysis;

import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.standard.StandardFilter;

public class StempelTokenFilterFactory extends BaseTokenFilterFactory {
  public StempelFilter create(TokenStream input) {
    return new StempelFilter(input);
  }
}

2. Then I put the file to the extracted stempel-1.0.jar in
./org/getopt/solr/analysis/
3. Then I created a class from it: jar -cf
StempelTokenFilterFactory.class StempelFilterFactory.java
4. Then I created new stempel-1.0.jar archive: jar -cf stempel-1.0.jar
-C ./stempel-1.0/ .
5. Then in schema.xml I've put:

    <fieldType name="text_pl" class="solr.TextField">
      <analyzer>
	<tokenizer class="solr.WhitespaceTokenizerFactory"/>
	<filter class="solr.LowerCaseFilterFactory"/>
	<filter class="org.getopt.solr.analysis.StempelTokenFilterFactory" />
      </analyzer>
    </fieldType>

6. I started the solr server and I recieved the following error:

2010-11-11 11:50:56 org.apache.solr.common.SolrException log
SEVERE: java.lang.ClassFormatError: Incompatible magic value
1347093252 in class file
org/getopt/solr/analysis/StempelTokenFilterFactory
	at java.lang.ClassLoader.defineClass1(Native Method)
	at java.lang.ClassLoader.defineClass(ClassLoader.java:634)
	at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
...

Question: What is wrong? :) I use "jar (fastjar) 0.98" to create jars,
I googled on that error but with no answer gave me idea what is wrong
in my .java file.

Please help, as I believe I am close to the end of that subject.

Cheers,
Jakub Godawa.

2010/11/3 Lance Norskog <go...@gmail.com>:
> Here's the problem: Solr is a little dumb about these Filter classes,
> and so you have to make a Factory object for the Stempel Filter.
>
> There are a lot of other FilterFactory classes. You would have to just
> copy one and change the names to Stempel and it might actually work.
>
> This will take some Solr programming- perhaps the author can help you?
>
> On Tue, Nov 2, 2010 at 7:08 AM, Jakub Godawa <ja...@gmail.com> wrote:
>> Sorry, I am not Java programmer at all. I would appreciate more
>> verbose (or step by step) help.
>>
>> 2010/11/2 Bernd Fehling <be...@uni-bielefeld.de>:
>>>
>>> So you call org.getopt.solr.analysis.StempelTokenFilterFactory.
>>> In this case I would assume a file StempelTokenFilterFactory.class
>>> in your directory org/getopt/solr/analysis/.
>>>
>>> And a class which extends the BaseTokenFilterFactory rigth?
>>> ...
>>> public class StempelTokenFilterFactory extends BaseTokenFilterFactory implements ResourceLoaderAware {
>>> ...
>>>
>>>
>>>
>>> Am 02.11.2010 14:20, schrieb Jakub Godawa:
>>>> This is what stempel-1.0.jar consist of after jar -xf:
>>>>
>>>> jgodawa@ubuntu:~/apache-solr-1.4.1/ifaq/lib$ ls -R org/
>>>> org/:
>>>> egothor  getopt
>>>>
>>>> org/egothor:
>>>> stemmer
>>>>
>>>> org/egothor/stemmer:
>>>> Cell.class     Diff.class    Gener.class  MultiTrie2.class
>>>> Optimizer2.class  Reduce.class        Row.class    TestAll.class
>>>> TestLoad.class  Trie$StrEnum.class
>>>> Compile.class  DiffIt.class  Lift.class   MultiTrie.class
>>>> Optimizer.class   Reduce$Remap.class  Stock.class  Test.class
>>>> Trie.class
>>>>
>>>> org/getopt:
>>>> stempel
>>>>
>>>> org/getopt/stempel:
>>>> Benchmark.class  lucene  Stemmer.class
>>>>
>>>> org/getopt/stempel/lucene:
>>>> StempelAnalyzer.class  StempelFilter.class
>>>> jgodawa@ubuntu:~/apache-solr-1.4.1/ifaq/lib$ ls -R META-INF/
>>>> META-INF/:
>>>> MANIFEST.MF
>>>> jgodawa@ubuntu:~/apache-solr-1.4.1/ifaq/lib$ ls -R res
>>>> res:
>>>> tables
>>>>
>>>> res/tables:
>>>> readme.txt  stemmer_1000.out  stemmer_100.out  stemmer_2000.out
>>>> stemmer_200.out  stemmer_500.out  stemmer_700.out
>>>>
>>>> 2010/11/2 Bernd Fehling <be...@uni-bielefeld.de>:
>>>>> Hi Jakub,
>>>>>
>>>>> if you unzip your stempel-1.0.jar do you have the
>>>>> required directory structure and file in there?
>>>>> org/getopt/stempel/lucene/StempelFilter.class
>>>>>
>>>>> Regards,
>>>>> Bernd
>>>>>
>>>>> Am 02.11.2010 13:54, schrieb Jakub Godawa:
>>>>>> Erick I've put the jar files like that before. I also added the
>>>>>> directive and put the file in instanceDir/lib
>>>>>>
>>>>>> What is still a problem is that even the files are loaded:
>>>>>> 2010-11-02 13:20:48 org.apache.solr.core.SolrResourceLoader replaceClassLoader
>>>>>> INFO: Adding 'file:/home/jgodawa/apache-solr-1.4.1/ifaq/lib/stempel-1.0.jar'
>>>>>> to classloader
>>>>>>
>>>>>> I am not able to use the FilterFactory... maybe I am attempting it in
>>>>>> a wrong way?
>>>>>>
>>>>>> Cheers,
>>>>>> Jakub Godawa.
>>>>>>
>>>>>> 2010/11/2 Erick Erickson <er...@gmail.com>:
>>>>>>> The polish stemmer jar file needs to be findable by Solr, if you copy
>>>>>>> it to <solr_home>/lib and restart solr you should be set.
>>>>>>>
>>>>>>> Alternatively, you can add another <lib> directive to the solrconfig.xml
>>>>>>> file
>>>>>>> (there are several examples in that file already).
>>>>>>>
>>>>>>> I'm a little confused about not being able to find TokenFilter, is that
>>>>>>> still
>>>>>>> a problem?
>>>>>>>
>>>>>>> HTH
>>>>>>> Erick
>>>>>>>
>>>>>>> On Tue, Nov 2, 2010 at 8:07 AM, Jakub Godawa <ja...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Thank you Bernd! I couldn't make it run though. Here is my problem:
>>>>>>>>
>>>>>>>> 1. There is a file ~/apache-solr-1.4.1/ifaq/lib/stempel-1.0.jar
>>>>>>>> 2. In ~/apache-solr-1.4.1/ifaq/solr/conf/solrconfig.xml there is a
>>>>>>>> directive: <lib path="../lib/stempel-1.0.jar" />
>>>>>>>> 3. In ~/apache-solr-1.4.1/ifaq/solr/conf/schema.xml there is fieldType:
>>>>>>>>
>>>>>>>> (...)
>>>>>>>>  <!-- Polish -->
>>>>>>>>   <fieldType name="text_pl" class="solr.TextField">
>>>>>>>>    <analyzer>
>>>>>>>>       <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>>>>>>>      <filter class="solr.LowerCaseFilterFactory"/>
>>>>>>>>      <filter class="org.getopt.stempel.lucene.StempelFilter" />
>>>>>>>>      <!--    <filter
>>>>>>>> class="org.getopt.solr.analysis.StempelTokenFilterFactory"
>>>>>>>> protected="protwords.txt" /> -->
>>>>>>>>    </analyzer>
>>>>>>>>  </fieldType>
>>>>>>>> (...)
>>>>>>>>
>>>>>>>> 4. jar file is loaded but I got an error:
>>>>>>>> SEVERE: Could not start SOLR. Check solr/home property
>>>>>>>> java.lang.NoClassDefFoundError: org/apache/lucene/analysis/TokenFilter
>>>>>>>>      at java.lang.ClassLoader.defineClass1(Native Method)
>>>>>>>>      at java.lang.ClassLoader.defineClass(ClassLoader.java:634)
>>>>>>>>      at
>>>>>>>> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
>>>>>>>> (...)
>>>>>>>>
>>>>>>>> 5. Different class gave me that one:
>>>>>>>> SEVERE: org.apache.solr.common.SolrException: Error loading class
>>>>>>>> 'org.getopt.solr.analysis.StempelTokenFilterFactory'
>>>>>>>>      at
>>>>>>>> org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:375)
>>>>>>>>      at
>>>>>>>> org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:390)
>>>>>>>> (...)
>>>>>>>>
>>>>>>>> Question is: How to make <fieldType /> and <filter /> work with that
>>>>>>>> Stempel? :)
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Jakub Godawa.
>>>>>>>>
>>>>>>>> 2010/10/29 Bernd Fehling <be...@uni-bielefeld.de>:
>>>>>>>>> Hi Jakub,
>>>>>>>>>
>>>>>>>>> I have ported the KStemmer for use in most recent Solr trunk version.
>>>>>>>>> My stemmer is located in the lib directory of Solr
>>>>>>>> "solr/lib/KStemmer-2.00.jar"
>>>>>>>>> because it belongs to Solr.
>>>>>>>>>
>>>>>>>>> Write it as FilterFactory and use it as Filter like:
>>>>>>>>> <filter class="de.ubbielefeld.solr.analysis.KStemFilterFactory"
>>>>>>>> protected="protwords.txt" />
>>>>>>>>>
>>>>>>>>> This is how my fieldType looks like:
>>>>>>>>>
>>>>>>>>>    <fieldType name="text_kstem" class="solr.TextField"
>>>>>>>> positionIncrementGap="100">
>>>>>>>>>      <analyzer type="index">
>>>>>>>>>        <tokenizer class="solr.WhitespaceTokenizerFactory" />
>>>>>>>>>        <filter class="solr.StopFilterFactory" ignoreCase="true"
>>>>>>>> words="stopwords.txt" enablePositionIncrements="false" />
>>>>>>>>>        <filter class="solr.WordDelimiterFilterFactory"
>>>>>>>> generateWordParts="1" generateNumberParts="1" catenateWords="1"
>>>>>>>> catenateNumbers="1"
>>>>>>>>> catenateAll="0" splitOnCaseChange="1" />
>>>>>>>>>        <filter class="solr.LowerCaseFilterFactory" />
>>>>>>>>>        <filter class="de.ubbielefeld.solr.analysis.KStemFilterFactory"
>>>>>>>> protected="protwords.txt" />
>>>>>>>>>        <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
>>>>>>>>>      </analyzer>
>>>>>>>>>      <analyzer type="query">
>>>>>>>>>        <tokenizer class="solr.WhitespaceTokenizerFactory" />
>>>>>>>>>        <filter class="solr.StopFilterFactory" ignoreCase="true"
>>>>>>>> words="stopwords.txt" />
>>>>>>>>>        <filter class="solr.WordDelimiterFilterFactory"
>>>>>>>> generateWordParts="1" generateNumberParts="1" catenateWords="0"
>>>>>>>> catenateNumbers="0"
>>>>>>>>> catenateAll="0" splitOnCaseChange="1" />
>>>>>>>>>        <filter class="solr.LowerCaseFilterFactory" />
>>>>>>>>>        <filter class="de.ubbielefeld.solr.analysis.KStemFilterFactory"
>>>>>>>> protected="protwords.txt" />
>>>>>>>>>        <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
>>>>>>>>>      </analyzer>
>>>>>>>>>    </fieldType>
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Bernd
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Am 28.10.2010 14:56, schrieb Jakub Godawa:
>>>>>>>>>> Hi!
>>>>>>>>>> There is a polish stemmer http://www.getopt.org/stempel/ and I have
>>>>>>>>>> problems connecting it with solr 1.4.1
>>>>>>>>>> Questions:
>>>>>>>>>>
>>>>>>>>>> 1. Where EXACTLY do I put "stemper-1.0.jar" file?
>>>>>>>>>> 2. How do I register the file, so I can build a fieldType like:
>>>>>>>>>>
>>>>>>>>>> <fieldType name="text_pl" class="solr.TextField">
>>>>>>>>>>   <analyzer class="org.geoopt.solr.analysis.StempelTokenFilterFactory"/>
>>>>>>>>>> </fieldType>
>>>>>>>>>>
>>>>>>>>>> 3. Is that the right approach to make it work?
>>>>>>>>>>
>>>>>>>>>> Thanks for verbose explanation,
>>>>>>>>>> Jakub.
>>>
>>
>
>
>
> --
> Lance Norskog
> goksron@gmail.com
>

Re: How to use polish stemmer - Stempel - in schema.xml?

Posted by Lance Norskog <go...@gmail.com>.
Here's the problem: Solr is a little dumb about these Filter classes,
and so you have to make a Factory object for the Stempel Filter.

There are a lot of other FilterFactory classes. You would have to just
copy one and change the names to Stempel and it might actually work.

This will take some Solr programming- perhaps the author can help you?

On Tue, Nov 2, 2010 at 7:08 AM, Jakub Godawa <ja...@gmail.com> wrote:
> Sorry, I am not Java programmer at all. I would appreciate more
> verbose (or step by step) help.
>
> 2010/11/2 Bernd Fehling <be...@uni-bielefeld.de>:
>>
>> So you call org.getopt.solr.analysis.StempelTokenFilterFactory.
>> In this case I would assume a file StempelTokenFilterFactory.class
>> in your directory org/getopt/solr/analysis/.
>>
>> And a class which extends the BaseTokenFilterFactory rigth?
>> ...
>> public class StempelTokenFilterFactory extends BaseTokenFilterFactory implements ResourceLoaderAware {
>> ...
>>
>>
>>
>> Am 02.11.2010 14:20, schrieb Jakub Godawa:
>>> This is what stempel-1.0.jar consist of after jar -xf:
>>>
>>> jgodawa@ubuntu:~/apache-solr-1.4.1/ifaq/lib$ ls -R org/
>>> org/:
>>> egothor  getopt
>>>
>>> org/egothor:
>>> stemmer
>>>
>>> org/egothor/stemmer:
>>> Cell.class     Diff.class    Gener.class  MultiTrie2.class
>>> Optimizer2.class  Reduce.class        Row.class    TestAll.class
>>> TestLoad.class  Trie$StrEnum.class
>>> Compile.class  DiffIt.class  Lift.class   MultiTrie.class
>>> Optimizer.class   Reduce$Remap.class  Stock.class  Test.class
>>> Trie.class
>>>
>>> org/getopt:
>>> stempel
>>>
>>> org/getopt/stempel:
>>> Benchmark.class  lucene  Stemmer.class
>>>
>>> org/getopt/stempel/lucene:
>>> StempelAnalyzer.class  StempelFilter.class
>>> jgodawa@ubuntu:~/apache-solr-1.4.1/ifaq/lib$ ls -R META-INF/
>>> META-INF/:
>>> MANIFEST.MF
>>> jgodawa@ubuntu:~/apache-solr-1.4.1/ifaq/lib$ ls -R res
>>> res:
>>> tables
>>>
>>> res/tables:
>>> readme.txt  stemmer_1000.out  stemmer_100.out  stemmer_2000.out
>>> stemmer_200.out  stemmer_500.out  stemmer_700.out
>>>
>>> 2010/11/2 Bernd Fehling <be...@uni-bielefeld.de>:
>>>> Hi Jakub,
>>>>
>>>> if you unzip your stempel-1.0.jar do you have the
>>>> required directory structure and file in there?
>>>> org/getopt/stempel/lucene/StempelFilter.class
>>>>
>>>> Regards,
>>>> Bernd
>>>>
>>>> Am 02.11.2010 13:54, schrieb Jakub Godawa:
>>>>> Erick I've put the jar files like that before. I also added the
>>>>> directive and put the file in instanceDir/lib
>>>>>
>>>>> What is still a problem is that even the files are loaded:
>>>>> 2010-11-02 13:20:48 org.apache.solr.core.SolrResourceLoader replaceClassLoader
>>>>> INFO: Adding 'file:/home/jgodawa/apache-solr-1.4.1/ifaq/lib/stempel-1.0.jar'
>>>>> to classloader
>>>>>
>>>>> I am not able to use the FilterFactory... maybe I am attempting it in
>>>>> a wrong way?
>>>>>
>>>>> Cheers,
>>>>> Jakub Godawa.
>>>>>
>>>>> 2010/11/2 Erick Erickson <er...@gmail.com>:
>>>>>> The polish stemmer jar file needs to be findable by Solr, if you copy
>>>>>> it to <solr_home>/lib and restart solr you should be set.
>>>>>>
>>>>>> Alternatively, you can add another <lib> directive to the solrconfig.xml
>>>>>> file
>>>>>> (there are several examples in that file already).
>>>>>>
>>>>>> I'm a little confused about not being able to find TokenFilter, is that
>>>>>> still
>>>>>> a problem?
>>>>>>
>>>>>> HTH
>>>>>> Erick
>>>>>>
>>>>>> On Tue, Nov 2, 2010 at 8:07 AM, Jakub Godawa <ja...@gmail.com> wrote:
>>>>>>
>>>>>>> Thank you Bernd! I couldn't make it run though. Here is my problem:
>>>>>>>
>>>>>>> 1. There is a file ~/apache-solr-1.4.1/ifaq/lib/stempel-1.0.jar
>>>>>>> 2. In ~/apache-solr-1.4.1/ifaq/solr/conf/solrconfig.xml there is a
>>>>>>> directive: <lib path="../lib/stempel-1.0.jar" />
>>>>>>> 3. In ~/apache-solr-1.4.1/ifaq/solr/conf/schema.xml there is fieldType:
>>>>>>>
>>>>>>> (...)
>>>>>>>  <!-- Polish -->
>>>>>>>   <fieldType name="text_pl" class="solr.TextField">
>>>>>>>    <analyzer>
>>>>>>>       <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>>>>>>      <filter class="solr.LowerCaseFilterFactory"/>
>>>>>>>      <filter class="org.getopt.stempel.lucene.StempelFilter" />
>>>>>>>      <!--    <filter
>>>>>>> class="org.getopt.solr.analysis.StempelTokenFilterFactory"
>>>>>>> protected="protwords.txt" /> -->
>>>>>>>    </analyzer>
>>>>>>>  </fieldType>
>>>>>>> (...)
>>>>>>>
>>>>>>> 4. jar file is loaded but I got an error:
>>>>>>> SEVERE: Could not start SOLR. Check solr/home property
>>>>>>> java.lang.NoClassDefFoundError: org/apache/lucene/analysis/TokenFilter
>>>>>>>      at java.lang.ClassLoader.defineClass1(Native Method)
>>>>>>>      at java.lang.ClassLoader.defineClass(ClassLoader.java:634)
>>>>>>>      at
>>>>>>> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
>>>>>>> (...)
>>>>>>>
>>>>>>> 5. Different class gave me that one:
>>>>>>> SEVERE: org.apache.solr.common.SolrException: Error loading class
>>>>>>> 'org.getopt.solr.analysis.StempelTokenFilterFactory'
>>>>>>>      at
>>>>>>> org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:375)
>>>>>>>      at
>>>>>>> org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:390)
>>>>>>> (...)
>>>>>>>
>>>>>>> Question is: How to make <fieldType /> and <filter /> work with that
>>>>>>> Stempel? :)
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Jakub Godawa.
>>>>>>>
>>>>>>> 2010/10/29 Bernd Fehling <be...@uni-bielefeld.de>:
>>>>>>>> Hi Jakub,
>>>>>>>>
>>>>>>>> I have ported the KStemmer for use in most recent Solr trunk version.
>>>>>>>> My stemmer is located in the lib directory of Solr
>>>>>>> "solr/lib/KStemmer-2.00.jar"
>>>>>>>> because it belongs to Solr.
>>>>>>>>
>>>>>>>> Write it as FilterFactory and use it as Filter like:
>>>>>>>> <filter class="de.ubbielefeld.solr.analysis.KStemFilterFactory"
>>>>>>> protected="protwords.txt" />
>>>>>>>>
>>>>>>>> This is how my fieldType looks like:
>>>>>>>>
>>>>>>>>    <fieldType name="text_kstem" class="solr.TextField"
>>>>>>> positionIncrementGap="100">
>>>>>>>>      <analyzer type="index">
>>>>>>>>        <tokenizer class="solr.WhitespaceTokenizerFactory" />
>>>>>>>>        <filter class="solr.StopFilterFactory" ignoreCase="true"
>>>>>>> words="stopwords.txt" enablePositionIncrements="false" />
>>>>>>>>        <filter class="solr.WordDelimiterFilterFactory"
>>>>>>> generateWordParts="1" generateNumberParts="1" catenateWords="1"
>>>>>>> catenateNumbers="1"
>>>>>>>> catenateAll="0" splitOnCaseChange="1" />
>>>>>>>>        <filter class="solr.LowerCaseFilterFactory" />
>>>>>>>>        <filter class="de.ubbielefeld.solr.analysis.KStemFilterFactory"
>>>>>>> protected="protwords.txt" />
>>>>>>>>        <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
>>>>>>>>      </analyzer>
>>>>>>>>      <analyzer type="query">
>>>>>>>>        <tokenizer class="solr.WhitespaceTokenizerFactory" />
>>>>>>>>        <filter class="solr.StopFilterFactory" ignoreCase="true"
>>>>>>> words="stopwords.txt" />
>>>>>>>>        <filter class="solr.WordDelimiterFilterFactory"
>>>>>>> generateWordParts="1" generateNumberParts="1" catenateWords="0"
>>>>>>> catenateNumbers="0"
>>>>>>>> catenateAll="0" splitOnCaseChange="1" />
>>>>>>>>        <filter class="solr.LowerCaseFilterFactory" />
>>>>>>>>        <filter class="de.ubbielefeld.solr.analysis.KStemFilterFactory"
>>>>>>> protected="protwords.txt" />
>>>>>>>>        <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
>>>>>>>>      </analyzer>
>>>>>>>>    </fieldType>
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Bernd
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Am 28.10.2010 14:56, schrieb Jakub Godawa:
>>>>>>>>> Hi!
>>>>>>>>> There is a polish stemmer http://www.getopt.org/stempel/ and I have
>>>>>>>>> problems connecting it with solr 1.4.1
>>>>>>>>> Questions:
>>>>>>>>>
>>>>>>>>> 1. Where EXACTLY do I put "stemper-1.0.jar" file?
>>>>>>>>> 2. How do I register the file, so I can build a fieldType like:
>>>>>>>>>
>>>>>>>>> <fieldType name="text_pl" class="solr.TextField">
>>>>>>>>>   <analyzer class="org.geoopt.solr.analysis.StempelTokenFilterFactory"/>
>>>>>>>>> </fieldType>
>>>>>>>>>
>>>>>>>>> 3. Is that the right approach to make it work?
>>>>>>>>>
>>>>>>>>> Thanks for verbose explanation,
>>>>>>>>> Jakub.
>>
>



-- 
Lance Norskog
goksron@gmail.com

Re: How to use polish stemmer - Stempel - in schema.xml?

Posted by Jakub Godawa <ja...@gmail.com>.
Sorry, I am not Java programmer at all. I would appreciate more
verbose (or step by step) help.

2010/11/2 Bernd Fehling <be...@uni-bielefeld.de>:
>
> So you call org.getopt.solr.analysis.StempelTokenFilterFactory.
> In this case I would assume a file StempelTokenFilterFactory.class
> in your directory org/getopt/solr/analysis/.
>
> And a class which extends the BaseTokenFilterFactory rigth?
> ...
> public class StempelTokenFilterFactory extends BaseTokenFilterFactory implements ResourceLoaderAware {
> ...
>
>
>
> Am 02.11.2010 14:20, schrieb Jakub Godawa:
>> This is what stempel-1.0.jar consist of after jar -xf:
>>
>> jgodawa@ubuntu:~/apache-solr-1.4.1/ifaq/lib$ ls -R org/
>> org/:
>> egothor  getopt
>>
>> org/egothor:
>> stemmer
>>
>> org/egothor/stemmer:
>> Cell.class     Diff.class    Gener.class  MultiTrie2.class
>> Optimizer2.class  Reduce.class        Row.class    TestAll.class
>> TestLoad.class  Trie$StrEnum.class
>> Compile.class  DiffIt.class  Lift.class   MultiTrie.class
>> Optimizer.class   Reduce$Remap.class  Stock.class  Test.class
>> Trie.class
>>
>> org/getopt:
>> stempel
>>
>> org/getopt/stempel:
>> Benchmark.class  lucene  Stemmer.class
>>
>> org/getopt/stempel/lucene:
>> StempelAnalyzer.class  StempelFilter.class
>> jgodawa@ubuntu:~/apache-solr-1.4.1/ifaq/lib$ ls -R META-INF/
>> META-INF/:
>> MANIFEST.MF
>> jgodawa@ubuntu:~/apache-solr-1.4.1/ifaq/lib$ ls -R res
>> res:
>> tables
>>
>> res/tables:
>> readme.txt  stemmer_1000.out  stemmer_100.out  stemmer_2000.out
>> stemmer_200.out  stemmer_500.out  stemmer_700.out
>>
>> 2010/11/2 Bernd Fehling <be...@uni-bielefeld.de>:
>>> Hi Jakub,
>>>
>>> if you unzip your stempel-1.0.jar do you have the
>>> required directory structure and file in there?
>>> org/getopt/stempel/lucene/StempelFilter.class
>>>
>>> Regards,
>>> Bernd
>>>
>>> Am 02.11.2010 13:54, schrieb Jakub Godawa:
>>>> Erick I've put the jar files like that before. I also added the
>>>> directive and put the file in instanceDir/lib
>>>>
>>>> What is still a problem is that even the files are loaded:
>>>> 2010-11-02 13:20:48 org.apache.solr.core.SolrResourceLoader replaceClassLoader
>>>> INFO: Adding 'file:/home/jgodawa/apache-solr-1.4.1/ifaq/lib/stempel-1.0.jar'
>>>> to classloader
>>>>
>>>> I am not able to use the FilterFactory... maybe I am attempting it in
>>>> a wrong way?
>>>>
>>>> Cheers,
>>>> Jakub Godawa.
>>>>
>>>> 2010/11/2 Erick Erickson <er...@gmail.com>:
>>>>> The polish stemmer jar file needs to be findable by Solr, if you copy
>>>>> it to <solr_home>/lib and restart solr you should be set.
>>>>>
>>>>> Alternatively, you can add another <lib> directive to the solrconfig.xml
>>>>> file
>>>>> (there are several examples in that file already).
>>>>>
>>>>> I'm a little confused about not being able to find TokenFilter, is that
>>>>> still
>>>>> a problem?
>>>>>
>>>>> HTH
>>>>> Erick
>>>>>
>>>>> On Tue, Nov 2, 2010 at 8:07 AM, Jakub Godawa <ja...@gmail.com> wrote:
>>>>>
>>>>>> Thank you Bernd! I couldn't make it run though. Here is my problem:
>>>>>>
>>>>>> 1. There is a file ~/apache-solr-1.4.1/ifaq/lib/stempel-1.0.jar
>>>>>> 2. In ~/apache-solr-1.4.1/ifaq/solr/conf/solrconfig.xml there is a
>>>>>> directive: <lib path="../lib/stempel-1.0.jar" />
>>>>>> 3. In ~/apache-solr-1.4.1/ifaq/solr/conf/schema.xml there is fieldType:
>>>>>>
>>>>>> (...)
>>>>>>  <!-- Polish -->
>>>>>>   <fieldType name="text_pl" class="solr.TextField">
>>>>>>    <analyzer>
>>>>>>       <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>>>>>      <filter class="solr.LowerCaseFilterFactory"/>
>>>>>>      <filter class="org.getopt.stempel.lucene.StempelFilter" />
>>>>>>      <!--    <filter
>>>>>> class="org.getopt.solr.analysis.StempelTokenFilterFactory"
>>>>>> protected="protwords.txt" /> -->
>>>>>>    </analyzer>
>>>>>>  </fieldType>
>>>>>> (...)
>>>>>>
>>>>>> 4. jar file is loaded but I got an error:
>>>>>> SEVERE: Could not start SOLR. Check solr/home property
>>>>>> java.lang.NoClassDefFoundError: org/apache/lucene/analysis/TokenFilter
>>>>>>      at java.lang.ClassLoader.defineClass1(Native Method)
>>>>>>      at java.lang.ClassLoader.defineClass(ClassLoader.java:634)
>>>>>>      at
>>>>>> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
>>>>>> (...)
>>>>>>
>>>>>> 5. Different class gave me that one:
>>>>>> SEVERE: org.apache.solr.common.SolrException: Error loading class
>>>>>> 'org.getopt.solr.analysis.StempelTokenFilterFactory'
>>>>>>      at
>>>>>> org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:375)
>>>>>>      at
>>>>>> org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:390)
>>>>>> (...)
>>>>>>
>>>>>> Question is: How to make <fieldType /> and <filter /> work with that
>>>>>> Stempel? :)
>>>>>>
>>>>>> Cheers,
>>>>>> Jakub Godawa.
>>>>>>
>>>>>> 2010/10/29 Bernd Fehling <be...@uni-bielefeld.de>:
>>>>>>> Hi Jakub,
>>>>>>>
>>>>>>> I have ported the KStemmer for use in most recent Solr trunk version.
>>>>>>> My stemmer is located in the lib directory of Solr
>>>>>> "solr/lib/KStemmer-2.00.jar"
>>>>>>> because it belongs to Solr.
>>>>>>>
>>>>>>> Write it as FilterFactory and use it as Filter like:
>>>>>>> <filter class="de.ubbielefeld.solr.analysis.KStemFilterFactory"
>>>>>> protected="protwords.txt" />
>>>>>>>
>>>>>>> This is how my fieldType looks like:
>>>>>>>
>>>>>>>    <fieldType name="text_kstem" class="solr.TextField"
>>>>>> positionIncrementGap="100">
>>>>>>>      <analyzer type="index">
>>>>>>>        <tokenizer class="solr.WhitespaceTokenizerFactory" />
>>>>>>>        <filter class="solr.StopFilterFactory" ignoreCase="true"
>>>>>> words="stopwords.txt" enablePositionIncrements="false" />
>>>>>>>        <filter class="solr.WordDelimiterFilterFactory"
>>>>>> generateWordParts="1" generateNumberParts="1" catenateWords="1"
>>>>>> catenateNumbers="1"
>>>>>>> catenateAll="0" splitOnCaseChange="1" />
>>>>>>>        <filter class="solr.LowerCaseFilterFactory" />
>>>>>>>        <filter class="de.ubbielefeld.solr.analysis.KStemFilterFactory"
>>>>>> protected="protwords.txt" />
>>>>>>>        <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
>>>>>>>      </analyzer>
>>>>>>>      <analyzer type="query">
>>>>>>>        <tokenizer class="solr.WhitespaceTokenizerFactory" />
>>>>>>>        <filter class="solr.StopFilterFactory" ignoreCase="true"
>>>>>> words="stopwords.txt" />
>>>>>>>        <filter class="solr.WordDelimiterFilterFactory"
>>>>>> generateWordParts="1" generateNumberParts="1" catenateWords="0"
>>>>>> catenateNumbers="0"
>>>>>>> catenateAll="0" splitOnCaseChange="1" />
>>>>>>>        <filter class="solr.LowerCaseFilterFactory" />
>>>>>>>        <filter class="de.ubbielefeld.solr.analysis.KStemFilterFactory"
>>>>>> protected="protwords.txt" />
>>>>>>>        <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
>>>>>>>      </analyzer>
>>>>>>>    </fieldType>
>>>>>>>
>>>>>>> Regards,
>>>>>>> Bernd
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Am 28.10.2010 14:56, schrieb Jakub Godawa:
>>>>>>>> Hi!
>>>>>>>> There is a polish stemmer http://www.getopt.org/stempel/ and I have
>>>>>>>> problems connecting it with solr 1.4.1
>>>>>>>> Questions:
>>>>>>>>
>>>>>>>> 1. Where EXACTLY do I put "stemper-1.0.jar" file?
>>>>>>>> 2. How do I register the file, so I can build a fieldType like:
>>>>>>>>
>>>>>>>> <fieldType name="text_pl" class="solr.TextField">
>>>>>>>>   <analyzer class="org.geoopt.solr.analysis.StempelTokenFilterFactory"/>
>>>>>>>> </fieldType>
>>>>>>>>
>>>>>>>> 3. Is that the right approach to make it work?
>>>>>>>>
>>>>>>>> Thanks for verbose explanation,
>>>>>>>> Jakub.
>

Re: How to use polish stemmer - Stempel - in schema.xml?

Posted by Bernd Fehling <be...@uni-bielefeld.de>.
So you call org.getopt.solr.analysis.StempelTokenFilterFactory.
In this case I would assume a file StempelTokenFilterFactory.class
in your directory org/getopt/solr/analysis/.

And a class which extends the BaseTokenFilterFactory rigth?
...
public class StempelTokenFilterFactory extends BaseTokenFilterFactory implements ResourceLoaderAware {
...



Am 02.11.2010 14:20, schrieb Jakub Godawa:
> This is what stempel-1.0.jar consist of after jar -xf:
> 
> jgodawa@ubuntu:~/apache-solr-1.4.1/ifaq/lib$ ls -R org/
> org/:
> egothor  getopt
> 
> org/egothor:
> stemmer
> 
> org/egothor/stemmer:
> Cell.class     Diff.class    Gener.class  MultiTrie2.class
> Optimizer2.class  Reduce.class        Row.class    TestAll.class
> TestLoad.class  Trie$StrEnum.class
> Compile.class  DiffIt.class  Lift.class   MultiTrie.class
> Optimizer.class   Reduce$Remap.class  Stock.class  Test.class
> Trie.class
> 
> org/getopt:
> stempel
> 
> org/getopt/stempel:
> Benchmark.class  lucene  Stemmer.class
> 
> org/getopt/stempel/lucene:
> StempelAnalyzer.class  StempelFilter.class
> jgodawa@ubuntu:~/apache-solr-1.4.1/ifaq/lib$ ls -R META-INF/
> META-INF/:
> MANIFEST.MF
> jgodawa@ubuntu:~/apache-solr-1.4.1/ifaq/lib$ ls -R res
> res:
> tables
> 
> res/tables:
> readme.txt  stemmer_1000.out  stemmer_100.out  stemmer_2000.out
> stemmer_200.out  stemmer_500.out  stemmer_700.out
> 
> 2010/11/2 Bernd Fehling <be...@uni-bielefeld.de>:
>> Hi Jakub,
>>
>> if you unzip your stempel-1.0.jar do you have the
>> required directory structure and file in there?
>> org/getopt/stempel/lucene/StempelFilter.class
>>
>> Regards,
>> Bernd
>>
>> Am 02.11.2010 13:54, schrieb Jakub Godawa:
>>> Erick I've put the jar files like that before. I also added the
>>> directive and put the file in instanceDir/lib
>>>
>>> What is still a problem is that even the files are loaded:
>>> 2010-11-02 13:20:48 org.apache.solr.core.SolrResourceLoader replaceClassLoader
>>> INFO: Adding 'file:/home/jgodawa/apache-solr-1.4.1/ifaq/lib/stempel-1.0.jar'
>>> to classloader
>>>
>>> I am not able to use the FilterFactory... maybe I am attempting it in
>>> a wrong way?
>>>
>>> Cheers,
>>> Jakub Godawa.
>>>
>>> 2010/11/2 Erick Erickson <er...@gmail.com>:
>>>> The polish stemmer jar file needs to be findable by Solr, if you copy
>>>> it to <solr_home>/lib and restart solr you should be set.
>>>>
>>>> Alternatively, you can add another <lib> directive to the solrconfig.xml
>>>> file
>>>> (there are several examples in that file already).
>>>>
>>>> I'm a little confused about not being able to find TokenFilter, is that
>>>> still
>>>> a problem?
>>>>
>>>> HTH
>>>> Erick
>>>>
>>>> On Tue, Nov 2, 2010 at 8:07 AM, Jakub Godawa <ja...@gmail.com> wrote:
>>>>
>>>>> Thank you Bernd! I couldn't make it run though. Here is my problem:
>>>>>
>>>>> 1. There is a file ~/apache-solr-1.4.1/ifaq/lib/stempel-1.0.jar
>>>>> 2. In ~/apache-solr-1.4.1/ifaq/solr/conf/solrconfig.xml there is a
>>>>> directive: <lib path="../lib/stempel-1.0.jar" />
>>>>> 3. In ~/apache-solr-1.4.1/ifaq/solr/conf/schema.xml there is fieldType:
>>>>>
>>>>> (...)
>>>>>  <!-- Polish -->
>>>>>   <fieldType name="text_pl" class="solr.TextField">
>>>>>    <analyzer>
>>>>>       <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>>>>      <filter class="solr.LowerCaseFilterFactory"/>
>>>>>      <filter class="org.getopt.stempel.lucene.StempelFilter" />
>>>>>      <!--    <filter
>>>>> class="org.getopt.solr.analysis.StempelTokenFilterFactory"
>>>>> protected="protwords.txt" /> -->
>>>>>    </analyzer>
>>>>>  </fieldType>
>>>>> (...)
>>>>>
>>>>> 4. jar file is loaded but I got an error:
>>>>> SEVERE: Could not start SOLR. Check solr/home property
>>>>> java.lang.NoClassDefFoundError: org/apache/lucene/analysis/TokenFilter
>>>>>      at java.lang.ClassLoader.defineClass1(Native Method)
>>>>>      at java.lang.ClassLoader.defineClass(ClassLoader.java:634)
>>>>>      at
>>>>> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
>>>>> (...)
>>>>>
>>>>> 5. Different class gave me that one:
>>>>> SEVERE: org.apache.solr.common.SolrException: Error loading class
>>>>> 'org.getopt.solr.analysis.StempelTokenFilterFactory'
>>>>>      at
>>>>> org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:375)
>>>>>      at
>>>>> org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:390)
>>>>> (...)
>>>>>
>>>>> Question is: How to make <fieldType /> and <filter /> work with that
>>>>> Stempel? :)
>>>>>
>>>>> Cheers,
>>>>> Jakub Godawa.
>>>>>
>>>>> 2010/10/29 Bernd Fehling <be...@uni-bielefeld.de>:
>>>>>> Hi Jakub,
>>>>>>
>>>>>> I have ported the KStemmer for use in most recent Solr trunk version.
>>>>>> My stemmer is located in the lib directory of Solr
>>>>> "solr/lib/KStemmer-2.00.jar"
>>>>>> because it belongs to Solr.
>>>>>>
>>>>>> Write it as FilterFactory and use it as Filter like:
>>>>>> <filter class="de.ubbielefeld.solr.analysis.KStemFilterFactory"
>>>>> protected="protwords.txt" />
>>>>>>
>>>>>> This is how my fieldType looks like:
>>>>>>
>>>>>>    <fieldType name="text_kstem" class="solr.TextField"
>>>>> positionIncrementGap="100">
>>>>>>      <analyzer type="index">
>>>>>>        <tokenizer class="solr.WhitespaceTokenizerFactory" />
>>>>>>        <filter class="solr.StopFilterFactory" ignoreCase="true"
>>>>> words="stopwords.txt" enablePositionIncrements="false" />
>>>>>>        <filter class="solr.WordDelimiterFilterFactory"
>>>>> generateWordParts="1" generateNumberParts="1" catenateWords="1"
>>>>> catenateNumbers="1"
>>>>>> catenateAll="0" splitOnCaseChange="1" />
>>>>>>        <filter class="solr.LowerCaseFilterFactory" />
>>>>>>        <filter class="de.ubbielefeld.solr.analysis.KStemFilterFactory"
>>>>> protected="protwords.txt" />
>>>>>>        <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
>>>>>>      </analyzer>
>>>>>>      <analyzer type="query">
>>>>>>        <tokenizer class="solr.WhitespaceTokenizerFactory" />
>>>>>>        <filter class="solr.StopFilterFactory" ignoreCase="true"
>>>>> words="stopwords.txt" />
>>>>>>        <filter class="solr.WordDelimiterFilterFactory"
>>>>> generateWordParts="1" generateNumberParts="1" catenateWords="0"
>>>>> catenateNumbers="0"
>>>>>> catenateAll="0" splitOnCaseChange="1" />
>>>>>>        <filter class="solr.LowerCaseFilterFactory" />
>>>>>>        <filter class="de.ubbielefeld.solr.analysis.KStemFilterFactory"
>>>>> protected="protwords.txt" />
>>>>>>        <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
>>>>>>      </analyzer>
>>>>>>    </fieldType>
>>>>>>
>>>>>> Regards,
>>>>>> Bernd
>>>>>>
>>>>>>
>>>>>>
>>>>>> Am 28.10.2010 14:56, schrieb Jakub Godawa:
>>>>>>> Hi!
>>>>>>> There is a polish stemmer http://www.getopt.org/stempel/ and I have
>>>>>>> problems connecting it with solr 1.4.1
>>>>>>> Questions:
>>>>>>>
>>>>>>> 1. Where EXACTLY do I put "stemper-1.0.jar" file?
>>>>>>> 2. How do I register the file, so I can build a fieldType like:
>>>>>>>
>>>>>>> <fieldType name="text_pl" class="solr.TextField">
>>>>>>>   <analyzer class="org.geoopt.solr.analysis.StempelTokenFilterFactory"/>
>>>>>>> </fieldType>
>>>>>>>
>>>>>>> 3. Is that the right approach to make it work?
>>>>>>>
>>>>>>> Thanks for verbose explanation,
>>>>>>> Jakub.

Re: How to use polish stemmer - Stempel - in schema.xml?

Posted by Jakub Godawa <ja...@gmail.com>.
This is what stempel-1.0.jar consist of after jar -xf:

jgodawa@ubuntu:~/apache-solr-1.4.1/ifaq/lib$ ls -R org/
org/:
egothor  getopt

org/egothor:
stemmer

org/egothor/stemmer:
Cell.class     Diff.class    Gener.class  MultiTrie2.class
Optimizer2.class  Reduce.class        Row.class    TestAll.class
TestLoad.class  Trie$StrEnum.class
Compile.class  DiffIt.class  Lift.class   MultiTrie.class
Optimizer.class   Reduce$Remap.class  Stock.class  Test.class
Trie.class

org/getopt:
stempel

org/getopt/stempel:
Benchmark.class  lucene  Stemmer.class

org/getopt/stempel/lucene:
StempelAnalyzer.class  StempelFilter.class
jgodawa@ubuntu:~/apache-solr-1.4.1/ifaq/lib$ ls -R META-INF/
META-INF/:
MANIFEST.MF
jgodawa@ubuntu:~/apache-solr-1.4.1/ifaq/lib$ ls -R res
res:
tables

res/tables:
readme.txt  stemmer_1000.out  stemmer_100.out  stemmer_2000.out
stemmer_200.out  stemmer_500.out  stemmer_700.out

2010/11/2 Bernd Fehling <be...@uni-bielefeld.de>:
> Hi Jakub,
>
> if you unzip your stempel-1.0.jar do you have the
> required directory structure and file in there?
> org/getopt/stempel/lucene/StempelFilter.class
>
> Regards,
> Bernd
>
> Am 02.11.2010 13:54, schrieb Jakub Godawa:
>> Erick I've put the jar files like that before. I also added the
>> directive and put the file in instanceDir/lib
>>
>> What is still a problem is that even the files are loaded:
>> 2010-11-02 13:20:48 org.apache.solr.core.SolrResourceLoader replaceClassLoader
>> INFO: Adding 'file:/home/jgodawa/apache-solr-1.4.1/ifaq/lib/stempel-1.0.jar'
>> to classloader
>>
>> I am not able to use the FilterFactory... maybe I am attempting it in
>> a wrong way?
>>
>> Cheers,
>> Jakub Godawa.
>>
>> 2010/11/2 Erick Erickson <er...@gmail.com>:
>>> The polish stemmer jar file needs to be findable by Solr, if you copy
>>> it to <solr_home>/lib and restart solr you should be set.
>>>
>>> Alternatively, you can add another <lib> directive to the solrconfig.xml
>>> file
>>> (there are several examples in that file already).
>>>
>>> I'm a little confused about not being able to find TokenFilter, is that
>>> still
>>> a problem?
>>>
>>> HTH
>>> Erick
>>>
>>> On Tue, Nov 2, 2010 at 8:07 AM, Jakub Godawa <ja...@gmail.com> wrote:
>>>
>>>> Thank you Bernd! I couldn't make it run though. Here is my problem:
>>>>
>>>> 1. There is a file ~/apache-solr-1.4.1/ifaq/lib/stempel-1.0.jar
>>>> 2. In ~/apache-solr-1.4.1/ifaq/solr/conf/solrconfig.xml there is a
>>>> directive: <lib path="../lib/stempel-1.0.jar" />
>>>> 3. In ~/apache-solr-1.4.1/ifaq/solr/conf/schema.xml there is fieldType:
>>>>
>>>> (...)
>>>>  <!-- Polish -->
>>>>   <fieldType name="text_pl" class="solr.TextField">
>>>>    <analyzer>
>>>>       <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>>>      <filter class="solr.LowerCaseFilterFactory"/>
>>>>      <filter class="org.getopt.stempel.lucene.StempelFilter" />
>>>>      <!--    <filter
>>>> class="org.getopt.solr.analysis.StempelTokenFilterFactory"
>>>> protected="protwords.txt" /> -->
>>>>    </analyzer>
>>>>  </fieldType>
>>>> (...)
>>>>
>>>> 4. jar file is loaded but I got an error:
>>>> SEVERE: Could not start SOLR. Check solr/home property
>>>> java.lang.NoClassDefFoundError: org/apache/lucene/analysis/TokenFilter
>>>>      at java.lang.ClassLoader.defineClass1(Native Method)
>>>>      at java.lang.ClassLoader.defineClass(ClassLoader.java:634)
>>>>      at
>>>> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
>>>> (...)
>>>>
>>>> 5. Different class gave me that one:
>>>> SEVERE: org.apache.solr.common.SolrException: Error loading class
>>>> 'org.getopt.solr.analysis.StempelTokenFilterFactory'
>>>>      at
>>>> org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:375)
>>>>      at
>>>> org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:390)
>>>> (...)
>>>>
>>>> Question is: How to make <fieldType /> and <filter /> work with that
>>>> Stempel? :)
>>>>
>>>> Cheers,
>>>> Jakub Godawa.
>>>>
>>>> 2010/10/29 Bernd Fehling <be...@uni-bielefeld.de>:
>>>>> Hi Jakub,
>>>>>
>>>>> I have ported the KStemmer for use in most recent Solr trunk version.
>>>>> My stemmer is located in the lib directory of Solr
>>>> "solr/lib/KStemmer-2.00.jar"
>>>>> because it belongs to Solr.
>>>>>
>>>>> Write it as FilterFactory and use it as Filter like:
>>>>> <filter class="de.ubbielefeld.solr.analysis.KStemFilterFactory"
>>>> protected="protwords.txt" />
>>>>>
>>>>> This is how my fieldType looks like:
>>>>>
>>>>>    <fieldType name="text_kstem" class="solr.TextField"
>>>> positionIncrementGap="100">
>>>>>      <analyzer type="index">
>>>>>        <tokenizer class="solr.WhitespaceTokenizerFactory" />
>>>>>        <filter class="solr.StopFilterFactory" ignoreCase="true"
>>>> words="stopwords.txt" enablePositionIncrements="false" />
>>>>>        <filter class="solr.WordDelimiterFilterFactory"
>>>> generateWordParts="1" generateNumberParts="1" catenateWords="1"
>>>> catenateNumbers="1"
>>>>> catenateAll="0" splitOnCaseChange="1" />
>>>>>        <filter class="solr.LowerCaseFilterFactory" />
>>>>>        <filter class="de.ubbielefeld.solr.analysis.KStemFilterFactory"
>>>> protected="protwords.txt" />
>>>>>        <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
>>>>>      </analyzer>
>>>>>      <analyzer type="query">
>>>>>        <tokenizer class="solr.WhitespaceTokenizerFactory" />
>>>>>        <filter class="solr.StopFilterFactory" ignoreCase="true"
>>>> words="stopwords.txt" />
>>>>>        <filter class="solr.WordDelimiterFilterFactory"
>>>> generateWordParts="1" generateNumberParts="1" catenateWords="0"
>>>> catenateNumbers="0"
>>>>> catenateAll="0" splitOnCaseChange="1" />
>>>>>        <filter class="solr.LowerCaseFilterFactory" />
>>>>>        <filter class="de.ubbielefeld.solr.analysis.KStemFilterFactory"
>>>> protected="protwords.txt" />
>>>>>        <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
>>>>>      </analyzer>
>>>>>    </fieldType>
>>>>>
>>>>> Regards,
>>>>> Bernd
>>>>>
>>>>>
>>>>>
>>>>> Am 28.10.2010 14:56, schrieb Jakub Godawa:
>>>>>> Hi!
>>>>>> There is a polish stemmer http://www.getopt.org/stempel/ and I have
>>>>>> problems connecting it with solr 1.4.1
>>>>>> Questions:
>>>>>>
>>>>>> 1. Where EXACTLY do I put "stemper-1.0.jar" file?
>>>>>> 2. How do I register the file, so I can build a fieldType like:
>>>>>>
>>>>>> <fieldType name="text_pl" class="solr.TextField">
>>>>>>   <analyzer class="org.geoopt.solr.analysis.StempelTokenFilterFactory"/>
>>>>>> </fieldType>
>>>>>>
>>>>>> 3. Is that the right approach to make it work?
>>>>>>
>>>>>> Thanks for verbose explanation,
>>>>>> Jakub.
>>>>>
>>>>
>>>
>
> --
> *************************************************************
> Bernd Fehling                Universitätsbibliothek Bielefeld
> Dipl.-Inform. (FH)                        Universitätsstr. 25
> Tel. +49 521 106-4060                   Fax. +49 521 106-4052
> bernd.fehling@uni-bielefeld.de                33615 Bielefeld
>
> BASE - Bielefeld Academic Search Engine - www.base-search.net
> *************************************************************
>

Re: How to use polish stemmer - Stempel - in schema.xml?

Posted by Bernd Fehling <be...@uni-bielefeld.de>.
Hi Jakub,

if you unzip your stempel-1.0.jar do you have the
required directory structure and file in there?
org/getopt/stempel/lucene/StempelFilter.class

Regards,
Bernd

Am 02.11.2010 13:54, schrieb Jakub Godawa:
> Erick I've put the jar files like that before. I also added the
> directive and put the file in instanceDir/lib
> 
> What is still a problem is that even the files are loaded:
> 2010-11-02 13:20:48 org.apache.solr.core.SolrResourceLoader replaceClassLoader
> INFO: Adding 'file:/home/jgodawa/apache-solr-1.4.1/ifaq/lib/stempel-1.0.jar'
> to classloader
> 
> I am not able to use the FilterFactory... maybe I am attempting it in
> a wrong way?
> 
> Cheers,
> Jakub Godawa.
> 
> 2010/11/2 Erick Erickson <er...@gmail.com>:
>> The polish stemmer jar file needs to be findable by Solr, if you copy
>> it to <solr_home>/lib and restart solr you should be set.
>>
>> Alternatively, you can add another <lib> directive to the solrconfig.xml
>> file
>> (there are several examples in that file already).
>>
>> I'm a little confused about not being able to find TokenFilter, is that
>> still
>> a problem?
>>
>> HTH
>> Erick
>>
>> On Tue, Nov 2, 2010 at 8:07 AM, Jakub Godawa <ja...@gmail.com> wrote:
>>
>>> Thank you Bernd! I couldn't make it run though. Here is my problem:
>>>
>>> 1. There is a file ~/apache-solr-1.4.1/ifaq/lib/stempel-1.0.jar
>>> 2. In ~/apache-solr-1.4.1/ifaq/solr/conf/solrconfig.xml there is a
>>> directive: <lib path="../lib/stempel-1.0.jar" />
>>> 3. In ~/apache-solr-1.4.1/ifaq/solr/conf/schema.xml there is fieldType:
>>>
>>> (...)
>>>  <!-- Polish -->
>>>   <fieldType name="text_pl" class="solr.TextField">
>>>    <analyzer>
>>>       <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>>      <filter class="solr.LowerCaseFilterFactory"/>
>>>      <filter class="org.getopt.stempel.lucene.StempelFilter" />
>>>      <!--    <filter
>>> class="org.getopt.solr.analysis.StempelTokenFilterFactory"
>>> protected="protwords.txt" /> -->
>>>    </analyzer>
>>>  </fieldType>
>>> (...)
>>>
>>> 4. jar file is loaded but I got an error:
>>> SEVERE: Could not start SOLR. Check solr/home property
>>> java.lang.NoClassDefFoundError: org/apache/lucene/analysis/TokenFilter
>>>      at java.lang.ClassLoader.defineClass1(Native Method)
>>>      at java.lang.ClassLoader.defineClass(ClassLoader.java:634)
>>>      at
>>> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
>>> (...)
>>>
>>> 5. Different class gave me that one:
>>> SEVERE: org.apache.solr.common.SolrException: Error loading class
>>> 'org.getopt.solr.analysis.StempelTokenFilterFactory'
>>>      at
>>> org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:375)
>>>      at
>>> org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:390)
>>> (...)
>>>
>>> Question is: How to make <fieldType /> and <filter /> work with that
>>> Stempel? :)
>>>
>>> Cheers,
>>> Jakub Godawa.
>>>
>>> 2010/10/29 Bernd Fehling <be...@uni-bielefeld.de>:
>>>> Hi Jakub,
>>>>
>>>> I have ported the KStemmer for use in most recent Solr trunk version.
>>>> My stemmer is located in the lib directory of Solr
>>> "solr/lib/KStemmer-2.00.jar"
>>>> because it belongs to Solr.
>>>>
>>>> Write it as FilterFactory and use it as Filter like:
>>>> <filter class="de.ubbielefeld.solr.analysis.KStemFilterFactory"
>>> protected="protwords.txt" />
>>>>
>>>> This is how my fieldType looks like:
>>>>
>>>>    <fieldType name="text_kstem" class="solr.TextField"
>>> positionIncrementGap="100">
>>>>      <analyzer type="index">
>>>>        <tokenizer class="solr.WhitespaceTokenizerFactory" />
>>>>        <filter class="solr.StopFilterFactory" ignoreCase="true"
>>> words="stopwords.txt" enablePositionIncrements="false" />
>>>>        <filter class="solr.WordDelimiterFilterFactory"
>>> generateWordParts="1" generateNumberParts="1" catenateWords="1"
>>> catenateNumbers="1"
>>>> catenateAll="0" splitOnCaseChange="1" />
>>>>        <filter class="solr.LowerCaseFilterFactory" />
>>>>        <filter class="de.ubbielefeld.solr.analysis.KStemFilterFactory"
>>> protected="protwords.txt" />
>>>>        <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
>>>>      </analyzer>
>>>>      <analyzer type="query">
>>>>        <tokenizer class="solr.WhitespaceTokenizerFactory" />
>>>>        <filter class="solr.StopFilterFactory" ignoreCase="true"
>>> words="stopwords.txt" />
>>>>        <filter class="solr.WordDelimiterFilterFactory"
>>> generateWordParts="1" generateNumberParts="1" catenateWords="0"
>>> catenateNumbers="0"
>>>> catenateAll="0" splitOnCaseChange="1" />
>>>>        <filter class="solr.LowerCaseFilterFactory" />
>>>>        <filter class="de.ubbielefeld.solr.analysis.KStemFilterFactory"
>>> protected="protwords.txt" />
>>>>        <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
>>>>      </analyzer>
>>>>    </fieldType>
>>>>
>>>> Regards,
>>>> Bernd
>>>>
>>>>
>>>>
>>>> Am 28.10.2010 14:56, schrieb Jakub Godawa:
>>>>> Hi!
>>>>> There is a polish stemmer http://www.getopt.org/stempel/ and I have
>>>>> problems connecting it with solr 1.4.1
>>>>> Questions:
>>>>>
>>>>> 1. Where EXACTLY do I put "stemper-1.0.jar" file?
>>>>> 2. How do I register the file, so I can build a fieldType like:
>>>>>
>>>>> <fieldType name="text_pl" class="solr.TextField">
>>>>>   <analyzer class="org.geoopt.solr.analysis.StempelTokenFilterFactory"/>
>>>>> </fieldType>
>>>>>
>>>>> 3. Is that the right approach to make it work?
>>>>>
>>>>> Thanks for verbose explanation,
>>>>> Jakub.
>>>>
>>>
>>

-- 
*************************************************************
Bernd Fehling                Universitätsbibliothek Bielefeld
Dipl.-Inform. (FH)                        Universitätsstr. 25
Tel. +49 521 106-4060                   Fax. +49 521 106-4052
bernd.fehling@uni-bielefeld.de                33615 Bielefeld

BASE - Bielefeld Academic Search Engine - www.base-search.net
*************************************************************

Re: How to use polish stemmer - Stempel - in schema.xml?

Posted by Jakub Godawa <ja...@gmail.com>.
Erick I've put the jar files like that before. I also added the
directive and put the file in instanceDir/lib

What is still a problem is that even the files are loaded:
2010-11-02 13:20:48 org.apache.solr.core.SolrResourceLoader replaceClassLoader
INFO: Adding 'file:/home/jgodawa/apache-solr-1.4.1/ifaq/lib/stempel-1.0.jar'
to classloader

I am not able to use the FilterFactory... maybe I am attempting it in
a wrong way?

Cheers,
Jakub Godawa.

2010/11/2 Erick Erickson <er...@gmail.com>:
> The polish stemmer jar file needs to be findable by Solr, if you copy
> it to <solr_home>/lib and restart solr you should be set.
>
> Alternatively, you can add another <lib> directive to the solrconfig.xml
> file
> (there are several examples in that file already).
>
> I'm a little confused about not being able to find TokenFilter, is that
> still
> a problem?
>
> HTH
> Erick
>
> On Tue, Nov 2, 2010 at 8:07 AM, Jakub Godawa <ja...@gmail.com> wrote:
>
>> Thank you Bernd! I couldn't make it run though. Here is my problem:
>>
>> 1. There is a file ~/apache-solr-1.4.1/ifaq/lib/stempel-1.0.jar
>> 2. In ~/apache-solr-1.4.1/ifaq/solr/conf/solrconfig.xml there is a
>> directive: <lib path="../lib/stempel-1.0.jar" />
>> 3. In ~/apache-solr-1.4.1/ifaq/solr/conf/schema.xml there is fieldType:
>>
>> (...)
>>  <!-- Polish -->
>>   <fieldType name="text_pl" class="solr.TextField">
>>    <analyzer>
>>       <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>      <filter class="solr.LowerCaseFilterFactory"/>
>>      <filter class="org.getopt.stempel.lucene.StempelFilter" />
>>      <!--    <filter
>> class="org.getopt.solr.analysis.StempelTokenFilterFactory"
>> protected="protwords.txt" /> -->
>>    </analyzer>
>>  </fieldType>
>> (...)
>>
>> 4. jar file is loaded but I got an error:
>> SEVERE: Could not start SOLR. Check solr/home property
>> java.lang.NoClassDefFoundError: org/apache/lucene/analysis/TokenFilter
>>      at java.lang.ClassLoader.defineClass1(Native Method)
>>      at java.lang.ClassLoader.defineClass(ClassLoader.java:634)
>>      at
>> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
>> (...)
>>
>> 5. Different class gave me that one:
>> SEVERE: org.apache.solr.common.SolrException: Error loading class
>> 'org.getopt.solr.analysis.StempelTokenFilterFactory'
>>      at
>> org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:375)
>>      at
>> org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:390)
>> (...)
>>
>> Question is: How to make <fieldType /> and <filter /> work with that
>> Stempel? :)
>>
>> Cheers,
>> Jakub Godawa.
>>
>> 2010/10/29 Bernd Fehling <be...@uni-bielefeld.de>:
>> > Hi Jakub,
>> >
>> > I have ported the KStemmer for use in most recent Solr trunk version.
>> > My stemmer is located in the lib directory of Solr
>> "solr/lib/KStemmer-2.00.jar"
>> > because it belongs to Solr.
>> >
>> > Write it as FilterFactory and use it as Filter like:
>> > <filter class="de.ubbielefeld.solr.analysis.KStemFilterFactory"
>> protected="protwords.txt" />
>> >
>> > This is how my fieldType looks like:
>> >
>> >    <fieldType name="text_kstem" class="solr.TextField"
>> positionIncrementGap="100">
>> >      <analyzer type="index">
>> >        <tokenizer class="solr.WhitespaceTokenizerFactory" />
>> >        <filter class="solr.StopFilterFactory" ignoreCase="true"
>> words="stopwords.txt" enablePositionIncrements="false" />
>> >        <filter class="solr.WordDelimiterFilterFactory"
>> generateWordParts="1" generateNumberParts="1" catenateWords="1"
>> catenateNumbers="1"
>> > catenateAll="0" splitOnCaseChange="1" />
>> >        <filter class="solr.LowerCaseFilterFactory" />
>> >        <filter class="de.ubbielefeld.solr.analysis.KStemFilterFactory"
>> protected="protwords.txt" />
>> >        <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
>> >      </analyzer>
>> >      <analyzer type="query">
>> >        <tokenizer class="solr.WhitespaceTokenizerFactory" />
>> >        <filter class="solr.StopFilterFactory" ignoreCase="true"
>> words="stopwords.txt" />
>> >        <filter class="solr.WordDelimiterFilterFactory"
>> generateWordParts="1" generateNumberParts="1" catenateWords="0"
>> catenateNumbers="0"
>> > catenateAll="0" splitOnCaseChange="1" />
>> >        <filter class="solr.LowerCaseFilterFactory" />
>> >        <filter class="de.ubbielefeld.solr.analysis.KStemFilterFactory"
>> protected="protwords.txt" />
>> >        <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
>> >      </analyzer>
>> >    </fieldType>
>> >
>> > Regards,
>> > Bernd
>> >
>> >
>> >
>> > Am 28.10.2010 14:56, schrieb Jakub Godawa:
>> >> Hi!
>> >> There is a polish stemmer http://www.getopt.org/stempel/ and I have
>> >> problems connecting it with solr 1.4.1
>> >> Questions:
>> >>
>> >> 1. Where EXACTLY do I put "stemper-1.0.jar" file?
>> >> 2. How do I register the file, so I can build a fieldType like:
>> >>
>> >> <fieldType name="text_pl" class="solr.TextField">
>> >>   <analyzer class="org.geoopt.solr.analysis.StempelTokenFilterFactory"/>
>> >> </fieldType>
>> >>
>> >> 3. Is that the right approach to make it work?
>> >>
>> >> Thanks for verbose explanation,
>> >> Jakub.
>> >
>>
>

Re: How to use polish stemmer - Stempel - in schema.xml?

Posted by Erick Erickson <er...@gmail.com>.
The polish stemmer jar file needs to be findable by Solr, if you copy
it to <solr_home>/lib and restart solr you should be set.

Alternatively, you can add another <lib> directive to the solrconfig.xml
file
(there are several examples in that file already).

I'm a little confused about not being able to find TokenFilter, is that
still
a problem?

HTH
Erick

On Tue, Nov 2, 2010 at 8:07 AM, Jakub Godawa <ja...@gmail.com> wrote:

> Thank you Bernd! I couldn't make it run though. Here is my problem:
>
> 1. There is a file ~/apache-solr-1.4.1/ifaq/lib/stempel-1.0.jar
> 2. In ~/apache-solr-1.4.1/ifaq/solr/conf/solrconfig.xml there is a
> directive: <lib path="../lib/stempel-1.0.jar" />
> 3. In ~/apache-solr-1.4.1/ifaq/solr/conf/schema.xml there is fieldType:
>
> (...)
>  <!-- Polish -->
>   <fieldType name="text_pl" class="solr.TextField">
>    <analyzer>
>       <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>      <filter class="solr.LowerCaseFilterFactory"/>
>      <filter class="org.getopt.stempel.lucene.StempelFilter" />
>      <!--    <filter
> class="org.getopt.solr.analysis.StempelTokenFilterFactory"
> protected="protwords.txt" /> -->
>    </analyzer>
>  </fieldType>
> (...)
>
> 4. jar file is loaded but I got an error:
> SEVERE: Could not start SOLR. Check solr/home property
> java.lang.NoClassDefFoundError: org/apache/lucene/analysis/TokenFilter
>      at java.lang.ClassLoader.defineClass1(Native Method)
>      at java.lang.ClassLoader.defineClass(ClassLoader.java:634)
>      at
> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
> (...)
>
> 5. Different class gave me that one:
> SEVERE: org.apache.solr.common.SolrException: Error loading class
> 'org.getopt.solr.analysis.StempelTokenFilterFactory'
>      at
> org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:375)
>      at
> org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:390)
> (...)
>
> Question is: How to make <fieldType /> and <filter /> work with that
> Stempel? :)
>
> Cheers,
> Jakub Godawa.
>
> 2010/10/29 Bernd Fehling <be...@uni-bielefeld.de>:
> > Hi Jakub,
> >
> > I have ported the KStemmer for use in most recent Solr trunk version.
> > My stemmer is located in the lib directory of Solr
> "solr/lib/KStemmer-2.00.jar"
> > because it belongs to Solr.
> >
> > Write it as FilterFactory and use it as Filter like:
> > <filter class="de.ubbielefeld.solr.analysis.KStemFilterFactory"
> protected="protwords.txt" />
> >
> > This is how my fieldType looks like:
> >
> >    <fieldType name="text_kstem" class="solr.TextField"
> positionIncrementGap="100">
> >      <analyzer type="index">
> >        <tokenizer class="solr.WhitespaceTokenizerFactory" />
> >        <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" enablePositionIncrements="false" />
> >        <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1"
> > catenateAll="0" splitOnCaseChange="1" />
> >        <filter class="solr.LowerCaseFilterFactory" />
> >        <filter class="de.ubbielefeld.solr.analysis.KStemFilterFactory"
> protected="protwords.txt" />
> >        <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
> >      </analyzer>
> >      <analyzer type="query">
> >        <tokenizer class="solr.WhitespaceTokenizerFactory" />
> >        <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" />
> >        <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0"
> > catenateAll="0" splitOnCaseChange="1" />
> >        <filter class="solr.LowerCaseFilterFactory" />
> >        <filter class="de.ubbielefeld.solr.analysis.KStemFilterFactory"
> protected="protwords.txt" />
> >        <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
> >      </analyzer>
> >    </fieldType>
> >
> > Regards,
> > Bernd
> >
> >
> >
> > Am 28.10.2010 14:56, schrieb Jakub Godawa:
> >> Hi!
> >> There is a polish stemmer http://www.getopt.org/stempel/ and I have
> >> problems connecting it with solr 1.4.1
> >> Questions:
> >>
> >> 1. Where EXACTLY do I put "stemper-1.0.jar" file?
> >> 2. How do I register the file, so I can build a fieldType like:
> >>
> >> <fieldType name="text_pl" class="solr.TextField">
> >>   <analyzer class="org.geoopt.solr.analysis.StempelTokenFilterFactory"/>
> >> </fieldType>
> >>
> >> 3. Is that the right approach to make it work?
> >>
> >> Thanks for verbose explanation,
> >> Jakub.
> >
>