You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Celio Nogueira de Faria Jr <ce...@gmail.com> on 2012/01/11 21:02:09 UTC

seq2sparse with

Hi all,

I am trying to do a "mahout seq2sparse" command using the
org.apache.lucene.analysis.br.BrazilianAnalyzer class (mahout 0.5 under
Ubuntu 10.04 32b). When I use another analyzer (as Whitespace) all goes
well. I also verified that the lucene-analyzers-3.1.0.jar is included on
the classpath.

The full command line and the error stack are shown below.

Any clues?

TIA,

Celio.

./mahout seq2sparse -i ./sequenced/ -o ./vectors-bigram -ow -a
org.apache.lucene.analysis.br.BrazilianAnalyzer -chunk 200 -wt tfidf -s 5
-md 3 -x 90 -ng 2 -ml 50 -seq -n 2

Exxception in thread "main" java.lang.InstantiationException:
org.apache.lucene.analysis.br.BrazilianAnalyzer
at java.lang.Class.newInstance0(Class.java:340)
 at java.lang.Class.newInstance(Class.java:308)
at
org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles.run(SparseVectorsFromSequenceFiles.java:198)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
 at
org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles.main(SparseVectorsFromSequenceFiles.java:52)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
 at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:187)

Re: seq2sparse with

Posted by Lance Norskog <go...@gmail.com>.
This bug happens because the code tries to instantiate the analyzer.
The problem is: BrazilianAnalyzer does not have a 'zero-argument'
constructor:
http://code.google.com/p/lucenelearning/source/browse/trunk/lucene-3.0.0/contrib/analyzers/common/src/java/org/apache/lucene/analysis/br/BrazilianAnalyzer.java?spec=svn2&r=2

The easiest way to solve this is to make a subclass of
BrazilianAnalyzer with a zero-argument constructor that calls the
parent with the Lucene version:

public BrazilianAnalyzer() {
  super(Version.LUCENE_31);
}

I think this is the formula. You may have to look at other Lucene
sources to find the right combination.

On Wed, Jan 11, 2012 at 12:02 PM, Celio Nogueira de Faria Jr
<ce...@gmail.com> wrote:
> Hi all,
>
> I am trying to do a "mahout seq2sparse" command using the
> org.apache.lucene.analysis.br.BrazilianAnalyzer class (mahout 0.5 under
> Ubuntu 10.04 32b). When I use another analyzer (as Whitespace) all goes
> well. I also verified that the lucene-analyzers-3.1.0.jar is included on
> the classpath.
>
> The full command line and the error stack are shown below.
>
> Any clues?
>
> TIA,
>
> Celio.
>
> ./mahout seq2sparse -i ./sequenced/ -o ./vectors-bigram -ow -a
> org.apache.lucene.analysis.br.BrazilianAnalyzer -chunk 200 -wt tfidf -s 5
> -md 3 -x 90 -ng 2 -ml 50 -seq -n 2
>
> Exxception in thread "main" java.lang.InstantiationException:
> org.apache.lucene.analysis.br.BrazilianAnalyzer
> at java.lang.Class.newInstance0(Class.java:340)
>  at java.lang.Class.newInstance(Class.java:308)
> at
> org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles.run(SparseVectorsFromSequenceFiles.java:198)
>  at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>  at
> org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles.main(SparseVectorsFromSequenceFiles.java:52)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>  at java.lang.reflect.Method.invoke(Method.java:597)
> at
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>  at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:187)



-- 
Lance Norskog
goksron@gmail.com