You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Yang <te...@gmail.com> on 2014/01/10 00:48:43 UTC

ArrayIndexOutOfBoundsException with mahout vectordump and cvb ?

I am trying to run the lda (now called cvb) function, I followed the steps
listed in many online sources. the final step after getting the lda result,
to show the result in a human-readable form is doing this vectordump, but
it gave me the following exception:

I also listed the first few bytes of my cvb output file, looks to be at
least not empty.

Thanks!
yang

sh-3.2$   bin/mahout vectordump -i MAHOUT/cvb/part-m-00000 --dictionary
 sparse/dictionary.file-0 --dictionaryType sequencefile --vectorSize 10 -o
cvbout
Running on hadoop, using /apache/hadoop/bin/hadoop and HADOOP_CONF_DIR=
MAHOUT-JOB:
/home/yyang15/mahout/mahout-distribution-0.8/mahout-examples-0.8-job.jar
14/01/08 16:37:03 INFO common.AbstractJob: Command line arguments:
{--dictionary=[sparse/dictionary.file-0], --dictionaryType=[sequencefile],
--endPhase=[2147483647], --input=[MAHOUT/cvb/part-m-00000],
--output=[cvbout], --startPhase=[0], --tempDir=[temp], --vectorSize=[10]}
14/01/08 16:37:04 INFO vectors.VectorDumper: Sort? false
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 0
        at
org.apache.mahout.utils.vectors.VectorHelper$2.apply(VectorHelper.java:132)
        at
org.apache.mahout.utils.vectors.VectorHelper$2.apply(VectorHelper.java:129)
        at com.google.common.collect.Iterators$8.next(Iterators.java:812)
        at java.util.AbstractCollection.toArray(AbstractCollection.java:124)
        at java.util.ArrayList.<init>(ArrayList.java:131)
        at com.google.common.collect.Lists.newArrayList(Lists.java:119)
        at
org.apache.mahout.utils.vectors.VectorHelper.toWeightedTerms(VectorHelper.java:128)
        at
org.apache.mahout.utils.vectors.VectorHelper.vectorToJson(VectorHelper.java:147)
        at
org.apache.mahout.utils.vectors.VectorDumper.run(VectorDumper.java:240)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at
org.apache.mahout.utils.vectors.VectorDumper.main(VectorDumper.java:260)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
        at
org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
        at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:194)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:160)




SEQ^F
org.apache.hadoop.io.IntWritable%org.apache.mahout.math.VectorWritable^@^@^@^@^@^@%<D3>NX<97><A9><FD><BB>a;H<98>KȪ<82>^@^A!^G^@^@^@^D^@^@^@^@^C<A0>H=<B6>g<9C>O
<EF>^?<D8>=ˍ<8A><F1>-<AC>8=ɪA+<E0><F1>^R=<AC>-^Ck<BE>^Cm=<F4>p-<E0>ul<D3>=<BA><FE>H7T<F6>^B=<D7>E<EC><95>RH<A7>=<BB>U<DE>^B^Y"<F1>=<D9>WV^F"^P^Q=հ`^?8^N<F1>=<D6>b^YJ
<91><A0>$=<BB><94><F1><C6>^S?c=<B1><BA><88>^G<EB>i^P=<9B>^N>R<92><D2>q=<BA>^H,<9E>^_<B3><91>=<CE><ED>
i<C1>^FA=<F4>6<9F><A6><BF>^V[=<9F><E8>IN<A4>L<D5>=<B4><E5><F4>j
<83><A0>I=<F4>p<AB>֣%<80>=<A3>'^A<AB><8B>=<A9>=<A4>^V<DB>3<80>^M<B7>=<B5>A^SV^Eͺ?4
       ^K0^\<9D><BA>=<AA><86>l<8B><F4><E8>m^@^@^@^@^@^@^@^@=<C4>w^NjK<BF>
   =<AB>"^O;!<E0><F7>=<AF><BC>R<DC>-

Fwd: ArrayIndexOutOfBoundsException with mahout vectordump and cvb ?

Posted by Yang <te...@gmail.com>.
yes,

-sh-3.2$ bin/mahout seqdumper -i MAHOUT/sparse/dictionary.file-0 |grep zero
14/01/14 11:27:52 INFO common.AbstractJob: Command line arguments:
{--endPhase=[2147483647], --input=[MAHOUT/sparse/dictionary.file-0],
--startPhase=[0], --tempDir=[temp]}
Key: itemcountzero: Value: 5399
Key: zero: Value: 7704
14/01/14 11:27:53 INFO driver.MahoutDriver: Program took 978 ms (Minutes:
0.0163)


if the presence of some specific key causes this issue, then isn't it a bug?

thanks
Yang


On Mon, Jan 13, 2014 at 6:02 PM, Suneel Marthi <su...@yahoo.com>wrote:

> Does the dictionary have a Key 'zero'?
>
>
>
>
>   On Monday, January 13, 2014 7:37 PM, Yang <te...@gmail.com> wrote:
>   Suneel:
>
> thanks for the reply (sorry my gmail somehow put the reply into archive so
> it didn't show up in my inbox)
>
>
> the dictionary seems ok, at least not empty.
>
> -sh-3.2$ ls -l  sparse/
> total 464
> drwxr-xr-x 2 yyang15 gid-yyang15  32768 Jan  8 15:17 df-count
> -rw-r--r-- 1 yyang15 gid-yyang15 203369 Jan  8 15:17 dictionary.file-0
> -rw-r--r-- 1 yyang15 gid-yyang15 186893 Jan  8 15:17 frequency.file-0
> drwxr-xr-x 2 yyang15 gid-yyang15   4096 Jan  8 15:17 tf-vectors
> drwxr-xr-x 2 yyang15 gid-yyang15   4096 Jan  8 15:17 tokenized-documents
> drwxr-xr-x 2 yyang15 gid-yyang15  32768 Jan  8 15:18 wordcount
>
>
> -sh-3.2$ bin/mahout seqdumper -i MAHOUT/sparse/dictionary.file-0
>
> Key: containing: Value: 9229
> Key: craft: Value: 9230
> Key: e33494add68d3d0138c45300f0aa361a: Value: 9231
> Key: elizabeth: Value: 9232
> Key: extra: Value: 9233
> Key: joe: Value: 9234
> Key: juice: Value: 9235
> Key: mario's: Value: 9236
> Key: musical: Value: 9237
> Key: nicest: Value: 9238
> Key: petit_ermitage.html: Value: 9239
> Key: rebeccabarker: Value: 9240
> Key: spa's: Value: 9241
> Key: steam: Value: 9242
> Key: stylesheet: Value: 9243
> Key: tim46679: Value: 9244
> Key: topnav.search_where: Value: 9245
> Key: www.expedia.com: Value: 9246
> Key: xv: Value: 9247
> Count: 9248
> 14/01/13 17:35:39 INFO driver.MahoutDriver: Program took 54565 ms
> (Minutes: 0.9094166666666667)
>
>
>
> On Thu, Jan 9, 2014 at 4:12 PM, Suneel Marthi <su...@yahoo.com>wrote:
>
> The issue seems to be with ur dictionary. What is the length of dictionary?
>
>
>
>
>
> On Thursday, January 9, 2014 6:49 PM, Yang <te...@gmail.com> wrote:
>
> I am trying to run the lda (now called cvb) function, I followed the steps
> listed in many online sources. the final step after getting the lda result,
> to show the result in a human-readable form is doing this vectordump, but
> it gave me the following exception:
>
> I also listed the first few bytes of my cvb output file, looks to be at
> least not empty.
>
> Thanks!
> yang
>
> sh-3.2$   bin/mahout vectordump -i MAHOUT/cvb/part-m-00000 --dictionary
> sparse/dictionary.file-0 --dictionaryType sequencefile --vectorSize 10 -o
> cvbout
> Running on hadoop, using /apache/hadoop/bin/hadoop and HADOOP_CONF_DIR=
> MAHOUT-JOB:
> /home/yyang15/mahout/mahout-distribution-0.8/mahout-examples-0.8-job.jar
> 14/01/08 16:37:03 INFO common.AbstractJob: Command line arguments:
> {--dictionary=[sparse/dictionary.file-0], --dictionaryType=[sequencefile],
> --endPhase=[2147483647], --input=[MAHOUT/cvb/part-m-00000],
> --output=[cvbout], --startPhase=[0], --tempDir=[temp], --vectorSize=[10]}
> 14/01/08 16:37:04 INFO vectors.VectorDumper: Sort? false
> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 0
>         at
> org.apache.mahout.utils.vectors.VectorHelper$2.apply(VectorHelper.java:132)
>         at
> org.apache.mahout.utils.vectors.VectorHelper$2.apply(VectorHelper.java:129)
>         at com.google.common.collect.Iterators$8.next(Iterators.java:812)
>         at
> java.util.AbstractCollection.toArray(AbstractCollection.java:124)
>         at java.util.ArrayList.<init>(ArrayList.java:131)
>         at com.google.common.collect.Lists.newArrayList(Lists.java:119)
>         at
>
> org.apache.mahout.utils.vectors.VectorHelper.toWeightedTerms(VectorHelper.java:128)
>         at
>
> org.apache.mahout.utils.vectors.VectorHelper.vectorToJson(VectorHelper.java:147)
>         at
> org.apache.mahout.utils.vectors.VectorDumper.run(VectorDumper.java:240)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>         at
> org.apache.mahout.utils.vectors.VectorDumper.main(VectorDumper.java:260)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at
>
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>         at
> org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>         at
> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:194)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
>
>
>
>
> SEQ^F
>
> org.apache.hadoop.io.IntWritable%org.apache.mahout.math.VectorWritable^@^@^@^@^@^@%<D3>NX<97><A9><FD><BB>a;H<98>KȪ<82>^@^A!^G^@^@^@^D^@^@^@^@^C<A0>H=<B6>g<9C>O
>
> <EF>^?<D8>=ˍ<8A><F1>-<AC>8=ɪA+<E0><F1>^R=<AC>-^Ck<BE>^Cm=<F4>p-<E0>ul<D3>=<BA><FE>H7T<F6>^B=<D7>E<EC><95>RH<A7>=<BB>U<DE>^B^Y"<F1>=<D9>WV^F"^P^Q=հ`^?8^N<F1>=<D6>b^YJ
>
> <91><A0>$=<BB><94><F1><C6>^S?c=<B1><BA><88>^G<EB>i^P=<9B>^N>R<92><D2>q=<BA>^H,<9E>^_<B3><91>=<CE><ED>
> i<C1>^FA=<F4>6<9F><A6><BF>^V[=<9F><E8>IN<A4>L<D5>=<B4><E5><F4>j
>
> <83><A0>I=<F4>p<AB>֣%<80>=<A3>'^A<AB><8B>=<A9>=<A4>^V<DB>3<80>^M<B7>=<B5>A^SV^Eͺ?4
>        ^K0^\<9D><BA>=<AA><86>l<8B><F4><E8>m^@^@^@^@^@^@^@^@=<C4>w^NjK<BF>
>    =<AB>"^O;!<E0><F7>=<AF><BC>R<DC>-
>
>
>
>
>

Re: ArrayIndexOutOfBoundsException with mahout vectordump and cvb ?

Posted by Suneel Marthi <su...@yahoo.com>.
Does the dictionary have a Key 'zero'?





On Monday, January 13, 2014 7:37 PM, Yang <te...@gmail.com> wrote:
 
Suneel:

thanks for the reply (sorry my gmail somehow put the reply into archive so it didn't show up in my inbox)


the dictionary seems ok, at least not empty. 

-sh-3.2$ ls -l  sparse/
total 464
drwxr-xr-x 2 yyang15 gid-yyang15  32768 Jan  8 15:17 df-count
-rw-r--r-- 1 yyang15 gid-yyang15 203369 Jan  8 15:17 dictionary.file-0
-rw-r--r-- 1 yyang15 gid-yyang15 186893 Jan  8 15:17 frequency.file-0
drwxr-xr-x 2 yyang15 gid-yyang15   4096 Jan  8 15:17 tf-vectors
drwxr-xr-x 2 yyang15 gid-yyang15   4096 Jan  8 15:17 tokenized-documents
drwxr-xr-x 2 yyang15 gid-yyang15  32768 Jan  8 15:18 wordcount



-sh-3.2$ bin/mahout seqdumper -i MAHOUT/sparse/dictionary.file-0

Key: containing: Value: 9229
Key: craft: Value: 9230
Key: e33494add68d3d0138c45300f0aa361a: Value: 9231
Key: elizabeth: Value: 9232
Key: extra: Value: 9233
Key: joe: Value: 9234
Key: juice: Value: 9235
Key: mario's: Value: 9236
Key: musical: Value: 9237
Key: nicest: Value: 9238
Key: petit_ermitage.html: Value: 9239
Key: rebeccabarker: Value: 9240
Key: spa's: Value: 9241
Key: steam: Value: 9242
Key: stylesheet: Value: 9243
Key: tim46679: Value: 9244
Key: topnav.search_where: Value: 9245
Key: www.expedia.com: Value: 9246
Key: xv: Value: 9247
Count: 9248
14/01/13 17:35:39 INFO driver.MahoutDriver: Program took 54565 ms (Minutes: 0.9094166666666667)




On Thu, Jan 9, 2014 at 4:12 PM, Suneel Marthi <su...@yahoo.com> wrote:

The issue seems to be with ur dictionary. What is the length of dictionary?
>
>
>
>
>
>
>On Thursday, January 9, 2014 6:49 PM, Yang <te...@gmail.com> wrote:
>
>I am trying to run the lda (now called cvb) function, I followed the steps
>listed in many online sources. the final step after getting the lda result,
>to show the result in a human-readable form is doing this vectordump, but
>it gave me the following exception:
>
>I also listed the first few bytes of my cvb output file, looks to be at
>least not empty.
>
>Thanks!
>yang
>
>sh-3.2$   bin/mahout vectordump -i MAHOUT/cvb/part-m-00000 --dictionary
>sparse/dictionary.file-0 --dictionaryType sequencefile --vectorSize 10 -o
>cvbout
>Running on hadoop, using /apache/hadoop/bin/hadoop and HADOOP_CONF_DIR=
>MAHOUT-JOB:
>/home/yyang15/mahout/mahout-distribution-0.8/mahout-examples-0.8-job.jar
>14/01/08 16:37:03 INFO common.AbstractJob: Command line arguments:
>{--dictionary=[sparse/dictionary.file-0], --dictionaryType=[sequencefile],
>--endPhase=[2147483647], --input=[MAHOUT/cvb/part-m-00000],
>--output=[cvbout], --startPhase=[0], --tempDir=[temp], --vectorSize=[10]}
>14/01/08 16:37:04 INFO vectors.VectorDumper: Sort? false
>Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 0
>        at
>org.apache.mahout.utils.vectors.VectorHelper$2.apply(VectorHelper.java:132)
>        at
>org.apache.mahout.utils.vectors.VectorHelper$2.apply(VectorHelper.java:129)
>        at com.google.common.collect.Iterators$8.next(Iterators.java:812)
>        at java.util.AbstractCollection.toArray(AbstractCollection.java:124)
>        at java.util.ArrayList.<init>(ArrayList.java:131)
>        at com.google.common.collect.Lists.newArrayList(Lists.java:119)
>        at
>org.apache.mahout.utils.vectors.VectorHelper.toWeightedTerms(VectorHelper.java:128)
>        at
>org.apache.mahout.utils.vectors.VectorHelper.vectorToJson(VectorHelper.java:147)
>        at
>org.apache.mahout.utils.vectors.VectorDumper.run(VectorDumper.java:240)
>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>        at
>org.apache.mahout.utils.vectors.VectorDumper.main(VectorDumper.java:260)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at
>sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at
>sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at
>org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>        at
>org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>        at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:194)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at
>sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at
>sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
>
>
>
>
>SEQ^F
>org.apache.hadoop.io.IntWritable%org.apache.mahout.math.VectorWritable^@^@^@^@^@^@%<D3>NX<97><A9><FD><BB>a;H<98>KȪ<82>^@^A!^G^@^@^@^D^@^@^@^@^C<A0>H=<B6>g<9C>O
><EF>^?<D8>=ˍ<8A><F1>-<AC>8=ɪA+<E0><F1>^R=<AC>-^Ck<BE>^Cm=<F4>p-<E0>ul<D3>=<BA><FE>H7T<F6>^B=<D7>E<EC><95>RH<A7>=<BB>U<DE>^B^Y"<F1>=<D9>WV^F"^P^Q=հ`^?8^N<F1>=<D6>b^YJ
><91><A0>$=<BB><94><F1><C6>^S?c=<B1><BA><88>^G<EB>i^P=<9B>^N>R<92><D2>q=<BA>^H,<9E>^_<B3><91>=<CE><ED>
>i<C1>^FA=<F4>6<9F><A6><BF>^V[=<9F><E8>IN<A4>L<D5>=<B4><E5><F4>j
><83><A0>I=<F4>p<AB>֣%<80>=<A3>'^A<AB><8B>=<A9>=<A4>^V<DB>3<80>^M<B7>=<B5>A^SV^Eͺ?4
>       ^K0^\<9D><BA>=<AA><86>l<8B><F4><E8>m^@^@^@^@^@^@^@^@=<C4>w^NjK<BF>
>   =<AB>"^O;!<E0><F7>=<AF><BC>R<DC>-

Re: ArrayIndexOutOfBoundsException with mahout vectordump and cvb ?

Posted by Yang <te...@gmail.com>.
Suneel:

thanks for the reply (sorry my gmail somehow put the reply into archive so
it didn't show up in my inbox)


the dictionary seems ok, at least not empty.

-sh-3.2$ ls -l  sparse/
total 464
drwxr-xr-x 2 yyang15 gid-yyang15  32768 Jan  8 15:17 df-count
-rw-r--r-- 1 yyang15 gid-yyang15 203369 Jan  8 15:17 dictionary.file-0
-rw-r--r-- 1 yyang15 gid-yyang15 186893 Jan  8 15:17 frequency.file-0
drwxr-xr-x 2 yyang15 gid-yyang15   4096 Jan  8 15:17 tf-vectors
drwxr-xr-x 2 yyang15 gid-yyang15   4096 Jan  8 15:17 tokenized-documents
drwxr-xr-x 2 yyang15 gid-yyang15  32768 Jan  8 15:18 wordcount


-sh-3.2$ bin/mahout seqdumper -i MAHOUT/sparse/dictionary.file-0

Key: containing: Value: 9229
Key: craft: Value: 9230
Key: e33494add68d3d0138c45300f0aa361a: Value: 9231
Key: elizabeth: Value: 9232
Key: extra: Value: 9233
Key: joe: Value: 9234
Key: juice: Value: 9235
Key: mario's: Value: 9236
Key: musical: Value: 9237
Key: nicest: Value: 9238
Key: petit_ermitage.html: Value: 9239
Key: rebeccabarker: Value: 9240
Key: spa's: Value: 9241
Key: steam: Value: 9242
Key: stylesheet: Value: 9243
Key: tim46679: Value: 9244
Key: topnav.search_where: Value: 9245
Key: www.expedia.com: Value: 9246
Key: xv: Value: 9247
Count: 9248
14/01/13 17:35:39 INFO driver.MahoutDriver: Program took 54565 ms (Minutes:
0.9094166666666667)



On Thu, Jan 9, 2014 at 4:12 PM, Suneel Marthi <su...@yahoo.com>wrote:

> The issue seems to be with ur dictionary. What is the length of dictionary?
>
>
>
>
>
> On Thursday, January 9, 2014 6:49 PM, Yang <te...@gmail.com> wrote:
>
> I am trying to run the lda (now called cvb) function, I followed the steps
> listed in many online sources. the final step after getting the lda result,
> to show the result in a human-readable form is doing this vectordump, but
> it gave me the following exception:
>
> I also listed the first few bytes of my cvb output file, looks to be at
> least not empty.
>
> Thanks!
> yang
>
> sh-3.2$   bin/mahout vectordump -i MAHOUT/cvb/part-m-00000 --dictionary
> sparse/dictionary.file-0 --dictionaryType sequencefile --vectorSize 10 -o
> cvbout
> Running on hadoop, using /apache/hadoop/bin/hadoop and HADOOP_CONF_DIR=
> MAHOUT-JOB:
> /home/yyang15/mahout/mahout-distribution-0.8/mahout-examples-0.8-job.jar
> 14/01/08 16:37:03 INFO common.AbstractJob: Command line arguments:
> {--dictionary=[sparse/dictionary.file-0], --dictionaryType=[sequencefile],
> --endPhase=[2147483647], --input=[MAHOUT/cvb/part-m-00000],
> --output=[cvbout], --startPhase=[0], --tempDir=[temp], --vectorSize=[10]}
> 14/01/08 16:37:04 INFO vectors.VectorDumper: Sort? false
> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 0
>         at
> org.apache.mahout.utils.vectors.VectorHelper$2.apply(VectorHelper.java:132)
>         at
> org.apache.mahout.utils.vectors.VectorHelper$2.apply(VectorHelper.java:129)
>         at com.google.common.collect.Iterators$8.next(Iterators.java:812)
>         at
> java.util.AbstractCollection.toArray(AbstractCollection.java:124)
>         at java.util.ArrayList.<init>(ArrayList.java:131)
>         at com.google.common.collect.Lists.newArrayList(Lists.java:119)
>         at
>
> org.apache.mahout.utils.vectors.VectorHelper.toWeightedTerms(VectorHelper.java:128)
>         at
>
> org.apache.mahout.utils.vectors.VectorHelper.vectorToJson(VectorHelper.java:147)
>         at
> org.apache.mahout.utils.vectors.VectorDumper.run(VectorDumper.java:240)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>         at
> org.apache.mahout.utils.vectors.VectorDumper.main(VectorDumper.java:260)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at
>
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>         at
> org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>         at
> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:194)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
>
>
>
>
> SEQ^F
>
> org.apache.hadoop.io.IntWritable%org.apache.mahout.math.VectorWritable^@^@^@^@^@^@%<D3>NX<97><A9><FD><BB>a;H<98>KȪ<82>^@^A!^G^@^@^@^D^@^@^@^@^C<A0>H=<B6>g<9C>O
>
> <EF>^?<D8>=ˍ<8A><F1>-<AC>8=ɪA+<E0><F1>^R=<AC>-^Ck<BE>^Cm=<F4>p-<E0>ul<D3>=<BA><FE>H7T<F6>^B=<D7>E<EC><95>RH<A7>=<BB>U<DE>^B^Y"<F1>=<D9>WV^F"^P^Q=հ`^?8^N<F1>=<D6>b^YJ
>
> <91><A0>$=<BB><94><F1><C6>^S?c=<B1><BA><88>^G<EB>i^P=<9B>^N>R<92><D2>q=<BA>^H,<9E>^_<B3><91>=<CE><ED>
> i<C1>^FA=<F4>6<9F><A6><BF>^V[=<9F><E8>IN<A4>L<D5>=<B4><E5><F4>j
>
> <83><A0>I=<F4>p<AB>֣%<80>=<A3>'^A<AB><8B>=<A9>=<A4>^V<DB>3<80>^M<B7>=<B5>A^SV^Eͺ?4
>        ^K0^\<9D><BA>=<AA><86>l<8B><F4><E8>m^@^@^@^@^@^@^@^@=<C4>w^NjK<BF>
>    =<AB>"^O;!<E0><F7>=<AF><BC>R<DC>-
>

Re: ArrayIndexOutOfBoundsException with mahout vectordump and cvb ?

Posted by Suneel Marthi <su...@yahoo.com>.
The issue seems to be with ur dictionary. What is the length of dictionary?





On Thursday, January 9, 2014 6:49 PM, Yang <te...@gmail.com> wrote:
 
I am trying to run the lda (now called cvb) function, I followed the steps
listed in many online sources. the final step after getting the lda result,
to show the result in a human-readable form is doing this vectordump, but
it gave me the following exception:

I also listed the first few bytes of my cvb output file, looks to be at
least not empty.

Thanks!
yang

sh-3.2$   bin/mahout vectordump -i MAHOUT/cvb/part-m-00000 --dictionary
sparse/dictionary.file-0 --dictionaryType sequencefile --vectorSize 10 -o
cvbout
Running on hadoop, using /apache/hadoop/bin/hadoop and HADOOP_CONF_DIR=
MAHOUT-JOB:
/home/yyang15/mahout/mahout-distribution-0.8/mahout-examples-0.8-job.jar
14/01/08 16:37:03 INFO common.AbstractJob: Command line arguments:
{--dictionary=[sparse/dictionary.file-0], --dictionaryType=[sequencefile],
--endPhase=[2147483647], --input=[MAHOUT/cvb/part-m-00000],
--output=[cvbout], --startPhase=[0], --tempDir=[temp], --vectorSize=[10]}
14/01/08 16:37:04 INFO vectors.VectorDumper: Sort? false
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 0
        at
org.apache.mahout.utils.vectors.VectorHelper$2.apply(VectorHelper.java:132)
        at
org.apache.mahout.utils.vectors.VectorHelper$2.apply(VectorHelper.java:129)
        at com.google.common.collect.Iterators$8.next(Iterators.java:812)
        at java.util.AbstractCollection.toArray(AbstractCollection.java:124)
        at java.util.ArrayList.<init>(ArrayList.java:131)
        at com.google.common.collect.Lists.newArrayList(Lists.java:119)
        at
org.apache.mahout.utils.vectors.VectorHelper.toWeightedTerms(VectorHelper.java:128)
        at
org.apache.mahout.utils.vectors.VectorHelper.vectorToJson(VectorHelper.java:147)
        at
org.apache.mahout.utils.vectors.VectorDumper.run(VectorDumper.java:240)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at
org.apache.mahout.utils.vectors.VectorDumper.main(VectorDumper.java:260)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
        at
org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
        at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:194)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:160)




SEQ^F
org.apache.hadoop.io.IntWritable%org.apache.mahout.math.VectorWritable^@^@^@^@^@^@%<D3>NX<97><A9><FD><BB>a;H<98>KȪ<82>^@^A!^G^@^@^@^D^@^@^@^@^C<A0>H=<B6>g<9C>O
<EF>^?<D8>=ˍ<8A><F1>-<AC>8=ɪA+<E0><F1>^R=<AC>-^Ck<BE>^Cm=<F4>p-<E0>ul<D3>=<BA><FE>H7T<F6>^B=<D7>E<EC><95>RH<A7>=<BB>U<DE>^B^Y"<F1>=<D9>WV^F"^P^Q=հ`^?8^N<F1>=<D6>b^YJ
<91><A0>$=<BB><94><F1><C6>^S?c=<B1><BA><88>^G<EB>i^P=<9B>^N>R<92><D2>q=<BA>^H,<9E>^_<B3><91>=<CE><ED>
i<C1>^FA=<F4>6<9F><A6><BF>^V[=<9F><E8>IN<A4>L<D5>=<B4><E5><F4>j
<83><A0>I=<F4>p<AB>֣%<80>=<A3>'^A<AB><8B>=<A9>=<A4>^V<DB>3<80>^M<B7>=<B5>A^SV^Eͺ?4
       ^K0^\<9D><BA>=<AA><86>l<8B><F4><E8>m^@^@^@^@^@^@^@^@=<C4>w^NjK<BF>
   =<AB>"^O;!<E0><F7>=<AF><BC>R<DC>-