You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Alok Tanna <ta...@gmail.com> on 2016/02/04 04:33:00 UTC

Mahout error : seq2sparse

Mahout in local mode

I am able to successfully run the below command on smaller data set, but
then when I am running this command on large data set I am getting below
error.  Its looks like I need to increase size of some parameter but then I
am not sure which one.  It is failing with this error java.io.EOFException
  which creating the dictionary-0 file

Please fine the attached file for more details.

command: mahout seq2sparse -i /home/ubuntu/AT/AT-Seq/ -o
/home/ubuntu/AT/AT-vectors/ -lnorm -nv -wt tfidf

Main error :


16/02/03 23:02:06 INFO mapred.LocalJobRunner: reduce > reduce
16/02/03 23:02:17 INFO mapred.LocalJobRunner: reduce > reduce
16/02/03 23:02:18 WARN mapred.LocalJobRunner: job_local1308764206_0003
java.io.EOFException
        at java.io.DataInputStream.readByte(DataInputStream.java:267)
        at
org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:299)
        at
org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:320)
        at org.apache.hadoop.io.Text.readFields(Text.java:263)
        at
org.apache.mahout.common.StringTuple.readFields(StringTuple.java:142)
        at
org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
        at
org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
        at
org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:117)
        at
org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
        at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
        at
org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
        at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)
16/02/03 23:02:18 INFO mapred.JobClient: Job complete:
job_local1308764206_0003
16/02/03 23:02:18 INFO mapred.JobClient: Counters: 20
16/02/03 23:02:18 INFO mapred.JobClient:   File Output Format Counters
16/02/03 23:02:18 INFO mapred.JobClient:     Bytes Written=14923244
16/02/03 23:02:18 INFO mapred.JobClient:   FileSystemCounters
16/02/03 23:02:18 INFO mapred.JobClient:     FILE_BYTES_READ=1412144036729
16/02/03 23:02:18 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=323876626568
16/02/03 23:02:18 INFO mapred.JobClient:   File Input Format Counters
16/02/03 23:02:18 INFO mapred.JobClient:     Bytes Read=11885543289
16/02/03 23:02:18 INFO mapred.JobClient:   Map-Reduce Framework
16/02/03 23:02:18 INFO mapred.JobClient:     Reduce input groups=223
16/02/03 23:02:18 INFO mapred.JobClient:     Map output materialized
bytes=2214020551
16/02/03 23:02:18 INFO mapred.JobClient:     Combine output records=0
16/02/03 23:02:18 INFO mapred.JobClient:     Map input records=223
16/02/03 23:02:18 INFO mapred.JobClient:     Reduce shuffle bytes=0
16/02/03 23:02:18 INFO mapred.JobClient:     Physical memory (bytes)
snapshot=0
16/02/03 23:02:18 INFO mapred.JobClient:     Reduce output records=222
16/02/03 23:02:18 INFO mapred.JobClient:     Spilled Records=638
16/02/03 23:02:18 INFO mapred.JobClient:     Map output bytes=2214019100
16/02/03 23:02:18 INFO mapred.JobClient:     CPU time spent (ms)=0
16/02/03 23:02:18 INFO mapred.JobClient:     Total committed heap usaAT
(bytes)=735978192896
16/02/03 23:02:18 INFO mapred.JobClient:     Virtual memory (bytes)
snapshot=0
16/02/03 23:02:18 INFO mapred.JobClient:     Combine input records=0
16/02/03 23:02:18 INFO mapred.JobClient:     Map output records=223
16/02/03 23:02:18 INFO mapred.JobClient:     SPLIT_RAW_BYTES=9100
16/02/03 23:02:18 INFO mapred.JobClient:     Reduce input records=222
Exception in thread "main" java.lang.IllegalStateException: Job failed!
        at
org.apache.mahout.vectorizer.DictionaryVectorizer.makePartialVectors(DictionaryVectorizer.java:329)
        at
org.apache.mahout.vectorizer.DictionaryVectorizer.createTermFrequencyVectors(DictionaryVectorizer.java:199)
        at
org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles.run(SparseVectorsFromSequenceFiles.java:274)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
        at
org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles.main(SparseVectorsFromSequenceFiles.java:56)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
        at
org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
        at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
.
.



-- 
Thanks & Regards,

Alok R. Tanna

RE: Mahout error : seq2sparse

Posted by Andrew Palumbo <ap...@outlook.com>.
thank you for reporting this.  The "-el" option was removed from 'mahout trainnb' in v0.10.0

It is now the default action.

That piece of documentation needs to be updated.

Andy


-------- Original message --------
From: Andrew Musselman <an...@gmail.com>
Date: 02/04/2016 2:20 PM (GMT-05:00)
To: Alok Tanna <ta...@gmail.com>
Cc: user@mahout.apache.org
Subject: Re: Mahout error : seq2sparse

Great to hear!  If you're up for it you could sign up and file a bug at
https://issues.apache.org/jira/browse/MAHOUT so we can track that.

Thanks!

On Thu, Feb 4, 2016 at 11:18 AM, Alok Tanna <ta...@gmail.com> wrote:

> Thank you so much Andrew it did work with the latest version in local
> mode.
>
> I found one thing that with the new version in the twenty-newsgroups
> classification example(
> https://mahout.apache.org/users/classification/twenty-newsgroups.html)
> this command
> 6. Train the classifier
>
>  $ mahout trainnb
>         -i ${WORK_DIR}/20news-train-vectors
>         -el
>         -o ${WORK_DIR}/model
>         -li ${WORK_DIR}/labelindex
>         -ow
>         -c
>
>
>  wont work with -el parameter . Once I removed it worked fine. not sure
> why ?
>
> with this -el parameter it worked with earlier versions.
>
> Thanks,
> Alok Tanna
>
> On Thu, Feb 4, 2016 at 2:18 AM, Alok Tanna <ta...@gmail.com> wrote:
>
>> Will try to update it to night to the latest version and then give it a
>> try .
>>
>> Thanks,
>> Alok Tanna
>>
>> On Thu, Feb 4, 2016 at 1:48 AM, Andrew Musselman <
>> andrew.musselman@gmail.com> wrote:
>>
>>> Would recommend updating to the latest version if you can; you're
>>> probably working with two-releases-old code.
>>>
>>>
>>> On Wednesday, February 3, 2016, Alok Tanna <ta...@gmail.com> wrote:
>>>
>>>> Thank you Andrew . I was able to remove empty lines with your help and
>>>> also run re run the process but then still I am getting the same error.
>>>>
>>>> when I just run Mahout it shows me  this
>>>> jar /mahout-examples-1.0-SNAPSHOT-job.jar!
>>>>
>>>> I think only option I have now is to set up the cluster and run it on
>>>> that
>>>>
>>>> Thanks,
>>>> Alok Tanna
>>>>
>>>>
>>
>>
>> --
>> Thanks & Regards,
>>
>> Alok R. Tanna
>>
>>
>
>
>
> --
> Thanks & Regards,
>
> Alok R. Tanna
>
>

Re: Mahout error : seq2sparse

Posted by Andrew Musselman <an...@gmail.com>.
Great to hear!  If you're up for it you could sign up and file a bug at
https://issues.apache.org/jira/browse/MAHOUT so we can track that.

Thanks!

On Thu, Feb 4, 2016 at 11:18 AM, Alok Tanna <ta...@gmail.com> wrote:

> Thank you so much Andrew it did work with the latest version in local
> mode.
>
> I found one thing that with the new version in the twenty-newsgroups
> classification example(
> https://mahout.apache.org/users/classification/twenty-newsgroups.html)
> this command
> 6. Train the classifier
>
>  $ mahout trainnb
>         -i ${WORK_DIR}/20news-train-vectors
>         -el
>         -o ${WORK_DIR}/model
>         -li ${WORK_DIR}/labelindex
>         -ow
>         -c
>
>
>  wont work with -el parameter . Once I removed it worked fine. not sure
> why ?
>
> with this -el parameter it worked with earlier versions.
>
> Thanks,
> Alok Tanna
>
> On Thu, Feb 4, 2016 at 2:18 AM, Alok Tanna <ta...@gmail.com> wrote:
>
>> Will try to update it to night to the latest version and then give it a
>> try .
>>
>> Thanks,
>> Alok Tanna
>>
>> On Thu, Feb 4, 2016 at 1:48 AM, Andrew Musselman <
>> andrew.musselman@gmail.com> wrote:
>>
>>> Would recommend updating to the latest version if you can; you're
>>> probably working with two-releases-old code.
>>>
>>>
>>> On Wednesday, February 3, 2016, Alok Tanna <ta...@gmail.com> wrote:
>>>
>>>> Thank you Andrew . I was able to remove empty lines with your help and
>>>> also run re run the process but then still I am getting the same error.
>>>>
>>>> when I just run Mahout it shows me  this
>>>> jar /mahout-examples-1.0-SNAPSHOT-job.jar!
>>>>
>>>> I think only option I have now is to set up the cluster and run it on
>>>> that
>>>>
>>>> Thanks,
>>>> Alok Tanna
>>>>
>>>>
>>
>>
>> --
>> Thanks & Regards,
>>
>> Alok R. Tanna
>>
>>
>
>
>
> --
> Thanks & Regards,
>
> Alok R. Tanna
>
>

Re: Mahout error : seq2sparse

Posted by Alok Tanna <ta...@gmail.com>.
Thank you so much Andrew it did work with the latest version in local mode.

I found one thing that with the new version in the twenty-newsgroups
classification example(
https://mahout.apache.org/users/classification/twenty-newsgroups.html) this
command
6. Train the classifier

 $ mahout trainnb
        -i ${WORK_DIR}/20news-train-vectors
        -el
        -o ${WORK_DIR}/model
        -li ${WORK_DIR}/labelindex
        -ow
        -c


 wont work with -el parameter . Once I removed it worked fine. not sure why
?

with this -el parameter it worked with earlier versions.

Thanks,
Alok Tanna

On Thu, Feb 4, 2016 at 2:18 AM, Alok Tanna <ta...@gmail.com> wrote:

> Will try to update it to night to the latest version and then give it a
> try .
>
> Thanks,
> Alok Tanna
>
> On Thu, Feb 4, 2016 at 1:48 AM, Andrew Musselman <
> andrew.musselman@gmail.com> wrote:
>
>> Would recommend updating to the latest version if you can; you're
>> probably working with two-releases-old code.
>>
>>
>> On Wednesday, February 3, 2016, Alok Tanna <ta...@gmail.com> wrote:
>>
>>> Thank you Andrew . I was able to remove empty lines with your help and
>>> also run re run the process but then still I am getting the same error.
>>>
>>> when I just run Mahout it shows me  this
>>> jar /mahout-examples-1.0-SNAPSHOT-job.jar!
>>>
>>> I think only option I have now is to set up the cluster and run it on
>>> that
>>>
>>> Thanks,
>>> Alok Tanna
>>>
>>>
>
>
> --
> Thanks & Regards,
>
> Alok R. Tanna
>
>



-- 
Thanks & Regards,

Alok R. Tanna

Re: Mahout error : seq2sparse

Posted by Alok Tanna <ta...@gmail.com>.
Will try to update it to night to the latest version and then give it a try
.

Thanks,
Alok Tanna

On Thu, Feb 4, 2016 at 1:48 AM, Andrew Musselman <andrew.musselman@gmail.com
> wrote:

> Would recommend updating to the latest version if you can; you're probably
> working with two-releases-old code.
>
>
> On Wednesday, February 3, 2016, Alok Tanna <ta...@gmail.com> wrote:
>
>> Thank you Andrew . I was able to remove empty lines with your help and
>> also run re run the process but then still I am getting the same error.
>>
>> when I just run Mahout it shows me  this
>> jar /mahout-examples-1.0-SNAPSHOT-job.jar!
>>
>> I think only option I have now is to set up the cluster and run it on
>> that
>>
>> Thanks,
>> Alok Tanna
>>
>>


-- 
Thanks & Regards,

Alok R. Tanna

Re: Mahout error : seq2sparse

Posted by Andrew Musselman <an...@gmail.com>.
Would recommend updating to the latest version if you can; you're probably
working with two-releases-old code.

On Wednesday, February 3, 2016, Alok Tanna <ta...@gmail.com> wrote:

> Thank you Andrew . I was able to remove empty lines with your help and
> also run re run the process but then still I am getting the same error.
>
> when I just run Mahout it shows me  this
> jar /mahout-examples-1.0-SNAPSHOT-job.jar!
>
> I think only option I have now is to set up the cluster and run it on that
>
> Thanks,
> Alok Tanna
>
>

Re: Mahout error : seq2sparse

Posted by Alok Tanna <ta...@gmail.com>.
Thank you Andrew . I was able to remove empty lines with your help and also
run re run the process but then still I am getting the same error.

when I just run Mahout it shows me  this
jar /mahout-examples-1.0-SNAPSHOT-job.jar!

I think only option I have now is to set up the cluster and run it on that

Thanks,
Alok Tanna

Re: Mahout error : seq2sparse

Posted by Andrew Musselman <an...@gmail.com>.
For the Mahout version you could run `mahout` and look for lines that
include the version-jar name, such as:  "MAHOUT-JOB:
/usr/lib/mahout/mahout-examples-0.11.1-job.jar"

We don't have a -version flag that I can see but I just opened
https://issues.apache.org/jira/browse/MAHOUT-1798 which you're free to take
a stab at.

On Wed, Feb 3, 2016 at 9:21 PM, Andrew Musselman <andrew.musselman@gmail.com
> wrote:

> $ for i in `ls input-directory`; do sed -i '/^$/d' input-directory/$i; done
>
> On Wed, Feb 3, 2016 at 9:08 PM, Alok Tanna <ta...@gmail.com> wrote:
>
>> This command works thank you  , yes I am seeing lot of empty lines in my
>> input files. any magic command to remove this lines that would save lot of
>> time.
>> I would re run this once I have removed empty lines.
>>
>> It would be great if I can get this working in local mode or else I will
>> have to send few days to get it working on hadoop\spark cluster.
>>
>> Thanks,
>> Alok Tanna
>>
>> On Wed, Feb 3, 2016 at 11:38 PM, Andrew Musselman <
>> andrew.musselman@gmail.com> wrote:
>>
>>> Ah; looks like that config can be set in Hadoop's core-site.xml but if
>>> you're running Mahout in local mode that shouldn't help.
>>>
>>> Can you try this with local mode off, in other words on a running
>>> Hadoop/Spark cluster?
>>>
>>> Looking for empty lines could be run via a command like `grep -r "^$"
>>> input-file-directory`; blank lines will show up before your next prompt if
>>> so.
>>>
>>> On Wed, Feb 3, 2016 at 8:30 PM, Alok Tanna <ta...@gmail.com> wrote:
>>>
>>>> Thank you Andrew for the quick response . I have around 300 input
>>>> files. It would take a while for me to go though each file. I will try to
>>>> look into that, but then I had successfully generated the sequence file use mahout
>>>> seqdirectory for the same dataset. How can I find which mahout release I am
>>>> on? also can you let me know how can I increase io.sort.mb = 100 when
>>>> I have Mahout running in local mode.
>>>>
>>>> In the earlier attach file you can see it says 16/02/03 22:59:04 INFO
>>>> mapred.MapTask: Record too large for in-memory buffer: 99614722 bytes
>>>>
>>>> How can I increase in-memory buffer for Mahout local mode.
>>>>
>>>> I hope this has nothing to do with this error.
>>>>
>>>> Thanks,
>>>> Alok Tanna
>>>>
>>>> On Wed, Feb 3, 2016 at 10:50 PM, Andrew Musselman <
>>>> andrew.musselman@gmail.com> wrote:
>>>>
>>>>> Is it possible you have any empty lines or extra whitespace at the end
>>>>> or
>>>>> in the middle of any of your input files?  I don't know for sure but
>>>>> that's
>>>>> where I'd start looking.
>>>>>
>>>>> Are you on the most recent release?
>>>>>
>>>>> On Wed, Feb 3, 2016 at 7:33 PM, Alok Tanna <ta...@gmail.com>
>>>>> wrote:
>>>>>
>>>>> > Mahout in local mode
>>>>> >
>>>>> > I am able to successfully run the below command on smaller data set,
>>>>> but
>>>>> > then when I am running this command on large data set I am getting
>>>>> below
>>>>> > error.  Its looks like I need to increase size of some parameter but
>>>>> then I
>>>>> > am not sure which one.  It is failing with this error
>>>>> java.io.EOFException
>>>>> >   which creating the dictionary-0 file
>>>>> >
>>>>> > Please fine the attached file for more details.
>>>>> >
>>>>> > command: mahout seq2sparse -i /home/ubuntu/AT/AT-Seq/ -o
>>>>> > /home/ubuntu/AT/AT-vectors/ -lnorm -nv -wt tfidf
>>>>> >
>>>>> > Main error :
>>>>> >
>>>>> >
>>>>> > 16/02/03 23:02:06 INFO mapred.LocalJobRunner: reduce > reduce
>>>>> > 16/02/03 23:02:17 INFO mapred.LocalJobRunner: reduce > reduce
>>>>> > 16/02/03 23:02:18 WARN mapred.LocalJobRunner:
>>>>> job_local1308764206_0003
>>>>> > java.io.EOFException
>>>>> >         at java.io.DataInputStream.readByte(DataInputStream.java:267)
>>>>> >         at
>>>>> > org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:299)
>>>>> >         at
>>>>> > org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:320)
>>>>> >         at org.apache.hadoop.io.Text.readFields(Text.java:263)
>>>>> >         at
>>>>> > org.apache.mahout.common.StringTuple.readFields(StringTuple.java:142)
>>>>> >         at
>>>>> >
>>>>> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
>>>>> >         at
>>>>> >
>>>>> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
>>>>> >         at
>>>>> >
>>>>> org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:117)
>>>>> >         at
>>>>> >
>>>>> org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
>>>>> >         at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
>>>>> >         at
>>>>> >
>>>>> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
>>>>> >         at
>>>>> org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
>>>>> >         at
>>>>> >
>>>>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)
>>>>> > 16/02/03 23:02:18 INFO mapred.JobClient: Job complete:
>>>>> > job_local1308764206_0003
>>>>> > 16/02/03 23:02:18 INFO mapred.JobClient: Counters: 20
>>>>> > 16/02/03 23:02:18 INFO mapred.JobClient:   File Output Format
>>>>> Counters
>>>>> > 16/02/03 23:02:18 INFO mapred.JobClient:     Bytes Written=14923244
>>>>> > 16/02/03 23:02:18 INFO mapred.JobClient:   FileSystemCounters
>>>>> > 16/02/03 23:02:18 INFO mapred.JobClient:
>>>>>  FILE_BYTES_READ=1412144036729
>>>>> > 16/02/03 23:02:18 INFO mapred.JobClient:
>>>>> > FILE_BYTES_WRITTEN=323876626568
>>>>> > 16/02/03 23:02:18 INFO mapred.JobClient:   File Input Format Counters
>>>>> > 16/02/03 23:02:18 INFO mapred.JobClient:     Bytes Read=11885543289
>>>>> > 16/02/03 23:02:18 INFO mapred.JobClient:   Map-Reduce Framework
>>>>> > 16/02/03 23:02:18 INFO mapred.JobClient:     Reduce input groups=223
>>>>> > 16/02/03 23:02:18 INFO mapred.JobClient:     Map output materialized
>>>>> > bytes=2214020551
>>>>> > 16/02/03 23:02:18 INFO mapred.JobClient:     Combine output records=0
>>>>> > 16/02/03 23:02:18 INFO mapred.JobClient:     Map input records=223
>>>>> > 16/02/03 23:02:18 INFO mapred.JobClient:     Reduce shuffle bytes=0
>>>>> > 16/02/03 23:02:18 INFO mapred.JobClient:     Physical memory (bytes)
>>>>> > snapshot=0
>>>>> > 16/02/03 23:02:18 INFO mapred.JobClient:     Reduce output
>>>>> records=222
>>>>> > 16/02/03 23:02:18 INFO mapred.JobClient:     Spilled Records=638
>>>>> > 16/02/03 23:02:18 INFO mapred.JobClient:     Map output
>>>>> bytes=2214019100
>>>>> > 16/02/03 23:02:18 INFO mapred.JobClient:     CPU time spent (ms)=0
>>>>> > 16/02/03 23:02:18 INFO mapred.JobClient:     Total committed heap
>>>>> usaAT
>>>>> > (bytes)=735978192896
>>>>> > 16/02/03 23:02:18 INFO mapred.JobClient:     Virtual memory (bytes)
>>>>> > snapshot=0
>>>>> > 16/02/03 23:02:18 INFO mapred.JobClient:     Combine input records=0
>>>>> > 16/02/03 23:02:18 INFO mapred.JobClient:     Map output records=223
>>>>> > 16/02/03 23:02:18 INFO mapred.JobClient:     SPLIT_RAW_BYTES=9100
>>>>> > 16/02/03 23:02:18 INFO mapred.JobClient:     Reduce input records=222
>>>>> > Exception in thread "main" java.lang.IllegalStateException: Job
>>>>> failed!
>>>>> >         at
>>>>> >
>>>>> org.apache.mahout.vectorizer.DictionaryVectorizer.makePartialVectors(DictionaryVectorizer.java:329)
>>>>> >         at
>>>>> >
>>>>> org.apache.mahout.vectorizer.DictionaryVectorizer.createTermFrequencyVectors(DictionaryVectorizer.java:199)
>>>>> >         at
>>>>> >
>>>>> org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles.run(SparseVectorsFromSequenceFiles.java:274)
>>>>> >         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>>>> >         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>>>>> >         at
>>>>> >
>>>>> org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles.main(SparseVectorsFromSequenceFiles.java:56)
>>>>> >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>>>>> Method)
>>>>> >         at
>>>>> >
>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>>> >         at
>>>>> >
>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>>> >         at java.lang.reflect.Method.invoke(Method.java:606)
>>>>> >         at
>>>>> >
>>>>> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>>>>> >         at
>>>>> > org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>>>>> >         at
>>>>> > org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
>>>>> > .
>>>>> > .
>>>>> >
>>>>> >
>>>>> >
>>>>> > --
>>>>> > Thanks & Regards,
>>>>> >
>>>>> > Alok R. Tanna
>>>>> >
>>>>> >
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Thanks & Regards,
>>>>
>>>> Alok R. Tanna
>>>>
>>>>
>>>
>>>
>>
>>
>> --
>> Thanks & Regards,
>>
>> Alok R. Tanna
>>
>>
>
>

Re: Mahout error : seq2sparse

Posted by Andrew Musselman <an...@gmail.com>.
$ for i in `ls input-directory`; do sed -i '/^$/d' input-directory/$i; done

On Wed, Feb 3, 2016 at 9:08 PM, Alok Tanna <ta...@gmail.com> wrote:

> This command works thank you  , yes I am seeing lot of empty lines in my
> input files. any magic command to remove this lines that would save lot of
> time.
> I would re run this once I have removed empty lines.
>
> It would be great if I can get this working in local mode or else I will
> have to send few days to get it working on hadoop\spark cluster.
>
> Thanks,
> Alok Tanna
>
> On Wed, Feb 3, 2016 at 11:38 PM, Andrew Musselman <
> andrew.musselman@gmail.com> wrote:
>
>> Ah; looks like that config can be set in Hadoop's core-site.xml but if
>> you're running Mahout in local mode that shouldn't help.
>>
>> Can you try this with local mode off, in other words on a running
>> Hadoop/Spark cluster?
>>
>> Looking for empty lines could be run via a command like `grep -r "^$"
>> input-file-directory`; blank lines will show up before your next prompt if
>> so.
>>
>> On Wed, Feb 3, 2016 at 8:30 PM, Alok Tanna <ta...@gmail.com> wrote:
>>
>>> Thank you Andrew for the quick response . I have around 300 input files.
>>> It would take a while for me to go though each file. I will try to look
>>> into that, but then I had successfully generated the sequence file use mahout
>>> seqdirectory for the same dataset. How can I find which mahout release I am
>>> on? also can you let me know how can I increase io.sort.mb = 100 when I
>>> have Mahout running in local mode.
>>>
>>> In the earlier attach file you can see it says 16/02/03 22:59:04 INFO
>>> mapred.MapTask: Record too large for in-memory buffer: 99614722 bytes
>>>
>>> How can I increase in-memory buffer for Mahout local mode.
>>>
>>> I hope this has nothing to do with this error.
>>>
>>> Thanks,
>>> Alok Tanna
>>>
>>> On Wed, Feb 3, 2016 at 10:50 PM, Andrew Musselman <
>>> andrew.musselman@gmail.com> wrote:
>>>
>>>> Is it possible you have any empty lines or extra whitespace at the end
>>>> or
>>>> in the middle of any of your input files?  I don't know for sure but
>>>> that's
>>>> where I'd start looking.
>>>>
>>>> Are you on the most recent release?
>>>>
>>>> On Wed, Feb 3, 2016 at 7:33 PM, Alok Tanna <ta...@gmail.com> wrote:
>>>>
>>>> > Mahout in local mode
>>>> >
>>>> > I am able to successfully run the below command on smaller data set,
>>>> but
>>>> > then when I am running this command on large data set I am getting
>>>> below
>>>> > error.  Its looks like I need to increase size of some parameter but
>>>> then I
>>>> > am not sure which one.  It is failing with this error
>>>> java.io.EOFException
>>>> >   which creating the dictionary-0 file
>>>> >
>>>> > Please fine the attached file for more details.
>>>> >
>>>> > command: mahout seq2sparse -i /home/ubuntu/AT/AT-Seq/ -o
>>>> > /home/ubuntu/AT/AT-vectors/ -lnorm -nv -wt tfidf
>>>> >
>>>> > Main error :
>>>> >
>>>> >
>>>> > 16/02/03 23:02:06 INFO mapred.LocalJobRunner: reduce > reduce
>>>> > 16/02/03 23:02:17 INFO mapred.LocalJobRunner: reduce > reduce
>>>> > 16/02/03 23:02:18 WARN mapred.LocalJobRunner: job_local1308764206_0003
>>>> > java.io.EOFException
>>>> >         at java.io.DataInputStream.readByte(DataInputStream.java:267)
>>>> >         at
>>>> > org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:299)
>>>> >         at
>>>> > org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:320)
>>>> >         at org.apache.hadoop.io.Text.readFields(Text.java:263)
>>>> >         at
>>>> > org.apache.mahout.common.StringTuple.readFields(StringTuple.java:142)
>>>> >         at
>>>> >
>>>> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
>>>> >         at
>>>> >
>>>> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
>>>> >         at
>>>> >
>>>> org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:117)
>>>> >         at
>>>> >
>>>> org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
>>>> >         at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
>>>> >         at
>>>> > org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
>>>> >         at
>>>> org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
>>>> >         at
>>>> >
>>>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)
>>>> > 16/02/03 23:02:18 INFO mapred.JobClient: Job complete:
>>>> > job_local1308764206_0003
>>>> > 16/02/03 23:02:18 INFO mapred.JobClient: Counters: 20
>>>> > 16/02/03 23:02:18 INFO mapred.JobClient:   File Output Format Counters
>>>> > 16/02/03 23:02:18 INFO mapred.JobClient:     Bytes Written=14923244
>>>> > 16/02/03 23:02:18 INFO mapred.JobClient:   FileSystemCounters
>>>> > 16/02/03 23:02:18 INFO mapred.JobClient:
>>>>  FILE_BYTES_READ=1412144036729
>>>> > 16/02/03 23:02:18 INFO mapred.JobClient:
>>>> > FILE_BYTES_WRITTEN=323876626568
>>>> > 16/02/03 23:02:18 INFO mapred.JobClient:   File Input Format Counters
>>>> > 16/02/03 23:02:18 INFO mapred.JobClient:     Bytes Read=11885543289
>>>> > 16/02/03 23:02:18 INFO mapred.JobClient:   Map-Reduce Framework
>>>> > 16/02/03 23:02:18 INFO mapred.JobClient:     Reduce input groups=223
>>>> > 16/02/03 23:02:18 INFO mapred.JobClient:     Map output materialized
>>>> > bytes=2214020551
>>>> > 16/02/03 23:02:18 INFO mapred.JobClient:     Combine output records=0
>>>> > 16/02/03 23:02:18 INFO mapred.JobClient:     Map input records=223
>>>> > 16/02/03 23:02:18 INFO mapred.JobClient:     Reduce shuffle bytes=0
>>>> > 16/02/03 23:02:18 INFO mapred.JobClient:     Physical memory (bytes)
>>>> > snapshot=0
>>>> > 16/02/03 23:02:18 INFO mapred.JobClient:     Reduce output records=222
>>>> > 16/02/03 23:02:18 INFO mapred.JobClient:     Spilled Records=638
>>>> > 16/02/03 23:02:18 INFO mapred.JobClient:     Map output
>>>> bytes=2214019100
>>>> > 16/02/03 23:02:18 INFO mapred.JobClient:     CPU time spent (ms)=0
>>>> > 16/02/03 23:02:18 INFO mapred.JobClient:     Total committed heap
>>>> usaAT
>>>> > (bytes)=735978192896
>>>> > 16/02/03 23:02:18 INFO mapred.JobClient:     Virtual memory (bytes)
>>>> > snapshot=0
>>>> > 16/02/03 23:02:18 INFO mapred.JobClient:     Combine input records=0
>>>> > 16/02/03 23:02:18 INFO mapred.JobClient:     Map output records=223
>>>> > 16/02/03 23:02:18 INFO mapred.JobClient:     SPLIT_RAW_BYTES=9100
>>>> > 16/02/03 23:02:18 INFO mapred.JobClient:     Reduce input records=222
>>>> > Exception in thread "main" java.lang.IllegalStateException: Job
>>>> failed!
>>>> >         at
>>>> >
>>>> org.apache.mahout.vectorizer.DictionaryVectorizer.makePartialVectors(DictionaryVectorizer.java:329)
>>>> >         at
>>>> >
>>>> org.apache.mahout.vectorizer.DictionaryVectorizer.createTermFrequencyVectors(DictionaryVectorizer.java:199)
>>>> >         at
>>>> >
>>>> org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles.run(SparseVectorsFromSequenceFiles.java:274)
>>>> >         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>>> >         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>>>> >         at
>>>> >
>>>> org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles.main(SparseVectorsFromSequenceFiles.java:56)
>>>> >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>> >         at
>>>> >
>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>> >         at
>>>> >
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>> >         at java.lang.reflect.Method.invoke(Method.java:606)
>>>> >         at
>>>> >
>>>> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>>>> >         at
>>>> > org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>>>> >         at
>>>> > org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
>>>> > .
>>>> > .
>>>> >
>>>> >
>>>> >
>>>> > --
>>>> > Thanks & Regards,
>>>> >
>>>> > Alok R. Tanna
>>>> >
>>>> >
>>>>
>>>
>>>
>>>
>>> --
>>> Thanks & Regards,
>>>
>>> Alok R. Tanna
>>>
>>>
>>
>>
>
>
> --
> Thanks & Regards,
>
> Alok R. Tanna
>
>

Re: Mahout error : seq2sparse

Posted by Alok Tanna <ta...@gmail.com>.
This command works thank you  , yes I am seeing lot of empty lines in my
input files. any magic command to remove this lines that would save lot of
time.
I would re run this once I have removed empty lines.

It would be great if I can get this working in local mode or else I will
have to send few days to get it working on hadoop\spark cluster.

Thanks,
Alok Tanna

On Wed, Feb 3, 2016 at 11:38 PM, Andrew Musselman <
andrew.musselman@gmail.com> wrote:

> Ah; looks like that config can be set in Hadoop's core-site.xml but if
> you're running Mahout in local mode that shouldn't help.
>
> Can you try this with local mode off, in other words on a running
> Hadoop/Spark cluster?
>
> Looking for empty lines could be run via a command like `grep -r "^$"
> input-file-directory`; blank lines will show up before your next prompt if
> so.
>
> On Wed, Feb 3, 2016 at 8:30 PM, Alok Tanna <ta...@gmail.com> wrote:
>
>> Thank you Andrew for the quick response . I have around 300 input files.
>> It would take a while for me to go though each file. I will try to look
>> into that, but then I had successfully generated the sequence file use mahout
>> seqdirectory for the same dataset. How can I find which mahout release I am
>> on? also can you let me know how can I increase io.sort.mb = 100 when I
>> have Mahout running in local mode.
>>
>> In the earlier attach file you can see it says 16/02/03 22:59:04 INFO
>> mapred.MapTask: Record too large for in-memory buffer: 99614722 bytes
>>
>> How can I increase in-memory buffer for Mahout local mode.
>>
>> I hope this has nothing to do with this error.
>>
>> Thanks,
>> Alok Tanna
>>
>> On Wed, Feb 3, 2016 at 10:50 PM, Andrew Musselman <
>> andrew.musselman@gmail.com> wrote:
>>
>>> Is it possible you have any empty lines or extra whitespace at the end or
>>> in the middle of any of your input files?  I don't know for sure but
>>> that's
>>> where I'd start looking.
>>>
>>> Are you on the most recent release?
>>>
>>> On Wed, Feb 3, 2016 at 7:33 PM, Alok Tanna <ta...@gmail.com> wrote:
>>>
>>> > Mahout in local mode
>>> >
>>> > I am able to successfully run the below command on smaller data set,
>>> but
>>> > then when I am running this command on large data set I am getting
>>> below
>>> > error.  Its looks like I need to increase size of some parameter but
>>> then I
>>> > am not sure which one.  It is failing with this error
>>> java.io.EOFException
>>> >   which creating the dictionary-0 file
>>> >
>>> > Please fine the attached file for more details.
>>> >
>>> > command: mahout seq2sparse -i /home/ubuntu/AT/AT-Seq/ -o
>>> > /home/ubuntu/AT/AT-vectors/ -lnorm -nv -wt tfidf
>>> >
>>> > Main error :
>>> >
>>> >
>>> > 16/02/03 23:02:06 INFO mapred.LocalJobRunner: reduce > reduce
>>> > 16/02/03 23:02:17 INFO mapred.LocalJobRunner: reduce > reduce
>>> > 16/02/03 23:02:18 WARN mapred.LocalJobRunner: job_local1308764206_0003
>>> > java.io.EOFException
>>> >         at java.io.DataInputStream.readByte(DataInputStream.java:267)
>>> >         at
>>> > org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:299)
>>> >         at
>>> > org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:320)
>>> >         at org.apache.hadoop.io.Text.readFields(Text.java:263)
>>> >         at
>>> > org.apache.mahout.common.StringTuple.readFields(StringTuple.java:142)
>>> >         at
>>> >
>>> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
>>> >         at
>>> >
>>> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
>>> >         at
>>> >
>>> org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:117)
>>> >         at
>>> >
>>> org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
>>> >         at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
>>> >         at
>>> > org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
>>> >         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
>>> >         at
>>> >
>>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)
>>> > 16/02/03 23:02:18 INFO mapred.JobClient: Job complete:
>>> > job_local1308764206_0003
>>> > 16/02/03 23:02:18 INFO mapred.JobClient: Counters: 20
>>> > 16/02/03 23:02:18 INFO mapred.JobClient:   File Output Format Counters
>>> > 16/02/03 23:02:18 INFO mapred.JobClient:     Bytes Written=14923244
>>> > 16/02/03 23:02:18 INFO mapred.JobClient:   FileSystemCounters
>>> > 16/02/03 23:02:18 INFO mapred.JobClient:
>>>  FILE_BYTES_READ=1412144036729
>>> > 16/02/03 23:02:18 INFO mapred.JobClient:
>>> > FILE_BYTES_WRITTEN=323876626568
>>> > 16/02/03 23:02:18 INFO mapred.JobClient:   File Input Format Counters
>>> > 16/02/03 23:02:18 INFO mapred.JobClient:     Bytes Read=11885543289
>>> > 16/02/03 23:02:18 INFO mapred.JobClient:   Map-Reduce Framework
>>> > 16/02/03 23:02:18 INFO mapred.JobClient:     Reduce input groups=223
>>> > 16/02/03 23:02:18 INFO mapred.JobClient:     Map output materialized
>>> > bytes=2214020551
>>> > 16/02/03 23:02:18 INFO mapred.JobClient:     Combine output records=0
>>> > 16/02/03 23:02:18 INFO mapred.JobClient:     Map input records=223
>>> > 16/02/03 23:02:18 INFO mapred.JobClient:     Reduce shuffle bytes=0
>>> > 16/02/03 23:02:18 INFO mapred.JobClient:     Physical memory (bytes)
>>> > snapshot=0
>>> > 16/02/03 23:02:18 INFO mapred.JobClient:     Reduce output records=222
>>> > 16/02/03 23:02:18 INFO mapred.JobClient:     Spilled Records=638
>>> > 16/02/03 23:02:18 INFO mapred.JobClient:     Map output
>>> bytes=2214019100
>>> > 16/02/03 23:02:18 INFO mapred.JobClient:     CPU time spent (ms)=0
>>> > 16/02/03 23:02:18 INFO mapred.JobClient:     Total committed heap usaAT
>>> > (bytes)=735978192896
>>> > 16/02/03 23:02:18 INFO mapred.JobClient:     Virtual memory (bytes)
>>> > snapshot=0
>>> > 16/02/03 23:02:18 INFO mapred.JobClient:     Combine input records=0
>>> > 16/02/03 23:02:18 INFO mapred.JobClient:     Map output records=223
>>> > 16/02/03 23:02:18 INFO mapred.JobClient:     SPLIT_RAW_BYTES=9100
>>> > 16/02/03 23:02:18 INFO mapred.JobClient:     Reduce input records=222
>>> > Exception in thread "main" java.lang.IllegalStateException: Job failed!
>>> >         at
>>> >
>>> org.apache.mahout.vectorizer.DictionaryVectorizer.makePartialVectors(DictionaryVectorizer.java:329)
>>> >         at
>>> >
>>> org.apache.mahout.vectorizer.DictionaryVectorizer.createTermFrequencyVectors(DictionaryVectorizer.java:199)
>>> >         at
>>> >
>>> org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles.run(SparseVectorsFromSequenceFiles.java:274)
>>> >         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>> >         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>>> >         at
>>> >
>>> org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles.main(SparseVectorsFromSequenceFiles.java:56)
>>> >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>> >         at
>>> >
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>> >         at
>>> >
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>> >         at java.lang.reflect.Method.invoke(Method.java:606)
>>> >         at
>>> >
>>> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>>> >         at
>>> > org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>>> >         at
>>> > org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
>>> > .
>>> > .
>>> >
>>> >
>>> >
>>> > --
>>> > Thanks & Regards,
>>> >
>>> > Alok R. Tanna
>>> >
>>> >
>>>
>>
>>
>>
>> --
>> Thanks & Regards,
>>
>> Alok R. Tanna
>>
>>
>
>


-- 
Thanks & Regards,

Alok R. Tanna

Re: Mahout error : seq2sparse

Posted by Andrew Musselman <an...@gmail.com>.
Ah; looks like that config can be set in Hadoop's core-site.xml but if
you're running Mahout in local mode that shouldn't help.

Can you try this with local mode off, in other words on a running
Hadoop/Spark cluster?

Looking for empty lines could be run via a command like `grep -r "^$"
input-file-directory`; blank lines will show up before your next prompt if
so.

On Wed, Feb 3, 2016 at 8:30 PM, Alok Tanna <ta...@gmail.com> wrote:

> Thank you Andrew for the quick response . I have around 300 input files.
> It would take a while for me to go though each file. I will try to look
> into that, but then I had successfully generated the sequence file use mahout
> seqdirectory for the same dataset. How can I find which mahout release I am
> on? also can you let me know how can I increase io.sort.mb = 100 when I
> have Mahout running in local mode.
>
> In the earlier attach file you can see it says 16/02/03 22:59:04 INFO
> mapred.MapTask: Record too large for in-memory buffer: 99614722 bytes
>
> How can I increase in-memory buffer for Mahout local mode.
>
> I hope this has nothing to do with this error.
>
> Thanks,
> Alok Tanna
>
> On Wed, Feb 3, 2016 at 10:50 PM, Andrew Musselman <
> andrew.musselman@gmail.com> wrote:
>
>> Is it possible you have any empty lines or extra whitespace at the end or
>> in the middle of any of your input files?  I don't know for sure but
>> that's
>> where I'd start looking.
>>
>> Are you on the most recent release?
>>
>> On Wed, Feb 3, 2016 at 7:33 PM, Alok Tanna <ta...@gmail.com> wrote:
>>
>> > Mahout in local mode
>> >
>> > I am able to successfully run the below command on smaller data set, but
>> > then when I am running this command on large data set I am getting below
>> > error.  Its looks like I need to increase size of some parameter but
>> then I
>> > am not sure which one.  It is failing with this error
>> java.io.EOFException
>> >   which creating the dictionary-0 file
>> >
>> > Please fine the attached file for more details.
>> >
>> > command: mahout seq2sparse -i /home/ubuntu/AT/AT-Seq/ -o
>> > /home/ubuntu/AT/AT-vectors/ -lnorm -nv -wt tfidf
>> >
>> > Main error :
>> >
>> >
>> > 16/02/03 23:02:06 INFO mapred.LocalJobRunner: reduce > reduce
>> > 16/02/03 23:02:17 INFO mapred.LocalJobRunner: reduce > reduce
>> > 16/02/03 23:02:18 WARN mapred.LocalJobRunner: job_local1308764206_0003
>> > java.io.EOFException
>> >         at java.io.DataInputStream.readByte(DataInputStream.java:267)
>> >         at
>> > org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:299)
>> >         at
>> > org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:320)
>> >         at org.apache.hadoop.io.Text.readFields(Text.java:263)
>> >         at
>> > org.apache.mahout.common.StringTuple.readFields(StringTuple.java:142)
>> >         at
>> >
>> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
>> >         at
>> >
>> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
>> >         at
>> >
>> org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:117)
>> >         at
>> > org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
>> >         at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
>> >         at
>> > org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
>> >         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
>> >         at
>> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)
>> > 16/02/03 23:02:18 INFO mapred.JobClient: Job complete:
>> > job_local1308764206_0003
>> > 16/02/03 23:02:18 INFO mapred.JobClient: Counters: 20
>> > 16/02/03 23:02:18 INFO mapred.JobClient:   File Output Format Counters
>> > 16/02/03 23:02:18 INFO mapred.JobClient:     Bytes Written=14923244
>> > 16/02/03 23:02:18 INFO mapred.JobClient:   FileSystemCounters
>> > 16/02/03 23:02:18 INFO mapred.JobClient:
>>  FILE_BYTES_READ=1412144036729
>> > 16/02/03 23:02:18 INFO mapred.JobClient:
>> > FILE_BYTES_WRITTEN=323876626568
>> > 16/02/03 23:02:18 INFO mapred.JobClient:   File Input Format Counters
>> > 16/02/03 23:02:18 INFO mapred.JobClient:     Bytes Read=11885543289
>> > 16/02/03 23:02:18 INFO mapred.JobClient:   Map-Reduce Framework
>> > 16/02/03 23:02:18 INFO mapred.JobClient:     Reduce input groups=223
>> > 16/02/03 23:02:18 INFO mapred.JobClient:     Map output materialized
>> > bytes=2214020551
>> > 16/02/03 23:02:18 INFO mapred.JobClient:     Combine output records=0
>> > 16/02/03 23:02:18 INFO mapred.JobClient:     Map input records=223
>> > 16/02/03 23:02:18 INFO mapred.JobClient:     Reduce shuffle bytes=0
>> > 16/02/03 23:02:18 INFO mapred.JobClient:     Physical memory (bytes)
>> > snapshot=0
>> > 16/02/03 23:02:18 INFO mapred.JobClient:     Reduce output records=222
>> > 16/02/03 23:02:18 INFO mapred.JobClient:     Spilled Records=638
>> > 16/02/03 23:02:18 INFO mapred.JobClient:     Map output bytes=2214019100
>> > 16/02/03 23:02:18 INFO mapred.JobClient:     CPU time spent (ms)=0
>> > 16/02/03 23:02:18 INFO mapred.JobClient:     Total committed heap usaAT
>> > (bytes)=735978192896
>> > 16/02/03 23:02:18 INFO mapred.JobClient:     Virtual memory (bytes)
>> > snapshot=0
>> > 16/02/03 23:02:18 INFO mapred.JobClient:     Combine input records=0
>> > 16/02/03 23:02:18 INFO mapred.JobClient:     Map output records=223
>> > 16/02/03 23:02:18 INFO mapred.JobClient:     SPLIT_RAW_BYTES=9100
>> > 16/02/03 23:02:18 INFO mapred.JobClient:     Reduce input records=222
>> > Exception in thread "main" java.lang.IllegalStateException: Job failed!
>> >         at
>> >
>> org.apache.mahout.vectorizer.DictionaryVectorizer.makePartialVectors(DictionaryVectorizer.java:329)
>> >         at
>> >
>> org.apache.mahout.vectorizer.DictionaryVectorizer.createTermFrequencyVectors(DictionaryVectorizer.java:199)
>> >         at
>> >
>> org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles.run(SparseVectorsFromSequenceFiles.java:274)
>> >         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>> >         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>> >         at
>> >
>> org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles.main(SparseVectorsFromSequenceFiles.java:56)
>> >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> >         at
>> >
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>> >         at
>> >
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> >         at java.lang.reflect.Method.invoke(Method.java:606)
>> >         at
>> >
>> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>> >         at
>> > org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>> >         at
>> > org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
>> > .
>> > .
>> >
>> >
>> >
>> > --
>> > Thanks & Regards,
>> >
>> > Alok R. Tanna
>> >
>> >
>>
>
>
>
> --
> Thanks & Regards,
>
> Alok R. Tanna
>
>

Re: Mahout error : seq2sparse

Posted by Alok Tanna <ta...@gmail.com>.
Thank you Andrew for the quick response . I have around 300 input files. It
would take a while for me to go though each file. I will try to look into
that, but then I had successfully generated the sequence file use mahout
seqdirectory for the same dataset. How can I find which mahout release I am
on? also can you let me know how can I increase io.sort.mb = 100 when I
have Mahout running in local mode.

In the earlier attach file you can see it says 16/02/03 22:59:04 INFO
mapred.MapTask: Record too large for in-memory buffer: 99614722 bytes

How can I increase in-memory buffer for Mahout local mode.

I hope this has nothing to do with this error.

Thanks,
Alok Tanna

On Wed, Feb 3, 2016 at 10:50 PM, Andrew Musselman <
andrew.musselman@gmail.com> wrote:

> Is it possible you have any empty lines or extra whitespace at the end or
> in the middle of any of your input files?  I don't know for sure but that's
> where I'd start looking.
>
> Are you on the most recent release?
>
> On Wed, Feb 3, 2016 at 7:33 PM, Alok Tanna <ta...@gmail.com> wrote:
>
> > Mahout in local mode
> >
> > I am able to successfully run the below command on smaller data set, but
> > then when I am running this command on large data set I am getting below
> > error.  Its looks like I need to increase size of some parameter but
> then I
> > am not sure which one.  It is failing with this error
> java.io.EOFException
> >   which creating the dictionary-0 file
> >
> > Please fine the attached file for more details.
> >
> > command: mahout seq2sparse -i /home/ubuntu/AT/AT-Seq/ -o
> > /home/ubuntu/AT/AT-vectors/ -lnorm -nv -wt tfidf
> >
> > Main error :
> >
> >
> > 16/02/03 23:02:06 INFO mapred.LocalJobRunner: reduce > reduce
> > 16/02/03 23:02:17 INFO mapred.LocalJobRunner: reduce > reduce
> > 16/02/03 23:02:18 WARN mapred.LocalJobRunner: job_local1308764206_0003
> > java.io.EOFException
> >         at java.io.DataInputStream.readByte(DataInputStream.java:267)
> >         at
> > org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:299)
> >         at
> > org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:320)
> >         at org.apache.hadoop.io.Text.readFields(Text.java:263)
> >         at
> > org.apache.mahout.common.StringTuple.readFields(StringTuple.java:142)
> >         at
> >
> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
> >         at
> >
> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
> >         at
> >
> org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:117)
> >         at
> > org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
> >         at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
> >         at
> > org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
> >         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
> >         at
> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)
> > 16/02/03 23:02:18 INFO mapred.JobClient: Job complete:
> > job_local1308764206_0003
> > 16/02/03 23:02:18 INFO mapred.JobClient: Counters: 20
> > 16/02/03 23:02:18 INFO mapred.JobClient:   File Output Format Counters
> > 16/02/03 23:02:18 INFO mapred.JobClient:     Bytes Written=14923244
> > 16/02/03 23:02:18 INFO mapred.JobClient:   FileSystemCounters
> > 16/02/03 23:02:18 INFO mapred.JobClient:
>  FILE_BYTES_READ=1412144036729
> > 16/02/03 23:02:18 INFO mapred.JobClient:
> > FILE_BYTES_WRITTEN=323876626568
> > 16/02/03 23:02:18 INFO mapred.JobClient:   File Input Format Counters
> > 16/02/03 23:02:18 INFO mapred.JobClient:     Bytes Read=11885543289
> > 16/02/03 23:02:18 INFO mapred.JobClient:   Map-Reduce Framework
> > 16/02/03 23:02:18 INFO mapred.JobClient:     Reduce input groups=223
> > 16/02/03 23:02:18 INFO mapred.JobClient:     Map output materialized
> > bytes=2214020551
> > 16/02/03 23:02:18 INFO mapred.JobClient:     Combine output records=0
> > 16/02/03 23:02:18 INFO mapred.JobClient:     Map input records=223
> > 16/02/03 23:02:18 INFO mapred.JobClient:     Reduce shuffle bytes=0
> > 16/02/03 23:02:18 INFO mapred.JobClient:     Physical memory (bytes)
> > snapshot=0
> > 16/02/03 23:02:18 INFO mapred.JobClient:     Reduce output records=222
> > 16/02/03 23:02:18 INFO mapred.JobClient:     Spilled Records=638
> > 16/02/03 23:02:18 INFO mapred.JobClient:     Map output bytes=2214019100
> > 16/02/03 23:02:18 INFO mapred.JobClient:     CPU time spent (ms)=0
> > 16/02/03 23:02:18 INFO mapred.JobClient:     Total committed heap usaAT
> > (bytes)=735978192896
> > 16/02/03 23:02:18 INFO mapred.JobClient:     Virtual memory (bytes)
> > snapshot=0
> > 16/02/03 23:02:18 INFO mapred.JobClient:     Combine input records=0
> > 16/02/03 23:02:18 INFO mapred.JobClient:     Map output records=223
> > 16/02/03 23:02:18 INFO mapred.JobClient:     SPLIT_RAW_BYTES=9100
> > 16/02/03 23:02:18 INFO mapred.JobClient:     Reduce input records=222
> > Exception in thread "main" java.lang.IllegalStateException: Job failed!
> >         at
> >
> org.apache.mahout.vectorizer.DictionaryVectorizer.makePartialVectors(DictionaryVectorizer.java:329)
> >         at
> >
> org.apache.mahout.vectorizer.DictionaryVectorizer.createTermFrequencyVectors(DictionaryVectorizer.java:199)
> >         at
> >
> org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles.run(SparseVectorsFromSequenceFiles.java:274)
> >         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> >         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
> >         at
> >
> org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles.main(SparseVectorsFromSequenceFiles.java:56)
> >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >         at
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> >         at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> >         at java.lang.reflect.Method.invoke(Method.java:606)
> >         at
> >
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> >         at
> > org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> >         at
> > org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
> > .
> > .
> >
> >
> >
> > --
> > Thanks & Regards,
> >
> > Alok R. Tanna
> >
> >
>



-- 
Thanks & Regards,

Alok R. Tanna

Re: Mahout error : seq2sparse

Posted by Alok Tanna <ta...@gmail.com>.
Thank you Andrew for the quick response . I have around 300 input files. It
would take a while for me to go though each file. I will try to look into
that, but then I had successfully generated the sequence file use mahout
seqdirectory for the same dataset. How can I find which mahout release I am
on? also can you let me know how can I increase io.sort.mb = 100 when I
have Mahout running in local mode.

In the earlier attach file you can see it says 16/02/03 22:59:04 INFO
mapred.MapTask: Record too large for in-memory buffer: 99614722 bytes

How can I increase in-memory buffer for Mahout local mode.

I hope this has nothing to do with this error.

Thanks,
Alok Tanna

On Wed, Feb 3, 2016 at 10:50 PM, Andrew Musselman <
andrew.musselman@gmail.com> wrote:

> Is it possible you have any empty lines or extra whitespace at the end or
> in the middle of any of your input files?  I don't know for sure but that's
> where I'd start looking.
>
> Are you on the most recent release?
>
> On Wed, Feb 3, 2016 at 7:33 PM, Alok Tanna <ta...@gmail.com> wrote:
>
> > Mahout in local mode
> >
> > I am able to successfully run the below command on smaller data set, but
> > then when I am running this command on large data set I am getting below
> > error.  Its looks like I need to increase size of some parameter but
> then I
> > am not sure which one.  It is failing with this error
> java.io.EOFException
> >   which creating the dictionary-0 file
> >
> > Please fine the attached file for more details.
> >
> > command: mahout seq2sparse -i /home/ubuntu/AT/AT-Seq/ -o
> > /home/ubuntu/AT/AT-vectors/ -lnorm -nv -wt tfidf
> >
> > Main error :
> >
> >
> > 16/02/03 23:02:06 INFO mapred.LocalJobRunner: reduce > reduce
> > 16/02/03 23:02:17 INFO mapred.LocalJobRunner: reduce > reduce
> > 16/02/03 23:02:18 WARN mapred.LocalJobRunner: job_local1308764206_0003
> > java.io.EOFException
> >         at java.io.DataInputStream.readByte(DataInputStream.java:267)
> >         at
> > org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:299)
> >         at
> > org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:320)
> >         at org.apache.hadoop.io.Text.readFields(Text.java:263)
> >         at
> > org.apache.mahout.common.StringTuple.readFields(StringTuple.java:142)
> >         at
> >
> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
> >         at
> >
> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
> >         at
> >
> org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:117)
> >         at
> > org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
> >         at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
> >         at
> > org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
> >         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
> >         at
> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)
> > 16/02/03 23:02:18 INFO mapred.JobClient: Job complete:
> > job_local1308764206_0003
> > 16/02/03 23:02:18 INFO mapred.JobClient: Counters: 20
> > 16/02/03 23:02:18 INFO mapred.JobClient:   File Output Format Counters
> > 16/02/03 23:02:18 INFO mapred.JobClient:     Bytes Written=14923244
> > 16/02/03 23:02:18 INFO mapred.JobClient:   FileSystemCounters
> > 16/02/03 23:02:18 INFO mapred.JobClient:
>  FILE_BYTES_READ=1412144036729
> > 16/02/03 23:02:18 INFO mapred.JobClient:
> > FILE_BYTES_WRITTEN=323876626568
> > 16/02/03 23:02:18 INFO mapred.JobClient:   File Input Format Counters
> > 16/02/03 23:02:18 INFO mapred.JobClient:     Bytes Read=11885543289
> > 16/02/03 23:02:18 INFO mapred.JobClient:   Map-Reduce Framework
> > 16/02/03 23:02:18 INFO mapred.JobClient:     Reduce input groups=223
> > 16/02/03 23:02:18 INFO mapred.JobClient:     Map output materialized
> > bytes=2214020551
> > 16/02/03 23:02:18 INFO mapred.JobClient:     Combine output records=0
> > 16/02/03 23:02:18 INFO mapred.JobClient:     Map input records=223
> > 16/02/03 23:02:18 INFO mapred.JobClient:     Reduce shuffle bytes=0
> > 16/02/03 23:02:18 INFO mapred.JobClient:     Physical memory (bytes)
> > snapshot=0
> > 16/02/03 23:02:18 INFO mapred.JobClient:     Reduce output records=222
> > 16/02/03 23:02:18 INFO mapred.JobClient:     Spilled Records=638
> > 16/02/03 23:02:18 INFO mapred.JobClient:     Map output bytes=2214019100
> > 16/02/03 23:02:18 INFO mapred.JobClient:     CPU time spent (ms)=0
> > 16/02/03 23:02:18 INFO mapred.JobClient:     Total committed heap usaAT
> > (bytes)=735978192896
> > 16/02/03 23:02:18 INFO mapred.JobClient:     Virtual memory (bytes)
> > snapshot=0
> > 16/02/03 23:02:18 INFO mapred.JobClient:     Combine input records=0
> > 16/02/03 23:02:18 INFO mapred.JobClient:     Map output records=223
> > 16/02/03 23:02:18 INFO mapred.JobClient:     SPLIT_RAW_BYTES=9100
> > 16/02/03 23:02:18 INFO mapred.JobClient:     Reduce input records=222
> > Exception in thread "main" java.lang.IllegalStateException: Job failed!
> >         at
> >
> org.apache.mahout.vectorizer.DictionaryVectorizer.makePartialVectors(DictionaryVectorizer.java:329)
> >         at
> >
> org.apache.mahout.vectorizer.DictionaryVectorizer.createTermFrequencyVectors(DictionaryVectorizer.java:199)
> >         at
> >
> org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles.run(SparseVectorsFromSequenceFiles.java:274)
> >         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> >         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
> >         at
> >
> org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles.main(SparseVectorsFromSequenceFiles.java:56)
> >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >         at
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> >         at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> >         at java.lang.reflect.Method.invoke(Method.java:606)
> >         at
> >
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> >         at
> > org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> >         at
> > org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
> > .
> > .
> >
> >
> >
> > --
> > Thanks & Regards,
> >
> > Alok R. Tanna
> >
> >
>



-- 
Thanks & Regards,

Alok R. Tanna

Re: Mahout error : seq2sparse

Posted by Andrew Musselman <an...@gmail.com>.
Is it possible you have any empty lines or extra whitespace at the end or
in the middle of any of your input files?  I don't know for sure but that's
where I'd start looking.

Are you on the most recent release?

On Wed, Feb 3, 2016 at 7:33 PM, Alok Tanna <ta...@gmail.com> wrote:

> Mahout in local mode
>
> I am able to successfully run the below command on smaller data set, but
> then when I am running this command on large data set I am getting below
> error.  Its looks like I need to increase size of some parameter but then I
> am not sure which one.  It is failing with this error java.io.EOFException
>   which creating the dictionary-0 file
>
> Please fine the attached file for more details.
>
> command: mahout seq2sparse -i /home/ubuntu/AT/AT-Seq/ -o
> /home/ubuntu/AT/AT-vectors/ -lnorm -nv -wt tfidf
>
> Main error :
>
>
> 16/02/03 23:02:06 INFO mapred.LocalJobRunner: reduce > reduce
> 16/02/03 23:02:17 INFO mapred.LocalJobRunner: reduce > reduce
> 16/02/03 23:02:18 WARN mapred.LocalJobRunner: job_local1308764206_0003
> java.io.EOFException
>         at java.io.DataInputStream.readByte(DataInputStream.java:267)
>         at
> org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:299)
>         at
> org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:320)
>         at org.apache.hadoop.io.Text.readFields(Text.java:263)
>         at
> org.apache.mahout.common.StringTuple.readFields(StringTuple.java:142)
>         at
> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
>         at
> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
>         at
> org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:117)
>         at
> org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
>         at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
>         at
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
>         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
>         at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)
> 16/02/03 23:02:18 INFO mapred.JobClient: Job complete:
> job_local1308764206_0003
> 16/02/03 23:02:18 INFO mapred.JobClient: Counters: 20
> 16/02/03 23:02:18 INFO mapred.JobClient:   File Output Format Counters
> 16/02/03 23:02:18 INFO mapred.JobClient:     Bytes Written=14923244
> 16/02/03 23:02:18 INFO mapred.JobClient:   FileSystemCounters
> 16/02/03 23:02:18 INFO mapred.JobClient:     FILE_BYTES_READ=1412144036729
> 16/02/03 23:02:18 INFO mapred.JobClient:
> FILE_BYTES_WRITTEN=323876626568
> 16/02/03 23:02:18 INFO mapred.JobClient:   File Input Format Counters
> 16/02/03 23:02:18 INFO mapred.JobClient:     Bytes Read=11885543289
> 16/02/03 23:02:18 INFO mapred.JobClient:   Map-Reduce Framework
> 16/02/03 23:02:18 INFO mapred.JobClient:     Reduce input groups=223
> 16/02/03 23:02:18 INFO mapred.JobClient:     Map output materialized
> bytes=2214020551
> 16/02/03 23:02:18 INFO mapred.JobClient:     Combine output records=0
> 16/02/03 23:02:18 INFO mapred.JobClient:     Map input records=223
> 16/02/03 23:02:18 INFO mapred.JobClient:     Reduce shuffle bytes=0
> 16/02/03 23:02:18 INFO mapred.JobClient:     Physical memory (bytes)
> snapshot=0
> 16/02/03 23:02:18 INFO mapred.JobClient:     Reduce output records=222
> 16/02/03 23:02:18 INFO mapred.JobClient:     Spilled Records=638
> 16/02/03 23:02:18 INFO mapred.JobClient:     Map output bytes=2214019100
> 16/02/03 23:02:18 INFO mapred.JobClient:     CPU time spent (ms)=0
> 16/02/03 23:02:18 INFO mapred.JobClient:     Total committed heap usaAT
> (bytes)=735978192896
> 16/02/03 23:02:18 INFO mapred.JobClient:     Virtual memory (bytes)
> snapshot=0
> 16/02/03 23:02:18 INFO mapred.JobClient:     Combine input records=0
> 16/02/03 23:02:18 INFO mapred.JobClient:     Map output records=223
> 16/02/03 23:02:18 INFO mapred.JobClient:     SPLIT_RAW_BYTES=9100
> 16/02/03 23:02:18 INFO mapred.JobClient:     Reduce input records=222
> Exception in thread "main" java.lang.IllegalStateException: Job failed!
>         at
> org.apache.mahout.vectorizer.DictionaryVectorizer.makePartialVectors(DictionaryVectorizer.java:329)
>         at
> org.apache.mahout.vectorizer.DictionaryVectorizer.createTermFrequencyVectors(DictionaryVectorizer.java:199)
>         at
> org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles.run(SparseVectorsFromSequenceFiles.java:274)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>         at
> org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles.main(SparseVectorsFromSequenceFiles.java:56)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
>         at
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>         at
> org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>         at
> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
> .
> .
>
>
>
> --
> Thanks & Regards,
>
> Alok R. Tanna
>
>