You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@mahout.apache.org by Tharindu Rusira <th...@gmail.com> on 2014/01/31 10:58:24 UTC

Error while running ./cluster-reuters.sh with option "lda clustering"

Hi all,
I'm running Mahout examples from the latest Mahout 0.9 release candidate. I
got this error while running ./cluster-reuters.sh with option 3 lda
clustering. As to the error log, this does not seem to be a Mahout issue
but Hadoop(1.2.1) fails to write to
*/tmp/mahout-work-tkumara/reuters-lda. *This
is however strange because /tmp/mahout-work-tkumara/ does not have a
*reuters-lda
*directory and the exception stack trace complains that the said directory
already exists.

14/01/31 15:20:39 ERROR security.UserGroupInformation:
PriviledgedActionException as:tkumara
cause:org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory
/tmp/mahout-work-tkumara/reuters-lda already exists
Exception in thread "main"
org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory
/tmp/mahout-work-tkumara/reuters-lda already exists
    at
org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:137)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:973)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:394)
    at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
    at
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936)
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:550)
    at
org.apache.mahout.clustering.lda.cvb.CVB0Driver.writeTopicModel(CVB0Driver.java:441)
    at
org.apache.mahout.clustering.lda.cvb.CVB0Driver.run(CVB0Driver.java:336)
    at
org.apache.mahout.clustering.lda.cvb.CVB0Driver.run(CVB0Driver.java:198)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at
org.apache.mahout.clustering.lda.cvb.CVB0Driver.main(CVB0Driver.java:534)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
    at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
    at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:160)

I also checked relevant section in ./cluster-reuters.sh but could not find
anything there.

elif [ "x$clustertype" == "xlda" ]; then
  $MAHOUT seq2sparse \
    -i ${WORK_DIR}/reuters-out-seqdir/ \
    -o ${WORK_DIR}/reuters-out-seqdir-sparse-lda -ow --maxDFPercent 85
--namedVector \
  && \
  $MAHOUT rowid \
    -i ${WORK_DIR}/reuters-out-seqdir-sparse-lda/tfidf-vectors \
    -o ${WORK_DIR}/reuters-out-matrix \
  && \
  rm -rf ${WORK_DIR}/reuters-lda ${WORK_DIR}/reuters-lda-topics
${WORK_DIR}/reuters-lda-model \
  && \
  $MAHOUT cvb \
    -i ${WORK_DIR}/reuters-out-matrix/matrix \
    -o ${WORK_DIR}/reuters-lda -k 20 -ow -x 20 \
    -dict ${WORK_DIR}/reuters-out-seqdir-sparse-lda/dictionary.file-* \
    -dt ${WORK_DIR}/reuters-lda-topics \
    -mt ${WORK_DIR}/reuters-lda-model \
  && \
  $MAHOUT vectordump \
    -i ${WORK_DIR}/reuters-lda-topics/part-m-00000 \
    -o ${WORK_DIR}/reuters-lda/vectordump \
    -vs 10 -p true \
    -d ${WORK_DIR}/reuters-out-seqdir-sparse-lda/dictionary.file-* \
    -dt sequencefile -sort ${WORK_DIR}/reuters-lda-topics/part-m-00000 \
    && \
  cat ${WORK_DIR}/reuters-lda/vectordump

So what would possibly be the reason for this exception?
Thanks,
-- 
M.P. Tharindu Rusira Kumara

Department of Computer Science and Engineering,
University of Moratuwa,
Sri Lanka.
+94757033733
www.tharindu-rusira.blogspot.com

Re: Error while running ./cluster-reuters.sh with option "lda clustering"

Posted by Tharindu Rusira <th...@gmail.com>.

I managed to overcome the issue by using the famous hadoop trick,
formatting the namenode and restarting hadoop. But still I have no clue
what went wrong the first time but the problem was obviously with Hadoop.

$HADOOP_HOME/bin/stop-all.sh
$HADOOP_HOME/bin/hadoop namenode -format
$HADOOP_HOME/bin/start-all.sh

Regards,


On Fri, Jan 31, 2014 at 3:28 PM, Tharindu Rusira
<th...@gmail.com>wrote:

> Hi all,
> I'm running Mahout examples from the latest Mahout 0.9 release candidate.
> I got this error while running ./cluster-reuters.sh with option 3 lda
> clustering. As to the error log, this does not seem to be a Mahout issue
> but Hadoop(1.2.1) fails to write to */tmp/mahout-work-tkumara/reuters-lda.
> *This is however strange because /tmp/mahout-work-tkumara/ does not have
> a *reuters-lda  *directory and the exception stack trace complains that
> the said directory already exists.
>
> 14/01/31 15:20:39 ERROR security.UserGroupInformation:
> PriviledgedActionException as:tkumara
> cause:org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory
> /tmp/mahout-work-tkumara/reuters-lda already exists
> Exception in thread "main"
> org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory
> /tmp/mahout-work-tkumara/reuters-lda already exists
>     at
> org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:137)
>     at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:973)
>     at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at javax.security.auth.Subject.doAs(Subject.java:394)
>     at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
>     at
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936)
>     at org.apache.hadoop.mapreduce.Job.submit(Job.java:550)
>     at
> org.apache.mahout.clustering.lda.cvb.CVB0Driver.writeTopicModel(CVB0Driver.java:441)
>     at
> org.apache.mahout.clustering.lda.cvb.CVB0Driver.run(CVB0Driver.java:336)
>     at
> org.apache.mahout.clustering.lda.cvb.CVB0Driver.run(CVB0Driver.java:198)
>     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>     at
> org.apache.mahout.clustering.lda.cvb.CVB0Driver.main(CVB0Driver.java:534)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>     at java.lang.reflect.Method.invoke(Method.java:597)
>     at
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>     at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>     at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>     at java.lang.reflect.Method.invoke(Method.java:597)
>     at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
>
> I also checked relevant section in ./cluster-reuters.sh but could not find
> anything there.
>
> elif [ "x$clustertype" == "xlda" ]; then
>   $MAHOUT seq2sparse \
>     -i ${WORK_DIR}/reuters-out-seqdir/ \
>     -o ${WORK_DIR}/reuters-out-seqdir-sparse-lda -ow --maxDFPercent 85
> --namedVector \
>   && \
>   $MAHOUT rowid \
>     -i ${WORK_DIR}/reuters-out-seqdir-sparse-lda/tfidf-vectors \
>     -o ${WORK_DIR}/reuters-out-matrix \
>   && \
>   rm -rf ${WORK_DIR}/reuters-lda ${WORK_DIR}/reuters-lda-topics
> ${WORK_DIR}/reuters-lda-model \
>   && \
>   $MAHOUT cvb \
>     -i ${WORK_DIR}/reuters-out-matrix/matrix \
>     -o ${WORK_DIR}/reuters-lda -k 20 -ow -x 20 \
>     -dict ${WORK_DIR}/reuters-out-seqdir-sparse-lda/dictionary.file-* \
>     -dt ${WORK_DIR}/reuters-lda-topics \
>     -mt ${WORK_DIR}/reuters-lda-model \
>   && \
>   $MAHOUT vectordump \
>     -i ${WORK_DIR}/reuters-lda-topics/part-m-00000 \
>     -o ${WORK_DIR}/reuters-lda/vectordump \
>     -vs 10 -p true \
>     -d ${WORK_DIR}/reuters-out-seqdir-sparse-lda/dictionary.file-* \
>     -dt sequencefile -sort ${WORK_DIR}/reuters-lda-topics/part-m-00000 \
>     && \
>   cat ${WORK_DIR}/reuters-lda/vectordump
>
> So what would possibly be the reason for this exception?
> Thanks,
> --
> M.P. Tharindu Rusira Kumara
>
> Department of Computer Science and Engineering,
> University of Moratuwa,
> Sri Lanka.
> +94757033733
> www.tharindu-rusira.blogspot.com
>
>


-- 
M.P. Tharindu Rusira Kumara

Department of Computer Science and Engineering,
University of Moratuwa,
Sri Lanka.
+94757033733
www.tharindu-rusira.blogspot.com