You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Drew Farris <dr...@apache.org> on 2012/06/09 14:27:56 UTC

cluster-reuters.sh clusterdump arguments

Hi All,

In kicking the tires of the 0.7 release, I've discovered that the
arguments for clusterdump in examples/bin/cluster-reuters.sh aren't
quite right.

When running what's checked in, I get:

12/06/09 08:10:47 ERROR common.AbstractJob: Unexpected -s while
processing Job-Specific Options:
usage: <command> [Generic Options] [Job-Specific Options]

The current dump commands look like:

  $MAHOUT clusterdump \
    -s ${WORK_DIR}/reuters-kmeans/clusters-*-final \
    -d ${WORK_DIR}/reuters-out-seqdir-sparse-kmeans/dictionary.file-0 \
    -dt sequencefile -b 100 -n 20 --evaluate -dm
org.apache.mahout.common.distance.CosineDistanceMeasure \
    --pointsDir ${WORK_DIR}/reuters-kmeans/clusteredPoints

I think they should be:

  $MAHOUT clusterdump \
    -i ${WORK_DIR}/reuters-kmeans/clusters-*-final \
    -o ${WORK_DIR}/reuters-kmeans/clusters-dump -of TEXT \
    -d ${WORK_DIR}/reuters-out-seqdir-sparse-kmeans/dictionary.file-0 \
    -dt sequencefile -b 100 -n 20 --evaluate -dm
org.apache.mahout.common.distance.CosineDistanceMeasure \
    --pointsDir ${WORK_DIR}/reuters-kmeans/clusteredPoints

Anyone opposed to getting this fix in for 0.7?

Drew

Re: cluster-reuters.sh clusterdump arguments

Posted by Jeff Eastman <jd...@windwardsolutions.com>.
+1 -s got changed to -i some time back and it looks like some of the 
$MAHOUT clusterdump invocations didn't get upgraded. I agree it needs 
fixing.

On 6/9/12 8:27 AM, Drew Farris wrote:
> Hi All,
>
> In kicking the tires of the 0.7 release, I've discovered that the
> arguments for clusterdump in examples/bin/cluster-reuters.sh aren't
> quite right.
>
> When running what's checked in, I get:
>
> 12/06/09 08:10:47 ERROR common.AbstractJob: Unexpected -s while
> processing Job-Specific Options:
> usage:<command>  [Generic Options] [Job-Specific Options]
>
> The current dump commands look like:
>
>    $MAHOUT clusterdump \
>      -s ${WORK_DIR}/reuters-kmeans/clusters-*-final \
>      -d ${WORK_DIR}/reuters-out-seqdir-sparse-kmeans/dictionary.file-0 \
>      -dt sequencefile -b 100 -n 20 --evaluate -dm
> org.apache.mahout.common.distance.CosineDistanceMeasure \
>      --pointsDir ${WORK_DIR}/reuters-kmeans/clusteredPoints
>
> I think they should be:
>
>    $MAHOUT clusterdump \
>      -i ${WORK_DIR}/reuters-kmeans/clusters-*-final \
>      -o ${WORK_DIR}/reuters-kmeans/clusters-dump -of TEXT \
>      -d ${WORK_DIR}/reuters-out-seqdir-sparse-kmeans/dictionary.file-0 \
>      -dt sequencefile -b 100 -n 20 --evaluate -dm
> org.apache.mahout.common.distance.CosineDistanceMeasure \
>      --pointsDir ${WORK_DIR}/reuters-kmeans/clusteredPoints
>
> Anyone opposed to getting this fix in for 0.7?
>
> Drew
>
>