You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by venkata ramana <ve...@gmail.com> on 2014/07/01 07:36:05 UTC

Re: Clusterdump in mahout

Hi Suneel,

I checked examples/bin/cluster-reuters.sh in mahout. The below statement I
did not understand.

$MAHOUT seqdirectory -i ${WORK_DIR}/reuters-out -o
${WORK_DIR}/reuters-out-seqdir -c UTF-8 -chunk 5

for sequence directory input should be my input. Here in the
cluster-reuters shell the input is taking directly from hadoop

 $HADOOP dfs -put ${WORK_DIR}/reuters-out ${WORK_DIR}/reuters-out

Can you please let me know how to use my input. Please let me know if I am
wrong.

Thanks,
Venkat




On Fri, Jun 27, 2014 at 2:48 PM, venkata ramana <venkat.ecosystems@gmail.com
> wrote:

> Hi,
>
> I have not used reuters-21578 my k-means.
>
> These steps I followed.
>
> I have prepared sequence directory then seq2sparse directory.
>
> ./mahout kmeans -Dmapred.map.java.child.opts=-Xmx1g -i
> /urlcat-data/56-categories/vector-dir/tfidf-vectors/ \
> -c /urlcat-data/56-categories/cluster-centroids -o
> /urlcat-data/56-categories/kmeans-cluster-output \
> -ow -dm org.apache.mahout.common.distance.CosineDistanceMeasure -x 5 -ow
> -cd 1 -k 49 --clustering -cl
>
>
> mahout clusterdump -i /opt/49-classification/cluster-centroids -o
> /opt/49-classification/kmeans-cluster-output/clusteranalyze1.txt -p
> /opt/49-classification/kmeans-cluster-output/clusteredPoints/ -d
> /root/Desktop/final_feature_dictionaries.txt -dt text -e;
>
> I have checked examples/bin/cluster-reuters.sh and downloaded
> reuters-21578
>
> Can you please let me know what should I do now.
>
> Thanks,
> Venkat
>
>
> On Thu, Jun 26, 2014 at 6:46 PM, Suneel Marthi <sm...@apache.org> wrote:
>
>> No, a dictionary is not a file of 'crisp keywords' to clusters mapping. A
>> dictionary is a mapping of keywords to a unique integerId.
>>
>> I again ask that it would be easier to help, if u can outline the steps u
>> had done for generating the clusters. Seems like u might have missed
>> something, at the very least look at the kmeans example in
>> examples/bin/cluster-reuters.sh for the correct sequence of steps.
>>
>>
>> On Thu, Jun 26, 2014 at 5:07 AM, venkata ramana <
>> venkat.ecosystems@gmail.com
>> > wrote:
>>
>> > As per my understanding dictionary file contains crisp keywords which
>> are
>> > related to cluster. Please let me know if I am wrong.
>> >
>> > Thanks,
>> > Venkat
>> >
>> >
>> > On Thu, Jun 26, 2014 at 1:27 PM, Suneel Marthi <sm...@apache.org>
>> wrote:
>> >
>> > > Its clear from the stacktrace that u have a String as key where an
>> > integer
>> > > was expected.
>> > > How did u go about building ur clusters from original input ?
>> > >
>> > >
>> > > On Thu, Jun 26, 2014 at 3:28 AM, venkata ramana <
>> > > venkat.ecosystems@gmail.com
>> > > > wrote:
>> > >
>> > > > Hi Mahout,
>> > > >
>> > > > I am trying to analysis  my k-means cluster. I have used following
>> > > command.
>> > > >
>> > > > mahout clusterdump -i /opt/49-classification/cluster-centroids -o
>> > > > /opt/49-classification/kmeans-cluster-output/clusteranalyze1.txt -p
>> > > > /opt/49-classification/kmeans-cluster-output/clusteredPoints/ -d
>> > > > /root/Desktop/final_feature_dictionaries.txt -dt text -e;
>> > > >
>> > > > I got the following error.
>> > > >
>> > > > hadoop binary is not in PATH,HADOOP_HOME/bin,HADOOP_PREFIX/bin,
>> running
>> > > > locally
>> > > > SLF4J: Class path contains multiple SLF4J bindings.
>> > > > SLF4J: Found binding in
>> > > >
>> > > >
>> > >
>> >
>> [jar:file:/opt/Gouri_Sankar/mahout-distribution-0.8/mahout-examples-0.8-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>> > > > SLF4J: Found binding in
>> > > >
>> > > >
>> > >
>> >
>> [jar:file:/opt/Gouri_Sankar/mahout-distribution-0.8/lib/slf4j-jcl-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>> > > > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
>> > > > explanation.
>> > > > SLF4J: Actual binding is of type [org.slf4j.impl.JCLLoggerFactory]
>> > > > Jun 26, 2014 12:43:40 PM org.slf4j.impl.JCLLoggerAdapter info
>> > > > INFO: Command line arguments:
>> > > > {--dictionary=[/root/Desktop/final_feature_dictionaries.txt],
>> > > > --dictionaryType=[text],
>> > > >
>> > > >
>> > >
>> >
>> --distanceMeasure=[org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure],
>> > > > --endPhase=[2147483647], --evaluate=null,
>> > > > --input=[/opt/49-classification/cluster-centroids],
>> > > >
>> > > >
>> > >
>> >
>> --output=[/opt/49-classification/kmeans-cluster-output/clusteranalyze1.txt],
>> > > > --outputFormat=[TEXT],
>> > > >
>> > > >
>> > >
>> >
>> --pointsDir=[/opt/49-classification/kmeans-cluster-output/clusteredPoints/],
>> > > > --startPhase=[0], --tempDir=[temp]}
>> > > > Exception in thread "main" java.lang.NumberFormatException: For
>> input
>> > > > string: "aajproperty.com"
>> > > >     at
>> > > >
>> > > >
>> > >
>> >
>> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>> > > >     at java.lang.Integer.parseInt(Integer.java:492)
>> > > >     at java.lang.Integer.parseInt(Integer.java:527)
>> > > >     at
>> > > >
>> > > >
>> > >
>> >
>> org.apache.mahout.utils.vectors.VectorHelper.loadTermDictionary(VectorHelper.java:218)
>> > > >
>> > > >
>> > > > I have not used any numbers in my dictionary file. Could you please
>> > help
>> > > me
>> > > > on this.
>> > > >
>> > > > Thanks,
>> > > > Venkat
>> > > >
>> > >
>> >
>>
>
>