You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@mahout.apache.org by "Videnova, Svetlana" <sv...@logica.com> on 2012/06/20 09:36:43 UTC

several info

How do you want to combine Mahout and Solr? => that's was my question

I was using mahout0.6 but from yesterday Mahout0.7.

So I was trying to run (just for test and making sure that everything works properly)



###############################################################################################################

:/usr/local/mahout-distribution-0.7/examples/bin$ ./build-cluster-syntheticcontrol.sh

Please call cluster-syntheticcontrol.sh directly next time.  This file is going away.

Please select a number to choose the corresponding clustering algorithm

1. canopy clustering

2. kmeans clustering

3. fuzzykmeans clustering

4. dirichlet clustering

5. meanshift clustering

Enter your choice : 1

ok. You chose 1 and we'll use canopy Clustering

creating work directory at /tmp/mahout-work-hduser

Downloading Synthetic control data

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current

                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:--  0:01:03 --:--:--     0curl: (7) couldn't connect to host

Checking the health of DFS...

Warning: $HADOOP_HOME is deprecated.



Found 4 items

drwxr-xr-x   - hduser supergroup          0 2012-06-18 14:05 /user/hduser/gutenberg

drwxr-xr-x   - hduser supergroup          0 2012-06-18 14:07 /user/hduser/gutenberg-output

drwxr-xr-x   - hduser supergroup          0 2012-06-18 15:35 /user/hduser/output

drwxr-xr-x   - hduser supergroup          0 2012-06-19 14:24 /user/hduser/testdata

DFS is healthy...

Uploading Synthetic control data to HDFS

Warning: $HADOOP_HOME is deprecated.



Deleted hdfs://localhost:54310/user/hduser/testdata

Warning: $HADOOP_HOME is deprecated.



Warning: $HADOOP_HOME is deprecated.



put: File /tmp/mahout-work-hduser/synthetic_control.data does not exist.

Successfully Uploaded Synthetic control data to HDFS

Warning: $HADOOP_HOME is deprecated.



Running on hadoop, using /usr/local/hadoop/bin/hadoop and HADOOP_CONF_DIR=

MAHOUT-JOB: /usr/local/mahout-distribution-0.7/mahout-examples-0.7-job.jar

Warning: $HADOOP_HOME is deprecated.



12/06/20 08:20:24 WARN driver.MahoutDriver: No org.apache.mahout.clustering.syntheticcontrol.canopy.Job.props found on classpath, will use command-line arguments only

12/06/20 08:20:24 INFO canopy.Job: Running with default arguments

12/06/20 08:20:25 INFO common.HadoopUtil: Deleting output

12/06/20 08:20:26 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.

12/06/20 08:20:28 INFO input.FileInputFormat: Total input paths to process : 0

12/06/20 08:20:28 INFO mapred.JobClient: Running job: job_201206181326_0030

12/06/20 08:20:29 INFO mapred.JobClient:  map 0% reduce 0%

12/06/20 08:20:52 INFO mapred.JobClient: Job complete: job_201206181326_0030

12/06/20 08:20:52 INFO mapred.JobClient: Counters: 4

12/06/20 08:20:52 INFO mapred.JobClient:   Job Counters

12/06/20 08:20:52 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=10970

12/06/20 08:20:52 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0

12/06/20 08:20:52 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0

12/06/20 08:20:52 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0

12/06/20 08:20:52 INFO canopy.CanopyDriver: Build Clusters Input: output/data Out: output Measure: org.apache.mahout.common.distance.EuclideanDistanceMeasure@c5967f t1: 80.0 t2: 55.0

12/06/20 08:20:52 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.

12/06/20 08:20:53 INFO input.FileInputFormat: Total input paths to process : 0

12/06/20 08:20:53 INFO mapred.JobClient: Running job: job_201206181326_0031

12/06/20 08:20:54 INFO mapred.JobClient:  map 0% reduce 0%

12/06/20 08:21:17 INFO mapred.JobClient:  map 0% reduce 100%

12/06/20 08:21:22 INFO mapred.JobClient: Job complete: job_201206181326_0031

12/06/20 08:21:22 INFO mapred.JobClient: Counters: 19

12/06/20 08:21:22 INFO mapred.JobClient:   Job Counters

12/06/20 08:21:22 INFO mapred.JobClient:     Launched reduce tasks=1

12/06/20 08:21:22 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=9351

12/06/20 08:21:22 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0

12/06/20 08:21:22 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0

12/06/20 08:21:22 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=7740

12/06/20 08:21:22 INFO mapred.JobClient:   File Output Format Counters

12/06/20 08:21:22 INFO mapred.JobClient:     Bytes Written=106

12/06/20 08:21:22 INFO mapred.JobClient:   FileSystemCounters

12/06/20 08:21:22 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=22545

12/06/20 08:21:22 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=106

12/06/20 08:21:22 INFO mapred.JobClient:   Map-Reduce Framework

12/06/20 08:21:22 INFO mapred.JobClient:     Reduce input groups=0

12/06/20 08:21:22 INFO mapred.JobClient:     Combine output records=0

12/06/20 08:21:22 INFO mapred.JobClient:     Reduce shuffle bytes=0

12/06/20 08:21:22 INFO mapred.JobClient:     Physical memory (bytes) snapshot=40652800

12/06/20 08:21:22 INFO mapred.JobClient:     Reduce output records=0

12/06/20 08:21:22 INFO mapred.JobClient:     Spilled Records=0

12/06/20 08:21:22 INFO mapred.JobClient:     CPU time spent (ms)=420

12/06/20 08:21:22 INFO mapred.JobClient:     Total committed heap usage (bytes)=16252928

12/06/20 08:21:22 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=383250432

12/06/20 08:21:22 INFO mapred.JobClient:     Combine input records=0

12/06/20 08:21:22 INFO mapred.JobClient:     Reduce input records=0

12/06/20 08:21:22 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.

12/06/20 08:21:23 INFO input.FileInputFormat: Total input paths to process : 0

12/06/20 08:21:23 INFO mapred.JobClient: Running job: job_201206181326_0032

12/06/20 08:21:24 INFO mapred.JobClient:  map 0% reduce 0%

12/06/20 08:21:43 INFO mapred.JobClient: Job complete: job_201206181326_0032

12/06/20 08:21:43 INFO mapred.JobClient: Counters: 4

12/06/20 08:21:43 INFO mapred.JobClient:   Job Counters

12/06/20 08:21:43 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=9347

12/06/20 08:21:43 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0

12/06/20 08:21:43 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0

12/06/20 08:21:43 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0

12/06/20 08:21:43 INFO clustering.ClusterDumper: Wrote 0 clusters

12/06/20 08:21:43 INFO driver.MahoutDriver: Program took 78406 ms (Minutes: 1.3067666666666666)

###############################################################################################























How do you want to combine Mahout and Solr? Also, Solr is a web

service and can receive and supply data in several different formats.



On Tue, Jun 19, 2012 at 6:04 AM, Paritosh Ranjan <pranjan <at> xebia.com> wrote:

> Regarding the errors,

> which version of Mahout are you using?

> There was some problem in cluster-reuters.sh ( build-reuters.sh calls cluster-reuters.sh ) which has

been fixed in the last release 0.7.

> ________________________________________

> From: Svet [svetlana.videnova <at> logica.com]

> Sent: Tuesday, June 19, 2012 2:51 PM

> To: user <at> mahout.apache.org

> Subject: several info

>

> Hi all,

>

>

> First of all i would like to thanks Praveenesh Kumar for helping me with hadoop

> and mahout!!!

>

> Nevertheless i have several questions about Mahout.

>

> 1) I need Mahout working with SOLR. Can somebody give me a great tutorial to

> make them starting together?

>

> 2)What exactly the possibilities of input and output files of Mahout (especially

> when Mahout works with SOLR, i know that output file of SOLR is XML)?

>

> 3)Which of thoses algorythms are using Hadoop? And please complete the list if i

> forgot some.

>          -Canopy, KMeans, Dirichlet, Mean-shift, Latent Dirichlet Allocation

>

>

>

>

> 4)Moreover i was trying to run "./build-reuters.sh" and choosing kmeans

> clustering (but its the same error with fuzzykmeans)

>  Can somebody help me with this error? (but look at 8) ! )

> ###########################

> 12/06/19 13:33:52 WARN mapred.LocalJobRunner: job_local_0001

> java.lang.IllegalStateException: No clusters found. Check your -c path.

>        at

> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:59)

>        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)

>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)

>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)

>        at

> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)

> 12/06/19 13:33:52 INFO mapred.JobClient:  map 0% reduce 0%

> 12/06/19 13:33:52 INFO mapred.JobClient: Job complete: job_local_0001

> 12/06/19 13:33:52 INFO mapred.JobClient: Counters: 0

> Exception in thread "main" java.lang.InterruptedException: K-Means Iteration

> failed processing /tmp/mahout-work-hduser/reuters-kmeans-clusters/part-

> randomSeed

>        at

> org.apache.mahout.clustering.kmeans.KMeansDriver.runIteration(KMeansDriver.java:

> 371)

>        at

> org.apache.mahout.clustering.kmeans.KMeansDriver.buildClustersMR(KMeansDriver.ja

> va:316)

>        at

> org.apache.mahout.clustering.kmeans.KMeansDriver.buildClusters(KMeansDriver.java

> :239)

>        at

> org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:154)

>        at

> org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:112)

>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)

>        at

> org.apache.mahout.clustering.kmeans.KMeansDriver.main(KMeansDriver.java:61)

>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

>        at

> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

>        at

> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.jav

> a:43)

>        at java.lang.reflect.Method.invoke(Method.java:601)

>        at

> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.jav

> a:68)

>        at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)

>        at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)

>

> ###########################

>

>

> 5)problem also with "./build-reuters" but lda (but look at 8) ! )

> ############################

> 12/06/19 13:40:01 WARN mapred.LocalJobRunner: job_local_0001

> java.lang.IllegalArgumentException

>        at

> com.google.common.base.Preconditions.checkArgument(Preconditions.java:72)

>        at

> org.apache.mahout.clustering.lda.LDADriver.createState(LDADriver.java:124)

>        at

> org.apache.mahout.clustering.lda.LDADriver.createState(LDADriver.java:92)

>        at

> org.apache.mahout.clustering.lda.LDAWordTopicMapper.configure(LDAWordTopicMapper

> .java:96)

>        at

> org.apache.mahout.clustering.lda.LDAWordTopicMapper.setup(LDAWordTopicMapper.jav

> a:102)

>        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)

>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)

>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)

>        at

> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)

> 12/06/19 13:40:02 INFO mapred.JobClient:  map 0% reduce 0%

> 12/06/19 13:40:02 INFO mapred.JobClient: Job complete: job_local_0001

> 12/06/19 13:40:02 INFO mapred.JobClient: Counters: 0

> Exception in thread "main" java.lang.InterruptedException: LDA Iteration failed

> processing /tmp/mahout-work-hduser/reuters-lda/state-0

>        at

> org.apache.mahout.clustering.lda.LDADriver.runIteration(LDADriver.java:449)

>        at org.apache.mahout.clustering.lda.LDADriver.run(LDADriver.java:249)

>        at org.apache.mahout.clustering.lda.LDADriver.run(LDADriver.java:169)

>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)

>        at org.apache.mahout.clustering.lda.LDADriver.main(LDADriver.java:88)

>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

>        at

> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

>        at

> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.jav

> a:43)

>        at java.lang.reflect.Method.invoke(Method.java:601)

>        at

> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.jav

> a:68)

>        at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)

>        at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)

> ############################

>

>

> 6)But i was starting "./build-reuters" with dirichlet clustering and it wrote

> 20clusters without problems (but look at 8) ! )

> The result is :

> ############################

> ...

> 12/06/19 13:45:12 INFO driver.MahoutDriver: Program took 142609 ms (Minutes:

> 2.3768166666666666)

> MAHOUT_LOCAL is set, so we don't add HADOOP_CONF_DIR to classpath.

> MAHOUT_LOCAL is set, running locally

> SLF4J: Class path contains multiple SLF4J bindings.

> SLF4J: Found binding in [jar:file:/usr/local/mahout-distribution-0.6/mahout-

> examples-0.6-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]

> SLF4J: Found binding in [jar:file:/usr/local/mahout-distribution-0.6/lib/slf4j-

> jcl-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]

> SLF4J: Found binding in [jar:file:/usr/local/mahout-distribution-0.6/lib/slf4j-

> log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]

> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

> 12/06/19 13:45:13 INFO common.AbstractJob: Command line arguments: {--

> dictionary=/tmp/mahout-work-hduser/reuters-out-seqdir-sparse-

> dirichlet/dictionary.file-0, --dictionaryType=sequencefile, --

> distanceMeasure=org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasur

> e, --endPhase=2147483647, --numWords=20, --outputFormat=TEXT, --

> seqFileDir=/tmp/mahout-work-hduser/reuters-dirichlet/clusters-20-final, --

> startPhase=0, --substring=100, --tempDir=temp}

> DC-0 total= 0 model= DMC:0{n=0 c=[] r=[]}

>        Top Terms:

> DC-1 total= 0 model= DMC:1{n=0 c=[] r=[]}

>        Top Terms:

> DC-10 total= 0 model= DMC:10{n=0 c=[] r=[]}

>        Top Terms:

> DC-11 total= 0 model= DMC:11{n=0 c=[] r=[]}

>        Top Terms:

> DC-12 total= 0 model= DMC:12{n=0 c=[] r=[]}

>        Top Terms:

> DC-13 total= 0 model= DMC:13{n=0 c=[] r=[]}

>        Top Terms:

> DC-14 total= 0 model= DMC:14{n=0 c=[] r=[]}

>        Top Terms:

> DC-15 total= 0 model= DMC:15{n=0 c=[] r=[]}

>        Top Terms:

> DC-16 total= 0 model= DMC:16{n=0 c=[] r=[]}

>        Top Terms:

> DC-17 total= 0 model= DMC:17{n=0 c=[] r=[]}

>        Top Terms:

> DC-18 total= 0 model= DMC:18{n=0 c=[] r=[]}

>        Top Terms:

> DC-19 total= 0 model= DMC:19{n=0 c=[] r=[]}

>        Top Terms:

> DC-2 total= 0 model= DMC:2{n=0 c=[] r=[]}

>        Top Terms:

> DC-3 total= 0 model= DMC:3{n=0 c=[] r=[]}

>        Top Terms:

> DC-4 total= 0 model= DMC:4{n=0 c=[] r=[]}

>        Top Terms:

> DC-5 total= 0 model= DMC:5{n=0 c=[] r=[]}

>        Top Terms:

> DC-6 total= 0 model= DMC:6{n=0 c=[] r=[]}

>        Top Terms:

> DC-7 total= 0 model= DMC:7{n=0 c=[] r=[]}

>        Top Terms:

> DC-8 total= 0 model= DMC:8{n=0 c=[] r=[]}

>        Top Terms:

> DC-9 total= 0 model= DMC:9{n=0 c=[] r=[]}

>        Top Terms:

> 12/06/19 13:45:14 INFO clustering.ClusterDumper: Wrote 20 clusters

> 12/06/19 13:45:14 INFO driver.MahoutDriver: Program took 789 ms (Minutes:

> 0.01315)

> ############################

>

>

> 7) And the end : "./build-reuters" with minhash clustering.

> Works good!

>

>

> 8) For 4),5),6) and 7) there is SUCCESS file in /tmp/mahout-work-hduser/

>

> ...

>

>

>

> Thanks everybody

> Regards

>



--

Lance Norskog

goksron <at> gmail.com





Think green - keep it on the screen.

This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you.

RE: several info

Posted by "Videnova, Svetlana" <sv...@logica.com>.

Hi,

I have database (which is evolving all the time). This database, after solr indexion, is xml file thanks to the solr output.
Then I have to give this xml file to mahout in order to mahout be able to classify and clusterize those information. Then I have to parse again the output of mahout in order to display on my screen this database information that I needed.

Regards

-----Message d'origine-----
De : Lance Norskog [mailto:goksron@gmail.com] 
Envoyé : dimanche 24 juin 2012 02:03
À : dev@mahout.apache.org
Objet : Re: several info

Please describe what you would like to do. What would you like to learn from your data? We cannot recommend techniques until we know this.

On Fri, Jun 22, 2012 at 5:54 AM, Videnova, Svetlana <sv...@logica.com> wrote:
> HI,
> Sorry I didnt find how the source code of this link https://github.com/gsingers/ApacheCon2010 can help me, maybe I missed some information...
> I'm ok about doing some code that's no problem but where and with which purpose??I mean for the moment I don't even know what hadoop/mahout/solr needs to work together and moreover what I have to add on already existing files in order to add my own database.
>
> Thanks
>
> Regards
>
>
> -----Message d'origine-----
> De : Grant Ingersoll [mailto:gsingers@apache.org] Envoyé : vendredi 22 
> juin 2012 13:41 À : dev@mahout.apache.org Objet : Re: several info
>
>
> On Jun 22, 2012, at 2:30 AM, Videnova, Svetlana wrote:
>
>> Hi Grant,
>>
>> Thank you for your fast answer.
>> My question was where can I get the output file on Mahout and how can I vectorize the indexing solr files, and where put them in Mahout?
>
> I think that link I provided shows how to get data out of Solr and into Mahout.  You need term vectors (or see https://issues.apache.org/jira/browse/MAHOUT-944 for using it off of stored fields).  To get things back into Solr, you'll have to write some code to do that.
>
> -Grant
>
> Think green - keep it on the screen.
>
> This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you.
>
>

--
Lance Norskog
goksron@gmail.com

Think green - keep it on the screen.

This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you.

RE: several info

Posted by "Videnova, Svetlana" <sv...@logica.com>.

Apparently i had a proxy problem.
Now I run ./example/build-cluster-syntheticcontrol.sh
And after all info and numbers I have got this output=>
12/06/26 08:29:17 INFO clustering.ClusterDumper: Wrote 12 clusters
12/06/26 08:29:17 INFO driver.MahoutDriver: Program took 451592 ms (Minutes: 7.526533333333333)

Hadoop works properly.


My hadoop version is :
#####################
/usr/local/hadoop$ ls
bin          hadoop-ant-1.0.3.jar          ivy          README.txt
build.xml    hadoop-client-1.0.3.jar       ivy.xml      sbin
c++          hadoop-core-1.0.3.jar         lib          share
CHANGES.txt  hadoop-examples-1.0.3.jar     libexec      src
conf         hadoop-minicluster-1.0.3.jar  LICENSE.txt  webapps
contrib      hadoop-test-1.0.3.jar         logs
docs         hadoop-tools-1.0.3.jar        NOTICE.txt



/usr/local/hadoop$ hadoop -version
Warning: $HADOOP_HOME is deprecated.

java version "1.6.0_32"
Java(TM) SE Runtime Environment (build 1.6.0_32-b05)
Java HotSpot(TM) Client VM (build 20.7-b02, mixed mode)
#####################





$ hadoop fs -lsr 
#######################################
/usr/local/hadoop$ hadoop fs -lsr
Warning: $HADOOP_HOME is deprecated.

drwxr-xr-x   - hduser supergroup          0 2012-06-18 14:05 /user/hduser/gutenberg
-rw-r--r--   1 hduser supergroup     674566 2012-06-18 14:05 /user/hduser/gutenberg/pg20417.txt
-rw-r--r--   1 hduser supergroup    1573150 2012-06-18 14:05 /user/hduser/gutenberg/pg4300.txt
-rw-r--r--   1 hduser supergroup    1423801 2012-06-18 14:05 /user/hduser/gutenberg/pg5000.txt
drwxr-xr-x   - hduser supergroup          0 2012-06-18 14:07 /user/hduser/gutenberg-output
-rw-r--r--   1 hduser supergroup          0 2012-06-18 14:07 /user/hduser/gutenberg-output/_SUCCESS
drwxr-xr-x   - hduser supergroup          0 2012-06-18 14:06 /user/hduser/gutenberg-output/_logs
drwxr-xr-x   - hduser supergroup          0 2012-06-18 14:06 /user/hduser/gutenberg-output/_logs/history
-rw-r--r--   1 hduser supergroup      19419 2012-06-18 14:06 /user/hduser/gutenberg-output/_logs/history/job_201206181326_0001_1340021186120_hduser_word+count
-rw-r--r--   1 hduser supergroup      20388 2012-06-18 14:06 /user/hduser/gutenberg-output/_logs/history/job_201206181326_0001_conf.xml
-rw-r--r--   1 hduser supergroup     880838 2012-06-18 14:06 /user/hduser/gutenberg-output/part-r-00000
drwxr-xr-x   - hduser supergroup          0 2012-06-26 08:28 /user/hduser/output
-rw-r--r--   1 hduser supergroup        194 2012-06-26 08:28 /user/hduser/output/_policy
drwxr-xr-x   - hduser supergroup          0 2012-06-26 08:29 /user/hduser/output/clusteredPoints
-rw-r--r--   1 hduser supergroup          0 2012-06-26 08:29 /user/hduser/output/clusteredPoints/_SUCCESS
drwxr-xr-x   - hduser supergroup          0 2012-06-26 08:28 /user/hduser/output/clusteredPoints/_logs
drwxr-xr-x   - hduser supergroup          0 2012-06-26 08:28 /user/hduser/output/clusteredPoints/_logs/history
-rw-r--r--   1 hduser supergroup       9145 2012-06-26 08:28 /user/hduser/output/clusteredPoints/_logs/history/job_201206260820_0012_1340692129661_hduser_Cluster+Classification+Driver+running+over+input%3A+
-rw-r--r--   1 hduser supergroup      20557 2012-06-26 08:28 /user/hduser/output/clusteredPoints/_logs/history/job_201206260820_0012_conf.xml
-rw-r--r--   1 hduser supergroup     340900 2012-06-26 08:29 /user/hduser/output/clusteredPoints/part-m-00000
drwxr-xr-x   - hduser supergroup          0 2012-06-26 08:22 /user/hduser/output/clusters-0
-rw-r--r--   1 hduser supergroup        194 2012-06-26 08:22 /user/hduser/output/clusters-0/_policy
-rw-r--r--   1 hduser supergroup       1891 2012-06-26 08:22 /user/hduser/output/clusters-0/part-00000
-rw-r--r--   1 hduser supergroup       1891 2012-06-26 08:22 /user/hduser/output/clusters-0/part-00001
-rw-r--r--   1 hduser supergroup       1891 2012-06-26 08:22 /user/hduser/output/clusters-0/part-00002
-rw-r--r--   1 hduser supergroup       1891 2012-06-26 08:22 /user/hduser/output/clusters-0/part-00003
-rw-r--r--   1 hduser supergroup       1891 2012-06-26 08:22 /user/hduser/output/clusters-0/part-00004
-rw-r--r--   1 hduser supergroup       1891 2012-06-26 08:22 /user/hduser/output/clusters-0/part-00005
-rw-r--r--   1 hduser supergroup       7331 2012-06-26 08:22 /user/hduser/output/clusters-0/part-randomSeed
drwxr-xr-x   - hduser supergroup          0 2012-06-26 08:22 /user/hduser/output/clusters-1
-rw-r--r--   1 hduser supergroup          0 2012-06-26 08:22 /user/hduser/output/clusters-1/_SUCCESS
drwxr-xr-x   - hduser supergroup          0 2012-06-26 08:22 /user/hduser/output/clusters-1/_logs
drwxr-xr-x   - hduser supergroup          0 2012-06-26 08:22 /user/hduser/output/clusters-1/_logs/history
-rw-r--r--   1 hduser supergroup      13708 2012-06-26 08:22 /user/hduser/output/clusters-1/_logs/history/job_201206260820_0002_1340691736541_hduser_Cluster+Iterator+running+iteration+1+over+priorPat
-rw-r--r--   1 hduser supergroup      20872 2012-06-26 08:22 /user/hduser/output/clusters-1/_logs/history/job_201206260820_0002_conf.xml
-rw-r--r--   1 hduser supergroup        194 2012-06-26 08:22 /user/hduser/output/clusters-1/_policy
-rw-r--r--   1 hduser supergroup      11809 2012-06-26 08:22 /user/hduser/output/clusters-1/part-r-00000
drwxr-xr-x   - hduser supergroup          0 2012-06-26 08:28 /user/hduser/output/clusters-10-final
-rw-r--r--   1 hduser supergroup          0 2012-06-26 08:28 /user/hduser/output/clusters-10-final/_SUCCESS
drwxr-xr-x   - hduser supergroup          0 2012-06-26 08:28 /user/hduser/output/clusters-10-final/_logs
drwxr-xr-x   - hduser supergroup          0 2012-06-26 08:28 /user/hduser/output/clusters-10-final/_logs/history
-rw-r--r--   1 hduser supergroup      13723 2012-06-26 08:28 /user/hduser/output/clusters-10-final/_logs/history/job_201206260820_0011_1340692090891_hduser_Cluster+Iterator+running+iteration+10+over+priorPa
-rw-r--r--   1 hduser supergroup      20874 2012-06-26 08:28 /user/hduser/output/clusters-10-final/_logs/history/job_201206260820_0011_conf.xml
-rw-r--r--   1 hduser supergroup        194 2012-06-26 08:28 /user/hduser/output/clusters-10-final/_policy
-rw-r--r--   1 hduser supergroup      13989 2012-06-26 08:28 /user/hduser/output/clusters-10-final/part-r-00000
drwxr-xr-x   - hduser supergroup          0 2012-06-26 08:23 /user/hduser/output/clusters-2
-rw-r--r--   1 hduser supergroup          0 2012-06-26 08:23 /user/hduser/output/clusters-2/_SUCCESS
drwxr-xr-x   - hduser supergroup          0 2012-06-26 08:22 /user/hduser/output/clusters-2/_logs
drwxr-xr-x   - hduser supergroup          0 2012-06-26 08:22 /user/hduser/output/clusters-2/_logs/history
-rw-r--r--   1 hduser supergroup      13708 2012-06-26 08:22 /user/hduser/output/clusters-2/_logs/history/job_201206260820_0003_1340691778216_hduser_Cluster+Iterator+running+iteration+2+over+priorPat
-rw-r--r--   1 hduser supergroup      20872 2012-06-26 08:22 /user/hduser/output/clusters-2/_logs/history/job_201206260820_0003_conf.xml
-rw-r--r--   1 hduser supergroup        194 2012-06-26 08:23 /user/hduser/output/clusters-2/_policy
-rw-r--r--   1 hduser supergroup      12909 2012-06-26 08:23 /user/hduser/output/clusters-2/part-r-00000
drwxr-xr-x   - hduser supergroup          0 2012-06-26 08:24 /user/hduser/output/clusters-3
-rw-r--r--   1 hduser supergroup          0 2012-06-26 08:24 /user/hduser/output/clusters-3/_SUCCESS
drwxr-xr-x   - hduser supergroup          0 2012-06-26 08:23 /user/hduser/output/clusters-3/_logs
drwxr-xr-x   - hduser supergroup          0 2012-06-26 08:23 /user/hduser/output/clusters-3/_logs/history
-rw-r--r--   1 hduser supergroup      13722 2012-06-26 08:23 /user/hduser/output/clusters-3/_logs/history/job_201206260820_0004_1340691817118_hduser_Cluster+Iterator+running+iteration+3+over+priorPat
-rw-r--r--   1 hduser supergroup      20872 2012-06-26 08:23 /user/hduser/output/clusters-3/_logs/history/job_201206260820_0004_conf.xml
-rw-r--r--   1 hduser supergroup        194 2012-06-26 08:24 /user/hduser/output/clusters-3/_policy
-rw-r--r--   1 hduser supergroup      13449 2012-06-26 08:24 /user/hduser/output/clusters-3/part-r-00000
drwxr-xr-x   - hduser supergroup          0 2012-06-26 08:24 /user/hduser/output/clusters-4
-rw-r--r--   1 hduser supergroup          0 2012-06-26 08:24 /user/hduser/output/clusters-4/_SUCCESS
drwxr-xr-x   - hduser supergroup          0 2012-06-26 08:24 /user/hduser/output/clusters-4/_logs
drwxr-xr-x   - hduser supergroup          0 2012-06-26 08:24 /user/hduser/output/clusters-4/_logs/history
-rw-r--r--   1 hduser supergroup      13722 2012-06-26 08:24 /user/hduser/output/clusters-4/_logs/history/job_201206260820_0005_1340691855706_hduser_Cluster+Iterator+running+iteration+4+over+priorPat
-rw-r--r--   1 hduser supergroup      20872 2012-06-26 08:24 /user/hduser/output/clusters-4/_logs/history/job_201206260820_0005_conf.xml
-rw-r--r--   1 hduser supergroup        194 2012-06-26 08:24 /user/hduser/output/clusters-4/_policy
-rw-r--r--   1 hduser supergroup      13989 2012-06-26 08:24 /user/hduser/output/clusters-4/part-r-00000
drwxr-xr-x   - hduser supergroup          0 2012-06-26 08:25 /user/hduser/output/clusters-5
-rw-r--r--   1 hduser supergroup          0 2012-06-26 08:25 /user/hduser/output/clusters-5/_SUCCESS
drwxr-xr-x   - hduser supergroup          0 2012-06-26 08:24 /user/hduser/output/clusters-5/_logs
drwxr-xr-x   - hduser supergroup          0 2012-06-26 08:24 /user/hduser/output/clusters-5/_logs/history
-rw-r--r--   1 hduser supergroup      13706 2012-06-26 08:24 /user/hduser/output/clusters-5/_logs/history/job_201206260820_0006_1340691895472_hduser_Cluster+Iterator+running+iteration+5+over+priorPat
-rw-r--r--   1 hduser supergroup      20872 2012-06-26 08:24 /user/hduser/output/clusters-5/_logs/history/job_201206260820_0006_conf.xml
-rw-r--r--   1 hduser supergroup        194 2012-06-26 08:25 /user/hduser/output/clusters-5/_policy
-rw-r--r--   1 hduser supergroup      13989 2012-06-26 08:25 /user/hduser/output/clusters-5/part-r-00000
drwxr-xr-x   - hduser supergroup          0 2012-06-26 08:26 /user/hduser/output/clusters-6
-rw-r--r--   1 hduser supergroup          0 2012-06-26 08:26 /user/hduser/output/clusters-6/_SUCCESS
drwxr-xr-x   - hduser supergroup          0 2012-06-26 08:25 /user/hduser/output/clusters-6/_logs
drwxr-xr-x   - hduser supergroup          0 2012-06-26 08:25 /user/hduser/output/clusters-6/_logs/history
-rw-r--r--   1 hduser supergroup      13722 2012-06-26 08:25 /user/hduser/output/clusters-6/_logs/history/job_201206260820_0007_1340691934345_hduser_Cluster+Iterator+running+iteration+6+over+priorPat
-rw-r--r--   1 hduser supergroup      20872 2012-06-26 08:25 /user/hduser/output/clusters-6/_logs/history/job_201206260820_0007_conf.xml
-rw-r--r--   1 hduser supergroup        194 2012-06-26 08:26 /user/hduser/output/clusters-6/_policy
-rw-r--r--   1 hduser supergroup      13989 2012-06-26 08:26 /user/hduser/output/clusters-6/part-r-00000
drwxr-xr-x   - hduser supergroup          0 2012-06-26 08:26 /user/hduser/output/clusters-7
-rw-r--r--   1 hduser supergroup          0 2012-06-26 08:26 /user/hduser/output/clusters-7/_SUCCESS
drwxr-xr-x   - hduser supergroup          0 2012-06-26 08:26 /user/hduser/output/clusters-7/_logs
drwxr-xr-x   - hduser supergroup          0 2012-06-26 08:26 /user/hduser/output/clusters-7/_logs/history
-rw-r--r--   1 hduser supergroup      13722 2012-06-26 08:26 /user/hduser/output/clusters-7/_logs/history/job_201206260820_0008_1340691973801_hduser_Cluster+Iterator+running+iteration+7+over+priorPat
-rw-r--r--   1 hduser supergroup      20872 2012-06-26 08:26 /user/hduser/output/clusters-7/_logs/history/job_201206260820_0008_conf.xml
-rw-r--r--   1 hduser supergroup        194 2012-06-26 08:26 /user/hduser/output/clusters-7/_policy
-rw-r--r--   1 hduser supergroup      13989 2012-06-26 08:26 /user/hduser/output/clusters-7/part-r-00000
drwxr-xr-x   - hduser supergroup          0 2012-06-26 08:27 /user/hduser/output/clusters-8
-rw-r--r--   1 hduser supergroup          0 2012-06-26 08:27 /user/hduser/output/clusters-8/_SUCCESS
drwxr-xr-x   - hduser supergroup          0 2012-06-26 08:26 /user/hduser/output/clusters-8/_logs
drwxr-xr-x   - hduser supergroup          0 2012-06-26 08:26 /user/hduser/output/clusters-8/_logs/history
-rw-r--r--   1 hduser supergroup      13722 2012-06-26 08:26 /user/hduser/output/clusters-8/_logs/history/job_201206260820_0009_1340692013041_hduser_Cluster+Iterator+running+iteration+8+over+priorPat
-rw-r--r--   1 hduser supergroup      20872 2012-06-26 08:26 /user/hduser/output/clusters-8/_logs/history/job_201206260820_0009_conf.xml
-rw-r--r--   1 hduser supergroup        194 2012-06-26 08:27 /user/hduser/output/clusters-8/_policy
-rw-r--r--   1 hduser supergroup      13989 2012-06-26 08:27 /user/hduser/output/clusters-8/part-r-00000
drwxr-xr-x   - hduser supergroup          0 2012-06-26 08:28 /user/hduser/output/clusters-9
-rw-r--r--   1 hduser supergroup          0 2012-06-26 08:28 /user/hduser/output/clusters-9/_SUCCESS
drwxr-xr-x   - hduser supergroup          0 2012-06-26 08:27 /user/hduser/output/clusters-9/_logs
drwxr-xr-x   - hduser supergroup          0 2012-06-26 08:27 /user/hduser/output/clusters-9/_logs/history
-rw-r--r--   1 hduser supergroup      13724 2012-06-26 08:27 /user/hduser/output/clusters-9/_logs/history/job_201206260820_0010_1340692051563_hduser_Cluster+Iterator+running+iteration+9+over+priorPat
-rw-r--r--   1 hduser supergroup      20872 2012-06-26 08:27 /user/hduser/output/clusters-9/_logs/history/job_201206260820_0010_conf.xml
-rw-r--r--   1 hduser supergroup        194 2012-06-26 08:28 /user/hduser/output/clusters-9/_policy
-rw-r--r--   1 hduser supergroup      13989 2012-06-26 08:27 /user/hduser/output/clusters-9/part-r-00000
drwxr-xr-x   - hduser supergroup          0 2012-06-26 08:22 /user/hduser/output/data
-rw-r--r--   1 hduser supergroup          0 2012-06-26 08:22 /user/hduser/output/data/_SUCCESS
drwxr-xr-x   - hduser supergroup          0 2012-06-26 08:21 /user/hduser/output/data/_logs
drwxr-xr-x   - hduser supergroup          0 2012-06-26 08:21 /user/hduser/output/data/_logs/history
-rw-r--r--   1 hduser supergroup       9125 2012-06-26 08:21 /user/hduser/output/data/_logs/history/job_201206260820_0001_1340691708676_hduser_Input+Driver+running+over+input%3A+testdata
-rw-r--r--   1 hduser supergroup      20267 2012-06-26 08:21 /user/hduser/output/data/_logs/history/job_201206260820_0001_conf.xml
-rw-r--r--   1 hduser supergroup     335470 2012-06-26 08:22 /user/hduser/output/data/part-m-00000
drwxr-xr-x   - hduser supergroup          0 2012-06-26 08:21 /user/hduser/testdata
-rw-r--r--   1 hduser supergroup     288374 2012-06-26 08:21 /user/hduser/testdata/synthetic_control.data
##################################



-----Message d'origine-----
De : shaposhnik@gmail.com [mailto:shaposhnik@gmail.com] De la part de Roman Shaposhnik
Envoyé : lundi 25 juin 2012 18:00
À : dev@mahout.apache.org
Objet : Re: several info

On Mon, Jun 25, 2012 at 7:48 AM, Videnova, Svetlana <sv...@logica.com> wrote:
>  % Total    % Received % Xferd  Average Speed   Time    Time     Time  
> Current
>                                 Dload  Upload   Total   Spent    Left  
> Speed
>  0     0    0     0    0     0      0      0 --:--:--  0:01:03 
> --:--:--     0
>
>
> curl: (7) couldn't connect to host

This is suspect. You sure your host has the type of network connectivity that allows it to connect to the outside world?

Also, what version of Hadoop are you using and how it was installed?

Finally, can you make sure that the basic stuff like:
   hadoop fs -lsr .

works?

Thanks,
Roman.


Think green - keep it on the screen.

This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you.

RE: several info

Posted by "Videnova, Svetlana" <sv...@logica.com>.

Please can you help me with this error?

############################################
hduser:/usr/local/mahout-distribution-0.7/examples/bin$ ./build-reuters.sh 
Please call cluster-reuters.sh directly next time.  This file is going away.
Please select a number to choose the corresponding clustering algorithm
1. kmeans clustering
2. fuzzykmeans clustering
3. dirichlet clustering
4. minhash clustering
Enter your choice : 1
ok. You chose 1 and we'll use kmeans Clustering
creating work directory at /tmp/mahout-work-hduser
Downloading Reuters-21578
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 7959k  100 7959k    0     0  72556      0  0:01:52  0:01:52 --:--:--  192k
Extracting...
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Warning: $HADOOP_HOME is deprecated.

Running on hadoop, using /usr/local/hadoop/bin/hadoop and HADOOP_CONF_DIR=/usr/local/hadoop/conf
MAHOUT-JOB: /usr/local/mahout-distribution-0.7/mahout-examples-0.7-job.jar
Warning: $HADOOP_HOME is deprecated.

12/06/26 08:53:50 WARN driver.MahoutDriver: No org.apache.lucene.benchmark.utils.ExtractReuters.props found on classpath, will use command-line arguments only
Deleting all files in /tmp/mahout-work-hduser/reuters-out-tmp
12/06/26 08:53:56 INFO driver.MahoutDriver: Program took 5613 ms (Minutes: 0.09355)
MAHOUT_LOCAL is set, so we don't add HADOOP_CONF_DIR to classpath.
Warning: $HADOOP_HOME is deprecated.

MAHOUT_LOCAL is set, running locally
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/mahout-distribution-0.7/mahout-examples-0.7-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/mahout-distribution-0.7/lib/slf4j-jcl-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/mahout-distribution-0.7/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/util/ProgramDriver
	at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:96)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.util.ProgramDriver
	at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
	at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
	... 1 more
Warning: $HADOOP_HOME is deprecated.

rmr: cannot remove /tmp/mahout-work-hduser/reuters-out-seqdir: No such file or directory.
Warning: $HADOOP_HOME is deprecated.

put: File /tmp/mahout-work-hduser/reuters-out-seqdir does not exist.

###############################################################



-----Message d'origine-----
De : Sean Owen [mailto:srowen@gmail.com] 
Envoyé : mardi 26 juin 2012 09:46
À : dev@mahout.apache.org
Objet : Re: several info

That is just a message from Hadoop, which you can ignore.

On Tue, Jun 26, 2012 at 8:43 AM, Videnova, Svetlana <sv...@logica.com> wrote:
> Warning: $HADOOP_HOME is deprecated : is this caused because I set HADOOP_HOME=/usr/local/hadoop?


Think green - keep it on the screen.

This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you.

Re: several info

Posted by Sean Owen <sr...@gmail.com>.

That is just a message from Hadoop, which you can ignore.

On Tue, Jun 26, 2012 at 8:43 AM, Videnova, Svetlana
<sv...@logica.com> wrote:
> Warning: $HADOOP_HOME is deprecated : is this caused because I set HADOOP_HOME=/usr/local/hadoop?

RE: several info

Posted by "Videnova, Svetlana" <sv...@logica.com>.

Warning: $HADOOP_HOME is deprecated : is this caused because I set HADOOP_HOME=/usr/local/hadoop?

-----Message d'origine-----
De : Lance Norskog [mailto:goksron@gmail.com] 
Envoyé : mardi 26 juin 2012 05:16
À : dev@mahout.apache.org
Objet : Re: several info

After you get your network connection problems sorted, it will be easier if you remove your HADOOP environment variables. Mahout includes its own Hadoop. Mahout will run in local pseudo-disributed mode if you do not have HADOOP_* environment variables set.

On Mon, Jun 25, 2012 at 9:00 AM, Roman Shaposhnik <ro...@shaposhnik.org> wrote:
> On Mon, Jun 25, 2012 at 7:48 AM, Videnova, Svetlana 
> <sv...@logica.com> wrote:
>>  % Total    % Received % Xferd  Average Speed   Time    Time     Time  
>> Current
>>                                 Dload  Upload   Total   Spent    Left  
>> Speed
>>  0     0    0     0    0     0      0      0 --:--:--  0:01:03 
>> --:--:--     0
>>
>>
>> curl: (7) couldn't connect to host
>
> This is suspect. You sure your host has the type of network 
> connectivity that allows it to connect to the outside world?
>
> Also, what version of Hadoop are you using and how it was installed?
>
> Finally, can you make sure that the basic stuff like:
>   hadoop fs -lsr .
>
> works?
>
> Thanks,
> Roman.

--
Lance Norskog
goksron@gmail.com

Think green - keep it on the screen.

This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you.

Re: several info

Posted by Lance Norskog <go...@gmail.com>.

After you get your network connection problems sorted, it will be
easier if you remove your HADOOP environment variables. Mahout
includes its own Hadoop. Mahout will run in local pseudo-disributed
mode if you do not have HADOOP_* environment variables set.

On Mon, Jun 25, 2012 at 9:00 AM, Roman Shaposhnik <ro...@shaposhnik.org> wrote:
> On Mon, Jun 25, 2012 at 7:48 AM, Videnova, Svetlana
> <sv...@logica.com> wrote:
>>  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
>>                                 Dload  Upload   Total   Spent    Left  Speed
>>  0     0    0     0    0     0      0      0 --:--:--  0:01:03 --:--:--     0
>>
>>
>> curl: (7) couldn't connect to host
>
> This is suspect. You sure your host has the type of network connectivity that
> allows it to connect to the outside world?
>
> Also, what version of Hadoop are you using and how it was installed?
>
> Finally, can you make sure that the basic stuff like:
>   hadoop fs -lsr .
>
> works?
>
> Thanks,
> Roman.



-- 
Lance Norskog
goksron@gmail.com

Re: several info

Posted by Roman Shaposhnik <ro...@shaposhnik.org>.

On Mon, Jun 25, 2012 at 7:48 AM, Videnova, Svetlana
<sv...@logica.com> wrote:
>  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
>                                 Dload  Upload   Total   Spent    Left  Speed
>  0     0    0     0    0     0      0      0 --:--:--  0:01:03 --:--:--     0
>
>
> curl: (7) couldn't connect to host

This is suspect. You sure your host has the type of network connectivity that
allows it to connect to the outside world?

Also, what version of Hadoop are you using and how it was installed?

Finally, can you make sure that the basic stuff like:
   hadoop fs -lsr .

works?

Thanks,
Roman.

RE: several info

Posted by "Videnova, Svetlana" <sv...@logica.com>.

Also I tried to run the example, but=>


/usr/local/mahout-distribution-0.7/examples/bin$ ./build-reuters.sh 
Please call cluster-reuters.sh directly next time.  This file is going away.
Please select a number to choose the corresponding clustering algorithm
1. kmeans clustering
2. fuzzykmeans clustering
3. dirichlet clustering
4. minhash clustering
Enter your choice : 1
ok. You chose 1 and we'll use kmeans Clustering
creating work directory at /tmp/mahout-work-hduser
Downloading Reuters-21578
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:--  0:01:03 --:--:--     0curl: (7) couldn't connect to host
Extracting...
tar (child): /tmp/mahout-work-hduser/reuters21578.tar.gz : open impossible: Aucun fichier ou dossier de ce type
tar (child): Error is not recoverable: exiting now
tar: Child returned status 2
tar: Error is not recoverable: exiting now
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Warning: $HADOOP_HOME is deprecated.

Running on hadoop, using /usr/local/hadoop/bin/hadoop and HADOOP_CONF_DIR=/usr/local/hadoop/conf
MAHOUT-JOB: /usr/local/mahout-distribution-0.7/mahout-examples-0.7-job.jar
Warning: $HADOOP_HOME is deprecated.

12/06/25 15:54:53 WARN driver.MahoutDriver: No org.apache.lucene.benchmark.utils.ExtractReuters.props found on classpath, will use command-line arguments only
Deleting all files in /tmp/mahout-work-hduser/reuters-out-tmp
No .sgm files in /tmp/mahout-work-hduser/reuters-sgm
12/06/25 15:54:53 INFO driver.MahoutDriver: Program took 3 ms (Minutes: 6.666666666666667E-5)
MAHOUT_LOCAL is set, so we don't add HADOOP_CONF_DIR to classpath.
Warning: $HADOOP_HOME is deprecated.

MAHOUT_LOCAL is set, running locally
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/mahout-distribution-0.7/mahout-examples-0.7-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/mahout-distribution-0.7/lib/slf4j-jcl-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/mahout-distribution-0.7/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/util/ProgramDriver
	at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:96)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.util.ProgramDriver
	at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
	at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
	... 1 more
Warning: $HADOOP_HOME is deprecated.

12/06/25 15:55:01 INFO ipc.Client: Retrying connect to server: localhost/10.84.30.51:54310. Already tried 0 time(s).
12/06/25 15:55:02 INFO ipc.Client: Retrying connect to server: localhost/10.84.30.51:54310. Already tried 1 time(s).
12/06/25 15:55:03 INFO ipc.Client: Retrying connect to server: localhost/10.84.30.51:54310. Already tried 2 time(s).
12/06/25 15:55:04 INFO ipc.Client: Retrying connect to server: localhost/10.84.30.51:54310. Already tried 3 time(s).
12/06/25 15:55:05 INFO ipc.Client: Retrying connect to server: localhost/10.84.30.51:54310. Already tried 4 time(s).
12/06/25 15:55:06 INFO ipc.Client: Retrying connect to server: localhost/10.84.30.51:54310. Already tried 5 time(s).
12/06/25 15:55:07 INFO ipc.Client: Retrying connect to server: localhost/10.84.30.51:54310. Already tried 6 time(s).
12/06/25 15:55:08 INFO ipc.Client: Retrying connect to server: localhost/10.84.30.51:54310. Already tried 7 time(s).
12/06/25 15:55:09 INFO ipc.Client: Retrying connect to server: localhost/10.84.30.51:54310. Already tried 8 time(s).
12/06/25 15:55:10 INFO ipc.Client: Retrying connect to server: localhost/10.84.30.51:54310. Already tried 9 time(s).
Bad connection to FS. command aborted. exception: Call to localhost/10.84.30.51:54310 failed on connection exception: java.net.ConnectException: Connection refused
Warning: $HADOOP_HOME is deprecated.

12/06/25 15:55:12 INFO ipc.Client: Retrying connect to server: localhost/10.84.30.51:54310. Already tried 0 time(s).
12/06/25 15:55:13 INFO ipc.Client: Retrying connect to server: localhost/10.84.30.51:54310. Already tried 1 time(s).
12/06/25 15:55:14 INFO ipc.Client: Retrying connect to server: localhost/10.84.30.51:54310. Already tried 2 time(s).
12/06/25 15:55:15 INFO ipc.Client: Retrying connect to server: localhost/10.84.30.51:54310. Already tried 3 time(s).
12/06/25 15:55:16 INFO ipc.Client: Retrying connect to server: localhost/10.84.30.51:54310. Already tried 4 time(s).
12/06/25 15:55:17 INFO ipc.Client: Retrying connect to server: localhost/10.84.30.51:54310. Already tried 5 time(s).
12/06/25 15:55:18 INFO ipc.Client: Retrying connect to server: localhost/10.84.30.51:54310. Already tried 6 time(s).
12/06/25 15:55:19 INFO ipc.Client: Retrying connect to server: localhost/10.84.30.51:54310. Already tried 7 time(s).
12/06/25 15:55:20 INFO ipc.Client: Retrying connect to server: localhost/10.84.30.51:54310. Already tried 8 time(s).
12/06/25 15:55:21 INFO ipc.Client: Retrying connect to server: localhost/10.84.30.51:54310. Already tried 9 time(s).
Bad connection to FS. command aborted. exception: Call to localhost/10.84.30.51:54310 failed on connection exception: java.net.ConnectException: Connection refused



-----Message d'origine-----
De : Sean Owen [mailto:srowen@gmail.com] 
Envoyé : lundi 25 juin 2012 16:33
À : dev@mahout.apache.org
Objet : Re: several info

Either you have a typo, or you are not looking at the right setting.
Is your system out of RAM with no swap or something?

On Mon, Jun 25, 2012 at 3:28 PM, Videnova, Svetlana <sv...@logica.com> wrote:
> I have 4GB RAM. I set 2GB...


Think green - keep it on the screen.

This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you.

RE: several info

Posted by "Videnova, Svetlana" <sv...@logica.com>.

Where can i looking for?
I don’t think that I have any problems with my system ...

-----Message d'origine-----
De : Sean Owen [mailto:srowen@gmail.com] 
Envoyé : lundi 25 juin 2012 16:33
À : dev@mahout.apache.org
Objet : Re: several info

Either you have a typo, or you are not looking at the right setting.
Is your system out of RAM with no swap or something?

On Mon, Jun 25, 2012 at 3:28 PM, Videnova, Svetlana <sv...@logica.com> wrote:
> I have 4GB RAM. I set 2GB...


Think green - keep it on the screen.

This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you.

Re: several info

Posted by Sean Owen <sr...@gmail.com>.

Either you have a typo, or you are not looking at the right setting.
Is your system out of RAM with no swap or something?

On Mon, Jun 25, 2012 at 3:28 PM, Videnova, Svetlana
<sv...@logica.com> wrote:
> I have 4GB RAM. I set 2GB...

RE: several info

Posted by "Videnova, Svetlana" <sv...@logica.com>.

I have 4GB RAM. I set 2GB...

-----Message d'origine-----
De : Sean Owen [mailto:srowen@gmail.com] 
Envoyé : lundi 25 juin 2012 16:23
À : dev@mahout.apache.org
Objet : Re: several info

This isn't specific to Mahout:

Error occurred during initialization of VM Could not reserve enough space for object heap

This means that you set a heap size that is too big for the machine.
For example, maybe you requested a 4GB heap on a 32-bit machine.

On Mon, Jun 25, 2012 at 2:41 PM, Videnova, Svetlana <sv...@logica.com> wrote:
> Please can somebody help me with this error?
>
>
> Im using mahout 0.7
>
>
> /usr/local/mahout-distribution-0.7/examples/bin$ ./build-reuters.sh 
> Please call cluster-reuters.sh directly next time.  This file is going away.
> Please select a number to choose the corresponding clustering 
> algorithm 1. kmeans clustering 2. fuzzykmeans clustering 3. dirichlet 
> clustering 4. minhash clustering Enter your choice : 1 ok. You chose 1 
> and we'll use kmeans Clustering creating work directory at 
> /tmp/mahout-work-hduser MAHOUT_LOCAL is set, so we don't add 
> HADOOP_CONF_DIR to classpath.
> Warning: $HADOOP_HOME is deprecated.
>
> MAHOUT_LOCAL is set, running locally
> Error occurred during initialization of VM Could not reserve enough 
> space for object heap
> Error: Could not create the Java Virtual Machine.
> Error: A fatal exception has occurred. Program will exit.
> Warning: $HADOOP_HOME is deprecated.
>
> rmr: cannot remove /tmp/mahout-work-hduser/reuters-out-seqdir: No such file or directory.
> Warning: $HADOOP_HOME is deprecated.
>
> put: File /tmp/mahout-work-hduser/reuters-out-seqdir does not exist.
>

Think green - keep it on the screen.

This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you.

Re: several info

Posted by Sean Owen <sr...@gmail.com>.

This isn't specific to Mahout:

Error occurred during initialization of VM
Could not reserve enough space for object heap

This means that you set a heap size that is too big for the machine.
For example, maybe you requested a 4GB heap on a 32-bit machine.

On Mon, Jun 25, 2012 at 2:41 PM, Videnova, Svetlana
<sv...@logica.com> wrote:
> Please can somebody help me with this error?
>
>
> Im using mahout 0.7
>
>
> /usr/local/mahout-distribution-0.7/examples/bin$ ./build-reuters.sh
> Please call cluster-reuters.sh directly next time.  This file is going away.
> Please select a number to choose the corresponding clustering algorithm
> 1. kmeans clustering
> 2. fuzzykmeans clustering
> 3. dirichlet clustering
> 4. minhash clustering
> Enter your choice : 1
> ok. You chose 1 and we'll use kmeans Clustering
> creating work directory at /tmp/mahout-work-hduser
> MAHOUT_LOCAL is set, so we don't add HADOOP_CONF_DIR to classpath.
> Warning: $HADOOP_HOME is deprecated.
>
> MAHOUT_LOCAL is set, running locally
> Error occurred during initialization of VM
> Could not reserve enough space for object heap
> Error: Could not create the Java Virtual Machine.
> Error: A fatal exception has occurred. Program will exit.
> Warning: $HADOOP_HOME is deprecated.
>
> rmr: cannot remove /tmp/mahout-work-hduser/reuters-out-seqdir: No such file or directory.
> Warning: $HADOOP_HOME is deprecated.
>
> put: File /tmp/mahout-work-hduser/reuters-out-seqdir does not exist.
>

RE: several info

Posted by "Videnova, Svetlana" <sv...@logica.com>.

Please can somebody help me with this error?

Im using mahout 0.7 

/usr/local/mahout-distribution-0.7/examples/bin$ ./build-reuters.sh
Please call cluster-reuters.sh directly next time.  This file is going away.
Please select a number to choose the corresponding clustering algorithm
1. kmeans clustering
2. fuzzykmeans clustering
3. dirichlet clustering
4. minhash clustering
Enter your choice : 1
ok. You chose 1 and we'll use kmeans Clustering
creating work directory at /tmp/mahout-work-hduser
MAHOUT_LOCAL is set, so we don't add HADOOP_CONF_DIR to classpath.
Warning: $HADOOP_HOME is deprecated.

MAHOUT_LOCAL is set, running locally
Error occurred during initialization of VM
Could not reserve enough space for object heap
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.
Warning: $HADOOP_HOME is deprecated.

rmr: cannot remove /tmp/mahout-work-hduser/reuters-out-seqdir: No such file or directory.
Warning: $HADOOP_HOME is deprecated.

put: File /tmp/mahout-work-hduser/reuters-out-seqdir does not exist.

-----Message d'origine-----
De : Lance Norskog [mailto:goksron@gmail.com] 
Envoyé : dimanche 24 juin 2012 02:03
À : dev@mahout.apache.org
Objet : Re: several info

Please describe what you would like to do. What would you like to learn from your data? We cannot recommend techniques until we know this.

On Fri, Jun 22, 2012 at 5:54 AM, Videnova, Svetlana <sv...@logica.com> wrote:
> HI,
> Sorry I didnt find how the source code of this link https://github.com/gsingers/ApacheCon2010 can help me, maybe I missed some information...
> I'm ok about doing some code that's no problem but where and with which purpose??I mean for the moment I don't even know what hadoop/mahout/solr needs to work together and moreover what I have to add on already existing files in order to add my own database.
>
> Thanks
>
> Regards
>
>
> -----Message d'origine-----
> De : Grant Ingersoll [mailto:gsingers@apache.org] Envoyé : vendredi 22 
> juin 2012 13:41 À : dev@mahout.apache.org Objet : Re: several info
>
>
> On Jun 22, 2012, at 2:30 AM, Videnova, Svetlana wrote:
>
>> Hi Grant,
>>
>> Thank you for your fast answer.
>> My question was where can I get the output file on Mahout and how can I vectorize the indexing solr files, and where put them in Mahout?
>
> I think that link I provided shows how to get data out of Solr and into Mahout.  You need term vectors (or see https://issues.apache.org/jira/browse/MAHOUT-944 for using it off of stored fields).  To get things back into Solr, you'll have to write some code to do that.
>
> -Grant
>
> Think green - keep it on the screen.
>
> This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you.
>
>

--
Lance Norskog
goksron@gmail.com

Think green - keep it on the screen.

This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you.

Re: several info

Posted by Lance Norskog <go...@gmail.com>.

Please describe what you would like to do. What would you like to
learn from your data? We cannot recommend techniques until we know
this.

On Fri, Jun 22, 2012 at 5:54 AM, Videnova, Svetlana
<sv...@logica.com> wrote:
> HI,
> Sorry I didnt find how the source code of this link https://github.com/gsingers/ApacheCon2010 can help me, maybe I missed some information...
> I'm ok about doing some code that's no problem but where and with which purpose??I mean for the moment I don't even know what hadoop/mahout/solr needs to work together and moreover what I have to add on already existing files in order to add my own database.
>
> Thanks
>
> Regards
>
>
> -----Message d'origine-----
> De : Grant Ingersoll [mailto:gsingers@apache.org]
> Envoyé : vendredi 22 juin 2012 13:41
> À : dev@mahout.apache.org
> Objet : Re: several info
>
>
> On Jun 22, 2012, at 2:30 AM, Videnova, Svetlana wrote:
>
>> Hi Grant,
>>
>> Thank you for your fast answer.
>> My question was where can I get the output file on Mahout and how can I vectorize the indexing solr files, and where put them in Mahout?
>
> I think that link I provided shows how to get data out of Solr and into Mahout.  You need term vectors (or see https://issues.apache.org/jira/browse/MAHOUT-944 for using it off of stored fields).  To get things back into Solr, you'll have to write some code to do that.
>
> -Grant
>
> Think green - keep it on the screen.
>
> This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you.
>
>



-- 
Lance Norskog
goksron@gmail.com

RE: several info

Posted by "Videnova, Svetlana" <sv...@logica.com>.

HI,
Sorry I didnt find how the source code of this link https://github.com/gsingers/ApacheCon2010 can help me, maybe I missed some information...
I'm ok about doing some code that's no problem but where and with which purpose??I mean for the moment I don't even know what hadoop/mahout/solr needs to work together and moreover what I have to add on already existing files in order to add my own database.

Thanks

Regards

-----Message d'origine-----
De : Grant Ingersoll [mailto:gsingers@apache.org] 
Envoyé : vendredi 22 juin 2012 13:41
À : dev@mahout.apache.org
Objet : Re: several info

On Jun 22, 2012, at 2:30 AM, Videnova, Svetlana wrote:

> Hi Grant,
> 
> Thank you for your fast answer.
> My question was where can I get the output file on Mahout and how can I vectorize the indexing solr files, and where put them in Mahout?

I think that link I provided shows how to get data out of Solr and into Mahout.  You need term vectors (or see https://issues.apache.org/jira/browse/MAHOUT-944 for using it off of stored fields).  To get things back into Solr, you'll have to write some code to do that.

-Grant

Think green - keep it on the screen.

This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you.

Re: several info

Posted by Grant Ingersoll <gs...@apache.org>.

On Jun 22, 2012, at 2:30 AM, Videnova, Svetlana wrote:

> Hi Grant,
> 
> Thank you for your fast answer.
> My question was where can I get the output file on Mahout and how can I vectorize the indexing solr files, and where put them in Mahout?

I think that link I provided shows how to get data out of Solr and into Mahout.  You need term vectors (or see https://issues.apache.org/jira/browse/MAHOUT-944 for using it off of stored fields).  To get things back into Solr, you'll have to write some code to do that.

-Grant

RE: several info

Posted by "Videnova, Svetlana" <sv...@logica.com>.

Hi Grant,

Thank you for your fast answer.
My question was where can I get the output file on Mahout and how can I vectorize the indexing solr files, and where put them in Mahout?
I'll try to find some info there.

I'm sorry of my confusion I'll post next questions on user@mahout.apache.org .


Regards 

-----Message d'origine-----
De : Grant Ingersoll [mailto:gsingers@apache.org] 
Envoyé : jeudi 21 juin 2012 21:14
À : dev@mahout.apache.org
Objet : Re: several info

Hi Svetlana,

I'm not sure I understand what question you are asking.  Perhaps if you can back up and tell us the problem you are trying to solve we can point you in the right direction.  Mahout is a library of tools and can integrate with Solr in a variety of ways, almost none of which are out of the box at the moment.

It's a little dated, but perhaps this helps: http://www.lucidimagination.com/blog/2010/03/16/integrating-apache-mahout-with-apache-lucene-and-solr-part-i-of-3/  (someday I will finish II and III of that series)

There are also various other sources on the web and I've given some talks on it in the past as well as put up some code at https://github.com/gsingers/ApacheCon2010 (which is also outdated)

Finally, this type of question is best asked on user@mahout.apache.org, just for future reference.

-Grant

On Jun 20, 2012, at 3:36 AM, Videnova, Svetlana wrote:

> How do you want to combine Mahout and Solr? => that's was my question
> 
> I was using mahout0.6 but from yesterday Mahout0.7.
> 
> So I was trying to run (just for test and making sure that everything works properly)
> 
> 
> 
> ###############################################################################################################
> 
> :/usr/local/mahout-distribution-0.7/examples/bin$ ./build-cluster-syntheticcontrol.sh
> 
> Please call cluster-syntheticcontrol.sh directly next time.  This file is going away.
> 
> Please select a number to choose the corresponding clustering algorithm
> 
> 1. canopy clustering
> 
> 2. kmeans clustering
> 
> 3. fuzzykmeans clustering
> 
> 4. dirichlet clustering
> 
> 5. meanshift clustering
> 
> Enter your choice : 1
> 
> ok. You chose 1 and we'll use canopy Clustering
> 
> creating work directory at /tmp/mahout-work-hduser
> 
> Downloading Synthetic control data
> 
>  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
> 
>                                 Dload  Upload   Total   Spent    Left  Speed
> 
>  0     0    0     0    0     0      0      0 --:--:--  0:01:03 --:--:--     0curl: (7) couldn't connect to host
> 
> Checking the health of DFS...
> 
> Warning: $HADOOP_HOME is deprecated.
> 
> 
> 
> Found 4 items
> 
> drwxr-xr-x   - hduser supergroup          0 2012-06-18 14:05 /user/hduser/gutenberg
> 
> drwxr-xr-x   - hduser supergroup          0 2012-06-18 14:07 /user/hduser/gutenberg-output
> 
> drwxr-xr-x   - hduser supergroup          0 2012-06-18 15:35 /user/hduser/output
> 
> drwxr-xr-x   - hduser supergroup          0 2012-06-19 14:24 /user/hduser/testdata
> 
> DFS is healthy...
> 
> Uploading Synthetic control data to HDFS
> 
> Warning: $HADOOP_HOME is deprecated.
> 
> 
> 
> Deleted hdfs://localhost:54310/user/hduser/testdata
> 
> Warning: $HADOOP_HOME is deprecated.
> 
> 
> 
> Warning: $HADOOP_HOME is deprecated.
> 
> 
> 
> put: File /tmp/mahout-work-hduser/synthetic_control.data does not exist.
> 
> Successfully Uploaded Synthetic control data to HDFS
> 
> Warning: $HADOOP_HOME is deprecated.
> 
> 
> 
> Running on hadoop, using /usr/local/hadoop/bin/hadoop and HADOOP_CONF_DIR=
> 
> MAHOUT-JOB: /usr/local/mahout-distribution-0.7/mahout-examples-0.7-job.jar
> 
> Warning: $HADOOP_HOME is deprecated.
> 
> 
> 
> 12/06/20 08:20:24 WARN driver.MahoutDriver: No org.apache.mahout.clustering.syntheticcontrol.canopy.Job.props found on classpath, will use command-line arguments only
> 
> 12/06/20 08:20:24 INFO canopy.Job: Running with default arguments
> 
> 12/06/20 08:20:25 INFO common.HadoopUtil: Deleting output
> 
> 12/06/20 08:20:26 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
> 
> 12/06/20 08:20:28 INFO input.FileInputFormat: Total input paths to process : 0
> 
> 12/06/20 08:20:28 INFO mapred.JobClient: Running job: job_201206181326_0030
> 
> 12/06/20 08:20:29 INFO mapred.JobClient:  map 0% reduce 0%
> 
> 12/06/20 08:20:52 INFO mapred.JobClient: Job complete: job_201206181326_0030
> 
> 12/06/20 08:20:52 INFO mapred.JobClient: Counters: 4
> 
> 12/06/20 08:20:52 INFO mapred.JobClient:   Job Counters
> 
> 12/06/20 08:20:52 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=10970
> 
> 12/06/20 08:20:52 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
> 
> 12/06/20 08:20:52 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
> 
> 12/06/20 08:20:52 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
> 
> 12/06/20 08:20:52 INFO canopy.CanopyDriver: Build Clusters Input: output/data Out: output Measure: org.apache.mahout.common.distance.EuclideanDistanceMeasure@c5967f t1: 80.0 t2: 55.0
> 
> 12/06/20 08:20:52 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
> 
> 12/06/20 08:20:53 INFO input.FileInputFormat: Total input paths to process : 0
> 
> 12/06/20 08:20:53 INFO mapred.JobClient: Running job: job_201206181326_0031
> 
> 12/06/20 08:20:54 INFO mapred.JobClient:  map 0% reduce 0%
> 
> 12/06/20 08:21:17 INFO mapred.JobClient:  map 0% reduce 100%
> 
> 12/06/20 08:21:22 INFO mapred.JobClient: Job complete: job_201206181326_0031
> 
> 12/06/20 08:21:22 INFO mapred.JobClient: Counters: 19
> 
> 12/06/20 08:21:22 INFO mapred.JobClient:   Job Counters
> 
> 12/06/20 08:21:22 INFO mapred.JobClient:     Launched reduce tasks=1
> 
> 12/06/20 08:21:22 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=9351
> 
> 12/06/20 08:21:22 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
> 
> 12/06/20 08:21:22 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
> 
> 12/06/20 08:21:22 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=7740
> 
> 12/06/20 08:21:22 INFO mapred.JobClient:   File Output Format Counters
> 
> 12/06/20 08:21:22 INFO mapred.JobClient:     Bytes Written=106
> 
> 12/06/20 08:21:22 INFO mapred.JobClient:   FileSystemCounters
> 
> 12/06/20 08:21:22 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=22545
> 
> 12/06/20 08:21:22 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=106
> 
> 12/06/20 08:21:22 INFO mapred.JobClient:   Map-Reduce Framework
> 
> 12/06/20 08:21:22 INFO mapred.JobClient:     Reduce input groups=0
> 
> 12/06/20 08:21:22 INFO mapred.JobClient:     Combine output records=0
> 
> 12/06/20 08:21:22 INFO mapred.JobClient:     Reduce shuffle bytes=0
> 
> 12/06/20 08:21:22 INFO mapred.JobClient:     Physical memory (bytes) snapshot=40652800
> 
> 12/06/20 08:21:22 INFO mapred.JobClient:     Reduce output records=0
> 
> 12/06/20 08:21:22 INFO mapred.JobClient:     Spilled Records=0
> 
> 12/06/20 08:21:22 INFO mapred.JobClient:     CPU time spent (ms)=420
> 
> 12/06/20 08:21:22 INFO mapred.JobClient:     Total committed heap usage (bytes)=16252928
> 
> 12/06/20 08:21:22 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=383250432
> 
> 12/06/20 08:21:22 INFO mapred.JobClient:     Combine input records=0
> 
> 12/06/20 08:21:22 INFO mapred.JobClient:     Reduce input records=0
> 
> 12/06/20 08:21:22 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
> 
> 12/06/20 08:21:23 INFO input.FileInputFormat: Total input paths to process : 0
> 
> 12/06/20 08:21:23 INFO mapred.JobClient: Running job: job_201206181326_0032
> 
> 12/06/20 08:21:24 INFO mapred.JobClient:  map 0% reduce 0%
> 
> 12/06/20 08:21:43 INFO mapred.JobClient: Job complete: job_201206181326_0032
> 
> 12/06/20 08:21:43 INFO mapred.JobClient: Counters: 4
> 
> 12/06/20 08:21:43 INFO mapred.JobClient:   Job Counters
> 
> 12/06/20 08:21:43 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=9347
> 
> 12/06/20 08:21:43 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
> 
> 12/06/20 08:21:43 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
> 
> 12/06/20 08:21:43 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
> 
> 12/06/20 08:21:43 INFO clustering.ClusterDumper: Wrote 0 clusters
> 
> 12/06/20 08:21:43 INFO driver.MahoutDriver: Program took 78406 ms (Minutes: 1.3067666666666666)
> 
> ###############################################################################################
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> How do you want to combine Mahout and Solr? Also, Solr is a web
> 
> service and can receive and supply data in several different formats.
> 
> 
> 
> On Tue, Jun 19, 2012 at 6:04 AM, Paritosh Ranjan <pranjan <at> xebia.com> wrote:
> 
>> Regarding the errors,
> 
>> which version of Mahout are you using?
> 
>> There was some problem in cluster-reuters.sh ( build-reuters.sh calls cluster-reuters.sh ) which has
> 
> been fixed in the last release 0.7.
> 
>> ________________________________________
> 
>> From: Svet [svetlana.videnova <at> logica.com]
> 
>> Sent: Tuesday, June 19, 2012 2:51 PM
> 
>> To: user <at> mahout.apache.org
> 
>> Subject: several info
> 
>> 
> 
>> Hi all,
> 
>> 
> 
>> 
> 
>> First of all i would like to thanks Praveenesh Kumar for helping me with hadoop
> 
>> and mahout!!!
> 
>> 
> 
>> Nevertheless i have several questions about Mahout.
> 
>> 
> 
>> 1) I need Mahout working with SOLR. Can somebody give me a great tutorial to
> 
>> make them starting together?
> 
>> 
> 
>> 2)What exactly the possibilities of input and output files of Mahout (especially
> 
>> when Mahout works with SOLR, i know that output file of SOLR is XML)?
> 
>> 
> 
>> 3)Which of thoses algorythms are using Hadoop? And please complete the list if i
> 
>> forgot some.
> 
>>         -Canopy, KMeans, Dirichlet, Mean-shift, Latent Dirichlet Allocation
> 
>> 
> 
>> 
> 
>> 
> 
>> 
> 
>> 4)Moreover i was trying to run "./build-reuters.sh" and choosing kmeans
> 
>> clustering (but its the same error with fuzzykmeans)
> 
>> Can somebody help me with this error? (but look at 8) ! )
> 
>> ###########################
> 
>> 12/06/19 13:33:52 WARN mapred.LocalJobRunner: job_local_0001
> 
>> java.lang.IllegalStateException: No clusters found. Check your -c path.
> 
>>       at
> 
>> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:59)
> 
>>       at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
> 
>>       at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
> 
>>       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> 
>>       at
> 
>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
> 
>> 12/06/19 13:33:52 INFO mapred.JobClient:  map 0% reduce 0%
> 
>> 12/06/19 13:33:52 INFO mapred.JobClient: Job complete: job_local_0001
> 
>> 12/06/19 13:33:52 INFO mapred.JobClient: Counters: 0
> 
>> Exception in thread "main" java.lang.InterruptedException: K-Means Iteration
> 
>> failed processing /tmp/mahout-work-hduser/reuters-kmeans-clusters/part-
> 
>> randomSeed
> 
>>       at
> 
>> org.apache.mahout.clustering.kmeans.KMeansDriver.runIteration(KMeansDriver.java:
> 
>> 371)
> 
>>       at
> 
>> org.apache.mahout.clustering.kmeans.KMeansDriver.buildClustersMR(KMeansDriver.ja
> 
>> va:316)
> 
>>       at
> 
>> org.apache.mahout.clustering.kmeans.KMeansDriver.buildClusters(KMeansDriver.java
> 
>> :239)
> 
>>       at
> 
>> org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:154)
> 
>>       at
> 
>> org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:112)
> 
>>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> 
>>       at
> 
>> org.apache.mahout.clustering.kmeans.KMeansDriver.main(KMeansDriver.java:61)
> 
>>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 
>>       at
> 
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> 
>>       at
> 
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.jav
> 
>> a:43)
> 
>>       at java.lang.reflect.Method.invoke(Method.java:601)
> 
>>       at
> 
>> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.jav
> 
>> a:68)
> 
>>       at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> 
>>       at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)
> 
>> 
> 
>> ###########################
> 
>> 
> 
>> 
> 
>> 5)problem also with "./build-reuters" but lda (but look at 8) ! )
> 
>> ############################
> 
>> 12/06/19 13:40:01 WARN mapred.LocalJobRunner: job_local_0001
> 
>> java.lang.IllegalArgumentException
> 
>>       at
> 
>> com.google.common.base.Preconditions.checkArgument(Preconditions.java:72)
> 
>>       at
> 
>> org.apache.mahout.clustering.lda.LDADriver.createState(LDADriver.java:124)
> 
>>       at
> 
>> org.apache.mahout.clustering.lda.LDADriver.createState(LDADriver.java:92)
> 
>>       at
> 
>> org.apache.mahout.clustering.lda.LDAWordTopicMapper.configure(LDAWordTopicMapper
> 
>> .java:96)
> 
>>       at
> 
>> org.apache.mahout.clustering.lda.LDAWordTopicMapper.setup(LDAWordTopicMapper.jav
> 
>> a:102)
> 
>>       at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
> 
>>       at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
> 
>>       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> 
>>       at
> 
>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
> 
>> 12/06/19 13:40:02 INFO mapred.JobClient:  map 0% reduce 0%
> 
>> 12/06/19 13:40:02 INFO mapred.JobClient: Job complete: job_local_0001
> 
>> 12/06/19 13:40:02 INFO mapred.JobClient: Counters: 0
> 
>> Exception in thread "main" java.lang.InterruptedException: LDA Iteration failed
> 
>> processing /tmp/mahout-work-hduser/reuters-lda/state-0
> 
>>       at
> 
>> org.apache.mahout.clustering.lda.LDADriver.runIteration(LDADriver.java:449)
> 
>>       at org.apache.mahout.clustering.lda.LDADriver.run(LDADriver.java:249)
> 
>>       at org.apache.mahout.clustering.lda.LDADriver.run(LDADriver.java:169)
> 
>>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> 
>>       at org.apache.mahout.clustering.lda.LDADriver.main(LDADriver.java:88)
> 
>>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 
>>       at
> 
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> 
>>       at
> 
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.jav
> 
>> a:43)
> 
>>       at java.lang.reflect.Method.invoke(Method.java:601)
> 
>>       at
> 
>> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.jav
> 
>> a:68)
> 
>>       at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> 
>>       at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)
> 
>> ############################
> 
>> 
> 
>> 
> 
>> 6)But i was starting "./build-reuters" with dirichlet clustering and it wrote
> 
>> 20clusters without problems (but look at 8) ! )
> 
>> The result is :
> 
>> ############################
> 
>> ...
> 
>> 12/06/19 13:45:12 INFO driver.MahoutDriver: Program took 142609 ms (Minutes:
> 
>> 2.3768166666666666)
> 
>> MAHOUT_LOCAL is set, so we don't add HADOOP_CONF_DIR to classpath.
> 
>> MAHOUT_LOCAL is set, running locally
> 
>> SLF4J: Class path contains multiple SLF4J bindings.
> 
>> SLF4J: Found binding in [jar:file:/usr/local/mahout-distribution-0.6/mahout-
> 
>> examples-0.6-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> 
>> SLF4J: Found binding in [jar:file:/usr/local/mahout-distribution-0.6/lib/slf4j-
> 
>> jcl-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> 
>> SLF4J: Found binding in [jar:file:/usr/local/mahout-distribution-0.6/lib/slf4j-
> 
>> log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> 
>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
> 
>> 12/06/19 13:45:13 INFO common.AbstractJob: Command line arguments: {--
> 
>> dictionary=/tmp/mahout-work-hduser/reuters-out-seqdir-sparse-
> 
>> dirichlet/dictionary.file-0, --dictionaryType=sequencefile, --
> 
>> distanceMeasure=org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasur
> 
>> e, --endPhase=2147483647, --numWords=20, --outputFormat=TEXT, --
> 
>> seqFileDir=/tmp/mahout-work-hduser/reuters-dirichlet/clusters-20-final, --
> 
>> startPhase=0, --substring=100, --tempDir=temp}
> 
>> DC-0 total= 0 model= DMC:0{n=0 c=[] r=[]}
> 
>>       Top Terms:
> 
>> DC-1 total= 0 model= DMC:1{n=0 c=[] r=[]}
> 
>>       Top Terms:
> 
>> DC-10 total= 0 model= DMC:10{n=0 c=[] r=[]}
> 
>>       Top Terms:
> 
>> DC-11 total= 0 model= DMC:11{n=0 c=[] r=[]}
> 
>>       Top Terms:
> 
>> DC-12 total= 0 model= DMC:12{n=0 c=[] r=[]}
> 
>>       Top Terms:
> 
>> DC-13 total= 0 model= DMC:13{n=0 c=[] r=[]}
> 
>>       Top Terms:
> 
>> DC-14 total= 0 model= DMC:14{n=0 c=[] r=[]}
> 
>>       Top Terms:
> 
>> DC-15 total= 0 model= DMC:15{n=0 c=[] r=[]}
> 
>>       Top Terms:
> 
>> DC-16 total= 0 model= DMC:16{n=0 c=[] r=[]}
> 
>>       Top Terms:
> 
>> DC-17 total= 0 model= DMC:17{n=0 c=[] r=[]}
> 
>>       Top Terms:
> 
>> DC-18 total= 0 model= DMC:18{n=0 c=[] r=[]}
> 
>>       Top Terms:
> 
>> DC-19 total= 0 model= DMC:19{n=0 c=[] r=[]}
> 
>>       Top Terms:
> 
>> DC-2 total= 0 model= DMC:2{n=0 c=[] r=[]}
> 
>>       Top Terms:
> 
>> DC-3 total= 0 model= DMC:3{n=0 c=[] r=[]}
> 
>>       Top Terms:
> 
>> DC-4 total= 0 model= DMC:4{n=0 c=[] r=[]}
> 
>>       Top Terms:
> 
>> DC-5 total= 0 model= DMC:5{n=0 c=[] r=[]}
> 
>>       Top Terms:
> 
>> DC-6 total= 0 model= DMC:6{n=0 c=[] r=[]}
> 
>>       Top Terms:
> 
>> DC-7 total= 0 model= DMC:7{n=0 c=[] r=[]}
> 
>>       Top Terms:
> 
>> DC-8 total= 0 model= DMC:8{n=0 c=[] r=[]}
> 
>>       Top Terms:
> 
>> DC-9 total= 0 model= DMC:9{n=0 c=[] r=[]}
> 
>>       Top Terms:
> 
>> 12/06/19 13:45:14 INFO clustering.ClusterDumper: Wrote 20 clusters
> 
>> 12/06/19 13:45:14 INFO driver.MahoutDriver: Program took 789 ms (Minutes:
> 
>> 0.01315)
> 
>> ############################
> 
>> 
> 
>> 
> 
>> 7) And the end : "./build-reuters" with minhash clustering.
> 
>> Works good!
> 
>> 
> 
>> 
> 
>> 8) For 4),5),6) and 7) there is SUCCESS file in /tmp/mahout-work-hduser/
> 
>> 
> 
>> ...
> 
>> 
> 
>> 
> 
>> 
> 
>> Thanks everybody
> 
>> Regards
> 
>> 
> 
> 
> 
> --
> 
> Lance Norskog
> 
> goksron <at> gmail.com
> 
> 
> 
> 
> 
> Think green - keep it on the screen.
> 
> This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you.
> 

--------------------------------------------
Grant Ingersoll
http://www.lucidimagination.com




Think green - keep it on the screen.

This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you.

Re: several info

Posted by Grant Ingersoll <gs...@apache.org>.

Hi Svetlana,

I'm not sure I understand what question you are asking.  Perhaps if you can back up and tell us the problem you are trying to solve we can point you in the right direction.  Mahout is a library of tools and can integrate with Solr in a variety of ways, almost none of which are out of the box at the moment.

It's a little dated, but perhaps this helps: http://www.lucidimagination.com/blog/2010/03/16/integrating-apache-mahout-with-apache-lucene-and-solr-part-i-of-3/  (someday I will finish II and III of that series)

There are also various other sources on the web and I've given some talks on it in the past as well as put up some code at https://github.com/gsingers/ApacheCon2010 (which is also outdated)

Finally, this type of question is best asked on user@mahout.apache.org, just for future reference.

-Grant

On Jun 20, 2012, at 3:36 AM, Videnova, Svetlana wrote:

> How do you want to combine Mahout and Solr? => that's was my question
> 
> I was using mahout0.6 but from yesterday Mahout0.7.
> 
> So I was trying to run (just for test and making sure that everything works properly)
> 
> 
> 
> ###############################################################################################################
> 
> :/usr/local/mahout-distribution-0.7/examples/bin$ ./build-cluster-syntheticcontrol.sh
> 
> Please call cluster-syntheticcontrol.sh directly next time.  This file is going away.
> 
> Please select a number to choose the corresponding clustering algorithm
> 
> 1. canopy clustering
> 
> 2. kmeans clustering
> 
> 3. fuzzykmeans clustering
> 
> 4. dirichlet clustering
> 
> 5. meanshift clustering
> 
> Enter your choice : 1
> 
> ok. You chose 1 and we'll use canopy Clustering
> 
> creating work directory at /tmp/mahout-work-hduser
> 
> Downloading Synthetic control data
> 
>  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
> 
>                                 Dload  Upload   Total   Spent    Left  Speed
> 
>  0     0    0     0    0     0      0      0 --:--:--  0:01:03 --:--:--     0curl: (7) couldn't connect to host
> 
> Checking the health of DFS...
> 
> Warning: $HADOOP_HOME is deprecated.
> 
> 
> 
> Found 4 items
> 
> drwxr-xr-x   - hduser supergroup          0 2012-06-18 14:05 /user/hduser/gutenberg
> 
> drwxr-xr-x   - hduser supergroup          0 2012-06-18 14:07 /user/hduser/gutenberg-output
> 
> drwxr-xr-x   - hduser supergroup          0 2012-06-18 15:35 /user/hduser/output
> 
> drwxr-xr-x   - hduser supergroup          0 2012-06-19 14:24 /user/hduser/testdata
> 
> DFS is healthy...
> 
> Uploading Synthetic control data to HDFS
> 
> Warning: $HADOOP_HOME is deprecated.
> 
> 
> 
> Deleted hdfs://localhost:54310/user/hduser/testdata
> 
> Warning: $HADOOP_HOME is deprecated.
> 
> 
> 
> Warning: $HADOOP_HOME is deprecated.
> 
> 
> 
> put: File /tmp/mahout-work-hduser/synthetic_control.data does not exist.
> 
> Successfully Uploaded Synthetic control data to HDFS
> 
> Warning: $HADOOP_HOME is deprecated.
> 
> 
> 
> Running on hadoop, using /usr/local/hadoop/bin/hadoop and HADOOP_CONF_DIR=
> 
> MAHOUT-JOB: /usr/local/mahout-distribution-0.7/mahout-examples-0.7-job.jar
> 
> Warning: $HADOOP_HOME is deprecated.
> 
> 
> 
> 12/06/20 08:20:24 WARN driver.MahoutDriver: No org.apache.mahout.clustering.syntheticcontrol.canopy.Job.props found on classpath, will use command-line arguments only
> 
> 12/06/20 08:20:24 INFO canopy.Job: Running with default arguments
> 
> 12/06/20 08:20:25 INFO common.HadoopUtil: Deleting output
> 
> 12/06/20 08:20:26 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
> 
> 12/06/20 08:20:28 INFO input.FileInputFormat: Total input paths to process : 0
> 
> 12/06/20 08:20:28 INFO mapred.JobClient: Running job: job_201206181326_0030
> 
> 12/06/20 08:20:29 INFO mapred.JobClient:  map 0% reduce 0%
> 
> 12/06/20 08:20:52 INFO mapred.JobClient: Job complete: job_201206181326_0030
> 
> 12/06/20 08:20:52 INFO mapred.JobClient: Counters: 4
> 
> 12/06/20 08:20:52 INFO mapred.JobClient:   Job Counters
> 
> 12/06/20 08:20:52 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=10970
> 
> 12/06/20 08:20:52 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
> 
> 12/06/20 08:20:52 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
> 
> 12/06/20 08:20:52 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
> 
> 12/06/20 08:20:52 INFO canopy.CanopyDriver: Build Clusters Input: output/data Out: output Measure: org.apache.mahout.common.distance.EuclideanDistanceMeasure@c5967f t1: 80.0 t2: 55.0
> 
> 12/06/20 08:20:52 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
> 
> 12/06/20 08:20:53 INFO input.FileInputFormat: Total input paths to process : 0
> 
> 12/06/20 08:20:53 INFO mapred.JobClient: Running job: job_201206181326_0031
> 
> 12/06/20 08:20:54 INFO mapred.JobClient:  map 0% reduce 0%
> 
> 12/06/20 08:21:17 INFO mapred.JobClient:  map 0% reduce 100%
> 
> 12/06/20 08:21:22 INFO mapred.JobClient: Job complete: job_201206181326_0031
> 
> 12/06/20 08:21:22 INFO mapred.JobClient: Counters: 19
> 
> 12/06/20 08:21:22 INFO mapred.JobClient:   Job Counters
> 
> 12/06/20 08:21:22 INFO mapred.JobClient:     Launched reduce tasks=1
> 
> 12/06/20 08:21:22 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=9351
> 
> 12/06/20 08:21:22 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
> 
> 12/06/20 08:21:22 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
> 
> 12/06/20 08:21:22 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=7740
> 
> 12/06/20 08:21:22 INFO mapred.JobClient:   File Output Format Counters
> 
> 12/06/20 08:21:22 INFO mapred.JobClient:     Bytes Written=106
> 
> 12/06/20 08:21:22 INFO mapred.JobClient:   FileSystemCounters
> 
> 12/06/20 08:21:22 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=22545
> 
> 12/06/20 08:21:22 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=106
> 
> 12/06/20 08:21:22 INFO mapred.JobClient:   Map-Reduce Framework
> 
> 12/06/20 08:21:22 INFO mapred.JobClient:     Reduce input groups=0
> 
> 12/06/20 08:21:22 INFO mapred.JobClient:     Combine output records=0
> 
> 12/06/20 08:21:22 INFO mapred.JobClient:     Reduce shuffle bytes=0
> 
> 12/06/20 08:21:22 INFO mapred.JobClient:     Physical memory (bytes) snapshot=40652800
> 
> 12/06/20 08:21:22 INFO mapred.JobClient:     Reduce output records=0
> 
> 12/06/20 08:21:22 INFO mapred.JobClient:     Spilled Records=0
> 
> 12/06/20 08:21:22 INFO mapred.JobClient:     CPU time spent (ms)=420
> 
> 12/06/20 08:21:22 INFO mapred.JobClient:     Total committed heap usage (bytes)=16252928
> 
> 12/06/20 08:21:22 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=383250432
> 
> 12/06/20 08:21:22 INFO mapred.JobClient:     Combine input records=0
> 
> 12/06/20 08:21:22 INFO mapred.JobClient:     Reduce input records=0
> 
> 12/06/20 08:21:22 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
> 
> 12/06/20 08:21:23 INFO input.FileInputFormat: Total input paths to process : 0
> 
> 12/06/20 08:21:23 INFO mapred.JobClient: Running job: job_201206181326_0032
> 
> 12/06/20 08:21:24 INFO mapred.JobClient:  map 0% reduce 0%
> 
> 12/06/20 08:21:43 INFO mapred.JobClient: Job complete: job_201206181326_0032
> 
> 12/06/20 08:21:43 INFO mapred.JobClient: Counters: 4
> 
> 12/06/20 08:21:43 INFO mapred.JobClient:   Job Counters
> 
> 12/06/20 08:21:43 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=9347
> 
> 12/06/20 08:21:43 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
> 
> 12/06/20 08:21:43 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
> 
> 12/06/20 08:21:43 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
> 
> 12/06/20 08:21:43 INFO clustering.ClusterDumper: Wrote 0 clusters
> 
> 12/06/20 08:21:43 INFO driver.MahoutDriver: Program took 78406 ms (Minutes: 1.3067666666666666)
> 
> ###############################################################################################
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> How do you want to combine Mahout and Solr? Also, Solr is a web
> 
> service and can receive and supply data in several different formats.
> 
> 
> 
> On Tue, Jun 19, 2012 at 6:04 AM, Paritosh Ranjan <pranjan <at> xebia.com> wrote:
> 
>> Regarding the errors,
> 
>> which version of Mahout are you using?
> 
>> There was some problem in cluster-reuters.sh ( build-reuters.sh calls cluster-reuters.sh ) which has
> 
> been fixed in the last release 0.7.
> 
>> ________________________________________
> 
>> From: Svet [svetlana.videnova <at> logica.com]
> 
>> Sent: Tuesday, June 19, 2012 2:51 PM
> 
>> To: user <at> mahout.apache.org
> 
>> Subject: several info
> 
>> 
> 
>> Hi all,
> 
>> 
> 
>> 
> 
>> First of all i would like to thanks Praveenesh Kumar for helping me with hadoop
> 
>> and mahout!!!
> 
>> 
> 
>> Nevertheless i have several questions about Mahout.
> 
>> 
> 
>> 1) I need Mahout working with SOLR. Can somebody give me a great tutorial to
> 
>> make them starting together?
> 
>> 
> 
>> 2)What exactly the possibilities of input and output files of Mahout (especially
> 
>> when Mahout works with SOLR, i know that output file of SOLR is XML)?
> 
>> 
> 
>> 3)Which of thoses algorythms are using Hadoop? And please complete the list if i
> 
>> forgot some.
> 
>>         -Canopy, KMeans, Dirichlet, Mean-shift, Latent Dirichlet Allocation
> 
>> 
> 
>> 
> 
>> 
> 
>> 
> 
>> 4)Moreover i was trying to run "./build-reuters.sh" and choosing kmeans
> 
>> clustering (but its the same error with fuzzykmeans)
> 
>> Can somebody help me with this error? (but look at 8) ! )
> 
>> ###########################
> 
>> 12/06/19 13:33:52 WARN mapred.LocalJobRunner: job_local_0001
> 
>> java.lang.IllegalStateException: No clusters found. Check your -c path.
> 
>>       at
> 
>> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:59)
> 
>>       at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
> 
>>       at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
> 
>>       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> 
>>       at
> 
>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
> 
>> 12/06/19 13:33:52 INFO mapred.JobClient:  map 0% reduce 0%
> 
>> 12/06/19 13:33:52 INFO mapred.JobClient: Job complete: job_local_0001
> 
>> 12/06/19 13:33:52 INFO mapred.JobClient: Counters: 0
> 
>> Exception in thread "main" java.lang.InterruptedException: K-Means Iteration
> 
>> failed processing /tmp/mahout-work-hduser/reuters-kmeans-clusters/part-
> 
>> randomSeed
> 
>>       at
> 
>> org.apache.mahout.clustering.kmeans.KMeansDriver.runIteration(KMeansDriver.java:
> 
>> 371)
> 
>>       at
> 
>> org.apache.mahout.clustering.kmeans.KMeansDriver.buildClustersMR(KMeansDriver.ja
> 
>> va:316)
> 
>>       at
> 
>> org.apache.mahout.clustering.kmeans.KMeansDriver.buildClusters(KMeansDriver.java
> 
>> :239)
> 
>>       at
> 
>> org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:154)
> 
>>       at
> 
>> org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:112)
> 
>>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> 
>>       at
> 
>> org.apache.mahout.clustering.kmeans.KMeansDriver.main(KMeansDriver.java:61)
> 
>>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 
>>       at
> 
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> 
>>       at
> 
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.jav
> 
>> a:43)
> 
>>       at java.lang.reflect.Method.invoke(Method.java:601)
> 
>>       at
> 
>> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.jav
> 
>> a:68)
> 
>>       at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> 
>>       at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)
> 
>> 
> 
>> ###########################
> 
>> 
> 
>> 
> 
>> 5)problem also with "./build-reuters" but lda (but look at 8) ! )
> 
>> ############################
> 
>> 12/06/19 13:40:01 WARN mapred.LocalJobRunner: job_local_0001
> 
>> java.lang.IllegalArgumentException
> 
>>       at
> 
>> com.google.common.base.Preconditions.checkArgument(Preconditions.java:72)
> 
>>       at
> 
>> org.apache.mahout.clustering.lda.LDADriver.createState(LDADriver.java:124)
> 
>>       at
> 
>> org.apache.mahout.clustering.lda.LDADriver.createState(LDADriver.java:92)
> 
>>       at
> 
>> org.apache.mahout.clustering.lda.LDAWordTopicMapper.configure(LDAWordTopicMapper
> 
>> .java:96)
> 
>>       at
> 
>> org.apache.mahout.clustering.lda.LDAWordTopicMapper.setup(LDAWordTopicMapper.jav
> 
>> a:102)
> 
>>       at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
> 
>>       at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
> 
>>       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> 
>>       at
> 
>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
> 
>> 12/06/19 13:40:02 INFO mapred.JobClient:  map 0% reduce 0%
> 
>> 12/06/19 13:40:02 INFO mapred.JobClient: Job complete: job_local_0001
> 
>> 12/06/19 13:40:02 INFO mapred.JobClient: Counters: 0
> 
>> Exception in thread "main" java.lang.InterruptedException: LDA Iteration failed
> 
>> processing /tmp/mahout-work-hduser/reuters-lda/state-0
> 
>>       at
> 
>> org.apache.mahout.clustering.lda.LDADriver.runIteration(LDADriver.java:449)
> 
>>       at org.apache.mahout.clustering.lda.LDADriver.run(LDADriver.java:249)
> 
>>       at org.apache.mahout.clustering.lda.LDADriver.run(LDADriver.java:169)
> 
>>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> 
>>       at org.apache.mahout.clustering.lda.LDADriver.main(LDADriver.java:88)
> 
>>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 
>>       at
> 
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> 
>>       at
> 
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.jav
> 
>> a:43)
> 
>>       at java.lang.reflect.Method.invoke(Method.java:601)
> 
>>       at
> 
>> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.jav
> 
>> a:68)
> 
>>       at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> 
>>       at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)
> 
>> ############################
> 
>> 
> 
>> 
> 
>> 6)But i was starting "./build-reuters" with dirichlet clustering and it wrote
> 
>> 20clusters without problems (but look at 8) ! )
> 
>> The result is :
> 
>> ############################
> 
>> ...
> 
>> 12/06/19 13:45:12 INFO driver.MahoutDriver: Program took 142609 ms (Minutes:
> 
>> 2.3768166666666666)
> 
>> MAHOUT_LOCAL is set, so we don't add HADOOP_CONF_DIR to classpath.
> 
>> MAHOUT_LOCAL is set, running locally
> 
>> SLF4J: Class path contains multiple SLF4J bindings.
> 
>> SLF4J: Found binding in [jar:file:/usr/local/mahout-distribution-0.6/mahout-
> 
>> examples-0.6-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> 
>> SLF4J: Found binding in [jar:file:/usr/local/mahout-distribution-0.6/lib/slf4j-
> 
>> jcl-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> 
>> SLF4J: Found binding in [jar:file:/usr/local/mahout-distribution-0.6/lib/slf4j-
> 
>> log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> 
>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
> 
>> 12/06/19 13:45:13 INFO common.AbstractJob: Command line arguments: {--
> 
>> dictionary=/tmp/mahout-work-hduser/reuters-out-seqdir-sparse-
> 
>> dirichlet/dictionary.file-0, --dictionaryType=sequencefile, --
> 
>> distanceMeasure=org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasur
> 
>> e, --endPhase=2147483647, --numWords=20, --outputFormat=TEXT, --
> 
>> seqFileDir=/tmp/mahout-work-hduser/reuters-dirichlet/clusters-20-final, --
> 
>> startPhase=0, --substring=100, --tempDir=temp}
> 
>> DC-0 total= 0 model= DMC:0{n=0 c=[] r=[]}
> 
>>       Top Terms:
> 
>> DC-1 total= 0 model= DMC:1{n=0 c=[] r=[]}
> 
>>       Top Terms:
> 
>> DC-10 total= 0 model= DMC:10{n=0 c=[] r=[]}
> 
>>       Top Terms:
> 
>> DC-11 total= 0 model= DMC:11{n=0 c=[] r=[]}
> 
>>       Top Terms:
> 
>> DC-12 total= 0 model= DMC:12{n=0 c=[] r=[]}
> 
>>       Top Terms:
> 
>> DC-13 total= 0 model= DMC:13{n=0 c=[] r=[]}
> 
>>       Top Terms:
> 
>> DC-14 total= 0 model= DMC:14{n=0 c=[] r=[]}
> 
>>       Top Terms:
> 
>> DC-15 total= 0 model= DMC:15{n=0 c=[] r=[]}
> 
>>       Top Terms:
> 
>> DC-16 total= 0 model= DMC:16{n=0 c=[] r=[]}
> 
>>       Top Terms:
> 
>> DC-17 total= 0 model= DMC:17{n=0 c=[] r=[]}
> 
>>       Top Terms:
> 
>> DC-18 total= 0 model= DMC:18{n=0 c=[] r=[]}
> 
>>       Top Terms:
> 
>> DC-19 total= 0 model= DMC:19{n=0 c=[] r=[]}
> 
>>       Top Terms:
> 
>> DC-2 total= 0 model= DMC:2{n=0 c=[] r=[]}
> 
>>       Top Terms:
> 
>> DC-3 total= 0 model= DMC:3{n=0 c=[] r=[]}
> 
>>       Top Terms:
> 
>> DC-4 total= 0 model= DMC:4{n=0 c=[] r=[]}
> 
>>       Top Terms:
> 
>> DC-5 total= 0 model= DMC:5{n=0 c=[] r=[]}
> 
>>       Top Terms:
> 
>> DC-6 total= 0 model= DMC:6{n=0 c=[] r=[]}
> 
>>       Top Terms:
> 
>> DC-7 total= 0 model= DMC:7{n=0 c=[] r=[]}
> 
>>       Top Terms:
> 
>> DC-8 total= 0 model= DMC:8{n=0 c=[] r=[]}
> 
>>       Top Terms:
> 
>> DC-9 total= 0 model= DMC:9{n=0 c=[] r=[]}
> 
>>       Top Terms:
> 
>> 12/06/19 13:45:14 INFO clustering.ClusterDumper: Wrote 20 clusters
> 
>> 12/06/19 13:45:14 INFO driver.MahoutDriver: Program took 789 ms (Minutes:
> 
>> 0.01315)
> 
>> ############################
> 
>> 
> 
>> 
> 
>> 7) And the end : "./build-reuters" with minhash clustering.
> 
>> Works good!
> 
>> 
> 
>> 
> 
>> 8) For 4),5),6) and 7) there is SUCCESS file in /tmp/mahout-work-hduser/
> 
>> 
> 
>> ...
> 
>> 
> 
>> 
> 
>> 
> 
>> Thanks everybody
> 
>> Regards
> 
>> 
> 
> 
> 
> --
> 
> Lance Norskog
> 
> goksron <at> gmail.com
> 
> 
> 
> 
> 
> Think green - keep it on the screen.
> 
> This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you.
> 

--------------------------------------------
Grant Ingersoll
http://www.lucidimagination.com