You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Videnova, Svetlana" <sv...@logica.com> on 2012/06/20 09:36:43 UTC
several info
How do you want to combine Mahout and Solr? => that's was my question
I was using mahout0.6 but from yesterday Mahout0.7.
So I was trying to run (just for test and making sure that everything works properly)
###############################################################################################################
:/usr/local/mahout-distribution-0.7/examples/bin$ ./build-cluster-syntheticcontrol.sh
Please call cluster-syntheticcontrol.sh directly next time. This file is going away.
Please select a number to choose the corresponding clustering algorithm
1. canopy clustering
2. kmeans clustering
3. fuzzykmeans clustering
4. dirichlet clustering
5. meanshift clustering
Enter your choice : 1
ok. You chose 1 and we'll use canopy Clustering
creating work directory at /tmp/mahout-work-hduser
Downloading Synthetic control data
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- 0:01:03 --:--:-- 0curl: (7) couldn't connect to host
Checking the health of DFS...
Warning: $HADOOP_HOME is deprecated.
Found 4 items
drwxr-xr-x - hduser supergroup 0 2012-06-18 14:05 /user/hduser/gutenberg
drwxr-xr-x - hduser supergroup 0 2012-06-18 14:07 /user/hduser/gutenberg-output
drwxr-xr-x - hduser supergroup 0 2012-06-18 15:35 /user/hduser/output
drwxr-xr-x - hduser supergroup 0 2012-06-19 14:24 /user/hduser/testdata
DFS is healthy...
Uploading Synthetic control data to HDFS
Warning: $HADOOP_HOME is deprecated.
Deleted hdfs://localhost:54310/user/hduser/testdata
Warning: $HADOOP_HOME is deprecated.
Warning: $HADOOP_HOME is deprecated.
put: File /tmp/mahout-work-hduser/synthetic_control.data does not exist.
Successfully Uploaded Synthetic control data to HDFS
Warning: $HADOOP_HOME is deprecated.
Running on hadoop, using /usr/local/hadoop/bin/hadoop and HADOOP_CONF_DIR=
MAHOUT-JOB: /usr/local/mahout-distribution-0.7/mahout-examples-0.7-job.jar
Warning: $HADOOP_HOME is deprecated.
12/06/20 08:20:24 WARN driver.MahoutDriver: No org.apache.mahout.clustering.syntheticcontrol.canopy.Job.props found on classpath, will use command-line arguments only
12/06/20 08:20:24 INFO canopy.Job: Running with default arguments
12/06/20 08:20:25 INFO common.HadoopUtil: Deleting output
12/06/20 08:20:26 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
12/06/20 08:20:28 INFO input.FileInputFormat: Total input paths to process : 0
12/06/20 08:20:28 INFO mapred.JobClient: Running job: job_201206181326_0030
12/06/20 08:20:29 INFO mapred.JobClient: map 0% reduce 0%
12/06/20 08:20:52 INFO mapred.JobClient: Job complete: job_201206181326_0030
12/06/20 08:20:52 INFO mapred.JobClient: Counters: 4
12/06/20 08:20:52 INFO mapred.JobClient: Job Counters
12/06/20 08:20:52 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=10970
12/06/20 08:20:52 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
12/06/20 08:20:52 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
12/06/20 08:20:52 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0
12/06/20 08:20:52 INFO canopy.CanopyDriver: Build Clusters Input: output/data Out: output Measure: org.apache.mahout.common.distance.EuclideanDistanceMeasure@c5967f t1: 80.0 t2: 55.0
12/06/20 08:20:52 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
12/06/20 08:20:53 INFO input.FileInputFormat: Total input paths to process : 0
12/06/20 08:20:53 INFO mapred.JobClient: Running job: job_201206181326_0031
12/06/20 08:20:54 INFO mapred.JobClient: map 0% reduce 0%
12/06/20 08:21:17 INFO mapred.JobClient: map 0% reduce 100%
12/06/20 08:21:22 INFO mapred.JobClient: Job complete: job_201206181326_0031
12/06/20 08:21:22 INFO mapred.JobClient: Counters: 19
12/06/20 08:21:22 INFO mapred.JobClient: Job Counters
12/06/20 08:21:22 INFO mapred.JobClient: Launched reduce tasks=1
12/06/20 08:21:22 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=9351
12/06/20 08:21:22 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
12/06/20 08:21:22 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
12/06/20 08:21:22 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=7740
12/06/20 08:21:22 INFO mapred.JobClient: File Output Format Counters
12/06/20 08:21:22 INFO mapred.JobClient: Bytes Written=106
12/06/20 08:21:22 INFO mapred.JobClient: FileSystemCounters
12/06/20 08:21:22 INFO mapred.JobClient: FILE_BYTES_WRITTEN=22545
12/06/20 08:21:22 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=106
12/06/20 08:21:22 INFO mapred.JobClient: Map-Reduce Framework
12/06/20 08:21:22 INFO mapred.JobClient: Reduce input groups=0
12/06/20 08:21:22 INFO mapred.JobClient: Combine output records=0
12/06/20 08:21:22 INFO mapred.JobClient: Reduce shuffle bytes=0
12/06/20 08:21:22 INFO mapred.JobClient: Physical memory (bytes) snapshot=40652800
12/06/20 08:21:22 INFO mapred.JobClient: Reduce output records=0
12/06/20 08:21:22 INFO mapred.JobClient: Spilled Records=0
12/06/20 08:21:22 INFO mapred.JobClient: CPU time spent (ms)=420
12/06/20 08:21:22 INFO mapred.JobClient: Total committed heap usage (bytes)=16252928
12/06/20 08:21:22 INFO mapred.JobClient: Virtual memory (bytes) snapshot=383250432
12/06/20 08:21:22 INFO mapred.JobClient: Combine input records=0
12/06/20 08:21:22 INFO mapred.JobClient: Reduce input records=0
12/06/20 08:21:22 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
12/06/20 08:21:23 INFO input.FileInputFormat: Total input paths to process : 0
12/06/20 08:21:23 INFO mapred.JobClient: Running job: job_201206181326_0032
12/06/20 08:21:24 INFO mapred.JobClient: map 0% reduce 0%
12/06/20 08:21:43 INFO mapred.JobClient: Job complete: job_201206181326_0032
12/06/20 08:21:43 INFO mapred.JobClient: Counters: 4
12/06/20 08:21:43 INFO mapred.JobClient: Job Counters
12/06/20 08:21:43 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=9347
12/06/20 08:21:43 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
12/06/20 08:21:43 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
12/06/20 08:21:43 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0
12/06/20 08:21:43 INFO clustering.ClusterDumper: Wrote 0 clusters
12/06/20 08:21:43 INFO driver.MahoutDriver: Program took 78406 ms (Minutes: 1.3067666666666666)
###############################################################################################
How do you want to combine Mahout and Solr? Also, Solr is a web
service and can receive and supply data in several different formats.
On Tue, Jun 19, 2012 at 6:04 AM, Paritosh Ranjan <pranjan <at> xebia.com> wrote:
> Regarding the errors,
> which version of Mahout are you using?
> There was some problem in cluster-reuters.sh ( build-reuters.sh calls cluster-reuters.sh ) which has
been fixed in the last release 0.7.
> ________________________________________
> From: Svet [svetlana.videnova <at> logica.com]
> Sent: Tuesday, June 19, 2012 2:51 PM
> To: user <at> mahout.apache.org
> Subject: several info
>
> Hi all,
>
>
> First of all i would like to thanks Praveenesh Kumar for helping me with hadoop
> and mahout!!!
>
> Nevertheless i have several questions about Mahout.
>
> 1) I need Mahout working with SOLR. Can somebody give me a great tutorial to
> make them starting together?
>
> 2)What exactly the possibilities of input and output files of Mahout (especially
> when Mahout works with SOLR, i know that output file of SOLR is XML)?
>
> 3)Which of thoses algorythms are using Hadoop? And please complete the list if i
> forgot some.
> -Canopy, KMeans, Dirichlet, Mean-shift, Latent Dirichlet Allocation
>
>
>
>
> 4)Moreover i was trying to run "./build-reuters.sh" and choosing kmeans
> clustering (but its the same error with fuzzykmeans)
> Can somebody help me with this error? (but look at 8) ! )
> ###########################
> 12/06/19 13:33:52 WARN mapred.LocalJobRunner: job_local_0001
> java.lang.IllegalStateException: No clusters found. Check your -c path.
> at
> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:59)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
> 12/06/19 13:33:52 INFO mapred.JobClient: map 0% reduce 0%
> 12/06/19 13:33:52 INFO mapred.JobClient: Job complete: job_local_0001
> 12/06/19 13:33:52 INFO mapred.JobClient: Counters: 0
> Exception in thread "main" java.lang.InterruptedException: K-Means Iteration
> failed processing /tmp/mahout-work-hduser/reuters-kmeans-clusters/part-
> randomSeed
> at
> org.apache.mahout.clustering.kmeans.KMeansDriver.runIteration(KMeansDriver.java:
> 371)
> at
> org.apache.mahout.clustering.kmeans.KMeansDriver.buildClustersMR(KMeansDriver.ja
> va:316)
> at
> org.apache.mahout.clustering.kmeans.KMeansDriver.buildClusters(KMeansDriver.java
> :239)
> at
> org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:154)
> at
> org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:112)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at
> org.apache.mahout.clustering.kmeans.KMeansDriver.main(KMeansDriver.java:61)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.jav
> a:43)
> at java.lang.reflect.Method.invoke(Method.java:601)
> at
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.jav
> a:68)
> at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)
>
> ###########################
>
>
> 5)problem also with "./build-reuters" but lda (but look at 8) ! )
> ############################
> 12/06/19 13:40:01 WARN mapred.LocalJobRunner: job_local_0001
> java.lang.IllegalArgumentException
> at
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:72)
> at
> org.apache.mahout.clustering.lda.LDADriver.createState(LDADriver.java:124)
> at
> org.apache.mahout.clustering.lda.LDADriver.createState(LDADriver.java:92)
> at
> org.apache.mahout.clustering.lda.LDAWordTopicMapper.configure(LDAWordTopicMapper
> .java:96)
> at
> org.apache.mahout.clustering.lda.LDAWordTopicMapper.setup(LDAWordTopicMapper.jav
> a:102)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
> 12/06/19 13:40:02 INFO mapred.JobClient: map 0% reduce 0%
> 12/06/19 13:40:02 INFO mapred.JobClient: Job complete: job_local_0001
> 12/06/19 13:40:02 INFO mapred.JobClient: Counters: 0
> Exception in thread "main" java.lang.InterruptedException: LDA Iteration failed
> processing /tmp/mahout-work-hduser/reuters-lda/state-0
> at
> org.apache.mahout.clustering.lda.LDADriver.runIteration(LDADriver.java:449)
> at org.apache.mahout.clustering.lda.LDADriver.run(LDADriver.java:249)
> at org.apache.mahout.clustering.lda.LDADriver.run(LDADriver.java:169)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at org.apache.mahout.clustering.lda.LDADriver.main(LDADriver.java:88)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.jav
> a:43)
> at java.lang.reflect.Method.invoke(Method.java:601)
> at
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.jav
> a:68)
> at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)
> ############################
>
>
> 6)But i was starting "./build-reuters" with dirichlet clustering and it wrote
> 20clusters without problems (but look at 8) ! )
> The result is :
> ############################
> ...
> 12/06/19 13:45:12 INFO driver.MahoutDriver: Program took 142609 ms (Minutes:
> 2.3768166666666666)
> MAHOUT_LOCAL is set, so we don't add HADOOP_CONF_DIR to classpath.
> MAHOUT_LOCAL is set, running locally
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in [jar:file:/usr/local/mahout-distribution-0.6/mahout-
> examples-0.6-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in [jar:file:/usr/local/mahout-distribution-0.6/lib/slf4j-
> jcl-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in [jar:file:/usr/local/mahout-distribution-0.6/lib/slf4j-
> log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
> 12/06/19 13:45:13 INFO common.AbstractJob: Command line arguments: {--
> dictionary=/tmp/mahout-work-hduser/reuters-out-seqdir-sparse-
> dirichlet/dictionary.file-0, --dictionaryType=sequencefile, --
> distanceMeasure=org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasur
> e, --endPhase=2147483647, --numWords=20, --outputFormat=TEXT, --
> seqFileDir=/tmp/mahout-work-hduser/reuters-dirichlet/clusters-20-final, --
> startPhase=0, --substring=100, --tempDir=temp}
> DC-0 total= 0 model= DMC:0{n=0 c=[] r=[]}
> Top Terms:
> DC-1 total= 0 model= DMC:1{n=0 c=[] r=[]}
> Top Terms:
> DC-10 total= 0 model= DMC:10{n=0 c=[] r=[]}
> Top Terms:
> DC-11 total= 0 model= DMC:11{n=0 c=[] r=[]}
> Top Terms:
> DC-12 total= 0 model= DMC:12{n=0 c=[] r=[]}
> Top Terms:
> DC-13 total= 0 model= DMC:13{n=0 c=[] r=[]}
> Top Terms:
> DC-14 total= 0 model= DMC:14{n=0 c=[] r=[]}
> Top Terms:
> DC-15 total= 0 model= DMC:15{n=0 c=[] r=[]}
> Top Terms:
> DC-16 total= 0 model= DMC:16{n=0 c=[] r=[]}
> Top Terms:
> DC-17 total= 0 model= DMC:17{n=0 c=[] r=[]}
> Top Terms:
> DC-18 total= 0 model= DMC:18{n=0 c=[] r=[]}
> Top Terms:
> DC-19 total= 0 model= DMC:19{n=0 c=[] r=[]}
> Top Terms:
> DC-2 total= 0 model= DMC:2{n=0 c=[] r=[]}
> Top Terms:
> DC-3 total= 0 model= DMC:3{n=0 c=[] r=[]}
> Top Terms:
> DC-4 total= 0 model= DMC:4{n=0 c=[] r=[]}
> Top Terms:
> DC-5 total= 0 model= DMC:5{n=0 c=[] r=[]}
> Top Terms:
> DC-6 total= 0 model= DMC:6{n=0 c=[] r=[]}
> Top Terms:
> DC-7 total= 0 model= DMC:7{n=0 c=[] r=[]}
> Top Terms:
> DC-8 total= 0 model= DMC:8{n=0 c=[] r=[]}
> Top Terms:
> DC-9 total= 0 model= DMC:9{n=0 c=[] r=[]}
> Top Terms:
> 12/06/19 13:45:14 INFO clustering.ClusterDumper: Wrote 20 clusters
> 12/06/19 13:45:14 INFO driver.MahoutDriver: Program took 789 ms (Minutes:
> 0.01315)
> ############################
>
>
> 7) And the end : "./build-reuters" with minhash clustering.
> Works good!
>
>
> 8) For 4),5),6) and 7) there is SUCCESS file in /tmp/mahout-work-hduser/
>
> ...
>
>
>
> Thanks everybody
> Regards
>
--
Lance Norskog
goksron <at> gmail.com
Think green - keep it on the screen.
This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you.
RE: several info
Posted by "Videnova, Svetlana" <sv...@logica.com>.
Hi,
I have database (which is evolving all the time). This database, after solr indexion, is xml file thanks to the solr output.
Then I have to give this xml file to mahout in order to mahout be able to classify and clusterize those information. Then I have to parse again the output of mahout in order to display on my screen this database information that I needed.
Regards
-----Message d'origine-----
De : Lance Norskog [mailto:goksron@gmail.com]
Envoyé : dimanche 24 juin 2012 02:03
À : dev@mahout.apache.org
Objet : Re: several info
Please describe what you would like to do. What would you like to learn from your data? We cannot recommend techniques until we know this.
On Fri, Jun 22, 2012 at 5:54 AM, Videnova, Svetlana <sv...@logica.com> wrote:
> HI,
> Sorry I didnt find how the source code of this link https://github.com/gsingers/ApacheCon2010 can help me, maybe I missed some information...
> I'm ok about doing some code that's no problem but where and with which purpose??I mean for the moment I don't even know what hadoop/mahout/solr needs to work together and moreover what I have to add on already existing files in order to add my own database.
>
> Thanks
>
> Regards
>
>
> -----Message d'origine-----
> De : Grant Ingersoll [mailto:gsingers@apache.org] Envoyé : vendredi 22
> juin 2012 13:41 À : dev@mahout.apache.org Objet : Re: several info
>
>
> On Jun 22, 2012, at 2:30 AM, Videnova, Svetlana wrote:
>
>> Hi Grant,
>>
>> Thank you for your fast answer.
>> My question was where can I get the output file on Mahout and how can I vectorize the indexing solr files, and where put them in Mahout?
>
> I think that link I provided shows how to get data out of Solr and into Mahout. You need term vectors (or see https://issues.apache.org/jira/browse/MAHOUT-944 for using it off of stored fields). To get things back into Solr, you'll have to write some code to do that.
>
> -Grant
>
> Think green - keep it on the screen.
>
> This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you.
>
>
--
Lance Norskog
goksron@gmail.com
Think green - keep it on the screen.
This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you.
RE: several info
Posted by "Videnova, Svetlana" <sv...@logica.com>.
Apparently i had a proxy problem.
Now I run ./example/build-cluster-syntheticcontrol.sh
And after all info and numbers I have got this output=>
12/06/26 08:29:17 INFO clustering.ClusterDumper: Wrote 12 clusters
12/06/26 08:29:17 INFO driver.MahoutDriver: Program took 451592 ms (Minutes: 7.526533333333333)
Hadoop works properly.
My hadoop version is :
#####################
/usr/local/hadoop$ ls
bin hadoop-ant-1.0.3.jar ivy README.txt
build.xml hadoop-client-1.0.3.jar ivy.xml sbin
c++ hadoop-core-1.0.3.jar lib share
CHANGES.txt hadoop-examples-1.0.3.jar libexec src
conf hadoop-minicluster-1.0.3.jar LICENSE.txt webapps
contrib hadoop-test-1.0.3.jar logs
docs hadoop-tools-1.0.3.jar NOTICE.txt
/usr/local/hadoop$ hadoop -version
Warning: $HADOOP_HOME is deprecated.
java version "1.6.0_32"
Java(TM) SE Runtime Environment (build 1.6.0_32-b05)
Java HotSpot(TM) Client VM (build 20.7-b02, mixed mode)
#####################
$ hadoop fs -lsr
#######################################
/usr/local/hadoop$ hadoop fs -lsr
Warning: $HADOOP_HOME is deprecated.
drwxr-xr-x - hduser supergroup 0 2012-06-18 14:05 /user/hduser/gutenberg
-rw-r--r-- 1 hduser supergroup 674566 2012-06-18 14:05 /user/hduser/gutenberg/pg20417.txt
-rw-r--r-- 1 hduser supergroup 1573150 2012-06-18 14:05 /user/hduser/gutenberg/pg4300.txt
-rw-r--r-- 1 hduser supergroup 1423801 2012-06-18 14:05 /user/hduser/gutenberg/pg5000.txt
drwxr-xr-x - hduser supergroup 0 2012-06-18 14:07 /user/hduser/gutenberg-output
-rw-r--r-- 1 hduser supergroup 0 2012-06-18 14:07 /user/hduser/gutenberg-output/_SUCCESS
drwxr-xr-x - hduser supergroup 0 2012-06-18 14:06 /user/hduser/gutenberg-output/_logs
drwxr-xr-x - hduser supergroup 0 2012-06-18 14:06 /user/hduser/gutenberg-output/_logs/history
-rw-r--r-- 1 hduser supergroup 19419 2012-06-18 14:06 /user/hduser/gutenberg-output/_logs/history/job_201206181326_0001_1340021186120_hduser_word+count
-rw-r--r-- 1 hduser supergroup 20388 2012-06-18 14:06 /user/hduser/gutenberg-output/_logs/history/job_201206181326_0001_conf.xml
-rw-r--r-- 1 hduser supergroup 880838 2012-06-18 14:06 /user/hduser/gutenberg-output/part-r-00000
drwxr-xr-x - hduser supergroup 0 2012-06-26 08:28 /user/hduser/output
-rw-r--r-- 1 hduser supergroup 194 2012-06-26 08:28 /user/hduser/output/_policy
drwxr-xr-x - hduser supergroup 0 2012-06-26 08:29 /user/hduser/output/clusteredPoints
-rw-r--r-- 1 hduser supergroup 0 2012-06-26 08:29 /user/hduser/output/clusteredPoints/_SUCCESS
drwxr-xr-x - hduser supergroup 0 2012-06-26 08:28 /user/hduser/output/clusteredPoints/_logs
drwxr-xr-x - hduser supergroup 0 2012-06-26 08:28 /user/hduser/output/clusteredPoints/_logs/history
-rw-r--r-- 1 hduser supergroup 9145 2012-06-26 08:28 /user/hduser/output/clusteredPoints/_logs/history/job_201206260820_0012_1340692129661_hduser_Cluster+Classification+Driver+running+over+input%3A+
-rw-r--r-- 1 hduser supergroup 20557 2012-06-26 08:28 /user/hduser/output/clusteredPoints/_logs/history/job_201206260820_0012_conf.xml
-rw-r--r-- 1 hduser supergroup 340900 2012-06-26 08:29 /user/hduser/output/clusteredPoints/part-m-00000
drwxr-xr-x - hduser supergroup 0 2012-06-26 08:22 /user/hduser/output/clusters-0
-rw-r--r-- 1 hduser supergroup 194 2012-06-26 08:22 /user/hduser/output/clusters-0/_policy
-rw-r--r-- 1 hduser supergroup 1891 2012-06-26 08:22 /user/hduser/output/clusters-0/part-00000
-rw-r--r-- 1 hduser supergroup 1891 2012-06-26 08:22 /user/hduser/output/clusters-0/part-00001
-rw-r--r-- 1 hduser supergroup 1891 2012-06-26 08:22 /user/hduser/output/clusters-0/part-00002
-rw-r--r-- 1 hduser supergroup 1891 2012-06-26 08:22 /user/hduser/output/clusters-0/part-00003
-rw-r--r-- 1 hduser supergroup 1891 2012-06-26 08:22 /user/hduser/output/clusters-0/part-00004
-rw-r--r-- 1 hduser supergroup 1891 2012-06-26 08:22 /user/hduser/output/clusters-0/part-00005
-rw-r--r-- 1 hduser supergroup 7331 2012-06-26 08:22 /user/hduser/output/clusters-0/part-randomSeed
drwxr-xr-x - hduser supergroup 0 2012-06-26 08:22 /user/hduser/output/clusters-1
-rw-r--r-- 1 hduser supergroup 0 2012-06-26 08:22 /user/hduser/output/clusters-1/_SUCCESS
drwxr-xr-x - hduser supergroup 0 2012-06-26 08:22 /user/hduser/output/clusters-1/_logs
drwxr-xr-x - hduser supergroup 0 2012-06-26 08:22 /user/hduser/output/clusters-1/_logs/history
-rw-r--r-- 1 hduser supergroup 13708 2012-06-26 08:22 /user/hduser/output/clusters-1/_logs/history/job_201206260820_0002_1340691736541_hduser_Cluster+Iterator+running+iteration+1+over+priorPat
-rw-r--r-- 1 hduser supergroup 20872 2012-06-26 08:22 /user/hduser/output/clusters-1/_logs/history/job_201206260820_0002_conf.xml
-rw-r--r-- 1 hduser supergroup 194 2012-06-26 08:22 /user/hduser/output/clusters-1/_policy
-rw-r--r-- 1 hduser supergroup 11809 2012-06-26 08:22 /user/hduser/output/clusters-1/part-r-00000
drwxr-xr-x - hduser supergroup 0 2012-06-26 08:28 /user/hduser/output/clusters-10-final
-rw-r--r-- 1 hduser supergroup 0 2012-06-26 08:28 /user/hduser/output/clusters-10-final/_SUCCESS
drwxr-xr-x - hduser supergroup 0 2012-06-26 08:28 /user/hduser/output/clusters-10-final/_logs
drwxr-xr-x - hduser supergroup 0 2012-06-26 08:28 /user/hduser/output/clusters-10-final/_logs/history
-rw-r--r-- 1 hduser supergroup 13723 2012-06-26 08:28 /user/hduser/output/clusters-10-final/_logs/history/job_201206260820_0011_1340692090891_hduser_Cluster+Iterator+running+iteration+10+over+priorPa
-rw-r--r-- 1 hduser supergroup 20874 2012-06-26 08:28 /user/hduser/output/clusters-10-final/_logs/history/job_201206260820_0011_conf.xml
-rw-r--r-- 1 hduser supergroup 194 2012-06-26 08:28 /user/hduser/output/clusters-10-final/_policy
-rw-r--r-- 1 hduser supergroup 13989 2012-06-26 08:28 /user/hduser/output/clusters-10-final/part-r-00000
drwxr-xr-x - hduser supergroup 0 2012-06-26 08:23 /user/hduser/output/clusters-2
-rw-r--r-- 1 hduser supergroup 0 2012-06-26 08:23 /user/hduser/output/clusters-2/_SUCCESS
drwxr-xr-x - hduser supergroup 0 2012-06-26 08:22 /user/hduser/output/clusters-2/_logs
drwxr-xr-x - hduser supergroup 0 2012-06-26 08:22 /user/hduser/output/clusters-2/_logs/history
-rw-r--r-- 1 hduser supergroup 13708 2012-06-26 08:22 /user/hduser/output/clusters-2/_logs/history/job_201206260820_0003_1340691778216_hduser_Cluster+Iterator+running+iteration+2+over+priorPat
-rw-r--r-- 1 hduser supergroup 20872 2012-06-26 08:22 /user/hduser/output/clusters-2/_logs/history/job_201206260820_0003_conf.xml
-rw-r--r-- 1 hduser supergroup 194 2012-06-26 08:23 /user/hduser/output/clusters-2/_policy
-rw-r--r-- 1 hduser supergroup 12909 2012-06-26 08:23 /user/hduser/output/clusters-2/part-r-00000
drwxr-xr-x - hduser supergroup 0 2012-06-26 08:24 /user/hduser/output/clusters-3
-rw-r--r-- 1 hduser supergroup 0 2012-06-26 08:24 /user/hduser/output/clusters-3/_SUCCESS
drwxr-xr-x - hduser supergroup 0 2012-06-26 08:23 /user/hduser/output/clusters-3/_logs
drwxr-xr-x - hduser supergroup 0 2012-06-26 08:23 /user/hduser/output/clusters-3/_logs/history
-rw-r--r-- 1 hduser supergroup 13722 2012-06-26 08:23 /user/hduser/output/clusters-3/_logs/history/job_201206260820_0004_1340691817118_hduser_Cluster+Iterator+running+iteration+3+over+priorPat
-rw-r--r-- 1 hduser supergroup 20872 2012-06-26 08:23 /user/hduser/output/clusters-3/_logs/history/job_201206260820_0004_conf.xml
-rw-r--r-- 1 hduser supergroup 194 2012-06-26 08:24 /user/hduser/output/clusters-3/_policy
-rw-r--r-- 1 hduser supergroup 13449 2012-06-26 08:24 /user/hduser/output/clusters-3/part-r-00000
drwxr-xr-x - hduser supergroup 0 2012-06-26 08:24 /user/hduser/output/clusters-4
-rw-r--r-- 1 hduser supergroup 0 2012-06-26 08:24 /user/hduser/output/clusters-4/_SUCCESS
drwxr-xr-x - hduser supergroup 0 2012-06-26 08:24 /user/hduser/output/clusters-4/_logs
drwxr-xr-x - hduser supergroup 0 2012-06-26 08:24 /user/hduser/output/clusters-4/_logs/history
-rw-r--r-- 1 hduser supergroup 13722 2012-06-26 08:24 /user/hduser/output/clusters-4/_logs/history/job_201206260820_0005_1340691855706_hduser_Cluster+Iterator+running+iteration+4+over+priorPat
-rw-r--r-- 1 hduser supergroup 20872 2012-06-26 08:24 /user/hduser/output/clusters-4/_logs/history/job_201206260820_0005_conf.xml
-rw-r--r-- 1 hduser supergroup 194 2012-06-26 08:24 /user/hduser/output/clusters-4/_policy
-rw-r--r-- 1 hduser supergroup 13989 2012-06-26 08:24 /user/hduser/output/clusters-4/part-r-00000
drwxr-xr-x - hduser supergroup 0 2012-06-26 08:25 /user/hduser/output/clusters-5
-rw-r--r-- 1 hduser supergroup 0 2012-06-26 08:25 /user/hduser/output/clusters-5/_SUCCESS
drwxr-xr-x - hduser supergroup 0 2012-06-26 08:24 /user/hduser/output/clusters-5/_logs
drwxr-xr-x - hduser supergroup 0 2012-06-26 08:24 /user/hduser/output/clusters-5/_logs/history
-rw-r--r-- 1 hduser supergroup 13706 2012-06-26 08:24 /user/hduser/output/clusters-5/_logs/history/job_201206260820_0006_1340691895472_hduser_Cluster+Iterator+running+iteration+5+over+priorPat
-rw-r--r-- 1 hduser supergroup 20872 2012-06-26 08:24 /user/hduser/output/clusters-5/_logs/history/job_201206260820_0006_conf.xml
-rw-r--r-- 1 hduser supergroup 194 2012-06-26 08:25 /user/hduser/output/clusters-5/_policy
-rw-r--r-- 1 hduser supergroup 13989 2012-06-26 08:25 /user/hduser/output/clusters-5/part-r-00000
drwxr-xr-x - hduser supergroup 0 2012-06-26 08:26 /user/hduser/output/clusters-6
-rw-r--r-- 1 hduser supergroup 0 2012-06-26 08:26 /user/hduser/output/clusters-6/_SUCCESS
drwxr-xr-x - hduser supergroup 0 2012-06-26 08:25 /user/hduser/output/clusters-6/_logs
drwxr-xr-x - hduser supergroup 0 2012-06-26 08:25 /user/hduser/output/clusters-6/_logs/history
-rw-r--r-- 1 hduser supergroup 13722 2012-06-26 08:25 /user/hduser/output/clusters-6/_logs/history/job_201206260820_0007_1340691934345_hduser_Cluster+Iterator+running+iteration+6+over+priorPat
-rw-r--r-- 1 hduser supergroup 20872 2012-06-26 08:25 /user/hduser/output/clusters-6/_logs/history/job_201206260820_0007_conf.xml
-rw-r--r-- 1 hduser supergroup 194 2012-06-26 08:26 /user/hduser/output/clusters-6/_policy
-rw-r--r-- 1 hduser supergroup 13989 2012-06-26 08:26 /user/hduser/output/clusters-6/part-r-00000
drwxr-xr-x - hduser supergroup 0 2012-06-26 08:26 /user/hduser/output/clusters-7
-rw-r--r-- 1 hduser supergroup 0 2012-06-26 08:26 /user/hduser/output/clusters-7/_SUCCESS
drwxr-xr-x - hduser supergroup 0 2012-06-26 08:26 /user/hduser/output/clusters-7/_logs
drwxr-xr-x - hduser supergroup 0 2012-06-26 08:26 /user/hduser/output/clusters-7/_logs/history
-rw-r--r-- 1 hduser supergroup 13722 2012-06-26 08:26 /user/hduser/output/clusters-7/_logs/history/job_201206260820_0008_1340691973801_hduser_Cluster+Iterator+running+iteration+7+over+priorPat
-rw-r--r-- 1 hduser supergroup 20872 2012-06-26 08:26 /user/hduser/output/clusters-7/_logs/history/job_201206260820_0008_conf.xml
-rw-r--r-- 1 hduser supergroup 194 2012-06-26 08:26 /user/hduser/output/clusters-7/_policy
-rw-r--r-- 1 hduser supergroup 13989 2012-06-26 08:26 /user/hduser/output/clusters-7/part-r-00000
drwxr-xr-x - hduser supergroup 0 2012-06-26 08:27 /user/hduser/output/clusters-8
-rw-r--r-- 1 hduser supergroup 0 2012-06-26 08:27 /user/hduser/output/clusters-8/_SUCCESS
drwxr-xr-x - hduser supergroup 0 2012-06-26 08:26 /user/hduser/output/clusters-8/_logs
drwxr-xr-x - hduser supergroup 0 2012-06-26 08:26 /user/hduser/output/clusters-8/_logs/history
-rw-r--r-- 1 hduser supergroup 13722 2012-06-26 08:26 /user/hduser/output/clusters-8/_logs/history/job_201206260820_0009_1340692013041_hduser_Cluster+Iterator+running+iteration+8+over+priorPat
-rw-r--r-- 1 hduser supergroup 20872 2012-06-26 08:26 /user/hduser/output/clusters-8/_logs/history/job_201206260820_0009_conf.xml
-rw-r--r-- 1 hduser supergroup 194 2012-06-26 08:27 /user/hduser/output/clusters-8/_policy
-rw-r--r-- 1 hduser supergroup 13989 2012-06-26 08:27 /user/hduser/output/clusters-8/part-r-00000
drwxr-xr-x - hduser supergroup 0 2012-06-26 08:28 /user/hduser/output/clusters-9
-rw-r--r-- 1 hduser supergroup 0 2012-06-26 08:28 /user/hduser/output/clusters-9/_SUCCESS
drwxr-xr-x - hduser supergroup 0 2012-06-26 08:27 /user/hduser/output/clusters-9/_logs
drwxr-xr-x - hduser supergroup 0 2012-06-26 08:27 /user/hduser/output/clusters-9/_logs/history
-rw-r--r-- 1 hduser supergroup 13724 2012-06-26 08:27 /user/hduser/output/clusters-9/_logs/history/job_201206260820_0010_1340692051563_hduser_Cluster+Iterator+running+iteration+9+over+priorPat
-rw-r--r-- 1 hduser supergroup 20872 2012-06-26 08:27 /user/hduser/output/clusters-9/_logs/history/job_201206260820_0010_conf.xml
-rw-r--r-- 1 hduser supergroup 194 2012-06-26 08:28 /user/hduser/output/clusters-9/_policy
-rw-r--r-- 1 hduser supergroup 13989 2012-06-26 08:27 /user/hduser/output/clusters-9/part-r-00000
drwxr-xr-x - hduser supergroup 0 2012-06-26 08:22 /user/hduser/output/data
-rw-r--r-- 1 hduser supergroup 0 2012-06-26 08:22 /user/hduser/output/data/_SUCCESS
drwxr-xr-x - hduser supergroup 0 2012-06-26 08:21 /user/hduser/output/data/_logs
drwxr-xr-x - hduser supergroup 0 2012-06-26 08:21 /user/hduser/output/data/_logs/history
-rw-r--r-- 1 hduser supergroup 9125 2012-06-26 08:21 /user/hduser/output/data/_logs/history/job_201206260820_0001_1340691708676_hduser_Input+Driver+running+over+input%3A+testdata
-rw-r--r-- 1 hduser supergroup 20267 2012-06-26 08:21 /user/hduser/output/data/_logs/history/job_201206260820_0001_conf.xml
-rw-r--r-- 1 hduser supergroup 335470 2012-06-26 08:22 /user/hduser/output/data/part-m-00000
drwxr-xr-x - hduser supergroup 0 2012-06-26 08:21 /user/hduser/testdata
-rw-r--r-- 1 hduser supergroup 288374 2012-06-26 08:21 /user/hduser/testdata/synthetic_control.data
##################################
-----Message d'origine-----
De : shaposhnik@gmail.com [mailto:shaposhnik@gmail.com] De la part de Roman Shaposhnik
Envoyé : lundi 25 juin 2012 18:00
À : dev@mahout.apache.org
Objet : Re: several info
On Mon, Jun 25, 2012 at 7:48 AM, Videnova, Svetlana <sv...@logica.com> wrote:
> % Total % Received % Xferd Average Speed Time Time Time
> Current
> Dload Upload Total Spent Left
> Speed
> 0 0 0 0 0 0 0 0 --:--:-- 0:01:03
> --:--:-- 0
>
>
> curl: (7) couldn't connect to host
This is suspect. You sure your host has the type of network connectivity that allows it to connect to the outside world?
Also, what version of Hadoop are you using and how it was installed?
Finally, can you make sure that the basic stuff like:
hadoop fs -lsr .
works?
Thanks,
Roman.
Think green - keep it on the screen.
This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you.
RE: several info
Posted by "Videnova, Svetlana" <sv...@logica.com>.
Please can you help me with this error?
############################################
hduser:/usr/local/mahout-distribution-0.7/examples/bin$ ./build-reuters.sh
Please call cluster-reuters.sh directly next time. This file is going away.
Please select a number to choose the corresponding clustering algorithm
1. kmeans clustering
2. fuzzykmeans clustering
3. dirichlet clustering
4. minhash clustering
Enter your choice : 1
ok. You chose 1 and we'll use kmeans Clustering
creating work directory at /tmp/mahout-work-hduser
Downloading Reuters-21578
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 7959k 100 7959k 0 0 72556 0 0:01:52 0:01:52 --:--:-- 192k
Extracting...
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Warning: $HADOOP_HOME is deprecated.
Running on hadoop, using /usr/local/hadoop/bin/hadoop and HADOOP_CONF_DIR=/usr/local/hadoop/conf
MAHOUT-JOB: /usr/local/mahout-distribution-0.7/mahout-examples-0.7-job.jar
Warning: $HADOOP_HOME is deprecated.
12/06/26 08:53:50 WARN driver.MahoutDriver: No org.apache.lucene.benchmark.utils.ExtractReuters.props found on classpath, will use command-line arguments only
Deleting all files in /tmp/mahout-work-hduser/reuters-out-tmp
12/06/26 08:53:56 INFO driver.MahoutDriver: Program took 5613 ms (Minutes: 0.09355)
MAHOUT_LOCAL is set, so we don't add HADOOP_CONF_DIR to classpath.
Warning: $HADOOP_HOME is deprecated.
MAHOUT_LOCAL is set, running locally
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/mahout-distribution-0.7/mahout-examples-0.7-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/mahout-distribution-0.7/lib/slf4j-jcl-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/mahout-distribution-0.7/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/util/ProgramDriver
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:96)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.util.ProgramDriver
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
... 1 more
Warning: $HADOOP_HOME is deprecated.
rmr: cannot remove /tmp/mahout-work-hduser/reuters-out-seqdir: No such file or directory.
Warning: $HADOOP_HOME is deprecated.
put: File /tmp/mahout-work-hduser/reuters-out-seqdir does not exist.
###############################################################
-----Message d'origine-----
De : Sean Owen [mailto:srowen@gmail.com]
Envoyé : mardi 26 juin 2012 09:46
À : dev@mahout.apache.org
Objet : Re: several info
That is just a message from Hadoop, which you can ignore.
On Tue, Jun 26, 2012 at 8:43 AM, Videnova, Svetlana <sv...@logica.com> wrote:
> Warning: $HADOOP_HOME is deprecated : is this caused because I set HADOOP_HOME=/usr/local/hadoop?
Think green - keep it on the screen.
This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you.
Re: several info
Posted by Sean Owen <sr...@gmail.com>.
That is just a message from Hadoop, which you can ignore.
On Tue, Jun 26, 2012 at 8:43 AM, Videnova, Svetlana
<sv...@logica.com> wrote:
> Warning: $HADOOP_HOME is deprecated : is this caused because I set HADOOP_HOME=/usr/local/hadoop?
RE: several info
Posted by "Videnova, Svetlana" <sv...@logica.com>.
Warning: $HADOOP_HOME is deprecated : is this caused because I set HADOOP_HOME=/usr/local/hadoop?
-----Message d'origine-----
De : Lance Norskog [mailto:goksron@gmail.com]
Envoyé : mardi 26 juin 2012 05:16
À : dev@mahout.apache.org
Objet : Re: several info
After you get your network connection problems sorted, it will be easier if you remove your HADOOP environment variables. Mahout includes its own Hadoop. Mahout will run in local pseudo-disributed mode if you do not have HADOOP_* environment variables set.
On Mon, Jun 25, 2012 at 9:00 AM, Roman Shaposhnik <ro...@shaposhnik.org> wrote:
> On Mon, Jun 25, 2012 at 7:48 AM, Videnova, Svetlana
> <sv...@logica.com> wrote:
>> % Total % Received % Xferd Average Speed Time Time Time
>> Current
>> Dload Upload Total Spent Left
>> Speed
>> 0 0 0 0 0 0 0 0 --:--:-- 0:01:03
>> --:--:-- 0
>>
>>
>> curl: (7) couldn't connect to host
>
> This is suspect. You sure your host has the type of network
> connectivity that allows it to connect to the outside world?
>
> Also, what version of Hadoop are you using and how it was installed?
>
> Finally, can you make sure that the basic stuff like:
> hadoop fs -lsr .
>
> works?
>
> Thanks,
> Roman.
--
Lance Norskog
goksron@gmail.com
Think green - keep it on the screen.
This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you.
Re: several info
Posted by Lance Norskog <go...@gmail.com>.
After you get your network connection problems sorted, it will be
easier if you remove your HADOOP environment variables. Mahout
includes its own Hadoop. Mahout will run in local pseudo-disributed
mode if you do not have HADOOP_* environment variables set.
On Mon, Jun 25, 2012 at 9:00 AM, Roman Shaposhnik <ro...@shaposhnik.org> wrote:
> On Mon, Jun 25, 2012 at 7:48 AM, Videnova, Svetlana
> <sv...@logica.com> wrote:
>> % Total % Received % Xferd Average Speed Time Time Time Current
>> Dload Upload Total Spent Left Speed
>> 0 0 0 0 0 0 0 0 --:--:-- 0:01:03 --:--:-- 0
>>
>>
>> curl: (7) couldn't connect to host
>
> This is suspect. You sure your host has the type of network connectivity that
> allows it to connect to the outside world?
>
> Also, what version of Hadoop are you using and how it was installed?
>
> Finally, can you make sure that the basic stuff like:
> hadoop fs -lsr .
>
> works?
>
> Thanks,
> Roman.
--
Lance Norskog
goksron@gmail.com
Re: several info
Posted by Roman Shaposhnik <ro...@shaposhnik.org>.
On Mon, Jun 25, 2012 at 7:48 AM, Videnova, Svetlana
<sv...@logica.com> wrote:
> % Total % Received % Xferd Average Speed Time Time Time Current
> Dload Upload Total Spent Left Speed
> 0 0 0 0 0 0 0 0 --:--:-- 0:01:03 --:--:-- 0
>
>
> curl: (7) couldn't connect to host
This is suspect. You sure your host has the type of network connectivity that
allows it to connect to the outside world?
Also, what version of Hadoop are you using and how it was installed?
Finally, can you make sure that the basic stuff like:
hadoop fs -lsr .
works?
Thanks,
Roman.
RE: several info
Posted by "Videnova, Svetlana" <sv...@logica.com>.
Also I tried to run the example, but=>
/usr/local/mahout-distribution-0.7/examples/bin$ ./build-reuters.sh
Please call cluster-reuters.sh directly next time. This file is going away.
Please select a number to choose the corresponding clustering algorithm
1. kmeans clustering
2. fuzzykmeans clustering
3. dirichlet clustering
4. minhash clustering
Enter your choice : 1
ok. You chose 1 and we'll use kmeans Clustering
creating work directory at /tmp/mahout-work-hduser
Downloading Reuters-21578
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- 0:01:03 --:--:-- 0curl: (7) couldn't connect to host
Extracting...
tar (child): /tmp/mahout-work-hduser/reuters21578.tar.gz : open impossible: Aucun fichier ou dossier de ce type
tar (child): Error is not recoverable: exiting now
tar: Child returned status 2
tar: Error is not recoverable: exiting now
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Warning: $HADOOP_HOME is deprecated.
Running on hadoop, using /usr/local/hadoop/bin/hadoop and HADOOP_CONF_DIR=/usr/local/hadoop/conf
MAHOUT-JOB: /usr/local/mahout-distribution-0.7/mahout-examples-0.7-job.jar
Warning: $HADOOP_HOME is deprecated.
12/06/25 15:54:53 WARN driver.MahoutDriver: No org.apache.lucene.benchmark.utils.ExtractReuters.props found on classpath, will use command-line arguments only
Deleting all files in /tmp/mahout-work-hduser/reuters-out-tmp
No .sgm files in /tmp/mahout-work-hduser/reuters-sgm
12/06/25 15:54:53 INFO driver.MahoutDriver: Program took 3 ms (Minutes: 6.666666666666667E-5)
MAHOUT_LOCAL is set, so we don't add HADOOP_CONF_DIR to classpath.
Warning: $HADOOP_HOME is deprecated.
MAHOUT_LOCAL is set, running locally
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/mahout-distribution-0.7/mahout-examples-0.7-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/mahout-distribution-0.7/lib/slf4j-jcl-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/mahout-distribution-0.7/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/util/ProgramDriver
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:96)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.util.ProgramDriver
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
... 1 more
Warning: $HADOOP_HOME is deprecated.
12/06/25 15:55:01 INFO ipc.Client: Retrying connect to server: localhost/10.84.30.51:54310. Already tried 0 time(s).
12/06/25 15:55:02 INFO ipc.Client: Retrying connect to server: localhost/10.84.30.51:54310. Already tried 1 time(s).
12/06/25 15:55:03 INFO ipc.Client: Retrying connect to server: localhost/10.84.30.51:54310. Already tried 2 time(s).
12/06/25 15:55:04 INFO ipc.Client: Retrying connect to server: localhost/10.84.30.51:54310. Already tried 3 time(s).
12/06/25 15:55:05 INFO ipc.Client: Retrying connect to server: localhost/10.84.30.51:54310. Already tried 4 time(s).
12/06/25 15:55:06 INFO ipc.Client: Retrying connect to server: localhost/10.84.30.51:54310. Already tried 5 time(s).
12/06/25 15:55:07 INFO ipc.Client: Retrying connect to server: localhost/10.84.30.51:54310. Already tried 6 time(s).
12/06/25 15:55:08 INFO ipc.Client: Retrying connect to server: localhost/10.84.30.51:54310. Already tried 7 time(s).
12/06/25 15:55:09 INFO ipc.Client: Retrying connect to server: localhost/10.84.30.51:54310. Already tried 8 time(s).
12/06/25 15:55:10 INFO ipc.Client: Retrying connect to server: localhost/10.84.30.51:54310. Already tried 9 time(s).
Bad connection to FS. command aborted. exception: Call to localhost/10.84.30.51:54310 failed on connection exception: java.net.ConnectException: Connection refused
Warning: $HADOOP_HOME is deprecated.
12/06/25 15:55:12 INFO ipc.Client: Retrying connect to server: localhost/10.84.30.51:54310. Already tried 0 time(s).
12/06/25 15:55:13 INFO ipc.Client: Retrying connect to server: localhost/10.84.30.51:54310. Already tried 1 time(s).
12/06/25 15:55:14 INFO ipc.Client: Retrying connect to server: localhost/10.84.30.51:54310. Already tried 2 time(s).
12/06/25 15:55:15 INFO ipc.Client: Retrying connect to server: localhost/10.84.30.51:54310. Already tried 3 time(s).
12/06/25 15:55:16 INFO ipc.Client: Retrying connect to server: localhost/10.84.30.51:54310. Already tried 4 time(s).
12/06/25 15:55:17 INFO ipc.Client: Retrying connect to server: localhost/10.84.30.51:54310. Already tried 5 time(s).
12/06/25 15:55:18 INFO ipc.Client: Retrying connect to server: localhost/10.84.30.51:54310. Already tried 6 time(s).
12/06/25 15:55:19 INFO ipc.Client: Retrying connect to server: localhost/10.84.30.51:54310. Already tried 7 time(s).
12/06/25 15:55:20 INFO ipc.Client: Retrying connect to server: localhost/10.84.30.51:54310. Already tried 8 time(s).
12/06/25 15:55:21 INFO ipc.Client: Retrying connect to server: localhost/10.84.30.51:54310. Already tried 9 time(s).
Bad connection to FS. command aborted. exception: Call to localhost/10.84.30.51:54310 failed on connection exception: java.net.ConnectException: Connection refused
-----Message d'origine-----
De : Sean Owen [mailto:srowen@gmail.com]
Envoyé : lundi 25 juin 2012 16:33
À : dev@mahout.apache.org
Objet : Re: several info
Either you have a typo, or you are not looking at the right setting.
Is your system out of RAM with no swap or something?
On Mon, Jun 25, 2012 at 3:28 PM, Videnova, Svetlana <sv...@logica.com> wrote:
> I have 4GB RAM. I set 2GB...
Think green - keep it on the screen.
This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you.
RE: several info
Posted by "Videnova, Svetlana" <sv...@logica.com>.
Where can i looking for?
I don’t think that I have any problems with my system ...
-----Message d'origine-----
De : Sean Owen [mailto:srowen@gmail.com]
Envoyé : lundi 25 juin 2012 16:33
À : dev@mahout.apache.org
Objet : Re: several info
Either you have a typo, or you are not looking at the right setting.
Is your system out of RAM with no swap or something?
On Mon, Jun 25, 2012 at 3:28 PM, Videnova, Svetlana <sv...@logica.com> wrote:
> I have 4GB RAM. I set 2GB...
Think green - keep it on the screen.
This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you.
Re: several info
Posted by Sean Owen <sr...@gmail.com>.
Either you have a typo, or you are not looking at the right setting.
Is your system out of RAM with no swap or something?
On Mon, Jun 25, 2012 at 3:28 PM, Videnova, Svetlana
<sv...@logica.com> wrote:
> I have 4GB RAM. I set 2GB...
RE: several info
Posted by "Videnova, Svetlana" <sv...@logica.com>.
I have 4GB RAM. I set 2GB...
-----Message d'origine-----
De : Sean Owen [mailto:srowen@gmail.com]
Envoyé : lundi 25 juin 2012 16:23
À : dev@mahout.apache.org
Objet : Re: several info
This isn't specific to Mahout:
Error occurred during initialization of VM Could not reserve enough space for object heap
This means that you set a heap size that is too big for the machine.
For example, maybe you requested a 4GB heap on a 32-bit machine.
On Mon, Jun 25, 2012 at 2:41 PM, Videnova, Svetlana <sv...@logica.com> wrote:
> Please can somebody help me with this error?
>
>
> Im using mahout 0.7
>
>
> /usr/local/mahout-distribution-0.7/examples/bin$ ./build-reuters.sh
> Please call cluster-reuters.sh directly next time. This file is going away.
> Please select a number to choose the corresponding clustering
> algorithm 1. kmeans clustering 2. fuzzykmeans clustering 3. dirichlet
> clustering 4. minhash clustering Enter your choice : 1 ok. You chose 1
> and we'll use kmeans Clustering creating work directory at
> /tmp/mahout-work-hduser MAHOUT_LOCAL is set, so we don't add
> HADOOP_CONF_DIR to classpath.
> Warning: $HADOOP_HOME is deprecated.
>
> MAHOUT_LOCAL is set, running locally
> Error occurred during initialization of VM Could not reserve enough
> space for object heap
> Error: Could not create the Java Virtual Machine.
> Error: A fatal exception has occurred. Program will exit.
> Warning: $HADOOP_HOME is deprecated.
>
> rmr: cannot remove /tmp/mahout-work-hduser/reuters-out-seqdir: No such file or directory.
> Warning: $HADOOP_HOME is deprecated.
>
> put: File /tmp/mahout-work-hduser/reuters-out-seqdir does not exist.
>
Think green - keep it on the screen.
This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you.
Re: several info
Posted by Sean Owen <sr...@gmail.com>.
This isn't specific to Mahout:
Error occurred during initialization of VM
Could not reserve enough space for object heap
This means that you set a heap size that is too big for the machine.
For example, maybe you requested a 4GB heap on a 32-bit machine.
On Mon, Jun 25, 2012 at 2:41 PM, Videnova, Svetlana
<sv...@logica.com> wrote:
> Please can somebody help me with this error?
>
>
> Im using mahout 0.7
>
>
> /usr/local/mahout-distribution-0.7/examples/bin$ ./build-reuters.sh
> Please call cluster-reuters.sh directly next time. This file is going away.
> Please select a number to choose the corresponding clustering algorithm
> 1. kmeans clustering
> 2. fuzzykmeans clustering
> 3. dirichlet clustering
> 4. minhash clustering
> Enter your choice : 1
> ok. You chose 1 and we'll use kmeans Clustering
> creating work directory at /tmp/mahout-work-hduser
> MAHOUT_LOCAL is set, so we don't add HADOOP_CONF_DIR to classpath.
> Warning: $HADOOP_HOME is deprecated.
>
> MAHOUT_LOCAL is set, running locally
> Error occurred during initialization of VM
> Could not reserve enough space for object heap
> Error: Could not create the Java Virtual Machine.
> Error: A fatal exception has occurred. Program will exit.
> Warning: $HADOOP_HOME is deprecated.
>
> rmr: cannot remove /tmp/mahout-work-hduser/reuters-out-seqdir: No such file or directory.
> Warning: $HADOOP_HOME is deprecated.
>
> put: File /tmp/mahout-work-hduser/reuters-out-seqdir does not exist.
>
RE: several info
Posted by "Videnova, Svetlana" <sv...@logica.com>.
Please can somebody help me with this error?
Im using mahout 0.7
/usr/local/mahout-distribution-0.7/examples/bin$ ./build-reuters.sh
Please call cluster-reuters.sh directly next time. This file is going away.
Please select a number to choose the corresponding clustering algorithm
1. kmeans clustering
2. fuzzykmeans clustering
3. dirichlet clustering
4. minhash clustering
Enter your choice : 1
ok. You chose 1 and we'll use kmeans Clustering
creating work directory at /tmp/mahout-work-hduser
MAHOUT_LOCAL is set, so we don't add HADOOP_CONF_DIR to classpath.
Warning: $HADOOP_HOME is deprecated.
MAHOUT_LOCAL is set, running locally
Error occurred during initialization of VM
Could not reserve enough space for object heap
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.
Warning: $HADOOP_HOME is deprecated.
rmr: cannot remove /tmp/mahout-work-hduser/reuters-out-seqdir: No such file or directory.
Warning: $HADOOP_HOME is deprecated.
put: File /tmp/mahout-work-hduser/reuters-out-seqdir does not exist.
-----Message d'origine-----
De : Lance Norskog [mailto:goksron@gmail.com]
Envoyé : dimanche 24 juin 2012 02:03
À : dev@mahout.apache.org
Objet : Re: several info
Please describe what you would like to do. What would you like to learn from your data? We cannot recommend techniques until we know this.
On Fri, Jun 22, 2012 at 5:54 AM, Videnova, Svetlana <sv...@logica.com> wrote:
> HI,
> Sorry I didnt find how the source code of this link https://github.com/gsingers/ApacheCon2010 can help me, maybe I missed some information...
> I'm ok about doing some code that's no problem but where and with which purpose??I mean for the moment I don't even know what hadoop/mahout/solr needs to work together and moreover what I have to add on already existing files in order to add my own database.
>
> Thanks
>
> Regards
>
>
> -----Message d'origine-----
> De : Grant Ingersoll [mailto:gsingers@apache.org] Envoyé : vendredi 22
> juin 2012 13:41 À : dev@mahout.apache.org Objet : Re: several info
>
>
> On Jun 22, 2012, at 2:30 AM, Videnova, Svetlana wrote:
>
>> Hi Grant,
>>
>> Thank you for your fast answer.
>> My question was where can I get the output file on Mahout and how can I vectorize the indexing solr files, and where put them in Mahout?
>
> I think that link I provided shows how to get data out of Solr and into Mahout. You need term vectors (or see https://issues.apache.org/jira/browse/MAHOUT-944 for using it off of stored fields). To get things back into Solr, you'll have to write some code to do that.
>
> -Grant
>
> Think green - keep it on the screen.
>
> This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you.
>
>
--
Lance Norskog
goksron@gmail.com
Think green - keep it on the screen.
This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you.
Re: several info
Posted by Lance Norskog <go...@gmail.com>.
Please describe what you would like to do. What would you like to
learn from your data? We cannot recommend techniques until we know
this.
On Fri, Jun 22, 2012 at 5:54 AM, Videnova, Svetlana
<sv...@logica.com> wrote:
> HI,
> Sorry I didnt find how the source code of this link https://github.com/gsingers/ApacheCon2010 can help me, maybe I missed some information...
> I'm ok about doing some code that's no problem but where and with which purpose??I mean for the moment I don't even know what hadoop/mahout/solr needs to work together and moreover what I have to add on already existing files in order to add my own database.
>
> Thanks
>
> Regards
>
>
> -----Message d'origine-----
> De : Grant Ingersoll [mailto:gsingers@apache.org]
> Envoyé : vendredi 22 juin 2012 13:41
> À : dev@mahout.apache.org
> Objet : Re: several info
>
>
> On Jun 22, 2012, at 2:30 AM, Videnova, Svetlana wrote:
>
>> Hi Grant,
>>
>> Thank you for your fast answer.
>> My question was where can I get the output file on Mahout and how can I vectorize the indexing solr files, and where put them in Mahout?
>
> I think that link I provided shows how to get data out of Solr and into Mahout. You need term vectors (or see https://issues.apache.org/jira/browse/MAHOUT-944 for using it off of stored fields). To get things back into Solr, you'll have to write some code to do that.
>
> -Grant
>
> Think green - keep it on the screen.
>
> This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you.
>
>
--
Lance Norskog
goksron@gmail.com
RE: several info
Posted by "Videnova, Svetlana" <sv...@logica.com>.
HI,
Sorry I didnt find how the source code of this link https://github.com/gsingers/ApacheCon2010 can help me, maybe I missed some information...
I'm ok about doing some code that's no problem but where and with which purpose??I mean for the moment I don't even know what hadoop/mahout/solr needs to work together and moreover what I have to add on already existing files in order to add my own database.
Thanks
Regards
-----Message d'origine-----
De : Grant Ingersoll [mailto:gsingers@apache.org]
Envoyé : vendredi 22 juin 2012 13:41
À : dev@mahout.apache.org
Objet : Re: several info
On Jun 22, 2012, at 2:30 AM, Videnova, Svetlana wrote:
> Hi Grant,
>
> Thank you for your fast answer.
> My question was where can I get the output file on Mahout and how can I vectorize the indexing solr files, and where put them in Mahout?
I think that link I provided shows how to get data out of Solr and into Mahout. You need term vectors (or see https://issues.apache.org/jira/browse/MAHOUT-944 for using it off of stored fields). To get things back into Solr, you'll have to write some code to do that.
-Grant
Think green - keep it on the screen.
This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you.
Re: several info
Posted by Grant Ingersoll <gs...@apache.org>.
On Jun 22, 2012, at 2:30 AM, Videnova, Svetlana wrote:
> Hi Grant,
>
> Thank you for your fast answer.
> My question was where can I get the output file on Mahout and how can I vectorize the indexing solr files, and where put them in Mahout?
I think that link I provided shows how to get data out of Solr and into Mahout. You need term vectors (or see https://issues.apache.org/jira/browse/MAHOUT-944 for using it off of stored fields). To get things back into Solr, you'll have to write some code to do that.
-Grant
RE: several info
Posted by "Videnova, Svetlana" <sv...@logica.com>.
Hi Grant,
Thank you for your fast answer.
My question was where can I get the output file on Mahout and how can I vectorize the indexing solr files, and where put them in Mahout?
I'll try to find some info there.
I'm sorry of my confusion I'll post next questions on user@mahout.apache.org .
Regards
-----Message d'origine-----
De : Grant Ingersoll [mailto:gsingers@apache.org]
Envoyé : jeudi 21 juin 2012 21:14
À : dev@mahout.apache.org
Objet : Re: several info
Hi Svetlana,
I'm not sure I understand what question you are asking. Perhaps if you can back up and tell us the problem you are trying to solve we can point you in the right direction. Mahout is a library of tools and can integrate with Solr in a variety of ways, almost none of which are out of the box at the moment.
It's a little dated, but perhaps this helps: http://www.lucidimagination.com/blog/2010/03/16/integrating-apache-mahout-with-apache-lucene-and-solr-part-i-of-3/ (someday I will finish II and III of that series)
There are also various other sources on the web and I've given some talks on it in the past as well as put up some code at https://github.com/gsingers/ApacheCon2010 (which is also outdated)
Finally, this type of question is best asked on user@mahout.apache.org, just for future reference.
-Grant
On Jun 20, 2012, at 3:36 AM, Videnova, Svetlana wrote:
> How do you want to combine Mahout and Solr? => that's was my question
>
> I was using mahout0.6 but from yesterday Mahout0.7.
>
> So I was trying to run (just for test and making sure that everything works properly)
>
>
>
> ###############################################################################################################
>
> :/usr/local/mahout-distribution-0.7/examples/bin$ ./build-cluster-syntheticcontrol.sh
>
> Please call cluster-syntheticcontrol.sh directly next time. This file is going away.
>
> Please select a number to choose the corresponding clustering algorithm
>
> 1. canopy clustering
>
> 2. kmeans clustering
>
> 3. fuzzykmeans clustering
>
> 4. dirichlet clustering
>
> 5. meanshift clustering
>
> Enter your choice : 1
>
> ok. You chose 1 and we'll use canopy Clustering
>
> creating work directory at /tmp/mahout-work-hduser
>
> Downloading Synthetic control data
>
> % Total % Received % Xferd Average Speed Time Time Time Current
>
> Dload Upload Total Spent Left Speed
>
> 0 0 0 0 0 0 0 0 --:--:-- 0:01:03 --:--:-- 0curl: (7) couldn't connect to host
>
> Checking the health of DFS...
>
> Warning: $HADOOP_HOME is deprecated.
>
>
>
> Found 4 items
>
> drwxr-xr-x - hduser supergroup 0 2012-06-18 14:05 /user/hduser/gutenberg
>
> drwxr-xr-x - hduser supergroup 0 2012-06-18 14:07 /user/hduser/gutenberg-output
>
> drwxr-xr-x - hduser supergroup 0 2012-06-18 15:35 /user/hduser/output
>
> drwxr-xr-x - hduser supergroup 0 2012-06-19 14:24 /user/hduser/testdata
>
> DFS is healthy...
>
> Uploading Synthetic control data to HDFS
>
> Warning: $HADOOP_HOME is deprecated.
>
>
>
> Deleted hdfs://localhost:54310/user/hduser/testdata
>
> Warning: $HADOOP_HOME is deprecated.
>
>
>
> Warning: $HADOOP_HOME is deprecated.
>
>
>
> put: File /tmp/mahout-work-hduser/synthetic_control.data does not exist.
>
> Successfully Uploaded Synthetic control data to HDFS
>
> Warning: $HADOOP_HOME is deprecated.
>
>
>
> Running on hadoop, using /usr/local/hadoop/bin/hadoop and HADOOP_CONF_DIR=
>
> MAHOUT-JOB: /usr/local/mahout-distribution-0.7/mahout-examples-0.7-job.jar
>
> Warning: $HADOOP_HOME is deprecated.
>
>
>
> 12/06/20 08:20:24 WARN driver.MahoutDriver: No org.apache.mahout.clustering.syntheticcontrol.canopy.Job.props found on classpath, will use command-line arguments only
>
> 12/06/20 08:20:24 INFO canopy.Job: Running with default arguments
>
> 12/06/20 08:20:25 INFO common.HadoopUtil: Deleting output
>
> 12/06/20 08:20:26 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
>
> 12/06/20 08:20:28 INFO input.FileInputFormat: Total input paths to process : 0
>
> 12/06/20 08:20:28 INFO mapred.JobClient: Running job: job_201206181326_0030
>
> 12/06/20 08:20:29 INFO mapred.JobClient: map 0% reduce 0%
>
> 12/06/20 08:20:52 INFO mapred.JobClient: Job complete: job_201206181326_0030
>
> 12/06/20 08:20:52 INFO mapred.JobClient: Counters: 4
>
> 12/06/20 08:20:52 INFO mapred.JobClient: Job Counters
>
> 12/06/20 08:20:52 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=10970
>
> 12/06/20 08:20:52 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
>
> 12/06/20 08:20:52 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
>
> 12/06/20 08:20:52 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0
>
> 12/06/20 08:20:52 INFO canopy.CanopyDriver: Build Clusters Input: output/data Out: output Measure: org.apache.mahout.common.distance.EuclideanDistanceMeasure@c5967f t1: 80.0 t2: 55.0
>
> 12/06/20 08:20:52 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
>
> 12/06/20 08:20:53 INFO input.FileInputFormat: Total input paths to process : 0
>
> 12/06/20 08:20:53 INFO mapred.JobClient: Running job: job_201206181326_0031
>
> 12/06/20 08:20:54 INFO mapred.JobClient: map 0% reduce 0%
>
> 12/06/20 08:21:17 INFO mapred.JobClient: map 0% reduce 100%
>
> 12/06/20 08:21:22 INFO mapred.JobClient: Job complete: job_201206181326_0031
>
> 12/06/20 08:21:22 INFO mapred.JobClient: Counters: 19
>
> 12/06/20 08:21:22 INFO mapred.JobClient: Job Counters
>
> 12/06/20 08:21:22 INFO mapred.JobClient: Launched reduce tasks=1
>
> 12/06/20 08:21:22 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=9351
>
> 12/06/20 08:21:22 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
>
> 12/06/20 08:21:22 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
>
> 12/06/20 08:21:22 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=7740
>
> 12/06/20 08:21:22 INFO mapred.JobClient: File Output Format Counters
>
> 12/06/20 08:21:22 INFO mapred.JobClient: Bytes Written=106
>
> 12/06/20 08:21:22 INFO mapred.JobClient: FileSystemCounters
>
> 12/06/20 08:21:22 INFO mapred.JobClient: FILE_BYTES_WRITTEN=22545
>
> 12/06/20 08:21:22 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=106
>
> 12/06/20 08:21:22 INFO mapred.JobClient: Map-Reduce Framework
>
> 12/06/20 08:21:22 INFO mapred.JobClient: Reduce input groups=0
>
> 12/06/20 08:21:22 INFO mapred.JobClient: Combine output records=0
>
> 12/06/20 08:21:22 INFO mapred.JobClient: Reduce shuffle bytes=0
>
> 12/06/20 08:21:22 INFO mapred.JobClient: Physical memory (bytes) snapshot=40652800
>
> 12/06/20 08:21:22 INFO mapred.JobClient: Reduce output records=0
>
> 12/06/20 08:21:22 INFO mapred.JobClient: Spilled Records=0
>
> 12/06/20 08:21:22 INFO mapred.JobClient: CPU time spent (ms)=420
>
> 12/06/20 08:21:22 INFO mapred.JobClient: Total committed heap usage (bytes)=16252928
>
> 12/06/20 08:21:22 INFO mapred.JobClient: Virtual memory (bytes) snapshot=383250432
>
> 12/06/20 08:21:22 INFO mapred.JobClient: Combine input records=0
>
> 12/06/20 08:21:22 INFO mapred.JobClient: Reduce input records=0
>
> 12/06/20 08:21:22 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
>
> 12/06/20 08:21:23 INFO input.FileInputFormat: Total input paths to process : 0
>
> 12/06/20 08:21:23 INFO mapred.JobClient: Running job: job_201206181326_0032
>
> 12/06/20 08:21:24 INFO mapred.JobClient: map 0% reduce 0%
>
> 12/06/20 08:21:43 INFO mapred.JobClient: Job complete: job_201206181326_0032
>
> 12/06/20 08:21:43 INFO mapred.JobClient: Counters: 4
>
> 12/06/20 08:21:43 INFO mapred.JobClient: Job Counters
>
> 12/06/20 08:21:43 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=9347
>
> 12/06/20 08:21:43 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
>
> 12/06/20 08:21:43 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
>
> 12/06/20 08:21:43 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0
>
> 12/06/20 08:21:43 INFO clustering.ClusterDumper: Wrote 0 clusters
>
> 12/06/20 08:21:43 INFO driver.MahoutDriver: Program took 78406 ms (Minutes: 1.3067666666666666)
>
> ###############################################################################################
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> How do you want to combine Mahout and Solr? Also, Solr is a web
>
> service and can receive and supply data in several different formats.
>
>
>
> On Tue, Jun 19, 2012 at 6:04 AM, Paritosh Ranjan <pranjan <at> xebia.com> wrote:
>
>> Regarding the errors,
>
>> which version of Mahout are you using?
>
>> There was some problem in cluster-reuters.sh ( build-reuters.sh calls cluster-reuters.sh ) which has
>
> been fixed in the last release 0.7.
>
>> ________________________________________
>
>> From: Svet [svetlana.videnova <at> logica.com]
>
>> Sent: Tuesday, June 19, 2012 2:51 PM
>
>> To: user <at> mahout.apache.org
>
>> Subject: several info
>
>>
>
>> Hi all,
>
>>
>
>>
>
>> First of all i would like to thanks Praveenesh Kumar for helping me with hadoop
>
>> and mahout!!!
>
>>
>
>> Nevertheless i have several questions about Mahout.
>
>>
>
>> 1) I need Mahout working with SOLR. Can somebody give me a great tutorial to
>
>> make them starting together?
>
>>
>
>> 2)What exactly the possibilities of input and output files of Mahout (especially
>
>> when Mahout works with SOLR, i know that output file of SOLR is XML)?
>
>>
>
>> 3)Which of thoses algorythms are using Hadoop? And please complete the list if i
>
>> forgot some.
>
>> -Canopy, KMeans, Dirichlet, Mean-shift, Latent Dirichlet Allocation
>
>>
>
>>
>
>>
>
>>
>
>> 4)Moreover i was trying to run "./build-reuters.sh" and choosing kmeans
>
>> clustering (but its the same error with fuzzykmeans)
>
>> Can somebody help me with this error? (but look at 8) ! )
>
>> ###########################
>
>> 12/06/19 13:33:52 WARN mapred.LocalJobRunner: job_local_0001
>
>> java.lang.IllegalStateException: No clusters found. Check your -c path.
>
>> at
>
>> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:59)
>
>> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>
>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>
>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>
>> at
>
>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
>
>> 12/06/19 13:33:52 INFO mapred.JobClient: map 0% reduce 0%
>
>> 12/06/19 13:33:52 INFO mapred.JobClient: Job complete: job_local_0001
>
>> 12/06/19 13:33:52 INFO mapred.JobClient: Counters: 0
>
>> Exception in thread "main" java.lang.InterruptedException: K-Means Iteration
>
>> failed processing /tmp/mahout-work-hduser/reuters-kmeans-clusters/part-
>
>> randomSeed
>
>> at
>
>> org.apache.mahout.clustering.kmeans.KMeansDriver.runIteration(KMeansDriver.java:
>
>> 371)
>
>> at
>
>> org.apache.mahout.clustering.kmeans.KMeansDriver.buildClustersMR(KMeansDriver.ja
>
>> va:316)
>
>> at
>
>> org.apache.mahout.clustering.kmeans.KMeansDriver.buildClusters(KMeansDriver.java
>
>> :239)
>
>> at
>
>> org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:154)
>
>> at
>
>> org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:112)
>
>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>
>> at
>
>> org.apache.mahout.clustering.kmeans.KMeansDriver.main(KMeansDriver.java:61)
>
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>
>> at
>
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>
>> at
>
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.jav
>
>> a:43)
>
>> at java.lang.reflect.Method.invoke(Method.java:601)
>
>> at
>
>> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.jav
>
>> a:68)
>
>> at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>
>> at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)
>
>>
>
>> ###########################
>
>>
>
>>
>
>> 5)problem also with "./build-reuters" but lda (but look at 8) ! )
>
>> ############################
>
>> 12/06/19 13:40:01 WARN mapred.LocalJobRunner: job_local_0001
>
>> java.lang.IllegalArgumentException
>
>> at
>
>> com.google.common.base.Preconditions.checkArgument(Preconditions.java:72)
>
>> at
>
>> org.apache.mahout.clustering.lda.LDADriver.createState(LDADriver.java:124)
>
>> at
>
>> org.apache.mahout.clustering.lda.LDADriver.createState(LDADriver.java:92)
>
>> at
>
>> org.apache.mahout.clustering.lda.LDAWordTopicMapper.configure(LDAWordTopicMapper
>
>> .java:96)
>
>> at
>
>> org.apache.mahout.clustering.lda.LDAWordTopicMapper.setup(LDAWordTopicMapper.jav
>
>> a:102)
>
>> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>
>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>
>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>
>> at
>
>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
>
>> 12/06/19 13:40:02 INFO mapred.JobClient: map 0% reduce 0%
>
>> 12/06/19 13:40:02 INFO mapred.JobClient: Job complete: job_local_0001
>
>> 12/06/19 13:40:02 INFO mapred.JobClient: Counters: 0
>
>> Exception in thread "main" java.lang.InterruptedException: LDA Iteration failed
>
>> processing /tmp/mahout-work-hduser/reuters-lda/state-0
>
>> at
>
>> org.apache.mahout.clustering.lda.LDADriver.runIteration(LDADriver.java:449)
>
>> at org.apache.mahout.clustering.lda.LDADriver.run(LDADriver.java:249)
>
>> at org.apache.mahout.clustering.lda.LDADriver.run(LDADriver.java:169)
>
>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>
>> at org.apache.mahout.clustering.lda.LDADriver.main(LDADriver.java:88)
>
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>
>> at
>
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>
>> at
>
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.jav
>
>> a:43)
>
>> at java.lang.reflect.Method.invoke(Method.java:601)
>
>> at
>
>> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.jav
>
>> a:68)
>
>> at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>
>> at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)
>
>> ############################
>
>>
>
>>
>
>> 6)But i was starting "./build-reuters" with dirichlet clustering and it wrote
>
>> 20clusters without problems (but look at 8) ! )
>
>> The result is :
>
>> ############################
>
>> ...
>
>> 12/06/19 13:45:12 INFO driver.MahoutDriver: Program took 142609 ms (Minutes:
>
>> 2.3768166666666666)
>
>> MAHOUT_LOCAL is set, so we don't add HADOOP_CONF_DIR to classpath.
>
>> MAHOUT_LOCAL is set, running locally
>
>> SLF4J: Class path contains multiple SLF4J bindings.
>
>> SLF4J: Found binding in [jar:file:/usr/local/mahout-distribution-0.6/mahout-
>
>> examples-0.6-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>
>> SLF4J: Found binding in [jar:file:/usr/local/mahout-distribution-0.6/lib/slf4j-
>
>> jcl-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>
>> SLF4J: Found binding in [jar:file:/usr/local/mahout-distribution-0.6/lib/slf4j-
>
>> log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>
>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
>
>> 12/06/19 13:45:13 INFO common.AbstractJob: Command line arguments: {--
>
>> dictionary=/tmp/mahout-work-hduser/reuters-out-seqdir-sparse-
>
>> dirichlet/dictionary.file-0, --dictionaryType=sequencefile, --
>
>> distanceMeasure=org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasur
>
>> e, --endPhase=2147483647, --numWords=20, --outputFormat=TEXT, --
>
>> seqFileDir=/tmp/mahout-work-hduser/reuters-dirichlet/clusters-20-final, --
>
>> startPhase=0, --substring=100, --tempDir=temp}
>
>> DC-0 total= 0 model= DMC:0{n=0 c=[] r=[]}
>
>> Top Terms:
>
>> DC-1 total= 0 model= DMC:1{n=0 c=[] r=[]}
>
>> Top Terms:
>
>> DC-10 total= 0 model= DMC:10{n=0 c=[] r=[]}
>
>> Top Terms:
>
>> DC-11 total= 0 model= DMC:11{n=0 c=[] r=[]}
>
>> Top Terms:
>
>> DC-12 total= 0 model= DMC:12{n=0 c=[] r=[]}
>
>> Top Terms:
>
>> DC-13 total= 0 model= DMC:13{n=0 c=[] r=[]}
>
>> Top Terms:
>
>> DC-14 total= 0 model= DMC:14{n=0 c=[] r=[]}
>
>> Top Terms:
>
>> DC-15 total= 0 model= DMC:15{n=0 c=[] r=[]}
>
>> Top Terms:
>
>> DC-16 total= 0 model= DMC:16{n=0 c=[] r=[]}
>
>> Top Terms:
>
>> DC-17 total= 0 model= DMC:17{n=0 c=[] r=[]}
>
>> Top Terms:
>
>> DC-18 total= 0 model= DMC:18{n=0 c=[] r=[]}
>
>> Top Terms:
>
>> DC-19 total= 0 model= DMC:19{n=0 c=[] r=[]}
>
>> Top Terms:
>
>> DC-2 total= 0 model= DMC:2{n=0 c=[] r=[]}
>
>> Top Terms:
>
>> DC-3 total= 0 model= DMC:3{n=0 c=[] r=[]}
>
>> Top Terms:
>
>> DC-4 total= 0 model= DMC:4{n=0 c=[] r=[]}
>
>> Top Terms:
>
>> DC-5 total= 0 model= DMC:5{n=0 c=[] r=[]}
>
>> Top Terms:
>
>> DC-6 total= 0 model= DMC:6{n=0 c=[] r=[]}
>
>> Top Terms:
>
>> DC-7 total= 0 model= DMC:7{n=0 c=[] r=[]}
>
>> Top Terms:
>
>> DC-8 total= 0 model= DMC:8{n=0 c=[] r=[]}
>
>> Top Terms:
>
>> DC-9 total= 0 model= DMC:9{n=0 c=[] r=[]}
>
>> Top Terms:
>
>> 12/06/19 13:45:14 INFO clustering.ClusterDumper: Wrote 20 clusters
>
>> 12/06/19 13:45:14 INFO driver.MahoutDriver: Program took 789 ms (Minutes:
>
>> 0.01315)
>
>> ############################
>
>>
>
>>
>
>> 7) And the end : "./build-reuters" with minhash clustering.
>
>> Works good!
>
>>
>
>>
>
>> 8) For 4),5),6) and 7) there is SUCCESS file in /tmp/mahout-work-hduser/
>
>>
>
>> ...
>
>>
>
>>
>
>>
>
>> Thanks everybody
>
>> Regards
>
>>
>
>
>
> --
>
> Lance Norskog
>
> goksron <at> gmail.com
>
>
>
>
>
> Think green - keep it on the screen.
>
> This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you.
>
--------------------------------------------
Grant Ingersoll
http://www.lucidimagination.com
Think green - keep it on the screen.
This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you.
Re: several info
Posted by Grant Ingersoll <gs...@apache.org>.
Hi Svetlana,
I'm not sure I understand what question you are asking. Perhaps if you can back up and tell us the problem you are trying to solve we can point you in the right direction. Mahout is a library of tools and can integrate with Solr in a variety of ways, almost none of which are out of the box at the moment.
It's a little dated, but perhaps this helps: http://www.lucidimagination.com/blog/2010/03/16/integrating-apache-mahout-with-apache-lucene-and-solr-part-i-of-3/ (someday I will finish II and III of that series)
There are also various other sources on the web and I've given some talks on it in the past as well as put up some code at https://github.com/gsingers/ApacheCon2010 (which is also outdated)
Finally, this type of question is best asked on user@mahout.apache.org, just for future reference.
-Grant
On Jun 20, 2012, at 3:36 AM, Videnova, Svetlana wrote:
> How do you want to combine Mahout and Solr? => that's was my question
>
> I was using mahout0.6 but from yesterday Mahout0.7.
>
> So I was trying to run (just for test and making sure that everything works properly)
>
>
>
> ###############################################################################################################
>
> :/usr/local/mahout-distribution-0.7/examples/bin$ ./build-cluster-syntheticcontrol.sh
>
> Please call cluster-syntheticcontrol.sh directly next time. This file is going away.
>
> Please select a number to choose the corresponding clustering algorithm
>
> 1. canopy clustering
>
> 2. kmeans clustering
>
> 3. fuzzykmeans clustering
>
> 4. dirichlet clustering
>
> 5. meanshift clustering
>
> Enter your choice : 1
>
> ok. You chose 1 and we'll use canopy Clustering
>
> creating work directory at /tmp/mahout-work-hduser
>
> Downloading Synthetic control data
>
> % Total % Received % Xferd Average Speed Time Time Time Current
>
> Dload Upload Total Spent Left Speed
>
> 0 0 0 0 0 0 0 0 --:--:-- 0:01:03 --:--:-- 0curl: (7) couldn't connect to host
>
> Checking the health of DFS...
>
> Warning: $HADOOP_HOME is deprecated.
>
>
>
> Found 4 items
>
> drwxr-xr-x - hduser supergroup 0 2012-06-18 14:05 /user/hduser/gutenberg
>
> drwxr-xr-x - hduser supergroup 0 2012-06-18 14:07 /user/hduser/gutenberg-output
>
> drwxr-xr-x - hduser supergroup 0 2012-06-18 15:35 /user/hduser/output
>
> drwxr-xr-x - hduser supergroup 0 2012-06-19 14:24 /user/hduser/testdata
>
> DFS is healthy...
>
> Uploading Synthetic control data to HDFS
>
> Warning: $HADOOP_HOME is deprecated.
>
>
>
> Deleted hdfs://localhost:54310/user/hduser/testdata
>
> Warning: $HADOOP_HOME is deprecated.
>
>
>
> Warning: $HADOOP_HOME is deprecated.
>
>
>
> put: File /tmp/mahout-work-hduser/synthetic_control.data does not exist.
>
> Successfully Uploaded Synthetic control data to HDFS
>
> Warning: $HADOOP_HOME is deprecated.
>
>
>
> Running on hadoop, using /usr/local/hadoop/bin/hadoop and HADOOP_CONF_DIR=
>
> MAHOUT-JOB: /usr/local/mahout-distribution-0.7/mahout-examples-0.7-job.jar
>
> Warning: $HADOOP_HOME is deprecated.
>
>
>
> 12/06/20 08:20:24 WARN driver.MahoutDriver: No org.apache.mahout.clustering.syntheticcontrol.canopy.Job.props found on classpath, will use command-line arguments only
>
> 12/06/20 08:20:24 INFO canopy.Job: Running with default arguments
>
> 12/06/20 08:20:25 INFO common.HadoopUtil: Deleting output
>
> 12/06/20 08:20:26 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
>
> 12/06/20 08:20:28 INFO input.FileInputFormat: Total input paths to process : 0
>
> 12/06/20 08:20:28 INFO mapred.JobClient: Running job: job_201206181326_0030
>
> 12/06/20 08:20:29 INFO mapred.JobClient: map 0% reduce 0%
>
> 12/06/20 08:20:52 INFO mapred.JobClient: Job complete: job_201206181326_0030
>
> 12/06/20 08:20:52 INFO mapred.JobClient: Counters: 4
>
> 12/06/20 08:20:52 INFO mapred.JobClient: Job Counters
>
> 12/06/20 08:20:52 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=10970
>
> 12/06/20 08:20:52 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
>
> 12/06/20 08:20:52 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
>
> 12/06/20 08:20:52 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0
>
> 12/06/20 08:20:52 INFO canopy.CanopyDriver: Build Clusters Input: output/data Out: output Measure: org.apache.mahout.common.distance.EuclideanDistanceMeasure@c5967f t1: 80.0 t2: 55.0
>
> 12/06/20 08:20:52 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
>
> 12/06/20 08:20:53 INFO input.FileInputFormat: Total input paths to process : 0
>
> 12/06/20 08:20:53 INFO mapred.JobClient: Running job: job_201206181326_0031
>
> 12/06/20 08:20:54 INFO mapred.JobClient: map 0% reduce 0%
>
> 12/06/20 08:21:17 INFO mapred.JobClient: map 0% reduce 100%
>
> 12/06/20 08:21:22 INFO mapred.JobClient: Job complete: job_201206181326_0031
>
> 12/06/20 08:21:22 INFO mapred.JobClient: Counters: 19
>
> 12/06/20 08:21:22 INFO mapred.JobClient: Job Counters
>
> 12/06/20 08:21:22 INFO mapred.JobClient: Launched reduce tasks=1
>
> 12/06/20 08:21:22 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=9351
>
> 12/06/20 08:21:22 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
>
> 12/06/20 08:21:22 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
>
> 12/06/20 08:21:22 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=7740
>
> 12/06/20 08:21:22 INFO mapred.JobClient: File Output Format Counters
>
> 12/06/20 08:21:22 INFO mapred.JobClient: Bytes Written=106
>
> 12/06/20 08:21:22 INFO mapred.JobClient: FileSystemCounters
>
> 12/06/20 08:21:22 INFO mapred.JobClient: FILE_BYTES_WRITTEN=22545
>
> 12/06/20 08:21:22 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=106
>
> 12/06/20 08:21:22 INFO mapred.JobClient: Map-Reduce Framework
>
> 12/06/20 08:21:22 INFO mapred.JobClient: Reduce input groups=0
>
> 12/06/20 08:21:22 INFO mapred.JobClient: Combine output records=0
>
> 12/06/20 08:21:22 INFO mapred.JobClient: Reduce shuffle bytes=0
>
> 12/06/20 08:21:22 INFO mapred.JobClient: Physical memory (bytes) snapshot=40652800
>
> 12/06/20 08:21:22 INFO mapred.JobClient: Reduce output records=0
>
> 12/06/20 08:21:22 INFO mapred.JobClient: Spilled Records=0
>
> 12/06/20 08:21:22 INFO mapred.JobClient: CPU time spent (ms)=420
>
> 12/06/20 08:21:22 INFO mapred.JobClient: Total committed heap usage (bytes)=16252928
>
> 12/06/20 08:21:22 INFO mapred.JobClient: Virtual memory (bytes) snapshot=383250432
>
> 12/06/20 08:21:22 INFO mapred.JobClient: Combine input records=0
>
> 12/06/20 08:21:22 INFO mapred.JobClient: Reduce input records=0
>
> 12/06/20 08:21:22 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
>
> 12/06/20 08:21:23 INFO input.FileInputFormat: Total input paths to process : 0
>
> 12/06/20 08:21:23 INFO mapred.JobClient: Running job: job_201206181326_0032
>
> 12/06/20 08:21:24 INFO mapred.JobClient: map 0% reduce 0%
>
> 12/06/20 08:21:43 INFO mapred.JobClient: Job complete: job_201206181326_0032
>
> 12/06/20 08:21:43 INFO mapred.JobClient: Counters: 4
>
> 12/06/20 08:21:43 INFO mapred.JobClient: Job Counters
>
> 12/06/20 08:21:43 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=9347
>
> 12/06/20 08:21:43 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
>
> 12/06/20 08:21:43 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
>
> 12/06/20 08:21:43 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0
>
> 12/06/20 08:21:43 INFO clustering.ClusterDumper: Wrote 0 clusters
>
> 12/06/20 08:21:43 INFO driver.MahoutDriver: Program took 78406 ms (Minutes: 1.3067666666666666)
>
> ###############################################################################################
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> How do you want to combine Mahout and Solr? Also, Solr is a web
>
> service and can receive and supply data in several different formats.
>
>
>
> On Tue, Jun 19, 2012 at 6:04 AM, Paritosh Ranjan <pranjan <at> xebia.com> wrote:
>
>> Regarding the errors,
>
>> which version of Mahout are you using?
>
>> There was some problem in cluster-reuters.sh ( build-reuters.sh calls cluster-reuters.sh ) which has
>
> been fixed in the last release 0.7.
>
>> ________________________________________
>
>> From: Svet [svetlana.videnova <at> logica.com]
>
>> Sent: Tuesday, June 19, 2012 2:51 PM
>
>> To: user <at> mahout.apache.org
>
>> Subject: several info
>
>>
>
>> Hi all,
>
>>
>
>>
>
>> First of all i would like to thanks Praveenesh Kumar for helping me with hadoop
>
>> and mahout!!!
>
>>
>
>> Nevertheless i have several questions about Mahout.
>
>>
>
>> 1) I need Mahout working with SOLR. Can somebody give me a great tutorial to
>
>> make them starting together?
>
>>
>
>> 2)What exactly the possibilities of input and output files of Mahout (especially
>
>> when Mahout works with SOLR, i know that output file of SOLR is XML)?
>
>>
>
>> 3)Which of thoses algorythms are using Hadoop? And please complete the list if i
>
>> forgot some.
>
>> -Canopy, KMeans, Dirichlet, Mean-shift, Latent Dirichlet Allocation
>
>>
>
>>
>
>>
>
>>
>
>> 4)Moreover i was trying to run "./build-reuters.sh" and choosing kmeans
>
>> clustering (but its the same error with fuzzykmeans)
>
>> Can somebody help me with this error? (but look at 8) ! )
>
>> ###########################
>
>> 12/06/19 13:33:52 WARN mapred.LocalJobRunner: job_local_0001
>
>> java.lang.IllegalStateException: No clusters found. Check your -c path.
>
>> at
>
>> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:59)
>
>> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>
>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>
>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>
>> at
>
>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
>
>> 12/06/19 13:33:52 INFO mapred.JobClient: map 0% reduce 0%
>
>> 12/06/19 13:33:52 INFO mapred.JobClient: Job complete: job_local_0001
>
>> 12/06/19 13:33:52 INFO mapred.JobClient: Counters: 0
>
>> Exception in thread "main" java.lang.InterruptedException: K-Means Iteration
>
>> failed processing /tmp/mahout-work-hduser/reuters-kmeans-clusters/part-
>
>> randomSeed
>
>> at
>
>> org.apache.mahout.clustering.kmeans.KMeansDriver.runIteration(KMeansDriver.java:
>
>> 371)
>
>> at
>
>> org.apache.mahout.clustering.kmeans.KMeansDriver.buildClustersMR(KMeansDriver.ja
>
>> va:316)
>
>> at
>
>> org.apache.mahout.clustering.kmeans.KMeansDriver.buildClusters(KMeansDriver.java
>
>> :239)
>
>> at
>
>> org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:154)
>
>> at
>
>> org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:112)
>
>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>
>> at
>
>> org.apache.mahout.clustering.kmeans.KMeansDriver.main(KMeansDriver.java:61)
>
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>
>> at
>
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>
>> at
>
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.jav
>
>> a:43)
>
>> at java.lang.reflect.Method.invoke(Method.java:601)
>
>> at
>
>> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.jav
>
>> a:68)
>
>> at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>
>> at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)
>
>>
>
>> ###########################
>
>>
>
>>
>
>> 5)problem also with "./build-reuters" but lda (but look at 8) ! )
>
>> ############################
>
>> 12/06/19 13:40:01 WARN mapred.LocalJobRunner: job_local_0001
>
>> java.lang.IllegalArgumentException
>
>> at
>
>> com.google.common.base.Preconditions.checkArgument(Preconditions.java:72)
>
>> at
>
>> org.apache.mahout.clustering.lda.LDADriver.createState(LDADriver.java:124)
>
>> at
>
>> org.apache.mahout.clustering.lda.LDADriver.createState(LDADriver.java:92)
>
>> at
>
>> org.apache.mahout.clustering.lda.LDAWordTopicMapper.configure(LDAWordTopicMapper
>
>> .java:96)
>
>> at
>
>> org.apache.mahout.clustering.lda.LDAWordTopicMapper.setup(LDAWordTopicMapper.jav
>
>> a:102)
>
>> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>
>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>
>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>
>> at
>
>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
>
>> 12/06/19 13:40:02 INFO mapred.JobClient: map 0% reduce 0%
>
>> 12/06/19 13:40:02 INFO mapred.JobClient: Job complete: job_local_0001
>
>> 12/06/19 13:40:02 INFO mapred.JobClient: Counters: 0
>
>> Exception in thread "main" java.lang.InterruptedException: LDA Iteration failed
>
>> processing /tmp/mahout-work-hduser/reuters-lda/state-0
>
>> at
>
>> org.apache.mahout.clustering.lda.LDADriver.runIteration(LDADriver.java:449)
>
>> at org.apache.mahout.clustering.lda.LDADriver.run(LDADriver.java:249)
>
>> at org.apache.mahout.clustering.lda.LDADriver.run(LDADriver.java:169)
>
>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>
>> at org.apache.mahout.clustering.lda.LDADriver.main(LDADriver.java:88)
>
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>
>> at
>
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>
>> at
>
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.jav
>
>> a:43)
>
>> at java.lang.reflect.Method.invoke(Method.java:601)
>
>> at
>
>> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.jav
>
>> a:68)
>
>> at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>
>> at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)
>
>> ############################
>
>>
>
>>
>
>> 6)But i was starting "./build-reuters" with dirichlet clustering and it wrote
>
>> 20clusters without problems (but look at 8) ! )
>
>> The result is :
>
>> ############################
>
>> ...
>
>> 12/06/19 13:45:12 INFO driver.MahoutDriver: Program took 142609 ms (Minutes:
>
>> 2.3768166666666666)
>
>> MAHOUT_LOCAL is set, so we don't add HADOOP_CONF_DIR to classpath.
>
>> MAHOUT_LOCAL is set, running locally
>
>> SLF4J: Class path contains multiple SLF4J bindings.
>
>> SLF4J: Found binding in [jar:file:/usr/local/mahout-distribution-0.6/mahout-
>
>> examples-0.6-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>
>> SLF4J: Found binding in [jar:file:/usr/local/mahout-distribution-0.6/lib/slf4j-
>
>> jcl-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>
>> SLF4J: Found binding in [jar:file:/usr/local/mahout-distribution-0.6/lib/slf4j-
>
>> log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>
>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
>
>> 12/06/19 13:45:13 INFO common.AbstractJob: Command line arguments: {--
>
>> dictionary=/tmp/mahout-work-hduser/reuters-out-seqdir-sparse-
>
>> dirichlet/dictionary.file-0, --dictionaryType=sequencefile, --
>
>> distanceMeasure=org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasur
>
>> e, --endPhase=2147483647, --numWords=20, --outputFormat=TEXT, --
>
>> seqFileDir=/tmp/mahout-work-hduser/reuters-dirichlet/clusters-20-final, --
>
>> startPhase=0, --substring=100, --tempDir=temp}
>
>> DC-0 total= 0 model= DMC:0{n=0 c=[] r=[]}
>
>> Top Terms:
>
>> DC-1 total= 0 model= DMC:1{n=0 c=[] r=[]}
>
>> Top Terms:
>
>> DC-10 total= 0 model= DMC:10{n=0 c=[] r=[]}
>
>> Top Terms:
>
>> DC-11 total= 0 model= DMC:11{n=0 c=[] r=[]}
>
>> Top Terms:
>
>> DC-12 total= 0 model= DMC:12{n=0 c=[] r=[]}
>
>> Top Terms:
>
>> DC-13 total= 0 model= DMC:13{n=0 c=[] r=[]}
>
>> Top Terms:
>
>> DC-14 total= 0 model= DMC:14{n=0 c=[] r=[]}
>
>> Top Terms:
>
>> DC-15 total= 0 model= DMC:15{n=0 c=[] r=[]}
>
>> Top Terms:
>
>> DC-16 total= 0 model= DMC:16{n=0 c=[] r=[]}
>
>> Top Terms:
>
>> DC-17 total= 0 model= DMC:17{n=0 c=[] r=[]}
>
>> Top Terms:
>
>> DC-18 total= 0 model= DMC:18{n=0 c=[] r=[]}
>
>> Top Terms:
>
>> DC-19 total= 0 model= DMC:19{n=0 c=[] r=[]}
>
>> Top Terms:
>
>> DC-2 total= 0 model= DMC:2{n=0 c=[] r=[]}
>
>> Top Terms:
>
>> DC-3 total= 0 model= DMC:3{n=0 c=[] r=[]}
>
>> Top Terms:
>
>> DC-4 total= 0 model= DMC:4{n=0 c=[] r=[]}
>
>> Top Terms:
>
>> DC-5 total= 0 model= DMC:5{n=0 c=[] r=[]}
>
>> Top Terms:
>
>> DC-6 total= 0 model= DMC:6{n=0 c=[] r=[]}
>
>> Top Terms:
>
>> DC-7 total= 0 model= DMC:7{n=0 c=[] r=[]}
>
>> Top Terms:
>
>> DC-8 total= 0 model= DMC:8{n=0 c=[] r=[]}
>
>> Top Terms:
>
>> DC-9 total= 0 model= DMC:9{n=0 c=[] r=[]}
>
>> Top Terms:
>
>> 12/06/19 13:45:14 INFO clustering.ClusterDumper: Wrote 20 clusters
>
>> 12/06/19 13:45:14 INFO driver.MahoutDriver: Program took 789 ms (Minutes:
>
>> 0.01315)
>
>> ############################
>
>>
>
>>
>
>> 7) And the end : "./build-reuters" with minhash clustering.
>
>> Works good!
>
>>
>
>>
>
>> 8) For 4),5),6) and 7) there is SUCCESS file in /tmp/mahout-work-hduser/
>
>>
>
>> ...
>
>>
>
>>
>
>>
>
>> Thanks everybody
>
>> Regards
>
>>
>
>
>
> --
>
> Lance Norskog
>
> goksron <at> gmail.com
>
>
>
>
>
> Think green - keep it on the screen.
>
> This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you.
>
--------------------------------------------
Grant Ingersoll
http://www.lucidimagination.com