You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Elmer Garduno (JIRA)" <ji...@apache.org> on 2012/10/31 06:45:11 UTC
[jira] [Created] (MAHOUT-1108) cluster-reuters.sh executes
seqdirectory with MAHOUT_LOCAL=true
Elmer Garduno created MAHOUT-1108:
-------------------------------------
Summary: cluster-reuters.sh executes seqdirectory with MAHOUT_LOCAL=true
Key: MAHOUT-1108
URL: https://issues.apache.org/jira/browse/MAHOUT-1108
Project: Mahout
Issue Type: Bug
Affects Versions: 0.7
Reporter: Elmer Garduno
Priority: Minor
Got the following exception when running the command with HADOOP_CONF and HADOOP_CONF_DIR
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/util/ProgramDriver
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:96)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.util.ProgramDriver
at java.net.URLClassLoader$1.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
... 1 more
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-1108) cluster-reuters.sh executes
seqdirectory with MAHOUT_LOCAL=true
Posted by "Lance Norskog (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13489178#comment-13489178 ]
Lance Norskog commented on MAHOUT-1108:
---------------------------------------
I don't use a Hadoop cluster, I just run these jobs locally. Please do not require a cluster to run the example programs.
> cluster-reuters.sh executes seqdirectory with MAHOUT_LOCAL=true
> ---------------------------------------------------------------
>
> Key: MAHOUT-1108
> URL: https://issues.apache.org/jira/browse/MAHOUT-1108
> Project: Mahout
> Issue Type: Bug
> Affects Versions: 0.7
> Reporter: Elmer Garduno
> Priority: Minor
> Original Estimate: 1h
> Remaining Estimate: 1h
>
> Got the following exception when running the command with HADOOP_CONF and HADOOP_CONF_DIR
> Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/util/ProgramDriver
> at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:96)
> Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.util.ProgramDriver
> at java.net.URLClassLoader$1.run(Unknown Source)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(Unknown Source)
> at java.lang.ClassLoader.loadClass(Unknown Source)
> at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
> at java.lang.ClassLoader.loadClass(Unknown Source)
> ... 1 more
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Comment Edited] (MAHOUT-1108) cluster-reuters.sh executes
seqdirectory with MAHOUT_LOCAL=true
Posted by "Paritosh Ranjan (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13488613#comment-13488613 ]
Paritosh Ranjan edited comment on MAHOUT-1108 at 11/1/12 11:21 AM:
-------------------------------------------------------------------
I tend to agree with you.
I don't see the point of extracting it locally and the putting the files in hdfs, if it can be put directly into hdfs. From what I can see, nothing else is done sequentially (locally) in this script. So, MAHOUT_LOCAL seems to be redundant to me.
Still, I think that the first mapreduce call should be after this check
HADOOP="$HADOOP_HOME/bin/hadoop"
if [ ! -e $HADOOP ]; then
echo "Can't find hadoop in $HADOOP, exiting"
exit 1
fi
so that the user is warned with a proper message.
Since I am not the creator of this script, and I am not sure about the use of MAHOUT_LOCAL, I would like to wait for someone to clarify the doubts regarding MAHOUT_LOCAL. Then, I think we can go ahead this change with some modifications ( like putting the mapreduce call after the check of hadoop's existence).
was (Author: paritoshranjan):
I tend to agree with you.
I don't see the point of extracting it locally and the putting the files in hdfs. From what I can see, nothing else is done sequentially (locally) in this script. So, MAHOUT_LOCAL seems to be redundant to me.
Still, I think that the first mapreduce call be after
HADOOP="$HADOOP_HOME/bin/hadoop"
if [ ! -e $HADOOP ]; then
echo "Can't find hadoop in $HADOOP, exiting"
exit 1
fi
so that the user is warned with a proper message.
Since I am not the creator of this script, and I am not sure about the use of MAHOUT_LOCAL, I would like to wait for someone to clarify the doubts regarding MAHOUT_LOCAL. Then, I think we can go ahead this change with some modifications ( like putting the mapreduce call after the check of hadoop's existence).
> cluster-reuters.sh executes seqdirectory with MAHOUT_LOCAL=true
> ---------------------------------------------------------------
>
> Key: MAHOUT-1108
> URL: https://issues.apache.org/jira/browse/MAHOUT-1108
> Project: Mahout
> Issue Type: Bug
> Affects Versions: 0.7
> Reporter: Elmer Garduno
> Priority: Minor
> Original Estimate: 1h
> Remaining Estimate: 1h
>
> Got the following exception when running the command with HADOOP_CONF and HADOOP_CONF_DIR
> Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/util/ProgramDriver
> at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:96)
> Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.util.ProgramDriver
> at java.net.URLClassLoader$1.run(Unknown Source)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(Unknown Source)
> at java.lang.ClassLoader.loadClass(Unknown Source)
> at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
> at java.lang.ClassLoader.loadClass(Unknown Source)
> ... 1 more
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-1108) cluster-reuters.sh executes
seqdirectory with MAHOUT_LOCAL=true
Posted by "Paritosh Ranjan (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13487603#comment-13487603 ]
Paritosh Ranjan commented on MAHOUT-1108:
-----------------------------------------
The code tries to put reuters-out-seqdir directly in hdfs. It will hamper the execution of cluster-reuters in local mode. The code also removes the hadoop cluster statu check and cleanup and directly puts everything in hdfs, which I think is not desired.
I think the problem faced while execution in distributed environment happened due to absence of HADOOP_HOME property. Can you retry with HADOOP_HOME set?
> cluster-reuters.sh executes seqdirectory with MAHOUT_LOCAL=true
> ---------------------------------------------------------------
>
> Key: MAHOUT-1108
> URL: https://issues.apache.org/jira/browse/MAHOUT-1108
> Project: Mahout
> Issue Type: Bug
> Affects Versions: 0.7
> Reporter: Elmer Garduno
> Priority: Minor
> Original Estimate: 1h
> Remaining Estimate: 1h
>
> Got the following exception when running the command with HADOOP_CONF and HADOOP_CONF_DIR
> Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/util/ProgramDriver
> at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:96)
> Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.util.ProgramDriver
> at java.net.URLClassLoader$1.run(Unknown Source)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(Unknown Source)
> at java.lang.ClassLoader.loadClass(Unknown Source)
> at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
> at java.lang.ClassLoader.loadClass(Unknown Source)
> ... 1 more
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-1108) cluster-reuters.sh executes
seqdirectory with MAHOUT_LOCAL=true
Posted by "Paritosh Ranjan (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13488613#comment-13488613 ]
Paritosh Ranjan commented on MAHOUT-1108:
-----------------------------------------
I tend to agree with you.
I don't see the point of extracting it locally and the putting the files in hdfs. From what I can see, nothing else is done sequentially (locally) in this script. So, MAHOUT_LOCAL seems to be redundant to me.
Still, I think that the first mapreduce call be after
HADOOP="$HADOOP_HOME/bin/hadoop"
if [ ! -e $HADOOP ]; then
echo "Can't find hadoop in $HADOOP, exiting"
exit 1
fi
so that the user is warned with a proper message.
Since I am not the creator of this script, and I am not sure about the use of MAHOUT_LOCAL, I would like to wait for someone to clarify the doubts regarding MAHOUT_LOCAL. Then, I think we can go ahead this change with some modifications ( like putting the mapreduce call after the check of hadoop's existence).
> cluster-reuters.sh executes seqdirectory with MAHOUT_LOCAL=true
> ---------------------------------------------------------------
>
> Key: MAHOUT-1108
> URL: https://issues.apache.org/jira/browse/MAHOUT-1108
> Project: Mahout
> Issue Type: Bug
> Affects Versions: 0.7
> Reporter: Elmer Garduno
> Priority: Minor
> Original Estimate: 1h
> Remaining Estimate: 1h
>
> Got the following exception when running the command with HADOOP_CONF and HADOOP_CONF_DIR
> Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/util/ProgramDriver
> at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:96)
> Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.util.ProgramDriver
> at java.net.URLClassLoader$1.run(Unknown Source)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(Unknown Source)
> at java.lang.ClassLoader.loadClass(Unknown Source)
> at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
> at java.lang.ClassLoader.loadClass(Unknown Source)
> ... 1 more
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-1108) cluster-reuters.sh executes
seqdirectory with MAHOUT_LOCAL=true
Posted by "Elmer Garduno (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13488425#comment-13488425 ]
Elmer Garduno commented on MAHOUT-1108:
---------------------------------------
Yes the HADOOP_HOME variable is set. I added a more balanced approach that copies the reuters-sgm files to the cluster from the beginning and then runs all the operations on the cluster.
I have updated the request.
https://github.com/apache/mahout/pull/9
Thanks
> cluster-reuters.sh executes seqdirectory with MAHOUT_LOCAL=true
> ---------------------------------------------------------------
>
> Key: MAHOUT-1108
> URL: https://issues.apache.org/jira/browse/MAHOUT-1108
> Project: Mahout
> Issue Type: Bug
> Affects Versions: 0.7
> Reporter: Elmer Garduno
> Priority: Minor
> Original Estimate: 1h
> Remaining Estimate: 1h
>
> Got the following exception when running the command with HADOOP_CONF and HADOOP_CONF_DIR
> Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/util/ProgramDriver
> at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:96)
> Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.util.ProgramDriver
> at java.net.URLClassLoader$1.run(Unknown Source)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(Unknown Source)
> at java.lang.ClassLoader.loadClass(Unknown Source)
> at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
> at java.lang.ClassLoader.loadClass(Unknown Source)
> ... 1 more
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-1108) cluster-reuters.sh executes
seqdirectory with MAHOUT_LOCAL=true
Posted by "Elmer Garduno (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13487561#comment-13487561 ]
Elmer Garduno commented on MAHOUT-1108:
---------------------------------------
Added a pull request on github.
https://github.com/apache/mahout/pull/8
> cluster-reuters.sh executes seqdirectory with MAHOUT_LOCAL=true
> ---------------------------------------------------------------
>
> Key: MAHOUT-1108
> URL: https://issues.apache.org/jira/browse/MAHOUT-1108
> Project: Mahout
> Issue Type: Bug
> Affects Versions: 0.7
> Reporter: Elmer Garduno
> Priority: Minor
> Original Estimate: 1h
> Remaining Estimate: 1h
>
> Got the following exception when running the command with HADOOP_CONF and HADOOP_CONF_DIR
> Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/util/ProgramDriver
> at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:96)
> Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.util.ProgramDriver
> at java.net.URLClassLoader$1.run(Unknown Source)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(Unknown Source)
> at java.lang.ClassLoader.loadClass(Unknown Source)
> at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
> at java.lang.ClassLoader.loadClass(Unknown Source)
> ... 1 more
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Comment Edited] (MAHOUT-1108) cluster-reuters.sh executes
seqdirectory with MAHOUT_LOCAL=true
Posted by "Paritosh Ranjan (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13487603#comment-13487603 ]
Paritosh Ranjan edited comment on MAHOUT-1108 at 10/31/12 7:42 AM:
-------------------------------------------------------------------
The code tries to put reuters-out-seqdir directly in hdfs. It will hamper the execution of cluster-reuters in local mode. The code also removes the hadoop cluster status check and cleanup and directly puts everything in hdfs, which I think is not desired.
I think the problem faced while execution in distributed environment happened due to absence of HADOOP_HOME property. Can you retry with HADOOP_HOME set?
was (Author: paritoshranjan):
The code tries to put reuters-out-seqdir directly in hdfs. It will hamper the execution of cluster-reuters in local mode. The code also removes the hadoop cluster statu check and cleanup and directly puts everything in hdfs, which I think is not desired.
I think the problem faced while execution in distributed environment happened due to absence of HADOOP_HOME property. Can you retry with HADOOP_HOME set?
> cluster-reuters.sh executes seqdirectory with MAHOUT_LOCAL=true
> ---------------------------------------------------------------
>
> Key: MAHOUT-1108
> URL: https://issues.apache.org/jira/browse/MAHOUT-1108
> Project: Mahout
> Issue Type: Bug
> Affects Versions: 0.7
> Reporter: Elmer Garduno
> Priority: Minor
> Original Estimate: 1h
> Remaining Estimate: 1h
>
> Got the following exception when running the command with HADOOP_CONF and HADOOP_CONF_DIR
> Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/util/ProgramDriver
> at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:96)
> Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.util.ProgramDriver
> at java.net.URLClassLoader$1.run(Unknown Source)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(Unknown Source)
> at java.lang.ClassLoader.loadClass(Unknown Source)
> at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
> at java.lang.ClassLoader.loadClass(Unknown Source)
> ... 1 more
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-1108) cluster-reuters.sh executes
seqdirectory with MAHOUT_LOCAL=true
Posted by "Elmer Garduno (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13489189#comment-13489189 ]
Elmer Garduno commented on MAHOUT-1108:
---------------------------------------
@Paritosh I have updated the pull request to do the check before the first call, and also made sure the directory is available on HDFS or locally if that's the case.
@Lance I think this change doesn't affect the way you currently use it, as the rest of the commands where already invoked without using MAHOUT_LOCAL.
> cluster-reuters.sh executes seqdirectory with MAHOUT_LOCAL=true
> ---------------------------------------------------------------
>
> Key: MAHOUT-1108
> URL: https://issues.apache.org/jira/browse/MAHOUT-1108
> Project: Mahout
> Issue Type: Bug
> Affects Versions: 0.7
> Reporter: Elmer Garduno
> Priority: Minor
> Original Estimate: 1h
> Remaining Estimate: 1h
>
> Got the following exception when running the command with HADOOP_CONF and HADOOP_CONF_DIR
> Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/util/ProgramDriver
> at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:96)
> Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.util.ProgramDriver
> at java.net.URLClassLoader$1.run(Unknown Source)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(Unknown Source)
> at java.lang.ClassLoader.loadClass(Unknown Source)
> at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
> at java.lang.ClassLoader.loadClass(Unknown Source)
> ... 1 more
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Comment Edited] (MAHOUT-1108) cluster-reuters.sh executes
seqdirectory with MAHOUT_LOCAL=true
Posted by "Elmer Garduno (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13488425#comment-13488425 ]
Elmer Garduno edited comment on MAHOUT-1108 at 11/1/12 2:38 AM:
----------------------------------------------------------------
Yes the HADOOP_HOME variable is set. I added a more balanced approach that copies the reuters-sgm files to the cluster from the beginning and then runs all the operations on the cluster.
I have updated the request.
https://github.com/apache/mahout/pull/9
What do you think?
Thanks
was (Author: elmer.garduno):
Yes the HADOOP_HOME variable is set. I added a more balanced approach that copies the reuters-sgm files to the cluster from the beginning and then runs all the operations on the cluster.
I have updated the request.
https://github.com/apache/mahout/pull/9
Thanks
> cluster-reuters.sh executes seqdirectory with MAHOUT_LOCAL=true
> ---------------------------------------------------------------
>
> Key: MAHOUT-1108
> URL: https://issues.apache.org/jira/browse/MAHOUT-1108
> Project: Mahout
> Issue Type: Bug
> Affects Versions: 0.7
> Reporter: Elmer Garduno
> Priority: Minor
> Original Estimate: 1h
> Remaining Estimate: 1h
>
> Got the following exception when running the command with HADOOP_CONF and HADOOP_CONF_DIR
> Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/util/ProgramDriver
> at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:96)
> Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.util.ProgramDriver
> at java.net.URLClassLoader$1.run(Unknown Source)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(Unknown Source)
> at java.lang.ClassLoader.loadClass(Unknown Source)
> at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
> at java.lang.ClassLoader.loadClass(Unknown Source)
> ... 1 more
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira