You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Elmer Garduno (JIRA)" <ji...@apache.org> on 2012/10/31 06:45:11 UTC

[jira] [Created] (MAHOUT-1108) cluster-reuters.sh executes seqdirectory with MAHOUT_LOCAL=true

Elmer Garduno created MAHOUT-1108:
-------------------------------------

             Summary: cluster-reuters.sh executes seqdirectory with MAHOUT_LOCAL=true
                 Key: MAHOUT-1108
                 URL: https://issues.apache.org/jira/browse/MAHOUT-1108
             Project: Mahout
          Issue Type: Bug
    Affects Versions: 0.7
            Reporter: Elmer Garduno
            Priority: Minor


Got the following exception when running the command with HADOOP_CONF and  HADOOP_CONF_DIR

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/util/ProgramDriver
	at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:96)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.util.ProgramDriver
	at java.net.URLClassLoader$1.run(Unknown Source)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.net.URLClassLoader.findClass(Unknown Source)
	at java.lang.ClassLoader.loadClass(Unknown Source)
	at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
	at java.lang.ClassLoader.loadClass(Unknown Source)
	... 1 more



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-1108) cluster-reuters.sh executes seqdirectory with MAHOUT_LOCAL=true

Posted by "Lance Norskog (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13489178#comment-13489178 ] 

Lance Norskog commented on MAHOUT-1108:
---------------------------------------

I don't use a Hadoop cluster, I just run these jobs locally. Please do not require a cluster to run the example programs.
                
> cluster-reuters.sh executes seqdirectory with MAHOUT_LOCAL=true
> ---------------------------------------------------------------
>
>                 Key: MAHOUT-1108
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1108
>             Project: Mahout
>          Issue Type: Bug
>    Affects Versions: 0.7
>            Reporter: Elmer Garduno
>            Priority: Minor
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> Got the following exception when running the command with HADOOP_CONF and  HADOOP_CONF_DIR
> Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/util/ProgramDriver
> 	at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:96)
> Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.util.ProgramDriver
> 	at java.net.URLClassLoader$1.run(Unknown Source)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at java.net.URLClassLoader.findClass(Unknown Source)
> 	at java.lang.ClassLoader.loadClass(Unknown Source)
> 	at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
> 	at java.lang.ClassLoader.loadClass(Unknown Source)
> 	... 1 more

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Comment Edited] (MAHOUT-1108) cluster-reuters.sh executes seqdirectory with MAHOUT_LOCAL=true

Posted by "Paritosh Ranjan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13488613#comment-13488613 ] 

Paritosh Ranjan edited comment on MAHOUT-1108 at 11/1/12 11:21 AM:
-------------------------------------------------------------------

I tend to agree with you.

I don't see the point of extracting it locally and the putting the files in hdfs, if it can be put directly into hdfs. From what I can see, nothing else is done sequentially (locally) in this script. So, MAHOUT_LOCAL seems to be redundant to me.

Still, I think that the first mapreduce call should be after this check 

 HADOOP="$HADOOP_HOME/bin/hadoop"
  if [ ! -e $HADOOP ]; then
    echo "Can't find hadoop in $HADOOP, exiting"
    exit 1
  fi

so that the user is warned with a proper message.

Since I am not the creator of this script, and I am not sure about the use of MAHOUT_LOCAL, I would like to wait for someone to clarify the doubts regarding MAHOUT_LOCAL. Then, I think we can go ahead this change with some modifications ( like putting the mapreduce call after the check of hadoop's existence).


                
      was (Author: paritoshranjan):
    I tend to agree with you.

I don't see the point of extracting it locally and the putting the files in hdfs. From what I can see, nothing else is done sequentially (locally) in this script. So, MAHOUT_LOCAL seems to be redundant to me.

Still, I think that the first mapreduce call be after 

 HADOOP="$HADOOP_HOME/bin/hadoop"
  if [ ! -e $HADOOP ]; then
    echo "Can't find hadoop in $HADOOP, exiting"
    exit 1
  fi

so that the user is warned with a proper message.

Since I am not the creator of this script, and I am not sure about the use of MAHOUT_LOCAL, I would like to wait for someone to clarify the doubts regarding MAHOUT_LOCAL. Then, I think we can go ahead this change with some modifications ( like putting the mapreduce call after the check of hadoop's existence).


                  
> cluster-reuters.sh executes seqdirectory with MAHOUT_LOCAL=true
> ---------------------------------------------------------------
>
>                 Key: MAHOUT-1108
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1108
>             Project: Mahout
>          Issue Type: Bug
>    Affects Versions: 0.7
>            Reporter: Elmer Garduno
>            Priority: Minor
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> Got the following exception when running the command with HADOOP_CONF and  HADOOP_CONF_DIR
> Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/util/ProgramDriver
> 	at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:96)
> Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.util.ProgramDriver
> 	at java.net.URLClassLoader$1.run(Unknown Source)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at java.net.URLClassLoader.findClass(Unknown Source)
> 	at java.lang.ClassLoader.loadClass(Unknown Source)
> 	at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
> 	at java.lang.ClassLoader.loadClass(Unknown Source)
> 	... 1 more

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-1108) cluster-reuters.sh executes seqdirectory with MAHOUT_LOCAL=true

Posted by "Paritosh Ranjan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13487603#comment-13487603 ] 

Paritosh Ranjan commented on MAHOUT-1108:
-----------------------------------------

The code tries to put reuters-out-seqdir directly in hdfs. It will hamper the execution of cluster-reuters in local mode. The code also removes the hadoop cluster statu check and cleanup and directly puts everything in hdfs, which I think is not desired.

I think the problem faced while execution in distributed environment happened due to absence of HADOOP_HOME property. Can you retry with HADOOP_HOME set?
                
> cluster-reuters.sh executes seqdirectory with MAHOUT_LOCAL=true
> ---------------------------------------------------------------
>
>                 Key: MAHOUT-1108
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1108
>             Project: Mahout
>          Issue Type: Bug
>    Affects Versions: 0.7
>            Reporter: Elmer Garduno
>            Priority: Minor
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> Got the following exception when running the command with HADOOP_CONF and  HADOOP_CONF_DIR
> Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/util/ProgramDriver
> 	at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:96)
> Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.util.ProgramDriver
> 	at java.net.URLClassLoader$1.run(Unknown Source)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at java.net.URLClassLoader.findClass(Unknown Source)
> 	at java.lang.ClassLoader.loadClass(Unknown Source)
> 	at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
> 	at java.lang.ClassLoader.loadClass(Unknown Source)
> 	... 1 more

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-1108) cluster-reuters.sh executes seqdirectory with MAHOUT_LOCAL=true

Posted by "Paritosh Ranjan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13488613#comment-13488613 ] 

Paritosh Ranjan commented on MAHOUT-1108:
-----------------------------------------

I tend to agree with you.

I don't see the point of extracting it locally and the putting the files in hdfs. From what I can see, nothing else is done sequentially (locally) in this script. So, MAHOUT_LOCAL seems to be redundant to me.

Still, I think that the first mapreduce call be after 

 HADOOP="$HADOOP_HOME/bin/hadoop"
  if [ ! -e $HADOOP ]; then
    echo "Can't find hadoop in $HADOOP, exiting"
    exit 1
  fi

so that the user is warned with a proper message.

Since I am not the creator of this script, and I am not sure about the use of MAHOUT_LOCAL, I would like to wait for someone to clarify the doubts regarding MAHOUT_LOCAL. Then, I think we can go ahead this change with some modifications ( like putting the mapreduce call after the check of hadoop's existence).


                
> cluster-reuters.sh executes seqdirectory with MAHOUT_LOCAL=true
> ---------------------------------------------------------------
>
>                 Key: MAHOUT-1108
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1108
>             Project: Mahout
>          Issue Type: Bug
>    Affects Versions: 0.7
>            Reporter: Elmer Garduno
>            Priority: Minor
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> Got the following exception when running the command with HADOOP_CONF and  HADOOP_CONF_DIR
> Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/util/ProgramDriver
> 	at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:96)
> Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.util.ProgramDriver
> 	at java.net.URLClassLoader$1.run(Unknown Source)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at java.net.URLClassLoader.findClass(Unknown Source)
> 	at java.lang.ClassLoader.loadClass(Unknown Source)
> 	at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
> 	at java.lang.ClassLoader.loadClass(Unknown Source)
> 	... 1 more

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-1108) cluster-reuters.sh executes seqdirectory with MAHOUT_LOCAL=true

Posted by "Elmer Garduno (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13488425#comment-13488425 ] 

Elmer Garduno commented on MAHOUT-1108:
---------------------------------------

Yes the HADOOP_HOME variable is set. I added a more balanced approach that copies the reuters-sgm files to the cluster from the beginning and then runs all the operations on the cluster.

I have updated the request.

https://github.com/apache/mahout/pull/9

Thanks
                
> cluster-reuters.sh executes seqdirectory with MAHOUT_LOCAL=true
> ---------------------------------------------------------------
>
>                 Key: MAHOUT-1108
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1108
>             Project: Mahout
>          Issue Type: Bug
>    Affects Versions: 0.7
>            Reporter: Elmer Garduno
>            Priority: Minor
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> Got the following exception when running the command with HADOOP_CONF and  HADOOP_CONF_DIR
> Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/util/ProgramDriver
> 	at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:96)
> Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.util.ProgramDriver
> 	at java.net.URLClassLoader$1.run(Unknown Source)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at java.net.URLClassLoader.findClass(Unknown Source)
> 	at java.lang.ClassLoader.loadClass(Unknown Source)
> 	at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
> 	at java.lang.ClassLoader.loadClass(Unknown Source)
> 	... 1 more

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-1108) cluster-reuters.sh executes seqdirectory with MAHOUT_LOCAL=true

Posted by "Elmer Garduno (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13487561#comment-13487561 ] 

Elmer Garduno commented on MAHOUT-1108:
---------------------------------------

Added a pull request on github.

https://github.com/apache/mahout/pull/8
                
> cluster-reuters.sh executes seqdirectory with MAHOUT_LOCAL=true
> ---------------------------------------------------------------
>
>                 Key: MAHOUT-1108
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1108
>             Project: Mahout
>          Issue Type: Bug
>    Affects Versions: 0.7
>            Reporter: Elmer Garduno
>            Priority: Minor
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> Got the following exception when running the command with HADOOP_CONF and  HADOOP_CONF_DIR
> Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/util/ProgramDriver
> 	at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:96)
> Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.util.ProgramDriver
> 	at java.net.URLClassLoader$1.run(Unknown Source)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at java.net.URLClassLoader.findClass(Unknown Source)
> 	at java.lang.ClassLoader.loadClass(Unknown Source)
> 	at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
> 	at java.lang.ClassLoader.loadClass(Unknown Source)
> 	... 1 more

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Comment Edited] (MAHOUT-1108) cluster-reuters.sh executes seqdirectory with MAHOUT_LOCAL=true

Posted by "Paritosh Ranjan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13487603#comment-13487603 ] 

Paritosh Ranjan edited comment on MAHOUT-1108 at 10/31/12 7:42 AM:
-------------------------------------------------------------------

The code tries to put reuters-out-seqdir directly in hdfs. It will hamper the execution of cluster-reuters in local mode. The code also removes the hadoop cluster status check and cleanup and directly puts everything in hdfs, which I think is not desired.

I think the problem faced while execution in distributed environment happened due to absence of HADOOP_HOME property. Can you retry with HADOOP_HOME set?
                
      was (Author: paritoshranjan):
    The code tries to put reuters-out-seqdir directly in hdfs. It will hamper the execution of cluster-reuters in local mode. The code also removes the hadoop cluster statu check and cleanup and directly puts everything in hdfs, which I think is not desired.

I think the problem faced while execution in distributed environment happened due to absence of HADOOP_HOME property. Can you retry with HADOOP_HOME set?
                  
> cluster-reuters.sh executes seqdirectory with MAHOUT_LOCAL=true
> ---------------------------------------------------------------
>
>                 Key: MAHOUT-1108
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1108
>             Project: Mahout
>          Issue Type: Bug
>    Affects Versions: 0.7
>            Reporter: Elmer Garduno
>            Priority: Minor
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> Got the following exception when running the command with HADOOP_CONF and  HADOOP_CONF_DIR
> Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/util/ProgramDriver
> 	at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:96)
> Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.util.ProgramDriver
> 	at java.net.URLClassLoader$1.run(Unknown Source)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at java.net.URLClassLoader.findClass(Unknown Source)
> 	at java.lang.ClassLoader.loadClass(Unknown Source)
> 	at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
> 	at java.lang.ClassLoader.loadClass(Unknown Source)
> 	... 1 more

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-1108) cluster-reuters.sh executes seqdirectory with MAHOUT_LOCAL=true

Posted by "Elmer Garduno (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13489189#comment-13489189 ] 

Elmer Garduno commented on MAHOUT-1108:
---------------------------------------

@Paritosh I have updated the pull request to do the check before the first call, and also made sure the directory is available on HDFS or locally if that's the case.

@Lance I think this change doesn't affect the way you currently use it, as the rest of the commands where already invoked without using MAHOUT_LOCAL.
                
> cluster-reuters.sh executes seqdirectory with MAHOUT_LOCAL=true
> ---------------------------------------------------------------
>
>                 Key: MAHOUT-1108
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1108
>             Project: Mahout
>          Issue Type: Bug
>    Affects Versions: 0.7
>            Reporter: Elmer Garduno
>            Priority: Minor
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> Got the following exception when running the command with HADOOP_CONF and  HADOOP_CONF_DIR
> Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/util/ProgramDriver
> 	at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:96)
> Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.util.ProgramDriver
> 	at java.net.URLClassLoader$1.run(Unknown Source)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at java.net.URLClassLoader.findClass(Unknown Source)
> 	at java.lang.ClassLoader.loadClass(Unknown Source)
> 	at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
> 	at java.lang.ClassLoader.loadClass(Unknown Source)
> 	... 1 more

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Comment Edited] (MAHOUT-1108) cluster-reuters.sh executes seqdirectory with MAHOUT_LOCAL=true

Posted by "Elmer Garduno (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13488425#comment-13488425 ] 

Elmer Garduno edited comment on MAHOUT-1108 at 11/1/12 2:38 AM:
----------------------------------------------------------------

Yes the HADOOP_HOME variable is set. I added a more balanced approach that copies the reuters-sgm files to the cluster from the beginning and then runs all the operations on the cluster.

I have updated the request.

https://github.com/apache/mahout/pull/9

What do you think?

Thanks
                
      was (Author: elmer.garduno):
    Yes the HADOOP_HOME variable is set. I added a more balanced approach that copies the reuters-sgm files to the cluster from the beginning and then runs all the operations on the cluster.

I have updated the request.

https://github.com/apache/mahout/pull/9

Thanks
                  
> cluster-reuters.sh executes seqdirectory with MAHOUT_LOCAL=true
> ---------------------------------------------------------------
>
>                 Key: MAHOUT-1108
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1108
>             Project: Mahout
>          Issue Type: Bug
>    Affects Versions: 0.7
>            Reporter: Elmer Garduno
>            Priority: Minor
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> Got the following exception when running the command with HADOOP_CONF and  HADOOP_CONF_DIR
> Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/util/ProgramDriver
> 	at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:96)
> Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.util.ProgramDriver
> 	at java.net.URLClassLoader$1.run(Unknown Source)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at java.net.URLClassLoader.findClass(Unknown Source)
> 	at java.lang.ClassLoader.loadClass(Unknown Source)
> 	at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
> 	at java.lang.ClassLoader.loadClass(Unknown Source)
> 	... 1 more

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira