You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tuscany.apache.org by "Chris Trezzo (JIRA)" <de...@tuscany.apache.org> on 2008/07/09 21:11:31 UTC

[jira] Created: (TUSCANY-2471) Two small test functions that submit a MR job without the console, and the start of a Java Component that submits a MR job.

Two small test functions that submit a MR job without the console, and the start of a Java Component that submits a MR job.
---------------------------------------------------------------------------------------------------------------------------

                 Key: TUSCANY-2471
                 URL: https://issues.apache.org/jira/browse/TUSCANY-2471
             Project: Tuscany
          Issue Type: New Feature
         Environment: Mac OS 10.5.2
            Reporter: Chris Trezzo
            Priority: Minor


The Test class submits a MR job using the runJar API. (This is basically doing the same thing as the hadoop shell script)

The Test2 class submits a MR job without calling the main or run method in org.apache.hadoop.examples.WordCount, but the job is still submitted using a JAR file.

The services package includes incomplete code for a java SCA component that will do the same thing as Test2 through Tuscany.

In order for these methods to work, Hadoop must be running (I have it running in pseudo-distributed mode). Also the Hadoop library and the Hadoop/conf directory must be included in the class path.

Attached is the patch and wordcount.jar

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (TUSCANY-2471) Two small test functions that submit a MR job without the console, and the start of a Java Component that submits a MR job.

Posted by "Jean-Sebastien Delfino (JIRA)" <de...@tuscany.apache.org>.
    [ https://issues.apache.org/jira/browse/TUSCANY-2471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12613672#action_12613672 ] 

Jean-Sebastien Delfino commented on TUSCANY-2471:
-------------------------------------------------

Your patch has been applied in SVN r676988. Thanks!

> Two small test functions that submit a MR job without the console, and the start of a Java Component that submits a MR job.
> ---------------------------------------------------------------------------------------------------------------------------
>
>                 Key: TUSCANY-2471
>                 URL: https://issues.apache.org/jira/browse/TUSCANY-2471
>             Project: Tuscany
>          Issue Type: New Feature
>         Environment: Mac OS 10.5.2
>            Reporter: Chris Trezzo
>            Priority: Minor
>         Attachments: patch-JIRA-2471, wordcount.jar
>
>
> The Test class submits a MR job using the runJar API. (This is basically doing the same thing as the hadoop shell script)
> The Test2 class submits a MR job without calling the main or run method in org.apache.hadoop.examples.WordCount, but the job is still submitted using a JAR file.
> The services package includes incomplete code for a java SCA component that will do the same thing as Test2 through Tuscany.
> In order for these methods to work, Hadoop must be running (I have it running in pseudo-distributed mode). Also the Hadoop library and the Hadoop/conf directory must be included in the class path.
> Attached is the patch and wordcount.jar

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (TUSCANY-2471) Two small test functions that submit a MR job without the console, and the start of a Java Component that submits a MR job.

Posted by "Chris Trezzo (JIRA)" <de...@tuscany.apache.org>.
     [ https://issues.apache.org/jira/browse/TUSCANY-2471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Trezzo updated TUSCANY-2471:
----------------------------------

    Attachment: patch-JIRA-2471-pomUpdate

Please apply patch-JIRA-2471-pomUpdate in the following directory:
http://svn.apache.org/repos/asf/tuscany/sandbox/ctrezzo

In order for the test functions to run properly, the configuration directory (i.e. ../hadoop-0.15.3/conf) that includes hadoop-default.xml and hadoop-site.xml must be on the classpath.

Also the following jar files must be on the classpath as well:
hadoop-0.15.3-core.jar
commons-cli-2.0-SNAPSHOT.jar
commons-codec-1.3.jar
commons-httpclient-3.0.1.jar
commons-logging-1.0.4.jar
commons-logging-api-1.0.4.jar
jets3t-0.5.0.jar
jetty-5.1.4.jar
junit-3.8.1.jar
kfs-0.1.jar
log4j-1.2.13.jar
servlet-api.jar
xmlenc-0.52.jar

All of these files are listed as dependencies in the pom.xml file, and, with the exception of kfs-0.1.jar, will be downloaded to your maven repo automatically. (kfs-0.1.jar can be found in the /lib directory of the hadoop distribution, and must be installed manually for the time being)

Finally, the org.apache.hadoop.myExamples package must be put into a JAR file using the following command:
jar -cvf wordcount.jar -C directory-the-classes-are-in/

Remember, HDFS must be running in order for these functions to work.

Thanks!

> Two small test functions that submit a MR job without the console, and the start of a Java Component that submits a MR job.
> ---------------------------------------------------------------------------------------------------------------------------
>
>                 Key: TUSCANY-2471
>                 URL: https://issues.apache.org/jira/browse/TUSCANY-2471
>             Project: Tuscany
>          Issue Type: New Feature
>         Environment: Mac OS 10.5.2
>            Reporter: Chris Trezzo
>            Assignee: Jean-Sebastien Delfino
>            Priority: Minor
>         Attachments: patch-JIRA-2471, patch-JIRA-2471-pomUpdate, wordcount.jar
>
>
> The Test class submits a MR job using the runJar API. (This is basically doing the same thing as the hadoop shell script)
> The Test2 class submits a MR job without calling the main or run method in org.apache.hadoop.examples.WordCount, but the job is still submitted using a JAR file.
> The services package includes incomplete code for a java SCA component that will do the same thing as Test2 through Tuscany.
> In order for these methods to work, Hadoop must be running (I have it running in pseudo-distributed mode). Also the Hadoop library and the Hadoop/conf directory must be included in the class path.
> Attached is the patch and wordcount.jar

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (TUSCANY-2471) Two small test functions that submit a MR job without the console, and the start of a Java Component that submits a MR job.

Posted by "Chris Trezzo (JIRA)" <de...@tuscany.apache.org>.
    [ https://issues.apache.org/jira/browse/TUSCANY-2471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12612743#action_12612743 ] 

Chris Trezzo commented on TUSCANY-2471:
---------------------------------------

I have been searching through Hadoop's mailing list, and found an open JIRA issue [1] for publishing the hadoop-core to the apache repository.

Someone has been publishing hadoop-core on there own [2], but it is not an official repository.

[1] https://issues.apache.org/jira/browse/HADOOP-3305
[2] http://people.apache.org/~kalle/mahout/maven2/org/apache/hadoop/core/0.17.0-SNAPSHOT/

> Two small test functions that submit a MR job without the console, and the start of a Java Component that submits a MR job.
> ---------------------------------------------------------------------------------------------------------------------------
>
>                 Key: TUSCANY-2471
>                 URL: https://issues.apache.org/jira/browse/TUSCANY-2471
>             Project: Tuscany
>          Issue Type: New Feature
>         Environment: Mac OS 10.5.2
>            Reporter: Chris Trezzo
>            Priority: Minor
>         Attachments: patch-JIRA-2471, wordcount.jar
>
>
> The Test class submits a MR job using the runJar API. (This is basically doing the same thing as the hadoop shell script)
> The Test2 class submits a MR job without calling the main or run method in org.apache.hadoop.examples.WordCount, but the job is still submitted using a JAR file.
> The services package includes incomplete code for a java SCA component that will do the same thing as Test2 through Tuscany.
> In order for these methods to work, Hadoop must be running (I have it running in pseudo-distributed mode). Also the Hadoop library and the Hadoop/conf directory must be included in the class path.
> Attached is the patch and wordcount.jar

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (TUSCANY-2471) Two small test functions that submit a MR job without the console, and the start of a Java Component that submits a MR job.

Posted by "Jean-Sebastien Delfino (JIRA)" <de...@tuscany.apache.org>.
    [ https://issues.apache.org/jira/browse/TUSCANY-2471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12612633#action_12612633 ] 

Jean-Sebastien Delfino commented on TUSCANY-2471:
-------------------------------------------------

That looks like a good starting point for experimenting with Hadoop.

To help others in the project try it, could you please provide a Maven pom.xml for that code?

I guess the Maven pom should build the few classes you have in your patch, the Wordcount jar, and reference the required dependencies from Hadoop. The hadoop JARs do not seem to be available in the Apache Maven repos yet (unless I missed them) so I suggest the following:

1. ask the Hadoop project (on their dev list) if they already have their JARS in a public repos and if not if they could please publish them
2. in the meantime install the Hadoop JARs manually in your local Maven repos and write in this JIRA or better in a README file the instructions to do it (for others who will want to try your code)

Thanks!



> Two small test functions that submit a MR job without the console, and the start of a Java Component that submits a MR job.
> ---------------------------------------------------------------------------------------------------------------------------
>
>                 Key: TUSCANY-2471
>                 URL: https://issues.apache.org/jira/browse/TUSCANY-2471
>             Project: Tuscany
>          Issue Type: New Feature
>         Environment: Mac OS 10.5.2
>            Reporter: Chris Trezzo
>            Priority: Minor
>         Attachments: patch-JIRA-2471, wordcount.jar
>
>
> The Test class submits a MR job using the runJar API. (This is basically doing the same thing as the hadoop shell script)
> The Test2 class submits a MR job without calling the main or run method in org.apache.hadoop.examples.WordCount, but the job is still submitted using a JAR file.
> The services package includes incomplete code for a java SCA component that will do the same thing as Test2 through Tuscany.
> In order for these methods to work, Hadoop must be running (I have it running in pseudo-distributed mode). Also the Hadoop library and the Hadoop/conf directory must be included in the class path.
> Attached is the patch and wordcount.jar

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (TUSCANY-2471) Two small test functions that submit a MR job without the console, and the start of a Java Component that submits a MR job.

Posted by "Jean-Sebastien Delfino (JIRA)" <de...@tuscany.apache.org>.
     [ https://issues.apache.org/jira/browse/TUSCANY-2471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jean-Sebastien Delfino reassigned TUSCANY-2471:
-----------------------------------------------

    Assignee: Jean-Sebastien Delfino

> Two small test functions that submit a MR job without the console, and the start of a Java Component that submits a MR job.
> ---------------------------------------------------------------------------------------------------------------------------
>
>                 Key: TUSCANY-2471
>                 URL: https://issues.apache.org/jira/browse/TUSCANY-2471
>             Project: Tuscany
>          Issue Type: New Feature
>         Environment: Mac OS 10.5.2
>            Reporter: Chris Trezzo
>            Assignee: Jean-Sebastien Delfino
>            Priority: Minor
>         Attachments: patch-JIRA-2471, wordcount.jar
>
>
> The Test class submits a MR job using the runJar API. (This is basically doing the same thing as the hadoop shell script)
> The Test2 class submits a MR job without calling the main or run method in org.apache.hadoop.examples.WordCount, but the job is still submitted using a JAR file.
> The services package includes incomplete code for a java SCA component that will do the same thing as Test2 through Tuscany.
> In order for these methods to work, Hadoop must be running (I have it running in pseudo-distributed mode). Also the Hadoop library and the Hadoop/conf directory must be included in the class path.
> Attached is the patch and wordcount.jar

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (TUSCANY-2471) Two small test functions that submit a MR job without the console, and the start of a Java Component that submits a MR job.

Posted by "Chris Trezzo (JIRA)" <de...@tuscany.apache.org>.
     [ https://issues.apache.org/jira/browse/TUSCANY-2471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Trezzo updated TUSCANY-2471:
----------------------------------

    Attachment: wordcount.jar
                patch-JIRA-2471

The patch should be applied in the following directory:
http://svn.apache.org/repos/asf/tuscany/sandbox/ctrezzo


> Two small test functions that submit a MR job without the console, and the start of a Java Component that submits a MR job.
> ---------------------------------------------------------------------------------------------------------------------------
>
>                 Key: TUSCANY-2471
>                 URL: https://issues.apache.org/jira/browse/TUSCANY-2471
>             Project: Tuscany
>          Issue Type: New Feature
>         Environment: Mac OS 10.5.2
>            Reporter: Chris Trezzo
>            Priority: Minor
>         Attachments: patch-JIRA-2471, wordcount.jar
>
>
> The Test class submits a MR job using the runJar API. (This is basically doing the same thing as the hadoop shell script)
> The Test2 class submits a MR job without calling the main or run method in org.apache.hadoop.examples.WordCount, but the job is still submitted using a JAR file.
> The services package includes incomplete code for a java SCA component that will do the same thing as Test2 through Tuscany.
> In order for these methods to work, Hadoop must be running (I have it running in pseudo-distributed mode). Also the Hadoop library and the Hadoop/conf directory must be included in the class path.
> Attached is the patch and wordcount.jar

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.