You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@crunch.apache.org by "Roman Shaposhnik (JIRA)" <ji...@apache.org> on 2012/09/19 18:45:07 UTC

[jira] [Created] (CRUNCH-68) Crunch examples don't accept generic tool arguments

Roman Shaposhnik created CRUNCH-68:
--------------------------------------

             Summary: Crunch examples don't accept generic tool arguments
                 Key: CRUNCH-68
                 URL: https://issues.apache.org/jira/browse/CRUNCH-68
             Project: Crunch
          Issue Type: Bug
          Components: Core
    Affects Versions: 0.3.0
            Reporter: Roman Shaposhnik
            Assignee: Josh Wills
             Fix For: 0.4.0


Currently all crunch examples have the following code:

{noformat}
    if (args.length != 3) {
      System.err.println();
      System.err.println("Usage: " + this.getClass().getName() + " [generic options] input output");
      System.err.println();
      GenericOptionsParser.printGenericCommandUsage(System.err);
      return 1;
    }
{noformat}

this is incorrect since run() gets to see all arguments even generic ones and thus you can't predict the value of 
args.length.
This is also, unfortunately a major blocker, for using Crunch with Hadoop 2 because of a MAPREDUCE-4068. 

Essentially at this point a combination of MAPREDUCE-4068 and inability to pass -libjars makes Crunch example DOA for Hadoop 2 clusters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CRUNCH-68) Crunch examples don't accept generic tool arguments

Posted by "Matthias Friedrich (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CRUNCH-68?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13459719#comment-13459719 ] 

Matthias Friedrich commented on CRUNCH-68:
------------------------------------------

Hi Brock, thanks a lot for helping us out here!

Building multiple example JARs seems excessive, let's just remove the mainClass from the manifest and then fix things according to 2) and 3).
                
> Crunch examples don't accept generic tool arguments
> ---------------------------------------------------
>
>                 Key: CRUNCH-68
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-68
>             Project: Crunch
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.3.0
>            Reporter: Roman Shaposhnik
>            Assignee: Matthias Friedrich
>             Fix For: 0.4.0
>
>         Attachments: CRUNCH-68-Fix-command-line-parser-for-examples.patch
>
>
> Currently all crunch examples have the following code:
> {noformat}
>     if (args.length != 3) {
>       System.err.println();
>       System.err.println("Usage: " + this.getClass().getName() + " [generic options] input output");
>       System.err.println();
>       GenericOptionsParser.printGenericCommandUsage(System.err);
>       return 1;
>     }
> {noformat}
> this is incorrect since run() gets to see all arguments even generic ones and thus you can't predict the value of 
> args.length.
> This is also, unfortunately a major blocker, for using Crunch with Hadoop 2 because of a MAPREDUCE-4068. 
> Essentially at this point a combination of MAPREDUCE-4068 and inability to pass -libjars makes Crunch example DOA for Hadoop 2 clusters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (CRUNCH-68) Crunch examples don't accept generic tool arguments

Posted by "Matthias Friedrich (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CRUNCH-68?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matthias Friedrich resolved CRUNCH-68.
--------------------------------------

    Resolution: Fixed
      Assignee: Matthias Friedrich  (was: Josh Wills)

OK, committed.
                
> Crunch examples don't accept generic tool arguments
> ---------------------------------------------------
>
>                 Key: CRUNCH-68
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-68
>             Project: Crunch
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.3.0
>            Reporter: Roman Shaposhnik
>            Assignee: Matthias Friedrich
>             Fix For: 0.4.0
>
>         Attachments: CRUNCH-68-Fix-command-line-parser-for-examples.patch
>
>
> Currently all crunch examples have the following code:
> {noformat}
>     if (args.length != 3) {
>       System.err.println();
>       System.err.println("Usage: " + this.getClass().getName() + " [generic options] input output");
>       System.err.println();
>       GenericOptionsParser.printGenericCommandUsage(System.err);
>       return 1;
>     }
> {noformat}
> this is incorrect since run() gets to see all arguments even generic ones and thus you can't predict the value of 
> args.length.
> This is also, unfortunately a major blocker, for using Crunch with Hadoop 2 because of a MAPREDUCE-4068. 
> Essentially at this point a combination of MAPREDUCE-4068 and inability to pass -libjars makes Crunch example DOA for Hadoop 2 clusters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Reopened] (CRUNCH-68) Crunch examples don't accept generic tool arguments

Posted by "Matthias Friedrich (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CRUNCH-68?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matthias Friedrich reopened CRUNCH-68:
--------------------------------------


Not quite done, reopening ...
                
> Crunch examples don't accept generic tool arguments
> ---------------------------------------------------
>
>                 Key: CRUNCH-68
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-68
>             Project: Crunch
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.3.0
>            Reporter: Roman Shaposhnik
>            Assignee: Matthias Friedrich
>             Fix For: 0.4.0
>
>         Attachments: CRUNCH-68-Fix-command-line-parser-for-examples.patch
>
>
> Currently all crunch examples have the following code:
> {noformat}
>     if (args.length != 3) {
>       System.err.println();
>       System.err.println("Usage: " + this.getClass().getName() + " [generic options] input output");
>       System.err.println();
>       GenericOptionsParser.printGenericCommandUsage(System.err);
>       return 1;
>     }
> {noformat}
> this is incorrect since run() gets to see all arguments even generic ones and thus you can't predict the value of 
> args.length.
> This is also, unfortunately a major blocker, for using Crunch with Hadoop 2 because of a MAPREDUCE-4068. 
> Essentially at this point a combination of MAPREDUCE-4068 and inability to pass -libjars makes Crunch example DOA for Hadoop 2 clusters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CRUNCH-68) Crunch examples don't accept generic tool arguments

Posted by "Roman Shaposhnik (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CRUNCH-68?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13458969#comment-13458969 ] 

Roman Shaposhnik commented on CRUNCH-68:
----------------------------------------

bq. Hmm, looking at AverageBytesByIP and TotalBytesByIP, I think they haven't worked in a while

Perhaps this has nothing to do with generic tool arguments then. Is it worth pursuing bringing them back to life?
                
> Crunch examples don't accept generic tool arguments
> ---------------------------------------------------
>
>                 Key: CRUNCH-68
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-68
>             Project: Crunch
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.3.0
>            Reporter: Roman Shaposhnik
>            Assignee: Josh Wills
>             Fix For: 0.4.0
>
>         Attachments: CRUNCH-68-Fix-command-line-parser-for-examples.patch
>
>
> Currently all crunch examples have the following code:
> {noformat}
>     if (args.length != 3) {
>       System.err.println();
>       System.err.println("Usage: " + this.getClass().getName() + " [generic options] input output");
>       System.err.println();
>       GenericOptionsParser.printGenericCommandUsage(System.err);
>       return 1;
>     }
> {noformat}
> this is incorrect since run() gets to see all arguments even generic ones and thus you can't predict the value of 
> args.length.
> This is also, unfortunately a major blocker, for using Crunch with Hadoop 2 because of a MAPREDUCE-4068. 
> Essentially at this point a combination of MAPREDUCE-4068 and inability to pass -libjars makes Crunch example DOA for Hadoop 2 clusters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CRUNCH-68) Crunch examples don't accept generic tool arguments

Posted by "Roman Shaposhnik (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CRUNCH-68?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13459253#comment-13459253 ] 

Roman Shaposhnik commented on CRUNCH-68:
----------------------------------------

Matthias, that was the issue I was alluding to -- sorry for not being clear enough (I'm just getting into Crunch code based from the Bigtop integration side of things!).
                
> Crunch examples don't accept generic tool arguments
> ---------------------------------------------------
>
>                 Key: CRUNCH-68
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-68
>             Project: Crunch
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.3.0
>            Reporter: Roman Shaposhnik
>            Assignee: Josh Wills
>             Fix For: 0.4.0
>
>         Attachments: CRUNCH-68-Fix-command-line-parser-for-examples.patch
>
>
> Currently all crunch examples have the following code:
> {noformat}
>     if (args.length != 3) {
>       System.err.println();
>       System.err.println("Usage: " + this.getClass().getName() + " [generic options] input output");
>       System.err.println();
>       GenericOptionsParser.printGenericCommandUsage(System.err);
>       return 1;
>     }
> {noformat}
> this is incorrect since run() gets to see all arguments even generic ones and thus you can't predict the value of 
> args.length.
> This is also, unfortunately a major blocker, for using Crunch with Hadoop 2 because of a MAPREDUCE-4068. 
> Essentially at this point a combination of MAPREDUCE-4068 and inability to pass -libjars makes Crunch example DOA for Hadoop 2 clusters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CRUNCH-68) Crunch examples don't accept generic tool arguments

Posted by "Brock Noland (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CRUNCH-68?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13461284#comment-13461284 ] 

Brock Noland commented on CRUNCH-68:
------------------------------------

+1, thanks for taking care of this.
                
> Crunch examples don't accept generic tool arguments
> ---------------------------------------------------
>
>                 Key: CRUNCH-68
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-68
>             Project: Crunch
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.3.0
>            Reporter: Roman Shaposhnik
>            Assignee: Matthias Friedrich
>             Fix For: 0.4.0
>
>         Attachments: CRUNCH-68-Fix-command-line-parser-for-examples.patch, CRUNCH-68-v2.patch
>
>
> Currently all crunch examples have the following code:
> {noformat}
>     if (args.length != 3) {
>       System.err.println();
>       System.err.println("Usage: " + this.getClass().getName() + " [generic options] input output");
>       System.err.println();
>       GenericOptionsParser.printGenericCommandUsage(System.err);
>       return 1;
>     }
> {noformat}
> this is incorrect since run() gets to see all arguments even generic ones and thus you can't predict the value of 
> args.length.
> This is also, unfortunately a major blocker, for using Crunch with Hadoop 2 because of a MAPREDUCE-4068. 
> Essentially at this point a combination of MAPREDUCE-4068 and inability to pass -libjars makes Crunch example DOA for Hadoop 2 clusters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (CRUNCH-68) Crunch examples don't accept generic tool arguments

Posted by "Matthias Friedrich (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CRUNCH-68?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matthias Friedrich resolved CRUNCH-68.
--------------------------------------

    Resolution: Fixed

Thanks, Brock!
                
> Crunch examples don't accept generic tool arguments
> ---------------------------------------------------
>
>                 Key: CRUNCH-68
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-68
>             Project: Crunch
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.3.0
>            Reporter: Roman Shaposhnik
>            Assignee: Matthias Friedrich
>             Fix For: 0.4.0
>
>         Attachments: CRUNCH-68-Fix-command-line-parser-for-examples.patch, CRUNCH-68-v2.patch
>
>
> Currently all crunch examples have the following code:
> {noformat}
>     if (args.length != 3) {
>       System.err.println();
>       System.err.println("Usage: " + this.getClass().getName() + " [generic options] input output");
>       System.err.println();
>       GenericOptionsParser.printGenericCommandUsage(System.err);
>       return 1;
>     }
> {noformat}
> this is incorrect since run() gets to see all arguments even generic ones and thus you can't predict the value of 
> args.length.
> This is also, unfortunately a major blocker, for using Crunch with Hadoop 2 because of a MAPREDUCE-4068. 
> Essentially at this point a combination of MAPREDUCE-4068 and inability to pass -libjars makes Crunch example DOA for Hadoop 2 clusters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CRUNCH-68) Crunch examples don't accept generic tool arguments

Posted by "Matthias Friedrich (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CRUNCH-68?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13459011#comment-13459011 ] 

Matthias Friedrich commented on CRUNCH-68:
------------------------------------------

AverageBytesByIP and TotalBytesByIP expected 2 arguments while they actually need three (class name, input, output); the patch fixes this. I'm not aware of any other problems.
                
> Crunch examples don't accept generic tool arguments
> ---------------------------------------------------
>
>                 Key: CRUNCH-68
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-68
>             Project: Crunch
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.3.0
>            Reporter: Roman Shaposhnik
>            Assignee: Josh Wills
>             Fix For: 0.4.0
>
>         Attachments: CRUNCH-68-Fix-command-line-parser-for-examples.patch
>
>
> Currently all crunch examples have the following code:
> {noformat}
>     if (args.length != 3) {
>       System.err.println();
>       System.err.println("Usage: " + this.getClass().getName() + " [generic options] input output");
>       System.err.println();
>       GenericOptionsParser.printGenericCommandUsage(System.err);
>       return 1;
>     }
> {noformat}
> this is incorrect since run() gets to see all arguments even generic ones and thus you can't predict the value of 
> args.length.
> This is also, unfortunately a major blocker, for using Crunch with Hadoop 2 because of a MAPREDUCE-4068. 
> Essentially at this point a combination of MAPREDUCE-4068 and inability to pass -libjars makes Crunch example DOA for Hadoop 2 clusters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CRUNCH-68) Crunch examples don't accept generic tool arguments

Posted by "Matthias Friedrich (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CRUNCH-68?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matthias Friedrich updated CRUNCH-68:
-------------------------------------

    Attachment: CRUNCH-68-v2.patch

I think this should do it.
                
> Crunch examples don't accept generic tool arguments
> ---------------------------------------------------
>
>                 Key: CRUNCH-68
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-68
>             Project: Crunch
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.3.0
>            Reporter: Roman Shaposhnik
>            Assignee: Matthias Friedrich
>             Fix For: 0.4.0
>
>         Attachments: CRUNCH-68-Fix-command-line-parser-for-examples.patch, CRUNCH-68-v2.patch
>
>
> Currently all crunch examples have the following code:
> {noformat}
>     if (args.length != 3) {
>       System.err.println();
>       System.err.println("Usage: " + this.getClass().getName() + " [generic options] input output");
>       System.err.println();
>       GenericOptionsParser.printGenericCommandUsage(System.err);
>       return 1;
>     }
> {noformat}
> this is incorrect since run() gets to see all arguments even generic ones and thus you can't predict the value of 
> args.length.
> This is also, unfortunately a major blocker, for using Crunch with Hadoop 2 because of a MAPREDUCE-4068. 
> Essentially at this point a combination of MAPREDUCE-4068 and inability to pass -libjars makes Crunch example DOA for Hadoop 2 clusters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CRUNCH-68) Crunch examples don't accept generic tool arguments

Posted by "Josh Wills (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CRUNCH-68?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13459130#comment-13459130 ] 

Josh Wills commented on CRUNCH-68:
----------------------------------

+1-- let's commit it.
                
> Crunch examples don't accept generic tool arguments
> ---------------------------------------------------
>
>                 Key: CRUNCH-68
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-68
>             Project: Crunch
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.3.0
>            Reporter: Roman Shaposhnik
>            Assignee: Josh Wills
>             Fix For: 0.4.0
>
>         Attachments: CRUNCH-68-Fix-command-line-parser-for-examples.patch
>
>
> Currently all crunch examples have the following code:
> {noformat}
>     if (args.length != 3) {
>       System.err.println();
>       System.err.println("Usage: " + this.getClass().getName() + " [generic options] input output");
>       System.err.println();
>       GenericOptionsParser.printGenericCommandUsage(System.err);
>       return 1;
>     }
> {noformat}
> this is incorrect since run() gets to see all arguments even generic ones and thus you can't predict the value of 
> args.length.
> This is also, unfortunately a major blocker, for using Crunch with Hadoop 2 because of a MAPREDUCE-4068. 
> Essentially at this point a combination of MAPREDUCE-4068 and inability to pass -libjars makes Crunch example DOA for Hadoop 2 clusters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CRUNCH-68) Crunch examples don't accept generic tool arguments

Posted by "Brock Noland (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CRUNCH-68?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13459672#comment-13459672 ] 

Brock Noland commented on CRUNCH-68:
------------------------------------

Alright, here is what I have uncovered:

1) The reason that the main and run methods are getting the classname is because the jar manifest has the classname already specified:

{noformat}
$ hadoop jar target/apache-crunch-0.4.0-incubating-SNAPSHOT-job.jar not.a.class.name wordcount/input wordcount/output-2
12/09/20 10:21:44 INFO exec.CrunchJob: Running job "org.apache.crunch.examples.WordCount: Text(wordcount/input)+S0+Aggregate.count+GBK+combine+asText+Text(wordcount/output-2)"
{noformat}

Note that not.a.class.name is only required because the run() method is looking for 3 args.

2) Due to #1, it's actually not possible to run the other examples:

{noformat}
$ hadoop jar target/apache-crunch-0.4.0-incubating-SNAPSHOT-job.jar org.apache.crunch.examples.TotalBytesByIP access_log/input access_log/output
12/09/20 10:20:14 INFO exec.CrunchJob: Running job "org.apache.crunch.examples.WordCount: Text(access_log/input)+S0+Aggregate.count+GBK+combine+asText+Text(access_log/output)"
{noformat}

3) All examples use ToolRunner which in both 1.X and 2.X already parse the args with GenericOptionsParser and pass the remaining args to the run() method:

https://github.com/apache/hadoop-common/blob/release-1.0.3/src/core/org/apache/hadoop/util/ToolRunner.java#L59
https://github.com/apache/hadoop-common/blob/release-2.0.1-alpha/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/ToolRunner.java#L64


Points of action:
1) Either a jar should be generated for all examples or we should remove the mainClass from the jar manifest.
2) All examples should take 2 args. The class is specified either in the jar manifest or on the command line and will never be passed to the run() method unless you have it both in the manifest and on the command line.
3) The examples should not use GenericOptionsParser in the run() method.

Let me know if you agree and I can open JIRAs for said items.
                
> Crunch examples don't accept generic tool arguments
> ---------------------------------------------------
>
>                 Key: CRUNCH-68
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-68
>             Project: Crunch
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.3.0
>            Reporter: Roman Shaposhnik
>            Assignee: Matthias Friedrich
>             Fix For: 0.4.0
>
>         Attachments: CRUNCH-68-Fix-command-line-parser-for-examples.patch
>
>
> Currently all crunch examples have the following code:
> {noformat}
>     if (args.length != 3) {
>       System.err.println();
>       System.err.println("Usage: " + this.getClass().getName() + " [generic options] input output");
>       System.err.println();
>       GenericOptionsParser.printGenericCommandUsage(System.err);
>       return 1;
>     }
> {noformat}
> this is incorrect since run() gets to see all arguments even generic ones and thus you can't predict the value of 
> args.length.
> This is also, unfortunately a major blocker, for using Crunch with Hadoop 2 because of a MAPREDUCE-4068. 
> Essentially at this point a combination of MAPREDUCE-4068 and inability to pass -libjars makes Crunch example DOA for Hadoop 2 clusters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CRUNCH-68) Crunch examples don't accept generic tool arguments

Posted by "Matthias Friedrich (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CRUNCH-68?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matthias Friedrich updated CRUNCH-68:
-------------------------------------

    Attachment: CRUNCH-68-Fix-command-line-parser-for-examples.patch

Hmm, looking at AverageBytesByIP and TotalBytesByIP, I think they haven't worked in a while.

Anyway, this should be easy to fix with GenericOptionParser. I've also added correct exit codes, you might need these.
                
> Crunch examples don't accept generic tool arguments
> ---------------------------------------------------
>
>                 Key: CRUNCH-68
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-68
>             Project: Crunch
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.3.0
>            Reporter: Roman Shaposhnik
>            Assignee: Josh Wills
>             Fix For: 0.4.0
>
>         Attachments: CRUNCH-68-Fix-command-line-parser-for-examples.patch
>
>
> Currently all crunch examples have the following code:
> {noformat}
>     if (args.length != 3) {
>       System.err.println();
>       System.err.println("Usage: " + this.getClass().getName() + " [generic options] input output");
>       System.err.println();
>       GenericOptionsParser.printGenericCommandUsage(System.err);
>       return 1;
>     }
> {noformat}
> this is incorrect since run() gets to see all arguments even generic ones and thus you can't predict the value of 
> args.length.
> This is also, unfortunately a major blocker, for using Crunch with Hadoop 2 because of a MAPREDUCE-4068. 
> Essentially at this point a combination of MAPREDUCE-4068 and inability to pass -libjars makes Crunch example DOA for Hadoop 2 clusters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CRUNCH-68) Crunch examples don't accept generic tool arguments

Posted by "Brock Noland (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CRUNCH-68?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13459547#comment-13459547 ] 

Brock Noland commented on CRUNCH-68:
------------------------------------

Hmm something is amiss here. They shouldn't need the classname and ToolRunner already does the generic options stuff:

https://github.com/apache/hadoop-common/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/ToolRunner.java#L59

Since I wrote these and have done *nothing* on Crunch since it was on GitHub. I will look at this later today.
                
> Crunch examples don't accept generic tool arguments
> ---------------------------------------------------
>
>                 Key: CRUNCH-68
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-68
>             Project: Crunch
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.3.0
>            Reporter: Roman Shaposhnik
>            Assignee: Matthias Friedrich
>             Fix For: 0.4.0
>
>         Attachments: CRUNCH-68-Fix-command-line-parser-for-examples.patch
>
>
> Currently all crunch examples have the following code:
> {noformat}
>     if (args.length != 3) {
>       System.err.println();
>       System.err.println("Usage: " + this.getClass().getName() + " [generic options] input output");
>       System.err.println();
>       GenericOptionsParser.printGenericCommandUsage(System.err);
>       return 1;
>     }
> {noformat}
> this is incorrect since run() gets to see all arguments even generic ones and thus you can't predict the value of 
> args.length.
> This is also, unfortunately a major blocker, for using Crunch with Hadoop 2 because of a MAPREDUCE-4068. 
> Essentially at this point a combination of MAPREDUCE-4068 and inability to pass -libjars makes Crunch example DOA for Hadoop 2 clusters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Comment Edited] (CRUNCH-68) Crunch examples don't accept generic tool arguments

Posted by "Brock Noland (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CRUNCH-68?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13459672#comment-13459672 ] 

Brock Noland edited comment on CRUNCH-68 at 9/21/12 2:31 AM:
-------------------------------------------------------------

Alright, here is what I have uncovered:

1) The reason that the main and run methods are getting the classname is because the jar manifest has the classname already specified:

{code}
$ hadoop jar target/apache-crunch-0.4.0-incubating-SNAPSHOT-job.jar not.a.class.name wordcount/input wordcount/output-2
12/09/20 10:21:44 INFO exec.CrunchJob: Running job "org.apache.crunch.examples.WordCount: Text(wordcount/input)+S0+Aggregate.count+GBK+combine+asText+Text(wordcount/output-2)"
{code}

Note that not.a.class.name is only required because the run() method is looking for 3 args.

2) Due to #1, it's actually not possible to run the other examples:

{code}
$ hadoop jar target/apache-crunch-0.4.0-incubating-SNAPSHOT-job.jar org.apache.crunch.examples.TotalBytesByIP access_log/input access_log/output
12/09/20 10:20:14 INFO exec.CrunchJob: Running job "org.apache.crunch.examples.WordCount: Text(access_log/input)+S0+Aggregate.count+GBK+combine+asText+Text(access_log/output)"
{code}

3) All examples use ToolRunner which in both 1.X and 2.X already parse the args with GenericOptionsParser and pass the remaining args to the run() method:

https://github.com/apache/hadoop-common/blob/release-1.0.3/src/core/org/apache/hadoop/util/ToolRunner.java#L59
https://github.com/apache/hadoop-common/blob/release-2.0.1-alpha/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/ToolRunner.java#L64


Points of action:
1) Either a jar should be generated for all examples or we should remove the mainClass from the jar manifest.
2) All examples should take 2 args. The class is specified either in the jar manifest or on the command line and will never be passed to the run() method unless you have it both in the manifest and on the command line.
3) The examples should not use GenericOptionsParser in the run() method.

Let me know if you agree and I can open JIRAs for said items.
                
      was (Author: brocknoland):
    Alright, here is what I have uncovered:

1) The reason that the main and run methods are getting the classname is because the jar manifest has the classname already specified:

{noformat}
$ hadoop jar target/apache-crunch-0.4.0-incubating-SNAPSHOT-job.jar not.a.class.name wordcount/input wordcount/output-2
12/09/20 10:21:44 INFO exec.CrunchJob: Running job "org.apache.crunch.examples.WordCount: Text(wordcount/input)+S0+Aggregate.count+GBK+combine+asText+Text(wordcount/output-2)"
{noformat}

Note that not.a.class.name is only required because the run() method is looking for 3 args.

2) Due to #1, it's actually not possible to run the other examples:

{noformat}
$ hadoop jar target/apache-crunch-0.4.0-incubating-SNAPSHOT-job.jar org.apache.crunch.examples.TotalBytesByIP access_log/input access_log/output
12/09/20 10:20:14 INFO exec.CrunchJob: Running job "org.apache.crunch.examples.WordCount: Text(access_log/input)+S0+Aggregate.count+GBK+combine+asText+Text(access_log/output)"
{noformat}

3) All examples use ToolRunner which in both 1.X and 2.X already parse the args with GenericOptionsParser and pass the remaining args to the run() method:

https://github.com/apache/hadoop-common/blob/release-1.0.3/src/core/org/apache/hadoop/util/ToolRunner.java#L59
https://github.com/apache/hadoop-common/blob/release-2.0.1-alpha/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/ToolRunner.java#L64


Points of action:
1) Either a jar should be generated for all examples or we should remove the mainClass from the jar manifest.
2) All examples should take 2 args. The class is specified either in the jar manifest or on the command line and will never be passed to the run() method unless you have it both in the manifest and on the command line.
3) The examples should not use GenericOptionsParser in the run() method.

Let me know if you agree and I can open JIRAs for said items.
                  
> Crunch examples don't accept generic tool arguments
> ---------------------------------------------------
>
>                 Key: CRUNCH-68
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-68
>             Project: Crunch
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.3.0
>            Reporter: Roman Shaposhnik
>            Assignee: Matthias Friedrich
>             Fix For: 0.4.0
>
>         Attachments: CRUNCH-68-Fix-command-line-parser-for-examples.patch
>
>
> Currently all crunch examples have the following code:
> {noformat}
>     if (args.length != 3) {
>       System.err.println();
>       System.err.println("Usage: " + this.getClass().getName() + " [generic options] input output");
>       System.err.println();
>       GenericOptionsParser.printGenericCommandUsage(System.err);
>       return 1;
>     }
> {noformat}
> this is incorrect since run() gets to see all arguments even generic ones and thus you can't predict the value of 
> args.length.
> This is also, unfortunately a major blocker, for using Crunch with Hadoop 2 because of a MAPREDUCE-4068. 
> Essentially at this point a combination of MAPREDUCE-4068 and inability to pass -libjars makes Crunch example DOA for Hadoop 2 clusters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira