You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Santhosh Srinivasan (JIRA)" <ji...@apache.org> on 2009/02/09 18:20:59 UTC

[jira] Created: (PIG-660) Integration with Hadoop 0.20

Integration with Hadoop 0.20
----------------------------

                 Key: PIG-660
                 URL: https://issues.apache.org/jira/browse/PIG-660
             Project: Pig
          Issue Type: Bug
          Components: impl
    Affects Versions: types_branch
         Environment: Hadoop 0.20
            Reporter: Santhosh Srinivasan
             Fix For: 0.1.0


With Hadoop 0.20, it will be possible to query the status of each map and reduce in a map reduce job. This will allow better error reporting. Some of the other items that could be on Hadoop's feature requests/bugs are documented here for tracking.

1. Hadoop should return objects instead of strings when exceptions are thrown
2. The JobControl should handle all exceptions and report them appropriately. For example, when the JobControl fails to launch jobs, it should handle exceptions appropriately and should support APIs that query this state, i.e., failure to launch jobs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-660) Integration with Hadoop 0.20

Posted by "Dmitriy V. Ryaboy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12740241#action_12740241 ] 

Dmitriy V. Ryaboy commented on PIG-660:
---------------------------------------

The shim patch posted above doesn't work as cleanly as desired; the current build.xml has junit.hadoop.conf points to a directory in ${user.home}

This has an undesired effect -- a hadoop config file gets created the first time you run ant, which among other things sets what class implements the FileSytem interface. When ant gets re-run with a different hadoop version, 'ant clean' does not clean out this file -- so an incorrect fs class name gets used.  Deleting the directory created by junit.hadoop.conf before rerunning fixes the problem; so does putting the value of junit.hadoop.conf relative to ${build.dir} instead of ${user.home}.  

As I am not sure how the Y! developers use their pigconf directories this thing references, I do not know the appropriate way to proceed. Comments?

> Integration with Hadoop 0.20
> ----------------------------
>
>                 Key: PIG-660
>                 URL: https://issues.apache.org/jira/browse/PIG-660
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.2.0
>         Environment: Hadoop 0.20
>            Reporter: Santhosh Srinivasan
>            Assignee: Santhosh Srinivasan
>             Fix For: 0.4.0
>
>         Attachments: PIG-660-for-branch-0.3.patch, PIG-660.patch, PIG-660_1.patch, PIG-660_2.patch, PIG-660_3.patch, PIG-660_4.patch, PIG-660_5.patch, pig_660_shims.patch, pig_660_shims_2.patch
>
>
> With Hadoop 0.20, it will be possible to query the status of each map and reduce in a map reduce job. This will allow better error reporting. Some of the other items that could be on Hadoop's feature requests/bugs are documented here for tracking.
> 1. Hadoop should return objects instead of strings when exceptions are thrown
> 2. The JobControl should handle all exceptions and report them appropriately. For example, when the JobControl fails to launch jobs, it should handle exceptions appropriately and should support APIs that query this state, i.e., failure to launch jobs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-660) Integration with Hadoop 0.20

Posted by "Nate Murray (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nate Murray updated PIG-660:
----------------------------


By applying this patch to r801032 and changing the hadoop.version = 20, I'm still getting problems when trying to use pig against hadoop 20.

Error and trace:

2009-08-07 00:45:48,549 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://x.x.x.x:54310
2009-08-07 00:45:48,834 [main] ERROR org.apache.pig.Main - ERROR 2999: Unexpected internal error. Failed to create DataStorage
2009-08-07 00:45:48,834 [main] ERROR org.apache.pig.Main - java.lang.RuntimeException: Failed to create DataStorage
        at org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:75)
        at org.apache.pig.backend.hadoop.datastorage.HDataStorage.<init>(HDataStorage.java:58)
        at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:198)
        at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:137)
        at org.apache.pig.impl.PigContext.connect(PigContext.java:180)
        at org.apache.pig.PigServer.<init>(PigServer.java:169)
        at org.apache.pig.PigServer.<init>(PigServer.java:158)
        at org.apache.pig.tools.grunt.Grunt.<init>(Grunt.java:54)
        at org.apache.pig.Main.main(Main.java:347)
Caused by: java.io.IOException: Call failed on local exception
        at org.apache.hadoop.ipc.Client.call(Client.java:718)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
        at org.apache.hadoop.dfs.$Proxy0.getProtocolVersion(Unknown Source)
        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:319)
        at org.apache.hadoop.dfs.DFSClient.createRPCNamenode(DFSClient.java:103)
        at org.apache.hadoop.dfs.DFSClient.<init>(DFSClient.java:173)
        at org.apache.hadoop.dfs.DistributedFileSystem.initialize(DistributedFileSystem.java:67)
        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1339)
        at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:56)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1351)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:213)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:118)
        at org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:72)
        ... 8 more
Caused by: java.io.EOFException
        at java.io.DataInputStream.readInt(DataInputStream.java:375)
        at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:499)
        at org.apache.hadoop.ipc.Client$Connection.run(Client.java:441

> Integration with Hadoop 0.20
> ----------------------------
>
>                 Key: PIG-660
>                 URL: https://issues.apache.org/jira/browse/PIG-660
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.2.0
>         Environment: Hadoop 0.20
>            Reporter: Santhosh Srinivasan
>            Assignee: Santhosh Srinivasan
>             Fix For: 0.4.0
>
>         Attachments: PIG-660-for-branch-0.3.patch, PIG-660.patch, PIG-660_1.patch, PIG-660_2.patch, PIG-660_3.patch, PIG-660_4.patch, PIG-660_5.patch, pig_660_shims.patch, pig_660_shims_2.patch, pig_660_shims_3.patch
>
>
> With Hadoop 0.20, it will be possible to query the status of each map and reduce in a map reduce job. This will allow better error reporting. Some of the other items that could be on Hadoop's feature requests/bugs are documented here for tracking.
> 1. Hadoop should return objects instead of strings when exceptions are thrown
> 2. The JobControl should handle all exceptions and report them appropriately. For example, when the JobControl fails to launch jobs, it should handle exceptions appropriately and should support APIs that query this state, i.e., failure to launch jobs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (PIG-660) Integration with Hadoop 0.20

Posted by "Santhosh Srinivasan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Santhosh Srinivasan reassigned PIG-660:
---------------------------------------

    Assignee: Santhosh Srinivasan

> Integration with Hadoop 0.20
> ----------------------------
>
>                 Key: PIG-660
>                 URL: https://issues.apache.org/jira/browse/PIG-660
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: types_branch
>         Environment: Hadoop 0.20
>            Reporter: Santhosh Srinivasan
>            Assignee: Santhosh Srinivasan
>             Fix For: 0.1.0
>
>
> With Hadoop 0.20, it will be possible to query the status of each map and reduce in a map reduce job. This will allow better error reporting. Some of the other items that could be on Hadoop's feature requests/bugs are documented here for tracking.
> 1. Hadoop should return objects instead of strings when exceptions are thrown
> 2. The JobControl should handle all exceptions and report them appropriately. For example, when the JobControl fails to launch jobs, it should handle exceptions appropriately and should support APIs that query this state, i.e., failure to launch jobs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-660) Integration with Hadoop 0.20

Posted by "Santhosh Srinivasan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Santhosh Srinivasan updated PIG-660:
------------------------------------

    Attachment: PIG-660_3.patch

Latest patch in synchrony with PIG trunk.

> Integration with Hadoop 0.20
> ----------------------------
>
>                 Key: PIG-660
>                 URL: https://issues.apache.org/jira/browse/PIG-660
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 1.0.0
>         Environment: Hadoop 0.20
>            Reporter: Santhosh Srinivasan
>            Assignee: Santhosh Srinivasan
>             Fix For: 0.1.0
>
>         Attachments: PIG-660.patch, PIG-660_1.patch, PIG-660_2.patch, PIG-660_3.patch
>
>
> With Hadoop 0.20, it will be possible to query the status of each map and reduce in a map reduce job. This will allow better error reporting. Some of the other items that could be on Hadoop's feature requests/bugs are documented here for tracking.
> 1. Hadoop should return objects instead of strings when exceptions are thrown
> 2. The JobControl should handle all exceptions and report them appropriately. For example, when the JobControl fails to launch jobs, it should handle exceptions appropriately and should support APIs that query this state, i.e., failure to launch jobs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-660) Integration with Hadoop 0.20

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Olga Natkovich updated PIG-660:
-------------------------------

    Fix Version/s:     (was: 0.1.0)
                   0.4.0

> Integration with Hadoop 0.20
> ----------------------------
>
>                 Key: PIG-660
>                 URL: https://issues.apache.org/jira/browse/PIG-660
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.2.0
>         Environment: Hadoop 0.20
>            Reporter: Santhosh Srinivasan
>            Assignee: Santhosh Srinivasan
>             Fix For: 0.4.0
>
>         Attachments: PIG-660.patch, PIG-660_1.patch, PIG-660_2.patch, PIG-660_3.patch
>
>
> With Hadoop 0.20, it will be possible to query the status of each map and reduce in a map reduce job. This will allow better error reporting. Some of the other items that could be on Hadoop's feature requests/bugs are documented here for tracking.
> 1. Hadoop should return objects instead of strings when exceptions are thrown
> 2. The JobControl should handle all exceptions and report them appropriately. For example, when the JobControl fails to launch jobs, it should handle exceptions appropriately and should support APIs that query this state, i.e., failure to launch jobs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-660) Integration with Hadoop 0.20

Posted by "Dmitriy V. Ryaboy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dmitriy V. Ryaboy updated PIG-660:
----------------------------------

    Attachment: PIG-660_5.patch

Updating the patch to set PIG_HADOOP_VERSION to 20 by default.

> Integration with Hadoop 0.20
> ----------------------------
>
>                 Key: PIG-660
>                 URL: https://issues.apache.org/jira/browse/PIG-660
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.2.0
>         Environment: Hadoop 0.20
>            Reporter: Santhosh Srinivasan
>            Assignee: Santhosh Srinivasan
>             Fix For: 0.4.0
>
>         Attachments: PIG-660.patch, PIG-660_1.patch, PIG-660_2.patch, PIG-660_3.patch, PIG-660_4.patch, PIG-660_5.patch
>
>
> With Hadoop 0.20, it will be possible to query the status of each map and reduce in a map reduce job. This will allow better error reporting. Some of the other items that could be on Hadoop's feature requests/bugs are documented here for tracking.
> 1. Hadoop should return objects instead of strings when exceptions are thrown
> 2. The JobControl should handle all exceptions and report them appropriately. For example, when the JobControl fails to launch jobs, it should handle exceptions appropriately and should support APIs that query this state, i.e., failure to launch jobs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-660) Integration with Hadoop 0.20

Posted by "Santhosh Srinivasan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Santhosh Srinivasan updated PIG-660:
------------------------------------

    Attachment: PIG-660_2.patch

Updated patch in synchrony with the latest sources.

> Integration with Hadoop 0.20
> ----------------------------
>
>                 Key: PIG-660
>                 URL: https://issues.apache.org/jira/browse/PIG-660
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: types_branch
>         Environment: Hadoop 0.20
>            Reporter: Santhosh Srinivasan
>            Assignee: Santhosh Srinivasan
>             Fix For: 0.1.0
>
>         Attachments: PIG-660.patch, PIG-660_1.patch, PIG-660_2.patch
>
>
> With Hadoop 0.20, it will be possible to query the status of each map and reduce in a map reduce job. This will allow better error reporting. Some of the other items that could be on Hadoop's feature requests/bugs are documented here for tracking.
> 1. Hadoop should return objects instead of strings when exceptions are thrown
> 2. The JobControl should handle all exceptions and report them appropriately. For example, when the JobControl fails to launch jobs, it should handle exceptions appropriately and should support APIs that query this state, i.e., failure to launch jobs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-660) Integration with Hadoop 0.20

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Olga Natkovich updated PIG-660:
-------------------------------

    Attachment:     (was: PIG-660_6.patch)

> Integration with Hadoop 0.20
> ----------------------------
>
>                 Key: PIG-660
>                 URL: https://issues.apache.org/jira/browse/PIG-660
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.2.0
>         Environment: Hadoop 0.20
>            Reporter: Santhosh Srinivasan
>            Assignee: Santhosh Srinivasan
>             Fix For: 0.4.0
>
>         Attachments: PIG-660.patch, PIG-660_1.patch, PIG-660_2.patch, PIG-660_3.patch, PIG-660_4.patch, PIG-660_5.patch
>
>
> With Hadoop 0.20, it will be possible to query the status of each map and reduce in a map reduce job. This will allow better error reporting. Some of the other items that could be on Hadoop's feature requests/bugs are documented here for tracking.
> 1. Hadoop should return objects instead of strings when exceptions are thrown
> 2. The JobControl should handle all exceptions and report them appropriately. For example, when the JobControl fails to launch jobs, it should handle exceptions appropriately and should support APIs that query this state, i.e., failure to launch jobs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-660) Integration with Hadoop 0.20

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12736264#action_12736264 ] 

Raghu Angadi commented on PIG-660:
----------------------------------

Currently, hadoop jar for 0.18 under lib/ is called hadoop18.jar. Should we change build.xml to use hadoop20.jar instead of hadoop18.jar?

I can file a jira to commit hadoop20.jar. This might be replaced by updated jar when this jira is committed.

> Integration with Hadoop 0.20
> ----------------------------
>
>                 Key: PIG-660
>                 URL: https://issues.apache.org/jira/browse/PIG-660
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.2.0
>         Environment: Hadoop 0.20
>            Reporter: Santhosh Srinivasan
>            Assignee: Santhosh Srinivasan
>             Fix For: 0.4.0
>
>         Attachments: PIG-660.patch, PIG-660_1.patch, PIG-660_2.patch, PIG-660_3.patch, PIG-660_4.patch, PIG-660_5.patch
>
>
> With Hadoop 0.20, it will be possible to query the status of each map and reduce in a map reduce job. This will allow better error reporting. Some of the other items that could be on Hadoop's feature requests/bugs are documented here for tracking.
> 1. Hadoop should return objects instead of strings when exceptions are thrown
> 2. The JobControl should handle all exceptions and report them appropriately. For example, when the JobControl fails to launch jobs, it should handle exceptions appropriately and should support APIs that query this state, i.e., failure to launch jobs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-660) Integration with Hadoop 0.20

Posted by "Santhosh Srinivasan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Santhosh Srinivasan updated PIG-660:
------------------------------------

    Attachment:     (was: PIG-660_2.patch)

> Integration with Hadoop 0.20
> ----------------------------
>
>                 Key: PIG-660
>                 URL: https://issues.apache.org/jira/browse/PIG-660
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: types_branch
>         Environment: Hadoop 0.20
>            Reporter: Santhosh Srinivasan
>            Assignee: Santhosh Srinivasan
>             Fix For: 0.1.0
>
>         Attachments: PIG-660.patch, PIG-660_1.patch, PIG-660_2.patch
>
>
> With Hadoop 0.20, it will be possible to query the status of each map and reduce in a map reduce job. This will allow better error reporting. Some of the other items that could be on Hadoop's feature requests/bugs are documented here for tracking.
> 1. Hadoop should return objects instead of strings when exceptions are thrown
> 2. The JobControl should handle all exceptions and report them appropriately. For example, when the JobControl fails to launch jobs, it should handle exceptions appropriately and should support APIs that query this state, i.e., failure to launch jobs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-660) Integration with Hadoop 0.20

Posted by "Dmitriy V. Ryaboy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dmitriy V. Ryaboy updated PIG-660:
----------------------------------

    Attachment: pig_660_shims_3.patch

The attached patch fixes the mentioned issue with junit.hadoop.conf by setting it to $build.dir/conf
This can be overridden by build.properties if individual contributors want to revert to the old behavior.

Also added a compatibility shim for hadoop19 (from PIG-573)

> Integration with Hadoop 0.20
> ----------------------------
>
>                 Key: PIG-660
>                 URL: https://issues.apache.org/jira/browse/PIG-660
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.2.0
>         Environment: Hadoop 0.20
>            Reporter: Santhosh Srinivasan
>            Assignee: Santhosh Srinivasan
>             Fix For: 0.4.0
>
>         Attachments: PIG-660-for-branch-0.3.patch, PIG-660.patch, PIG-660_1.patch, PIG-660_2.patch, PIG-660_3.patch, PIG-660_4.patch, PIG-660_5.patch, pig_660_shims.patch, pig_660_shims_2.patch, pig_660_shims_3.patch
>
>
> With Hadoop 0.20, it will be possible to query the status of each map and reduce in a map reduce job. This will allow better error reporting. Some of the other items that could be on Hadoop's feature requests/bugs are documented here for tracking.
> 1. Hadoop should return objects instead of strings when exceptions are thrown
> 2. The JobControl should handle all exceptions and report them appropriately. For example, when the JobControl fails to launch jobs, it should handle exceptions appropriately and should support APIs that query this state, i.e., failure to launch jobs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-660) Integration with Hadoop 0.20

Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12739348#action_12739348 ] 

Daniel Dai commented on PIG-660:
--------------------------------

Hi, Dmitriy, 
I like your idea. One comment, in src/20/java/org/apache/pig/shims/HadoopShims.java, the package line is "org.apache.hadoop.hive.shims", I guess it is a typo right?

> Integration with Hadoop 0.20
> ----------------------------
>
>                 Key: PIG-660
>                 URL: https://issues.apache.org/jira/browse/PIG-660
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.2.0
>         Environment: Hadoop 0.20
>            Reporter: Santhosh Srinivasan
>            Assignee: Santhosh Srinivasan
>             Fix For: 0.4.0
>
>         Attachments: PIG-660-for-branch-0.3.patch, PIG-660.patch, PIG-660_1.patch, PIG-660_2.patch, PIG-660_3.patch, PIG-660_4.patch, PIG-660_5.patch, pig_660_shims.patch
>
>
> With Hadoop 0.20, it will be possible to query the status of each map and reduce in a map reduce job. This will allow better error reporting. Some of the other items that could be on Hadoop's feature requests/bugs are documented here for tracking.
> 1. Hadoop should return objects instead of strings when exceptions are thrown
> 2. The JobControl should handle all exceptions and report them appropriately. For example, when the JobControl fails to launch jobs, it should handle exceptions appropriately and should support APIs that query this state, i.e., failure to launch jobs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-660) Integration with Hadoop 0.20

Posted by "Dmitriy V. Ryaboy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12736313#action_12736313 ] 

Dmitriy V. Ryaboy commented on PIG-660:
---------------------------------------

Santosh and Olga -- could you document the differences between a version of 20 Pig can use and that in the Hadoop release? Links to necessary patches, etc?

> Integration with Hadoop 0.20
> ----------------------------
>
>                 Key: PIG-660
>                 URL: https://issues.apache.org/jira/browse/PIG-660
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.2.0
>         Environment: Hadoop 0.20
>            Reporter: Santhosh Srinivasan
>            Assignee: Santhosh Srinivasan
>             Fix For: 0.4.0
>
>         Attachments: PIG-660.patch, PIG-660_1.patch, PIG-660_2.patch, PIG-660_3.patch, PIG-660_4.patch, PIG-660_5.patch, PIG-660_6.patch
>
>
> With Hadoop 0.20, it will be possible to query the status of each map and reduce in a map reduce job. This will allow better error reporting. Some of the other items that could be on Hadoop's feature requests/bugs are documented here for tracking.
> 1. Hadoop should return objects instead of strings when exceptions are thrown
> 2. The JobControl should handle all exceptions and report them appropriately. For example, when the JobControl fails to launch jobs, it should handle exceptions appropriately and should support APIs that query this state, i.e., failure to launch jobs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-660) Integration with Hadoop 0.20

Posted by "Santhosh Srinivasan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12736283#action_12736283 ] 

Santhosh Srinivasan commented on PIG-660:
-----------------------------------------

The build.xml in the patch(es) have the reference to hadoop20.jar. The missing part is the hadoop20.jar that Pig can use to build its sources. Pig cannot use the hadoop20.jar coming from the Hadoop release.

> Integration with Hadoop 0.20
> ----------------------------
>
>                 Key: PIG-660
>                 URL: https://issues.apache.org/jira/browse/PIG-660
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.2.0
>         Environment: Hadoop 0.20
>            Reporter: Santhosh Srinivasan
>            Assignee: Santhosh Srinivasan
>             Fix For: 0.4.0
>
>         Attachments: PIG-660.patch, PIG-660_1.patch, PIG-660_2.patch, PIG-660_3.patch, PIG-660_4.patch, PIG-660_5.patch
>
>
> With Hadoop 0.20, it will be possible to query the status of each map and reduce in a map reduce job. This will allow better error reporting. Some of the other items that could be on Hadoop's feature requests/bugs are documented here for tracking.
> 1. Hadoop should return objects instead of strings when exceptions are thrown
> 2. The JobControl should handle all exceptions and report them appropriately. For example, when the JobControl fails to launch jobs, it should handle exceptions appropriately and should support APIs that query this state, i.e., failure to launch jobs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-660) Integration with Hadoop 0.20

Posted by "Pradeep Kamath (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pradeep Kamath updated PIG-660:
-------------------------------

    Attachment: PIG-660-for-branch-0.3.patch

Attached a patch for "branch-0.3" based on PIG-660_5.patch. The only difference is that a couple of files (HConfiguration.java and HDataStorage.java) need ctrl-M end of lines for the patch to apply correctly to branch-0.3

> Integration with Hadoop 0.20
> ----------------------------
>
>                 Key: PIG-660
>                 URL: https://issues.apache.org/jira/browse/PIG-660
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.2.0
>         Environment: Hadoop 0.20
>            Reporter: Santhosh Srinivasan
>            Assignee: Santhosh Srinivasan
>             Fix For: 0.4.0
>
>         Attachments: PIG-660-for-branch-0.3.patch, PIG-660.patch, PIG-660_1.patch, PIG-660_2.patch, PIG-660_3.patch, PIG-660_4.patch, PIG-660_5.patch
>
>
> With Hadoop 0.20, it will be possible to query the status of each map and reduce in a map reduce job. This will allow better error reporting. Some of the other items that could be on Hadoop's feature requests/bugs are documented here for tracking.
> 1. Hadoop should return objects instead of strings when exceptions are thrown
> 2. The JobControl should handle all exceptions and report them appropriately. For example, when the JobControl fails to launch jobs, it should handle exceptions appropriately and should support APIs that query this state, i.e., failure to launch jobs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-660) Integration with Hadoop 0.20

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Olga Natkovich updated PIG-660:
-------------------------------

    Attachment: PIG-660_4.patch

Updated patch:

(1) Applies without warnings to the current trunk.
(2) Resolves TestCounter failures. Thanks, Arun, for help with this.

> Integration with Hadoop 0.20
> ----------------------------
>
>                 Key: PIG-660
>                 URL: https://issues.apache.org/jira/browse/PIG-660
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.2.0
>         Environment: Hadoop 0.20
>            Reporter: Santhosh Srinivasan
>            Assignee: Santhosh Srinivasan
>             Fix For: 0.4.0
>
>         Attachments: PIG-660.patch, PIG-660_1.patch, PIG-660_2.patch, PIG-660_3.patch, PIG-660_4.patch
>
>
> With Hadoop 0.20, it will be possible to query the status of each map and reduce in a map reduce job. This will allow better error reporting. Some of the other items that could be on Hadoop's feature requests/bugs are documented here for tracking.
> 1. Hadoop should return objects instead of strings when exceptions are thrown
> 2. The JobControl should handle all exceptions and report them appropriately. For example, when the JobControl fails to launch jobs, it should handle exceptions appropriately and should support APIs that query this state, i.e., failure to launch jobs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-660) Integration with Hadoop 0.20

Posted by "Santhosh Srinivasan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Santhosh Srinivasan updated PIG-660:
------------------------------------

    Attachment: PIG-660_1.patch

New patch with TestHBaseStorage.java excluded from unit testing.

> Integration with Hadoop 0.20
> ----------------------------
>
>                 Key: PIG-660
>                 URL: https://issues.apache.org/jira/browse/PIG-660
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: types_branch
>         Environment: Hadoop 0.20
>            Reporter: Santhosh Srinivasan
>            Assignee: Santhosh Srinivasan
>             Fix For: 0.1.0
>
>         Attachments: PIG-660.patch, PIG-660_1.patch
>
>
> With Hadoop 0.20, it will be possible to query the status of each map and reduce in a map reduce job. This will allow better error reporting. Some of the other items that could be on Hadoop's feature requests/bugs are documented here for tracking.
> 1. Hadoop should return objects instead of strings when exceptions are thrown
> 2. The JobControl should handle all exceptions and report them appropriately. For example, when the JobControl fails to launch jobs, it should handle exceptions appropriately and should support APIs that query this state, i.e., failure to launch jobs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-660) Integration with Hadoop 0.20

Posted by "Dmitriy V. Ryaboy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dmitriy V. Ryaboy updated PIG-660:
----------------------------------

    Attachment: pig_660_shims.patch

Attached patch, pig_660_shims.patch, introduces an compatibility layer similar to that in https://issues.apache.org/jira/browse/HIVE-487 . HadoopShims.java contains wrappers that hide interface differences between Hadoop 18 and 20; when an interface change affects Pig, a shim is added into this class, and used by Pig.

Separate versions of the shims are maintained for different Hadoop versions.

This way, Pig users can compile against either Hadoop 18 or Hadoop 20 by simply changing an ant property, either via the -D flag, or build.properties, instead of having to go through the process of patching.

There has been discussion of officially moving Pig to 0.20; this way, we sidestep the whole question, and only need to worry about version compatibility when using specific Hadoop APIs.

I propose that we use this mechanism until Pig is moved to use the new, future-proofed API.  

Pig compiled against 18 won't be able to use some of the newest features, such as Zebra storage. Ant can be configured not to build ant if Hadoop version is < 20.


> Integration with Hadoop 0.20
> ----------------------------
>
>                 Key: PIG-660
>                 URL: https://issues.apache.org/jira/browse/PIG-660
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.2.0
>         Environment: Hadoop 0.20
>            Reporter: Santhosh Srinivasan
>            Assignee: Santhosh Srinivasan
>             Fix For: 0.4.0
>
>         Attachments: PIG-660-for-branch-0.3.patch, PIG-660.patch, PIG-660_1.patch, PIG-660_2.patch, PIG-660_3.patch, PIG-660_4.patch, PIG-660_5.patch, pig_660_shims.patch
>
>
> With Hadoop 0.20, it will be possible to query the status of each map and reduce in a map reduce job. This will allow better error reporting. Some of the other items that could be on Hadoop's feature requests/bugs are documented here for tracking.
> 1. Hadoop should return objects instead of strings when exceptions are thrown
> 2. The JobControl should handle all exceptions and report them appropriately. For example, when the JobControl fails to launch jobs, it should handle exceptions appropriately and should support APIs that query this state, i.e., failure to launch jobs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-660) Integration with Hadoop 0.20

Posted by "Dmitriy V. Ryaboy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12740339#action_12740339 ] 

Dmitriy V. Ryaboy commented on PIG-660:
---------------------------------------

Nate,
Your stacktrace shows hadoop.dfs calls (as opposed to hdfs) which tells me it's looking for -- and finding -- hadoop 18 classes.

Can you do this:

export PIG_HADOOP_VERSION=20
ant clean; ant -Dhadoop.version=20

any try again?

Just to be sure, try moving hadoop1* out of the lib directory (so that it for sure fails if it's trying to look for 18).

> Integration with Hadoop 0.20
> ----------------------------
>
>                 Key: PIG-660
>                 URL: https://issues.apache.org/jira/browse/PIG-660
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.2.0
>         Environment: Hadoop 0.20
>            Reporter: Santhosh Srinivasan
>            Assignee: Santhosh Srinivasan
>             Fix For: 0.4.0
>
>         Attachments: PIG-660-for-branch-0.3.patch, PIG-660.patch, PIG-660_1.patch, PIG-660_2.patch, PIG-660_3.patch, PIG-660_4.patch, PIG-660_5.patch, pig_660_shims.patch, pig_660_shims_2.patch, pig_660_shims_3.patch
>
>
> With Hadoop 0.20, it will be possible to query the status of each map and reduce in a map reduce job. This will allow better error reporting. Some of the other items that could be on Hadoop's feature requests/bugs are documented here for tracking.
> 1. Hadoop should return objects instead of strings when exceptions are thrown
> 2. The JobControl should handle all exceptions and report them appropriately. For example, when the JobControl fails to launch jobs, it should handle exceptions appropriately and should support APIs that query this state, i.e., failure to launch jobs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-660) Integration with Hadoop 0.20

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Olga Natkovich updated PIG-660:
-------------------------------

    Affects Version/s:     (was: 0.2.0)
                       0.5.0

> Integration with Hadoop 0.20
> ----------------------------
>
>                 Key: PIG-660
>                 URL: https://issues.apache.org/jira/browse/PIG-660
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.5.0
>         Environment: Hadoop 0.20
>            Reporter: Santhosh Srinivasan
>            Assignee: Santhosh Srinivasan
>             Fix For: 0.4.0
>
>         Attachments: hadoop20.jar.gz, PIG-660-for-branch-0.3.patch, PIG-660.patch, PIG-660_1.patch, PIG-660_2.patch, PIG-660_3.patch, PIG-660_4.patch, PIG-660_5.patch, PIG-660_trunk.patch, PIG-660_trunk_2.patch, pig_660_shims.patch, pig_660_shims_2.patch, pig_660_shims_3.patch
>
>
> With Hadoop 0.20, it will be possible to query the status of each map and reduce in a map reduce job. This will allow better error reporting. Some of the other items that could be on Hadoop's feature requests/bugs are documented here for tracking.
> 1. Hadoop should return objects instead of strings when exceptions are thrown
> 2. The JobControl should handle all exceptions and report them appropriately. For example, when the JobControl fails to launch jobs, it should handle exceptions appropriately and should support APIs that query this state, i.e., failure to launch jobs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-660) Integration with Hadoop 0.20

Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Dai updated PIG-660:
---------------------------

    Attachment: PIG-660_trunk_2.patch

Resync with trunk.

> Integration with Hadoop 0.20
> ----------------------------
>
>                 Key: PIG-660
>                 URL: https://issues.apache.org/jira/browse/PIG-660
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.2.0
>         Environment: Hadoop 0.20
>            Reporter: Santhosh Srinivasan
>            Assignee: Santhosh Srinivasan
>             Fix For: 0.4.0
>
>         Attachments: hadoop20.jar.gz, PIG-660-for-branch-0.3.patch, PIG-660.patch, PIG-660_1.patch, PIG-660_2.patch, PIG-660_3.patch, PIG-660_4.patch, PIG-660_5.patch, PIG-660_trunk.patch, PIG-660_trunk_2.patch, pig_660_shims.patch, pig_660_shims_2.patch, pig_660_shims_3.patch
>
>
> With Hadoop 0.20, it will be possible to query the status of each map and reduce in a map reduce job. This will allow better error reporting. Some of the other items that could be on Hadoop's feature requests/bugs are documented here for tracking.
> 1. Hadoop should return objects instead of strings when exceptions are thrown
> 2. The JobControl should handle all exceptions and report them appropriately. For example, when the JobControl fails to launch jobs, it should handle exceptions appropriately and should support APIs that query this state, i.e., failure to launch jobs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (PIG-660) Integration with Hadoop 0.20

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Olga Natkovich resolved PIG-660.
--------------------------------

    Resolution: Fixed

patch was committed a while back

> Integration with Hadoop 0.20
> ----------------------------
>
>                 Key: PIG-660
>                 URL: https://issues.apache.org/jira/browse/PIG-660
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.4.0
>         Environment: Hadoop 0.20
>            Reporter: Santhosh Srinivasan
>            Assignee: Santhosh Srinivasan
>             Fix For: 0.5.0
>
>         Attachments: hadoop20.jar.gz, PIG-660-for-branch-0.3.patch, PIG-660.patch, PIG-660_1.patch, PIG-660_2.patch, PIG-660_3.patch, PIG-660_4.patch, PIG-660_5.patch, PIG-660_trunk.patch, PIG-660_trunk_2.patch, pig_660_shims.patch, pig_660_shims_2.patch, pig_660_shims_3.patch
>
>
> With Hadoop 0.20, it will be possible to query the status of each map and reduce in a map reduce job. This will allow better error reporting. Some of the other items that could be on Hadoop's feature requests/bugs are documented here for tracking.
> 1. Hadoop should return objects instead of strings when exceptions are thrown
> 2. The JobControl should handle all exceptions and report them appropriately. For example, when the JobControl fails to launch jobs, it should handle exceptions appropriately and should support APIs that query this state, i.e., failure to launch jobs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-660) Integration with Hadoop 0.20

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Raghu Angadi updated PIG-660:
-----------------------------

    Attachment: PIG-660_6.patch

Updated patch fixes two minor conflicts with the current pig trunk.

> Integration with Hadoop 0.20
> ----------------------------
>
>                 Key: PIG-660
>                 URL: https://issues.apache.org/jira/browse/PIG-660
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.2.0
>         Environment: Hadoop 0.20
>            Reporter: Santhosh Srinivasan
>            Assignee: Santhosh Srinivasan
>             Fix For: 0.4.0
>
>         Attachments: PIG-660.patch, PIG-660_1.patch, PIG-660_2.patch, PIG-660_3.patch, PIG-660_4.patch, PIG-660_5.patch, PIG-660_6.patch
>
>
> With Hadoop 0.20, it will be possible to query the status of each map and reduce in a map reduce job. This will allow better error reporting. Some of the other items that could be on Hadoop's feature requests/bugs are documented here for tracking.
> 1. Hadoop should return objects instead of strings when exceptions are thrown
> 2. The JobControl should handle all exceptions and report them appropriately. For example, when the JobControl fails to launch jobs, it should handle exceptions appropriately and should support APIs that query this state, i.e., failure to launch jobs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-660) Integration with Hadoop 0.20

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Olga Natkovich updated PIG-660:
-------------------------------

    Affects Version/s:     (was: 0.5.0)
                       0.4.0
        Fix Version/s: 0.5.0

> Integration with Hadoop 0.20
> ----------------------------
>
>                 Key: PIG-660
>                 URL: https://issues.apache.org/jira/browse/PIG-660
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.4.0
>         Environment: Hadoop 0.20
>            Reporter: Santhosh Srinivasan
>            Assignee: Santhosh Srinivasan
>             Fix For: 0.5.0
>
>         Attachments: hadoop20.jar.gz, PIG-660-for-branch-0.3.patch, PIG-660.patch, PIG-660_1.patch, PIG-660_2.patch, PIG-660_3.patch, PIG-660_4.patch, PIG-660_5.patch, PIG-660_trunk.patch, PIG-660_trunk_2.patch, pig_660_shims.patch, pig_660_shims_2.patch, pig_660_shims_3.patch
>
>
> With Hadoop 0.20, it will be possible to query the status of each map and reduce in a map reduce job. This will allow better error reporting. Some of the other items that could be on Hadoop's feature requests/bugs are documented here for tracking.
> 1. Hadoop should return objects instead of strings when exceptions are thrown
> 2. The JobControl should handle all exceptions and report them appropriately. For example, when the JobControl fails to launch jobs, it should handle exceptions appropriately and should support APIs that query this state, i.e., failure to launch jobs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (PIG-660) Integration with Hadoop 0.20

Posted by "Santhosh Srinivasan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12680718#action_12680718 ] 

Santhosh Srinivasan edited comment on PIG-660 at 3/11/09 10:25 AM:
-------------------------------------------------------------------

Latest patch in synchrony with PIG trunk. Also has a fix for the number of reducers for order by when parallel is not used in the script.

      was (Author: sms):
    Latest patch in synchrony with PIG trunk.
  
> Integration with Hadoop 0.20
> ----------------------------
>
>                 Key: PIG-660
>                 URL: https://issues.apache.org/jira/browse/PIG-660
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 1.0.0
>         Environment: Hadoop 0.20
>            Reporter: Santhosh Srinivasan
>            Assignee: Santhosh Srinivasan
>             Fix For: 0.1.0
>
>         Attachments: PIG-660.patch, PIG-660_1.patch, PIG-660_2.patch, PIG-660_3.patch
>
>
> With Hadoop 0.20, it will be possible to query the status of each map and reduce in a map reduce job. This will allow better error reporting. Some of the other items that could be on Hadoop's feature requests/bugs are documented here for tracking.
> 1. Hadoop should return objects instead of strings when exceptions are thrown
> 2. The JobControl should handle all exceptions and report them appropriately. For example, when the JobControl fails to launch jobs, it should handle exceptions appropriately and should support APIs that query this state, i.e., failure to launch jobs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-660) Integration with Hadoop 0.20

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12736319#action_12736319 ] 

Olga Natkovich commented on PIG-660:
------------------------------------

removed the latest attachment - I think there is a bit of confusion. We don't need a new patch, just a separate hadoop jar that works with the official hadoop 20 release.

> Integration with Hadoop 0.20
> ----------------------------
>
>                 Key: PIG-660
>                 URL: https://issues.apache.org/jira/browse/PIG-660
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.2.0
>         Environment: Hadoop 0.20
>            Reporter: Santhosh Srinivasan
>            Assignee: Santhosh Srinivasan
>             Fix For: 0.4.0
>
>         Attachments: PIG-660.patch, PIG-660_1.patch, PIG-660_2.patch, PIG-660_3.patch, PIG-660_4.patch, PIG-660_5.patch
>
>
> With Hadoop 0.20, it will be possible to query the status of each map and reduce in a map reduce job. This will allow better error reporting. Some of the other items that could be on Hadoop's feature requests/bugs are documented here for tracking.
> 1. Hadoop should return objects instead of strings when exceptions are thrown
> 2. The JobControl should handle all exceptions and report them appropriately. For example, when the JobControl fails to launch jobs, it should handle exceptions appropriately and should support APIs that query this state, i.e., failure to launch jobs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-660) Integration with Hadoop 0.20

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12736286#action_12736286 ] 

Olga Natkovich commented on PIG-660:
------------------------------------

Raghu, please, add the hadoop20.jar that Zebra is using. We can commit it with the understanding that we will overwrite once we commit hadoop 20 support into PIg

> Integration with Hadoop 0.20
> ----------------------------
>
>                 Key: PIG-660
>                 URL: https://issues.apache.org/jira/browse/PIG-660
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.2.0
>         Environment: Hadoop 0.20
>            Reporter: Santhosh Srinivasan
>            Assignee: Santhosh Srinivasan
>             Fix For: 0.4.0
>
>         Attachments: PIG-660.patch, PIG-660_1.patch, PIG-660_2.patch, PIG-660_3.patch, PIG-660_4.patch, PIG-660_5.patch
>
>
> With Hadoop 0.20, it will be possible to query the status of each map and reduce in a map reduce job. This will allow better error reporting. Some of the other items that could be on Hadoop's feature requests/bugs are documented here for tracking.
> 1. Hadoop should return objects instead of strings when exceptions are thrown
> 2. The JobControl should handle all exceptions and report them appropriately. For example, when the JobControl fails to launch jobs, it should handle exceptions appropriately and should support APIs that query this state, i.e., failure to launch jobs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-660) Integration with Hadoop 0.20

Posted by "Santhosh Srinivasan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12671936#action_12671936 ] 

Santhosh Srinivasan commented on PIG-660:
-----------------------------------------

JIRAs in Hadoop corresponding to items 1 and 2.

1. https://issues.apache.org/jira/browse/HADOOP-5201
2. https://issues.apache.org/jira/browse/HADOOP-5202

> Integration with Hadoop 0.20
> ----------------------------
>
>                 Key: PIG-660
>                 URL: https://issues.apache.org/jira/browse/PIG-660
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: types_branch
>         Environment: Hadoop 0.20
>            Reporter: Santhosh Srinivasan
>             Fix For: 0.1.0
>
>
> With Hadoop 0.20, it will be possible to query the status of each map and reduce in a map reduce job. This will allow better error reporting. Some of the other items that could be on Hadoop's feature requests/bugs are documented here for tracking.
> 1. Hadoop should return objects instead of strings when exceptions are thrown
> 2. The JobControl should handle all exceptions and report them appropriately. For example, when the JobControl fails to launch jobs, it should handle exceptions appropriately and should support APIs that query this state, i.e., failure to launch jobs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-660) Integration with Hadoop 0.20

Posted by "Santhosh Srinivasan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Santhosh Srinivasan updated PIG-660:
------------------------------------

    Attachment: PIG-660_2.patch

New patch that ensures that pig specific properties are picked up and reasonable error messages are returned when backend errors (that cannot be parsed) occur.

> Integration with Hadoop 0.20
> ----------------------------
>
>                 Key: PIG-660
>                 URL: https://issues.apache.org/jira/browse/PIG-660
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: types_branch
>         Environment: Hadoop 0.20
>            Reporter: Santhosh Srinivasan
>            Assignee: Santhosh Srinivasan
>             Fix For: 0.1.0
>
>         Attachments: PIG-660.patch, PIG-660_1.patch, PIG-660_2.patch
>
>
> With Hadoop 0.20, it will be possible to query the status of each map and reduce in a map reduce job. This will allow better error reporting. Some of the other items that could be on Hadoop's feature requests/bugs are documented here for tracking.
> 1. Hadoop should return objects instead of strings when exceptions are thrown
> 2. The JobControl should handle all exceptions and report them appropriately. For example, when the JobControl fails to launch jobs, it should handle exceptions appropriately and should support APIs that query this state, i.e., failure to launch jobs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-660) Integration with Hadoop 0.20

Posted by "Santhosh Srinivasan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Santhosh Srinivasan updated PIG-660:
------------------------------------

    Attachment: PIG-660.patch

Patch to integrate PIG with Hadoop 20. This patch switches off the deprecation warnings and fixes a NPE.

> Integration with Hadoop 0.20
> ----------------------------
>
>                 Key: PIG-660
>                 URL: https://issues.apache.org/jira/browse/PIG-660
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: types_branch
>         Environment: Hadoop 0.20
>            Reporter: Santhosh Srinivasan
>            Assignee: Santhosh Srinivasan
>             Fix For: 0.1.0
>
>         Attachments: PIG-660.patch
>
>
> With Hadoop 0.20, it will be possible to query the status of each map and reduce in a map reduce job. This will allow better error reporting. Some of the other items that could be on Hadoop's feature requests/bugs are documented here for tracking.
> 1. Hadoop should return objects instead of strings when exceptions are thrown
> 2. The JobControl should handle all exceptions and report them appropriately. For example, when the JobControl fails to launch jobs, it should handle exceptions appropriately and should support APIs that query this state, i.e., failure to launch jobs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-660) Integration with Hadoop 0.20

Posted by "Nate Murray (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nate Murray updated PIG-660:
----------------------------


Dmitriy, thanks for the feedback. I did an ant clean and ant -Dhadoop.version=20 as you suggested. That alone did not work, however, when I deleted hadoop17.jar and hadoop18.jar *then* it worked perfectly.

Thanks again,

Nate

> Integration with Hadoop 0.20
> ----------------------------
>
>                 Key: PIG-660
>                 URL: https://issues.apache.org/jira/browse/PIG-660
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.2.0
>         Environment: Hadoop 0.20
>            Reporter: Santhosh Srinivasan
>            Assignee: Santhosh Srinivasan
>             Fix For: 0.4.0
>
>         Attachments: PIG-660-for-branch-0.3.patch, PIG-660.patch, PIG-660_1.patch, PIG-660_2.patch, PIG-660_3.patch, PIG-660_4.patch, PIG-660_5.patch, pig_660_shims.patch, pig_660_shims_2.patch, pig_660_shims_3.patch
>
>
> With Hadoop 0.20, it will be possible to query the status of each map and reduce in a map reduce job. This will allow better error reporting. Some of the other items that could be on Hadoop's feature requests/bugs are documented here for tracking.
> 1. Hadoop should return objects instead of strings when exceptions are thrown
> 2. The JobControl should handle all exceptions and report them appropriately. For example, when the JobControl fails to launch jobs, it should handle exceptions appropriately and should support APIs that query this state, i.e., failure to launch jobs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-660) Integration with Hadoop 0.20

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12736297#action_12736297 ] 

Raghu Angadi commented on PIG-660:
----------------------------------

Thanks Olga and Santosh.

build.xml change is already in the patch. Thanks.

I will attach hadoop20.jar that works with PIG. This is useful for anyone to tryout the patch. This will also be used by zebra (PIG-833). Please commit the jar file to PIG trunk. It could be updated with a later version of hadoop-0.20 branch.

> Integration with Hadoop 0.20
> ----------------------------
>
>                 Key: PIG-660
>                 URL: https://issues.apache.org/jira/browse/PIG-660
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.2.0
>         Environment: Hadoop 0.20
>            Reporter: Santhosh Srinivasan
>            Assignee: Santhosh Srinivasan
>             Fix For: 0.4.0
>
>         Attachments: PIG-660.patch, PIG-660_1.patch, PIG-660_2.patch, PIG-660_3.patch, PIG-660_4.patch, PIG-660_5.patch
>
>
> With Hadoop 0.20, it will be possible to query the status of each map and reduce in a map reduce job. This will allow better error reporting. Some of the other items that could be on Hadoop's feature requests/bugs are documented here for tracking.
> 1. Hadoop should return objects instead of strings when exceptions are thrown
> 2. The JobControl should handle all exceptions and report them appropriately. For example, when the JobControl fails to launch jobs, it should handle exceptions appropriately and should support APIs that query this state, i.e., failure to launch jobs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-660) Integration with Hadoop 0.20

Posted by "Dmitriy V. Ryaboy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dmitriy V. Ryaboy updated PIG-660:
----------------------------------

    Attachment: pig_660_shims_2.patch

Sure is.. uploading a patch with the fixed package name. 

> Integration with Hadoop 0.20
> ----------------------------
>
>                 Key: PIG-660
>                 URL: https://issues.apache.org/jira/browse/PIG-660
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.2.0
>         Environment: Hadoop 0.20
>            Reporter: Santhosh Srinivasan
>            Assignee: Santhosh Srinivasan
>             Fix For: 0.4.0
>
>         Attachments: PIG-660-for-branch-0.3.patch, PIG-660.patch, PIG-660_1.patch, PIG-660_2.patch, PIG-660_3.patch, PIG-660_4.patch, PIG-660_5.patch, pig_660_shims.patch, pig_660_shims_2.patch
>
>
> With Hadoop 0.20, it will be possible to query the status of each map and reduce in a map reduce job. This will allow better error reporting. Some of the other items that could be on Hadoop's feature requests/bugs are documented here for tracking.
> 1. Hadoop should return objects instead of strings when exceptions are thrown
> 2. The JobControl should handle all exceptions and report them appropriately. For example, when the JobControl fails to launch jobs, it should handle exceptions appropriately and should support APIs that query this state, i.e., failure to launch jobs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.