You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Benjamin Reed (JIRA)" <ji...@apache.org> on 2007/01/25 20:47:49 UTC

[jira] Created: (HADOOP-933) Application defined InputSplits do not work

Application defined InputSplits do not work
-------------------------------------------

                 Key: HADOOP-933
                 URL: https://issues.apache.org/jira/browse/HADOOP-933
             Project: Hadoop
          Issue Type: Bug
          Components: mapred
    Affects Versions: 0.10.1
            Reporter: Benjamin Reed


If an application defines its own InputSplit, the task tracker chokes when it cannot deserialize the InputSplit when it deserializes MapTasks it receives from the JobTracker. This is because the TaskTracker does not resolve classes from the job jar file. The attached patch delays resolution of the InputSplit until it is running in the context of the child process where it can resolve the InputSplit class.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Work started: (HADOOP-933) Application defined InputSplits do not work

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Work on HADOOP-933 started by Owen O'Malley.

> Application defined InputSplits do not work
> -------------------------------------------
>
>                 Key: HADOOP-933
>                 URL: https://issues.apache.org/jira/browse/HADOOP-933
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.10.1
>            Reporter: Benjamin Reed
>         Assigned To: Owen O'Malley
>         Attachments: JobInProgress.patch, MapTask.patch
>
>
> If an application defines its own InputSplit, the task tracker chokes when it cannot deserialize the InputSplit when it deserializes MapTasks it receives from the JobTracker. This is because the TaskTracker does not resolve classes from the job jar file. The attached patch delays resolution of the InputSplit until it is running in the context of the child process where it can resolve the InputSplit class.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-933) Application defined InputSplits do not work

Posted by "Benjamin Reed (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12467956 ] 

Benjamin Reed commented on HADOOP-933:
--------------------------------------

I found another place that assumed FileSplit. See attached patch. Our test cases pass now, so that should be the end of it. (At least for our use of InputSplit.)

> Application defined InputSplits do not work
> -------------------------------------------
>
>                 Key: HADOOP-933
>                 URL: https://issues.apache.org/jira/browse/HADOOP-933
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.10.1
>            Reporter: Benjamin Reed
>             Fix For: 0.10.1
>
>         Attachments: JobInProgress.patch, MapTask.patch
>
>
> If an application defines its own InputSplit, the task tracker chokes when it cannot deserialize the InputSplit when it deserializes MapTasks it receives from the JobTracker. This is because the TaskTracker does not resolve classes from the job jar file. The attached patch delays resolution of the InputSplit until it is running in the context of the child process where it can resolve the InputSplit class.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-933) Application defined InputSplits do not work

Posted by "Benjamin Reed (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Benjamin Reed updated HADOOP-933:
---------------------------------

    Attachment: JobInProgress.patch

> Application defined InputSplits do not work
> -------------------------------------------
>
>                 Key: HADOOP-933
>                 URL: https://issues.apache.org/jira/browse/HADOOP-933
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.10.1
>            Reporter: Benjamin Reed
>             Fix For: 0.10.1
>
>         Attachments: JobInProgress.patch, MapTask.patch
>
>
> If an application defines its own InputSplit, the task tracker chokes when it cannot deserialize the InputSplit when it deserializes MapTasks it receives from the JobTracker. This is because the TaskTracker does not resolve classes from the job jar file. The attached patch delays resolution of the InputSplit until it is running in the context of the child process where it can resolve the InputSplit class.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-933) Application defined InputSplits do not work

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Owen O'Malley updated HADOOP-933:
---------------------------------

    Attachment: client-split.patch

This patch fixes this bug and HADOOP-867.

The JobClient creates an InputFormat object and generates the list of InputSplits using getSplits. The list of InputSplits are written to a dfs file next to the job.xml. (More precisely, a list of RawInputSplits are written to a file. The RawInputSplits consist of the serialized InputSplit, the class name, and the list of locations. When the JobTracker initializes the JobInProgress, it just has to read the serialized InputSplits and passes them down to the TaskTrackers and to the Task. When the MapTask starts, it deserializes the InputSplit and uses it to create the RecordReader. This has the advantage that non-FileSplit InputSplits work (since the class is recorded) and that the user code is never loaded by the JobTracker.

> Application defined InputSplits do not work
> -------------------------------------------
>
>                 Key: HADOOP-933
>                 URL: https://issues.apache.org/jira/browse/HADOOP-933
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.10.1
>            Reporter: Benjamin Reed
>         Assigned To: Owen O'Malley
>         Attachments: client-split.patch, JobInProgress.patch, MapTask.patch
>
>
> If an application defines its own InputSplit, the task tracker chokes when it cannot deserialize the InputSplit when it deserializes MapTasks it receives from the JobTracker. This is because the TaskTracker does not resolve classes from the job jar file. The attached patch delays resolution of the InputSplit until it is running in the context of the child process where it can resolve the InputSplit class.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-933) Application defined InputSplits do not work

Posted by "Benjamin Reed (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12467711 ] 

Benjamin Reed commented on HADOOP-933:
--------------------------------------

Unfortunately, the workaround wouldn't work either. The MapTask does not write out the name of the InputSplit class, so the readFields() assumes that it is reading a FileSplit.

> Application defined InputSplits do not work
> -------------------------------------------
>
>                 Key: HADOOP-933
>                 URL: https://issues.apache.org/jira/browse/HADOOP-933
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.10.1
>            Reporter: Benjamin Reed
>             Fix For: 0.10.1
>
>         Attachments: MapTask.patch
>
>
> If an application defines its own InputSplit, the task tracker chokes when it cannot deserialize the InputSplit when it deserializes MapTasks it receives from the JobTracker. This is because the TaskTracker does not resolve classes from the job jar file. The attached patch delays resolution of the InputSplit until it is running in the context of the child process where it can resolve the InputSplit class.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-933) Application defined InputSplits do not work

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Owen O'Malley updated HADOOP-933:
---------------------------------

    Fix Version/s: 0.12.0
           Status: Patch Available  (was: In Progress)

> Application defined InputSplits do not work
> -------------------------------------------
>
>                 Key: HADOOP-933
>                 URL: https://issues.apache.org/jira/browse/HADOOP-933
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.10.1
>            Reporter: Benjamin Reed
>         Assigned To: Owen O'Malley
>             Fix For: 0.12.0
>
>         Attachments: client-split.patch, JobInProgress.patch, MapTask.patch
>
>
> If an application defines its own InputSplit, the task tracker chokes when it cannot deserialize the InputSplit when it deserializes MapTasks it receives from the JobTracker. This is because the TaskTracker does not resolve classes from the job jar file. The attached patch delays resolution of the InputSplit until it is running in the context of the child process where it can resolve the InputSplit class.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-933) Application defined InputSplits do not work

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12467889 ] 

Owen O'Malley commented on HADOOP-933:
--------------------------------------

For a fix that doesn't depend on HADOOP-867, I would propose:
  1. Replace the InputSplit in MapTask with:
      private byte[] inputSplit;
      private String inputSplitClassname;
  2. The MapTask constructor still takes a InputSplit and serializes it to set inputSplit and inputSplitClassname.
  3. The MapTask.run method uses the bytes and classname to reconstruct the InputSplit as a local variable in run.
  4. I don't think the change to set map.input.* properties to default values in the non-FileSplit case is reasonable. 

> Application defined InputSplits do not work
> -------------------------------------------
>
>                 Key: HADOOP-933
>                 URL: https://issues.apache.org/jira/browse/HADOOP-933
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.10.1
>            Reporter: Benjamin Reed
>             Fix For: 0.10.1
>
>         Attachments: MapTask.patch
>
>
> If an application defines its own InputSplit, the task tracker chokes when it cannot deserialize the InputSplit when it deserializes MapTasks it receives from the JobTracker. This is because the TaskTracker does not resolve classes from the job jar file. The attached patch delays resolution of the InputSplit until it is running in the context of the child process where it can resolve the InputSplit class.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-933) Application defined InputSplits do not work

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting updated HADOOP-933:
--------------------------------

    Fix Version/s:     (was: 0.10.1)
         Assignee: Owen O'Malley

0.10.1 has already been released, so this cannot be targetted for that release.

> Application defined InputSplits do not work
> -------------------------------------------
>
>                 Key: HADOOP-933
>                 URL: https://issues.apache.org/jira/browse/HADOOP-933
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.10.1
>            Reporter: Benjamin Reed
>         Assigned To: Owen O'Malley
>         Attachments: JobInProgress.patch, MapTask.patch
>
>
> If an application defines its own InputSplit, the task tracker chokes when it cannot deserialize the InputSplit when it deserializes MapTasks it receives from the JobTracker. This is because the TaskTracker does not resolve classes from the job jar file. The attached patch delays resolution of the InputSplit until it is running in the context of the child process where it can resolve the InputSplit class.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-933) Application defined InputSplits do not work

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Owen O'Malley updated HADOOP-933:
---------------------------------

    Attachment:     (was: client-split.patch)

> Application defined InputSplits do not work
> -------------------------------------------
>
>                 Key: HADOOP-933
>                 URL: https://issues.apache.org/jira/browse/HADOOP-933
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.10.1
>            Reporter: Benjamin Reed
>         Assigned To: Owen O'Malley
>             Fix For: 0.12.0
>
>         Attachments: client-split.patch, JobInProgress.patch, MapTask.patch
>
>
> If an application defines its own InputSplit, the task tracker chokes when it cannot deserialize the InputSplit when it deserializes MapTasks it receives from the JobTracker. This is because the TaskTracker does not resolve classes from the job jar file. The attached patch delays resolution of the InputSplit until it is running in the context of the child process where it can resolve the InputSplit class.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-933) Application defined InputSplits do not work

Posted by "Benjamin Reed (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12467651 ] 

Benjamin Reed commented on HADOOP-933:
--------------------------------------

867 might be a better solution, but it is an enhancement. This is a bug. It would be nice to fix the bug.

> Application defined InputSplits do not work
> -------------------------------------------
>
>                 Key: HADOOP-933
>                 URL: https://issues.apache.org/jira/browse/HADOOP-933
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.10.1
>            Reporter: Benjamin Reed
>             Fix For: 0.10.1
>
>         Attachments: MapTask.patch
>
>
> If an application defines its own InputSplit, the task tracker chokes when it cannot deserialize the InputSplit when it deserializes MapTasks it receives from the JobTracker. This is because the TaskTracker does not resolve classes from the job jar file. The attached patch delays resolution of the InputSplit until it is running in the context of the child process where it can resolve the InputSplit class.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-933) Application defined InputSplits do not work

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Owen O'Malley updated HADOOP-933:
---------------------------------

    Status: Open  (was: Patch Available)

I think a better solution would be to do HADOOP-867, which will have the client write all of the input splits to disk. Then the MapTask would only need to keep the Path to the split file and an offset. The only place the splits would be instantiated would be the submitting program and the task jvm.

> Application defined InputSplits do not work
> -------------------------------------------
>
>                 Key: HADOOP-933
>                 URL: https://issues.apache.org/jira/browse/HADOOP-933
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.10.1
>            Reporter: Benjamin Reed
>             Fix For: 0.10.1
>
>         Attachments: MapTask.patch
>
>
> If an application defines its own InputSplit, the task tracker chokes when it cannot deserialize the InputSplit when it deserializes MapTasks it receives from the JobTracker. This is because the TaskTracker does not resolve classes from the job jar file. The attached patch delays resolution of the InputSplit until it is running in the context of the child process where it can resolve the InputSplit class.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-933) Application defined InputSplits do not work

Posted by "Benjamin Reed (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Benjamin Reed updated HADOOP-933:
---------------------------------

    Fix Version/s: 0.10.1
           Status: Patch Available  (was: Open)

> Application defined InputSplits do not work
> -------------------------------------------
>
>                 Key: HADOOP-933
>                 URL: https://issues.apache.org/jira/browse/HADOOP-933
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.10.1
>            Reporter: Benjamin Reed
>             Fix For: 0.10.1
>
>
> If an application defines its own InputSplit, the task tracker chokes when it cannot deserialize the InputSplit when it deserializes MapTasks it receives from the JobTracker. This is because the TaskTracker does not resolve classes from the job jar file. The attached patch delays resolution of the InputSplit until it is running in the context of the child process where it can resolve the InputSplit class.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-933) Application defined InputSplits do not work

Posted by "Runping Qi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12468716 ] 

Runping Qi commented on HADOOP-933:
-----------------------------------


What is the status of this issue?


> Application defined InputSplits do not work
> -------------------------------------------
>
>                 Key: HADOOP-933
>                 URL: https://issues.apache.org/jira/browse/HADOOP-933
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.10.1
>            Reporter: Benjamin Reed
>             Fix For: 0.10.1
>
>         Attachments: JobInProgress.patch, MapTask.patch
>
>
> If an application defines its own InputSplit, the task tracker chokes when it cannot deserialize the InputSplit when it deserializes MapTasks it receives from the JobTracker. This is because the TaskTracker does not resolve classes from the job jar file. The attached patch delays resolution of the InputSplit until it is running in the context of the child process where it can resolve the InputSplit class.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-933) Application defined InputSplits do not work

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting updated HADOOP-933:
--------------------------------

    Status: Open  (was: Patch Available)

This latest patch doesn't compile.  Also, the unit test leaves a directory named "test_mini_mr_local" in $CWD.

> Application defined InputSplits do not work
> -------------------------------------------
>
>                 Key: HADOOP-933
>                 URL: https://issues.apache.org/jira/browse/HADOOP-933
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.10.1
>            Reporter: Benjamin Reed
>         Assigned To: Owen O'Malley
>             Fix For: 0.12.0
>
>         Attachments: client-split.patch, JobInProgress.patch, MapTask.patch
>
>
> If an application defines its own InputSplit, the task tracker chokes when it cannot deserialize the InputSplit when it deserializes MapTasks it receives from the JobTracker. This is because the TaskTracker does not resolve classes from the job jar file. The attached patch delays resolution of the InputSplit until it is running in the context of the child process where it can resolve the InputSplit class.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-933) Application defined InputSplits do not work

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12471885 ] 

Hadoop QA commented on HADOOP-933:
----------------------------------

-1, because 3 attempts failed to build and test the latest attachment (http://issues.apache.org/jira/secure/attachment/12350806/client-split.patch) against trunk revision r505557. Please note that this message is automatically generated and may represent a problem with the automation system and not the patch.

> Application defined InputSplits do not work
> -------------------------------------------
>
>                 Key: HADOOP-933
>                 URL: https://issues.apache.org/jira/browse/HADOOP-933
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.10.1
>            Reporter: Benjamin Reed
>         Assigned To: Owen O'Malley
>             Fix For: 0.12.0
>
>         Attachments: client-split.patch, JobInProgress.patch, MapTask.patch
>
>
> If an application defines its own InputSplit, the task tracker chokes when it cannot deserialize the InputSplit when it deserializes MapTasks it receives from the JobTracker. This is because the TaskTracker does not resolve classes from the job jar file. The attached patch delays resolution of the InputSplit until it is running in the context of the child process where it can resolve the InputSplit class.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-933) Application defined InputSplits do not work

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Owen O'Malley updated HADOOP-933:
---------------------------------

    Status: Patch Available  (was: Open)

> Application defined InputSplits do not work
> -------------------------------------------
>
>                 Key: HADOOP-933
>                 URL: https://issues.apache.org/jira/browse/HADOOP-933
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.10.1
>            Reporter: Benjamin Reed
>         Assigned To: Owen O'Malley
>             Fix For: 0.12.0
>
>         Attachments: client-split-3.patch, client-split-fixed.patch, client-split.patch, JobInProgress.patch, MapTask.patch
>
>
> If an application defines its own InputSplit, the task tracker chokes when it cannot deserialize the InputSplit when it deserializes MapTasks it receives from the JobTracker. This is because the TaskTracker does not resolve classes from the job jar file. The attached patch delays resolution of the InputSplit until it is running in the context of the child process where it can resolve the InputSplit class.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-933) Application defined InputSplits do not work

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Owen O'Malley updated HADOOP-933:
---------------------------------

    Attachment: client-split.patch

This patch fixes the conflicts from Nigel's patch for moving test files out of /tmp.

> Application defined InputSplits do not work
> -------------------------------------------
>
>                 Key: HADOOP-933
>                 URL: https://issues.apache.org/jira/browse/HADOOP-933
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.10.1
>            Reporter: Benjamin Reed
>         Assigned To: Owen O'Malley
>             Fix For: 0.12.0
>
>         Attachments: client-split.patch, JobInProgress.patch, MapTask.patch
>
>
> If an application defines its own InputSplit, the task tracker chokes when it cannot deserialize the InputSplit when it deserializes MapTasks it receives from the JobTracker. This is because the TaskTracker does not resolve classes from the job jar file. The attached patch delays resolution of the InputSplit until it is running in the context of the child process where it can resolve the InputSplit class.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-933) Application defined InputSplits do not work

Posted by "Nigel Daley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nigel Daley updated HADOOP-933:
-------------------------------

    Attachment: client-split-fixed.patch

Latest patch doesn't compile :p
Attaching one that does.

> Application defined InputSplits do not work
> -------------------------------------------
>
>                 Key: HADOOP-933
>                 URL: https://issues.apache.org/jira/browse/HADOOP-933
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.10.1
>            Reporter: Benjamin Reed
>         Assigned To: Owen O'Malley
>             Fix For: 0.12.0
>
>         Attachments: client-split-fixed.patch, client-split.patch, JobInProgress.patch, MapTask.patch
>
>
> If an application defines its own InputSplit, the task tracker chokes when it cannot deserialize the InputSplit when it deserializes MapTasks it receives from the JobTracker. This is because the TaskTracker does not resolve classes from the job jar file. The attached patch delays resolution of the InputSplit until it is running in the context of the child process where it can resolve the InputSplit class.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-933) Application defined InputSplits do not work

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Owen O'Malley updated HADOOP-933:
---------------------------------

    Attachment:     (was: client-split.patch)

> Application defined InputSplits do not work
> -------------------------------------------
>
>                 Key: HADOOP-933
>                 URL: https://issues.apache.org/jira/browse/HADOOP-933
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.10.1
>            Reporter: Benjamin Reed
>         Assigned To: Owen O'Malley
>             Fix For: 0.12.0
>
>         Attachments: client-split.patch, JobInProgress.patch, MapTask.patch
>
>
> If an application defines its own InputSplit, the task tracker chokes when it cannot deserialize the InputSplit when it deserializes MapTasks it receives from the JobTracker. This is because the TaskTracker does not resolve classes from the job jar file. The attached patch delays resolution of the InputSplit until it is running in the context of the child process where it can resolve the InputSplit class.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Re: [jira] Commented: (HADOOP-933) Application defined InputSplits do not work

Posted by Nigel Daley <nd...@yahoo-inc.com>.
The patch couldn't be applied by the automated process because the  
filename in the patch did not start with src/java/org/...


On Jan 25, 2007, at 11:51 AM, Hadoop QA (JIRA) wrote:

>
>     [ https://issues.apache.org/jira/browse/HADOOP-933? 
> page=com.atlassian.jira.plugin.system.issuetabpanels:comment- 
> tabpanel#action_12467513 ]
>
> Hadoop QA commented on HADOOP-933:
> ----------------------------------
>
> -1, because the patch command could not apply the latest attachment  
> (http://issues.apache.org/jira/secure/attachment/12349621/ 
> MapTask.patch) as a patch to trunk revision r499156. Please note  
> that this message is automatically generated and may represent a  
> problem with the automation system and not the patch.
>
>> Application defined InputSplits do not work
>> -------------------------------------------
>>
>>                 Key: HADOOP-933
>>                 URL: https://issues.apache.org/jira/browse/HADOOP-933
>>             Project: Hadoop
>>          Issue Type: Bug
>>          Components: mapred
>>    Affects Versions: 0.10.1
>>            Reporter: Benjamin Reed
>>             Fix For: 0.10.1
>>
>>         Attachments: MapTask.patch
>>
>>
>> If an application defines its own InputSplit, the task tracker  
>> chokes when it cannot deserialize the InputSplit when it  
>> deserializes MapTasks it receives from the JobTracker. This is  
>> because the TaskTracker does not resolve classes from the job jar  
>> file. The attached patch delays resolution of the InputSplit until  
>> it is running in the context of the child process where it can  
>> resolve the InputSplit class.
>
> -- 
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>


[jira] Commented: (HADOOP-933) Application defined InputSplits do not work

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12467513 ] 

Hadoop QA commented on HADOOP-933:
----------------------------------

-1, because the patch command could not apply the latest attachment (http://issues.apache.org/jira/secure/attachment/12349621/MapTask.patch) as a patch to trunk revision r499156. Please note that this message is automatically generated and may represent a problem with the automation system and not the patch.

> Application defined InputSplits do not work
> -------------------------------------------
>
>                 Key: HADOOP-933
>                 URL: https://issues.apache.org/jira/browse/HADOOP-933
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.10.1
>            Reporter: Benjamin Reed
>             Fix For: 0.10.1
>
>         Attachments: MapTask.patch
>
>
> If an application defines its own InputSplit, the task tracker chokes when it cannot deserialize the InputSplit when it deserializes MapTasks it receives from the JobTracker. This is because the TaskTracker does not resolve classes from the job jar file. The attached patch delays resolution of the InputSplit until it is running in the context of the child process where it can resolve the InputSplit class.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-933) Application defined InputSplits do not work

Posted by "Benjamin Reed (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Benjamin Reed updated HADOOP-933:
---------------------------------

    Attachment: MapTask.patch

> Application defined InputSplits do not work
> -------------------------------------------
>
>                 Key: HADOOP-933
>                 URL: https://issues.apache.org/jira/browse/HADOOP-933
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.10.1
>            Reporter: Benjamin Reed
>             Fix For: 0.10.1
>
>         Attachments: MapTask.patch
>
>
> If an application defines its own InputSplit, the task tracker chokes when it cannot deserialize the InputSplit when it deserializes MapTasks it receives from the JobTracker. This is because the TaskTracker does not resolve classes from the job jar file. The attached patch delays resolution of the InputSplit until it is running in the context of the child process where it can resolve the InputSplit class.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-933) Application defined InputSplits do not work

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12471872 ] 

Owen O'Malley commented on HADOOP-933:
--------------------------------------

This patch also fixes HADOOP-338.

> Application defined InputSplits do not work
> -------------------------------------------
>
>                 Key: HADOOP-933
>                 URL: https://issues.apache.org/jira/browse/HADOOP-933
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.10.1
>            Reporter: Benjamin Reed
>         Assigned To: Owen O'Malley
>             Fix For: 0.12.0
>
>         Attachments: client-split.patch, JobInProgress.patch, MapTask.patch
>
>
> If an application defines its own InputSplit, the task tracker chokes when it cannot deserialize the InputSplit when it deserializes MapTasks it receives from the JobTracker. This is because the TaskTracker does not resolve classes from the job jar file. The attached patch delays resolution of the InputSplit until it is running in the context of the child process where it can resolve the InputSplit class.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-933) Application defined InputSplits do not work

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Owen O'Malley updated HADOOP-933:
---------------------------------

    Attachment: client-split.patch

Doug wanted the split file to have a fixed ascii header. I added "SPL"

> Application defined InputSplits do not work
> -------------------------------------------
>
>                 Key: HADOOP-933
>                 URL: https://issues.apache.org/jira/browse/HADOOP-933
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.10.1
>            Reporter: Benjamin Reed
>         Assigned To: Owen O'Malley
>             Fix For: 0.12.0
>
>         Attachments: client-split.patch, client-split.patch, JobInProgress.patch, MapTask.patch
>
>
> If an application defines its own InputSplit, the task tracker chokes when it cannot deserialize the InputSplit when it deserializes MapTasks it receives from the JobTracker. This is because the TaskTracker does not resolve classes from the job jar file. The attached patch delays resolution of the InputSplit until it is running in the context of the child process where it can resolve the InputSplit class.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-933) Application defined InputSplits do not work

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Owen O'Malley updated HADOOP-933:
---------------------------------

    Attachment: client-split-3.patch

This patch fixes the problems mentioned previously including moving all of the temporary directories into build/test. It also adds a bit to the task tracker jsp that lists the tasks that aren't running, but being stored on the task tracker. (This can help in figuring out why the mini/mr cluster gets stuck.) I also changed one of the mini/mr flags to volatile since it is accessed without synchronization.

> Application defined InputSplits do not work
> -------------------------------------------
>
>                 Key: HADOOP-933
>                 URL: https://issues.apache.org/jira/browse/HADOOP-933
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.10.1
>            Reporter: Benjamin Reed
>         Assigned To: Owen O'Malley
>             Fix For: 0.12.0
>
>         Attachments: client-split-3.patch, client-split-fixed.patch, client-split.patch, JobInProgress.patch, MapTask.patch
>
>
> If an application defines its own InputSplit, the task tracker chokes when it cannot deserialize the InputSplit when it deserializes MapTasks it receives from the JobTracker. This is because the TaskTracker does not resolve classes from the job jar file. The attached patch delays resolution of the InputSplit until it is running in the context of the child process where it can resolve the InputSplit class.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-933) Application defined InputSplits do not work

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12467669 ] 

Doug Cutting commented on HADOOP-933:
-------------------------------------

> 867 might be a better solution, but it is an enhancement. This is a bug.

Actually, it's a new feature that doesn't yet work!

The workaround is to put your split implementations on the classpath of your tasktrackers.  That may or may not be practical for you...


> Application defined InputSplits do not work
> -------------------------------------------
>
>                 Key: HADOOP-933
>                 URL: https://issues.apache.org/jira/browse/HADOOP-933
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.10.1
>            Reporter: Benjamin Reed
>             Fix For: 0.10.1
>
>         Attachments: MapTask.patch
>
>
> If an application defines its own InputSplit, the task tracker chokes when it cannot deserialize the InputSplit when it deserializes MapTasks it receives from the JobTracker. This is because the TaskTracker does not resolve classes from the job jar file. The attached patch delays resolution of the InputSplit until it is running in the context of the child process where it can resolve the InputSplit class.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-933) Application defined InputSplits do not work

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting updated HADOOP-933:
--------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

This has been fixed by HADOOP-867.

> Application defined InputSplits do not work
> -------------------------------------------
>
>                 Key: HADOOP-933
>                 URL: https://issues.apache.org/jira/browse/HADOOP-933
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.10.1
>            Reporter: Benjamin Reed
>         Assigned To: Owen O'Malley
>             Fix For: 0.12.0
>
>         Attachments: client-split-3.patch, client-split-fixed.patch, client-split.patch, JobInProgress.patch, MapTask.patch
>
>
> If an application defines its own InputSplit, the task tracker chokes when it cannot deserialize the InputSplit when it deserializes MapTasks it receives from the JobTracker. This is because the TaskTracker does not resolve classes from the job jar file. The attached patch delays resolution of the InputSplit until it is running in the context of the child process where it can resolve the InputSplit class.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.