You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Chris K Wensel (JIRA)" <ji...@apache.org> on 2008/10/24 02:31:44 UTC

[jira] Created: (HADOOP-4510) FileOutputFormat protects getTaskOutputPath

FileOutputFormat protects getTaskOutputPath
-------------------------------------------

                 Key: HADOOP-4510
                 URL: https://issues.apache.org/jira/browse/HADOOP-4510
             Project: Hadoop Core
          Issue Type: Bug
    Affects Versions: 0.19.0
            Reporter: Chris K Wensel
            Priority: Blocker


o.a.h.m.FileOutputFormat#getTaskOutputPath() is protected. 

Having access to a task output directory as used internally by RecordWriters is quite handy. This is especially true if the user is attempting to serialize out data in a similar fashion as the output collector.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4510) FileOutputFormat protects getTaskOutputPath

Posted by "Chris K Wensel (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12643712#action_12643712 ] 

Chris K Wensel commented on HADOOP-4510:
----------------------------------------

I see this is marked as fixed for 0.20. Any way it can leak into the 0.19.x release?

> FileOutputFormat protects getTaskOutputPath
> -------------------------------------------
>
>                 Key: HADOOP-4510
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4510
>             Project: Hadoop Core
>          Issue Type: Bug
>    Affects Versions: 0.19.0
>            Reporter: Chris K Wensel
>            Assignee: Chris K Wensel
>            Priority: Blocker
>             Fix For: 0.20.0
>
>         Attachments: hadoop-4510.patch
>
>
> o.a.h.m.FileOutputFormat#getTaskOutputPath() is protected. 
> Having access to a task output directory as used internally by RecordWriters is quite handy. This is especially true if the user is attempting to serialize out data in a similar fashion as the output collector.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4510) FileOutputFormat protects getTaskOutputPath

Posted by "Robert Chansler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Chansler updated HADOOP-4510:
------------------------------------

    Release Note:   (was: Changes the static method to public)

> FileOutputFormat protects getTaskOutputPath
> -------------------------------------------
>
>                 Key: HADOOP-4510
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4510
>             Project: Hadoop Core
>          Issue Type: Bug
>    Affects Versions: 0.19.0
>            Reporter: Chris K Wensel
>            Assignee: Chris K Wensel
>            Priority: Blocker
>             Fix For: 0.19.0
>
>         Attachments: hadoop-4510.patch
>
>
> o.a.h.m.FileOutputFormat#getTaskOutputPath() is protected. 
> Having access to a task output directory as used internally by RecordWriters is quite handy. This is especially true if the user is attempting to serialize out data in a similar fashion as the output collector.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4510) FileOutputFormat protects getTaskOutputPath

Posted by "Chris K Wensel (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris K Wensel updated HADOOP-4510:
-----------------------------------

    Attachment: hadoop-4510.patch

changes protected to public

> FileOutputFormat protects getTaskOutputPath
> -------------------------------------------
>
>                 Key: HADOOP-4510
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4510
>             Project: Hadoop Core
>          Issue Type: Bug
>    Affects Versions: 0.19.0
>            Reporter: Chris K Wensel
>            Priority: Blocker
>         Attachments: hadoop-4510.patch
>
>
> o.a.h.m.FileOutputFormat#getTaskOutputPath() is protected. 
> Having access to a task output directory as used internally by RecordWriters is quite handy. This is especially true if the user is attempting to serialize out data in a similar fashion as the output collector.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4510) FileOutputFormat protects getTaskOutputPath

Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12642372#action_12642372 ] 

Amareshwari Sriramadasu commented on HADOOP-4510:
-------------------------------------------------

Users can get the task's temporary output path from the api FileOutputFormat.getWorkOutputPath(), where he can create side files and etc. 
The method FileOutputFormat.getTaskOutputPath() is used internally by the RecordWriters. And it is protected sothat outputFormats extending FileOutputFormat can use this in RecordWriters. I dont see a reason why this should be made public.

> FileOutputFormat protects getTaskOutputPath
> -------------------------------------------
>
>                 Key: HADOOP-4510
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4510
>             Project: Hadoop Core
>          Issue Type: Bug
>    Affects Versions: 0.19.0
>            Reporter: Chris K Wensel
>            Priority: Blocker
>         Attachments: hadoop-4510.patch
>
>
> o.a.h.m.FileOutputFormat#getTaskOutputPath() is protected. 
> Having access to a task output directory as used internally by RecordWriters is quite handy. This is especially true if the user is attempting to serialize out data in a similar fashion as the output collector.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4510) FileOutputFormat protects getTaskOutputPath

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12644299#action_12644299 ] 

Hudson commented on HADOOP-4510:
--------------------------------

Integrated in Hadoop-trunk #648 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/648/])
    Move  to 0.19.0


> FileOutputFormat protects getTaskOutputPath
> -------------------------------------------
>
>                 Key: HADOOP-4510
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4510
>             Project: Hadoop Core
>          Issue Type: Bug
>    Affects Versions: 0.19.0
>            Reporter: Chris K Wensel
>            Assignee: Chris K Wensel
>            Priority: Blocker
>             Fix For: 0.19.0
>
>         Attachments: hadoop-4510.patch
>
>
> o.a.h.m.FileOutputFormat#getTaskOutputPath() is protected. 
> Having access to a task output directory as used internally by RecordWriters is quite handy. This is especially true if the user is attempting to serialize out data in a similar fashion as the output collector.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4510) FileOutputFormat protects getTaskOutputPath

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Owen O'Malley updated HADOOP-4510:
----------------------------------

       Resolution: Fixed
    Fix Version/s: 0.20.0
         Assignee: Chris K Wensel
     Hadoop Flags: [Reviewed]
           Status: Resolved  (was: Patch Available)

I just committed this. Thanks, Chris!

> FileOutputFormat protects getTaskOutputPath
> -------------------------------------------
>
>                 Key: HADOOP-4510
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4510
>             Project: Hadoop Core
>          Issue Type: Bug
>    Affects Versions: 0.19.0
>            Reporter: Chris K Wensel
>            Assignee: Chris K Wensel
>            Priority: Blocker
>             Fix For: 0.20.0
>
>         Attachments: hadoop-4510.patch
>
>
> o.a.h.m.FileOutputFormat#getTaskOutputPath() is protected. 
> Having access to a task output directory as used internally by RecordWriters is quite handy. This is especially true if the user is attempting to serialize out data in a similar fashion as the output collector.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4510) FileOutputFormat protects getTaskOutputPath

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12642926#action_12642926 ] 

Hudson commented on HADOOP-4510:
--------------------------------

Integrated in Hadoop-trunk #644 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/644/])
    

> FileOutputFormat protects getTaskOutputPath
> -------------------------------------------
>
>                 Key: HADOOP-4510
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4510
>             Project: Hadoop Core
>          Issue Type: Bug
>    Affects Versions: 0.19.0
>            Reporter: Chris K Wensel
>            Assignee: Chris K Wensel
>            Priority: Blocker
>             Fix For: 0.20.0
>
>         Attachments: hadoop-4510.patch
>
>
> o.a.h.m.FileOutputFormat#getTaskOutputPath() is protected. 
> Having access to a task output directory as used internally by RecordWriters is quite handy. This is especially true if the user is attempting to serialize out data in a similar fashion as the output collector.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4510) FileOutputFormat protects getTaskOutputPath

Posted by "Chris K Wensel (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12642351#action_12642351 ] 

Chris K Wensel commented on HADOOP-4510:
----------------------------------------

The simple solution is to make the method public.

The alternative of not having RecordWriters write to this magic directory seems more daunting considering this bit of Hadoop changes every release. 

> FileOutputFormat protects getTaskOutputPath
> -------------------------------------------
>
>                 Key: HADOOP-4510
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4510
>             Project: Hadoop Core
>          Issue Type: Bug
>    Affects Versions: 0.19.0
>            Reporter: Chris K Wensel
>            Priority: Blocker
>
> o.a.h.m.FileOutputFormat#getTaskOutputPath() is protected. 
> Having access to a task output directory as used internally by RecordWriters is quite handy. This is especially true if the user is attempting to serialize out data in a similar fashion as the output collector.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4510) FileOutputFormat protects getTaskOutputPath

Posted by "Chris K Wensel (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris K Wensel updated HADOOP-4510:
-----------------------------------

    Release Note: Changes the static method to public
          Status: Patch Available  (was: Open)

> FileOutputFormat protects getTaskOutputPath
> -------------------------------------------
>
>                 Key: HADOOP-4510
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4510
>             Project: Hadoop Core
>          Issue Type: Bug
>    Affects Versions: 0.19.0
>            Reporter: Chris K Wensel
>            Priority: Blocker
>         Attachments: hadoop-4510.patch
>
>
> o.a.h.m.FileOutputFormat#getTaskOutputPath() is protected. 
> Having access to a task output directory as used internally by RecordWriters is quite handy. This is especially true if the user is attempting to serialize out data in a similar fashion as the output collector.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4510) FileOutputFormat protects getTaskOutputPath

Posted by "Chris K Wensel (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12644181#action_12644181 ] 

Chris K Wensel commented on HADOOP-4510:
----------------------------------------

Great news, thanks!

> FileOutputFormat protects getTaskOutputPath
> -------------------------------------------
>
>                 Key: HADOOP-4510
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4510
>             Project: Hadoop Core
>          Issue Type: Bug
>    Affects Versions: 0.19.0
>            Reporter: Chris K Wensel
>            Assignee: Chris K Wensel
>            Priority: Blocker
>             Fix For: 0.19.0
>
>         Attachments: hadoop-4510.patch
>
>
> o.a.h.m.FileOutputFormat#getTaskOutputPath() is protected. 
> Having access to a task output directory as used internally by RecordWriters is quite handy. This is especially true if the user is attempting to serialize out data in a similar fashion as the output collector.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4510) FileOutputFormat protects getTaskOutputPath

Posted by "Nigel Daley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nigel Daley updated HADOOP-4510:
--------------------------------

    Fix Version/s:     (was: 0.20.0)
                   0.19.0

Now this is pushed to the 0.19 branch, Jira should reflect the change.

> FileOutputFormat protects getTaskOutputPath
> -------------------------------------------
>
>                 Key: HADOOP-4510
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4510
>             Project: Hadoop Core
>          Issue Type: Bug
>    Affects Versions: 0.19.0
>            Reporter: Chris K Wensel
>            Assignee: Chris K Wensel
>            Priority: Blocker
>             Fix For: 0.19.0
>
>         Attachments: hadoop-4510.patch
>
>
> o.a.h.m.FileOutputFormat#getTaskOutputPath() is protected. 
> Having access to a task output directory as used internally by RecordWriters is quite handy. This is especially true if the user is attempting to serialize out data in a similar fashion as the output collector.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4510) FileOutputFormat protects getTaskOutputPath

Posted by "Chris K Wensel (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12642383#action_12642383 ] 

Chris K Wensel commented on HADOOP-4510:
----------------------------------------

We prefer it public because we write through the FileOutputFormat class via a RecordWriter, which internally (magically) inserts the temp path and task id path at the end of the intended path. 

This is done so that speculative execution will succeed. And we would like to benefit from this behavior, so aren't really asking that it change.

The side effect is that we have no way of finding the actual location of the data written and then moving it to where it was intended to be written. 

Since we don't have multiple (named) output collectors, we must emulate the behavior through our own api.

> FileOutputFormat protects getTaskOutputPath
> -------------------------------------------
>
>                 Key: HADOOP-4510
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4510
>             Project: Hadoop Core
>          Issue Type: Bug
>    Affects Versions: 0.19.0
>            Reporter: Chris K Wensel
>            Priority: Blocker
>         Attachments: hadoop-4510.patch
>
>
> o.a.h.m.FileOutputFormat#getTaskOutputPath() is protected. 
> Having access to a task output directory as used internally by RecordWriters is quite handy. This is especially true if the user is attempting to serialize out data in a similar fashion as the output collector.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.