You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Runping Qi (JIRA)" <ji...@apache.org> on 2008/07/03 00:02:45 UTC

[jira] Created: (HADOOP-3684) The data_join should allow the user to implement a customer cloning function

The data_join should allow the user to implement a customer cloning function
----------------------------------------------------------------------------

                 Key: HADOOP-3684
                 URL: https://issues.apache.org/jira/browse/HADOOP-3684
             Project: Hadoop Core
          Issue Type: Bug
          Components: mapred
            Reporter: Runping Qi



Currently, the framework uses serialization/deserialization to clone the values passed to the resuce function.
This amounts to a very heavy weight deep copy of the value objects.
That is way too expensive. Although that may be a generic way to work for all possible value classes, thus good as a default way,
the framework should allow the user to implemet an application specific yet efficient cloning function.
 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3684) The data_join should allow the user to implement a customer cloning function

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12624794#action_12624794 ] 

Hudson commented on HADOOP-3684:
--------------------------------

Integrated in Hadoop-trunk #581 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/581/])

> The data_join should allow the user to implement a customer cloning function
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-3684
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3684
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Runping Qi
>            Assignee: Runping Qi
>             Fix For: 0.19.0
>
>         Attachments: H-3684.txt
>
>
> Currently, the framework uses serialization/deserialization to clone the values passed to the resuce function.
> This amounts to a very heavy weight deep copy of the value objects.
> That is way too expensive. Although that may be a generic way to work for all possible value classes, thus good as a default way,
> the framework should allow the user to implemet an application specific yet efficient cloning function.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3684) The data_join should allow the user to implement a customer cloning function

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Douglas updated HADOOP-3684:
----------------------------------

    Issue Type: Improvement  (was: Bug)

> The data_join should allow the user to implement a customer cloning function
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-3684
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3684
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Runping Qi
>            Assignee: Runping Qi
>             Fix For: 0.19.0
>
>         Attachments: H-3684.txt
>
>
> Currently, the framework uses serialization/deserialization to clone the values passed to the resuce function.
> This amounts to a very heavy weight deep copy of the value objects.
> That is way too expensive. Although that may be a generic way to work for all possible value classes, thus good as a default way,
> the framework should allow the user to implemet an application specific yet efficient cloning function.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3684) The data_join should allow the user to implement a customer cloning function

Posted by "Runping Qi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Runping Qi updated HADOOP-3684:
-------------------------------

    Attachment:     (was: H-3684.txt)

> The data_join should allow the user to implement a customer cloning function
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-3684
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3684
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Runping Qi
>             Fix For: 0.19.0
>
>         Attachments: H-3684.txt
>
>
> Currently, the framework uses serialization/deserialization to clone the values passed to the resuce function.
> This amounts to a very heavy weight deep copy of the value objects.
> That is way too expensive. Although that may be a generic way to work for all possible value classes, thus good as a default way,
> the framework should allow the user to implemet an application specific yet efficient cloning function.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3684) The data_join should allow the user to implement a customer cloning function

Posted by "Robert Chansler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Chansler updated HADOOP-3684:
------------------------------------

    Release Note: Allowed user to overwrite clone function in a subclass of TaggedMapOutput class.  (was: make it possible for the user to overwrite clone function in a subclass of TaggedMapOutput class)

> The data_join should allow the user to implement a customer cloning function
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-3684
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3684
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Runping Qi
>            Assignee: Runping Qi
>             Fix For: 0.19.0
>
>         Attachments: H-3684.txt
>
>
> Currently, the framework uses serialization/deserialization to clone the values passed to the resuce function.
> This amounts to a very heavy weight deep copy of the value objects.
> That is way too expensive. Although that may be a generic way to work for all possible value classes, thus good as a default way,
> the framework should allow the user to implemet an application specific yet efficient cloning function.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3684) The data_join should allow the user to implement a customer cloning function

Posted by "Runping Qi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Runping Qi updated HADOOP-3684:
-------------------------------

    Fix Version/s: 0.19.0
     Release Note: make it possible for the user to overwrite clone function in a subclass of TaggedMapOutput class
           Status: Patch Available  (was: Open)

> The data_join should allow the user to implement a customer cloning function
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-3684
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3684
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Runping Qi
>             Fix For: 0.19.0
>
>         Attachments: H-3684.txt
>
>
> Currently, the framework uses serialization/deserialization to clone the values passed to the resuce function.
> This amounts to a very heavy weight deep copy of the value objects.
> That is way too expensive. Although that may be a generic way to work for all possible value classes, thus good as a default way,
> the framework should allow the user to implemet an application specific yet efficient cloning function.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3684) The data_join should allow the user to implement a customer cloning function

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Douglas updated HADOOP-3684:
----------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

I just committed this. Thanks, Runping

> The data_join should allow the user to implement a customer cloning function
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-3684
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3684
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Runping Qi
>            Assignee: Runping Qi
>             Fix For: 0.19.0
>
>         Attachments: H-3684.txt
>
>
> Currently, the framework uses serialization/deserialization to clone the values passed to the resuce function.
> This amounts to a very heavy weight deep copy of the value objects.
> That is way too expensive. Although that may be a generic way to work for all possible value classes, thus good as a default way,
> the framework should allow the user to implemet an application specific yet efficient cloning function.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3684) The data_join should allow the user to implement a customer cloning function

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12610127#action_12610127 ] 

Hadoop QA commented on HADOOP-3684:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12385161/H-3684.txt
  against trunk revision 673517.

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no tests are needed for this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    -1 contrib tests.  The patch failed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2789/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2789/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2789/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2789/console

This message is automatically generated.

> The data_join should allow the user to implement a customer cloning function
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-3684
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3684
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Runping Qi
>             Fix For: 0.19.0
>
>         Attachments: H-3684.txt
>
>
> Currently, the framework uses serialization/deserialization to clone the values passed to the resuce function.
> This amounts to a very heavy weight deep copy of the value objects.
> That is way too expensive. Although that may be a generic way to work for all possible value classes, thus good as a default way,
> the framework should allow the user to implemet an application specific yet efficient cloning function.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3684) The data_join should allow the user to implement a customer cloning function

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12610093#action_12610093 ] 

Hadoop QA commented on HADOOP-3684:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12385153/H-3684.txt
  against trunk revision 673517.

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no tests are needed for this patch.

    -1 patch.  The patch command could not apply the patch.

Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2786/console

This message is automatically generated.

> The data_join should allow the user to implement a customer cloning function
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-3684
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3684
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Runping Qi
>             Fix For: 0.19.0
>
>         Attachments: H-3684.txt
>
>
> Currently, the framework uses serialization/deserialization to clone the values passed to the resuce function.
> This amounts to a very heavy weight deep copy of the value objects.
> That is way too expensive. Although that may be a generic way to work for all possible value classes, thus good as a default way,
> the framework should allow the user to implemet an application specific yet efficient cloning function.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3684) The data_join should allow the user to implement a customer cloning function

Posted by "Runping Qi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Runping Qi updated HADOOP-3684:
-------------------------------

    Status: Open  (was: Patch Available)

> The data_join should allow the user to implement a customer cloning function
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-3684
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3684
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Runping Qi
>             Fix For: 0.19.0
>
>         Attachments: H-3684.txt
>
>
> Currently, the framework uses serialization/deserialization to clone the values passed to the resuce function.
> This amounts to a very heavy weight deep copy of the value objects.
> That is way too expensive. Although that may be a generic way to work for all possible value classes, thus good as a default way,
> the framework should allow the user to implemet an application specific yet efficient cloning function.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3684) The data_join should allow the user to implement a customer cloning function

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Douglas updated HADOOP-3684:
----------------------------------

     Description: 
Currently, the framework uses serialization/deserialization to clone the values passed to the resuce function.
This amounts to a very heavy weight deep copy of the value objects.
That is way too expensive. Although that may be a generic way to work for all possible value classes, thus good as a default way,
the framework should allow the user to implemet an application specific yet efficient cloning function.
 

  was:

Currently, the framework uses serialization/deserialization to clone the values passed to the resuce function.
This amounts to a very heavy weight deep copy of the value objects.
That is way too expensive. Although that may be a generic way to work for all possible value classes, thus good as a default way,
the framework should allow the user to implemet an application specific yet efficient cloning function.
 

        Assignee: Runping Qi
    Hadoop Flags: [Reviewed]

+1

> The data_join should allow the user to implement a customer cloning function
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-3684
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3684
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Runping Qi
>            Assignee: Runping Qi
>             Fix For: 0.19.0
>
>         Attachments: H-3684.txt
>
>
> Currently, the framework uses serialization/deserialization to clone the values passed to the resuce function.
> This amounts to a very heavy weight deep copy of the value objects.
> That is way too expensive. Although that may be a generic way to work for all possible value classes, thus good as a default way,
> the framework should allow the user to implemet an application specific yet efficient cloning function.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3684) The data_join should allow the user to implement a customer cloning function

Posted by "Runping Qi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Runping Qi updated HADOOP-3684:
-------------------------------

    Attachment: H-3684.txt


Attach a simple patch.

This patch allows the user to overwrite the clone(JobConf job) 
method in the subclass of TaggedMapOutputclass.

> The data_join should allow the user to implement a customer cloning function
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-3684
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3684
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Runping Qi
>         Attachments: H-3684.txt
>
>
> Currently, the framework uses serialization/deserialization to clone the values passed to the resuce function.
> This amounts to a very heavy weight deep copy of the value objects.
> That is way too expensive. Although that may be a generic way to work for all possible value classes, thus good as a default way,
> the framework should allow the user to implemet an application specific yet efficient cloning function.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3684) The data_join should allow the user to implement a customer cloning function

Posted by "Runping Qi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Runping Qi updated HADOOP-3684:
-------------------------------

    Status: Patch Available  (was: Open)


regenerate the patch

> The data_join should allow the user to implement a customer cloning function
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-3684
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3684
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Runping Qi
>             Fix For: 0.19.0
>
>         Attachments: H-3684.txt
>
>
> Currently, the framework uses serialization/deserialization to clone the values passed to the resuce function.
> This amounts to a very heavy weight deep copy of the value objects.
> That is way too expensive. Although that may be a generic way to work for all possible value classes, thus good as a default way,
> the framework should allow the user to implemet an application specific yet efficient cloning function.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3684) The data_join should allow the user to implement a customer cloning function

Posted by "Runping Qi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Runping Qi updated HADOOP-3684:
-------------------------------

    Attachment: H-3684.txt

> The data_join should allow the user to implement a customer cloning function
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-3684
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3684
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Runping Qi
>             Fix For: 0.19.0
>
>         Attachments: H-3684.txt
>
>
> Currently, the framework uses serialization/deserialization to clone the values passed to the resuce function.
> This amounts to a very heavy weight deep copy of the value objects.
> That is way too expensive. Although that may be a generic way to work for all possible value classes, thus good as a default way,
> the framework should allow the user to implemet an application specific yet efficient cloning function.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.