You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Jingkei Ly (JIRA)" <ji...@apache.org> on 2009/03/27 13:16:50 UTC

[jira] Created: (HADOOP-5589) TupleWritable: Lift implicit limit on the number of values that can be stored

TupleWritable: Lift implicit limit on the number of values that can be stored
-----------------------------------------------------------------------------

                 Key: HADOOP-5589
                 URL: https://issues.apache.org/jira/browse/HADOOP-5589
             Project: Hadoop Core
          Issue Type: Improvement
          Components: mapred
    Affects Versions: 0.21.0
            Reporter: Jingkei Ly


TupleWritable uses an instance field of the primitive type, long, which I presume is so that it can quickly determine if a position has been written to in its array of Writables (by using bit-shifting operations on the long field). The problem with this is that it implies that there is a maximum limit of 64 values you can store in a TupleWritable.

An example of a use-case where I think this would be a problem is if you had two MR jobs with over 64 reduces tasks and you wanted to join the outputs with CompositeInputFormat  - this will probably cause unexpected results in the current scheme.

At the very least, the 64-value limit should be documented in TupleWritable.



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5589) TupleWritable: Lift implicit limit on the number of values that can be stored

Posted by "Jingkei Ly (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jingkei Ly updated HADOOP-5589:
-------------------------------

    Status: In Progress  (was: Patch Available)

> TupleWritable: Lift implicit limit on the number of values that can be stored
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-5589
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5589
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.21.0
>            Reporter: Jingkei Ly
>            Assignee: Jingkei Ly
>         Attachments: HADOOP-5589-1.patch, HADOOP-5589-2.patch, HADOOP-5589-3.patch, HADOOP-5589-4.patch
>
>
> TupleWritable uses an instance field of the primitive type, long, which I presume is so that it can quickly determine if a position has been written to in its array of Writables (by using bit-shifting operations on the long field). The problem with this is that it implies that there is a maximum limit of 64 values you can store in a TupleWritable.
> An example of a use-case where I think this would be a problem is if you had two MR jobs with over 64 reduces tasks and you wanted to join the outputs with CompositeInputFormat  - this will probably cause unexpected results in the current scheme.
> At the very least, the 64-value limit should be documented in TupleWritable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5589) TupleWritable: Lift implicit limit on the number of values that can be stored

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12702761#action_12702761 ] 

Hudson commented on HADOOP-5589:
--------------------------------

Integrated in Hadoop-trunk #817 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/817/])
    . Eliminate source limit of 64 for map-side joins imposed by
TupleWritable encoding. Contributed by Jingkei Ly


> TupleWritable: Lift implicit limit on the number of values that can be stored
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-5589
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5589
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.19.1
>            Reporter: Jingkei Ly
>            Assignee: Jingkei Ly
>             Fix For: 0.21.0
>
>         Attachments: HADOOP-5589-1.patch, HADOOP-5589-2.patch, HADOOP-5589-3.patch, HADOOP-5589-4.patch, HADOOP-5589-4.patch
>
>
> TupleWritable uses an instance field of the primitive type, long, which I presume is so that it can quickly determine if a position has been written to in its array of Writables (by using bit-shifting operations on the long field). The problem with this is that it implies that there is a maximum limit of 64 values you can store in a TupleWritable.
> An example of a use-case where I think this would be a problem is if you had two MR jobs with over 64 reduces tasks and you wanted to join the outputs with CompositeInputFormat  - this will probably cause unexpected results in the current scheme.
> At the very least, the 64-value limit should be documented in TupleWritable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5589) TupleWritable: Lift implicit limit on the number of values that can be stored

Posted by "Jingkei Ly (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12689884#action_12689884 ] 

Jingkei Ly commented on HADOOP-5589:
------------------------------------

The example below demonstrates the current 64-value limit in TupleWritable:
{code}
Text emptyText = new Text("Should not be set written");
public void testTupleBoundarySuccess() throws Exception {
    Writable[] values = new Writable[64];
    Arrays.fill(values,emptyText);
    values[63] = new Text("Should be the only value set written");
                                     
    TupleWritable tuple = new TupleWritable(values);
    tuple.setWritten(63);
    
    for (int pos=0; pos<tuple.size();pos++) {
      boolean has = tuple.has(pos);
      if (pos == 63) {
        assertTrue(has);
      }
      else {
        assertFalse("Tuple position is incorrectly labelled as set: " + pos, has);
      }
    }
}
 
public void testTupleBoundaryFailure() throws Exception {
    Writable[] values = new Writable[65];
    Arrays.fill(values,emptyText);
    values[64] = new Text("Should be the only value set written");
                                     
    TupleWritable tuple = new TupleWritable(values);
    tuple.setWritten(64);
    
    for (int pos=0; pos<tuple.size();pos++) {
      boolean has = tuple.has(pos);
      if (pos == 64) {
        assertTrue(has);
      }
      else {
        assertFalse("Tuple position is incorrectly labelled as set: " + pos, has);
      }
    }
  }
{code}

> TupleWritable: Lift implicit limit on the number of values that can be stored
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-5589
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5589
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.21.0
>            Reporter: Jingkei Ly
>
> TupleWritable uses an instance field of the primitive type, long, which I presume is so that it can quickly determine if a position has been written to in its array of Writables (by using bit-shifting operations on the long field). The problem with this is that it implies that there is a maximum limit of 64 values you can store in a TupleWritable.
> An example of a use-case where I think this would be a problem is if you had two MR jobs with over 64 reduces tasks and you wanted to join the outputs with CompositeInputFormat  - this will probably cause unexpected results in the current scheme.
> At the very least, the 64-value limit should be documented in TupleWritable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5589) TupleWritable: Lift implicit limit on the number of values that can be stored

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699648#action_12699648 ] 

Hadoop QA commented on HADOOP-5589:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12404067/HADOOP-5589-3.patch
  against trunk revision 765427.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 1 new Findbugs warnings.

    +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    -1 core tests.  The patch failed core unit tests.

    -1 contrib tests.  The patch failed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/200/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/200/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/200/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/200/console

This message is automatically generated.

> TupleWritable: Lift implicit limit on the number of values that can be stored
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-5589
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5589
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.21.0
>            Reporter: Jingkei Ly
>            Assignee: Jingkei Ly
>         Attachments: HADOOP-5589-1.patch, HADOOP-5589-2.patch, HADOOP-5589-3.patch
>
>
> TupleWritable uses an instance field of the primitive type, long, which I presume is so that it can quickly determine if a position has been written to in its array of Writables (by using bit-shifting operations on the long field). The problem with this is that it implies that there is a maximum limit of 64 values you can store in a TupleWritable.
> An example of a use-case where I think this would be a problem is if you had two MR jobs with over 64 reduces tasks and you wanted to join the outputs with CompositeInputFormat  - this will probably cause unexpected results in the current scheme.
> At the very least, the 64-value limit should be documented in TupleWritable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5589) TupleWritable: Lift implicit limit on the number of values that can be stored

Posted by "Jingkei Ly (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jingkei Ly updated HADOOP-5589:
-------------------------------

    Status: Patch Available  (was: In Progress)

> TupleWritable: Lift implicit limit on the number of values that can be stored
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-5589
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5589
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.19.1
>            Reporter: Jingkei Ly
>            Assignee: Jingkei Ly
>         Attachments: HADOOP-5589-1.patch, HADOOP-5589-2.patch, HADOOP-5589-3.patch, HADOOP-5589-4.patch, HADOOP-5589-4.patch
>
>
> TupleWritable uses an instance field of the primitive type, long, which I presume is so that it can quickly determine if a position has been written to in its array of Writables (by using bit-shifting operations on the long field). The problem with this is that it implies that there is a maximum limit of 64 values you can store in a TupleWritable.
> An example of a use-case where I think this would be a problem is if you had two MR jobs with over 64 reduces tasks and you wanted to join the outputs with CompositeInputFormat  - this will probably cause unexpected results in the current scheme.
> At the very least, the 64-value limit should be documented in TupleWritable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5589) TupleWritable: Lift implicit limit on the number of values that can be stored

Posted by "Jingkei Ly (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jingkei Ly updated HADOOP-5589:
-------------------------------

    Attachment: HADOOP-5589-4.patch

Applied changes to patch as-per Chris' comments (I particularly like the simplication to my overcomplicated for-loop in #readBitSet)

> TupleWritable: Lift implicit limit on the number of values that can be stored
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-5589
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5589
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.19.1
>            Reporter: Jingkei Ly
>            Assignee: Jingkei Ly
>         Attachments: HADOOP-5589-1.patch, HADOOP-5589-2.patch, HADOOP-5589-3.patch, HADOOP-5589-4.patch, HADOOP-5589-4.patch
>
>
> TupleWritable uses an instance field of the primitive type, long, which I presume is so that it can quickly determine if a position has been written to in its array of Writables (by using bit-shifting operations on the long field). The problem with this is that it implies that there is a maximum limit of 64 values you can store in a TupleWritable.
> An example of a use-case where I think this would be a problem is if you had two MR jobs with over 64 reduces tasks and you wanted to join the outputs with CompositeInputFormat  - this will probably cause unexpected results in the current scheme.
> At the very least, the 64-value limit should be documented in TupleWritable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5589) TupleWritable: Lift implicit limit on the number of values that can be stored

Posted by "Jingkei Ly (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jingkei Ly updated HADOOP-5589:
-------------------------------

    Affects Version/s:     (was: 0.21.0)
                       0.19.1
               Status: Patch Available  (was: In Progress)

> TupleWritable: Lift implicit limit on the number of values that can be stored
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-5589
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5589
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.19.1
>            Reporter: Jingkei Ly
>            Assignee: Jingkei Ly
>         Attachments: HADOOP-5589-1.patch, HADOOP-5589-2.patch, HADOOP-5589-3.patch, HADOOP-5589-4.patch
>
>
> TupleWritable uses an instance field of the primitive type, long, which I presume is so that it can quickly determine if a position has been written to in its array of Writables (by using bit-shifting operations on the long field). The problem with this is that it implies that there is a maximum limit of 64 values you can store in a TupleWritable.
> An example of a use-case where I think this would be a problem is if you had two MR jobs with over 64 reduces tasks and you wanted to join the outputs with CompositeInputFormat  - this will probably cause unexpected results in the current scheme.
> At the very least, the 64-value limit should be documented in TupleWritable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5589) TupleWritable: Lift implicit limit on the number of values that can be stored

Posted by "Jingkei Ly (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698722#action_12698722 ] 

Jingkei Ly commented on HADOOP-5589:
------------------------------------

I think I'm probably going to be hitting this problem soon. Has anybody got any comments on the proposed solution and the patch I submitted?

> TupleWritable: Lift implicit limit on the number of values that can be stored
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-5589
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5589
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.21.0
>            Reporter: Jingkei Ly
>         Attachments: HADOOP-5589-1.patch, HADOOP-5589-2.patch, HADOOP-5589-3.patch
>
>
> TupleWritable uses an instance field of the primitive type, long, which I presume is so that it can quickly determine if a position has been written to in its array of Writables (by using bit-shifting operations on the long field). The problem with this is that it implies that there is a maximum limit of 64 values you can store in a TupleWritable.
> An example of a use-case where I think this would be a problem is if you had two MR jobs with over 64 reduces tasks and you wanted to join the outputs with CompositeInputFormat  - this will probably cause unexpected results in the current scheme.
> At the very least, the 64-value limit should be documented in TupleWritable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5589) TupleWritable: Lift implicit limit on the number of values that can be stored

Posted by "Jingkei Ly (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12701458#action_12701458 ] 

Jingkei Ly commented on HADOOP-5589:
------------------------------------

The test that failed appears to be unrelated to this patch, and passes when run locally on my box.

> TupleWritable: Lift implicit limit on the number of values that can be stored
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-5589
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5589
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.19.1
>            Reporter: Jingkei Ly
>            Assignee: Jingkei Ly
>         Attachments: HADOOP-5589-1.patch, HADOOP-5589-2.patch, HADOOP-5589-3.patch, HADOOP-5589-4.patch
>
>
> TupleWritable uses an instance field of the primitive type, long, which I presume is so that it can quickly determine if a position has been written to in its array of Writables (by using bit-shifting operations on the long field). The problem with this is that it implies that there is a maximum limit of 64 values you can store in a TupleWritable.
> An example of a use-case where I think this would be a problem is if you had two MR jobs with over 64 reduces tasks and you wanted to join the outputs with CompositeInputFormat  - this will probably cause unexpected results in the current scheme.
> At the very least, the 64-value limit should be documented in TupleWritable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5589) TupleWritable: Lift implicit limit on the number of values that can be stored

Posted by "Jingkei Ly (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12689885#action_12689885 ] 

Jingkei Ly commented on HADOOP-5589:
------------------------------------

My suggestion would be to replace the long field, "written", with a java.util.BitSet. This should provide more than enough space to do the job of the written field in a more scalable fashion, and is relatively compact. I'm not sure of what the performance implications are though, if any.

> TupleWritable: Lift implicit limit on the number of values that can be stored
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-5589
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5589
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.21.0
>            Reporter: Jingkei Ly
>
> TupleWritable uses an instance field of the primitive type, long, which I presume is so that it can quickly determine if a position has been written to in its array of Writables (by using bit-shifting operations on the long field). The problem with this is that it implies that there is a maximum limit of 64 values you can store in a TupleWritable.
> An example of a use-case where I think this would be a problem is if you had two MR jobs with over 64 reduces tasks and you wanted to join the outputs with CompositeInputFormat  - this will probably cause unexpected results in the current scheme.
> At the very least, the 64-value limit should be documented in TupleWritable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5589) TupleWritable: Lift implicit limit on the number of values that can be stored

Posted by "Jingkei Ly (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jingkei Ly updated HADOOP-5589:
-------------------------------

    Attachment: HADOOP-5589-4.patch

I've attached a new patch to support reading of TupleWritables written by previous versions of the class. As discussed, it will fail to read older TupleWritables if they held > 64 values. 

The new patch contains additional tests to test backwards-compatibility and attempts to fix the findbugs warning.

> TupleWritable: Lift implicit limit on the number of values that can be stored
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-5589
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5589
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.21.0
>            Reporter: Jingkei Ly
>            Assignee: Jingkei Ly
>         Attachments: HADOOP-5589-1.patch, HADOOP-5589-2.patch, HADOOP-5589-3.patch, HADOOP-5589-4.patch
>
>
> TupleWritable uses an instance field of the primitive type, long, which I presume is so that it can quickly determine if a position has been written to in its array of Writables (by using bit-shifting operations on the long field). The problem with this is that it implies that there is a maximum limit of 64 values you can store in a TupleWritable.
> An example of a use-case where I think this would be a problem is if you had two MR jobs with over 64 reduces tasks and you wanted to join the outputs with CompositeInputFormat  - this will probably cause unexpected results in the current scheme.
> At the very least, the 64-value limit should be documented in TupleWritable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5589) TupleWritable: Lift implicit limit on the number of values that can be stored

Posted by "Jingkei Ly (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jingkei Ly updated HADOOP-5589:
-------------------------------

    Status: In Progress  (was: Patch Available)

> TupleWritable: Lift implicit limit on the number of values that can be stored
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-5589
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5589
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.19.1
>            Reporter: Jingkei Ly
>            Assignee: Jingkei Ly
>         Attachments: HADOOP-5589-1.patch, HADOOP-5589-2.patch, HADOOP-5589-3.patch, HADOOP-5589-4.patch, HADOOP-5589-4.patch
>
>
> TupleWritable uses an instance field of the primitive type, long, which I presume is so that it can quickly determine if a position has been written to in its array of Writables (by using bit-shifting operations on the long field). The problem with this is that it implies that there is a maximum limit of 64 values you can store in a TupleWritable.
> An example of a use-case where I think this would be a problem is if you had two MR jobs with over 64 reduces tasks and you wanted to join the outputs with CompositeInputFormat  - this will probably cause unexpected results in the current scheme.
> At the very least, the 64-value limit should be documented in TupleWritable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5589) TupleWritable: Lift implicit limit on the number of values that can be stored

Posted by "Jingkei Ly (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12717752#action_12717752 ] 

Jingkei Ly commented on HADOOP-5589:
------------------------------------

It looks like you would probably need to backport HADOOP-3721 as well, as this incorporated a change to CompositeRecordReader.JoinCollector that removed the limitation from using longs. 

> TupleWritable: Lift implicit limit on the number of values that can be stored
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-5589
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5589
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.19.1
>            Reporter: Jingkei Ly
>            Assignee: Jingkei Ly
>             Fix For: 0.21.0
>
>         Attachments: HADOOP-5589-1.patch, HADOOP-5589-2.patch, HADOOP-5589-3.patch, HADOOP-5589-4.patch, HADOOP-5589-4.patch
>
>
> TupleWritable uses an instance field of the primitive type, long, which I presume is so that it can quickly determine if a position has been written to in its array of Writables (by using bit-shifting operations on the long field). The problem with this is that it implies that there is a maximum limit of 64 values you can store in a TupleWritable.
> An example of a use-case where I think this would be a problem is if you had two MR jobs with over 64 reduces tasks and you wanted to join the outputs with CompositeInputFormat  - this will probably cause unexpected results in the current scheme.
> At the very least, the 64-value limit should be documented in TupleWritable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5589) TupleWritable: Lift implicit limit on the number of values that can be stored

Posted by "Jingkei Ly (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699395#action_12699395 ] 

Jingkei Ly commented on HADOOP-5589:
------------------------------------

The patch could be backwards-compatible if the bitset was written to the stream as VLongs (essentially what had been implemented in HADOOP-5589-2.patch, minus the bug with sparse bitsets), as the bytes written to the stream would be exactly the same in both implementations as long as there were less than 64 values.

However, because we can't read an old TupleWritable containing over 64 values without throwing an EOFException, it won't be "fully" backwardly-compatible. 

While I would be tempted to argue that TupleWritable never supported over 64 values in a tuple anyway, is there still a need to support users who were storing tuples over 64 values but with incorrect results? 



> TupleWritable: Lift implicit limit on the number of values that can be stored
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-5589
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5589
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.21.0
>            Reporter: Jingkei Ly
>            Assignee: Jingkei Ly
>         Attachments: HADOOP-5589-1.patch, HADOOP-5589-2.patch, HADOOP-5589-3.patch
>
>
> TupleWritable uses an instance field of the primitive type, long, which I presume is so that it can quickly determine if a position has been written to in its array of Writables (by using bit-shifting operations on the long field). The problem with this is that it implies that there is a maximum limit of 64 values you can store in a TupleWritable.
> An example of a use-case where I think this would be a problem is if you had two MR jobs with over 64 reduces tasks and you wanted to join the outputs with CompositeInputFormat  - this will probably cause unexpected results in the current scheme.
> At the very least, the 64-value limit should be documented in TupleWritable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5589) TupleWritable: Lift implicit limit on the number of values that can be stored

Posted by "Jingkei Ly (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jingkei Ly updated HADOOP-5589:
-------------------------------

    Attachment: HADOOP-5589-3.patch

This is another change to the patch - this time to make the serialized bitset more compact (by writing the bitset to the stream as bytes instead of variable-length longs). Also addressed a problem when writing a sparse bitset.

> TupleWritable: Lift implicit limit on the number of values that can be stored
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-5589
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5589
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.21.0
>            Reporter: Jingkei Ly
>         Attachments: HADOOP-5589-1.patch, HADOOP-5589-2.patch, HADOOP-5589-3.patch
>
>
> TupleWritable uses an instance field of the primitive type, long, which I presume is so that it can quickly determine if a position has been written to in its array of Writables (by using bit-shifting operations on the long field). The problem with this is that it implies that there is a maximum limit of 64 values you can store in a TupleWritable.
> An example of a use-case where I think this would be a problem is if you had two MR jobs with over 64 reduces tasks and you wanted to join the outputs with CompositeInputFormat  - this will probably cause unexpected results in the current scheme.
> At the very least, the 64-value limit should be documented in TupleWritable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5589) TupleWritable: Lift implicit limit on the number of values that can be stored

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12701442#action_12701442 ] 

Hadoop QA commented on HADOOP-5589:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12405766/HADOOP-5589-4.patch
  against trunk revision 767331.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    -1 core tests.  The patch failed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/223/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/223/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/223/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/223/console

This message is automatically generated.

> TupleWritable: Lift implicit limit on the number of values that can be stored
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-5589
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5589
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.19.1
>            Reporter: Jingkei Ly
>            Assignee: Jingkei Ly
>         Attachments: HADOOP-5589-1.patch, HADOOP-5589-2.patch, HADOOP-5589-3.patch, HADOOP-5589-4.patch
>
>
> TupleWritable uses an instance field of the primitive type, long, which I presume is so that it can quickly determine if a position has been written to in its array of Writables (by using bit-shifting operations on the long field). The problem with this is that it implies that there is a maximum limit of 64 values you can store in a TupleWritable.
> An example of a use-case where I think this would be a problem is if you had two MR jobs with over 64 reduces tasks and you wanted to join the outputs with CompositeInputFormat  - this will probably cause unexpected results in the current scheme.
> At the very least, the 64-value limit should be documented in TupleWritable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5589) TupleWritable: Lift implicit limit on the number of values that can be stored

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12701826#action_12701826 ] 

Chris Douglas commented on HADOOP-5589:
---------------------------------------

Just a few minor nits:
* Instead of using {{BitSet::set(i, false)}}, {{BitSet::clear\(i)}} should accomplish the same thing in {{TupleWritable::clearWritten}}; in the iterator, using {{clear}} is probably more readable than {{flip}} though they're obviously equivalent.
* The size check in {{equals}} can be dropped, since {{BitSet}} also performs it in its {{equals}} implementation
* The loop in readBitSet would be easier to read as:
{code}
    for (int offset = Long.SIZE; offset < nbits; offset += Byte.SIZE) {
{code}
* I'd replace {{setBit}} with the equivalent, single line of code in {{writeBitSet}}

The rest of the changes look good.

> TupleWritable: Lift implicit limit on the number of values that can be stored
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-5589
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5589
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.19.1
>            Reporter: Jingkei Ly
>            Assignee: Jingkei Ly
>         Attachments: HADOOP-5589-1.patch, HADOOP-5589-2.patch, HADOOP-5589-3.patch, HADOOP-5589-4.patch
>
>
> TupleWritable uses an instance field of the primitive type, long, which I presume is so that it can quickly determine if a position has been written to in its array of Writables (by using bit-shifting operations on the long field). The problem with this is that it implies that there is a maximum limit of 64 values you can store in a TupleWritable.
> An example of a use-case where I think this would be a problem is if you had two MR jobs with over 64 reduces tasks and you wanted to join the outputs with CompositeInputFormat  - this will probably cause unexpected results in the current scheme.
> At the very least, the 64-value limit should be documented in TupleWritable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5589) TupleWritable: Lift implicit limit on the number of values that can be stored

Posted by "Jingkei Ly (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jingkei Ly updated HADOOP-5589:
-------------------------------

    Attachment: HADOOP-5589-1.patch

I've attached my shot at implementing this with BitSets.

> TupleWritable: Lift implicit limit on the number of values that can be stored
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-5589
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5589
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.21.0
>            Reporter: Jingkei Ly
>         Attachments: HADOOP-5589-1.patch
>
>
> TupleWritable uses an instance field of the primitive type, long, which I presume is so that it can quickly determine if a position has been written to in its array of Writables (by using bit-shifting operations on the long field). The problem with this is that it implies that there is a maximum limit of 64 values you can store in a TupleWritable.
> An example of a use-case where I think this would be a problem is if you had two MR jobs with over 64 reduces tasks and you wanted to join the outputs with CompositeInputFormat  - this will probably cause unexpected results in the current scheme.
> At the very least, the 64-value limit should be documented in TupleWritable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5589) TupleWritable: Lift implicit limit on the number of values that can be stored

Posted by "Jingkei Ly (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jingkei Ly updated HADOOP-5589:
-------------------------------

    Attachment: HADOOP-5589-2.patch

I've noticed a bug in my original patch where writeBitSet() would incorrectly detect when to write out a new long to the stream - this new patch should fix it.

> TupleWritable: Lift implicit limit on the number of values that can be stored
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-5589
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5589
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.21.0
>            Reporter: Jingkei Ly
>         Attachments: HADOOP-5589-1.patch, HADOOP-5589-2.patch
>
>
> TupleWritable uses an instance field of the primitive type, long, which I presume is so that it can quickly determine if a position has been written to in its array of Writables (by using bit-shifting operations on the long field). The problem with this is that it implies that there is a maximum limit of 64 values you can store in a TupleWritable.
> An example of a use-case where I think this would be a problem is if you had two MR jobs with over 64 reduces tasks and you wanted to join the outputs with CompositeInputFormat  - this will probably cause unexpected results in the current scheme.
> At the very least, the 64-value limit should be documented in TupleWritable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5589) TupleWritable: Lift implicit limit on the number of values that can be stored

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699378#action_12699378 ] 

Chris Douglas commented on HADOOP-5589:
---------------------------------------

Unfortunately, this would be an incompatible change to TupleWritable; one could no longer read tuples written using an older version of Hadoop, right?

I see two possible approaches:
# Define backwards-compatible semantics for TupleWritable, so readFields will read both old and new
# Subclass or create a new TupleWritable2 (or whatever) and modify the join framework to use that instead

> TupleWritable: Lift implicit limit on the number of values that can be stored
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-5589
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5589
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.21.0
>            Reporter: Jingkei Ly
>            Assignee: Jingkei Ly
>         Attachments: HADOOP-5589-1.patch, HADOOP-5589-2.patch, HADOOP-5589-3.patch
>
>
> TupleWritable uses an instance field of the primitive type, long, which I presume is so that it can quickly determine if a position has been written to in its array of Writables (by using bit-shifting operations on the long field). The problem with this is that it implies that there is a maximum limit of 64 values you can store in a TupleWritable.
> An example of a use-case where I think this would be a problem is if you had two MR jobs with over 64 reduces tasks and you wanted to join the outputs with CompositeInputFormat  - this will probably cause unexpected results in the current scheme.
> At the very least, the 64-value limit should be documented in TupleWritable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5589) TupleWritable: Lift implicit limit on the number of values that can be stored

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699452#action_12699452 ] 

Chris Douglas commented on HADOOP-5589:
---------------------------------------

bq. The patch could be backwards-compatible if the bitset was written to the stream as VLongs (essentially what had been implemented in HADOOP-5589-2.patch

This seems like a reasonable approach to me. As long as this can read old data, it's OK if it writes records that cannot be read by the old implementation.

bq. is there still a need to support users who were storing tuples over 64 values but with incorrect results? 

Such data is already corrupt, so I don't think anything can be done for it.

> TupleWritable: Lift implicit limit on the number of values that can be stored
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-5589
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5589
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.21.0
>            Reporter: Jingkei Ly
>            Assignee: Jingkei Ly
>         Attachments: HADOOP-5589-1.patch, HADOOP-5589-2.patch, HADOOP-5589-3.patch
>
>
> TupleWritable uses an instance field of the primitive type, long, which I presume is so that it can quickly determine if a position has been written to in its array of Writables (by using bit-shifting operations on the long field). The problem with this is that it implies that there is a maximum limit of 64 values you can store in a TupleWritable.
> An example of a use-case where I think this would be a problem is if you had two MR jobs with over 64 reduces tasks and you wanted to join the outputs with CompositeInputFormat  - this will probably cause unexpected results in the current scheme.
> At the very least, the 64-value limit should be documented in TupleWritable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5589) TupleWritable: Lift implicit limit on the number of values that can be stored

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12702134#action_12702134 ] 

Chris Douglas commented on HADOOP-5589:
---------------------------------------

+1

{noformat}
     [exec] +1 overall.  
     [exec] 
     [exec]     +1 @author.  The patch does not contain any @author tags.
     [exec] 
     [exec]     +1 tests included.  The patch appears to include 3 new or modified tests.
     [exec] 
     [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
     [exec] 
     [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
     [exec] 
     [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
     [exec] 
     [exec]     +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
     [exec] 
     [exec]     +1 release audit.  The applied patch does not increase the total number of release audit warnings.
{noformat}

Relevant tests, TestDatamerge and TestTupleWritable both pass

> TupleWritable: Lift implicit limit on the number of values that can be stored
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-5589
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5589
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.19.1
>            Reporter: Jingkei Ly
>            Assignee: Jingkei Ly
>         Attachments: HADOOP-5589-1.patch, HADOOP-5589-2.patch, HADOOP-5589-3.patch, HADOOP-5589-4.patch, HADOOP-5589-4.patch
>
>
> TupleWritable uses an instance field of the primitive type, long, which I presume is so that it can quickly determine if a position has been written to in its array of Writables (by using bit-shifting operations on the long field). The problem with this is that it implies that there is a maximum limit of 64 values you can store in a TupleWritable.
> An example of a use-case where I think this would be a problem is if you had two MR jobs with over 64 reduces tasks and you wanted to join the outputs with CompositeInputFormat  - this will probably cause unexpected results in the current scheme.
> At the very least, the 64-value limit should be documented in TupleWritable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5589) TupleWritable: Lift implicit limit on the number of values that can be stored

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Douglas updated HADOOP-5589:
----------------------------------

    Assignee: Jingkei Ly
      Status: Patch Available  (was: Open)

> TupleWritable: Lift implicit limit on the number of values that can be stored
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-5589
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5589
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.21.0
>            Reporter: Jingkei Ly
>            Assignee: Jingkei Ly
>         Attachments: HADOOP-5589-1.patch, HADOOP-5589-2.patch, HADOOP-5589-3.patch
>
>
> TupleWritable uses an instance field of the primitive type, long, which I presume is so that it can quickly determine if a position has been written to in its array of Writables (by using bit-shifting operations on the long field). The problem with this is that it implies that there is a maximum limit of 64 values you can store in a TupleWritable.
> An example of a use-case where I think this would be a problem is if you had two MR jobs with over 64 reduces tasks and you wanted to join the outputs with CompositeInputFormat  - this will probably cause unexpected results in the current scheme.
> At the very least, the 64-value limit should be documented in TupleWritable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5589) TupleWritable: Lift implicit limit on the number of values that can be stored

Posted by "jv ning (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12717744#action_12717744 ] 

jv ning commented on HADOOP-5589:
---------------------------------

How does the CompositeRecordReader change with this patch. I have been backporting to 18.2 and it looks like CompositeRecordReader uses a long to hold information also.

> TupleWritable: Lift implicit limit on the number of values that can be stored
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-5589
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5589
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.19.1
>            Reporter: Jingkei Ly
>            Assignee: Jingkei Ly
>             Fix For: 0.21.0
>
>         Attachments: HADOOP-5589-1.patch, HADOOP-5589-2.patch, HADOOP-5589-3.patch, HADOOP-5589-4.patch, HADOOP-5589-4.patch
>
>
> TupleWritable uses an instance field of the primitive type, long, which I presume is so that it can quickly determine if a position has been written to in its array of Writables (by using bit-shifting operations on the long field). The problem with this is that it implies that there is a maximum limit of 64 values you can store in a TupleWritable.
> An example of a use-case where I think this would be a problem is if you had two MR jobs with over 64 reduces tasks and you wanted to join the outputs with CompositeInputFormat  - this will probably cause unexpected results in the current scheme.
> At the very least, the 64-value limit should be documented in TupleWritable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5589) TupleWritable: Lift implicit limit on the number of values that can be stored

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Douglas updated HADOOP-5589:
----------------------------------

       Resolution: Fixed
    Fix Version/s: 0.21.0
     Hadoop Flags: [Reviewed]
           Status: Resolved  (was: Patch Available)

I committed this. Thanks, Jingkei

> TupleWritable: Lift implicit limit on the number of values that can be stored
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-5589
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5589
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.19.1
>            Reporter: Jingkei Ly
>            Assignee: Jingkei Ly
>             Fix For: 0.21.0
>
>         Attachments: HADOOP-5589-1.patch, HADOOP-5589-2.patch, HADOOP-5589-3.patch, HADOOP-5589-4.patch, HADOOP-5589-4.patch
>
>
> TupleWritable uses an instance field of the primitive type, long, which I presume is so that it can quickly determine if a position has been written to in its array of Writables (by using bit-shifting operations on the long field). The problem with this is that it implies that there is a maximum limit of 64 values you can store in a TupleWritable.
> An example of a use-case where I think this would be a problem is if you had two MR jobs with over 64 reduces tasks and you wanted to join the outputs with CompositeInputFormat  - this will probably cause unexpected results in the current scheme.
> At the very least, the 64-value limit should be documented in TupleWritable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.