You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by Julien Le Dem <ju...@ledem.net> on 2012/07/03 00:50:21 UTC

Re: Review Request: SchemaTuple in Pig

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4651/#review8812
-----------------------------------------------------------


Great work!
some minor comments.
This is getting really good!


trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java
<https://reviews.apache.org/r/4651/#comment18632>

    longer term we should probably have a DistributedCacheManager to centralize those things (not now)



trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigGenericMapBase.java
<https://reviews.apache.org/r/4651/#comment18634>

    maybe this instead:
    SchemaTupleBackend.initialize(job, pigContext.getExecType());
    and check inside
    



trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigGenericMapReduce.java
<https://reviews.apache.org/r/4651/#comment18633>

    is this needed ?



trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java
<https://reviews.apache.org/r/4651/#comment18657>

    whitespace



trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POFRJoin.java
<https://reviews.apache.org/r/4651/#comment18663>

    if you override get, you should really override containsKey as well, or this could hide some hard to debug side effects.
    



trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POFRJoin.java
<https://reviews.apache.org/r/4651/#comment18665>

    any reason you decided to extend HashMap as opposed to just convert when inserting?



trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POFRJoin.java
<https://reviews.apache.org/r/4651/#comment18658>

    factor this out in a method.
    
    Is there a case when this is already a SchemaTuple?



trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POFRJoin.java
<https://reviews.apache.org/r/4651/#comment18667>

    why not just convert the tuple here, instead of extending ArrayList?
    It would seem a little more obvious.
    If you want a strategy pattern, it does not have to be in List.



trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POMergeJoin.java
<https://reviews.apache.org/r/4651/#comment18668>

    are there cases where the tuple is already a SchemaTuple?



trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POMergeJoin.java
<https://reviews.apache.org/r/4651/#comment18669>

    what is this for ?



trunk/src/org/apache/pig/builtin/mock/Storage.java
<https://reviews.apache.org/r/4651/#comment18670>

    thanks



trunk/src/org/apache/pig/data/AppendableSchemaTuple.java
<https://reviews.apache.org/r/4651/#comment18671>

    remove?



trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java
<https://reviews.apache.org/r/4651/#comment18675>

    what are those for ?
    It's unlikely we want UDFs to be dependent on SchemaTuples (or their absence)



trunk/src/org/apache/pig/newplan/logical/relational/LogToPhyTranslationVisitor.java
<https://reviews.apache.org/r/4651/#comment18678>

    is there some code somewhere that does this already ?



trunk/test/org/apache/pig/data/TestSchemaTuple.java
<https://reviews.apache.org/r/4651/#comment18679>

    :)


- Julien Le Dem


On June 29, 2012, 9:55 p.m., Jonathan Coveney wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/4651/
> -----------------------------------------------------------
> 
> (Updated June 29, 2012, 9:55 p.m.)
> 
> 
> Review request for pig and Julien Le Dem.
> 
> 
> Description
> -------
> 
> This work builds on Dmitriy's PrimitiveTuple work. The idea is that, knowing the Schema on the frontend, we can code generate Tuples which can be used for fun and profit. In rudimentary tests, the memory efficiency is 2-4x better, and it's ~15% smaller serialized (heavily heavily depends on the data, though). Need to do get/set tests, but assuming that it's on par (or even faster) than Tuple, the memory gain is huge.
> 
> Need to clean up the code and add tests.
> 
> Right now, it generates a SchemaTuple for every inputSchema and outputSchema given to UDF's. The next step is to make a SchemaBag, where I think the serialization savings will be really huge.
> 
> Needs tests and comments, but I want the code to settle a bit.
> 
> 
> This addresses bug PIG-2632.
>     https://issues.apache.org/jira/browse/PIG-2632
> 
> 
> Diffs
> -----
> 
>   trunk/.gitignore 1355561 
>   trunk/conf/pig.properties 1355561 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java 1355561 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigGenericMapBase.java 1355561 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigGenericMapReduce.java 1355561 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigTupleDefaultRawComparator.java 1355561 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/PhysicalOperator.java 1355561 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java 1355561 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POFRJoin.java 1355561 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POMergeJoin.java 1355561 
>   trunk/src/org/apache/pig/builtin/mock/Storage.java 1355561 
>   trunk/src/org/apache/pig/data/AppendableSchemaTuple.java PRE-CREATION 
>   trunk/src/org/apache/pig/data/BinInterSedes.java 1355561 
>   trunk/src/org/apache/pig/data/BinSedesTupleFactory.java 1355561 
>   trunk/src/org/apache/pig/data/DataByteArray.java 1355561 
>   trunk/src/org/apache/pig/data/FieldIsNullException.java PRE-CREATION 
>   trunk/src/org/apache/pig/data/PBooleanTuple.java 1355561 
>   trunk/src/org/apache/pig/data/PDoubleTuple.java 1355561 
>   trunk/src/org/apache/pig/data/PFloatTuple.java 1355561 
>   trunk/src/org/apache/pig/data/PIntTuple.java 1355561 
>   trunk/src/org/apache/pig/data/PLongTuple.java 1355561 
>   trunk/src/org/apache/pig/data/PStringTuple.java 1355561 
>   trunk/src/org/apache/pig/data/PrimitiveFieldTuple.java 1355561 
>   trunk/src/org/apache/pig/data/PrimitiveTuple.java 1355561 
>   trunk/src/org/apache/pig/data/SchemaTuple.java PRE-CREATION 
>   trunk/src/org/apache/pig/data/SchemaTupleBackend.java PRE-CREATION 
>   trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java PRE-CREATION 
>   trunk/src/org/apache/pig/data/SchemaTupleFactory.java PRE-CREATION 
>   trunk/src/org/apache/pig/data/SchemaTupleFrontend.java PRE-CREATION 
>   trunk/src/org/apache/pig/data/TupleFactory.java 1355561 
>   trunk/src/org/apache/pig/data/TupleMaker.java PRE-CREATION 
>   trunk/src/org/apache/pig/data/TypeAwareTuple.java 1355561 
>   trunk/src/org/apache/pig/data/utils/BytesHelper.java PRE-CREATION 
>   trunk/src/org/apache/pig/data/utils/MethodHelper.java PRE-CREATION 
>   trunk/src/org/apache/pig/data/utils/SedesHelper.java PRE-CREATION 
>   trunk/src/org/apache/pig/data/utils/StructuresHelper.java PRE-CREATION 
>   trunk/src/org/apache/pig/impl/PigContext.java 1355561 
>   trunk/src/org/apache/pig/impl/io/InterRecordReader.java 1355561 
>   trunk/src/org/apache/pig/impl/io/NullableTuple.java 1355561 
>   trunk/src/org/apache/pig/newplan/logical/expression/ExpToPhyTranslationVisitor.java 1355561 
>   trunk/src/org/apache/pig/newplan/logical/expression/UserFuncExpression.java 1355561 
>   trunk/src/org/apache/pig/newplan/logical/relational/LogToPhyTranslationVisitor.java 1355561 
>   trunk/src/org/apache/pig/newplan/logical/relational/LogicalRelationalOperator.java 1355561 
>   trunk/src/org/apache/pig/newplan/logical/rules/GroupByConstParallelSetter.java 1355561 
>   trunk/src/org/apache/pig/newplan/logical/rules/MergeForEach.java 1355561 
>   trunk/test/org/apache/pig/data/TestSchemaTuple.java PRE-CREATION 
>   trunk/test/org/apache/pig/data/utils/TestMethodHelper.java PRE-CREATION 
>   trunk/test/org/apache/pig/test/TestDataBag.java 1355561 
>   trunk/test/org/apache/pig/test/TestLogicalPlanBuilder.java 1355561 
>   trunk/test/org/apache/pig/test/TestPrimitiveFieldTuple.java 1355561 
>   trunk/test/org/apache/pig/test/TestPrimitiveTuple.java 1355561 
>   trunk/test/org/apache/pig/test/TestSchema.java 1355561 
> 
> Diff: https://reviews.apache.org/r/4651/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Jonathan Coveney
> 
>


Re: Review Request: SchemaTuple in Pig

Posted by Jonathan Coveney <jc...@gmail.com>.

> On July 2, 2012, 10:50 p.m., Julien Le Dem wrote:
> > trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POFRJoin.java, lines 336-368
> > <https://reviews.apache.org/r/4651/diff/10/?file=117525#file117525line336>
> >
> >     any reason you decided to extend HashMap as opposed to just convert when inserting?

I originally converted when inserting. The nice thing about this pattern is that you centralize the conversion (at insertion), instead of having to have an if-then-else at any point where you actually use the HashMap.


> On July 2, 2012, 10:50 p.m., Julien Le Dem wrote:
> > trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POFRJoin.java, line 307
> > <https://reviews.apache.org/r/4651/diff/10/?file=117525#file117525line307>
> >
> >     if you override get, you should really override containsKey as well, or this could hide some hard to debug side effects.
> >

k


> On July 2, 2012, 10:50 p.m., Julien Le Dem wrote:
> > trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java, line 224
> > <https://reviews.apache.org/r/4651/diff/10/?file=117524#file117524line224>
> >
> >     whitespace

I generated the patch to not affect the existing whitespace. We don't really have any clear guidelines in the project around fixing whitespace formatting (afaik the current consensus is "don't fix whitespace")


> On July 2, 2012, 10:50 p.m., Julien Le Dem wrote:
> > trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigGenericMapReduce.java, line 66
> > <https://reviews.apache.org/r/4651/diff/10/?file=117521#file117521line66>
> >
> >     is this needed ?

confusingly (and I did not notice until now), but this is actually a different Pair than the one I implemented. That's...funny


> On July 2, 2012, 10:50 p.m., Julien Le Dem wrote:
> > trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigGenericMapBase.java, line 179
> > <https://reviews.apache.org/r/4651/diff/10/?file=117520#file117520line179>
> >
> >     maybe this instead:
> >     SchemaTupleBackend.initialize(job, pigContext.getExecType());
> >     and check inside
> >

agreed


> On July 2, 2012, 10:50 p.m., Julien Le Dem wrote:
> > trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POFRJoin.java, line 447
> > <https://reviews.apache.org/r/4651/diff/10/?file=117525#file117525line447>
> >
> >     why not just convert the tuple here, instead of extending ArrayList?
> >     It would seem a little more obvious.
> >     If you want a strategy pattern, it does not have to be in List.

See above. Give thoughts w.r.t. that and I'll go with it.


> On July 2, 2012, 10:50 p.m., Julien Le Dem wrote:
> > trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POFRJoin.java, lines 359-365
> > <https://reviews.apache.org/r/4651/diff/10/?file=117525#file117525line359>
> >
> >     factor this out in a method.
> >     
> >     Is there a case when this is already a SchemaTuple?

It depends. In some cases, it is possible...it's hard to know since SchemaTuple could be integrated deeper in the pipeline. I factored it out and added checks.


> On July 2, 2012, 10:50 p.m., Julien Le Dem wrote:
> > trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POMergeJoin.java, line 251
> > <https://reviews.apache.org/r/4651/diff/10/?file=117526#file117526line251>
> >
> >     are there cases where the tuple is already a SchemaTuple?

see above


> On July 2, 2012, 10:50 p.m., Julien Le Dem wrote:
> > trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POMergeJoin.java, lines 302-306
> > <https://reviews.apache.org/r/4651/diff/10/?file=117526#file117526line302>
> >
> >     what is this for ?

It's old code that can go. whoops!


> On July 2, 2012, 10:50 p.m., Julien Le Dem wrote:
> > trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java, lines 99-117
> > <https://reviews.apache.org/r/4651/diff/10/?file=117543#file117543line99>
> >
> >     what are those for ?
> >     It's unlikely we want UDFs to be dependent on SchemaTuples (or their absence)

There is a comment above explaining...is that comment unclear? Basically, this is the mechanism by which the code generation step can communicate the context in which a given SchemaTuple should be used. Based on the contexts in which a SchemaTuple was registered, we annotate the generated class. This way, on the backend when a given piece of code "requests" a SchemaTuple for a given Schema, we can inspect the SchemaTuple that matches the Schema to see if it was intended to be used in that context (imagine the case where we have SchemaTuple turned on for merge join, but turned off for UDF's, but there is a merge join and udf tuple that both have the same Schema. This is the mechanism that is used to give a SchematupleFactory to the former but not the latter)


> On July 2, 2012, 10:50 p.m., Julien Le Dem wrote:
> > trunk/src/org/apache/pig/newplan/logical/relational/LogToPhyTranslationVisitor.java, lines 1079-1094
> > <https://reviews.apache.org/r/4651/diff/10/?file=117558#file117558line1079>
> >
> >     is there some code somewhere that does this already ?

I didn't find any, but I didn't look too hard.


> On July 2, 2012, 10:50 p.m., Julien Le Dem wrote:
> > trunk/test/org/apache/pig/data/TestSchemaTuple.java, line 95
> > <https://reviews.apache.org/r/4651/diff/10/?file=117562#file117562line95>
> >
> >     :)

TDD!


- Jonathan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4651/#review8812
-----------------------------------------------------------


On June 29, 2012, 9:55 p.m., Jonathan Coveney wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/4651/
> -----------------------------------------------------------
> 
> (Updated June 29, 2012, 9:55 p.m.)
> 
> 
> Review request for pig and Julien Le Dem.
> 
> 
> Description
> -------
> 
> This work builds on Dmitriy's PrimitiveTuple work. The idea is that, knowing the Schema on the frontend, we can code generate Tuples which can be used for fun and profit. In rudimentary tests, the memory efficiency is 2-4x better, and it's ~15% smaller serialized (heavily heavily depends on the data, though). Need to do get/set tests, but assuming that it's on par (or even faster) than Tuple, the memory gain is huge.
> 
> Need to clean up the code and add tests.
> 
> Right now, it generates a SchemaTuple for every inputSchema and outputSchema given to UDF's. The next step is to make a SchemaBag, where I think the serialization savings will be really huge.
> 
> Needs tests and comments, but I want the code to settle a bit.
> 
> 
> This addresses bug PIG-2632.
>     https://issues.apache.org/jira/browse/PIG-2632
> 
> 
> Diffs
> -----
> 
>   trunk/.gitignore 1355561 
>   trunk/conf/pig.properties 1355561 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java 1355561 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigGenericMapBase.java 1355561 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigGenericMapReduce.java 1355561 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigTupleDefaultRawComparator.java 1355561 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/PhysicalOperator.java 1355561 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java 1355561 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POFRJoin.java 1355561 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POMergeJoin.java 1355561 
>   trunk/src/org/apache/pig/builtin/mock/Storage.java 1355561 
>   trunk/src/org/apache/pig/data/AppendableSchemaTuple.java PRE-CREATION 
>   trunk/src/org/apache/pig/data/BinInterSedes.java 1355561 
>   trunk/src/org/apache/pig/data/BinSedesTupleFactory.java 1355561 
>   trunk/src/org/apache/pig/data/DataByteArray.java 1355561 
>   trunk/src/org/apache/pig/data/FieldIsNullException.java PRE-CREATION 
>   trunk/src/org/apache/pig/data/PBooleanTuple.java 1355561 
>   trunk/src/org/apache/pig/data/PDoubleTuple.java 1355561 
>   trunk/src/org/apache/pig/data/PFloatTuple.java 1355561 
>   trunk/src/org/apache/pig/data/PIntTuple.java 1355561 
>   trunk/src/org/apache/pig/data/PLongTuple.java 1355561 
>   trunk/src/org/apache/pig/data/PStringTuple.java 1355561 
>   trunk/src/org/apache/pig/data/PrimitiveFieldTuple.java 1355561 
>   trunk/src/org/apache/pig/data/PrimitiveTuple.java 1355561 
>   trunk/src/org/apache/pig/data/SchemaTuple.java PRE-CREATION 
>   trunk/src/org/apache/pig/data/SchemaTupleBackend.java PRE-CREATION 
>   trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java PRE-CREATION 
>   trunk/src/org/apache/pig/data/SchemaTupleFactory.java PRE-CREATION 
>   trunk/src/org/apache/pig/data/SchemaTupleFrontend.java PRE-CREATION 
>   trunk/src/org/apache/pig/data/TupleFactory.java 1355561 
>   trunk/src/org/apache/pig/data/TupleMaker.java PRE-CREATION 
>   trunk/src/org/apache/pig/data/TypeAwareTuple.java 1355561 
>   trunk/src/org/apache/pig/data/utils/BytesHelper.java PRE-CREATION 
>   trunk/src/org/apache/pig/data/utils/MethodHelper.java PRE-CREATION 
>   trunk/src/org/apache/pig/data/utils/SedesHelper.java PRE-CREATION 
>   trunk/src/org/apache/pig/data/utils/StructuresHelper.java PRE-CREATION 
>   trunk/src/org/apache/pig/impl/PigContext.java 1355561 
>   trunk/src/org/apache/pig/impl/io/InterRecordReader.java 1355561 
>   trunk/src/org/apache/pig/impl/io/NullableTuple.java 1355561 
>   trunk/src/org/apache/pig/newplan/logical/expression/ExpToPhyTranslationVisitor.java 1355561 
>   trunk/src/org/apache/pig/newplan/logical/expression/UserFuncExpression.java 1355561 
>   trunk/src/org/apache/pig/newplan/logical/relational/LogToPhyTranslationVisitor.java 1355561 
>   trunk/src/org/apache/pig/newplan/logical/relational/LogicalRelationalOperator.java 1355561 
>   trunk/src/org/apache/pig/newplan/logical/rules/GroupByConstParallelSetter.java 1355561 
>   trunk/src/org/apache/pig/newplan/logical/rules/MergeForEach.java 1355561 
>   trunk/test/org/apache/pig/data/TestSchemaTuple.java PRE-CREATION 
>   trunk/test/org/apache/pig/data/utils/TestMethodHelper.java PRE-CREATION 
>   trunk/test/org/apache/pig/test/TestDataBag.java 1355561 
>   trunk/test/org/apache/pig/test/TestLogicalPlanBuilder.java 1355561 
>   trunk/test/org/apache/pig/test/TestPrimitiveFieldTuple.java 1355561 
>   trunk/test/org/apache/pig/test/TestPrimitiveTuple.java 1355561 
>   trunk/test/org/apache/pig/test/TestSchema.java 1355561 
> 
> Diff: https://reviews.apache.org/r/4651/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Jonathan Coveney
> 
>


Re: Review Request: SchemaTuple in Pig

Posted by Jonathan Coveney <jc...@gmail.com>.

> On July 2, 2012, 10:50 p.m., Julien Le Dem wrote:
> > Great work!
> > some minor comments.
> > This is getting really good!

Thanks Julien!


> On July 2, 2012, 10:50 p.m., Julien Le Dem wrote:
> > trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POFRJoin.java, line 447
> > <https://reviews.apache.org/r/4651/diff/10/?file=117525#file117525line447>
> >
> >     why not just convert the tuple here, instead of extending ArrayList?
> >     It would seem a little more obvious.
> >     If you want a strategy pattern, it does not have to be in List.
> 
> Jonathan Coveney wrote:
>     See above. Give thoughts w.r.t. that and I'll go with it.

In this vein, I could create a side interface (kind of like TupleMaker) that would encapsulate the proper datatype, and not have the potential pitfalls of "oh the containsKey was or wasn't implement" or something like that?


- Jonathan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4651/#review8812
-----------------------------------------------------------


On June 29, 2012, 9:55 p.m., Jonathan Coveney wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/4651/
> -----------------------------------------------------------
> 
> (Updated June 29, 2012, 9:55 p.m.)
> 
> 
> Review request for pig and Julien Le Dem.
> 
> 
> Description
> -------
> 
> This work builds on Dmitriy's PrimitiveTuple work. The idea is that, knowing the Schema on the frontend, we can code generate Tuples which can be used for fun and profit. In rudimentary tests, the memory efficiency is 2-4x better, and it's ~15% smaller serialized (heavily heavily depends on the data, though). Need to do get/set tests, but assuming that it's on par (or even faster) than Tuple, the memory gain is huge.
> 
> Need to clean up the code and add tests.
> 
> Right now, it generates a SchemaTuple for every inputSchema and outputSchema given to UDF's. The next step is to make a SchemaBag, where I think the serialization savings will be really huge.
> 
> Needs tests and comments, but I want the code to settle a bit.
> 
> 
> This addresses bug PIG-2632.
>     https://issues.apache.org/jira/browse/PIG-2632
> 
> 
> Diffs
> -----
> 
>   trunk/.gitignore 1355561 
>   trunk/conf/pig.properties 1355561 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java 1355561 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigGenericMapBase.java 1355561 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigGenericMapReduce.java 1355561 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigTupleDefaultRawComparator.java 1355561 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/PhysicalOperator.java 1355561 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java 1355561 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POFRJoin.java 1355561 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POMergeJoin.java 1355561 
>   trunk/src/org/apache/pig/builtin/mock/Storage.java 1355561 
>   trunk/src/org/apache/pig/data/AppendableSchemaTuple.java PRE-CREATION 
>   trunk/src/org/apache/pig/data/BinInterSedes.java 1355561 
>   trunk/src/org/apache/pig/data/BinSedesTupleFactory.java 1355561 
>   trunk/src/org/apache/pig/data/DataByteArray.java 1355561 
>   trunk/src/org/apache/pig/data/FieldIsNullException.java PRE-CREATION 
>   trunk/src/org/apache/pig/data/PBooleanTuple.java 1355561 
>   trunk/src/org/apache/pig/data/PDoubleTuple.java 1355561 
>   trunk/src/org/apache/pig/data/PFloatTuple.java 1355561 
>   trunk/src/org/apache/pig/data/PIntTuple.java 1355561 
>   trunk/src/org/apache/pig/data/PLongTuple.java 1355561 
>   trunk/src/org/apache/pig/data/PStringTuple.java 1355561 
>   trunk/src/org/apache/pig/data/PrimitiveFieldTuple.java 1355561 
>   trunk/src/org/apache/pig/data/PrimitiveTuple.java 1355561 
>   trunk/src/org/apache/pig/data/SchemaTuple.java PRE-CREATION 
>   trunk/src/org/apache/pig/data/SchemaTupleBackend.java PRE-CREATION 
>   trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java PRE-CREATION 
>   trunk/src/org/apache/pig/data/SchemaTupleFactory.java PRE-CREATION 
>   trunk/src/org/apache/pig/data/SchemaTupleFrontend.java PRE-CREATION 
>   trunk/src/org/apache/pig/data/TupleFactory.java 1355561 
>   trunk/src/org/apache/pig/data/TupleMaker.java PRE-CREATION 
>   trunk/src/org/apache/pig/data/TypeAwareTuple.java 1355561 
>   trunk/src/org/apache/pig/data/utils/BytesHelper.java PRE-CREATION 
>   trunk/src/org/apache/pig/data/utils/MethodHelper.java PRE-CREATION 
>   trunk/src/org/apache/pig/data/utils/SedesHelper.java PRE-CREATION 
>   trunk/src/org/apache/pig/data/utils/StructuresHelper.java PRE-CREATION 
>   trunk/src/org/apache/pig/impl/PigContext.java 1355561 
>   trunk/src/org/apache/pig/impl/io/InterRecordReader.java 1355561 
>   trunk/src/org/apache/pig/impl/io/NullableTuple.java 1355561 
>   trunk/src/org/apache/pig/newplan/logical/expression/ExpToPhyTranslationVisitor.java 1355561 
>   trunk/src/org/apache/pig/newplan/logical/expression/UserFuncExpression.java 1355561 
>   trunk/src/org/apache/pig/newplan/logical/relational/LogToPhyTranslationVisitor.java 1355561 
>   trunk/src/org/apache/pig/newplan/logical/relational/LogicalRelationalOperator.java 1355561 
>   trunk/src/org/apache/pig/newplan/logical/rules/GroupByConstParallelSetter.java 1355561 
>   trunk/src/org/apache/pig/newplan/logical/rules/MergeForEach.java 1355561 
>   trunk/test/org/apache/pig/data/TestSchemaTuple.java PRE-CREATION 
>   trunk/test/org/apache/pig/data/utils/TestMethodHelper.java PRE-CREATION 
>   trunk/test/org/apache/pig/test/TestDataBag.java 1355561 
>   trunk/test/org/apache/pig/test/TestLogicalPlanBuilder.java 1355561 
>   trunk/test/org/apache/pig/test/TestPrimitiveFieldTuple.java 1355561 
>   trunk/test/org/apache/pig/test/TestPrimitiveTuple.java 1355561 
>   trunk/test/org/apache/pig/test/TestSchema.java 1355561 
> 
> Diff: https://reviews.apache.org/r/4651/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Jonathan Coveney
> 
>