You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by Jonathan Coveney <jc...@gmail.com> on 2012/06/18 19:22:42 UTC

Re: Review Request: SchemaTuple in Pig

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4651/
-----------------------------------------------------------

(Updated June 18, 2012, 5:22 p.m.)


Review request for pig and Julien Le Dem.


Changes
-------

This is not as radical a change as it would appear, as we had been discussing it a bit on github while it was down. Still, this is a much refactored version of the code, and with some nascent testing!


Description
-------

This work builds on Dmitriy's PrimitiveTuple work. The idea is that, knowing the Schema on the frontend, we can code generate Tuples which can be used for fun and profit. In rudimentary tests, the memory efficiency is 2-4x better, and it's ~15% smaller serialized (heavily heavily depends on the data, though). Need to do get/set tests, but assuming that it's on par (or even faster) than Tuple, the memory gain is huge.

Need to clean up the code and add tests.

Right now, it generates a SchemaTuple for every inputSchema and outputSchema given to UDF's. The next step is to make a SchemaBag, where I think the serialization savings will be really huge.

Needs tests and comments, but I want the code to settle a bit.


This addresses bug PIG-2632.
    https://issues.apache.org/jira/browse/PIG-2632


Diffs (updated)
-----

  trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java 1351417 
  trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigGenericMapBase.java 1351417 
  trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigGenericMapReduce.java 1351417 
  trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigTupleDefaultRawComparator.java 1351417 
  trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/PhysicalOperator.java 1351417 
  trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java 1351417 
  trunk/src/org/apache/pig/data/BinInterSedes.java 1351417 
  trunk/src/org/apache/pig/data/DataByteArray.java 1351417 
  trunk/src/org/apache/pig/data/TupleFactory.java 1351417 
  trunk/src/org/apache/pig/data/TypeAwareTuple.java 1351417 
  trunk/src/org/apache/pig/impl/PigContext.java 1351417 
  trunk/src/org/apache/pig/impl/io/NullableTuple.java 1351417 
  trunk/src/org/apache/pig/impl/io/PigNullableWritable.java 1351417 
  trunk/src/org/apache/pig/newplan/logical/expression/ExpToPhyTranslationVisitor.java 1351417 
  trunk/src/org/apache/pig/newplan/logical/expression/UserFuncExpression.java 1351417 
  trunk/test/org/apache/pig/test/TestDataBag.java 1351417 
  trunk/test/org/apache/pig/test/TestSchema.java 1351417 

Diff: https://reviews.apache.org/r/4651/diff/


Testing
-------


Thanks,

Jonathan Coveney


Re: Review Request: SchemaTuple in Pig

Posted by Jonathan Coveney <jc...@gmail.com>.

> On July 2, 2012, 10:50 p.m., Julien Le Dem wrote:
> > trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POFRJoin.java, lines 336-368
> > <https://reviews.apache.org/r/4651/diff/10/?file=117525#file117525line336>
> >
> >     any reason you decided to extend HashMap as opposed to just convert when inserting?

I originally converted when inserting. The nice thing about this pattern is that you centralize the conversion (at insertion), instead of having to have an if-then-else at any point where you actually use the HashMap.


> On July 2, 2012, 10:50 p.m., Julien Le Dem wrote:
> > trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POFRJoin.java, line 307
> > <https://reviews.apache.org/r/4651/diff/10/?file=117525#file117525line307>
> >
> >     if you override get, you should really override containsKey as well, or this could hide some hard to debug side effects.
> >

k


> On July 2, 2012, 10:50 p.m., Julien Le Dem wrote:
> > trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java, line 224
> > <https://reviews.apache.org/r/4651/diff/10/?file=117524#file117524line224>
> >
> >     whitespace

I generated the patch to not affect the existing whitespace. We don't really have any clear guidelines in the project around fixing whitespace formatting (afaik the current consensus is "don't fix whitespace")


> On July 2, 2012, 10:50 p.m., Julien Le Dem wrote:
> > trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigGenericMapReduce.java, line 66
> > <https://reviews.apache.org/r/4651/diff/10/?file=117521#file117521line66>
> >
> >     is this needed ?

confusingly (and I did not notice until now), but this is actually a different Pair than the one I implemented. That's...funny


> On July 2, 2012, 10:50 p.m., Julien Le Dem wrote:
> > trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigGenericMapBase.java, line 179
> > <https://reviews.apache.org/r/4651/diff/10/?file=117520#file117520line179>
> >
> >     maybe this instead:
> >     SchemaTupleBackend.initialize(job, pigContext.getExecType());
> >     and check inside
> >

agreed


> On July 2, 2012, 10:50 p.m., Julien Le Dem wrote:
> > trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POFRJoin.java, line 447
> > <https://reviews.apache.org/r/4651/diff/10/?file=117525#file117525line447>
> >
> >     why not just convert the tuple here, instead of extending ArrayList?
> >     It would seem a little more obvious.
> >     If you want a strategy pattern, it does not have to be in List.

See above. Give thoughts w.r.t. that and I'll go with it.


> On July 2, 2012, 10:50 p.m., Julien Le Dem wrote:
> > trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POFRJoin.java, lines 359-365
> > <https://reviews.apache.org/r/4651/diff/10/?file=117525#file117525line359>
> >
> >     factor this out in a method.
> >     
> >     Is there a case when this is already a SchemaTuple?

It depends. In some cases, it is possible...it's hard to know since SchemaTuple could be integrated deeper in the pipeline. I factored it out and added checks.


> On July 2, 2012, 10:50 p.m., Julien Le Dem wrote:
> > trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POMergeJoin.java, line 251
> > <https://reviews.apache.org/r/4651/diff/10/?file=117526#file117526line251>
> >
> >     are there cases where the tuple is already a SchemaTuple?

see above


> On July 2, 2012, 10:50 p.m., Julien Le Dem wrote:
> > trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POMergeJoin.java, lines 302-306
> > <https://reviews.apache.org/r/4651/diff/10/?file=117526#file117526line302>
> >
> >     what is this for ?

It's old code that can go. whoops!


> On July 2, 2012, 10:50 p.m., Julien Le Dem wrote:
> > trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java, lines 99-117
> > <https://reviews.apache.org/r/4651/diff/10/?file=117543#file117543line99>
> >
> >     what are those for ?
> >     It's unlikely we want UDFs to be dependent on SchemaTuples (or their absence)

There is a comment above explaining...is that comment unclear? Basically, this is the mechanism by which the code generation step can communicate the context in which a given SchemaTuple should be used. Based on the contexts in which a SchemaTuple was registered, we annotate the generated class. This way, on the backend when a given piece of code "requests" a SchemaTuple for a given Schema, we can inspect the SchemaTuple that matches the Schema to see if it was intended to be used in that context (imagine the case where we have SchemaTuple turned on for merge join, but turned off for UDF's, but there is a merge join and udf tuple that both have the same Schema. This is the mechanism that is used to give a SchematupleFactory to the former but not the latter)


> On July 2, 2012, 10:50 p.m., Julien Le Dem wrote:
> > trunk/src/org/apache/pig/newplan/logical/relational/LogToPhyTranslationVisitor.java, lines 1079-1094
> > <https://reviews.apache.org/r/4651/diff/10/?file=117558#file117558line1079>
> >
> >     is there some code somewhere that does this already ?

I didn't find any, but I didn't look too hard.


> On July 2, 2012, 10:50 p.m., Julien Le Dem wrote:
> > trunk/test/org/apache/pig/data/TestSchemaTuple.java, line 95
> > <https://reviews.apache.org/r/4651/diff/10/?file=117562#file117562line95>
> >
> >     :)

TDD!


- Jonathan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4651/#review8812
-----------------------------------------------------------


On June 29, 2012, 9:55 p.m., Jonathan Coveney wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/4651/
> -----------------------------------------------------------
> 
> (Updated June 29, 2012, 9:55 p.m.)
> 
> 
> Review request for pig and Julien Le Dem.
> 
> 
> Description
> -------
> 
> This work builds on Dmitriy's PrimitiveTuple work. The idea is that, knowing the Schema on the frontend, we can code generate Tuples which can be used for fun and profit. In rudimentary tests, the memory efficiency is 2-4x better, and it's ~15% smaller serialized (heavily heavily depends on the data, though). Need to do get/set tests, but assuming that it's on par (or even faster) than Tuple, the memory gain is huge.
> 
> Need to clean up the code and add tests.
> 
> Right now, it generates a SchemaTuple for every inputSchema and outputSchema given to UDF's. The next step is to make a SchemaBag, where I think the serialization savings will be really huge.
> 
> Needs tests and comments, but I want the code to settle a bit.
> 
> 
> This addresses bug PIG-2632.
>     https://issues.apache.org/jira/browse/PIG-2632
> 
> 
> Diffs
> -----
> 
>   trunk/.gitignore 1355561 
>   trunk/conf/pig.properties 1355561 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java 1355561 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigGenericMapBase.java 1355561 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigGenericMapReduce.java 1355561 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigTupleDefaultRawComparator.java 1355561 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/PhysicalOperator.java 1355561 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java 1355561 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POFRJoin.java 1355561 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POMergeJoin.java 1355561 
>   trunk/src/org/apache/pig/builtin/mock/Storage.java 1355561 
>   trunk/src/org/apache/pig/data/AppendableSchemaTuple.java PRE-CREATION 
>   trunk/src/org/apache/pig/data/BinInterSedes.java 1355561 
>   trunk/src/org/apache/pig/data/BinSedesTupleFactory.java 1355561 
>   trunk/src/org/apache/pig/data/DataByteArray.java 1355561 
>   trunk/src/org/apache/pig/data/FieldIsNullException.java PRE-CREATION 
>   trunk/src/org/apache/pig/data/PBooleanTuple.java 1355561 
>   trunk/src/org/apache/pig/data/PDoubleTuple.java 1355561 
>   trunk/src/org/apache/pig/data/PFloatTuple.java 1355561 
>   trunk/src/org/apache/pig/data/PIntTuple.java 1355561 
>   trunk/src/org/apache/pig/data/PLongTuple.java 1355561 
>   trunk/src/org/apache/pig/data/PStringTuple.java 1355561 
>   trunk/src/org/apache/pig/data/PrimitiveFieldTuple.java 1355561 
>   trunk/src/org/apache/pig/data/PrimitiveTuple.java 1355561 
>   trunk/src/org/apache/pig/data/SchemaTuple.java PRE-CREATION 
>   trunk/src/org/apache/pig/data/SchemaTupleBackend.java PRE-CREATION 
>   trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java PRE-CREATION 
>   trunk/src/org/apache/pig/data/SchemaTupleFactory.java PRE-CREATION 
>   trunk/src/org/apache/pig/data/SchemaTupleFrontend.java PRE-CREATION 
>   trunk/src/org/apache/pig/data/TupleFactory.java 1355561 
>   trunk/src/org/apache/pig/data/TupleMaker.java PRE-CREATION 
>   trunk/src/org/apache/pig/data/TypeAwareTuple.java 1355561 
>   trunk/src/org/apache/pig/data/utils/BytesHelper.java PRE-CREATION 
>   trunk/src/org/apache/pig/data/utils/MethodHelper.java PRE-CREATION 
>   trunk/src/org/apache/pig/data/utils/SedesHelper.java PRE-CREATION 
>   trunk/src/org/apache/pig/data/utils/StructuresHelper.java PRE-CREATION 
>   trunk/src/org/apache/pig/impl/PigContext.java 1355561 
>   trunk/src/org/apache/pig/impl/io/InterRecordReader.java 1355561 
>   trunk/src/org/apache/pig/impl/io/NullableTuple.java 1355561 
>   trunk/src/org/apache/pig/newplan/logical/expression/ExpToPhyTranslationVisitor.java 1355561 
>   trunk/src/org/apache/pig/newplan/logical/expression/UserFuncExpression.java 1355561 
>   trunk/src/org/apache/pig/newplan/logical/relational/LogToPhyTranslationVisitor.java 1355561 
>   trunk/src/org/apache/pig/newplan/logical/relational/LogicalRelationalOperator.java 1355561 
>   trunk/src/org/apache/pig/newplan/logical/rules/GroupByConstParallelSetter.java 1355561 
>   trunk/src/org/apache/pig/newplan/logical/rules/MergeForEach.java 1355561 
>   trunk/test/org/apache/pig/data/TestSchemaTuple.java PRE-CREATION 
>   trunk/test/org/apache/pig/data/utils/TestMethodHelper.java PRE-CREATION 
>   trunk/test/org/apache/pig/test/TestDataBag.java 1355561 
>   trunk/test/org/apache/pig/test/TestLogicalPlanBuilder.java 1355561 
>   trunk/test/org/apache/pig/test/TestPrimitiveFieldTuple.java 1355561 
>   trunk/test/org/apache/pig/test/TestPrimitiveTuple.java 1355561 
>   trunk/test/org/apache/pig/test/TestSchema.java 1355561 
> 
> Diff: https://reviews.apache.org/r/4651/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Jonathan Coveney
> 
>


Re: Review Request: SchemaTuple in Pig

Posted by Jonathan Coveney <jc...@gmail.com>.

> On July 2, 2012, 10:50 p.m., Julien Le Dem wrote:
> > Great work!
> > some minor comments.
> > This is getting really good!

Thanks Julien!


> On July 2, 2012, 10:50 p.m., Julien Le Dem wrote:
> > trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POFRJoin.java, line 447
> > <https://reviews.apache.org/r/4651/diff/10/?file=117525#file117525line447>
> >
> >     why not just convert the tuple here, instead of extending ArrayList?
> >     It would seem a little more obvious.
> >     If you want a strategy pattern, it does not have to be in List.
> 
> Jonathan Coveney wrote:
>     See above. Give thoughts w.r.t. that and I'll go with it.

In this vein, I could create a side interface (kind of like TupleMaker) that would encapsulate the proper datatype, and not have the potential pitfalls of "oh the containsKey was or wasn't implement" or something like that?


- Jonathan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4651/#review8812
-----------------------------------------------------------


On June 29, 2012, 9:55 p.m., Jonathan Coveney wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/4651/
> -----------------------------------------------------------
> 
> (Updated June 29, 2012, 9:55 p.m.)
> 
> 
> Review request for pig and Julien Le Dem.
> 
> 
> Description
> -------
> 
> This work builds on Dmitriy's PrimitiveTuple work. The idea is that, knowing the Schema on the frontend, we can code generate Tuples which can be used for fun and profit. In rudimentary tests, the memory efficiency is 2-4x better, and it's ~15% smaller serialized (heavily heavily depends on the data, though). Need to do get/set tests, but assuming that it's on par (or even faster) than Tuple, the memory gain is huge.
> 
> Need to clean up the code and add tests.
> 
> Right now, it generates a SchemaTuple for every inputSchema and outputSchema given to UDF's. The next step is to make a SchemaBag, where I think the serialization savings will be really huge.
> 
> Needs tests and comments, but I want the code to settle a bit.
> 
> 
> This addresses bug PIG-2632.
>     https://issues.apache.org/jira/browse/PIG-2632
> 
> 
> Diffs
> -----
> 
>   trunk/.gitignore 1355561 
>   trunk/conf/pig.properties 1355561 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java 1355561 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigGenericMapBase.java 1355561 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigGenericMapReduce.java 1355561 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigTupleDefaultRawComparator.java 1355561 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/PhysicalOperator.java 1355561 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java 1355561 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POFRJoin.java 1355561 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POMergeJoin.java 1355561 
>   trunk/src/org/apache/pig/builtin/mock/Storage.java 1355561 
>   trunk/src/org/apache/pig/data/AppendableSchemaTuple.java PRE-CREATION 
>   trunk/src/org/apache/pig/data/BinInterSedes.java 1355561 
>   trunk/src/org/apache/pig/data/BinSedesTupleFactory.java 1355561 
>   trunk/src/org/apache/pig/data/DataByteArray.java 1355561 
>   trunk/src/org/apache/pig/data/FieldIsNullException.java PRE-CREATION 
>   trunk/src/org/apache/pig/data/PBooleanTuple.java 1355561 
>   trunk/src/org/apache/pig/data/PDoubleTuple.java 1355561 
>   trunk/src/org/apache/pig/data/PFloatTuple.java 1355561 
>   trunk/src/org/apache/pig/data/PIntTuple.java 1355561 
>   trunk/src/org/apache/pig/data/PLongTuple.java 1355561 
>   trunk/src/org/apache/pig/data/PStringTuple.java 1355561 
>   trunk/src/org/apache/pig/data/PrimitiveFieldTuple.java 1355561 
>   trunk/src/org/apache/pig/data/PrimitiveTuple.java 1355561 
>   trunk/src/org/apache/pig/data/SchemaTuple.java PRE-CREATION 
>   trunk/src/org/apache/pig/data/SchemaTupleBackend.java PRE-CREATION 
>   trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java PRE-CREATION 
>   trunk/src/org/apache/pig/data/SchemaTupleFactory.java PRE-CREATION 
>   trunk/src/org/apache/pig/data/SchemaTupleFrontend.java PRE-CREATION 
>   trunk/src/org/apache/pig/data/TupleFactory.java 1355561 
>   trunk/src/org/apache/pig/data/TupleMaker.java PRE-CREATION 
>   trunk/src/org/apache/pig/data/TypeAwareTuple.java 1355561 
>   trunk/src/org/apache/pig/data/utils/BytesHelper.java PRE-CREATION 
>   trunk/src/org/apache/pig/data/utils/MethodHelper.java PRE-CREATION 
>   trunk/src/org/apache/pig/data/utils/SedesHelper.java PRE-CREATION 
>   trunk/src/org/apache/pig/data/utils/StructuresHelper.java PRE-CREATION 
>   trunk/src/org/apache/pig/impl/PigContext.java 1355561 
>   trunk/src/org/apache/pig/impl/io/InterRecordReader.java 1355561 
>   trunk/src/org/apache/pig/impl/io/NullableTuple.java 1355561 
>   trunk/src/org/apache/pig/newplan/logical/expression/ExpToPhyTranslationVisitor.java 1355561 
>   trunk/src/org/apache/pig/newplan/logical/expression/UserFuncExpression.java 1355561 
>   trunk/src/org/apache/pig/newplan/logical/relational/LogToPhyTranslationVisitor.java 1355561 
>   trunk/src/org/apache/pig/newplan/logical/relational/LogicalRelationalOperator.java 1355561 
>   trunk/src/org/apache/pig/newplan/logical/rules/GroupByConstParallelSetter.java 1355561 
>   trunk/src/org/apache/pig/newplan/logical/rules/MergeForEach.java 1355561 
>   trunk/test/org/apache/pig/data/TestSchemaTuple.java PRE-CREATION 
>   trunk/test/org/apache/pig/data/utils/TestMethodHelper.java PRE-CREATION 
>   trunk/test/org/apache/pig/test/TestDataBag.java 1355561 
>   trunk/test/org/apache/pig/test/TestLogicalPlanBuilder.java 1355561 
>   trunk/test/org/apache/pig/test/TestPrimitiveFieldTuple.java 1355561 
>   trunk/test/org/apache/pig/test/TestPrimitiveTuple.java 1355561 
>   trunk/test/org/apache/pig/test/TestSchema.java 1355561 
> 
> Diff: https://reviews.apache.org/r/4651/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Jonathan Coveney
> 
>


Re: Review Request: SchemaTuple in Pig

Posted by Julien Le Dem <ju...@ledem.net>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4651/#review8812
-----------------------------------------------------------


Great work!
some minor comments.
This is getting really good!


trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java
<https://reviews.apache.org/r/4651/#comment18632>

    longer term we should probably have a DistributedCacheManager to centralize those things (not now)



trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigGenericMapBase.java
<https://reviews.apache.org/r/4651/#comment18634>

    maybe this instead:
    SchemaTupleBackend.initialize(job, pigContext.getExecType());
    and check inside
    



trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigGenericMapReduce.java
<https://reviews.apache.org/r/4651/#comment18633>

    is this needed ?



trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java
<https://reviews.apache.org/r/4651/#comment18657>

    whitespace



trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POFRJoin.java
<https://reviews.apache.org/r/4651/#comment18663>

    if you override get, you should really override containsKey as well, or this could hide some hard to debug side effects.
    



trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POFRJoin.java
<https://reviews.apache.org/r/4651/#comment18665>

    any reason you decided to extend HashMap as opposed to just convert when inserting?



trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POFRJoin.java
<https://reviews.apache.org/r/4651/#comment18658>

    factor this out in a method.
    
    Is there a case when this is already a SchemaTuple?



trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POFRJoin.java
<https://reviews.apache.org/r/4651/#comment18667>

    why not just convert the tuple here, instead of extending ArrayList?
    It would seem a little more obvious.
    If you want a strategy pattern, it does not have to be in List.



trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POMergeJoin.java
<https://reviews.apache.org/r/4651/#comment18668>

    are there cases where the tuple is already a SchemaTuple?



trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POMergeJoin.java
<https://reviews.apache.org/r/4651/#comment18669>

    what is this for ?



trunk/src/org/apache/pig/builtin/mock/Storage.java
<https://reviews.apache.org/r/4651/#comment18670>

    thanks



trunk/src/org/apache/pig/data/AppendableSchemaTuple.java
<https://reviews.apache.org/r/4651/#comment18671>

    remove?



trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java
<https://reviews.apache.org/r/4651/#comment18675>

    what are those for ?
    It's unlikely we want UDFs to be dependent on SchemaTuples (or their absence)



trunk/src/org/apache/pig/newplan/logical/relational/LogToPhyTranslationVisitor.java
<https://reviews.apache.org/r/4651/#comment18678>

    is there some code somewhere that does this already ?



trunk/test/org/apache/pig/data/TestSchemaTuple.java
<https://reviews.apache.org/r/4651/#comment18679>

    :)


- Julien Le Dem


On June 29, 2012, 9:55 p.m., Jonathan Coveney wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/4651/
> -----------------------------------------------------------
> 
> (Updated June 29, 2012, 9:55 p.m.)
> 
> 
> Review request for pig and Julien Le Dem.
> 
> 
> Description
> -------
> 
> This work builds on Dmitriy's PrimitiveTuple work. The idea is that, knowing the Schema on the frontend, we can code generate Tuples which can be used for fun and profit. In rudimentary tests, the memory efficiency is 2-4x better, and it's ~15% smaller serialized (heavily heavily depends on the data, though). Need to do get/set tests, but assuming that it's on par (or even faster) than Tuple, the memory gain is huge.
> 
> Need to clean up the code and add tests.
> 
> Right now, it generates a SchemaTuple for every inputSchema and outputSchema given to UDF's. The next step is to make a SchemaBag, where I think the serialization savings will be really huge.
> 
> Needs tests and comments, but I want the code to settle a bit.
> 
> 
> This addresses bug PIG-2632.
>     https://issues.apache.org/jira/browse/PIG-2632
> 
> 
> Diffs
> -----
> 
>   trunk/.gitignore 1355561 
>   trunk/conf/pig.properties 1355561 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java 1355561 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigGenericMapBase.java 1355561 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigGenericMapReduce.java 1355561 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigTupleDefaultRawComparator.java 1355561 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/PhysicalOperator.java 1355561 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java 1355561 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POFRJoin.java 1355561 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POMergeJoin.java 1355561 
>   trunk/src/org/apache/pig/builtin/mock/Storage.java 1355561 
>   trunk/src/org/apache/pig/data/AppendableSchemaTuple.java PRE-CREATION 
>   trunk/src/org/apache/pig/data/BinInterSedes.java 1355561 
>   trunk/src/org/apache/pig/data/BinSedesTupleFactory.java 1355561 
>   trunk/src/org/apache/pig/data/DataByteArray.java 1355561 
>   trunk/src/org/apache/pig/data/FieldIsNullException.java PRE-CREATION 
>   trunk/src/org/apache/pig/data/PBooleanTuple.java 1355561 
>   trunk/src/org/apache/pig/data/PDoubleTuple.java 1355561 
>   trunk/src/org/apache/pig/data/PFloatTuple.java 1355561 
>   trunk/src/org/apache/pig/data/PIntTuple.java 1355561 
>   trunk/src/org/apache/pig/data/PLongTuple.java 1355561 
>   trunk/src/org/apache/pig/data/PStringTuple.java 1355561 
>   trunk/src/org/apache/pig/data/PrimitiveFieldTuple.java 1355561 
>   trunk/src/org/apache/pig/data/PrimitiveTuple.java 1355561 
>   trunk/src/org/apache/pig/data/SchemaTuple.java PRE-CREATION 
>   trunk/src/org/apache/pig/data/SchemaTupleBackend.java PRE-CREATION 
>   trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java PRE-CREATION 
>   trunk/src/org/apache/pig/data/SchemaTupleFactory.java PRE-CREATION 
>   trunk/src/org/apache/pig/data/SchemaTupleFrontend.java PRE-CREATION 
>   trunk/src/org/apache/pig/data/TupleFactory.java 1355561 
>   trunk/src/org/apache/pig/data/TupleMaker.java PRE-CREATION 
>   trunk/src/org/apache/pig/data/TypeAwareTuple.java 1355561 
>   trunk/src/org/apache/pig/data/utils/BytesHelper.java PRE-CREATION 
>   trunk/src/org/apache/pig/data/utils/MethodHelper.java PRE-CREATION 
>   trunk/src/org/apache/pig/data/utils/SedesHelper.java PRE-CREATION 
>   trunk/src/org/apache/pig/data/utils/StructuresHelper.java PRE-CREATION 
>   trunk/src/org/apache/pig/impl/PigContext.java 1355561 
>   trunk/src/org/apache/pig/impl/io/InterRecordReader.java 1355561 
>   trunk/src/org/apache/pig/impl/io/NullableTuple.java 1355561 
>   trunk/src/org/apache/pig/newplan/logical/expression/ExpToPhyTranslationVisitor.java 1355561 
>   trunk/src/org/apache/pig/newplan/logical/expression/UserFuncExpression.java 1355561 
>   trunk/src/org/apache/pig/newplan/logical/relational/LogToPhyTranslationVisitor.java 1355561 
>   trunk/src/org/apache/pig/newplan/logical/relational/LogicalRelationalOperator.java 1355561 
>   trunk/src/org/apache/pig/newplan/logical/rules/GroupByConstParallelSetter.java 1355561 
>   trunk/src/org/apache/pig/newplan/logical/rules/MergeForEach.java 1355561 
>   trunk/test/org/apache/pig/data/TestSchemaTuple.java PRE-CREATION 
>   trunk/test/org/apache/pig/data/utils/TestMethodHelper.java PRE-CREATION 
>   trunk/test/org/apache/pig/test/TestDataBag.java 1355561 
>   trunk/test/org/apache/pig/test/TestLogicalPlanBuilder.java 1355561 
>   trunk/test/org/apache/pig/test/TestPrimitiveFieldTuple.java 1355561 
>   trunk/test/org/apache/pig/test/TestPrimitiveTuple.java 1355561 
>   trunk/test/org/apache/pig/test/TestSchema.java 1355561 
> 
> Diff: https://reviews.apache.org/r/4651/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Jonathan Coveney
> 
>


Re: Review Request: SchemaTuple in Pig

Posted by Jonathan Coveney <jc...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4651/
-----------------------------------------------------------

(Updated June 29, 2012, 9:55 p.m.)


Review request for pig and Julien Le Dem.


Description
-------

This work builds on Dmitriy's PrimitiveTuple work. The idea is that, knowing the Schema on the frontend, we can code generate Tuples which can be used for fun and profit. In rudimentary tests, the memory efficiency is 2-4x better, and it's ~15% smaller serialized (heavily heavily depends on the data, though). Need to do get/set tests, but assuming that it's on par (or even faster) than Tuple, the memory gain is huge.

Need to clean up the code and add tests.

Right now, it generates a SchemaTuple for every inputSchema and outputSchema given to UDF's. The next step is to make a SchemaBag, where I think the serialization savings will be really huge.

Needs tests and comments, but I want the code to settle a bit.


This addresses bug PIG-2632.
    https://issues.apache.org/jira/browse/PIG-2632


Diffs (updated)
-----

  trunk/.gitignore 1355561 
  trunk/conf/pig.properties 1355561 
  trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java 1355561 
  trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigGenericMapBase.java 1355561 
  trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigGenericMapReduce.java 1355561 
  trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigTupleDefaultRawComparator.java 1355561 
  trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/PhysicalOperator.java 1355561 
  trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java 1355561 
  trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POFRJoin.java 1355561 
  trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POMergeJoin.java 1355561 
  trunk/src/org/apache/pig/builtin/mock/Storage.java 1355561 
  trunk/src/org/apache/pig/data/AppendableSchemaTuple.java PRE-CREATION 
  trunk/src/org/apache/pig/data/BinInterSedes.java 1355561 
  trunk/src/org/apache/pig/data/BinSedesTupleFactory.java 1355561 
  trunk/src/org/apache/pig/data/DataByteArray.java 1355561 
  trunk/src/org/apache/pig/data/FieldIsNullException.java PRE-CREATION 
  trunk/src/org/apache/pig/data/PBooleanTuple.java 1355561 
  trunk/src/org/apache/pig/data/PDoubleTuple.java 1355561 
  trunk/src/org/apache/pig/data/PFloatTuple.java 1355561 
  trunk/src/org/apache/pig/data/PIntTuple.java 1355561 
  trunk/src/org/apache/pig/data/PLongTuple.java 1355561 
  trunk/src/org/apache/pig/data/PStringTuple.java 1355561 
  trunk/src/org/apache/pig/data/PrimitiveFieldTuple.java 1355561 
  trunk/src/org/apache/pig/data/PrimitiveTuple.java 1355561 
  trunk/src/org/apache/pig/data/SchemaTuple.java PRE-CREATION 
  trunk/src/org/apache/pig/data/SchemaTupleBackend.java PRE-CREATION 
  trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java PRE-CREATION 
  trunk/src/org/apache/pig/data/SchemaTupleFactory.java PRE-CREATION 
  trunk/src/org/apache/pig/data/SchemaTupleFrontend.java PRE-CREATION 
  trunk/src/org/apache/pig/data/TupleFactory.java 1355561 
  trunk/src/org/apache/pig/data/TupleMaker.java PRE-CREATION 
  trunk/src/org/apache/pig/data/TypeAwareTuple.java 1355561 
  trunk/src/org/apache/pig/data/utils/BytesHelper.java PRE-CREATION 
  trunk/src/org/apache/pig/data/utils/MethodHelper.java PRE-CREATION 
  trunk/src/org/apache/pig/data/utils/SedesHelper.java PRE-CREATION 
  trunk/src/org/apache/pig/data/utils/StructuresHelper.java PRE-CREATION 
  trunk/src/org/apache/pig/impl/PigContext.java 1355561 
  trunk/src/org/apache/pig/impl/io/InterRecordReader.java 1355561 
  trunk/src/org/apache/pig/impl/io/NullableTuple.java 1355561 
  trunk/src/org/apache/pig/newplan/logical/expression/ExpToPhyTranslationVisitor.java 1355561 
  trunk/src/org/apache/pig/newplan/logical/expression/UserFuncExpression.java 1355561 
  trunk/src/org/apache/pig/newplan/logical/relational/LogToPhyTranslationVisitor.java 1355561 
  trunk/src/org/apache/pig/newplan/logical/relational/LogicalRelationalOperator.java 1355561 
  trunk/src/org/apache/pig/newplan/logical/rules/GroupByConstParallelSetter.java 1355561 
  trunk/src/org/apache/pig/newplan/logical/rules/MergeForEach.java 1355561 
  trunk/test/org/apache/pig/data/TestSchemaTuple.java PRE-CREATION 
  trunk/test/org/apache/pig/data/utils/TestMethodHelper.java PRE-CREATION 
  trunk/test/org/apache/pig/test/TestDataBag.java 1355561 
  trunk/test/org/apache/pig/test/TestLogicalPlanBuilder.java 1355561 
  trunk/test/org/apache/pig/test/TestPrimitiveFieldTuple.java 1355561 
  trunk/test/org/apache/pig/test/TestPrimitiveTuple.java 1355561 
  trunk/test/org/apache/pig/test/TestSchema.java 1355561 

Diff: https://reviews.apache.org/r/4651/diff/


Testing
-------


Thanks,

Jonathan Coveney


Re: Review Request: SchemaTuple in Pig

Posted by Jonathan Coveney <jc...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4651/
-----------------------------------------------------------

(Updated June 29, 2012, 9:21 p.m.)


Review request for pig and Julien Le Dem.


Changes
-------

Hopefully this is close to what will make it in. Need some eyes on the new pieces, however: it integrates with FR joins and merge joins. We ran it on a job here and saw significant memory benefits!


Description
-------

This work builds on Dmitriy's PrimitiveTuple work. The idea is that, knowing the Schema on the frontend, we can code generate Tuples which can be used for fun and profit. In rudimentary tests, the memory efficiency is 2-4x better, and it's ~15% smaller serialized (heavily heavily depends on the data, though). Need to do get/set tests, but assuming that it's on par (or even faster) than Tuple, the memory gain is huge.

Need to clean up the code and add tests.

Right now, it generates a SchemaTuple for every inputSchema and outputSchema given to UDF's. The next step is to make a SchemaBag, where I think the serialization savings will be really huge.

Needs tests and comments, but I want the code to settle a bit.


This addresses bug PIG-2632.
    https://issues.apache.org/jira/browse/PIG-2632


Diffs (updated)
-----

  trunk/.gitignore 1355561 
  trunk/conf/pig.properties 1355561 
  trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java 1355561 
  trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigGenericMapBase.java 1355561 
  trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigGenericMapReduce.java 1355561 
  trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigTupleDefaultRawComparator.java 1355561 
  trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/PhysicalOperator.java 1355561 
  trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java 1355561 
  trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POFRJoin.java 1355561 
  trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POMergeJoin.java 1355561 
  trunk/src/org/apache/pig/builtin/mock/Storage.java 1355561 
  trunk/src/org/apache/pig/data/AppendableSchemaTuple.java PRE-CREATION 
  trunk/src/org/apache/pig/data/BinInterSedes.java 1355561 
  trunk/src/org/apache/pig/data/BinSedesTupleFactory.java 1355561 
  trunk/src/org/apache/pig/data/DataByteArray.java 1355561 
  trunk/src/org/apache/pig/data/FieldIsNullException.java PRE-CREATION 
  trunk/src/org/apache/pig/data/PBooleanTuple.java 1355561 
  trunk/src/org/apache/pig/data/PDoubleTuple.java 1355561 
  trunk/src/org/apache/pig/data/PFloatTuple.java 1355561 
  trunk/src/org/apache/pig/data/PIntTuple.java 1355561 
  trunk/src/org/apache/pig/data/PLongTuple.java 1355561 
  trunk/src/org/apache/pig/data/PStringTuple.java 1355561 
  trunk/src/org/apache/pig/data/PrimitiveFieldTuple.java 1355561 
  trunk/src/org/apache/pig/data/PrimitiveTuple.java 1355561 
  trunk/src/org/apache/pig/data/SchemaTuple.java PRE-CREATION 
  trunk/src/org/apache/pig/data/SchemaTupleBackend.java PRE-CREATION 
  trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java PRE-CREATION 
  trunk/src/org/apache/pig/data/SchemaTupleFactory.java PRE-CREATION 
  trunk/src/org/apache/pig/data/SchemaTupleFrontend.java PRE-CREATION 
  trunk/src/org/apache/pig/data/TupleFactory.java 1355561 
  trunk/src/org/apache/pig/data/TypeAwareTuple.java 1355561 
  trunk/src/org/apache/pig/data/utils/BytesHelper.java PRE-CREATION 
  trunk/src/org/apache/pig/data/utils/MethodHelper.java PRE-CREATION 
  trunk/src/org/apache/pig/data/utils/SedesHelper.java PRE-CREATION 
  trunk/src/org/apache/pig/data/utils/StructuresHelper.java PRE-CREATION 
  trunk/src/org/apache/pig/impl/PigContext.java 1355561 
  trunk/src/org/apache/pig/impl/io/InterRecordReader.java 1355561 
  trunk/src/org/apache/pig/impl/io/NullableTuple.java 1355561 
  trunk/src/org/apache/pig/newplan/logical/expression/ExpToPhyTranslationVisitor.java 1355561 
  trunk/src/org/apache/pig/newplan/logical/expression/UserFuncExpression.java 1355561 
  trunk/src/org/apache/pig/newplan/logical/relational/LogToPhyTranslationVisitor.java 1355561 
  trunk/src/org/apache/pig/newplan/logical/relational/LogicalRelationalOperator.java 1355561 
  trunk/src/org/apache/pig/newplan/logical/rules/GroupByConstParallelSetter.java 1355561 
  trunk/src/org/apache/pig/newplan/logical/rules/MergeForEach.java 1355561 
  trunk/test/org/apache/pig/data/TestSchemaTuple.java PRE-CREATION 
  trunk/test/org/apache/pig/data/utils/TestMethodHelper.java PRE-CREATION 
  trunk/test/org/apache/pig/test/TestDataBag.java 1355561 
  trunk/test/org/apache/pig/test/TestLogicalPlanBuilder.java 1355561 
  trunk/test/org/apache/pig/test/TestPrimitiveFieldTuple.java 1355561 
  trunk/test/org/apache/pig/test/TestPrimitiveTuple.java 1355561 
  trunk/test/org/apache/pig/test/TestSchema.java 1355561 

Diff: https://reviews.apache.org/r/4651/diff/


Testing
-------


Thanks,

Jonathan Coveney


Re: Review Request: SchemaTuple in Pig

Posted by Jonathan Coveney <jc...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4651/
-----------------------------------------------------------

(Updated June 20, 2012, 4:35 a.m.)


Review request for pig and Julien Le Dem.


Changes
-------

Argh same issue as before. Forgot to add the new files in SVN. This should work.


Description
-------

This work builds on Dmitriy's PrimitiveTuple work. The idea is that, knowing the Schema on the frontend, we can code generate Tuples which can be used for fun and profit. In rudimentary tests, the memory efficiency is 2-4x better, and it's ~15% smaller serialized (heavily heavily depends on the data, though). Need to do get/set tests, but assuming that it's on par (or even faster) than Tuple, the memory gain is huge.

Need to clean up the code and add tests.

Right now, it generates a SchemaTuple for every inputSchema and outputSchema given to UDF's. The next step is to make a SchemaBag, where I think the serialization savings will be really huge.

Needs tests and comments, but I want the code to settle a bit.


This addresses bug PIG-2632.
    https://issues.apache.org/jira/browse/PIG-2632


Diffs (updated)
-----

  trunk/src/docs/src/documentation/content/xdocs/perf.xml 1351931 
  trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java 1351931 
  trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MRCompiler.java 1351931 
  trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigGenericMapBase.java 1351931 
  trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigGenericMapReduce.java 1351931 
  trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigTupleDefaultRawComparator.java 1351931 
  trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/PhysicalOperator.java 1351931 
  trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java 1351931 
  trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POMergeJoin.java 1351931 
  trunk/src/org/apache/pig/data/AppendableSchemaTuple.java PRE-CREATION 
  trunk/src/org/apache/pig/data/BinInterSedes.java 1351931 
  trunk/src/org/apache/pig/data/BinSedesTupleFactory.java 1351931 
  trunk/src/org/apache/pig/data/DataByteArray.java 1351931 
  trunk/src/org/apache/pig/data/FieldIsNullException.java PRE-CREATION 
  trunk/src/org/apache/pig/data/PBooleanTuple.java 1351931 
  trunk/src/org/apache/pig/data/PDoubleTuple.java 1351931 
  trunk/src/org/apache/pig/data/PFloatTuple.java 1351931 
  trunk/src/org/apache/pig/data/PIntTuple.java 1351931 
  trunk/src/org/apache/pig/data/PLongTuple.java 1351931 
  trunk/src/org/apache/pig/data/PStringTuple.java 1351931 
  trunk/src/org/apache/pig/data/PrimitiveFieldTuple.java 1351931 
  trunk/src/org/apache/pig/data/PrimitiveTuple.java 1351931 
  trunk/src/org/apache/pig/data/SchemaTuple.java PRE-CREATION 
  trunk/src/org/apache/pig/data/SchemaTupleBackend.java PRE-CREATION 
  trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java PRE-CREATION 
  trunk/src/org/apache/pig/data/SchemaTupleFactory.java PRE-CREATION 
  trunk/src/org/apache/pig/data/SchemaTupleFrontend.java PRE-CREATION 
  trunk/src/org/apache/pig/data/TupleFactory.java 1351931 
  trunk/src/org/apache/pig/data/TypeAwareTuple.java 1351931 
  trunk/src/org/apache/pig/data/utils/BytesHelper.java PRE-CREATION 
  trunk/src/org/apache/pig/data/utils/MethodHelper.java PRE-CREATION 
  trunk/src/org/apache/pig/data/utils/SedesHelper.java PRE-CREATION 
  trunk/src/org/apache/pig/data/utils/StructuresHelper.java PRE-CREATION 
  trunk/src/org/apache/pig/impl/PigContext.java 1351931 
  trunk/src/org/apache/pig/impl/io/NullableTuple.java 1351931 
  trunk/src/org/apache/pig/newplan/logical/expression/ExpToPhyTranslationVisitor.java 1351931 
  trunk/src/org/apache/pig/newplan/logical/expression/UserFuncExpression.java 1351931 
  trunk/src/org/apache/pig/newplan/logical/relational/LogToPhyTranslationVisitor.java 1351931 
  trunk/src/org/apache/pig/tools/pigstats/ScriptState.java 1351931 
  trunk/test/org/apache/pig/data/TestSchemaTuple.java PRE-CREATION 
  trunk/test/org/apache/pig/data/utils/TestMethodHelper.java PRE-CREATION 
  trunk/test/org/apache/pig/test/TestDataBag.java 1351931 
  trunk/test/org/apache/pig/test/TestPrimitiveFieldTuple.java 1351931 
  trunk/test/org/apache/pig/test/TestPrimitiveTuple.java 1351931 
  trunk/test/org/apache/pig/test/TestSchema.java 1351931 

Diff: https://reviews.apache.org/r/4651/diff/


Testing
-------


Thanks,

Jonathan Coveney


Re: Review Request: SchemaTuple in Pig

Posted by Jonathan Coveney <jc...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4651/
-----------------------------------------------------------

(Updated June 20, 2012, 4:28 a.m.)


Review request for pig and Julien Le Dem.


Changes
-------

This patch now incorporates work from: https://issues.apache.org/jira/browse/PIG-2673
The goal being to leverage SchemaTuples to make merge joins more performance from a memory perspective (since the current implementation keeps a list of tuples).

And I tried to add more tests. I'd like to get a to do list of what needs to be done for this to get committed, if possible.


Description
-------

This work builds on Dmitriy's PrimitiveTuple work. The idea is that, knowing the Schema on the frontend, we can code generate Tuples which can be used for fun and profit. In rudimentary tests, the memory efficiency is 2-4x better, and it's ~15% smaller serialized (heavily heavily depends on the data, though). Need to do get/set tests, but assuming that it's on par (or even faster) than Tuple, the memory gain is huge.

Need to clean up the code and add tests.

Right now, it generates a SchemaTuple for every inputSchema and outputSchema given to UDF's. The next step is to make a SchemaBag, where I think the serialization savings will be really huge.

Needs tests and comments, but I want the code to settle a bit.


This addresses bug PIG-2632.
    https://issues.apache.org/jira/browse/PIG-2632


Diffs (updated)
-----

  trunk/src/docs/src/documentation/content/xdocs/perf.xml 1351931 
  trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java 1351931 
  trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MRCompiler.java 1351931 
  trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigGenericMapBase.java 1351931 
  trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigGenericMapReduce.java 1351931 
  trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigTupleDefaultRawComparator.java 1351931 
  trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/PhysicalOperator.java 1351931 
  trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java 1351931 
  trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POMergeJoin.java 1351931 
  trunk/src/org/apache/pig/data/BinInterSedes.java 1351931 
  trunk/src/org/apache/pig/data/BinSedesTupleFactory.java 1351931 
  trunk/src/org/apache/pig/data/DataByteArray.java 1351931 
  trunk/src/org/apache/pig/data/TupleFactory.java 1351931 
  trunk/src/org/apache/pig/data/TypeAwareTuple.java 1351931 
  trunk/src/org/apache/pig/impl/PigContext.java 1351931 
  trunk/src/org/apache/pig/impl/io/NullableTuple.java 1351931 
  trunk/src/org/apache/pig/newplan/logical/expression/ExpToPhyTranslationVisitor.java 1351931 
  trunk/src/org/apache/pig/newplan/logical/expression/UserFuncExpression.java 1351931 
  trunk/src/org/apache/pig/newplan/logical/relational/LogToPhyTranslationVisitor.java 1351931 
  trunk/src/org/apache/pig/tools/pigstats/ScriptState.java 1351931 
  trunk/test/org/apache/pig/test/TestDataBag.java 1351931 
  trunk/test/org/apache/pig/test/TestSchema.java 1351931 

Diff: https://reviews.apache.org/r/4651/diff/


Testing
-------


Thanks,

Jonathan Coveney


Re: Review Request: SchemaTuple in Pig

Posted by Jonathan Coveney <jc...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4651/
-----------------------------------------------------------

(Updated June 18, 2012, 6:49 p.m.)


Review request for pig and Julien Le Dem.


Changes
-------

This is the cutting edge diff of record. Has all the files, and takes into account some comments.


Description
-------

This work builds on Dmitriy's PrimitiveTuple work. The idea is that, knowing the Schema on the frontend, we can code generate Tuples which can be used for fun and profit. In rudimentary tests, the memory efficiency is 2-4x better, and it's ~15% smaller serialized (heavily heavily depends on the data, though). Need to do get/set tests, but assuming that it's on par (or even faster) than Tuple, the memory gain is huge.

Need to clean up the code and add tests.

Right now, it generates a SchemaTuple for every inputSchema and outputSchema given to UDF's. The next step is to make a SchemaBag, where I think the serialization savings will be really huge.

Needs tests and comments, but I want the code to settle a bit.


This addresses bug PIG-2632.
    https://issues.apache.org/jira/browse/PIG-2632


Diffs (updated)
-----

  trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java 1351455 
  trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigGenericMapBase.java 1351455 
  trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigGenericMapReduce.java 1351455 
  trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigTupleDefaultRawComparator.java 1351455 
  trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/PhysicalOperator.java 1351455 
  trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java 1351455 
  trunk/src/org/apache/pig/data/AppendableSchemaTuple.java PRE-CREATION 
  trunk/src/org/apache/pig/data/BinInterSedes.java 1351455 
  trunk/src/org/apache/pig/data/BinSedesTupleFactory.java 1351455 
  trunk/src/org/apache/pig/data/DataByteArray.java 1351455 
  trunk/src/org/apache/pig/data/FieldIsNullException.java PRE-CREATION 
  trunk/src/org/apache/pig/data/PBooleanTuple.java 1351455 
  trunk/src/org/apache/pig/data/PDoubleTuple.java 1351455 
  trunk/src/org/apache/pig/data/PFloatTuple.java 1351455 
  trunk/src/org/apache/pig/data/PIntTuple.java 1351455 
  trunk/src/org/apache/pig/data/PLongTuple.java 1351455 
  trunk/src/org/apache/pig/data/PStringTuple.java 1351455 
  trunk/src/org/apache/pig/data/PrimitiveFieldTuple.java 1351455 
  trunk/src/org/apache/pig/data/PrimitiveTuple.java 1351455 
  trunk/src/org/apache/pig/data/SchemaTuple.java PRE-CREATION 
  trunk/src/org/apache/pig/data/SchemaTupleBackend.java PRE-CREATION 
  trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java PRE-CREATION 
  trunk/src/org/apache/pig/data/SchemaTupleFactory.java PRE-CREATION 
  trunk/src/org/apache/pig/data/SchemaTupleFrontend.java PRE-CREATION 
  trunk/src/org/apache/pig/data/TupleFactory.java 1351455 
  trunk/src/org/apache/pig/data/TypeAwareTuple.java 1351455 
  trunk/src/org/apache/pig/data/utils/BytesHelper.java PRE-CREATION 
  trunk/src/org/apache/pig/data/utils/MethodHelper.java PRE-CREATION 
  trunk/src/org/apache/pig/data/utils/SedesHelper.java PRE-CREATION 
  trunk/src/org/apache/pig/data/utils/StructuresHelper.java PRE-CREATION 
  trunk/src/org/apache/pig/impl/PigContext.java 1351455 
  trunk/src/org/apache/pig/impl/io/NullableTuple.java 1351455 
  trunk/src/org/apache/pig/newplan/logical/expression/ExpToPhyTranslationVisitor.java 1351455 
  trunk/src/org/apache/pig/newplan/logical/expression/UserFuncExpression.java 1351455 
  trunk/test/org/apache/pig/test/TestDataBag.java 1351455 
  trunk/test/org/apache/pig/test/TestPrimitiveFieldTuple.java 1351455 
  trunk/test/org/apache/pig/test/TestPrimitiveTuple.java 1351455 
  trunk/test/org/apache/pig/test/TestSchema.java 1351455 

Diff: https://reviews.apache.org/r/4651/diff/


Testing
-------


Thanks,

Jonathan Coveney


Re: Review Request: SchemaTuple in Pig

Posted by Jonathan Coveney <jc...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4651/
-----------------------------------------------------------

(Updated June 18, 2012, 6:05 p.m.)


Review request for pig and Julien Le Dem.


Changes
-------

I had to use svn to make the patch though I am developing in git, so I forgot to add all the new files...


Description
-------

This work builds on Dmitriy's PrimitiveTuple work. The idea is that, knowing the Schema on the frontend, we can code generate Tuples which can be used for fun and profit. In rudimentary tests, the memory efficiency is 2-4x better, and it's ~15% smaller serialized (heavily heavily depends on the data, though). Need to do get/set tests, but assuming that it's on par (or even faster) than Tuple, the memory gain is huge.

Need to clean up the code and add tests.

Right now, it generates a SchemaTuple for every inputSchema and outputSchema given to UDF's. The next step is to make a SchemaBag, where I think the serialization savings will be really huge.

Needs tests and comments, but I want the code to settle a bit.


This addresses bug PIG-2632.
    https://issues.apache.org/jira/browse/PIG-2632


Diffs (updated)
-----

  trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java 1351417 
  trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigGenericMapBase.java 1351417 
  trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigGenericMapReduce.java 1351417 
  trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigTupleDefaultRawComparator.java 1351417 
  trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/PhysicalOperator.java 1351417 
  trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java 1351417 
  trunk/src/org/apache/pig/data/AppendableSchemaTuple.java PRE-CREATION 
  trunk/src/org/apache/pig/data/BinInterSedes.java 1351417 
  trunk/src/org/apache/pig/data/DataByteArray.java 1351417 
  trunk/src/org/apache/pig/data/FieldIsNullException.java PRE-CREATION 
  trunk/src/org/apache/pig/data/PBooleanTuple.java 1351417 
  trunk/src/org/apache/pig/data/PDoubleTuple.java 1351417 
  trunk/src/org/apache/pig/data/PFloatTuple.java 1351417 
  trunk/src/org/apache/pig/data/PIntTuple.java 1351417 
  trunk/src/org/apache/pig/data/PLongTuple.java 1351417 
  trunk/src/org/apache/pig/data/PStringTuple.java 1351417 
  trunk/src/org/apache/pig/data/PrimitiveFieldTuple.java 1351417 
  trunk/src/org/apache/pig/data/PrimitiveTuple.java 1351417 
  trunk/src/org/apache/pig/data/SchemaTuple.java PRE-CREATION 
  trunk/src/org/apache/pig/data/SchemaTupleBackend.java PRE-CREATION 
  trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java PRE-CREATION 
  trunk/src/org/apache/pig/data/SchemaTupleFactory.java PRE-CREATION 
  trunk/src/org/apache/pig/data/SchemaTupleFrontend.java PRE-CREATION 
  trunk/src/org/apache/pig/data/TupleFactory.java 1351417 
  trunk/src/org/apache/pig/data/TypeAwareTuple.java 1351417 
  trunk/src/org/apache/pig/data/utils/BytesHelper.java PRE-CREATION 
  trunk/src/org/apache/pig/data/utils/MethodHelper.java PRE-CREATION 
  trunk/src/org/apache/pig/data/utils/SedesHelper.java PRE-CREATION 
  trunk/src/org/apache/pig/data/utils/StructuresHelper.java PRE-CREATION 
  trunk/src/org/apache/pig/impl/PigContext.java 1351417 
  trunk/src/org/apache/pig/impl/io/NullableTuple.java 1351417 
  trunk/src/org/apache/pig/impl/io/PigNullableWritable.java 1351417 
  trunk/src/org/apache/pig/newplan/logical/expression/ExpToPhyTranslationVisitor.java 1351417 
  trunk/src/org/apache/pig/newplan/logical/expression/UserFuncExpression.java 1351417 
  trunk/test/org/apache/pig/data/TestSchemaTuple.java PRE-CREATION 
  trunk/test/org/apache/pig/data/utils/TestMethodHelper.java PRE-CREATION 
  trunk/test/org/apache/pig/test/TestDataBag.java 1351417 
  trunk/test/org/apache/pig/test/TestPrimitiveFieldTuple.java 1351417 
  trunk/test/org/apache/pig/test/TestPrimitiveTuple.java 1351417 
  trunk/test/org/apache/pig/test/TestSchema.java 1351417 

Diff: https://reviews.apache.org/r/4651/diff/


Testing
-------


Thanks,

Jonathan Coveney


Re: Review Request: SchemaTuple in Pig

Posted by Jonathan Coveney <jc...@gmail.com>.

> On June 18, 2012, 6:02 p.m., Julien Le Dem wrote:
> > trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java, lines 147-150
> > <https://reviews.apache.org/r/4651/diff/4/?file=111761#file111761line147>
> >
> >     remove?

I'll put a more instructive comment, but the point is that eventually we may want to use Schematuple for the output as well...


> On June 18, 2012, 6:02 p.m., Julien Le Dem wrote:
> > trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java, line 155
> > <https://reviews.apache.org/r/4651/diff/4/?file=111761#file111761line155>
> >
> >     you could also have a method isFixedSize() in TupleFactory.

Not a bad idea. Although I do not think this actually solves the problem... you can have a TupleFactory that isn't a fixed size ie AppendableSchemaTuple, but appends don't start at index 0.


> On June 18, 2012, 6:02 p.m., Julien Le Dem wrote:
> > trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java, lines 194-196
> > <https://reviews.apache.org/r/4651/diff/4/?file=111761#file111761line194>
> >
> >     you could also do this at the same time you would have initialized the Schema based factory (line 144)
> >     if the assigment happens in the constructor you can even make inputTupleFactory final

The assignment doesn't happen in the constructor alas, it happens in instantiateFunc. This function is only called in the constructor, but I am dubious of refactoring it. I could, though.


> On June 18, 2012, 6:02 p.m., Julien Le Dem wrote:
> > trunk/src/org/apache/pig/data/BinInterSedes.java, line 148
> > <https://reviews.apache.org/r/4651/diff/4/?file=111762#file111762line148>
> >
> >     where is this used?

readGenericTuple (which is in org.apache.pig.data.utils in SedesHelper) as well as addColstoTuple in BinInterSedes


> On June 18, 2012, 6:02 p.m., Julien Le Dem wrote:
> > trunk/src/org/apache/pig/data/DataByteArray.java, lines 235-243
> > <https://reviews.apache.org/r/4651/diff/4/?file=111763#file111763line235>
> >
> >     http://docs.oracle.com/javase/1.5.0/docs/api/java/util/Arrays.html#hashCode%28byte[]%29
> >     I'm not sure I get why these should use different primes...

Agreed


> On June 18, 2012, 6:02 p.m., Julien Le Dem wrote:
> > trunk/src/org/apache/pig/data/TupleFactory.java, line 157
> > <https://reviews.apache.org/r/4651/diff/4/?file=111764#file111764line157>
> >
> >     I prefer just Schema as a parameter here.
> >     I'll look again when you add SchemaTupleFrontend and Backend

Yeah, take a look once it is added. There's no real getting around it if we want people to have control over when generated code is and is not used (ie yes to join optimizations, no to udf, and so on)


- Jonathan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4651/#review8332
-----------------------------------------------------------


On June 18, 2012, 6:05 p.m., Jonathan Coveney wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/4651/
> -----------------------------------------------------------
> 
> (Updated June 18, 2012, 6:05 p.m.)
> 
> 
> Review request for pig and Julien Le Dem.
> 
> 
> Description
> -------
> 
> This work builds on Dmitriy's PrimitiveTuple work. The idea is that, knowing the Schema on the frontend, we can code generate Tuples which can be used for fun and profit. In rudimentary tests, the memory efficiency is 2-4x better, and it's ~15% smaller serialized (heavily heavily depends on the data, though). Need to do get/set tests, but assuming that it's on par (or even faster) than Tuple, the memory gain is huge.
> 
> Need to clean up the code and add tests.
> 
> Right now, it generates a SchemaTuple for every inputSchema and outputSchema given to UDF's. The next step is to make a SchemaBag, where I think the serialization savings will be really huge.
> 
> Needs tests and comments, but I want the code to settle a bit.
> 
> 
> This addresses bug PIG-2632.
>     https://issues.apache.org/jira/browse/PIG-2632
> 
> 
> Diffs
> -----
> 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java 1351417 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigGenericMapBase.java 1351417 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigGenericMapReduce.java 1351417 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigTupleDefaultRawComparator.java 1351417 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/PhysicalOperator.java 1351417 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java 1351417 
>   trunk/src/org/apache/pig/data/AppendableSchemaTuple.java PRE-CREATION 
>   trunk/src/org/apache/pig/data/BinInterSedes.java 1351417 
>   trunk/src/org/apache/pig/data/DataByteArray.java 1351417 
>   trunk/src/org/apache/pig/data/FieldIsNullException.java PRE-CREATION 
>   trunk/src/org/apache/pig/data/PBooleanTuple.java 1351417 
>   trunk/src/org/apache/pig/data/PDoubleTuple.java 1351417 
>   trunk/src/org/apache/pig/data/PFloatTuple.java 1351417 
>   trunk/src/org/apache/pig/data/PIntTuple.java 1351417 
>   trunk/src/org/apache/pig/data/PLongTuple.java 1351417 
>   trunk/src/org/apache/pig/data/PStringTuple.java 1351417 
>   trunk/src/org/apache/pig/data/PrimitiveFieldTuple.java 1351417 
>   trunk/src/org/apache/pig/data/PrimitiveTuple.java 1351417 
>   trunk/src/org/apache/pig/data/SchemaTuple.java PRE-CREATION 
>   trunk/src/org/apache/pig/data/SchemaTupleBackend.java PRE-CREATION 
>   trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java PRE-CREATION 
>   trunk/src/org/apache/pig/data/SchemaTupleFactory.java PRE-CREATION 
>   trunk/src/org/apache/pig/data/SchemaTupleFrontend.java PRE-CREATION 
>   trunk/src/org/apache/pig/data/TupleFactory.java 1351417 
>   trunk/src/org/apache/pig/data/TypeAwareTuple.java 1351417 
>   trunk/src/org/apache/pig/data/utils/BytesHelper.java PRE-CREATION 
>   trunk/src/org/apache/pig/data/utils/MethodHelper.java PRE-CREATION 
>   trunk/src/org/apache/pig/data/utils/SedesHelper.java PRE-CREATION 
>   trunk/src/org/apache/pig/data/utils/StructuresHelper.java PRE-CREATION 
>   trunk/src/org/apache/pig/impl/PigContext.java 1351417 
>   trunk/src/org/apache/pig/impl/io/NullableTuple.java 1351417 
>   trunk/src/org/apache/pig/impl/io/PigNullableWritable.java 1351417 
>   trunk/src/org/apache/pig/newplan/logical/expression/ExpToPhyTranslationVisitor.java 1351417 
>   trunk/src/org/apache/pig/newplan/logical/expression/UserFuncExpression.java 1351417 
>   trunk/test/org/apache/pig/data/TestSchemaTuple.java PRE-CREATION 
>   trunk/test/org/apache/pig/data/utils/TestMethodHelper.java PRE-CREATION 
>   trunk/test/org/apache/pig/test/TestDataBag.java 1351417 
>   trunk/test/org/apache/pig/test/TestPrimitiveFieldTuple.java 1351417 
>   trunk/test/org/apache/pig/test/TestPrimitiveTuple.java 1351417 
>   trunk/test/org/apache/pig/test/TestSchema.java 1351417 
> 
> Diff: https://reviews.apache.org/r/4651/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Jonathan Coveney
> 
>


Re: Review Request: SchemaTuple in Pig

Posted by Julien Le Dem <ju...@ledem.net>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4651/#review8332
-----------------------------------------------------------



trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java
<https://reviews.apache.org/r/4651/#comment17954>

    do we want to get rid of those lines?



trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java
<https://reviews.apache.org/r/4651/#comment17955>

    remove?



trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java
<https://reviews.apache.org/r/4651/#comment17956>

    you could also have a method isFixedSize() in TupleFactory.



trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java
<https://reviews.apache.org/r/4651/#comment17958>

    you could also do this at the same time you would have initialized the Schema based factory (line 144)
    if the assigment happens in the constructor you can even make inputTupleFactory final



trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java
<https://reviews.apache.org/r/4651/#comment17960>

    that's where -w does not do what you want :(



trunk/src/org/apache/pig/data/BinInterSedes.java
<https://reviews.apache.org/r/4651/#comment17962>

    where is this used?



trunk/src/org/apache/pig/data/DataByteArray.java
<https://reviews.apache.org/r/4651/#comment17965>

    http://docs.oracle.com/javase/1.5.0/docs/api/java/util/Arrays.html#hashCode%28byte[]%29
    I'm not sure I get why these should use different primes...



trunk/src/org/apache/pig/data/TupleFactory.java
<https://reviews.apache.org/r/4651/#comment17973>

    I prefer just Schema as a parameter here.
    I'll look again when you add SchemaTupleFrontend and Backend



trunk/src/org/apache/pig/impl/PigContext.java
<https://reviews.apache.org/r/4651/#comment17967>

    update javadoc



trunk/src/org/apache/pig/impl/io/PigNullableWritable.java
<https://reviews.apache.org/r/4651/#comment17969>

    those have setters, use them instead



trunk/src/org/apache/pig/newplan/logical/expression/UserFuncExpression.java
<https://reviews.apache.org/r/4651/#comment17971>

    move up the declaration of inputSchemaToGen and reuse it


- Julien Le Dem


On June 18, 2012, 5:22 p.m., Jonathan Coveney wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/4651/
> -----------------------------------------------------------
> 
> (Updated June 18, 2012, 5:22 p.m.)
> 
> 
> Review request for pig and Julien Le Dem.
> 
> 
> Description
> -------
> 
> This work builds on Dmitriy's PrimitiveTuple work. The idea is that, knowing the Schema on the frontend, we can code generate Tuples which can be used for fun and profit. In rudimentary tests, the memory efficiency is 2-4x better, and it's ~15% smaller serialized (heavily heavily depends on the data, though). Need to do get/set tests, but assuming that it's on par (or even faster) than Tuple, the memory gain is huge.
> 
> Need to clean up the code and add tests.
> 
> Right now, it generates a SchemaTuple for every inputSchema and outputSchema given to UDF's. The next step is to make a SchemaBag, where I think the serialization savings will be really huge.
> 
> Needs tests and comments, but I want the code to settle a bit.
> 
> 
> This addresses bug PIG-2632.
>     https://issues.apache.org/jira/browse/PIG-2632
> 
> 
> Diffs
> -----
> 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java 1351417 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigGenericMapBase.java 1351417 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigGenericMapReduce.java 1351417 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigTupleDefaultRawComparator.java 1351417 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/PhysicalOperator.java 1351417 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java 1351417 
>   trunk/src/org/apache/pig/data/BinInterSedes.java 1351417 
>   trunk/src/org/apache/pig/data/DataByteArray.java 1351417 
>   trunk/src/org/apache/pig/data/TupleFactory.java 1351417 
>   trunk/src/org/apache/pig/data/TypeAwareTuple.java 1351417 
>   trunk/src/org/apache/pig/impl/PigContext.java 1351417 
>   trunk/src/org/apache/pig/impl/io/NullableTuple.java 1351417 
>   trunk/src/org/apache/pig/impl/io/PigNullableWritable.java 1351417 
>   trunk/src/org/apache/pig/newplan/logical/expression/ExpToPhyTranslationVisitor.java 1351417 
>   trunk/src/org/apache/pig/newplan/logical/expression/UserFuncExpression.java 1351417 
>   trunk/test/org/apache/pig/test/TestDataBag.java 1351417 
>   trunk/test/org/apache/pig/test/TestSchema.java 1351417 
> 
> Diff: https://reviews.apache.org/r/4651/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Jonathan Coveney
> 
>