You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by Cheolsoo Park <ch...@cloudera.com> on 2012/09/01 03:02:16 UTC

Review Request: PIG-2579 Support for multiple input schemas in AvroStorage

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6884/
-----------------------------------------------------------

Review request for pig.


Description
-------

Add support for multiple avro schemas to AvroStorage. This patch is based on Stan Rosenberg's original work.

Please see https://issues.apache.org/jira/browse/PIG-2579 for details


This addresses bug PIG-2579.
    https://issues.apache.org/jira/browse/PIG-2579


Diffs
-----

  contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java 012b908 
  contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java 84280af 
  contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroInputFormat.java 452a98b 
  contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroRecordReader.java d2ef08d 
  contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java 1ca228e 
  contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorageUtils.java 0761d5a 

Diff: https://reviews.apache.org/r/6884/diff/


Testing
-------

New unit tests are added:
- TestAvroStorageUtils.testMergeSchema
- TestAvroStorage.testMultipleSchemas1,2


Thanks,

Cheolsoo Park


Re: Review Request: PIG-2579 Support for multiple input schemas in AvroStorage

Posted by Cheolsoo Park <ch...@cloudera.com>.

> On Sept. 17, 2012, 11:14 p.m., Santhosh Srinivasan wrote:
> > Have a few comments.

Thank you very much for reviewing my patch, Santhosh! Please find responses inline.


> On Sept. 17, 2012, 11:14 p.m., Santhosh Srinivasan wrote:
> > contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java, line 199
> > <https://reviews.apache.org/r/6884/diff/3/?file=154465#file154465line199>
> >
> >     Why aren't float and double returning float and double respectively?

I am worried about loss of precision by converting a big positive/negative int to float/double. Do you disagree?


> On Sept. 17, 2012, 11:14 p.m., Santhosh Srinivasan wrote:
> > contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java, line 210
> > <https://reviews.apache.org/r/6884/diff/3/?file=154465#file154465line210>
> >
> >     Why aren't float and double returning float and double respectively?

Same as above.


> On Sept. 17, 2012, 11:14 p.m., Santhosh Srinivasan wrote:
> > contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java, line 221
> > <https://reviews.apache.org/r/6884/diff/3/?file=154465#file154465line221>
> >
> >     Why aren't int and long returning float ?

Same as above.


> On Sept. 17, 2012, 11:14 p.m., Santhosh Srinivasan wrote:
> > contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java, line 232
> > <https://reviews.apache.org/r/6884/diff/3/?file=154465#file154465line232>
> >
> >     Why aren't int and long returning double ?

Same as above.


> On Sept. 17, 2012, 11:14 p.m., Santhosh Srinivasan wrote:
> > contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java, line 273
> > <https://reviews.apache.org/r/6884/diff/3/?file=154465#file154465line273>
> >
> >     Is the equality applicable all the way through?

No, this is just an enum equality check. However, that's sufficient here since mergeType() is meant to merge primitive types only.

I added a comment regarding what this method is meant for.


> On Sept. 17, 2012, 11:14 p.m., Santhosh Srinivasan wrote:
> > contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroRecordReader.java, line 144
> > <https://reviews.apache.org/r/6884/diff/3/?file=154467#file154467line144>
> >
> >     Are the size of t and mProtoTuple expected to match?

No, they are not always expected to match because the size of mProtoTuple is equal to that of merged schema while the size of t is equal to that of individual input schemas.

In fact, I realized that this loop is redundant after all because the initialization of mProtoTuple is already done inside the constructor. So I removed this.


> On Sept. 17, 2012, 11:14 p.m., Santhosh Srinivasan wrote:
> > contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroRecordReader.java, line 93
> > <https://reviews.apache.org/r/6884/diff/3/?file=154467#file154467line93>
> >
> >     This will append to the end of the list resulting in a list size of 2*tupleSize. Was that your intention?

No, that's not my intention.

But I believe that initializing the capability of ArrayList doesn't initialize its size. The size seems to be 0 until an entry is actually added to it.

Nevertheless, I changed it to add(i, null), so now it's more clear.


> On Sept. 17, 2012, 11:14 p.m., Santhosh Srinivasan wrote:
> > contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorageUtils.java, line 184
> > <https://reviews.apache.org/r/6884/diff/3/?file=154469#file154469line184>
> >
> >     Why isn't bytes + other non-null type = bytes ?

My reasoning is that bytes is just blob, so its type is unknown; therefore, the merged type is unknown. I wanted to be conservative in terms of type converting to avoid run-time exceptions.

But please let me know if you think otherwise.


> On Sept. 17, 2012, 11:14 p.m., Santhosh Srinivasan wrote:
> > contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java, line 132
> > <https://reviews.apache.org/r/6884/diff/3/?file=154464#file154464line132>
> >
> >     Is it multiple_schema or multiple_schemas ?

It should be multiple_schemas. Fixed.


> On Sept. 17, 2012, 11:14 p.m., Santhosh Srinivasan wrote:
> > contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java, line 311
> > <https://reviews.apache.org/r/6884/diff/3/?file=154464#file154464line311>
> >
> >     Any reason for removing this here and not in other places?

I put it back.


> On Sept. 17, 2012, 11:14 p.m., Santhosh Srinivasan wrote:
> > contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java, line 362
> > <https://reviews.apache.org/r/6884/diff/3/?file=154464#file154464line362>
> >
> >     The logic used here is close to the logic used in setLocation. Is it possible to refactor the code and move this to a utility method?

Their logic is very similar, but there is a difference. At a high level, they'are like this:

// setLocation
if (a) {
  do x;
  do y;
}

// getSchema
if (a) {
  do x;
}

where

a: getAllSubDirs()
x: initialize inputAvroSchema
y: call FileInputFormat.setInputPaths()

I factored out the code for initializing inputAvroSchema into a method setInputAvroSchema(), but I found it hard to refactor more than this. Please let me know if you have any suggestion.

By the way, while thinking about this, I eliminated getConcretePathFromGlob(). Originally, getConcretePathFromGlob() was called by getAvroSchema() to load the schema from the first file that matches the glob pattern. But since we build 'paths' by calling getAllSubDirs() anyway, it seems better to resue 'paths' inside getAvroSchema().


> On Sept. 17, 2012, 11:14 p.m., Santhosh Srinivasan wrote:
> > contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java, line 130
> > <https://reviews.apache.org/r/6884/diff/3/?file=154464#file154464line130>
> >
> >     Can these string constants be declared to make them reusable and more readable?


- Cheolsoo


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6884/#review11566
-----------------------------------------------------------


On Sept. 19, 2012, 12:10 a.m., Cheolsoo Park wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/6884/
> -----------------------------------------------------------
> 
> (Updated Sept. 19, 2012, 12:10 a.m.)
> 
> 
> Review request for pig and Santhosh Srinivasan.
> 
> 
> Description
> -------
> 
> Add support for multiple avro schemas to AvroStorage. This patch is based on Stan Rosenberg's original work.
> 
> Please see https://issues.apache.org/jira/browse/PIG-2579 for details
> 
> 
> This addresses bug PIG-2579.
>     https://issues.apache.org/jira/browse/PIG-2579
> 
> 
> Diffs
> -----
> 
>   contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java d7a004f 
>   contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java 84280af 
>   contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroInputFormat.java fb5cc25 
>   contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroRecordReader.java 75057f9 
>   contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java 1f6e581 
>   contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorageUtils.java 0761d5a 
> 
> Diff: https://reviews.apache.org/r/6884/diff/
> 
> 
> Testing
> -------
> 
> New unit tests are added:
> - TestAvroStorageUtils.testMergeSchema
> - TestAvroStorage.testMultipleSchemas1,2
> 
> 
> Thanks,
> 
> Cheolsoo Park
> 
>


Re: Review Request: PIG-2579 Support for multiple input schemas in AvroStorage

Posted by Santhosh Srinivasan <sm...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6884/#review11566
-----------------------------------------------------------


Have a few comments.


contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java
<https://reviews.apache.org/r/6884/#comment24871>

    Can these string constants be declared to make them reusable and more readable?



contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java
<https://reviews.apache.org/r/6884/#comment24872>

    Is it multiple_schema or multiple_schemas ?



contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java
<https://reviews.apache.org/r/6884/#comment25050>

    Use a descriptive variable name - path instead of p.



contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java
<https://reviews.apache.org/r/6884/#comment25052>

    Any reason for removing this here and not in other places?



contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java
<https://reviews.apache.org/r/6884/#comment25054>

    The logic used here is close to the logic used in setLocation. Is it possible to refactor the code and move this to a utility method?



contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java
<https://reviews.apache.org/r/6884/#comment25058>

    Why aren't float and double returning float and double respectively?



contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java
<https://reviews.apache.org/r/6884/#comment25059>

    Why aren't float and double returning float and double respectively?



contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java
<https://reviews.apache.org/r/6884/#comment25060>

    Why aren't int and long returning float ?



contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java
<https://reviews.apache.org/r/6884/#comment25061>

    Why aren't int and long returning double ?



contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java
<https://reviews.apache.org/r/6884/#comment25078>

    Can you add a more descriptive formal Javadoc?



contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java
<https://reviews.apache.org/r/6884/#comment25065>

    Is the equality applicable all the way through?



contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java
<https://reviews.apache.org/r/6884/#comment25062>

    Use a descriptive variable name - xField



contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java
<https://reviews.apache.org/r/6884/#comment25064>

    Use a descriptive variable name - yField



contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java
<https://reviews.apache.org/r/6884/#comment25063>

    Use a descriptive variable name - entry



contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java
<https://reviews.apache.org/r/6884/#comment25066>

    Use a descriptive name - xSchema



contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java
<https://reviews.apache.org/r/6884/#comment25067>

    Use a descriptive name - ySchema



contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java
<https://reviews.apache.org/r/6884/#comment25068>

    Can you change the message in the exception to "Cannot merge FIXED types with different sizes:"



contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java
<https://reviews.apache.org/r/6884/#comment25077>

    Can you add a more descriptive formal Javadoc?



contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java
<https://reviews.apache.org/r/6884/#comment25070>

    Can the name of the method be changed to getSchemaToMergedSchemaMap ?



contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java
<https://reviews.apache.org/r/6884/#comment25071>

    Use a descriptive variable name - entry?



contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java
<https://reviews.apache.org/r/6884/#comment25072>

    Use path



contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java
<https://reviews.apache.org/r/6884/#comment25073>

    Use schema



contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroInputFormat.java
<https://reviews.apache.org/r/6884/#comment25080>

    Can you add a comment describing this data structure?



contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroInputFormat.java
<https://reviews.apache.org/r/6884/#comment25082>

    Can this be removed or rephrased? Since this is a public method, it can be called by any method



contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroRecordReader.java
<https://reviews.apache.org/r/6884/#comment25101>

    Can you add a comment on the variable(s) ?



contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroRecordReader.java
<https://reviews.apache.org/r/6884/#comment25103>

    This will append to the end of the list resulting in a list size of 2*tupleSize. Was that your intention?



contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroRecordReader.java
<https://reviews.apache.org/r/6884/#comment25121>

    Are the size of t and mProtoTuple expected to match?



contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java
<https://reviews.apache.org/r/6884/#comment25122>

    Should it be testMultipleSchemas1 ?



contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java
<https://reviews.apache.org/r/6884/#comment25125>

    Should it be testMultipleSchemas2 ?



contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorageUtils.java
<https://reviews.apache.org/r/6884/#comment25134>

    This test should be split into multiple tests



contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorageUtils.java
<https://reviews.apache.org/r/6884/#comment25131>

    Why isn't bytes + other non-null type = bytes ?


- Santhosh Srinivasan


On Sept. 13, 2012, 10:46 p.m., Cheolsoo Park wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/6884/
> -----------------------------------------------------------
> 
> (Updated Sept. 13, 2012, 10:46 p.m.)
> 
> 
> Review request for pig and Santhosh Srinivasan.
> 
> 
> Description
> -------
> 
> Add support for multiple avro schemas to AvroStorage. This patch is based on Stan Rosenberg's original work.
> 
> Please see https://issues.apache.org/jira/browse/PIG-2579 for details
> 
> 
> This addresses bug PIG-2579.
>     https://issues.apache.org/jira/browse/PIG-2579
> 
> 
> Diffs
> -----
> 
>   contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java d7a004f 
>   contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java 84280af 
>   contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroInputFormat.java fb5cc25 
>   contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroRecordReader.java 75057f9 
>   contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java 1f6e581 
>   contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorageUtils.java 0761d5a 
> 
> Diff: https://reviews.apache.org/r/6884/diff/
> 
> 
> Testing
> -------
> 
> New unit tests are added:
> - TestAvroStorageUtils.testMergeSchema
> - TestAvroStorage.testMultipleSchemas1,2
> 
> 
> Thanks,
> 
> Cheolsoo Park
> 
>


Re: Review Request: PIG-2579 Support for multiple input schemas in AvroStorage

Posted by Santhosh Srinivasan <sm...@apache.org>.

> On Oct. 10, 2012, 9:50 p.m., Santhosh Srinivasan wrote:
> > contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroRecordReader.java, line 102
> > <https://reviews.apache.org/r/6884/diff/3-4/?file=154467#file154467line102>
> >
> >     Can you replace add with set since you are already initializing the capacity of the array list?
> 
> Cheolsoo Park wrote:
>     No, I can't because the size of list is still 0 even though I initialize its capacity until I add entries to it. Now set() throws an OutOfBoundException if (index < 0 || index >= size()), which is always true with any non-negative index as size() == 0.
>     
>     http://docs.oracle.com/javase/6/docs/api/java/util/ArrayList.html#set(int, E)

Thanks! My concerns around the performance are addressed.


- Santhosh


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6884/#review12319
-----------------------------------------------------------


On Oct. 10, 2012, 11:15 p.m., Cheolsoo Park wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/6884/
> -----------------------------------------------------------
> 
> (Updated Oct. 10, 2012, 11:15 p.m.)
> 
> 
> Review request for pig and Santhosh Srinivasan.
> 
> 
> Description
> -------
> 
> Add support for multiple avro schemas to AvroStorage. This patch is based on Stan Rosenberg's original work.
> 
> Please see https://issues.apache.org/jira/browse/PIG-2579 for details
> 
> 
> This addresses bug PIG-2579.
>     https://issues.apache.org/jira/browse/PIG-2579
> 
> 
> Diffs
> -----
> 
>   contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java d7a004f 
>   contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java 84280af 
>   contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroInputFormat.java fb5cc25 
>   contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroRecordReader.java 75057f9 
>   contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java 1f6e581 
>   contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorageUtils.java 0761d5a 
> 
> Diff: https://reviews.apache.org/r/6884/diff/
> 
> 
> Testing
> -------
> 
> New unit tests are added:
> - TestAvroStorageUtils.testMergeSchema
> - TestAvroStorage.testMultipleSchemas1,2
> 
> 
> Thanks,
> 
> Cheolsoo Park
> 
>


Re: Review Request: PIG-2579 Support for multiple input schemas in AvroStorage

Posted by Cheolsoo Park <ch...@cloudera.com>.

> On Oct. 10, 2012, 9:50 p.m., Santhosh Srinivasan wrote:
> > contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroRecordReader.java, line 102
> > <https://reviews.apache.org/r/6884/diff/3-4/?file=154467#file154467line102>
> >
> >     Can you replace add with set since you are already initializing the capacity of the array list?

No, I can't because the size of list is still 0 even though I initialize its capacity until I add entries to it. Now set() throws an OutOfBoundException if (index < 0 || index >= size()), which is always true with any non-negative index as size() == 0.

http://docs.oracle.com/javase/6/docs/api/java/util/ArrayList.html#set(int, E)


- Cheolsoo


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6884/#review12319
-----------------------------------------------------------


On Oct. 10, 2012, 11:15 p.m., Cheolsoo Park wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/6884/
> -----------------------------------------------------------
> 
> (Updated Oct. 10, 2012, 11:15 p.m.)
> 
> 
> Review request for pig and Santhosh Srinivasan.
> 
> 
> Description
> -------
> 
> Add support for multiple avro schemas to AvroStorage. This patch is based on Stan Rosenberg's original work.
> 
> Please see https://issues.apache.org/jira/browse/PIG-2579 for details
> 
> 
> This addresses bug PIG-2579.
>     https://issues.apache.org/jira/browse/PIG-2579
> 
> 
> Diffs
> -----
> 
>   contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java d7a004f 
>   contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java 84280af 
>   contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroInputFormat.java fb5cc25 
>   contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroRecordReader.java 75057f9 
>   contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java 1f6e581 
>   contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorageUtils.java 0761d5a 
> 
> Diff: https://reviews.apache.org/r/6884/diff/
> 
> 
> Testing
> -------
> 
> New unit tests are added:
> - TestAvroStorageUtils.testMergeSchema
> - TestAvroStorage.testMultipleSchemas1,2
> 
> 
> Thanks,
> 
> Cheolsoo Park
> 
>


Re: Review Request: PIG-2579 Support for multiple input schemas in AvroStorage

Posted by Santhosh Srinivasan <sm...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6884/#review12319
-----------------------------------------------------------


Couple of comments. Otherwise, the patch looks good.


contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroRecordReader.java
<https://reviews.apache.org/r/6884/#comment26091>

    Can you replace add with set since you are already initializing the capacity of the array list?



contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorageUtils.java
<https://reviews.apache.org/r/6884/#comment26092>

    Can you add a message for the failure?


- Santhosh Srinivasan


On Sept. 28, 2012, 8:44 p.m., Cheolsoo Park wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/6884/
> -----------------------------------------------------------
> 
> (Updated Sept. 28, 2012, 8:44 p.m.)
> 
> 
> Review request for pig and Santhosh Srinivasan.
> 
> 
> Description
> -------
> 
> Add support for multiple avro schemas to AvroStorage. This patch is based on Stan Rosenberg's original work.
> 
> Please see https://issues.apache.org/jira/browse/PIG-2579 for details
> 
> 
> This addresses bug PIG-2579.
>     https://issues.apache.org/jira/browse/PIG-2579
> 
> 
> Diffs
> -----
> 
>   contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java d7a004f 
>   contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java 84280af 
>   contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroInputFormat.java fb5cc25 
>   contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroRecordReader.java 75057f9 
>   contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java 1f6e581 
>   contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorageUtils.java 0761d5a 
> 
> Diff: https://reviews.apache.org/r/6884/diff/
> 
> 
> Testing
> -------
> 
> New unit tests are added:
> - TestAvroStorageUtils.testMergeSchema
> - TestAvroStorage.testMultipleSchemas1,2
> 
> 
> Thanks,
> 
> Cheolsoo Park
> 
>


Re: Review Request: PIG-2579 Support for multiple input schemas in AvroStorage

Posted by Santhosh Srinivasan <sm...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6884/#review12337
-----------------------------------------------------------

Ship it!


I will run the unit tests and commit the patch if everything passes.

- Santhosh Srinivasan


On Oct. 10, 2012, 11:15 p.m., Cheolsoo Park wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/6884/
> -----------------------------------------------------------
> 
> (Updated Oct. 10, 2012, 11:15 p.m.)
> 
> 
> Review request for pig and Santhosh Srinivasan.
> 
> 
> Description
> -------
> 
> Add support for multiple avro schemas to AvroStorage. This patch is based on Stan Rosenberg's original work.
> 
> Please see https://issues.apache.org/jira/browse/PIG-2579 for details
> 
> 
> This addresses bug PIG-2579.
>     https://issues.apache.org/jira/browse/PIG-2579
> 
> 
> Diffs
> -----
> 
>   contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java d7a004f 
>   contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java 84280af 
>   contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroInputFormat.java fb5cc25 
>   contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroRecordReader.java 75057f9 
>   contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java 1f6e581 
>   contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorageUtils.java 0761d5a 
> 
> Diff: https://reviews.apache.org/r/6884/diff/
> 
> 
> Testing
> -------
> 
> New unit tests are added:
> - TestAvroStorageUtils.testMergeSchema
> - TestAvroStorage.testMultipleSchemas1,2
> 
> 
> Thanks,
> 
> Cheolsoo Park
> 
>


Re: Review Request: PIG-2579 Support for multiple input schemas in AvroStorage

Posted by Cheolsoo Park <ch...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6884/
-----------------------------------------------------------

(Updated Oct. 10, 2012, 11:15 p.m.)


Review request for pig and Santhosh Srinivasan.


Changes
-------

Incorporate Santhosh's comments.


Description
-------

Add support for multiple avro schemas to AvroStorage. This patch is based on Stan Rosenberg's original work.

Please see https://issues.apache.org/jira/browse/PIG-2579 for details


This addresses bug PIG-2579.
    https://issues.apache.org/jira/browse/PIG-2579


Diffs (updated)
-----

  contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java d7a004f 
  contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java 84280af 
  contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroInputFormat.java fb5cc25 
  contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroRecordReader.java 75057f9 
  contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java 1f6e581 
  contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorageUtils.java 0761d5a 

Diff: https://reviews.apache.org/r/6884/diff/


Testing
-------

New unit tests are added:
- TestAvroStorageUtils.testMergeSchema
- TestAvroStorage.testMultipleSchemas1,2


Thanks,

Cheolsoo Park


Re: Review Request: PIG-2579 Support for multiple input schemas in AvroStorage

Posted by Cheolsoo Park <ch...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6884/
-----------------------------------------------------------

(Updated Sept. 28, 2012, 8:44 p.m.)


Review request for pig and Santhosh Srinivasan.


Changes
-------

After discussing with Santhosh offline, I made changes to type merging as follows:
1) int + float/double = float/double
2) long + float/double = float/double

Although this may result in loss of precision, I decided to do so because:
1) This is consistent with how Pig merges numeric types.
2) Converting numeric types to strings (e.g. int + float/double = string) will result in loss of type information, which seems more significant than loss of precision particularly when data producers are different from data consumers.


Description
-------

Add support for multiple avro schemas to AvroStorage. This patch is based on Stan Rosenberg's original work.

Please see https://issues.apache.org/jira/browse/PIG-2579 for details


This addresses bug PIG-2579.
    https://issues.apache.org/jira/browse/PIG-2579


Diffs (updated)
-----

  contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java d7a004f 
  contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java 84280af 
  contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroInputFormat.java fb5cc25 
  contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroRecordReader.java 75057f9 
  contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java 1f6e581 
  contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorageUtils.java 0761d5a 

Diff: https://reviews.apache.org/r/6884/diff/


Testing
-------

New unit tests are added:
- TestAvroStorageUtils.testMergeSchema
- TestAvroStorage.testMultipleSchemas1,2


Thanks,

Cheolsoo Park


Re: Review Request: PIG-2579 Support for multiple input schemas in AvroStorage

Posted by Cheolsoo Park <ch...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6884/
-----------------------------------------------------------

(Updated Sept. 19, 2012, 12:10 a.m.)


Review request for pig and Santhosh Srinivasan.


Changes
-------

Incorporate Santhosh's comments.


Description
-------

Add support for multiple avro schemas to AvroStorage. This patch is based on Stan Rosenberg's original work.

Please see https://issues.apache.org/jira/browse/PIG-2579 for details


This addresses bug PIG-2579.
    https://issues.apache.org/jira/browse/PIG-2579


Diffs (updated)
-----

  contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java d7a004f 
  contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java 84280af 
  contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroInputFormat.java fb5cc25 
  contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroRecordReader.java 75057f9 
  contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java 1f6e581 
  contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorageUtils.java 0761d5a 

Diff: https://reviews.apache.org/r/6884/diff/


Testing
-------

New unit tests are added:
- TestAvroStorageUtils.testMergeSchema
- TestAvroStorage.testMultipleSchemas1,2


Thanks,

Cheolsoo Park


Re: Review Request: PIG-2579 Support for multiple input schemas in AvroStorage

Posted by Cheolsoo Park <ch...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6884/
-----------------------------------------------------------

(Updated Sept. 13, 2012, 10:46 p.m.)


Review request for pig and Santhosh Srinivasan.


Changes
-------

Rebased the patch to trunk.


Description
-------

Add support for multiple avro schemas to AvroStorage. This patch is based on Stan Rosenberg's original work.

Please see https://issues.apache.org/jira/browse/PIG-2579 for details


This addresses bug PIG-2579.
    https://issues.apache.org/jira/browse/PIG-2579


Diffs (updated)
-----

  contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java d7a004f 
  contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java 84280af 
  contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroInputFormat.java fb5cc25 
  contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroRecordReader.java 75057f9 
  contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java 1f6e581 
  contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorageUtils.java 0761d5a 

Diff: https://reviews.apache.org/r/6884/diff/


Testing
-------

New unit tests are added:
- TestAvroStorageUtils.testMergeSchema
- TestAvroStorage.testMultipleSchemas1,2


Thanks,

Cheolsoo Park


Re: Review Request: PIG-2579 Support for multiple input schemas in AvroStorage

Posted by Cheolsoo Park <ch...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6884/
-----------------------------------------------------------

(Updated Sept. 10, 2012, 11:08 p.m.)


Review request for pig and Santhosh Srinivasan.


Description
-------

Add support for multiple avro schemas to AvroStorage. This patch is based on Stan Rosenberg's original work.

Please see https://issues.apache.org/jira/browse/PIG-2579 for details


This addresses bug PIG-2579.
    https://issues.apache.org/jira/browse/PIG-2579


Diffs
-----

  contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java 012b908 
  contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java 84280af 
  contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroInputFormat.java 452a98b 
  contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroRecordReader.java d2ef08d 
  contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java 1ca228e 
  contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorageUtils.java 0761d5a 

Diff: https://reviews.apache.org/r/6884/diff/


Testing
-------

New unit tests are added:
- TestAvroStorageUtils.testMergeSchema
- TestAvroStorage.testMultipleSchemas1,2


Thanks,

Cheolsoo Park


Re: Review Request: PIG-2579 Support for multiple input schemas in AvroStorage

Posted by Cheolsoo Park <ch...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6884/
-----------------------------------------------------------

(Updated Sept. 2, 2012, 3:47 a.m.)


Review request for pig.


Description
-------

Add support for multiple avro schemas to AvroStorage. This patch is based on Stan Rosenberg's original work.

Please see https://issues.apache.org/jira/browse/PIG-2579 for details


This addresses bug PIG-2579.
    https://issues.apache.org/jira/browse/PIG-2579


Diffs (updated)
-----

  contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java 012b908 
  contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java 84280af 
  contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroInputFormat.java 452a98b 
  contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroRecordReader.java d2ef08d 
  contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java 1ca228e 
  contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorageUtils.java 0761d5a 

Diff: https://reviews.apache.org/r/6884/diff/


Testing
-------

New unit tests are added:
- TestAvroStorageUtils.testMergeSchema
- TestAvroStorage.testMultipleSchemas1,2


Thanks,

Cheolsoo Park