You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by Cheolsoo Park <ch...@cloudera.com> on 2012/08/13 11:47:37 UTC

Review Request: PIG-2869 Add recursive record support to AvroStorage

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6536/
-----------------------------------------------------------

Review request for pig.


Description
-------

Allow recursive records to be loaded/stored by AvroStorage.

The changes include:

1) Remove the recursive record check from AvroSchema2Pig.
2) Modofy inconvert() in AvroSchema2Pig so that it can map recursive records to bytearrays.
3) Modify containsGenericUnion() in AvroStorageUtils so that it can handle Avro schema that contains recursive records.
4) Update the parameter parsing in AvroStorage so that 'no_schema_check' can be passed to both LoadFunc and StoreFunc.
5) Add the recursive record check to AvroSchemaManager. This is needed because 'schema_file' and 'data' cannot refer to avro schema that contains recursive records.

AvroStorage works as follows:

1) PigSchema maps recursive records to bytearrays, so there is discrepancy between Avro schema and Pig schema.
2) Recursive records are loaded as tuples even though Pig schema defines them as bytearrays and can be referred to by position (e.g. $0, $1.$0, etc).
3) To store recursive records, Avro schema must be provided via the 'schema' or 'same' parameter in StoreFunc. In addition, 'no_schema_check' must be enabled because otherwise schema check will fail due to discrepancy between Avro schema and Pig schema.
4) Avro schema cannot be specified by the 'data' or 'schema_file' parameter. This is because AvroSchemaManager cannot handle recursive records for now. The recursive record check is added to AvroSchemaManager, so if Avro schema that contains recursive records is specified by these parameters, an exception is thrown.


This addresses bug PIG-2869.
    https://issues.apache.org/jira/browse/PIG-2869


Diffs
-----

  contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroSchema2Pig.java 6b1d2a1 
  contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroSchemaManager.java 1939d3e 
  contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java c9f7d81 
  contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java e24b495 
  contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java 2fab3f7 
  contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorageUtils.java 040234f 
  contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/avro_test_files/test_generic_union_schema.avro 4e23e73 
  contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/avro_test_files/test_recursive_schema.avro 12a36f8 

Diff: https://reviews.apache.org/r/6536/diff/


Testing
-------

New test cases are added as follows:

1) Load/store Avro files that contain recursive records in array, map, union, and another record.
2) Load Avro files that contains recursive records, generate new relations, apply filters, and store them as non-recursive records.
3) Tests for the StoreFunc parameters: no_schema_check, schema, same, schema_file, and data.


Thanks,

Cheolsoo Park


Re: Review Request: PIG-2875 Add recursive record support to AvroStorage

Posted by Santhosh Srinivasan <sm...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6536/#review10524
-----------------------------------------------------------


A couple of comments. I see white spaces (red color) in the diffs. Can you check if its tab or spaces. If its a tab then it has to be replaced with spaces. Other than that one minor comment.


contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java
<https://reviews.apache.org/r/6536/#comment22529>

    Move the comment into the assertion, i.e., 
    
    Assert.fail("Negative test to test an exception. Should not be succeeding!");
    
    This makes debugging the tests easier.



contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java
<https://reviews.apache.org/r/6536/#comment22530>

    Move the comment into the assertion, i.e., 
    
    Assert.fail("Negative test to test an exception. Should not be succeeding!");
    
    This makes debugging the tests easier.



contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java
<https://reviews.apache.org/r/6536/#comment22531>

    Move the comment into the assertion, i.e., 
    
    Assert.fail("Negative test to test an exception. Should not be succeeding!");
    
    This makes debugging the tests easier.



contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java
<https://reviews.apache.org/r/6536/#comment22532>

    Move the comment into the assertion, i.e., 
    
    Assert.fail("Negative test to test an exception. Should not be succeeding!");
    
    This makes debugging the tests easier.



contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java
<https://reviews.apache.org/r/6536/#comment22533>

    Move the comment into the assertion, i.e., 
    
    Assert.fail("Negative test to test an exception. Should not be succeeding!");
    
    This makes debugging the tests easier.


- Santhosh Srinivasan


On Aug. 16, 2012, 10:18 p.m., Cheolsoo Park wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/6536/
> -----------------------------------------------------------
> 
> (Updated Aug. 16, 2012, 10:18 p.m.)
> 
> 
> Review request for pig.
> 
> 
> Description
> -------
> 
> Allow recursive records to be loaded/stored by AvroStorage.
> 
> The changes include:
> 
> 1) Remove the recursive record check from AvroSchema2Pig.
> 2) Modofy inconvert() in AvroSchema2Pig so that it can map recursive records to bytearrays.
> 3) Modify containsGenericUnion() in AvroStorageUtils so that it can handle Avro schema that contains recursive records.
> 4) Update the parameter parsing in AvroStorage so that 'no_schema_check' can be passed to both LoadFunc and StoreFunc.
> 5) Add the recursive record check to AvroSchemaManager. This is needed because 'schema_file' and 'data' cannot refer to avro schema that contains recursive records.
> 
> AvroStorage works as follows:
> 
> 1) PigSchema maps recursive records to bytearrays, so there is discrepancy between Avro schema and Pig schema.
> 2) Recursive records are loaded as tuples even though Pig schema defines them as bytearrays and can be referred to by position (e.g. $0, $1.$0, etc).
> 3) To store recursive records, Avro schema must be provided via the 'schema' or 'same' parameter in StoreFunc. In addition, 'no_schema_check' must be enabled because otherwise schema check will fail due to discrepancy between Avro schema and Pig schema.
> 4) Avro schema cannot be specified by the 'data' or 'schema_file' parameter. This is because AvroSchemaManager cannot handle recursive records for now. The recursive record check is added to AvroSchemaManager, so if Avro schema that contains recursive records is specified by these parameters, an exception is thrown.
> 
> 
> This addresses bug PIG-2875.
>     https://issues.apache.org/jira/browse/PIG-2875
> 
> 
> Diffs
> -----
> 
>   contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroSchema2Pig.java 6b1d2a1 
>   contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroSchemaManager.java 1939d3e 
>   contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java c9f7d81 
>   contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java e24b495 
>   contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java 2fab3f7 
>   contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorageUtils.java 040234f 
> 
> Diff: https://reviews.apache.org/r/6536/diff/
> 
> 
> Testing
> -------
> 
> New test cases are added as follows:
> 
> 1) Load/store Avro files that contain recursive records in array, map, union, and another record.
> 2) Load Avro files that contains recursive records, generate new relations, apply filters, and store them as non-recursive records.
> 3) Tests for the StoreFunc parameters: no_schema_check, schema, same, schema_file, and data.
> 
> 
> Thanks,
> 
> Cheolsoo Park
> 
>


Re: Review Request: PIG-2875 Add recursive record support to AvroStorage

Posted by Santhosh Srinivasan <sm...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6536/#review10526
-----------------------------------------------------------

Ship it!


+1. Will commit it if all the tests pass

- Santhosh Srinivasan


On Aug. 20, 2012, 4:50 a.m., Cheolsoo Park wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/6536/
> -----------------------------------------------------------
> 
> (Updated Aug. 20, 2012, 4:50 a.m.)
> 
> 
> Review request for pig.
> 
> 
> Description
> -------
> 
> Allow recursive records to be loaded/stored by AvroStorage.
> 
> The changes include:
> 
> 1) Remove the recursive record check from AvroSchema2Pig.
> 2) Modofy inconvert() in AvroSchema2Pig so that it can map recursive records to bytearrays.
> 3) Modify containsGenericUnion() in AvroStorageUtils so that it can handle Avro schema that contains recursive records.
> 4) Update the parameter parsing in AvroStorage so that 'no_schema_check' can be passed to both LoadFunc and StoreFunc.
> 5) Add the recursive record check to AvroSchemaManager. This is needed because 'schema_file' and 'data' cannot refer to avro schema that contains recursive records.
> 
> AvroStorage works as follows:
> 
> 1) PigSchema maps recursive records to bytearrays, so there is discrepancy between Avro schema and Pig schema.
> 2) Recursive records are loaded as tuples even though Pig schema defines them as bytearrays and can be referred to by position (e.g. $0, $1.$0, etc).
> 3) To store recursive records, Avro schema must be provided via the 'schema' or 'same' parameter in StoreFunc. In addition, 'no_schema_check' must be enabled because otherwise schema check will fail due to discrepancy between Avro schema and Pig schema.
> 4) Avro schema cannot be specified by the 'data' or 'schema_file' parameter. This is because AvroSchemaManager cannot handle recursive records for now. The recursive record check is added to AvroSchemaManager, so if Avro schema that contains recursive records is specified by these parameters, an exception is thrown.
> 
> 
> This addresses bug PIG-2875.
>     https://issues.apache.org/jira/browse/PIG-2875
> 
> 
> Diffs
> -----
> 
>   contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroSchema2Pig.java 6b1d2a1 
>   contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroSchemaManager.java 1939d3e 
>   contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java c9f7d81 
>   contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java e24b495 
>   contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java 2fab3f7 
>   contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorageUtils.java 040234f 
>   contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/avro_test_files/test_generic_union_schema.avro 4e23e73 
>   contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/avro_test_files/test_recursive_schema.avro 12a36f8 
> 
> Diff: https://reviews.apache.org/r/6536/diff/
> 
> 
> Testing
> -------
> 
> New test cases are added as follows:
> 
> 1) Load/store Avro files that contain recursive records in array, map, union, and another record.
> 2) Load Avro files that contains recursive records, generate new relations, apply filters, and store them as non-recursive records.
> 3) Tests for the StoreFunc parameters: no_schema_check, schema, same, schema_file, and data.
> 
> 
> Thanks,
> 
> Cheolsoo Park
> 
>


Re: Review Request: PIG-2875 Add recursive record support to AvroStorage

Posted by Santhosh Srinivasan <sm...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6536/#review10529
-----------------------------------------------------------

Ship it!


+1. Will run the tests again.

- Santhosh Srinivasan


On Aug. 20, 2012, 6:40 a.m., Cheolsoo Park wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/6536/
> -----------------------------------------------------------
> 
> (Updated Aug. 20, 2012, 6:40 a.m.)
> 
> 
> Review request for pig.
> 
> 
> Description
> -------
> 
> Allow recursive records to be loaded/stored by AvroStorage.
> 
> The changes include:
> 
> 1) Remove the recursive record check from AvroSchema2Pig.
> 2) Modofy inconvert() in AvroSchema2Pig so that it can map recursive records to bytearrays.
> 3) Modify containsGenericUnion() in AvroStorageUtils so that it can handle Avro schema that contains recursive records.
> 4) Update the parameter parsing in AvroStorage so that 'no_schema_check' can be passed to both LoadFunc and StoreFunc.
> 5) Add the recursive record check to AvroSchemaManager. This is needed because 'schema_file' and 'data' cannot refer to avro schema that contains recursive records.
> 
> AvroStorage works as follows:
> 
> 1) PigSchema maps recursive records to bytearrays, so there is discrepancy between Avro schema and Pig schema.
> 2) Recursive records are loaded as tuples even though Pig schema defines them as bytearrays and can be referred to by position (e.g. $0, $1.$0, etc).
> 3) To store recursive records, Avro schema must be provided via the 'schema' or 'same' parameter in StoreFunc. In addition, 'no_schema_check' must be enabled because otherwise schema check will fail due to discrepancy between Avro schema and Pig schema.
> 4) Avro schema cannot be specified by the 'data' or 'schema_file' parameter. This is because AvroSchemaManager cannot handle recursive records for now. The recursive record check is added to AvroSchemaManager, so if Avro schema that contains recursive records is specified by these parameters, an exception is thrown.
> 
> 
> This addresses bug PIG-2875.
>     https://issues.apache.org/jira/browse/PIG-2875
> 
> 
> Diffs
> -----
> 
>   contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroSchema2Pig.java 6b1d2a1 
>   contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroSchemaManager.java 1939d3e 
>   contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java c9f7d81 
>   contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java e24b495 
>   contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java 2fab3f7 
>   contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorageUtils.java 040234f 
>   contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/avro_test_files/test_generic_union_schema.avro 4e23e73 
>   contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/avro_test_files/test_recursive_schema.avro 12a36f8 
> 
> Diff: https://reviews.apache.org/r/6536/diff/
> 
> 
> Testing
> -------
> 
> New test cases are added as follows:
> 
> 1) Load/store Avro files that contain recursive records in array, map, union, and another record.
> 2) Load Avro files that contains recursive records, generate new relations, apply filters, and store them as non-recursive records.
> 3) Tests for the StoreFunc parameters: no_schema_check, schema, same, schema_file, and data.
> 
> 
> Thanks,
> 
> Cheolsoo Park
> 
>


Re: Review Request: PIG-2875 Add recursive record support to AvroStorage

Posted by Cheolsoo Park <ch...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6536/
-----------------------------------------------------------

(Updated Aug. 20, 2012, 6:40 a.m.)


Review request for pig.


Changes
-------

Fix a typo in a test case.


Description
-------

Allow recursive records to be loaded/stored by AvroStorage.

The changes include:

1) Remove the recursive record check from AvroSchema2Pig.
2) Modofy inconvert() in AvroSchema2Pig so that it can map recursive records to bytearrays.
3) Modify containsGenericUnion() in AvroStorageUtils so that it can handle Avro schema that contains recursive records.
4) Update the parameter parsing in AvroStorage so that 'no_schema_check' can be passed to both LoadFunc and StoreFunc.
5) Add the recursive record check to AvroSchemaManager. This is needed because 'schema_file' and 'data' cannot refer to avro schema that contains recursive records.

AvroStorage works as follows:

1) PigSchema maps recursive records to bytearrays, so there is discrepancy between Avro schema and Pig schema.
2) Recursive records are loaded as tuples even though Pig schema defines them as bytearrays and can be referred to by position (e.g. $0, $1.$0, etc).
3) To store recursive records, Avro schema must be provided via the 'schema' or 'same' parameter in StoreFunc. In addition, 'no_schema_check' must be enabled because otherwise schema check will fail due to discrepancy between Avro schema and Pig schema.
4) Avro schema cannot be specified by the 'data' or 'schema_file' parameter. This is because AvroSchemaManager cannot handle recursive records for now. The recursive record check is added to AvroSchemaManager, so if Avro schema that contains recursive records is specified by these parameters, an exception is thrown.


This addresses bug PIG-2875.
    https://issues.apache.org/jira/browse/PIG-2875


Diffs (updated)
-----

  contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroSchema2Pig.java 6b1d2a1 
  contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroSchemaManager.java 1939d3e 
  contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java c9f7d81 
  contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java e24b495 
  contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java 2fab3f7 
  contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorageUtils.java 040234f 
  contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/avro_test_files/test_generic_union_schema.avro 4e23e73 
  contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/avro_test_files/test_recursive_schema.avro 12a36f8 

Diff: https://reviews.apache.org/r/6536/diff/


Testing
-------

New test cases are added as follows:

1) Load/store Avro files that contain recursive records in array, map, union, and another record.
2) Load Avro files that contains recursive records, generate new relations, apply filters, and store them as non-recursive records.
3) Tests for the StoreFunc parameters: no_schema_check, schema, same, schema_file, and data.


Thanks,

Cheolsoo Park


Re: Review Request: PIG-2875 Add recursive record support to AvroStorage

Posted by Cheolsoo Park <ch...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6536/
-----------------------------------------------------------

(Updated Aug. 20, 2012, 4:50 a.m.)


Review request for pig.


Changes
-------

1) Removed trailing whitespaces.
2) Moved comments for negative tests into assertions.


Description
-------

Allow recursive records to be loaded/stored by AvroStorage.

The changes include:

1) Remove the recursive record check from AvroSchema2Pig.
2) Modofy inconvert() in AvroSchema2Pig so that it can map recursive records to bytearrays.
3) Modify containsGenericUnion() in AvroStorageUtils so that it can handle Avro schema that contains recursive records.
4) Update the parameter parsing in AvroStorage so that 'no_schema_check' can be passed to both LoadFunc and StoreFunc.
5) Add the recursive record check to AvroSchemaManager. This is needed because 'schema_file' and 'data' cannot refer to avro schema that contains recursive records.

AvroStorage works as follows:

1) PigSchema maps recursive records to bytearrays, so there is discrepancy between Avro schema and Pig schema.
2) Recursive records are loaded as tuples even though Pig schema defines them as bytearrays and can be referred to by position (e.g. $0, $1.$0, etc).
3) To store recursive records, Avro schema must be provided via the 'schema' or 'same' parameter in StoreFunc. In addition, 'no_schema_check' must be enabled because otherwise schema check will fail due to discrepancy between Avro schema and Pig schema.
4) Avro schema cannot be specified by the 'data' or 'schema_file' parameter. This is because AvroSchemaManager cannot handle recursive records for now. The recursive record check is added to AvroSchemaManager, so if Avro schema that contains recursive records is specified by these parameters, an exception is thrown.


This addresses bug PIG-2875.
    https://issues.apache.org/jira/browse/PIG-2875


Diffs (updated)
-----

  contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroSchema2Pig.java 6b1d2a1 
  contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroSchemaManager.java 1939d3e 
  contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java c9f7d81 
  contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java e24b495 
  contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java 2fab3f7 
  contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorageUtils.java 040234f 
  contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/avro_test_files/test_generic_union_schema.avro 4e23e73 
  contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/avro_test_files/test_recursive_schema.avro 12a36f8 

Diff: https://reviews.apache.org/r/6536/diff/


Testing
-------

New test cases are added as follows:

1) Load/store Avro files that contain recursive records in array, map, union, and another record.
2) Load Avro files that contains recursive records, generate new relations, apply filters, and store them as non-recursive records.
3) Tests for the StoreFunc parameters: no_schema_check, schema, same, schema_file, and data.


Thanks,

Cheolsoo Park


Re: Review Request: PIG-2875 Add recursive record support to AvroStorage

Posted by Cheolsoo Park <ch...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6536/
-----------------------------------------------------------

(Updated Aug. 16, 2012, 10:18 p.m.)


Review request for pig.


Changes
-------

Incorporate Santhosh's comments.


Description
-------

Allow recursive records to be loaded/stored by AvroStorage.

The changes include:

1) Remove the recursive record check from AvroSchema2Pig.
2) Modofy inconvert() in AvroSchema2Pig so that it can map recursive records to bytearrays.
3) Modify containsGenericUnion() in AvroStorageUtils so that it can handle Avro schema that contains recursive records.
4) Update the parameter parsing in AvroStorage so that 'no_schema_check' can be passed to both LoadFunc and StoreFunc.
5) Add the recursive record check to AvroSchemaManager. This is needed because 'schema_file' and 'data' cannot refer to avro schema that contains recursive records.

AvroStorage works as follows:

1) PigSchema maps recursive records to bytearrays, so there is discrepancy between Avro schema and Pig schema.
2) Recursive records are loaded as tuples even though Pig schema defines them as bytearrays and can be referred to by position (e.g. $0, $1.$0, etc).
3) To store recursive records, Avro schema must be provided via the 'schema' or 'same' parameter in StoreFunc. In addition, 'no_schema_check' must be enabled because otherwise schema check will fail due to discrepancy between Avro schema and Pig schema.
4) Avro schema cannot be specified by the 'data' or 'schema_file' parameter. This is because AvroSchemaManager cannot handle recursive records for now. The recursive record check is added to AvroSchemaManager, so if Avro schema that contains recursive records is specified by these parameters, an exception is thrown.


This addresses bug PIG-2875.
    https://issues.apache.org/jira/browse/PIG-2875


Diffs (updated)
-----

  contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroSchema2Pig.java 6b1d2a1 
  contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroSchemaManager.java 1939d3e 
  contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java c9f7d81 
  contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java e24b495 
  contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java 2fab3f7 
  contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorageUtils.java 040234f 

Diff: https://reviews.apache.org/r/6536/diff/


Testing
-------

New test cases are added as follows:

1) Load/store Avro files that contain recursive records in array, map, union, and another record.
2) Load Avro files that contains recursive records, generate new relations, apply filters, and store them as non-recursive records.
3) Tests for the StoreFunc parameters: no_schema_check, schema, same, schema_file, and data.


Thanks,

Cheolsoo Park


Re: Review Request: PIG-2875 Add recursive record support to AvroStorage

Posted by Cheolsoo Park <ch...@cloudera.com>.

> On Aug. 16, 2012, 7:27 a.m., Santhosh Srinivasan wrote:
> > contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java, line 386
> > <https://reviews.apache.org/r/6536/diff/1/?file=138628#file138628line386>
> >
> >     The no_schema_check can now appear in any position, i.e., its no longer to required for it to be the first argument?

In fact, 'no_schema_check' wasn't used with other options at all. But since I was making it possible to use it with other options, I didn't want to restrict its position. So 'no_schema_check' can be at any position except between a parameter and its value:

no_schema_check, key, value => OK
key, value, no_schema_check => OK
key, no_schema_check, value => not OK


> On Aug. 16, 2012, 7:27 a.m., Santhosh Srinivasan wrote:
> > contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java, line 389
> > <https://reviews.apache.org/r/6536/diff/1/?file=138628#file138628line389>
> >
> >     I am a little concerned with this change. For loop counters, to be conservative, I would avoid making changes in the body of the loop. Its hard to detect these changes and harder to maintain.
> >     
> >     Is there a better way of implementing this?

I agree that we should be careful about counters. But the problem is that the original loop body was implemented with an assumption that every arg is a pair of parameter/value, which is no longer true since I am allowing a parameter only arg 'no_schema_check'. So I think that I have to make some changes to the body of the loop.

Nevertheless, I made increasing counters more explicit in the new patch, so I believe that they are more visible now. Please let me know if you have a better suggestion.


> On Aug. 16, 2012, 7:27 a.m., Santhosh Srinivasan wrote:
> > contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java, line 288
> > <https://reviews.apache.org/r/6536/diff/1/?file=138629#file138629line288>
> >
> >     Can this be changed to return containsGenericUnion(fs, visitedRecords) to be consistent with the other parts of the code?

No, it cannot. It is inside a loop, so we should not return until the loop is over.


> On Aug. 16, 2012, 7:27 a.m., Santhosh Srinivasan wrote:
> > contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java, line 318
> > <https://reviews.apache.org/r/6536/diff/1/?file=138629#file138629line318>
> >
> >     Can this be changed to return containsGenericUnion(fs, visitedRecords) to be consistent with the other parts of the code?

No, it cannot. It is inside a loop, so we should not return until the loop is over.


> On Aug. 16, 2012, 7:27 a.m., Santhosh Srinivasan wrote:
> > contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java, line 866
> > <https://reviews.apache.org/r/6536/diff/1/?file=138630#file138630line866>
> >
> >     Not sure if this holds true for all cases. This is an either or - i.e., if a sequence of jobs has one failure then the assertion will kick in for the first one which is actually a false alarm.

Indeed. I modified the code, so now the number of failed jobs (numOfFailedJobs) is counted. If expectedToFail is true, numOfFailedJobs must be greater than 0; otherwise, it must be equal to 0.


- Cheolsoo


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6536/#review10347
-----------------------------------------------------------


On Aug. 16, 2012, 10:18 p.m., Cheolsoo Park wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/6536/
> -----------------------------------------------------------
> 
> (Updated Aug. 16, 2012, 10:18 p.m.)
> 
> 
> Review request for pig.
> 
> 
> Description
> -------
> 
> Allow recursive records to be loaded/stored by AvroStorage.
> 
> The changes include:
> 
> 1) Remove the recursive record check from AvroSchema2Pig.
> 2) Modofy inconvert() in AvroSchema2Pig so that it can map recursive records to bytearrays.
> 3) Modify containsGenericUnion() in AvroStorageUtils so that it can handle Avro schema that contains recursive records.
> 4) Update the parameter parsing in AvroStorage so that 'no_schema_check' can be passed to both LoadFunc and StoreFunc.
> 5) Add the recursive record check to AvroSchemaManager. This is needed because 'schema_file' and 'data' cannot refer to avro schema that contains recursive records.
> 
> AvroStorage works as follows:
> 
> 1) PigSchema maps recursive records to bytearrays, so there is discrepancy between Avro schema and Pig schema.
> 2) Recursive records are loaded as tuples even though Pig schema defines them as bytearrays and can be referred to by position (e.g. $0, $1.$0, etc).
> 3) To store recursive records, Avro schema must be provided via the 'schema' or 'same' parameter in StoreFunc. In addition, 'no_schema_check' must be enabled because otherwise schema check will fail due to discrepancy between Avro schema and Pig schema.
> 4) Avro schema cannot be specified by the 'data' or 'schema_file' parameter. This is because AvroSchemaManager cannot handle recursive records for now. The recursive record check is added to AvroSchemaManager, so if Avro schema that contains recursive records is specified by these parameters, an exception is thrown.
> 
> 
> This addresses bug PIG-2875.
>     https://issues.apache.org/jira/browse/PIG-2875
> 
> 
> Diffs
> -----
> 
>   contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroSchema2Pig.java 6b1d2a1 
>   contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroSchemaManager.java 1939d3e 
>   contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java c9f7d81 
>   contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java e24b495 
>   contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java 2fab3f7 
>   contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorageUtils.java 040234f 
> 
> Diff: https://reviews.apache.org/r/6536/diff/
> 
> 
> Testing
> -------
> 
> New test cases are added as follows:
> 
> 1) Load/store Avro files that contain recursive records in array, map, union, and another record.
> 2) Load Avro files that contains recursive records, generate new relations, apply filters, and store them as non-recursive records.
> 3) Tests for the StoreFunc parameters: no_schema_check, schema, same, schema_file, and data.
> 
> 
> Thanks,
> 
> Cheolsoo Park
> 
>


Re: Review Request: PIG-2875 Add recursive record support to AvroStorage

Posted by Santhosh Srinivasan <sm...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6536/#review10347
-----------------------------------------------------------


Have a few comments.


contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroSchemaManager.java
<https://reviews.apache.org/r/6536/#comment21961>

    Can you elaborate on this error message? What is the recommended course of action when the user sees this error message?
    
    Minor comment: Please use braces to future proof accidental mistakes in the future, i..e, 
    if () { ...
    }



contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java
<https://reviews.apache.org/r/6536/#comment21967>

    The no_schema_check can now appear in any position, i.e., its no longer to required for it to be the first argument?



contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java
<https://reviews.apache.org/r/6536/#comment21966>

    I am a little concerned with this change. For loop counters, to be conservative, I would avoid making changes in the body of the loop. Its hard to detect these changes and harder to maintain.
    
    Is there a better way of implementing this?



contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java
<https://reviews.apache.org/r/6536/#comment21968>

    Can you use parenthesis to make this more readable?



contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java
<https://reviews.apache.org/r/6536/#comment22058>

    Can this be changed to return containsGenericUnion(fs, visitedRecords) to be consistent with the other parts of the code?



contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java
<https://reviews.apache.org/r/6536/#comment22059>

    Can this be changed to return containsGenericUnion(fs, visitedRecords) to be consistent with the other parts of the code?



contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java
<https://reviews.apache.org/r/6536/#comment22126>

    There should be three rows with the third row being NULL.



contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java
<https://reviews.apache.org/r/6536/#comment22127>

    Can you include a message here. Something like "Negative test to test an exception. Should not be succeeding!" 



contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java
<https://reviews.apache.org/r/6536/#comment22128>

    Can you include a message here. Something like "Negative test to test an exception. Should not be succeeding!" 



contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java
<https://reviews.apache.org/r/6536/#comment22129>

    Can you include a message here. Something like "Negative test to test an exception. Should not be succeeding!" 



contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java
<https://reviews.apache.org/r/6536/#comment22130>

    Not sure if this holds true for all cases. This is an either or - i.e., if a sequence of jobs has one failure then the assertion will kick in for the first one which is actually a false alarm.


- Santhosh Srinivasan


On Aug. 14, 2012, 10:23 a.m., Cheolsoo Park wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/6536/
> -----------------------------------------------------------
> 
> (Updated Aug. 14, 2012, 10:23 a.m.)
> 
> 
> Review request for pig.
> 
> 
> Description
> -------
> 
> Allow recursive records to be loaded/stored by AvroStorage.
> 
> The changes include:
> 
> 1) Remove the recursive record check from AvroSchema2Pig.
> 2) Modofy inconvert() in AvroSchema2Pig so that it can map recursive records to bytearrays.
> 3) Modify containsGenericUnion() in AvroStorageUtils so that it can handle Avro schema that contains recursive records.
> 4) Update the parameter parsing in AvroStorage so that 'no_schema_check' can be passed to both LoadFunc and StoreFunc.
> 5) Add the recursive record check to AvroSchemaManager. This is needed because 'schema_file' and 'data' cannot refer to avro schema that contains recursive records.
> 
> AvroStorage works as follows:
> 
> 1) PigSchema maps recursive records to bytearrays, so there is discrepancy between Avro schema and Pig schema.
> 2) Recursive records are loaded as tuples even though Pig schema defines them as bytearrays and can be referred to by position (e.g. $0, $1.$0, etc).
> 3) To store recursive records, Avro schema must be provided via the 'schema' or 'same' parameter in StoreFunc. In addition, 'no_schema_check' must be enabled because otherwise schema check will fail due to discrepancy between Avro schema and Pig schema.
> 4) Avro schema cannot be specified by the 'data' or 'schema_file' parameter. This is because AvroSchemaManager cannot handle recursive records for now. The recursive record check is added to AvroSchemaManager, so if Avro schema that contains recursive records is specified by these parameters, an exception is thrown.
> 
> 
> This addresses bug PIG-2875.
>     https://issues.apache.org/jira/browse/PIG-2875
> 
> 
> Diffs
> -----
> 
>   contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroSchema2Pig.java 6b1d2a1 
>   contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroSchemaManager.java 1939d3e 
>   contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java c9f7d81 
>   contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java e24b495 
>   contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java 2fab3f7 
>   contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorageUtils.java 040234f 
>   contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/avro_test_files/test_generic_union_schema.avro 4e23e73 
>   contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/avro_test_files/test_recursive_schema.avro 12a36f8 
> 
> Diff: https://reviews.apache.org/r/6536/diff/
> 
> 
> Testing
> -------
> 
> New test cases are added as follows:
> 
> 1) Load/store Avro files that contain recursive records in array, map, union, and another record.
> 2) Load Avro files that contains recursive records, generate new relations, apply filters, and store them as non-recursive records.
> 3) Tests for the StoreFunc parameters: no_schema_check, schema, same, schema_file, and data.
> 
> 
> Thanks,
> 
> Cheolsoo Park
> 
>


Re: Review Request: PIG-2875 Add recursive record support to AvroStorage

Posted by Cheolsoo Park <ch...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6536/
-----------------------------------------------------------

(Updated Aug. 14, 2012, 10:23 a.m.)


Review request for pig.


Summary (updated)
-----------------

PIG-2875 Add recursive record support to AvroStorage


Description
-------

Allow recursive records to be loaded/stored by AvroStorage.

The changes include:

1) Remove the recursive record check from AvroSchema2Pig.
2) Modofy inconvert() in AvroSchema2Pig so that it can map recursive records to bytearrays.
3) Modify containsGenericUnion() in AvroStorageUtils so that it can handle Avro schema that contains recursive records.
4) Update the parameter parsing in AvroStorage so that 'no_schema_check' can be passed to both LoadFunc and StoreFunc.
5) Add the recursive record check to AvroSchemaManager. This is needed because 'schema_file' and 'data' cannot refer to avro schema that contains recursive records.

AvroStorage works as follows:

1) PigSchema maps recursive records to bytearrays, so there is discrepancy between Avro schema and Pig schema.
2) Recursive records are loaded as tuples even though Pig schema defines them as bytearrays and can be referred to by position (e.g. $0, $1.$0, etc).
3) To store recursive records, Avro schema must be provided via the 'schema' or 'same' parameter in StoreFunc. In addition, 'no_schema_check' must be enabled because otherwise schema check will fail due to discrepancy between Avro schema and Pig schema.
4) Avro schema cannot be specified by the 'data' or 'schema_file' parameter. This is because AvroSchemaManager cannot handle recursive records for now. The recursive record check is added to AvroSchemaManager, so if Avro schema that contains recursive records is specified by these parameters, an exception is thrown.


This addresses bug PIG-2875.
    https://issues.apache.org/jira/browse/PIG-2875


Diffs
-----

  contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroSchema2Pig.java 6b1d2a1 
  contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroSchemaManager.java 1939d3e 
  contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java c9f7d81 
  contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java e24b495 
  contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java 2fab3f7 
  contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorageUtils.java 040234f 
  contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/avro_test_files/test_generic_union_schema.avro 4e23e73 
  contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/avro_test_files/test_recursive_schema.avro 12a36f8 

Diff: https://reviews.apache.org/r/6536/diff/


Testing
-------

New test cases are added as follows:

1) Load/store Avro files that contain recursive records in array, map, union, and another record.
2) Load Avro files that contains recursive records, generate new relations, apply filters, and store them as non-recursive records.
3) Tests for the StoreFunc parameters: no_schema_check, schema, same, schema_file, and data.


Thanks,

Cheolsoo Park


Re: Review Request: PIG-2869 Add recursive record support to AvroStorage

Posted by Cheolsoo Park <ch...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6536/
-----------------------------------------------------------

(Updated Aug. 14, 2012, 10:22 a.m.)


Review request for pig.


Description
-------

Allow recursive records to be loaded/stored by AvroStorage.

The changes include:

1) Remove the recursive record check from AvroSchema2Pig.
2) Modofy inconvert() in AvroSchema2Pig so that it can map recursive records to bytearrays.
3) Modify containsGenericUnion() in AvroStorageUtils so that it can handle Avro schema that contains recursive records.
4) Update the parameter parsing in AvroStorage so that 'no_schema_check' can be passed to both LoadFunc and StoreFunc.
5) Add the recursive record check to AvroSchemaManager. This is needed because 'schema_file' and 'data' cannot refer to avro schema that contains recursive records.

AvroStorage works as follows:

1) PigSchema maps recursive records to bytearrays, so there is discrepancy between Avro schema and Pig schema.
2) Recursive records are loaded as tuples even though Pig schema defines them as bytearrays and can be referred to by position (e.g. $0, $1.$0, etc).
3) To store recursive records, Avro schema must be provided via the 'schema' or 'same' parameter in StoreFunc. In addition, 'no_schema_check' must be enabled because otherwise schema check will fail due to discrepancy between Avro schema and Pig schema.
4) Avro schema cannot be specified by the 'data' or 'schema_file' parameter. This is because AvroSchemaManager cannot handle recursive records for now. The recursive record check is added to AvroSchemaManager, so if Avro schema that contains recursive records is specified by these parameters, an exception is thrown.


This addresses bug PIG-2875.
    https://issues.apache.org/jira/browse/PIG-2875


Diffs
-----

  contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroSchema2Pig.java 6b1d2a1 
  contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroSchemaManager.java 1939d3e 
  contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java c9f7d81 
  contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java e24b495 
  contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java 2fab3f7 
  contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorageUtils.java 040234f 
  contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/avro_test_files/test_generic_union_schema.avro 4e23e73 
  contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/avro_test_files/test_recursive_schema.avro 12a36f8 

Diff: https://reviews.apache.org/r/6536/diff/


Testing
-------

New test cases are added as follows:

1) Load/store Avro files that contain recursive records in array, map, union, and another record.
2) Load Avro files that contains recursive records, generate new relations, apply filters, and store them as non-recursive records.
3) Tests for the StoreFunc parameters: no_schema_check, schema, same, schema_file, and data.


Thanks,

Cheolsoo Park