You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by Cheolsoo Park <ch...@cloudera.com> on 2012/07/15 04:51:52 UTC

Review Request: PIG-2492 AvroStorage should recognize globs and commas

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5936/
-----------------------------------------------------------

Review request for pig.


Description
-------

Add glob support to AvroStorage:

https://issues.apache.org/jira/browse/PIG-2492


This addresses bug PIG-2492.
    https://issues.apache.org/jira/browse/PIG-2492


Diffs
-----

  contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java 0f8ef27 
  contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java c7de726 
  contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java 48b093b 

Diff: https://reviews.apache.org/r/5936/diff/


Testing
-------

1. Added new unit tests as follows:

- testDir verifies that AvroStorage recursively loads files in a directory and its sub-directories.
- testGlob1 to 3 verify that glob patterns are expanded properly.

To run the tests, please do the following:

wget https://issues.apache.org/jira/secure/attachment/12536534/avro_test_files.tar.gz 
tar -xf avro_test_files.tar.gz
ant clean compile-test piggybank -Dhadoopversion=20
cd contrib/piggybank/java
ant test -Dtestcase=TestAvroStorage

2. Both TestAvroStorage and TestAvroStorageUtils pass.


Thanks,

Cheolsoo Park


Re: Review Request: PIG-2492 AvroStorage should recognize globs and commas

Posted by Cheolsoo Park <ch...@cloudera.com>.

> On July 18, 2012, 2:56 a.m., Santhosh Srinivasan wrote:
> > Needs more test cases to increase code coverage. There are no negative tests.

Thank you very much for reviewing my patch! I will add more tests as you suggested.


> On July 18, 2012, 2:56 a.m., Santhosh Srinivasan wrote:
> > contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java, line 69
> > <https://reviews.apache.org/r/5936/diff/1/?file=122600#file122600line69>
> >
> >     What if you have test_glob{3,2,1}.avro - will the test case fail?

No, it doesn't. The order doesn't matter. Nevertheless, I will add a test case for that.


- Cheolsoo


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5936/#review9233
-----------------------------------------------------------


On July 15, 2012, 2:51 a.m., Cheolsoo Park wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/5936/
> -----------------------------------------------------------
> 
> (Updated July 15, 2012, 2:51 a.m.)
> 
> 
> Review request for pig.
> 
> 
> Description
> -------
> 
> Add glob support to AvroStorage:
> 
> https://issues.apache.org/jira/browse/PIG-2492
> 
> 
> This addresses bug PIG-2492.
>     https://issues.apache.org/jira/browse/PIG-2492
> 
> 
> Diffs
> -----
> 
>   contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java 0f8ef27 
>   contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java c7de726 
>   contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java 48b093b 
> 
> Diff: https://reviews.apache.org/r/5936/diff/
> 
> 
> Testing
> -------
> 
> 1. Added new unit tests as follows:
> 
> - testDir verifies that AvroStorage recursively loads files in a directory and its sub-directories.
> - testGlob1 to 3 verify that glob patterns are expanded properly.
> 
> To run the tests, please do the following:
> 
> wget https://issues.apache.org/jira/secure/attachment/12536534/avro_test_files.tar.gz 
> tar -xf avro_test_files.tar.gz
> ant clean compile-test piggybank -Dhadoopversion=20
> cd contrib/piggybank/java
> ant test -Dtestcase=TestAvroStorage
> 
> 2. Both TestAvroStorage and TestAvroStorageUtils pass.
> 
> 
> Thanks,
> 
> Cheolsoo Park
> 
>


Re: Review Request: PIG-2492 AvroStorage should recognize globs and commas

Posted by Santhosh Srinivasan <sm...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5936/#review9233
-----------------------------------------------------------


Needs more test cases to increase code coverage. There are no negative tests.


contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java
<https://reviews.apache.org/r/5936/#comment19778>

    Need a test case to cover this code path



contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java
<https://reviews.apache.org/r/5936/#comment19779>

    Can you add a test case to cover this code path?



contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java
<https://reviews.apache.org/r/5936/#comment19777>

    What if you have test_glob{3,2,1}.avro - will the test case fail?


- Santhosh Srinivasan


On July 15, 2012, 2:51 a.m., Cheolsoo Park wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/5936/
> -----------------------------------------------------------
> 
> (Updated July 15, 2012, 2:51 a.m.)
> 
> 
> Review request for pig.
> 
> 
> Description
> -------
> 
> Add glob support to AvroStorage:
> 
> https://issues.apache.org/jira/browse/PIG-2492
> 
> 
> This addresses bug PIG-2492.
>     https://issues.apache.org/jira/browse/PIG-2492
> 
> 
> Diffs
> -----
> 
>   contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java 0f8ef27 
>   contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java c7de726 
>   contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java 48b093b 
> 
> Diff: https://reviews.apache.org/r/5936/diff/
> 
> 
> Testing
> -------
> 
> 1. Added new unit tests as follows:
> 
> - testDir verifies that AvroStorage recursively loads files in a directory and its sub-directories.
> - testGlob1 to 3 verify that glob patterns are expanded properly.
> 
> To run the tests, please do the following:
> 
> wget https://issues.apache.org/jira/secure/attachment/12536534/avro_test_files.tar.gz 
> tar -xf avro_test_files.tar.gz
> ant clean compile-test piggybank -Dhadoopversion=20
> cd contrib/piggybank/java
> ant test -Dtestcase=TestAvroStorage
> 
> 2. Both TestAvroStorage and TestAvroStorageUtils pass.
> 
> 
> Thanks,
> 
> Cheolsoo Park
> 
>


Re: Review Request: PIG-2492 AvroStorage should recognize globs and commas

Posted by Cheolsoo Park <ch...@cloudera.com>.

> On July 20, 2012, 2:46 a.m., Santhosh Srinivasan wrote:
> > contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java, line 200
> > <https://reviews.apache.org/r/5936/diff/1-2/?file=122600#file122600line200>
> >
> >     Can you add an assert here that checks for the expected message from the exception?

The exception thrown by AvroStorage is catched by the Pig backend, and JobCreationException is re-thrown as follows. I added an assert that compares the message of the exception against JobCreationException.

Internal error creating job configuration.
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobCreationException: ERROR 2017: Internal error creating job configuration.
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:756)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:281)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:179)
    at org.apache.pig.PigServer.launchPlan(PigServer.java:1260)
    at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1245)
    at org.apache.pig.PigServer.execute(PigServer.java:1235)
    at org.apache.pig.PigServer.executeBatch(PigServer.java:329)
    at org.apache.pig.piggybank.test.storage.avro.TestAvroStorage.testAvroStorage(TestAvroStorage.java:495)
    at org.apache.pig.piggybank.test.storage.avro.TestAvroStorage.testGlob6(TestAvroStorage.java:197)
Caused by: java.io.IOException: Input path 'file:///home/cheolsoo/workspace/pig-git/contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/avro_test_files/test_dir{1,2}/file_that_does_not_exist*.avro' is not found
    at org.apache.pig.piggybank.storage.avro.AvroStorage.setLocation(AvroStorage.java:142)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:403)


> On July 20, 2012, 2:46 a.m., Santhosh Srinivasan wrote:
> > contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorageUtils.java, line 161
> > <https://reviews.apache.org/r/5936/diff/2/?file=124635#file124635line161>
> >
> >     Can you add an assert to check for the message from the exception?

The message of the exception looks like this. Since this is a bit too long, I simply confirm that it contains 'Illegal file pattern'.

Illegal file pattern: Unclosed group near index 3
{1,
   ^


> On July 20, 2012, 2:46 a.m., Santhosh Srinivasan wrote:
> > contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorageUtils.java, line 163
> > <https://reviews.apache.org/r/5936/diff/2/?file=124635#file124635line163>
> >
> >     Why is this check required? What are the guarantees for concretePath being null. There is an assumption here that concretePath is/was null. It could be due to the previous test setting concretePath to null.

This was a mistake. I removed it.


> On July 20, 2012, 2:46 a.m., Santhosh Srinivasan wrote:
> > contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java, line 173
> > <https://reviews.apache.org/r/5936/diff/1-2/?file=122600#file122600line173>
> >
> >     escaped

Fixed


> On July 20, 2012, 2:46 a.m., Santhosh Srinivasan wrote:
> > contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java, line 158
> > <https://reviews.apache.org/r/5936/diff/1-2/?file=122600#file122600line158>
> >
> >     escaped

Fixed


> On July 20, 2012, 2:46 a.m., Santhosh Srinivasan wrote:
> > contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java, line 143
> > <https://reviews.apache.org/r/5936/diff/1-2/?file=122600#file122600line143>
> >
> >     escaped

Fixed


> On July 20, 2012, 2:46 a.m., Santhosh Srinivasan wrote:
> > contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java, line 128
> > <https://reviews.apache.org/r/5936/diff/1-2/?file=122600#file122600line128>
> >
> >     escaped

Fixed


> On July 20, 2012, 2:46 a.m., Santhosh Srinivasan wrote:
> > contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java, line 71
> > <https://reviews.apache.org/r/5936/diff/1-2/?file=122600#file122600line71>
> >
> >     Looks like this variable is not used.

Removed


- Cheolsoo


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5936/#review9293
-----------------------------------------------------------


On July 19, 2012, 1:23 a.m., Cheolsoo Park wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/5936/
> -----------------------------------------------------------
> 
> (Updated July 19, 2012, 1:23 a.m.)
> 
> 
> Review request for pig.
> 
> 
> Description
> -------
> 
> Add glob support to AvroStorage:
> 
> https://issues.apache.org/jira/browse/PIG-2492
> 
> 
> This addresses bug PIG-2492.
>     https://issues.apache.org/jira/browse/PIG-2492
> 
> 
> Diffs
> -----
> 
>   contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java 0f8ef27 
>   contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java c7de726 
>   contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java 48b093b 
>   contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorageUtils.java e5d0c38 
> 
> Diff: https://reviews.apache.org/r/5936/diff/
> 
> 
> Testing
> -------
> 
> 1. Added new unit tests as follows:
> 
> - testDir verifies that AvroStorage recursively loads files in a directory and its sub-directories.
> - testGlob1 to 3 verify that glob patterns are expanded properly.
> 
> To run the tests, please do the following:
> 
> wget https://issues.apache.org/jira/secure/attachment/12536534/avro_test_files.tar.gz 
> tar -xf avro_test_files.tar.gz
> ant clean compile-test piggybank -Dhadoopversion=20
> cd contrib/piggybank/java
> ant test -Dtestcase=TestAvroStorage
> 
> 2. Both TestAvroStorage and TestAvroStorageUtils pass.
> 
> 
> Thanks,
> 
> Cheolsoo Park
> 
>


Re: Review Request: PIG-2492 AvroStorage should recognize globs and commas

Posted by Santhosh Srinivasan <sm...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5936/#review9293
-----------------------------------------------------------


Some more comments.


contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java
<https://reviews.apache.org/r/5936/#comment19908>

    Looks like this variable is not used.



contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java
<https://reviews.apache.org/r/5936/#comment19907>

    escaped



contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java
<https://reviews.apache.org/r/5936/#comment19906>

    escaped



contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java
<https://reviews.apache.org/r/5936/#comment19905>

    escaped



contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java
<https://reviews.apache.org/r/5936/#comment19904>

    escaped



contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java
<https://reviews.apache.org/r/5936/#comment19901>

    Can you add an assert here that checks for the expected message from the exception?



contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorageUtils.java
<https://reviews.apache.org/r/5936/#comment19903>

    Can you add an assert to check for the message from the exception?



contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorageUtils.java
<https://reviews.apache.org/r/5936/#comment19902>

    Why is this check required? What are the guarantees for concretePath being null. There is an assumption here that concretePath is/was null. It could be due to the previous test setting concretePath to null.


- Santhosh Srinivasan


On July 19, 2012, 1:23 a.m., Cheolsoo Park wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/5936/
> -----------------------------------------------------------
> 
> (Updated July 19, 2012, 1:23 a.m.)
> 
> 
> Review request for pig.
> 
> 
> Description
> -------
> 
> Add glob support to AvroStorage:
> 
> https://issues.apache.org/jira/browse/PIG-2492
> 
> 
> This addresses bug PIG-2492.
>     https://issues.apache.org/jira/browse/PIG-2492
> 
> 
> Diffs
> -----
> 
>   contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java 0f8ef27 
>   contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java c7de726 
>   contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java 48b093b 
>   contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorageUtils.java e5d0c38 
> 
> Diff: https://reviews.apache.org/r/5936/diff/
> 
> 
> Testing
> -------
> 
> 1. Added new unit tests as follows:
> 
> - testDir verifies that AvroStorage recursively loads files in a directory and its sub-directories.
> - testGlob1 to 3 verify that glob patterns are expanded properly.
> 
> To run the tests, please do the following:
> 
> wget https://issues.apache.org/jira/secure/attachment/12536534/avro_test_files.tar.gz 
> tar -xf avro_test_files.tar.gz
> ant clean compile-test piggybank -Dhadoopversion=20
> cd contrib/piggybank/java
> ant test -Dtestcase=TestAvroStorage
> 
> 2. Both TestAvroStorage and TestAvroStorageUtils pass.
> 
> 
> Thanks,
> 
> Cheolsoo Park
> 
>


Re: Review Request: PIG-2492 AvroStorage should recognize globs and commas

Posted by Santhosh Srinivasan <sm...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5936/#review9320
-----------------------------------------------------------

Ship it!


Ship It!

- Santhosh Srinivasan


On July 20, 2012, 4:36 a.m., Cheolsoo Park wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/5936/
> -----------------------------------------------------------
> 
> (Updated July 20, 2012, 4:36 a.m.)
> 
> 
> Review request for pig.
> 
> 
> Description
> -------
> 
> Add glob support to AvroStorage:
> 
> https://issues.apache.org/jira/browse/PIG-2492
> 
> 
> This addresses bug PIG-2492.
>     https://issues.apache.org/jira/browse/PIG-2492
> 
> 
> Diffs
> -----
> 
>   contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java 0f8ef27 
>   contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java c7de726 
>   contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java 48b093b 
>   contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorageUtils.java e5d0c38 
> 
> Diff: https://reviews.apache.org/r/5936/diff/
> 
> 
> Testing
> -------
> 
> 1. Added new unit tests as follows:
> 
> - testDir verifies that AvroStorage recursively loads files in a directory and its sub-directories.
> - testGlob1 to 3 verify that glob patterns are expanded properly.
> 
> To run the tests, please do the following:
> 
> wget https://issues.apache.org/jira/secure/attachment/12536534/avro_test_files.tar.gz 
> tar -xf avro_test_files.tar.gz
> ant clean compile-test piggybank -Dhadoopversion=20
> cd contrib/piggybank/java
> ant test -Dtestcase=TestAvroStorage
> 
> 2. Both TestAvroStorage and TestAvroStorageUtils pass.
> 
> 
> Thanks,
> 
> Cheolsoo Park
> 
>


Re: Review Request: PIG-2492 AvroStorage should recognize globs and commas

Posted by Santhosh Srinivasan <sm...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5936/#review9340
-----------------------------------------------------------

Ship it!


+1 Will commit it if all the tests pass.

- Santhosh Srinivasan


On July 21, 2012, 10:27 p.m., Cheolsoo Park wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/5936/
> -----------------------------------------------------------
> 
> (Updated July 21, 2012, 10:27 p.m.)
> 
> 
> Review request for pig.
> 
> 
> Description
> -------
> 
> Add glob support to AvroStorage:
> 
> https://issues.apache.org/jira/browse/PIG-2492
> 
> 
> This addresses bug PIG-2492.
>     https://issues.apache.org/jira/browse/PIG-2492
> 
> 
> Diffs
> -----
> 
>   contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java 0f8ef27 
>   contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java c7de726 
>   contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java 48b093b 
>   contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorageUtils.java e5d0c38 
> 
> Diff: https://reviews.apache.org/r/5936/diff/
> 
> 
> Testing
> -------
> 
> 1. Added new unit tests as follows:
> 
> - testDir verifies that AvroStorage recursively loads files in a directory and its sub-directories.
> - testGlob1 to 3 verify that glob patterns are expanded properly.
> 
> To run the tests, please do the following:
> 
> wget https://issues.apache.org/jira/secure/attachment/12536534/avro_test_files.tar.gz 
> tar -xf avro_test_files.tar.gz
> ant clean compile-test piggybank -Dhadoopversion=20
> cd contrib/piggybank/java
> ant test -Dtestcase=TestAvroStorage
> 
> 2. Both TestAvroStorage and TestAvroStorageUtils pass.
> 
> 
> Thanks,
> 
> Cheolsoo Park
> 
>


Re: Review Request: PIG-2492 AvroStorage should recognize globs and commas

Posted by Cheolsoo Park <ch...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5936/
-----------------------------------------------------------

(Updated July 21, 2012, 10:27 p.m.)


Review request for pig.


Changes
-------

It turned out that testGlob1 fails when the test .avro files are added by svn. This was because '*' matches every file including .svn. I fixed this problem by applying PATHP_FILTER to fs.globStatus() and confirmed that all tests pass even when test .avro files are added by svn.

Thank you very much for finding this problem Santhosh!


Description
-------

Add glob support to AvroStorage:

https://issues.apache.org/jira/browse/PIG-2492


This addresses bug PIG-2492.
    https://issues.apache.org/jira/browse/PIG-2492


Diffs (updated)
-----

  contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java 0f8ef27 
  contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java c7de726 
  contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java 48b093b 
  contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorageUtils.java e5d0c38 

Diff: https://reviews.apache.org/r/5936/diff/


Testing
-------

1. Added new unit tests as follows:

- testDir verifies that AvroStorage recursively loads files in a directory and its sub-directories.
- testGlob1 to 3 verify that glob patterns are expanded properly.

To run the tests, please do the following:

wget https://issues.apache.org/jira/secure/attachment/12536534/avro_test_files.tar.gz 
tar -xf avro_test_files.tar.gz
ant clean compile-test piggybank -Dhadoopversion=20
cd contrib/piggybank/java
ant test -Dtestcase=TestAvroStorage

2. Both TestAvroStorage and TestAvroStorageUtils pass.


Thanks,

Cheolsoo Park


Re: Review Request: PIG-2492 AvroStorage should recognize globs and commas

Posted by Cheolsoo Park <ch...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5936/
-----------------------------------------------------------

(Updated July 20, 2012, 4:36 a.m.)


Review request for pig.


Changes
-------

Incorporated Santhosh's comments.


Description
-------

Add glob support to AvroStorage:

https://issues.apache.org/jira/browse/PIG-2492


This addresses bug PIG-2492.
    https://issues.apache.org/jira/browse/PIG-2492


Diffs (updated)
-----

  contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java 0f8ef27 
  contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java c7de726 
  contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java 48b093b 
  contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorageUtils.java e5d0c38 

Diff: https://reviews.apache.org/r/5936/diff/


Testing
-------

1. Added new unit tests as follows:

- testDir verifies that AvroStorage recursively loads files in a directory and its sub-directories.
- testGlob1 to 3 verify that glob patterns are expanded properly.

To run the tests, please do the following:

wget https://issues.apache.org/jira/secure/attachment/12536534/avro_test_files.tar.gz 
tar -xf avro_test_files.tar.gz
ant clean compile-test piggybank -Dhadoopversion=20
cd contrib/piggybank/java
ant test -Dtestcase=TestAvroStorage

2. Both TestAvroStorage and TestAvroStorageUtils pass.


Thanks,

Cheolsoo Park


Re: Review Request: PIG-2492 AvroStorage should recognize globs and commas

Posted by Cheolsoo Park <ch...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5936/
-----------------------------------------------------------

(Updated July 19, 2012, 1:23 a.m.)


Review request for pig.


Changes
-------

1) Added more unit tests including some negative tests.

2) Removed getPathsFromString() because I realized that fs.globStatus() implicitly expands comma-separated string into paths, so it is redundant to explicitly do it.

3) Changed the type of 1st parameter of getAllSubDirs() from URI to hadoop.fs.Path. This is needed because '{' and '}' are not allowed in URI, so URI.create() throws a URISyntaxException on a glob pattern. But these characters are automatically escaped when constructing a Path. Note that this wasn't an issue in my previous patch because getPathsFromString() used to implicitly convert a glob pattern to paths, but now I removed getPathsFromString() and have to do it explicitly.

In fact, this reverts some changes made by PIG-2540 (https://issues.apache.org/jira/browse/PIG-2540). However, this does not break S3 support because inside getAllSubDirs(), file system is still constructed for the given URI, and globStatus() is called on that file system.

FileSystem fs = FileSystem.get(path.toUri(), job.getConfiguration());
FileStatus[] matchedFiles = fs.globStatus(path);

So if path is a s3 URI, S3 file system will be used.


Description
-------

Add glob support to AvroStorage:

https://issues.apache.org/jira/browse/PIG-2492


This addresses bug PIG-2492.
    https://issues.apache.org/jira/browse/PIG-2492


Diffs (updated)
-----

  contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java 0f8ef27 
  contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java c7de726 
  contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java 48b093b 
  contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorageUtils.java e5d0c38 

Diff: https://reviews.apache.org/r/5936/diff/


Testing
-------

1. Added new unit tests as follows:

- testDir verifies that AvroStorage recursively loads files in a directory and its sub-directories.
- testGlob1 to 3 verify that glob patterns are expanded properly.

To run the tests, please do the following:

wget https://issues.apache.org/jira/secure/attachment/12536534/avro_test_files.tar.gz 
tar -xf avro_test_files.tar.gz
ant clean compile-test piggybank -Dhadoopversion=20
cd contrib/piggybank/java
ant test -Dtestcase=TestAvroStorage

2. Both TestAvroStorage and TestAvroStorageUtils pass.


Thanks,

Cheolsoo Park