You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by Cheolsoo Park <ch...@cloudera.com> on 2012/07/15 04:51:52 UTC
Review Request: PIG-2492 AvroStorage should recognize globs and commas
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5936/
-----------------------------------------------------------
Review request for pig.
Description
-------
Add glob support to AvroStorage:
https://issues.apache.org/jira/browse/PIG-2492
This addresses bug PIG-2492.
https://issues.apache.org/jira/browse/PIG-2492
Diffs
-----
contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java 0f8ef27
contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java c7de726
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java 48b093b
Diff: https://reviews.apache.org/r/5936/diff/
Testing
-------
1. Added new unit tests as follows:
- testDir verifies that AvroStorage recursively loads files in a directory and its sub-directories.
- testGlob1 to 3 verify that glob patterns are expanded properly.
To run the tests, please do the following:
wget https://issues.apache.org/jira/secure/attachment/12536534/avro_test_files.tar.gz
tar -xf avro_test_files.tar.gz
ant clean compile-test piggybank -Dhadoopversion=20
cd contrib/piggybank/java
ant test -Dtestcase=TestAvroStorage
2. Both TestAvroStorage and TestAvroStorageUtils pass.
Thanks,
Cheolsoo Park
Re: Review Request: PIG-2492 AvroStorage should recognize globs and commas
Posted by Cheolsoo Park <ch...@cloudera.com>.
> On July 18, 2012, 2:56 a.m., Santhosh Srinivasan wrote:
> > Needs more test cases to increase code coverage. There are no negative tests.
Thank you very much for reviewing my patch! I will add more tests as you suggested.
> On July 18, 2012, 2:56 a.m., Santhosh Srinivasan wrote:
> > contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java, line 69
> > <https://reviews.apache.org/r/5936/diff/1/?file=122600#file122600line69>
> >
> > What if you have test_glob{3,2,1}.avro - will the test case fail?
No, it doesn't. The order doesn't matter. Nevertheless, I will add a test case for that.
- Cheolsoo
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5936/#review9233
-----------------------------------------------------------
On July 15, 2012, 2:51 a.m., Cheolsoo Park wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/5936/
> -----------------------------------------------------------
>
> (Updated July 15, 2012, 2:51 a.m.)
>
>
> Review request for pig.
>
>
> Description
> -------
>
> Add glob support to AvroStorage:
>
> https://issues.apache.org/jira/browse/PIG-2492
>
>
> This addresses bug PIG-2492.
> https://issues.apache.org/jira/browse/PIG-2492
>
>
> Diffs
> -----
>
> contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java 0f8ef27
> contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java c7de726
> contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java 48b093b
>
> Diff: https://reviews.apache.org/r/5936/diff/
>
>
> Testing
> -------
>
> 1. Added new unit tests as follows:
>
> - testDir verifies that AvroStorage recursively loads files in a directory and its sub-directories.
> - testGlob1 to 3 verify that glob patterns are expanded properly.
>
> To run the tests, please do the following:
>
> wget https://issues.apache.org/jira/secure/attachment/12536534/avro_test_files.tar.gz
> tar -xf avro_test_files.tar.gz
> ant clean compile-test piggybank -Dhadoopversion=20
> cd contrib/piggybank/java
> ant test -Dtestcase=TestAvroStorage
>
> 2. Both TestAvroStorage and TestAvroStorageUtils pass.
>
>
> Thanks,
>
> Cheolsoo Park
>
>
Re: Review Request: PIG-2492 AvroStorage should recognize globs and commas
Posted by Santhosh Srinivasan <sm...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5936/#review9233
-----------------------------------------------------------
Needs more test cases to increase code coverage. There are no negative tests.
contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java
<https://reviews.apache.org/r/5936/#comment19778>
Need a test case to cover this code path
contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java
<https://reviews.apache.org/r/5936/#comment19779>
Can you add a test case to cover this code path?
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java
<https://reviews.apache.org/r/5936/#comment19777>
What if you have test_glob{3,2,1}.avro - will the test case fail?
- Santhosh Srinivasan
On July 15, 2012, 2:51 a.m., Cheolsoo Park wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/5936/
> -----------------------------------------------------------
>
> (Updated July 15, 2012, 2:51 a.m.)
>
>
> Review request for pig.
>
>
> Description
> -------
>
> Add glob support to AvroStorage:
>
> https://issues.apache.org/jira/browse/PIG-2492
>
>
> This addresses bug PIG-2492.
> https://issues.apache.org/jira/browse/PIG-2492
>
>
> Diffs
> -----
>
> contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java 0f8ef27
> contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java c7de726
> contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java 48b093b
>
> Diff: https://reviews.apache.org/r/5936/diff/
>
>
> Testing
> -------
>
> 1. Added new unit tests as follows:
>
> - testDir verifies that AvroStorage recursively loads files in a directory and its sub-directories.
> - testGlob1 to 3 verify that glob patterns are expanded properly.
>
> To run the tests, please do the following:
>
> wget https://issues.apache.org/jira/secure/attachment/12536534/avro_test_files.tar.gz
> tar -xf avro_test_files.tar.gz
> ant clean compile-test piggybank -Dhadoopversion=20
> cd contrib/piggybank/java
> ant test -Dtestcase=TestAvroStorage
>
> 2. Both TestAvroStorage and TestAvroStorageUtils pass.
>
>
> Thanks,
>
> Cheolsoo Park
>
>
Re: Review Request: PIG-2492 AvroStorage should recognize globs and commas
Posted by Cheolsoo Park <ch...@cloudera.com>.
> On July 20, 2012, 2:46 a.m., Santhosh Srinivasan wrote:
> > contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java, line 200
> > <https://reviews.apache.org/r/5936/diff/1-2/?file=122600#file122600line200>
> >
> > Can you add an assert here that checks for the expected message from the exception?
The exception thrown by AvroStorage is catched by the Pig backend, and JobCreationException is re-thrown as follows. I added an assert that compares the message of the exception against JobCreationException.
Internal error creating job configuration.
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobCreationException: ERROR 2017: Internal error creating job configuration.
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:756)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:281)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:179)
at org.apache.pig.PigServer.launchPlan(PigServer.java:1260)
at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1245)
at org.apache.pig.PigServer.execute(PigServer.java:1235)
at org.apache.pig.PigServer.executeBatch(PigServer.java:329)
at org.apache.pig.piggybank.test.storage.avro.TestAvroStorage.testAvroStorage(TestAvroStorage.java:495)
at org.apache.pig.piggybank.test.storage.avro.TestAvroStorage.testGlob6(TestAvroStorage.java:197)
Caused by: java.io.IOException: Input path 'file:///home/cheolsoo/workspace/pig-git/contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/avro_test_files/test_dir{1,2}/file_that_does_not_exist*.avro' is not found
at org.apache.pig.piggybank.storage.avro.AvroStorage.setLocation(AvroStorage.java:142)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:403)
> On July 20, 2012, 2:46 a.m., Santhosh Srinivasan wrote:
> > contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorageUtils.java, line 161
> > <https://reviews.apache.org/r/5936/diff/2/?file=124635#file124635line161>
> >
> > Can you add an assert to check for the message from the exception?
The message of the exception looks like this. Since this is a bit too long, I simply confirm that it contains 'Illegal file pattern'.
Illegal file pattern: Unclosed group near index 3
{1,
^
> On July 20, 2012, 2:46 a.m., Santhosh Srinivasan wrote:
> > contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorageUtils.java, line 163
> > <https://reviews.apache.org/r/5936/diff/2/?file=124635#file124635line163>
> >
> > Why is this check required? What are the guarantees for concretePath being null. There is an assumption here that concretePath is/was null. It could be due to the previous test setting concretePath to null.
This was a mistake. I removed it.
> On July 20, 2012, 2:46 a.m., Santhosh Srinivasan wrote:
> > contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java, line 173
> > <https://reviews.apache.org/r/5936/diff/1-2/?file=122600#file122600line173>
> >
> > escaped
Fixed
> On July 20, 2012, 2:46 a.m., Santhosh Srinivasan wrote:
> > contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java, line 158
> > <https://reviews.apache.org/r/5936/diff/1-2/?file=122600#file122600line158>
> >
> > escaped
Fixed
> On July 20, 2012, 2:46 a.m., Santhosh Srinivasan wrote:
> > contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java, line 143
> > <https://reviews.apache.org/r/5936/diff/1-2/?file=122600#file122600line143>
> >
> > escaped
Fixed
> On July 20, 2012, 2:46 a.m., Santhosh Srinivasan wrote:
> > contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java, line 128
> > <https://reviews.apache.org/r/5936/diff/1-2/?file=122600#file122600line128>
> >
> > escaped
Fixed
> On July 20, 2012, 2:46 a.m., Santhosh Srinivasan wrote:
> > contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java, line 71
> > <https://reviews.apache.org/r/5936/diff/1-2/?file=122600#file122600line71>
> >
> > Looks like this variable is not used.
Removed
- Cheolsoo
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5936/#review9293
-----------------------------------------------------------
On July 19, 2012, 1:23 a.m., Cheolsoo Park wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/5936/
> -----------------------------------------------------------
>
> (Updated July 19, 2012, 1:23 a.m.)
>
>
> Review request for pig.
>
>
> Description
> -------
>
> Add glob support to AvroStorage:
>
> https://issues.apache.org/jira/browse/PIG-2492
>
>
> This addresses bug PIG-2492.
> https://issues.apache.org/jira/browse/PIG-2492
>
>
> Diffs
> -----
>
> contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java 0f8ef27
> contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java c7de726
> contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java 48b093b
> contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorageUtils.java e5d0c38
>
> Diff: https://reviews.apache.org/r/5936/diff/
>
>
> Testing
> -------
>
> 1. Added new unit tests as follows:
>
> - testDir verifies that AvroStorage recursively loads files in a directory and its sub-directories.
> - testGlob1 to 3 verify that glob patterns are expanded properly.
>
> To run the tests, please do the following:
>
> wget https://issues.apache.org/jira/secure/attachment/12536534/avro_test_files.tar.gz
> tar -xf avro_test_files.tar.gz
> ant clean compile-test piggybank -Dhadoopversion=20
> cd contrib/piggybank/java
> ant test -Dtestcase=TestAvroStorage
>
> 2. Both TestAvroStorage and TestAvroStorageUtils pass.
>
>
> Thanks,
>
> Cheolsoo Park
>
>
Re: Review Request: PIG-2492 AvroStorage should recognize globs and commas
Posted by Santhosh Srinivasan <sm...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5936/#review9293
-----------------------------------------------------------
Some more comments.
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java
<https://reviews.apache.org/r/5936/#comment19908>
Looks like this variable is not used.
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java
<https://reviews.apache.org/r/5936/#comment19907>
escaped
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java
<https://reviews.apache.org/r/5936/#comment19906>
escaped
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java
<https://reviews.apache.org/r/5936/#comment19905>
escaped
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java
<https://reviews.apache.org/r/5936/#comment19904>
escaped
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java
<https://reviews.apache.org/r/5936/#comment19901>
Can you add an assert here that checks for the expected message from the exception?
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorageUtils.java
<https://reviews.apache.org/r/5936/#comment19903>
Can you add an assert to check for the message from the exception?
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorageUtils.java
<https://reviews.apache.org/r/5936/#comment19902>
Why is this check required? What are the guarantees for concretePath being null. There is an assumption here that concretePath is/was null. It could be due to the previous test setting concretePath to null.
- Santhosh Srinivasan
On July 19, 2012, 1:23 a.m., Cheolsoo Park wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/5936/
> -----------------------------------------------------------
>
> (Updated July 19, 2012, 1:23 a.m.)
>
>
> Review request for pig.
>
>
> Description
> -------
>
> Add glob support to AvroStorage:
>
> https://issues.apache.org/jira/browse/PIG-2492
>
>
> This addresses bug PIG-2492.
> https://issues.apache.org/jira/browse/PIG-2492
>
>
> Diffs
> -----
>
> contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java 0f8ef27
> contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java c7de726
> contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java 48b093b
> contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorageUtils.java e5d0c38
>
> Diff: https://reviews.apache.org/r/5936/diff/
>
>
> Testing
> -------
>
> 1. Added new unit tests as follows:
>
> - testDir verifies that AvroStorage recursively loads files in a directory and its sub-directories.
> - testGlob1 to 3 verify that glob patterns are expanded properly.
>
> To run the tests, please do the following:
>
> wget https://issues.apache.org/jira/secure/attachment/12536534/avro_test_files.tar.gz
> tar -xf avro_test_files.tar.gz
> ant clean compile-test piggybank -Dhadoopversion=20
> cd contrib/piggybank/java
> ant test -Dtestcase=TestAvroStorage
>
> 2. Both TestAvroStorage and TestAvroStorageUtils pass.
>
>
> Thanks,
>
> Cheolsoo Park
>
>
Re: Review Request: PIG-2492 AvroStorage should recognize globs and commas
Posted by Santhosh Srinivasan <sm...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5936/#review9320
-----------------------------------------------------------
Ship it!
Ship It!
- Santhosh Srinivasan
On July 20, 2012, 4:36 a.m., Cheolsoo Park wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/5936/
> -----------------------------------------------------------
>
> (Updated July 20, 2012, 4:36 a.m.)
>
>
> Review request for pig.
>
>
> Description
> -------
>
> Add glob support to AvroStorage:
>
> https://issues.apache.org/jira/browse/PIG-2492
>
>
> This addresses bug PIG-2492.
> https://issues.apache.org/jira/browse/PIG-2492
>
>
> Diffs
> -----
>
> contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java 0f8ef27
> contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java c7de726
> contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java 48b093b
> contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorageUtils.java e5d0c38
>
> Diff: https://reviews.apache.org/r/5936/diff/
>
>
> Testing
> -------
>
> 1. Added new unit tests as follows:
>
> - testDir verifies that AvroStorage recursively loads files in a directory and its sub-directories.
> - testGlob1 to 3 verify that glob patterns are expanded properly.
>
> To run the tests, please do the following:
>
> wget https://issues.apache.org/jira/secure/attachment/12536534/avro_test_files.tar.gz
> tar -xf avro_test_files.tar.gz
> ant clean compile-test piggybank -Dhadoopversion=20
> cd contrib/piggybank/java
> ant test -Dtestcase=TestAvroStorage
>
> 2. Both TestAvroStorage and TestAvroStorageUtils pass.
>
>
> Thanks,
>
> Cheolsoo Park
>
>
Re: Review Request: PIG-2492 AvroStorage should recognize globs and commas
Posted by Santhosh Srinivasan <sm...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5936/#review9340
-----------------------------------------------------------
Ship it!
+1 Will commit it if all the tests pass.
- Santhosh Srinivasan
On July 21, 2012, 10:27 p.m., Cheolsoo Park wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/5936/
> -----------------------------------------------------------
>
> (Updated July 21, 2012, 10:27 p.m.)
>
>
> Review request for pig.
>
>
> Description
> -------
>
> Add glob support to AvroStorage:
>
> https://issues.apache.org/jira/browse/PIG-2492
>
>
> This addresses bug PIG-2492.
> https://issues.apache.org/jira/browse/PIG-2492
>
>
> Diffs
> -----
>
> contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java 0f8ef27
> contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java c7de726
> contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java 48b093b
> contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorageUtils.java e5d0c38
>
> Diff: https://reviews.apache.org/r/5936/diff/
>
>
> Testing
> -------
>
> 1. Added new unit tests as follows:
>
> - testDir verifies that AvroStorage recursively loads files in a directory and its sub-directories.
> - testGlob1 to 3 verify that glob patterns are expanded properly.
>
> To run the tests, please do the following:
>
> wget https://issues.apache.org/jira/secure/attachment/12536534/avro_test_files.tar.gz
> tar -xf avro_test_files.tar.gz
> ant clean compile-test piggybank -Dhadoopversion=20
> cd contrib/piggybank/java
> ant test -Dtestcase=TestAvroStorage
>
> 2. Both TestAvroStorage and TestAvroStorageUtils pass.
>
>
> Thanks,
>
> Cheolsoo Park
>
>
Re: Review Request: PIG-2492 AvroStorage should recognize globs and commas
Posted by Cheolsoo Park <ch...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5936/
-----------------------------------------------------------
(Updated July 21, 2012, 10:27 p.m.)
Review request for pig.
Changes
-------
It turned out that testGlob1 fails when the test .avro files are added by svn. This was because '*' matches every file including .svn. I fixed this problem by applying PATHP_FILTER to fs.globStatus() and confirmed that all tests pass even when test .avro files are added by svn.
Thank you very much for finding this problem Santhosh!
Description
-------
Add glob support to AvroStorage:
https://issues.apache.org/jira/browse/PIG-2492
This addresses bug PIG-2492.
https://issues.apache.org/jira/browse/PIG-2492
Diffs (updated)
-----
contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java 0f8ef27
contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java c7de726
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java 48b093b
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorageUtils.java e5d0c38
Diff: https://reviews.apache.org/r/5936/diff/
Testing
-------
1. Added new unit tests as follows:
- testDir verifies that AvroStorage recursively loads files in a directory and its sub-directories.
- testGlob1 to 3 verify that glob patterns are expanded properly.
To run the tests, please do the following:
wget https://issues.apache.org/jira/secure/attachment/12536534/avro_test_files.tar.gz
tar -xf avro_test_files.tar.gz
ant clean compile-test piggybank -Dhadoopversion=20
cd contrib/piggybank/java
ant test -Dtestcase=TestAvroStorage
2. Both TestAvroStorage and TestAvroStorageUtils pass.
Thanks,
Cheolsoo Park
Re: Review Request: PIG-2492 AvroStorage should recognize globs and commas
Posted by Cheolsoo Park <ch...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5936/
-----------------------------------------------------------
(Updated July 20, 2012, 4:36 a.m.)
Review request for pig.
Changes
-------
Incorporated Santhosh's comments.
Description
-------
Add glob support to AvroStorage:
https://issues.apache.org/jira/browse/PIG-2492
This addresses bug PIG-2492.
https://issues.apache.org/jira/browse/PIG-2492
Diffs (updated)
-----
contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java 0f8ef27
contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java c7de726
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java 48b093b
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorageUtils.java e5d0c38
Diff: https://reviews.apache.org/r/5936/diff/
Testing
-------
1. Added new unit tests as follows:
- testDir verifies that AvroStorage recursively loads files in a directory and its sub-directories.
- testGlob1 to 3 verify that glob patterns are expanded properly.
To run the tests, please do the following:
wget https://issues.apache.org/jira/secure/attachment/12536534/avro_test_files.tar.gz
tar -xf avro_test_files.tar.gz
ant clean compile-test piggybank -Dhadoopversion=20
cd contrib/piggybank/java
ant test -Dtestcase=TestAvroStorage
2. Both TestAvroStorage and TestAvroStorageUtils pass.
Thanks,
Cheolsoo Park
Re: Review Request: PIG-2492 AvroStorage should recognize globs and commas
Posted by Cheolsoo Park <ch...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5936/
-----------------------------------------------------------
(Updated July 19, 2012, 1:23 a.m.)
Review request for pig.
Changes
-------
1) Added more unit tests including some negative tests.
2) Removed getPathsFromString() because I realized that fs.globStatus() implicitly expands comma-separated string into paths, so it is redundant to explicitly do it.
3) Changed the type of 1st parameter of getAllSubDirs() from URI to hadoop.fs.Path. This is needed because '{' and '}' are not allowed in URI, so URI.create() throws a URISyntaxException on a glob pattern. But these characters are automatically escaped when constructing a Path. Note that this wasn't an issue in my previous patch because getPathsFromString() used to implicitly convert a glob pattern to paths, but now I removed getPathsFromString() and have to do it explicitly.
In fact, this reverts some changes made by PIG-2540 (https://issues.apache.org/jira/browse/PIG-2540). However, this does not break S3 support because inside getAllSubDirs(), file system is still constructed for the given URI, and globStatus() is called on that file system.
FileSystem fs = FileSystem.get(path.toUri(), job.getConfiguration());
FileStatus[] matchedFiles = fs.globStatus(path);
So if path is a s3 URI, S3 file system will be used.
Description
-------
Add glob support to AvroStorage:
https://issues.apache.org/jira/browse/PIG-2492
This addresses bug PIG-2492.
https://issues.apache.org/jira/browse/PIG-2492
Diffs (updated)
-----
contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java 0f8ef27
contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java c7de726
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java 48b093b
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorageUtils.java e5d0c38
Diff: https://reviews.apache.org/r/5936/diff/
Testing
-------
1. Added new unit tests as follows:
- testDir verifies that AvroStorage recursively loads files in a directory and its sub-directories.
- testGlob1 to 3 verify that glob patterns are expanded properly.
To run the tests, please do the following:
wget https://issues.apache.org/jira/secure/attachment/12536534/avro_test_files.tar.gz
tar -xf avro_test_files.tar.gz
ant clean compile-test piggybank -Dhadoopversion=20
cd contrib/piggybank/java
ant test -Dtestcase=TestAvroStorage
2. Both TestAvroStorage and TestAvroStorageUtils pass.
Thanks,
Cheolsoo Park