You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by Rohini Palaniswamy <ro...@gmail.com> on 2014/01/23 22:22:25 UTC

Review Request 17266: [PIG-3661] Piggybank AvroStorage fails if used in more than one load or store statement

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/17266/
-----------------------------------------------------------

Review request for pig.


Bugs: PIG-3661
    https://issues.apache.org/jira/browse/PIG-3661


Repository: pig


Description
-------

This patch fixes other issues with AvroStorage as well apart from fixing multiple load and store
   - Hidden files were not excluded (PIG-3717)
   - mapred.input.dir was getting populated with all files instead of the top level directory making the conf very big
   - Default value was not set for a Union


Diffs
-----

  http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java 1560805 
  http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java 1560805 
  http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroInputFormat.java 1560805 
  http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroRecordReader.java 1560805 
  http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java 1560805 
  http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorageUtils.java 1560805 

Diff: https://reviews.apache.org/r/17266/diff/


Testing
-------

Unit test added. testGlob was passing in git but failing when run in svn code base due to hidden .svn files(PIG-3717). That passes as well.


Thanks,

Rohini Palaniswamy


Re: Review Request 17266: [PIG-3661] Piggybank AvroStorage fails if used in more than one load or store statement

Posted by Rohini Palaniswamy <ro...@gmail.com>.

> On Jan. 24, 2014, 5:09 p.m., Cheolsoo Park wrote:
> > http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java, lines 204-212
> > <https://reviews.apache.org/r/17266/diff/1/?file=436566#file436566line204>
> >
> >     Isn't (statuses.length == 0 && !status.isDir()) == fs.isFile(path)? If so, can you simplify this to the following? 
> >     
> >     if (fs.isFile(path)) {
> >       return path;
> >     }
> >     
> >     FileStatus[] statuses = fs.listStatus(path, PATH_FILTER);
> >     <the rest goes here>

Found that fs.isFile() returns false instead of throwing FileNotFoundException. So instead doing

FileStatus status = fs.getFileStatus(path);
        if (!status.isDir()) {
            return path;
        }


- Rohini


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/17266/#review32718
-----------------------------------------------------------


On Jan. 23, 2014, 9:22 p.m., Rohini Palaniswamy wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/17266/
> -----------------------------------------------------------
> 
> (Updated Jan. 23, 2014, 9:22 p.m.)
> 
> 
> Review request for pig.
> 
> 
> Bugs: PIG-3661
>     https://issues.apache.org/jira/browse/PIG-3661
> 
> 
> Repository: pig
> 
> 
> Description
> -------
> 
> This patch fixes other issues with AvroStorage as well apart from fixing multiple load and store
>    - Hidden files were not excluded (PIG-3717)
>    - mapred.input.dir was getting populated with all files instead of the top level directory making the conf very big
>    - Default value was not set for a Union
> 
> 
> Diffs
> -----
> 
>   http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java 1560805 
>   http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java 1560805 
>   http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroInputFormat.java 1560805 
>   http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroRecordReader.java 1560805 
>   http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java 1560805 
>   http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorageUtils.java 1560805 
> 
> Diff: https://reviews.apache.org/r/17266/diff/
> 
> 
> Testing
> -------
> 
> Unit test added. testGlob was passing in git but failing when run in svn code base due to hidden .svn files(PIG-3717). That passes as well.
> 
> 
> Thanks,
> 
> Rohini Palaniswamy
> 
>


Re: Review Request 17266: [PIG-3661] Piggybank AvroStorage fails if used in more than one load or store statement

Posted by Cheolsoo Park <pi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/17266/#review32718
-----------------------------------------------------------


Looks good. I have few minor comments below.


http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java
<https://reviews.apache.org/r/17266/#comment61661>

    Can we remove this method? Why not let AvroStorage directly call getPaths(String, Configuration, boolean) and get rid of an extra indirection?



http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java
<https://reviews.apache.org/r/17266/#comment61667>

    f.getPath() => path.



http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java
<https://reviews.apache.org/r/17266/#comment61668>

    } else {



http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java
<https://reviews.apache.org/r/17266/#comment61663>

    Can we remove this method? It seems not used anywhere.



http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java
<https://reviews.apache.org/r/17266/#comment61670>

    Isn't (statuses.length == 0 && !status.isDir()) == fs.isFile(path)? If so, can you simplify this to the following? 
    
    if (fs.isFile(path)) {
      return path;
    }
    
    FileStatus[] statuses = fs.listStatus(path, PATH_FILTER);
    <the rest goes here>



http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroRecordReader.java
<https://reviews.apache.org/r/17266/#comment61669>

    for( => for (


- Cheolsoo Park


On Jan. 23, 2014, 9:22 p.m., Rohini Palaniswamy wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/17266/
> -----------------------------------------------------------
> 
> (Updated Jan. 23, 2014, 9:22 p.m.)
> 
> 
> Review request for pig.
> 
> 
> Bugs: PIG-3661
>     https://issues.apache.org/jira/browse/PIG-3661
> 
> 
> Repository: pig
> 
> 
> Description
> -------
> 
> This patch fixes other issues with AvroStorage as well apart from fixing multiple load and store
>    - Hidden files were not excluded (PIG-3717)
>    - mapred.input.dir was getting populated with all files instead of the top level directory making the conf very big
>    - Default value was not set for a Union
> 
> 
> Diffs
> -----
> 
>   http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java 1560805 
>   http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java 1560805 
>   http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroInputFormat.java 1560805 
>   http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroRecordReader.java 1560805 
>   http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java 1560805 
>   http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorageUtils.java 1560805 
> 
> Diff: https://reviews.apache.org/r/17266/diff/
> 
> 
> Testing
> -------
> 
> Unit test added. testGlob was passing in git but failing when run in svn code base due to hidden .svn files(PIG-3717). That passes as well.
> 
> 
> Thanks,
> 
> Rohini Palaniswamy
> 
>


Re: Review Request 17266: [PIG-3661] Piggybank AvroStorage fails if used in more than one load or store statement

Posted by Cheolsoo Park <pi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/17266/#review32745
-----------------------------------------------------------

Ship it!


Ship It!

- Cheolsoo Park


On Jan. 24, 2014, 8:27 p.m., Rohini Palaniswamy wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/17266/
> -----------------------------------------------------------
> 
> (Updated Jan. 24, 2014, 8:27 p.m.)
> 
> 
> Review request for pig.
> 
> 
> Bugs: PIG-3661
>     https://issues.apache.org/jira/browse/PIG-3661
> 
> 
> Repository: pig
> 
> 
> Description
> -------
> 
> This patch fixes other issues with AvroStorage as well apart from fixing multiple load and store
>    - Hidden files were not excluded (PIG-3717)
>    - mapred.input.dir was getting populated with all files instead of the top level directory making the conf very big
>    - Default value was not set for a Union
> 
> 
> Diffs
> -----
> 
>   http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java 1560805 
>   http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java 1560805 
>   http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroInputFormat.java 1560805 
>   http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroRecordReader.java 1560805 
>   http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java 1560805 
>   http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorageUtils.java 1560805 
> 
> Diff: https://reviews.apache.org/r/17266/diff/
> 
> 
> Testing
> -------
> 
> Unit test added. testGlob was passing in git but failing when run in svn code base due to hidden .svn files(PIG-3717). That passes as well.
> 
> 
> Thanks,
> 
> Rohini Palaniswamy
> 
>


Re: Review Request 17266: [PIG-3661] Piggybank AvroStorage fails if used in more than one load or store statement

Posted by Rohini Palaniswamy <ro...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/17266/
-----------------------------------------------------------

(Updated Jan. 24, 2014, 8:27 p.m.)


Review request for pig.


Changes
-------

Addressed review comments


Bugs: PIG-3661
    https://issues.apache.org/jira/browse/PIG-3661


Repository: pig


Description
-------

This patch fixes other issues with AvroStorage as well apart from fixing multiple load and store
   - Hidden files were not excluded (PIG-3717)
   - mapred.input.dir was getting populated with all files instead of the top level directory making the conf very big
   - Default value was not set for a Union


Diffs (updated)
-----

  http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java 1560805 
  http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java 1560805 
  http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroInputFormat.java 1560805 
  http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroRecordReader.java 1560805 
  http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java 1560805 
  http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorageUtils.java 1560805 

Diff: https://reviews.apache.org/r/17266/diff/


Testing
-------

Unit test added. testGlob was passing in git but failing when run in svn code base due to hidden .svn files(PIG-3717). That passes as well.


Thanks,

Rohini Palaniswamy