You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hawq.apache.org by mgoddard-pivotal <gi...@git.apache.org> on 2018/04/27 02:04:38 UTC

[GitHub] incubator-hawq pull request #1357: Changes to enable reuse of PXF Parquet cl...

GitHub user mgoddard-pivotal opened a pull request:

    https://github.com/apache/incubator-hawq/pull/1357

    Changes to enable reuse of PXF Parquet classes, for data in S3

    I am working on adding support for read/write of Parquet formatted data, stored in S3, over PXF.  I wanted to reuse these existing Parquet classes since they are very functional, but I had to add a few methods since their members were private, as was a method I needed to use.
    
    I'd like to submit this PR for just these changes.  The PR for the overall S3 Parquet project is in a different module and I will submit this separately.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/mgoddard-pivotal/incubator-hawq s3-parquet

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-hawq/pull/1357.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1357
    
----
commit 95471117c0f866c80890abcb5740595bf0614e25
Author: Michael Goddard <mg...@...>
Date:   2018-04-27T01:53:12Z

    Changes so that these Parquet classes could be reused to support S3 Parquet reads

----


---

[GitHub] incubator-hawq pull request #1357: Changes to enable reuse of PXF Parquet cl...

Posted by mgoddard-pivotal <gi...@git.apache.org>.
Github user mgoddard-pivotal closed the pull request at:

    https://github.com/apache/incubator-hawq/pull/1357


---

[GitHub] incubator-hawq issue #1357: Changes to enable reuse of PXF Parquet classes, ...

Posted by shivzone <gi...@git.apache.org>.
Github user shivzone commented on the issue:

    https://github.com/apache/incubator-hawq/pull/1357
  
    @mgoddard-pivotal please close this PR


---

[GitHub] incubator-hawq pull request #1357: Changes to enable reuse of PXF Parquet cl...

Posted by shivzone <gi...@git.apache.org>.
Github user shivzone commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq/pull/1357#discussion_r185127394
  
    --- Diff: pxf/pxf-hdfs/src/main/java/org/apache/hawq/pxf/plugins/hdfs/ParquetResolver.java ---
    @@ -55,6 +56,16 @@ public ParquetResolver(InputData metaData) {
             super(metaData);
         }
     
    +    // This method facilitates passing in the MessageType instance, which is
    --- End diff --
    
    javadoc wouldn't like comments this style. Please refer to above function on standard convention used


---

[GitHub] incubator-hawq pull request #1357: Changes to enable reuse of PXF Parquet cl...

Posted by mgoddard-pivotal <gi...@git.apache.org>.
Github user mgoddard-pivotal commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq/pull/1357#discussion_r185272600
  
    --- Diff: pxf/pxf-hdfs/src/main/java/org/apache/hawq/pxf/plugins/hdfs/ParquetResolver.java ---
    @@ -55,6 +56,16 @@ public ParquetResolver(InputData metaData) {
             super(metaData);
         }
     
    +    // This method facilitates passing in the MessageType instance, which is
    --- End diff --
    
    @shivzone, I've switched to a Javadoc style comment.  Does this work?


---

[GitHub] incubator-hawq issue #1357: Changes to enable reuse of PXF Parquet classes, ...

Posted by shivzone <gi...@git.apache.org>.
Github user shivzone commented on the issue:

    https://github.com/apache/incubator-hawq/pull/1357
  
    Merged to Master


---

[GitHub] incubator-hawq pull request #1357: Changes to enable reuse of PXF Parquet cl...

Posted by shivzone <gi...@git.apache.org>.
Github user shivzone commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq/pull/1357#discussion_r185127176
  
    --- Diff: pxf/pxf-hdfs/src/main/java/org/apache/hawq/pxf/plugins/hdfs/ParquetFileAccessor.java ---
    @@ -130,21 +130,40 @@ private Group readNextGroup() {
          */
         public ParquetFileAccessor(InputData input) {
             super(input);
    -        ParquetUserData parquetUserData = HdfsUtilities.parseParquetUserData(input);
    -        schema = parquetUserData.getSchema();
    +    }
    +
    +    public MessageType getSchema() {
    +      return schema;
    +    }
    +
    +    public void setSchema(MessageType schema) {
    +      this.schema = schema;
    +      columnIO = new ColumnIOFactory().getColumnIO(schema);
    +    }
    +
    +    // Enable sub-classes of ParquetFileAccessor to set up recordIterator
    +    public void setRecordIterator() {
    +        recordIterator = new RecordIterator(reader);
    +    }
    +
    +    public void setReader (ParquetFileReader reader) {
    +      this.reader = reader;
    +    }
    +
    +    public boolean iteratorHasNext() {
    +      return recordIterator.hasNext();
         }
     
         @Override
         public boolean openForRead() throws Exception {
             Configuration conf = new Configuration();
             Path file = new Path(inputData.getDataSource());
             FileSplit fileSplit = HdfsUtilities.parseFileSplit(inputData);
    +        setSchema(HdfsUtilities.parseParquetUserData(inputData).getSchema());
    --- End diff --
    
    At some point we should look into moving some of the avro/parquet util functions into a generic package (maybe pxf-api) to decouple parquet specific functionality from hdfs package. For now, this is fine.


---