You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/08/24 20:08:36 UTC

[GitHub] [arrow] scober opened a new pull request #8040: ARROW-9824: [C++] Export file_offset in RowGroupMetaData

scober opened a new pull request #8040:
URL: https://github.com/apache/arrow/pull/8040


   Export the file_offset field (which already exists in RowGroupMetaDataImpl) in RowGroupMetaData.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] scober commented on pull request #8040: ARROW-9824: [C++] Export file_offset in RowGroupMetaData

Posted by GitBox <gi...@apache.org>.
scober commented on pull request #8040:
URL: https://github.com/apache/arrow/pull/8040#issuecomment-681002346


   When you say documentation I assume you mean a comment near the method definition?
   And it looks the default value of this field is 0? (https://github.com/apache/arrow/blob/d02e16646cb3ee2a568dfeeaa68a0f75986592c8/cpp/src/parquet/metadata.cc#L1249)
   And when you say the wording from the parquet specification do you mean the comment in the thrift definition?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] scober commented on pull request #8040: ARROW-9824: [C++] Export file_offset in RowGroupMetaData

Posted by GitBox <gi...@apache.org>.
scober commented on pull request #8040:
URL: https://github.com/apache/arrow/pull/8040#issuecomment-681018008


   I updated the comments for the method declaration to address your documentation concerns. I did not change the default value of `file_offset` because I agree that 0 is wrong (though less obviously wrong than -1) and I did not want to chase down the unintended consequences of changing that.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #8040: ARROW-9824: [C++] Export file_offset in RowGroupMetaData

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #8040:
URL: https://github.com/apache/arrow/pull/8040#issuecomment-679345538


   https://issues.apache.org/jira/browse/ARROW-9824


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] scober commented on a change in pull request #8040: ARROW-9824: [C++] Export file_offset in RowGroupMetaData

Posted by GitBox <gi...@apache.org>.
scober commented on a change in pull request #8040:
URL: https://github.com/apache/arrow/pull/8040#discussion_r477475598



##########
File path: cpp/src/parquet/metadata.h
##########
@@ -210,6 +210,12 @@ class PARQUET_EXPORT RowGroupMetaData {
 
   /// \brief Total byte size of all the uncompressed column data in this row group.
   int64_t total_byte_size() const;
+
+  // The file_offset field that this method exposes is optional. This method

Review comment:
       Sure




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #8040: ARROW-9824: [C++] Export file_offset in RowGroupMetaData

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #8040:
URL: https://github.com/apache/arrow/pull/8040#issuecomment-679991489


   @emkornfield Do you think this is ok to add?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] emkornfield commented on pull request #8040: ARROW-9824: [C++] Export file_offset in RowGroupMetaData

Posted by GitBox <gi...@apache.org>.
emkornfield commented on pull request #8040:
URL: https://github.com/apache/arrow/pull/8040#issuecomment-681004709


   > When you say documentation I assume you mean a comment near the method definition?
   
   Yes the comment for the method definition
   
   > And it looks the default value of this field is 0?
   
   Yes, I'm not sure if it breaks anything to make this -1 or something obviously wrong (0 should be wrong because I think all parquet files need to start with 'PAR1'
   
   > And when you say the wording from the parquet specification do you mean the comment in the thrift definition?
   
   Yes:
   ```
   Byte offset from beginning of file to first page (data or dictionary) in this row group
   ```
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] emkornfield commented on pull request #8040: ARROW-9824: [C++] Export file_offset in RowGroupMetaData

Posted by GitBox <gi...@apache.org>.
emkornfield commented on pull request #8040:
URL: https://github.com/apache/arrow/pull/8040#issuecomment-680108252


   Since this is simply a proxy object around the underlying metadata I think it is OK to expose but should have some documentation around the value of this field if the offset isn't present (it looks like this field is [optional](https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L822), but in practice I imagine it must be required to properly parse a file?).  It would be nice to use the wording from the parquet specificaiton 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] emkornfield closed pull request #8040: PARQUET-1904: [C++] Export file_offset in RowGroupMetaData

Posted by GitBox <gi...@apache.org>.
emkornfield closed pull request #8040:
URL: https://github.com/apache/arrow/pull/8040


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] emkornfield commented on pull request #8040: ARROW-9824: [C++] Export file_offset in RowGroupMetaData

Posted by GitBox <gi...@apache.org>.
emkornfield commented on pull request #8040:
URL: https://github.com/apache/arrow/pull/8040#issuecomment-681317956


   +1 thank you @scober 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] emkornfield commented on a change in pull request #8040: ARROW-9824: [C++] Export file_offset in RowGroupMetaData

Posted by GitBox <gi...@apache.org>.
emkornfield commented on a change in pull request #8040:
URL: https://github.com/apache/arrow/pull/8040#discussion_r477472781



##########
File path: cpp/src/parquet/metadata.h
##########
@@ -210,6 +210,12 @@ class PARQUET_EXPORT RowGroupMetaData {
 
   /// \brief Total byte size of all the uncompressed column data in this row group.
   int64_t total_byte_size() const;
+
+  // The file_offset field that this method exposes is optional. This method

Review comment:
       Please move this below \brief with a new line and use triple slash '///'




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org