You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2020/07/15 20:33:56 UTC

[GitHub] [iceberg] rdblue commented on a change in pull request #1207: ORC: Support row position as a metadata column

rdblue commented on a change in pull request #1207:
URL: https://github.com/apache/iceberg/pull/1207#discussion_r455325073



##########
File path: orc/src/main/java/org/apache/iceberg/orc/OrcRowReader.java
##########
@@ -29,6 +29,6 @@
   /**
    * Reads a row.
    */
-  T read(VectorizedRowBatch batch, int row);
+  T read(VectorizedRowBatch batch, long batchOffsetInFile, int rowOffsetInBatch);

Review comment:
       This seems to introduce a lot of code churn, when most implementations don't use `batchOffsetInFile`. What about a less intrusive way of passing this by using a context method that is called once for each batch?
   
   Parquet has something similar, where each row group causes new context to be passed to the readers: https://github.com/apache/iceberg/blob/master/parquet/src/main/java/org/apache/iceberg/parquet/ParquetValueReader.java#L32
   
   This could expose a method like `setBatchContext(long batchOffsetInFile)` with a no-op default. Then only a few implementations would need to change.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org