You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "ggershinsky (via GitHub)" <gi...@apache.org> on 2023/05/07 10:56:01 UTC

[GitHub] [parquet-mr] ggershinsky commented on a diff in pull request #1089: PARQUET-2297: Skip delta problem check for encrypted files

ggershinsky commented on code in PR #1089:
URL: https://github.com/apache/parquet-mr/pull/1089#discussion_r1186829851


##########
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetRecordReader.java:
##########
@@ -173,7 +173,10 @@ private void initializeInternalReader(ParquetInputSplit split, Configuration con
       }
     }
 
-    if (!reader.getRowGroups().isEmpty()) {
+    if (!reader.getRowGroups().isEmpty() &&
+      // Encrypted files (parquet-mr 1.12+) can't have the delta encoding problem (resolved in parquet-mr 1.8)

Review Comment:
   - with delta encoding problem: basically impossible to reproduce :), it was resolved in 1.8
   - without this problem: I've had a look at the existing unitests, unfortunately none can be used as a basis for adding a function for this particular situation. This will require building a new unitest from scratch. However, given that a) the patch is small and straightforward b) Spark stopped using this parquet read path - building a full unitest can be an overkill. But if you have a different opinion, please let me know.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org