You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "ggershinsky (via GitHub)" <gi...@apache.org> on 2023/05/04 06:39:13 UTC

[GitHub] [parquet-mr] ggershinsky opened a new pull request, #1089: PARQUET-2297: Skip delta problem check for encrypted files

ggershinsky opened a new pull request, #1089:
URL: https://github.com/apache/parquet-mr/pull/1089

   https://issues.apache.org/jira/browse/PARQUET-2297


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-mr] ggershinsky commented on a diff in pull request #1089: PARQUET-2297: Skip delta problem check for encrypted files

Posted by "ggershinsky (via GitHub)" <gi...@apache.org>.
ggershinsky commented on code in PR #1089:
URL: https://github.com/apache/parquet-mr/pull/1089#discussion_r1186829851


##########
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetRecordReader.java:
##########
@@ -173,7 +173,10 @@ private void initializeInternalReader(ParquetInputSplit split, Configuration con
       }
     }
 
-    if (!reader.getRowGroups().isEmpty()) {
+    if (!reader.getRowGroups().isEmpty() &&
+      // Encrypted files (parquet-mr 1.12+) can't have the delta encoding problem (resolved in parquet-mr 1.8)

Review Comment:
   - with delta encoding problem: basically impossible to reproduce :), it was resolved in 1.8
   - without this problem: I've had a look at the existing unitests, unfortunately none can be used as a basis for adding a function for this particular situation. This will require building a new unitest from scratch. However, given that a) the patch is small and straightforward b) Spark stopped using this parquet read path - building a full unitest can be an overkill. But if you have a different opinion, please let me know.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-mr] ggershinsky merged pull request #1089: PARQUET-2297: Skip delta problem check for encrypted files

Posted by "ggershinsky (via GitHub)" <gi...@apache.org>.
ggershinsky merged PR #1089:
URL: https://github.com/apache/parquet-mr/pull/1089


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-mr] Fokko commented on a diff in pull request #1089: PARQUET-2297: Skip delta problem check for encrypted files

Posted by "Fokko (via GitHub)" <gi...@apache.org>.
Fokko commented on code in PR #1089:
URL: https://github.com/apache/parquet-mr/pull/1089#discussion_r1186655384


##########
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetRecordReader.java:
##########
@@ -173,7 +173,10 @@ private void initializeInternalReader(ParquetInputSplit split, Configuration con
       }
     }
 
-    if (!reader.getRowGroups().isEmpty()) {
+    if (!reader.getRowGroups().isEmpty() &&
+      // Encrypted files (parquet-mr 1.12+) can't have the delta encoding problem (resolved in parquet-mr 1.8)

Review Comment:
   Could we add a test for this?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-mr] Fokko commented on a diff in pull request #1089: PARQUET-2297: Skip delta problem check for encrypted files

Posted by "Fokko (via GitHub)" <gi...@apache.org>.
Fokko commented on code in PR #1089:
URL: https://github.com/apache/parquet-mr/pull/1089#discussion_r1188231603


##########
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetRecordReader.java:
##########
@@ -173,7 +173,10 @@ private void initializeInternalReader(ParquetInputSplit split, Configuration con
       }
     }
 
-    if (!reader.getRowGroups().isEmpty()) {
+    if (!reader.getRowGroups().isEmpty() &&
+      // Encrypted files (parquet-mr 1.12+) can't have the delta encoding problem (resolved in parquet-mr 1.8)

Review Comment:
   Thanks for the explanation, I'm fine with leaving out a unit test. Just curious if it would be easy to modify existing tests to make sure that we hit the code.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-mr] wgtmac commented on pull request #1089: PARQUET-2297: Skip delta problem check for encrypted files

Posted by "wgtmac (via GitHub)" <gi...@apache.org>.
wgtmac commented on PR #1089:
URL: https://github.com/apache/parquet-mr/pull/1089#issuecomment-1535610371

   Should we include this fix to the next 1.13.1 release: https://lists.apache.org/thread/1mjvdcmwqjcblmfkfgpd9ob2yodx7tom ?  @ggershinsky 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-mr] ggershinsky commented on pull request #1089: PARQUET-2297: Skip delta problem check for encrypted files

Posted by "ggershinsky (via GitHub)" <gi...@apache.org>.
ggershinsky commented on PR #1089:
URL: https://github.com/apache/parquet-mr/pull/1089#issuecomment-1535733579

   SGTM, I'll send a PR to the parquet-1.13.x branch too


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org