You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by GitBox <gi...@apache.org> on 2022/06/20 11:11:38 UTC

[GitHub] [parquet-mr] ala opened a new pull request, #978: PARQUET-2161: Fix row index generation in combination with range filtering

ala opened a new pull request, #978:
URL: https://github.com/apache/parquet-mr/pull/978

   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Parquet Jira](https://issues.apache.org/jira/browse/PARQUET/) issues and references them in the PR title. For example, "PARQUET-1234: My Parquet PR"
     - https://issues.apache.org/jira/browse/PARQUET-2161
     - In case you are adding a dependency, check if the license complies with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason:
     - Extends `TestParquetReader` suite. 
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)":
     1. Subject is separated from body by a blank line
     1. Subject is limited to 50 characters (not including Jira issue reference)
     1. Subject does not end with a period
     1. Subject uses the imperative mood ("add", not "adding")
     1. Body wraps at 72 characters
     1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes how to use it.
     - All the public functions and the classes in the PR contain Javadoc that explain what it does
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-mr] ggershinsky commented on pull request #978: PARQUET-2161: Fix row index generation in combination with range filtering

Posted by GitBox <gi...@apache.org>.
ggershinsky commented on PR #978:
URL: https://github.com/apache/parquet-mr/pull/978#issuecomment-1195083014

   cc @shangxinli 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-mr] ggershinsky merged pull request #978: PARQUET-2161: Fix row index generation in combination with range filtering

Posted by GitBox <gi...@apache.org>.
ggershinsky merged PR #978:
URL: https://github.com/apache/parquet-mr/pull/978


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-mr] chenjunjiedada commented on pull request #978: PARQUET-2161: Fix row index generation in combination with range filtering

Posted by GitBox <gi...@apache.org>.
chenjunjiedada commented on PR #978:
URL: https://github.com/apache/parquet-mr/pull/978#issuecomment-1169957186

   This looks correct to me. The logic also exists in the iceberg row position reader. See: https://github.com/apache/iceberg/pull/1254#discussion_r461893642. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-mr] ala commented on pull request #978: PARQUET-2161: Fix row index generation in combination with range filtering

Posted by GitBox <gi...@apache.org>.
ala commented on PR #978:
URL: https://github.com/apache/parquet-mr/pull/978#issuecomment-1165708679

   cc @ggershinsky
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-mr] shangxinli commented on pull request #978: PARQUET-2161: Fix row index generation in combination with range filtering

Posted by GitBox <gi...@apache.org>.
shangxinli commented on PR #978:
URL: https://github.com/apache/parquet-mr/pull/978#issuecomment-1281184662

   @ala Thanks for pinging me! At this moment, I don't have ETA yet. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-mr] ggershinsky commented on pull request #978: PARQUET-2161: Fix row index generation in combination with range filtering

Posted by GitBox <gi...@apache.org>.
ggershinsky commented on PR #978:
URL: https://github.com/apache/parquet-mr/pull/978#issuecomment-1169963727

   Thanks @chenjunjiedada . 
   @ala , please handle the message comment, and I'll merge this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-mr] ggershinsky commented on pull request #978: PARQUET-2161: Fix row index generation in combination with range filtering

Posted by GitBox <gi...@apache.org>.
ggershinsky commented on PR #978:
URL: https://github.com/apache/parquet-mr/pull/978#issuecomment-1170396062

   Thanks @ala 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-mr] ggershinsky commented on pull request #978: PARQUET-2161: Fix row index generation in combination with range filtering

Posted by GitBox <gi...@apache.org>.
ggershinsky commented on PR #978:
URL: https://github.com/apache/parquet-mr/pull/978#issuecomment-1168681667

   Yep, I remember reviewing that PR. @prakharjain09 , can you also have a look at this fix?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-mr] ala commented on pull request #978: PARQUET-2161: Fix row index generation in combination with range filtering

Posted by GitBox <gi...@apache.org>.
ala commented on PR #978:
URL: https://github.com/apache/parquet-mr/pull/978#issuecomment-1190057981

   @ggershinsky Do you know when the next release that will include the fix might happen? We are looking to unblock https://issues.apache.org/jira/browse/SPARK-39634 in Apache Spark.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-mr] ala commented on pull request #978: PARQUET-2161: Fix row index generation in combination with range filtering

Posted by GitBox <gi...@apache.org>.
ala commented on PR #978:
URL: https://github.com/apache/parquet-mr/pull/978#issuecomment-1281067651

   @ggershinsky @shangxinli Hi! I just wanted to ask if 1.12.4 release might be happening soon (it seems in the previous years there usually was a release around September-October time)? We could really use the fix in Spark. Also: do I need to cherry-pick this fix, or would the next release be cut from `master`?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-mr] chenjunjiedada commented on a diff in pull request #978: PARQUET-2161: Fix row index generation in combination with range filtering

Posted by GitBox <gi...@apache.org>.
chenjunjiedada commented on code in PR #978:
URL: https://github.com/apache/parquet-mr/pull/978#discussion_r909597173


##########
parquet-hadoop/src/test/java/org/apache/parquet/filter2/recordlevel/PhoneBookWriter.java:
##########
@@ -359,7 +359,7 @@ public static List<User> readUsers(ParquetReader.Builder<Group> builder, boolean
       User u = userFromGroup(group);
       users.add(u);
       if (validateRowIndexes) {
-        assertEquals(reader.getCurrentRowIndex(), u.id);
+        assertEquals("validating row index", u.id, reader.getCurrentRowIndex());

Review Comment:
   The message doesn't look like an error message. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-mr] ala commented on pull request #978: PARQUET-2161: Fix row index generation in combination with range filtering

Posted by GitBox <gi...@apache.org>.
ala commented on PR #978:
URL: https://github.com/apache/parquet-mr/pull/978#issuecomment-1170153775

   @ggershinsky Thanks for the review. I tweaked the error assertion message to better match the rest of the codebase.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-mr] ala commented on pull request #978: PARQUET-2161: Fix row index generation in combination with range filtering

Posted by GitBox <gi...@apache.org>.
ala commented on PR #978:
URL: https://github.com/apache/parquet-mr/pull/978#issuecomment-1164263841

   cc @shangxinli This is a small follow-up bug fix for https://github.com/apache/parquet-mr/pull/945


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org