You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by GitBox <gi...@apache.org> on 2021/01/11 04:37:09 UTC

[GitHub] [parquet-mr] chenjunjiedada opened a new pull request #852: PARQUET-1851: fix parquet metadata converter NPE

chenjunjiedada opened a new pull request #852:
URL: https://github.com/apache/parquet-mr/pull/852


   This fixes the NPE as below:
   
   ```
   java.lang.NullPointerExceptionjava.lang.NullPointerException at
   org.apache.parquet.format.converter.ParquetMetadataConverter.addRowGroup(ParquetMetadataConverter.java:476) at
   org.apache.parquet.format.converter.ParquetMetadataConverter.toParquetMetadata(ParquetMetadataConverter.java:177) at
   org.apache.parquet.hadoop.ParquetFileWriter.serializeFooter(ParquetFileWriter.java:914) at
   org.apache.parquet.hadoop.ParquetFileWriter.end(ParquetFileWriter.java:864) at
   org.apache.iceberg.parquet.ParquetWriter.close(ParquetWriter.java:206) at
   org.apache.iceberg.data.TestLocalScan.writeFile(TestLocalScan.java:429)
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [parquet-mr] chenjunjiedada commented on pull request #852: PARQUET-1851: fix parquet metadata converter NPE

Posted by GitBox <gi...@apache.org>.
chenjunjiedada commented on pull request #852:
URL: https://github.com/apache/parquet-mr/pull/852#issuecomment-759323935


   @gszadovszky Thanks for the comment and it makes sense to me. How about throwing an exception when detecting the row group's count is zero? Does that sound reasonable to you?
   
   PS. the space changes in the PR is automatically fixed by idea IDE. If you don't like it, I can handle it.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [parquet-mr] gszadovszky merged pull request #852: PARQUET-1851: fix parquet metadata converter NPE

Posted by GitBox <gi...@apache.org>.
gszadovszky merged pull request #852:
URL: https://github.com/apache/parquet-mr/pull/852


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [parquet-mr] vdiravka commented on a change in pull request #852: PARQUET-1851: fix parquet metadata converter NPE

Posted by GitBox <gi...@apache.org>.
vdiravka commented on a change in pull request #852:
URL: https://github.com/apache/parquet-mr/pull/852#discussion_r612155776



##########
File path: parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileWriter.java
##########
@@ -860,6 +860,10 @@ public void endColumn() throws IOException {
    * @throws IOException if there is an error while writing
    */
   public void endBlock() throws IOException {
+    if (currentRecordCount == 0) {
+      throw new IOException("End block with zero record");

Review comment:
       @gszadovszky Done
   https://issues.apache.org/jira/browse/PARQUET-2026




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [parquet-mr] gszadovszky commented on pull request #852: PARQUET-1851: fix parquet metadata converter NPE

Posted by GitBox <gi...@apache.org>.
gszadovszky commented on pull request #852:
URL: https://github.com/apache/parquet-mr/pull/852#issuecomment-759250313


   As I've commented in the jira I think this fix is not complete. I am no against adding a null check here but it will not solve the potential problem of writing an empty row group.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [parquet-mr] gszadovszky commented on a change in pull request #852: PARQUET-1851: fix parquet metadata converter NPE

Posted by GitBox <gi...@apache.org>.
gszadovszky commented on a change in pull request #852:
URL: https://github.com/apache/parquet-mr/pull/852#discussion_r556380366



##########
File path: parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileWriter.java
##########
@@ -860,6 +860,10 @@ public void endColumn() throws IOException {
    * @throws IOException if there is an error while writing
    */
   public void endBlock() throws IOException {
+    if (currentRecordCount == 0) {
+      throw new IOException("End block with zero record");

Review comment:
       I would suggest using a `ParquetEncodingException` instead.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [parquet-mr] gszadovszky commented on a change in pull request #852: PARQUET-1851: fix parquet metadata converter NPE

Posted by GitBox <gi...@apache.org>.
gszadovszky commented on a change in pull request #852:
URL: https://github.com/apache/parquet-mr/pull/852#discussion_r611595861



##########
File path: parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileWriter.java
##########
@@ -860,6 +860,10 @@ public void endColumn() throws IOException {
    * @throws IOException if there is an error while writing
    */
   public void endBlock() throws IOException {
+    if (currentRecordCount == 0) {
+      throw new IOException("End block with zero record");

Review comment:
       @vdiravka, you might want to create a separate jira about this topic so we can discuss it in a more open way. Please, also describe what the empty parquet files are used for.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [parquet-mr] vdiravka commented on a change in pull request #852: PARQUET-1851: fix parquet metadata converter NPE

Posted by GitBox <gi...@apache.org>.
vdiravka commented on a change in pull request #852:
URL: https://github.com/apache/parquet-mr/pull/852#discussion_r611094854



##########
File path: parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileWriter.java
##########
@@ -860,6 +860,10 @@ public void endColumn() throws IOException {
    * @throws IOException if there is an error while writing
    */
   public void endBlock() throws IOException {
+    if (currentRecordCount == 0) {
+      throw new IOException("End block with zero record");

Review comment:
       The empty parquet files was used in Drill earlier and now this change breaks the possibility to store empty tables (with header schema only) in parquet files. Any suggestions to bypass this and integrate Parquet-1.12.0 to Drill?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [parquet-mr] chenjunjiedada commented on a change in pull request #852: PARQUET-1851: fix parquet metadata converter NPE

Posted by GitBox <gi...@apache.org>.
chenjunjiedada commented on a change in pull request #852:
URL: https://github.com/apache/parquet-mr/pull/852#discussion_r556384676



##########
File path: parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileWriter.java
##########
@@ -860,6 +860,10 @@ public void endColumn() throws IOException {
    * @throws IOException if there is an error while writing
    */
   public void endBlock() throws IOException {
+    if (currentRecordCount == 0) {
+      throw new IOException("End block with zero record");

Review comment:
       SGTM.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [parquet-mr] vdiravka commented on a change in pull request #852: PARQUET-1851: fix parquet metadata converter NPE

Posted by GitBox <gi...@apache.org>.
vdiravka commented on a change in pull request #852:
URL: https://github.com/apache/parquet-mr/pull/852#discussion_r611094854



##########
File path: parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileWriter.java
##########
@@ -860,6 +860,10 @@ public void endColumn() throws IOException {
    * @throws IOException if there is an error while writing
    */
   public void endBlock() throws IOException {
+    if (currentRecordCount == 0) {
+      throw new IOException("End block with zero record");

Review comment:
       The empty parquet files was used in Drill earlier and now this change breaks the possibility to store empty tables (with header schema only) in parquet files. Any suggestions to bypass this and integrate Parquet-1.12.0 to Drill?
   @gszadovszky  @cenjunjiedada 

##########
File path: parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileWriter.java
##########
@@ -860,6 +860,10 @@ public void endColumn() throws IOException {
    * @throws IOException if there is an error while writing
    */
   public void endBlock() throws IOException {
+    if (currentRecordCount == 0) {
+      throw new IOException("End block with zero record");

Review comment:
       The empty parquet files was used in Drill earlier and now this change breaks the possibility to store empty tables (with header schema only) in parquet files. Any suggestions to bypass this and integrate Parquet-1.12.0 to Drill?
   @gszadovszky  @chenjunjiedada 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org