You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/08/30 21:25:00 UTC

[jira] [Commented] (PARQUET-1408) parquet-tools SimpleRecord does display null columns

    [ https://issues.apache.org/jira/browse/PARQUET-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16597939#comment-16597939 ] 

ASF GitHub Bot commented on PARQUET-1408:
-----------------------------------------

rushton opened a new pull request #518: [PARQUET-1408] Make parquet-tools to display fields with missing values
URL: https://github.com/apache/parquet-mr/pull/518
 
 
   When using parquet-tools on a parquet file with null records the null columns are omitted from the output.
   
   Example:
   ```
   scala> case class Foo(a: Int, b: String)
   defined class Foo
   
   scala> org.apache.spark.sql.SparkSession.builder.getOrCreate.createDataset((0 to 1000).map(x => Foo(1,null))).write.parquet("/tmp/foobar/")
   ```
   Pre-patch:
   ```
   ☁  parquet-tools [master] ⚡  java -jar target/parquet-tools-1.10.1-SNAPSHOT.jar cat -j /tmp/foobar/part-00000-436a4d37-d82a-4771-8e7e-e4d428464675-c000.snappy.parquet | head -n5
   {"a":1}
   {"a":1}
   {"a":1}
   {"a":1}
   {"a":1}
   ```
   Post-patch:
   ```
   ☁  parquet-tools [master] ⚡  java -jar target/parquet-tools-1.10.1-SNAPSHOT.jar cat -j /tmp/foobar/part-00000-436a4d37-d82a-4771-8e7e-e4d428464675-c000.snappy.parquet | head -n5
   {"a":1,"b":null}
   {"a":1,"b":null}
   {"a":1,"b":null}
   {"a":1,"b":null}
   {"a":1,"b":null}
    ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> parquet-tools SimpleRecord does display null columns
> ----------------------------------------------------
>
>                 Key: PARQUET-1408
>                 URL: https://issues.apache.org/jira/browse/PARQUET-1408
>             Project: Parquet
>          Issue Type: Bug
>    Affects Versions: 1.9.0
>            Reporter: Nicholas Rushton
>            Priority: Minor
>              Labels: pull-request-available
>             Fix For: 1.10.1
>
>
> When using parquet-tools on a parquet file with null records the null columns are omitted from the output.
>  
> Example:
> {code:java}
> scala> case class Foo(a: Int, b: String)
> defined class Foo
> scala> org.apache.spark.sql.SparkSession.builder.getOrCreate.createDataset((0 to 1000).map(x => Foo(1,null))).write.parquet("/tmp/foobar/"){code}
> Actual:
> {code:java}
> ☁  parquet-tools [master] ⚡  java -jar target/parquet-tools-1.10.1-SNAPSHOT.jar cat -j /tmp/foobar/part-00000-436a4d37-d82a-4771-8e7e-e4d428464675-c000.snappy.parquet | head -n5
> {"a":1}
> {"a":1}
> {"a":1}
> {"a":1}
> {"a":1}{code}
> Expected:
> {code:java}
> ☁  parquet-tools [master] ⚡  java -jar target/parquet-tools-1.10.1-SNAPSHOT.jar cat -j /tmp/foobar/part-00000-436a4d37-d82a-4771-8e7e-e4d428464675-c000.snappy.parquet | head -n5
> {"a":1,"b":null}
> {"a":1,"b":null}
> {"a":1,"b":null}
> {"a":1,"b":null}
> {"a":1,"b":null}{code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)