You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/08/14 16:05:07 UTC

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #25454: [MINOR][DOCS] Remove Note on Parquet Nullability Behaviour

dongjoon-hyun commented on a change in pull request #25454: [MINOR][DOCS] Remove Note on Parquet Nullability Behaviour
URL: https://github.com/apache/spark/pull/25454#discussion_r313957648
 
 

 ##########
 File path: docs/sql-data-sources-parquet.md
 ##########
 @@ -24,8 +24,7 @@ license: |
 
 [Parquet](http://parquet.io) is a columnar format that is supported by many other data processing systems.
 Spark SQL provides support for both reading and writing Parquet files that automatically preserves the schema
-of the original data. When reading Parquet files, all columns are automatically converted to be nullable for
-compatibility reasons.
 
 Review comment:
   Hi, @sujithjay .
   Thank you for making a PR, but the old statement is true. You are confusing between reading and writing.
   When writing, the schema is preserved. When reading, it becomes nullable.
   ```
   $ parquet-tools meta /tmp/nullability.parquet/part-00000-f8024888-e746-41ac-ac9c-dc1cab5dfb80-c000.snappy.parquet
   file:        file:/tmp/nullability.parquet/part-00000-f8024888-e746-41ac-ac9c-dc1cab5dfb80-c000.snappy.parquet
   creator:     parquet-mr version 1.10.1 (build a89df8f9932b6ef6633d06069e50c9b7970bebd1)
   extra:       org.apache.spark.sql.parquet.row.metadata = {"type":"struct","fields":[{"name":"id","type":"long","nullable":false,"metadata":{}}]}
   
   file schema: spark_schema
   --------------------------------------------------------------------------------
   id:          REQUIRED INT64 R:0 D:0
   
   scala> spark.read.parquet("/tmp/nullability.parquet").printSchema
   root
    |-- id: long (nullable = true)
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org