You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@parquet.apache.org by ap...@apache.org on 2021/04/04 09:25:27 UTC

[parquet-format] branch master updated: PARQUET-2013: [Format] Mention that ConvertedType is deprecated (#169)

This is an automated email from the ASF dual-hosted git repository.

apitrou pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/parquet-format.git


The following commit(s) were added to refs/heads/master by this push:
     new cabeea7  PARQUET-2013: [Format] Mention that ConvertedType is deprecated (#169)
cabeea7 is described below

commit cabeea7ca4afe22f4768555a017594ce343d88df
Author: Antoine Pitrou <an...@python.org>
AuthorDate: Sun Apr 4 11:25:20 2021 +0200

    PARQUET-2013: [Format] Mention that ConvertedType is deprecated (#169)
    
    Also slight wording improvements, and replace a "must" with "should" for writing the ConvertedType field.
---
 LogicalTypes.md                | 25 +++++++++++++++----------
 README.md                      |  5 ++---
 src/main/thrift/parquet.thrift | 19 +++++++++++++------
 3 files changed, 30 insertions(+), 19 deletions(-)

diff --git a/LogicalTypes.md b/LogicalTypes.md
index 2904eaa..4b9a8df 100644
--- a/LogicalTypes.md
+++ b/LogicalTypes.md
@@ -31,6 +31,7 @@ This file contains the specification for all logical types.
 
 The parquet format's `LogicalType` stores the type annotation. The annotation
 may require additional metadata fields, as well as rules for those fields.
+
 There is an older representation of the logical type annotations called `ConvertedType`.
 To support backward compatibility with old files, readers should interpret `LogicalTypes`
 in the same way as `ConvertedType`, and writers should populate `ConvertedType` in the metadata
@@ -39,16 +40,18 @@ according to well defined conversion rules.
 ### Compatibility
 
 The Thrift definition of the metadata has two fields for logical types: `ConvertedType` and `LogicalType`.
-`ConvertedType` is an enum of all available annotation. Since Thrift enums can't have additional type parameters,
+`ConvertedType` is an enum of all available annotations. Since Thrift enums can't have additional type parameters,
 it is cumbersome to define additional type parameters, like decimal scale and precision
 (which are additional 32 bit integer fields on SchemaElement, and are relevant only for decimals) or time unit
 and UTC adjustment flag for Timestamp types. To overcome this problem, a new logical type representation was introduced into
-the metadata to replace `ConvertedType`: `LogicalType`.  The new representation is a union of struct of logical types,
+the metadata to replace `ConvertedType`: `LogicalType`.  The new representation is a union of structs of logical types,
 this way allowing more flexible API, logical types can have type parameters.
 
-However, to maintain compatibility, Parquet readers should be able to read
-and interpret old logical type representation (in case the new one is not present,
-because the file was written by older writer), and write `ConvertedType` field for old readers.
+`ConvertedType` is deprecated. However, to maintain compatibility with old writers,
+Parquet readers should be able to read and interpret `ConvertedType` annotations
+in case `LogicalType` annotations are not present. Parquet writers must always write
+`LogicalType` annotations where applicable, but should also write the corresponding
+`ConvertedType` annotations (if any) to maintain compatibility with old readers.
 
 Compatibility considerations are mentioned for each annotation in the corresponding section.
 
@@ -242,7 +245,7 @@ comparison.
 To support compatibility with older readers, implementations of parquet-format should
 write `DecimalType` precision and scale into the corresponding SchemaElement field in metadata.
 
-## Date/Time Types
+## Temporal Types
 
 ### DATE
 
@@ -753,7 +756,9 @@ optional group my_map (MAP_KEY_VALUE) {
 }
 ```
 
-## Null
-Sometimes when discovering the schema of existing data values are always null and there's no type information.
-The `NULL` type can be used to annotates a column that is always null.
-(Similar to Null type in Avro)
+## UNKNOWN (always null)
+
+Sometimes, when discovering the schema of existing data, values are always null
+and there's no type information.
+The `UNKNOWN` type can be used to annotate a column that is always null.
+(Similar to Null type in Avro and Arrow)
diff --git a/README.md b/README.md
index 15fc427..ac7c791 100644
--- a/README.md
+++ b/README.md
@@ -139,9 +139,8 @@ by specifying how the primitive types should be interpreted. This keeps the set
 of primitive types to a minimum and reuses parquet's efficient encodings. For
 example, strings are stored as byte arrays (binary) with a UTF8 annotation.
 These annotations define how to further decode and interpret the data.
-Annotations are stored as `ConvertedType` fields in the file metadata and are
-documented in
-[LogicalTypes.md][logical-types].
+Annotations are stored as `LogicalType` fields in the file metadata and are
+documented in [LogicalTypes.md][logical-types].
 
 [logical-types]: LogicalTypes.md
 
diff --git a/src/main/thrift/parquet.thrift b/src/main/thrift/parquet.thrift
index 24088c1..1dc6958 100644
--- a/src/main/thrift/parquet.thrift
+++ b/src/main/thrift/parquet.thrift
@@ -41,9 +41,10 @@ enum Type {
 }
 
 /**
- * Common types used by frameworks(e.g. hive, pig) using parquet.  This helps map
- * between types in those frameworks to the base types in parquet.  This is only
- * metadata and not needed to read or write the data.
+ * DEPRECATED: Common types used by frameworks(e.g. hive, pig) using parquet.
+ * ConvertedType is superseded by LogicalType.  This enum should not be extended.
+ *
+ * See LogicalTypes.md for conversion between ConvertedType and LogicalType.
  */
 enum ConvertedType {
   /** a BYTE_ARRAY actually contains UTF8 encoded chars */
@@ -316,7 +317,7 @@ struct BsonType {
  * LogicalType annotations to replace ConvertedType.
  *
  * To maintain compatibility, implementations using LogicalType for a
- * SchemaElement must also set the corresponding ConvertedType from the
+ * SchemaElement should also set the corresponding ConvertedType from the
  * following table.
  */
 union LogicalType {
@@ -374,13 +375,19 @@ struct SchemaElement {
    */
   5: optional i32 num_children;
 
-  /** When the schema is the result of a conversion from another model
+  /**
+   * DEPRECATED: When the schema is the result of a conversion from another model.
    * Used to record the original type to help with cross conversion.
+   *
+   * This is superseded by logicalType.
    */
   6: optional ConvertedType converted_type;
 
-  /** Used when this column contains decimal data.
+  /**
+   * DEPRECATED: Used when this column contains decimal data.
    * See the DECIMAL converted type for more details.
+   *
+   * This is superseded by using the DecimalType annotation in logicalType.
    */
   7: optional i32 scale
   8: optional i32 precision