You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/10/10 12:46:41 UTC

[GitHub] [spark] maropu commented on a change in pull request #29837: [SPARK-32463][SQL][DOCS] Add "Type Conversion" section in "Supported Data Types" of SQL docs

maropu commented on a change in pull request #29837:
URL: https://github.com/apache/spark/pull/29837#discussion_r502785507



##########
File path: docs/sql-ref-datatypes.md
##########
@@ -314,3 +314,128 @@ SELECT COUNT(*), c2 FROM test GROUP BY c2;
 |        3| Infinity|
 +---------+---------+
 ```
+
+### Type conversion
+
+In general, an expression can contain different data types, type conversion is the transformation of some data types into others in order to solve the expressions. 
+Spark supports both implicit conversions by type coercion and explicit conversions by explicit casting and store assignment casting.
+
+#### Type coercion in operations between different types 
+

Review comment:
       Could you describe what' a type coercion at the beginning?

##########
File path: docs/sql-ref-datatypes.md
##########
@@ -314,3 +314,128 @@ SELECT COUNT(*), c2 FROM test GROUP BY c2;
 |        3| Infinity|
 +---------+---------+
 ```
+
+### Type conversion
+
+In general, an expression can contain different data types, type conversion is the transformation of some data types into others in order to solve the expressions. 
+Spark supports both implicit conversions by type coercion and explicit conversions by explicit casting and store assignment casting.
+
+#### Type coercion in operations between different types 
+
+The following matrix shows the resulting type to which they are implicitly converted to resolve an expression involving different data types 
+
+Numeric expresions:
+
+|               |ByteType   |ShortType  |IntegerType |LongType  |FloatType |DoubleType |StringType |
+|---------------|-----------|-----------|------------|----------|----------|-----------|-----------|
+|**ByteType**   |--         |ShortType  |IntegerType |LongType  |FloatType |DoubleType |DoubleType |
+|**ShortType**  |ShortType  |--         |IntegerType |LongType  |FloatType |DoubleType |DoubleType |
+|**IntegerType**|IntegerType|IntegerType|--          |LongType  |FloatType |DoubleType |DoubleType |
+|**LongType**   |LongType   |LongType   |LongType    |--        |FloatType |DoubleType |DoubleType |
+|**FloatType**  |FloatType  |FloatType  |FloatType   |FloatType |--        |DoubleType |DoubleType |
+|**DoubleType** |DoubleType |DoubleType |DoubleType  |DoubleType|DoubleType|--         |DoubleType |
+|**StringType** |DoubleType |DoubleType |DoubleType  |DoubleType|DoubleType|DoubleType |--         |

Review comment:
       This matrix means specific type coercion rules in binary arithmetic operations, right? Cold you add a matrix for type coercion in more general cases, too, like `Table 2-10 Implicit Type Conversion Matrix` in the Oracle doc? You could refer to the rules in `TypeCoercion.scala` https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala#L983-L1064

##########
File path: docs/sql-ref-datatypes.md
##########
@@ -314,3 +314,128 @@ SELECT COUNT(*), c2 FROM test GROUP BY c2;
 |        3| Infinity|
 +---------+---------+
 ```
+
+### Type conversion
+
+In general, an expression can contain different data types, type conversion is the transformation of some data types into others in order to solve the expressions. 

Review comment:
       `in order to solve the expressions.` -> `in order to resolve type mismatches.`?

##########
File path: docs/sql-ref-datatypes.md
##########
@@ -314,3 +314,128 @@ SELECT COUNT(*), c2 FROM test GROUP BY c2;
 |        3| Infinity|
 +---------+---------+
 ```
+
+### Type conversion
+
+In general, an expression can contain different data types, type conversion is the transformation of some data types into others in order to solve the expressions. 
+Spark supports both implicit conversions by type coercion and explicit conversions by explicit casting and store assignment casting.
+
+#### Type coercion in operations between different types 
+
+The following matrix shows the resulting type to which they are implicitly converted to resolve an expression involving different data types 
+
+Numeric expresions:
+
+|               |ByteType   |ShortType  |IntegerType |LongType  |FloatType |DoubleType |StringType |
+|---------------|-----------|-----------|------------|----------|----------|-----------|-----------|
+|**ByteType**   |--         |ShortType  |IntegerType |LongType  |FloatType |DoubleType |DoubleType |
+|**ShortType**  |ShortType  |--         |IntegerType |LongType  |FloatType |DoubleType |DoubleType |
+|**IntegerType**|IntegerType|IntegerType|--          |LongType  |FloatType |DoubleType |DoubleType |
+|**LongType**   |LongType   |LongType   |LongType    |--        |FloatType |DoubleType |DoubleType |
+|**FloatType**  |FloatType  |FloatType  |FloatType   |FloatType |--        |DoubleType |DoubleType |
+|**DoubleType** |DoubleType |DoubleType |DoubleType  |DoubleType|DoubleType|--         |DoubleType |
+|**StringType** |DoubleType |DoubleType |DoubleType  |DoubleType|DoubleType|DoubleType |--         |
+
+The case of DecimalType, is treated differently, for example, there is no common type for double and decimal because double's range is larger than decimal, and yet decimal is more precise than double, but in an expresion, we would cast the decimal into double.
+
+Time expresions:
+
+|                  |DateType     |TimestampType |
+|------------------|-------------|--------------|
+|**DateType**      |--           |TimestampType |
+|**TimestampType** |TimestampType|--            |
+
+#### Type coercion examples
+
+```sql
+DESCRIBE TABLE numericTable;
++-------------+---------+-------+
+|col_name     |data_type|comment|
++-------------+---------+-------+
+|integerColumn|int      |null   |
+|doubleColumn |double   |null   |
++-------------+---------+-------+
+
+DESCRIBE SELECT integerColumn + doubleColumn as result FROM numericTable;
++--------+---------+-------+
+|col_name|data_type|comment|
++--------+---------+-------+
+|  result|   double|   null|
++--------+---------+-------+
+
+```
+
+```sql
+DESCRIBE dateTable;
++---------------+---------+-------+
+|       col_name|data_type|comment|
++---------------+---------+-------+
+|     dateColumn|     date|   null|
+|timestampColumn|timestamp|   null|
++---------------+---------+-------+
+
+SELECT MONTHS_BETWEEN(dateColumn,timestampColumn) FROM dateTable;
+
+```
+
+#### Explicit casting and store assignment casting
+
+When you are using explicit casting by CAST or doing INSERT INTO operations that need to cast types to different store types, the following matrix shows if the conversion is allowed
+
+|             |ByteType  |ShortType |IntegerType |LongType |FloatType |DoubleType |StringType |BinaryType |BooleanType |TimestampType |DateType|
+|-------------|----------|----------|------------|---------|----------|-----------|-----------|-----------|------------|--------------|--------|
+|**ByteType** |--        |X         |X           |X        |X         |X          |X          |X          |X           |X             |        |
+|**ShortType**|*         |--        |X           |X        |X         |X          |X          |X          |X           |X             |        |
+|**IntegerType**|*       |*         |--          |X        |X         |X          |X          |X          |X           |X             |        |
+|**LongType** |*         |*         |*           |--       |X         |X          |X          |X          |X           |X             |        |
+|**FloatType** |*        |*         |*           |*        |--        |X          |X          |           |X           |X             |        |
+|**DoubleType** |*       |*         |*           |*        |*         |--         |X          |           |X           |X             |        |
+|**StringType** |*       |*         |*           |*        |*         |*          |--         |X          |X           |X             |X       |
+|**BinaryType** |        |          |            |         |          |           |           |--         |            |              |        |
+|**BooleanType** |X      |X         |X           |X        |X         |X          |X          |           |--          |              |        |
+|**TimestampType** |*    |*         |*           |X        |X         |X          |X          |           |            |--            |X       |
+|**DateType** |*         |*         |X           |X        |X         |X          |X          |           |            |X             |--      |
+
+X: Conversion allowed (cast ByteType in ShortType)  
+*: An overflow can occur (cast ShortType in ByteType)
+
+If an overflow occurs and ANSI compliance is activated (spark.sql.ansi.enabled is set to true for casting or spark.sql.storeAssignmentPolicy=ANSI for store assignment casting) an exception will be thrown. 

Review comment:
       spark.sql.storeAssignmentPolicy=ANSI -> \`spark.sql.storeAssignmentPolicy=ANSI\`

##########
File path: docs/sql-ref-datatypes.md
##########
@@ -314,3 +314,128 @@ SELECT COUNT(*), c2 FROM test GROUP BY c2;
 |        3| Infinity|
 +---------+---------+
 ```
+
+### Type conversion
+
+In general, an expression can contain different data types, type conversion is the transformation of some data types into others in order to solve the expressions. 

Review comment:
       `... can contain different data types, type conversion ...` -> `... can contain different data types and type conversion ...`?

##########
File path: docs/sql-ref-datatypes.md
##########
@@ -314,3 +314,128 @@ SELECT COUNT(*), c2 FROM test GROUP BY c2;
 |        3| Infinity|
 +---------+---------+
 ```
+
+### Type conversion

Review comment:
       `conversion` -> `Conversion`

##########
File path: docs/sql-ref-datatypes.md
##########
@@ -314,3 +314,128 @@ SELECT COUNT(*), c2 FROM test GROUP BY c2;
 |        3| Infinity|
 +---------+---------+
 ```
+
+### Type conversion
+
+In general, an expression can contain different data types, type conversion is the transformation of some data types into others in order to solve the expressions. 
+Spark supports both implicit conversions by type coercion and explicit conversions by explicit casting and store assignment casting.
+
+#### Type coercion in operations between different types 
+
+The following matrix shows the resulting type to which they are implicitly converted to resolve an expression involving different data types 
+
+Numeric expresions:
+
+|               |ByteType   |ShortType  |IntegerType |LongType  |FloatType |DoubleType |StringType |
+|---------------|-----------|-----------|------------|----------|----------|-----------|-----------|
+|**ByteType**   |--         |ShortType  |IntegerType |LongType  |FloatType |DoubleType |DoubleType |
+|**ShortType**  |ShortType  |--         |IntegerType |LongType  |FloatType |DoubleType |DoubleType |
+|**IntegerType**|IntegerType|IntegerType|--          |LongType  |FloatType |DoubleType |DoubleType |
+|**LongType**   |LongType   |LongType   |LongType    |--        |FloatType |DoubleType |DoubleType |
+|**FloatType**  |FloatType  |FloatType  |FloatType   |FloatType |--        |DoubleType |DoubleType |
+|**DoubleType** |DoubleType |DoubleType |DoubleType  |DoubleType|DoubleType|--         |DoubleType |
+|**StringType** |DoubleType |DoubleType |DoubleType  |DoubleType|DoubleType|DoubleType |--         |
+
+The case of DecimalType, is treated differently, for example, there is no common type for double and decimal because double's range is larger than decimal, and yet decimal is more precise than double, but in an expresion, we would cast the decimal into double.
+
+Time expresions:
+
+|                  |DateType     |TimestampType |
+|------------------|-------------|--------------|
+|**DateType**      |--           |TimestampType |
+|**TimestampType** |TimestampType|--            |
+
+#### Type coercion examples
+
+```sql
+DESCRIBE TABLE numericTable;
++-------------+---------+-------+
+|col_name     |data_type|comment|
++-------------+---------+-------+
+|integerColumn|int      |null   |
+|doubleColumn |double   |null   |
++-------------+---------+-------+
+
+DESCRIBE SELECT integerColumn + doubleColumn as result FROM numericTable;
++--------+---------+-------+
+|col_name|data_type|comment|
++--------+---------+-------+
+|  result|   double|   null|
++--------+---------+-------+
+
+```
+
+```sql
+DESCRIBE dateTable;
++---------------+---------+-------+
+|       col_name|data_type|comment|
++---------------+---------+-------+
+|     dateColumn|     date|   null|
+|timestampColumn|timestamp|   null|
++---------------+---------+-------+
+
+SELECT MONTHS_BETWEEN(dateColumn,timestampColumn) FROM dateTable;
+
+```
+
+#### Explicit casting and store assignment casting
+
+When you are using explicit casting by CAST or doing INSERT INTO operations that need to cast types to different store types, the following matrix shows if the conversion is allowed
+
+|             |ByteType  |ShortType |IntegerType |LongType |FloatType |DoubleType |StringType |BinaryType |BooleanType |TimestampType |DateType|
+|-------------|----------|----------|------------|---------|----------|-----------|-----------|-----------|------------|--------------|--------|
+|**ByteType** |--        |X         |X           |X        |X         |X          |X          |X          |X           |X             |        |
+|**ShortType**|*         |--        |X           |X        |X         |X          |X          |X          |X           |X             |        |
+|**IntegerType**|*       |*         |--          |X        |X         |X          |X          |X          |X           |X             |        |
+|**LongType** |*         |*         |*           |--       |X         |X          |X          |X          |X           |X             |        |
+|**FloatType** |*        |*         |*           |*        |--        |X          |X          |           |X           |X             |        |
+|**DoubleType** |*       |*         |*           |*        |*         |--         |X          |           |X           |X             |        |
+|**StringType** |*       |*         |*           |*        |*         |*          |--         |X          |X           |X             |X       |
+|**BinaryType** |        |          |            |         |          |           |           |--         |            |              |        |
+|**BooleanType** |X      |X         |X           |X        |X         |X          |X          |           |--          |              |        |
+|**TimestampType** |*    |*         |*           |X        |X         |X          |X          |           |            |--            |X       |
+|**DateType** |*         |*         |X           |X        |X         |X          |X          |           |            |X             |--      |
+
+X: Conversion allowed (cast ByteType in ShortType)  
+*: An overflow can occur (cast ShortType in ByteType)
+
+If an overflow occurs and ANSI compliance is activated (spark.sql.ansi.enabled is set to true for casting or spark.sql.storeAssignmentPolicy=ANSI for store assignment casting) an exception will be thrown. 

Review comment:
       spark.sql.ansi.enabled -> \`spark.sql.ansi.enabled\`

##########
File path: docs/sql-ref-datatypes.md
##########
@@ -314,3 +314,128 @@ SELECT COUNT(*), c2 FROM test GROUP BY c2;
 |        3| Infinity|
 +---------+---------+
 ```
+
+### Type conversion
+
+In general, an expression can contain different data types, type conversion is the transformation of some data types into others in order to solve the expressions. 
+Spark supports both implicit conversions by type coercion and explicit conversions by explicit casting and store assignment casting.
+
+#### Type coercion in operations between different types 
+
+The following matrix shows the resulting type to which they are implicitly converted to resolve an expression involving different data types 
+
+Numeric expresions:
+
+|               |ByteType   |ShortType  |IntegerType |LongType  |FloatType |DoubleType |StringType |
+|---------------|-----------|-----------|------------|----------|----------|-----------|-----------|
+|**ByteType**   |--         |ShortType  |IntegerType |LongType  |FloatType |DoubleType |DoubleType |
+|**ShortType**  |ShortType  |--         |IntegerType |LongType  |FloatType |DoubleType |DoubleType |
+|**IntegerType**|IntegerType|IntegerType|--          |LongType  |FloatType |DoubleType |DoubleType |
+|**LongType**   |LongType   |LongType   |LongType    |--        |FloatType |DoubleType |DoubleType |
+|**FloatType**  |FloatType  |FloatType  |FloatType   |FloatType |--        |DoubleType |DoubleType |
+|**DoubleType** |DoubleType |DoubleType |DoubleType  |DoubleType|DoubleType|--         |DoubleType |
+|**StringType** |DoubleType |DoubleType |DoubleType  |DoubleType|DoubleType|DoubleType |--         |
+
+The case of DecimalType, is treated differently, for example, there is no common type for double and decimal because double's range is larger than decimal, and yet decimal is more precise than double, but in an expresion, we would cast the decimal into double.

Review comment:
       Could you describe more about decimal type coercion by referring to `DecimalPrecision`?  https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/DecimalPrecision.scala#L29-L63

##########
File path: docs/sql-ref-datatypes.md
##########
@@ -314,3 +314,128 @@ SELECT COUNT(*), c2 FROM test GROUP BY c2;
 |        3| Infinity|
 +---------+---------+
 ```
+
+### Type conversion
+
+In general, an expression can contain different data types, type conversion is the transformation of some data types into others in order to solve the expressions. 
+Spark supports both implicit conversions by type coercion and explicit conversions by explicit casting and store assignment casting.
+
+#### Type coercion in operations between different types 
+
+The following matrix shows the resulting type to which they are implicitly converted to resolve an expression involving different data types 
+
+Numeric expresions:
+
+|               |ByteType   |ShortType  |IntegerType |LongType  |FloatType |DoubleType |StringType |
+|---------------|-----------|-----------|------------|----------|----------|-----------|-----------|
+|**ByteType**   |--         |ShortType  |IntegerType |LongType  |FloatType |DoubleType |DoubleType |
+|**ShortType**  |ShortType  |--         |IntegerType |LongType  |FloatType |DoubleType |DoubleType |
+|**IntegerType**|IntegerType|IntegerType|--          |LongType  |FloatType |DoubleType |DoubleType |
+|**LongType**   |LongType   |LongType   |LongType    |--        |FloatType |DoubleType |DoubleType |
+|**FloatType**  |FloatType  |FloatType  |FloatType   |FloatType |--        |DoubleType |DoubleType |
+|**DoubleType** |DoubleType |DoubleType |DoubleType  |DoubleType|DoubleType|--         |DoubleType |
+|**StringType** |DoubleType |DoubleType |DoubleType  |DoubleType|DoubleType|DoubleType |--         |
+
+The case of DecimalType, is treated differently, for example, there is no common type for double and decimal because double's range is larger than decimal, and yet decimal is more precise than double, but in an expresion, we would cast the decimal into double.
+
+Time expresions:
+
+|                  |DateType     |TimestampType |
+|------------------|-------------|--------------|
+|**DateType**      |--           |TimestampType |
+|**TimestampType** |TimestampType|--            |
+
+#### Type coercion examples
+
+```sql
+DESCRIBE TABLE numericTable;
++-------------+---------+-------+
+|col_name     |data_type|comment|
++-------------+---------+-------+
+|integerColumn|int      |null   |
+|doubleColumn |double   |null   |
++-------------+---------+-------+
+
+DESCRIBE SELECT integerColumn + doubleColumn as result FROM numericTable;
++--------+---------+-------+
+|col_name|data_type|comment|
++--------+---------+-------+
+|  result|   double|   null|
++--------+---------+-------+
+
+```
+
+```sql
+DESCRIBE dateTable;
++---------------+---------+-------+
+|       col_name|data_type|comment|
++---------------+---------+-------+
+|     dateColumn|     date|   null|
+|timestampColumn|timestamp|   null|
++---------------+---------+-------+
+
+SELECT MONTHS_BETWEEN(dateColumn,timestampColumn) FROM dateTable;
+

Review comment:
       Could you put the output for following the other doc examples?

##########
File path: docs/sql-ref-datatypes.md
##########
@@ -314,3 +314,128 @@ SELECT COUNT(*), c2 FROM test GROUP BY c2;
 |        3| Infinity|
 +---------+---------+
 ```
+
+### Type conversion
+
+In general, an expression can contain different data types, type conversion is the transformation of some data types into others in order to solve the expressions. 
+Spark supports both implicit conversions by type coercion and explicit conversions by explicit casting and store assignment casting.
+
+#### Type coercion in operations between different types 
+
+The following matrix shows the resulting type to which they are implicitly converted to resolve an expression involving different data types 
+
+Numeric expresions:
+
+|               |ByteType   |ShortType  |IntegerType |LongType  |FloatType |DoubleType |StringType |
+|---------------|-----------|-----------|------------|----------|----------|-----------|-----------|
+|**ByteType**   |--         |ShortType  |IntegerType |LongType  |FloatType |DoubleType |DoubleType |
+|**ShortType**  |ShortType  |--         |IntegerType |LongType  |FloatType |DoubleType |DoubleType |
+|**IntegerType**|IntegerType|IntegerType|--          |LongType  |FloatType |DoubleType |DoubleType |
+|**LongType**   |LongType   |LongType   |LongType    |--        |FloatType |DoubleType |DoubleType |
+|**FloatType**  |FloatType  |FloatType  |FloatType   |FloatType |--        |DoubleType |DoubleType |
+|**DoubleType** |DoubleType |DoubleType |DoubleType  |DoubleType|DoubleType|--         |DoubleType |
+|**StringType** |DoubleType |DoubleType |DoubleType  |DoubleType|DoubleType|DoubleType |--         |
+
+The case of DecimalType, is treated differently, for example, there is no common type for double and decimal because double's range is larger than decimal, and yet decimal is more precise than double, but in an expresion, we would cast the decimal into double.
+
+Time expresions:
+
+|                  |DateType     |TimestampType |
+|------------------|-------------|--------------|
+|**DateType**      |--           |TimestampType |
+|**TimestampType** |TimestampType|--            |
+
+#### Type coercion examples
+
+```sql
+DESCRIBE TABLE numericTable;
++-------------+---------+-------+
+|col_name     |data_type|comment|
++-------------+---------+-------+
+|integerColumn|int      |null   |
+|doubleColumn |double   |null   |
++-------------+---------+-------+
+
+DESCRIBE SELECT integerColumn + doubleColumn as result FROM numericTable;
++--------+---------+-------+
+|col_name|data_type|comment|
++--------+---------+-------+
+|  result|   double|   null|
++--------+---------+-------+
+
+```
+
+```sql
+DESCRIBE dateTable;
++---------------+---------+-------+
+|       col_name|data_type|comment|
++---------------+---------+-------+
+|     dateColumn|     date|   null|
+|timestampColumn|timestamp|   null|
++---------------+---------+-------+
+
+SELECT MONTHS_BETWEEN(dateColumn,timestampColumn) FROM dateTable;
+
+```
+
+#### Explicit casting and store assignment casting

Review comment:
       cc: @gengliangwang 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org