You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by "PhilDakin (via GitHub)" <gi...@apache.org> on 2023/10/13 21:26:35 UTC

[PR] [SPARK-44733][PYTHON][DOCS] Add Python to Spark type conversion page to PySpark docs. [spark]

PhilDakin opened a new pull request, #43369:
URL: https://github.com/apache/spark/pull/43369

   @allisonwang-db
   
   ### What changes were proposed in this pull request?
   Add documentation page showing Python to Spark type mappings for PySpark.
   
   
   ### Why are the changes needed?
   Surface this information to users navigating the PySpark docs per https://issues.apache.org/jira/browse/SPARK-44733.
   
   
   ### Does this PR introduce _any_ user-facing change?
   Yes, adds new page to PySpark docs.
   
   
   ### How was this patch tested?
   Build HTML docs file using Sphinx, inspect visually.
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   No.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-44733][PYTHON][DOCS] Add Python to Spark type conversion page to PySpark docs. [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.

HyukjinKwon commented on code in PR #43369:
URL: https://github.com/apache/spark/pull/43369#discussion_r1359990662


##########
python/docs/source/user_guide/sql/type_conversions.rst:
##########
@@ -0,0 +1,106 @@
+..  Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+..    http://www.apache.org/licenses/LICENSE-2.0
+
+..  Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+=======================
+Python to Spark Type Conversions
+=======================
+
+.. currentmodule:: pyspark.sql.types
+
+All data types of Spark SQL are located in the package of `pyspark.sql.types`.
+You can access them by doing:
+
+.. code-block:: python
+
+    from pyspark.sql.types import *
+
+.. list-table::
+    :header-rows: 1
+
+    * - Data type
+      - Value type in Python
+      - API to access or create a data type
+    * - **ByteType**
+      - | int or long

Review Comment:
   should be `int`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-44733][PYTHON][DOCS] Add Python to Spark type conversion page to PySpark docs. [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.

HyukjinKwon commented on code in PR #43369:
URL: https://github.com/apache/spark/pull/43369#discussion_r1390507631


##########
python/docs/source/user_guide/sql/type_conversions.rst:
##########
@@ -0,0 +1,248 @@
+..  Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+..    http://www.apache.org/licenses/LICENSE-2.0
+
+..  Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+================================
+Python to Spark Type Conversions
+================================
+
+.. TODO: Add additional information on conversions when Arrow is enabled.
+.. TODO: Add in-depth explanation and table for type conversions (SPARK-44734).
+
+.. currentmodule:: pyspark.sql.types
+
+When working with PySpark, you will often need to consider the conversions between Python-native
+objects to their Spark equivalents. For instance, when working with user-defined functions, the
+function return type will be cast by Spark to an appropriate Spark SQL type. Or, when creating a
+``DataFrame``, you may supply ``numpy`` or ``pandas`` objects as the inputted data. This guide will cover
+the various conversions between Python and Spark SQL types.
+
+Browsing Type Conversions
+-------------------------
+
+Though this document provides a comprehensive list of type conversions, you may find it easier to
+interactively check the conversion behavior of Spark. To do so, you can test small examples of
+user-defined functions, and use the ``spark.createDataFrame`` interface.
+
+All data types of Spark SQL are located in the package of ``pyspark.sql.types``.
+You can access them by doing:
+
+.. code-block:: python
+
+    from pyspark.sql.types import *
+
+Configuration
+-------------
+There are several configurations that affect the behavior of type conversions. These configurations
+are listed below:
+
+.. list-table::
+    :header-rows: 1
+
+    * - Configuration
+      - Description
+      - Default
+    * - spark.sql.execution.pythonUDF.arrow.enabled
+      - Enable PyArrow in PySpark. See more `here <arrow_pandas.rst>`_.
+      - False
+    * - spark.sql.pyspark.inferNestedDictAsStruct.enabled
+      - When enabled, nested dictionaries are inferred as StructType. Otherwise, they are inferred as MapType.
+      - False
+    * - spark.sql.timestampType
+      - If set to `TIMESTAMP_NTZ`, the default timestamp type is ``TimestampNTZType``. Otherwise, the default timestamp type is TimestampType.
+      - ""
+
+All Conversions
+---------------
+.. list-table::

Review Comment:
   Let's at least add a comment here to update `docs/sql-ref-datatypes.md` together if anyone makes some change. I don't still like that we're duplicating the docs but probably it's fine as we're going to put all Python specific information here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-44733][PYTHON][DOCS] Add Python to Spark type conversion page to PySpark docs. [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.

HyukjinKwon commented on code in PR #43369:
URL: https://github.com/apache/spark/pull/43369#discussion_r1390506968


##########
python/docs/source/user_guide/sql/type_conversions.rst:
##########
@@ -0,0 +1,248 @@
+..  Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+..    http://www.apache.org/licenses/LICENSE-2.0
+
+..  Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+================================
+Python to Spark Type Conversions
+================================
+
+.. TODO: Add additional information on conversions when Arrow is enabled.
+.. TODO: Add in-depth explanation and table for type conversions (SPARK-44734).
+
+.. currentmodule:: pyspark.sql.types
+
+When working with PySpark, you will often need to consider the conversions between Python-native
+objects to their Spark equivalents. For instance, when working with user-defined functions, the
+function return type will be cast by Spark to an appropriate Spark SQL type. Or, when creating a
+``DataFrame``, you may supply ``numpy`` or ``pandas`` objects as the inputted data. This guide will cover
+the various conversions between Python and Spark SQL types.
+
+Browsing Type Conversions
+-------------------------
+
+Though this document provides a comprehensive list of type conversions, you may find it easier to
+interactively check the conversion behavior of Spark. To do so, you can test small examples of
+user-defined functions, and use the ``spark.createDataFrame`` interface.
+
+All data types of Spark SQL are located in the package of ``pyspark.sql.types``.
+You can access them by doing:
+
+.. code-block:: python
+
+    from pyspark.sql.types import *
+
+Configuration
+-------------
+There are several configurations that affect the behavior of type conversions. These configurations
+are listed below:
+
+.. list-table::
+    :header-rows: 1
+
+    * - Configuration
+      - Description
+      - Default
+    * - spark.sql.execution.pythonUDF.arrow.enabled
+      - Enable PyArrow in PySpark. See more `here <arrow_pandas.rst>`_.
+      - False
+    * - spark.sql.pyspark.inferNestedDictAsStruct.enabled
+      - When enabled, nested dictionaries are inferred as StructType. Otherwise, they are inferred as MapType.
+      - False
+    * - spark.sql.timestampType
+      - If set to `TIMESTAMP_NTZ`, the default timestamp type is ``TimestampNTZType``. Otherwise, the default timestamp type is TimestampType.
+      - ""
+
+All Conversions
+---------------
+.. list-table::
+    :header-rows: 1
+
+    * - Data type
+      - Value type in Python
+      - API to access or create a data type
+    * - **ByteType**
+      - int
+          .. note:: Numbers will be converted to 1-byte signed integer numbers at runtime. Please make sure that numbers are within the range of -128 to 127.
+      - ByteType()
+    * - **ShortType**
+      - int
+          .. note:: Numbers will be converted to 2-byte signed integer numbers at runtime. Please make sure that numbers are within the range of -32768 to 32767.
+      - ShortType()
+    * - **IntegerType**
+      - int
+      - IntegerType()
+    * - **LongType**
+      - int
+          .. note:: Numbers will be converted to 8-byte signed integer numbers at runtime. Please make sure that numbers are within the range of -9223372036854775808 to 9223372036854775807. Otherwise, please convert data to decimal.Decimal and use DecimalType.
+      - LongType()
+    * - **FloatType**
+      - float
+          .. note:: Numbers will be converted to 4-byte single-precision floating point numbers at runtime.
+      - FloatType()
+    * - **DoubleType**
+      - float
+      - DoubleType()
+    * - **DecimalType**
+      - decimal.Decimal
+      - DecimalType()|
+    * - **StringType**
+      - string
+      - StringType()
+    * - **BinaryType**
+      - bytearray
+      - BinaryType()
+    * - **BooleanType**
+      - bool
+      - BooleanType()
+    * - **TimestampType**
+      - datetime.datetime
+      - TimestampType()
+    * - **TimestampNTZType**
+      - datetime.datetime
+      - TimestampNTZType()
+    * - **DateType**
+      - datetime.date
+      - DateType()
+    * - **DayTimeIntervalType**
+      - datetime.timedelta
+      - DayTimeIntervalType()
+    * - **ArrayType**
+      - list, tuple, or array
+      - ArrayType(*elementType*, [*containsNull*])
+          .. note:: The default value of *containsNull* is True.
+    * - **MapType**
+      - dict
+      - MapType(*keyType*, *valueType*, [*valueContainsNull]*)
+          .. note:: The default value of *valueContainsNull* is True.
+    * - **StructType**
+      - list or tuple
+      - StructType(*fields*)
+          .. note:: *fields* is a Seq of StructFields. Also, two fields with the same name are not allowed.
+    * - **StructField**
+      - The value type in Python of the data type of this field. For example, Int for a StructField with the data type IntegerType.
+      - StructField(*name*, *dataType*, [*nullable*])
+          .. note:: The default value of *nullable* is True.
+
+Conversions in Practice - UDFs
+------------------------------
+A common conversion case is returning a Python value from a UDF. In this case, the return type of
+the UDF must match the provided return type.
+
+.. note:: If the actual return type of your function does not match the provided return type, Spark will implicitly cast the value to null.
+
+.. code-block:: python
+
+  from pyspark.sql.types import (
+      StructType,
+      StructField,
+      IntegerType,
+      StringType,
+      FloatType,
+  )
+  from pyspark.sql.functions import udf, col
+
+  df = spark.createDataFrame(
+      [[1]], schema=StructType([StructField("int", IntegerType())])
+  )
+

Review Comment:
   should be two vertical newlines according to PEP 8.



##########
python/docs/source/user_guide/sql/type_conversions.rst:
##########
@@ -0,0 +1,248 @@
+..  Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+..    http://www.apache.org/licenses/LICENSE-2.0
+
+..  Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+================================
+Python to Spark Type Conversions
+================================
+
+.. TODO: Add additional information on conversions when Arrow is enabled.
+.. TODO: Add in-depth explanation and table for type conversions (SPARK-44734).
+
+.. currentmodule:: pyspark.sql.types
+
+When working with PySpark, you will often need to consider the conversions between Python-native
+objects to their Spark equivalents. For instance, when working with user-defined functions, the
+function return type will be cast by Spark to an appropriate Spark SQL type. Or, when creating a
+``DataFrame``, you may supply ``numpy`` or ``pandas`` objects as the inputted data. This guide will cover
+the various conversions between Python and Spark SQL types.
+
+Browsing Type Conversions
+-------------------------
+
+Though this document provides a comprehensive list of type conversions, you may find it easier to
+interactively check the conversion behavior of Spark. To do so, you can test small examples of
+user-defined functions, and use the ``spark.createDataFrame`` interface.
+
+All data types of Spark SQL are located in the package of ``pyspark.sql.types``.
+You can access them by doing:
+
+.. code-block:: python
+
+    from pyspark.sql.types import *
+
+Configuration
+-------------
+There are several configurations that affect the behavior of type conversions. These configurations
+are listed below:
+
+.. list-table::
+    :header-rows: 1
+
+    * - Configuration
+      - Description
+      - Default
+    * - spark.sql.execution.pythonUDF.arrow.enabled
+      - Enable PyArrow in PySpark. See more `here <arrow_pandas.rst>`_.
+      - False
+    * - spark.sql.pyspark.inferNestedDictAsStruct.enabled
+      - When enabled, nested dictionaries are inferred as StructType. Otherwise, they are inferred as MapType.
+      - False
+    * - spark.sql.timestampType
+      - If set to `TIMESTAMP_NTZ`, the default timestamp type is ``TimestampNTZType``. Otherwise, the default timestamp type is TimestampType.
+      - ""
+
+All Conversions
+---------------
+.. list-table::
+    :header-rows: 1
+
+    * - Data type
+      - Value type in Python
+      - API to access or create a data type
+    * - **ByteType**
+      - int
+          .. note:: Numbers will be converted to 1-byte signed integer numbers at runtime. Please make sure that numbers are within the range of -128 to 127.
+      - ByteType()
+    * - **ShortType**
+      - int
+          .. note:: Numbers will be converted to 2-byte signed integer numbers at runtime. Please make sure that numbers are within the range of -32768 to 32767.
+      - ShortType()
+    * - **IntegerType**
+      - int
+      - IntegerType()
+    * - **LongType**
+      - int
+          .. note:: Numbers will be converted to 8-byte signed integer numbers at runtime. Please make sure that numbers are within the range of -9223372036854775808 to 9223372036854775807. Otherwise, please convert data to decimal.Decimal and use DecimalType.
+      - LongType()
+    * - **FloatType**
+      - float
+          .. note:: Numbers will be converted to 4-byte single-precision floating point numbers at runtime.
+      - FloatType()
+    * - **DoubleType**
+      - float
+      - DoubleType()
+    * - **DecimalType**
+      - decimal.Decimal
+      - DecimalType()|
+    * - **StringType**
+      - string
+      - StringType()
+    * - **BinaryType**
+      - bytearray
+      - BinaryType()
+    * - **BooleanType**
+      - bool
+      - BooleanType()
+    * - **TimestampType**
+      - datetime.datetime
+      - TimestampType()
+    * - **TimestampNTZType**
+      - datetime.datetime
+      - TimestampNTZType()
+    * - **DateType**
+      - datetime.date
+      - DateType()
+    * - **DayTimeIntervalType**
+      - datetime.timedelta
+      - DayTimeIntervalType()
+    * - **ArrayType**
+      - list, tuple, or array
+      - ArrayType(*elementType*, [*containsNull*])
+          .. note:: The default value of *containsNull* is True.
+    * - **MapType**
+      - dict
+      - MapType(*keyType*, *valueType*, [*valueContainsNull]*)
+          .. note:: The default value of *valueContainsNull* is True.
+    * - **StructType**
+      - list or tuple
+      - StructType(*fields*)
+          .. note:: *fields* is a Seq of StructFields. Also, two fields with the same name are not allowed.
+    * - **StructField**
+      - The value type in Python of the data type of this field. For example, Int for a StructField with the data type IntegerType.
+      - StructField(*name*, *dataType*, [*nullable*])
+          .. note:: The default value of *nullable* is True.
+
+Conversions in Practice - UDFs
+------------------------------
+A common conversion case is returning a Python value from a UDF. In this case, the return type of
+the UDF must match the provided return type.
+
+.. note:: If the actual return type of your function does not match the provided return type, Spark will implicitly cast the value to null.
+
+.. code-block:: python
+
+  from pyspark.sql.types import (
+      StructType,
+      StructField,
+      IntegerType,
+      StringType,
+      FloatType,
+  )
+  from pyspark.sql.functions import udf, col
+
+  df = spark.createDataFrame(
+      [[1]], schema=StructType([StructField("int", IntegerType())])
+  )
+
+  @udf(returnType=StringType())
+  def to_string(value):
+      return str(value)
+

Review Comment:
   ditto



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-44733][PYTHON][DOCS] Add Python to Spark type conversion page to PySpark docs. [spark]

Posted by "PhilDakin (via GitHub)" <gi...@apache.org>.

PhilDakin commented on PR #43369:
URL: https://github.com/apache/spark/pull/43369#issuecomment-1809445378

   @HyukjinKwon ah, I was still going to address your other comments before merge. Not a big deal.
   
   My JIRA ID is `phildakin`, but I'd prefer to not be assigned that other ticket at this time - spending time on other things.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-44733][PYTHON][DOCS] Add Python to Spark type conversion page to PySpark docs. [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.

HyukjinKwon commented on code in PR #43369:
URL: https://github.com/apache/spark/pull/43369#discussion_r1390507243


##########
python/docs/source/user_guide/sql/type_conversions.rst:
##########
@@ -0,0 +1,248 @@
+..  Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+..    http://www.apache.org/licenses/LICENSE-2.0
+
+..  Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+================================
+Python to Spark Type Conversions
+================================
+
+.. TODO: Add additional information on conversions when Arrow is enabled.
+.. TODO: Add in-depth explanation and table for type conversions (SPARK-44734).
+
+.. currentmodule:: pyspark.sql.types
+
+When working with PySpark, you will often need to consider the conversions between Python-native
+objects to their Spark equivalents. For instance, when working with user-defined functions, the
+function return type will be cast by Spark to an appropriate Spark SQL type. Or, when creating a
+``DataFrame``, you may supply ``numpy`` or ``pandas`` objects as the inputted data. This guide will cover
+the various conversions between Python and Spark SQL types.
+
+Browsing Type Conversions
+-------------------------
+
+Though this document provides a comprehensive list of type conversions, you may find it easier to
+interactively check the conversion behavior of Spark. To do so, you can test small examples of
+user-defined functions, and use the ``spark.createDataFrame`` interface.
+
+All data types of Spark SQL are located in the package of ``pyspark.sql.types``.
+You can access them by doing:
+
+.. code-block:: python
+
+    from pyspark.sql.types import *
+
+Configuration
+-------------
+There are several configurations that affect the behavior of type conversions. These configurations
+are listed below:
+
+.. list-table::
+    :header-rows: 1
+
+    * - Configuration
+      - Description
+      - Default
+    * - spark.sql.execution.pythonUDF.arrow.enabled
+      - Enable PyArrow in PySpark. See more `here <arrow_pandas.rst>`_.
+      - False
+    * - spark.sql.pyspark.inferNestedDictAsStruct.enabled
+      - When enabled, nested dictionaries are inferred as StructType. Otherwise, they are inferred as MapType.
+      - False
+    * - spark.sql.timestampType
+      - If set to `TIMESTAMP_NTZ`, the default timestamp type is ``TimestampNTZType``. Otherwise, the default timestamp type is TimestampType.
+      - ""
+
+All Conversions
+---------------
+.. list-table::
+    :header-rows: 1
+
+    * - Data type
+      - Value type in Python
+      - API to access or create a data type
+    * - **ByteType**
+      - int
+          .. note:: Numbers will be converted to 1-byte signed integer numbers at runtime. Please make sure that numbers are within the range of -128 to 127.
+      - ByteType()
+    * - **ShortType**
+      - int
+          .. note:: Numbers will be converted to 2-byte signed integer numbers at runtime. Please make sure that numbers are within the range of -32768 to 32767.
+      - ShortType()
+    * - **IntegerType**
+      - int
+      - IntegerType()
+    * - **LongType**
+      - int
+          .. note:: Numbers will be converted to 8-byte signed integer numbers at runtime. Please make sure that numbers are within the range of -9223372036854775808 to 9223372036854775807. Otherwise, please convert data to decimal.Decimal and use DecimalType.
+      - LongType()
+    * - **FloatType**
+      - float
+          .. note:: Numbers will be converted to 4-byte single-precision floating point numbers at runtime.
+      - FloatType()
+    * - **DoubleType**
+      - float
+      - DoubleType()
+    * - **DecimalType**
+      - decimal.Decimal
+      - DecimalType()|
+    * - **StringType**
+      - string
+      - StringType()
+    * - **BinaryType**
+      - bytearray
+      - BinaryType()
+    * - **BooleanType**
+      - bool
+      - BooleanType()
+    * - **TimestampType**
+      - datetime.datetime
+      - TimestampType()
+    * - **TimestampNTZType**
+      - datetime.datetime
+      - TimestampNTZType()
+    * - **DateType**
+      - datetime.date
+      - DateType()
+    * - **DayTimeIntervalType**
+      - datetime.timedelta
+      - DayTimeIntervalType()
+    * - **ArrayType**
+      - list, tuple, or array
+      - ArrayType(*elementType*, [*containsNull*])
+          .. note:: The default value of *containsNull* is True.
+    * - **MapType**
+      - dict
+      - MapType(*keyType*, *valueType*, [*valueContainsNull]*)
+          .. note:: The default value of *valueContainsNull* is True.
+    * - **StructType**
+      - list or tuple
+      - StructType(*fields*)
+          .. note:: *fields* is a Seq of StructFields. Also, two fields with the same name are not allowed.
+    * - **StructField**
+      - The value type in Python of the data type of this field. For example, Int for a StructField with the data type IntegerType.
+      - StructField(*name*, *dataType*, [*nullable*])
+          .. note:: The default value of *nullable* is True.
+
+Conversions in Practice - UDFs
+------------------------------
+A common conversion case is returning a Python value from a UDF. In this case, the return type of
+the UDF must match the provided return type.
+
+.. note:: If the actual return type of your function does not match the provided return type, Spark will implicitly cast the value to null.
+
+.. code-block:: python
+
+  from pyspark.sql.types import (
+      StructType,
+      StructField,
+      IntegerType,
+      StringType,
+      FloatType,
+  )
+  from pyspark.sql.functions import udf, col
+
+  df = spark.createDataFrame(
+      [[1]], schema=StructType([StructField("int", IntegerType())])
+  )
+
+  @udf(returnType=StringType())
+  def to_string(value):
+      return str(value)
+
+  @udf(returnType=FloatType())
+  def to_float(value):
+      return float(value)
+
+  df.withColumn("cast_int", to_float(col("int"))).withColumn(
+      "cast_str", to_string(col("int"))
+  ).printSchema()
+  # root
+  # |-- int: integer (nullable = true)
+  # |-- cast_int: float (nullable = true)
+  # |-- cast_str: string (nullable = true)
+
+Conversions in Practice - Creating DataFrames
+---------------------------------------------
+Another common conversion case is when creating a DataFrame from values in Python. In this case,
+you can supply a schema, or allow Spark to infer the schema from the provided data.
+
+.. code-block:: python
+
+  data = [
+      ["Wei", "Math", 93.0, 1],
+      ["Jerry", "Physics", 85.0, 4],
+      ["Katrina", "Geology", 90.0, 2],
+  ]
+  cols = ["Name", "Subject", "Score", "Period"]
+
+  spark.createDataFrame(data, cols).printSchema()
+  # root
+  # |-- Name: string (nullable = true)
+  # |-- Subject: string (nullable = true)
+  # |-- Score: double (nullable = true)
+  # |-- Period: long (nullable = true)
+
+  import pandas as pd
+
+  df = pd.DataFrame(data, columns=cols)
+  spark.createDataFrame(df).printSchema()
+  # root
+  # |-- Name: string (nullable = true)
+  # |-- Subject: string (nullable = true)
+  # |-- Score: double (nullable = true)
+  # |-- Period: long (nullable = true)
+
+  import numpy as np
+
+  spark.createDataFrame(np.zeros([3, 2], "int8")).printSchema()
+  # root
+  # |-- _1: byte (nullable = true)
+  # |-- _2: byte (nullable = true)
+
+Conversions in Practice - Nested Data Types
+-------------------------------------------
+Nested data types will convert to ``StructType``, ``MapType``, and ``ArrayType``, depending on the passed data.
+
+.. code-block:: python
+
+  data = [

Review Comment:
   should be either 3 spaces (per the sphinx specification), or 4 spaces to be consistent across PySpark documentation (yes we're using non standard spacing in most of the rst files).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-44733][PYTHON][DOCS] Add Python to Spark type conversion page to PySpark docs. [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.

HyukjinKwon closed pull request #43369: [SPARK-44733][PYTHON][DOCS] Add Python to Spark type conversion page to PySpark docs.
URL: https://github.com/apache/spark/pull/43369


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-44733][PYTHON][DOCS] Add Python to Spark type conversion page to PySpark docs. [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.

HyukjinKwon commented on PR #43369:
URL: https://github.com/apache/spark/pull/43369#issuecomment-1809442490

   @PhilDakin do you have a JIRA ID? so I can assign this ticket (SPARK-44733) to you. feel free to directly comment in the JIRA.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-44733][PYTHON][DOCS] Add Python to Spark type conversion page to PySpark docs. [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.

HyukjinKwon commented on PR #43369:
URL: https://github.com/apache/spark/pull/43369#issuecomment-1809441470

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-44733][PYTHON][DOCS] Add Python to Spark type conversion page to PySpark docs. [spark]

Posted by "PhilDakin (via GitHub)" <gi...@apache.org>.

PhilDakin commented on PR #43369:
URL: https://github.com/apache/spark/pull/43369#issuecomment-1766593152

   ![spark_python_docs_build_html_user_guide_sql_type_conversions html](https://github.com/apache/spark/assets/15946757/3d8c0d82-6c73-4c8f-b6e1-21d781ce125c)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-44733][PYTHON][DOCS] Add Python to Spark type conversion page to PySpark docs. [spark]

Posted by "PhilDakin (via GitHub)" <gi...@apache.org>.

PhilDakin commented on PR #43369:
URL: https://github.com/apache/spark/pull/43369#issuecomment-1768747894

   ![Screenshot 2023-10-18 at 11 33 37 AM](https://github.com/apache/spark/assets/15946757/bf6bc251-39fe-48af-8f13-921998043961)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-44733][PYTHON][DOCS] Add Python to Spark type conversion page to PySpark docs. [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.

HyukjinKwon commented on code in PR #43369:
URL: https://github.com/apache/spark/pull/43369#discussion_r1390507342


##########
python/docs/source/user_guide/sql/type_conversions.rst:
##########
@@ -0,0 +1,248 @@
+..  Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+..    http://www.apache.org/licenses/LICENSE-2.0
+
+..  Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+================================
+Python to Spark Type Conversions
+================================
+
+.. TODO: Add additional information on conversions when Arrow is enabled.
+.. TODO: Add in-depth explanation and table for type conversions (SPARK-44734).
+
+.. currentmodule:: pyspark.sql.types
+
+When working with PySpark, you will often need to consider the conversions between Python-native
+objects to their Spark equivalents. For instance, when working with user-defined functions, the
+function return type will be cast by Spark to an appropriate Spark SQL type. Or, when creating a
+``DataFrame``, you may supply ``numpy`` or ``pandas`` objects as the inputted data. This guide will cover
+the various conversions between Python and Spark SQL types.
+
+Browsing Type Conversions
+-------------------------
+
+Though this document provides a comprehensive list of type conversions, you may find it easier to
+interactively check the conversion behavior of Spark. To do so, you can test small examples of
+user-defined functions, and use the ``spark.createDataFrame`` interface.
+
+All data types of Spark SQL are located in the package of ``pyspark.sql.types``.
+You can access them by doing:
+
+.. code-block:: python
+
+    from pyspark.sql.types import *
+
+Configuration
+-------------
+There are several configurations that affect the behavior of type conversions. These configurations
+are listed below:
+
+.. list-table::
+    :header-rows: 1
+
+    * - Configuration
+      - Description
+      - Default
+    * - spark.sql.execution.pythonUDF.arrow.enabled

Review Comment:
   Should we make it code block like 
   ```suggestion
       * - `spark.sql.execution.pythonUDF.arrow.enabled`
   ```
   
   ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-44733][PYTHON][DOCS] Add Python to Spark type conversion page to PySpark docs. [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.

HyukjinKwon commented on code in PR #43369:
URL: https://github.com/apache/spark/pull/43369#discussion_r1390508025


##########
python/docs/source/user_guide/sql/type_conversions.rst:
##########
@@ -0,0 +1,248 @@
+..  Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+..    http://www.apache.org/licenses/LICENSE-2.0
+
+..  Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+================================
+Python to Spark Type Conversions
+================================
+
+.. TODO: Add additional information on conversions when Arrow is enabled.

Review Comment:
   Should probably file a JIRA



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-44733][PYTHON][DOCS] Add Python to Spark type conversion page to PySpark docs. [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.

HyukjinKwon commented on code in PR #43369:
URL: https://github.com/apache/spark/pull/43369#discussion_r1390507631


##########
python/docs/source/user_guide/sql/type_conversions.rst:
##########
@@ -0,0 +1,248 @@
+..  Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+..    http://www.apache.org/licenses/LICENSE-2.0
+
+..  Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+================================
+Python to Spark Type Conversions
+================================
+
+.. TODO: Add additional information on conversions when Arrow is enabled.
+.. TODO: Add in-depth explanation and table for type conversions (SPARK-44734).
+
+.. currentmodule:: pyspark.sql.types
+
+When working with PySpark, you will often need to consider the conversions between Python-native
+objects to their Spark equivalents. For instance, when working with user-defined functions, the
+function return type will be cast by Spark to an appropriate Spark SQL type. Or, when creating a
+``DataFrame``, you may supply ``numpy`` or ``pandas`` objects as the inputted data. This guide will cover
+the various conversions between Python and Spark SQL types.
+
+Browsing Type Conversions
+-------------------------
+
+Though this document provides a comprehensive list of type conversions, you may find it easier to
+interactively check the conversion behavior of Spark. To do so, you can test small examples of
+user-defined functions, and use the ``spark.createDataFrame`` interface.
+
+All data types of Spark SQL are located in the package of ``pyspark.sql.types``.
+You can access them by doing:
+
+.. code-block:: python
+
+    from pyspark.sql.types import *
+
+Configuration
+-------------
+There are several configurations that affect the behavior of type conversions. These configurations
+are listed below:
+
+.. list-table::
+    :header-rows: 1
+
+    * - Configuration
+      - Description
+      - Default
+    * - spark.sql.execution.pythonUDF.arrow.enabled
+      - Enable PyArrow in PySpark. See more `here <arrow_pandas.rst>`_.
+      - False
+    * - spark.sql.pyspark.inferNestedDictAsStruct.enabled
+      - When enabled, nested dictionaries are inferred as StructType. Otherwise, they are inferred as MapType.
+      - False
+    * - spark.sql.timestampType
+      - If set to `TIMESTAMP_NTZ`, the default timestamp type is ``TimestampNTZType``. Otherwise, the default timestamp type is TimestampType.
+      - ""
+
+All Conversions
+---------------
+.. list-table::

Review Comment:
   Let's add a comment here to update `docs/sql-ref-datatypes.md` together if anyone makes some change. I don't like that we're duplicating the docs.



##########
python/docs/source/user_guide/sql/type_conversions.rst:
##########
@@ -0,0 +1,248 @@
+..  Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+..    http://www.apache.org/licenses/LICENSE-2.0
+
+..  Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+================================
+Python to Spark Type Conversions
+================================
+
+.. TODO: Add additional information on conversions when Arrow is enabled.
+.. TODO: Add in-depth explanation and table for type conversions (SPARK-44734).
+
+.. currentmodule:: pyspark.sql.types
+
+When working with PySpark, you will often need to consider the conversions between Python-native
+objects to their Spark equivalents. For instance, when working with user-defined functions, the
+function return type will be cast by Spark to an appropriate Spark SQL type. Or, when creating a
+``DataFrame``, you may supply ``numpy`` or ``pandas`` objects as the inputted data. This guide will cover
+the various conversions between Python and Spark SQL types.
+
+Browsing Type Conversions
+-------------------------
+
+Though this document provides a comprehensive list of type conversions, you may find it easier to
+interactively check the conversion behavior of Spark. To do so, you can test small examples of
+user-defined functions, and use the ``spark.createDataFrame`` interface.
+
+All data types of Spark SQL are located in the package of ``pyspark.sql.types``.
+You can access them by doing:
+
+.. code-block:: python
+
+    from pyspark.sql.types import *
+
+Configuration
+-------------
+There are several configurations that affect the behavior of type conversions. These configurations
+are listed below:
+
+.. list-table::
+    :header-rows: 1
+
+    * - Configuration
+      - Description
+      - Default
+    * - spark.sql.execution.pythonUDF.arrow.enabled
+      - Enable PyArrow in PySpark. See more `here <arrow_pandas.rst>`_.
+      - False
+    * - spark.sql.pyspark.inferNestedDictAsStruct.enabled
+      - When enabled, nested dictionaries are inferred as StructType. Otherwise, they are inferred as MapType.
+      - False
+    * - spark.sql.timestampType
+      - If set to `TIMESTAMP_NTZ`, the default timestamp type is ``TimestampNTZType``. Otherwise, the default timestamp type is TimestampType.
+      - ""
+
+All Conversions
+---------------
+.. list-table::

Review Comment:
   Let's at least add a comment here to update `docs/sql-ref-datatypes.md` together if anyone makes some change. I don't like that we're duplicating the docs.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-44733][PYTHON][DOCS] Add Python to Spark type conversion page to PySpark docs. [spark]

Posted by "PhilDakin (via GitHub)" <gi...@apache.org>.

PhilDakin commented on PR #43369:
URL: https://github.com/apache/spark/pull/43369#issuecomment-1766601051

   Test failure looks unrelated, `pyspark-mllib` `Error: The operation was canceled.`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-44733][PYTHON][DOCS] Add Python to Spark type conversion page to PySpark docs. [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.

HyukjinKwon commented on code in PR #43369:
URL: https://github.com/apache/spark/pull/43369#discussion_r1363179016


##########
docs/sql-ref-datatypes.md:
##########
@@ -119,10 +119,10 @@ from pyspark.sql.types import *
 
 |Data type|Value type in Python|API to access or create a data type|

Review Comment:
   Actually on second thought, can we just add a link back from Python docs to here? then we won't need to duplicate the table.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-44733][PYTHON][DOCS] Add Python to Spark type conversion page to PySpark docs. [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.

HyukjinKwon commented on code in PR #43369:
URL: https://github.com/apache/spark/pull/43369#discussion_r1359991057


##########
python/docs/source/user_guide/sql/type_conversions.rst:
##########
@@ -0,0 +1,106 @@
+..  Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+..    http://www.apache.org/licenses/LICENSE-2.0
+
+..  Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+=======================
+Python to Spark Type Conversions
+=======================
+
+.. currentmodule:: pyspark.sql.types
+
+All data types of Spark SQL are located in the package of `pyspark.sql.types`.
+You can access them by doing:
+
+.. code-block:: python
+
+    from pyspark.sql.types import *
+
+.. list-table::
+    :header-rows: 1
+
+    * - Data type
+      - Value type in Python
+      - API to access or create a data type
+    * - **ByteType**
+      - | int or long
+        |
+        | **Note:** Numbers will be converted to 1-byte signed integer numbers at runtime. Please make sure that numbers are within the range of -128 to 127.

Review Comment:
   You can use `.. note` directive



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-44733][PYTHON][DOCS] Add Python to Spark type conversion page to PySpark docs. [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.

HyukjinKwon commented on code in PR #43369:
URL: https://github.com/apache/spark/pull/43369#discussion_r1359990922


##########
python/docs/source/user_guide/sql/type_conversions.rst:
##########
@@ -0,0 +1,106 @@
+..  Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+..    http://www.apache.org/licenses/LICENSE-2.0
+
+..  Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+=======================
+Python to Spark Type Conversions
+=======================

Review Comment:
   ```suggestion
   ================================
   Python to Spark Type Conversions
   ================================
   ```
   
   Otherwise Sphinx throws errors/warnings



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-44733][PYTHON][DOCS] Add Python to Spark type conversion page to PySpark docs. [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.

HyukjinKwon commented on PR #43369:
URL: https://github.com/apache/spark/pull/43369#issuecomment-1766233419

   Mind attaching the output HTML image if you don't mind? Otherwise looks fine from a cursory look. cc @itholic, @xinrong-meng  and @zhengruifeng too if you find some time to review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-44733][PYTHON][DOCS] Add Python to Spark type conversion page to PySpark docs. [spark]

Posted by "PhilDakin (via GitHub)" <gi...@apache.org>.

PhilDakin commented on PR #43369:
URL: https://github.com/apache/spark/pull/43369#issuecomment-1785785637

   @allisonwang-db added full-page screenshot to description and rebased onto master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-44733][PYTHON][DOCS] Add Python to Spark type conversion page to PySpark docs. [spark]

Posted by "PhilDakin (via GitHub)" <gi...@apache.org>.

PhilDakin commented on PR #43369:
URL: https://github.com/apache/spark/pull/43369#issuecomment-1792603360

   @allisonwang-db any further updates needed here?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-44733][PYTHON][DOCS] Add Python to Spark type conversion page to PySpark docs. [spark]

Posted by "PhilDakin (via GitHub)" <gi...@apache.org>.

PhilDakin commented on PR #43369:
URL: https://github.com/apache/spark/pull/43369#issuecomment-1769423186

   I agree that duplicating the table is not ideal. It would be nice to have a cross-format inclusion mechanism for tables, between the main documentation and PySpark's. Seems a bit out of scope for this PR, though.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-44733][PYTHON][DOCS] Add Python to Spark type conversion page to PySpark docs. [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.

HyukjinKwon commented on code in PR #43369:
URL: https://github.com/apache/spark/pull/43369#discussion_r1390507455


##########
python/docs/source/user_guide/sql/type_conversions.rst:
##########
@@ -0,0 +1,248 @@
+..  Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+..    http://www.apache.org/licenses/LICENSE-2.0
+
+..  Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+================================
+Python to Spark Type Conversions
+================================
+
+.. TODO: Add additional information on conversions when Arrow is enabled.
+.. TODO: Add in-depth explanation and table for type conversions (SPARK-44734).
+
+.. currentmodule:: pyspark.sql.types
+
+When working with PySpark, you will often need to consider the conversions between Python-native
+objects to their Spark equivalents. For instance, when working with user-defined functions, the
+function return type will be cast by Spark to an appropriate Spark SQL type. Or, when creating a
+``DataFrame``, you may supply ``numpy`` or ``pandas`` objects as the inputted data. This guide will cover
+the various conversions between Python and Spark SQL types.
+
+Browsing Type Conversions
+-------------------------
+
+Though this document provides a comprehensive list of type conversions, you may find it easier to
+interactively check the conversion behavior of Spark. To do so, you can test small examples of
+user-defined functions, and use the ``spark.createDataFrame`` interface.
+
+All data types of Spark SQL are located in the package of ``pyspark.sql.types``.
+You can access them by doing:
+
+.. code-block:: python
+
+    from pyspark.sql.types import *

Review Comment:
   Let's avoid wildcards. It's discouraged according to PEP 8.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-44733][PYTHON][DOCS] Add Python to Spark type conversion page to PySpark docs. [spark]

Posted by "PhilDakin (via GitHub)" <gi...@apache.org>.

PhilDakin commented on code in PR #43369:
URL: https://github.com/apache/spark/pull/43369#discussion_r1391907938


##########
python/docs/source/user_guide/sql/type_conversions.rst:
##########
@@ -0,0 +1,248 @@
+..  Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+..    http://www.apache.org/licenses/LICENSE-2.0
+
+..  Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+================================
+Python to Spark Type Conversions
+================================
+
+.. TODO: Add additional information on conversions when Arrow is enabled.

Review Comment:
   This is covered by the ticket in the TODO below, modifying to make this clear.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-44733][PYTHON][DOCS] Add Python to Spark type conversion page to PySpark docs. [spark]

Posted by "allisonwang-db (via GitHub)" <gi...@apache.org>.

allisonwang-db commented on PR #43369:
URL: https://github.com/apache/spark/pull/43369#issuecomment-1769309725

   ^ We don't have to add everything in this PR, but I do think we should have a separate table for type conversion in PySpark docs, and then we can improve it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-44733][PYTHON][DOCS] Add Python to Spark type conversion page to PySpark docs. [spark]

Posted by "PhilDakin (via GitHub)" <gi...@apache.org>.

PhilDakin commented on PR #43369:
URL: https://github.com/apache/spark/pull/43369#issuecomment-1775882582

   @allisonwang-db what do you think here?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-44733][PYTHON][DOCS] Add Python to Spark type conversion page to PySpark docs. [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.

HyukjinKwon commented on code in PR #43369:
URL: https://github.com/apache/spark/pull/43369#discussion_r1390507005


##########
python/docs/source/user_guide/sql/type_conversions.rst:
##########
@@ -0,0 +1,248 @@
+..  Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+..    http://www.apache.org/licenses/LICENSE-2.0
+
+..  Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+================================
+Python to Spark Type Conversions
+================================
+
+.. TODO: Add additional information on conversions when Arrow is enabled.
+.. TODO: Add in-depth explanation and table for type conversions (SPARK-44734).
+
+.. currentmodule:: pyspark.sql.types
+
+When working with PySpark, you will often need to consider the conversions between Python-native
+objects to their Spark equivalents. For instance, when working with user-defined functions, the
+function return type will be cast by Spark to an appropriate Spark SQL type. Or, when creating a
+``DataFrame``, you may supply ``numpy`` or ``pandas`` objects as the inputted data. This guide will cover
+the various conversions between Python and Spark SQL types.
+
+Browsing Type Conversions
+-------------------------
+
+Though this document provides a comprehensive list of type conversions, you may find it easier to
+interactively check the conversion behavior of Spark. To do so, you can test small examples of
+user-defined functions, and use the ``spark.createDataFrame`` interface.
+
+All data types of Spark SQL are located in the package of ``pyspark.sql.types``.
+You can access them by doing:
+
+.. code-block:: python
+
+    from pyspark.sql.types import *
+
+Configuration
+-------------
+There are several configurations that affect the behavior of type conversions. These configurations
+are listed below:
+
+.. list-table::
+    :header-rows: 1
+
+    * - Configuration
+      - Description
+      - Default
+    * - spark.sql.execution.pythonUDF.arrow.enabled
+      - Enable PyArrow in PySpark. See more `here <arrow_pandas.rst>`_.
+      - False
+    * - spark.sql.pyspark.inferNestedDictAsStruct.enabled
+      - When enabled, nested dictionaries are inferred as StructType. Otherwise, they are inferred as MapType.
+      - False
+    * - spark.sql.timestampType
+      - If set to `TIMESTAMP_NTZ`, the default timestamp type is ``TimestampNTZType``. Otherwise, the default timestamp type is TimestampType.
+      - ""
+
+All Conversions
+---------------
+.. list-table::
+    :header-rows: 1
+
+    * - Data type
+      - Value type in Python
+      - API to access or create a data type
+    * - **ByteType**
+      - int
+          .. note:: Numbers will be converted to 1-byte signed integer numbers at runtime. Please make sure that numbers are within the range of -128 to 127.
+      - ByteType()
+    * - **ShortType**
+      - int
+          .. note:: Numbers will be converted to 2-byte signed integer numbers at runtime. Please make sure that numbers are within the range of -32768 to 32767.
+      - ShortType()
+    * - **IntegerType**
+      - int
+      - IntegerType()
+    * - **LongType**
+      - int
+          .. note:: Numbers will be converted to 8-byte signed integer numbers at runtime. Please make sure that numbers are within the range of -9223372036854775808 to 9223372036854775807. Otherwise, please convert data to decimal.Decimal and use DecimalType.
+      - LongType()
+    * - **FloatType**
+      - float
+          .. note:: Numbers will be converted to 4-byte single-precision floating point numbers at runtime.
+      - FloatType()
+    * - **DoubleType**
+      - float
+      - DoubleType()
+    * - **DecimalType**
+      - decimal.Decimal
+      - DecimalType()|
+    * - **StringType**
+      - string
+      - StringType()
+    * - **BinaryType**
+      - bytearray
+      - BinaryType()
+    * - **BooleanType**
+      - bool
+      - BooleanType()
+    * - **TimestampType**
+      - datetime.datetime
+      - TimestampType()
+    * - **TimestampNTZType**
+      - datetime.datetime
+      - TimestampNTZType()
+    * - **DateType**
+      - datetime.date
+      - DateType()
+    * - **DayTimeIntervalType**
+      - datetime.timedelta
+      - DayTimeIntervalType()
+    * - **ArrayType**
+      - list, tuple, or array
+      - ArrayType(*elementType*, [*containsNull*])
+          .. note:: The default value of *containsNull* is True.
+    * - **MapType**
+      - dict
+      - MapType(*keyType*, *valueType*, [*valueContainsNull]*)
+          .. note:: The default value of *valueContainsNull* is True.
+    * - **StructType**
+      - list or tuple
+      - StructType(*fields*)
+          .. note:: *fields* is a Seq of StructFields. Also, two fields with the same name are not allowed.
+    * - **StructField**
+      - The value type in Python of the data type of this field. For example, Int for a StructField with the data type IntegerType.
+      - StructField(*name*, *dataType*, [*nullable*])
+          .. note:: The default value of *nullable* is True.
+
+Conversions in Practice - UDFs
+------------------------------
+A common conversion case is returning a Python value from a UDF. In this case, the return type of
+the UDF must match the provided return type.
+
+.. note:: If the actual return type of your function does not match the provided return type, Spark will implicitly cast the value to null.
+
+.. code-block:: python
+
+  from pyspark.sql.types import (
+      StructType,
+      StructField,
+      IntegerType,
+      StringType,
+      FloatType,
+  )
+  from pyspark.sql.functions import udf, col
+
+  df = spark.createDataFrame(
+      [[1]], schema=StructType([StructField("int", IntegerType())])
+  )
+
+  @udf(returnType=StringType())
+  def to_string(value):
+      return str(value)
+
+  @udf(returnType=FloatType())
+  def to_float(value):
+      return float(value)
+

Review Comment:
   ditto



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-44733][PYTHON][DOCS] Add Python to Spark type conversion page to PySpark docs. [spark]

Posted by "allisonwang-db (via GitHub)" <gi...@apache.org>.

allisonwang-db commented on code in PR #43369:
URL: https://github.com/apache/spark/pull/43369#discussion_r1376549912


##########
python/docs/source/user_guide/sql/type_conversions.rst:
##########
@@ -0,0 +1,249 @@
+..  Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+..    http://www.apache.org/licenses/LICENSE-2.0
+
+..  Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+================================
+Python to Spark Type Conversions
+================================
+
+.. TODO: Add additional information on conversions when Arrow is enabled..
+.. TODO: Add in-depth explanation and table for type conversions (SPARK-44734).
+
+.. currentmodule:: pyspark.sql.types
+
+When working with PySpark, you will often need to consider the conversions between Python-native
+objects to their Spark equivalents. For instance, when working with user-defined functions, the
+function return type will be cast by Spark to an appropriate Spark SQL type. Or, when creating a
+``DataFrame``, you may supply ``numpy`` or ``pandas`` objects as the inputted data. This guide will cover
+the various conversions between Python and Spark SQL types.
+
+Browsing Type Conversions
+-------------------------
+
+Though this document provides a comprehensive list of type conversions, you may find it easier to
+interactively check the conversion behavior of Spark. To do so, you can test small examples of
+user-defined functions, and use the ``spark.createDataFrame`` interface.
+
+All data types of Spark SQL are located in the package of ``pyspark.sql.types``.
+You can access them by doing:
+
+.. code-block:: python
+
+    from pyspark.sql.types import *
+
+Configuration
+-------------
+There are several configurations that affect the behavior of type conversions. These configurations
+are listed below:
+
+.. list-table::
+    :header-rows: 1
+
+    * - Configuration
+      - Description
+      - Default
+    * - spark.sql.execution.pythonUDF.arrow.enabled
+      - Enable PyArrow in PySpark. See more `here <arrow_pandas.rst>`_.
+      - False
+    * - spark.sql.pyspark.inferNestedDictAsStruct.enabled
+      - When enabled, nested dictionaries are inferred as StructType. Otherwise, they are inferred as MapType.
+      - False
+    * - spark.sql.timestampType
+      - If set to `TIMESTAMP_NTZ`, the default timestamp type is ``TimestampNTZType``. Otherwise, the default timestamp type is TimestampType.
+      - ""
+
+Conversions in Practice - UDFs
+------------------------------
+A common conversion case is returning a Python value from a UDF. In this case, the return type of
+the UDF must match the provided return type.
+
+.. note:: If the actual return type of your function does not match the provided return type, Spark will implicitly cast the value to null.
+
+.. code-block:: python
+
+  from pyspark.sql.types import (
+      StructType,
+      StructField,
+      IntegerType,
+      StringType,
+      FloatType,
+  )
+  from pyspark.sql.functions import udf, col
+
+  df = spark.createDataFrame(
+      [[1]], schema=StructType([StructField("int", IntegerType())])
+  )
+
+  @udf(returnType=StringType())
+  def to_string(value):
+      return str(value)
+
+  @udf(returnType=FloatType())
+  def to_float(value):
+      return float(value)
+
+  df.withColumn("cast_int", to_float(col("int"))).withColumn(
+      "cast_str", to_string(col("int"))
+  ).printSchema()
+  # root
+  # |-- int: integer (nullable = true)
+  # |-- cast_int: float (nullable = true)
+  # |-- cast_str: string (nullable = true)
+
+Conversions in Practice - Creating DataFrames
+---------------------------------------------
+Another common conversion case is when creating a DataFrame from values in Python. In this case,
+you can supply a schema, or allow Spark to infer the schema from the provided data.
+
+.. code-block:: python
+
+  data = [
+      ["Wei", "Math", 93.0, 1],
+      ["Jerry", "Physics", 85.0, 4],
+      ["Katrina", "Geology", 90.0, 2],
+  ]
+  cols = ["Name", "Subject", "Score", "Period"]
+
+  spark.createDataFrame(data, cols).printSchema()
+  # root
+  # |-- Name: string (nullable = true)
+  # |-- Subject: string (nullable = true)
+  # |-- Score: double (nullable = true)
+  # |-- Period: long (nullable = true)
+
+  import pandas as pd
+
+  df = pd.DataFrame(data, columns=cols)
+  spark.createDataFrame(df).printSchema()
+  # root
+  # |-- Name: string (nullable = true)
+  # |-- Subject: string (nullable = true)
+  # |-- Score: double (nullable = true)
+  # |-- Period: long (nullable = true)
+
+  import numpy as np
+
+  spark.createDataFrame(np.zeros([3, 2], "int8")).printSchema()
+  # root
+  # |-- _1: byte (nullable = true)
+  # |-- _2: byte (nullable = true)
+
+Conversions in Practice - Nested Data Types
+-------------------------------------------
+Nested data types will convert to ``StructType``, ``MapType``, and ``ArrayType``, depending on the passed data.
+
+.. code-block:: python
+
+  data = [
+      ["Wei", [[1, 2]], {"RecordType": "Scores", "Math": { "H1": 93.0, "H2": 85.0}}],
+  ]
+  cols = ["Name", "ActiveHalfs", "Record"]
+
+  spark.createDataFrame(data, cols).printSchema()
+  # root
+  #  |-- Name: string (nullable = true)
+  #  |-- ActiveHalfs: array (nullable = true)
+  #  |    |-- element: array (containsNull = true)
+  #  |    |    |-- element: long (containsNull = true)
+  #  |-- Record: map (nullable = true)
+  #  |    |-- key: string
+  #  |    |-- value: string (valueContainsNull = true)
+
+  spark.conf.set('spark.sql.pyspark.inferNestedDictAsStruct.enabled', True)
+
+  spark.createDataFrame(data, cols).printSchema()
+  # root
+  #  |-- Name: string (nullable = true)
+  #  |-- ActiveHalfs: array (nullable = true)
+  #  |    |-- element: array (containsNull = true)
+  #  |    |    |-- element: long (containsNull = true)
+  #  |-- Record: struct (nullable = true)
+  #  |    |-- RecordType: string (nullable = true)
+  #  |    |-- Math: struct (nullable = true)
+  #  |    |    |-- H1: double (nullable = true)
+  #  |    |    |-- H2: double (nullable = true)
+
+All Conversions
+---------------

Review Comment:
   This is great. Let's move this to the top of the user guide (above the section Conversions in Practice - UDFs)



##########
python/docs/source/user_guide/sql/type_conversions.rst:
##########
@@ -0,0 +1,249 @@
+..  Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+..    http://www.apache.org/licenses/LICENSE-2.0
+
+..  Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+================================
+Python to Spark Type Conversions
+================================
+
+.. TODO: Add additional information on conversions when Arrow is enabled..

Review Comment:
   ```suggestion
   .. TODO: Add additional information on conversions when Arrow is enabled.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-44733][PYTHON][DOCS] Add Python to Spark type conversion page to PySpark docs. [spark]

Posted by "itholic (via GitHub)" <gi...@apache.org>.

itholic commented on PR #43369:
URL: https://github.com/apache/spark/pull/43369#issuecomment-1767634300

   Looks nice. Could you rebase the PR to master?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-44733][PYTHON][DOCS] Add Python to Spark type conversion page to PySpark docs. [spark]

Posted by "allisonwang-db (via GitHub)" <gi...@apache.org>.

allisonwang-db commented on PR #43369:
URL: https://github.com/apache/spark/pull/43369#issuecomment-1769005409

   Hi @PhilDakin thanks for doing this! I personally think it's better to have the table here instead of a link to another page.
   
   Also, I think we should **explain why this conversion table matters**. For example, it is useful when users what to map a Python return type to a Spark return type in a Python UDF. 
   
   Another thing we need to mention is type casting. What if I want to cast an int type in Python to a FloatType in Spark? Currently, for regular Python UDF, it will return NULL, I believe, but for arrow-optimized Python UDF, it can cast the value properly. It will be valuable to have a table like this:
   https://github.com/apache/spark/blob/b41ea9162f4c8fbc4d04d28d6ab5cc0342b88cb0/python/pyspark/sql/udf.py#L94-L119


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-44733][PYTHON][DOCS] Add Python to Spark type conversion page to PySpark docs. [spark]

Posted by "zhengruifeng (via GitHub)" <gi...@apache.org>.

zhengruifeng commented on PR #43369:
URL: https://github.com/apache/spark/pull/43369#issuecomment-1776506123

   Do we need to mention related configs like `spark.sql.pyspark.inferNestedDictAsStruct.enabled` and `spark.sql.timestampType`?
   
   In createDataFrame, `spark.sql.pyspark.inferNestedDictAsStruct.enabled` controls whether a dict be treated as a map or struct.
   
   
   BTW, I think we may need to mention nested rows and numpy arrays:
   ```
   In [25]: spark.createDataFrame(np.zeros([3,3], "int8"))
   Out[25]: DataFrame[_1: tinyint, _2: tinyint, _3: tinyint]
   
   In [26]: spark.createDataFrame(np.zeros([3,3], "int64"))
   Out[26]: DataFrame[_1: bigint, _2: bigint, _3: bigint]
   
   In [27]: spark.createDataFrame([Row(a=1, b=Row(c=2))])
   Out[27]: DataFrame[a: bigint, b: struct<c:bigint>]
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-44733][PYTHON][DOCS] Add Python to Spark type conversion page to PySpark docs. [spark]

Posted by "PhilDakin (via GitHub)" <gi...@apache.org>.

PhilDakin commented on PR #43369:
URL: https://github.com/apache/spark/pull/43369#issuecomment-1769370243

   @allisonwang-db brought back the table and added a section indicating when these conversions are relevant during UDF definitions.
   
   Will follow up with examples going into more depth on type conversion as a separate PR for https://issues.apache.org/jira/browse/SPARK-44734.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-44733][PYTHON][DOCS] Add Python to Spark type conversion page to PySpark docs. [spark]

Posted by "PhilDakin (via GitHub)" <gi...@apache.org>.

PhilDakin commented on code in PR #43369:
URL: https://github.com/apache/spark/pull/43369#discussion_r1376630344


##########
python/docs/source/user_guide/sql/type_conversions.rst:
##########
@@ -0,0 +1,249 @@
+..  Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+..    http://www.apache.org/licenses/LICENSE-2.0
+
+..  Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+================================
+Python to Spark Type Conversions
+================================
+
+.. TODO: Add additional information on conversions when Arrow is enabled..

Review Comment:
   I removed this extraneous period.



##########
python/docs/source/user_guide/sql/type_conversions.rst:
##########
@@ -0,0 +1,249 @@
+..  Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+..    http://www.apache.org/licenses/LICENSE-2.0
+
+..  Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+================================
+Python to Spark Type Conversions
+================================
+
+.. TODO: Add additional information on conversions when Arrow is enabled..
+.. TODO: Add in-depth explanation and table for type conversions (SPARK-44734).
+
+.. currentmodule:: pyspark.sql.types
+
+When working with PySpark, you will often need to consider the conversions between Python-native
+objects to their Spark equivalents. For instance, when working with user-defined functions, the
+function return type will be cast by Spark to an appropriate Spark SQL type. Or, when creating a
+``DataFrame``, you may supply ``numpy`` or ``pandas`` objects as the inputted data. This guide will cover
+the various conversions between Python and Spark SQL types.
+
+Browsing Type Conversions
+-------------------------
+
+Though this document provides a comprehensive list of type conversions, you may find it easier to
+interactively check the conversion behavior of Spark. To do so, you can test small examples of
+user-defined functions, and use the ``spark.createDataFrame`` interface.
+
+All data types of Spark SQL are located in the package of ``pyspark.sql.types``.
+You can access them by doing:
+
+.. code-block:: python
+
+    from pyspark.sql.types import *
+
+Configuration
+-------------
+There are several configurations that affect the behavior of type conversions. These configurations
+are listed below:
+
+.. list-table::
+    :header-rows: 1
+
+    * - Configuration
+      - Description
+      - Default
+    * - spark.sql.execution.pythonUDF.arrow.enabled
+      - Enable PyArrow in PySpark. See more `here <arrow_pandas.rst>`_.
+      - False
+    * - spark.sql.pyspark.inferNestedDictAsStruct.enabled
+      - When enabled, nested dictionaries are inferred as StructType. Otherwise, they are inferred as MapType.
+      - False
+    * - spark.sql.timestampType
+      - If set to `TIMESTAMP_NTZ`, the default timestamp type is ``TimestampNTZType``. Otherwise, the default timestamp type is TimestampType.
+      - ""
+
+Conversions in Practice - UDFs
+------------------------------
+A common conversion case is returning a Python value from a UDF. In this case, the return type of
+the UDF must match the provided return type.
+
+.. note:: If the actual return type of your function does not match the provided return type, Spark will implicitly cast the value to null.
+
+.. code-block:: python
+
+  from pyspark.sql.types import (
+      StructType,
+      StructField,
+      IntegerType,
+      StringType,
+      FloatType,
+  )
+  from pyspark.sql.functions import udf, col
+
+  df = spark.createDataFrame(
+      [[1]], schema=StructType([StructField("int", IntegerType())])
+  )
+
+  @udf(returnType=StringType())
+  def to_string(value):
+      return str(value)
+
+  @udf(returnType=FloatType())
+  def to_float(value):
+      return float(value)
+
+  df.withColumn("cast_int", to_float(col("int"))).withColumn(
+      "cast_str", to_string(col("int"))
+  ).printSchema()
+  # root
+  # |-- int: integer (nullable = true)
+  # |-- cast_int: float (nullable = true)
+  # |-- cast_str: string (nullable = true)
+
+Conversions in Practice - Creating DataFrames
+---------------------------------------------
+Another common conversion case is when creating a DataFrame from values in Python. In this case,
+you can supply a schema, or allow Spark to infer the schema from the provided data.
+
+.. code-block:: python
+
+  data = [
+      ["Wei", "Math", 93.0, 1],
+      ["Jerry", "Physics", 85.0, 4],
+      ["Katrina", "Geology", 90.0, 2],
+  ]
+  cols = ["Name", "Subject", "Score", "Period"]
+
+  spark.createDataFrame(data, cols).printSchema()
+  # root
+  # |-- Name: string (nullable = true)
+  # |-- Subject: string (nullable = true)
+  # |-- Score: double (nullable = true)
+  # |-- Period: long (nullable = true)
+
+  import pandas as pd
+
+  df = pd.DataFrame(data, columns=cols)
+  spark.createDataFrame(df).printSchema()
+  # root
+  # |-- Name: string (nullable = true)
+  # |-- Subject: string (nullable = true)
+  # |-- Score: double (nullable = true)
+  # |-- Period: long (nullable = true)
+
+  import numpy as np
+
+  spark.createDataFrame(np.zeros([3, 2], "int8")).printSchema()
+  # root
+  # |-- _1: byte (nullable = true)
+  # |-- _2: byte (nullable = true)
+
+Conversions in Practice - Nested Data Types
+-------------------------------------------
+Nested data types will convert to ``StructType``, ``MapType``, and ``ArrayType``, depending on the passed data.
+
+.. code-block:: python
+
+  data = [
+      ["Wei", [[1, 2]], {"RecordType": "Scores", "Math": { "H1": 93.0, "H2": 85.0}}],
+  ]
+  cols = ["Name", "ActiveHalfs", "Record"]
+
+  spark.createDataFrame(data, cols).printSchema()
+  # root
+  #  |-- Name: string (nullable = true)
+  #  |-- ActiveHalfs: array (nullable = true)
+  #  |    |-- element: array (containsNull = true)
+  #  |    |    |-- element: long (containsNull = true)
+  #  |-- Record: map (nullable = true)
+  #  |    |-- key: string
+  #  |    |-- value: string (valueContainsNull = true)
+
+  spark.conf.set('spark.sql.pyspark.inferNestedDictAsStruct.enabled', True)
+
+  spark.createDataFrame(data, cols).printSchema()
+  # root
+  #  |-- Name: string (nullable = true)
+  #  |-- ActiveHalfs: array (nullable = true)
+  #  |    |-- element: array (containsNull = true)
+  #  |    |    |-- element: long (containsNull = true)
+  #  |-- Record: struct (nullable = true)
+  #  |    |-- RecordType: string (nullable = true)
+  #  |    |-- Math: struct (nullable = true)
+  #  |    |    |-- H1: double (nullable = true)
+  #  |    |    |-- H2: double (nullable = true)
+
+All Conversions
+---------------

Review Comment:
   Moved table.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-44733][PYTHON][DOCS] Add Python to Spark type conversion page to PySpark docs. [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.

HyukjinKwon commented on code in PR #43369:
URL: https://github.com/apache/spark/pull/43369#discussion_r1390507101


##########
python/docs/source/user_guide/sql/type_conversions.rst:
##########
@@ -0,0 +1,248 @@
+..  Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+..    http://www.apache.org/licenses/LICENSE-2.0
+
+..  Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+================================
+Python to Spark Type Conversions
+================================
+
+.. TODO: Add additional information on conversions when Arrow is enabled.
+.. TODO: Add in-depth explanation and table for type conversions (SPARK-44734).
+
+.. currentmodule:: pyspark.sql.types
+
+When working with PySpark, you will often need to consider the conversions between Python-native
+objects to their Spark equivalents. For instance, when working with user-defined functions, the
+function return type will be cast by Spark to an appropriate Spark SQL type. Or, when creating a
+``DataFrame``, you may supply ``numpy`` or ``pandas`` objects as the inputted data. This guide will cover
+the various conversions between Python and Spark SQL types.
+
+Browsing Type Conversions
+-------------------------
+
+Though this document provides a comprehensive list of type conversions, you may find it easier to
+interactively check the conversion behavior of Spark. To do so, you can test small examples of
+user-defined functions, and use the ``spark.createDataFrame`` interface.
+
+All data types of Spark SQL are located in the package of ``pyspark.sql.types``.
+You can access them by doing:
+
+.. code-block:: python
+
+    from pyspark.sql.types import *
+
+Configuration
+-------------
+There are several configurations that affect the behavior of type conversions. These configurations
+are listed below:
+
+.. list-table::
+    :header-rows: 1
+
+    * - Configuration
+      - Description
+      - Default
+    * - spark.sql.execution.pythonUDF.arrow.enabled
+      - Enable PyArrow in PySpark. See more `here <arrow_pandas.rst>`_.
+      - False
+    * - spark.sql.pyspark.inferNestedDictAsStruct.enabled
+      - When enabled, nested dictionaries are inferred as StructType. Otherwise, they are inferred as MapType.
+      - False
+    * - spark.sql.timestampType
+      - If set to `TIMESTAMP_NTZ`, the default timestamp type is ``TimestampNTZType``. Otherwise, the default timestamp type is TimestampType.
+      - ""
+
+All Conversions
+---------------
+.. list-table::
+    :header-rows: 1
+
+    * - Data type
+      - Value type in Python
+      - API to access or create a data type
+    * - **ByteType**
+      - int
+          .. note:: Numbers will be converted to 1-byte signed integer numbers at runtime. Please make sure that numbers are within the range of -128 to 127.
+      - ByteType()
+    * - **ShortType**
+      - int
+          .. note:: Numbers will be converted to 2-byte signed integer numbers at runtime. Please make sure that numbers are within the range of -32768 to 32767.
+      - ShortType()
+    * - **IntegerType**
+      - int
+      - IntegerType()
+    * - **LongType**
+      - int
+          .. note:: Numbers will be converted to 8-byte signed integer numbers at runtime. Please make sure that numbers are within the range of -9223372036854775808 to 9223372036854775807. Otherwise, please convert data to decimal.Decimal and use DecimalType.
+      - LongType()
+    * - **FloatType**
+      - float
+          .. note:: Numbers will be converted to 4-byte single-precision floating point numbers at runtime.
+      - FloatType()
+    * - **DoubleType**
+      - float
+      - DoubleType()
+    * - **DecimalType**
+      - decimal.Decimal
+      - DecimalType()|
+    * - **StringType**
+      - string
+      - StringType()
+    * - **BinaryType**
+      - bytearray
+      - BinaryType()
+    * - **BooleanType**
+      - bool
+      - BooleanType()
+    * - **TimestampType**
+      - datetime.datetime
+      - TimestampType()
+    * - **TimestampNTZType**
+      - datetime.datetime
+      - TimestampNTZType()
+    * - **DateType**
+      - datetime.date
+      - DateType()
+    * - **DayTimeIntervalType**
+      - datetime.timedelta
+      - DayTimeIntervalType()
+    * - **ArrayType**
+      - list, tuple, or array
+      - ArrayType(*elementType*, [*containsNull*])
+          .. note:: The default value of *containsNull* is True.
+    * - **MapType**
+      - dict
+      - MapType(*keyType*, *valueType*, [*valueContainsNull]*)
+          .. note:: The default value of *valueContainsNull* is True.
+    * - **StructType**
+      - list or tuple
+      - StructType(*fields*)
+          .. note:: *fields* is a Seq of StructFields. Also, two fields with the same name are not allowed.
+    * - **StructField**
+      - The value type in Python of the data type of this field. For example, Int for a StructField with the data type IntegerType.
+      - StructField(*name*, *dataType*, [*nullable*])
+          .. note:: The default value of *nullable* is True.
+
+Conversions in Practice - UDFs
+------------------------------
+A common conversion case is returning a Python value from a UDF. In this case, the return type of
+the UDF must match the provided return type.
+
+.. note:: If the actual return type of your function does not match the provided return type, Spark will implicitly cast the value to null.
+
+.. code-block:: python
+
+  from pyspark.sql.types import (
+      StructType,
+      StructField,
+      IntegerType,
+      StringType,
+      FloatType,
+  )
+  from pyspark.sql.functions import udf, col
+
+  df = spark.createDataFrame(
+      [[1]], schema=StructType([StructField("int", IntegerType())])
+  )
+
+  @udf(returnType=StringType())
+  def to_string(value):
+      return str(value)
+
+  @udf(returnType=FloatType())
+  def to_float(value):
+      return float(value)
+
+  df.withColumn("cast_int", to_float(col("int"))).withColumn(
+      "cast_str", to_string(col("int"))
+  ).printSchema()
+  # root
+  # |-- int: integer (nullable = true)
+  # |-- cast_int: float (nullable = true)
+  # |-- cast_str: string (nullable = true)
+
+Conversions in Practice - Creating DataFrames
+---------------------------------------------
+Another common conversion case is when creating a DataFrame from values in Python. In this case,
+you can supply a schema, or allow Spark to infer the schema from the provided data.
+
+.. code-block:: python
+
+  data = [
+      ["Wei", "Math", 93.0, 1],
+      ["Jerry", "Physics", 85.0, 4],
+      ["Katrina", "Geology", 90.0, 2],
+  ]
+  cols = ["Name", "Subject", "Score", "Period"]
+
+  spark.createDataFrame(data, cols).printSchema()
+  # root
+  # |-- Name: string (nullable = true)
+  # |-- Subject: string (nullable = true)
+  # |-- Score: double (nullable = true)
+  # |-- Period: long (nullable = true)
+
+  import pandas as pd

Review Comment:
   I would mode the imports to the top. numpy too.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-44733][PYTHON][DOCS] Add Python to Spark type conversion page to PySpark docs. [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.

HyukjinKwon commented on code in PR #43369:
URL: https://github.com/apache/spark/pull/43369#discussion_r1390507825


##########
docs/sql-ref-datatypes.md:
##########
@@ -119,10 +119,10 @@ from pyspark.sql.types import *
 
 |Data type|Value type in Python|API to access or create a data type|

Review Comment:
   Let's also add a comment that we should fix `python/docs/source/user_guide/sql/type_conversions.rst`. You could use `<!-- comment -->`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org