You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Anthony Wainer Cachay Guivin (Jira)" <ji...@apache.org> on 2022/10/17 13:55:00 UTC
[jira] [Updated] (SPARK-40820) creating StructType from Json
[ https://issues.apache.org/jira/browse/SPARK-40820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Anthony Wainer Cachay Guivin updated SPARK-40820:
-------------------------------------------------
Description:
When create a StructType from a Python dictionary you utilize the [StructType.fromJson|https://github.com/apache/spark/blob/master/python/pyspark/sql/types.py#L569-L571] method.
To create a schema can be created as follows from the code below, but it requires to put inside the json: Nullable and Metadata, this is inconsistent because within the DataType class this by default.
{code:python}
json = {
"name": "name",
"type": "string"
}
StructField.fromJson(json)
{code}
Error:
{code:python}
from pyspark.sql.types import StructField
json = {
"name": "name",
"type": "string"
}
StructField.fromJson(json)
Traceback (most recent call last):
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/code.py", line 90, in runcode
exec(code, self.locals)
File "<input>", line 1, in <module>
File "/Users/awcachayg/PycharmProjects/DataFabric/cfortega.Datalogo/venv/lib/python3.8/site-packages/pyspark/sql/types.py", line 583, in fromJson
json["nullable"],
KeyError: 'nullable' {code}
Proposed coding solution:
Instead use indexes for getting from a dictionary, it would be better to use .get
{code:python}
def fromJson(cls, json: Dict[str, Any]) -> "StructField":
return StructField(
json["name"],
_parse_datatype_json_value(json["type"]),
json.get("nullable"),
json.get("metadata"),
)
{code}
was:
When create a StructType from a Python dictionary you utilize the [StructType.fromJson|https://github.com/apache/spark/blob/master/python/pyspark/sql/types.py#L569-L571] method.
To create a schema can be created as follows from the code below, but it requires to put inside the json: Nullable and Metadata, this is inconsistent because within the DataType class this by default.
{code:python}
json = {
"name": "name",
"type": "string"
}
StructField.fromJson(json)
{code}
Error:
!image-2022-10-17-15-52-21-787.png!
Proposed coding solution:
Instead use indexes for getting from a dictionary, it would be better to use .get
{code:python}
def fromJson(cls, json: Dict[str, Any]) -> "StructField":
return StructField(
json["name"],
_parse_datatype_json_value(json["type"]),
json.get("nullable"),
json.get("metadata"),
)
{code}
> creating StructType from Json
> -----------------------------
>
> Key: SPARK-40820
> URL: https://issues.apache.org/jira/browse/SPARK-40820
> Project: Spark
> Issue Type: Bug
> Components: PySpark
> Affects Versions: 3.3.0
> Reporter: Anthony Wainer Cachay Guivin
> Priority: Minor
> Attachments: bug .png
>
>
> When create a StructType from a Python dictionary you utilize the [StructType.fromJson|https://github.com/apache/spark/blob/master/python/pyspark/sql/types.py#L569-L571] method.
> To create a schema can be created as follows from the code below, but it requires to put inside the json: Nullable and Metadata, this is inconsistent because within the DataType class this by default.
> {code:python}
> json = {
> "name": "name",
> "type": "string"
> }
> StructField.fromJson(json)
> {code}
> Error:
> {code:python}
> from pyspark.sql.types import StructField
> json = {
> "name": "name",
> "type": "string"
> }
> StructField.fromJson(json)
> Traceback (most recent call last):
> File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/code.py", line 90, in runcode
> exec(code, self.locals)
> File "<input>", line 1, in <module>
> File "/Users/awcachayg/PycharmProjects/DataFabric/cfortega.Datalogo/venv/lib/python3.8/site-packages/pyspark/sql/types.py", line 583, in fromJson
> json["nullable"],
> KeyError: 'nullable' {code}
>
> Proposed coding solution:
> Instead use indexes for getting from a dictionary, it would be better to use .get
> {code:python}
> def fromJson(cls, json: Dict[str, Any]) -> "StructField":
> return StructField(
> json["name"],
> _parse_datatype_json_value(json["type"]),
> json.get("nullable"),
> json.get("metadata"),
> )
> {code}
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org