You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "morvenhuang (Jira)" <ji...@apache.org> on 2022/04/08 02:42:00 UTC
[jira] [Created] (SPARK-38826) dropFieldIfAllNull option does not work for empty JSON struct
morvenhuang created SPARK-38826:
-----------------------------------
Summary: dropFieldIfAllNull option does not work for empty JSON struct
Key: SPARK-38826
URL: https://issues.apache.org/jira/browse/SPARK-38826
Project: Spark
Issue Type: Improvement
Components: Documentation
Affects Versions: 3.2.1
Reporter: morvenhuang
As stated in the doc,
{quote}dropFieldIfAllNull
Whether to ignore column of all null values or empty array/struct during schema inference.
{quote}
But when I try this,
{code:java}
String json = "{\"field1\":\"value1\", \"field2\":{}}";
JavaSparkContext jsc = new JavaSparkContext(spark.sparkContext());
JavaRDD<String> jrdd = jsc.parallelize(Arrays.asList(json));
Dataset<Row> df = spark.read().option("dropFieldIfAllNull", "false").json(jrdd);
df.printSchema();
{code}
I get this,
{code:java}
root
|-- field1: string (nullable = true){code}
Notice field2 is still missing even when dropFieldIfAllNull is set to false, so apparently, this option does not work for empty struct.
This is due to [SPARK-8093|https://issues.apache.org/jira/browse/SPARK-8093].
I think we should update the doc, otherwise it would be confusing.
I can make a patch for this.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org