You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "morvenhuang (Jira)" <ji...@apache.org> on 2022/04/08 02:42:00 UTC

[jira] [Created] (SPARK-38826) dropFieldIfAllNull option does not work for empty JSON struct

morvenhuang created SPARK-38826:
-----------------------------------

             Summary: dropFieldIfAllNull option does not work for empty JSON struct
                 Key: SPARK-38826
                 URL: https://issues.apache.org/jira/browse/SPARK-38826
             Project: Spark
          Issue Type: Improvement
          Components: Documentation
    Affects Versions: 3.2.1
            Reporter: morvenhuang


As stated in the doc, 
{quote}dropFieldIfAllNull

Whether to ignore column of all null values or empty array/struct during schema inference.

 
{quote}
But when I try this, 

 
{code:java}
String json = "{\"field1\":\"value1\", \"field2\":{}}";
JavaSparkContext jsc = new JavaSparkContext(spark.sparkContext());
JavaRDD<String> jrdd = jsc.parallelize(Arrays.asList(json));
Dataset<Row> df = spark.read().option("dropFieldIfAllNull", "false").json(jrdd);
df.printSchema();
{code}
 

I get this, 
{code:java}
root
 |-- field1: string (nullable = true){code}
Notice field2 is still missing even when dropFieldIfAllNull is set to false, so apparently, this option does not work for empty struct.

This is due to [SPARK-8093|https://issues.apache.org/jira/browse/SPARK-8093].

I think we should update the doc, otherwise it would be confusing.

I can make a patch for this.

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org