You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Wenchen Fan (Jira)" <ji...@apache.org> on 2020/03/11 10:31:00 UTC

[jira] [Assigned] (SPARK-31071) Spark Encoders.bean() should allow marking non-null fields in its Spark schema

     [ https://issues.apache.org/jira/browse/SPARK-31071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Wenchen Fan reassigned SPARK-31071:
-----------------------------------

    Assignee: L. C. Hsieh

> Spark Encoders.bean() should allow marking non-null fields in its Spark schema
> ------------------------------------------------------------------------------
>
>                 Key: SPARK-31071
>                 URL: https://issues.apache.org/jira/browse/SPARK-31071
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.4.4
>            Reporter: Kyrill Alyoshin
>            Assignee: L. C. Hsieh
>            Priority: Major
>
> Spark _Encoders.bean()_ method should allow the generated StructType schema fields be *non-nullable*.
> Currently, any non-primitive type is automatically _nullable_. It is hard-coded in the _org.apache.spark.sql.catalyst.JavaTypeReference_ class.  This can lead to rather interesting situations... For example, let's say I want to save a dataframe using an Avro format with my own non-spark generated Avro schema. Let's also say that my Avro schema has a field that is non-null (i.e., not a union type). Well, it appears *impossible* to store a dataframe using such an Avro schema since Spark would assume that the field is nullable (as it is in its own schema) which would conflict with Avro schema semantics and throw an exception.
> I propose making a change to the _JavaTypeReference_ class to observe the JSR-305 _Nonnull_ annotation (and its children) on the provided bean class during StructType schema generation. This would allow bean creators to control the resulting Spark schema so much better.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org