You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2019/05/21 04:03:06 UTC

[jira] [Updated] (SPARK-12264) Add a typeTag or scalaTypeTag method to DataType

     [ https://issues.apache.org/jira/browse/SPARK-12264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon updated SPARK-12264:
---------------------------------
    Labels: bulk-closed  (was: )

> Add a typeTag or scalaTypeTag method to DataType
> ------------------------------------------------
>
>                 Key: SPARK-12264
>                 URL: https://issues.apache.org/jira/browse/SPARK-12264
>             Project: Spark
>          Issue Type: New Feature
>          Components: SQL
>            Reporter: Andras Nemeth
>            Priority: Minor
>              Labels: bulk-closed
>
> We are writing code that's dealing with generic DataFrames as inputs and further processes their contents with normal RDD operations (not SQL). We need some mechanism that tells us exactly what Scala types we will find inside a Row of a given DataFrame.
> The schema of the DataFrame contains this information in an abstract sense. But we need to map it to TypeTags, as that's what the rest of the system uses to identify what RDD contains what type of data - quite the natural choice in Scala.
> As far as I can tell, there is no good way to do this today. For now we have a hand coded mapping, but that feels very fragile as spark evolves. Is there a better way I'm missing? And if not, could we create one? Adding a typeTag or scalaTypeTag method to DataType, or at least to AtomicType  seems easy enough.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org