You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Alexander (JIRA)" <ji...@apache.org> on 2018/08/21 19:50:00 UTC

[jira] [Commented] (SPARK-7768) Make user-defined type (UDT) API public

    [ https://issues.apache.org/jira/browse/SPARK-7768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16587928#comment-16587928 ] 

Alexander commented on SPARK-7768:
----------------------------------

It's been a while since this had any activity. What is the difficulty level of getting UDTRegistration up to a level where it can become public? I noticed that you need to use spark-inner types in org.apache.spark.unsafe like UTF8String instead of String etc... but this is a minor inconvenience compared to the annoyance-level of not being able to use even things like custom-enum types (think [Ennumeratum|https://github.com/lloydmeta/enumeratum] for instance) inside of records.

This is probably the most misunderstood part of Spark in general. For instance, vaquarkhan's [gigantic post|https://github.com/vaquarkhan/vk-wiki-notes/wiki/Apache-Spark-custom-Encoder-example] about wrangling with this issue. Other people have written custom encoders to solve particular use-cases e.g. [here|https://typelevel.org/frameless/Injection.html], and [here|https://github.com/gennady-lebedev/spark-enum-encoder]. There is a general cacaphony of fustration and cluenessness because you have to dig into ExpressionEncoder in order to be able to understand why it's happening.

Come on guys! Spark is probably the most awesome analytics engine in the world. Why can't we solve this problem together???

> Make user-defined type (UDT) API public
> ---------------------------------------
>
>                 Key: SPARK-7768
>                 URL: https://issues.apache.org/jira/browse/SPARK-7768
>             Project: Spark
>          Issue Type: New Feature
>          Components: SQL
>            Reporter: Xiangrui Meng
>            Priority: Critical
>
> As the demand for UDTs increases beyond sparse/dense vectors in MLlib, it would be nice to make the UDT API public in 1.5.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org