You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Shixiong Zhu (JIRA)" <ji...@apache.org> on 2019/07/21 22:42:00 UTC

[jira] [Updated] (SPARK-28456) Add a public API `Encoder.makeCopy` to allow creating Encoder without touching Scala reflections

     [ https://issues.apache.org/jira/browse/SPARK-28456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Shixiong Zhu updated SPARK-28456:
---------------------------------
    Description: 
Because `Encoder` is not thread safe, the user cannot reuse an `Encoder` in multiple `Dataset`s. However, creating an `Encoder` for a complicated class is slow due to Scala reflections. To reduce the cost of Encoder creation, right now I usually use the private API `ExpressionEncoder.copy` as follows:
{code:java}
object FooEncoder {
 private lazy val _encoder: ExpressionEncoder[Foo] = ExpressionEncoder[Foo]()
 implicit def encoder: ExpressionEncoder[Foo] = _encoder.copy()
}
{code}
This PR proposes a new method `makeCopy` in `Encoder` so that the above codes can be rewritten using public APIs.
{code:java}
object FooEncoder {
 private lazy val _encoder: Encoder[Foo] = Encoders.product[Foo]()
 implicit def encoder: Encoder[Foo] = _encoder.makeCopy
}
{code}

  was:
Because `Encoder` is not thread safe, the user cannot reuse an `Encoder` in multiple `Dataset`s. However, creating an `Encoder` for a complicated class is slow due to Scala reflections. To reduce the cost of Encoder creation, right now I usually use the private API `ExpressionEncoder.copy` as follows:

{code}
object FooEncoder {
 private lazy val _encoder: ExpressionEncoder[Foo] = ExpressionEncoder[Foo]()
 implicit def encoder: ExpressionEncoder[Foo] = _encoder.copy()
}
{code}

This PR proposes a new method `copyEncoder` in `Encoder` so that the above codes can be rewritten using public APIs.

{code}
object FooEncoder {
 private lazy val _encoder: Encoder[Foo] = Encoders.product[Foo]()
 implicit def encoder: Encoder[Foo] = _encoder.copyEncoder()
}
{code}

Regarding the method name, 
- Why not use `copy`? It conflicts with `case class`'s copy.
- Why not use `clone`? It conflicts with `Object.clone`.

        Summary: Add a public API `Encoder.makeCopy` to allow creating Encoder without touching Scala reflections  (was: Add a public API `Encoder.copyEncoder` to allow creating Encoder without touching Scala reflections)

> Add a public API `Encoder.makeCopy` to allow creating Encoder without touching Scala reflections
> ------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-28456
>                 URL: https://issues.apache.org/jira/browse/SPARK-28456
>             Project: Spark
>          Issue Type: New Feature
>          Components: SQL
>    Affects Versions: 2.4.3
>            Reporter: Shixiong Zhu
>            Assignee: Shixiong Zhu
>            Priority: Major
>
> Because `Encoder` is not thread safe, the user cannot reuse an `Encoder` in multiple `Dataset`s. However, creating an `Encoder` for a complicated class is slow due to Scala reflections. To reduce the cost of Encoder creation, right now I usually use the private API `ExpressionEncoder.copy` as follows:
> {code:java}
> object FooEncoder {
>  private lazy val _encoder: ExpressionEncoder[Foo] = ExpressionEncoder[Foo]()
>  implicit def encoder: ExpressionEncoder[Foo] = _encoder.copy()
> }
> {code}
> This PR proposes a new method `makeCopy` in `Encoder` so that the above codes can be rewritten using public APIs.
> {code:java}
> object FooEncoder {
>  private lazy val _encoder: Encoder[Foo] = Encoders.product[Foo]()
>  implicit def encoder: Encoder[Foo] = _encoder.makeCopy
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org