You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Shixiong Zhu (JIRA)" <ji...@apache.org> on 2019/07/21 22:42:00 UTC
[jira] [Updated] (SPARK-28456) Add a public API `Encoder.makeCopy`
to allow creating Encoder without touching Scala reflections
[ https://issues.apache.org/jira/browse/SPARK-28456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Shixiong Zhu updated SPARK-28456:
---------------------------------
Description:
Because `Encoder` is not thread safe, the user cannot reuse an `Encoder` in multiple `Dataset`s. However, creating an `Encoder` for a complicated class is slow due to Scala reflections. To reduce the cost of Encoder creation, right now I usually use the private API `ExpressionEncoder.copy` as follows:
{code:java}
object FooEncoder {
private lazy val _encoder: ExpressionEncoder[Foo] = ExpressionEncoder[Foo]()
implicit def encoder: ExpressionEncoder[Foo] = _encoder.copy()
}
{code}
This PR proposes a new method `makeCopy` in `Encoder` so that the above codes can be rewritten using public APIs.
{code:java}
object FooEncoder {
private lazy val _encoder: Encoder[Foo] = Encoders.product[Foo]()
implicit def encoder: Encoder[Foo] = _encoder.makeCopy
}
{code}
was:
Because `Encoder` is not thread safe, the user cannot reuse an `Encoder` in multiple `Dataset`s. However, creating an `Encoder` for a complicated class is slow due to Scala reflections. To reduce the cost of Encoder creation, right now I usually use the private API `ExpressionEncoder.copy` as follows:
{code}
object FooEncoder {
private lazy val _encoder: ExpressionEncoder[Foo] = ExpressionEncoder[Foo]()
implicit def encoder: ExpressionEncoder[Foo] = _encoder.copy()
}
{code}
This PR proposes a new method `copyEncoder` in `Encoder` so that the above codes can be rewritten using public APIs.
{code}
object FooEncoder {
private lazy val _encoder: Encoder[Foo] = Encoders.product[Foo]()
implicit def encoder: Encoder[Foo] = _encoder.copyEncoder()
}
{code}
Regarding the method name,
- Why not use `copy`? It conflicts with `case class`'s copy.
- Why not use `clone`? It conflicts with `Object.clone`.
Summary: Add a public API `Encoder.makeCopy` to allow creating Encoder without touching Scala reflections (was: Add a public API `Encoder.copyEncoder` to allow creating Encoder without touching Scala reflections)
> Add a public API `Encoder.makeCopy` to allow creating Encoder without touching Scala reflections
> ------------------------------------------------------------------------------------------------
>
> Key: SPARK-28456
> URL: https://issues.apache.org/jira/browse/SPARK-28456
> Project: Spark
> Issue Type: New Feature
> Components: SQL
> Affects Versions: 2.4.3
> Reporter: Shixiong Zhu
> Assignee: Shixiong Zhu
> Priority: Major
>
> Because `Encoder` is not thread safe, the user cannot reuse an `Encoder` in multiple `Dataset`s. However, creating an `Encoder` for a complicated class is slow due to Scala reflections. To reduce the cost of Encoder creation, right now I usually use the private API `ExpressionEncoder.copy` as follows:
> {code:java}
> object FooEncoder {
> private lazy val _encoder: ExpressionEncoder[Foo] = ExpressionEncoder[Foo]()
> implicit def encoder: ExpressionEncoder[Foo] = _encoder.copy()
> }
> {code}
> This PR proposes a new method `makeCopy` in `Encoder` so that the above codes can be rewritten using public APIs.
> {code:java}
> object FooEncoder {
> private lazy val _encoder: Encoder[Foo] = Encoders.product[Foo]()
> implicit def encoder: Encoder[Foo] = _encoder.makeCopy
> }
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org