You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by amitsela <gi...@git.apache.org> on 2016/05/31 22:48:04 UTC

[GitHub] spark pull request: [SPARK-15489][SQL][WIP] Dataset kryo encoder won't load ...

GitHub user amitsela opened a pull request:

    https://github.com/apache/spark/pull/13424

    [SPARK-15489][SQL][WIP] Dataset kryo encoder won't load custom user settings

    ## What changes were proposed in this pull request?
    
    Serializer instantiation will consider existing SparkConf
    
    
    ## How was this patch tested?
    manual test with `ImmutableList` (Guava) and `kryo-serializers`'s `Immutable*Serializer` implementations.
    This is marked as WIP because I'm wondering about a unit test, but I haven't found any example of a test that instantiates serializers with codegen.
    
    
    (If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
    
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/amitsela/spark SPARK-15489

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/13424.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #13424
    
----
commit ea4eb20922490e0a93b8e880349cee415ccb83c6
Author: Sela <an...@paypal.com>
Date:   2016-05-30T21:45:55Z

    serializer instance should take existing configuration into account

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15489][SQL][WIP] Dataset kryo encoder won't load ...

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13424#discussion_r65367568
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala ---
    @@ -547,11 +547,18 @@ case class EncodeUsingSerializer(child: Expression, kryo: Boolean)
             (classOf[JavaSerializer].getName, classOf[JavaSerializerInstance].getName)
           }
         }
    +    // try conf from env, otherwise create a new one
    +    val env = s"${classOf[SparkEnv].getName}.get()"
         val sparkConf = s"new ${classOf[SparkConf].getName}()"
    -    ctx.addMutableState(
    -      serializerInstanceClass,
    -      serializer,
    -      s"$serializer = ($serializerInstanceClass) new $serializerClass($sparkConf).newInstance();")
    +    val serializerInit = s"""
    +      $serializer = null;
    --- End diff --
    
    This line seems not needed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13424: [SPARK-15489][SQL] Dataset kryo encoder won't load custo...

Posted by amitsela <gi...@git.apache.org>.
Github user amitsela commented on the issue:

    https://github.com/apache/spark/pull/13424
  
    I'm guessing here that the PR didn't start a Jenkins build because of the [WIP] so if a committer could "kick" Jenkins here... Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13424: [SPARK-15489][SQL][WIP] Dataset kryo encoder won't load ...

Posted by amitsela <gi...@git.apache.org>.
Github user amitsela commented on the issue:

    https://github.com/apache/spark/pull/13424
  
    I will check if the mentioned line above can be removed. If you're switching serializer to boolean as mentioned, I don't see any other way, correct ?
    Regarding tests, is there any existing test suite that I could use that provides access to a serializer instance "in the executor" ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13424: [SPARK-15489][SQL][WIP] Dataset kryo encoder won't load ...

Posted by amitsela <gi...@git.apache.org>.
Github user amitsela commented on the issue:

    https://github.com/apache/spark/pull/13424
  
    I've added a test suite since I couldn't find an existing one that would fit (I needed to override `beforeAll()`) so I will remove WIP from the PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15489][SQL][WIP] Dataset kryo encoder won't load ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13424
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13424: [SPARK-15489][SQL] Dataset kryo encoder won't load custo...

Posted by amitsela <gi...@git.apache.org>.
Github user amitsela commented on the issue:

    https://github.com/apache/spark/pull/13424
  
    Done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13424: [SPARK-15489][SQL] Dataset kryo encoder won't load custo...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13424
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15489][SQL][WIP] Dataset kryo encoder won't load ...

Posted by amitsela <gi...@git.apache.org>.
Github user amitsela commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13424#discussion_r65369454
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala ---
    @@ -547,11 +547,18 @@ case class EncodeUsingSerializer(child: Expression, kryo: Boolean)
             (classOf[JavaSerializer].getName, classOf[JavaSerializerInstance].getName)
           }
         }
    +    // try conf from env, otherwise create a new one
    +    val env = s"${classOf[SparkEnv].getName}.get()"
         val sparkConf = s"new ${classOf[SparkConf].getName}()"
    -    ctx.addMutableState(
    -      serializerInstanceClass,
    -      serializer,
    -      s"$serializer = ($serializerInstanceClass) new $serializerClass($sparkConf).newInstance();")
    +    val serializerInit = s"""
    +      $serializer = null;
    +      if ($env == null) {
    +        $serializer = ($serializerInstanceClass) new $serializerClass($sparkConf).newInstance();
    +       } else {
    +         $serializer = ($serializerInstanceClass) new $serializerClass($env.conf()).newInstance();
    --- End diff --
    
    Looks like you're right, the existing code created a new `Serializer` each time, so I tried to fix that..
    Adding to that, I originally wrote this for branch-1.6 where there is no `Serializer` in `SpaekEnv` AFAIK.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13424: [SPARK-15489][SQL] Dataset kryo encoder won't load custo...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13424
  
    **[Test build #60208 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60208/consoleFull)** for PR 13424 at commit [`0a53f8e`](https://github.com/apache/spark/commit/0a53f8ef9962ce49f78cfd9bea34a0cbb497d1f7).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13424: [SPARK-15489][SQL] Dataset kryo encoder won't load custo...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13424
  
    **[Test build #60208 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60208/consoleFull)** for PR 13424 at commit [`0a53f8e`](https://github.com/apache/spark/commit/0a53f8ef9962ce49f78cfd9bea34a0cbb497d1f7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13424: [SPARK-15489][SQL] Dataset kryo encoder won't load custo...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the issue:

    https://github.com/apache/spark/pull/13424
  
    LGTM, can you update the description (it still says WIP).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15489][SQL][WIP] Dataset kryo encoder won't load ...

Posted by amitsela <gi...@git.apache.org>.
Github user amitsela commented on the pull request:

    https://github.com/apache/spark/pull/13424
  
    @marmbrus does this make sense to you ? I went for `SparkEnv` because I couldn't find any entry/access point in the code to get the user-defined configuration otherwise.
    Any pointers on how it would be best to test this ? I could write a test pipeline, but I thought you could maybe point me to an existing suite that has some boiler plate for this.
    Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13424: [SPARK-15489][SQL] Dataset kryo encoder won't load custo...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the issue:

    https://github.com/apache/spark/pull/13424
  
    ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15489][SQL][WIP] Dataset kryo encoder won't load ...

Posted by amitsela <gi...@git.apache.org>.
Github user amitsela commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13424#discussion_r65369539
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala ---
    @@ -547,11 +547,18 @@ case class EncodeUsingSerializer(child: Expression, kryo: Boolean)
             (classOf[JavaSerializer].getName, classOf[JavaSerializerInstance].getName)
           }
         }
    +    // try conf from env, otherwise create a new one
    +    val env = s"${classOf[SparkEnv].getName}.get()"
         val sparkConf = s"new ${classOf[SparkConf].getName}()"
    -    ctx.addMutableState(
    -      serializerInstanceClass,
    -      serializer,
    -      s"$serializer = ($serializerInstanceClass) new $serializerClass($sparkConf).newInstance();")
    +    val serializerInit = s"""
    +      $serializer = null;
    --- End diff --
    
    I'll check that, this generated code is Java, right ? so the if could be confusing..


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15489][SQL][WIP] Dataset kryo encoder won't load ...

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13424#discussion_r65367637
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala ---
    @@ -547,11 +547,18 @@ case class EncodeUsingSerializer(child: Expression, kryo: Boolean)
             (classOf[JavaSerializer].getName, classOf[JavaSerializerInstance].getName)
           }
         }
    +    // try conf from env, otherwise create a new one
    +    val env = s"${classOf[SparkEnv].getName}.get()"
         val sparkConf = s"new ${classOf[SparkConf].getName}()"
    -    ctx.addMutableState(
    -      serializerInstanceClass,
    -      serializer,
    -      s"$serializer = ($serializerInstanceClass) new $serializerClass($sparkConf).newInstance();")
    +    val serializerInit = s"""
    +      $serializer = null;
    +      if ($env == null) {
    +        $serializer = ($serializerInstanceClass) new $serializerClass($sparkConf).newInstance();
    +       } else {
    +         $serializer = ($serializerInstanceClass) new $serializerClass($env.conf()).newInstance();
    --- End diff --
    
    Can't we use `SparkEnv.serializer` instead of creating new serializer if the env exists?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13424: [SPARK-15489][SQL] Dataset kryo encoder won't load custo...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13424
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60208/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13424: [SPARK-15489][SQL] Dataset kryo encoder won't loa...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/13424


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15489][SQL][WIP] Dataset kryo encoder won't load ...

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13424#discussion_r65371678
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala ---
    @@ -547,11 +547,18 @@ case class EncodeUsingSerializer(child: Expression, kryo: Boolean)
             (classOf[JavaSerializer].getName, classOf[JavaSerializerInstance].getName)
           }
         }
    +    // try conf from env, otherwise create a new one
    +    val env = s"${classOf[SparkEnv].getName}.get()"
         val sparkConf = s"new ${classOf[SparkConf].getName}()"
    -    ctx.addMutableState(
    -      serializerInstanceClass,
    -      serializer,
    -      s"$serializer = ($serializerInstanceClass) new $serializerClass($sparkConf).newInstance();")
    +    val serializerInit = s"""
    +      $serializer = null;
    +      if ($env == null) {
    +        $serializer = ($serializerInstanceClass) new $serializerClass($sparkConf).newInstance();
    +       } else {
    +         $serializer = ($serializerInstanceClass) new $serializerClass($env.conf()).newInstance();
    --- End diff --
    
    Ah, I'm sorry but we might want to switch serializer by the `kryo: Boolean` parameter.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13424: [SPARK-15489][SQL][WIP] Dataset kryo encoder won't load ...

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on the issue:

    https://github.com/apache/spark/pull/13424
  
    As for my second comment about switching serializer, yes, please leave it as it is.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13424: [SPARK-15489][SQL] Dataset kryo encoder won't load custo...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the issue:

    https://github.com/apache/spark/pull/13424
  
    Thanks!  Merged into master and 2.0


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org