You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by ueshin <gi...@git.apache.org> on 2017/05/10 07:04:52 UTC

[GitHub] spark pull request #17933: [SPARK-20588][SQL] Cache TimeZone instances per t...

GitHub user ueshin opened a pull request:

    https://github.com/apache/spark/pull/17933

    [SPARK-20588][SQL] Cache TimeZone instances per thread.

    ## What changes were proposed in this pull request?
    
    Because the method `TimeZone.getTimeZone(String ID)` is synchronized on the TimeZone class, concurrent call of this method will become a bottleneck.
    This especially happens when casting from string value containing timezone info to timestamp value, which uses `DateTimeUtils.stringToTimestamp()` and gets TimeZone instance on the site.
    
    This pr makes a cache of the generated TimeZone instances per thread to avoid the synchronization.
    
    ## How was this patch tested?
    
    Existing tests.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ueshin/apache-spark issues/SPARK-20588

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/17933.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #17933
    
----
commit de79e50779c0f2e17ea26301ac7d1216b37331c9
Author: Takuya UESHIN <ue...@databricks.com>
Date:   2017-05-10T05:55:53Z

    Cache TimeZone instances per thread.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17933: [SPARK-20588][SQL] Cache TimeZone instances.

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17933
  
    **[Test build #76923 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76923/testReport)** for PR 17933 at commit [`3cdbb3a`](https://github.com/apache/spark/commit/3cdbb3acf12b2082056e8b4e2eb3f1645fa1bde7).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17933: [SPARK-20588][SQL] Cache TimeZone instances per thread.

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17933
  
    **[Test build #76732 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76732/testReport)** for PR 17933 at commit [`de79e50`](https://github.com/apache/spark/commit/de79e50779c0f2e17ea26301ac7d1216b37331c9).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17933: [SPARK-20588][SQL] Cache TimeZone instances per thread.

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17933
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17933: [SPARK-20588][SQL] Cache TimeZone instances.

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17933
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76923/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17933: [SPARK-20588][SQL] Cache TimeZone instances.

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17933
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17933: [SPARK-20588][SQL] Cache TimeZone instances per t...

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17933#discussion_r116396203
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala ---
    @@ -98,6 +99,14 @@ object DateTimeUtils {
         sdf
       }
     
    +  private val threadLocalTimeZones = new ThreadLocal[mutable.Map[String, TimeZone]] {
    --- End diff --
    
    That's a good point.
    How about using `ConcurrentHashMap` instead?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17933: [SPARK-20588][SQL] Cache TimeZone instances per t...

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17933#discussion_r115702672
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala ---
    @@ -98,6 +99,21 @@ object DateTimeUtils {
         sdf
       }
     
    +  private val threadLocalTimeZones = new ThreadLocal[mutable.Map[String, TimeZone]] {
    +    override def initialValue(): mutable.Map[String, TimeZone] = mutable.Map.empty
    +  }
    +
    +  def getTimeZone(timeZoneId: String): TimeZone = {
    +    val timeZones = threadLocalTimeZones.get()
    --- End diff --
    
    I'll update it. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17933: [SPARK-20588][SQL] Cache TimeZone instances per t...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17933#discussion_r116377714
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala ---
    @@ -98,6 +99,14 @@ object DateTimeUtils {
         sdf
       }
     
    +  private val threadLocalTimeZones = new ThreadLocal[mutable.Map[String, TimeZone]] {
    --- End diff --
    
    As we won't go to update this map. We only need synchronization when putting the values. The content of the map can be shared between threads when reading. I am wondering if we need a local map for each thread.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17933: [SPARK-20588][SQL] Cache TimeZone instances per thread.

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17933
  
    **[Test build #76732 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76732/testReport)** for PR 17933 at commit [`de79e50`](https://github.com/apache/spark/commit/de79e50779c0f2e17ea26301ac7d1216b37331c9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17933: [SPARK-20588][SQL] Cache TimeZone instances.

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17933
  
    **[Test build #76921 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76921/testReport)** for PR 17933 at commit [`7935a1a`](https://github.com/apache/spark/commit/7935a1a8d8336924e361559d7a708d73b8568e68).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17933: [SPARK-20588][SQL] Cache TimeZone instances per t...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17933#discussion_r115690941
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala ---
    @@ -954,8 +955,9 @@ case class FromUTCTimestamp(left: Expression, right: Expression)
             val tzTerm = ctx.freshName("tz")
             val utcTerm = ctx.freshName("utc")
             val tzClass = classOf[TimeZone].getName
    -        ctx.addMutableState(tzClass, tzTerm, s"""$tzTerm = $tzClass.getTimeZone("$tz");""")
    -        ctx.addMutableState(tzClass, utcTerm, s"""$utcTerm = $tzClass.getTimeZone("UTC");""")
    +        val dtu = DateTimeUtils.getClass.getName.stripSuffix("$")
    --- End diff --
    
    Is it more efficient to save this value in a static (object) member somewhere or will it not matter much? I see it's used several times.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17933: [SPARK-20588][SQL] Cache TimeZone instances per t...

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17933#discussion_r115702646
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala ---
    @@ -954,8 +955,9 @@ case class FromUTCTimestamp(left: Expression, right: Expression)
             val tzTerm = ctx.freshName("tz")
             val utcTerm = ctx.freshName("utc")
             val tzClass = classOf[TimeZone].getName
    -        ctx.addMutableState(tzClass, tzTerm, s"""$tzTerm = $tzClass.getTimeZone("$tz");""")
    -        ctx.addMutableState(tzClass, utcTerm, s"""$utcTerm = $tzClass.getTimeZone("UTC");""")
    +        val dtu = DateTimeUtils.getClass.getName.stripSuffix("$")
    --- End diff --
    
    I don't think it will matter much for now because this will be processed only once per generating code.
    What do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17933: [SPARK-20588][SQL] Cache TimeZone instances.

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17933
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17933: [SPARK-20588][SQL] Cache TimeZone instances.

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/17933


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17933: [SPARK-20588][SQL] Cache TimeZone instances per thread.

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17933
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17933: [SPARK-20588][SQL] Cache TimeZone instances.

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/17933
  
    Thanks! Merging to master/2.2


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17933: [SPARK-20588][SQL] Cache TimeZone instances per thread.

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17933
  
    **[Test build #76747 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76747/testReport)** for PR 17933 at commit [`97d5bba`](https://github.com/apache/spark/commit/97d5bba544d26444dd945e848ec5da0d37a9a381).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17933: [SPARK-20588][SQL] Cache TimeZone instances per thread.

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17933
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76732/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17933: [SPARK-20588][SQL] Cache TimeZone instances.

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17933#discussion_r116398915
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala ---
    @@ -98,6 +101,15 @@ object DateTimeUtils {
         sdf
       }
     
    +  private val computedTimeZones = new ConcurrentHashMap[String, TimeZone]
    +  private val computeTimeZone = new JFunction[String, TimeZone] {
    +    override def apply(timeZoneId: String): TimeZone = TimeZone.getTimeZone(timeZoneId)
    +  }
    +
    +  def getTimeZone(timeZoneId: String): TimeZone = {
    +    computedTimeZones.computeIfAbsent(timeZoneId, computeTimeZone)
    --- End diff --
    
    I believe Java 7 support was removed as of Spark 2.2.0.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17933: [SPARK-20588][SQL] Cache TimeZone instances.

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17933
  
    **[Test build #76923 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76923/testReport)** for PR 17933 at commit [`3cdbb3a`](https://github.com/apache/spark/commit/3cdbb3acf12b2082056e8b4e2eb3f1645fa1bde7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17933: [SPARK-20588][SQL] Cache TimeZone instances.

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17933#discussion_r116398184
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala ---
    @@ -98,6 +99,14 @@ object DateTimeUtils {
         sdf
       }
     
    +  private val threadLocalTimeZones = new ThreadLocal[mutable.Map[String, TimeZone]] {
    --- End diff --
    
    Sounds good to me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17933: [SPARK-20588][SQL] Cache TimeZone instances.

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/17933
  
    LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17933: [SPARK-20588][SQL] Cache TimeZone instances per thread.

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17933
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76747/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17933: [SPARK-20588][SQL] Cache TimeZone instances.

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17933#discussion_r116398563
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala ---
    @@ -20,9 +20,12 @@ package org.apache.spark.sql.catalyst.util
     import java.sql.{Date, Timestamp}
     import java.text.{DateFormat, SimpleDateFormat}
     import java.util.{Calendar, Locale, TimeZone}
    +import java.util.concurrent.ConcurrentHashMap
    +import java.util.function.{Function => JFunction}
     import javax.xml.bind.DatatypeConverter
     
     import scala.annotation.tailrec
    +import scala.collection.mutable
    --- End diff --
    
    We can remove this now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17933: [SPARK-20588][SQL] Cache TimeZone instances per thread.

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17933
  
    **[Test build #76747 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76747/testReport)** for PR 17933 at commit [`97d5bba`](https://github.com/apache/spark/commit/97d5bba544d26444dd945e848ec5da0d37a9a381).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17933: [SPARK-20588][SQL] Cache TimeZone instances per t...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17933#discussion_r115691373
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala ---
    @@ -98,6 +99,21 @@ object DateTimeUtils {
         sdf
       }
     
    +  private val threadLocalTimeZones = new ThreadLocal[mutable.Map[String, TimeZone]] {
    +    override def initialValue(): mutable.Map[String, TimeZone] = mutable.Map.empty
    +  }
    +
    +  def getTimeZone(timeZoneId: String): TimeZone = {
    +    val timeZones = threadLocalTimeZones.get()
    --- End diff --
    
    How about just `threadLocalTimeZones.get().getOrElseUpdate(timeZoneId, TimeZone.getTimeZone(timeZoneId))`? It avoids the double lookup.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17933: [SPARK-20588][SQL] Cache TimeZone instances.

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17933#discussion_r116398454
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala ---
    @@ -98,6 +101,15 @@ object DateTimeUtils {
         sdf
       }
     
    +  private val computedTimeZones = new ConcurrentHashMap[String, TimeZone]
    +  private val computeTimeZone = new JFunction[String, TimeZone] {
    +    override def apply(timeZoneId: String): TimeZone = TimeZone.getTimeZone(timeZoneId)
    +  }
    +
    +  def getTimeZone(timeZoneId: String): TimeZone = {
    +    computedTimeZones.computeIfAbsent(timeZoneId, computeTimeZone)
    --- End diff --
    
    Is Java 7 support completely removed?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17933: [SPARK-20588][SQL] Cache TimeZone instances.

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17933#discussion_r116398935
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala ---
    @@ -20,9 +20,12 @@ package org.apache.spark.sql.catalyst.util
     import java.sql.{Date, Timestamp}
     import java.text.{DateFormat, SimpleDateFormat}
     import java.util.{Calendar, Locale, TimeZone}
    +import java.util.concurrent.ConcurrentHashMap
    +import java.util.function.{Function => JFunction}
     import javax.xml.bind.DatatypeConverter
     
     import scala.annotation.tailrec
    +import scala.collection.mutable
    --- End diff --
    
    Thanks, I'll remove it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17933: [SPARK-20588][SQL] Cache TimeZone instances.

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17933
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76921/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17933: [SPARK-20588][SQL] Cache TimeZone instances.

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17933
  
    **[Test build #76921 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76921/testReport)** for PR 17933 at commit [`7935a1a`](https://github.com/apache/spark/commit/7935a1a8d8336924e361559d7a708d73b8568e68).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org