You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by ueshin <gi...@git.apache.org> on 2018/08/24 10:21:33 UTC

[GitHub] spark pull request #22216: [SPARK-25223][SQL] Use a map to store values for ...

GitHub user ueshin opened a pull request:

    https://github.com/apache/spark/pull/22216

    [SPARK-25223][SQL] Use a map to store values for NamedLambdaVariable.

    ## What changes were proposed in this pull request?
    
    Currently we use `functionsForEval`, `NamedLambdaVarible`s in which are replace with of arguments from the original functions, to make sure the lambda variables refer the same instances as of arguments, but it's pretty hacky.
    Instead, we can use a global map and set/get the lambda variable values in the map.
    
    ## How was this patch tested?
    
    Existing tests.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ueshin/apache-spark issues/SPARK-25223/use_map

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22216.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22216
    
----
commit 6dd5349ce8fca053432ebcea0d4699923fd1f15d
Author: Takuya UESHIN <ue...@...>
Date:   2018-08-24T06:48:03Z

    Move nullable to `SimpleHigherOrderFunction`.

commit c27121b8743fad106cd5cd2ce33d22814856ec9c
Author: Takuya UESHIN <ue...@...>
Date:   2018-08-24T07:55:03Z

    Use a map to store values for `NamedLambdaVariable`.

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22216: [SPARK-25223][SQL] Use a map to store values for NamedLa...

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on the issue:

    https://github.com/apache/spark/pull/22216
  
    cc @cloud-fan 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22216: [SPARK-25223][SQL] Use a map to store values for NamedLa...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22216
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2523/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22216: [SPARK-25223][SQL] Use a map to store values for NamedLa...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22216
  
    **[Test build #95203 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95203/testReport)** for PR 22216 at commit [`c27121b`](https://github.com/apache/spark/commit/c27121b8743fad106cd5cd2ce33d22814856ec9c).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22216: [SPARK-25223][SQL] Use a map to store values for ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22216#discussion_r212663909
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala ---
    @@ -35,28 +35,38 @@ case class NamedLambdaVariable(
         name: String,
         dataType: DataType,
         nullable: Boolean,
    -    exprId: ExprId = NamedExpression.newExprId,
    -    value: AtomicReference[Any] = new AtomicReference())
    +    exprId: ExprId = NamedExpression.newExprId)
       extends LeafExpression
       with NamedExpression
       with CodegenFallback {
     
       override def qualifier: Seq[String] = Seq.empty
     
    -  override def newInstance(): NamedExpression =
    -    copy(exprId = NamedExpression.newExprId, value = new AtomicReference())
    +  override def newInstance(): NamedExpression = copy(exprId = NamedExpression.newExprId)
     
       override def toAttribute: Attribute = {
         AttributeReference(name, dataType, nullable, Metadata.empty)(exprId, Seq.empty)
       }
     
    -  override def eval(input: InternalRow): Any = value.get
    +  override def eval(input: InternalRow): Any = NamedLambdaVariable.values(this)
     
       override def toString: String = s"lambda $name#${exprId.id}$typeSuffix"
     
       override def simpleString: String = s"lambda $name#${exprId.id}: ${dataType.simpleString}"
     }
     
    +object NamedLambdaVariable {
    +
    +  private[this] val _values =
    +    new ThreadLocal[mutable.Map[NamedLambdaVariable, Any]] {
    --- End diff --
    
    shall we use `ExprId` as the key?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22216: [SPARK-25223][SQL] Use a map to store values for NamedLa...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22216
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22216: [SPARK-25223][SQL] Use a map to store values for ...

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin closed the pull request at:

    https://github.com/apache/spark/pull/22216


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22216: [SPARK-25223][SQL] Use a map to store values for NamedLa...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22216
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95203/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22216: [SPARK-25223][SQL] Use a map to store values for NamedLa...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on the issue:

    https://github.com/apache/spark/pull/22216
  
    I think the use of global state and a thread local is far more hacky and probably is slower.
    
    The only clean solution I see here is to pass the lambda values around using the input row. I am not saying that this easy.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22216: [SPARK-25223][SQL] Use a map to store values for NamedLa...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22216
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22216: [SPARK-25223][SQL] Use a map to store values for NamedLa...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22216
  
    **[Test build #95203 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95203/testReport)** for PR 22216 at commit [`c27121b`](https://github.com/apache/spark/commit/c27121b8743fad106cd5cd2ce33d22814856ec9c).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22216: [SPARK-25223][SQL] Use a map to store values for NamedLa...

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on the issue:

    https://github.com/apache/spark/pull/22216
  
    I see. Let me think again. I'd close this for now. Thanks!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org