You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Josh Rosen (JIRA)" <ji...@apache.org> on 2015/10/14 00:11:08 UTC
[jira] [Updated] (SPARK-11080) Incorporate per-JVM id into ExprId to prevent unsafe cross-JVM comparisions

     [ https://issues.apache.org/jira/browse/SPARK-11080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Josh Rosen updated SPARK-11080:
-------------------------------
    Description: 
In the current implementation of named expressions' ExprIds, we rely on a per-JVM AtomicLong to ensure that expression ids are unique within a JVM. However, these expression ids will not be globally unique. This opens the potential for id collisions if new expression ids happen to be created inside of tasks rather than on the driver.

There are currently a few cases where tasks allocate expression ids, which happen to be safe because those expressions are never compared to expressions created on the driver. In order to guard against the introduction of invalid comparisons between driver-created and executor-created expression ids, this patch extends ExprId to incorporate a UUID to identify the JVM that created the id, which prevents collisions.

  was:
My understanding of {{NamedExpression.newExprId}} is that it is only intended to be called on the driver. If it is called on executors, then this may lead to scenarios where the same expression id is re-used in two different NamedExpressions.

More generally, I think that calling {{NamedExpression.newExprId}} within tasks may be an indicator of potential attribute binding bugs. Therefore, I think that we should prevent {{NamedExpression.newExprId}} from being called inside of tasks by throwing an exception when such calls occur. 


> Incorporate per-JVM id into ExprId to prevent unsafe cross-JVM comparisions
> ---------------------------------------------------------------------------
>
>                 Key: SPARK-11080
>                 URL: https://issues.apache.org/jira/browse/SPARK-11080
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>            Reporter: Josh Rosen
>            Assignee: Josh Rosen
>             Fix For: 1.6.0
>
>
> In the current implementation of named expressions' ExprIds, we rely on a per-JVM AtomicLong to ensure that expression ids are unique within a JVM. However, these expression ids will not be globally unique. This opens the potential for id collisions if new expression ids happen to be created inside of tasks rather than on the driver.
> There are currently a few cases where tasks allocate expression ids, which happen to be safe because those expressions are never compared to expressions created on the driver. In order to guard against the introduction of invalid comparisons between driver-created and executor-created expression ids, this patch extends ExprId to incorporate a UUID to identify the JVM that created the id, which prevents collisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org