You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "zengxl (JIRA)" <ji...@apache.org> on 2018/11/08 05:53:00 UTC
[jira] [Updated] (SPARK-25961) 处理数据倾斜时使用随机数不支持

     [ https://issues.apache.org/jira/browse/SPARK-25961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

zengxl updated SPARK-25961:
---------------------------
    Description: 
my query sql use two table join,one table join key has null value,i use rand value instead of null value,but has error,the error info as follows：

Error in query: nondeterministic expressions are only allowed in

Project, Filter, Aggregate or Window, found

 

 

scan spark source code is org.apache.spark.sql.catalyst.analysis.CheckAnalysis check sql, because the number of random variables is uncertain, it is prohibited

case o if o.expressions.exists(!_.deterministic) &&
 !o.isInstanceOf[Project] && !o.isInstanceOf[Filter] &&
 !o.isInstanceOf[Aggregate] && !o.isInstanceOf[Window] =>
 // The rule above is used to check Aggregate operator.
 failAnalysis(
 s"""nondeterministic expressions are only allowed in
|Project, Filter, Aggregate or Window, found:|
|${o.expressions.map(_.sql).mkString(",")}|
|in operator ${operator.simpleString}
 """.stripMargin)|
 
Is it possible to add Join to this code? It's not yet tested.And whether there will be other effects

case o if o.expressions.exists(!_.deterministic) &&
 !o.isInstanceOf[Project] && !o.isInstanceOf[Filter] &&
 !o.isInstanceOf[Aggregate] && !o.isInstanceOf[Window] +{color:#d04437}&& !o.isInstanceOf[Join]{color}+ =>
 // The rule above is used to check Aggregate operator.
 failAnalysis(
 s"""nondeterministic expressions are only allowed in
|Project, Filter, Aggregate or Window or Join, found:|
|${o.expressions.map(_.sql).mkString(",")}|
|in operator ${operator.simpleString}
 """.stripMargin)|

 

this is my sparksql：

SELECT
 T1.CUST_NO AS CUST_NO ,
 T3.CON_LAST_NAME AS CUST_NAME ,
 T3.CON_SEX_MF AS SEX_CODE ,
 T3.X_POSITION AS POST_LV_CODE 
 FROM tmp.ICT_CUST_RANGE_INFO T1
 LEFT join tmp.F_CUST_BASE_INFO_ALL T3 ON CASE WHEN coalesce(T1.CUST_NO,'') ='' THEN concat('cust_no',RAND()) ELSE T1.CUST_NO END = T3.BECIF and T3.DATE='20181105'
 WHERE T1.DATE='20181105'

  was:
两个表连接，有一个表存在空值，给join键加上随机数，提示不可以

Error in query: nondeterministic expressions are only allowed in

Project, Filter, Aggregate or Window, found

查看源码发现是在org.apache.spark.sql.catalyst.analysis.CheckAnalysis进行sql校验，由于随机数是不确定值被禁止了

case o if o.expressions.exists(!_.deterministic) &&
 !o.isInstanceOf[Project] && !o.isInstanceOf[Filter] &&
 !o.isInstanceOf[Aggregate] && !o.isInstanceOf[Window] =>
 // The rule above is used to check Aggregate operator.
 failAnalysis(
 s"""nondeterministic expressions are only allowed in
 |Project, Filter, Aggregate or Window, found:
 | ${o.expressions.map(_.sql).mkString(",")}
 |in operator ${operator.simpleString}
 """.stripMargin)

是否在这段代码加上Join情况就可以？现在还没测试

case o if o.expressions.exists(!_.deterministic) &&
 !o.isInstanceOf[Project] && !o.isInstanceOf[Filter] &&
 !o.isInstanceOf[Aggregate] && !o.isInstanceOf[Window] +{color:#d04437}&& !o.isInstanceOf[Join]{color}+ =>
 // The rule above is used to check Aggregate operator.
 failAnalysis(
 s"""nondeterministic expressions are only allowed in
 |Project, Filter, Aggregate or Window or Join, found:
 | ${o.expressions.map(_.sql).mkString(",")}
 |in operator ${operator.simpleString}
 """.stripMargin)

 

我的sql：

SELECT
T1.CUST_NO AS CUST_NO ,
T3.CON_LAST_NAME AS CUST_NAME ,
T3.CON_SEX_MF AS SEX_CODE ,
T3.X_POSITION AS POST_LV_CODE 
FROM tmp.ICT_CUST_RANGE_INFO T1
LEFT join tmp.F_CUST_BASE_INFO_ALL T3 ON CASE WHEN coalesce(T1.CUST_NO,'') ='' THEN concat('cust_no',RAND()) ELSE T1.CUST_NO END = T3.BECIF and T3.DATE='20181105'
WHERE T1.DATE='20181105'


> 处理数据倾斜时使用随机数不支持
> ---------------
>
>                 Key: SPARK-25961
>                 URL: https://issues.apache.org/jira/browse/SPARK-25961
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.3.1
>         Environment: spark on yarn 2.3.1
>            Reporter: zengxl
>            Priority: Major
>
> my query sql use two table join,one table join key has null value,i use rand value instead of null value,but has error,the error info as follows：
> Error in query: nondeterministic expressions are only allowed in
> Project, Filter, Aggregate or Window, found
>  
>  
> scan spark source code is org.apache.spark.sql.catalyst.analysis.CheckAnalysis check sql, because the number of random variables is uncertain, it is prohibited
> case o if o.expressions.exists(!_.deterministic) &&
>  !o.isInstanceOf[Project] && !o.isInstanceOf[Filter] &&
>  !o.isInstanceOf[Aggregate] && !o.isInstanceOf[Window] =>
>  // The rule above is used to check Aggregate operator.
>  failAnalysis(
>  s"""nondeterministic expressions are only allowed in
> |Project, Filter, Aggregate or Window, found:|
> |${o.expressions.map(_.sql).mkString(",")}|
> |in operator ${operator.simpleString}
>  """.stripMargin)|
>  
> Is it possible to add Join to this code? It's not yet tested.And whether there will be other effects
> case o if o.expressions.exists(!_.deterministic) &&
>  !o.isInstanceOf[Project] && !o.isInstanceOf[Filter] &&
>  !o.isInstanceOf[Aggregate] && !o.isInstanceOf[Window] +{color:#d04437}&& !o.isInstanceOf[Join]{color}+ =>
>  // The rule above is used to check Aggregate operator.
>  failAnalysis(
>  s"""nondeterministic expressions are only allowed in
> |Project, Filter, Aggregate or Window or Join, found:|
> |${o.expressions.map(_.sql).mkString(",")}|
> |in operator ${operator.simpleString}
>  """.stripMargin)|
>  
> this is my sparksql：
> SELECT
>  T1.CUST_NO AS CUST_NO ,
>  T3.CON_LAST_NAME AS CUST_NAME ,
>  T3.CON_SEX_MF AS SEX_CODE ,
>  T3.X_POSITION AS POST_LV_CODE 
>  FROM tmp.ICT_CUST_RANGE_INFO T1
>  LEFT join tmp.F_CUST_BASE_INFO_ALL T3 ON CASE WHEN coalesce(T1.CUST_NO,'') ='' THEN concat('cust_no',RAND()) ELSE T1.CUST_NO END = T3.BECIF and T3.DATE='20181105'
>  WHERE T1.DATE='20181105'



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org