You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Xin Wu (JIRA)" <ji...@apache.org> on 2015/10/14 03:24:05 UTC
[jira] [Commented] (SPARK-10747) add support for window specification to include how NULLS are ordered

    [ https://issues.apache.org/jira/browse/SPARK-10747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14956094#comment-14956094 ] 

Xin Wu commented on SPARK-10747:
--------------------------------

I ran this query on the released Hive 1.2.1 version, and this is not supported yet
{code}
hive> select rnum, c1, c2, c3, dense_rank() over(partition by c1 order by c3 desc nulls last) from tolap;
FAILED: ParseException line 1:76 missing ) at 'nulls' near 'nulls'
line 1:82 missing EOF at 'last' near 'nulls'
{code}

And SparkSQL is using Hive ql parser to parse the query. and it will fail. 

{code}
scala> sqlContext.sql("select rnum, c1, c2, c3, dense_rank() over(partition by c1 order by c3 desc nulls last) from tolap")
org.apache.spark.sql.AnalysisException: line 1:76 missing ) at 'nulls' near 'nulls'
line 1:82 missing EOF at 'last' near 'nulls';
        at org.apache.spark.sql.hive.HiveQl$.createPlan(HiveQl.scala:298)
        at org.apache.spark.sql.hive.ExtendedHiveQlParser$$anonfun$hiveQl$1.apply(ExtendedHiveQlParser.scala:41)
        at org.apache.spark.sql.hive.ExtendedHiveQlParser$$anonfun$hiveQl$1.apply(ExtendedHiveQlParser.scala:40)
        at scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:136)
        at scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:135)
        at scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:242)
        at scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:242)
        at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222)
        at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1$$anonfun$apply$2.apply(Parsers.scala:254)
        at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1$$anonfun$apply$2.apply(Parsers.scala:254)
        at scala.util.parsing.combinator.Parsers$Failure.append(Parsers.scala:202)
        at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:254)
        at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:254)
        at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222)
        at scala.util.parsing.combinator.Parsers$$anon$2$$anonfun$apply$14.apply(Parsers.scala:891)
        at scala.util.parsing.combinator.Parsers$$anon$2$$anonfun$apply$14.apply(Parsers.scala:891)
        at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
        at scala.util.parsing.combinator.Parsers$$anon$2.apply(Parsers.scala:890)
        at scala.util.parsing.combinator.PackratParsers$$anon$1.apply(PackratParsers.scala:110)
        at org.apache.spark.sql.catalyst.AbstractSparkSQLParser.parse(AbstractSparkSQLParser.scala:34)
        at org.apache.spark.sql.hive.HiveQl$.parseSql(HiveQl.scala:276)
        at org.apache.spark.sql.hive.HiveQLDialect.parse(HiveContext.scala:62)
        at org.apache.spark.sql.SQLContext$$anonfun$3.apply(SQLContext.scala:173)
        at org.apache.spark.sql.SQLContext$$anonfun$3.apply(SQLContext.scala:173)
        at org.apache.spark.sql.SparkSQLParser$$anonfun$org$apache$spark$sql$SparkSQLParser$$others$1.apply(SparkSQLParser.scala:115)
        at org.apache.spark.sql.SparkSQLParser$$anonfun$org$apache$spark$sql$SparkSQLParser$$others$1.apply(SparkSQLParser.scala:114)
        at scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:136)
        at scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:135)
        at scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:242)
        at scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:242)
        at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222)

{code}

In HiveQl.scala, you see the following where getAst(sql) will throw the org.apache.hadoop.hive.ql.parse.ParseException,

{code}
def createPlan(sql: String): LogicalPlan = {
    try {
      val tree = getAst(sql)
      if (nativeCommands contains tree.getText) {
        HiveNativeCommand(sql)
      } else {
        nodeToPlan(tree) match {
          case NativePlaceholder => HiveNativeCommand(sql)
          case other => other
        }
      }
    } catch {
      case pe: org.apache.hadoop.hive.ql.parse.ParseException =>
        pe.getMessage match {
          case errorRegEx(line, start, message) =>
            throw new AnalysisException(message, Some(line.toInt), Some(start.toInt))
          case otherMessage =>
            throw new AnalysisException(otherMessage)
        }

{code}

which is thrown by org.apache.hadoop.hive.ql.parse.ParseDriver.java

{code}
public ASTNode parse(String command) throws ParseException {
        return this.parse(command, (Context)null);
    }
{code}

So I think this needs to wait for HIVE-9535 to be resolved.. 
I am new and learning the spark code, so I hope my understanding is correct here. 


> add support for window specification to include how NULLS are ordered
> ---------------------------------------------------------------------
>
>                 Key: SPARK-10747
>                 URL: https://issues.apache.org/jira/browse/SPARK-10747
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 1.5.0
>            Reporter: N Campbell
>
> You cannot express how NULLS are to be sorted in the window order specification and have to use a compensating expression to simulate.
> Error: org.apache.spark.sql.AnalysisException: line 1:76 missing ) at 'nulls' near 'nulls'
> line 1:82 missing EOF at 'last' near 'nulls';
> SQLState:  null
> Same limitation as Hive reported in Apache JIRA HIVE-9535 )
> This fails
> select rnum, c1, c2, c3, dense_rank() over(partition by c1 order by c3 desc nulls last) from tolap
> select rnum, c1, c2, c3, dense_rank() over(partition by c1 order by case when c3 is null then 1 else 0 end) from tolap



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org