You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Xin Wu (JIRA)" <ji...@apache.org> on 2015/10/14 03:24:05 UTC
[jira] [Commented] (SPARK-10747) add support for window
specification to include how NULLS are ordered
[ https://issues.apache.org/jira/browse/SPARK-10747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14956094#comment-14956094 ]
Xin Wu commented on SPARK-10747:
--------------------------------
I ran this query on the released Hive 1.2.1 version, and this is not supported yet
{code}
hive> select rnum, c1, c2, c3, dense_rank() over(partition by c1 order by c3 desc nulls last) from tolap;
FAILED: ParseException line 1:76 missing ) at 'nulls' near 'nulls'
line 1:82 missing EOF at 'last' near 'nulls'
{code}
And SparkSQL is using Hive ql parser to parse the query. and it will fail.
{code}
scala> sqlContext.sql("select rnum, c1, c2, c3, dense_rank() over(partition by c1 order by c3 desc nulls last) from tolap")
org.apache.spark.sql.AnalysisException: line 1:76 missing ) at 'nulls' near 'nulls'
line 1:82 missing EOF at 'last' near 'nulls';
at org.apache.spark.sql.hive.HiveQl$.createPlan(HiveQl.scala:298)
at org.apache.spark.sql.hive.ExtendedHiveQlParser$$anonfun$hiveQl$1.apply(ExtendedHiveQlParser.scala:41)
at org.apache.spark.sql.hive.ExtendedHiveQlParser$$anonfun$hiveQl$1.apply(ExtendedHiveQlParser.scala:40)
at scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:136)
at scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:135)
at scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:242)
at scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:242)
at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222)
at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1$$anonfun$apply$2.apply(Parsers.scala:254)
at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1$$anonfun$apply$2.apply(Parsers.scala:254)
at scala.util.parsing.combinator.Parsers$Failure.append(Parsers.scala:202)
at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:254)
at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:254)
at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222)
at scala.util.parsing.combinator.Parsers$$anon$2$$anonfun$apply$14.apply(Parsers.scala:891)
at scala.util.parsing.combinator.Parsers$$anon$2$$anonfun$apply$14.apply(Parsers.scala:891)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
at scala.util.parsing.combinator.Parsers$$anon$2.apply(Parsers.scala:890)
at scala.util.parsing.combinator.PackratParsers$$anon$1.apply(PackratParsers.scala:110)
at org.apache.spark.sql.catalyst.AbstractSparkSQLParser.parse(AbstractSparkSQLParser.scala:34)
at org.apache.spark.sql.hive.HiveQl$.parseSql(HiveQl.scala:276)
at org.apache.spark.sql.hive.HiveQLDialect.parse(HiveContext.scala:62)
at org.apache.spark.sql.SQLContext$$anonfun$3.apply(SQLContext.scala:173)
at org.apache.spark.sql.SQLContext$$anonfun$3.apply(SQLContext.scala:173)
at org.apache.spark.sql.SparkSQLParser$$anonfun$org$apache$spark$sql$SparkSQLParser$$others$1.apply(SparkSQLParser.scala:115)
at org.apache.spark.sql.SparkSQLParser$$anonfun$org$apache$spark$sql$SparkSQLParser$$others$1.apply(SparkSQLParser.scala:114)
at scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:136)
at scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:135)
at scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:242)
at scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:242)
at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222)
{code}
In HiveQl.scala, you see the following where getAst(sql) will throw the org.apache.hadoop.hive.ql.parse.ParseException,
{code}
def createPlan(sql: String): LogicalPlan = {
try {
val tree = getAst(sql)
if (nativeCommands contains tree.getText) {
HiveNativeCommand(sql)
} else {
nodeToPlan(tree) match {
case NativePlaceholder => HiveNativeCommand(sql)
case other => other
}
}
} catch {
case pe: org.apache.hadoop.hive.ql.parse.ParseException =>
pe.getMessage match {
case errorRegEx(line, start, message) =>
throw new AnalysisException(message, Some(line.toInt), Some(start.toInt))
case otherMessage =>
throw new AnalysisException(otherMessage)
}
{code}
which is thrown by org.apache.hadoop.hive.ql.parse.ParseDriver.java
{code}
public ASTNode parse(String command) throws ParseException {
return this.parse(command, (Context)null);
}
{code}
So I think this needs to wait for HIVE-9535 to be resolved..
I am new and learning the spark code, so I hope my understanding is correct here.
> add support for window specification to include how NULLS are ordered
> ---------------------------------------------------------------------
>
> Key: SPARK-10747
> URL: https://issues.apache.org/jira/browse/SPARK-10747
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 1.5.0
> Reporter: N Campbell
>
> You cannot express how NULLS are to be sorted in the window order specification and have to use a compensating expression to simulate.
> Error: org.apache.spark.sql.AnalysisException: line 1:76 missing ) at 'nulls' near 'nulls'
> line 1:82 missing EOF at 'last' near 'nulls';
> SQLState: null
> Same limitation as Hive reported in Apache JIRA HIVE-9535 )
> This fails
> select rnum, c1, c2, c3, dense_rank() over(partition by c1 order by c3 desc nulls last) from tolap
> select rnum, c1, c2, c3, dense_rank() over(partition by c1 order by case when c3 is null then 1 else 0 end) from tolap
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org