You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Kishor Bachhav <kb...@pivotal.io> on 2015/10/28 13:52:50 UTC
nested select is not working in spark sql
Hi,
I am trying to execute below query in spark sql but throws exception
select n_name from NATION where n_regionkey = (select r_regionkey from
REGION where r_name='ASIA')
Exception:
Exception in thread "main" java.lang.RuntimeException: [1.55] failure:
``)'' expected but identifier r_regionkey found
select n_name from NATION where n_regionkey = (select r_regionkey from
REGION where r_name='ASIA')
^
at scala.sys.package$.error(package.scala:27)
at
org.apache.spark.sql.catalyst.AbstractSparkSQLParser.parse(AbstractSparkSQLParser.scala:36)
at
org.apache.spark.sql.SnappyParserDialect.parse(snappyParsers.scala:65)
at
org.apache.spark.sql.SQLContext$$anonfun$3.apply(SQLContext.scala:169)
at
org.apache.spark.sql.SQLContext$$anonfun$3.apply(SQLContext.scala:169)
at
org.apache.spark.sql.SparkSQLParser$$anonfun$org$apache$spark$sql$SparkSQLParser$$others$1.apply(SparkSQLParser.scala:115)
at
org.apache.spark.sql.SparkSQLParser$$anonfun$org$apache$spark$sql$SparkSQLParser$$others$1.apply(SparkSQLParser.scala:114)
at scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:136)
at scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:135)
at
scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:242)
at
scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:242)
at
scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222)
at
scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1$$anonfun$apply$2.apply(Parsers.scala:254)
at
scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1$$anonfun$apply$2.apply(Parsers.scala:254)
at
scala.util.parsing.combinator.Parsers$Failure.append(Parsers.scala:202)
at
scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:254)
at
scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:254)
at
scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222)
at
scala.util.parsing.combinator.Parsers$$anon$2$$anonfun$apply$14.apply(Parsers.scala:891)
at
scala.util.parsing.combinator.Parsers$$anon$2$$anonfun$apply$14.apply(Parsers.scala:891)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
at
scala.util.parsing.combinator.Parsers$$anon$2.apply(Parsers.scala:890)
at
scala.util.parsing.combinator.PackratParsers$$anon$1.apply(PackratParsers.scala:110)
at
org.apache.spark.sql.catalyst.AbstractSparkSQLParser.parse(AbstractSparkSQLParser.scala:34)
Same is working in mysql as well as memsql.
Expected Result is
memsql> select n_name from NATION where n_regionkey = (select r_regionkey
from REGION where r_name='ASIA');
+-----------+
| n_name |
+-----------+
| INDIA |
| INDONESIA |
| JAPAN |
| CHINA |
| VIETNAM |
+-----------+
5 rows in set (0.71 sec)
How can I make this work in spark sql?
Actually above query is one simplified version of Minimum cost supplier
query (Q2) of TPCH which has this nested select nature. I am working on
these TPCH queries. If anybody has the modified set of TPCH queries for
spark sql, kindly let me know. It will be very useful for me.
select
s_acctbal,
s_name,
n_name,
p_partkey,
p_mfgr,
s_address,
s_phone,
s_comment
from
part,
supplier,
partsupp,
nation,
region
where
p_partkey = ps_partkey
and s_suppkey = ps_suppkey
and p_size = [SIZE]
and p_type like '%[TYPE]'
and s_nationkey = n_nationkey
and n_regionkey = r_regionkey
and r_name = '[REGION]'
and ps_supplycost = (
select
min(ps_supplycost)
from
partsupp, supplier,
nation, region
where
p_partkey = ps_partkey
and s_suppkey = ps_suppkey
and s_nationkey = n_nationkey
and n_regionkey = r_regionkey
and r_name = '[REGION]'
)
order by
s_acctbal desc,
n_name,
s_name,
p_partkey;
Regards
Kishor
Re: nested select is not working in spark sql
Posted by Deenar Toraskar <de...@gmail.com>.
You can try the following syntax
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SubQueries
SELECT *
FROM A
WHERE A.a IN (SELECT foo FROM B);
Regards
Deenar
*Think Reactive Ltd*
deenar.toraskar@thinkreactive.co.uk
07714140812
On 28 October 2015 at 14:37, Richard Hillegas <rh...@us.ibm.com> wrote:
> On 28 October 2015 at 14:37, Richard Hillegas <rh...@us.ibm.com> wrote:
>
Hi Kishor,
>
> Spark doesn't currently support subqueries in the WHERE clause. However,
> it looks as though someone is working on this right now:
> https://issues.apache.org/jira/browse/SPARK-4226
>
> Hope this helps,
> Rick Hillegas
>
>
>
> Kishor Bachhav <kb...@pivotal.io> wrote on 10/28/2015 05:52:50 AM:
>
> > From: Kishor Bachhav <kb...@pivotal.io>
> > To: user@spark.apache.org
> > Date: 10/28/2015 05:53 AM
> > Subject: nested select is not working in spark sql
>
> >
> > Hi,
>
> > I am trying to execute below query in spark sql but throws exception
> >
> > select n_name from NATION where n_regionkey = (select r_regionkey
> > from REGION where r_name='ASIA')
>
> > Exception:
> > Exception in thread "main" java.lang.RuntimeException: [1.55]
> > failure: ``)'' expected but identifier r_regionkey found
> >
> > select n_name from NATION where n_regionkey = (select r_regionkey
> > from REGION where r_name='ASIA')
> > ^
> > at scala.sys.package$.error(package.scala:27)
> > at org.apache.spark.sql.catalyst.AbstractSparkSQLParser.parse
> > (AbstractSparkSQLParser.scala:36)
> > at
> org.apache.spark.sql.SnappyParserDialect.parse(snappyParsers.scala:65)
> > at
> org.apache.spark.sql.SQLContext$$anonfun$3.apply(SQLContext.scala:169)
> > at
> org.apache.spark.sql.SQLContext$$anonfun$3.apply(SQLContext.scala:169)
> > at org.apache.spark.sql.SparkSQLParser$$anonfun$org$apache$spark
> > $sql$SparkSQLParser$$others$1.apply(SparkSQLParser.scala:115)
> > at org.apache.spark.sql.SparkSQLParser$$anonfun$org$apache$spark
> > $sql$SparkSQLParser$$others$1.apply(SparkSQLParser.scala:114)
> > at
> scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:136)
> > at
> scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:135)
> > at scala.util.parsing.combinator.Parsers$Parser$$anonfun$map
> > $1.apply(Parsers.scala:242)
> > at scala.util.parsing.combinator.Parsers$Parser$$anonfun$map
> > $1.apply(Parsers.scala:242)
> > at
> scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222)
> > at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append
> > $1$$anonfun$apply$2.apply(Parsers.scala:254)
> > at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append
> > $1$$anonfun$apply$2.apply(Parsers.scala:254)
> > at
> scala.util.parsing.combinator.Parsers$Failure.append(Parsers.scala:202)
> > at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append
> > $1.apply(Parsers.scala:254)
> > at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append
> > $1.apply(Parsers.scala:254)
> > at
> scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222)
> > at scala.util.parsing.combinator.Parsers$$anon$2$$anonfun$apply
> > $14.apply(Parsers.scala:891)
> > at scala.util.parsing.combinator.Parsers$$anon$2$$anonfun$apply
> > $14.apply(Parsers.scala:891)
> > at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
> > at
> scala.util.parsing.combinator.Parsers$$anon$2.apply(Parsers.scala:890)
> > at scala.util.parsing.combinator.PackratParsers$$anon$1.apply
> > (PackratParsers.scala:110)
> > at org.apache.spark.sql.catalyst.AbstractSparkSQLParser.parse
> > (AbstractSparkSQLParser.scala:34)
> >
>
> > Same is working in mysql as well as memsql.
>
> > Expected Result is
> >
> > memsql> select n_name from NATION where n_regionkey = (select
> > r_regionkey from REGION where r_name='ASIA');
> > +-----------+
> > | n_name |
> > +-----------+
> > | INDIA |
> > | INDONESIA |
> > | JAPAN |
> > | CHINA |
> > | VIETNAM |
> > +-----------+
> > 5 rows in set (0.71 sec)
>
> > How can I make this work in spark sql?
>
> > Actually above query is one simplified version of Minimum cost
> > supplier query (Q2) of TPCH which has this nested select nature. I
> > am working on these TPCH queries. If anybody has the modified set of
> > TPCH queries for spark sql, kindly let me know. It will be very useful
> for me.
> >
> > select
> > s_acctbal,
> > s_name,
> > n_name,
> > p_partkey,
> > p_mfgr,
> > s_address,
> > s_phone,
> > s_comment
> > from
> > part,
> > supplier,
> > partsupp,
> > nation,
> > region
> > where
> > p_partkey = ps_partkey
> > and s_suppkey = ps_suppkey
> > and p_size = [SIZE]
> > and p_type like '%[TYPE]'
> > and s_nationkey = n_nationkey
> > and n_regionkey = r_regionkey
> > and r_name = '[REGION]'
> > and ps_supplycost = (
> > select
> > min(ps_supplycost)
> > from
> > partsupp, supplier,
> > nation, region
> > where
> > p_partkey = ps_partkey
> > and s_suppkey = ps_suppkey
> > and s_nationkey = n_nationkey
> > and n_regionkey = r_regionkey
> > and r_name = '[REGION]'
> > )
> > order by
> > s_acctbal desc,
> > n_name,
> > s_name,
> > p_partkey;
> >
>
> > Regards
> > Kishor
>
>
Re: nested select is not working in spark sql
Posted by Richard Hillegas <rh...@us.ibm.com>.
Hi Kishor,
Spark doesn't currently support subqueries in the WHERE clause. However, it
looks as though someone is working on this right now:
https://issues.apache.org/jira/browse/SPARK-4226
Hope this helps,
Rick Hillegas
Kishor Bachhav <kb...@pivotal.io> wrote on 10/28/2015 05:52:50 AM:
> From: Kishor Bachhav <kb...@pivotal.io>
> To: user@spark.apache.org
> Date: 10/28/2015 05:53 AM
> Subject: nested select is not working in spark sql
>
> Hi,
> I am trying to execute below query in spark sql but throws exception
>
> select n_name from NATION where n_regionkey = (select r_regionkey
> from REGION where r_name='ASIA')
> Exception:
> Exception in thread "main" java.lang.RuntimeException: [1.55]
> failure: ``)'' expected but identifier r_regionkey found
>
> select n_name from NATION where n_regionkey = (select r_regionkey
> from REGION where r_name='ASIA')
> ^
> at scala.sys.package$.error(package.scala:27)
> at org.apache.spark.sql.catalyst.AbstractSparkSQLParser.parse
> (AbstractSparkSQLParser.scala:36)
> at org.apache.spark.sql.SnappyParserDialect.parse
(snappyParsers.scala:65)
> at org.apache.spark.sql.SQLContext$$anonfun$3.apply
(SQLContext.scala:169)
> at org.apache.spark.sql.SQLContext$$anonfun$3.apply
(SQLContext.scala:169)
> at org.apache.spark.sql.SparkSQLParser$$anonfun$org$apache$spark
> $sql$SparkSQLParser$$others$1.apply(SparkSQLParser.scala:115)
> at org.apache.spark.sql.SparkSQLParser$$anonfun$org$apache$spark
> $sql$SparkSQLParser$$others$1.apply(SparkSQLParser.scala:114)
> at scala.util.parsing.combinator.Parsers$Success.map
(Parsers.scala:136)
> at scala.util.parsing.combinator.Parsers$Success.map
(Parsers.scala:135)
> at scala.util.parsing.combinator.Parsers$Parser$$anonfun$map
> $1.apply(Parsers.scala:242)
> at scala.util.parsing.combinator.Parsers$Parser$$anonfun$map
> $1.apply(Parsers.scala:242)
> at scala.util.parsing.combinator.Parsers$$anon$3.apply
(Parsers.scala:222)
> at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append
> $1$$anonfun$apply$2.apply(Parsers.scala:254)
> at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append
> $1$$anonfun$apply$2.apply(Parsers.scala:254)
> at scala.util.parsing.combinator.Parsers$Failure.append
(Parsers.scala:202)
> at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append
> $1.apply(Parsers.scala:254)
> at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append
> $1.apply(Parsers.scala:254)
> at scala.util.parsing.combinator.Parsers$$anon$3.apply
(Parsers.scala:222)
> at scala.util.parsing.combinator.Parsers$$anon$2$$anonfun$apply
> $14.apply(Parsers.scala:891)
> at scala.util.parsing.combinator.Parsers$$anon$2$$anonfun$apply
> $14.apply(Parsers.scala:891)
> at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
> at scala.util.parsing.combinator.Parsers$$anon$2.apply
(Parsers.scala:890)
> at scala.util.parsing.combinator.PackratParsers$$anon$1.apply
> (PackratParsers.scala:110)
> at org.apache.spark.sql.catalyst.AbstractSparkSQLParser.parse
> (AbstractSparkSQLParser.scala:34)
>
> Same is working in mysql as well as memsql.
> Expected Result is
>
> memsql> select n_name from NATION where n_regionkey = (select
> r_regionkey from REGION where r_name='ASIA');
> +-----------+
> | n_name |
> +-----------+
> | INDIA |
> | INDONESIA |
> | JAPAN |
> | CHINA |
> | VIETNAM |
> +-----------+
> 5 rows in set (0.71 sec)
> How can I make this work in spark sql?
> Actually above query is one simplified version of Minimum cost
> supplier query (Q2) of TPCH which has this nested select nature. I
> am working on these TPCH queries. If anybody has the modified set of
> TPCH queries for spark sql, kindly let me know. It will be very useful
for me.
>
> select
> s_acctbal,
> s_name,
> n_name,
> p_partkey,
> p_mfgr,
> s_address,
> s_phone,
> s_comment
> from
> part,
> supplier,
> partsupp,
> nation,
> region
> where
> p_partkey = ps_partkey
> and s_suppkey = ps_suppkey
> and p_size = [SIZE]
> and p_type like '%[TYPE]'
> and s_nationkey = n_nationkey
> and n_regionkey = r_regionkey
> and r_name = '[REGION]'
> and ps_supplycost = (
> select
> min(ps_supplycost)
> from
> partsupp, supplier,
> nation, region
> where
> p_partkey = ps_partkey
> and s_suppkey = ps_suppkey
> and s_nationkey = n_nationkey
> and n_regionkey = r_regionkey
> and r_name = '[REGION]'
> )
> order by
> s_acctbal desc,
> n_name,
> s_name,
> p_partkey;
>
> Regards
> Kishor