You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by shahab <sh...@gmail.com> on 2015/03/03 08:51:53 UTC
Supporting Hive features in Spark SQL Thrift JDBC server
Hi,
According to Spark SQL documentation, "....Spark SQL supports the vast
majority of Hive features, such as User Defined Functions( UDF) ", and one
of these UFDs is "current_date()" function, which should be supported.
However, i get error when I am using this UDF in my SQL query. There are
couple of other UDFs which cause similar error.
Am I missing something in my JDBC server ?
/Shahab
Re: Supporting Hive features in Spark SQL Thrift JDBC server
Posted by shahab <sh...@gmail.com>.
@Yin: sorry for my mistake, you are right it was added in 1.2, not 0.12.0 ,
my bad!
On Tue, Mar 3, 2015 at 6:47 PM, shahab <sh...@gmail.com> wrote:
> Thanks Rohit, yes my mistake, it does work with 1.1 ( I am actually
> running it on spark 1.1)
>
> But do you mean that even HiveConext of spark (nit Calliope
> CassandraAwareHiveContext) is not supporting Hive 0.12 ??
>
> best,
> /Shahab
>
> On Tue, Mar 3, 2015 at 5:55 PM, Rohit Rai <ro...@tuplejump.com> wrote:
>
>> The Hive dependency comes from spark-hive.
>>
>> It does work with Spark 1.1 we will have the 1.2 release later this month.
>> On Mar 3, 2015 8:49 AM, "shahab" <sh...@gmail.com> wrote:
>>
>>>
>>> Thanks Rohit,
>>>
>>> I am already using Calliope and quite happy with it, well done ! except
>>> the fact that :
>>> 1- It seems that it does not support Hive 0.12 or higher, Am i right?
>>> for example you can not use : current_time() UDF, or those new UDFs added
>>> in hive 0.12 . Are they supported? Any plan for supporting them?
>>> 2-It does not support Spark 1.1 and 1.2. Any plan for new release?
>>>
>>> best,
>>> /Shahab
>>>
>>> On Tue, Mar 3, 2015 at 5:41 PM, Rohit Rai <ro...@tuplejump.com> wrote:
>>>
>>>> Hello Shahab,
>>>>
>>>> I think CassandraAwareHiveContext
>>>> <https://github.com/tuplejump/calliope/blob/develop/sql/hive/src/main/scala/org/apache/spark/sql/hive/CassandraAwareHiveContext.scala> in
>>>> Calliopee is what you are looking for. Create CAHC instance and you should
>>>> be able to run hive functions against the SchemaRDD you create from there.
>>>>
>>>> Cheers,
>>>> Rohit
>>>>
>>>> *Founder & CEO, **Tuplejump, Inc.*
>>>> ____________________________
>>>> www.tuplejump.com
>>>> *The Data Engineering Platform*
>>>>
>>>> On Tue, Mar 3, 2015 at 6:03 AM, Cheng, Hao <ha...@intel.com> wrote:
>>>>
>>>>> The temp table in metastore can not be shared cross SQLContext
>>>>> instances, since HiveContext is a sub class of SQLContext (inherits all of
>>>>> its functionality), why not using a single HiveContext globally? Is there
>>>>> any specific requirement in your case that you need multiple
>>>>> SQLContext/HiveContext?
>>>>>
>>>>>
>>>>>
>>>>> *From:* shahab [mailto:shahab.mokari@gmail.com]
>>>>> *Sent:* Tuesday, March 3, 2015 9:46 PM
>>>>>
>>>>> *To:* Cheng, Hao
>>>>> *Cc:* user@spark.apache.org
>>>>> *Subject:* Re: Supporting Hive features in Spark SQL Thrift JDBC
>>>>> server
>>>>>
>>>>>
>>>>>
>>>>> You are right , CassandraAwareSQLContext is subclass of SQL context.
>>>>>
>>>>>
>>>>>
>>>>> But I did another experiment, I queried Cassandra
>>>>> using CassandraAwareSQLContext, then I registered the "rdd" as a temp table
>>>>> , next I tried to query it using HiveContext, but it seems that hive
>>>>> context can not see the registered table suing SQL context. Is this a
>>>>> normal case?
>>>>>
>>>>>
>>>>>
>>>>> best,
>>>>>
>>>>> /Shahab
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Mar 3, 2015 at 1:35 PM, Cheng, Hao <ha...@intel.com>
>>>>> wrote:
>>>>>
>>>>> Hive UDF are only applicable for HiveContext and its subclass
>>>>> instance, is the CassandraAwareSQLContext a direct sub class of
>>>>> HiveContext or SQLContext?
>>>>>
>>>>>
>>>>>
>>>>> *From:* shahab [mailto:shahab.mokari@gmail.com]
>>>>> *Sent:* Tuesday, March 3, 2015 5:10 PM
>>>>> *To:* Cheng, Hao
>>>>> *Cc:* user@spark.apache.org
>>>>> *Subject:* Re: Supporting Hive features in Spark SQL Thrift JDBC
>>>>> server
>>>>>
>>>>>
>>>>>
>>>>> val sc: SparkContext = new SparkContext(conf)
>>>>>
>>>>> val sqlCassContext = new CassandraAwareSQLContext(sc) // I used
>>>>> some Calliope Cassandra Spark connector
>>>>>
>>>>> val rdd : SchemaRDD = sqlCassContext.sql("select * from db.profile " )
>>>>>
>>>>> rdd.cache
>>>>>
>>>>> rdd.registerTempTable("profile")
>>>>>
>>>>> rdd.first //enforce caching
>>>>>
>>>>> val q = "select from_unixtime(floor(createdAt/1000)) from
>>>>> profile where sampling_bucket=0 "
>>>>>
>>>>> val rdd2 = rdd.sqlContext.sql(q )
>>>>>
>>>>> println ("Result: " + rdd2.first)
>>>>>
>>>>>
>>>>>
>>>>> And I get the following errors:
>>>>>
>>>>> xception in thread "main"
>>>>> org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Unresolved
>>>>> attributes: 'from_unixtime('floor(('createdAt / 1000))) AS c0#7, tree:
>>>>>
>>>>> Project ['from_unixtime('floor(('createdAt / 1000))) AS c0#7]
>>>>>
>>>>> Filter (sampling_bucket#10 = 0)
>>>>>
>>>>> Subquery profile
>>>>>
>>>>> Project
>>>>> [company#8,bucket#9,sampling_bucket#10,profileid#11,createdat#12L,modifiedat#13L,version#14]
>>>>>
>>>>> CassandraRelation localhost, 9042, 9160, normaldb_sampling,
>>>>> profile, org.apache.spark.sql.CassandraAwareSQLContext@778b692d,
>>>>> None, None, false, Some(Configuration: core-default.xml, core-site.xml,
>>>>> mapred-default.xml, mapred-site.xml)
>>>>>
>>>>>
>>>>>
>>>>> at
>>>>> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$apply$1.applyOrElse(Analyzer.scala:72)
>>>>>
>>>>> at
>>>>> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$apply$1.applyOrElse(Analyzer.scala:70)
>>>>>
>>>>> at
>>>>> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:165)
>>>>>
>>>>> at
>>>>> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:183)
>>>>>
>>>>> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>>>>>
>>>>> at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>>>>>
>>>>> at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>>>>>
>>>>> at
>>>>> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
>>>>>
>>>>> at
>>>>> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
>>>>>
>>>>> at
>>>>> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
>>>>>
>>>>> at scala.collection.TraversableOnce$class.to
>>>>> (TraversableOnce.scala:273)
>>>>>
>>>>> at scala.collection.AbstractIterator.to(Iterator.scala:1157)
>>>>>
>>>>> at
>>>>> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
>>>>>
>>>>> at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
>>>>>
>>>>> at
>>>>> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
>>>>>
>>>>> at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
>>>>>
>>>>> at
>>>>> org.apache.spark.sql.catalyst.trees.TreeNode.transformChildrenDown(TreeNode.scala:212)
>>>>>
>>>>> at
>>>>> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:168)
>>>>>
>>>>> at
>>>>> org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:156)
>>>>>
>>>>> at
>>>>> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:70)
>>>>>
>>>>> at
>>>>> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:68)
>>>>>
>>>>> at
>>>>> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:61)
>>>>>
>>>>> at
>>>>> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:59)
>>>>>
>>>>> at
>>>>> scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:51)
>>>>>
>>>>> at
>>>>> scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:60)
>>>>>
>>>>> at
>>>>> scala.collection.mutable.WrappedArray.foldLeft(WrappedArray.scala:34)
>>>>>
>>>>> at
>>>>> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:59)
>>>>>
>>>>> at
>>>>> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:51)
>>>>>
>>>>> at scala.collection.immutable.List.foreach(List.scala:318)
>>>>>
>>>>> at
>>>>> org.apache.spark.sql.catalyst.rules.RuleExecutor.apply(RuleExecutor.scala:51)
>>>>>
>>>>> at
>>>>> org.apache.spark.sql.SQLContext$QueryExecution.analyzed$lzycompute(SQLContext.scala:402)
>>>>>
>>>>> at
>>>>> org.apache.spark.sql.SQLContext$QueryExecution.analyzed(SQLContext.scala:402)
>>>>>
>>>>> at
>>>>> org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan$lzycompute(SQLContext.scala:403)
>>>>>
>>>>> at
>>>>> org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan(SQLContext.scala:403)
>>>>>
>>>>> at
>>>>> org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:407)
>>>>>
>>>>> at
>>>>> org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:405)
>>>>>
>>>>> at
>>>>> org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:411)
>>>>>
>>>>> at
>>>>> org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:411)
>>>>>
>>>>> at org.apache.spark.sql.SchemaRDD.collect(SchemaRDD.scala:438)
>>>>>
>>>>> at org.apache.spark.sql.SchemaRDD.take(SchemaRDD.scala:440)
>>>>>
>>>>> at org.apache.spark.sql.SchemaRDD.take(SchemaRDD.scala:103)
>>>>>
>>>>> at org.apache.spark.rdd.RDD.first(RDD.scala:1091)
>>>>>
>>>>> at boot.SQLDemo$.main(SQLDemo.scala:65) //my code
>>>>>
>>>>> at boot.SQLDemo.main(SQLDemo.scala) //my code
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Mar 3, 2015 at 8:57 AM, Cheng, Hao <ha...@intel.com>
>>>>> wrote:
>>>>>
>>>>> Can you provide the detailed failure call stack?
>>>>>
>>>>>
>>>>>
>>>>> *From:* shahab [mailto:shahab.mokari@gmail.com]
>>>>> *Sent:* Tuesday, March 3, 2015 3:52 PM
>>>>> *To:* user@spark.apache.org
>>>>> *Subject:* Supporting Hive features in Spark SQL Thrift JDBC server
>>>>>
>>>>>
>>>>>
>>>>> Hi,
>>>>>
>>>>>
>>>>>
>>>>> According to Spark SQL documentation, "....Spark SQL supports the
>>>>> vast majority of Hive features, such as User Defined Functions( UDF) ",
>>>>> and one of these UFDs is "current_date()" function, which should be
>>>>> supported.
>>>>>
>>>>>
>>>>>
>>>>> However, i get error when I am using this UDF in my SQL query. There
>>>>> are couple of other UDFs which cause similar error.
>>>>>
>>>>>
>>>>>
>>>>> Am I missing something in my JDBC server ?
>>>>>
>>>>>
>>>>>
>>>>> /Shahab
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>
Re: Supporting Hive features in Spark SQL Thrift JDBC server
Posted by shahab <sh...@gmail.com>.
Thanks Rohit, yes my mistake, it does work with 1.1 ( I am actually running
it on spark 1.1)
But do you mean that even HiveConext of spark (nit Calliope
CassandraAwareHiveContext) is not supporting Hive 0.12 ??
best,
/Shahab
On Tue, Mar 3, 2015 at 5:55 PM, Rohit Rai <ro...@tuplejump.com> wrote:
> The Hive dependency comes from spark-hive.
>
> It does work with Spark 1.1 we will have the 1.2 release later this month.
> On Mar 3, 2015 8:49 AM, "shahab" <sh...@gmail.com> wrote:
>
>>
>> Thanks Rohit,
>>
>> I am already using Calliope and quite happy with it, well done ! except
>> the fact that :
>> 1- It seems that it does not support Hive 0.12 or higher, Am i right?
>> for example you can not use : current_time() UDF, or those new UDFs added
>> in hive 0.12 . Are they supported? Any plan for supporting them?
>> 2-It does not support Spark 1.1 and 1.2. Any plan for new release?
>>
>> best,
>> /Shahab
>>
>> On Tue, Mar 3, 2015 at 5:41 PM, Rohit Rai <ro...@tuplejump.com> wrote:
>>
>>> Hello Shahab,
>>>
>>> I think CassandraAwareHiveContext
>>> <https://github.com/tuplejump/calliope/blob/develop/sql/hive/src/main/scala/org/apache/spark/sql/hive/CassandraAwareHiveContext.scala> in
>>> Calliopee is what you are looking for. Create CAHC instance and you should
>>> be able to run hive functions against the SchemaRDD you create from there.
>>>
>>> Cheers,
>>> Rohit
>>>
>>> *Founder & CEO, **Tuplejump, Inc.*
>>> ____________________________
>>> www.tuplejump.com
>>> *The Data Engineering Platform*
>>>
>>> On Tue, Mar 3, 2015 at 6:03 AM, Cheng, Hao <ha...@intel.com> wrote:
>>>
>>>> The temp table in metastore can not be shared cross SQLContext
>>>> instances, since HiveContext is a sub class of SQLContext (inherits all of
>>>> its functionality), why not using a single HiveContext globally? Is there
>>>> any specific requirement in your case that you need multiple
>>>> SQLContext/HiveContext?
>>>>
>>>>
>>>>
>>>> *From:* shahab [mailto:shahab.mokari@gmail.com]
>>>> *Sent:* Tuesday, March 3, 2015 9:46 PM
>>>>
>>>> *To:* Cheng, Hao
>>>> *Cc:* user@spark.apache.org
>>>> *Subject:* Re: Supporting Hive features in Spark SQL Thrift JDBC server
>>>>
>>>>
>>>>
>>>> You are right , CassandraAwareSQLContext is subclass of SQL context.
>>>>
>>>>
>>>>
>>>> But I did another experiment, I queried Cassandra
>>>> using CassandraAwareSQLContext, then I registered the "rdd" as a temp table
>>>> , next I tried to query it using HiveContext, but it seems that hive
>>>> context can not see the registered table suing SQL context. Is this a
>>>> normal case?
>>>>
>>>>
>>>>
>>>> best,
>>>>
>>>> /Shahab
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, Mar 3, 2015 at 1:35 PM, Cheng, Hao <ha...@intel.com> wrote:
>>>>
>>>> Hive UDF are only applicable for HiveContext and its subclass
>>>> instance, is the CassandraAwareSQLContext a direct sub class of
>>>> HiveContext or SQLContext?
>>>>
>>>>
>>>>
>>>> *From:* shahab [mailto:shahab.mokari@gmail.com]
>>>> *Sent:* Tuesday, March 3, 2015 5:10 PM
>>>> *To:* Cheng, Hao
>>>> *Cc:* user@spark.apache.org
>>>> *Subject:* Re: Supporting Hive features in Spark SQL Thrift JDBC server
>>>>
>>>>
>>>>
>>>> val sc: SparkContext = new SparkContext(conf)
>>>>
>>>> val sqlCassContext = new CassandraAwareSQLContext(sc) // I used some
>>>> Calliope Cassandra Spark connector
>>>>
>>>> val rdd : SchemaRDD = sqlCassContext.sql("select * from db.profile " )
>>>>
>>>> rdd.cache
>>>>
>>>> rdd.registerTempTable("profile")
>>>>
>>>> rdd.first //enforce caching
>>>>
>>>> val q = "select from_unixtime(floor(createdAt/1000)) from profile
>>>> where sampling_bucket=0 "
>>>>
>>>> val rdd2 = rdd.sqlContext.sql(q )
>>>>
>>>> println ("Result: " + rdd2.first)
>>>>
>>>>
>>>>
>>>> And I get the following errors:
>>>>
>>>> xception in thread "main"
>>>> org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Unresolved
>>>> attributes: 'from_unixtime('floor(('createdAt / 1000))) AS c0#7, tree:
>>>>
>>>> Project ['from_unixtime('floor(('createdAt / 1000))) AS c0#7]
>>>>
>>>> Filter (sampling_bucket#10 = 0)
>>>>
>>>> Subquery profile
>>>>
>>>> Project
>>>> [company#8,bucket#9,sampling_bucket#10,profileid#11,createdat#12L,modifiedat#13L,version#14]
>>>>
>>>> CassandraRelation localhost, 9042, 9160, normaldb_sampling,
>>>> profile, org.apache.spark.sql.CassandraAwareSQLContext@778b692d, None,
>>>> None, false, Some(Configuration: core-default.xml, core-site.xml,
>>>> mapred-default.xml, mapred-site.xml)
>>>>
>>>>
>>>>
>>>> at
>>>> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$apply$1.applyOrElse(Analyzer.scala:72)
>>>>
>>>> at
>>>> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$apply$1.applyOrElse(Analyzer.scala:70)
>>>>
>>>> at
>>>> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:165)
>>>>
>>>> at
>>>> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:183)
>>>>
>>>> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>>>>
>>>> at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>>>>
>>>> at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>>>>
>>>> at
>>>> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
>>>>
>>>> at
>>>> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
>>>>
>>>> at
>>>> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
>>>>
>>>> at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
>>>>
>>>> at scala.collection.AbstractIterator.to(Iterator.scala:1157)
>>>>
>>>> at
>>>> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
>>>>
>>>> at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
>>>>
>>>> at
>>>> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
>>>>
>>>> at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
>>>>
>>>> at
>>>> org.apache.spark.sql.catalyst.trees.TreeNode.transformChildrenDown(TreeNode.scala:212)
>>>>
>>>> at
>>>> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:168)
>>>>
>>>> at
>>>> org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:156)
>>>>
>>>> at
>>>> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:70)
>>>>
>>>> at
>>>> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:68)
>>>>
>>>> at
>>>> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:61)
>>>>
>>>> at
>>>> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:59)
>>>>
>>>> at
>>>> scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:51)
>>>>
>>>> at
>>>> scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:60)
>>>>
>>>> at scala.collection.mutable.WrappedArray.foldLeft(WrappedArray.scala:34)
>>>>
>>>> at
>>>> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:59)
>>>>
>>>> at
>>>> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:51)
>>>>
>>>> at scala.collection.immutable.List.foreach(List.scala:318)
>>>>
>>>> at
>>>> org.apache.spark.sql.catalyst.rules.RuleExecutor.apply(RuleExecutor.scala:51)
>>>>
>>>> at
>>>> org.apache.spark.sql.SQLContext$QueryExecution.analyzed$lzycompute(SQLContext.scala:402)
>>>>
>>>> at
>>>> org.apache.spark.sql.SQLContext$QueryExecution.analyzed(SQLContext.scala:402)
>>>>
>>>> at
>>>> org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan$lzycompute(SQLContext.scala:403)
>>>>
>>>> at
>>>> org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan(SQLContext.scala:403)
>>>>
>>>> at
>>>> org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:407)
>>>>
>>>> at
>>>> org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:405)
>>>>
>>>> at
>>>> org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:411)
>>>>
>>>> at
>>>> org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:411)
>>>>
>>>> at org.apache.spark.sql.SchemaRDD.collect(SchemaRDD.scala:438)
>>>>
>>>> at org.apache.spark.sql.SchemaRDD.take(SchemaRDD.scala:440)
>>>>
>>>> at org.apache.spark.sql.SchemaRDD.take(SchemaRDD.scala:103)
>>>>
>>>> at org.apache.spark.rdd.RDD.first(RDD.scala:1091)
>>>>
>>>> at boot.SQLDemo$.main(SQLDemo.scala:65) //my code
>>>>
>>>> at boot.SQLDemo.main(SQLDemo.scala) //my code
>>>>
>>>>
>>>>
>>>> On Tue, Mar 3, 2015 at 8:57 AM, Cheng, Hao <ha...@intel.com> wrote:
>>>>
>>>> Can you provide the detailed failure call stack?
>>>>
>>>>
>>>>
>>>> *From:* shahab [mailto:shahab.mokari@gmail.com]
>>>> *Sent:* Tuesday, March 3, 2015 3:52 PM
>>>> *To:* user@spark.apache.org
>>>> *Subject:* Supporting Hive features in Spark SQL Thrift JDBC server
>>>>
>>>>
>>>>
>>>> Hi,
>>>>
>>>>
>>>>
>>>> According to Spark SQL documentation, "....Spark SQL supports the vast
>>>> majority of Hive features, such as User Defined Functions( UDF) ", and one
>>>> of these UFDs is "current_date()" function, which should be supported.
>>>>
>>>>
>>>>
>>>> However, i get error when I am using this UDF in my SQL query. There
>>>> are couple of other UDFs which cause similar error.
>>>>
>>>>
>>>>
>>>> Am I missing something in my JDBC server ?
>>>>
>>>>
>>>>
>>>> /Shahab
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>
Re: Supporting Hive features in Spark SQL Thrift JDBC server
Posted by Yin Huai <yh...@databricks.com>.
@Shahab, based on https://issues.apache.org/jira/browse/HIVE-5472,
current_date was added in Hive *1.2.0 (not 0.12.0)*. For my previous email,
I meant current_date is not in neither Hive 0.12.0 nor Hive 0.13.1 (Spark
SQL currently supports these two Hive versions).
On Tue, Mar 3, 2015 at 8:55 AM, Rohit Rai <ro...@tuplejump.com> wrote:
> The Hive dependency comes from spark-hive.
>
> It does work with Spark 1.1 we will have the 1.2 release later this month.
> On Mar 3, 2015 8:49 AM, "shahab" <sh...@gmail.com> wrote:
>
>>
>> Thanks Rohit,
>>
>> I am already using Calliope and quite happy with it, well done ! except
>> the fact that :
>> 1- It seems that it does not support Hive 0.12 or higher, Am i right?
>> for example you can not use : current_time() UDF, or those new UDFs added
>> in hive 0.12 . Are they supported? Any plan for supporting them?
>> 2-It does not support Spark 1.1 and 1.2. Any plan for new release?
>>
>> best,
>> /Shahab
>>
>> On Tue, Mar 3, 2015 at 5:41 PM, Rohit Rai <ro...@tuplejump.com> wrote:
>>
>>> Hello Shahab,
>>>
>>> I think CassandraAwareHiveContext
>>> <https://github.com/tuplejump/calliope/blob/develop/sql/hive/src/main/scala/org/apache/spark/sql/hive/CassandraAwareHiveContext.scala> in
>>> Calliopee is what you are looking for. Create CAHC instance and you should
>>> be able to run hive functions against the SchemaRDD you create from there.
>>>
>>> Cheers,
>>> Rohit
>>>
>>> *Founder & CEO, **Tuplejump, Inc.*
>>> ____________________________
>>> www.tuplejump.com
>>> *The Data Engineering Platform*
>>>
>>> On Tue, Mar 3, 2015 at 6:03 AM, Cheng, Hao <ha...@intel.com> wrote:
>>>
>>>> The temp table in metastore can not be shared cross SQLContext
>>>> instances, since HiveContext is a sub class of SQLContext (inherits all of
>>>> its functionality), why not using a single HiveContext globally? Is there
>>>> any specific requirement in your case that you need multiple
>>>> SQLContext/HiveContext?
>>>>
>>>>
>>>>
>>>> *From:* shahab [mailto:shahab.mokari@gmail.com]
>>>> *Sent:* Tuesday, March 3, 2015 9:46 PM
>>>>
>>>> *To:* Cheng, Hao
>>>> *Cc:* user@spark.apache.org
>>>> *Subject:* Re: Supporting Hive features in Spark SQL Thrift JDBC server
>>>>
>>>>
>>>>
>>>> You are right , CassandraAwareSQLContext is subclass of SQL context.
>>>>
>>>>
>>>>
>>>> But I did another experiment, I queried Cassandra
>>>> using CassandraAwareSQLContext, then I registered the "rdd" as a temp table
>>>> , next I tried to query it using HiveContext, but it seems that hive
>>>> context can not see the registered table suing SQL context. Is this a
>>>> normal case?
>>>>
>>>>
>>>>
>>>> best,
>>>>
>>>> /Shahab
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, Mar 3, 2015 at 1:35 PM, Cheng, Hao <ha...@intel.com> wrote:
>>>>
>>>> Hive UDF are only applicable for HiveContext and its subclass
>>>> instance, is the CassandraAwareSQLContext a direct sub class of
>>>> HiveContext or SQLContext?
>>>>
>>>>
>>>>
>>>> *From:* shahab [mailto:shahab.mokari@gmail.com]
>>>> *Sent:* Tuesday, March 3, 2015 5:10 PM
>>>> *To:* Cheng, Hao
>>>> *Cc:* user@spark.apache.org
>>>> *Subject:* Re: Supporting Hive features in Spark SQL Thrift JDBC server
>>>>
>>>>
>>>>
>>>> val sc: SparkContext = new SparkContext(conf)
>>>>
>>>> val sqlCassContext = new CassandraAwareSQLContext(sc) // I used some
>>>> Calliope Cassandra Spark connector
>>>>
>>>> val rdd : SchemaRDD = sqlCassContext.sql("select * from db.profile " )
>>>>
>>>> rdd.cache
>>>>
>>>> rdd.registerTempTable("profile")
>>>>
>>>> rdd.first //enforce caching
>>>>
>>>> val q = "select from_unixtime(floor(createdAt/1000)) from profile
>>>> where sampling_bucket=0 "
>>>>
>>>> val rdd2 = rdd.sqlContext.sql(q )
>>>>
>>>> println ("Result: " + rdd2.first)
>>>>
>>>>
>>>>
>>>> And I get the following errors:
>>>>
>>>> xception in thread "main"
>>>> org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Unresolved
>>>> attributes: 'from_unixtime('floor(('createdAt / 1000))) AS c0#7, tree:
>>>>
>>>> Project ['from_unixtime('floor(('createdAt / 1000))) AS c0#7]
>>>>
>>>> Filter (sampling_bucket#10 = 0)
>>>>
>>>> Subquery profile
>>>>
>>>> Project
>>>> [company#8,bucket#9,sampling_bucket#10,profileid#11,createdat#12L,modifiedat#13L,version#14]
>>>>
>>>> CassandraRelation localhost, 9042, 9160, normaldb_sampling,
>>>> profile, org.apache.spark.sql.CassandraAwareSQLContext@778b692d, None,
>>>> None, false, Some(Configuration: core-default.xml, core-site.xml,
>>>> mapred-default.xml, mapred-site.xml)
>>>>
>>>>
>>>>
>>>> at
>>>> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$apply$1.applyOrElse(Analyzer.scala:72)
>>>>
>>>> at
>>>> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$apply$1.applyOrElse(Analyzer.scala:70)
>>>>
>>>> at
>>>> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:165)
>>>>
>>>> at
>>>> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:183)
>>>>
>>>> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>>>>
>>>> at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>>>>
>>>> at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>>>>
>>>> at
>>>> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
>>>>
>>>> at
>>>> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
>>>>
>>>> at
>>>> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
>>>>
>>>> at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
>>>>
>>>> at scala.collection.AbstractIterator.to(Iterator.scala:1157)
>>>>
>>>> at
>>>> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
>>>>
>>>> at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
>>>>
>>>> at
>>>> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
>>>>
>>>> at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
>>>>
>>>> at
>>>> org.apache.spark.sql.catalyst.trees.TreeNode.transformChildrenDown(TreeNode.scala:212)
>>>>
>>>> at
>>>> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:168)
>>>>
>>>> at
>>>> org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:156)
>>>>
>>>> at
>>>> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:70)
>>>>
>>>> at
>>>> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:68)
>>>>
>>>> at
>>>> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:61)
>>>>
>>>> at
>>>> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:59)
>>>>
>>>> at
>>>> scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:51)
>>>>
>>>> at
>>>> scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:60)
>>>>
>>>> at scala.collection.mutable.WrappedArray.foldLeft(WrappedArray.scala:34)
>>>>
>>>> at
>>>> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:59)
>>>>
>>>> at
>>>> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:51)
>>>>
>>>> at scala.collection.immutable.List.foreach(List.scala:318)
>>>>
>>>> at
>>>> org.apache.spark.sql.catalyst.rules.RuleExecutor.apply(RuleExecutor.scala:51)
>>>>
>>>> at
>>>> org.apache.spark.sql.SQLContext$QueryExecution.analyzed$lzycompute(SQLContext.scala:402)
>>>>
>>>> at
>>>> org.apache.spark.sql.SQLContext$QueryExecution.analyzed(SQLContext.scala:402)
>>>>
>>>> at
>>>> org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan$lzycompute(SQLContext.scala:403)
>>>>
>>>> at
>>>> org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan(SQLContext.scala:403)
>>>>
>>>> at
>>>> org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:407)
>>>>
>>>> at
>>>> org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:405)
>>>>
>>>> at
>>>> org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:411)
>>>>
>>>> at
>>>> org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:411)
>>>>
>>>> at org.apache.spark.sql.SchemaRDD.collect(SchemaRDD.scala:438)
>>>>
>>>> at org.apache.spark.sql.SchemaRDD.take(SchemaRDD.scala:440)
>>>>
>>>> at org.apache.spark.sql.SchemaRDD.take(SchemaRDD.scala:103)
>>>>
>>>> at org.apache.spark.rdd.RDD.first(RDD.scala:1091)
>>>>
>>>> at boot.SQLDemo$.main(SQLDemo.scala:65) //my code
>>>>
>>>> at boot.SQLDemo.main(SQLDemo.scala) //my code
>>>>
>>>>
>>>>
>>>> On Tue, Mar 3, 2015 at 8:57 AM, Cheng, Hao <ha...@intel.com> wrote:
>>>>
>>>> Can you provide the detailed failure call stack?
>>>>
>>>>
>>>>
>>>> *From:* shahab [mailto:shahab.mokari@gmail.com]
>>>> *Sent:* Tuesday, March 3, 2015 3:52 PM
>>>> *To:* user@spark.apache.org
>>>> *Subject:* Supporting Hive features in Spark SQL Thrift JDBC server
>>>>
>>>>
>>>>
>>>> Hi,
>>>>
>>>>
>>>>
>>>> According to Spark SQL documentation, "....Spark SQL supports the vast
>>>> majority of Hive features, such as User Defined Functions( UDF) ", and one
>>>> of these UFDs is "current_date()" function, which should be supported.
>>>>
>>>>
>>>>
>>>> However, i get error when I am using this UDF in my SQL query. There
>>>> are couple of other UDFs which cause similar error.
>>>>
>>>>
>>>>
>>>> Am I missing something in my JDBC server ?
>>>>
>>>>
>>>>
>>>> /Shahab
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>
Re: Supporting Hive features in Spark SQL Thrift JDBC server
Posted by Rohit Rai <ro...@tuplejump.com>.
The Hive dependency comes from spark-hive.
It does work with Spark 1.1 we will have the 1.2 release later this month.
On Mar 3, 2015 8:49 AM, "shahab" <sh...@gmail.com> wrote:
>
> Thanks Rohit,
>
> I am already using Calliope and quite happy with it, well done ! except
> the fact that :
> 1- It seems that it does not support Hive 0.12 or higher, Am i right? for
> example you can not use : current_time() UDF, or those new UDFs added in
> hive 0.12 . Are they supported? Any plan for supporting them?
> 2-It does not support Spark 1.1 and 1.2. Any plan for new release?
>
> best,
> /Shahab
>
> On Tue, Mar 3, 2015 at 5:41 PM, Rohit Rai <ro...@tuplejump.com> wrote:
>
>> Hello Shahab,
>>
>> I think CassandraAwareHiveContext
>> <https://github.com/tuplejump/calliope/blob/develop/sql/hive/src/main/scala/org/apache/spark/sql/hive/CassandraAwareHiveContext.scala> in
>> Calliopee is what you are looking for. Create CAHC instance and you should
>> be able to run hive functions against the SchemaRDD you create from there.
>>
>> Cheers,
>> Rohit
>>
>> *Founder & CEO, **Tuplejump, Inc.*
>> ____________________________
>> www.tuplejump.com
>> *The Data Engineering Platform*
>>
>> On Tue, Mar 3, 2015 at 6:03 AM, Cheng, Hao <ha...@intel.com> wrote:
>>
>>> The temp table in metastore can not be shared cross SQLContext
>>> instances, since HiveContext is a sub class of SQLContext (inherits all of
>>> its functionality), why not using a single HiveContext globally? Is there
>>> any specific requirement in your case that you need multiple
>>> SQLContext/HiveContext?
>>>
>>>
>>>
>>> *From:* shahab [mailto:shahab.mokari@gmail.com]
>>> *Sent:* Tuesday, March 3, 2015 9:46 PM
>>>
>>> *To:* Cheng, Hao
>>> *Cc:* user@spark.apache.org
>>> *Subject:* Re: Supporting Hive features in Spark SQL Thrift JDBC server
>>>
>>>
>>>
>>> You are right , CassandraAwareSQLContext is subclass of SQL context.
>>>
>>>
>>>
>>> But I did another experiment, I queried Cassandra
>>> using CassandraAwareSQLContext, then I registered the "rdd" as a temp table
>>> , next I tried to query it using HiveContext, but it seems that hive
>>> context can not see the registered table suing SQL context. Is this a
>>> normal case?
>>>
>>>
>>>
>>> best,
>>>
>>> /Shahab
>>>
>>>
>>>
>>>
>>>
>>> On Tue, Mar 3, 2015 at 1:35 PM, Cheng, Hao <ha...@intel.com> wrote:
>>>
>>> Hive UDF are only applicable for HiveContext and its subclass
>>> instance, is the CassandraAwareSQLContext a direct sub class of
>>> HiveContext or SQLContext?
>>>
>>>
>>>
>>> *From:* shahab [mailto:shahab.mokari@gmail.com]
>>> *Sent:* Tuesday, March 3, 2015 5:10 PM
>>> *To:* Cheng, Hao
>>> *Cc:* user@spark.apache.org
>>> *Subject:* Re: Supporting Hive features in Spark SQL Thrift JDBC server
>>>
>>>
>>>
>>> val sc: SparkContext = new SparkContext(conf)
>>>
>>> val sqlCassContext = new CassandraAwareSQLContext(sc) // I used some
>>> Calliope Cassandra Spark connector
>>>
>>> val rdd : SchemaRDD = sqlCassContext.sql("select * from db.profile " )
>>>
>>> rdd.cache
>>>
>>> rdd.registerTempTable("profile")
>>>
>>> rdd.first //enforce caching
>>>
>>> val q = "select from_unixtime(floor(createdAt/1000)) from profile
>>> where sampling_bucket=0 "
>>>
>>> val rdd2 = rdd.sqlContext.sql(q )
>>>
>>> println ("Result: " + rdd2.first)
>>>
>>>
>>>
>>> And I get the following errors:
>>>
>>> xception in thread "main"
>>> org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Unresolved
>>> attributes: 'from_unixtime('floor(('createdAt / 1000))) AS c0#7, tree:
>>>
>>> Project ['from_unixtime('floor(('createdAt / 1000))) AS c0#7]
>>>
>>> Filter (sampling_bucket#10 = 0)
>>>
>>> Subquery profile
>>>
>>> Project
>>> [company#8,bucket#9,sampling_bucket#10,profileid#11,createdat#12L,modifiedat#13L,version#14]
>>>
>>> CassandraRelation localhost, 9042, 9160, normaldb_sampling, profile,
>>> org.apache.spark.sql.CassandraAwareSQLContext@778b692d, None, None,
>>> false, Some(Configuration: core-default.xml, core-site.xml,
>>> mapred-default.xml, mapred-site.xml)
>>>
>>>
>>>
>>> at
>>> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$apply$1.applyOrElse(Analyzer.scala:72)
>>>
>>> at
>>> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$apply$1.applyOrElse(Analyzer.scala:70)
>>>
>>> at
>>> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:165)
>>>
>>> at
>>> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:183)
>>>
>>> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>>>
>>> at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>>>
>>> at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>>>
>>> at
>>> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
>>>
>>> at
>>> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
>>>
>>> at
>>> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
>>>
>>> at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
>>>
>>> at scala.collection.AbstractIterator.to(Iterator.scala:1157)
>>>
>>> at
>>> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
>>>
>>> at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
>>>
>>> at
>>> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
>>>
>>> at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
>>>
>>> at
>>> org.apache.spark.sql.catalyst.trees.TreeNode.transformChildrenDown(TreeNode.scala:212)
>>>
>>> at
>>> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:168)
>>>
>>> at
>>> org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:156)
>>>
>>> at
>>> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:70)
>>>
>>> at
>>> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:68)
>>>
>>> at
>>> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:61)
>>>
>>> at
>>> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:59)
>>>
>>> at
>>> scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:51)
>>>
>>> at
>>> scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:60)
>>>
>>> at scala.collection.mutable.WrappedArray.foldLeft(WrappedArray.scala:34)
>>>
>>> at
>>> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:59)
>>>
>>> at
>>> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:51)
>>>
>>> at scala.collection.immutable.List.foreach(List.scala:318)
>>>
>>> at
>>> org.apache.spark.sql.catalyst.rules.RuleExecutor.apply(RuleExecutor.scala:51)
>>>
>>> at
>>> org.apache.spark.sql.SQLContext$QueryExecution.analyzed$lzycompute(SQLContext.scala:402)
>>>
>>> at
>>> org.apache.spark.sql.SQLContext$QueryExecution.analyzed(SQLContext.scala:402)
>>>
>>> at
>>> org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan$lzycompute(SQLContext.scala:403)
>>>
>>> at
>>> org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan(SQLContext.scala:403)
>>>
>>> at
>>> org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:407)
>>>
>>> at
>>> org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:405)
>>>
>>> at
>>> org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:411)
>>>
>>> at
>>> org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:411)
>>>
>>> at org.apache.spark.sql.SchemaRDD.collect(SchemaRDD.scala:438)
>>>
>>> at org.apache.spark.sql.SchemaRDD.take(SchemaRDD.scala:440)
>>>
>>> at org.apache.spark.sql.SchemaRDD.take(SchemaRDD.scala:103)
>>>
>>> at org.apache.spark.rdd.RDD.first(RDD.scala:1091)
>>>
>>> at boot.SQLDemo$.main(SQLDemo.scala:65) //my code
>>>
>>> at boot.SQLDemo.main(SQLDemo.scala) //my code
>>>
>>>
>>>
>>> On Tue, Mar 3, 2015 at 8:57 AM, Cheng, Hao <ha...@intel.com> wrote:
>>>
>>> Can you provide the detailed failure call stack?
>>>
>>>
>>>
>>> *From:* shahab [mailto:shahab.mokari@gmail.com]
>>> *Sent:* Tuesday, March 3, 2015 3:52 PM
>>> *To:* user@spark.apache.org
>>> *Subject:* Supporting Hive features in Spark SQL Thrift JDBC server
>>>
>>>
>>>
>>> Hi,
>>>
>>>
>>>
>>> According to Spark SQL documentation, "....Spark SQL supports the vast
>>> majority of Hive features, such as User Defined Functions( UDF) ", and one
>>> of these UFDs is "current_date()" function, which should be supported.
>>>
>>>
>>>
>>> However, i get error when I am using this UDF in my SQL query. There are
>>> couple of other UDFs which cause similar error.
>>>
>>>
>>>
>>> Am I missing something in my JDBC server ?
>>>
>>>
>>>
>>> /Shahab
>>>
>>>
>>>
>>>
>>>
>>
>>
>
Re: Supporting Hive features in Spark SQL Thrift JDBC server
Posted by shahab <sh...@gmail.com>.
Thanks Rohit,
I am already using Calliope and quite happy with it, well done ! except the
fact that :
1- It seems that it does not support Hive 0.12 or higher, Am i right? for
example you can not use : current_time() UDF, or those new UDFs added in
hive 0.12 . Are they supported? Any plan for supporting them?
2-It does not support Spark 1.1 and 1.2. Any plan for new release?
best,
/Shahab
On Tue, Mar 3, 2015 at 5:41 PM, Rohit Rai <ro...@tuplejump.com> wrote:
> Hello Shahab,
>
> I think CassandraAwareHiveContext
> <https://github.com/tuplejump/calliope/blob/develop/sql/hive/src/main/scala/org/apache/spark/sql/hive/CassandraAwareHiveContext.scala> in
> Calliopee is what you are looking for. Create CAHC instance and you should
> be able to run hive functions against the SchemaRDD you create from there.
>
> Cheers,
> Rohit
>
> *Founder & CEO, **Tuplejump, Inc.*
> ____________________________
> www.tuplejump.com
> *The Data Engineering Platform*
>
> On Tue, Mar 3, 2015 at 6:03 AM, Cheng, Hao <ha...@intel.com> wrote:
>
>> The temp table in metastore can not be shared cross SQLContext
>> instances, since HiveContext is a sub class of SQLContext (inherits all of
>> its functionality), why not using a single HiveContext globally? Is there
>> any specific requirement in your case that you need multiple
>> SQLContext/HiveContext?
>>
>>
>>
>> *From:* shahab [mailto:shahab.mokari@gmail.com]
>> *Sent:* Tuesday, March 3, 2015 9:46 PM
>>
>> *To:* Cheng, Hao
>> *Cc:* user@spark.apache.org
>> *Subject:* Re: Supporting Hive features in Spark SQL Thrift JDBC server
>>
>>
>>
>> You are right , CassandraAwareSQLContext is subclass of SQL context.
>>
>>
>>
>> But I did another experiment, I queried Cassandra
>> using CassandraAwareSQLContext, then I registered the "rdd" as a temp table
>> , next I tried to query it using HiveContext, but it seems that hive
>> context can not see the registered table suing SQL context. Is this a
>> normal case?
>>
>>
>>
>> best,
>>
>> /Shahab
>>
>>
>>
>>
>>
>> On Tue, Mar 3, 2015 at 1:35 PM, Cheng, Hao <ha...@intel.com> wrote:
>>
>> Hive UDF are only applicable for HiveContext and its subclass instance,
>> is the CassandraAwareSQLContext a direct sub class of HiveContext or
>> SQLContext?
>>
>>
>>
>> *From:* shahab [mailto:shahab.mokari@gmail.com]
>> *Sent:* Tuesday, March 3, 2015 5:10 PM
>> *To:* Cheng, Hao
>> *Cc:* user@spark.apache.org
>> *Subject:* Re: Supporting Hive features in Spark SQL Thrift JDBC server
>>
>>
>>
>> val sc: SparkContext = new SparkContext(conf)
>>
>> val sqlCassContext = new CassandraAwareSQLContext(sc) // I used some
>> Calliope Cassandra Spark connector
>>
>> val rdd : SchemaRDD = sqlCassContext.sql("select * from db.profile " )
>>
>> rdd.cache
>>
>> rdd.registerTempTable("profile")
>>
>> rdd.first //enforce caching
>>
>> val q = "select from_unixtime(floor(createdAt/1000)) from profile
>> where sampling_bucket=0 "
>>
>> val rdd2 = rdd.sqlContext.sql(q )
>>
>> println ("Result: " + rdd2.first)
>>
>>
>>
>> And I get the following errors:
>>
>> xception in thread "main"
>> org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Unresolved
>> attributes: 'from_unixtime('floor(('createdAt / 1000))) AS c0#7, tree:
>>
>> Project ['from_unixtime('floor(('createdAt / 1000))) AS c0#7]
>>
>> Filter (sampling_bucket#10 = 0)
>>
>> Subquery profile
>>
>> Project
>> [company#8,bucket#9,sampling_bucket#10,profileid#11,createdat#12L,modifiedat#13L,version#14]
>>
>> CassandraRelation localhost, 9042, 9160, normaldb_sampling, profile,
>> org.apache.spark.sql.CassandraAwareSQLContext@778b692d, None, None,
>> false, Some(Configuration: core-default.xml, core-site.xml,
>> mapred-default.xml, mapred-site.xml)
>>
>>
>>
>> at
>> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$apply$1.applyOrElse(Analyzer.scala:72)
>>
>> at
>> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$apply$1.applyOrElse(Analyzer.scala:70)
>>
>> at
>> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:165)
>>
>> at
>> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:183)
>>
>> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>>
>> at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>>
>> at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>>
>> at
>> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
>>
>> at
>> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
>>
>> at
>> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
>>
>> at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
>>
>> at scala.collection.AbstractIterator.to(Iterator.scala:1157)
>>
>> at
>> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
>>
>> at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
>>
>> at
>> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
>>
>> at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
>>
>> at
>> org.apache.spark.sql.catalyst.trees.TreeNode.transformChildrenDown(TreeNode.scala:212)
>>
>> at
>> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:168)
>>
>> at
>> org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:156)
>>
>> at
>> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:70)
>>
>> at
>> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:68)
>>
>> at
>> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:61)
>>
>> at
>> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:59)
>>
>> at
>> scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:51)
>>
>> at
>> scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:60)
>>
>> at scala.collection.mutable.WrappedArray.foldLeft(WrappedArray.scala:34)
>>
>> at
>> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:59)
>>
>> at
>> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:51)
>>
>> at scala.collection.immutable.List.foreach(List.scala:318)
>>
>> at
>> org.apache.spark.sql.catalyst.rules.RuleExecutor.apply(RuleExecutor.scala:51)
>>
>> at
>> org.apache.spark.sql.SQLContext$QueryExecution.analyzed$lzycompute(SQLContext.scala:402)
>>
>> at
>> org.apache.spark.sql.SQLContext$QueryExecution.analyzed(SQLContext.scala:402)
>>
>> at
>> org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan$lzycompute(SQLContext.scala:403)
>>
>> at
>> org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan(SQLContext.scala:403)
>>
>> at
>> org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:407)
>>
>> at
>> org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:405)
>>
>> at
>> org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:411)
>>
>> at
>> org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:411)
>>
>> at org.apache.spark.sql.SchemaRDD.collect(SchemaRDD.scala:438)
>>
>> at org.apache.spark.sql.SchemaRDD.take(SchemaRDD.scala:440)
>>
>> at org.apache.spark.sql.SchemaRDD.take(SchemaRDD.scala:103)
>>
>> at org.apache.spark.rdd.RDD.first(RDD.scala:1091)
>>
>> at boot.SQLDemo$.main(SQLDemo.scala:65) //my code
>>
>> at boot.SQLDemo.main(SQLDemo.scala) //my code
>>
>>
>>
>> On Tue, Mar 3, 2015 at 8:57 AM, Cheng, Hao <ha...@intel.com> wrote:
>>
>> Can you provide the detailed failure call stack?
>>
>>
>>
>> *From:* shahab [mailto:shahab.mokari@gmail.com]
>> *Sent:* Tuesday, March 3, 2015 3:52 PM
>> *To:* user@spark.apache.org
>> *Subject:* Supporting Hive features in Spark SQL Thrift JDBC server
>>
>>
>>
>> Hi,
>>
>>
>>
>> According to Spark SQL documentation, "....Spark SQL supports the vast
>> majority of Hive features, such as User Defined Functions( UDF) ", and one
>> of these UFDs is "current_date()" function, which should be supported.
>>
>>
>>
>> However, i get error when I am using this UDF in my SQL query. There are
>> couple of other UDFs which cause similar error.
>>
>>
>>
>> Am I missing something in my JDBC server ?
>>
>>
>>
>> /Shahab
>>
>>
>>
>>
>>
>
>
Re: Supporting Hive features in Spark SQL Thrift JDBC server
Posted by Rohit Rai <ro...@tuplejump.com>.
Hello Shahab,
I think CassandraAwareHiveContext
<https://github.com/tuplejump/calliope/blob/develop/sql/hive/src/main/scala/org/apache/spark/sql/hive/CassandraAwareHiveContext.scala>
in
Calliopee is what you are looking for. Create CAHC instance and you should
be able to run hive functions against the SchemaRDD you create from there.
Cheers,
Rohit
*Founder & CEO, **Tuplejump, Inc.*
____________________________
www.tuplejump.com
*The Data Engineering Platform*
On Tue, Mar 3, 2015 at 6:03 AM, Cheng, Hao <ha...@intel.com> wrote:
> The temp table in metastore can not be shared cross SQLContext
> instances, since HiveContext is a sub class of SQLContext (inherits all of
> its functionality), why not using a single HiveContext globally? Is there
> any specific requirement in your case that you need multiple
> SQLContext/HiveContext?
>
>
>
> *From:* shahab [mailto:shahab.mokari@gmail.com]
> *Sent:* Tuesday, March 3, 2015 9:46 PM
>
> *To:* Cheng, Hao
> *Cc:* user@spark.apache.org
> *Subject:* Re: Supporting Hive features in Spark SQL Thrift JDBC server
>
>
>
> You are right , CassandraAwareSQLContext is subclass of SQL context.
>
>
>
> But I did another experiment, I queried Cassandra
> using CassandraAwareSQLContext, then I registered the "rdd" as a temp table
> , next I tried to query it using HiveContext, but it seems that hive
> context can not see the registered table suing SQL context. Is this a
> normal case?
>
>
>
> best,
>
> /Shahab
>
>
>
>
>
> On Tue, Mar 3, 2015 at 1:35 PM, Cheng, Hao <ha...@intel.com> wrote:
>
> Hive UDF are only applicable for HiveContext and its subclass instance,
> is the CassandraAwareSQLContext a direct sub class of HiveContext or
> SQLContext?
>
>
>
> *From:* shahab [mailto:shahab.mokari@gmail.com]
> *Sent:* Tuesday, March 3, 2015 5:10 PM
> *To:* Cheng, Hao
> *Cc:* user@spark.apache.org
> *Subject:* Re: Supporting Hive features in Spark SQL Thrift JDBC server
>
>
>
> val sc: SparkContext = new SparkContext(conf)
>
> val sqlCassContext = new CassandraAwareSQLContext(sc) // I used some
> Calliope Cassandra Spark connector
>
> val rdd : SchemaRDD = sqlCassContext.sql("select * from db.profile " )
>
> rdd.cache
>
> rdd.registerTempTable("profile")
>
> rdd.first //enforce caching
>
> val q = "select from_unixtime(floor(createdAt/1000)) from profile
> where sampling_bucket=0 "
>
> val rdd2 = rdd.sqlContext.sql(q )
>
> println ("Result: " + rdd2.first)
>
>
>
> And I get the following errors:
>
> xception in thread "main"
> org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Unresolved
> attributes: 'from_unixtime('floor(('createdAt / 1000))) AS c0#7, tree:
>
> Project ['from_unixtime('floor(('createdAt / 1000))) AS c0#7]
>
> Filter (sampling_bucket#10 = 0)
>
> Subquery profile
>
> Project
> [company#8,bucket#9,sampling_bucket#10,profileid#11,createdat#12L,modifiedat#13L,version#14]
>
> CassandraRelation localhost, 9042, 9160, normaldb_sampling, profile,
> org.apache.spark.sql.CassandraAwareSQLContext@778b692d, None, None,
> false, Some(Configuration: core-default.xml, core-site.xml,
> mapred-default.xml, mapred-site.xml)
>
>
>
> at
> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$apply$1.applyOrElse(Analyzer.scala:72)
>
> at
> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$apply$1.applyOrElse(Analyzer.scala:70)
>
> at
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:165)
>
> at
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:183)
>
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>
> at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>
> at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>
> at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
>
> at
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
>
> at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
>
> at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
>
> at scala.collection.AbstractIterator.to(Iterator.scala:1157)
>
> at
> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
>
> at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
>
> at
> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
>
> at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
>
> at
> org.apache.spark.sql.catalyst.trees.TreeNode.transformChildrenDown(TreeNode.scala:212)
>
> at
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:168)
>
> at
> org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:156)
>
> at
> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:70)
>
> at
> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:68)
>
> at
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:61)
>
> at
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:59)
>
> at
> scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:51)
>
> at
> scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:60)
>
> at scala.collection.mutable.WrappedArray.foldLeft(WrappedArray.scala:34)
>
> at
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:59)
>
> at
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:51)
>
> at scala.collection.immutable.List.foreach(List.scala:318)
>
> at
> org.apache.spark.sql.catalyst.rules.RuleExecutor.apply(RuleExecutor.scala:51)
>
> at
> org.apache.spark.sql.SQLContext$QueryExecution.analyzed$lzycompute(SQLContext.scala:402)
>
> at
> org.apache.spark.sql.SQLContext$QueryExecution.analyzed(SQLContext.scala:402)
>
> at
> org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan$lzycompute(SQLContext.scala:403)
>
> at
> org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan(SQLContext.scala:403)
>
> at
> org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:407)
>
> at
> org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:405)
>
> at
> org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:411)
>
> at
> org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:411)
>
> at org.apache.spark.sql.SchemaRDD.collect(SchemaRDD.scala:438)
>
> at org.apache.spark.sql.SchemaRDD.take(SchemaRDD.scala:440)
>
> at org.apache.spark.sql.SchemaRDD.take(SchemaRDD.scala:103)
>
> at org.apache.spark.rdd.RDD.first(RDD.scala:1091)
>
> at boot.SQLDemo$.main(SQLDemo.scala:65) //my code
>
> at boot.SQLDemo.main(SQLDemo.scala) //my code
>
>
>
> On Tue, Mar 3, 2015 at 8:57 AM, Cheng, Hao <ha...@intel.com> wrote:
>
> Can you provide the detailed failure call stack?
>
>
>
> *From:* shahab [mailto:shahab.mokari@gmail.com]
> *Sent:* Tuesday, March 3, 2015 3:52 PM
> *To:* user@spark.apache.org
> *Subject:* Supporting Hive features in Spark SQL Thrift JDBC server
>
>
>
> Hi,
>
>
>
> According to Spark SQL documentation, "....Spark SQL supports the vast
> majority of Hive features, such as User Defined Functions( UDF) ", and one
> of these UFDs is "current_date()" function, which should be supported.
>
>
>
> However, i get error when I am using this UDF in my SQL query. There are
> couple of other UDFs which cause similar error.
>
>
>
> Am I missing something in my JDBC server ?
>
>
>
> /Shahab
>
>
>
>
>
Re: Supporting Hive features in Spark SQL Thrift JDBC server
Posted by shahab <sh...@gmail.com>.
@Cheng :My problem is that the connector I use to query Spark does not
support latest Hive (0.12, 0.13), But I need to perform Hive Queries on
data retrieved from Cassandra. I assumed that if I get data out of
cassandra in some way and register it as Temp table I would be able to
query it using HiveContext, but it seems I can not do this!
@Yes, it is added in Hive 0.12, but do you mean It is not supported by
HiveContext in Spark????
Thanks,
/Shahab
On Tue, Mar 3, 2015 at 5:23 PM, Yin Huai <yh...@databricks.com> wrote:
> Regarding current_date, I think it is not in either Hive 0.12.0 or 0.13.1
> (versions that we support). Seems
> https://issues.apache.org/jira/browse/HIVE-5472 added it Hive recently.
>
> On Tue, Mar 3, 2015 at 6:03 AM, Cheng, Hao <ha...@intel.com> wrote:
>
>> The temp table in metastore can not be shared cross SQLContext
>> instances, since HiveContext is a sub class of SQLContext (inherits all of
>> its functionality), why not using a single HiveContext globally? Is there
>> any specific requirement in your case that you need multiple
>> SQLContext/HiveContext?
>>
>>
>>
>> *From:* shahab [mailto:shahab.mokari@gmail.com]
>> *Sent:* Tuesday, March 3, 2015 9:46 PM
>>
>> *To:* Cheng, Hao
>> *Cc:* user@spark.apache.org
>> *Subject:* Re: Supporting Hive features in Spark SQL Thrift JDBC server
>>
>>
>>
>> You are right , CassandraAwareSQLContext is subclass of SQL context.
>>
>>
>>
>> But I did another experiment, I queried Cassandra
>> using CassandraAwareSQLContext, then I registered the "rdd" as a temp table
>> , next I tried to query it using HiveContext, but it seems that hive
>> context can not see the registered table suing SQL context. Is this a
>> normal case?
>>
>>
>>
>> best,
>>
>> /Shahab
>>
>>
>>
>>
>>
>> On Tue, Mar 3, 2015 at 1:35 PM, Cheng, Hao <ha...@intel.com> wrote:
>>
>> Hive UDF are only applicable for HiveContext and its subclass instance,
>> is the CassandraAwareSQLContext a direct sub class of HiveContext or
>> SQLContext?
>>
>>
>>
>> *From:* shahab [mailto:shahab.mokari@gmail.com]
>> *Sent:* Tuesday, March 3, 2015 5:10 PM
>> *To:* Cheng, Hao
>> *Cc:* user@spark.apache.org
>> *Subject:* Re: Supporting Hive features in Spark SQL Thrift JDBC server
>>
>>
>>
>> val sc: SparkContext = new SparkContext(conf)
>>
>> val sqlCassContext = new CassandraAwareSQLContext(sc) // I used some
>> Calliope Cassandra Spark connector
>>
>> val rdd : SchemaRDD = sqlCassContext.sql("select * from db.profile " )
>>
>> rdd.cache
>>
>> rdd.registerTempTable("profile")
>>
>> rdd.first //enforce caching
>>
>> val q = "select from_unixtime(floor(createdAt/1000)) from profile
>> where sampling_bucket=0 "
>>
>> val rdd2 = rdd.sqlContext.sql(q )
>>
>> println ("Result: " + rdd2.first)
>>
>>
>>
>> And I get the following errors:
>>
>> xception in thread "main"
>> org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Unresolved
>> attributes: 'from_unixtime('floor(('createdAt / 1000))) AS c0#7, tree:
>>
>> Project ['from_unixtime('floor(('createdAt / 1000))) AS c0#7]
>>
>> Filter (sampling_bucket#10 = 0)
>>
>> Subquery profile
>>
>> Project
>> [company#8,bucket#9,sampling_bucket#10,profileid#11,createdat#12L,modifiedat#13L,version#14]
>>
>> CassandraRelation localhost, 9042, 9160, normaldb_sampling, profile,
>> org.apache.spark.sql.CassandraAwareSQLContext@778b692d, None, None,
>> false, Some(Configuration: core-default.xml, core-site.xml,
>> mapred-default.xml, mapred-site.xml)
>>
>>
>>
>> at
>> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$apply$1.applyOrElse(Analyzer.scala:72)
>>
>> at
>> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$apply$1.applyOrElse(Analyzer.scala:70)
>>
>> at
>> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:165)
>>
>> at
>> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:183)
>>
>> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>>
>> at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>>
>> at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>>
>> at
>> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
>>
>> at
>> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
>>
>> at
>> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
>>
>> at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
>>
>> at scala.collection.AbstractIterator.to(Iterator.scala:1157)
>>
>> at
>> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
>>
>> at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
>>
>> at
>> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
>>
>> at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
>>
>> at
>> org.apache.spark.sql.catalyst.trees.TreeNode.transformChildrenDown(TreeNode.scala:212)
>>
>> at
>> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:168)
>>
>> at
>> org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:156)
>>
>> at
>> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:70)
>>
>> at
>> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:68)
>>
>> at
>> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:61)
>>
>> at
>> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:59)
>>
>> at
>> scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:51)
>>
>> at
>> scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:60)
>>
>> at scala.collection.mutable.WrappedArray.foldLeft(WrappedArray.scala:34)
>>
>> at
>> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:59)
>>
>> at
>> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:51)
>>
>> at scala.collection.immutable.List.foreach(List.scala:318)
>>
>> at
>> org.apache.spark.sql.catalyst.rules.RuleExecutor.apply(RuleExecutor.scala:51)
>>
>> at
>> org.apache.spark.sql.SQLContext$QueryExecution.analyzed$lzycompute(SQLContext.scala:402)
>>
>> at
>> org.apache.spark.sql.SQLContext$QueryExecution.analyzed(SQLContext.scala:402)
>>
>> at
>> org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan$lzycompute(SQLContext.scala:403)
>>
>> at
>> org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan(SQLContext.scala:403)
>>
>> at
>> org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:407)
>>
>> at
>> org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:405)
>>
>> at
>> org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:411)
>>
>> at
>> org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:411)
>>
>> at org.apache.spark.sql.SchemaRDD.collect(SchemaRDD.scala:438)
>>
>> at org.apache.spark.sql.SchemaRDD.take(SchemaRDD.scala:440)
>>
>> at org.apache.spark.sql.SchemaRDD.take(SchemaRDD.scala:103)
>>
>> at org.apache.spark.rdd.RDD.first(RDD.scala:1091)
>>
>> at boot.SQLDemo$.main(SQLDemo.scala:65) //my code
>>
>> at boot.SQLDemo.main(SQLDemo.scala) //my code
>>
>>
>>
>> On Tue, Mar 3, 2015 at 8:57 AM, Cheng, Hao <ha...@intel.com> wrote:
>>
>> Can you provide the detailed failure call stack?
>>
>>
>>
>> *From:* shahab [mailto:shahab.mokari@gmail.com]
>> *Sent:* Tuesday, March 3, 2015 3:52 PM
>> *To:* user@spark.apache.org
>> *Subject:* Supporting Hive features in Spark SQL Thrift JDBC server
>>
>>
>>
>> Hi,
>>
>>
>>
>> According to Spark SQL documentation, "....Spark SQL supports the vast
>> majority of Hive features, such as User Defined Functions( UDF) ", and one
>> of these UFDs is "current_date()" function, which should be supported.
>>
>>
>>
>> However, i get error when I am using this UDF in my SQL query. There are
>> couple of other UDFs which cause similar error.
>>
>>
>>
>> Am I missing something in my JDBC server ?
>>
>>
>>
>> /Shahab
>>
>>
>>
>>
>>
>
>
Re: Supporting Hive features in Spark SQL Thrift JDBC server
Posted by Yin Huai <yh...@databricks.com>.
Regarding current_date, I think it is not in either Hive 0.12.0 or 0.13.1
(versions that we support). Seems
https://issues.apache.org/jira/browse/HIVE-5472 added it Hive recently.
On Tue, Mar 3, 2015 at 6:03 AM, Cheng, Hao <ha...@intel.com> wrote:
> The temp table in metastore can not be shared cross SQLContext
> instances, since HiveContext is a sub class of SQLContext (inherits all of
> its functionality), why not using a single HiveContext globally? Is there
> any specific requirement in your case that you need multiple
> SQLContext/HiveContext?
>
>
>
> *From:* shahab [mailto:shahab.mokari@gmail.com]
> *Sent:* Tuesday, March 3, 2015 9:46 PM
>
> *To:* Cheng, Hao
> *Cc:* user@spark.apache.org
> *Subject:* Re: Supporting Hive features in Spark SQL Thrift JDBC server
>
>
>
> You are right , CassandraAwareSQLContext is subclass of SQL context.
>
>
>
> But I did another experiment, I queried Cassandra
> using CassandraAwareSQLContext, then I registered the "rdd" as a temp table
> , next I tried to query it using HiveContext, but it seems that hive
> context can not see the registered table suing SQL context. Is this a
> normal case?
>
>
>
> best,
>
> /Shahab
>
>
>
>
>
> On Tue, Mar 3, 2015 at 1:35 PM, Cheng, Hao <ha...@intel.com> wrote:
>
> Hive UDF are only applicable for HiveContext and its subclass instance,
> is the CassandraAwareSQLContext a direct sub class of HiveContext or
> SQLContext?
>
>
>
> *From:* shahab [mailto:shahab.mokari@gmail.com]
> *Sent:* Tuesday, March 3, 2015 5:10 PM
> *To:* Cheng, Hao
> *Cc:* user@spark.apache.org
> *Subject:* Re: Supporting Hive features in Spark SQL Thrift JDBC server
>
>
>
> val sc: SparkContext = new SparkContext(conf)
>
> val sqlCassContext = new CassandraAwareSQLContext(sc) // I used some
> Calliope Cassandra Spark connector
>
> val rdd : SchemaRDD = sqlCassContext.sql("select * from db.profile " )
>
> rdd.cache
>
> rdd.registerTempTable("profile")
>
> rdd.first //enforce caching
>
> val q = "select from_unixtime(floor(createdAt/1000)) from profile
> where sampling_bucket=0 "
>
> val rdd2 = rdd.sqlContext.sql(q )
>
> println ("Result: " + rdd2.first)
>
>
>
> And I get the following errors:
>
> xception in thread "main"
> org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Unresolved
> attributes: 'from_unixtime('floor(('createdAt / 1000))) AS c0#7, tree:
>
> Project ['from_unixtime('floor(('createdAt / 1000))) AS c0#7]
>
> Filter (sampling_bucket#10 = 0)
>
> Subquery profile
>
> Project
> [company#8,bucket#9,sampling_bucket#10,profileid#11,createdat#12L,modifiedat#13L,version#14]
>
> CassandraRelation localhost, 9042, 9160, normaldb_sampling, profile,
> org.apache.spark.sql.CassandraAwareSQLContext@778b692d, None, None,
> false, Some(Configuration: core-default.xml, core-site.xml,
> mapred-default.xml, mapred-site.xml)
>
>
>
> at
> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$apply$1.applyOrElse(Analyzer.scala:72)
>
> at
> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$apply$1.applyOrElse(Analyzer.scala:70)
>
> at
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:165)
>
> at
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:183)
>
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>
> at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>
> at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>
> at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
>
> at
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
>
> at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
>
> at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
>
> at scala.collection.AbstractIterator.to(Iterator.scala:1157)
>
> at
> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
>
> at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
>
> at
> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
>
> at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
>
> at
> org.apache.spark.sql.catalyst.trees.TreeNode.transformChildrenDown(TreeNode.scala:212)
>
> at
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:168)
>
> at
> org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:156)
>
> at
> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:70)
>
> at
> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:68)
>
> at
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:61)
>
> at
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:59)
>
> at
> scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:51)
>
> at
> scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:60)
>
> at scala.collection.mutable.WrappedArray.foldLeft(WrappedArray.scala:34)
>
> at
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:59)
>
> at
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:51)
>
> at scala.collection.immutable.List.foreach(List.scala:318)
>
> at
> org.apache.spark.sql.catalyst.rules.RuleExecutor.apply(RuleExecutor.scala:51)
>
> at
> org.apache.spark.sql.SQLContext$QueryExecution.analyzed$lzycompute(SQLContext.scala:402)
>
> at
> org.apache.spark.sql.SQLContext$QueryExecution.analyzed(SQLContext.scala:402)
>
> at
> org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan$lzycompute(SQLContext.scala:403)
>
> at
> org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan(SQLContext.scala:403)
>
> at
> org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:407)
>
> at
> org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:405)
>
> at
> org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:411)
>
> at
> org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:411)
>
> at org.apache.spark.sql.SchemaRDD.collect(SchemaRDD.scala:438)
>
> at org.apache.spark.sql.SchemaRDD.take(SchemaRDD.scala:440)
>
> at org.apache.spark.sql.SchemaRDD.take(SchemaRDD.scala:103)
>
> at org.apache.spark.rdd.RDD.first(RDD.scala:1091)
>
> at boot.SQLDemo$.main(SQLDemo.scala:65) //my code
>
> at boot.SQLDemo.main(SQLDemo.scala) //my code
>
>
>
> On Tue, Mar 3, 2015 at 8:57 AM, Cheng, Hao <ha...@intel.com> wrote:
>
> Can you provide the detailed failure call stack?
>
>
>
> *From:* shahab [mailto:shahab.mokari@gmail.com]
> *Sent:* Tuesday, March 3, 2015 3:52 PM
> *To:* user@spark.apache.org
> *Subject:* Supporting Hive features in Spark SQL Thrift JDBC server
>
>
>
> Hi,
>
>
>
> According to Spark SQL documentation, "....Spark SQL supports the vast
> majority of Hive features, such as User Defined Functions( UDF) ", and one
> of these UFDs is "current_date()" function, which should be supported.
>
>
>
> However, i get error when I am using this UDF in my SQL query. There are
> couple of other UDFs which cause similar error.
>
>
>
> Am I missing something in my JDBC server ?
>
>
>
> /Shahab
>
>
>
>
>
RE: Supporting Hive features in Spark SQL Thrift JDBC server
Posted by "Cheng, Hao" <ha...@intel.com>.
The temp table in metastore can not be shared cross SQLContext instances, since HiveContext is a sub class of SQLContext (inherits all of its functionality), why not using a single HiveContext globally? Is there any specific requirement in your case that you need multiple SQLContext/HiveContext?
From: shahab [mailto:shahab.mokari@gmail.com]
Sent: Tuesday, March 3, 2015 9:46 PM
To: Cheng, Hao
Cc: user@spark.apache.org
Subject: Re: Supporting Hive features in Spark SQL Thrift JDBC server
You are right , CassandraAwareSQLContext is subclass of SQL context.
But I did another experiment, I queried Cassandra using CassandraAwareSQLContext, then I registered the "rdd" as a temp table , next I tried to query it using HiveContext, but it seems that hive context can not see the registered table suing SQL context. Is this a normal case?
best,
/Shahab
On Tue, Mar 3, 2015 at 1:35 PM, Cheng, Hao <ha...@intel.com>> wrote:
Hive UDF are only applicable for HiveContext and its subclass instance, is the CassandraAwareSQLContext a direct sub class of HiveContext or SQLContext?
From: shahab [mailto:shahab.mokari@gmail.com<ma...@gmail.com>]
Sent: Tuesday, March 3, 2015 5:10 PM
To: Cheng, Hao
Cc: user@spark.apache.org<ma...@spark.apache.org>
Subject: Re: Supporting Hive features in Spark SQL Thrift JDBC server
val sc: SparkContext = new SparkContext(conf)
val sqlCassContext = new CassandraAwareSQLContext(sc) // I used some Calliope Cassandra Spark connector
val rdd : SchemaRDD = sqlCassContext.sql("select * from db.profile " )
rdd.cache
rdd.registerTempTable("profile")
rdd.first //enforce caching
val q = "select from_unixtime(floor(createdAt/1000)) from profile where sampling_bucket=0 "
val rdd2 = rdd.sqlContext.sql(q )
println ("Result: " + rdd2.first)
And I get the following errors:
xception in thread "main" org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Unresolved attributes: 'from_unixtime('floor(('createdAt / 1000))) AS c0#7, tree:
Project ['from_unixtime('floor(('createdAt / 1000))) AS c0#7]
Filter (sampling_bucket#10 = 0)
Subquery profile
Project [company#8,bucket#9,sampling_bucket#10,profileid#11,createdat#12L,modifiedat#13L,version#14]
CassandraRelation localhost, 9042, 9160, normaldb_sampling, profile, org.apache.spark.sql.CassandraAwareSQLContext@778b692d<ma...@778b692d>, None, None, false, Some(Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml)
at org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$apply$1.applyOrElse(Analyzer.scala:72)
at org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$apply$1.applyOrElse(Analyzer.scala:70)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:165)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:183)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
at scala.collection.TraversableOnce$class.to<http://class.to>(TraversableOnce.scala:273)
at scala.collection.AbstractIterator.to<http://scala.collection.AbstractIterator.to>(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformChildrenDown(TreeNode.scala:212)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:168)
at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:156)
at org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:70)
at org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:68)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:61)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:59)
at scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:51)
at scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:60)
at scala.collection.mutable.WrappedArray.foldLeft(WrappedArray.scala:34)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:59)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:51)
at scala.collection.immutable.List.foreach(List.scala:318)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.apply(RuleExecutor.scala:51)
at org.apache.spark.sql.SQLContext$QueryExecution.analyzed$lzycompute(SQLContext.scala:402)
at org.apache.spark.sql.SQLContext$QueryExecution.analyzed(SQLContext.scala:402)
at org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan$lzycompute(SQLContext.scala:403)
at org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan(SQLContext.scala:403)
at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:407)
at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:405)
at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:411)
at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:411)
at org.apache.spark.sql.SchemaRDD.collect(SchemaRDD.scala:438)
at org.apache.spark.sql.SchemaRDD.take(SchemaRDD.scala:440)
at org.apache.spark.sql.SchemaRDD.take(SchemaRDD.scala:103)
at org.apache.spark.rdd.RDD.first(RDD.scala:1091)
at boot.SQLDemo$.main(SQLDemo.scala:65) //my code
at boot.SQLDemo.main(SQLDemo.scala) //my code
On Tue, Mar 3, 2015 at 8:57 AM, Cheng, Hao <ha...@intel.com>> wrote:
Can you provide the detailed failure call stack?
From: shahab [mailto:shahab.mokari@gmail.com<ma...@gmail.com>]
Sent: Tuesday, March 3, 2015 3:52 PM
To: user@spark.apache.org<ma...@spark.apache.org>
Subject: Supporting Hive features in Spark SQL Thrift JDBC server
Hi,
According to Spark SQL documentation, "....Spark SQL supports the vast majority of Hive features, such as User Defined Functions( UDF) ", and one of these UFDs is "current_date()" function, which should be supported.
However, i get error when I am using this UDF in my SQL query. There are couple of other UDFs which cause similar error.
Am I missing something in my JDBC server ?
/Shahab
Re: Supporting Hive features in Spark SQL Thrift JDBC server
Posted by shahab <sh...@gmail.com>.
You are right , CassandraAwareSQLContext is subclass of SQL context.
But I did another experiment, I queried Cassandra using
CassandraAwareSQLContext,
then I registered the "rdd" as a temp table , next I tried to query it
using HiveContext, but it seems that hive context can not see the
registered table suing SQL context. Is this a normal case?
best,
/Shahab
On Tue, Mar 3, 2015 at 1:35 PM, Cheng, Hao <ha...@intel.com> wrote:
> Hive UDF are only applicable for HiveContext and its subclass instance,
> is the CassandraAwareSQLContext a direct sub class of HiveContext or
> SQLContext?
>
>
>
> *From:* shahab [mailto:shahab.mokari@gmail.com]
> *Sent:* Tuesday, March 3, 2015 5:10 PM
> *To:* Cheng, Hao
> *Cc:* user@spark.apache.org
> *Subject:* Re: Supporting Hive features in Spark SQL Thrift JDBC server
>
>
>
> val sc: SparkContext = new SparkContext(conf)
>
> val sqlCassContext = new CassandraAwareSQLContext(sc) // I used some
> Calliope Cassandra Spark connector
>
> val rdd : SchemaRDD = sqlCassContext.sql("select * from db.profile " )
>
> rdd.cache
>
> rdd.registerTempTable("profile")
>
> rdd.first //enforce caching
>
> val q = "select from_unixtime(floor(createdAt/1000)) from profile
> where sampling_bucket=0 "
>
> val rdd2 = rdd.sqlContext.sql(q )
>
> println ("Result: " + rdd2.first)
>
>
>
> And I get the following errors:
>
> xception in thread "main"
> org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Unresolved
> attributes: 'from_unixtime('floor(('createdAt / 1000))) AS c0#7, tree:
>
> Project ['from_unixtime('floor(('createdAt / 1000))) AS c0#7]
>
> Filter (sampling_bucket#10 = 0)
>
> Subquery profile
>
> Project
> [company#8,bucket#9,sampling_bucket#10,profileid#11,createdat#12L,modifiedat#13L,version#14]
>
> CassandraRelation localhost, 9042, 9160, normaldb_sampling, profile,
> org.apache.spark.sql.CassandraAwareSQLContext@778b692d, None, None,
> false, Some(Configuration: core-default.xml, core-site.xml,
> mapred-default.xml, mapred-site.xml)
>
>
>
> at
> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$apply$1.applyOrElse(Analyzer.scala:72)
>
> at
> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$apply$1.applyOrElse(Analyzer.scala:70)
>
> at
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:165)
>
> at
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:183)
>
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>
> at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>
> at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>
> at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
>
> at
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
>
> at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
>
> at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
>
> at scala.collection.AbstractIterator.to(Iterator.scala:1157)
>
> at
> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
>
> at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
>
> at
> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
>
> at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
>
> at
> org.apache.spark.sql.catalyst.trees.TreeNode.transformChildrenDown(TreeNode.scala:212)
>
> at
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:168)
>
> at
> org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:156)
>
> at
> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:70)
>
> at
> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:68)
>
> at
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:61)
>
> at
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:59)
>
> at
> scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:51)
>
> at
> scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:60)
>
> at scala.collection.mutable.WrappedArray.foldLeft(WrappedArray.scala:34)
>
> at
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:59)
>
> at
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:51)
>
> at scala.collection.immutable.List.foreach(List.scala:318)
>
> at
> org.apache.spark.sql.catalyst.rules.RuleExecutor.apply(RuleExecutor.scala:51)
>
> at
> org.apache.spark.sql.SQLContext$QueryExecution.analyzed$lzycompute(SQLContext.scala:402)
>
> at
> org.apache.spark.sql.SQLContext$QueryExecution.analyzed(SQLContext.scala:402)
>
> at
> org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan$lzycompute(SQLContext.scala:403)
>
> at
> org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan(SQLContext.scala:403)
>
> at
> org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:407)
>
> at
> org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:405)
>
> at
> org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:411)
>
> at
> org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:411)
>
> at org.apache.spark.sql.SchemaRDD.collect(SchemaRDD.scala:438)
>
> at org.apache.spark.sql.SchemaRDD.take(SchemaRDD.scala:440)
>
> at org.apache.spark.sql.SchemaRDD.take(SchemaRDD.scala:103)
>
> at org.apache.spark.rdd.RDD.first(RDD.scala:1091)
>
> at boot.SQLDemo$.main(SQLDemo.scala:65) //my code
>
> at boot.SQLDemo.main(SQLDemo.scala) //my code
>
>
>
> On Tue, Mar 3, 2015 at 8:57 AM, Cheng, Hao <ha...@intel.com> wrote:
>
> Can you provide the detailed failure call stack?
>
>
>
> *From:* shahab [mailto:shahab.mokari@gmail.com]
> *Sent:* Tuesday, March 3, 2015 3:52 PM
> *To:* user@spark.apache.org
> *Subject:* Supporting Hive features in Spark SQL Thrift JDBC server
>
>
>
> Hi,
>
>
>
> According to Spark SQL documentation, "....Spark SQL supports the vast
> majority of Hive features, such as User Defined Functions( UDF) ", and one
> of these UFDs is "current_date()" function, which should be supported.
>
>
>
> However, i get error when I am using this UDF in my SQL query. There are
> couple of other UDFs which cause similar error.
>
>
>
> Am I missing something in my JDBC server ?
>
>
>
> /Shahab
>
>
>
RE: Supporting Hive features in Spark SQL Thrift JDBC server
Posted by "Cheng, Hao" <ha...@intel.com>.
Hive UDF are only applicable for HiveContext and its subclass instance, is the CassandraAwareSQLContext a direct sub class of HiveContext or SQLContext?
From: shahab [mailto:shahab.mokari@gmail.com]
Sent: Tuesday, March 3, 2015 5:10 PM
To: Cheng, Hao
Cc: user@spark.apache.org
Subject: Re: Supporting Hive features in Spark SQL Thrift JDBC server
val sc: SparkContext = new SparkContext(conf)
val sqlCassContext = new CassandraAwareSQLContext(sc) // I used some Calliope Cassandra Spark connector
val rdd : SchemaRDD = sqlCassContext.sql("select * from db.profile " )
rdd.cache
rdd.registerTempTable("profile")
rdd.first //enforce caching
val q = "select from_unixtime(floor(createdAt/1000)) from profile where sampling_bucket=0 "
val rdd2 = rdd.sqlContext.sql(q )
println ("Result: " + rdd2.first)
And I get the following errors:
xception in thread "main" org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Unresolved attributes: 'from_unixtime('floor(('createdAt / 1000))) AS c0#7, tree:
Project ['from_unixtime('floor(('createdAt / 1000))) AS c0#7]
Filter (sampling_bucket#10 = 0)
Subquery profile
Project [company#8,bucket#9,sampling_bucket#10,profileid#11,createdat#12L,modifiedat#13L,version#14]
CassandraRelation localhost, 9042, 9160, normaldb_sampling, profile, org.apache.spark.sql.CassandraAwareSQLContext@778b692d<ma...@778b692d>, None, None, false, Some(Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml)
at org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$apply$1.applyOrElse(Analyzer.scala:72)
at org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$apply$1.applyOrElse(Analyzer.scala:70)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:165)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:183)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
at scala.collection.TraversableOnce$class.to<http://class.to>(TraversableOnce.scala:273)
at scala.collection.AbstractIterator.to<http://scala.collection.AbstractIterator.to>(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformChildrenDown(TreeNode.scala:212)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:168)
at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:156)
at org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:70)
at org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:68)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:61)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:59)
at scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:51)
at scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:60)
at scala.collection.mutable.WrappedArray.foldLeft(WrappedArray.scala:34)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:59)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:51)
at scala.collection.immutable.List.foreach(List.scala:318)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.apply(RuleExecutor.scala:51)
at org.apache.spark.sql.SQLContext$QueryExecution.analyzed$lzycompute(SQLContext.scala:402)
at org.apache.spark.sql.SQLContext$QueryExecution.analyzed(SQLContext.scala:402)
at org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan$lzycompute(SQLContext.scala:403)
at org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan(SQLContext.scala:403)
at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:407)
at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:405)
at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:411)
at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:411)
at org.apache.spark.sql.SchemaRDD.collect(SchemaRDD.scala:438)
at org.apache.spark.sql.SchemaRDD.take(SchemaRDD.scala:440)
at org.apache.spark.sql.SchemaRDD.take(SchemaRDD.scala:103)
at org.apache.spark.rdd.RDD.first(RDD.scala:1091)
at boot.SQLDemo$.main(SQLDemo.scala:65) //my code
at boot.SQLDemo.main(SQLDemo.scala) //my code
On Tue, Mar 3, 2015 at 8:57 AM, Cheng, Hao <ha...@intel.com>> wrote:
Can you provide the detailed failure call stack?
From: shahab [mailto:shahab.mokari@gmail.com<ma...@gmail.com>]
Sent: Tuesday, March 3, 2015 3:52 PM
To: user@spark.apache.org<ma...@spark.apache.org>
Subject: Supporting Hive features in Spark SQL Thrift JDBC server
Hi,
According to Spark SQL documentation, "....Spark SQL supports the vast majority of Hive features, such as User Defined Functions( UDF) ", and one of these UFDs is "current_date()" function, which should be supported.
However, i get error when I am using this UDF in my SQL query. There are couple of other UDFs which cause similar error.
Am I missing something in my JDBC server ?
/Shahab
Re: Supporting Hive features in Spark SQL Thrift JDBC server
Posted by shahab <sh...@gmail.com>.
val sc: SparkContext = new SparkContext(conf)
val sqlCassContext = new CassandraAwareSQLContext(sc) // I used some
Calliope Cassandra Spark connector
val rdd : SchemaRDD = sqlCassContext.sql("select * from db.profile " )
rdd.cache
rdd.registerTempTable("profile")
rdd.first //enforce caching
val q = "select from_unixtime(floor(createdAt/1000)) from profile
where sampling_bucket=0 "
val rdd2 = rdd.sqlContext.sql(q )
println ("Result: " + rdd2.first)
And I get the following errors:
xception in thread "main"
org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Unresolved
attributes: 'from_unixtime('floor(('createdAt / 1000))) AS c0#7, tree:
Project ['from_unixtime('floor(('createdAt / 1000))) AS c0#7]
Filter (sampling_bucket#10 = 0)
Subquery profile
Project
[company#8,bucket#9,sampling_bucket#10,profileid#11,createdat#12L,modifiedat#13L,version#14]
CassandraRelation localhost, 9042, 9160, normaldb_sampling, profile,
org.apache.spark.sql.CassandraAwareSQLContext@778b692d, None, None, false,
Some(Configuration: core-default.xml, core-site.xml, mapred-default.xml,
mapred-site.xml)
at
org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$apply$1.applyOrElse(
Analyzer.scala:72)
at
org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$apply$1.applyOrElse(
Analyzer.scala:70)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(
TreeNode.scala:165)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(
TreeNode.scala:183)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
at scala.collection.AbstractIterator.to(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265
)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformChildrenDown(
TreeNode.scala:212)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(
TreeNode.scala:168)
at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:156
)
at org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(
Analyzer.scala:70)
at org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(
Analyzer.scala:68)
at
org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(
RuleExecutor.scala:61)
at
org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(
RuleExecutor.scala:59)
at scala.collection.IndexedSeqOptimized$class.foldl(
IndexedSeqOptimized.scala:51)
at scala.collection.IndexedSeqOptimized$class.foldLeft(
IndexedSeqOptimized.scala:60)
at scala.collection.mutable.WrappedArray.foldLeft(WrappedArray.scala:34)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(
RuleExecutor.scala:59)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(
RuleExecutor.scala:51)
at scala.collection.immutable.List.foreach(List.scala:318)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.apply(
RuleExecutor.scala:51)
at org.apache.spark.sql.SQLContext$QueryExecution.analyzed$lzycompute(
SQLContext.scala:402)
at org.apache.spark.sql.SQLContext$QueryExecution.analyzed(
SQLContext.scala:402)
at org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan$lzycompute(
SQLContext.scala:403)
at org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan(
SQLContext.scala:403)
at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(
SQLContext.scala:407)
at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(
SQLContext.scala:405)
at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(
SQLContext.scala:411)
at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(
SQLContext.scala:411)
at org.apache.spark.sql.SchemaRDD.collect(SchemaRDD.scala:438)
at org.apache.spark.sql.SchemaRDD.take(SchemaRDD.scala:440)
at org.apache.spark.sql.SchemaRDD.take(SchemaRDD.scala:103)
at org.apache.spark.rdd.RDD.first(RDD.scala:1091)
at boot.SQLDemo$.main(SQLDemo.scala:65) //my code
at boot.SQLDemo.main(SQLDemo.scala) //my code
On Tue, Mar 3, 2015 at 8:57 AM, Cheng, Hao <ha...@intel.com> wrote:
> Can you provide the detailed failure call stack?
>
>
>
> *From:* shahab [mailto:shahab.mokari@gmail.com]
> *Sent:* Tuesday, March 3, 2015 3:52 PM
> *To:* user@spark.apache.org
> *Subject:* Supporting Hive features in Spark SQL Thrift JDBC server
>
>
>
> Hi,
>
>
>
> According to Spark SQL documentation, "....Spark SQL supports the vast
> majority of Hive features, such as User Defined Functions( UDF) ", and one
> of these UFDs is "current_date()" function, which should be supported.
>
>
>
> However, i get error when I am using this UDF in my SQL query. There are
> couple of other UDFs which cause similar error.
>
>
>
> Am I missing something in my JDBC server ?
>
>
>
> /Shahab
>
RE: Supporting Hive features in Spark SQL Thrift JDBC server
Posted by "Cheng, Hao" <ha...@intel.com>.
Can you provide the detailed failure call stack?
From: shahab [mailto:shahab.mokari@gmail.com]
Sent: Tuesday, March 3, 2015 3:52 PM
To: user@spark.apache.org
Subject: Supporting Hive features in Spark SQL Thrift JDBC server
Hi,
According to Spark SQL documentation, "....Spark SQL supports the vast majority of Hive features, such as User Defined Functions( UDF) ", and one of these UFDs is "current_date()" function, which should be supported.
However, i get error when I am using this UDF in my SQL query. There are couple of other UDFs which cause similar error.
Am I missing something in my JDBC server ?
/Shahab