You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by shahab <sh...@gmail.com> on 2015/03/03 08:51:53 UTC

Supporting Hive features in Spark SQL Thrift JDBC server

Hi,

According to Spark SQL documentation, "....Spark SQL supports the vast
majority of Hive features, such as  User Defined Functions( UDF) ", and one
of these UFDs is "current_date()" function, which should be supported.

However, i get error when I am using this UDF in my SQL query. There are
couple of other UDFs which cause similar error.

Am I missing something in my JDBC server ?

/Shahab

Re: Supporting Hive features in Spark SQL Thrift JDBC server

Posted by shahab <sh...@gmail.com>.

@Yin: sorry for my mistake, you are right it was added in 1.2, not 0.12.0 ,
 my bad!

On Tue, Mar 3, 2015 at 6:47 PM, shahab <sh...@gmail.com> wrote:

> Thanks Rohit, yes my mistake, it does work with 1.1 ( I am actually
> running it on spark 1.1)
>
> But do you mean that even HiveConext of spark (nit Calliope
> CassandraAwareHiveContext) is not supporting Hive 0.12 ??
>
> best,
> /Shahab
>
> On Tue, Mar 3, 2015 at 5:55 PM, Rohit Rai <ro...@tuplejump.com> wrote:
>
>> The Hive dependency comes from spark-hive.
>>
>> It does work with Spark 1.1 we will have the 1.2 release later this month.
>> On Mar 3, 2015 8:49 AM, "shahab" <sh...@gmail.com> wrote:
>>
>>>
>>> Thanks Rohit,
>>>
>>> I am already using Calliope and quite happy with it, well done ! except
>>> the fact that :
>>> 1- It seems that it does not support Hive 0.12 or higher, Am i right?
>>>  for example you can not use : current_time() UDF, or those new UDFs added
>>> in hive 0.12 . Are they supported? Any plan for supporting them?
>>> 2-It does not support Spark 1.1 and 1.2. Any plan for new release?
>>>
>>> best,
>>> /Shahab
>>>
>>> On Tue, Mar 3, 2015 at 5:41 PM, Rohit Rai <ro...@tuplejump.com> wrote:
>>>
>>>> Hello Shahab,
>>>>
>>>> I think CassandraAwareHiveContext
>>>> <https://github.com/tuplejump/calliope/blob/develop/sql/hive/src/main/scala/org/apache/spark/sql/hive/CassandraAwareHiveContext.scala> in
>>>> Calliopee is what you are looking for. Create CAHC instance and you should
>>>> be able to run hive functions against the SchemaRDD you create from there.
>>>>
>>>> Cheers,
>>>> Rohit
>>>>
>>>> *Founder & CEO, **Tuplejump, Inc.*
>>>> ____________________________
>>>> www.tuplejump.com
>>>> *The Data Engineering Platform*
>>>>
>>>> On Tue, Mar 3, 2015 at 6:03 AM, Cheng, Hao <ha...@intel.com> wrote:
>>>>
>>>>>  The temp table in metastore can not be shared cross SQLContext
>>>>> instances, since HiveContext is a sub class of SQLContext (inherits all of
>>>>> its functionality), why not using a single HiveContext globally? Is there
>>>>> any specific requirement in your case that you need multiple
>>>>> SQLContext/HiveContext?
>>>>>
>>>>>
>>>>>
>>>>> *From:* shahab [mailto:shahab.mokari@gmail.com]
>>>>> *Sent:* Tuesday, March 3, 2015 9:46 PM
>>>>>
>>>>> *To:* Cheng, Hao
>>>>> *Cc:* user@spark.apache.org
>>>>> *Subject:* Re: Supporting Hive features in Spark SQL Thrift JDBC
>>>>> server
>>>>>
>>>>>
>>>>>
>>>>> You are right ,  CassandraAwareSQLContext is subclass of SQL context.
>>>>>
>>>>>
>>>>>
>>>>> But I did another experiment, I queried Cassandra
>>>>> using CassandraAwareSQLContext, then I registered the "rdd" as a temp table
>>>>> , next I tried to query it using HiveContext, but it seems that hive
>>>>> context can not see the registered table suing SQL context. Is this a
>>>>> normal case?
>>>>>
>>>>>
>>>>>
>>>>> best,
>>>>>
>>>>> /Shahab
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Mar 3, 2015 at 1:35 PM, Cheng, Hao <ha...@intel.com>
>>>>> wrote:
>>>>>
>>>>>  Hive UDF are only applicable for HiveContext and its subclass
>>>>> instance, is the CassandraAwareSQLContext a direct sub class of
>>>>> HiveContext or SQLContext?
>>>>>
>>>>>
>>>>>
>>>>> *From:* shahab [mailto:shahab.mokari@gmail.com]
>>>>> *Sent:* Tuesday, March 3, 2015 5:10 PM
>>>>> *To:* Cheng, Hao
>>>>> *Cc:* user@spark.apache.org
>>>>> *Subject:* Re: Supporting Hive features in Spark SQL Thrift JDBC
>>>>> server
>>>>>
>>>>>
>>>>>
>>>>>   val sc: SparkContext = new SparkContext(conf)
>>>>>
>>>>>   val sqlCassContext = new CassandraAwareSQLContext(sc)  // I used
>>>>> some Calliope Cassandra Spark connector
>>>>>
>>>>> val rdd : SchemaRDD  = sqlCassContext.sql("select * from db.profile " )
>>>>>
>>>>> rdd.cache
>>>>>
>>>>> rdd.registerTempTable("profile")
>>>>>
>>>>>  rdd.first  //enforce caching
>>>>>
>>>>>      val q = "select  from_unixtime(floor(createdAt/1000)) from
>>>>> profile where sampling_bucket=0 "
>>>>>
>>>>>      val rdd2 = rdd.sqlContext.sql(q )
>>>>>
>>>>>      println ("Result: " + rdd2.first)
>>>>>
>>>>>
>>>>>
>>>>> And I get the following  errors:
>>>>>
>>>>> xception in thread "main"
>>>>> org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Unresolved
>>>>> attributes: 'from_unixtime('floor(('createdAt / 1000))) AS c0#7, tree:
>>>>>
>>>>> Project ['from_unixtime('floor(('createdAt / 1000))) AS c0#7]
>>>>>
>>>>>  Filter (sampling_bucket#10 = 0)
>>>>>
>>>>>   Subquery profile
>>>>>
>>>>>    Project
>>>>> [company#8,bucket#9,sampling_bucket#10,profileid#11,createdat#12L,modifiedat#13L,version#14]
>>>>>
>>>>>     CassandraRelation localhost, 9042, 9160, normaldb_sampling,
>>>>> profile, org.apache.spark.sql.CassandraAwareSQLContext@778b692d,
>>>>> None, None, false, Some(Configuration: core-default.xml, core-site.xml,
>>>>> mapred-default.xml, mapred-site.xml)
>>>>>
>>>>>
>>>>>
>>>>> at
>>>>> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$apply$1.applyOrElse(Analyzer.scala:72)
>>>>>
>>>>> at
>>>>> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$apply$1.applyOrElse(Analyzer.scala:70)
>>>>>
>>>>> at
>>>>> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:165)
>>>>>
>>>>> at
>>>>> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:183)
>>>>>
>>>>> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>>>>>
>>>>> at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>>>>>
>>>>> at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>>>>>
>>>>> at
>>>>> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
>>>>>
>>>>> at
>>>>> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
>>>>>
>>>>> at
>>>>> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
>>>>>
>>>>> at scala.collection.TraversableOnce$class.to
>>>>> (TraversableOnce.scala:273)
>>>>>
>>>>> at scala.collection.AbstractIterator.to(Iterator.scala:1157)
>>>>>
>>>>> at
>>>>> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
>>>>>
>>>>> at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
>>>>>
>>>>> at
>>>>> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
>>>>>
>>>>> at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
>>>>>
>>>>> at
>>>>> org.apache.spark.sql.catalyst.trees.TreeNode.transformChildrenDown(TreeNode.scala:212)
>>>>>
>>>>> at
>>>>> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:168)
>>>>>
>>>>> at
>>>>> org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:156)
>>>>>
>>>>> at
>>>>> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:70)
>>>>>
>>>>> at
>>>>> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:68)
>>>>>
>>>>> at
>>>>> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:61)
>>>>>
>>>>> at
>>>>> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:59)
>>>>>
>>>>> at
>>>>> scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:51)
>>>>>
>>>>> at
>>>>> scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:60)
>>>>>
>>>>> at
>>>>> scala.collection.mutable.WrappedArray.foldLeft(WrappedArray.scala:34)
>>>>>
>>>>> at
>>>>> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:59)
>>>>>
>>>>> at
>>>>> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:51)
>>>>>
>>>>> at scala.collection.immutable.List.foreach(List.scala:318)
>>>>>
>>>>> at
>>>>> org.apache.spark.sql.catalyst.rules.RuleExecutor.apply(RuleExecutor.scala:51)
>>>>>
>>>>> at
>>>>> org.apache.spark.sql.SQLContext$QueryExecution.analyzed$lzycompute(SQLContext.scala:402)
>>>>>
>>>>> at
>>>>> org.apache.spark.sql.SQLContext$QueryExecution.analyzed(SQLContext.scala:402)
>>>>>
>>>>> at
>>>>> org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan$lzycompute(SQLContext.scala:403)
>>>>>
>>>>> at
>>>>> org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan(SQLContext.scala:403)
>>>>>
>>>>> at
>>>>> org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:407)
>>>>>
>>>>> at
>>>>> org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:405)
>>>>>
>>>>> at
>>>>> org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:411)
>>>>>
>>>>> at
>>>>> org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:411)
>>>>>
>>>>> at org.apache.spark.sql.SchemaRDD.collect(SchemaRDD.scala:438)
>>>>>
>>>>> at org.apache.spark.sql.SchemaRDD.take(SchemaRDD.scala:440)
>>>>>
>>>>> at org.apache.spark.sql.SchemaRDD.take(SchemaRDD.scala:103)
>>>>>
>>>>> at org.apache.spark.rdd.RDD.first(RDD.scala:1091)
>>>>>
>>>>> at boot.SQLDemo$.main(SQLDemo.scala:65)  //my code
>>>>>
>>>>> at boot.SQLDemo.main(SQLDemo.scala)  //my code
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Mar 3, 2015 at 8:57 AM, Cheng, Hao <ha...@intel.com>
>>>>> wrote:
>>>>>
>>>>>  Can you provide the detailed failure call stack?
>>>>>
>>>>>
>>>>>
>>>>> *From:* shahab [mailto:shahab.mokari@gmail.com]
>>>>> *Sent:* Tuesday, March 3, 2015 3:52 PM
>>>>> *To:* user@spark.apache.org
>>>>> *Subject:* Supporting Hive features in Spark SQL Thrift JDBC server
>>>>>
>>>>>
>>>>>
>>>>> Hi,
>>>>>
>>>>>
>>>>>
>>>>> According to Spark SQL documentation, "....Spark SQL supports the
>>>>> vast majority of Hive features, such as  User Defined Functions( UDF) ",
>>>>> and one of these UFDs is "current_date()" function, which should be
>>>>> supported.
>>>>>
>>>>>
>>>>>
>>>>> However, i get error when I am using this UDF in my SQL query. There
>>>>> are couple of other UDFs which cause similar error.
>>>>>
>>>>>
>>>>>
>>>>> Am I missing something in my JDBC server ?
>>>>>
>>>>>
>>>>>
>>>>> /Shahab
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>

Re: Supporting Hive features in Spark SQL Thrift JDBC server

Posted by shahab <sh...@gmail.com>.

Thanks Rohit, yes my mistake, it does work with 1.1 ( I am actually running
it on spark 1.1)

But do you mean that even HiveConext of spark (nit Calliope
CassandraAwareHiveContext) is not supporting Hive 0.12 ??

best,
/Shahab

On Tue, Mar 3, 2015 at 5:55 PM, Rohit Rai <ro...@tuplejump.com> wrote:

> The Hive dependency comes from spark-hive.
>
> It does work with Spark 1.1 we will have the 1.2 release later this month.
> On Mar 3, 2015 8:49 AM, "shahab" <sh...@gmail.com> wrote:
>
>>
>> Thanks Rohit,
>>
>> I am already using Calliope and quite happy with it, well done ! except
>> the fact that :
>> 1- It seems that it does not support Hive 0.12 or higher, Am i right?
>>  for example you can not use : current_time() UDF, or those new UDFs added
>> in hive 0.12 . Are they supported? Any plan for supporting them?
>> 2-It does not support Spark 1.1 and 1.2. Any plan for new release?
>>
>> best,
>> /Shahab
>>
>> On Tue, Mar 3, 2015 at 5:41 PM, Rohit Rai <ro...@tuplejump.com> wrote:
>>
>>> Hello Shahab,
>>>
>>> I think CassandraAwareHiveContext
>>> <https://github.com/tuplejump/calliope/blob/develop/sql/hive/src/main/scala/org/apache/spark/sql/hive/CassandraAwareHiveContext.scala> in
>>> Calliopee is what you are looking for. Create CAHC instance and you should
>>> be able to run hive functions against the SchemaRDD you create from there.
>>>
>>> Cheers,
>>> Rohit
>>>
>>> *Founder & CEO, **Tuplejump, Inc.*
>>> ____________________________
>>> www.tuplejump.com
>>> *The Data Engineering Platform*
>>>
>>> On Tue, Mar 3, 2015 at 6:03 AM, Cheng, Hao <ha...@intel.com> wrote:
>>>
>>>>  The temp table in metastore can not be shared cross SQLContext
>>>> instances, since HiveContext is a sub class of SQLContext (inherits all of
>>>> its functionality), why not using a single HiveContext globally? Is there
>>>> any specific requirement in your case that you need multiple
>>>> SQLContext/HiveContext?
>>>>
>>>>
>>>>
>>>> *From:* shahab [mailto:shahab.mokari@gmail.com]
>>>> *Sent:* Tuesday, March 3, 2015 9:46 PM
>>>>
>>>> *To:* Cheng, Hao
>>>> *Cc:* user@spark.apache.org
>>>> *Subject:* Re: Supporting Hive features in Spark SQL Thrift JDBC server
>>>>
>>>>
>>>>
>>>> You are right ,  CassandraAwareSQLContext is subclass of SQL context.
>>>>
>>>>
>>>>
>>>> But I did another experiment, I queried Cassandra
>>>> using CassandraAwareSQLContext, then I registered the "rdd" as a temp table
>>>> , next I tried to query it using HiveContext, but it seems that hive
>>>> context can not see the registered table suing SQL context. Is this a
>>>> normal case?
>>>>
>>>>
>>>>
>>>> best,
>>>>
>>>> /Shahab
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, Mar 3, 2015 at 1:35 PM, Cheng, Hao <ha...@intel.com> wrote:
>>>>
>>>>  Hive UDF are only applicable for HiveContext and its subclass
>>>> instance, is the CassandraAwareSQLContext a direct sub class of
>>>> HiveContext or SQLContext?
>>>>
>>>>
>>>>
>>>> *From:* shahab [mailto:shahab.mokari@gmail.com]
>>>> *Sent:* Tuesday, March 3, 2015 5:10 PM
>>>> *To:* Cheng, Hao
>>>> *Cc:* user@spark.apache.org
>>>> *Subject:* Re: Supporting Hive features in Spark SQL Thrift JDBC server
>>>>
>>>>
>>>>
>>>>   val sc: SparkContext = new SparkContext(conf)
>>>>
>>>>   val sqlCassContext = new CassandraAwareSQLContext(sc)  // I used some
>>>> Calliope Cassandra Spark connector
>>>>
>>>> val rdd : SchemaRDD  = sqlCassContext.sql("select * from db.profile " )
>>>>
>>>> rdd.cache
>>>>
>>>> rdd.registerTempTable("profile")
>>>>
>>>>  rdd.first  //enforce caching
>>>>
>>>>      val q = "select  from_unixtime(floor(createdAt/1000)) from profile
>>>> where sampling_bucket=0 "
>>>>
>>>>      val rdd2 = rdd.sqlContext.sql(q )
>>>>
>>>>      println ("Result: " + rdd2.first)
>>>>
>>>>
>>>>
>>>> And I get the following  errors:
>>>>
>>>> xception in thread "main"
>>>> org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Unresolved
>>>> attributes: 'from_unixtime('floor(('createdAt / 1000))) AS c0#7, tree:
>>>>
>>>> Project ['from_unixtime('floor(('createdAt / 1000))) AS c0#7]
>>>>
>>>>  Filter (sampling_bucket#10 = 0)
>>>>
>>>>   Subquery profile
>>>>
>>>>    Project
>>>> [company#8,bucket#9,sampling_bucket#10,profileid#11,createdat#12L,modifiedat#13L,version#14]
>>>>
>>>>     CassandraRelation localhost, 9042, 9160, normaldb_sampling,
>>>> profile, org.apache.spark.sql.CassandraAwareSQLContext@778b692d, None,
>>>> None, false, Some(Configuration: core-default.xml, core-site.xml,
>>>> mapred-default.xml, mapred-site.xml)
>>>>
>>>>
>>>>
>>>> at
>>>> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$apply$1.applyOrElse(Analyzer.scala:72)
>>>>
>>>> at
>>>> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$apply$1.applyOrElse(Analyzer.scala:70)
>>>>
>>>> at
>>>> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:165)
>>>>
>>>> at
>>>> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:183)
>>>>
>>>> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>>>>
>>>> at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>>>>
>>>> at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>>>>
>>>> at
>>>> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
>>>>
>>>> at
>>>> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
>>>>
>>>> at
>>>> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
>>>>
>>>> at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
>>>>
>>>> at scala.collection.AbstractIterator.to(Iterator.scala:1157)
>>>>
>>>> at
>>>> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
>>>>
>>>> at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
>>>>
>>>> at
>>>> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
>>>>
>>>> at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
>>>>
>>>> at
>>>> org.apache.spark.sql.catalyst.trees.TreeNode.transformChildrenDown(TreeNode.scala:212)
>>>>
>>>> at
>>>> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:168)
>>>>
>>>> at
>>>> org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:156)
>>>>
>>>> at
>>>> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:70)
>>>>
>>>> at
>>>> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:68)
>>>>
>>>> at
>>>> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:61)
>>>>
>>>> at
>>>> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:59)
>>>>
>>>> at
>>>> scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:51)
>>>>
>>>> at
>>>> scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:60)
>>>>
>>>> at scala.collection.mutable.WrappedArray.foldLeft(WrappedArray.scala:34)
>>>>
>>>> at
>>>> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:59)
>>>>
>>>> at
>>>> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:51)
>>>>
>>>> at scala.collection.immutable.List.foreach(List.scala:318)
>>>>
>>>> at
>>>> org.apache.spark.sql.catalyst.rules.RuleExecutor.apply(RuleExecutor.scala:51)
>>>>
>>>> at
>>>> org.apache.spark.sql.SQLContext$QueryExecution.analyzed$lzycompute(SQLContext.scala:402)
>>>>
>>>> at
>>>> org.apache.spark.sql.SQLContext$QueryExecution.analyzed(SQLContext.scala:402)
>>>>
>>>> at
>>>> org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan$lzycompute(SQLContext.scala:403)
>>>>
>>>> at
>>>> org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan(SQLContext.scala:403)
>>>>
>>>> at
>>>> org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:407)
>>>>
>>>> at
>>>> org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:405)
>>>>
>>>> at
>>>> org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:411)
>>>>
>>>> at
>>>> org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:411)
>>>>
>>>> at org.apache.spark.sql.SchemaRDD.collect(SchemaRDD.scala:438)
>>>>
>>>> at org.apache.spark.sql.SchemaRDD.take(SchemaRDD.scala:440)
>>>>
>>>> at org.apache.spark.sql.SchemaRDD.take(SchemaRDD.scala:103)
>>>>
>>>> at org.apache.spark.rdd.RDD.first(RDD.scala:1091)
>>>>
>>>> at boot.SQLDemo$.main(SQLDemo.scala:65)  //my code
>>>>
>>>> at boot.SQLDemo.main(SQLDemo.scala)  //my code
>>>>
>>>>
>>>>
>>>> On Tue, Mar 3, 2015 at 8:57 AM, Cheng, Hao <ha...@intel.com> wrote:
>>>>
>>>>  Can you provide the detailed failure call stack?
>>>>
>>>>
>>>>
>>>> *From:* shahab [mailto:shahab.mokari@gmail.com]
>>>> *Sent:* Tuesday, March 3, 2015 3:52 PM
>>>> *To:* user@spark.apache.org
>>>> *Subject:* Supporting Hive features in Spark SQL Thrift JDBC server
>>>>
>>>>
>>>>
>>>> Hi,
>>>>
>>>>
>>>>
>>>> According to Spark SQL documentation, "....Spark SQL supports the vast
>>>> majority of Hive features, such as  User Defined Functions( UDF) ", and one
>>>> of these UFDs is "current_date()" function, which should be supported.
>>>>
>>>>
>>>>
>>>> However, i get error when I am using this UDF in my SQL query. There
>>>> are couple of other UDFs which cause similar error.
>>>>
>>>>
>>>>
>>>> Am I missing something in my JDBC server ?
>>>>
>>>>
>>>>
>>>> /Shahab
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>

Re: Supporting Hive features in Spark SQL Thrift JDBC server

Posted by Yin Huai <yh...@databricks.com>.

@Shahab, based on https://issues.apache.org/jira/browse/HIVE-5472,
current_date was added in Hive *1.2.0 (not 0.12.0)*. For my previous email,
I meant current_date is not in neither Hive 0.12.0 nor Hive 0.13.1 (Spark
SQL currently supports these two Hive versions).

On Tue, Mar 3, 2015 at 8:55 AM, Rohit Rai <ro...@tuplejump.com> wrote:

> The Hive dependency comes from spark-hive.
>
> It does work with Spark 1.1 we will have the 1.2 release later this month.
> On Mar 3, 2015 8:49 AM, "shahab" <sh...@gmail.com> wrote:
>
>>
>> Thanks Rohit,
>>
>> I am already using Calliope and quite happy with it, well done ! except
>> the fact that :
>> 1- It seems that it does not support Hive 0.12 or higher, Am i right?
>>  for example you can not use : current_time() UDF, or those new UDFs added
>> in hive 0.12 . Are they supported? Any plan for supporting them?
>> 2-It does not support Spark 1.1 and 1.2. Any plan for new release?
>>
>> best,
>> /Shahab
>>
>> On Tue, Mar 3, 2015 at 5:41 PM, Rohit Rai <ro...@tuplejump.com> wrote:
>>
>>> Hello Shahab,
>>>
>>> I think CassandraAwareHiveContext
>>> <https://github.com/tuplejump/calliope/blob/develop/sql/hive/src/main/scala/org/apache/spark/sql/hive/CassandraAwareHiveContext.scala> in
>>> Calliopee is what you are looking for. Create CAHC instance and you should
>>> be able to run hive functions against the SchemaRDD you create from there.
>>>
>>> Cheers,
>>> Rohit
>>>
>>> *Founder & CEO, **Tuplejump, Inc.*
>>> ____________________________
>>> www.tuplejump.com
>>> *The Data Engineering Platform*
>>>
>>> On Tue, Mar 3, 2015 at 6:03 AM, Cheng, Hao <ha...@intel.com> wrote:
>>>
>>>>  The temp table in metastore can not be shared cross SQLContext
>>>> instances, since HiveContext is a sub class of SQLContext (inherits all of
>>>> its functionality), why not using a single HiveContext globally? Is there
>>>> any specific requirement in your case that you need multiple
>>>> SQLContext/HiveContext?
>>>>
>>>>
>>>>
>>>> *From:* shahab [mailto:shahab.mokari@gmail.com]
>>>> *Sent:* Tuesday, March 3, 2015 9:46 PM
>>>>
>>>> *To:* Cheng, Hao
>>>> *Cc:* user@spark.apache.org
>>>> *Subject:* Re: Supporting Hive features in Spark SQL Thrift JDBC server
>>>>
>>>>
>>>>
>>>> You are right ,  CassandraAwareSQLContext is subclass of SQL context.
>>>>
>>>>
>>>>
>>>> But I did another experiment, I queried Cassandra
>>>> using CassandraAwareSQLContext, then I registered the "rdd" as a temp table
>>>> , next I tried to query it using HiveContext, but it seems that hive
>>>> context can not see the registered table suing SQL context. Is this a
>>>> normal case?
>>>>
>>>>
>>>>
>>>> best,
>>>>
>>>> /Shahab
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, Mar 3, 2015 at 1:35 PM, Cheng, Hao <ha...@intel.com> wrote:
>>>>
>>>>  Hive UDF are only applicable for HiveContext and its subclass
>>>> instance, is the CassandraAwareSQLContext a direct sub class of
>>>> HiveContext or SQLContext?
>>>>
>>>>
>>>>
>>>> *From:* shahab [mailto:shahab.mokari@gmail.com]
>>>> *Sent:* Tuesday, March 3, 2015 5:10 PM
>>>> *To:* Cheng, Hao
>>>> *Cc:* user@spark.apache.org
>>>> *Subject:* Re: Supporting Hive features in Spark SQL Thrift JDBC server
>>>>
>>>>
>>>>
>>>>   val sc: SparkContext = new SparkContext(conf)
>>>>
>>>>   val sqlCassContext = new CassandraAwareSQLContext(sc)  // I used some
>>>> Calliope Cassandra Spark connector
>>>>
>>>> val rdd : SchemaRDD  = sqlCassContext.sql("select * from db.profile " )
>>>>
>>>> rdd.cache
>>>>
>>>> rdd.registerTempTable("profile")
>>>>
>>>>  rdd.first  //enforce caching
>>>>
>>>>      val q = "select  from_unixtime(floor(createdAt/1000)) from profile
>>>> where sampling_bucket=0 "
>>>>
>>>>      val rdd2 = rdd.sqlContext.sql(q )
>>>>
>>>>      println ("Result: " + rdd2.first)
>>>>
>>>>
>>>>
>>>> And I get the following  errors:
>>>>
>>>> xception in thread "main"
>>>> org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Unresolved
>>>> attributes: 'from_unixtime('floor(('createdAt / 1000))) AS c0#7, tree:
>>>>
>>>> Project ['from_unixtime('floor(('createdAt / 1000))) AS c0#7]
>>>>
>>>>  Filter (sampling_bucket#10 = 0)
>>>>
>>>>   Subquery profile
>>>>
>>>>    Project
>>>> [company#8,bucket#9,sampling_bucket#10,profileid#11,createdat#12L,modifiedat#13L,version#14]
>>>>
>>>>     CassandraRelation localhost, 9042, 9160, normaldb_sampling,
>>>> profile, org.apache.spark.sql.CassandraAwareSQLContext@778b692d, None,
>>>> None, false, Some(Configuration: core-default.xml, core-site.xml,
>>>> mapred-default.xml, mapred-site.xml)
>>>>
>>>>
>>>>
>>>> at
>>>> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$apply$1.applyOrElse(Analyzer.scala:72)
>>>>
>>>> at
>>>> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$apply$1.applyOrElse(Analyzer.scala:70)
>>>>
>>>> at
>>>> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:165)
>>>>
>>>> at
>>>> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:183)
>>>>
>>>> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>>>>
>>>> at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>>>>
>>>> at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>>>>
>>>> at
>>>> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
>>>>
>>>> at
>>>> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
>>>>
>>>> at
>>>> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
>>>>
>>>> at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
>>>>
>>>> at scala.collection.AbstractIterator.to(Iterator.scala:1157)
>>>>
>>>> at
>>>> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
>>>>
>>>> at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
>>>>
>>>> at
>>>> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
>>>>
>>>> at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
>>>>
>>>> at
>>>> org.apache.spark.sql.catalyst.trees.TreeNode.transformChildrenDown(TreeNode.scala:212)
>>>>
>>>> at
>>>> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:168)
>>>>
>>>> at
>>>> org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:156)
>>>>
>>>> at
>>>> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:70)
>>>>
>>>> at
>>>> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:68)
>>>>
>>>> at
>>>> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:61)
>>>>
>>>> at
>>>> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:59)
>>>>
>>>> at
>>>> scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:51)
>>>>
>>>> at
>>>> scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:60)
>>>>
>>>> at scala.collection.mutable.WrappedArray.foldLeft(WrappedArray.scala:34)
>>>>
>>>> at
>>>> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:59)
>>>>
>>>> at
>>>> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:51)
>>>>
>>>> at scala.collection.immutable.List.foreach(List.scala:318)
>>>>
>>>> at
>>>> org.apache.spark.sql.catalyst.rules.RuleExecutor.apply(RuleExecutor.scala:51)
>>>>
>>>> at
>>>> org.apache.spark.sql.SQLContext$QueryExecution.analyzed$lzycompute(SQLContext.scala:402)
>>>>
>>>> at
>>>> org.apache.spark.sql.SQLContext$QueryExecution.analyzed(SQLContext.scala:402)
>>>>
>>>> at
>>>> org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan$lzycompute(SQLContext.scala:403)
>>>>
>>>> at
>>>> org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan(SQLContext.scala:403)
>>>>
>>>> at
>>>> org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:407)
>>>>
>>>> at
>>>> org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:405)
>>>>
>>>> at
>>>> org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:411)
>>>>
>>>> at
>>>> org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:411)
>>>>
>>>> at org.apache.spark.sql.SchemaRDD.collect(SchemaRDD.scala:438)
>>>>
>>>> at org.apache.spark.sql.SchemaRDD.take(SchemaRDD.scala:440)
>>>>
>>>> at org.apache.spark.sql.SchemaRDD.take(SchemaRDD.scala:103)
>>>>
>>>> at org.apache.spark.rdd.RDD.first(RDD.scala:1091)
>>>>
>>>> at boot.SQLDemo$.main(SQLDemo.scala:65)  //my code
>>>>
>>>> at boot.SQLDemo.main(SQLDemo.scala)  //my code
>>>>
>>>>
>>>>
>>>> On Tue, Mar 3, 2015 at 8:57 AM, Cheng, Hao <ha...@intel.com> wrote:
>>>>
>>>>  Can you provide the detailed failure call stack?
>>>>
>>>>
>>>>
>>>> *From:* shahab [mailto:shahab.mokari@gmail.com]
>>>> *Sent:* Tuesday, March 3, 2015 3:52 PM
>>>> *To:* user@spark.apache.org
>>>> *Subject:* Supporting Hive features in Spark SQL Thrift JDBC server
>>>>
>>>>
>>>>
>>>> Hi,
>>>>
>>>>
>>>>
>>>> According to Spark SQL documentation, "....Spark SQL supports the vast
>>>> majority of Hive features, such as  User Defined Functions( UDF) ", and one
>>>> of these UFDs is "current_date()" function, which should be supported.
>>>>
>>>>
>>>>
>>>> However, i get error when I am using this UDF in my SQL query. There
>>>> are couple of other UDFs which cause similar error.
>>>>
>>>>
>>>>
>>>> Am I missing something in my JDBC server ?
>>>>
>>>>
>>>>
>>>> /Shahab
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>

Re: Supporting Hive features in Spark SQL Thrift JDBC server

Posted by Rohit Rai <ro...@tuplejump.com>.

The Hive dependency comes from spark-hive.

It does work with Spark 1.1 we will have the 1.2 release later this month.
On Mar 3, 2015 8:49 AM, "shahab" <sh...@gmail.com> wrote:

>
> Thanks Rohit,
>
> I am already using Calliope and quite happy with it, well done ! except
> the fact that :
> 1- It seems that it does not support Hive 0.12 or higher, Am i right?  for
> example you can not use : current_time() UDF, or those new UDFs added in
> hive 0.12 . Are they supported? Any plan for supporting them?
> 2-It does not support Spark 1.1 and 1.2. Any plan for new release?
>
> best,
> /Shahab
>
> On Tue, Mar 3, 2015 at 5:41 PM, Rohit Rai <ro...@tuplejump.com> wrote:
>
>> Hello Shahab,
>>
>> I think CassandraAwareHiveContext
>> <https://github.com/tuplejump/calliope/blob/develop/sql/hive/src/main/scala/org/apache/spark/sql/hive/CassandraAwareHiveContext.scala> in
>> Calliopee is what you are looking for. Create CAHC instance and you should
>> be able to run hive functions against the SchemaRDD you create from there.
>>
>> Cheers,
>> Rohit
>>
>> *Founder & CEO, **Tuplejump, Inc.*
>> ____________________________
>> www.tuplejump.com
>> *The Data Engineering Platform*
>>
>> On Tue, Mar 3, 2015 at 6:03 AM, Cheng, Hao <ha...@intel.com> wrote:
>>
>>>  The temp table in metastore can not be shared cross SQLContext
>>> instances, since HiveContext is a sub class of SQLContext (inherits all of
>>> its functionality), why not using a single HiveContext globally? Is there
>>> any specific requirement in your case that you need multiple
>>> SQLContext/HiveContext?
>>>
>>>
>>>
>>> *From:* shahab [mailto:shahab.mokari@gmail.com]
>>> *Sent:* Tuesday, March 3, 2015 9:46 PM
>>>
>>> *To:* Cheng, Hao
>>> *Cc:* user@spark.apache.org
>>> *Subject:* Re: Supporting Hive features in Spark SQL Thrift JDBC server
>>>
>>>
>>>
>>> You are right ,  CassandraAwareSQLContext is subclass of SQL context.
>>>
>>>
>>>
>>> But I did another experiment, I queried Cassandra
>>> using CassandraAwareSQLContext, then I registered the "rdd" as a temp table
>>> , next I tried to query it using HiveContext, but it seems that hive
>>> context can not see the registered table suing SQL context. Is this a
>>> normal case?
>>>
>>>
>>>
>>> best,
>>>
>>> /Shahab
>>>
>>>
>>>
>>>
>>>
>>> On Tue, Mar 3, 2015 at 1:35 PM, Cheng, Hao <ha...@intel.com> wrote:
>>>
>>>  Hive UDF are only applicable for HiveContext and its subclass
>>> instance, is the CassandraAwareSQLContext a direct sub class of
>>> HiveContext or SQLContext?
>>>
>>>
>>>
>>> *From:* shahab [mailto:shahab.mokari@gmail.com]
>>> *Sent:* Tuesday, March 3, 2015 5:10 PM
>>> *To:* Cheng, Hao
>>> *Cc:* user@spark.apache.org
>>> *Subject:* Re: Supporting Hive features in Spark SQL Thrift JDBC server
>>>
>>>
>>>
>>>   val sc: SparkContext = new SparkContext(conf)
>>>
>>>   val sqlCassContext = new CassandraAwareSQLContext(sc)  // I used some
>>> Calliope Cassandra Spark connector
>>>
>>> val rdd : SchemaRDD  = sqlCassContext.sql("select * from db.profile " )
>>>
>>> rdd.cache
>>>
>>> rdd.registerTempTable("profile")
>>>
>>>  rdd.first  //enforce caching
>>>
>>>      val q = "select  from_unixtime(floor(createdAt/1000)) from profile
>>> where sampling_bucket=0 "
>>>
>>>      val rdd2 = rdd.sqlContext.sql(q )
>>>
>>>      println ("Result: " + rdd2.first)
>>>
>>>
>>>
>>> And I get the following  errors:
>>>
>>> xception in thread "main"
>>> org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Unresolved
>>> attributes: 'from_unixtime('floor(('createdAt / 1000))) AS c0#7, tree:
>>>
>>> Project ['from_unixtime('floor(('createdAt / 1000))) AS c0#7]
>>>
>>>  Filter (sampling_bucket#10 = 0)
>>>
>>>   Subquery profile
>>>
>>>    Project
>>> [company#8,bucket#9,sampling_bucket#10,profileid#11,createdat#12L,modifiedat#13L,version#14]
>>>
>>>     CassandraRelation localhost, 9042, 9160, normaldb_sampling, profile,
>>> org.apache.spark.sql.CassandraAwareSQLContext@778b692d, None, None,
>>> false, Some(Configuration: core-default.xml, core-site.xml,
>>> mapred-default.xml, mapred-site.xml)
>>>
>>>
>>>
>>> at
>>> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$apply$1.applyOrElse(Analyzer.scala:72)
>>>
>>> at
>>> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$apply$1.applyOrElse(Analyzer.scala:70)
>>>
>>> at
>>> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:165)
>>>
>>> at
>>> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:183)
>>>
>>> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>>>
>>> at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>>>
>>> at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>>>
>>> at
>>> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
>>>
>>> at
>>> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
>>>
>>> at
>>> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
>>>
>>> at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
>>>
>>> at scala.collection.AbstractIterator.to(Iterator.scala:1157)
>>>
>>> at
>>> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
>>>
>>> at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
>>>
>>> at
>>> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
>>>
>>> at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
>>>
>>> at
>>> org.apache.spark.sql.catalyst.trees.TreeNode.transformChildrenDown(TreeNode.scala:212)
>>>
>>> at
>>> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:168)
>>>
>>> at
>>> org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:156)
>>>
>>> at
>>> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:70)
>>>
>>> at
>>> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:68)
>>>
>>> at
>>> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:61)
>>>
>>> at
>>> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:59)
>>>
>>> at
>>> scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:51)
>>>
>>> at
>>> scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:60)
>>>
>>> at scala.collection.mutable.WrappedArray.foldLeft(WrappedArray.scala:34)
>>>
>>> at
>>> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:59)
>>>
>>> at
>>> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:51)
>>>
>>> at scala.collection.immutable.List.foreach(List.scala:318)
>>>
>>> at
>>> org.apache.spark.sql.catalyst.rules.RuleExecutor.apply(RuleExecutor.scala:51)
>>>
>>> at
>>> org.apache.spark.sql.SQLContext$QueryExecution.analyzed$lzycompute(SQLContext.scala:402)
>>>
>>> at
>>> org.apache.spark.sql.SQLContext$QueryExecution.analyzed(SQLContext.scala:402)
>>>
>>> at
>>> org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan$lzycompute(SQLContext.scala:403)
>>>
>>> at
>>> org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan(SQLContext.scala:403)
>>>
>>> at
>>> org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:407)
>>>
>>> at
>>> org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:405)
>>>
>>> at
>>> org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:411)
>>>
>>> at
>>> org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:411)
>>>
>>> at org.apache.spark.sql.SchemaRDD.collect(SchemaRDD.scala:438)
>>>
>>> at org.apache.spark.sql.SchemaRDD.take(SchemaRDD.scala:440)
>>>
>>> at org.apache.spark.sql.SchemaRDD.take(SchemaRDD.scala:103)
>>>
>>> at org.apache.spark.rdd.RDD.first(RDD.scala:1091)
>>>
>>> at boot.SQLDemo$.main(SQLDemo.scala:65)  //my code
>>>
>>> at boot.SQLDemo.main(SQLDemo.scala)  //my code
>>>
>>>
>>>
>>> On Tue, Mar 3, 2015 at 8:57 AM, Cheng, Hao <ha...@intel.com> wrote:
>>>
>>>  Can you provide the detailed failure call stack?
>>>
>>>
>>>
>>> *From:* shahab [mailto:shahab.mokari@gmail.com]
>>> *Sent:* Tuesday, March 3, 2015 3:52 PM
>>> *To:* user@spark.apache.org
>>> *Subject:* Supporting Hive features in Spark SQL Thrift JDBC server
>>>
>>>
>>>
>>> Hi,
>>>
>>>
>>>
>>> According to Spark SQL documentation, "....Spark SQL supports the vast
>>> majority of Hive features, such as  User Defined Functions( UDF) ", and one
>>> of these UFDs is "current_date()" function, which should be supported.
>>>
>>>
>>>
>>> However, i get error when I am using this UDF in my SQL query. There are
>>> couple of other UDFs which cause similar error.
>>>
>>>
>>>
>>> Am I missing something in my JDBC server ?
>>>
>>>
>>>
>>> /Shahab
>>>
>>>
>>>
>>>
>>>
>>
>>
>

Re: Supporting Hive features in Spark SQL Thrift JDBC server

Posted by shahab <sh...@gmail.com>.

Thanks Rohit,

I am already using Calliope and quite happy with it, well done ! except the
fact that :
1- It seems that it does not support Hive 0.12 or higher, Am i right?  for
example you can not use : current_time() UDF, or those new UDFs added in
hive 0.12 . Are they supported? Any plan for supporting them?
2-It does not support Spark 1.1 and 1.2. Any plan for new release?

best,
/Shahab

On Tue, Mar 3, 2015 at 5:41 PM, Rohit Rai <ro...@tuplejump.com> wrote:

> Hello Shahab,
>
> I think CassandraAwareHiveContext
> <https://github.com/tuplejump/calliope/blob/develop/sql/hive/src/main/scala/org/apache/spark/sql/hive/CassandraAwareHiveContext.scala> in
> Calliopee is what you are looking for. Create CAHC instance and you should
> be able to run hive functions against the SchemaRDD you create from there.
>
> Cheers,
> Rohit
>
> *Founder & CEO, **Tuplejump, Inc.*
> ____________________________
> www.tuplejump.com
> *The Data Engineering Platform*
>
> On Tue, Mar 3, 2015 at 6:03 AM, Cheng, Hao <ha...@intel.com> wrote:
>
>>  The temp table in metastore can not be shared cross SQLContext
>> instances, since HiveContext is a sub class of SQLContext (inherits all of
>> its functionality), why not using a single HiveContext globally? Is there
>> any specific requirement in your case that you need multiple
>> SQLContext/HiveContext?
>>
>>
>>
>> *From:* shahab [mailto:shahab.mokari@gmail.com]
>> *Sent:* Tuesday, March 3, 2015 9:46 PM
>>
>> *To:* Cheng, Hao
>> *Cc:* user@spark.apache.org
>> *Subject:* Re: Supporting Hive features in Spark SQL Thrift JDBC server
>>
>>
>>
>> You are right ,  CassandraAwareSQLContext is subclass of SQL context.
>>
>>
>>
>> But I did another experiment, I queried Cassandra
>> using CassandraAwareSQLContext, then I registered the "rdd" as a temp table
>> , next I tried to query it using HiveContext, but it seems that hive
>> context can not see the registered table suing SQL context. Is this a
>> normal case?
>>
>>
>>
>> best,
>>
>> /Shahab
>>
>>
>>
>>
>>
>> On Tue, Mar 3, 2015 at 1:35 PM, Cheng, Hao <ha...@intel.com> wrote:
>>
>>  Hive UDF are only applicable for HiveContext and its subclass instance,
>> is the CassandraAwareSQLContext a direct sub class of HiveContext or
>> SQLContext?
>>
>>
>>
>> *From:* shahab [mailto:shahab.mokari@gmail.com]
>> *Sent:* Tuesday, March 3, 2015 5:10 PM
>> *To:* Cheng, Hao
>> *Cc:* user@spark.apache.org
>> *Subject:* Re: Supporting Hive features in Spark SQL Thrift JDBC server
>>
>>
>>
>>   val sc: SparkContext = new SparkContext(conf)
>>
>>   val sqlCassContext = new CassandraAwareSQLContext(sc)  // I used some
>> Calliope Cassandra Spark connector
>>
>> val rdd : SchemaRDD  = sqlCassContext.sql("select * from db.profile " )
>>
>> rdd.cache
>>
>> rdd.registerTempTable("profile")
>>
>>  rdd.first  //enforce caching
>>
>>      val q = "select  from_unixtime(floor(createdAt/1000)) from profile
>> where sampling_bucket=0 "
>>
>>      val rdd2 = rdd.sqlContext.sql(q )
>>
>>      println ("Result: " + rdd2.first)
>>
>>
>>
>> And I get the following  errors:
>>
>> xception in thread "main"
>> org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Unresolved
>> attributes: 'from_unixtime('floor(('createdAt / 1000))) AS c0#7, tree:
>>
>> Project ['from_unixtime('floor(('createdAt / 1000))) AS c0#7]
>>
>>  Filter (sampling_bucket#10 = 0)
>>
>>   Subquery profile
>>
>>    Project
>> [company#8,bucket#9,sampling_bucket#10,profileid#11,createdat#12L,modifiedat#13L,version#14]
>>
>>     CassandraRelation localhost, 9042, 9160, normaldb_sampling, profile,
>> org.apache.spark.sql.CassandraAwareSQLContext@778b692d, None, None,
>> false, Some(Configuration: core-default.xml, core-site.xml,
>> mapred-default.xml, mapred-site.xml)
>>
>>
>>
>> at
>> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$apply$1.applyOrElse(Analyzer.scala:72)
>>
>> at
>> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$apply$1.applyOrElse(Analyzer.scala:70)
>>
>> at
>> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:165)
>>
>> at
>> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:183)
>>
>> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>>
>> at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>>
>> at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>>
>> at
>> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
>>
>> at
>> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
>>
>> at
>> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
>>
>> at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
>>
>> at scala.collection.AbstractIterator.to(Iterator.scala:1157)
>>
>> at
>> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
>>
>> at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
>>
>> at
>> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
>>
>> at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
>>
>> at
>> org.apache.spark.sql.catalyst.trees.TreeNode.transformChildrenDown(TreeNode.scala:212)
>>
>> at
>> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:168)
>>
>> at
>> org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:156)
>>
>> at
>> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:70)
>>
>> at
>> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:68)
>>
>> at
>> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:61)
>>
>> at
>> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:59)
>>
>> at
>> scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:51)
>>
>> at
>> scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:60)
>>
>> at scala.collection.mutable.WrappedArray.foldLeft(WrappedArray.scala:34)
>>
>> at
>> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:59)
>>
>> at
>> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:51)
>>
>> at scala.collection.immutable.List.foreach(List.scala:318)
>>
>> at
>> org.apache.spark.sql.catalyst.rules.RuleExecutor.apply(RuleExecutor.scala:51)
>>
>> at
>> org.apache.spark.sql.SQLContext$QueryExecution.analyzed$lzycompute(SQLContext.scala:402)
>>
>> at
>> org.apache.spark.sql.SQLContext$QueryExecution.analyzed(SQLContext.scala:402)
>>
>> at
>> org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan$lzycompute(SQLContext.scala:403)
>>
>> at
>> org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan(SQLContext.scala:403)
>>
>> at
>> org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:407)
>>
>> at
>> org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:405)
>>
>> at
>> org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:411)
>>
>> at
>> org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:411)
>>
>> at org.apache.spark.sql.SchemaRDD.collect(SchemaRDD.scala:438)
>>
>> at org.apache.spark.sql.SchemaRDD.take(SchemaRDD.scala:440)
>>
>> at org.apache.spark.sql.SchemaRDD.take(SchemaRDD.scala:103)
>>
>> at org.apache.spark.rdd.RDD.first(RDD.scala:1091)
>>
>> at boot.SQLDemo$.main(SQLDemo.scala:65)  //my code
>>
>> at boot.SQLDemo.main(SQLDemo.scala)  //my code
>>
>>
>>
>> On Tue, Mar 3, 2015 at 8:57 AM, Cheng, Hao <ha...@intel.com> wrote:
>>
>>  Can you provide the detailed failure call stack?
>>
>>
>>
>> *From:* shahab [mailto:shahab.mokari@gmail.com]
>> *Sent:* Tuesday, March 3, 2015 3:52 PM
>> *To:* user@spark.apache.org
>> *Subject:* Supporting Hive features in Spark SQL Thrift JDBC server
>>
>>
>>
>> Hi,
>>
>>
>>
>> According to Spark SQL documentation, "....Spark SQL supports the vast
>> majority of Hive features, such as  User Defined Functions( UDF) ", and one
>> of these UFDs is "current_date()" function, which should be supported.
>>
>>
>>
>> However, i get error when I am using this UDF in my SQL query. There are
>> couple of other UDFs which cause similar error.
>>
>>
>>
>> Am I missing something in my JDBC server ?
>>
>>
>>
>> /Shahab
>>
>>
>>
>>
>>
>
>

Re: Supporting Hive features in Spark SQL Thrift JDBC server

Posted by Rohit Rai <ro...@tuplejump.com>.

Hello Shahab,

I think CassandraAwareHiveContext
<https://github.com/tuplejump/calliope/blob/develop/sql/hive/src/main/scala/org/apache/spark/sql/hive/CassandraAwareHiveContext.scala>
in
Calliopee is what you are looking for. Create CAHC instance and you should
be able to run hive functions against the SchemaRDD you create from there.

Cheers,
Rohit

*Founder & CEO, **Tuplejump, Inc.*
____________________________
www.tuplejump.com
*The Data Engineering Platform*

On Tue, Mar 3, 2015 at 6:03 AM, Cheng, Hao <ha...@intel.com> wrote:

>  The temp table in metastore can not be shared cross SQLContext
> instances, since HiveContext is a sub class of SQLContext (inherits all of
> its functionality), why not using a single HiveContext globally? Is there
> any specific requirement in your case that you need multiple
> SQLContext/HiveContext?
>
>
>
> *From:* shahab [mailto:shahab.mokari@gmail.com]
> *Sent:* Tuesday, March 3, 2015 9:46 PM
>
> *To:* Cheng, Hao
> *Cc:* user@spark.apache.org
> *Subject:* Re: Supporting Hive features in Spark SQL Thrift JDBC server
>
>
>
> You are right ,  CassandraAwareSQLContext is subclass of SQL context.
>
>
>
> But I did another experiment, I queried Cassandra
> using CassandraAwareSQLContext, then I registered the "rdd" as a temp table
> , next I tried to query it using HiveContext, but it seems that hive
> context can not see the registered table suing SQL context. Is this a
> normal case?
>
>
>
> best,
>
> /Shahab
>
>
>
>
>
> On Tue, Mar 3, 2015 at 1:35 PM, Cheng, Hao <ha...@intel.com> wrote:
>
>  Hive UDF are only applicable for HiveContext and its subclass instance,
> is the CassandraAwareSQLContext a direct sub class of HiveContext or
> SQLContext?
>
>
>
> *From:* shahab [mailto:shahab.mokari@gmail.com]
> *Sent:* Tuesday, March 3, 2015 5:10 PM
> *To:* Cheng, Hao
> *Cc:* user@spark.apache.org
> *Subject:* Re: Supporting Hive features in Spark SQL Thrift JDBC server
>
>
>
>   val sc: SparkContext = new SparkContext(conf)
>
>   val sqlCassContext = new CassandraAwareSQLContext(sc)  // I used some
> Calliope Cassandra Spark connector
>
> val rdd : SchemaRDD  = sqlCassContext.sql("select * from db.profile " )
>
> rdd.cache
>
> rdd.registerTempTable("profile")
>
>  rdd.first  //enforce caching
>
>      val q = "select  from_unixtime(floor(createdAt/1000)) from profile
> where sampling_bucket=0 "
>
>      val rdd2 = rdd.sqlContext.sql(q )
>
>      println ("Result: " + rdd2.first)
>
>
>
> And I get the following  errors:
>
> xception in thread "main"
> org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Unresolved
> attributes: 'from_unixtime('floor(('createdAt / 1000))) AS c0#7, tree:
>
> Project ['from_unixtime('floor(('createdAt / 1000))) AS c0#7]
>
>  Filter (sampling_bucket#10 = 0)
>
>   Subquery profile
>
>    Project
> [company#8,bucket#9,sampling_bucket#10,profileid#11,createdat#12L,modifiedat#13L,version#14]
>
>     CassandraRelation localhost, 9042, 9160, normaldb_sampling, profile,
> org.apache.spark.sql.CassandraAwareSQLContext@778b692d, None, None,
> false, Some(Configuration: core-default.xml, core-site.xml,
> mapred-default.xml, mapred-site.xml)
>
>
>
> at
> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$apply$1.applyOrElse(Analyzer.scala:72)
>
> at
> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$apply$1.applyOrElse(Analyzer.scala:70)
>
> at
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:165)
>
> at
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:183)
>
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>
> at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>
> at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>
> at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
>
> at
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
>
> at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
>
> at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
>
> at scala.collection.AbstractIterator.to(Iterator.scala:1157)
>
> at
> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
>
> at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
>
> at
> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
>
> at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
>
> at
> org.apache.spark.sql.catalyst.trees.TreeNode.transformChildrenDown(TreeNode.scala:212)
>
> at
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:168)
>
> at
> org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:156)
>
> at
> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:70)
>
> at
> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:68)
>
> at
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:61)
>
> at
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:59)
>
> at
> scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:51)
>
> at
> scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:60)
>
> at scala.collection.mutable.WrappedArray.foldLeft(WrappedArray.scala:34)
>
> at
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:59)
>
> at
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:51)
>
> at scala.collection.immutable.List.foreach(List.scala:318)
>
> at
> org.apache.spark.sql.catalyst.rules.RuleExecutor.apply(RuleExecutor.scala:51)
>
> at
> org.apache.spark.sql.SQLContext$QueryExecution.analyzed$lzycompute(SQLContext.scala:402)
>
> at
> org.apache.spark.sql.SQLContext$QueryExecution.analyzed(SQLContext.scala:402)
>
> at
> org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan$lzycompute(SQLContext.scala:403)
>
> at
> org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan(SQLContext.scala:403)
>
> at
> org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:407)
>
> at
> org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:405)
>
> at
> org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:411)
>
> at
> org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:411)
>
> at org.apache.spark.sql.SchemaRDD.collect(SchemaRDD.scala:438)
>
> at org.apache.spark.sql.SchemaRDD.take(SchemaRDD.scala:440)
>
> at org.apache.spark.sql.SchemaRDD.take(SchemaRDD.scala:103)
>
> at org.apache.spark.rdd.RDD.first(RDD.scala:1091)
>
> at boot.SQLDemo$.main(SQLDemo.scala:65)  //my code
>
> at boot.SQLDemo.main(SQLDemo.scala)  //my code
>
>
>
> On Tue, Mar 3, 2015 at 8:57 AM, Cheng, Hao <ha...@intel.com> wrote:
>
>  Can you provide the detailed failure call stack?
>
>
>
> *From:* shahab [mailto:shahab.mokari@gmail.com]
> *Sent:* Tuesday, March 3, 2015 3:52 PM
> *To:* user@spark.apache.org
> *Subject:* Supporting Hive features in Spark SQL Thrift JDBC server
>
>
>
> Hi,
>
>
>
> According to Spark SQL documentation, "....Spark SQL supports the vast
> majority of Hive features, such as  User Defined Functions( UDF) ", and one
> of these UFDs is "current_date()" function, which should be supported.
>
>
>
> However, i get error when I am using this UDF in my SQL query. There are
> couple of other UDFs which cause similar error.
>
>
>
> Am I missing something in my JDBC server ?
>
>
>
> /Shahab
>
>
>
>
>

Re: Supporting Hive features in Spark SQL Thrift JDBC server

Posted by shahab <sh...@gmail.com>.

@Cheng :My problem is that the connector I use to query Spark does not
support latest Hive (0.12, 0.13), But I need to perform Hive Queries on
data retrieved from Cassandra. I assumed that if I get data out of
cassandra in some way and register it as Temp table I would be able to
query it using HiveContext, but it seems I can not do this!

@Yes, it is added in Hive 0.12, but do you mean It is not supported by
HiveContext in Spark????

Thanks,
/Shahab

On Tue, Mar 3, 2015 at 5:23 PM, Yin Huai <yh...@databricks.com> wrote:

> Regarding current_date, I think it is not in either Hive 0.12.0 or 0.13.1
> (versions that we support). Seems
> https://issues.apache.org/jira/browse/HIVE-5472 added it Hive recently.
>
> On Tue, Mar 3, 2015 at 6:03 AM, Cheng, Hao <ha...@intel.com> wrote:
>
>>  The temp table in metastore can not be shared cross SQLContext
>> instances, since HiveContext is a sub class of SQLContext (inherits all of
>> its functionality), why not using a single HiveContext globally? Is there
>> any specific requirement in your case that you need multiple
>> SQLContext/HiveContext?
>>
>>
>>
>> *From:* shahab [mailto:shahab.mokari@gmail.com]
>> *Sent:* Tuesday, March 3, 2015 9:46 PM
>>
>> *To:* Cheng, Hao
>> *Cc:* user@spark.apache.org
>> *Subject:* Re: Supporting Hive features in Spark SQL Thrift JDBC server
>>
>>
>>
>> You are right ,  CassandraAwareSQLContext is subclass of SQL context.
>>
>>
>>
>> But I did another experiment, I queried Cassandra
>> using CassandraAwareSQLContext, then I registered the "rdd" as a temp table
>> , next I tried to query it using HiveContext, but it seems that hive
>> context can not see the registered table suing SQL context. Is this a
>> normal case?
>>
>>
>>
>> best,
>>
>> /Shahab
>>
>>
>>
>>
>>
>> On Tue, Mar 3, 2015 at 1:35 PM, Cheng, Hao <ha...@intel.com> wrote:
>>
>>  Hive UDF are only applicable for HiveContext and its subclass instance,
>> is the CassandraAwareSQLContext a direct sub class of HiveContext or
>> SQLContext?
>>
>>
>>
>> *From:* shahab [mailto:shahab.mokari@gmail.com]
>> *Sent:* Tuesday, March 3, 2015 5:10 PM
>> *To:* Cheng, Hao
>> *Cc:* user@spark.apache.org
>> *Subject:* Re: Supporting Hive features in Spark SQL Thrift JDBC server
>>
>>
>>
>>   val sc: SparkContext = new SparkContext(conf)
>>
>>   val sqlCassContext = new CassandraAwareSQLContext(sc)  // I used some
>> Calliope Cassandra Spark connector
>>
>> val rdd : SchemaRDD  = sqlCassContext.sql("select * from db.profile " )
>>
>> rdd.cache
>>
>> rdd.registerTempTable("profile")
>>
>>  rdd.first  //enforce caching
>>
>>      val q = "select  from_unixtime(floor(createdAt/1000)) from profile
>> where sampling_bucket=0 "
>>
>>      val rdd2 = rdd.sqlContext.sql(q )
>>
>>      println ("Result: " + rdd2.first)
>>
>>
>>
>> And I get the following  errors:
>>
>> xception in thread "main"
>> org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Unresolved
>> attributes: 'from_unixtime('floor(('createdAt / 1000))) AS c0#7, tree:
>>
>> Project ['from_unixtime('floor(('createdAt / 1000))) AS c0#7]
>>
>>  Filter (sampling_bucket#10 = 0)
>>
>>   Subquery profile
>>
>>    Project
>> [company#8,bucket#9,sampling_bucket#10,profileid#11,createdat#12L,modifiedat#13L,version#14]
>>
>>     CassandraRelation localhost, 9042, 9160, normaldb_sampling, profile,
>> org.apache.spark.sql.CassandraAwareSQLContext@778b692d, None, None,
>> false, Some(Configuration: core-default.xml, core-site.xml,
>> mapred-default.xml, mapred-site.xml)
>>
>>
>>
>> at
>> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$apply$1.applyOrElse(Analyzer.scala:72)
>>
>> at
>> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$apply$1.applyOrElse(Analyzer.scala:70)
>>
>> at
>> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:165)
>>
>> at
>> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:183)
>>
>> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>>
>> at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>>
>> at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>>
>> at
>> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
>>
>> at
>> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
>>
>> at
>> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
>>
>> at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
>>
>> at scala.collection.AbstractIterator.to(Iterator.scala:1157)
>>
>> at
>> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
>>
>> at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
>>
>> at
>> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
>>
>> at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
>>
>> at
>> org.apache.spark.sql.catalyst.trees.TreeNode.transformChildrenDown(TreeNode.scala:212)
>>
>> at
>> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:168)
>>
>> at
>> org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:156)
>>
>> at
>> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:70)
>>
>> at
>> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:68)
>>
>> at
>> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:61)
>>
>> at
>> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:59)
>>
>> at
>> scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:51)
>>
>> at
>> scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:60)
>>
>> at scala.collection.mutable.WrappedArray.foldLeft(WrappedArray.scala:34)
>>
>> at
>> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:59)
>>
>> at
>> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:51)
>>
>> at scala.collection.immutable.List.foreach(List.scala:318)
>>
>> at
>> org.apache.spark.sql.catalyst.rules.RuleExecutor.apply(RuleExecutor.scala:51)
>>
>> at
>> org.apache.spark.sql.SQLContext$QueryExecution.analyzed$lzycompute(SQLContext.scala:402)
>>
>> at
>> org.apache.spark.sql.SQLContext$QueryExecution.analyzed(SQLContext.scala:402)
>>
>> at
>> org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan$lzycompute(SQLContext.scala:403)
>>
>> at
>> org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan(SQLContext.scala:403)
>>
>> at
>> org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:407)
>>
>> at
>> org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:405)
>>
>> at
>> org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:411)
>>
>> at
>> org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:411)
>>
>> at org.apache.spark.sql.SchemaRDD.collect(SchemaRDD.scala:438)
>>
>> at org.apache.spark.sql.SchemaRDD.take(SchemaRDD.scala:440)
>>
>> at org.apache.spark.sql.SchemaRDD.take(SchemaRDD.scala:103)
>>
>> at org.apache.spark.rdd.RDD.first(RDD.scala:1091)
>>
>> at boot.SQLDemo$.main(SQLDemo.scala:65)  //my code
>>
>> at boot.SQLDemo.main(SQLDemo.scala)  //my code
>>
>>
>>
>> On Tue, Mar 3, 2015 at 8:57 AM, Cheng, Hao <ha...@intel.com> wrote:
>>
>>  Can you provide the detailed failure call stack?
>>
>>
>>
>> *From:* shahab [mailto:shahab.mokari@gmail.com]
>> *Sent:* Tuesday, March 3, 2015 3:52 PM
>> *To:* user@spark.apache.org
>> *Subject:* Supporting Hive features in Spark SQL Thrift JDBC server
>>
>>
>>
>> Hi,
>>
>>
>>
>> According to Spark SQL documentation, "....Spark SQL supports the vast
>> majority of Hive features, such as  User Defined Functions( UDF) ", and one
>> of these UFDs is "current_date()" function, which should be supported.
>>
>>
>>
>> However, i get error when I am using this UDF in my SQL query. There are
>> couple of other UDFs which cause similar error.
>>
>>
>>
>> Am I missing something in my JDBC server ?
>>
>>
>>
>> /Shahab
>>
>>
>>
>>
>>
>
>

Re: Supporting Hive features in Spark SQL Thrift JDBC server

Posted by Yin Huai <yh...@databricks.com>.

Regarding current_date, I think it is not in either Hive 0.12.0 or 0.13.1
(versions that we support). Seems
https://issues.apache.org/jira/browse/HIVE-5472 added it Hive recently.

On Tue, Mar 3, 2015 at 6:03 AM, Cheng, Hao <ha...@intel.com> wrote:

>  The temp table in metastore can not be shared cross SQLContext
> instances, since HiveContext is a sub class of SQLContext (inherits all of
> its functionality), why not using a single HiveContext globally? Is there
> any specific requirement in your case that you need multiple
> SQLContext/HiveContext?
>
>
>
> *From:* shahab [mailto:shahab.mokari@gmail.com]
> *Sent:* Tuesday, March 3, 2015 9:46 PM
>
> *To:* Cheng, Hao
> *Cc:* user@spark.apache.org
> *Subject:* Re: Supporting Hive features in Spark SQL Thrift JDBC server
>
>
>
> You are right ,  CassandraAwareSQLContext is subclass of SQL context.
>
>
>
> But I did another experiment, I queried Cassandra
> using CassandraAwareSQLContext, then I registered the "rdd" as a temp table
> , next I tried to query it using HiveContext, but it seems that hive
> context can not see the registered table suing SQL context. Is this a
> normal case?
>
>
>
> best,
>
> /Shahab
>
>
>
>
>
> On Tue, Mar 3, 2015 at 1:35 PM, Cheng, Hao <ha...@intel.com> wrote:
>
>  Hive UDF are only applicable for HiveContext and its subclass instance,
> is the CassandraAwareSQLContext a direct sub class of HiveContext or
> SQLContext?
>
>
>
> *From:* shahab [mailto:shahab.mokari@gmail.com]
> *Sent:* Tuesday, March 3, 2015 5:10 PM
> *To:* Cheng, Hao
> *Cc:* user@spark.apache.org
> *Subject:* Re: Supporting Hive features in Spark SQL Thrift JDBC server
>
>
>
>   val sc: SparkContext = new SparkContext(conf)
>
>   val sqlCassContext = new CassandraAwareSQLContext(sc)  // I used some
> Calliope Cassandra Spark connector
>
> val rdd : SchemaRDD  = sqlCassContext.sql("select * from db.profile " )
>
> rdd.cache
>
> rdd.registerTempTable("profile")
>
>  rdd.first  //enforce caching
>
>      val q = "select  from_unixtime(floor(createdAt/1000)) from profile
> where sampling_bucket=0 "
>
>      val rdd2 = rdd.sqlContext.sql(q )
>
>      println ("Result: " + rdd2.first)
>
>
>
> And I get the following  errors:
>
> xception in thread "main"
> org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Unresolved
> attributes: 'from_unixtime('floor(('createdAt / 1000))) AS c0#7, tree:
>
> Project ['from_unixtime('floor(('createdAt / 1000))) AS c0#7]
>
>  Filter (sampling_bucket#10 = 0)
>
>   Subquery profile
>
>    Project
> [company#8,bucket#9,sampling_bucket#10,profileid#11,createdat#12L,modifiedat#13L,version#14]
>
>     CassandraRelation localhost, 9042, 9160, normaldb_sampling, profile,
> org.apache.spark.sql.CassandraAwareSQLContext@778b692d, None, None,
> false, Some(Configuration: core-default.xml, core-site.xml,
> mapred-default.xml, mapred-site.xml)
>
>
>
> at
> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$apply$1.applyOrElse(Analyzer.scala:72)
>
> at
> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$apply$1.applyOrElse(Analyzer.scala:70)
>
> at
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:165)
>
> at
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:183)
>
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>
> at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>
> at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>
> at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
>
> at
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
>
> at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
>
> at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
>
> at scala.collection.AbstractIterator.to(Iterator.scala:1157)
>
> at
> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
>
> at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
>
> at
> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
>
> at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
>
> at
> org.apache.spark.sql.catalyst.trees.TreeNode.transformChildrenDown(TreeNode.scala:212)
>
> at
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:168)
>
> at
> org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:156)
>
> at
> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:70)
>
> at
> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:68)
>
> at
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:61)
>
> at
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:59)
>
> at
> scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:51)
>
> at
> scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:60)
>
> at scala.collection.mutable.WrappedArray.foldLeft(WrappedArray.scala:34)
>
> at
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:59)
>
> at
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:51)
>
> at scala.collection.immutable.List.foreach(List.scala:318)
>
> at
> org.apache.spark.sql.catalyst.rules.RuleExecutor.apply(RuleExecutor.scala:51)
>
> at
> org.apache.spark.sql.SQLContext$QueryExecution.analyzed$lzycompute(SQLContext.scala:402)
>
> at
> org.apache.spark.sql.SQLContext$QueryExecution.analyzed(SQLContext.scala:402)
>
> at
> org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan$lzycompute(SQLContext.scala:403)
>
> at
> org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan(SQLContext.scala:403)
>
> at
> org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:407)
>
> at
> org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:405)
>
> at
> org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:411)
>
> at
> org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:411)
>
> at org.apache.spark.sql.SchemaRDD.collect(SchemaRDD.scala:438)
>
> at org.apache.spark.sql.SchemaRDD.take(SchemaRDD.scala:440)
>
> at org.apache.spark.sql.SchemaRDD.take(SchemaRDD.scala:103)
>
> at org.apache.spark.rdd.RDD.first(RDD.scala:1091)
>
> at boot.SQLDemo$.main(SQLDemo.scala:65)  //my code
>
> at boot.SQLDemo.main(SQLDemo.scala)  //my code
>
>
>
> On Tue, Mar 3, 2015 at 8:57 AM, Cheng, Hao <ha...@intel.com> wrote:
>
>  Can you provide the detailed failure call stack?
>
>
>
> *From:* shahab [mailto:shahab.mokari@gmail.com]
> *Sent:* Tuesday, March 3, 2015 3:52 PM
> *To:* user@spark.apache.org
> *Subject:* Supporting Hive features in Spark SQL Thrift JDBC server
>
>
>
> Hi,
>
>
>
> According to Spark SQL documentation, "....Spark SQL supports the vast
> majority of Hive features, such as  User Defined Functions( UDF) ", and one
> of these UFDs is "current_date()" function, which should be supported.
>
>
>
> However, i get error when I am using this UDF in my SQL query. There are
> couple of other UDFs which cause similar error.
>
>
>
> Am I missing something in my JDBC server ?
>
>
>
> /Shahab
>
>
>
>
>

RE: Supporting Hive features in Spark SQL Thrift JDBC server

Posted by "Cheng, Hao" <ha...@intel.com>.

The temp table in metastore can not be shared cross SQLContext instances, since HiveContext is a sub class of SQLContext (inherits all of its functionality), why not using a single HiveContext globally? Is there any specific requirement in your case that you need multiple SQLContext/HiveContext?

From: shahab [mailto:shahab.mokari@gmail.com]
Sent: Tuesday, March 3, 2015 9:46 PM
To: Cheng, Hao
Cc: user@spark.apache.org
Subject: Re: Supporting Hive features in Spark SQL Thrift JDBC server

You are right ,  CassandraAwareSQLContext is subclass of SQL context.

But I did another experiment, I queried Cassandra using CassandraAwareSQLContext, then I registered the "rdd" as a temp table , next I tried to query it using HiveContext, but it seems that hive context can not see the registered table suing SQL context. Is this a normal case?

best,
/Shahab


On Tue, Mar 3, 2015 at 1:35 PM, Cheng, Hao <ha...@intel.com>> wrote:
Hive UDF are only applicable for HiveContext and its subclass instance, is the CassandraAwareSQLContext a direct sub class of HiveContext or SQLContext?

From: shahab [mailto:shahab.mokari@gmail.com<ma...@gmail.com>]
Sent: Tuesday, March 3, 2015 5:10 PM
To: Cheng, Hao
Cc: user@spark.apache.org<ma...@spark.apache.org>
Subject: Re: Supporting Hive features in Spark SQL Thrift JDBC server

  val sc: SparkContext = new SparkContext(conf)
  val sqlCassContext = new CassandraAwareSQLContext(sc)  // I used some Calliope Cassandra Spark connector
val rdd : SchemaRDD  = sqlCassContext.sql("select * from db.profile " )
rdd.cache
rdd.registerTempTable("profile")
 rdd.first  //enforce caching
     val q = "select  from_unixtime(floor(createdAt/1000)) from profile where sampling_bucket=0 "
     val rdd2 = rdd.sqlContext.sql(q )
     println ("Result: " + rdd2.first)

And I get the following  errors:
xception in thread "main" org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Unresolved attributes: 'from_unixtime('floor(('createdAt / 1000))) AS c0#7, tree:
Project ['from_unixtime('floor(('createdAt / 1000))) AS c0#7]
 Filter (sampling_bucket#10 = 0)
  Subquery profile
   Project [company#8,bucket#9,sampling_bucket#10,profileid#11,createdat#12L,modifiedat#13L,version#14]
    CassandraRelation localhost, 9042, 9160, normaldb_sampling, profile, org.apache.spark.sql.CassandraAwareSQLContext@778b692d<ma...@778b692d>, None, None, false, Some(Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml)

at org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$apply$1.applyOrElse(Analyzer.scala:72)
at org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$apply$1.applyOrElse(Analyzer.scala:70)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:165)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:183)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
at scala.collection.TraversableOnce$class.to<http://class.to>(TraversableOnce.scala:273)
at scala.collection.AbstractIterator.to<http://scala.collection.AbstractIterator.to>(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformChildrenDown(TreeNode.scala:212)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:168)
at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:156)
at org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:70)
at org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:68)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:61)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:59)
at scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:51)
at scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:60)
at scala.collection.mutable.WrappedArray.foldLeft(WrappedArray.scala:34)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:59)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:51)
at scala.collection.immutable.List.foreach(List.scala:318)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.apply(RuleExecutor.scala:51)
at org.apache.spark.sql.SQLContext$QueryExecution.analyzed$lzycompute(SQLContext.scala:402)
at org.apache.spark.sql.SQLContext$QueryExecution.analyzed(SQLContext.scala:402)
at org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan$lzycompute(SQLContext.scala:403)
at org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan(SQLContext.scala:403)
at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:407)
at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:405)
at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:411)
at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:411)
at org.apache.spark.sql.SchemaRDD.collect(SchemaRDD.scala:438)
at org.apache.spark.sql.SchemaRDD.take(SchemaRDD.scala:440)
at org.apache.spark.sql.SchemaRDD.take(SchemaRDD.scala:103)
at org.apache.spark.rdd.RDD.first(RDD.scala:1091)
at boot.SQLDemo$.main(SQLDemo.scala:65)  //my code
at boot.SQLDemo.main(SQLDemo.scala)  //my code

On Tue, Mar 3, 2015 at 8:57 AM, Cheng, Hao <ha...@intel.com>> wrote:
Can you provide the detailed failure call stack?

From: shahab [mailto:shahab.mokari@gmail.com<ma...@gmail.com>]
Sent: Tuesday, March 3, 2015 3:52 PM
To: user@spark.apache.org<ma...@spark.apache.org>
Subject: Supporting Hive features in Spark SQL Thrift JDBC server

Hi,

According to Spark SQL documentation, "....Spark SQL supports the vast majority of Hive features, such as  User Defined Functions( UDF) ", and one of these UFDs is "current_date()" function, which should be supported.

However, i get error when I am using this UDF in my SQL query. There are couple of other UDFs which cause similar error.

Am I missing something in my JDBC server ?

/Shahab

Re: Supporting Hive features in Spark SQL Thrift JDBC server

Posted by shahab <sh...@gmail.com>.

You are right ,  CassandraAwareSQLContext is subclass of SQL context.

But I did another experiment, I queried Cassandra using
CassandraAwareSQLContext,
then I registered the "rdd" as a temp table , next I tried to query it
using HiveContext, but it seems that hive context can not see the
registered table suing SQL context. Is this a normal case?

best,
/Shahab


On Tue, Mar 3, 2015 at 1:35 PM, Cheng, Hao <ha...@intel.com> wrote:

>  Hive UDF are only applicable for HiveContext and its subclass instance,
> is the CassandraAwareSQLContext a direct sub class of HiveContext or
> SQLContext?
>
>
>
> *From:* shahab [mailto:shahab.mokari@gmail.com]
> *Sent:* Tuesday, March 3, 2015 5:10 PM
> *To:* Cheng, Hao
> *Cc:* user@spark.apache.org
> *Subject:* Re: Supporting Hive features in Spark SQL Thrift JDBC server
>
>
>
>   val sc: SparkContext = new SparkContext(conf)
>
>   val sqlCassContext = new CassandraAwareSQLContext(sc)  // I used some
> Calliope Cassandra Spark connector
>
> val rdd : SchemaRDD  = sqlCassContext.sql("select * from db.profile " )
>
> rdd.cache
>
> rdd.registerTempTable("profile")
>
>  rdd.first  //enforce caching
>
>      val q = "select  from_unixtime(floor(createdAt/1000)) from profile
> where sampling_bucket=0 "
>
>      val rdd2 = rdd.sqlContext.sql(q )
>
>      println ("Result: " + rdd2.first)
>
>
>
> And I get the following  errors:
>
> xception in thread "main"
> org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Unresolved
> attributes: 'from_unixtime('floor(('createdAt / 1000))) AS c0#7, tree:
>
> Project ['from_unixtime('floor(('createdAt / 1000))) AS c0#7]
>
>  Filter (sampling_bucket#10 = 0)
>
>   Subquery profile
>
>    Project
> [company#8,bucket#9,sampling_bucket#10,profileid#11,createdat#12L,modifiedat#13L,version#14]
>
>     CassandraRelation localhost, 9042, 9160, normaldb_sampling, profile,
> org.apache.spark.sql.CassandraAwareSQLContext@778b692d, None, None,
> false, Some(Configuration: core-default.xml, core-site.xml,
> mapred-default.xml, mapred-site.xml)
>
>
>
> at
> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$apply$1.applyOrElse(Analyzer.scala:72)
>
> at
> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$apply$1.applyOrElse(Analyzer.scala:70)
>
> at
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:165)
>
> at
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:183)
>
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>
> at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>
> at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>
> at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
>
> at
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
>
> at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
>
> at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
>
> at scala.collection.AbstractIterator.to(Iterator.scala:1157)
>
> at
> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
>
> at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
>
> at
> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
>
> at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
>
> at
> org.apache.spark.sql.catalyst.trees.TreeNode.transformChildrenDown(TreeNode.scala:212)
>
> at
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:168)
>
> at
> org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:156)
>
> at
> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:70)
>
> at
> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:68)
>
> at
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:61)
>
> at
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:59)
>
> at
> scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:51)
>
> at
> scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:60)
>
> at scala.collection.mutable.WrappedArray.foldLeft(WrappedArray.scala:34)
>
> at
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:59)
>
> at
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:51)
>
> at scala.collection.immutable.List.foreach(List.scala:318)
>
> at
> org.apache.spark.sql.catalyst.rules.RuleExecutor.apply(RuleExecutor.scala:51)
>
> at
> org.apache.spark.sql.SQLContext$QueryExecution.analyzed$lzycompute(SQLContext.scala:402)
>
> at
> org.apache.spark.sql.SQLContext$QueryExecution.analyzed(SQLContext.scala:402)
>
> at
> org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan$lzycompute(SQLContext.scala:403)
>
> at
> org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan(SQLContext.scala:403)
>
> at
> org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:407)
>
> at
> org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:405)
>
> at
> org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:411)
>
> at
> org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:411)
>
> at org.apache.spark.sql.SchemaRDD.collect(SchemaRDD.scala:438)
>
> at org.apache.spark.sql.SchemaRDD.take(SchemaRDD.scala:440)
>
> at org.apache.spark.sql.SchemaRDD.take(SchemaRDD.scala:103)
>
> at org.apache.spark.rdd.RDD.first(RDD.scala:1091)
>
> at boot.SQLDemo$.main(SQLDemo.scala:65)  //my code
>
> at boot.SQLDemo.main(SQLDemo.scala)  //my code
>
>
>
> On Tue, Mar 3, 2015 at 8:57 AM, Cheng, Hao <ha...@intel.com> wrote:
>
>  Can you provide the detailed failure call stack?
>
>
>
> *From:* shahab [mailto:shahab.mokari@gmail.com]
> *Sent:* Tuesday, March 3, 2015 3:52 PM
> *To:* user@spark.apache.org
> *Subject:* Supporting Hive features in Spark SQL Thrift JDBC server
>
>
>
> Hi,
>
>
>
> According to Spark SQL documentation, "....Spark SQL supports the vast
> majority of Hive features, such as  User Defined Functions( UDF) ", and one
> of these UFDs is "current_date()" function, which should be supported.
>
>
>
> However, i get error when I am using this UDF in my SQL query. There are
> couple of other UDFs which cause similar error.
>
>
>
> Am I missing something in my JDBC server ?
>
>
>
> /Shahab
>
>
>

RE: Supporting Hive features in Spark SQL Thrift JDBC server

Posted by "Cheng, Hao" <ha...@intel.com>.

Hive UDF are only applicable for HiveContext and its subclass instance, is the CassandraAwareSQLContext a direct sub class of HiveContext or SQLContext?

From: shahab [mailto:shahab.mokari@gmail.com]
Sent: Tuesday, March 3, 2015 5:10 PM
To: Cheng, Hao
Cc: user@spark.apache.org
Subject: Re: Supporting Hive features in Spark SQL Thrift JDBC server

  val sc: SparkContext = new SparkContext(conf)
  val sqlCassContext = new CassandraAwareSQLContext(sc)  // I used some Calliope Cassandra Spark connector
val rdd : SchemaRDD  = sqlCassContext.sql("select * from db.profile " )
rdd.cache
rdd.registerTempTable("profile")
 rdd.first  //enforce caching
     val q = "select  from_unixtime(floor(createdAt/1000)) from profile where sampling_bucket=0 "
     val rdd2 = rdd.sqlContext.sql(q )
     println ("Result: " + rdd2.first)

And I get the following  errors:
xception in thread "main" org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Unresolved attributes: 'from_unixtime('floor(('createdAt / 1000))) AS c0#7, tree:
Project ['from_unixtime('floor(('createdAt / 1000))) AS c0#7]
 Filter (sampling_bucket#10 = 0)
  Subquery profile
   Project [company#8,bucket#9,sampling_bucket#10,profileid#11,createdat#12L,modifiedat#13L,version#14]
    CassandraRelation localhost, 9042, 9160, normaldb_sampling, profile, org.apache.spark.sql.CassandraAwareSQLContext@778b692d<ma...@778b692d>, None, None, false, Some(Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml)

at org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$apply$1.applyOrElse(Analyzer.scala:72)
at org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$apply$1.applyOrElse(Analyzer.scala:70)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:165)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:183)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
at scala.collection.TraversableOnce$class.to<http://class.to>(TraversableOnce.scala:273)
at scala.collection.AbstractIterator.to<http://scala.collection.AbstractIterator.to>(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformChildrenDown(TreeNode.scala:212)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:168)
at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:156)
at org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:70)
at org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:68)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:61)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:59)
at scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:51)
at scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:60)
at scala.collection.mutable.WrappedArray.foldLeft(WrappedArray.scala:34)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:59)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:51)
at scala.collection.immutable.List.foreach(List.scala:318)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.apply(RuleExecutor.scala:51)
at org.apache.spark.sql.SQLContext$QueryExecution.analyzed$lzycompute(SQLContext.scala:402)
at org.apache.spark.sql.SQLContext$QueryExecution.analyzed(SQLContext.scala:402)
at org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan$lzycompute(SQLContext.scala:403)
at org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan(SQLContext.scala:403)
at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:407)
at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:405)
at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:411)
at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:411)
at org.apache.spark.sql.SchemaRDD.collect(SchemaRDD.scala:438)
at org.apache.spark.sql.SchemaRDD.take(SchemaRDD.scala:440)
at org.apache.spark.sql.SchemaRDD.take(SchemaRDD.scala:103)
at org.apache.spark.rdd.RDD.first(RDD.scala:1091)
at boot.SQLDemo$.main(SQLDemo.scala:65)  //my code
at boot.SQLDemo.main(SQLDemo.scala)  //my code

On Tue, Mar 3, 2015 at 8:57 AM, Cheng, Hao <ha...@intel.com>> wrote:
Can you provide the detailed failure call stack?

From: shahab [mailto:shahab.mokari@gmail.com<ma...@gmail.com>]
Sent: Tuesday, March 3, 2015 3:52 PM
To: user@spark.apache.org<ma...@spark.apache.org>
Subject: Supporting Hive features in Spark SQL Thrift JDBC server

Hi,

According to Spark SQL documentation, "....Spark SQL supports the vast majority of Hive features, such as  User Defined Functions( UDF) ", and one of these UFDs is "current_date()" function, which should be supported.

However, i get error when I am using this UDF in my SQL query. There are couple of other UDFs which cause similar error.

Am I missing something in my JDBC server ?

/Shahab

Re: Supporting Hive features in Spark SQL Thrift JDBC server

Posted by shahab <sh...@gmail.com>.

  val sc: SparkContext = new SparkContext(conf)

  val sqlCassContext = new CassandraAwareSQLContext(sc)  // I used some
Calliope Cassandra Spark connector

 val rdd : SchemaRDD  = sqlCassContext.sql("select * from db.profile " )

 rdd.cache

 rdd.registerTempTable("profile")

 rdd.first  //enforce caching

     val q = "select  from_unixtime(floor(createdAt/1000)) from profile
where sampling_bucket=0 "

     val rdd2 = rdd.sqlContext.sql(q )

     println ("Result: " + rdd2.first)


And I get the following  errors:

xception in thread "main"
org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Unresolved
attributes: 'from_unixtime('floor(('createdAt / 1000))) AS c0#7, tree:

Project ['from_unixtime('floor(('createdAt / 1000))) AS c0#7]

 Filter (sampling_bucket#10 = 0)

  Subquery profile

   Project
[company#8,bucket#9,sampling_bucket#10,profileid#11,createdat#12L,modifiedat#13L,version#14]

    CassandraRelation localhost, 9042, 9160, normaldb_sampling, profile,
org.apache.spark.sql.CassandraAwareSQLContext@778b692d, None, None, false,
Some(Configuration: core-default.xml, core-site.xml, mapred-default.xml,
mapred-site.xml)


at
org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$apply$1.applyOrElse(
Analyzer.scala:72)

at
org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$apply$1.applyOrElse(
Analyzer.scala:70)

at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(
TreeNode.scala:165)

at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(
TreeNode.scala:183)

at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)

at scala.collection.Iterator$class.foreach(Iterator.scala:727)

at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)

at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)

at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)

at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)

at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)

at scala.collection.AbstractIterator.to(Iterator.scala:1157)

at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265
)

at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)

at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)

at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)

at org.apache.spark.sql.catalyst.trees.TreeNode.transformChildrenDown(
TreeNode.scala:212)

at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(
TreeNode.scala:168)

at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:156
)

at org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(
Analyzer.scala:70)

at org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(
Analyzer.scala:68)

at
org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(
RuleExecutor.scala:61)

at
org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(
RuleExecutor.scala:59)

at scala.collection.IndexedSeqOptimized$class.foldl(
IndexedSeqOptimized.scala:51)

at scala.collection.IndexedSeqOptimized$class.foldLeft(
IndexedSeqOptimized.scala:60)

at scala.collection.mutable.WrappedArray.foldLeft(WrappedArray.scala:34)

at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(
RuleExecutor.scala:59)

at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(
RuleExecutor.scala:51)

at scala.collection.immutable.List.foreach(List.scala:318)

at org.apache.spark.sql.catalyst.rules.RuleExecutor.apply(
RuleExecutor.scala:51)

at org.apache.spark.sql.SQLContext$QueryExecution.analyzed$lzycompute(
SQLContext.scala:402)

at org.apache.spark.sql.SQLContext$QueryExecution.analyzed(
SQLContext.scala:402)

at org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan$lzycompute(
SQLContext.scala:403)

at org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan(
SQLContext.scala:403)

at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(
SQLContext.scala:407)

at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(
SQLContext.scala:405)

at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(
SQLContext.scala:411)

at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(
SQLContext.scala:411)

at org.apache.spark.sql.SchemaRDD.collect(SchemaRDD.scala:438)

at org.apache.spark.sql.SchemaRDD.take(SchemaRDD.scala:440)

at org.apache.spark.sql.SchemaRDD.take(SchemaRDD.scala:103)

at org.apache.spark.rdd.RDD.first(RDD.scala:1091)

at boot.SQLDemo$.main(SQLDemo.scala:65)  //my code

 at boot.SQLDemo.main(SQLDemo.scala)  //my code

On Tue, Mar 3, 2015 at 8:57 AM, Cheng, Hao <ha...@intel.com> wrote:

>  Can you provide the detailed failure call stack?
>
>
>
> *From:* shahab [mailto:shahab.mokari@gmail.com]
> *Sent:* Tuesday, March 3, 2015 3:52 PM
> *To:* user@spark.apache.org
> *Subject:* Supporting Hive features in Spark SQL Thrift JDBC server
>
>
>
> Hi,
>
>
>
> According to Spark SQL documentation, "....Spark SQL supports the vast
> majority of Hive features, such as  User Defined Functions( UDF) ", and one
> of these UFDs is "current_date()" function, which should be supported.
>
>
>
> However, i get error when I am using this UDF in my SQL query. There are
> couple of other UDFs which cause similar error.
>
>
>
> Am I missing something in my JDBC server ?
>
>
>
> /Shahab
>

RE: Supporting Hive features in Spark SQL Thrift JDBC server

Posted by "Cheng, Hao" <ha...@intel.com>.

Can you provide the detailed failure call stack?

From: shahab [mailto:shahab.mokari@gmail.com]
Sent: Tuesday, March 3, 2015 3:52 PM
To: user@spark.apache.org
Subject: Supporting Hive features in Spark SQL Thrift JDBC server

Hi,

According to Spark SQL documentation, "....Spark SQL supports the vast majority of Hive features, such as  User Defined Functions( UDF) ", and one of these UFDs is "current_date()" function, which should be supported.

However, i get error when I am using this UDF in my SQL query. There are couple of other UDFs which cause similar error.

Am I missing something in my JDBC server ?

/Shahab