You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by gtinside <gt...@gmail.com> on 2014/09/22 22:02:29 UTC

Spark SQL CLI

Hi ,

I have been using spark shell to execute all SQLs. I am connecting to
Cassandra , converting the data in JSON and then running queries on it,  I
am using HiveContext (and not SQLContext) because of "explode "
functionality in it. 

I want to see how can I use Spark SQL CLI for directly running the queries
on saved table. I see metastore and metastore_db getting created in the
spark bin directory (my hive context is LocalHiveContext). I tried executing
queries in spark-sql cli after putting in a hive-site.xml with metastore and
metastore db directory same as the one in spark bin,  but it doesn't seem to
be working. I am getting "org.apache.hadoop.hive.ql.metadata.HiveException:
Unable to fetch table test_tbl". 

Is this possible ? 

Regards,
Gaurav



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-CLI-tp14840.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: Spark SQL CLI

Posted by Gaurav Tiwari <gt...@gmail.com>.

Thanks , will give it a try, appreciate your help

Regards,
Gaurav
On Sep 23, 2014 1:52 PM, "Michael Armbrust" <mi...@databricks.com> wrote:

> A workaround for now would be to save the JSON as parquet and the create a
> metastore parquet table.  Using parquet will be much faster for repeated
> querying. This function might be helpful:
>
> import org.apache.spark.sql.hive.HiveMetastoreTypes
>
> def createParquetTable(name: String, file: String, sqlContext:
> SQLContext): Unit = {
>   import sqlContext._
>
>   val rdd = parquetFile(file)
>   val schema = rdd.schema.fields.map(f => s"${f.name}
> ${HiveMetastoreTypes.toMetastoreType(f.dataType)}").mkString(",\n")
>   val ddl = s"""
>     |CREATE EXTERNAL TABLE $name (
>     |  $schema
>     |)
>     |ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe'
>     |STORED AS INPUTFORMAT 'parquet.hive.DeprecatedParquetInputFormat'
>     |OUTPUTFORMAT 'parquet.hive.DeprecatedParquetOutputFormat'
>     |LOCATION '$file'""".stripMargin
>   sql(ddl)
>   setConf("spark.sql.hive.convertMetastoreParquet", "true")
> }
>
> On Tue, Sep 23, 2014 at 10:49 AM, Michael Armbrust <michael@databricks.com
> > wrote:
>
>> You can't directly query JSON tables from the CLI or JDBC server since
>> temporary tables only live for the life of the Spark Context.  This PR will
>> eventually (targeted for 1.2) let you do what you want in pure SQL:
>> https://github.com/apache/spark/pull/2475
>>
>> On Mon, Sep 22, 2014 at 4:52 PM, Yin Huai <hu...@gmail.com> wrote:
>>
>>> Hi Gaurav,
>>>
>>> Seems metastore should be created by LocalHiveContext and metastore_db
>>> should be created by a regular HiveContext. Can you check if you are still
>>> using LocalHiveContext when you tried to access your tables? Also, if you
>>> created those tables when you launched your sql cli under bin/, you can
>>> launch sql cli in the same dir (bin/) and spark sql should be able to
>>> connect to the metastore without any setting.
>>>
>>> btw, Can you let me know your settings in hive-site?
>>>
>>> Thanks,
>>>
>>> Yin
>>>
>>> On Mon, Sep 22, 2014 at 7:18 PM, Gaurav Tiwari <gt...@gmail.com>
>>> wrote:
>>>
>>>> Hi ,
>>>>
>>>> I tried setting the metastore and metastore_db location in the
>>>> *conf/hive-site.xml *to the directories created in spark bin folder
>>>> (they were created when I ran spark shell and used LocalHiveContext), but
>>>> still doesn't work
>>>>
>>>> Do I need to same my RDD as a table through hive context to make this
>>>> work?
>>>>
>>>> Regards,
>>>> Gaurav
>>>>
>>>> On Mon, Sep 22, 2014 at 6:30 PM, Yin Huai <hu...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Gaurav,
>>>>>
>>>>> Can you put hive-site.xml in conf/ and try again?
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Yin
>>>>>
>>>>> On Mon, Sep 22, 2014 at 4:02 PM, gtinside <gt...@gmail.com> wrote:
>>>>>
>>>>>> Hi ,
>>>>>>
>>>>>> I have been using spark shell to execute all SQLs. I am connecting to
>>>>>> Cassandra , converting the data in JSON and then running queries on
>>>>>> it,  I
>>>>>> am using HiveContext (and not SQLContext) because of "explode "
>>>>>> functionality in it.
>>>>>>
>>>>>> I want to see how can I use Spark SQL CLI for directly running the
>>>>>> queries
>>>>>> on saved table. I see metastore and metastore_db getting created in
>>>>>> the
>>>>>> spark bin directory (my hive context is LocalHiveContext). I tried
>>>>>> executing
>>>>>> queries in spark-sql cli after putting in a hive-site.xml with
>>>>>> metastore and
>>>>>> metastore db directory same as the one in spark bin,  but it doesn't
>>>>>> seem to
>>>>>> be working. I am getting
>>>>>> "org.apache.hadoop.hive.ql.metadata.HiveException:
>>>>>> Unable to fetch table test_tbl".
>>>>>>
>>>>>> Is this possible ?
>>>>>>
>>>>>> Regards,
>>>>>> Gaurav
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> View this message in context:
>>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-CLI-tp14840.html
>>>>>> Sent from the Apache Spark User List mailing list archive at
>>>>>> Nabble.com.
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>>>>> For additional commands, e-mail: user-help@spark.apache.org
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Spark SQL CLI

Posted by Michael Armbrust <mi...@databricks.com>.

A workaround for now would be to save the JSON as parquet and the create a
metastore parquet table.  Using parquet will be much faster for repeated
querying. This function might be helpful:

import org.apache.spark.sql.hive.HiveMetastoreTypes

def createParquetTable(name: String, file: String, sqlContext: SQLContext):
Unit = {
  import sqlContext._

  val rdd = parquetFile(file)
  val schema = rdd.schema.fields.map(f => s"${f.name}
${HiveMetastoreTypes.toMetastoreType(f.dataType)}").mkString(",\n")
  val ddl = s"""
    |CREATE EXTERNAL TABLE $name (
    |  $schema
    |)
    |ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe'
    |STORED AS INPUTFORMAT 'parquet.hive.DeprecatedParquetInputFormat'
    |OUTPUTFORMAT 'parquet.hive.DeprecatedParquetOutputFormat'
    |LOCATION '$file'""".stripMargin
  sql(ddl)
  setConf("spark.sql.hive.convertMetastoreParquet", "true")
}

On Tue, Sep 23, 2014 at 10:49 AM, Michael Armbrust <mi...@databricks.com>
wrote:

> You can't directly query JSON tables from the CLI or JDBC server since
> temporary tables only live for the life of the Spark Context.  This PR will
> eventually (targeted for 1.2) let you do what you want in pure SQL:
> https://github.com/apache/spark/pull/2475
>
> On Mon, Sep 22, 2014 at 4:52 PM, Yin Huai <hu...@gmail.com> wrote:
>
>> Hi Gaurav,
>>
>> Seems metastore should be created by LocalHiveContext and metastore_db
>> should be created by a regular HiveContext. Can you check if you are still
>> using LocalHiveContext when you tried to access your tables? Also, if you
>> created those tables when you launched your sql cli under bin/, you can
>> launch sql cli in the same dir (bin/) and spark sql should be able to
>> connect to the metastore without any setting.
>>
>> btw, Can you let me know your settings in hive-site?
>>
>> Thanks,
>>
>> Yin
>>
>> On Mon, Sep 22, 2014 at 7:18 PM, Gaurav Tiwari <gt...@gmail.com>
>> wrote:
>>
>>> Hi ,
>>>
>>> I tried setting the metastore and metastore_db location in the
>>> *conf/hive-site.xml *to the directories created in spark bin folder
>>> (they were created when I ran spark shell and used LocalHiveContext), but
>>> still doesn't work
>>>
>>> Do I need to same my RDD as a table through hive context to make this
>>> work?
>>>
>>> Regards,
>>> Gaurav
>>>
>>> On Mon, Sep 22, 2014 at 6:30 PM, Yin Huai <hu...@gmail.com> wrote:
>>>
>>>> Hi Gaurav,
>>>>
>>>> Can you put hive-site.xml in conf/ and try again?
>>>>
>>>> Thanks,
>>>>
>>>> Yin
>>>>
>>>> On Mon, Sep 22, 2014 at 4:02 PM, gtinside <gt...@gmail.com> wrote:
>>>>
>>>>> Hi ,
>>>>>
>>>>> I have been using spark shell to execute all SQLs. I am connecting to
>>>>> Cassandra , converting the data in JSON and then running queries on
>>>>> it,  I
>>>>> am using HiveContext (and not SQLContext) because of "explode "
>>>>> functionality in it.
>>>>>
>>>>> I want to see how can I use Spark SQL CLI for directly running the
>>>>> queries
>>>>> on saved table. I see metastore and metastore_db getting created in the
>>>>> spark bin directory (my hive context is LocalHiveContext). I tried
>>>>> executing
>>>>> queries in spark-sql cli after putting in a hive-site.xml with
>>>>> metastore and
>>>>> metastore db directory same as the one in spark bin,  but it doesn't
>>>>> seem to
>>>>> be working. I am getting
>>>>> "org.apache.hadoop.hive.ql.metadata.HiveException:
>>>>> Unable to fetch table test_tbl".
>>>>>
>>>>> Is this possible ?
>>>>>
>>>>> Regards,
>>>>> Gaurav
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> View this message in context:
>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-CLI-tp14840.html
>>>>> Sent from the Apache Spark User List mailing list archive at
>>>>> Nabble.com.
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>>>> For additional commands, e-mail: user-help@spark.apache.org
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Spark SQL CLI

Posted by Michael Armbrust <mi...@databricks.com>.

You can't directly query JSON tables from the CLI or JDBC server since
temporary tables only live for the life of the Spark Context.  This PR will
eventually (targeted for 1.2) let you do what you want in pure SQL:
https://github.com/apache/spark/pull/2475

On Mon, Sep 22, 2014 at 4:52 PM, Yin Huai <hu...@gmail.com> wrote:

> Hi Gaurav,
>
> Seems metastore should be created by LocalHiveContext and metastore_db
> should be created by a regular HiveContext. Can you check if you are still
> using LocalHiveContext when you tried to access your tables? Also, if you
> created those tables when you launched your sql cli under bin/, you can
> launch sql cli in the same dir (bin/) and spark sql should be able to
> connect to the metastore without any setting.
>
> btw, Can you let me know your settings in hive-site?
>
> Thanks,
>
> Yin
>
> On Mon, Sep 22, 2014 at 7:18 PM, Gaurav Tiwari <gt...@gmail.com> wrote:
>
>> Hi ,
>>
>> I tried setting the metastore and metastore_db location in the
>> *conf/hive-site.xml *to the directories created in spark bin folder
>> (they were created when I ran spark shell and used LocalHiveContext), but
>> still doesn't work
>>
>> Do I need to same my RDD as a table through hive context to make this
>> work?
>>
>> Regards,
>> Gaurav
>>
>> On Mon, Sep 22, 2014 at 6:30 PM, Yin Huai <hu...@gmail.com> wrote:
>>
>>> Hi Gaurav,
>>>
>>> Can you put hive-site.xml in conf/ and try again?
>>>
>>> Thanks,
>>>
>>> Yin
>>>
>>> On Mon, Sep 22, 2014 at 4:02 PM, gtinside <gt...@gmail.com> wrote:
>>>
>>>> Hi ,
>>>>
>>>> I have been using spark shell to execute all SQLs. I am connecting to
>>>> Cassandra , converting the data in JSON and then running queries on
>>>> it,  I
>>>> am using HiveContext (and not SQLContext) because of "explode "
>>>> functionality in it.
>>>>
>>>> I want to see how can I use Spark SQL CLI for directly running the
>>>> queries
>>>> on saved table. I see metastore and metastore_db getting created in the
>>>> spark bin directory (my hive context is LocalHiveContext). I tried
>>>> executing
>>>> queries in spark-sql cli after putting in a hive-site.xml with
>>>> metastore and
>>>> metastore db directory same as the one in spark bin,  but it doesn't
>>>> seem to
>>>> be working. I am getting
>>>> "org.apache.hadoop.hive.ql.metadata.HiveException:
>>>> Unable to fetch table test_tbl".
>>>>
>>>> Is this possible ?
>>>>
>>>> Regards,
>>>> Gaurav
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-CLI-tp14840.html
>>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>>> For additional commands, e-mail: user-help@spark.apache.org
>>>>
>>>>
>>>
>>
>

Re: Spark SQL CLI

Posted by Yin Huai <hu...@gmail.com>.

Hi Gaurav,

Seems metastore should be created by LocalHiveContext and metastore_db
should be created by a regular HiveContext. Can you check if you are still
using LocalHiveContext when you tried to access your tables? Also, if you
created those tables when you launched your sql cli under bin/, you can
launch sql cli in the same dir (bin/) and spark sql should be able to
connect to the metastore without any setting.

btw, Can you let me know your settings in hive-site?

Thanks,

Yin

On Mon, Sep 22, 2014 at 7:18 PM, Gaurav Tiwari <gt...@gmail.com> wrote:

> Hi ,
>
> I tried setting the metastore and metastore_db location in the
> *conf/hive-site.xml *to the directories created in spark bin folder (they
> were created when I ran spark shell and used LocalHiveContext), but still
> doesn't work
>
> Do I need to same my RDD as a table through hive context to make this work?
>
> Regards,
> Gaurav
>
> On Mon, Sep 22, 2014 at 6:30 PM, Yin Huai <hu...@gmail.com> wrote:
>
>> Hi Gaurav,
>>
>> Can you put hive-site.xml in conf/ and try again?
>>
>> Thanks,
>>
>> Yin
>>
>> On Mon, Sep 22, 2014 at 4:02 PM, gtinside <gt...@gmail.com> wrote:
>>
>>> Hi ,
>>>
>>> I have been using spark shell to execute all SQLs. I am connecting to
>>> Cassandra , converting the data in JSON and then running queries on it,
>>> I
>>> am using HiveContext (and not SQLContext) because of "explode "
>>> functionality in it.
>>>
>>> I want to see how can I use Spark SQL CLI for directly running the
>>> queries
>>> on saved table. I see metastore and metastore_db getting created in the
>>> spark bin directory (my hive context is LocalHiveContext). I tried
>>> executing
>>> queries in spark-sql cli after putting in a hive-site.xml with metastore
>>> and
>>> metastore db directory same as the one in spark bin,  but it doesn't
>>> seem to
>>> be working. I am getting
>>> "org.apache.hadoop.hive.ql.metadata.HiveException:
>>> Unable to fetch table test_tbl".
>>>
>>> Is this possible ?
>>>
>>> Regards,
>>> Gaurav
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-CLI-tp14840.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>> For additional commands, e-mail: user-help@spark.apache.org
>>>
>>>
>>
>

Re: Spark SQL CLI

Posted by Gaurav Tiwari <gt...@gmail.com>.

Hi ,

I tried setting the metastore and metastore_db location in the
*conf/hive-site.xml *to the directories created in spark bin folder (they
were created when I ran spark shell and used LocalHiveContext), but still
doesn't work

Do I need to same my RDD as a table through hive context to make this work?

Regards,
Gaurav

On Mon, Sep 22, 2014 at 6:30 PM, Yin Huai <hu...@gmail.com> wrote:

> Hi Gaurav,
>
> Can you put hive-site.xml in conf/ and try again?
>
> Thanks,
>
> Yin
>
> On Mon, Sep 22, 2014 at 4:02 PM, gtinside <gt...@gmail.com> wrote:
>
>> Hi ,
>>
>> I have been using spark shell to execute all SQLs. I am connecting to
>> Cassandra , converting the data in JSON and then running queries on it,  I
>> am using HiveContext (and not SQLContext) because of "explode "
>> functionality in it.
>>
>> I want to see how can I use Spark SQL CLI for directly running the queries
>> on saved table. I see metastore and metastore_db getting created in the
>> spark bin directory (my hive context is LocalHiveContext). I tried
>> executing
>> queries in spark-sql cli after putting in a hive-site.xml with metastore
>> and
>> metastore db directory same as the one in spark bin,  but it doesn't seem
>> to
>> be working. I am getting
>> "org.apache.hadoop.hive.ql.metadata.HiveException:
>> Unable to fetch table test_tbl".
>>
>> Is this possible ?
>>
>> Regards,
>> Gaurav
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-CLI-tp14840.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org
>>
>>
>

Re: Spark SQL CLI

Posted by Yin Huai <hu...@gmail.com>.

Hi Gaurav,

Can you put hive-site.xml in conf/ and try again?

Thanks,

Yin

On Mon, Sep 22, 2014 at 4:02 PM, gtinside <gt...@gmail.com> wrote:

> Hi ,
>
> I have been using spark shell to execute all SQLs. I am connecting to
> Cassandra , converting the data in JSON and then running queries on it,  I
> am using HiveContext (and not SQLContext) because of "explode "
> functionality in it.
>
> I want to see how can I use Spark SQL CLI for directly running the queries
> on saved table. I see metastore and metastore_db getting created in the
> spark bin directory (my hive context is LocalHiveContext). I tried
> executing
> queries in spark-sql cli after putting in a hive-site.xml with metastore
> and
> metastore db directory same as the one in spark bin,  but it doesn't seem
> to
> be working. I am getting "org.apache.hadoop.hive.ql.metadata.HiveException:
> Unable to fetch table test_tbl".
>
> Is this possible ?
>
> Regards,
> Gaurav
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-CLI-tp14840.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>