You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Daniel Haviv <da...@veracity-group.com> on 2015/08/05 19:57:11 UTC

Starting Spark SQL thrift server from within a streaming app

Hi,
Is it possible to start the Spark SQL thrift server from with a streaming app so the streamed data could be queried as it's goes in ?

Thank you.
Daniel

Re: Starting Spark SQL thrift server from within a streaming app

Posted by Todd Nist <ts...@gmail.com>.

Well the creation of a thrift server would be to allow external access to
the data from JDBC / ODBC type connections.  The sparkstreaming-sql
leverages a standard spark sql context and then provides a means of
converting an incoming dstream into a row, look at the MessageToRow trait
in KafkaSource class.

The example, org.apache.spark.sql.streaming.examples.KafkaDDL should make
it clear; I think.

-Todd

On Thu, Aug 6, 2015 at 7:58 AM, Daniel Haviv <
daniel.haviv@veracity-group.com> wrote:

> Thank you Todd,
> How is the sparkstreaming-sql project different from starting a thrift
> server on a streaming app ?
>
> Thanks again.
> Daniel
>
>
> On Thu, Aug 6, 2015 at 1:53 AM, Todd Nist <ts...@gmail.com> wrote:
>
>> Hi Danniel,
>>
>> It is possible to create an instance of the SparkSQL Thrift server,
>> however seems like this project is what you may be looking for:
>>
>> https://github.com/Intel-bigdata/spark-streamingsql
>>
>> Not 100% sure of your use case is, but you can always convert the data
>> into DF then issue a query against it.  If you want other systems to be
>> able to query it then there are numerous connectors to  store data into
>> Hive, Cassandra, HBase, ElasticSearch, ....
>>
>> To create a instance of a thrift server with its own SQL Context you
>> would do something like the following:
>>
>> import org.apache.spark.{SparkConf, SparkContext}
>>
>> import org.apache.spark.sql.hive.HiveContext
>> import org.apache.spark.sql.hive.HiveMetastoreTypes._
>> import org.apache.spark.sql.types._
>> import org.apache.spark.sql.hive.thriftserver._
>>
>>
>> object MyThriftServer {
>>
>>   val sparkConf = new SparkConf()
>>     // master is passed to spark-submit, but could also be specified explicitely
>>     // .setMaster(sparkMaster)
>>     .setAppName("My ThriftServer")
>>     .set("spark.cores.max", "2")
>>   val sc = new SparkContext(sparkConf)
>>   val  sparkContext  =  sc
>>   import  sparkContext._
>>   val  sqlContext  =  new  HiveContext(sparkContext)
>>   import  sqlContext._
>>   import sqlContext.implicits._
>>
>>   makeRDD((1,"hello") :: (2,"world") ::Nil).toDF.cache().registerTempTable("t")
>>
>>   HiveThriftServer2.startWithContext(sqlContext)
>> }
>>
>> Again, I'm not really clear what your use case is, but it does sound like
>> the first link above is what you may want.
>>
>> -Todd
>>
>> On Wed, Aug 5, 2015 at 1:57 PM, Daniel Haviv <
>> daniel.haviv@veracity-group.com> wrote:
>>
>>> Hi,
>>> Is it possible to start the Spark SQL thrift server from with a
>>> streaming app so the streamed data could be queried as it's goes in ?
>>>
>>> Thank you.
>>> Daniel
>>>
>>
>>
>

Re: Starting Spark SQL thrift server from within a streaming app

Posted by Daniel Haviv <da...@veracity-group.com>.

Thank you Todd,
How is the sparkstreaming-sql project different from starting a thrift
server on a streaming app ?

Thanks again.
Daniel


On Thu, Aug 6, 2015 at 1:53 AM, Todd Nist <ts...@gmail.com> wrote:

> Hi Danniel,
>
> It is possible to create an instance of the SparkSQL Thrift server,
> however seems like this project is what you may be looking for:
>
> https://github.com/Intel-bigdata/spark-streamingsql
>
> Not 100% sure of your use case is, but you can always convert the data
> into DF then issue a query against it.  If you want other systems to be
> able to query it then there are numerous connectors to  store data into
> Hive, Cassandra, HBase, ElasticSearch, ....
>
> To create a instance of a thrift server with its own SQL Context you would
> do something like the following:
>
> import org.apache.spark.{SparkConf, SparkContext}
>
> import org.apache.spark.sql.hive.HiveContext
> import org.apache.spark.sql.hive.HiveMetastoreTypes._
> import org.apache.spark.sql.types._
> import org.apache.spark.sql.hive.thriftserver._
>
>
> object MyThriftServer {
>
>   val sparkConf = new SparkConf()
>     // master is passed to spark-submit, but could also be specified explicitely
>     // .setMaster(sparkMaster)
>     .setAppName("My ThriftServer")
>     .set("spark.cores.max", "2")
>   val sc = new SparkContext(sparkConf)
>   val  sparkContext  =  sc
>   import  sparkContext._
>   val  sqlContext  =  new  HiveContext(sparkContext)
>   import  sqlContext._
>   import sqlContext.implicits._
>
>   makeRDD((1,"hello") :: (2,"world") ::Nil).toDF.cache().registerTempTable("t")
>
>   HiveThriftServer2.startWithContext(sqlContext)
> }
>
> Again, I'm not really clear what your use case is, but it does sound like
> the first link above is what you may want.
>
> -Todd
>
> On Wed, Aug 5, 2015 at 1:57 PM, Daniel Haviv <
> daniel.haviv@veracity-group.com> wrote:
>
>> Hi,
>> Is it possible to start the Spark SQL thrift server from with a streaming
>> app so the streamed data could be queried as it's goes in ?
>>
>> Thank you.
>> Daniel
>>
>
>

Re: Starting Spark SQL thrift server from within a streaming app

Posted by Todd Nist <ts...@gmail.com>.

Hi Danniel,

It is possible to create an instance of the SparkSQL Thrift server, however
seems like this project is what you may be looking for:

https://github.com/Intel-bigdata/spark-streamingsql

Not 100% sure of your use case is, but you can always convert the data into
DF then issue a query against it.  If you want other systems to be able to
query it then there are numerous connectors to  store data into Hive,
Cassandra, HBase, ElasticSearch, ....

To create a instance of a thrift server with its own SQL Context you would
do something like the following:

import org.apache.spark.{SparkConf, SparkContext}

import org.apache.spark.sql.hive.HiveContext
import org.apache.spark.sql.hive.HiveMetastoreTypes._
import org.apache.spark.sql.types._
import org.apache.spark.sql.hive.thriftserver._

object MyThriftServer {

  val sparkConf = new SparkConf()
    // master is passed to spark-submit, but could also be specified explicitely
    // .setMaster(sparkMaster)
    .setAppName("My ThriftServer")
    .set("spark.cores.max", "2")
  val sc = new SparkContext(sparkConf)
  val  sparkContext  =  sc
  import  sparkContext._
  val  sqlContext  =  new  HiveContext(sparkContext)
  import  sqlContext._
  import sqlContext.implicits._

  makeRDD((1,"hello") :: (2,"world") ::Nil).toDF.cache().registerTempTable("t")

  HiveThriftServer2.startWithContext(sqlContext)
}

Again, I'm not really clear what your use case is, but it does sound like
the first link above is what you may want.

-Todd

On Wed, Aug 5, 2015 at 1:57 PM, Daniel Haviv <
daniel.haviv@veracity-group.com> wrote:

> Hi,
> Is it possible to start the Spark SQL thrift server from with a streaming
> app so the streamed data could be queried as it's goes in ?
>
> Thank you.
> Daniel
>