You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Daniel Haviv <da...@veracity-group.com> on 2015/08/05 19:57:11 UTC
Starting Spark SQL thrift server from within a streaming app
Hi,
Is it possible to start the Spark SQL thrift server from with a streaming app so the streamed data could be queried as it's goes in ?
Thank you.
Daniel
Re: Starting Spark SQL thrift server from within a streaming app
Posted by Todd Nist <ts...@gmail.com>.
Well the creation of a thrift server would be to allow external access to
the data from JDBC / ODBC type connections. The sparkstreaming-sql
leverages a standard spark sql context and then provides a means of
converting an incoming dstream into a row, look at the MessageToRow trait
in KafkaSource class.
The example, org.apache.spark.sql.streaming.examples.KafkaDDL should make
it clear; I think.
-Todd
On Thu, Aug 6, 2015 at 7:58 AM, Daniel Haviv <
daniel.haviv@veracity-group.com> wrote:
> Thank you Todd,
> How is the sparkstreaming-sql project different from starting a thrift
> server on a streaming app ?
>
> Thanks again.
> Daniel
>
>
> On Thu, Aug 6, 2015 at 1:53 AM, Todd Nist <ts...@gmail.com> wrote:
>
>> Hi Danniel,
>>
>> It is possible to create an instance of the SparkSQL Thrift server,
>> however seems like this project is what you may be looking for:
>>
>> https://github.com/Intel-bigdata/spark-streamingsql
>>
>> Not 100% sure of your use case is, but you can always convert the data
>> into DF then issue a query against it. If you want other systems to be
>> able to query it then there are numerous connectors to store data into
>> Hive, Cassandra, HBase, ElasticSearch, ....
>>
>> To create a instance of a thrift server with its own SQL Context you
>> would do something like the following:
>>
>> import org.apache.spark.{SparkConf, SparkContext}
>>
>> import org.apache.spark.sql.hive.HiveContext
>> import org.apache.spark.sql.hive.HiveMetastoreTypes._
>> import org.apache.spark.sql.types._
>> import org.apache.spark.sql.hive.thriftserver._
>>
>>
>> object MyThriftServer {
>>
>> val sparkConf = new SparkConf()
>> // master is passed to spark-submit, but could also be specified explicitely
>> // .setMaster(sparkMaster)
>> .setAppName("My ThriftServer")
>> .set("spark.cores.max", "2")
>> val sc = new SparkContext(sparkConf)
>> val sparkContext = sc
>> import sparkContext._
>> val sqlContext = new HiveContext(sparkContext)
>> import sqlContext._
>> import sqlContext.implicits._
>>
>> makeRDD((1,"hello") :: (2,"world") ::Nil).toDF.cache().registerTempTable("t")
>>
>> HiveThriftServer2.startWithContext(sqlContext)
>> }
>>
>> Again, I'm not really clear what your use case is, but it does sound like
>> the first link above is what you may want.
>>
>> -Todd
>>
>> On Wed, Aug 5, 2015 at 1:57 PM, Daniel Haviv <
>> daniel.haviv@veracity-group.com> wrote:
>>
>>> Hi,
>>> Is it possible to start the Spark SQL thrift server from with a
>>> streaming app so the streamed data could be queried as it's goes in ?
>>>
>>> Thank you.
>>> Daniel
>>>
>>
>>
>
Re: Starting Spark SQL thrift server from within a streaming app
Posted by Daniel Haviv <da...@veracity-group.com>.
Thank you Todd,
How is the sparkstreaming-sql project different from starting a thrift
server on a streaming app ?
Thanks again.
Daniel
On Thu, Aug 6, 2015 at 1:53 AM, Todd Nist <ts...@gmail.com> wrote:
> Hi Danniel,
>
> It is possible to create an instance of the SparkSQL Thrift server,
> however seems like this project is what you may be looking for:
>
> https://github.com/Intel-bigdata/spark-streamingsql
>
> Not 100% sure of your use case is, but you can always convert the data
> into DF then issue a query against it. If you want other systems to be
> able to query it then there are numerous connectors to store data into
> Hive, Cassandra, HBase, ElasticSearch, ....
>
> To create a instance of a thrift server with its own SQL Context you would
> do something like the following:
>
> import org.apache.spark.{SparkConf, SparkContext}
>
> import org.apache.spark.sql.hive.HiveContext
> import org.apache.spark.sql.hive.HiveMetastoreTypes._
> import org.apache.spark.sql.types._
> import org.apache.spark.sql.hive.thriftserver._
>
>
> object MyThriftServer {
>
> val sparkConf = new SparkConf()
> // master is passed to spark-submit, but could also be specified explicitely
> // .setMaster(sparkMaster)
> .setAppName("My ThriftServer")
> .set("spark.cores.max", "2")
> val sc = new SparkContext(sparkConf)
> val sparkContext = sc
> import sparkContext._
> val sqlContext = new HiveContext(sparkContext)
> import sqlContext._
> import sqlContext.implicits._
>
> makeRDD((1,"hello") :: (2,"world") ::Nil).toDF.cache().registerTempTable("t")
>
> HiveThriftServer2.startWithContext(sqlContext)
> }
>
> Again, I'm not really clear what your use case is, but it does sound like
> the first link above is what you may want.
>
> -Todd
>
> On Wed, Aug 5, 2015 at 1:57 PM, Daniel Haviv <
> daniel.haviv@veracity-group.com> wrote:
>
>> Hi,
>> Is it possible to start the Spark SQL thrift server from with a streaming
>> app so the streamed data could be queried as it's goes in ?
>>
>> Thank you.
>> Daniel
>>
>
>
Re: Starting Spark SQL thrift server from within a streaming app
Posted by Todd Nist <ts...@gmail.com>.
Hi Danniel,
It is possible to create an instance of the SparkSQL Thrift server, however
seems like this project is what you may be looking for:
https://github.com/Intel-bigdata/spark-streamingsql
Not 100% sure of your use case is, but you can always convert the data into
DF then issue a query against it. If you want other systems to be able to
query it then there are numerous connectors to store data into Hive,
Cassandra, HBase, ElasticSearch, ....
To create a instance of a thrift server with its own SQL Context you would
do something like the following:
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.sql.hive.HiveContext
import org.apache.spark.sql.hive.HiveMetastoreTypes._
import org.apache.spark.sql.types._
import org.apache.spark.sql.hive.thriftserver._
object MyThriftServer {
val sparkConf = new SparkConf()
// master is passed to spark-submit, but could also be specified explicitely
// .setMaster(sparkMaster)
.setAppName("My ThriftServer")
.set("spark.cores.max", "2")
val sc = new SparkContext(sparkConf)
val sparkContext = sc
import sparkContext._
val sqlContext = new HiveContext(sparkContext)
import sqlContext._
import sqlContext.implicits._
makeRDD((1,"hello") :: (2,"world") ::Nil).toDF.cache().registerTempTable("t")
HiveThriftServer2.startWithContext(sqlContext)
}
Again, I'm not really clear what your use case is, but it does sound like
the first link above is what you may want.
-Todd
On Wed, Aug 5, 2015 at 1:57 PM, Daniel Haviv <
daniel.haviv@veracity-group.com> wrote:
> Hi,
> Is it possible to start the Spark SQL thrift server from with a streaming
> app so the streamed data could be queried as it's goes in ?
>
> Thank you.
> Daniel
>