You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Michael Segel <ms...@hotmail.com> on 2016/05/16 19:12:37 UTC

Silly Question on my part...

For one use case.. we were considering using the thrift server as a way to allow multiple clients access shared RDDs. 

Within the Thrift Context, we create an RDD and expose it as a hive table. 

The question  is… where does the RDD exist. On the Thrift service node itself, or is that just a reference to the RDD which is contained with contexts on the cluster? 


Thx

-Mike


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: Silly Question on my part...

Posted by Do...@ODDO, od...@gmail.com.

On 5/16/2016 12:12 PM, Michael Segel wrote:
> For one use case.. we were considering using the thrift server as a way to allow multiple clients access shared RDDs.
>
> Within the Thrift Context, we create an RDD and expose it as a hive table.
>
> The question  is\u2026 where does the RDD exist. On the Thrift service node itself, or is that just a reference to the RDD which is contained with contexts on the cluster?
>

You can share RDDs using Apache Ignite - it is a distributed memory 
grid/cache with tons of additional functionality. The advantage is extra 
resilience (you can mirror caches or just partition them), you can query 
the contents of the caches in standard SQL etc. Since the caches persist 
past the existence of the Spark app, you can share them (obviously). You 
also get read/write through to SQL or NoSQL databases on the back end 
for persistence and loading/dumping caches to secondary storage. It is 
written in Java so very easy to use from Scala/Spark apps.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: Silly Question on my part...

Posted by Gene Pang <ge...@gmail.com>.

Hi Michael,

Yes, you can use Alluxio to share Spark RDDs. Here is a blog post about
getting started with Spark and Alluxio (
http://www.alluxio.com/2016/04/getting-started-with-alluxio-and-spark/),
and some documentation (
http://alluxio.org/documentation/master/en/Running-Spark-on-Alluxio.html).

I hope that helps,
Gene

On Tue, May 17, 2016 at 8:36 AM, Michael Segel <ms...@hotmail.com>
wrote:

> Thanks for the response.
>
> That’s what I thought, but I didn’t want to assume anything.
> (You know what happens when you ass u me … :-)
>
>
> Not sure about Tachyon though.  Its a thought, but I’m very conservative
> when it comes to design choices.
>
>
> On May 16, 2016, at 5:21 PM, John Trengrove <jo...@servian.com.au>
> wrote:
>
> If you are wanting to share RDDs it might be a good idea to check out
> Tachyon / Alluxio.
>
> For the Thrift server, I believe the datasets are located in your Spark
> cluster as RDDs and you just communicate with it via the Thrift
> JDBC Distributed Query Engine connector.
>
> 2016-05-17 5:12 GMT+10:00 Michael Segel <ms...@hotmail.com>:
>
>> For one use case.. we were considering using the thrift server as a way
>> to allow multiple clients access shared RDDs.
>>
>> Within the Thrift Context, we create an RDD and expose it as a hive table.
>>
>> The question  is… where does the RDD exist. On the Thrift service node
>> itself, or is that just a reference to the RDD which is contained with
>> contexts on the cluster?
>>
>>
>> Thx
>>
>> -Mike
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org
>>
>>
>
>
>

Re: Silly Question on my part...

Posted by Michael Segel <ms...@hotmail.com>.

Thanks for the response. 

That’s what I thought, but I didn’t want to assume anything. 
(You know what happens when you ass u me … :-) 


Not sure about Tachyon though.  Its a thought, but I’m very conservative when it comes to design choices. 


> On May 16, 2016, at 5:21 PM, John Trengrove <jo...@servian.com.au> wrote:
> 
> If you are wanting to share RDDs it might be a good idea to check out Tachyon / Alluxio.
> 
> For the Thrift server, I believe the datasets are located in your Spark cluster as RDDs and you just communicate with it via the Thrift JDBC Distributed Query Engine connector.
> 
> 2016-05-17 5:12 GMT+10:00 Michael Segel <msegel_hadoop@hotmail.com <ma...@hotmail.com>>:
> For one use case.. we were considering using the thrift server as a way to allow multiple clients access shared RDDs.
> 
> Within the Thrift Context, we create an RDD and expose it as a hive table.
> 
> The question  is… where does the RDD exist. On the Thrift service node itself, or is that just a reference to the RDD which is contained with contexts on the cluster?
> 
> 
> Thx
> 
> -Mike
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org <ma...@spark.apache.org>
> For additional commands, e-mail: user-help@spark.apache.org <ma...@spark.apache.org>
> 
> 
>

Re: Silly Question on my part...

Posted by John Trengrove <jo...@servian.com.au>.

If you are wanting to share RDDs it might be a good idea to check out
Tachyon / Alluxio.

For the Thrift server, I believe the datasets are located in your Spark
cluster as RDDs and you just communicate with it via the Thrift
JDBC Distributed Query Engine connector.

2016-05-17 5:12 GMT+10:00 Michael Segel <ms...@hotmail.com>:

> For one use case.. we were considering using the thrift server as a way to
> allow multiple clients access shared RDDs.
>
> Within the Thrift Context, we create an RDD and expose it as a hive table.
>
> The question  is… where does the RDD exist. On the Thrift service node
> itself, or is that just a reference to the RDD which is contained with
> contexts on the cluster?
>
>
> Thx
>
> -Mike
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>