You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Paul Snively <ps...@icloud.com> on 2013/08/27 20:03:00 UTC

Shark Queries on Streams?

Hi everyone!

I'm continuing to investigate the Spark/Shark ecosystem and am fascinated by the potential. In noticing that I can cache a DStream, it occurred to me to wonder whether there's a way to run Shark queries against a cached DStream? I guess this would imply that a cached DStream has "similar enough" structure to a Shark table, or could somehow be treated as a (memory-based) "external table," for the sake of being representable in the metastore.

Does this make any sense?

Thanks!
Paul

Re: Shark Queries on Streams?

Posted by Reynold Xin <rx...@cs.berkeley.edu>.
It definitely makes sense. In the long run we definitely would like to make
Shark work for streaming queries.

There was a prototype Harvey did a while ago that makes Shark being able to
query streaming RDDs. I will let him comment on how he implemented that.

Note that this might help too: https://github.com/amplab/shark/pull/136



--
Reynold Xin, AMPLab, UC Berkeley
http://rxin.org



On Tue, Aug 27, 2013 at 11:03 AM, Paul Snively <ps...@icloud.com> wrote:

> Hi everyone!
>
> I'm continuing to investigate the Spark/Shark ecosystem and am fascinated
> by the potential. In noticing that I can cache a DStream, it occurred to me
> to wonder whether there's a way to run Shark queries against a cached
> DStream? I guess this would imply that a cached DStream has "similar
> enough" structure to a Shark table, or could somehow be treated as a
> (memory-based) "external table," for the sake of being representable in the
> metastore.
>
> Does this make any sense?
>
> Thanks!
> Paul