You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hive.apache.org by Patrick McAnneny <pa...@leadkarma.com> on 2015/08/27 18:11:06 UTC

Hive on Spark

Once I get "hive.execution.engine=spark" working, how would I go about
loading portions of my data into memory? Lets say I have a 100TB database
and want to load all of last weeks data in spark memory, is this possible
or even beneficial? Or am I thinking about hive on spark in the wrong way.

I also assume hive on spark could get me to near-real-time capabilities for
large queries. Is this true?

Re: Hive on Spark

Posted by Patrick McAnneny <pa...@leadkarma.com>.

What is the benefit of Hive on Spark if you cannot pre-load data into
memory that you know will be queried.

On Mon, Aug 31, 2015 at 4:25 PM, Xuefu Zhang <xz...@cloudera.com> wrote:

> What you described isn't part of the functionality of Hive on Spark.
> Rather, Spark is used here as a general purpose engine similar to MR but
> without intemediate stages. It's batch origientated.
>
> Keeping 100T data in memory is hardly beneficial unless you know that that
> dataset is going to be used in subsequent queries.
>
> For loading data in memory and providing near real-time response, you
> might want to look at some memory-based DBs.
>
> Thanks,
> Xuefu
>
> On Thu, Aug 27, 2015 at 9:11 AM, Patrick McAnneny <
> patrick.mcanneny@leadkarma.com> wrote:
>
>> Once I get "hive.execution.engine=spark" working, how would I go about
>> loading portions of my data into memory? Lets say I have a 100TB database
>> and want to load all of last weeks data in spark memory, is this possible
>> or even beneficial? Or am I thinking about hive on spark in the wrong way.
>>
>> I also assume hive on spark could get me to near-real-time capabilities
>> for large queries. Is this true?
>>
>
>

Re: Hive on Spark

Posted by Xuefu Zhang <xz...@cloudera.com>.

What you described isn't part of the functionality of Hive on Spark.
Rather, Spark is used here as a general purpose engine similar to MR but
without intemediate stages. It's batch origientated.

Keeping 100T data in memory is hardly beneficial unless you know that that
dataset is going to be used in subsequent queries.

For loading data in memory and providing near real-time response, you might
want to look at some memory-based DBs.

Thanks,
Xuefu

On Thu, Aug 27, 2015 at 9:11 AM, Patrick McAnneny <
patrick.mcanneny@leadkarma.com> wrote:

> Once I get "hive.execution.engine=spark" working, how would I go about
> loading portions of my data into memory? Lets say I have a 100TB database
> and want to load all of last weeks data in spark memory, is this possible
> or even beneficial? Or am I thinking about hive on spark in the wrong way.
>
> I also assume hive on spark could get me to near-real-time capabilities
> for large queries. Is this true?
>