You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Pralabh Kumar <pr...@gmail.com> on 2018/04/05 03:43:40 UTC

Best way to Hive to Spark migration

Hi Spark group

What's the best way to Migrate Hive to Spark

1) Use HiveContext of Spark
2) Use Hive on Spark (
https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started
)
3) Migrate Hive to Calcite to Spark SQL


Regards

Re: Best way to Hive to Spark migration

Posted by Jörn Franke <jo...@gmail.com>.
And the usual hint when migrating - do not migrate only but also optimize the ETL process design - this brings the most benefit s

> On 5. Apr 2018, at 08:18, Jörn Franke <jo...@gmail.com> wrote:
> 
> Ok this is not much detail, but you are probably best off if you migrate them to SparkSQL.
> 
> Depends also on the Hive version and Spark version. If you have a recent one with TEZ+llap I would not expect so much difference. It can be also less performant -Spark SQL got only recently some features suchst cost based optimizer.
> 
>> On 5. Apr 2018, at 08:02, Pralabh Kumar <pr...@gmail.com> wrote:
>> 
>> Hi 
>> 
>> I have lot of ETL jobs (complex ones) , since they are SLA critical , I am planning them to migrate to spark.
>> 
>>> On Thu, Apr 5, 2018 at 10:46 AM, Jörn Franke <jo...@gmail.com> wrote:
>>> You need to provide more context on what you do currently in Hive and what do you expect from the migration.
>>> 
>>>> On 5. Apr 2018, at 05:43, Pralabh Kumar <pr...@gmail.com> wrote:
>>>> 
>>>> Hi Spark group
>>>> 
>>>> What's the best way to Migrate Hive to Spark
>>>> 
>>>> 1) Use HiveContext of Spark
>>>> 2) Use Hive on Spark (https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started)
>>>> 3) Migrate Hive to Calcite to Spark SQL
>>>> 
>>>> 
>>>> Regards
>>>> 
>> 

Re: Best way to Hive to Spark migration

Posted by Jörn Franke <jo...@gmail.com>.
Ok this is not much detail, but you are probably best off if you migrate them to SparkSQL.

Depends also on the Hive version and Spark version. If you have a recent one with TEZ+llap I would not expect so much difference. It can be also less performant -Spark SQL got only recently some features suchst cost based optimizer.

> On 5. Apr 2018, at 08:02, Pralabh Kumar <pr...@gmail.com> wrote:
> 
> Hi 
> 
> I have lot of ETL jobs (complex ones) , since they are SLA critical , I am planning them to migrate to spark.
> 
>> On Thu, Apr 5, 2018 at 10:46 AM, Jörn Franke <jo...@gmail.com> wrote:
>> You need to provide more context on what you do currently in Hive and what do you expect from the migration.
>> 
>>> On 5. Apr 2018, at 05:43, Pralabh Kumar <pr...@gmail.com> wrote:
>>> 
>>> Hi Spark group
>>> 
>>> What's the best way to Migrate Hive to Spark
>>> 
>>> 1) Use HiveContext of Spark
>>> 2) Use Hive on Spark (https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started)
>>> 3) Migrate Hive to Calcite to Spark SQL
>>> 
>>> 
>>> Regards
>>> 
> 

Re: Best way to Hive to Spark migration

Posted by Pralabh Kumar <pr...@gmail.com>.
Hi

I have lot of ETL jobs (complex ones) , since they are SLA critical , I am
planning them to migrate to spark.

On Thu, Apr 5, 2018 at 10:46 AM, Jörn Franke <jo...@gmail.com> wrote:

> You need to provide more context on what you do currently in Hive and what
> do you expect from the migration.
>
> On 5. Apr 2018, at 05:43, Pralabh Kumar <pr...@gmail.com> wrote:
>
> Hi Spark group
>
> What's the best way to Migrate Hive to Spark
>
> 1) Use HiveContext of Spark
> 2) Use Hive on Spark (https://cwiki.apache.org/
> confluence/display/Hive/Hive+on+Spark%3A+Getting+Started)
> 3) Migrate Hive to Calcite to Spark SQL
>
>
> Regards
>
>

Re: Best way to Hive to Spark migration

Posted by Jörn Franke <jo...@gmail.com>.
You need to provide more context on what you do currently in Hive and what do you expect from the migration.

> On 5. Apr 2018, at 05:43, Pralabh Kumar <pr...@gmail.com> wrote:
> 
> Hi Spark group
> 
> What's the best way to Migrate Hive to Spark
> 
> 1) Use HiveContext of Spark
> 2) Use Hive on Spark (https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started)
> 3) Migrate Hive to Calcite to Spark SQL
> 
> 
> Regards
> 

Re: Best way to Hive to Spark migration

Posted by Jörn Franke <jo...@gmail.com>.
You need to provide more context on what you do currently in Hive and what do you expect from the migration.

> On 5. Apr 2018, at 05:43, Pralabh Kumar <pr...@gmail.com> wrote:
> 
> Hi Spark group
> 
> What's the best way to Migrate Hive to Spark
> 
> 1) Use HiveContext of Spark
> 2) Use Hive on Spark (https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started)
> 3) Migrate Hive to Calcite to Spark SQL
> 
> 
> Regards
>