You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Pralabh Kumar <pr...@gmail.com> on 2018/04/05 03:43:40 UTC
Best way to Hive to Spark migration
Hi Spark group
What's the best way to Migrate Hive to Spark
1) Use HiveContext of Spark
2) Use Hive on Spark (
https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started
)
3) Migrate Hive to Calcite to Spark SQL
Regards
Re: Best way to Hive to Spark migration
Posted by Jörn Franke <jo...@gmail.com>.
And the usual hint when migrating - do not migrate only but also optimize the ETL process design - this brings the most benefit s
> On 5. Apr 2018, at 08:18, Jörn Franke <jo...@gmail.com> wrote:
>
> Ok this is not much detail, but you are probably best off if you migrate them to SparkSQL.
>
> Depends also on the Hive version and Spark version. If you have a recent one with TEZ+llap I would not expect so much difference. It can be also less performant -Spark SQL got only recently some features suchst cost based optimizer.
>
>> On 5. Apr 2018, at 08:02, Pralabh Kumar <pr...@gmail.com> wrote:
>>
>> Hi
>>
>> I have lot of ETL jobs (complex ones) , since they are SLA critical , I am planning them to migrate to spark.
>>
>>> On Thu, Apr 5, 2018 at 10:46 AM, Jörn Franke <jo...@gmail.com> wrote:
>>> You need to provide more context on what you do currently in Hive and what do you expect from the migration.
>>>
>>>> On 5. Apr 2018, at 05:43, Pralabh Kumar <pr...@gmail.com> wrote:
>>>>
>>>> Hi Spark group
>>>>
>>>> What's the best way to Migrate Hive to Spark
>>>>
>>>> 1) Use HiveContext of Spark
>>>> 2) Use Hive on Spark (https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started)
>>>> 3) Migrate Hive to Calcite to Spark SQL
>>>>
>>>>
>>>> Regards
>>>>
>>
Re: Best way to Hive to Spark migration
Posted by Jörn Franke <jo...@gmail.com>.
Ok this is not much detail, but you are probably best off if you migrate them to SparkSQL.
Depends also on the Hive version and Spark version. If you have a recent one with TEZ+llap I would not expect so much difference. It can be also less performant -Spark SQL got only recently some features suchst cost based optimizer.
> On 5. Apr 2018, at 08:02, Pralabh Kumar <pr...@gmail.com> wrote:
>
> Hi
>
> I have lot of ETL jobs (complex ones) , since they are SLA critical , I am planning them to migrate to spark.
>
>> On Thu, Apr 5, 2018 at 10:46 AM, Jörn Franke <jo...@gmail.com> wrote:
>> You need to provide more context on what you do currently in Hive and what do you expect from the migration.
>>
>>> On 5. Apr 2018, at 05:43, Pralabh Kumar <pr...@gmail.com> wrote:
>>>
>>> Hi Spark group
>>>
>>> What's the best way to Migrate Hive to Spark
>>>
>>> 1) Use HiveContext of Spark
>>> 2) Use Hive on Spark (https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started)
>>> 3) Migrate Hive to Calcite to Spark SQL
>>>
>>>
>>> Regards
>>>
>
Re: Best way to Hive to Spark migration
Posted by Pralabh Kumar <pr...@gmail.com>.
Hi
I have lot of ETL jobs (complex ones) , since they are SLA critical , I am
planning them to migrate to spark.
On Thu, Apr 5, 2018 at 10:46 AM, Jörn Franke <jo...@gmail.com> wrote:
> You need to provide more context on what you do currently in Hive and what
> do you expect from the migration.
>
> On 5. Apr 2018, at 05:43, Pralabh Kumar <pr...@gmail.com> wrote:
>
> Hi Spark group
>
> What's the best way to Migrate Hive to Spark
>
> 1) Use HiveContext of Spark
> 2) Use Hive on Spark (https://cwiki.apache.org/
> confluence/display/Hive/Hive+on+Spark%3A+Getting+Started)
> 3) Migrate Hive to Calcite to Spark SQL
>
>
> Regards
>
>
Re: Best way to Hive to Spark migration
Posted by Jörn Franke <jo...@gmail.com>.
You need to provide more context on what you do currently in Hive and what do you expect from the migration.
> On 5. Apr 2018, at 05:43, Pralabh Kumar <pr...@gmail.com> wrote:
>
> Hi Spark group
>
> What's the best way to Migrate Hive to Spark
>
> 1) Use HiveContext of Spark
> 2) Use Hive on Spark (https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started)
> 3) Migrate Hive to Calcite to Spark SQL
>
>
> Regards
>
Re: Best way to Hive to Spark migration
Posted by Jörn Franke <jo...@gmail.com>.
You need to provide more context on what you do currently in Hive and what do you expect from the migration.
> On 5. Apr 2018, at 05:43, Pralabh Kumar <pr...@gmail.com> wrote:
>
> Hi Spark group
>
> What's the best way to Migrate Hive to Spark
>
> 1) Use HiveContext of Spark
> 2) Use Hive on Spark (https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started)
> 3) Migrate Hive to Calcite to Spark SQL
>
>
> Regards
>