You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Shushant Arora <sh...@gmail.com> on 2014/05/01 15:43:30 UTC

data transfer from rdbms to hive

Hi

I have a requirement to transfer data from RDBMS mysql to partitioned hive
table
Partitioned on Year and month.
Each record in mysql data contains timestamp of user activity.

What is the best tool for that.

1.Shall I go with sqoop?

2.How to compute dynamic partition from RDBMS data .

Shall I bucketised my fetched data on User Key.
Shall I use day also in partition?
My requirement is to analyse user activity per day basis.

Thanks
Shushant

Re: data transfer from rdbms to hive

Posted by CRAIG LIU <cr...@gmail.com>.
I am new in hive and here is my idea?
1. Use mysqldump to dump your data to csv file.
2. Load csv to hive temp table.
3. Create partition table.
4. Use dynamic partition, select from temp table to insert to partition
table. You can use udf to get the date from the timestamp.

Regards,
Craig
2014-5-2 下午9:11于 "Matt Tucker" <ma...@gmail.com>写道:

> It sounds like you might need to export. Via sqoop using a query or view,
> as the date granularity in your MySQL table is different from the desired
> Hive table. The overall performance may be lower as MySQL must do more than
> just read rows from disk, but you may still find ways to get the data in
> parallel through Sqoop.
>
> On May 2, 2014 3:34 AM, "Hamza Asad" <ha...@gmail.com> wrote:
> >
> > Sqoop also support dynamic partitioning. I have done that. For that you
> have to enable dynamic partition i.e dynamic partition = true, in hive.
> >
> >
> > On Fri, May 2, 2014 at 12:57 PM, unmesha sreeveni <un...@gmail.com>
> wrote:
> >>
> >>
> >> On Fri, May 2, 2014 at 9:41 AM, Shushant Arora <
> shushantarora09@gmail.com> wrote:
> >>>
> >>> Sqoop
> >>
> >>
> >> ​Hi Shushant
> >>   I dont think other ecosystem projects can help you.The only way to
> import data from relational DB is SQOOP.
> >>
> >>
> http://my.safaribooksonline.com/book/databases/9781449364618/6dot-hadoop-ecosystem-integration/integration_hive_partition_html
> >>
> >> Let me know your thoughts.
> >>
> >>
> >>
> >> --
> >> Thanks & Regards
> >>
> >> Unmesha Sreeveni U.B
> >> Hadoop, Bigdata Developer
> >> Center for Cyber Security | Amrita Vishwa Vidyapeetham
> >> http://www.unmeshasreeveni.blogspot.in/
> >>
> >>
> >
> >
> >
> > --
> > Muhammad Hamza Asad
>

Re: data transfer from rdbms to hive

Posted by Matt Tucker <ma...@gmail.com>.
It sounds like you might need to export. Via sqoop using a query or view,
as the date granularity in your MySQL table is different from the desired
Hive table. The overall performance may be lower as MySQL must do more than
just read rows from disk, but you may still find ways to get the data in
parallel through Sqoop.

On May 2, 2014 3:34 AM, "Hamza Asad" <ha...@gmail.com> wrote:
>
> Sqoop also support dynamic partitioning. I have done that. For that you
have to enable dynamic partition i.e dynamic partition = true, in hive.
>
>
> On Fri, May 2, 2014 at 12:57 PM, unmesha sreeveni <un...@gmail.com>
wrote:
>>
>>
>> On Fri, May 2, 2014 at 9:41 AM, Shushant Arora <sh...@gmail.com>
wrote:
>>>
>>> Sqoop
>>
>>
>> ​Hi Shushant
>>   I dont think other ecosystem projects can help you.The only way to
import data from relational DB is SQOOP.
>>
>>
http://my.safaribooksonline.com/book/databases/9781449364618/6dot-hadoop-ecosystem-integration/integration_hive_partition_html
>>
>> Let me know your thoughts.
>>
>>
>>
>> --
>> Thanks & Regards
>>
>> Unmesha Sreeveni U.B
>> Hadoop, Bigdata Developer
>> Center for Cyber Security | Amrita Vishwa Vidyapeetham
>> http://www.unmeshasreeveni.blogspot.in/
>>
>>
>
>
>
> --
> Muhammad Hamza Asad

Re: data transfer from rdbms to hive

Posted by Shushant Arora <sh...@gmail.com>.
for that do i need to load files first in non partitioned table and then in
from there to partitioned table
use insert from unpartitioned table to partitioned one.


On Fri, May 2, 2014 at 4:04 PM, Hamza Asad <ha...@gmail.com> wrote:

> Sqoop also support dynamic partitioning. I have done that. For that you
> have to enable dynamic partition i.e dynamic partition = true, in hive.
>
>
> On Fri, May 2, 2014 at 12:57 PM, unmesha sreeveni <un...@gmail.com>wrote:
>
>>
>> On Fri, May 2, 2014 at 9:41 AM, Shushant Arora <shushantarora09@gmail.com
>> > wrote:
>>
>>> Sqoop
>>
>>
>> ​Hi Shushant
>>   I dont think other ecosystem projects can help you.The only way to
>> import data from relational DB is SQOOP.
>>
>>
>> http://my.safaribooksonline.com/book/databases/9781449364618/6dot-hadoop-ecosystem-integration/integration_hive_partition_html
>>
>> Let me know your thoughts.
>>
>>
>>
>> --
>> *Thanks & Regards *
>>
>>
>> *Unmesha Sreeveni U.B*
>> *Hadoop, Bigdata Developer*
>> *Center for Cyber Security | Amrita Vishwa Vidyapeetham*
>> http://www.unmeshasreeveni.blogspot.in/
>>
>>
>>
>
>
> --
> *Muhammad Hamza Asad*
>

Re: data transfer from rdbms to hive

Posted by Hamza Asad <ha...@gmail.com>.
Sqoop also support dynamic partitioning. I have done that. For that you
have to enable dynamic partition i.e dynamic partition = true, in hive.


On Fri, May 2, 2014 at 12:57 PM, unmesha sreeveni <un...@gmail.com>wrote:

>
> On Fri, May 2, 2014 at 9:41 AM, Shushant Arora <sh...@gmail.com>wrote:
>
>> Sqoop
>
>
> ​Hi Shushant
>   I dont think other ecosystem projects can help you.The only way to
> import data from relational DB is SQOOP.
>
>
> http://my.safaribooksonline.com/book/databases/9781449364618/6dot-hadoop-ecosystem-integration/integration_hive_partition_html
>
> Let me know your thoughts.
>
>
>
> --
> *Thanks & Regards *
>
>
> *Unmesha Sreeveni U.B*
> *Hadoop, Bigdata Developer*
> *Center for Cyber Security | Amrita Vishwa Vidyapeetham*
> http://www.unmeshasreeveni.blogspot.in/
>
>
>


-- 
*Muhammad Hamza Asad*

Re: data transfer from rdbms to hive

Posted by unmesha sreeveni <un...@gmail.com>.
On Fri, May 2, 2014 at 9:41 AM, Shushant Arora <sh...@gmail.com>wrote:

> Sqoop


​Hi Shushant
  I dont think other ecosystem projects can help you.The only way to import
data from relational DB is SQOOP.

http://my.safaribooksonline.com/book/databases/9781449364618/6dot-hadoop-ecosystem-integration/integration_hive_partition_html

Let me know your thoughts.



-- 
*Thanks & Regards *


*Unmesha Sreeveni U.B*
*Hadoop, Bigdata Developer*
*Center for Cyber Security | Amrita Vishwa Vidyapeetham*
http://www.unmeshasreeveni.blogspot.in/

Re: data transfer from rdbms to hive

Posted by Shushant Arora <sh...@gmail.com>.
But how to achieve dynamic partitioning. For each row in mysql date from
column get partition name and insert in corresponding partition in hive.
Sqoop requires partition t be told beforehand.




On Fri, May 2, 2014 at 8:36 AM, unmesha sreeveni <un...@gmail.com>wrote:

> I suggest you to go for sqoop - They imports data from RDBMS.
>
>
> On Thu, May 1, 2014 at 7:13 PM, Shushant Arora <sh...@gmail.com>wrote:
>
>> Hi
>>
>> I have a requirement to transfer data from RDBMS mysql to partitioned
>> hive table
>> Partitioned on Year and month.
>> Each record in mysql data contains timestamp of user activity.
>>
>> What is the best tool for that.
>>
>> 1.Shall I go with sqoop?
>>
>> 2.How to compute dynamic partition from RDBMS data .
>>
>> Shall I bucketised my fetched data on User Key.
>> Shall I use day also in partition?
>> My requirement is to analyse user activity per day basis.
>>
>> Thanks
>> Shushant
>>
>>
>>
>>
>
>
> --
> *Thanks & Regards *
>
>
> *Unmesha Sreeveni U.B*
> *Hadoop, Bigdata Developer*
> *Center for Cyber Security | Amrita Vishwa Vidyapeetham*
> http://www.unmeshasreeveni.blogspot.in/
>
>
>

Re: data transfer from rdbms to hive

Posted by unmesha sreeveni <un...@gmail.com>.
I suggest you to go for sqoop - They imports data from RDBMS.


On Thu, May 1, 2014 at 7:13 PM, Shushant Arora <sh...@gmail.com>wrote:

> Hi
>
> I have a requirement to transfer data from RDBMS mysql to partitioned hive
> table
> Partitioned on Year and month.
> Each record in mysql data contains timestamp of user activity.
>
> What is the best tool for that.
>
> 1.Shall I go with sqoop?
>
> 2.How to compute dynamic partition from RDBMS data .
>
> Shall I bucketised my fetched data on User Key.
> Shall I use day also in partition?
> My requirement is to analyse user activity per day basis.
>
> Thanks
> Shushant
>
>
>
>


-- 
*Thanks & Regards *


*Unmesha Sreeveni U.B*
*Hadoop, Bigdata Developer*
*Center for Cyber Security | Amrita Vishwa Vidyapeetham*
http://www.unmeshasreeveni.blogspot.in/