You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hudi.apache.org by Gurudatt Kulkarni <gu...@gmail.com> on 2020/08/21 06:48:11 UTC

[Question] How to use Hudi for migrating a historical mysql table?

Hi All,

I have a use case where there is historical data available in MySQL table
which is being populated by a Kafka topic.

My plan is to create a spark job that will migrate data from MySQL using
Hudi Datasource. Once the migration of historical data is done from MySQL,
use Deltastreamer to tap the Kafka topic for real-time data and write to
the same location. Is this possible? How to approach this problem, as it
may corrupt Hudi metadata.

Regards,
Gurudatt

Re: [Question] How to use Hudi for migrating a historical mysql table?

Posted by Pratyaksh Sharma <pr...@gmail.com>.

Hi Gurudatt,

You can use Debezium for migrating historical data as well. Using Debezium
will enable you to migrate existing as well as new data using
DeltaStreamer. I have used it in my previous org for the same use case.

On Fri, Aug 21, 2020 at 12:30 PM wowtuanzi@gmail.com <wo...@gmail.com>
wrote:

>
> You can use kafka to subscribe  mysql binlog ,then consume historical data
> directly.
> For details, please refer to [1]
>
>
> [1] http://hudi.apache.org/docs/writing_data.html
>
>
> wowtuanzi@gmail.com
>
> From: Gurudatt Kulkarni
> Date: 2020-08-21 15:18
> To: dev
> Subject: [Question] How to use Hudi for migrating a historical mysql table?
> Hi All,
>
> I have a use case where there is historical data available in MySQL table
> which is being populated by a Kafka topic.
>
> My plan is to create a spark job that will migrate data from MySQL using
> Hudi Datasource. Once the migration of historical data is done from MySQL,
> use Deltastreamer to tap the Kafka topic for real-time data and write to
> the same location. Is this possible? How to approach this problem, as it
> may corrupt Hudi metadata.
>
> Regards,
> Gurudatt
>

Re: [Question] How to use Hudi for migrating a historical mysql table?

Posted by "wowtuanzi@gmail.com" <wo...@gmail.com>.

You can use kafka to subscribe  mysql binlog ,then consume historical data directly.
For details, please refer to [1] 


[1] http://hudi.apache.org/docs/writing_data.html


wowtuanzi@gmail.com
 
From: Gurudatt Kulkarni
Date: 2020-08-21 15:18
To: dev
Subject: [Question] How to use Hudi for migrating a historical mysql table?
Hi All,
 
I have a use case where there is historical data available in MySQL table
which is being populated by a Kafka topic.
 
My plan is to create a spark job that will migrate data from MySQL using
Hudi Datasource. Once the migration of historical data is done from MySQL,
use Deltastreamer to tap the Kafka topic for real-time data and write to
the same location. Is this possible? How to approach this problem, as it
may corrupt Hudi metadata.
 
Regards,
Gurudatt