You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sqoop.apache.org by wenxing zheng <we...@gmail.com> on 2016/12/29 01:18:13 UTC

Import more than 10 million records from MySQL to HDFS

Dear all,

Did anyone already try to import more than 10 million data from MySQL to
HDFS by using the Sqoop2?

I always failed at the very beginning with various throttling settings, but
never made it.

Appreciated for any advice.
Thanks, Wenxing

Re: Import more than 10 million records from MySQL to HDFS

Posted by Anna Szonyi <sz...@cloudera.com>.
Hi Wenxing,

I tested the scenario with a simple table (10 million rows, but only 3
columns) and it seems to work perfectly for me (with super simple types:
int, varchar, timestamp).
Could you please share the full create table statement (doesn't have to be
the same column names just the same types) and some sample inserts to see
what causes the problem, as it doesn't seem to be the row size. It maybe to
do with the types or the number of columns.
Also please share your link and job setup & the exception you got.

Thanks,
/Anna

On Mon, Jan 16, 2017 at 6:59 PM, wenxing zheng <we...@gmail.com>
wrote:

> Hi Szabolcs,
>
> Sorry for the late reply. From my test, it's ok for 100,0000 rows.
>
> Thanks, Wenxing
>
> On Wed, Jan 11, 2017 at 12:42 AM, Szabolcs Vasas <va...@cloudera.com>
> wrote:
>
> > Hi Wenxing,
> >
> > I have created a table based on the column information you sent but I
> won't
> > be able to do this testing in the next couple of days.
> > Btw have you tried the import with smaller data sets? I mean have you
> tried
> > to test what is the biggest data set you can import successfully?
> >
> > Szabolcs
> >
> > On Wed, Jan 4, 2017 at 10:55 AM, wenxing zheng <we...@gmail.com>
> > wrote:
> >
> > > Hi Szabolcs,
> > >
> > > I am testing this scenario with our client's slave database. And I am
> > > sorry that I can not share the table definition and the sample data
> here.
> > > But attached is a sample of table definition with the column types.
> > >
> > > It's quite complex.
> > >
> > > Thanks, Wenxing
> > >
> > > On Wed, Jan 4, 2017 at 4:24 PM, Szabolcs Vasas <va...@cloudera.com>
> > wrote:
> > >
> > >> Hi Wenxing,
> > >>
> > >> I haven't tried this scenario yet but I would be happy to test it on
> my
> > >> side. Can you please send me the DDL statement for creating the MySQL
> > >> table
> > >> and some sample data?
> > >> Also it would be very helpful to send the details of the job you would
> > >> like
> > >> to run.
> > >>
> > >> Regards,
> > >> Szabolcs
> > >>
> > >> On Wed, Jan 4, 2017 at 2:54 AM, wenxing zheng <
> wenxing.zheng@gmail.com>
> > >> wrote:
> > >>
> > >> > can anyone help to advice?
> > >> >
> > >> > And I met with a problem when I set the checkColumn with
> updated_time,
> > >> but
> > >> > currently all the updated_time are in NULL. Under this case, the
> Sqoop
> > >> will
> > >> > fail to start the job. I think we need to support such kind of case.
> > >> >
> > >> > On Thu, Dec 29, 2016 at 9:18 AM, wenxing zheng <
> > wenxing.zheng@gmail.com
> > >> >
> > >> > wrote:
> > >> >
> > >> > > Dear all,
> > >> > >
> > >> > > Did anyone already try to import more than 10 million data from
> > MySQL
> > >> to
> > >> > > HDFS by using the Sqoop2?
> > >> > >
> > >> > > I always failed at the very beginning with various throttling
> > >> settings,
> > >> > > but never made it.
> > >> > >
> > >> > > Appreciated for any advice.
> > >> > > Thanks, Wenxing
> > >> > >
> > >> >
> > >>
> > >>
> > >>
> > >> --
> > >> Szabolcs Vasas
> > >> Software Engineer
> > >> <http://www.cloudera.com>
> > >>
> > >
> > >
> >
> >
> > --
> > Szabolcs Vasas
> > Software Engineer
> > <http://www.cloudera.com>
> >
>

Re: Import more than 10 million records from MySQL to HDFS

Posted by wenxing zheng <we...@gmail.com>.
Hi Szabolcs,

Sorry for the late reply. From my test, it's ok for 100,0000 rows.

Thanks, Wenxing

On Wed, Jan 11, 2017 at 12:42 AM, Szabolcs Vasas <va...@cloudera.com> wrote:

> Hi Wenxing,
>
> I have created a table based on the column information you sent but I won't
> be able to do this testing in the next couple of days.
> Btw have you tried the import with smaller data sets? I mean have you tried
> to test what is the biggest data set you can import successfully?
>
> Szabolcs
>
> On Wed, Jan 4, 2017 at 10:55 AM, wenxing zheng <we...@gmail.com>
> wrote:
>
> > Hi Szabolcs,
> >
> > I am testing this scenario with our client's slave database. And I am
> > sorry that I can not share the table definition and the sample data here.
> > But attached is a sample of table definition with the column types.
> >
> > It's quite complex.
> >
> > Thanks, Wenxing
> >
> > On Wed, Jan 4, 2017 at 4:24 PM, Szabolcs Vasas <va...@cloudera.com>
> wrote:
> >
> >> Hi Wenxing,
> >>
> >> I haven't tried this scenario yet but I would be happy to test it on my
> >> side. Can you please send me the DDL statement for creating the MySQL
> >> table
> >> and some sample data?
> >> Also it would be very helpful to send the details of the job you would
> >> like
> >> to run.
> >>
> >> Regards,
> >> Szabolcs
> >>
> >> On Wed, Jan 4, 2017 at 2:54 AM, wenxing zheng <we...@gmail.com>
> >> wrote:
> >>
> >> > can anyone help to advice?
> >> >
> >> > And I met with a problem when I set the checkColumn with updated_time,
> >> but
> >> > currently all the updated_time are in NULL. Under this case, the Sqoop
> >> will
> >> > fail to start the job. I think we need to support such kind of case.
> >> >
> >> > On Thu, Dec 29, 2016 at 9:18 AM, wenxing zheng <
> wenxing.zheng@gmail.com
> >> >
> >> > wrote:
> >> >
> >> > > Dear all,
> >> > >
> >> > > Did anyone already try to import more than 10 million data from
> MySQL
> >> to
> >> > > HDFS by using the Sqoop2?
> >> > >
> >> > > I always failed at the very beginning with various throttling
> >> settings,
> >> > > but never made it.
> >> > >
> >> > > Appreciated for any advice.
> >> > > Thanks, Wenxing
> >> > >
> >> >
> >>
> >>
> >>
> >> --
> >> Szabolcs Vasas
> >> Software Engineer
> >> <http://www.cloudera.com>
> >>
> >
> >
>
>
> --
> Szabolcs Vasas
> Software Engineer
> <http://www.cloudera.com>
>

Re: Import more than 10 million records from MySQL to HDFS

Posted by Szabolcs Vasas <va...@cloudera.com>.
Hi Wenxing,

I have created a table based on the column information you sent but I won't
be able to do this testing in the next couple of days.
Btw have you tried the import with smaller data sets? I mean have you tried
to test what is the biggest data set you can import successfully?

Szabolcs

On Wed, Jan 4, 2017 at 10:55 AM, wenxing zheng <we...@gmail.com>
wrote:

> Hi Szabolcs,
>
> I am testing this scenario with our client's slave database. And I am
> sorry that I can not share the table definition and the sample data here.
> But attached is a sample of table definition with the column types.
>
> It's quite complex.
>
> Thanks, Wenxing
>
> On Wed, Jan 4, 2017 at 4:24 PM, Szabolcs Vasas <va...@cloudera.com> wrote:
>
>> Hi Wenxing,
>>
>> I haven't tried this scenario yet but I would be happy to test it on my
>> side. Can you please send me the DDL statement for creating the MySQL
>> table
>> and some sample data?
>> Also it would be very helpful to send the details of the job you would
>> like
>> to run.
>>
>> Regards,
>> Szabolcs
>>
>> On Wed, Jan 4, 2017 at 2:54 AM, wenxing zheng <we...@gmail.com>
>> wrote:
>>
>> > can anyone help to advice?
>> >
>> > And I met with a problem when I set the checkColumn with updated_time,
>> but
>> > currently all the updated_time are in NULL. Under this case, the Sqoop
>> will
>> > fail to start the job. I think we need to support such kind of case.
>> >
>> > On Thu, Dec 29, 2016 at 9:18 AM, wenxing zheng <wenxing.zheng@gmail.com
>> >
>> > wrote:
>> >
>> > > Dear all,
>> > >
>> > > Did anyone already try to import more than 10 million data from MySQL
>> to
>> > > HDFS by using the Sqoop2?
>> > >
>> > > I always failed at the very beginning with various throttling
>> settings,
>> > > but never made it.
>> > >
>> > > Appreciated for any advice.
>> > > Thanks, Wenxing
>> > >
>> >
>>
>>
>>
>> --
>> Szabolcs Vasas
>> Software Engineer
>> <http://www.cloudera.com>
>>
>
>


-- 
Szabolcs Vasas
Software Engineer
<http://www.cloudera.com>

Re: Import more than 10 million records from MySQL to HDFS

Posted by wenxing zheng <we...@gmail.com>.
Hi Szabolcs,

I am testing this scenario with our client's slave database. And I am sorry
that I can not share the table definition and the sample data here. But
attached is a sample of table definition with the column types.

It's quite complex.

Thanks, Wenxing

On Wed, Jan 4, 2017 at 4:24 PM, Szabolcs Vasas <va...@cloudera.com> wrote:

> Hi Wenxing,
>
> I haven't tried this scenario yet but I would be happy to test it on my
> side. Can you please send me the DDL statement for creating the MySQL table
> and some sample data?
> Also it would be very helpful to send the details of the job you would like
> to run.
>
> Regards,
> Szabolcs
>
> On Wed, Jan 4, 2017 at 2:54 AM, wenxing zheng <we...@gmail.com>
> wrote:
>
> > can anyone help to advice?
> >
> > And I met with a problem when I set the checkColumn with updated_time,
> but
> > currently all the updated_time are in NULL. Under this case, the Sqoop
> will
> > fail to start the job. I think we need to support such kind of case.
> >
> > On Thu, Dec 29, 2016 at 9:18 AM, wenxing zheng <we...@gmail.com>
> > wrote:
> >
> > > Dear all,
> > >
> > > Did anyone already try to import more than 10 million data from MySQL
> to
> > > HDFS by using the Sqoop2?
> > >
> > > I always failed at the very beginning with various throttling settings,
> > > but never made it.
> > >
> > > Appreciated for any advice.
> > > Thanks, Wenxing
> > >
> >
>
>
>
> --
> Szabolcs Vasas
> Software Engineer
> <http://www.cloudera.com>
>

Re: Import more than 10 million records from MySQL to HDFS

Posted by Szabolcs Vasas <va...@cloudera.com>.
Hi Wenxing,

I haven't tried this scenario yet but I would be happy to test it on my
side. Can you please send me the DDL statement for creating the MySQL table
and some sample data?
Also it would be very helpful to send the details of the job you would like
to run.

Regards,
Szabolcs

On Wed, Jan 4, 2017 at 2:54 AM, wenxing zheng <we...@gmail.com>
wrote:

> can anyone help to advice?
>
> And I met with a problem when I set the checkColumn with updated_time, but
> currently all the updated_time are in NULL. Under this case, the Sqoop will
> fail to start the job. I think we need to support such kind of case.
>
> On Thu, Dec 29, 2016 at 9:18 AM, wenxing zheng <we...@gmail.com>
> wrote:
>
> > Dear all,
> >
> > Did anyone already try to import more than 10 million data from MySQL to
> > HDFS by using the Sqoop2?
> >
> > I always failed at the very beginning with various throttling settings,
> > but never made it.
> >
> > Appreciated for any advice.
> > Thanks, Wenxing
> >
>



-- 
Szabolcs Vasas
Software Engineer
<http://www.cloudera.com>

Re: Import more than 10 million records from MySQL to HDFS

Posted by wenxing zheng <we...@gmail.com>.
can anyone help to advice?

And I met with a problem when I set the checkColumn with updated_time, but
currently all the updated_time are in NULL. Under this case, the Sqoop will
fail to start the job. I think we need to support such kind of case.

On Thu, Dec 29, 2016 at 9:18 AM, wenxing zheng <we...@gmail.com>
wrote:

> Dear all,
>
> Did anyone already try to import more than 10 million data from MySQL to
> HDFS by using the Sqoop2?
>
> I always failed at the very beginning with various throttling settings,
> but never made it.
>
> Appreciated for any advice.
> Thanks, Wenxing
>