You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@sqoop.apache.org by Christian Verkerk <ch...@tubularlabs.com> on 2014/09/15 16:14:43 UTC

mysqlimport terminated with error code 1

Hi,

I'm trying to run a sqoop export for a large dataset (at least 1B
rows) with the following sqoop export call:

sqoop export --direct \
--connect <host> \
--table <table> \
--export-dir /user/hive/warehouse/<table> \
--num-mappers 8 \
--username <username> \
--password <password> \
--input-fields-terminated-by ',' \
--verbose

Behind the scenes, I've found that sqoop export does what you'd expect
it to: it farms out the work to a (num-mappers) number of different
nodes with a NodeManager role, gets about 32MB worth of HDFS data into
a temp file on each of the nodes and sends it along to mysqlimport
which generates a LOAD DATA  LOCAL INFILE for the tempfile into the
MySQL table.

The following error occurs depending on the level of parallelism used
(via num-mappers), that is, 2 mappers doesn't trigger it but 10
definitely will:

14/09/14 17:34:25 INFO mapreduce.Job: map 25% reduce 0% 14/09/14
17:34:27 INFO mapreduce.Job: Task Id :
attempt_1410271365435_0034_m_000000_0, Status : FAILED Error:
java.io.IOException: mysqlimport terminated with error code 1 at
org.apache.sqoop.mapreduce.MySQLExportMapper.closeExportHandles(MySQLExportMapper.java:313)
at org.apache.sqoop.mapreduce.MySQLExportMapper.run(MySQLExportMapper.java:250)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at
org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) at
org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167) at
java.security.AccessController.doPrivileged(Native Method) at
javax.security.auth.Subject.doAs(Subject.java:415) at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)

I understand there is some limit to the level of parallelism that can
be achieved in the job -- mysqld can get tied up processing too many
things at once etc. but I'd like to know how to turn the debugging on
for the org.apache.sqoop.mapreduce.MySQLExportMapper class so that I
can actually see the mysqlimport error.

Reading through the following code[0] (not sure if this is the
relevant version BTW), I see that a logger is set up that should be
giving a lot of information[1] about the mysqlimport calls, but I
don't seem to be getting any of this fun in my logs.

[0] https://svn.apache.org/repos/asf/sqoop/trunk/src/java/org/apache/sqoop/mapreduce/MySQLExportMapper.java
[1] `LOG.debug("Starting mysqlimport with arguments:");`


Additional info:

I have log4j.properties setup in the following basic way:

log4j.rootLogger=${root.logger}
root.logger=INFO,console

log4j.logger.org.apache.hadoop.mapred=TRACE
log4j.logger.org.apache.sqoop.mapreduce=TRACE
log4j.logger.org.apache.sqoop.mapreduce.MySQLExportMapper=TRACE

log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss}
%p %c{2}: %m%n

What I have found is that the `max_allowed_packet` setting in MySQL
seems to affect this behaviour somewhat but I'd rather get more
information about the actual error than attempt to tweak a setting
"blind".

Relevant versioning:

Cloudera Hadoop Distribution (5.1.2)
mysqlimport: Ver 3.7 Distrib 5.5.38, for debian-linux-gnu
sqoop version: 1.4.4

Kind regards,

Christian Verkerk

-- 
Christian Verkerk
Software Engineer, Tubular Labs
christian@tubularlabs.com

Re: mysqlimport terminated with error code 1

Posted by Christian Verkerk <ch...@tubularlabs.com>.
You were right, though the logs link in the ResourceManager would
disappear and I couldn't get to the logs when the container died --
going through a NodeManager that was executing the actual task allowed
me to look at the logs, and it turns out the mysqlimport logs are
printed there:

2014-09-15 17:56:34,025 INFO [main]
org.apache.sqoop.mapreduce.MySQLExportMapper: Checkpointing current
export.
2014-09-15 17:56:34,025 INFO [main]
org.apache.sqoop.mapreduce.MySQLExportMapper: Waiting for mysqlimport
to complete
2014-09-15 17:56:34,032 INFO [Thread-34]
org.apache.sqoop.mapreduce.MySQLExportMapper: mysqlimport: Error:
1205, Lock wait timeout exceeded; try restarting transaction, when
using table: <table>

I set log4j via because /etc/hadoop/conf/log4j.properties

On Mon, Sep 15, 2014 at 11:32 AM, pratik khadloya <ti...@gmail.com> wrote:
> Btw, Christian, can you tell me how you fed the custom log4j config to
> sqoop?
> I am interested in debugging the MySQLManager class.
>
> Thanks,
> ~Pratik
>
> On Mon, Sep 15, 2014 at 10:39 AM, pratik khadloya <ti...@gmail.com>
> wrote:
>>
>> You're right they are import only arguments, i misread your original
>> question.
>> Am surprised that there are no logs in the JT. You should be able to see
>> the logs for attempt attempt_1410271365435_0034_m_000000_0
>> and also able to see which machine ran that map job. You can click on the
>> machine name and then on bottom left there is a "Local logs" link which you
>> can click and finally see the local mapper logs for that task tracker.
>>
>> The general url to directly get to those logs is:
>> http://<task-tracker-machine-name>:50060/tasktracker.jsp
>>
>> I suspect the loading command maybe failing due to some column mismatch or
>> some delimiter problems.
>>
>> ~Pratik
>>
>> On Mon, Sep 15, 2014 at 10:18 AM, Christian Verkerk
>> <ch...@tubularlabs.com> wrote:
>>>
>>> Hi,
>>>
>>> The jobtracker logs are all empty. The --split-by and --boundary-query
>>> are sqoop import only arguments AFAICT. The split sizes, as in the
>>> size of the file that is loaded into MySQL, is about 32MB.
>>>
>>> The sqoop export job I posted _does_ get data into MySQL, it just
>>> stops after awhile (due to load, presumably) and so running just one
>>> query against MySQL will work just fine and will not reproduce the
>>> error.
>>>
>>> The key is that I need some way to get more information on the exact
>>> error mysqlimport hits.
>>>
>>> Kind regards,
>>>
>>> Christian
>>>
>>> On Mon, Sep 15, 2014 at 7:41 AM, pratik khadloya <ti...@gmail.com>
>>> wrote:
>>> > Is there any reason given for the termination in the jobtracker logs?
>>> > Also, i see that you have not specified any --split-by and/or
>>> > --boundary-query option.
>>> > Does sqoop take time to determine the splits, if yes then specifying
>>> > these
>>> > settings might help.
>>> >
>>> > Also, check what the split sizes are, you may be running into a data
>>> > skew
>>> > depending on the splitting column used (generally the primary key of
>>> > the
>>> > table).
>>> > The query is printed in the sqoop logs, try running the same directly
>>> > on
>>> > mysql and see how mysql responds.
>>> >
>>> > ~Pratik
>>> >
>>> > On Mon, Sep 15, 2014 at 7:14 AM, Christian Verkerk
>>> > <ch...@tubularlabs.com> wrote:
>>> >>
>>> >> Hi,
>>> >>
>>> >> I'm trying to run a sqoop export for a large dataset (at least 1B
>>> >> rows) with the following sqoop export call:
>>> >>
>>> >> sqoop export --direct \
>>> >> --connect <host> \
>>> >> --table <table> \
>>> >> --export-dir /user/hive/warehouse/<table> \
>>> >> --num-mappers 8 \
>>> >> --username <username> \
>>> >> --password <password> \
>>> >> --input-fields-terminated-by ',' \
>>> >> --verbose
>>> >>
>>> >> Behind the scenes, I've found that sqoop export does what you'd expect
>>> >> it to: it farms out the work to a (num-mappers) number of different
>>> >> nodes with a NodeManager role, gets about 32MB worth of HDFS data into
>>> >> a temp file on each of the nodes and sends it along to mysqlimport
>>> >> which generates a LOAD DATA  LOCAL INFILE for the tempfile into the
>>> >> MySQL table.
>>> >>
>>> >> The following error occurs depending on the level of parallelism used
>>> >> (via num-mappers), that is, 2 mappers doesn't trigger it but 10
>>> >> definitely will:
>>> >>
>>> >> 14/09/14 17:34:25 INFO mapreduce.Job: map 25% reduce 0% 14/09/14
>>> >> 17:34:27 INFO mapreduce.Job: Task Id :
>>> >> attempt_1410271365435_0034_m_000000_0, Status : FAILED Error:
>>> >> java.io.IOException: mysqlimport terminated with error code 1 at
>>> >>
>>> >>
>>> >> org.apache.sqoop.mapreduce.MySQLExportMapper.closeExportHandles(MySQLExportMapper.java:313)
>>> >> at
>>> >>
>>> >> org.apache.sqoop.mapreduce.MySQLExportMapper.run(MySQLExportMapper.java:250)
>>> >> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at
>>> >> org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) at
>>> >> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167) at
>>> >> java.security.AccessController.doPrivileged(Native Method) at
>>> >> javax.security.auth.Subject.doAs(Subject.java:415) at
>>> >>
>>> >>
>>> >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
>>> >> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
>>> >>
>>> >> I understand there is some limit to the level of parallelism that can
>>> >> be achieved in the job -- mysqld can get tied up processing too many
>>> >> things at once etc. but I'd like to know how to turn the debugging on
>>> >> for the org.apache.sqoop.mapreduce.MySQLExportMapper class so that I
>>> >> can actually see the mysqlimport error.
>>> >>
>>> >> Reading through the following code[0] (not sure if this is the
>>> >> relevant version BTW), I see that a logger is set up that should be
>>> >> giving a lot of information[1] about the mysqlimport calls, but I
>>> >> don't seem to be getting any of this fun in my logs.
>>> >>
>>> >> [0]
>>> >>
>>> >> https://svn.apache.org/repos/asf/sqoop/trunk/src/java/org/apache/sqoop/mapreduce/MySQLExportMapper.java
>>> >> [1] `LOG.debug("Starting mysqlimport with arguments:");`
>>> >>
>>> >>
>>> >> Additional info:
>>> >>
>>> >> I have log4j.properties setup in the following basic way:
>>> >>
>>> >> log4j.rootLogger=${root.logger}
>>> >> root.logger=INFO,console
>>> >>
>>> >> log4j.logger.org.apache.hadoop.mapred=TRACE
>>> >> log4j.logger.org.apache.sqoop.mapreduce=TRACE
>>> >> log4j.logger.org.apache.sqoop.mapreduce.MySQLExportMapper=TRACE
>>> >>
>>> >> log4j.appender.console=org.apache.log4j.ConsoleAppender
>>> >> log4j.appender.console.target=System.err
>>> >> log4j.appender.console.layout=org.apache.log4j.PatternLayout
>>> >> log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss}
>>> >> %p %c{2}: %m%n
>>> >>
>>> >> What I have found is that the `max_allowed_packet` setting in MySQL
>>> >> seems to affect this behaviour somewhat but I'd rather get more
>>> >> information about the actual error than attempt to tweak a setting
>>> >> "blind".
>>> >>
>>> >> Relevant versioning:
>>> >>
>>> >> Cloudera Hadoop Distribution (5.1.2)
>>> >> mysqlimport: Ver 3.7 Distrib 5.5.38, for debian-linux-gnu
>>> >> sqoop version: 1.4.4
>>> >>
>>> >> Kind regards,
>>> >>
>>> >> Christian Verkerk
>>> >>
>>> >> --
>>> >> Christian Verkerk
>>> >> Software Engineer, Tubular Labs
>>> >> christian@tubularlabs.com
>>> >
>>> >
>>>
>>>
>>>
>>> --
>>> Christian Verkerk
>>> Software Engineer, Tubular Labs
>>> christian@tubularlabs.com
>>>
>>> On Mon, Sep 15, 2014 at 7:41 AM, pratik khadloya <ti...@gmail.com>
>>> wrote:
>>> > Is there any reason given for the termination in the jobtracker logs?
>>> > Also, i see that you have not specified any --split-by and/or
>>> > --boundary-query option.
>>> > Does sqoop take time to determine the splits, if yes then specifying
>>> > these
>>> > settings might help.
>>> >
>>> > Also, check what the split sizes are, you may be running into a data
>>> > skew
>>> > depending on the splitting column used (generally the primary key of
>>> > the
>>> > table).
>>> > The query is printed in the sqoop logs, try running the same directly
>>> > on
>>> > mysql and see how mysql responds.
>>> >
>>> > ~Pratik
>>> >
>>> > On Mon, Sep 15, 2014 at 7:14 AM, Christian Verkerk
>>> > <ch...@tubularlabs.com> wrote:
>>> >>
>>> >> Hi,
>>> >>
>>> >> I'm trying to run a sqoop export for a large dataset (at least 1B
>>> >> rows) with the following sqoop export call:
>>> >>
>>> >> sqoop export --direct \
>>> >> --connect <host> \
>>> >> --table <table> \
>>> >> --export-dir /user/hive/warehouse/<table> \
>>> >> --num-mappers 8 \
>>> >> --username <username> \
>>> >> --password <password> \
>>> >> --input-fields-terminated-by ',' \
>>> >> --verbose
>>> >>
>>> >> Behind the scenes, I've found that sqoop export does what you'd expect
>>> >> it to: it farms out the work to a (num-mappers) number of different
>>> >> nodes with a NodeManager role, gets about 32MB worth of HDFS data into
>>> >> a temp file on each of the nodes and sends it along to mysqlimport
>>> >> which generates a LOAD DATA  LOCAL INFILE for the tempfile into the
>>> >> MySQL table.
>>> >>
>>> >> The following error occurs depending on the level of parallelism used
>>> >> (via num-mappers), that is, 2 mappers doesn't trigger it but 10
>>> >> definitely will:
>>> >>
>>> >> 14/09/14 17:34:25 INFO mapreduce.Job: map 25% reduce 0% 14/09/14
>>> >> 17:34:27 INFO mapreduce.Job: Task Id :
>>> >> attempt_1410271365435_0034_m_000000_0, Status : FAILED Error:
>>> >> java.io.IOException: mysqlimport terminated with error code 1 at
>>> >>
>>> >>
>>> >> org.apache.sqoop.mapreduce.MySQLExportMapper.closeExportHandles(MySQLExportMapper.java:313)
>>> >> at
>>> >>
>>> >> org.apache.sqoop.mapreduce.MySQLExportMapper.run(MySQLExportMapper.java:250)
>>> >> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at
>>> >> org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) at
>>> >> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167) at
>>> >> java.security.AccessController.doPrivileged(Native Method) at
>>> >> javax.security.auth.Subject.doAs(Subject.java:415) at
>>> >>
>>> >>
>>> >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
>>> >> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
>>> >>
>>> >> I understand there is some limit to the level of parallelism that can
>>> >> be achieved in the job -- mysqld can get tied up processing too many
>>> >> things at once etc. but I'd like to know how to turn the debugging on
>>> >> for the org.apache.sqoop.mapreduce.MySQLExportMapper class so that I
>>> >> can actually see the mysqlimport error.
>>> >>
>>> >> Reading through the following code[0] (not sure if this is the
>>> >> relevant version BTW), I see that a logger is set up that should be
>>> >> giving a lot of information[1] about the mysqlimport calls, but I
>>> >> don't seem to be getting any of this fun in my logs.
>>> >>
>>> >> [0]
>>> >>
>>> >> https://svn.apache.org/repos/asf/sqoop/trunk/src/java/org/apache/sqoop/mapreduce/MySQLExportMapper.java
>>> >> [1] `LOG.debug("Starting mysqlimport with arguments:");`
>>> >>
>>> >>
>>> >> Additional info:
>>> >>
>>> >> I have log4j.properties setup in the following basic way:
>>> >>
>>> >> log4j.rootLogger=${root.logger}
>>> >> root.logger=INFO,console
>>> >>
>>> >> log4j.logger.org.apache.hadoop.mapred=TRACE
>>> >> log4j.logger.org.apache.sqoop.mapreduce=TRACE
>>> >> log4j.logger.org.apache.sqoop.mapreduce.MySQLExportMapper=TRACE
>>> >>
>>> >> log4j.appender.console=org.apache.log4j.ConsoleAppender
>>> >> log4j.appender.console.target=System.err
>>> >> log4j.appender.console.layout=org.apache.log4j.PatternLayout
>>> >> log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss}
>>> >> %p %c{2}: %m%n
>>> >>
>>> >> What I have found is that the `max_allowed_packet` setting in MySQL
>>> >> seems to affect this behaviour somewhat but I'd rather get more
>>> >> information about the actual error than attempt to tweak a setting
>>> >> "blind".
>>> >>
>>> >> Relevant versioning:
>>> >>
>>> >> Cloudera Hadoop Distribution (5.1.2)
>>> >> mysqlimport: Ver 3.7 Distrib 5.5.38, for debian-linux-gnu
>>> >> sqoop version: 1.4.4
>>> >>
>>> >> Kind regards,
>>> >>
>>> >> Christian Verkerk
>>> >>
>>> >> --
>>> >> Christian Verkerk
>>> >> Software Engineer, Tubular Labs
>>> >> christian@tubularlabs.com
>>> >
>>> >
>>>
>>>
>>>
>>> --
>>> Christian Verkerk
>>> Software Engineer, Tubular Labs
>>> christian@tubularlabs.com
>>
>>
>

Re: mysqlimport terminated with error code 1

Posted by pratik khadloya <ti...@gmail.com>.
Btw, Christian, can you tell me how you fed the custom log4j config to
sqoop?
I am interested in debugging the MySQLManager class.

Thanks,
~Pratik

On Mon, Sep 15, 2014 at 10:39 AM, pratik khadloya <ti...@gmail.com>
wrote:

> You're right they are import only arguments, i misread your original
> question.
> Am surprised that there are no logs in the JT. You should be able to see
> the logs for attempt attempt_1410271365435_0034_m_000000_0
> and also able to see which machine ran that map job. You can click on the
> machine name and then on bottom left there is a "Local logs" link which you
> can click and finally see the local mapper logs for that task tracker.
>
> The general url to directly get to those logs is:
> http://<task-tracker-machine-name>:50060/tasktracker.jsp
>
> I suspect the loading command maybe failing due to some column mismatch or
> some delimiter problems.
>
> ~Pratik
>
> On Mon, Sep 15, 2014 at 10:18 AM, Christian Verkerk <
> christian@tubularlabs.com> wrote:
>
>> Hi,
>>
>> The jobtracker logs are all empty. The --split-by and --boundary-query
>> are sqoop import only arguments AFAICT. The split sizes, as in the
>> size of the file that is loaded into MySQL, is about 32MB.
>>
>> The sqoop export job I posted _does_ get data into MySQL, it just
>> stops after awhile (due to load, presumably) and so running just one
>> query against MySQL will work just fine and will not reproduce the
>> error.
>>
>> The key is that I need some way to get more information on the exact
>> error mysqlimport hits.
>>
>> Kind regards,
>>
>> Christian
>>
>> On Mon, Sep 15, 2014 at 7:41 AM, pratik khadloya <ti...@gmail.com>
>> wrote:
>> > Is there any reason given for the termination in the jobtracker logs?
>> > Also, i see that you have not specified any --split-by and/or
>> > --boundary-query option.
>> > Does sqoop take time to determine the splits, if yes then specifying
>> these
>> > settings might help.
>> >
>> > Also, check what the split sizes are, you may be running into a data
>> skew
>> > depending on the splitting column used (generally the primary key of the
>> > table).
>> > The query is printed in the sqoop logs, try running the same directly on
>> > mysql and see how mysql responds.
>> >
>> > ~Pratik
>> >
>> > On Mon, Sep 15, 2014 at 7:14 AM, Christian Verkerk
>> > <ch...@tubularlabs.com> wrote:
>> >>
>> >> Hi,
>> >>
>> >> I'm trying to run a sqoop export for a large dataset (at least 1B
>> >> rows) with the following sqoop export call:
>> >>
>> >> sqoop export --direct \
>> >> --connect <host> \
>> >> --table <table> \
>> >> --export-dir /user/hive/warehouse/<table> \
>> >> --num-mappers 8 \
>> >> --username <username> \
>> >> --password <password> \
>> >> --input-fields-terminated-by ',' \
>> >> --verbose
>> >>
>> >> Behind the scenes, I've found that sqoop export does what you'd expect
>> >> it to: it farms out the work to a (num-mappers) number of different
>> >> nodes with a NodeManager role, gets about 32MB worth of HDFS data into
>> >> a temp file on each of the nodes and sends it along to mysqlimport
>> >> which generates a LOAD DATA  LOCAL INFILE for the tempfile into the
>> >> MySQL table.
>> >>
>> >> The following error occurs depending on the level of parallelism used
>> >> (via num-mappers), that is, 2 mappers doesn't trigger it but 10
>> >> definitely will:
>> >>
>> >> 14/09/14 17:34:25 INFO mapreduce.Job: map 25% reduce 0% 14/09/14
>> >> 17:34:27 INFO mapreduce.Job: Task Id :
>> >> attempt_1410271365435_0034_m_000000_0, Status : FAILED Error:
>> >> java.io.IOException: mysqlimport terminated with error code 1 at
>> >>
>> >>
>> org.apache.sqoop.mapreduce.MySQLExportMapper.closeExportHandles(MySQLExportMapper.java:313)
>> >> at
>> >>
>> org.apache.sqoop.mapreduce.MySQLExportMapper.run(MySQLExportMapper.java:250)
>> >> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at
>> >> org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) at
>> >> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167) at
>> >> java.security.AccessController.doPrivileged(Native Method) at
>> >> javax.security.auth.Subject.doAs(Subject.java:415) at
>> >>
>> >>
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
>> >> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
>> >>
>> >> I understand there is some limit to the level of parallelism that can
>> >> be achieved in the job -- mysqld can get tied up processing too many
>> >> things at once etc. but I'd like to know how to turn the debugging on
>> >> for the org.apache.sqoop.mapreduce.MySQLExportMapper class so that I
>> >> can actually see the mysqlimport error.
>> >>
>> >> Reading through the following code[0] (not sure if this is the
>> >> relevant version BTW), I see that a logger is set up that should be
>> >> giving a lot of information[1] about the mysqlimport calls, but I
>> >> don't seem to be getting any of this fun in my logs.
>> >>
>> >> [0]
>> >>
>> https://svn.apache.org/repos/asf/sqoop/trunk/src/java/org/apache/sqoop/mapreduce/MySQLExportMapper.java
>> >> [1] `LOG.debug("Starting mysqlimport with arguments:");`
>> >>
>> >>
>> >> Additional info:
>> >>
>> >> I have log4j.properties setup in the following basic way:
>> >>
>> >> log4j.rootLogger=${root.logger}
>> >> root.logger=INFO,console
>> >>
>> >> log4j.logger.org.apache.hadoop.mapred=TRACE
>> >> log4j.logger.org.apache.sqoop.mapreduce=TRACE
>> >> log4j.logger.org.apache.sqoop.mapreduce.MySQLExportMapper=TRACE
>> >>
>> >> log4j.appender.console=org.apache.log4j.ConsoleAppender
>> >> log4j.appender.console.target=System.err
>> >> log4j.appender.console.layout=org.apache.log4j.PatternLayout
>> >> log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss}
>> >> %p %c{2}: %m%n
>> >>
>> >> What I have found is that the `max_allowed_packet` setting in MySQL
>> >> seems to affect this behaviour somewhat but I'd rather get more
>> >> information about the actual error than attempt to tweak a setting
>> >> "blind".
>> >>
>> >> Relevant versioning:
>> >>
>> >> Cloudera Hadoop Distribution (5.1.2)
>> >> mysqlimport: Ver 3.7 Distrib 5.5.38, for debian-linux-gnu
>> >> sqoop version: 1.4.4
>> >>
>> >> Kind regards,
>> >>
>> >> Christian Verkerk
>> >>
>> >> --
>> >> Christian Verkerk
>> >> Software Engineer, Tubular Labs
>> >> christian@tubularlabs.com
>> >
>> >
>>
>>
>>
>> --
>> Christian Verkerk
>> Software Engineer, Tubular Labs
>> christian@tubularlabs.com
>>
>> On Mon, Sep 15, 2014 at 7:41 AM, pratik khadloya <ti...@gmail.com>
>> wrote:
>> > Is there any reason given for the termination in the jobtracker logs?
>> > Also, i see that you have not specified any --split-by and/or
>> > --boundary-query option.
>> > Does sqoop take time to determine the splits, if yes then specifying
>> these
>> > settings might help.
>> >
>> > Also, check what the split sizes are, you may be running into a data
>> skew
>> > depending on the splitting column used (generally the primary key of the
>> > table).
>> > The query is printed in the sqoop logs, try running the same directly on
>> > mysql and see how mysql responds.
>> >
>> > ~Pratik
>> >
>> > On Mon, Sep 15, 2014 at 7:14 AM, Christian Verkerk
>> > <ch...@tubularlabs.com> wrote:
>> >>
>> >> Hi,
>> >>
>> >> I'm trying to run a sqoop export for a large dataset (at least 1B
>> >> rows) with the following sqoop export call:
>> >>
>> >> sqoop export --direct \
>> >> --connect <host> \
>> >> --table <table> \
>> >> --export-dir /user/hive/warehouse/<table> \
>> >> --num-mappers 8 \
>> >> --username <username> \
>> >> --password <password> \
>> >> --input-fields-terminated-by ',' \
>> >> --verbose
>> >>
>> >> Behind the scenes, I've found that sqoop export does what you'd expect
>> >> it to: it farms out the work to a (num-mappers) number of different
>> >> nodes with a NodeManager role, gets about 32MB worth of HDFS data into
>> >> a temp file on each of the nodes and sends it along to mysqlimport
>> >> which generates a LOAD DATA  LOCAL INFILE for the tempfile into the
>> >> MySQL table.
>> >>
>> >> The following error occurs depending on the level of parallelism used
>> >> (via num-mappers), that is, 2 mappers doesn't trigger it but 10
>> >> definitely will:
>> >>
>> >> 14/09/14 17:34:25 INFO mapreduce.Job: map 25% reduce 0% 14/09/14
>> >> 17:34:27 INFO mapreduce.Job: Task Id :
>> >> attempt_1410271365435_0034_m_000000_0, Status : FAILED Error:
>> >> java.io.IOException: mysqlimport terminated with error code 1 at
>> >>
>> >>
>> org.apache.sqoop.mapreduce.MySQLExportMapper.closeExportHandles(MySQLExportMapper.java:313)
>> >> at
>> >>
>> org.apache.sqoop.mapreduce.MySQLExportMapper.run(MySQLExportMapper.java:250)
>> >> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at
>> >> org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) at
>> >> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167) at
>> >> java.security.AccessController.doPrivileged(Native Method) at
>> >> javax.security.auth.Subject.doAs(Subject.java:415) at
>> >>
>> >>
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
>> >> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
>> >>
>> >> I understand there is some limit to the level of parallelism that can
>> >> be achieved in the job -- mysqld can get tied up processing too many
>> >> things at once etc. but I'd like to know how to turn the debugging on
>> >> for the org.apache.sqoop.mapreduce.MySQLExportMapper class so that I
>> >> can actually see the mysqlimport error.
>> >>
>> >> Reading through the following code[0] (not sure if this is the
>> >> relevant version BTW), I see that a logger is set up that should be
>> >> giving a lot of information[1] about the mysqlimport calls, but I
>> >> don't seem to be getting any of this fun in my logs.
>> >>
>> >> [0]
>> >>
>> https://svn.apache.org/repos/asf/sqoop/trunk/src/java/org/apache/sqoop/mapreduce/MySQLExportMapper.java
>> >> [1] `LOG.debug("Starting mysqlimport with arguments:");`
>> >>
>> >>
>> >> Additional info:
>> >>
>> >> I have log4j.properties setup in the following basic way:
>> >>
>> >> log4j.rootLogger=${root.logger}
>> >> root.logger=INFO,console
>> >>
>> >> log4j.logger.org.apache.hadoop.mapred=TRACE
>> >> log4j.logger.org.apache.sqoop.mapreduce=TRACE
>> >> log4j.logger.org.apache.sqoop.mapreduce.MySQLExportMapper=TRACE
>> >>
>> >> log4j.appender.console=org.apache.log4j.ConsoleAppender
>> >> log4j.appender.console.target=System.err
>> >> log4j.appender.console.layout=org.apache.log4j.PatternLayout
>> >> log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss}
>> >> %p %c{2}: %m%n
>> >>
>> >> What I have found is that the `max_allowed_packet` setting in MySQL
>> >> seems to affect this behaviour somewhat but I'd rather get more
>> >> information about the actual error than attempt to tweak a setting
>> >> "blind".
>> >>
>> >> Relevant versioning:
>> >>
>> >> Cloudera Hadoop Distribution (5.1.2)
>> >> mysqlimport: Ver 3.7 Distrib 5.5.38, for debian-linux-gnu
>> >> sqoop version: 1.4.4
>> >>
>> >> Kind regards,
>> >>
>> >> Christian Verkerk
>> >>
>> >> --
>> >> Christian Verkerk
>> >> Software Engineer, Tubular Labs
>> >> christian@tubularlabs.com
>> >
>> >
>>
>>
>>
>> --
>> Christian Verkerk
>> Software Engineer, Tubular Labs
>> christian@tubularlabs.com
>>
>
>

Re: mysqlimport terminated with error code 1

Posted by pratik khadloya <ti...@gmail.com>.
You're right they are import only arguments, i misread your original
question.
Am surprised that there are no logs in the JT. You should be able to see
the logs for attempt attempt_1410271365435_0034_m_000000_0
and also able to see which machine ran that map job. You can click on the
machine name and then on bottom left there is a "Local logs" link which you
can click and finally see the local mapper logs for that task tracker.

The general url to directly get to those logs is:
http://<task-tracker-machine-name>:50060/tasktracker.jsp

I suspect the loading command maybe failing due to some column mismatch or
some delimiter problems.

~Pratik

On Mon, Sep 15, 2014 at 10:18 AM, Christian Verkerk <
christian@tubularlabs.com> wrote:

> Hi,
>
> The jobtracker logs are all empty. The --split-by and --boundary-query
> are sqoop import only arguments AFAICT. The split sizes, as in the
> size of the file that is loaded into MySQL, is about 32MB.
>
> The sqoop export job I posted _does_ get data into MySQL, it just
> stops after awhile (due to load, presumably) and so running just one
> query against MySQL will work just fine and will not reproduce the
> error.
>
> The key is that I need some way to get more information on the exact
> error mysqlimport hits.
>
> Kind regards,
>
> Christian
>
> On Mon, Sep 15, 2014 at 7:41 AM, pratik khadloya <ti...@gmail.com>
> wrote:
> > Is there any reason given for the termination in the jobtracker logs?
> > Also, i see that you have not specified any --split-by and/or
> > --boundary-query option.
> > Does sqoop take time to determine the splits, if yes then specifying
> these
> > settings might help.
> >
> > Also, check what the split sizes are, you may be running into a data skew
> > depending on the splitting column used (generally the primary key of the
> > table).
> > The query is printed in the sqoop logs, try running the same directly on
> > mysql and see how mysql responds.
> >
> > ~Pratik
> >
> > On Mon, Sep 15, 2014 at 7:14 AM, Christian Verkerk
> > <ch...@tubularlabs.com> wrote:
> >>
> >> Hi,
> >>
> >> I'm trying to run a sqoop export for a large dataset (at least 1B
> >> rows) with the following sqoop export call:
> >>
> >> sqoop export --direct \
> >> --connect <host> \
> >> --table <table> \
> >> --export-dir /user/hive/warehouse/<table> \
> >> --num-mappers 8 \
> >> --username <username> \
> >> --password <password> \
> >> --input-fields-terminated-by ',' \
> >> --verbose
> >>
> >> Behind the scenes, I've found that sqoop export does what you'd expect
> >> it to: it farms out the work to a (num-mappers) number of different
> >> nodes with a NodeManager role, gets about 32MB worth of HDFS data into
> >> a temp file on each of the nodes and sends it along to mysqlimport
> >> which generates a LOAD DATA  LOCAL INFILE for the tempfile into the
> >> MySQL table.
> >>
> >> The following error occurs depending on the level of parallelism used
> >> (via num-mappers), that is, 2 mappers doesn't trigger it but 10
> >> definitely will:
> >>
> >> 14/09/14 17:34:25 INFO mapreduce.Job: map 25% reduce 0% 14/09/14
> >> 17:34:27 INFO mapreduce.Job: Task Id :
> >> attempt_1410271365435_0034_m_000000_0, Status : FAILED Error:
> >> java.io.IOException: mysqlimport terminated with error code 1 at
> >>
> >>
> org.apache.sqoop.mapreduce.MySQLExportMapper.closeExportHandles(MySQLExportMapper.java:313)
> >> at
> >>
> org.apache.sqoop.mapreduce.MySQLExportMapper.run(MySQLExportMapper.java:250)
> >> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at
> >> org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) at
> >> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167) at
> >> java.security.AccessController.doPrivileged(Native Method) at
> >> javax.security.auth.Subject.doAs(Subject.java:415) at
> >>
> >>
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
> >> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
> >>
> >> I understand there is some limit to the level of parallelism that can
> >> be achieved in the job -- mysqld can get tied up processing too many
> >> things at once etc. but I'd like to know how to turn the debugging on
> >> for the org.apache.sqoop.mapreduce.MySQLExportMapper class so that I
> >> can actually see the mysqlimport error.
> >>
> >> Reading through the following code[0] (not sure if this is the
> >> relevant version BTW), I see that a logger is set up that should be
> >> giving a lot of information[1] about the mysqlimport calls, but I
> >> don't seem to be getting any of this fun in my logs.
> >>
> >> [0]
> >>
> https://svn.apache.org/repos/asf/sqoop/trunk/src/java/org/apache/sqoop/mapreduce/MySQLExportMapper.java
> >> [1] `LOG.debug("Starting mysqlimport with arguments:");`
> >>
> >>
> >> Additional info:
> >>
> >> I have log4j.properties setup in the following basic way:
> >>
> >> log4j.rootLogger=${root.logger}
> >> root.logger=INFO,console
> >>
> >> log4j.logger.org.apache.hadoop.mapred=TRACE
> >> log4j.logger.org.apache.sqoop.mapreduce=TRACE
> >> log4j.logger.org.apache.sqoop.mapreduce.MySQLExportMapper=TRACE
> >>
> >> log4j.appender.console=org.apache.log4j.ConsoleAppender
> >> log4j.appender.console.target=System.err
> >> log4j.appender.console.layout=org.apache.log4j.PatternLayout
> >> log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss}
> >> %p %c{2}: %m%n
> >>
> >> What I have found is that the `max_allowed_packet` setting in MySQL
> >> seems to affect this behaviour somewhat but I'd rather get more
> >> information about the actual error than attempt to tweak a setting
> >> "blind".
> >>
> >> Relevant versioning:
> >>
> >> Cloudera Hadoop Distribution (5.1.2)
> >> mysqlimport: Ver 3.7 Distrib 5.5.38, for debian-linux-gnu
> >> sqoop version: 1.4.4
> >>
> >> Kind regards,
> >>
> >> Christian Verkerk
> >>
> >> --
> >> Christian Verkerk
> >> Software Engineer, Tubular Labs
> >> christian@tubularlabs.com
> >
> >
>
>
>
> --
> Christian Verkerk
> Software Engineer, Tubular Labs
> christian@tubularlabs.com
>
> On Mon, Sep 15, 2014 at 7:41 AM, pratik khadloya <ti...@gmail.com>
> wrote:
> > Is there any reason given for the termination in the jobtracker logs?
> > Also, i see that you have not specified any --split-by and/or
> > --boundary-query option.
> > Does sqoop take time to determine the splits, if yes then specifying
> these
> > settings might help.
> >
> > Also, check what the split sizes are, you may be running into a data skew
> > depending on the splitting column used (generally the primary key of the
> > table).
> > The query is printed in the sqoop logs, try running the same directly on
> > mysql and see how mysql responds.
> >
> > ~Pratik
> >
> > On Mon, Sep 15, 2014 at 7:14 AM, Christian Verkerk
> > <ch...@tubularlabs.com> wrote:
> >>
> >> Hi,
> >>
> >> I'm trying to run a sqoop export for a large dataset (at least 1B
> >> rows) with the following sqoop export call:
> >>
> >> sqoop export --direct \
> >> --connect <host> \
> >> --table <table> \
> >> --export-dir /user/hive/warehouse/<table> \
> >> --num-mappers 8 \
> >> --username <username> \
> >> --password <password> \
> >> --input-fields-terminated-by ',' \
> >> --verbose
> >>
> >> Behind the scenes, I've found that sqoop export does what you'd expect
> >> it to: it farms out the work to a (num-mappers) number of different
> >> nodes with a NodeManager role, gets about 32MB worth of HDFS data into
> >> a temp file on each of the nodes and sends it along to mysqlimport
> >> which generates a LOAD DATA  LOCAL INFILE for the tempfile into the
> >> MySQL table.
> >>
> >> The following error occurs depending on the level of parallelism used
> >> (via num-mappers), that is, 2 mappers doesn't trigger it but 10
> >> definitely will:
> >>
> >> 14/09/14 17:34:25 INFO mapreduce.Job: map 25% reduce 0% 14/09/14
> >> 17:34:27 INFO mapreduce.Job: Task Id :
> >> attempt_1410271365435_0034_m_000000_0, Status : FAILED Error:
> >> java.io.IOException: mysqlimport terminated with error code 1 at
> >>
> >>
> org.apache.sqoop.mapreduce.MySQLExportMapper.closeExportHandles(MySQLExportMapper.java:313)
> >> at
> >>
> org.apache.sqoop.mapreduce.MySQLExportMapper.run(MySQLExportMapper.java:250)
> >> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at
> >> org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) at
> >> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167) at
> >> java.security.AccessController.doPrivileged(Native Method) at
> >> javax.security.auth.Subject.doAs(Subject.java:415) at
> >>
> >>
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
> >> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
> >>
> >> I understand there is some limit to the level of parallelism that can
> >> be achieved in the job -- mysqld can get tied up processing too many
> >> things at once etc. but I'd like to know how to turn the debugging on
> >> for the org.apache.sqoop.mapreduce.MySQLExportMapper class so that I
> >> can actually see the mysqlimport error.
> >>
> >> Reading through the following code[0] (not sure if this is the
> >> relevant version BTW), I see that a logger is set up that should be
> >> giving a lot of information[1] about the mysqlimport calls, but I
> >> don't seem to be getting any of this fun in my logs.
> >>
> >> [0]
> >>
> https://svn.apache.org/repos/asf/sqoop/trunk/src/java/org/apache/sqoop/mapreduce/MySQLExportMapper.java
> >> [1] `LOG.debug("Starting mysqlimport with arguments:");`
> >>
> >>
> >> Additional info:
> >>
> >> I have log4j.properties setup in the following basic way:
> >>
> >> log4j.rootLogger=${root.logger}
> >> root.logger=INFO,console
> >>
> >> log4j.logger.org.apache.hadoop.mapred=TRACE
> >> log4j.logger.org.apache.sqoop.mapreduce=TRACE
> >> log4j.logger.org.apache.sqoop.mapreduce.MySQLExportMapper=TRACE
> >>
> >> log4j.appender.console=org.apache.log4j.ConsoleAppender
> >> log4j.appender.console.target=System.err
> >> log4j.appender.console.layout=org.apache.log4j.PatternLayout
> >> log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss}
> >> %p %c{2}: %m%n
> >>
> >> What I have found is that the `max_allowed_packet` setting in MySQL
> >> seems to affect this behaviour somewhat but I'd rather get more
> >> information about the actual error than attempt to tweak a setting
> >> "blind".
> >>
> >> Relevant versioning:
> >>
> >> Cloudera Hadoop Distribution (5.1.2)
> >> mysqlimport: Ver 3.7 Distrib 5.5.38, for debian-linux-gnu
> >> sqoop version: 1.4.4
> >>
> >> Kind regards,
> >>
> >> Christian Verkerk
> >>
> >> --
> >> Christian Verkerk
> >> Software Engineer, Tubular Labs
> >> christian@tubularlabs.com
> >
> >
>
>
>
> --
> Christian Verkerk
> Software Engineer, Tubular Labs
> christian@tubularlabs.com
>

Re: mysqlimport terminated with error code 1

Posted by Christian Verkerk <ch...@tubularlabs.com>.
Hi,

The jobtracker logs are all empty. The --split-by and --boundary-query
are sqoop import only arguments AFAICT. The split sizes, as in the
size of the file that is loaded into MySQL, is about 32MB.

The sqoop export job I posted _does_ get data into MySQL, it just
stops after awhile (due to load, presumably) and so running just one
query against MySQL will work just fine and will not reproduce the
error.

The key is that I need some way to get more information on the exact
error mysqlimport hits.

Kind regards,

Christian

On Mon, Sep 15, 2014 at 7:41 AM, pratik khadloya <ti...@gmail.com> wrote:
> Is there any reason given for the termination in the jobtracker logs?
> Also, i see that you have not specified any --split-by and/or
> --boundary-query option.
> Does sqoop take time to determine the splits, if yes then specifying these
> settings might help.
>
> Also, check what the split sizes are, you may be running into a data skew
> depending on the splitting column used (generally the primary key of the
> table).
> The query is printed in the sqoop logs, try running the same directly on
> mysql and see how mysql responds.
>
> ~Pratik
>
> On Mon, Sep 15, 2014 at 7:14 AM, Christian Verkerk
> <ch...@tubularlabs.com> wrote:
>>
>> Hi,
>>
>> I'm trying to run a sqoop export for a large dataset (at least 1B
>> rows) with the following sqoop export call:
>>
>> sqoop export --direct \
>> --connect <host> \
>> --table <table> \
>> --export-dir /user/hive/warehouse/<table> \
>> --num-mappers 8 \
>> --username <username> \
>> --password <password> \
>> --input-fields-terminated-by ',' \
>> --verbose
>>
>> Behind the scenes, I've found that sqoop export does what you'd expect
>> it to: it farms out the work to a (num-mappers) number of different
>> nodes with a NodeManager role, gets about 32MB worth of HDFS data into
>> a temp file on each of the nodes and sends it along to mysqlimport
>> which generates a LOAD DATA  LOCAL INFILE for the tempfile into the
>> MySQL table.
>>
>> The following error occurs depending on the level of parallelism used
>> (via num-mappers), that is, 2 mappers doesn't trigger it but 10
>> definitely will:
>>
>> 14/09/14 17:34:25 INFO mapreduce.Job: map 25% reduce 0% 14/09/14
>> 17:34:27 INFO mapreduce.Job: Task Id :
>> attempt_1410271365435_0034_m_000000_0, Status : FAILED Error:
>> java.io.IOException: mysqlimport terminated with error code 1 at
>>
>> org.apache.sqoop.mapreduce.MySQLExportMapper.closeExportHandles(MySQLExportMapper.java:313)
>> at
>> org.apache.sqoop.mapreduce.MySQLExportMapper.run(MySQLExportMapper.java:250)
>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at
>> org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) at
>> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167) at
>> java.security.AccessController.doPrivileged(Native Method) at
>> javax.security.auth.Subject.doAs(Subject.java:415) at
>>
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
>> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
>>
>> I understand there is some limit to the level of parallelism that can
>> be achieved in the job -- mysqld can get tied up processing too many
>> things at once etc. but I'd like to know how to turn the debugging on
>> for the org.apache.sqoop.mapreduce.MySQLExportMapper class so that I
>> can actually see the mysqlimport error.
>>
>> Reading through the following code[0] (not sure if this is the
>> relevant version BTW), I see that a logger is set up that should be
>> giving a lot of information[1] about the mysqlimport calls, but I
>> don't seem to be getting any of this fun in my logs.
>>
>> [0]
>> https://svn.apache.org/repos/asf/sqoop/trunk/src/java/org/apache/sqoop/mapreduce/MySQLExportMapper.java
>> [1] `LOG.debug("Starting mysqlimport with arguments:");`
>>
>>
>> Additional info:
>>
>> I have log4j.properties setup in the following basic way:
>>
>> log4j.rootLogger=${root.logger}
>> root.logger=INFO,console
>>
>> log4j.logger.org.apache.hadoop.mapred=TRACE
>> log4j.logger.org.apache.sqoop.mapreduce=TRACE
>> log4j.logger.org.apache.sqoop.mapreduce.MySQLExportMapper=TRACE
>>
>> log4j.appender.console=org.apache.log4j.ConsoleAppender
>> log4j.appender.console.target=System.err
>> log4j.appender.console.layout=org.apache.log4j.PatternLayout
>> log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss}
>> %p %c{2}: %m%n
>>
>> What I have found is that the `max_allowed_packet` setting in MySQL
>> seems to affect this behaviour somewhat but I'd rather get more
>> information about the actual error than attempt to tweak a setting
>> "blind".
>>
>> Relevant versioning:
>>
>> Cloudera Hadoop Distribution (5.1.2)
>> mysqlimport: Ver 3.7 Distrib 5.5.38, for debian-linux-gnu
>> sqoop version: 1.4.4
>>
>> Kind regards,
>>
>> Christian Verkerk
>>
>> --
>> Christian Verkerk
>> Software Engineer, Tubular Labs
>> christian@tubularlabs.com
>
>



-- 
Christian Verkerk
Software Engineer, Tubular Labs
christian@tubularlabs.com

On Mon, Sep 15, 2014 at 7:41 AM, pratik khadloya <ti...@gmail.com> wrote:
> Is there any reason given for the termination in the jobtracker logs?
> Also, i see that you have not specified any --split-by and/or
> --boundary-query option.
> Does sqoop take time to determine the splits, if yes then specifying these
> settings might help.
>
> Also, check what the split sizes are, you may be running into a data skew
> depending on the splitting column used (generally the primary key of the
> table).
> The query is printed in the sqoop logs, try running the same directly on
> mysql and see how mysql responds.
>
> ~Pratik
>
> On Mon, Sep 15, 2014 at 7:14 AM, Christian Verkerk
> <ch...@tubularlabs.com> wrote:
>>
>> Hi,
>>
>> I'm trying to run a sqoop export for a large dataset (at least 1B
>> rows) with the following sqoop export call:
>>
>> sqoop export --direct \
>> --connect <host> \
>> --table <table> \
>> --export-dir /user/hive/warehouse/<table> \
>> --num-mappers 8 \
>> --username <username> \
>> --password <password> \
>> --input-fields-terminated-by ',' \
>> --verbose
>>
>> Behind the scenes, I've found that sqoop export does what you'd expect
>> it to: it farms out the work to a (num-mappers) number of different
>> nodes with a NodeManager role, gets about 32MB worth of HDFS data into
>> a temp file on each of the nodes and sends it along to mysqlimport
>> which generates a LOAD DATA  LOCAL INFILE for the tempfile into the
>> MySQL table.
>>
>> The following error occurs depending on the level of parallelism used
>> (via num-mappers), that is, 2 mappers doesn't trigger it but 10
>> definitely will:
>>
>> 14/09/14 17:34:25 INFO mapreduce.Job: map 25% reduce 0% 14/09/14
>> 17:34:27 INFO mapreduce.Job: Task Id :
>> attempt_1410271365435_0034_m_000000_0, Status : FAILED Error:
>> java.io.IOException: mysqlimport terminated with error code 1 at
>>
>> org.apache.sqoop.mapreduce.MySQLExportMapper.closeExportHandles(MySQLExportMapper.java:313)
>> at
>> org.apache.sqoop.mapreduce.MySQLExportMapper.run(MySQLExportMapper.java:250)
>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at
>> org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) at
>> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167) at
>> java.security.AccessController.doPrivileged(Native Method) at
>> javax.security.auth.Subject.doAs(Subject.java:415) at
>>
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
>> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
>>
>> I understand there is some limit to the level of parallelism that can
>> be achieved in the job -- mysqld can get tied up processing too many
>> things at once etc. but I'd like to know how to turn the debugging on
>> for the org.apache.sqoop.mapreduce.MySQLExportMapper class so that I
>> can actually see the mysqlimport error.
>>
>> Reading through the following code[0] (not sure if this is the
>> relevant version BTW), I see that a logger is set up that should be
>> giving a lot of information[1] about the mysqlimport calls, but I
>> don't seem to be getting any of this fun in my logs.
>>
>> [0]
>> https://svn.apache.org/repos/asf/sqoop/trunk/src/java/org/apache/sqoop/mapreduce/MySQLExportMapper.java
>> [1] `LOG.debug("Starting mysqlimport with arguments:");`
>>
>>
>> Additional info:
>>
>> I have log4j.properties setup in the following basic way:
>>
>> log4j.rootLogger=${root.logger}
>> root.logger=INFO,console
>>
>> log4j.logger.org.apache.hadoop.mapred=TRACE
>> log4j.logger.org.apache.sqoop.mapreduce=TRACE
>> log4j.logger.org.apache.sqoop.mapreduce.MySQLExportMapper=TRACE
>>
>> log4j.appender.console=org.apache.log4j.ConsoleAppender
>> log4j.appender.console.target=System.err
>> log4j.appender.console.layout=org.apache.log4j.PatternLayout
>> log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss}
>> %p %c{2}: %m%n
>>
>> What I have found is that the `max_allowed_packet` setting in MySQL
>> seems to affect this behaviour somewhat but I'd rather get more
>> information about the actual error than attempt to tweak a setting
>> "blind".
>>
>> Relevant versioning:
>>
>> Cloudera Hadoop Distribution (5.1.2)
>> mysqlimport: Ver 3.7 Distrib 5.5.38, for debian-linux-gnu
>> sqoop version: 1.4.4
>>
>> Kind regards,
>>
>> Christian Verkerk
>>
>> --
>> Christian Verkerk
>> Software Engineer, Tubular Labs
>> christian@tubularlabs.com
>
>



-- 
Christian Verkerk
Software Engineer, Tubular Labs
christian@tubularlabs.com

Re: mysqlimport terminated with error code 1

Posted by pratik khadloya <ti...@gmail.com>.
Is there any reason given for the termination in the jobtracker logs?
Also, i see that you have not specified any --split-by and/or
--boundary-query option.
Does sqoop take time to determine the splits, if yes then specifying these
settings might help.

Also, check what the split sizes are, you may be running into a data skew
depending on the splitting column used (generally the primary key of the
table).
The query is printed in the sqoop logs, try running the same directly on
mysql and see how mysql responds.

~Pratik

On Mon, Sep 15, 2014 at 7:14 AM, Christian Verkerk <
christian@tubularlabs.com> wrote:

> Hi,
>
> I'm trying to run a sqoop export for a large dataset (at least 1B
> rows) with the following sqoop export call:
>
> sqoop export --direct \
> --connect <host> \
> --table <table> \
> --export-dir /user/hive/warehouse/<table> \
> --num-mappers 8 \
> --username <username> \
> --password <password> \
> --input-fields-terminated-by ',' \
> --verbose
>
> Behind the scenes, I've found that sqoop export does what you'd expect
> it to: it farms out the work to a (num-mappers) number of different
> nodes with a NodeManager role, gets about 32MB worth of HDFS data into
> a temp file on each of the nodes and sends it along to mysqlimport
> which generates a LOAD DATA  LOCAL INFILE for the tempfile into the
> MySQL table.
>
> The following error occurs depending on the level of parallelism used
> (via num-mappers), that is, 2 mappers doesn't trigger it but 10
> definitely will:
>
> 14/09/14 17:34:25 INFO mapreduce.Job: map 25% reduce 0% 14/09/14
> 17:34:27 INFO mapreduce.Job: Task Id :
> attempt_1410271365435_0034_m_000000_0, Status : FAILED Error:
> java.io.IOException: mysqlimport terminated with error code 1 at
>
> org.apache.sqoop.mapreduce.MySQLExportMapper.closeExportHandles(MySQLExportMapper.java:313)
> at
> org.apache.sqoop.mapreduce.MySQLExportMapper.run(MySQLExportMapper.java:250)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at
> org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) at
> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167) at
> java.security.AccessController.doPrivileged(Native Method) at
> javax.security.auth.Subject.doAs(Subject.java:415) at
>
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
>
> I understand there is some limit to the level of parallelism that can
> be achieved in the job -- mysqld can get tied up processing too many
> things at once etc. but I'd like to know how to turn the debugging on
> for the org.apache.sqoop.mapreduce.MySQLExportMapper class so that I
> can actually see the mysqlimport error.
>
> Reading through the following code[0] (not sure if this is the
> relevant version BTW), I see that a logger is set up that should be
> giving a lot of information[1] about the mysqlimport calls, but I
> don't seem to be getting any of this fun in my logs.
>
> [0]
> https://svn.apache.org/repos/asf/sqoop/trunk/src/java/org/apache/sqoop/mapreduce/MySQLExportMapper.java
> [1] `LOG.debug("Starting mysqlimport with arguments:");`
>
>
> Additional info:
>
> I have log4j.properties setup in the following basic way:
>
> log4j.rootLogger=${root.logger}
> root.logger=INFO,console
>
> log4j.logger.org.apache.hadoop.mapred=TRACE
> log4j.logger.org.apache.sqoop.mapreduce=TRACE
> log4j.logger.org.apache.sqoop.mapreduce.MySQLExportMapper=TRACE
>
> log4j.appender.console=org.apache.log4j.ConsoleAppender
> log4j.appender.console.target=System.err
> log4j.appender.console.layout=org.apache.log4j.PatternLayout
> log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss}
> %p %c{2}: %m%n
>
> What I have found is that the `max_allowed_packet` setting in MySQL
> seems to affect this behaviour somewhat but I'd rather get more
> information about the actual error than attempt to tweak a setting
> "blind".
>
> Relevant versioning:
>
> Cloudera Hadoop Distribution (5.1.2)
> mysqlimport: Ver 3.7 Distrib 5.5.38, for debian-linux-gnu
> sqoop version: 1.4.4
>
> Kind regards,
>
> Christian Verkerk
>
> --
> Christian Verkerk
> Software Engineer, Tubular Labs
> christian@tubularlabs.com
>