You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@sqoop.apache.org by Michael Arena <ma...@paytronix.com> on 2015/05/08 22:21:54 UTC

Scoop 1.4.5 on CDH 5.2.1 crashes when doing an incremental import to Parquet files

Cloudera backported Parquet support to their version of Sqoop 1.4.5.

Originally, I was doing Sqoop incremental imports from SQL Server to text files (TSV).
This worked fine but the size and query speed of the textfiles are a problem.

I then tried importing as Avro files but Sqoop prohibits Avro and incremental mode.

I then tried importing as Parquet files.

The initial import worked fine and loaded 69,071 rows.
Then next time Sqoop ran, it pulled in the 1 changed row but then the "merge" step failed since it appears to think the files are text (not Parquet):

15/05/08 16:05:45 INFO mapreduce.Job: Job job_1431092783319_0252 completed successfully
15/05/08 16:05:45 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=144566
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=34006
HDFS: Number of bytes written=11556
HDFS: Number of read operations=32
HDFS: Number of large read operations=0
HDFS: Number of write operations=7
Job Counters
Launched map tasks=1
Other local map tasks=1
Total time spent by all maps in occupied slots (ms)=16564
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=8282
Total vcore-seconds taken by all map tasks=8282
Total megabyte-seconds taken by all map tasks=33923072
Map-Reduce Framework
Map input records=1
Map output records=1
Input split bytes=119
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=82
CPU time spent (ms)=4570
Physical memory (bytes) snapshot=591728640
Virtual memory (bytes) snapshot=3873914880
Total committed heap usage (bytes)=1853882368
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=0
15/05/08 16:05:45 INFO mapreduce.ImportJobBase: Transferred 11.2852 KB in 31.1579 seconds (370.8854 bytes/sec)
15/05/08 16:05:45 INFO mapreduce.ImportJobBase: Retrieved 1 records.
15/05/08 16:05:45 INFO Configuration.deprecation: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
15/05/08 16:05:46 INFO client.RMProxy: Connecting to ResourceManager at ue1b-labA02/10.74.50.172:8032
15/05/08 16:05:46 INFO input.FileInputFormat: Total input paths to process : 5
15/05/08 16:05:46 INFO mapreduce.JobSubmitter: number of splits:5
15/05/08 16:05:47 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1431092783319_0255
15/05/08 16:05:47 INFO impl.YarnClientImpl: Submitted application application_1431092783319_0255
15/05/08 16:05:47 INFO mapreduce.Job: The url to track the job: http://ue1b-labA02:8088/proxy/application_1431092783319_0255/
15/05/08 16:05:47 INFO mapreduce.Job: Running job: job_1431092783319_0255
15/05/08 16:06:01 INFO mapreduce.Job: Job job_1431092783319_0255 running in uber mode : false
15/05/08 16:06:01 INFO mapreduce.Job:  map 0% reduce 0%
15/05/08 16:06:07 INFO mapreduce.Job: Task Id : attempt_1431092783319_0255_m_000000_0, Status : FAILED
Error: java.lang.RuntimeException: Can't parse input data: 'PAR1��
                                                                  ��
                                                                    '
at QueryResult.__loadFromFields(QueryResult.java:1413)
at QueryResult.parse(QueryResult.java:1221)
at org.apache.sqoop.mapreduce.MergeTextMapper.map(MergeTextMapper.java:53)
at org.apache.sqoop.mapreduce.MergeTextMapper.map(MergeTextMapper.java:34)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: java.lang.NumberFormatException: For input string: "PAR1��
                                                                     ��
                                                                       "
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:492)
at java.lang.Integer.valueOf(Integer.java:582)
at QueryResult.__loadFromFields(QueryResult.java:1270)
... 11 more



How are you engaging with millennials at your organization? Earn “Lifetime Loyalty with Effective Millennial Engagement” by signing up for our next webinar. Join us Tuesday, May 12 at 1:00 EDT to obtain the tools you need to earn brand loyalty from this important demographic. Click here <http://content.paytronix.com/Lifetime-Loyalty_0515_sig.html> to register!

Re: Scoop 1.4.5 on CDH 5.2.1 crashes when doing an incremental import to Parquet files

Posted by Michael Arena <ma...@paytronix.com>.
Here is the (sanitized) command to create the saved Sqoop job in the Sqoop Metastore:

sqoop job \
   --create import__myscope__mydb__mytable \
   --meta-connect ... \
   -- import \
   --connect "jdbc:sqlserver://mydbserver:1433;databaseName=mydb;" \
   --username ... \
   --password-file ... \
   --num-mappers 4 \
   --target-dir ... \
   --as-parquetfile \
   --compress --compression-codec org.apache.hadoop.io.compress.SnappyCodec \
   --relaxed-isolation \
   --query "SELECT id, a, b, c, modified_datetime FROM mytable" \
   --split-by id \
   --merge-key id \
   --incremental lastmodified \
   --check-column modified_datetime \
   --last-value "1900-01-01 00:00:00.000"

Every night, Oozie would run a Sqoop action:
   sqoop job —exec import__myscope__mydb__mytable


From: "Xu, Qian A"
Reply-To: "user@sqoop.apache.org<ma...@sqoop.apache.org>"
Date: Wednesday, May 13, 2015 at 10:40 AM
To: "user@sqoop.apache.org<ma...@sqoop.apache.org>"
Subject: RE: Scoop 1.4.5 on CDH 5.2.1 crashes when doing an incremental import to Parquet files

Could you please share the command you are using for incremental import (as Parquet)?

From: Michael Arena [mailto:marena@paytronix.com]
Sent: Tuesday, May 12, 2015 6:56 AM
To: user@sqoop.apache.org<ma...@sqoop.apache.org>
Subject: Re: Scoop 1.4.5 on CDH 5.2.1 crashes when doing an incremental import to Parquet files

Yes, incremental lastmodified.

From: Abraham Elmahrek
Reply-To: "user@sqoop.apache.org<ma...@sqoop.apache.org>"
Date: Monday, May 11, 2015 at 4:47 PM
To: "user@sqoop.apache.org<ma...@sqoop.apache.org>"
Subject: Re: Scoop 1.4.5 on CDH 5.2.1 crashes when doing an incremental import to Parquet files

Hey man,

Is this Incremental lastmodified mode?

-Abe

On Fri, May 8, 2015 at 1:21 PM, Michael Arena <ma...@paytronix.com>> wrote:
Cloudera backported Parquet support to their version of Sqoop 1.4.5.

Originally, I was doing Sqoop incremental imports from SQL Server to text files (TSV).
This worked fine but the size and query speed of the textfiles are a problem.

I then tried importing as Avro files but Sqoop prohibits Avro and incremental mode.

I then tried importing as Parquet files.

The initial import worked fine and loaded 69,071 rows.
Then next time Sqoop ran, it pulled in the 1 changed row but then the "merge" step failed since it appears to think the files are text (not Parquet):

15/05/08 16:05:45 INFO mapreduce.Job: Job job_1431092783319_0252 completed successfully
15/05/08 16:05:45 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=144566
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=34006
HDFS: Number of bytes written=11556
HDFS: Number of read operations=32
HDFS: Number of large read operations=0
HDFS: Number of write operations=7
Job Counters
Launched map tasks=1
Other local map tasks=1
Total time spent by all maps in occupied slots (ms)=16564
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=8282
Total vcore-seconds taken by all map tasks=8282
Total megabyte-seconds taken by all map tasks=33923072
Map-Reduce Framework
Map input records=1
Map output records=1
Input split bytes=119
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=82
CPU time spent (ms)=4570
Physical memory (bytes) snapshot=591728640
Virtual memory (bytes) snapshot=3873914880
Total committed heap usage (bytes)=1853882368
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=0
15/05/08 16:05:45 INFO mapreduce.ImportJobBase: Transferred 11.2852 KB in 31.1579 seconds (370.8854 bytes/sec)
15/05/08 16:05:45 INFO mapreduce.ImportJobBase: Retrieved 1 records.
15/05/08 16:05:45 INFO Configuration.deprecation: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
15/05/08 16:05:46 INFO client.RMProxy: Connecting to ResourceManager at ue1b-labA02/10.74.50.172:8032<http://10.74.50.172:8032>
15/05/08 16:05:46 INFO input.FileInputFormat: Total input paths to process : 5
15/05/08 16:05:46 INFO mapreduce.JobSubmitter: number of splits:5
15/05/08 16:05:47 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1431092783319_0255
15/05/08 16:05:47 INFO impl.YarnClientImpl: Submitted application application_1431092783319_0255
15/05/08 16:05:47 INFO mapreduce.Job: The url to track the job: http://ue1b-labA02:8088/proxy/application_1431092783319_0255/
15/05/08 16:05:47 INFO mapreduce.Job: Running job: job_1431092783319_0255
15/05/08 16:06:01 INFO mapreduce.Job: Job job_1431092783319_0255 running in uber mode : false
15/05/08 16:06:01 INFO mapreduce.Job:  map 0% reduce 0%
15/05/08 16:06:07 INFO mapreduce.Job: Task Id : attempt_1431092783319_0255_m_000000_0, Status : FAILED
Error: java.lang.RuntimeException: Can't parse input data: 'PAR1��
                                                                  ��
                                                                    '
at QueryResult.__loadFromFields(QueryResult.java:1413)
at QueryResult.parse(QueryResult.java:1221)
at org.apache.sqoop.mapreduce.MergeTextMapper.map(MergeTextMapper.java:53)
at org.apache.sqoop.mapreduce.MergeTextMapper.map(MergeTextMapper.java:34)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: java.lang.NumberFormatException: For input string: "PAR1��
                                                                     ��
                                                                       "
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:492)
at java.lang.Integer.valueOf(Integer.java:582)
at QueryResult.__loadFromFields(QueryResult.java:1270)
... 11 more



How are you engaging with millennials at your organization? Earn “Lifetime Loyalty with Effective Millennial Engagement” by signing up for our next webinar. Join us Tuesday, May 12 at 1:00 EDT to obtain the tools you need to earn brand loyalty from this important demographic. Click here <http://content.paytronix.com/Lifetime-Loyalty_0515_sig.html> to register!



How are you engaging with millennials at your organization? Earn “Lifetime Loyalty with Effective Millennial Engagement” by signing up for our next webinar. Join us Tuesday, May 12 at 1:00 EDT to obtain the tools you need to earn brand loyalty from this important demographic. Click here <http://content.paytronix.com/Lifetime-Loyalty_0515_sig.html> to register!

RE: Scoop 1.4.5 on CDH 5.2.1 crashes when doing an incremental import to Parquet files

Posted by "Xu, Qian A" <qi...@intel.com>.
Could you please share the command you are using for incremental import (as Parquet)?

From: Michael Arena [mailto:marena@paytronix.com]
Sent: Tuesday, May 12, 2015 6:56 AM
To: user@sqoop.apache.org
Subject: Re: Scoop 1.4.5 on CDH 5.2.1 crashes when doing an incremental import to Parquet files

Yes, incremental lastmodified.

From: Abraham Elmahrek
Reply-To: "user@sqoop.apache.org<ma...@sqoop.apache.org>"
Date: Monday, May 11, 2015 at 4:47 PM
To: "user@sqoop.apache.org<ma...@sqoop.apache.org>"
Subject: Re: Scoop 1.4.5 on CDH 5.2.1 crashes when doing an incremental import to Parquet files

Hey man,

Is this Incremental lastmodified mode?

-Abe

On Fri, May 8, 2015 at 1:21 PM, Michael Arena <ma...@paytronix.com>> wrote:
Cloudera backported Parquet support to their version of Sqoop 1.4.5.

Originally, I was doing Sqoop incremental imports from SQL Server to text files (TSV).
This worked fine but the size and query speed of the textfiles are a problem.

I then tried importing as Avro files but Sqoop prohibits Avro and incremental mode.

I then tried importing as Parquet files.

The initial import worked fine and loaded 69,071 rows.
Then next time Sqoop ran, it pulled in the 1 changed row but then the "merge" step failed since it appears to think the files are text (not Parquet):

15/05/08 16:05:45 INFO mapreduce.Job: Job job_1431092783319_0252 completed successfully
15/05/08 16:05:45 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=144566
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=34006
HDFS: Number of bytes written=11556
HDFS: Number of read operations=32
HDFS: Number of large read operations=0
HDFS: Number of write operations=7
Job Counters
Launched map tasks=1
Other local map tasks=1
Total time spent by all maps in occupied slots (ms)=16564
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=8282
Total vcore-seconds taken by all map tasks=8282
Total megabyte-seconds taken by all map tasks=33923072
Map-Reduce Framework
Map input records=1
Map output records=1
Input split bytes=119
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=82
CPU time spent (ms)=4570
Physical memory (bytes) snapshot=591728640
Virtual memory (bytes) snapshot=3873914880
Total committed heap usage (bytes)=1853882368
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=0
15/05/08 16:05:45 INFO mapreduce.ImportJobBase: Transferred 11.2852 KB in 31.1579 seconds (370.8854 bytes/sec)
15/05/08 16:05:45 INFO mapreduce.ImportJobBase: Retrieved 1 records.
15/05/08 16:05:45 INFO Configuration.deprecation: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
15/05/08 16:05:46 INFO client.RMProxy: Connecting to ResourceManager at ue1b-labA02/10.74.50.172:8032<http://10.74.50.172:8032>
15/05/08 16:05:46 INFO input.FileInputFormat: Total input paths to process : 5
15/05/08 16:05:46 INFO mapreduce.JobSubmitter: number of splits:5
15/05/08 16:05:47 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1431092783319_0255
15/05/08 16:05:47 INFO impl.YarnClientImpl: Submitted application application_1431092783319_0255
15/05/08 16:05:47 INFO mapreduce.Job: The url to track the job: http://ue1b-labA02:8088/proxy/application_1431092783319_0255/
15/05/08 16:05:47 INFO mapreduce.Job: Running job: job_1431092783319_0255
15/05/08 16:06:01 INFO mapreduce.Job: Job job_1431092783319_0255 running in uber mode : false
15/05/08 16:06:01 INFO mapreduce.Job:  map 0% reduce 0%
15/05/08 16:06:07 INFO mapreduce.Job: Task Id : attempt_1431092783319_0255_m_000000_0, Status : FAILED
Error: java.lang.RuntimeException: Can't parse input data: 'PAR1��
                                                                  ��
                                                                    '
at QueryResult.__loadFromFields(QueryResult.java:1413)
at QueryResult.parse(QueryResult.java:1221)
at org.apache.sqoop.mapreduce.MergeTextMapper.map(MergeTextMapper.java:53)
at org.apache.sqoop.mapreduce.MergeTextMapper.map(MergeTextMapper.java:34)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: java.lang.NumberFormatException: For input string: "PAR1��
                                                                     ��
                                                                       "
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:492)
at java.lang.Integer.valueOf(Integer.java:582)
at QueryResult.__loadFromFields(QueryResult.java:1270)
... 11 more



How are you engaging with millennials at your organization? Earn “Lifetime Loyalty with Effective Millennial Engagement” by signing up for our next webinar. Join us Tuesday, May 12 at 1:00 EDT to obtain the tools you need to earn brand loyalty from this important demographic. Click here <http://content.paytronix.com/Lifetime-Loyalty_0515_sig.html> to register!



How are you engaging with millennials at your organization? Earn “Lifetime Loyalty with Effective Millennial Engagement” by signing up for our next webinar. Join us Tuesday, May 12 at 1:00 EDT to obtain the tools you need to earn brand loyalty from this important demographic. Click here <http://content.paytronix.com/Lifetime-Loyalty_0515_sig.html> to register!

Re: Scoop 1.4.5 on CDH 5.2.1 crashes when doing an incremental import to Parquet files

Posted by Michael Arena <ma...@paytronix.com>.
Yes, incremental lastmodified.

From: Abraham Elmahrek
Reply-To: "user@sqoop.apache.org<ma...@sqoop.apache.org>"
Date: Monday, May 11, 2015 at 4:47 PM
To: "user@sqoop.apache.org<ma...@sqoop.apache.org>"
Subject: Re: Scoop 1.4.5 on CDH 5.2.1 crashes when doing an incremental import to Parquet files

Hey man,

Is this Incremental lastmodified mode?

-Abe

On Fri, May 8, 2015 at 1:21 PM, Michael Arena <ma...@paytronix.com>> wrote:
Cloudera backported Parquet support to their version of Sqoop 1.4.5.

Originally, I was doing Sqoop incremental imports from SQL Server to text files (TSV).
This worked fine but the size and query speed of the textfiles are a problem.

I then tried importing as Avro files but Sqoop prohibits Avro and incremental mode.

I then tried importing as Parquet files.

The initial import worked fine and loaded 69,071 rows.
Then next time Sqoop ran, it pulled in the 1 changed row but then the "merge" step failed since it appears to think the files are text (not Parquet):

15/05/08 16:05:45 INFO mapreduce.Job: Job job_1431092783319_0252 completed successfully
15/05/08 16:05:45 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=144566
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=34006
HDFS: Number of bytes written=11556
HDFS: Number of read operations=32
HDFS: Number of large read operations=0
HDFS: Number of write operations=7
Job Counters
Launched map tasks=1
Other local map tasks=1
Total time spent by all maps in occupied slots (ms)=16564
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=8282
Total vcore-seconds taken by all map tasks=8282
Total megabyte-seconds taken by all map tasks=33923072
Map-Reduce Framework
Map input records=1
Map output records=1
Input split bytes=119
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=82
CPU time spent (ms)=4570
Physical memory (bytes) snapshot=591728640
Virtual memory (bytes) snapshot=3873914880
Total committed heap usage (bytes)=1853882368
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=0
15/05/08 16:05:45 INFO mapreduce.ImportJobBase: Transferred 11.2852 KB in 31.1579 seconds (370.8854 bytes/sec)
15/05/08 16:05:45 INFO mapreduce.ImportJobBase: Retrieved 1 records.
15/05/08 16:05:45 INFO Configuration.deprecation: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
15/05/08 16:05:46 INFO client.RMProxy: Connecting to ResourceManager at ue1b-labA02/10.74.50.172:8032<http://10.74.50.172:8032>
15/05/08 16:05:46 INFO input.FileInputFormat: Total input paths to process : 5
15/05/08 16:05:46 INFO mapreduce.JobSubmitter: number of splits:5
15/05/08 16:05:47 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1431092783319_0255
15/05/08 16:05:47 INFO impl.YarnClientImpl: Submitted application application_1431092783319_0255
15/05/08 16:05:47 INFO mapreduce.Job: The url to track the job: http://ue1b-labA02:8088/proxy/application_1431092783319_0255/
15/05/08 16:05:47 INFO mapreduce.Job: Running job: job_1431092783319_0255
15/05/08 16:06:01 INFO mapreduce.Job: Job job_1431092783319_0255 running in uber mode : false
15/05/08 16:06:01 INFO mapreduce.Job:  map 0% reduce 0%
15/05/08 16:06:07 INFO mapreduce.Job: Task Id : attempt_1431092783319_0255_m_000000_0, Status : FAILED
Error: java.lang.RuntimeException: Can't parse input data: 'PAR1��
                                                                  ��
                                                                    '
at QueryResult.__loadFromFields(QueryResult.java:1413)
at QueryResult.parse(QueryResult.java:1221)
at org.apache.sqoop.mapreduce.MergeTextMapper.map(MergeTextMapper.java:53)
at org.apache.sqoop.mapreduce.MergeTextMapper.map(MergeTextMapper.java:34)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: java.lang.NumberFormatException: For input string: "PAR1��
                                                                     ��
                                                                       "
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:492)
at java.lang.Integer.valueOf(Integer.java:582)
at QueryResult.__loadFromFields(QueryResult.java:1270)
... 11 more



How are you engaging with millennials at your organization? Earn “Lifetime Loyalty with Effective Millennial Engagement” by signing up for our next webinar. Join us Tuesday, May 12 at 1:00 EDT to obtain the tools you need to earn brand loyalty from this important demographic. Click here <http://content.paytronix.com/Lifetime-Loyalty_0515_sig.html> to register!



How are you engaging with millennials at your organization? Earn “Lifetime Loyalty with Effective Millennial Engagement” by signing up for our next webinar. Join us Tuesday, May 12 at 1:00 EDT to obtain the tools you need to earn brand loyalty from this important demographic. Click here <http://content.paytronix.com/Lifetime-Loyalty_0515_sig.html> to register!

Re: Scoop 1.4.5 on CDH 5.2.1 crashes when doing an incremental import to Parquet files

Posted by Abraham Elmahrek <ab...@cloudera.com>.
Hey man,

Is this Incremental lastmodified mode?

-Abe

On Fri, May 8, 2015 at 1:21 PM, Michael Arena <ma...@paytronix.com> wrote:

>   Cloudera backported Parquet support to their version of Sqoop 1.4.5.
>
>  Originally, I was doing Sqoop incremental imports from SQL Server to
> text files (TSV).
> This worked fine but the size and query speed of the textfiles are a
> problem.
>
>  I then tried importing as Avro files but Sqoop prohibits Avro and
> incremental mode.
>
>  I then tried importing as Parquet files.
>
>  The initial import worked fine and loaded 69,071 rows.
> Then next time Sqoop ran, it pulled in the 1 changed row but then the
> "merge" step failed since it appears to think the files are text (not
> Parquet):
>
>  15/05/08 16:05:45 INFO mapreduce.Job: Job job_1431092783319_0252
> completed successfully
> 15/05/08 16:05:45 INFO mapreduce.Job: Counters: 30
> File System Counters
> FILE: Number of bytes read=0
> FILE: Number of bytes written=144566
> FILE: Number of read operations=0
> FILE: Number of large read operations=0
> FILE: Number of write operations=0
> HDFS: Number of bytes read=34006
> HDFS: Number of bytes written=11556
> HDFS: Number of read operations=32
> HDFS: Number of large read operations=0
> HDFS: Number of write operations=7
> Job Counters
> Launched map tasks=1
> Other local map tasks=1
> Total time spent by all maps in occupied slots (ms)=16564
> Total time spent by all reduces in occupied slots (ms)=0
> Total time spent by all map tasks (ms)=8282
> Total vcore-seconds taken by all map tasks=8282
> Total megabyte-seconds taken by all map tasks=33923072
> Map-Reduce Framework
> Map input records=1
> Map output records=1
> Input split bytes=119
> Spilled Records=0
> Failed Shuffles=0
> Merged Map outputs=0
> GC time elapsed (ms)=82
> CPU time spent (ms)=4570
> Physical memory (bytes) snapshot=591728640
> Virtual memory (bytes) snapshot=3873914880
> Total committed heap usage (bytes)=1853882368
> File Input Format Counters
> Bytes Read=0
> File Output Format Counters
> Bytes Written=0
> 15/05/08 16:05:45 INFO mapreduce.ImportJobBase: Transferred 11.2852 KB in
> 31.1579 seconds (370.8854 bytes/sec)
> 15/05/08 16:05:45 INFO mapreduce.ImportJobBase: Retrieved 1 records.
> 15/05/08 16:05:45 INFO Configuration.deprecation: mapred.output.key.class
> is deprecated. Instead, use mapreduce.job.output.key.class
> 15/05/08 16:05:46 INFO client.RMProxy: Connecting to ResourceManager at
> ue1b-labA02/10.74.50.172:8032
> 15/05/08 16:05:46 INFO input.FileInputFormat: Total input paths to process
> : 5
> 15/05/08 16:05:46 INFO mapreduce.JobSubmitter: number of splits:5
> 15/05/08 16:05:47 INFO mapreduce.JobSubmitter: Submitting tokens for job:
> job_1431092783319_0255
> 15/05/08 16:05:47 INFO impl.YarnClientImpl: Submitted application
> application_1431092783319_0255
> 15/05/08 16:05:47 INFO mapreduce.Job: The url to track the job:
> http://ue1b-labA02:8088/proxy/application_1431092783319_0255/
> 15/05/08 16:05:47 INFO mapreduce.Job: Running job: job_1431092783319_0255
> 15/05/08 16:06:01 INFO mapreduce.Job: Job job_1431092783319_0255 running
> in uber mode : false
> 15/05/08 16:06:01 INFO mapreduce.Job:  map 0% reduce 0%
> 15/05/08 16:06:07 INFO mapreduce.Job: Task Id :
> attempt_1431092783319_0255_m_000000_0, Status : FAILED
> Error: java.lang.RuntimeException: Can't parse input data: 'PAR1��
>                                                                   ��
>                                                                     '
> at QueryResult.__loadFromFields(QueryResult.java:1413)
> at QueryResult.parse(QueryResult.java:1221)
> at org.apache.sqoop.mapreduce.MergeTextMapper.map(MergeTextMapper.java:53)
> at org.apache.sqoop.mapreduce.MergeTextMapper.map(MergeTextMapper.java:34)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
> Caused by: java.lang.NumberFormatException: For input string: "PAR1��
>                                                                      ��
>                                                                        "
> at
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
> at java.lang.Integer.parseInt(Integer.java:492)
> at java.lang.Integer.valueOf(Integer.java:582)
> at QueryResult.__loadFromFields(QueryResult.java:1270)
> ... 11 more
>
>
>
> How are you engaging with millennials at your organization? Earn “Lifetime
> Loyalty with Effective Millennial Engagement” by signing up for our next
> webinar. Join us *Tuesday, May 12 at 1:00 EDT *to obtain the tools you
> need to earn brand loyalty from this important demographic. Click here
> <http://content.paytronix.com/Lifetime-Loyalty_0515_sig.html> to
> register!
>

Re: Scoop 1.4.5 on CDH 5.2.1 crashes when doing an incremental import to Parquet files

Posted by asser dennis <de...@gmail.com>.
Please unsubscribe me
Thanks
On May 8, 2015 11:22 PM, "Michael Arena" <ma...@paytronix.com> wrote:

>   Cloudera backported Parquet support to their version of Sqoop 1.4.5.
>
>  Originally, I was doing Sqoop incremental imports from SQL Server to
> text files (TSV).
> This worked fine but the size and query speed of the textfiles are a
> problem.
>
>  I then tried importing as Avro files but Sqoop prohibits Avro and
> incremental mode.
>
>  I then tried importing as Parquet files.
>
>  The initial import worked fine and loaded 69,071 rows.
> Then next time Sqoop ran, it pulled in the 1 changed row but then the
> "merge" step failed since it appears to think the files are text (not
> Parquet):
>
>  15/05/08 16:05:45 INFO mapreduce.Job: Job job_1431092783319_0252
> completed successfully
> 15/05/08 16:05:45 INFO mapreduce.Job: Counters: 30
> File System Counters
> FILE: Number of bytes read=0
> FILE: Number of bytes written=144566
> FILE: Number of read operations=0
> FILE: Number of large read operations=0
> FILE: Number of write operations=0
> HDFS: Number of bytes read=34006
> HDFS: Number of bytes written=11556
> HDFS: Number of read operations=32
> HDFS: Number of large read operations=0
> HDFS: Number of write operations=7
> Job Counters
> Launched map tasks=1
> Other local map tasks=1
> Total time spent by all maps in occupied slots (ms)=16564
> Total time spent by all reduces in occupied slots (ms)=0
> Total time spent by all map tasks (ms)=8282
> Total vcore-seconds taken by all map tasks=8282
> Total megabyte-seconds taken by all map tasks=33923072
> Map-Reduce Framework
> Map input records=1
> Map output records=1
> Input split bytes=119
> Spilled Records=0
> Failed Shuffles=0
> Merged Map outputs=0
> GC time elapsed (ms)=82
> CPU time spent (ms)=4570
> Physical memory (bytes) snapshot=591728640
> Virtual memory (bytes) snapshot=3873914880
> Total committed heap usage (bytes)=1853882368
> File Input Format Counters
> Bytes Read=0
> File Output Format Counters
> Bytes Written=0
> 15/05/08 16:05:45 INFO mapreduce.ImportJobBase: Transferred 11.2852 KB in
> 31.1579 seconds (370.8854 bytes/sec)
> 15/05/08 16:05:45 INFO mapreduce.ImportJobBase: Retrieved 1 records.
> 15/05/08 16:05:45 INFO Configuration.deprecation: mapred.output.key.class
> is deprecated. Instead, use mapreduce.job.output.key.class
> 15/05/08 16:05:46 INFO client.RMProxy: Connecting to ResourceManager at
> ue1b-labA02/10.74.50.172:8032
> 15/05/08 16:05:46 INFO input.FileInputFormat: Total input paths to process
> : 5
> 15/05/08 16:05:46 INFO mapreduce.JobSubmitter: number of splits:5
> 15/05/08 16:05:47 INFO mapreduce.JobSubmitter: Submitting tokens for job:
> job_1431092783319_0255
> 15/05/08 16:05:47 INFO impl.YarnClientImpl: Submitted application
> application_1431092783319_0255
> 15/05/08 16:05:47 INFO mapreduce.Job: The url to track the job:
> http://ue1b-labA02:8088/proxy/application_1431092783319_0255/
> 15/05/08 16:05:47 INFO mapreduce.Job: Running job: job_1431092783319_0255
> 15/05/08 16:06:01 INFO mapreduce.Job: Job job_1431092783319_0255 running
> in uber mode : false
> 15/05/08 16:06:01 INFO mapreduce.Job:  map 0% reduce 0%
> 15/05/08 16:06:07 INFO mapreduce.Job: Task Id :
> attempt_1431092783319_0255_m_000000_0, Status : FAILED
> Error: java.lang.RuntimeException: Can't parse input data: 'PAR1��
>                                                                   ��
>                                                                     '
> at QueryResult.__loadFromFields(QueryResult.java:1413)
> at QueryResult.parse(QueryResult.java:1221)
> at org.apache.sqoop.mapreduce.MergeTextMapper.map(MergeTextMapper.java:53)
> at org.apache.sqoop.mapreduce.MergeTextMapper.map(MergeTextMapper.java:34)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
> Caused by: java.lang.NumberFormatException: For input string: "PAR1��
>                                                                      ��
>                                                                        "
> at
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
> at java.lang.Integer.parseInt(Integer.java:492)
> at java.lang.Integer.valueOf(Integer.java:582)
> at QueryResult.__loadFromFields(QueryResult.java:1270)
> ... 11 more
>
>
>
> How are you engaging with millennials at your organization? Earn “Lifetime
> Loyalty with Effective Millennial Engagement” by signing up for our next
> webinar. Join us *Tuesday, May 12 at 1:00 EDT *to obtain the tools you
> need to earn brand loyalty from this important demographic. Click here
> <http://content.paytronix.com/Lifetime-Loyalty_0515_sig.html> to
> register!
>