You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@sqoop.apache.org by Greg Lindholm <gr...@gmail.com> on 2018/02/14 21:49:43 UTC

Sqoop import with HCatalog on AWS EMR

Hi Sqoop Users,

I was attempting to Sqoop import with HCat on an AWS EMR cluster.
I was importing from a MySQL database and writing to a S3 location.

sudo sqoop import \
  --connect jdbc:mysql://xxx.us-east-2.compute.amazonaws.com:3306/test1 \
  --username xxx -P\
  --table sampledata1 \
  --hcatalog-database greg3 \
  --hcatalog-table sampledata1_orc1 \
  --create-hcatalog-table \
  --hcatalog-storage-stanza 'stored as orc'

The database (greg3) was created in hive with a location to an S3 bucket.

The sqoop job would run and succeed but the data file was never being
written.
The table was created correctly in Hive HCatalog and the table folders were
created on S3 but no data file was being written.

I found the solution buried in a page on HCatalog under EMR.

You have to set these mapred config values to "Disable Direct Write When
Using HCatalog HStorer"

  -Dmapred.output.direct.NativeS3FileSystem=false \
  -Dmapred.output.direct.EmrFileSystem=false \

Here is the link:
https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hcatalog-using.html

Hopefully this will save someone else a lot of trouble.

/Greg