You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@sqoop.apache.org by Douglas Spadotto <do...@gmail.com> on 2016/10/25 17:21:28 UTC

Sqoop export from Hive table stored as Parquet

Hello everyone,

I saw in the past few months quite a few messages about Parquet support on
Sqoop, all about importing. Some of them worked well.

But for exporting I'm receiving this error when trying to export from a
Hive table stored as Parquet to Postgresql:

[cloudera@quickstart ~]$ sqoop export --connect
jdbc:postgresql://localhost/postgres --table test1  --export-dir
/user/hive/warehouse/teste1
Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will
fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
16/10/25 09:19:09 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.8.0
16/10/25 09:19:09 INFO manager.SqlManager: Using default fetchSize of 1000
16/10/25 09:19:09 INFO tool.CodeGenTool: Beginning code generation
16/10/25 09:19:10 INFO manager.SqlManager: Executing SQL statement: SELECT
t.* FROM "test1" AS t LIMIT 1
16/10/25 09:19:10 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is
/usr/lib/hadoop-mapreduce
Note:
/tmp/sqoop-cloudera/compile/019c3435216213411e2de14c483af692/test1.java
uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
16/10/25 09:19:11 INFO orm.CompilationManager: Writing jar file:
/tmp/sqoop-cloudera/compile/019c3435216213411e2de14c483af692/test1.jar
16/10/25 09:19:11 INFO mapreduce.ExportJobBase: Beginning export of test1
16/10/25 09:19:12 INFO Configuration.deprecation: mapred.jar is deprecated.
Instead, use mapreduce.job.jar
16/10/25 09:19:12 INFO Configuration.deprecation: mapred.map.max.attempts
is deprecated. Instead, use mapreduce.map.maxattempts
16/10/25 09:19:13 INFO manager.SqlManager: Executing SQL statement: SELECT
t.* FROM "test1" AS t LIMIT 1
16/10/25 09:19:13 ERROR sqoop.Sqoop: Got exception running Sqoop:
org.kitesdk.data.DatasetNotFoundException: Descriptor location does not
exist: hdfs://quickstart.cloudera:8020/user/hive/warehouse/teste1/.metadata
org.kitesdk.data.DatasetNotFoundException: Descriptor location does not
exist: hdfs://quickstart.cloudera:8020/user/hive/warehouse/teste1/.metadata
at
org.kitesdk.data.spi.filesystem.FileSystemMetadataProvider.checkExists(FileSystemMetadataProvider.java:562)
at
org.kitesdk.data.spi.filesystem.FileSystemMetadataProvider.find(FileSystemMetadataProvider.java:605)
at
org.kitesdk.data.spi.filesystem.FileSystemMetadataProvider.load(FileSystemMetadataProvider.java:114)
at
org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository.load(FileSystemDatasetRepository.java:197)
at org.kitesdk.data.Datasets.load(Datasets.java:108)
at org.kitesdk.data.Datasets.load(Datasets.java:140)
at
org.kitesdk.data.mapreduce.DatasetKeyInputFormat$ConfigBuilder.readFrom(DatasetKeyInputFormat.java:92)
at
org.kitesdk.data.mapreduce.DatasetKeyInputFormat$ConfigBuilder.readFrom(DatasetKeyInputFormat.java:139)
at
org.apache.sqoop.mapreduce.JdbcExportJob.configureInputFormat(JdbcExportJob.java:84)
at
org.apache.sqoop.mapreduce.ExportJobBase.runExport(ExportJobBase.java:432)
at org.apache.sqoop.manager.SqlManager.exportTable(SqlManager.java:931)
at org.apache.sqoop.tool.ExportTool.exportTable(ExportTool.java:81)
at org.apache.sqoop.tool.ExportTool.run(ExportTool.java:100)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
at org.apache.sqoop.Sqoop.main(Sqoop.java:236)

I saw a recent JIRA opened about this,
https://issues.apache.org/jira/browse/SQOOP-2907, and am wondering if there
is any workaround for this?

Thanks in advance,

Douglas

-- 
Visite: http://canseidesercowboy.wordpress.com/
Siga: @dougspadotto ou @excowboys
-----
Frodo: "I wish none of this had happened."
Gandalf: "So do all who live to see such times, but that is not for them to
decide. All we have to decide is what to do with the time that is given to
us."
-- Lord of the Rings: The Fellowship of the Ring (2001)

RE: Sqoop export from Hive table stored as Parquet

Posted by "Ravi, Chandramouli" <Ch...@vantiv.com>.
Hi Markus,

This solution looks very exciting.
Thanks for your help. We are trying that in my company too.
Hopefully it will work in our CDH

Thanks,
Chandra

From: Markus Kemper [mailto:markus@cloudera.com]
Sent: Tuesday, October 25, 2016 2:00 PM
To: user@sqoop.apache.org
Subject: Re: Sqoop export from Hive table stored as Parquet

Glad to hear it.  Let us know how it goes


Markus Kemper
Customer Operations Engineer
[www.cloudera.com]<http://www.cloudera.com/>


On Tue, Oct 25, 2016 at 1:55 PM, Douglas Spadotto <do...@gmail.com>> wrote:
Hi Markus,

Thank you!

I tried this (hcatalog options) myself a few minutes after I hit "send" on the e-mail. It worked fine, Sqoop was able to read the Parquet structure. Just my MR crashed but it was due to my unstable environment.

It looks like I'm on the way to the solution.

Cheers,

Douglas

On Tue, Oct 25, 2016 at 3:32 PM, Markus Kemper <ma...@cloudera.com>> wrote:
Hello Douglas,

The only workaround that I am aware of is to use the Sqoop --hcatalog options, for example:

sqoop export --connect <jdbc_connection_string> --table <rdbms_table> --hcatalog-database <hive_database> --hcatalog-table <hive_table>



Markus Kemper
Customer Operations Engineer
[www.cloudera.com]<http://www.cloudera.com/>


On Tue, Oct 25, 2016 at 1:21 PM, Douglas Spadotto <do...@gmail.com>> wrote:
Hello everyone,

I saw in the past few months quite a few messages about Parquet support on Sqoop, all about importing. Some of them worked well.

But for exporting I'm receiving this error when trying to export from a Hive table stored as Parquet to Postgresql:

[cloudera@quickstart ~]$ sqoop export --connect jdbc:postgresql://localhost/postgres --table test1  --export-dir /user/hive/warehouse/teste1
Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
16/10/25 09:19:09 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.8.0
16/10/25 09:19:09 INFO manager.SqlManager: Using default fetchSize of 1000
16/10/25 09:19:09 INFO tool.CodeGenTool: Beginning code generation
16/10/25 09:19:10 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM "test1" AS t LIMIT 1
16/10/25 09:19:10 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce
Note: /tmp/sqoop-cloudera/compile/019c3435216213411e2de14c483af692/test1.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
16/10/25 09:19:11 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-cloudera/compile/019c3435216213411e2de14c483af692/test1.jar
16/10/25 09:19:11 INFO mapreduce.ExportJobBase: Beginning export of test1
16/10/25 09:19:12 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
16/10/25 09:19:12 INFO Configuration.deprecation: mapred.map.max.attempts is deprecated. Instead, use mapreduce.map.maxattempts
16/10/25 09:19:13 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM "test1" AS t LIMIT 1
16/10/25 09:19:13 ERROR sqoop.Sqoop: Got exception running Sqoop: org.kitesdk.data.DatasetNotFoundException: Descriptor location does not exist: hdfs://quickstart.cloudera:8020/user/hive/warehouse/teste1/.metadata
org.kitesdk.data.DatasetNotFoundException: Descriptor location does not exist: hdfs://quickstart.cloudera:8020/user/hive/warehouse/teste1/.metadata
at org.kitesdk.data.spi.filesystem.FileSystemMetadataProvider.checkExists(FileSystemMetadataProvider.java:562)
at org.kitesdk.data.spi.filesystem.FileSystemMetadataProvider.find(FileSystemMetadataProvider.java:605)
at org.kitesdk.data.spi.filesystem.FileSystemMetadataProvider.load(FileSystemMetadataProvider.java:114)
at org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository.load(FileSystemDatasetRepository.java:197)
at org.kitesdk.data.Datasets.load(Datasets.java:108)
at org.kitesdk.data.Datasets.load(Datasets.java:140)
at org.kitesdk.data.mapreduce.DatasetKeyInputFormat$ConfigBuilder.readFrom(DatasetKeyInputFormat.java:92)
at org.kitesdk.data.mapreduce.DatasetKeyInputFormat$ConfigBuilder.readFrom(DatasetKeyInputFormat.java:139)
at org.apache.sqoop.mapreduce.JdbcExportJob.configureInputFormat(JdbcExportJob.java:84)
at org.apache.sqoop.mapreduce.ExportJobBase.runExport(ExportJobBase.java:432)
at org.apache.sqoop.manager.SqlManager.exportTable(SqlManager.java:931)
at org.apache.sqoop.tool.ExportTool.exportTable(ExportTool.java:81)
at org.apache.sqoop.tool.ExportTool.run(ExportTool.java:100)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
at org.apache.sqoop.Sqoop.main(Sqoop.java:236)

I saw a recent JIRA opened about this, https://issues.apache.org/jira/browse/SQOOP-2907, and am wondering if there is any workaround for this?

Thanks in advance,

Douglas

--
Visite: http://canseidesercowboy.wordpress.com/
Siga: @dougspadotto ou @excowboys
-----
Frodo: "I wish none of this had happened."
Gandalf: "So do all who live to see such times, but that is not for them to decide. All we have to decide is what to do with the time that is given to us."
-- Lord of the Rings: The Fellowship of the Ring (2001)




--
Visite: http://canseidesercowboy.wordpress.com/
Siga: @dougspadotto ou @excowboys
-----
Frodo: "I wish none of this had happened."
Gandalf: "So do all who live to see such times, but that is not for them to decide. All we have to decide is what to do with the time that is given to us."
-- Lord of the Rings: The Fellowship of the Ring (2001)


 **NOTICE: This e-mail message, including any attachments hereto, is for the sole use of the intended recipient(s) and may contain confidential and/or privileged information.  If you are not the intended recipient(s), any unauthorized review, use, copying, disclosure or distribution is prohibited.  If you are not the intended recipient(s), please contact the sender by reply e-mail immediately and destroy the original and all copies (including electronic versions) of this message and any of its attachments.

Re: Sqoop export from Hive table stored as Parquet

Posted by Markus Kemper <ma...@cloudera.com>.
Unfortunately Venkat is correct about (import + --hcatalog with parquet)
however just in case others following this thread are not aware are, it is
possible to import Parquet data into Hive, for example:

$ sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD
--table t1 --target-dir /user/<hdfs_user>/t1 --delete-target-dir
--hive-import --hive-database default --hive-table t1_parquet --num-mappers
1 --as-parquetfile

$ sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD
--table t1 --target-dir /user/hive/warehouse/t1_parquet --num-mappers 1
--as-parquetfile --append

$ impala-shell -i "<impala_daemon_host>:21000" -q "use default; select
c_int, from_unixtime(cast(c_date/1000 as bigint), 'yyyy-MM-dd') as c_date,
from_unixtime(cast(c_timestamp/1000 as bigint), 'yyyy-MM-dd HH:mm:ss.S') as
c_timestamp from t1_parquet"

+-------+------------+-----------------------+
| c_int | c_date     | c_timestamp           |
+-------+------------+-----------------------+
| 1     | 2016-10-26 | 2016-10-26 21:30:33.0 |
| 1     | 2016-10-26 | 2016-10-26 21:30:33.0 |
+-------+------------+-----------------------+



Markus Kemper
Customer Operations Engineer
[image: www.cloudera.com] <http://www.cloudera.com>


On Tue, Oct 25, 2016 at 4:35 PM, Venkat Ranganathan <
vranganathan@hortonworks.com> wrote:

> Thanks for testing and answering that the export works.   One thing to
> note is that imports are not allowed into parquet tables with HCatalog (See
> HIVE-7502).   It has been a long standing issue since we created the Sqoop
> HCatalog integration a few years ago L
>
>
>
> Thanks
>
>
> Venkat
>
>
>
> *From: *Markus Kemper <ma...@cloudera.com>
> *Reply-To: *"user@sqoop.apache.org" <us...@sqoop.apache.org>
> *Date: *Tuesday, October 25, 2016 at 12:01 PM
> *To: *"user@sqoop.apache.org" <us...@sqoop.apache.org>
> *Subject: *Re: Sqoop export from Hive table stored as Parquet
>
>
>
> Awesomeness and thank you for helping with the other forums.
>
>
>
>
> Markus Kemper
> Customer Operations Engineer
> [image: www.cloudera.com] <http://www.cloudera.com>
>
>
>
>
>
> On Tue, Oct 25, 2016 at 2:56 PM, Douglas Spadotto <do...@gmail.com>
> wrote:
>
> Hi Markus,
>
>
>
> It worked fine end to end. Here it goes the edited output:
>
>
>
> *SOURCE:*
>
>
>
> hive> describe extended teste1;
>
> OK
>
> id                   int
>
> nome                 string
>
>
>
> Detailed Table Information Table(tableName:teste1, dbName:default,
> owner:cloudera, createTime:1477408342, lastAccessTime:0, retention:0,
> sd:StorageDescriptor(cols:[FieldSchema(name:id, type:int, comment:null),
> FieldSchema(name:nome, type:string, comment:null)],
> location:hdfs://quickstart.cloudera:8020/user/hive/warehouse/teste1, *inputFormat:org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat,
> outputFormat:org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat,*
> compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null,
> serializationLib:org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe,
> parameters:{serialization.format=1}), bucketCols:[], sortCols:[],
> parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[],
> skewedColValues:[], skewedColValueLocationMaps:{}),
> storedAsSubDirectories:false), partitionKeys:[], parameters:{numFiles=2,
> transient_lastDdlTime=1477408508, COLUMN_STATS_ACCURATE=true,
> totalSize=645, numRows=3, rawDataSize=6}, viewOriginalText:null,
> viewExpandedText:null, tableType:MANAGED_TABLE)
>
> Time taken: 0.08 seconds, Fetched: 4 row(s)
>
>
>
> hive> select * from teste1;
>
> OK
>
> ...
>
> 1 Douglas
>
> 2 Spadotto
>
> 3 Doug
>
> Time taken: 0.323 seconds, Fetched: 3 row(s)
>
>
>
> *SQOOP:*
>
>
>
> cloudera@quickstart ~]$ sqoop export --connect
> jdbc:postgresql://localhost/postgres --table test1 --hcatalog-table
> teste1
>
> Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will
> fail.
>
> Please set $ACCUMULO_HOME to the root of your Accumulo installation.
>
> 16/10/25 11:42:19 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.8.0
>
> 16/10/25 11:42:19 INFO manager.SqlManager: Using default fetchSize of 1000
>
> 16/10/25 11:42:19 INFO tool.CodeGenTool: Beginning code generation
>
> 16/10/25 11:42:19 INFO manager.SqlManager: Executing SQL statement: SELECT
> t.* FROM "test1" AS t LIMIT 1
>
> 16/10/25 11:42:19 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is
> /usr/lib/hadoop-mapreduce
>
> Note: /tmp/sqoop-cloudera/compile/50329ee88534a5c3b6df8b37aadf2131/test1.java
> uses or overrides a deprecated API.
>
> Note: Recompile with -Xlint:deprecation for details.
>
> 16/10/25 11:42:20 INFO orm.CompilationManager: Writing jar file:
> /tmp/sqoop-cloudera/compile/50329ee88534a5c3b6df8b37aadf2131/test1.jar
>
> 16/10/25 11:42:20 INFO mapreduce.ExportJobBase: Beginning export of test1
>
> 16/10/25 11:42:21 INFO Configuration.deprecation: mapred.jar is
> deprecated. Instead, use mapreduce.job.jar
>
> 16/10/25 11:42:21 INFO Configuration.deprecation: mapred.map.max.attempts
> is deprecated. Instead, use mapreduce.map.maxattempts
>
> 16/10/25 11:42:21 INFO mapreduce.ExportJobBase: Configuring HCatalog for
> export job
>
> 16/10/25 11:42:21 INFO hcat.SqoopHCatUtilities: Configuring HCatalog
> specific details for job
>
> 16/10/25 11:42:21 INFO manager.SqlManager: Executing SQL statement: SELECT
> t.* FROM "test1" AS t LIMIT 1
>
> 16/10/25 11:42:21 INFO hcat.SqoopHCatUtilities: Database column names
> projected : [id, nome]
>
> 16/10/25 11:42:21 INFO hcat.SqoopHCatUtilities: Database column name -
> info map :
>
> id : [Type : 4,Precision : 10,Scale : 0]
>
> nome : [Type : 12,Precision : 50,Scale : 0]  *<-- Got the structure here!*
>
>
>
> 16/10/25 11:42:21 INFO hive.metastore: Trying to connect to metastore with
> URI thrift://quickstart.cloudera:9083
>
> 16/10/25 11:42:21 INFO hive.metastore: Opened a connection to metastore,
> current connections: 1
>
> 16/10/25 11:42:21 INFO hive.metastore: Connected to metastore.
>
> 16/10/25 11:42:21 INFO hcat.SqoopHCatUtilities: HCatalog full table schema
> fields = [id, nome]
>
> ...
>
> 16/10/25 11:42:22 WARN hcat.SqoopHCatUtilities: No files under
> /usr/lib/hive-hcatalog/share/hcatalog/storage-handlers to add to
> distributed cache for hcatalog job
>
> 16/10/25 11:42:22 INFO hcat.SqoopHCatUtilities: Configuring HCatalog for
> export job
>
> 16/10/25 11:42:22 INFO hcat.SqoopHCatUtilities: Ignoring configuration
> request for HCatalog info
>
> 16/10/25 11:42:22 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution
> is deprecated. Instead, use mapreduce.reduce.speculative
>
> 16/10/25 11:42:22 INFO Configuration.deprecation:
> mapred.map.tasks.speculative.execution is deprecated. Instead, use
> mapreduce.map.speculative
>
> 16/10/25 11:42:22 INFO Configuration.deprecation: mapred.map.tasks is
> deprecated. Instead, use mapreduce.job.maps
>
> 16/10/25 11:42:22 INFO client.RMProxy: Connecting to ResourceManager at
> quickstart.cloudera/192.168.26.129:8032
>
> 16/10/25 11:42:49 INFO Configuration.deprecation: mapred.input.dir is
> deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
>
> 16/10/25 11:42:49 INFO mapred.FileInputFormat: Total input paths to
> process : 2
>
> 16/10/25 11:42:50 INFO mapreduce.JobSubmitter: number of splits:2
>
> ...
>
> 16/10/25 11:43:07 INFO mapreduce.ExportJobBase: Transferred 15.9258 KB in
> 45.1568 seconds (361.1415 bytes/sec)
>
> 16/10/25 11:43:07 INFO mapreduce.ExportJobBase: Exported 3 records.
>
> 16/10/25 11:43:07 INFO hive.metastore: Closed a connection to metastore,
> current connections: 0
>
>
>
> *DESTINATION:*
>
>
>
> [cloudera@quickstart ~]$ psql -d postgres -c "select * from test1"
>
>  id |   nome
>
> ----+----------
>
>   1 | Douglas
>
>   2 | Spadotto
>
>   3 | Doug
>
> (3 rows)
>
>
>
> Now I'll go off into the Internet answer to everyone that posted this same
> question on other forums. :)
>
>
>
> Regards,
>
>
>
> Douglas
>
>
>
>
>
> On Tue, Oct 25, 2016 at 4:00 PM, Markus Kemper <ma...@cloudera.com>
> wrote:
>
> Glad to hear it.  Let us know how it goes
>
>
>
>
> Markus Kemper
> Customer Operations Engineer
> [image: www.cloudera.com] <http://www.cloudera.com>
>
>
>
>
>
> On Tue, Oct 25, 2016 at 1:55 PM, Douglas Spadotto <do...@gmail.com>
> wrote:
>
> Hi Markus,
>
>
>
> Thank you!
>
>
>
> I tried this (hcatalog options) myself a few minutes after I hit "send" on
> the e-mail. It worked fine, Sqoop was able to read the Parquet structure.
> Just my MR crashed but it was due to my unstable environment.
>
>
>
> It looks like I'm on the way to the solution.
>
>
>
> Cheers,
>
>
>
> Douglas
>
>
>
> On Tue, Oct 25, 2016 at 3:32 PM, Markus Kemper <ma...@cloudera.com>
> wrote:
>
> Hello Douglas,
>
>
>
> The only workaround that I am aware of is to use the Sqoop --hcatalog
> options, for example:
>
> sqoop export --connect <jdbc_connection_string> --table <rdbms_table>
> --hcatalog-database <hive_database> --hcatalog-table <hive_table>
>
>
>
>
>
>
> Markus Kemper
> Customer Operations Engineer
> [image: www.cloudera.com] <http://www.cloudera.com>
>
>
>
>
>
> On Tue, Oct 25, 2016 at 1:21 PM, Douglas Spadotto <do...@gmail.com>
> wrote:
>
> Hello everyone,
>
>
>
> I saw in the past few months quite a few messages about Parquet support on
> Sqoop, all about importing. Some of them worked well.
>
>
>
> But for exporting I'm receiving this error when trying to export from a
> Hive table stored as Parquet to Postgresql:
>
>
>
> [cloudera@quickstart ~]$ sqoop export --connect
> jdbc:postgresql://localhost/postgres --table test1  --export-dir
> /user/hive/warehouse/teste1
>
> Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will
> fail.
>
> Please set $ACCUMULO_HOME to the root of your Accumulo installation.
>
> 16/10/25 09:19:09 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.8.0
>
> 16/10/25 09:19:09 INFO manager.SqlManager: Using default fetchSize of 1000
>
> 16/10/25 09:19:09 INFO tool.CodeGenTool: Beginning code generation
>
> 16/10/25 09:19:10 INFO manager.SqlManager: Executing SQL statement: SELECT
> t.* FROM "test1" AS t LIMIT 1
>
> 16/10/25 09:19:10 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is
> /usr/lib/hadoop-mapreduce
>
> Note: /tmp/sqoop-cloudera/compile/019c3435216213411e2de14c483af692/test1.java
> uses or overrides a deprecated API.
>
> Note: Recompile with -Xlint:deprecation for details.
>
> 16/10/25 09:19:11 INFO orm.CompilationManager: Writing jar file:
> /tmp/sqoop-cloudera/compile/019c3435216213411e2de14c483af692/test1.jar
>
> 16/10/25 09:19:11 INFO mapreduce.ExportJobBase: Beginning export of test1
>
> 16/10/25 09:19:12 INFO Configuration.deprecation: mapred.jar is
> deprecated. Instead, use mapreduce.job.jar
>
> 16/10/25 09:19:12 INFO Configuration.deprecation: mapred.map.max.attempts
> is deprecated. Instead, use mapreduce.map.maxattempts
>
> 16/10/25 09:19:13 INFO manager.SqlManager: Executing SQL statement: SELECT
> t.* FROM "test1" AS t LIMIT 1
>
> 16/10/25 09:19:13 ERROR sqoop.Sqoop: Got exception running Sqoop:
> org.kitesdk.data.DatasetNotFoundException: Descriptor location does not
> exist: hdfs://quickstart.cloudera:8020/user/hive/warehouse/
> teste1/.metadata
>
> org.kitesdk.data.DatasetNotFoundException: Descriptor location does not
> exist: hdfs://quickstart.cloudera:8020/user/hive/warehouse/
> teste1/.metadata
>
> at org.kitesdk.data.spi.filesystem.FileSystemMetadataProvider.checkExists(
> FileSystemMetadataProvider.java:562)
>
> at org.kitesdk.data.spi.filesystem.FileSystemMetadataProvider.find(
> FileSystemMetadataProvider.java:605)
>
> at org.kitesdk.data.spi.filesystem.FileSystemMetadataProvider.load(
> FileSystemMetadataProvider.java:114)
>
> at org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository.load(
> FileSystemDatasetRepository.java:197)
>
> at org.kitesdk.data.Datasets.load(Datasets.java:108)
>
> at org.kitesdk.data.Datasets.load(Datasets.java:140)
>
> at org.kitesdk.data.mapreduce.DatasetKeyInputFormat$
> ConfigBuilder.readFrom(DatasetKeyInputFormat.java:92)
>
> at org.kitesdk.data.mapreduce.DatasetKeyInputFormat$
> ConfigBuilder.readFrom(DatasetKeyInputFormat.java:139)
>
> at org.apache.sqoop.mapreduce.JdbcExportJob.configureInputFormat(
> JdbcExportJob.java:84)
>
> at org.apache.sqoop.mapreduce.ExportJobBase.runExport(
> ExportJobBase.java:432)
>
> at org.apache.sqoop.manager.SqlManager.exportTable(SqlManager.java:931)
>
> at org.apache.sqoop.tool.ExportTool.exportTable(ExportTool.java:81)
>
> at org.apache.sqoop.tool.ExportTool.run(ExportTool.java:100)
>
> at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
>
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>
> at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
>
> at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
>
> at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
>
> at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
>
>
>
> I saw a recent JIRA opened about this, https://issues.apache.
> org/jira/browse/SQOOP-2907, and am wondering if there is any workaround
> for this?
>
>
>
> Thanks in advance,
>
>
>
> Douglas
>
>
>
> --
>
> Visite: http://canseidesercowboy.wordpress.com/
>
> Siga: @dougspadotto ou @excowboys
>
> -----
>
> Frodo: "I wish none of this had happened."
>
> Gandalf: "So do all who live to see such times, but that is not for them
> to decide. All we have to decide is what to do with the time that is given
> to us."
>
> -- Lord of the Rings: The Fellowship of the Ring (2001)
>
>
>
>
>
>
>
> --
>
> Visite: http://canseidesercowboy.wordpress.com/
>
> Siga: @dougspadotto ou @excowboys
>
> -----
>
> Frodo: "I wish none of this had happened."
>
> Gandalf: "So do all who live to see such times, but that is not for them
> to decide. All we have to decide is what to do with the time that is given
> to us."
>
> -- Lord of the Rings: The Fellowship of the Ring (2001)
>
>
>
>
>
>
>
> --
>
> Visite: http://canseidesercowboy.wordpress.com/
>
> Siga: @dougspadotto ou @excowboys
>
> -----
>
> Frodo: "I wish none of this had happened."
>
> Gandalf: "So do all who live to see such times, but that is not for them
> to decide. All we have to decide is what to do with the time that is given
> to us."
>
> -- Lord of the Rings: The Fellowship of the Ring (2001)
>
>
>

Re: Sqoop export from Hive table stored as Parquet

Posted by Venkat Ranganathan <vr...@hortonworks.com>.
Thanks for testing and answering that the export works.   One thing to note is that imports are not allowed into parquet tables with HCatalog (See HIVE-7502).   It has been a long standing issue since we created the Sqoop HCatalog integration a few years ago ☹

Thanks

Venkat

From: Markus Kemper <ma...@cloudera.com>
Reply-To: "user@sqoop.apache.org" <us...@sqoop.apache.org>
Date: Tuesday, October 25, 2016 at 12:01 PM
To: "user@sqoop.apache.org" <us...@sqoop.apache.org>
Subject: Re: Sqoop export from Hive table stored as Parquet

Awesomeness and thank you for helping with the other forums.


Markus Kemper
Customer Operations Engineer
[www.cloudera.com]<http://www.cloudera.com>


On Tue, Oct 25, 2016 at 2:56 PM, Douglas Spadotto <do...@gmail.com>> wrote:
Hi Markus,

It worked fine end to end. Here it goes the edited output:

SOURCE:

hive> describe extended teste1;
OK
id                   int
nome                 string

Detailed Table Information Table(tableName:teste1, dbName:default, owner:cloudera, createTime:1477408342, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:id, type:int, comment:null), FieldSchema(name:nome, type:string, comment:null)], location:hdfs://quickstart.cloudera:8020/user/hive/warehouse/teste1, inputFormat:org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe, parameters:{serialization.format=1}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[], parameters:{numFiles=2, transient_lastDdlTime=1477408508, COLUMN_STATS_ACCURATE=true, totalSize=645, numRows=3, rawDataSize=6}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE)
Time taken: 0.08 seconds, Fetched: 4 row(s)

hive> select * from teste1;
OK
...
1 Douglas
2 Spadotto
3 Doug
Time taken: 0.323 seconds, Fetched: 3 row(s)

SQOOP:

cloudera@quickstart ~]$ sqoop export --connect jdbc:postgresql://localhost/postgres --table test1 --hcatalog-table teste1
Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
16/10/25 11:42:19 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.8.0
16/10/25 11:42:19 INFO manager.SqlManager: Using default fetchSize of 1000
16/10/25 11:42:19 INFO tool.CodeGenTool: Beginning code generation
16/10/25 11:42:19 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM "test1" AS t LIMIT 1
16/10/25 11:42:19 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce
Note: /tmp/sqoop-cloudera/compile/50329ee88534a5c3b6df8b37aadf2131/test1.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
16/10/25 11:42:20 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-cloudera/compile/50329ee88534a5c3b6df8b37aadf2131/test1.jar
16/10/25 11:42:20 INFO mapreduce.ExportJobBase: Beginning export of test1
16/10/25 11:42:21 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
16/10/25 11:42:21 INFO Configuration.deprecation: mapred.map.max.attempts is deprecated. Instead, use mapreduce.map.maxattempts
16/10/25 11:42:21 INFO mapreduce.ExportJobBase: Configuring HCatalog for export job
16/10/25 11:42:21 INFO hcat.SqoopHCatUtilities: Configuring HCatalog specific details for job
16/10/25 11:42:21 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM "test1" AS t LIMIT 1
16/10/25 11:42:21 INFO hcat.SqoopHCatUtilities: Database column names projected : [id, nome]
16/10/25 11:42:21 INFO hcat.SqoopHCatUtilities: Database column name - info map :
id : [Type : 4,Precision : 10,Scale : 0]
nome : [Type : 12,Precision : 50,Scale : 0]  <-- Got the structure here!

16/10/25 11:42:21 INFO hive.metastore: Trying to connect to metastore with URI thrift://quickstart.cloudera:9083
16/10/25 11:42:21 INFO hive.metastore: Opened a connection to metastore, current connections: 1
16/10/25 11:42:21 INFO hive.metastore: Connected to metastore.
16/10/25 11:42:21 INFO hcat.SqoopHCatUtilities: HCatalog full table schema fields = [id, nome]
...
16/10/25 11:42:22 WARN hcat.SqoopHCatUtilities: No files under /usr/lib/hive-hcatalog/share/hcatalog/storage-handlers to add to distributed cache for hcatalog job
16/10/25 11:42:22 INFO hcat.SqoopHCatUtilities: Configuring HCatalog for export job
16/10/25 11:42:22 INFO hcat.SqoopHCatUtilities: Ignoring configuration request for HCatalog info
16/10/25 11:42:22 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
16/10/25 11:42:22 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
16/10/25 11:42:22 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
16/10/25 11:42:22 INFO client.RMProxy: Connecting to ResourceManager at quickstart.cloudera/192.168.26.129:8032<http://192.168.26.129:8032>
16/10/25 11:42:49 INFO Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
16/10/25 11:42:49 INFO mapred.FileInputFormat: Total input paths to process : 2
16/10/25 11:42:50 INFO mapreduce.JobSubmitter: number of splits:2
...
16/10/25 11:43:07 INFO mapreduce.ExportJobBase: Transferred 15.9258 KB in 45.1568 seconds (361.1415 bytes/sec)
16/10/25 11:43:07 INFO mapreduce.ExportJobBase: Exported 3 records.
16/10/25 11:43:07 INFO hive.metastore: Closed a connection to metastore, current connections: 0

DESTINATION:

[cloudera@quickstart ~]$ psql -d postgres -c "select * from test1"
 id |   nome
----+----------
  1 | Douglas
  2 | Spadotto
  3 | Doug
(3 rows)

Now I'll go off into the Internet answer to everyone that posted this same question on other forums. :)

Regards,

Douglas


On Tue, Oct 25, 2016 at 4:00 PM, Markus Kemper <ma...@cloudera.com>> wrote:
Glad to hear it.  Let us know how it goes


Markus Kemper
Customer Operations Engineer
[www.cloudera.com]<http://www.cloudera.com>


On Tue, Oct 25, 2016 at 1:55 PM, Douglas Spadotto <do...@gmail.com>> wrote:
Hi Markus,

Thank you!

I tried this (hcatalog options) myself a few minutes after I hit "send" on the e-mail. It worked fine, Sqoop was able to read the Parquet structure. Just my MR crashed but it was due to my unstable environment.

It looks like I'm on the way to the solution.

Cheers,

Douglas

On Tue, Oct 25, 2016 at 3:32 PM, Markus Kemper <ma...@cloudera.com>> wrote:
Hello Douglas,

The only workaround that I am aware of is to use the Sqoop --hcatalog options, for example:

sqoop export --connect <jdbc_connection_string> --table <rdbms_table> --hcatalog-database <hive_database> --hcatalog-table <hive_table>



Markus Kemper
Customer Operations Engineer
[www.cloudera.com]<http://www.cloudera.com>


On Tue, Oct 25, 2016 at 1:21 PM, Douglas Spadotto <do...@gmail.com>> wrote:
Hello everyone,

I saw in the past few months quite a few messages about Parquet support on Sqoop, all about importing. Some of them worked well.

But for exporting I'm receiving this error when trying to export from a Hive table stored as Parquet to Postgresql:

[cloudera@quickstart ~]$ sqoop export --connect jdbc:postgresql://localhost/postgres --table test1  --export-dir /user/hive/warehouse/teste1
Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
16/10/25 09:19:09 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.8.0
16/10/25 09:19:09 INFO manager.SqlManager: Using default fetchSize of 1000
16/10/25 09:19:09 INFO tool.CodeGenTool: Beginning code generation
16/10/25 09:19:10 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM "test1" AS t LIMIT 1
16/10/25 09:19:10 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce
Note: /tmp/sqoop-cloudera/compile/019c3435216213411e2de14c483af692/test1.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
16/10/25 09:19:11 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-cloudera/compile/019c3435216213411e2de14c483af692/test1.jar
16/10/25 09:19:11 INFO mapreduce.ExportJobBase: Beginning export of test1
16/10/25 09:19:12 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
16/10/25 09:19:12 INFO Configuration.deprecation: mapred.map.max.attempts is deprecated. Instead, use mapreduce.map.maxattempts
16/10/25 09:19:13 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM "test1" AS t LIMIT 1
16/10/25 09:19:13 ERROR sqoop.Sqoop: Got exception running Sqoop: org.kitesdk.data.DatasetNotFoundException: Descriptor location does not exist: hdfs://quickstart.cloudera:8020/user/hive/warehouse/teste1/.metadata
org.kitesdk.data.DatasetNotFoundException: Descriptor location does not exist: hdfs://quickstart.cloudera:8020/user/hive/warehouse/teste1/.metadata
at org.kitesdk.data.spi.filesystem.FileSystemMetadataProvider.checkExists(FileSystemMetadataProvider.java:562)
at org.kitesdk.data.spi.filesystem.FileSystemMetadataProvider.find(FileSystemMetadataProvider.java:605)
at org.kitesdk.data.spi.filesystem.FileSystemMetadataProvider.load(FileSystemMetadataProvider.java:114)
at org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository.load(FileSystemDatasetRepository.java:197)
at org.kitesdk.data.Datasets.load(Datasets.java:108)
at org.kitesdk.data.Datasets.load(Datasets.java:140)
at org.kitesdk.data.mapreduce.DatasetKeyInputFormat$ConfigBuilder.readFrom(DatasetKeyInputFormat.java:92)
at org.kitesdk.data.mapreduce.DatasetKeyInputFormat$ConfigBuilder.readFrom(DatasetKeyInputFormat.java:139)
at org.apache.sqoop.mapreduce.JdbcExportJob.configureInputFormat(JdbcExportJob.java:84)
at org.apache.sqoop.mapreduce.ExportJobBase.runExport(ExportJobBase.java:432)
at org.apache.sqoop.manager.SqlManager.exportTable(SqlManager.java:931)
at org.apache.sqoop.tool.ExportTool.exportTable(ExportTool.java:81)
at org.apache.sqoop.tool.ExportTool.run(ExportTool.java:100)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
at org.apache.sqoop.Sqoop.main(Sqoop.java:236)

I saw a recent JIRA opened about this, https://issues.apache.org/jira/browse/SQOOP-2907, and am wondering if there is any workaround for this?

Thanks in advance,

Douglas

--
Visite: http://canseidesercowboy.wordpress.com/
Siga: @dougspadotto ou @excowboys
-----
Frodo: "I wish none of this had happened."
Gandalf: "So do all who live to see such times, but that is not for them to decide. All we have to decide is what to do with the time that is given to us."
-- Lord of the Rings: The Fellowship of the Ring (2001)




--
Visite: http://canseidesercowboy.wordpress.com/
Siga: @dougspadotto ou @excowboys
-----
Frodo: "I wish none of this had happened."
Gandalf: "So do all who live to see such times, but that is not for them to decide. All we have to decide is what to do with the time that is given to us."
-- Lord of the Rings: The Fellowship of the Ring (2001)




--
Visite: http://canseidesercowboy.wordpress.com/
Siga: @dougspadotto ou @excowboys
-----
Frodo: "I wish none of this had happened."
Gandalf: "So do all who live to see such times, but that is not for them to decide. All we have to decide is what to do with the time that is given to us."
-- Lord of the Rings: The Fellowship of the Ring (2001)


Re: Sqoop export from Hive table stored as Parquet

Posted by Markus Kemper <ma...@cloudera.com>.
Awesomeness and thank you for helping with the other forums.


Markus Kemper
Customer Operations Engineer
[image: www.cloudera.com] <http://www.cloudera.com>


On Tue, Oct 25, 2016 at 2:56 PM, Douglas Spadotto <do...@gmail.com>
wrote:

> Hi Markus,
>
> It worked fine end to end. Here it goes the edited output:
>
> *SOURCE:*
>
> hive> describe extended teste1;
> OK
> id                   int
> nome                 string
>
> Detailed Table Information Table(tableName:teste1, dbName:default,
> owner:cloudera, createTime:1477408342, lastAccessTime:0, retention:0,
> sd:StorageDescriptor(cols:[FieldSchema(name:id, type:int, comment:null),
> FieldSchema(name:nome, type:string, comment:null)],
> location:hdfs://quickstart.cloudera:8020/user/hive/warehouse/teste1, *inputFormat:org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat,
> outputFormat:org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat,*
> compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null,
> serializationLib:org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe,
> parameters:{serialization.format=1}), bucketCols:[], sortCols:[],
> parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[],
> skewedColValues:[], skewedColValueLocationMaps:{}),
> storedAsSubDirectories:false), partitionKeys:[], parameters:{numFiles=2,
> transient_lastDdlTime=1477408508, COLUMN_STATS_ACCURATE=true,
> totalSize=645, numRows=3, rawDataSize=6}, viewOriginalText:null,
> viewExpandedText:null, tableType:MANAGED_TABLE)
> Time taken: 0.08 seconds, Fetched: 4 row(s)
>
> hive> select * from teste1;
> OK
> ...
> 1 Douglas
> 2 Spadotto
> 3 Doug
> Time taken: 0.323 seconds, Fetched: 3 row(s)
>
> *SQOOP:*
>
> cloudera@quickstart ~]$ sqoop export --connect
> jdbc:postgresql://localhost/postgres --table test1 --hcatalog-table
> teste1
> Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will
> fail.
> Please set $ACCUMULO_HOME to the root of your Accumulo installation.
> 16/10/25 11:42:19 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.8.0
> 16/10/25 11:42:19 INFO manager.SqlManager: Using default fetchSize of 1000
> 16/10/25 11:42:19 INFO tool.CodeGenTool: Beginning code generation
> 16/10/25 11:42:19 INFO manager.SqlManager: Executing SQL statement: SELECT
> t.* FROM "test1" AS t LIMIT 1
> 16/10/25 11:42:19 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is
> /usr/lib/hadoop-mapreduce
> Note: /tmp/sqoop-cloudera/compile/50329ee88534a5c3b6df8b37aadf2131/test1.java
> uses or overrides a deprecated API.
> Note: Recompile with -Xlint:deprecation for details.
> 16/10/25 11:42:20 INFO orm.CompilationManager: Writing jar file:
> /tmp/sqoop-cloudera/compile/50329ee88534a5c3b6df8b37aadf2131/test1.jar
> 16/10/25 11:42:20 INFO mapreduce.ExportJobBase: Beginning export of test1
> 16/10/25 11:42:21 INFO Configuration.deprecation: mapred.jar is
> deprecated. Instead, use mapreduce.job.jar
> 16/10/25 11:42:21 INFO Configuration.deprecation: mapred.map.max.attempts
> is deprecated. Instead, use mapreduce.map.maxattempts
> 16/10/25 11:42:21 INFO mapreduce.ExportJobBase: Configuring HCatalog for
> export job
> 16/10/25 11:42:21 INFO hcat.SqoopHCatUtilities: Configuring HCatalog
> specific details for job
> 16/10/25 11:42:21 INFO manager.SqlManager: Executing SQL statement: SELECT
> t.* FROM "test1" AS t LIMIT 1
> 16/10/25 11:42:21 INFO hcat.SqoopHCatUtilities: Database column names
> projected : [id, nome]
> 16/10/25 11:42:21 INFO hcat.SqoopHCatUtilities: Database column name -
> info map :
> id : [Type : 4,Precision : 10,Scale : 0]
> nome : [Type : 12,Precision : 50,Scale : 0]  *<-- Got the structure here!*
>
> 16/10/25 11:42:21 INFO hive.metastore: Trying to connect to metastore with
> URI thrift://quickstart.cloudera:9083
> 16/10/25 11:42:21 INFO hive.metastore: Opened a connection to metastore,
> current connections: 1
> 16/10/25 11:42:21 INFO hive.metastore: Connected to metastore.
> 16/10/25 11:42:21 INFO hcat.SqoopHCatUtilities: HCatalog full table schema
> fields = [id, nome]
> ...
> 16/10/25 11:42:22 WARN hcat.SqoopHCatUtilities: No files under
> /usr/lib/hive-hcatalog/share/hcatalog/storage-handlers to add to
> distributed cache for hcatalog job
> 16/10/25 11:42:22 INFO hcat.SqoopHCatUtilities: Configuring HCatalog for
> export job
> 16/10/25 11:42:22 INFO hcat.SqoopHCatUtilities: Ignoring configuration
> request for HCatalog info
> 16/10/25 11:42:22 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution
> is deprecated. Instead, use mapreduce.reduce.speculative
> 16/10/25 11:42:22 INFO Configuration.deprecation:
> mapred.map.tasks.speculative.execution is deprecated. Instead, use
> mapreduce.map.speculative
> 16/10/25 11:42:22 INFO Configuration.deprecation: mapred.map.tasks is
> deprecated. Instead, use mapreduce.job.maps
> 16/10/25 11:42:22 INFO client.RMProxy: Connecting to ResourceManager at
> quickstart.cloudera/192.168.26.129:8032
> 16/10/25 11:42:49 INFO Configuration.deprecation: mapred.input.dir is
> deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
> 16/10/25 11:42:49 INFO mapred.FileInputFormat: Total input paths to
> process : 2
> 16/10/25 11:42:50 INFO mapreduce.JobSubmitter: number of splits:2
> ...
> 16/10/25 11:43:07 INFO mapreduce.ExportJobBase: Transferred 15.9258 KB in
> 45.1568 seconds (361.1415 bytes/sec)
> 16/10/25 11:43:07 INFO mapreduce.ExportJobBase: Exported 3 records.
> 16/10/25 11:43:07 INFO hive.metastore: Closed a connection to metastore,
> current connections: 0
>
> *DESTINATION:*
>
> [cloudera@quickstart ~]$ psql -d postgres -c "select * from test1"
>  id |   nome
> ----+----------
>   1 | Douglas
>   2 | Spadotto
>   3 | Doug
> (3 rows)
>
> Now I'll go off into the Internet answer to everyone that posted this same
> question on other forums. :)
>
> Regards,
>
> Douglas
>
>
> On Tue, Oct 25, 2016 at 4:00 PM, Markus Kemper <ma...@cloudera.com>
> wrote:
>
>> Glad to hear it.  Let us know how it goes
>>
>>
>> Markus Kemper
>> Customer Operations Engineer
>> [image: www.cloudera.com] <http://www.cloudera.com>
>>
>>
>> On Tue, Oct 25, 2016 at 1:55 PM, Douglas Spadotto <dougspadotto@gmail.com
>> > wrote:
>>
>>> Hi Markus,
>>>
>>> Thank you!
>>>
>>> I tried this (hcatalog options) myself a few minutes after I hit "send"
>>> on the e-mail. It worked fine, Sqoop was able to read the Parquet
>>> structure. Just my MR crashed but it was due to my unstable environment.
>>>
>>> It looks like I'm on the way to the solution.
>>>
>>> Cheers,
>>>
>>> Douglas
>>>
>>> On Tue, Oct 25, 2016 at 3:32 PM, Markus Kemper <ma...@cloudera.com>
>>> wrote:
>>>
>>>> Hello Douglas,
>>>>
>>>> The only workaround that I am aware of is to use the Sqoop --hcatalog
>>>> options, for example:
>>>>
>>>> sqoop export --connect <jdbc_connection_string> --table <rdbms_table>
>>>> --hcatalog-database <hive_database> --hcatalog-table <hive_table>
>>>>
>>>>
>>>>
>>>> Markus Kemper
>>>> Customer Operations Engineer
>>>> [image: www.cloudera.com] <http://www.cloudera.com>
>>>>
>>>>
>>>> On Tue, Oct 25, 2016 at 1:21 PM, Douglas Spadotto <
>>>> dougspadotto@gmail.com> wrote:
>>>>
>>>>> Hello everyone,
>>>>>
>>>>> I saw in the past few months quite a few messages about Parquet
>>>>> support on Sqoop, all about importing. Some of them worked well.
>>>>>
>>>>> But for exporting I'm receiving this error when trying to export from
>>>>> a Hive table stored as Parquet to Postgresql:
>>>>>
>>>>> [cloudera@quickstart ~]$ sqoop export --connect
>>>>> jdbc:postgresql://localhost/postgres --table test1  --export-dir
>>>>> /user/hive/warehouse/teste1
>>>>> Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports
>>>>> will fail.
>>>>> Please set $ACCUMULO_HOME to the root of your Accumulo installation.
>>>>> 16/10/25 09:19:09 INFO sqoop.Sqoop: Running Sqoop version:
>>>>> 1.4.6-cdh5.8.0
>>>>> 16/10/25 09:19:09 INFO manager.SqlManager: Using default fetchSize of
>>>>> 1000
>>>>> 16/10/25 09:19:09 INFO tool.CodeGenTool: Beginning code generation
>>>>> 16/10/25 09:19:10 INFO manager.SqlManager: Executing SQL statement:
>>>>> SELECT t.* FROM "test1" AS t LIMIT 1
>>>>> 16/10/25 09:19:10 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is
>>>>> /usr/lib/hadoop-mapreduce
>>>>> Note: /tmp/sqoop-cloudera/compile/019c3435216213411e2de14c483af692/test1.java
>>>>> uses or overrides a deprecated API.
>>>>> Note: Recompile with -Xlint:deprecation for details.
>>>>> 16/10/25 09:19:11 INFO orm.CompilationManager: Writing jar file:
>>>>> /tmp/sqoop-cloudera/compile/019c3435216213411e2de14c483af692/test1.jar
>>>>> 16/10/25 09:19:11 INFO mapreduce.ExportJobBase: Beginning export of
>>>>> test1
>>>>> 16/10/25 09:19:12 INFO Configuration.deprecation: mapred.jar is
>>>>> deprecated. Instead, use mapreduce.job.jar
>>>>> 16/10/25 09:19:12 INFO Configuration.deprecation:
>>>>> mapred.map.max.attempts is deprecated. Instead, use
>>>>> mapreduce.map.maxattempts
>>>>> 16/10/25 09:19:13 INFO manager.SqlManager: Executing SQL statement:
>>>>> SELECT t.* FROM "test1" AS t LIMIT 1
>>>>> 16/10/25 09:19:13 ERROR sqoop.Sqoop: Got exception running Sqoop:
>>>>> org.kitesdk.data.DatasetNotFoundException: Descriptor location does
>>>>> not exist: hdfs://quickstart.cloudera:802
>>>>> 0/user/hive/warehouse/teste1/.metadata
>>>>> org.kitesdk.data.DatasetNotFoundException: Descriptor location does
>>>>> not exist: hdfs://quickstart.cloudera:802
>>>>> 0/user/hive/warehouse/teste1/.metadata
>>>>> at org.kitesdk.data.spi.filesystem.FileSystemMetadataProvider.c
>>>>> heckExists(FileSystemMetadataProvider.java:562)
>>>>> at org.kitesdk.data.spi.filesystem.FileSystemMetadataProvider.f
>>>>> ind(FileSystemMetadataProvider.java:605)
>>>>> at org.kitesdk.data.spi.filesystem.FileSystemMetadataProvider.l
>>>>> oad(FileSystemMetadataProvider.java:114)
>>>>> at org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository.
>>>>> load(FileSystemDatasetRepository.java:197)
>>>>> at org.kitesdk.data.Datasets.load(Datasets.java:108)
>>>>> at org.kitesdk.data.Datasets.load(Datasets.java:140)
>>>>> at org.kitesdk.data.mapreduce.DatasetKeyInputFormat$ConfigBuild
>>>>> er.readFrom(DatasetKeyInputFormat.java:92)
>>>>> at org.kitesdk.data.mapreduce.DatasetKeyInputFormat$ConfigBuild
>>>>> er.readFrom(DatasetKeyInputFormat.java:139)
>>>>> at org.apache.sqoop.mapreduce.JdbcExportJob.configureInputForma
>>>>> t(JdbcExportJob.java:84)
>>>>> at org.apache.sqoop.mapreduce.ExportJobBase.runExport(ExportJob
>>>>> Base.java:432)
>>>>> at org.apache.sqoop.manager.SqlManager.exportTable(SqlManager.j
>>>>> ava:931)
>>>>> at org.apache.sqoop.tool.ExportTool.exportTable(ExportTool.java:81)
>>>>> at org.apache.sqoop.tool.ExportTool.run(ExportTool.java:100)
>>>>> at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
>>>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>>>>> at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
>>>>> at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
>>>>> at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
>>>>> at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
>>>>>
>>>>> I saw a recent JIRA opened about this, https://issues.apache.or
>>>>> g/jira/browse/SQOOP-2907, and am wondering if there is any workaround
>>>>> for this?
>>>>>
>>>>> Thanks in advance,
>>>>>
>>>>> Douglas
>>>>>
>>>>> --
>>>>> Visite: http://canseidesercowboy.wordpress.com/
>>>>> Siga: @dougspadotto ou @excowboys
>>>>> -----
>>>>> Frodo: "I wish none of this had happened."
>>>>> Gandalf: "So do all who live to see such times, but that is not for
>>>>> them to decide. All we have to decide is what to do with the time that is
>>>>> given to us."
>>>>> -- Lord of the Rings: The Fellowship of the Ring (2001)
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Visite: http://canseidesercowboy.wordpress.com/
>>> Siga: @dougspadotto ou @excowboys
>>> -----
>>> Frodo: "I wish none of this had happened."
>>> Gandalf: "So do all who live to see such times, but that is not for them
>>> to decide. All we have to decide is what to do with the time that is given
>>> to us."
>>> -- Lord of the Rings: The Fellowship of the Ring (2001)
>>>
>>
>>
>
>
> --
> Visite: http://canseidesercowboy.wordpress.com/
> Siga: @dougspadotto ou @excowboys
> -----
> Frodo: "I wish none of this had happened."
> Gandalf: "So do all who live to see such times, but that is not for them
> to decide. All we have to decide is what to do with the time that is given
> to us."
> -- Lord of the Rings: The Fellowship of the Ring (2001)
>

Re: Sqoop export from Hive table stored as Parquet

Posted by Douglas Spadotto <do...@gmail.com>.
Hi Markus,

It worked fine end to end. Here it goes the edited output:

*SOURCE:*

hive> describe extended teste1;
OK
id                   int
nome                 string

Detailed Table Information Table(tableName:teste1, dbName:default,
owner:cloudera, createTime:1477408342, lastAccessTime:0, retention:0,
sd:StorageDescriptor(cols:[FieldSchema(name:id, type:int, comment:null),
FieldSchema(name:nome, type:string, comment:null)],
location:hdfs://quickstart.cloudera:8020/user/hive/warehouse/teste1,
*inputFormat:org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat,
outputFormat:org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat,*
compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null,
serializationLib:org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe,
parameters:{serialization.format=1}), bucketCols:[], sortCols:[],
parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[],
skewedColValueLocationMaps:{}), storedAsSubDirectories:false),
partitionKeys:[], parameters:{numFiles=2, transient_lastDdlTime=1477408508,
COLUMN_STATS_ACCURATE=true, totalSize=645, numRows=3, rawDataSize=6},
viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE)
Time taken: 0.08 seconds, Fetched: 4 row(s)

hive> select * from teste1;
OK
...
1 Douglas
2 Spadotto
3 Doug
Time taken: 0.323 seconds, Fetched: 3 row(s)

*SQOOP:*

cloudera@quickstart ~]$ sqoop export --connect
jdbc:postgresql://localhost/postgres --table test1 --hcatalog-table teste1
Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will
fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
16/10/25 11:42:19 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.8.0
16/10/25 11:42:19 INFO manager.SqlManager: Using default fetchSize of 1000
16/10/25 11:42:19 INFO tool.CodeGenTool: Beginning code generation
16/10/25 11:42:19 INFO manager.SqlManager: Executing SQL statement: SELECT
t.* FROM "test1" AS t LIMIT 1
16/10/25 11:42:19 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is
/usr/lib/hadoop-mapreduce
Note:
/tmp/sqoop-cloudera/compile/50329ee88534a5c3b6df8b37aadf2131/test1.java
uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
16/10/25 11:42:20 INFO orm.CompilationManager: Writing jar file:
/tmp/sqoop-cloudera/compile/50329ee88534a5c3b6df8b37aadf2131/test1.jar
16/10/25 11:42:20 INFO mapreduce.ExportJobBase: Beginning export of test1
16/10/25 11:42:21 INFO Configuration.deprecation: mapred.jar is deprecated.
Instead, use mapreduce.job.jar
16/10/25 11:42:21 INFO Configuration.deprecation: mapred.map.max.attempts
is deprecated. Instead, use mapreduce.map.maxattempts
16/10/25 11:42:21 INFO mapreduce.ExportJobBase: Configuring HCatalog for
export job
16/10/25 11:42:21 INFO hcat.SqoopHCatUtilities: Configuring HCatalog
specific details for job
16/10/25 11:42:21 INFO manager.SqlManager: Executing SQL statement: SELECT
t.* FROM "test1" AS t LIMIT 1
16/10/25 11:42:21 INFO hcat.SqoopHCatUtilities: Database column names
projected : [id, nome]
16/10/25 11:42:21 INFO hcat.SqoopHCatUtilities: Database column name - info
map :
id : [Type : 4,Precision : 10,Scale : 0]
nome : [Type : 12,Precision : 50,Scale : 0]  *<-- Got the structure here!*

16/10/25 11:42:21 INFO hive.metastore: Trying to connect to metastore with
URI thrift://quickstart.cloudera:9083
16/10/25 11:42:21 INFO hive.metastore: Opened a connection to metastore,
current connections: 1
16/10/25 11:42:21 INFO hive.metastore: Connected to metastore.
16/10/25 11:42:21 INFO hcat.SqoopHCatUtilities: HCatalog full table schema
fields = [id, nome]
...
16/10/25 11:42:22 WARN hcat.SqoopHCatUtilities: No files under
/usr/lib/hive-hcatalog/share/hcatalog/storage-handlers to add to
distributed cache for hcatalog job
16/10/25 11:42:22 INFO hcat.SqoopHCatUtilities: Configuring HCatalog for
export job
16/10/25 11:42:22 INFO hcat.SqoopHCatUtilities: Ignoring configuration
request for HCatalog info
16/10/25 11:42:22 INFO Configuration.deprecation:
mapred.reduce.tasks.speculative.execution is deprecated. Instead, use
mapreduce.reduce.speculative
16/10/25 11:42:22 INFO Configuration.deprecation:
mapred.map.tasks.speculative.execution is deprecated. Instead, use
mapreduce.map.speculative
16/10/25 11:42:22 INFO Configuration.deprecation: mapred.map.tasks is
deprecated. Instead, use mapreduce.job.maps
16/10/25 11:42:22 INFO client.RMProxy: Connecting to ResourceManager at
quickstart.cloudera/192.168.26.129:8032
16/10/25 11:42:49 INFO Configuration.deprecation: mapred.input.dir is
deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
16/10/25 11:42:49 INFO mapred.FileInputFormat: Total input paths to process
: 2
16/10/25 11:42:50 INFO mapreduce.JobSubmitter: number of splits:2
...
16/10/25 11:43:07 INFO mapreduce.ExportJobBase: Transferred 15.9258 KB in
45.1568 seconds (361.1415 bytes/sec)
16/10/25 11:43:07 INFO mapreduce.ExportJobBase: Exported 3 records.
16/10/25 11:43:07 INFO hive.metastore: Closed a connection to metastore,
current connections: 0

*DESTINATION:*

[cloudera@quickstart ~]$ psql -d postgres -c "select * from test1"
 id |   nome
----+----------
  1 | Douglas
  2 | Spadotto
  3 | Doug
(3 rows)

Now I'll go off into the Internet answer to everyone that posted this same
question on other forums. :)

Regards,

Douglas


On Tue, Oct 25, 2016 at 4:00 PM, Markus Kemper <ma...@cloudera.com> wrote:

> Glad to hear it.  Let us know how it goes
>
>
> Markus Kemper
> Customer Operations Engineer
> [image: www.cloudera.com] <http://www.cloudera.com>
>
>
> On Tue, Oct 25, 2016 at 1:55 PM, Douglas Spadotto <do...@gmail.com>
> wrote:
>
>> Hi Markus,
>>
>> Thank you!
>>
>> I tried this (hcatalog options) myself a few minutes after I hit "send"
>> on the e-mail. It worked fine, Sqoop was able to read the Parquet
>> structure. Just my MR crashed but it was due to my unstable environment.
>>
>> It looks like I'm on the way to the solution.
>>
>> Cheers,
>>
>> Douglas
>>
>> On Tue, Oct 25, 2016 at 3:32 PM, Markus Kemper <ma...@cloudera.com>
>> wrote:
>>
>>> Hello Douglas,
>>>
>>> The only workaround that I am aware of is to use the Sqoop --hcatalog
>>> options, for example:
>>>
>>> sqoop export --connect <jdbc_connection_string> --table <rdbms_table>
>>> --hcatalog-database <hive_database> --hcatalog-table <hive_table>
>>>
>>>
>>>
>>> Markus Kemper
>>> Customer Operations Engineer
>>> [image: www.cloudera.com] <http://www.cloudera.com>
>>>
>>>
>>> On Tue, Oct 25, 2016 at 1:21 PM, Douglas Spadotto <
>>> dougspadotto@gmail.com> wrote:
>>>
>>>> Hello everyone,
>>>>
>>>> I saw in the past few months quite a few messages about Parquet support
>>>> on Sqoop, all about importing. Some of them worked well.
>>>>
>>>> But for exporting I'm receiving this error when trying to export from a
>>>> Hive table stored as Parquet to Postgresql:
>>>>
>>>> [cloudera@quickstart ~]$ sqoop export --connect
>>>> jdbc:postgresql://localhost/postgres --table test1  --export-dir
>>>> /user/hive/warehouse/teste1
>>>> Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports
>>>> will fail.
>>>> Please set $ACCUMULO_HOME to the root of your Accumulo installation.
>>>> 16/10/25 09:19:09 INFO sqoop.Sqoop: Running Sqoop version:
>>>> 1.4.6-cdh5.8.0
>>>> 16/10/25 09:19:09 INFO manager.SqlManager: Using default fetchSize of
>>>> 1000
>>>> 16/10/25 09:19:09 INFO tool.CodeGenTool: Beginning code generation
>>>> 16/10/25 09:19:10 INFO manager.SqlManager: Executing SQL statement:
>>>> SELECT t.* FROM "test1" AS t LIMIT 1
>>>> 16/10/25 09:19:10 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is
>>>> /usr/lib/hadoop-mapreduce
>>>> Note: /tmp/sqoop-cloudera/compile/019c3435216213411e2de14c483af692/test1.java
>>>> uses or overrides a deprecated API.
>>>> Note: Recompile with -Xlint:deprecation for details.
>>>> 16/10/25 09:19:11 INFO orm.CompilationManager: Writing jar file:
>>>> /tmp/sqoop-cloudera/compile/019c3435216213411e2de14c483af692/test1.jar
>>>> 16/10/25 09:19:11 INFO mapreduce.ExportJobBase: Beginning export of
>>>> test1
>>>> 16/10/25 09:19:12 INFO Configuration.deprecation: mapred.jar is
>>>> deprecated. Instead, use mapreduce.job.jar
>>>> 16/10/25 09:19:12 INFO Configuration.deprecation:
>>>> mapred.map.max.attempts is deprecated. Instead, use
>>>> mapreduce.map.maxattempts
>>>> 16/10/25 09:19:13 INFO manager.SqlManager: Executing SQL statement:
>>>> SELECT t.* FROM "test1" AS t LIMIT 1
>>>> 16/10/25 09:19:13 ERROR sqoop.Sqoop: Got exception running Sqoop:
>>>> org.kitesdk.data.DatasetNotFoundException: Descriptor location does
>>>> not exist: hdfs://quickstart.cloudera:8020/user/hive/warehouse/teste1/.
>>>> metadata
>>>> org.kitesdk.data.DatasetNotFoundException: Descriptor location does
>>>> not exist: hdfs://quickstart.cloudera:8020/user/hive/warehouse/teste1/.
>>>> metadata
>>>> at org.kitesdk.data.spi.filesystem.FileSystemMetadataProvider.c
>>>> heckExists(FileSystemMetadataProvider.java:562)
>>>> at org.kitesdk.data.spi.filesystem.FileSystemMetadataProvider.f
>>>> ind(FileSystemMetadataProvider.java:605)
>>>> at org.kitesdk.data.spi.filesystem.FileSystemMetadataProvider.l
>>>> oad(FileSystemMetadataProvider.java:114)
>>>> at org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository.
>>>> load(FileSystemDatasetRepository.java:197)
>>>> at org.kitesdk.data.Datasets.load(Datasets.java:108)
>>>> at org.kitesdk.data.Datasets.load(Datasets.java:140)
>>>> at org.kitesdk.data.mapreduce.DatasetKeyInputFormat$ConfigBuild
>>>> er.readFrom(DatasetKeyInputFormat.java:92)
>>>> at org.kitesdk.data.mapreduce.DatasetKeyInputFormat$ConfigBuild
>>>> er.readFrom(DatasetKeyInputFormat.java:139)
>>>> at org.apache.sqoop.mapreduce.JdbcExportJob.configureInputForma
>>>> t(JdbcExportJob.java:84)
>>>> at org.apache.sqoop.mapreduce.ExportJobBase.runExport(ExportJob
>>>> Base.java:432)
>>>> at org.apache.sqoop.manager.SqlManager.exportTable(SqlManager.java:931)
>>>> at org.apache.sqoop.tool.ExportTool.exportTable(ExportTool.java:81)
>>>> at org.apache.sqoop.tool.ExportTool.run(ExportTool.java:100)
>>>> at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
>>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>>>> at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
>>>> at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
>>>> at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
>>>> at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
>>>>
>>>> I saw a recent JIRA opened about this, https://issues.apache.or
>>>> g/jira/browse/SQOOP-2907, and am wondering if there is any workaround
>>>> for this?
>>>>
>>>> Thanks in advance,
>>>>
>>>> Douglas
>>>>
>>>> --
>>>> Visite: http://canseidesercowboy.wordpress.com/
>>>> Siga: @dougspadotto ou @excowboys
>>>> -----
>>>> Frodo: "I wish none of this had happened."
>>>> Gandalf: "So do all who live to see such times, but that is not for
>>>> them to decide. All we have to decide is what to do with the time that is
>>>> given to us."
>>>> -- Lord of the Rings: The Fellowship of the Ring (2001)
>>>>
>>>
>>>
>>
>>
>> --
>> Visite: http://canseidesercowboy.wordpress.com/
>> Siga: @dougspadotto ou @excowboys
>> -----
>> Frodo: "I wish none of this had happened."
>> Gandalf: "So do all who live to see such times, but that is not for them
>> to decide. All we have to decide is what to do with the time that is given
>> to us."
>> -- Lord of the Rings: The Fellowship of the Ring (2001)
>>
>
>


-- 
Visite: http://canseidesercowboy.wordpress.com/
Siga: @dougspadotto ou @excowboys
-----
Frodo: "I wish none of this had happened."
Gandalf: "So do all who live to see such times, but that is not for them to
decide. All we have to decide is what to do with the time that is given to
us."
-- Lord of the Rings: The Fellowship of the Ring (2001)

Re: Sqoop export from Hive table stored as Parquet

Posted by Markus Kemper <ma...@cloudera.com>.
Glad to hear it.  Let us know how it goes


Markus Kemper
Customer Operations Engineer
[image: www.cloudera.com] <http://www.cloudera.com>


On Tue, Oct 25, 2016 at 1:55 PM, Douglas Spadotto <do...@gmail.com>
wrote:

> Hi Markus,
>
> Thank you!
>
> I tried this (hcatalog options) myself a few minutes after I hit "send" on
> the e-mail. It worked fine, Sqoop was able to read the Parquet structure.
> Just my MR crashed but it was due to my unstable environment.
>
> It looks like I'm on the way to the solution.
>
> Cheers,
>
> Douglas
>
> On Tue, Oct 25, 2016 at 3:32 PM, Markus Kemper <ma...@cloudera.com>
> wrote:
>
>> Hello Douglas,
>>
>> The only workaround that I am aware of is to use the Sqoop --hcatalog
>> options, for example:
>>
>> sqoop export --connect <jdbc_connection_string> --table <rdbms_table>
>> --hcatalog-database <hive_database> --hcatalog-table <hive_table>
>>
>>
>>
>> Markus Kemper
>> Customer Operations Engineer
>> [image: www.cloudera.com] <http://www.cloudera.com>
>>
>>
>> On Tue, Oct 25, 2016 at 1:21 PM, Douglas Spadotto <dougspadotto@gmail.com
>> > wrote:
>>
>>> Hello everyone,
>>>
>>> I saw in the past few months quite a few messages about Parquet support
>>> on Sqoop, all about importing. Some of them worked well.
>>>
>>> But for exporting I'm receiving this error when trying to export from a
>>> Hive table stored as Parquet to Postgresql:
>>>
>>> [cloudera@quickstart ~]$ sqoop export --connect
>>> jdbc:postgresql://localhost/postgres --table test1  --export-dir
>>> /user/hive/warehouse/teste1
>>> Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports
>>> will fail.
>>> Please set $ACCUMULO_HOME to the root of your Accumulo installation.
>>> 16/10/25 09:19:09 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.8.0
>>> 16/10/25 09:19:09 INFO manager.SqlManager: Using default fetchSize of
>>> 1000
>>> 16/10/25 09:19:09 INFO tool.CodeGenTool: Beginning code generation
>>> 16/10/25 09:19:10 INFO manager.SqlManager: Executing SQL statement:
>>> SELECT t.* FROM "test1" AS t LIMIT 1
>>> 16/10/25 09:19:10 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is
>>> /usr/lib/hadoop-mapreduce
>>> Note: /tmp/sqoop-cloudera/compile/019c3435216213411e2de14c483af692/test1.java
>>> uses or overrides a deprecated API.
>>> Note: Recompile with -Xlint:deprecation for details.
>>> 16/10/25 09:19:11 INFO orm.CompilationManager: Writing jar file:
>>> /tmp/sqoop-cloudera/compile/019c3435216213411e2de14c483af692/test1.jar
>>> 16/10/25 09:19:11 INFO mapreduce.ExportJobBase: Beginning export of test1
>>> 16/10/25 09:19:12 INFO Configuration.deprecation: mapred.jar is
>>> deprecated. Instead, use mapreduce.job.jar
>>> 16/10/25 09:19:12 INFO Configuration.deprecation:
>>> mapred.map.max.attempts is deprecated. Instead, use
>>> mapreduce.map.maxattempts
>>> 16/10/25 09:19:13 INFO manager.SqlManager: Executing SQL statement:
>>> SELECT t.* FROM "test1" AS t LIMIT 1
>>> 16/10/25 09:19:13 ERROR sqoop.Sqoop: Got exception running Sqoop:
>>> org.kitesdk.data.DatasetNotFoundException: Descriptor location does not
>>> exist: hdfs://quickstart.cloudera:8020/user/hive/warehouse/teste1/.
>>> metadata
>>> org.kitesdk.data.DatasetNotFoundException: Descriptor location does not
>>> exist: hdfs://quickstart.cloudera:8020/user/hive/warehouse/teste1/.
>>> metadata
>>> at org.kitesdk.data.spi.filesystem.FileSystemMetadataProvider.c
>>> heckExists(FileSystemMetadataProvider.java:562)
>>> at org.kitesdk.data.spi.filesystem.FileSystemMetadataProvider.f
>>> ind(FileSystemMetadataProvider.java:605)
>>> at org.kitesdk.data.spi.filesystem.FileSystemMetadataProvider.l
>>> oad(FileSystemMetadataProvider.java:114)
>>> at org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository.
>>> load(FileSystemDatasetRepository.java:197)
>>> at org.kitesdk.data.Datasets.load(Datasets.java:108)
>>> at org.kitesdk.data.Datasets.load(Datasets.java:140)
>>> at org.kitesdk.data.mapreduce.DatasetKeyInputFormat$ConfigBuild
>>> er.readFrom(DatasetKeyInputFormat.java:92)
>>> at org.kitesdk.data.mapreduce.DatasetKeyInputFormat$ConfigBuild
>>> er.readFrom(DatasetKeyInputFormat.java:139)
>>> at org.apache.sqoop.mapreduce.JdbcExportJob.configureInputForma
>>> t(JdbcExportJob.java:84)
>>> at org.apache.sqoop.mapreduce.ExportJobBase.runExport(ExportJob
>>> Base.java:432)
>>> at org.apache.sqoop.manager.SqlManager.exportTable(SqlManager.java:931)
>>> at org.apache.sqoop.tool.ExportTool.exportTable(ExportTool.java:81)
>>> at org.apache.sqoop.tool.ExportTool.run(ExportTool.java:100)
>>> at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>>> at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
>>> at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
>>> at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
>>> at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
>>>
>>> I saw a recent JIRA opened about this, https://issues.apache.or
>>> g/jira/browse/SQOOP-2907, and am wondering if there is any workaround
>>> for this?
>>>
>>> Thanks in advance,
>>>
>>> Douglas
>>>
>>> --
>>> Visite: http://canseidesercowboy.wordpress.com/
>>> Siga: @dougspadotto ou @excowboys
>>> -----
>>> Frodo: "I wish none of this had happened."
>>> Gandalf: "So do all who live to see such times, but that is not for them
>>> to decide. All we have to decide is what to do with the time that is given
>>> to us."
>>> -- Lord of the Rings: The Fellowship of the Ring (2001)
>>>
>>
>>
>
>
> --
> Visite: http://canseidesercowboy.wordpress.com/
> Siga: @dougspadotto ou @excowboys
> -----
> Frodo: "I wish none of this had happened."
> Gandalf: "So do all who live to see such times, but that is not for them
> to decide. All we have to decide is what to do with the time that is given
> to us."
> -- Lord of the Rings: The Fellowship of the Ring (2001)
>

Re: Sqoop export from Hive table stored as Parquet

Posted by Douglas Spadotto <do...@gmail.com>.
Hi Markus,

Thank you!

I tried this (hcatalog options) myself a few minutes after I hit "send" on
the e-mail. It worked fine, Sqoop was able to read the Parquet structure.
Just my MR crashed but it was due to my unstable environment.

It looks like I'm on the way to the solution.

Cheers,

Douglas

On Tue, Oct 25, 2016 at 3:32 PM, Markus Kemper <ma...@cloudera.com> wrote:

> Hello Douglas,
>
> The only workaround that I am aware of is to use the Sqoop --hcatalog
> options, for example:
>
> sqoop export --connect <jdbc_connection_string> --table <rdbms_table>
> --hcatalog-database <hive_database> --hcatalog-table <hive_table>
>
>
>
> Markus Kemper
> Customer Operations Engineer
> [image: www.cloudera.com] <http://www.cloudera.com>
>
>
> On Tue, Oct 25, 2016 at 1:21 PM, Douglas Spadotto <do...@gmail.com>
> wrote:
>
>> Hello everyone,
>>
>> I saw in the past few months quite a few messages about Parquet support
>> on Sqoop, all about importing. Some of them worked well.
>>
>> But for exporting I'm receiving this error when trying to export from a
>> Hive table stored as Parquet to Postgresql:
>>
>> [cloudera@quickstart ~]$ sqoop export --connect
>> jdbc:postgresql://localhost/postgres --table test1  --export-dir
>> /user/hive/warehouse/teste1
>> Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will
>> fail.
>> Please set $ACCUMULO_HOME to the root of your Accumulo installation.
>> 16/10/25 09:19:09 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.8.0
>> 16/10/25 09:19:09 INFO manager.SqlManager: Using default fetchSize of 1000
>> 16/10/25 09:19:09 INFO tool.CodeGenTool: Beginning code generation
>> 16/10/25 09:19:10 INFO manager.SqlManager: Executing SQL statement:
>> SELECT t.* FROM "test1" AS t LIMIT 1
>> 16/10/25 09:19:10 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is
>> /usr/lib/hadoop-mapreduce
>> Note: /tmp/sqoop-cloudera/compile/019c3435216213411e2de14c483af692/test1.java
>> uses or overrides a deprecated API.
>> Note: Recompile with -Xlint:deprecation for details.
>> 16/10/25 09:19:11 INFO orm.CompilationManager: Writing jar file:
>> /tmp/sqoop-cloudera/compile/019c3435216213411e2de14c483af692/test1.jar
>> 16/10/25 09:19:11 INFO mapreduce.ExportJobBase: Beginning export of test1
>> 16/10/25 09:19:12 INFO Configuration.deprecation: mapred.jar is
>> deprecated. Instead, use mapreduce.job.jar
>> 16/10/25 09:19:12 INFO Configuration.deprecation: mapred.map.max.attempts
>> is deprecated. Instead, use mapreduce.map.maxattempts
>> 16/10/25 09:19:13 INFO manager.SqlManager: Executing SQL statement:
>> SELECT t.* FROM "test1" AS t LIMIT 1
>> 16/10/25 09:19:13 ERROR sqoop.Sqoop: Got exception running Sqoop:
>> org.kitesdk.data.DatasetNotFoundException: Descriptor location does not
>> exist: hdfs://quickstart.cloudera:8020/user/hive/warehouse/teste1/.
>> metadata
>> org.kitesdk.data.DatasetNotFoundException: Descriptor location does not
>> exist: hdfs://quickstart.cloudera:8020/user/hive/warehouse/teste1/.
>> metadata
>> at org.kitesdk.data.spi.filesystem.FileSystemMetadataProvider.c
>> heckExists(FileSystemMetadataProvider.java:562)
>> at org.kitesdk.data.spi.filesystem.FileSystemMetadataProvider.f
>> ind(FileSystemMetadataProvider.java:605)
>> at org.kitesdk.data.spi.filesystem.FileSystemMetadataProvider.l
>> oad(FileSystemMetadataProvider.java:114)
>> at org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository.
>> load(FileSystemDatasetRepository.java:197)
>> at org.kitesdk.data.Datasets.load(Datasets.java:108)
>> at org.kitesdk.data.Datasets.load(Datasets.java:140)
>> at org.kitesdk.data.mapreduce.DatasetKeyInputFormat$ConfigBuild
>> er.readFrom(DatasetKeyInputFormat.java:92)
>> at org.kitesdk.data.mapreduce.DatasetKeyInputFormat$ConfigBuild
>> er.readFrom(DatasetKeyInputFormat.java:139)
>> at org.apache.sqoop.mapreduce.JdbcExportJob.configureInputForma
>> t(JdbcExportJob.java:84)
>> at org.apache.sqoop.mapreduce.ExportJobBase.runExport(ExportJob
>> Base.java:432)
>> at org.apache.sqoop.manager.SqlManager.exportTable(SqlManager.java:931)
>> at org.apache.sqoop.tool.ExportTool.exportTable(ExportTool.java:81)
>> at org.apache.sqoop.tool.ExportTool.run(ExportTool.java:100)
>> at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>> at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
>> at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
>> at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
>> at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
>>
>> I saw a recent JIRA opened about this, https://issues.apache.or
>> g/jira/browse/SQOOP-2907, and am wondering if there is any workaround
>> for this?
>>
>> Thanks in advance,
>>
>> Douglas
>>
>> --
>> Visite: http://canseidesercowboy.wordpress.com/
>> Siga: @dougspadotto ou @excowboys
>> -----
>> Frodo: "I wish none of this had happened."
>> Gandalf: "So do all who live to see such times, but that is not for them
>> to decide. All we have to decide is what to do with the time that is given
>> to us."
>> -- Lord of the Rings: The Fellowship of the Ring (2001)
>>
>
>


-- 
Visite: http://canseidesercowboy.wordpress.com/
Siga: @dougspadotto ou @excowboys
-----
Frodo: "I wish none of this had happened."
Gandalf: "So do all who live to see such times, but that is not for them to
decide. All we have to decide is what to do with the time that is given to
us."
-- Lord of the Rings: The Fellowship of the Ring (2001)

Re: Sqoop export from Hive table stored as Parquet

Posted by Markus Kemper <ma...@cloudera.com>.
Hello Douglas,

The only workaround that I am aware of is to use the Sqoop --hcatalog
options, for example:

sqoop export --connect <jdbc_connection_string> --table <rdbms_table>
--hcatalog-database <hive_database> --hcatalog-table <hive_table>



Markus Kemper
Customer Operations Engineer
[image: www.cloudera.com] <http://www.cloudera.com>


On Tue, Oct 25, 2016 at 1:21 PM, Douglas Spadotto <do...@gmail.com>
wrote:

> Hello everyone,
>
> I saw in the past few months quite a few messages about Parquet support on
> Sqoop, all about importing. Some of them worked well.
>
> But for exporting I'm receiving this error when trying to export from a
> Hive table stored as Parquet to Postgresql:
>
> [cloudera@quickstart ~]$ sqoop export --connect
> jdbc:postgresql://localhost/postgres --table test1  --export-dir
> /user/hive/warehouse/teste1
> Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will
> fail.
> Please set $ACCUMULO_HOME to the root of your Accumulo installation.
> 16/10/25 09:19:09 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.8.0
> 16/10/25 09:19:09 INFO manager.SqlManager: Using default fetchSize of 1000
> 16/10/25 09:19:09 INFO tool.CodeGenTool: Beginning code generation
> 16/10/25 09:19:10 INFO manager.SqlManager: Executing SQL statement: SELECT
> t.* FROM "test1" AS t LIMIT 1
> 16/10/25 09:19:10 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is
> /usr/lib/hadoop-mapreduce
> Note: /tmp/sqoop-cloudera/compile/019c3435216213411e2de14c483af692/test1.java
> uses or overrides a deprecated API.
> Note: Recompile with -Xlint:deprecation for details.
> 16/10/25 09:19:11 INFO orm.CompilationManager: Writing jar file:
> /tmp/sqoop-cloudera/compile/019c3435216213411e2de14c483af692/test1.jar
> 16/10/25 09:19:11 INFO mapreduce.ExportJobBase: Beginning export of test1
> 16/10/25 09:19:12 INFO Configuration.deprecation: mapred.jar is
> deprecated. Instead, use mapreduce.job.jar
> 16/10/25 09:19:12 INFO Configuration.deprecation: mapred.map.max.attempts
> is deprecated. Instead, use mapreduce.map.maxattempts
> 16/10/25 09:19:13 INFO manager.SqlManager: Executing SQL statement: SELECT
> t.* FROM "test1" AS t LIMIT 1
> 16/10/25 09:19:13 ERROR sqoop.Sqoop: Got exception running Sqoop:
> org.kitesdk.data.DatasetNotFoundException: Descriptor location does not
> exist: hdfs://quickstart.cloudera:8020/user/hive/warehouse/
> teste1/.metadata
> org.kitesdk.data.DatasetNotFoundException: Descriptor location does not
> exist: hdfs://quickstart.cloudera:8020/user/hive/warehouse/
> teste1/.metadata
> at org.kitesdk.data.spi.filesystem.FileSystemMetadataProvider.checkExists(
> FileSystemMetadataProvider.java:562)
> at org.kitesdk.data.spi.filesystem.FileSystemMetadataProvider.find(
> FileSystemMetadataProvider.java:605)
> at org.kitesdk.data.spi.filesystem.FileSystemMetadataProvider.load(
> FileSystemMetadataProvider.java:114)
> at org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository.load(
> FileSystemDatasetRepository.java:197)
> at org.kitesdk.data.Datasets.load(Datasets.java:108)
> at org.kitesdk.data.Datasets.load(Datasets.java:140)
> at org.kitesdk.data.mapreduce.DatasetKeyInputFormat$
> ConfigBuilder.readFrom(DatasetKeyInputFormat.java:92)
> at org.kitesdk.data.mapreduce.DatasetKeyInputFormat$
> ConfigBuilder.readFrom(DatasetKeyInputFormat.java:139)
> at org.apache.sqoop.mapreduce.JdbcExportJob.configureInputFormat(
> JdbcExportJob.java:84)
> at org.apache.sqoop.mapreduce.ExportJobBase.runExport(
> ExportJobBase.java:432)
> at org.apache.sqoop.manager.SqlManager.exportTable(SqlManager.java:931)
> at org.apache.sqoop.tool.ExportTool.exportTable(ExportTool.java:81)
> at org.apache.sqoop.tool.ExportTool.run(ExportTool.java:100)
> at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
> at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
> at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
> at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
>
> I saw a recent JIRA opened about this, https://issues.apache.
> org/jira/browse/SQOOP-2907, and am wondering if there is any workaround
> for this?
>
> Thanks in advance,
>
> Douglas
>
> --
> Visite: http://canseidesercowboy.wordpress.com/
> Siga: @dougspadotto ou @excowboys
> -----
> Frodo: "I wish none of this had happened."
> Gandalf: "So do all who live to see such times, but that is not for them
> to decide. All we have to decide is what to do with the time that is given
> to us."
> -- Lord of the Rings: The Fellowship of the Ring (2001)
>