You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sqoop.apache.org by "Boglarka Egyed (Jira)" <ji...@apache.org> on 2019/11/04 12:10:00 UTC

[jira] [Commented] (SQOOP-3455) Sqoop job fails while importing to S3 as Parquet

    [ https://issues.apache.org/jira/browse/SQOOP-3455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16966604#comment-16966604 ] 

Boglarka Egyed commented on SQOOP-3455:
---------------------------------------

[~kritijha] this is the same issue as SQOOP-3453.

Sqoop import into S3 is not supported with this version, it has been introduced only by SQOOP-3345 but has not been included into any official release yet. It works in trunk version though. (There are some use cases that are already working with 1.4.7 too but these are not tested at all.)

Furthermore, Parquet import into S3 will require some more options to set as traditionally Sqoop used Kite SDK to read/write Parquet that has been changed in SQOOP-3313 because it has many limitations. This change has also not been released yet but can be found in trunk.

> Sqoop job fails while importing to S3 as Parquet
> ------------------------------------------------
>
>                 Key: SQOOP-3455
>                 URL: https://issues.apache.org/jira/browse/SQOOP-3455
>             Project: Sqoop
>          Issue Type: Bug
>          Components: sqoop2-kite-connector
>    Affects Versions: 1.4.7
>            Reporter: Kriti Jha
>            Priority: Blocker
>
> A Sqoop job to import data from a MySQL database into S3 fails on using --as-parquetfile with the error as shown below:
> ----
> {{ERROR sqoop.Sqoop: Got exception running Sqoop: org.kitesdk.data.DatasetNotFoundException: Unknown dataset URI pattern: dataset:s3://sqoop-trial-bucket/sqoop-trial/trial
> Check that JARs for s3 datasets are on the classpath
> org.kitesdk.data.DatasetNotFoundException: Unknown dataset URI pattern: dataset:s3://}}{{sqoop-trial-bucket}}{{/sqoop-trial/trial Check that JARs for s3 datasets are on the classpath at org.kitesdk.data.spi.Registration.lookupDatasetUri(Registration.java:128) at org.kitesdk.data.Datasets.exists(Datasets.java:624) at org.kitesdk.data.Datasets.exists(Datasets.java:646) at org.apache.sqoop.mapreduce.ParquetJob.configureImportJob(ParquetJob.java:118) at org.apache.sqoop.mapreduce.DataDrivenImportJob.configureMapper(DataDrivenImportJob.java:132) at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:264) at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:692) at org.apache.sqoop.manager.MySQLManager.importTable(MySQLManager.java:127) at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:520) at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:628) at org.apache.sqoop.Sqoop.run(Sqoop.java:147) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243) at org.apache.sqoop.Sqoop.main(Sqoop.java:252)}}
> ----
> {{}}
> {{All the JARs for S3 are present in the classpath. Further, the same works on simply removing the argument --as-parquetfile, i.e. with any other format.}}
> {{}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)