You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Krystal (JIRA)" <ji...@apache.org> on 2015/04/24 22:46:39 UTC
[jira] [Commented] (DRILL-1345) Drill can write to Amazon S3
storage buckets but not read from them
[ https://issues.apache.org/jira/browse/DRILL-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14511704#comment-14511704 ]
Krystal commented on DRILL-1345:
--------------------------------
David - is this issue resolved? Can I close it?
> Drill can write to Amazon S3 storage buckets but not read from them
> -------------------------------------------------------------------
>
> Key: DRILL-1345
> URL: https://issues.apache.org/jira/browse/DRILL-1345
> Project: Apache Drill
> Issue Type: Bug
> Components: Storage - Parquet, Storage - Text & CSV
> Affects Versions: 0.4.0, 0.5.0
> Environment: CentOS 6.3 on Amazon Web Services virtual instance
> Reporter: David Tucker
> Assignee: David Tucker
> Priority: Critical
> Fix For: Future
>
>
> After configuring the storage plug-in for Amazon S3, drill commands will correctly create parquet or csv files in the S3 bucket. However, attempting to read those file results in a software hang.
> To reproduce the issue :
> Confirm Hadoop access to the bucket from the shell with
> 'hadoop fs -ls s3://<bucket>/'
> Likely causes for failure of hadoop access are incorrect user
> authentication settings in core-site.xml. You'll need appropriate
> AWS authentication keys for the following properties
> fs.s3.awsAccessKeyId
> fs.s3.awsSecretAccessKey
> fs.s3n.awsAccessKeyId
> fs.s3n.awsSecretAccessKey
> Configure S3 storage plug-in (clone of default DFS plug-in
> with a single change to the connection string {should be
> "s3://<bucket>". This CANNOT BE DONE until the actual
> connectivity to the bucket is verified (a separate issue with storage
> plug-in configuration that MUST connect to the target
> connection string or it fails).
> Simple queries to create tables in the S3 bucket will work.
> alter session set `store.format`='parquet' ;
> create table `my-s3`.`/employee1` as select * from cp.`employee.json` ;
>
> alter session set `store.format`='csv' ;
> create table `my-s3`.`/employee2` as select * from cp.`employee.json` ;
>
> Confirm the existence of the files in the S3 bucket, and the readability of their contents with "hadoop fs" commands.
> Attempts to read the same tables will hang
> select * from `my-s3`.'/employee1'
> "jstack -F <drillbit_pid>" indicates there is a deadlock of some kind.
> NOTE: The jets3t class enabling S3 data access from MapR Hadoop 4.0.1 client was incompatible with Drill 0.4 and 0.5. I had to leave the jets3t library excluded (via hadoop-excludes.txt) and copy in older jets3t support from MapR 3.0.3.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)