You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by "David Tucker (JIRA)" <ji...@apache.org> on 2014/08/26 23:03:58 UTC
[jira] [Created] (DRILL-1345) Drill can write to Amazon S3 storage
buckets but not read from them
David Tucker created DRILL-1345:
-----------------------------------
Summary: Drill can write to Amazon S3 storage buckets but not read from them
Key: DRILL-1345
URL: https://issues.apache.org/jira/browse/DRILL-1345
Project: Apache Drill
Issue Type: Bug
Components: Storage - Parquet, Storage - Text & CSV
Affects Versions: 0.4.0, 0.5.0
Environment: CentOS 6.3 on Amazon Web Services virtual instance
Reporter: David Tucker
Priority: Critical
After configuring the storage plug-in for Amazon S3, drill commands will correctly create parquet or csv files in the S3 bucket. However, attempting to read those file results in a software hang.
To reproduce the issue :
Confirm Hadoop access to the bucket from the shell with
'hadoop fs -ls s3://<bucket>/'
Likely causes for failure of hadoop access are incorrect user
authentication settings in core-site.xml. You'll need appropriate
AWS authentication keys for the following properties
fs.s3.awsAccessKeyId
fs.s3.awsSecretAccessKey
fs.s3n.awsAccessKeyId
fs.s3n.awsSecretAccessKey
Configure S3 storage plug-in (clone of default DFS plug-in
with a single change to the connection string {should be
"s3://<bucket>". This CANNOT BE DONE until the actual
connectivity to the bucket is verified (a separate issue with storage
plug-in configuration that MUST connect to the target
connection string or it fails).
Simple queries to create tables in the S3 bucket will work.
alter session set `store.format`='parquet' ;
create table `my-s3`.`/employee1` as select * from cp.`employee.json` ;
alter session set `store.format`='csv' ;
create table `my-s3`.`/employee2` as select * from cp.`employee.json` ;
Confirm the existence of the files in the S3 bucket, and the readability of their contents with "hadoop fs" commands.
Attempts to read the same tables will hang
select * from `my-s3`.'/employee1'
"jstack -F <drillbit_pid>" indicates there is a deadlock of some kind.
NOTE: The jets3t class enabling S3 data access from MapR Hadoop 4.0.1 client was incompatible with Drill 0.4 and 0.5. I had to leave the jets3t library excluded (via hadoop-excludes.txt) and copy in older jets3t support from MapR 3.0.3.
--
This message was sent by Atlassian JIRA
(v6.2#6252)