You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@drill.apache.org by "David Tucker (JIRA)" <ji...@apache.org> on 2014/08/26 23:03:58 UTC

[jira] [Created] (DRILL-1345) Drill can write to Amazon S3 storage buckets but not read from them

David Tucker created DRILL-1345:
-----------------------------------

             Summary: Drill can write to Amazon S3 storage buckets but not read from them
                 Key: DRILL-1345
                 URL: https://issues.apache.org/jira/browse/DRILL-1345
             Project: Apache Drill
          Issue Type: Bug
          Components: Storage - Parquet, Storage - Text & CSV
    Affects Versions: 0.4.0, 0.5.0
         Environment: CentOS 6.3 on Amazon Web Services virtual instance
            Reporter: David Tucker
            Priority: Critical


After configuring the storage plug-in for Amazon S3, drill commands will correctly create parquet or csv files in the S3 bucket.   However, attempting to read those file results in a software hang.

To reproduce the issue :
   Confirm Hadoop access to the bucket from the shell with
       'hadoop fs -ls s3://<bucket>/'
    Likely causes for failure of hadoop access are incorrect user
        authentication settings in core-site.xml.   You'll need appropriate 
        AWS authentication keys for the following properties 
            fs.s3.awsAccessKeyId
            fs.s3.awsSecretAccessKey
            fs.s3n.awsAccessKeyId
            fs.s3n.awsSecretAccessKey
    Configure S3 storage plug-in (clone of default DFS plug-in 
      with a single change to the connection string {should be
      "s3://<bucket>".   This CANNOT BE DONE until the actual 
      connectivity to the bucket is verified (a separate issue with storage
      plug-in configuration that MUST connect to the target
      connection string or it fails).

Simple queries to create tables in the S3 bucket will work.
  alter session set `store.format`='parquet' ;
  create table `my-s3`.`/employee1` as select * from cp.`employee.json` ;
 
  alter session set `store.format`='csv' ;
  create table `my-s3`.`/employee2` as select * from cp.`employee.json` ;
 
Confirm the existence of the files in the S3 bucket, and the readability of their contents with "hadoop fs" commands.

Attempts to read the same tables will hang
     select * from `my-s3`.'/employee1'

"jstack -F <drillbit_pid>" indicates there is a deadlock of some kind.

NOTE: The jets3t class enabling S3 data access from MapR Hadoop 4.0.1 client was incompatible with Drill 0.4 and 0.5.   I had to leave the jets3t library excluded (via hadoop-excludes.txt) and copy in older jets3t support from MapR 3.0.3.    



--
This message was sent by Atlassian JIRA
(v6.2#6252)