You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by Paul Mogren <PM...@commercehub.com> on 2015/05/22 23:14:22 UTC

poor experience getting started with S3

As support for AWS S3 is advertised and seems to be a common way to try
Drill with existing data, it would be nice to have S3 support fully
built-in. Having to search for and follow multi-step documentation on a
blog post, to enable Jets3t ­ including a protocol scheme error corrected
only by a third-party comment ­ was not a good experience. I have scripted
this, you can find it at
https://github.com/awslabs/emr-bootstrap-actions/pull/105

Additionally, Jets3t seems to have run its course; the prevailing advice
in the community is to switch to the AWS SDK at this point. One advantage
of doing so would be to support the EC2 instance profile credential
mechanism, to let people avoid specifying credentials in configuration.
Jets3t has stated that they are not interested in supporting this.

Frustratingly, Drill (or maybe it¹s Jets3t) seems to have an
interpretation of the various s3 protocol schemes that differs from
Hadoop¹s (http://wiki.apache.org/hadoop/AmazonS3), yet this Hadoop page
has been referenced in Drill configuration advice (I think it was on this
mailing list).


Paul