You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by Kumiko Yada <Ku...@ds-iq.com> on 2015/12/15 00:20:15 UTC
Drill Azure Blob Storage Plugin
Is there an Azure Blob Storage Plugin for Apache Drill? I'm looking for a solution that can be done without configuring Hadoop to access Azure blob (http://hadoop.apache.org/docs/r2.7.0/hadoop-azure/index.html).
Thanks
Kumiko
Re: Drill Azure Blob Storage Plugin
Posted by Tomer Shiran <ts...@dremio.com>.
Follow these steps:
- Add hadoop-azure-2.7.1.jar and azure-storage-2.0.0.jar to the
classpath. For example, copy these JARs into <drill>/jars/3rdparty. Note
that these JARs are available in a Hadoop tarball (you don't actually need
Hadoop) or directly online:
1.
http://central.maven.org/maven2/org/apache/hadoop/hadoop-azure/2.7.1/hadoop-azure-2.7.1.jar
2.
http://central.maven.org/maven2/com/microsoft/azure/azure-storage/2.0.0/azure-storage-2.0.0.jar
- Add the following XML to snippet to <drill>/conf/core-site.xml:
<property>
<name>fs.azure.account.key.YOUR_ACCOUNT.blob.core.windows.net</name>
<value>YOUR AZURE ACCESS KEY</value>
</property>
- Create a new datastore (ie, storage plugin) in Drill called "azure"
(or whatever you choose). For the configuration, just copy whatever JSON
configuration is in the default "dfs" plugin, but replace the connection
string from file:/// to wasb://
YOUR_CONTAINER@YOUR_ACCOUNT.blob.core.windows.net/ - the configuration
would look something like this:
{
"type": "file",
"enabled": true,
"connection": "wasb://drill@tshiran.blob.core.windows.net/",
"workspaces": {
"root": {
"location": "/",
"writable": false,
"defaultInputFormat": null
},
"tmp": {
"location": "/tmp",
"writable": true,
"defaultInputFormat": null
}
},
"formats": {
"psv": {
"type": "text",
"extensions": [
"tbl"
],
"delimiter": "|"
},
"csv": {
"type": "text",
"extensions": [
"csv"
],
"delimiter": ","
},
"tsv": {
"type": "text",
"extensions": [
"tsv"
],
"delimiter": "\t"
},
"parquet": {
"type": "parquet"
},
"json": {
"type": "json"
},
"avro": {
"type": "avro"
},
"sequencefile": {
"type": "sequencefile",
"extensions": [
"seq"
]
},
"csvh": {
"type": "text",
"extensions": [
"csvh"
],
"extractHeader": true,
"delimiter": ","
}
}
}
- From the Drill CLI, you could run a query like this:
SELECT COUNT(*) FROM azure.root.`sfpd2014.csv`;
- Or:
USE azure;
SELECT COUNT(*) FROM `sfpd2014.csv`;
On Mon, Dec 14, 2015 at 3:20 PM, Kumiko Yada <Ku...@ds-iq.com> wrote:
> Is there an Azure Blob Storage Plugin for Apache Drill? I'm looking for a
> solution that can be done without configuring Hadoop to access Azure blob (
> http://hadoop.apache.org/docs/r2.7.0/hadoop-azure/index.html).
>
> Thanks
> Kumiko
>