You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@drill.apache.org by Arthur Chan <ar...@gmail.com> on 2015/06/17 16:33:33 UTC

How to configure HDFS storage

Hi,

I am new to Drill, could anyone advise how to configure HDFS storage and
how to do smoke tests about Drill on HDFS?

=============
{
  "type": "file",
  "enabled": true,
  "connection": "file:///",
  "workspaces": {
    "root": {
      "location": "/",
      "writable": true,
      "defaultInputFormat": null
    },
    "tmp": {
      "location": "/tmp",
      "writable": true,
      "defaultInputFormat": null
    }
  },
  "formats": {
    "psv": {
      "type": "text",
      "extensions": [
        "tbl"
      ],
      "delimiter": "|"
    },
    "csv": {
      "type": "text",
      "extensions": [
        "csv"
      ],
      "delimiter": ","
    },
    "tsv": {
      "type": "text",
      "extensions": [
        "tsv"
      ],
      "delimiter": "\t"
    },
    "parquet": {
      "type": "parquet"
    },
    "json": {
      "type": "json"
    },
    "avro": {
      "type": "avro"
    }
  }
}
=============

Re: How to configure HDFS storage

Posted by Venki Korukanti <ve...@gmail.com>.

Hi Arthur,

Currently schema of FileSystem storage plugin doesn't allow you to any
configuration property other than "fs.default.name" (basically the
connection string) for configuration. It would be good if we have "config"
section similar to hive or hbase storage plugins. Please log a jira for
improvement. One workaround is copy core-site.xml/hdfs-site.xml for hadoop
installation into a directory which is on the Drills classpath. This way
whenever a Configuration object is instantiated it gets all properties
including HDFS HA properties from core-site.xml/hdfs-site.xml.

Thanks
Venki

On Thu, Jul 16, 2015 at 1:43 AM, Arthur Chan <ar...@gmail.com>
wrote:

> On Wed, Jun 17, 2015 at 10:47 PM, Arthur Chan <ar...@gmail.com>
> wrote:
>
> > Hi,
> >
> > Thanks!
> >
> > There is one thing about "connection" that I feel confused:
> >
> > In the HA Hadoop Cluster, instead of use <ip_address> and <port>,  we use
> > <zookeeper QuorumPeerMain> and use <dfs.nameservices> to manage the
> > connection to HA HDFS,
> >
> >
> > Below is the hdfs-site.xml here:
> >
> > ==============
> >
> >     <property>
> >
> >      <name>dfs.nameservices</name>
> >
> >      <value>mycluster</value>
> >
> >      <final>true</final>
> >
> >     </property>
> > ================
> >
> >
> > How to define ""connection" properly in my case?
> >
> > Regards
> >
> >
> >
> >
> > On Wed, Jun 17, 2015 at 10:35 PM, Akif Khan <ak...@innovaccer.com>
> > wrote:
> >
> >> This is the config for HDFS :
> >>
> >> {
> >>   "type": "file",
> >>   "enabled": true,
> >>   "connection": "hdfs://<ip_address>:8020/",
> >>   "workspaces": {
> >>     "root": {
> >>       "location": "/user/data",
> >>       "writable": true,
> >>       "defaultInputFormat": null
> >>     }
> >>   },
> >>   "formats": {
> >>     "psv": {
> >>       "type": "text",
> >>       "extensions": [
> >>         "tbl"
> >>       ],
> >>       "delimiter": "|"
> >>     },
> >>     "csv": {
> >>       "type": "text",
> >>       "extensions": [
> >>         "csv"
> >>       ],
> >>       "delimiter": ","
> >>     },
> >>     "tsv": {
> >>       "type": "text",
> >>       "extensions": [
> >>         "tsv"
> >>       ],
> >>       "delimiter": "\t"
> >>     },
> >>     "parquet": {
> >>       "type": "parquet"
> >>     },
> >>     "json": {
> >>       "type": "json"
> >>     },
> >>     "avro": {
> >>       "type": "avro"
> >>     }
> >>   }
> >> }
> >>
> >> On Wed, Jun 17, 2015 at 8:03 PM, Arthur Chan <ar...@gmail.com>
> >> wrote:
> >>
> >> > Hi,
> >> >
> >> > I am new to Drill, could anyone advise how to configure HDFS storage
> and
> >> > how to do smoke tests about Drill on HDFS?
> >> >
> >> > =============
> >> > {
> >> >   "type": "file",
> >> >   "enabled": true,
> >> >   "connection": "file:///",
> >> >   "workspaces": {
> >> >     "root": {
> >> >       "location": "/",
> >> >       "writable": true,
> >> >       "defaultInputFormat": null
> >> >     },
> >> >     "tmp": {
> >> >       "location": "/tmp",
> >> >       "writable": true,
> >> >       "defaultInputFormat": null
> >> >     }
> >> >   },
> >> >   "formats": {
> >> >     "psv": {
> >> >       "type": "text",
> >> >       "extensions": [
> >> >         "tbl"
> >> >       ],
> >> >       "delimiter": "|"
> >> >     },
> >> >     "csv": {
> >> >       "type": "text",
> >> >       "extensions": [
> >> >         "csv"
> >> >       ],
> >> >       "delimiter": ","
> >> >     },
> >> >     "tsv": {
> >> >       "type": "text",
> >> >       "extensions": [
> >> >         "tsv"
> >> >       ],
> >> >       "delimiter": "\t"
> >> >     },
> >> >     "parquet": {
> >> >       "type": "parquet"
> >> >     },
> >> >     "json": {
> >> >       "type": "json"
> >> >     },
> >> >     "avro": {
> >> >       "type": "avro"
> >> >     }
> >> >   }
> >> > }
> >> > =============
> >> >
> >>
> >>
> >>
> >> --
> >> Regards
> >>
> >> *Akif Khan*
> >> *InnovAccer Inc.*
> >> *www.innovaccer.com <http://www.innovaccer.com>*
> >> *+91 8802290360*
> >>
> >
> >
>

Re: How to configure HDFS storage

Posted by Arthur Chan <ar...@gmail.com>.

On Wed, Jun 17, 2015 at 10:47 PM, Arthur Chan <ar...@gmail.com>
wrote:

> Hi,
>
> Thanks!
>
> There is one thing about "connection" that I feel confused:
>
> In the HA Hadoop Cluster, instead of use <ip_address> and <port>,  we use
> <zookeeper QuorumPeerMain> and use <dfs.nameservices> to manage the
> connection to HA HDFS,
>
>
> Below is the hdfs-site.xml here:
>
> ==============
>
>     <property>
>
>      <name>dfs.nameservices</name>
>
>      <value>mycluster</value>
>
>      <final>true</final>
>
>     </property>
> ================
>
>
> How to define ""connection" properly in my case?
>
> Regards
>
>
>
>
> On Wed, Jun 17, 2015 at 10:35 PM, Akif Khan <ak...@innovaccer.com>
> wrote:
>
>> This is the config for HDFS :
>>
>> {
>>   "type": "file",
>>   "enabled": true,
>>   "connection": "hdfs://<ip_address>:8020/",
>>   "workspaces": {
>>     "root": {
>>       "location": "/user/data",
>>       "writable": true,
>>       "defaultInputFormat": null
>>     }
>>   },
>>   "formats": {
>>     "psv": {
>>       "type": "text",
>>       "extensions": [
>>         "tbl"
>>       ],
>>       "delimiter": "|"
>>     },
>>     "csv": {
>>       "type": "text",
>>       "extensions": [
>>         "csv"
>>       ],
>>       "delimiter": ","
>>     },
>>     "tsv": {
>>       "type": "text",
>>       "extensions": [
>>         "tsv"
>>       ],
>>       "delimiter": "\t"
>>     },
>>     "parquet": {
>>       "type": "parquet"
>>     },
>>     "json": {
>>       "type": "json"
>>     },
>>     "avro": {
>>       "type": "avro"
>>     }
>>   }
>> }
>>
>> On Wed, Jun 17, 2015 at 8:03 PM, Arthur Chan <ar...@gmail.com>
>> wrote:
>>
>> > Hi,
>> >
>> > I am new to Drill, could anyone advise how to configure HDFS storage and
>> > how to do smoke tests about Drill on HDFS?
>> >
>> > =============
>> > {
>> >   "type": "file",
>> >   "enabled": true,
>> >   "connection": "file:///",
>> >   "workspaces": {
>> >     "root": {
>> >       "location": "/",
>> >       "writable": true,
>> >       "defaultInputFormat": null
>> >     },
>> >     "tmp": {
>> >       "location": "/tmp",
>> >       "writable": true,
>> >       "defaultInputFormat": null
>> >     }
>> >   },
>> >   "formats": {
>> >     "psv": {
>> >       "type": "text",
>> >       "extensions": [
>> >         "tbl"
>> >       ],
>> >       "delimiter": "|"
>> >     },
>> >     "csv": {
>> >       "type": "text",
>> >       "extensions": [
>> >         "csv"
>> >       ],
>> >       "delimiter": ","
>> >     },
>> >     "tsv": {
>> >       "type": "text",
>> >       "extensions": [
>> >         "tsv"
>> >       ],
>> >       "delimiter": "\t"
>> >     },
>> >     "parquet": {
>> >       "type": "parquet"
>> >     },
>> >     "json": {
>> >       "type": "json"
>> >     },
>> >     "avro": {
>> >       "type": "avro"
>> >     }
>> >   }
>> > }
>> > =============
>> >
>>
>>
>>
>> --
>> Regards
>>
>> *Akif Khan*
>> *InnovAccer Inc.*
>> *www.innovaccer.com <http://www.innovaccer.com>*
>> *+91 8802290360*
>>
>
>

Re: How to configure HDFS storage

Posted by Arthur Chan <ar...@gmail.com>.

Hi,

Thanks!

There is one thing about "connection" that I feel confused:

In the HA Hadoop Cluster, instead of use <ip_address> and <port>,  we use
<zookeeper QuorumPeerMain> and use <dfs.nameservices> to manage the
connection to HA HDFS,


Below is the hdfs-site.xml here:

==============

    <property>

     <name>dfs.nameservices</name>

     <value>mycluster</value>

     <final>true</final>

    </property>
================


How to define ""connection" properly in my case?

Regards




On Wed, Jun 17, 2015 at 10:35 PM, Akif Khan <ak...@innovaccer.com>
wrote:

> This is the config for HDFS :
>
> {
>   "type": "file",
>   "enabled": true,
>   "connection": "hdfs://<ip_address>:8020/",
>   "workspaces": {
>     "root": {
>       "location": "/user/data",
>       "writable": true,
>       "defaultInputFormat": null
>     }
>   },
>   "formats": {
>     "psv": {
>       "type": "text",
>       "extensions": [
>         "tbl"
>       ],
>       "delimiter": "|"
>     },
>     "csv": {
>       "type": "text",
>       "extensions": [
>         "csv"
>       ],
>       "delimiter": ","
>     },
>     "tsv": {
>       "type": "text",
>       "extensions": [
>         "tsv"
>       ],
>       "delimiter": "\t"
>     },
>     "parquet": {
>       "type": "parquet"
>     },
>     "json": {
>       "type": "json"
>     },
>     "avro": {
>       "type": "avro"
>     }
>   }
> }
>
> On Wed, Jun 17, 2015 at 8:03 PM, Arthur Chan <ar...@gmail.com>
> wrote:
>
> > Hi,
> >
> > I am new to Drill, could anyone advise how to configure HDFS storage and
> > how to do smoke tests about Drill on HDFS?
> >
> > =============
> > {
> >   "type": "file",
> >   "enabled": true,
> >   "connection": "file:///",
> >   "workspaces": {
> >     "root": {
> >       "location": "/",
> >       "writable": true,
> >       "defaultInputFormat": null
> >     },
> >     "tmp": {
> >       "location": "/tmp",
> >       "writable": true,
> >       "defaultInputFormat": null
> >     }
> >   },
> >   "formats": {
> >     "psv": {
> >       "type": "text",
> >       "extensions": [
> >         "tbl"
> >       ],
> >       "delimiter": "|"
> >     },
> >     "csv": {
> >       "type": "text",
> >       "extensions": [
> >         "csv"
> >       ],
> >       "delimiter": ","
> >     },
> >     "tsv": {
> >       "type": "text",
> >       "extensions": [
> >         "tsv"
> >       ],
> >       "delimiter": "\t"
> >     },
> >     "parquet": {
> >       "type": "parquet"
> >     },
> >     "json": {
> >       "type": "json"
> >     },
> >     "avro": {
> >       "type": "avro"
> >     }
> >   }
> > }
> > =============
> >
>
>
>
> --
> Regards
>
> *Akif Khan*
> *InnovAccer Inc.*
> *www.innovaccer.com <http://www.innovaccer.com>*
> *+91 8802290360*
>

Re: How to configure HDFS storage

Posted by Akif Khan <ak...@innovaccer.com>.

This is the config for HDFS :

{
  "type": "file",
  "enabled": true,
  "connection": "hdfs://<ip_address>:8020/",
  "workspaces": {
    "root": {
      "location": "/user/data",
      "writable": true,
      "defaultInputFormat": null
    }
  },
  "formats": {
    "psv": {
      "type": "text",
      "extensions": [
        "tbl"
      ],
      "delimiter": "|"
    },
    "csv": {
      "type": "text",
      "extensions": [
        "csv"
      ],
      "delimiter": ","
    },
    "tsv": {
      "type": "text",
      "extensions": [
        "tsv"
      ],
      "delimiter": "\t"
    },
    "parquet": {
      "type": "parquet"
    },
    "json": {
      "type": "json"
    },
    "avro": {
      "type": "avro"
    }
  }
}

On Wed, Jun 17, 2015 at 8:03 PM, Arthur Chan <ar...@gmail.com>
wrote:

> Hi,
>
> I am new to Drill, could anyone advise how to configure HDFS storage and
> how to do smoke tests about Drill on HDFS?
>
> =============
> {
>   "type": "file",
>   "enabled": true,
>   "connection": "file:///",
>   "workspaces": {
>     "root": {
>       "location": "/",
>       "writable": true,
>       "defaultInputFormat": null
>     },
>     "tmp": {
>       "location": "/tmp",
>       "writable": true,
>       "defaultInputFormat": null
>     }
>   },
>   "formats": {
>     "psv": {
>       "type": "text",
>       "extensions": [
>         "tbl"
>       ],
>       "delimiter": "|"
>     },
>     "csv": {
>       "type": "text",
>       "extensions": [
>         "csv"
>       ],
>       "delimiter": ","
>     },
>     "tsv": {
>       "type": "text",
>       "extensions": [
>         "tsv"
>       ],
>       "delimiter": "\t"
>     },
>     "parquet": {
>       "type": "parquet"
>     },
>     "json": {
>       "type": "json"
>     },
>     "avro": {
>       "type": "avro"
>     }
>   }
> }
> =============
>



-- 
Regards

*Akif Khan*
*InnovAccer Inc.*
*www.innovaccer.com <http://www.innovaccer.com>*
*+91 8802290360*