You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by Manjeet Singh <ma...@gmail.com> on 2016/12/26 05:56:12 UTC

Drill with Parquet

Hi All,

I have query, I want to create table/ view on drill I have data stored in
parquet files on hdfs


i am using below command but not able to create table


CREATE TABLE 'ttttt'(
"AAA",  "Domain", "certValidity", "protocol", "LastActive", "GetCount",
"PostCount", "Data_Transfer", "Data_Receive", "Total_Communication",
"Last_Active") AS SELECT "AAA",  "Domain", "certValidity", "protocol",
"LastActive", "GetCount", "PostCount", "Data_Transfer", "Data_Receive",
"Total_Communication", "Last_Active" FROM
dfs."/user/Domain-1-_1481273732716/*";

below is the schema

-- AAA: string (nullable = true)
            |-- BTSID: string (nullable = true)
            |-- OIME: string (nullable = true)
           |-- OIMS: string (nullable = true)
            |-- application: string (nullable = true)
            |-- applicationCount: string (nullable = true)
            |-- dataRx: string (nullable = true)
            |-- dataTx: string (nullable = true)
            |-- day: string (nullable = true)
            |-- duration: string (nullable = true)
            |-- locationLatLong: string (nullable = true)
            |-- month: string (nullable = true)
            |-- protocol: string (nullable = true)
            |-- protocolCount: string (nullable = true)
            |-- serverIP: string (nullable = true)
            |-- startTime: string (nullable = true)
            |-- totalVolume: string (nullable = true)
            |-- year: string (nullable = true)

can anyone help me out?

Thanks
Manjeet

-- 
luv all

Re: Drill with Parquet

Posted by ankit beohar <an...@gmail.com>.
Hi Manjeet,

PFB link it will you.

https://www.mapr.com/blog/how-convert-csv-file-apache-parquet-using-apache-drill

Best Regards,
ANKIT BEOHAR


On Mon, Dec 26, 2016 at 2:29 PM, Khurram Faraaz <kf...@maprtech.com>
wrote:

> Hello Manjeet,
>
> What error do you see on the prompt from where you submit the CTAS
> statement ?
> Can you please share the stacktrace/error information from the drillbit.log
> file ?
> is your CTAS (the one that failed), is it over CSV/JSON files ?
>
> And when you say your SELECT statement is not working, what do you see on
> the prompt, an error/Exception, no results ?
> Can you also share the schema details for your parquet file (*
> 02a566d.snappy.parquet) in your SELECT statement ?
>
> What version of Drill are you using ?
>
> On Mon, Dec 26, 2016 at 12:16 PM, Manjeet Singh <
> manjeet.chandhok@gmail.com>
> wrote:
>
> > Hi all
> >
> > I have checked that even my simplest query is not working
> >
> > SELECT AAA FROM
> > dfs.'/user/olap/EntityProfiling/DomainParquet/
> > Domain-1-_1481273732716/part-r-00000-7c4b50c5-0318-4243-
> > 9f76-9822c02a566d.snappy.parquet';
> >
> >
> > Thanks
> > Manjeet
> >
> > On Mon, Dec 26, 2016 at 11:26 AM, Manjeet Singh <
> > manjeet.chandhok@gmail.com>
> > wrote:
> >
> > > Hi All,
> > >
> > > I have query, I want to create table/ view on drill I have data stored
> in
> > > parquet files on hdfs
> > >
> > >
> > > i am using below command but not able to create table
> > >
> > >
> > > CREATE TABLE 'ttttt'(
> > > "AAA",  "Domain", "certValidity", "protocol", "LastActive", "GetCount",
> > > "PostCount", "Data_Transfer", "Data_Receive", "Total_Communication",
> > > "Last_Active") AS SELECT "AAA",  "Domain", "certValidity", "protocol",
> > > "LastActive", "GetCount", "PostCount", "Data_Transfer", "Data_Receive",
> > > "Total_Communication", "Last_Active" FROM dfs."/user/Domain-1-_
> > > 1481273732716/*";
> > >
> > > below is the schema
> > >
> > > -- AAA: string (nullable = true)
> > >             |-- BTSID: string (nullable = true)
> > >             |-- OIME: string (nullable = true)
> > >            |-- OIMS: string (nullable = true)
> > >             |-- application: string (nullable = true)
> > >             |-- applicationCount: string (nullable = true)
> > >             |-- dataRx: string (nullable = true)
> > >             |-- dataTx: string (nullable = true)
> > >             |-- day: string (nullable = true)
> > >             |-- duration: string (nullable = true)
> > >             |-- locationLatLong: string (nullable = true)
> > >             |-- month: string (nullable = true)
> > >             |-- protocol: string (nullable = true)
> > >             |-- protocolCount: string (nullable = true)
> > >             |-- serverIP: string (nullable = true)
> > >             |-- startTime: string (nullable = true)
> > >             |-- totalVolume: string (nullable = true)
> > >             |-- year: string (nullable = true)
> > >
> > > can anyone help me out?
> > >
> > > Thanks
> > > Manjeet
> > >
> > > --
> > > luv all
> > >
> >
> >
> >
> > --
> > luv all
> >
>

Re: Drill with Parquet

Posted by Khurram Faraaz <kf...@maprtech.com>.
Hello Manjeet,

What error do you see on the prompt from where you submit the CTAS
statement ?
Can you please share the stacktrace/error information from the drillbit.log
file ?
is your CTAS (the one that failed), is it over CSV/JSON files ?

And when you say your SELECT statement is not working, what do you see on
the prompt, an error/Exception, no results ?
Can you also share the schema details for your parquet file (*
02a566d.snappy.parquet) in your SELECT statement ?

What version of Drill are you using ?

On Mon, Dec 26, 2016 at 12:16 PM, Manjeet Singh <ma...@gmail.com>
wrote:

> Hi all
>
> I have checked that even my simplest query is not working
>
> SELECT AAA FROM
> dfs.'/user/olap/EntityProfiling/DomainParquet/
> Domain-1-_1481273732716/part-r-00000-7c4b50c5-0318-4243-
> 9f76-9822c02a566d.snappy.parquet';
>
>
> Thanks
> Manjeet
>
> On Mon, Dec 26, 2016 at 11:26 AM, Manjeet Singh <
> manjeet.chandhok@gmail.com>
> wrote:
>
> > Hi All,
> >
> > I have query, I want to create table/ view on drill I have data stored in
> > parquet files on hdfs
> >
> >
> > i am using below command but not able to create table
> >
> >
> > CREATE TABLE 'ttttt'(
> > "AAA",  "Domain", "certValidity", "protocol", "LastActive", "GetCount",
> > "PostCount", "Data_Transfer", "Data_Receive", "Total_Communication",
> > "Last_Active") AS SELECT "AAA",  "Domain", "certValidity", "protocol",
> > "LastActive", "GetCount", "PostCount", "Data_Transfer", "Data_Receive",
> > "Total_Communication", "Last_Active" FROM dfs."/user/Domain-1-_
> > 1481273732716/*";
> >
> > below is the schema
> >
> > -- AAA: string (nullable = true)
> >             |-- BTSID: string (nullable = true)
> >             |-- OIME: string (nullable = true)
> >            |-- OIMS: string (nullable = true)
> >             |-- application: string (nullable = true)
> >             |-- applicationCount: string (nullable = true)
> >             |-- dataRx: string (nullable = true)
> >             |-- dataTx: string (nullable = true)
> >             |-- day: string (nullable = true)
> >             |-- duration: string (nullable = true)
> >             |-- locationLatLong: string (nullable = true)
> >             |-- month: string (nullable = true)
> >             |-- protocol: string (nullable = true)
> >             |-- protocolCount: string (nullable = true)
> >             |-- serverIP: string (nullable = true)
> >             |-- startTime: string (nullable = true)
> >             |-- totalVolume: string (nullable = true)
> >             |-- year: string (nullable = true)
> >
> > can anyone help me out?
> >
> > Thanks
> > Manjeet
> >
> > --
> > luv all
> >
>
>
>
> --
> luv all
>

Re: Drill with Parquet

Posted by Manjeet Singh <ma...@gmail.com>.
Hi all

I have checked that even my simplest query is not working

SELECT AAA FROM
dfs.'/user/olap/EntityProfiling/DomainParquet/Domain-1-_1481273732716/part-r-00000-7c4b50c5-0318-4243-9f76-9822c02a566d.snappy.parquet';


Thanks
Manjeet

On Mon, Dec 26, 2016 at 11:26 AM, Manjeet Singh <ma...@gmail.com>
wrote:

> Hi All,
>
> I have query, I want to create table/ view on drill I have data stored in
> parquet files on hdfs
>
>
> i am using below command but not able to create table
>
>
> CREATE TABLE 'ttttt'(
> "AAA",  "Domain", "certValidity", "protocol", "LastActive", "GetCount",
> "PostCount", "Data_Transfer", "Data_Receive", "Total_Communication",
> "Last_Active") AS SELECT "AAA",  "Domain", "certValidity", "protocol",
> "LastActive", "GetCount", "PostCount", "Data_Transfer", "Data_Receive",
> "Total_Communication", "Last_Active" FROM dfs."/user/Domain-1-_
> 1481273732716/*";
>
> below is the schema
>
> -- AAA: string (nullable = true)
>             |-- BTSID: string (nullable = true)
>             |-- OIME: string (nullable = true)
>            |-- OIMS: string (nullable = true)
>             |-- application: string (nullable = true)
>             |-- applicationCount: string (nullable = true)
>             |-- dataRx: string (nullable = true)
>             |-- dataTx: string (nullable = true)
>             |-- day: string (nullable = true)
>             |-- duration: string (nullable = true)
>             |-- locationLatLong: string (nullable = true)
>             |-- month: string (nullable = true)
>             |-- protocol: string (nullable = true)
>             |-- protocolCount: string (nullable = true)
>             |-- serverIP: string (nullable = true)
>             |-- startTime: string (nullable = true)
>             |-- totalVolume: string (nullable = true)
>             |-- year: string (nullable = true)
>
> can anyone help me out?
>
> Thanks
> Manjeet
>
> --
> luv all
>



-- 
luv all

Re: Drill with Parquet

Posted by Khurram Faraaz <kf...@maprtech.com>.
here is an example of CTAS over a parquet file. (the parquet file is
under /drill/testdata/join/typeall_l
directory)

CREATE TABLE l_tblprtnby_intcl
PARTITION BY( col_int )
AS SELECT * FROM dfs.`/drill/testdata/join/typeall_l`;

On Thu, Dec 29, 2016 at 5:32 PM, Khurram Faraaz <kf...@maprtech.com>
wrote:

> Please look at these examples on the documentation links below
>
> here is the link to supported datatypes in Drill -
> https://drill.apache.org/docs/supported-data-types/
>
> and link to CTAS in Drill - http://drill.apache.org/
> docs/create-table-as-ctas-command/
>
> On Thu, Dec 29, 2016 at 4:40 PM, Manjeet Singh <manjeet.chandhok@gmail.com
> > wrote:
>
>> Hi
>>
>> I am trying below query
>>
>> USE dfs;
>> CREATE table dfs.`view_name`(AAA String,
>>   Domain String,
>>   certValidity String,
>>   protocol String,
>>   LastActive String,
>>   GetCount DOUBLE,
>>   PostCount DOUBLE,
>>   Data_Transfer DOUBLE,
>>   Data_Receive DOUBLE,
>>   Total_Communication DOUBLE,
>>   Last_Active String)AS SELECT AAA String,
>>   Domain String,
>>   certValidity String,
>>   protocol String,
>>   LastActive String,
>>   GetCount DOUBLE,
>>   PostCount DOUBLE,
>>   Data_Transfer DOUBLE,
>>   Data_Receive DOUBLE,
>>   Total_Communication DOUBLE,
>>   Last_Active String FROM
>> dfs.`/Users/drilluser/apache-drill-1.0/sample-sata/nation.parquet`;
>>
>>
>>
>> still getting error
>>
>> can anyone suggest me what I am doing wrong
>> second can anyone share how to create table over parquet file which is on
>> hdfs?
>>
>> On Tue, Dec 27, 2016 at 2:52 AM, Ted Dunning <te...@gmail.com>
>> wrote:
>>
>> > On Sun, Dec 25, 2016 at 9:56 PM, Manjeet Singh <
>> manjeet.chandhok@gmail.com
>> > >
>> > wrote:
>> >
>> > > CREATE TABLE 'ttttt'(
>> > > "AAA",  "Domain", "certValidity", "protocol", "LastActive",
>> "GetCount",
>> > > "PostCount", "Data_Transfer", "Data_Receive", "Total_Communication",
>> > > "Last_Active") AS SELECT "AAA",  "Domain", "certValidity", "protocol",
>> > > "LastActive", "GetCount", "PostCount", "Data_Transfer",
>> "Data_Receive",
>> > > "Total_Communication", "Last_Active" FROM
>> > > dfs."/user/Domain-1-_1481273732716/*";
>> > >
>> >
>> > You aren't specifying where your new table will be created. THis is
>> > probably in dfs. You may also need to provide a workspace.  For
>> instance,
>> > when I was creating a file this morning I used this:
>> >
>> > create table maprfs.ted.fooble as ...
>> >
>> > The parts of this table name are:
>> >
>> > maprfs - the storage
>> >
>> > ted - the workspace (aka my home directory)
>> >
>> > fooble - the table (no need for quotes for some kinds of names ...
>> commonly
>> > necessary, however)
>> >
>> > You have to make sure that your workspace allows writes.
>> >
>>
>>
>>
>> --
>> luv all
>>
>
>

Re: Drill with Parquet

Posted by Khurram Faraaz <kf...@maprtech.com>.
Please look at these examples on the documentation links below

here is the link to supported datatypes in Drill -
https://drill.apache.org/docs/supported-data-types/

and link to CTAS in Drill -
http://drill.apache.org/docs/create-table-as-ctas-command/

On Thu, Dec 29, 2016 at 4:40 PM, Manjeet Singh <ma...@gmail.com>
wrote:

> Hi
>
> I am trying below query
>
> USE dfs;
> CREATE table dfs.`view_name`(AAA String,
>   Domain String,
>   certValidity String,
>   protocol String,
>   LastActive String,
>   GetCount DOUBLE,
>   PostCount DOUBLE,
>   Data_Transfer DOUBLE,
>   Data_Receive DOUBLE,
>   Total_Communication DOUBLE,
>   Last_Active String)AS SELECT AAA String,
>   Domain String,
>   certValidity String,
>   protocol String,
>   LastActive String,
>   GetCount DOUBLE,
>   PostCount DOUBLE,
>   Data_Transfer DOUBLE,
>   Data_Receive DOUBLE,
>   Total_Communication DOUBLE,
>   Last_Active String FROM
> dfs.`/Users/drilluser/apache-drill-1.0/sample-sata/nation.parquet`;
>
>
>
> still getting error
>
> can anyone suggest me what I am doing wrong
> second can anyone share how to create table over parquet file which is on
> hdfs?
>
> On Tue, Dec 27, 2016 at 2:52 AM, Ted Dunning <te...@gmail.com>
> wrote:
>
> > On Sun, Dec 25, 2016 at 9:56 PM, Manjeet Singh <
> manjeet.chandhok@gmail.com
> > >
> > wrote:
> >
> > > CREATE TABLE 'ttttt'(
> > > "AAA",  "Domain", "certValidity", "protocol", "LastActive", "GetCount",
> > > "PostCount", "Data_Transfer", "Data_Receive", "Total_Communication",
> > > "Last_Active") AS SELECT "AAA",  "Domain", "certValidity", "protocol",
> > > "LastActive", "GetCount", "PostCount", "Data_Transfer", "Data_Receive",
> > > "Total_Communication", "Last_Active" FROM
> > > dfs."/user/Domain-1-_1481273732716/*";
> > >
> >
> > You aren't specifying where your new table will be created. THis is
> > probably in dfs. You may also need to provide a workspace.  For instance,
> > when I was creating a file this morning I used this:
> >
> > create table maprfs.ted.fooble as ...
> >
> > The parts of this table name are:
> >
> > maprfs - the storage
> >
> > ted - the workspace (aka my home directory)
> >
> > fooble - the table (no need for quotes for some kinds of names ...
> commonly
> > necessary, however)
> >
> > You have to make sure that your workspace allows writes.
> >
>
>
>
> --
> luv all
>

Re: Drill with Parquet

Posted by Manjeet Singh <ma...@gmail.com>.
Hi

I am trying below query

USE dfs;
CREATE table dfs.`view_name`(AAA String,
  Domain String,
  certValidity String,
  protocol String,
  LastActive String,
  GetCount DOUBLE,
  PostCount DOUBLE,
  Data_Transfer DOUBLE,
  Data_Receive DOUBLE,
  Total_Communication DOUBLE,
  Last_Active String)AS SELECT AAA String,
  Domain String,
  certValidity String,
  protocol String,
  LastActive String,
  GetCount DOUBLE,
  PostCount DOUBLE,
  Data_Transfer DOUBLE,
  Data_Receive DOUBLE,
  Total_Communication DOUBLE,
  Last_Active String FROM
dfs.`/Users/drilluser/apache-drill-1.0/sample-sata/nation.parquet`;



still getting error

can anyone suggest me what I am doing wrong
second can anyone share how to create table over parquet file which is on
hdfs?

On Tue, Dec 27, 2016 at 2:52 AM, Ted Dunning <te...@gmail.com> wrote:

> On Sun, Dec 25, 2016 at 9:56 PM, Manjeet Singh <manjeet.chandhok@gmail.com
> >
> wrote:
>
> > CREATE TABLE 'ttttt'(
> > "AAA",  "Domain", "certValidity", "protocol", "LastActive", "GetCount",
> > "PostCount", "Data_Transfer", "Data_Receive", "Total_Communication",
> > "Last_Active") AS SELECT "AAA",  "Domain", "certValidity", "protocol",
> > "LastActive", "GetCount", "PostCount", "Data_Transfer", "Data_Receive",
> > "Total_Communication", "Last_Active" FROM
> > dfs."/user/Domain-1-_1481273732716/*";
> >
>
> You aren't specifying where your new table will be created. THis is
> probably in dfs. You may also need to provide a workspace.  For instance,
> when I was creating a file this morning I used this:
>
> create table maprfs.ted.fooble as ...
>
> The parts of this table name are:
>
> maprfs - the storage
>
> ted - the workspace (aka my home directory)
>
> fooble - the table (no need for quotes for some kinds of names ... commonly
> necessary, however)
>
> You have to make sure that your workspace allows writes.
>



-- 
luv all

Re: Drill with Parquet

Posted by Ted Dunning <te...@gmail.com>.
On Sun, Dec 25, 2016 at 9:56 PM, Manjeet Singh <ma...@gmail.com>
wrote:

> CREATE TABLE 'ttttt'(
> "AAA",  "Domain", "certValidity", "protocol", "LastActive", "GetCount",
> "PostCount", "Data_Transfer", "Data_Receive", "Total_Communication",
> "Last_Active") AS SELECT "AAA",  "Domain", "certValidity", "protocol",
> "LastActive", "GetCount", "PostCount", "Data_Transfer", "Data_Receive",
> "Total_Communication", "Last_Active" FROM
> dfs."/user/Domain-1-_1481273732716/*";
>

You aren't specifying where your new table will be created. THis is
probably in dfs. You may also need to provide a workspace.  For instance,
when I was creating a file this morning I used this:

create table maprfs.ted.fooble as ...

The parts of this table name are:

maprfs - the storage

ted - the workspace (aka my home directory)

fooble - the table (no need for quotes for some kinds of names ... commonly
necessary, however)

You have to make sure that your workspace allows writes.