You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by Manjeet Singh <ma...@gmail.com> on 2016/12/26 05:56:12 UTC
Drill with Parquet
Hi All,
I have query, I want to create table/ view on drill I have data stored in
parquet files on hdfs
i am using below command but not able to create table
CREATE TABLE 'ttttt'(
"AAA", "Domain", "certValidity", "protocol", "LastActive", "GetCount",
"PostCount", "Data_Transfer", "Data_Receive", "Total_Communication",
"Last_Active") AS SELECT "AAA", "Domain", "certValidity", "protocol",
"LastActive", "GetCount", "PostCount", "Data_Transfer", "Data_Receive",
"Total_Communication", "Last_Active" FROM
dfs."/user/Domain-1-_1481273732716/*";
below is the schema
-- AAA: string (nullable = true)
|-- BTSID: string (nullable = true)
|-- OIME: string (nullable = true)
|-- OIMS: string (nullable = true)
|-- application: string (nullable = true)
|-- applicationCount: string (nullable = true)
|-- dataRx: string (nullable = true)
|-- dataTx: string (nullable = true)
|-- day: string (nullable = true)
|-- duration: string (nullable = true)
|-- locationLatLong: string (nullable = true)
|-- month: string (nullable = true)
|-- protocol: string (nullable = true)
|-- protocolCount: string (nullable = true)
|-- serverIP: string (nullable = true)
|-- startTime: string (nullable = true)
|-- totalVolume: string (nullable = true)
|-- year: string (nullable = true)
can anyone help me out?
Thanks
Manjeet
--
luv all
Re: Drill with Parquet
Posted by ankit beohar <an...@gmail.com>.
Hi Manjeet,
PFB link it will you.
https://www.mapr.com/blog/how-convert-csv-file-apache-parquet-using-apache-drill
Best Regards,
ANKIT BEOHAR
On Mon, Dec 26, 2016 at 2:29 PM, Khurram Faraaz <kf...@maprtech.com>
wrote:
> Hello Manjeet,
>
> What error do you see on the prompt from where you submit the CTAS
> statement ?
> Can you please share the stacktrace/error information from the drillbit.log
> file ?
> is your CTAS (the one that failed), is it over CSV/JSON files ?
>
> And when you say your SELECT statement is not working, what do you see on
> the prompt, an error/Exception, no results ?
> Can you also share the schema details for your parquet file (*
> 02a566d.snappy.parquet) in your SELECT statement ?
>
> What version of Drill are you using ?
>
> On Mon, Dec 26, 2016 at 12:16 PM, Manjeet Singh <
> manjeet.chandhok@gmail.com>
> wrote:
>
> > Hi all
> >
> > I have checked that even my simplest query is not working
> >
> > SELECT AAA FROM
> > dfs.'/user/olap/EntityProfiling/DomainParquet/
> > Domain-1-_1481273732716/part-r-00000-7c4b50c5-0318-4243-
> > 9f76-9822c02a566d.snappy.parquet';
> >
> >
> > Thanks
> > Manjeet
> >
> > On Mon, Dec 26, 2016 at 11:26 AM, Manjeet Singh <
> > manjeet.chandhok@gmail.com>
> > wrote:
> >
> > > Hi All,
> > >
> > > I have query, I want to create table/ view on drill I have data stored
> in
> > > parquet files on hdfs
> > >
> > >
> > > i am using below command but not able to create table
> > >
> > >
> > > CREATE TABLE 'ttttt'(
> > > "AAA", "Domain", "certValidity", "protocol", "LastActive", "GetCount",
> > > "PostCount", "Data_Transfer", "Data_Receive", "Total_Communication",
> > > "Last_Active") AS SELECT "AAA", "Domain", "certValidity", "protocol",
> > > "LastActive", "GetCount", "PostCount", "Data_Transfer", "Data_Receive",
> > > "Total_Communication", "Last_Active" FROM dfs."/user/Domain-1-_
> > > 1481273732716/*";
> > >
> > > below is the schema
> > >
> > > -- AAA: string (nullable = true)
> > > |-- BTSID: string (nullable = true)
> > > |-- OIME: string (nullable = true)
> > > |-- OIMS: string (nullable = true)
> > > |-- application: string (nullable = true)
> > > |-- applicationCount: string (nullable = true)
> > > |-- dataRx: string (nullable = true)
> > > |-- dataTx: string (nullable = true)
> > > |-- day: string (nullable = true)
> > > |-- duration: string (nullable = true)
> > > |-- locationLatLong: string (nullable = true)
> > > |-- month: string (nullable = true)
> > > |-- protocol: string (nullable = true)
> > > |-- protocolCount: string (nullable = true)
> > > |-- serverIP: string (nullable = true)
> > > |-- startTime: string (nullable = true)
> > > |-- totalVolume: string (nullable = true)
> > > |-- year: string (nullable = true)
> > >
> > > can anyone help me out?
> > >
> > > Thanks
> > > Manjeet
> > >
> > > --
> > > luv all
> > >
> >
> >
> >
> > --
> > luv all
> >
>
Re: Drill with Parquet
Posted by Khurram Faraaz <kf...@maprtech.com>.
Hello Manjeet,
What error do you see on the prompt from where you submit the CTAS
statement ?
Can you please share the stacktrace/error information from the drillbit.log
file ?
is your CTAS (the one that failed), is it over CSV/JSON files ?
And when you say your SELECT statement is not working, what do you see on
the prompt, an error/Exception, no results ?
Can you also share the schema details for your parquet file (*
02a566d.snappy.parquet) in your SELECT statement ?
What version of Drill are you using ?
On Mon, Dec 26, 2016 at 12:16 PM, Manjeet Singh <ma...@gmail.com>
wrote:
> Hi all
>
> I have checked that even my simplest query is not working
>
> SELECT AAA FROM
> dfs.'/user/olap/EntityProfiling/DomainParquet/
> Domain-1-_1481273732716/part-r-00000-7c4b50c5-0318-4243-
> 9f76-9822c02a566d.snappy.parquet';
>
>
> Thanks
> Manjeet
>
> On Mon, Dec 26, 2016 at 11:26 AM, Manjeet Singh <
> manjeet.chandhok@gmail.com>
> wrote:
>
> > Hi All,
> >
> > I have query, I want to create table/ view on drill I have data stored in
> > parquet files on hdfs
> >
> >
> > i am using below command but not able to create table
> >
> >
> > CREATE TABLE 'ttttt'(
> > "AAA", "Domain", "certValidity", "protocol", "LastActive", "GetCount",
> > "PostCount", "Data_Transfer", "Data_Receive", "Total_Communication",
> > "Last_Active") AS SELECT "AAA", "Domain", "certValidity", "protocol",
> > "LastActive", "GetCount", "PostCount", "Data_Transfer", "Data_Receive",
> > "Total_Communication", "Last_Active" FROM dfs."/user/Domain-1-_
> > 1481273732716/*";
> >
> > below is the schema
> >
> > -- AAA: string (nullable = true)
> > |-- BTSID: string (nullable = true)
> > |-- OIME: string (nullable = true)
> > |-- OIMS: string (nullable = true)
> > |-- application: string (nullable = true)
> > |-- applicationCount: string (nullable = true)
> > |-- dataRx: string (nullable = true)
> > |-- dataTx: string (nullable = true)
> > |-- day: string (nullable = true)
> > |-- duration: string (nullable = true)
> > |-- locationLatLong: string (nullable = true)
> > |-- month: string (nullable = true)
> > |-- protocol: string (nullable = true)
> > |-- protocolCount: string (nullable = true)
> > |-- serverIP: string (nullable = true)
> > |-- startTime: string (nullable = true)
> > |-- totalVolume: string (nullable = true)
> > |-- year: string (nullable = true)
> >
> > can anyone help me out?
> >
> > Thanks
> > Manjeet
> >
> > --
> > luv all
> >
>
>
>
> --
> luv all
>
Re: Drill with Parquet
Posted by Manjeet Singh <ma...@gmail.com>.
Hi all
I have checked that even my simplest query is not working
SELECT AAA FROM
dfs.'/user/olap/EntityProfiling/DomainParquet/Domain-1-_1481273732716/part-r-00000-7c4b50c5-0318-4243-9f76-9822c02a566d.snappy.parquet';
Thanks
Manjeet
On Mon, Dec 26, 2016 at 11:26 AM, Manjeet Singh <ma...@gmail.com>
wrote:
> Hi All,
>
> I have query, I want to create table/ view on drill I have data stored in
> parquet files on hdfs
>
>
> i am using below command but not able to create table
>
>
> CREATE TABLE 'ttttt'(
> "AAA", "Domain", "certValidity", "protocol", "LastActive", "GetCount",
> "PostCount", "Data_Transfer", "Data_Receive", "Total_Communication",
> "Last_Active") AS SELECT "AAA", "Domain", "certValidity", "protocol",
> "LastActive", "GetCount", "PostCount", "Data_Transfer", "Data_Receive",
> "Total_Communication", "Last_Active" FROM dfs."/user/Domain-1-_
> 1481273732716/*";
>
> below is the schema
>
> -- AAA: string (nullable = true)
> |-- BTSID: string (nullable = true)
> |-- OIME: string (nullable = true)
> |-- OIMS: string (nullable = true)
> |-- application: string (nullable = true)
> |-- applicationCount: string (nullable = true)
> |-- dataRx: string (nullable = true)
> |-- dataTx: string (nullable = true)
> |-- day: string (nullable = true)
> |-- duration: string (nullable = true)
> |-- locationLatLong: string (nullable = true)
> |-- month: string (nullable = true)
> |-- protocol: string (nullable = true)
> |-- protocolCount: string (nullable = true)
> |-- serverIP: string (nullable = true)
> |-- startTime: string (nullable = true)
> |-- totalVolume: string (nullable = true)
> |-- year: string (nullable = true)
>
> can anyone help me out?
>
> Thanks
> Manjeet
>
> --
> luv all
>
--
luv all
Re: Drill with Parquet
Posted by Khurram Faraaz <kf...@maprtech.com>.
here is an example of CTAS over a parquet file. (the parquet file is
under /drill/testdata/join/typeall_l
directory)
CREATE TABLE l_tblprtnby_intcl
PARTITION BY( col_int )
AS SELECT * FROM dfs.`/drill/testdata/join/typeall_l`;
On Thu, Dec 29, 2016 at 5:32 PM, Khurram Faraaz <kf...@maprtech.com>
wrote:
> Please look at these examples on the documentation links below
>
> here is the link to supported datatypes in Drill -
> https://drill.apache.org/docs/supported-data-types/
>
> and link to CTAS in Drill - http://drill.apache.org/
> docs/create-table-as-ctas-command/
>
> On Thu, Dec 29, 2016 at 4:40 PM, Manjeet Singh <manjeet.chandhok@gmail.com
> > wrote:
>
>> Hi
>>
>> I am trying below query
>>
>> USE dfs;
>> CREATE table dfs.`view_name`(AAA String,
>> Domain String,
>> certValidity String,
>> protocol String,
>> LastActive String,
>> GetCount DOUBLE,
>> PostCount DOUBLE,
>> Data_Transfer DOUBLE,
>> Data_Receive DOUBLE,
>> Total_Communication DOUBLE,
>> Last_Active String)AS SELECT AAA String,
>> Domain String,
>> certValidity String,
>> protocol String,
>> LastActive String,
>> GetCount DOUBLE,
>> PostCount DOUBLE,
>> Data_Transfer DOUBLE,
>> Data_Receive DOUBLE,
>> Total_Communication DOUBLE,
>> Last_Active String FROM
>> dfs.`/Users/drilluser/apache-drill-1.0/sample-sata/nation.parquet`;
>>
>>
>>
>> still getting error
>>
>> can anyone suggest me what I am doing wrong
>> second can anyone share how to create table over parquet file which is on
>> hdfs?
>>
>> On Tue, Dec 27, 2016 at 2:52 AM, Ted Dunning <te...@gmail.com>
>> wrote:
>>
>> > On Sun, Dec 25, 2016 at 9:56 PM, Manjeet Singh <
>> manjeet.chandhok@gmail.com
>> > >
>> > wrote:
>> >
>> > > CREATE TABLE 'ttttt'(
>> > > "AAA", "Domain", "certValidity", "protocol", "LastActive",
>> "GetCount",
>> > > "PostCount", "Data_Transfer", "Data_Receive", "Total_Communication",
>> > > "Last_Active") AS SELECT "AAA", "Domain", "certValidity", "protocol",
>> > > "LastActive", "GetCount", "PostCount", "Data_Transfer",
>> "Data_Receive",
>> > > "Total_Communication", "Last_Active" FROM
>> > > dfs."/user/Domain-1-_1481273732716/*";
>> > >
>> >
>> > You aren't specifying where your new table will be created. THis is
>> > probably in dfs. You may also need to provide a workspace. For
>> instance,
>> > when I was creating a file this morning I used this:
>> >
>> > create table maprfs.ted.fooble as ...
>> >
>> > The parts of this table name are:
>> >
>> > maprfs - the storage
>> >
>> > ted - the workspace (aka my home directory)
>> >
>> > fooble - the table (no need for quotes for some kinds of names ...
>> commonly
>> > necessary, however)
>> >
>> > You have to make sure that your workspace allows writes.
>> >
>>
>>
>>
>> --
>> luv all
>>
>
>
Re: Drill with Parquet
Posted by Khurram Faraaz <kf...@maprtech.com>.
Please look at these examples on the documentation links below
here is the link to supported datatypes in Drill -
https://drill.apache.org/docs/supported-data-types/
and link to CTAS in Drill -
http://drill.apache.org/docs/create-table-as-ctas-command/
On Thu, Dec 29, 2016 at 4:40 PM, Manjeet Singh <ma...@gmail.com>
wrote:
> Hi
>
> I am trying below query
>
> USE dfs;
> CREATE table dfs.`view_name`(AAA String,
> Domain String,
> certValidity String,
> protocol String,
> LastActive String,
> GetCount DOUBLE,
> PostCount DOUBLE,
> Data_Transfer DOUBLE,
> Data_Receive DOUBLE,
> Total_Communication DOUBLE,
> Last_Active String)AS SELECT AAA String,
> Domain String,
> certValidity String,
> protocol String,
> LastActive String,
> GetCount DOUBLE,
> PostCount DOUBLE,
> Data_Transfer DOUBLE,
> Data_Receive DOUBLE,
> Total_Communication DOUBLE,
> Last_Active String FROM
> dfs.`/Users/drilluser/apache-drill-1.0/sample-sata/nation.parquet`;
>
>
>
> still getting error
>
> can anyone suggest me what I am doing wrong
> second can anyone share how to create table over parquet file which is on
> hdfs?
>
> On Tue, Dec 27, 2016 at 2:52 AM, Ted Dunning <te...@gmail.com>
> wrote:
>
> > On Sun, Dec 25, 2016 at 9:56 PM, Manjeet Singh <
> manjeet.chandhok@gmail.com
> > >
> > wrote:
> >
> > > CREATE TABLE 'ttttt'(
> > > "AAA", "Domain", "certValidity", "protocol", "LastActive", "GetCount",
> > > "PostCount", "Data_Transfer", "Data_Receive", "Total_Communication",
> > > "Last_Active") AS SELECT "AAA", "Domain", "certValidity", "protocol",
> > > "LastActive", "GetCount", "PostCount", "Data_Transfer", "Data_Receive",
> > > "Total_Communication", "Last_Active" FROM
> > > dfs."/user/Domain-1-_1481273732716/*";
> > >
> >
> > You aren't specifying where your new table will be created. THis is
> > probably in dfs. You may also need to provide a workspace. For instance,
> > when I was creating a file this morning I used this:
> >
> > create table maprfs.ted.fooble as ...
> >
> > The parts of this table name are:
> >
> > maprfs - the storage
> >
> > ted - the workspace (aka my home directory)
> >
> > fooble - the table (no need for quotes for some kinds of names ...
> commonly
> > necessary, however)
> >
> > You have to make sure that your workspace allows writes.
> >
>
>
>
> --
> luv all
>
Re: Drill with Parquet
Posted by Manjeet Singh <ma...@gmail.com>.
Hi
I am trying below query
USE dfs;
CREATE table dfs.`view_name`(AAA String,
Domain String,
certValidity String,
protocol String,
LastActive String,
GetCount DOUBLE,
PostCount DOUBLE,
Data_Transfer DOUBLE,
Data_Receive DOUBLE,
Total_Communication DOUBLE,
Last_Active String)AS SELECT AAA String,
Domain String,
certValidity String,
protocol String,
LastActive String,
GetCount DOUBLE,
PostCount DOUBLE,
Data_Transfer DOUBLE,
Data_Receive DOUBLE,
Total_Communication DOUBLE,
Last_Active String FROM
dfs.`/Users/drilluser/apache-drill-1.0/sample-sata/nation.parquet`;
still getting error
can anyone suggest me what I am doing wrong
second can anyone share how to create table over parquet file which is on
hdfs?
On Tue, Dec 27, 2016 at 2:52 AM, Ted Dunning <te...@gmail.com> wrote:
> On Sun, Dec 25, 2016 at 9:56 PM, Manjeet Singh <manjeet.chandhok@gmail.com
> >
> wrote:
>
> > CREATE TABLE 'ttttt'(
> > "AAA", "Domain", "certValidity", "protocol", "LastActive", "GetCount",
> > "PostCount", "Data_Transfer", "Data_Receive", "Total_Communication",
> > "Last_Active") AS SELECT "AAA", "Domain", "certValidity", "protocol",
> > "LastActive", "GetCount", "PostCount", "Data_Transfer", "Data_Receive",
> > "Total_Communication", "Last_Active" FROM
> > dfs."/user/Domain-1-_1481273732716/*";
> >
>
> You aren't specifying where your new table will be created. THis is
> probably in dfs. You may also need to provide a workspace. For instance,
> when I was creating a file this morning I used this:
>
> create table maprfs.ted.fooble as ...
>
> The parts of this table name are:
>
> maprfs - the storage
>
> ted - the workspace (aka my home directory)
>
> fooble - the table (no need for quotes for some kinds of names ... commonly
> necessary, however)
>
> You have to make sure that your workspace allows writes.
>
--
luv all
Re: Drill with Parquet
Posted by Ted Dunning <te...@gmail.com>.
On Sun, Dec 25, 2016 at 9:56 PM, Manjeet Singh <ma...@gmail.com>
wrote:
> CREATE TABLE 'ttttt'(
> "AAA", "Domain", "certValidity", "protocol", "LastActive", "GetCount",
> "PostCount", "Data_Transfer", "Data_Receive", "Total_Communication",
> "Last_Active") AS SELECT "AAA", "Domain", "certValidity", "protocol",
> "LastActive", "GetCount", "PostCount", "Data_Transfer", "Data_Receive",
> "Total_Communication", "Last_Active" FROM
> dfs."/user/Domain-1-_1481273732716/*";
>
You aren't specifying where your new table will be created. THis is
probably in dfs. You may also need to provide a workspace. For instance,
when I was creating a file this morning I used this:
create table maprfs.ted.fooble as ...
The parts of this table name are:
maprfs - the storage
ted - the workspace (aka my home directory)
fooble - the table (no need for quotes for some kinds of names ... commonly
necessary, however)
You have to make sure that your workspace allows writes.