You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@drill.apache.org by jose antonio <jo...@gmail.com> on 2016/09/28 15:43:40 UTC

Malfunction trying to query csv file using file headers

Hello all,

Yesterday a asked the same on
http://stackoverflow.com/questions/39720337/how-can-i-do-a-query-on-a-csv-file-using-the-header-file-with-apache-drill.
I found out that if a do a query of type
select * from dfs.*`path_to_file*/clientes.csv`


with this csv

ID, NOMBRE, APELLIDOS, PROVINCIA
12121212, MARIA, LIONZA, MADRID
12312312, JAIMITO, PEREZ, ALMERÍA
13131313, BRUNO, DIAZ, MALAGA
23423423, HARLEY, QUINN, BARCELONA


I could see the result as a table where every column has their own name
according to the header values present on the csv file, but if a do a query
specifying the column name then I just get the field empty except the ID,
p.e.

select ID, NOMBRE from dfs.*`path_to_file*/clientes.csv`
then I get

| ID       | NOMBRE |
| 12121212 |        |
| 12312312 |        |
| 13131313 |        |
| 23423423 |        |


why does it work for the ID value but it doesn't for the rest of
column like NOMBRE, APELLIDOS, PROVINCIA?

Also I updated the dfs plugin and add extractHeader to true, for the csv format.

Thanks in advance

-- 
José Antonio Rodríguez
Ing. de Sistemas ULA - Mérida

Re: Malfunction trying to query csv file using file headers

Posted by Charles Givre <cg...@gmail.com>.

De nada!

> On Sep 28, 2016, at 16:57, jose antonio rodriguez diaz <jo...@gmail.com> wrote:
> 
> Hello All,
> 
> Charles you were right, the problem was the space immediately after the comma, so it just worked when I tried this:
> 
> SELECT ID, ` NOMBRE` FROM dfs.`path_to_file/clientes.csv`
> 
> Thanks again.
> 
>> El 28 sept 2016, a las 19:06, Charles Givre <cg...@gmail.com> escribió:
>> 
>> I believe the issue you are encountering is due to the spaces in the header fields.
>> 
>> To verify this hypothesis, try this query (Note the spaces and back ticks in the field names):
>> 
>> SELECT `ID `, `NOMBRE ` FROM <your file>
>> The easy fix is to remove the spaces in the header.
>> 
>> 
>>> On Sep 28, 2016, at 11:43, jose antonio <jo...@gmail.com> wrote:
>>> 
>>> Hello all,
>>> 
>>> Yesterday a asked the same on
>>> http://stackoverflow.com/questions/39720337/how-can-i-do-a-query-on-a-csv-file-using-the-header-file-with-apache-drill.
>>> I found out that if a do a query of type
>>> select * from dfs.*`path_to_file*/clientes.csv`
>>> 
>>> 
>>> with this csv
>>> 
>>> ID, NOMBRE, APELLIDOS, PROVINCIA
>>> 12121212, MARIA, LIONZA, MADRID
>>> 12312312, JAIMITO, PEREZ, ALMERÍA
>>> 13131313, BRUNO, DIAZ, MALAGA
>>> 23423423, HARLEY, QUINN, BARCELONA
>>> 
>>> 
>>> I could see the result as a table where every column has their own name
>>> according to the header values present on the csv file, but if a do a query
>>> specifying the column name then I just get the field empty except the ID,
>>> p.e.
>>> 
>>> select ID, NOMBRE from dfs.*`path_to_file*/clientes.csv`
>>> then I get
>>> 
>>> | ID       | NOMBRE |
>>> | 12121212 |        |
>>> | 12312312 |        |
>>> | 13131313 |        |
>>> | 23423423 |        |
>>> 
>>> 
>>> why does it work for the ID value but it doesn't for the rest of
>>> column like NOMBRE, APELLIDOS, PROVINCIA?
>>> 
>>> Also I updated the dfs plugin and add extractHeader to true, for the csv format.
>>> 
>>> Thanks in advance
>>> 
>>> -- 
>>> José Antonio Rodríguez
>>> Ing. de Sistemas ULA - Mérida
>> 
>

Re: Malfunction trying to query csv file using file headers

Posted by jose antonio rodriguez diaz <jo...@gmail.com>.

Hello All,

Charles you were right, the problem was the space immediately after the comma, so it just worked when I tried this:

SELECT ID, ` NOMBRE` FROM dfs.`path_to_file/clientes.csv`

Thanks again.

> El 28 sept 2016, a las 19:06, Charles Givre <cg...@gmail.com> escribió:
> 
> I believe the issue you are encountering is due to the spaces in the header fields.
> 
> To verify this hypothesis, try this query (Note the spaces and back ticks in the field names):
> 
> SELECT `ID `, `NOMBRE ` FROM <your file>
> The easy fix is to remove the spaces in the header.
> 
> 
>> On Sep 28, 2016, at 11:43, jose antonio <jo...@gmail.com> wrote:
>> 
>> Hello all,
>> 
>> Yesterday a asked the same on
>> http://stackoverflow.com/questions/39720337/how-can-i-do-a-query-on-a-csv-file-using-the-header-file-with-apache-drill.
>> I found out that if a do a query of type
>> select * from dfs.*`path_to_file*/clientes.csv`
>> 
>> 
>> with this csv
>> 
>> ID, NOMBRE, APELLIDOS, PROVINCIA
>> 12121212, MARIA, LIONZA, MADRID
>> 12312312, JAIMITO, PEREZ, ALMERÍA
>> 13131313, BRUNO, DIAZ, MALAGA
>> 23423423, HARLEY, QUINN, BARCELONA
>> 
>> 
>> I could see the result as a table where every column has their own name
>> according to the header values present on the csv file, but if a do a query
>> specifying the column name then I just get the field empty except the ID,
>> p.e.
>> 
>> select ID, NOMBRE from dfs.*`path_to_file*/clientes.csv`
>> then I get
>> 
>> | ID       | NOMBRE |
>> | 12121212 |        |
>> | 12312312 |        |
>> | 13131313 |        |
>> | 23423423 |        |
>> 
>> 
>> why does it work for the ID value but it doesn't for the rest of
>> column like NOMBRE, APELLIDOS, PROVINCIA?
>> 
>> Also I updated the dfs plugin and add extractHeader to true, for the csv format.
>> 
>> Thanks in advance
>> 
>> -- 
>> José Antonio Rodríguez
>> Ing. de Sistemas ULA - Mérida
>

Re: Malfunction trying to query csv file using file headers

Posted by Charles Givre <cg...@gmail.com>.

I believe the issue you are encountering is due to the spaces in the header fields.

To verify this hypothesis, try this query (Note the spaces and back ticks in the field names):

SELECT `ID `, `NOMBRE ` FROM <your file>
The easy fix is to remove the spaces in the header.


> On Sep 28, 2016, at 11:43, jose antonio <jo...@gmail.com> wrote:
> 
> Hello all,
> 
> Yesterday a asked the same on
> http://stackoverflow.com/questions/39720337/how-can-i-do-a-query-on-a-csv-file-using-the-header-file-with-apache-drill.
> I found out that if a do a query of type
> select * from dfs.*`path_to_file*/clientes.csv`
> 
> 
> with this csv
> 
> ID, NOMBRE, APELLIDOS, PROVINCIA
> 12121212, MARIA, LIONZA, MADRID
> 12312312, JAIMITO, PEREZ, ALMERÍA
> 13131313, BRUNO, DIAZ, MALAGA
> 23423423, HARLEY, QUINN, BARCELONA
> 
> 
> I could see the result as a table where every column has their own name
> according to the header values present on the csv file, but if a do a query
> specifying the column name then I just get the field empty except the ID,
> p.e.
> 
> select ID, NOMBRE from dfs.*`path_to_file*/clientes.csv`
> then I get
> 
> | ID       | NOMBRE |
> | 12121212 |        |
> | 12312312 |        |
> | 13131313 |        |
> | 23423423 |        |
> 
> 
> why does it work for the ID value but it doesn't for the rest of
> column like NOMBRE, APELLIDOS, PROVINCIA?
> 
> Also I updated the dfs plugin and add extractHeader to true, for the csv format.
> 
> Thanks in advance
> 
> -- 
> José Antonio Rodríguez
> Ing. de Sistemas ULA - Mérida