You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Steven Phillips (JIRA)" <ji...@apache.org> on 2015/04/13 23:05:12 UTC

[jira] [Updated] (DRILL-2760) Quoted strings from CSV file appear in query output in different forms

     [ https://issues.apache.org/jira/browse/DRILL-2760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Steven Phillips updated DRILL-2760:
-----------------------------------
    Fix Version/s: 1.0.0

> Quoted strings from CSV file appear in query output in different forms
> ----------------------------------------------------------------------
>
>                 Key: DRILL-2760
>                 URL: https://issues.apache.org/jira/browse/DRILL-2760
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Text & CSV
>    Affects Versions: 0.9.0
>         Environment: | 9d92b8e319f2d46e8659d903d355450e15946533 | DRILL-2580: Exit early from HashJoinBatch if build side is empty | 26.03.2015 @ 16:13:53 EDT
> 4 node cluster on CentOS
>            Reporter: Khurram Faraaz
>            Assignee: Steven Phillips
>             Fix For: 1.0.0
>
>
> Quoted strings appear in query output in different forms, as shown in the section below.
> Quotes should NOT appear in query output. Strings must be stripped of their leading and prevailing quotes. (I am referring to this character - " )
> {code}
> Snippet of data from airports.cv file, first three lines, the first line has header information.
> [root@centos-01 airport_CSV_data]# head -3 airports.csv
> "id","ident","type","name","latitude_deg","longitude_deg","elevation_ft","continent","iso_country","iso_region","municipality","scheduled_service","gps_code","iata_code","local_code","home_link","wikipedia_link","keywords"
> 6523,"00A","heliport","Total Rf Heliport",40.07080078125,-74.9336013793945,11,"NA","US","US-PA","Bensalem","no","00A",,"00A",,,
> 6524,"00AK","small_airport","Lowell Field",59.94919968,-151.695999146,450,"NA","US","US-AK","Anchor Point","no","00AK",,"00AK",,,
> case 1) In this case quotes are not escaped, they appear in the output as is.
> 0: jdbc:drill:> select columns[0] id,columns[1] ident,columns[2] type,columns[3] name,columns[4] latitude_deg,columns[5] longitude_deg,columns[6] elevation_ft,columns[7] continent,columns[8] iso_country,columns[9] iso_region,columns[10] municipality,columns[11] scheduled_service,columns[12] gps_code,columns[13] iata_code, columns[14] local_code,columns[15] home_link,columns[16] wikipedia_link,columns[17] keywords from `airports.csv` limit 3;
> +------------+------------+------------+------------+--------------+---------------+--------------+------------+-------------+------------+--------------+-------------------+------------+------------+------------+------------+----------------+------------+
> |     id     |   ident    |    type    |    name    | latitude_deg | longitude_deg | elevation_ft | continent  | iso_country | iso_region | municipality | scheduled_service |  gps_code  | iata_code  | local_code | home_link  | wikipedia_link |  keywords  |
> +------------+------------+------------+------------+--------------+---------------+--------------+------------+-------------+------------+--------------+-------------------+------------+------------+------------+------------+----------------+------------+
> | "id"       | "ident"    | "type"     | "name"     | "latitude_deg" | "longitude_deg" | "elevation_ft" | "continent" | "iso_country" | "iso_region" | "municipality" | "scheduled_service" | "gps_code" | "iata_code" | "local_code" | "home_link" | "wikipedia_link" | "keywords" |
> | 6523       | "00A"      | "heliport" | "Total Rf Heliport" | 40.07080078125 | -74.9336013793945 | 11           | "NA"       | "US"        | "US-PA"    | "Bensalem"   | "no"              | "00A"      |            | "00A"      |            |                | null       |
> | 6524       | "00AK"     | "small_airport" | "Lowell Field" | 59.94919968  | -151.695999146 | 450          | "NA"       | "US"        | "US-AK"    | "Anchor Point" | "no"              | "00AK"     |            | "00AK"     |            |                | null       |
> +------------+------------+------------+------------+--------------+---------------+--------------+------------+-------------+------------+--------------+-------------------+------------+------------+------------+------------+----------------+------------+
> 3 rows selected (0.155 seconds)
> In this case quotes appear in the query output but they are escaped with backslash character in the output.
> 0: jdbc:drill:> select * from `airports.csv` limit 3;
> +------------+
> |  columns   |
> +------------+
> | ["\"id\"","\"ident\"","\"type\"","\"name\"","\"latitude_deg\"","\"longitude_deg\"","\"elevation_ft\"","\"continent\"","\"iso_country\"","\"iso_region\"","\"municipality\"","\"scheduled_service\"","\"gps_code\"","\"iata_code\"","\"local_code\"","\"home_link\"","\"wikipedia_link\"","\"keywords\""] |
> | ["6523","\"00A\"","\"heliport\"","\"Total Rf Heliport\"","40.07080078125","-74.9336013793945","11","\"NA\"","\"US\"","\"US-PA\"","\"Bensalem\"","\"no\"","\"00A\"","","\"00A\"","",""] |
> | ["6524","\"00AK\"","\"small_airport\"","\"Lowell Field\"","59.94919968","-151.695999146","450","\"NA\"","\"US\"","\"US-AK\"","\"Anchor Point\"","\"no\"","\"00AK\"","","\"00AK\"","",""] |
> +------------+
> 3 rows selected (0.097 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)