You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by "Peter McTaggart (JIRA)" <ji...@apache.org> on 2016/05/02 05:13:12 UTC
[jira] [Created] (DRILL-4648) select count(*) on csv file fails with UNSUPPORTED_OPERATION

Peter McTaggart created DRILL-4648:
--------------------------------------

             Summary: select count(*) on csv file fails with UNSUPPORTED_OPERATION
                 Key: DRILL-4648
                 URL: https://issues.apache.org/jira/browse/DRILL-4648
             Project: Apache Drill
          Issue Type: Bug
          Components: Execution - Data Types, Functions - Drill
    Affects Versions: 1.6.0
            Reporter: Peter McTaggart


When trying to perform a select count(*) on a CSV file the following error is encountered:
0: jdbc:drill:drillbit=10.1.101.10> select count(*) from `views/db/test.csv`;
Error: UNSUPPORTED_OPERATION ERROR: With extractHeader enabled, only header names are supported

column name columns
column index
Fragment 0:0

[Error Id: b38a1e44-c2f5-44a3-9960-6062debc6b50 on xxxxxx.compute.internal:31010] (state=,code=0)

If we refer to a column in the file by name it works, eg:


0: jdbc:drill:drillbit=10.1.101.10> select count(COLUMN_ONE) from `views/db/test.csv`;
+---------+
| EXPR$0  |
+---------+
| 1       |
+---------+
1 row selected (0.144 seconds)
0: jdbc:drill:drillbit=10.1.101.10>

The test.csv file contents:
~/D❯❯❯ cat test.csv
"COLUMN_ONE","COLUMN_TWO"
"Hello","World"
~/D❯❯❯


Drill is talking to a file mounted on Alluxio.

More info:
Mounting s3 directly gives the following results:
With extractHeaders NOT turned on:
: jdbc:drill:drillbit=10.1.101.10> select count(*) from `src/db/test.csv`;
+---------+
| EXPR$0  |
+---------+
| 2       |
+---------+
1 row selected (0.951 seconds)
0: jdbc:drill:drillbit=10.1.101.10>

**With extractHeaders = true :**

0: jdbc:drill:drillbit=10.1.101.10> select count(*) from `src/db/test.csv`;
Error: UNSUPPORTED_OPERATION ERROR: With extractHeader enabled, only header names are supported

column name columns
column index
Fragment 0:0

[Error Id: 5609cf0d-7553-44b5-bd90-40bce1c020a9 on ixxxxxx.compute.internal:31010] (state=,code=0)
0: jdbc:drill:drillbit=10.1.101.10>


Workspace file:

{
  "type": "file",
  "enabled": true,
  "connection": "s3a://<my-bucket>",
  "config": {
    "fs.s3a.access.key": "xxx",
    "fs.s3a.secret.key": "xxx"
  },
  "workspaces": {
    "root": {
      "location": "/",
      "writable": false,
      "defaultInputFormat": null
    },
    "tmp": {
      "location": "/tmp",
      "writable": true,
      "defaultInputFormat": null
    }
  },
  "formats": {
    "psv": {
      "type": "text",
      "extensions": [
        "tbl"
      ],
      "delimiter": "|"
    },
    "csv": {
      "type": "text",
      "extensions": [
        "csv"
      ],
      "extractHeader": true,
      "delimiter": ","
    },
    "tsv": {
      "type": "text",
      "extensions": [
        "tsv"
      ],
      "delimiter": "\t"
    },
    "parquet": {
      "type": "parquet"
    },
    "json": {
      "type": "json",
      "extensions": [
        "json"
      ]
    },
    "avro": {
      "type": "avro"
    },
    "sequencefile": {
      "type": "sequencefile",
      "extensions": [
        "seq"
      ]
    },
    "csvh": {
      "type": "text",
      "extensions": [
        "csvh"
      ],
      "extractHeader": true,
      "delimiter": ","
    }
  }
}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)