You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Zbigniew Tomanek (Jira)" <ji...@apache.org> on 2023/10/11 08:01:00 UTC

[jira] [Updated] (DRILL-8457) Allow configuring csv parser in http storage plugin configuration

     [ https://issues.apache.org/jira/browse/DRILL-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zbigniew Tomanek updated DRILL-8457:
------------------------------------
    Description: 
Currently there is no way to configure csv parser when http plugin is used. Because of that some kind of files cannot be parsed (e.g. when any column has more than 4096 chars or file has a delimiter different from `,`).

Since in DataWalk we utilize http plugin quite often we've changed our internal fork of Drill so following parser/format properties can be configured using additional `csvOptions` field:

 
{code:json}
{
  "csvOptions": {
    "delimiter": "\t",
    "quote": "\"",
    "quote_escape": "\"",
    "line_separator": "\n",
    "header_extraction_enabled": null,
    "number_of_rows_to_skip": 0,
    "number_of_records_to_read": -1,
    "line_separator_detection_enabled": true,
    "max_columns": 512,
    "max_chars_per_column": 4096,
    "skip_empty_lines": true,
    "ignore_leading_whitespaces": true,
    "ignore_trailing_whitespaces": true,
    "null_value": null
  }
}{code}
I'd be glad to get feedback whether creating PR with these changes would bring any value to the Drill

  was:
Currently there is no way to configure csv parser when http plugin is used. Because of that some kind of files cannot be parsed (e.g. when any column has more than 4096 chars or file has a delimiter different from `,`).

Since in DataWalk we utilize http plugin quite often we've changed our internal fork of Drill so following parser/format properties can be configured using additional `csvOptions` field:

 ```json
{
 "csvOptions": {
          "delimiter": "\t",
          "quote": "\"",
          "quote_escape": "\"",
          "line_separator": "\n",
          "header_extraction_enabled": null,
          "number_of_rows_to_skip": 0,
          "number_of_records_to_read": -1,
          "line_separator_detection_enabled": true,
          "max_columns": 512,
          "max_chars_per_column": 4096,
          "skip_empty_lines": true,
          "ignore_leading_whitespaces": true,
          "ignore_trailing_whitespaces": true,
          "null_value": null
        }
}
```

I'd be glad to get feedback whether creating PR with these changes would bring any value to the Drill


> Allow configuring csv parser in http storage plugin configuration
> -----------------------------------------------------------------
>
>                 Key: DRILL-8457
>                 URL: https://issues.apache.org/jira/browse/DRILL-8457
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Storage - HTTP
>    Affects Versions: Future
>            Reporter: Zbigniew Tomanek
>            Priority: Minor
>             Fix For: Future
>
>
> Currently there is no way to configure csv parser when http plugin is used. Because of that some kind of files cannot be parsed (e.g. when any column has more than 4096 chars or file has a delimiter different from `,`).
> Since in DataWalk we utilize http plugin quite often we've changed our internal fork of Drill so following parser/format properties can be configured using additional `csvOptions` field:
>  
> {code:json}
> {
>   "csvOptions": {
>     "delimiter": "\t",
>     "quote": "\"",
>     "quote_escape": "\"",
>     "line_separator": "\n",
>     "header_extraction_enabled": null,
>     "number_of_rows_to_skip": 0,
>     "number_of_records_to_read": -1,
>     "line_separator_detection_enabled": true,
>     "max_columns": 512,
>     "max_chars_per_column": 4096,
>     "skip_empty_lines": true,
>     "ignore_leading_whitespaces": true,
>     "ignore_trailing_whitespaces": true,
>     "null_value": null
>   }
> }{code}
> I'd be glad to get feedback whether creating PR with these changes would bring any value to the Drill



--
This message was sent by Atlassian Jira
(v8.20.10#820010)