You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@drill.apache.org by "benj.dev" <be...@laposte.net.INVALID> on 2019/01/23 20:29:31 UTC

Big varchar are ok when extractHeader=false but not when extractHeader=true

Hi,

With a CSV file test.csv
col1,col2
w,x
...y...,z

where ...y... is a > 65536 character string (let say 66000 for example)

Error with
extract of storage : "csv": { "type": "text", "extensions": [ "csv" ],
"extractHeader": true, "delimiter": "," },
SELECT * FROM tmp.`test.csv`
Error: UNSUPPORTED_OPERATION ERROR: Trying to write something big in a
column
column name columns
column index
Fragment 0:0

Same error with
SELECT col1, col2 FROM TABLE(tmp.`test.csv`(type => 'text',
fieldDelimiter => ',', extractHeader => true))
Error: UNSUPPORTED_OPERATION ERROR: Trying to write something big in a
column
columnIndex 0
Limit 65536
Fragment 0:0

But it's OK with
SELECT columns[0], columns[1] FROM TABLE(tmp.`test.csv`(type => 'text',
fieldDelimiter => ',', extractHeader => false))
    col1  | col2
+---------+------
| w       | x
| ...y... | z

Why it's not possible to work with big varchar with extractHeader=true
event though it work with extractHeader=false ?


And in a derivative point for the last case, for printing, ...y... is
truncated and next fields (here only col2) are not display for this row
(without any mark/warning) but the real value is correct.
Maybe it's a bug to not display other columns and maybe the length of
the printed character should be controlled by an option and finally a
marker should indicate when it's the case

Thanks for any explanation

Re: Big varchar are ok when extractHeader=false but not when extractHeader=true

Posted by "benj.dev" <be...@laposte.net.INVALID>.

Hi,

thanks for your interest,
It's work in this case because you test on a csv (without header).

Try to rename your csv in csvh
or change the storage of csv with "extractHeader": true (like in example)
or simply use the TABLE syntax like :
- for header : TABLE(tmp.`thetable`(type => 'text', fieldDelimiter =>
',', extractHeader => true))
- for no header : TABLE(tmp.`thetable`(type => 'text', fieldDelimiter =>
',', extractHeader => false))

The problem is that long string doesn't work when used
extractHeader=true (for csv with header (csvh))

I have just created a JIRA ticket on this subject :
https://issues.apache.org/jira/browse/DRILL-7020

Le 24/01/2019 à 04:31, Boaz Ben-Zvi a écrit :
>  Hi Benj,
> 
>    Testing with the latest code, this error does not show:
> 
>      0: jdbc:drill:zk=local> select columns[0], columns[1] from
> dfs.`/data/bar.csv`;
> +----------------------------------------------------------------------------------+---------+
> 
>      | EXPR$0                                      | EXPR$1  |
> +----------------------------------------------------------------------------------+---------+
> 
>      | hello | world   |
>      |
> a01234567890a01234567890a01234567890a01234567890a01234567890a01234567890a01234567890a01234567890a
> ...............890
> 
> The error you saw is raised in the append() method in
> FieldVarCharOutput.java , but somehow when I ran the example above that
> code path was not taken.
> 
> However indeed the printout truncates (at about 10,000 chars), including
> truncating the following columns. The actual varchar retains its length:
> 
>      0: jdbc:drill:zk=local> select char_length(columns[0]) from
> dfs.`/data/bar.csv`;
>      +---------+
>      | EXPR$0  |
>      +---------+
>      | 5       |
>      | 72061   |
>      +---------+
> 
>  -- Boaz
> 
> On 1/23/19 12:29 PM, benj.dev wrote:
>> Hi,
>>
>> With a CSV file test.csv
>> col1,col2
>> w,x
>> ...y...,z
>>
>> where ...y... is a > 65536 character string (let say 66000 for example)
>>
>> Error with
>> extract of storage : "csv": { "type": "text", "extensions": [ "csv" ],
>> "extractHeader": true, "delimiter": "," },
>> SELECT * FROM tmp.`test.csv`
>> Error: UNSUPPORTED_OPERATION ERROR: Trying to write something big in a
>> column
>> column name columns
>> column index
>> Fragment 0:0
>>
>> Same error with
>> SELECT col1, col2 FROM TABLE(tmp.`test.csv`(type => 'text',
>> fieldDelimiter => ',', extractHeader => true))
>> Error: UNSUPPORTED_OPERATION ERROR: Trying to write something big in a
>> column
>> columnIndex 0
>> Limit 65536
>> Fragment 0:0
>>
>> But it's OK with
>> SELECT columns[0], columns[1] FROM TABLE(tmp.`test.csv`(type => 'text',
>> fieldDelimiter => ',', extractHeader => false))
>>      col1  | col2
>> +---------+------
>> | w       | x
>> | ...y... | z
>>
>> Why it's not possible to work with big varchar with extractHeader=true
>> event though it work with extractHeader=false ?
>>
>>
>> And in a derivative point for the last case, for printing, ...y... is
>> truncated and next fields (here only col2) are not display for this row
>> (without any mark/warning) but the real value is correct.
>> Maybe it's a bug to not display other columns and maybe the length of
>> the printed character should be controlled by an option and finally a
>> marker should indicate when it's the case
>>
>> Thanks for any explanation
>

Re: Big varchar are ok when extractHeader=false but not when extractHeader=true

Posted by Boaz Ben-Zvi <bo...@apache.org>.

  Hi Benj,

    Testing with the latest code, this error does not show:

      0: jdbc:drill:zk=local> select columns[0], columns[1] from 
dfs.`/data/bar.csv`;
+----------------------------------------------------------------------------------+---------+ 

      | EXPR$0                                      | EXPR$1  |
+----------------------------------------------------------------------------------+---------+ 

      | hello | world   |
      | 
a01234567890a01234567890a01234567890a01234567890a01234567890a01234567890a01234567890a01234567890a 
...............890

The error you saw is raised in the append() method in 
FieldVarCharOutput.java , but somehow when I ran the example above that 
code path was not taken.

However indeed the printout truncates (at about 10,000 chars), including 
truncating the following columns. The actual varchar retains its length:

      0: jdbc:drill:zk=local> select char_length(columns[0]) from 
dfs.`/data/bar.csv`;
      +---------+
      | EXPR$0  |
      +---------+
      | 5       |
      | 72061   |
      +---------+

  -- Boaz

On 1/23/19 12:29 PM, benj.dev wrote:
> Hi,
>
> With a CSV file test.csv
> col1,col2
> w,x
> ...y...,z
>
> where ...y... is a > 65536 character string (let say 66000 for example)
>
> Error with
> extract of storage : "csv": { "type": "text", "extensions": [ "csv" ],
> "extractHeader": true, "delimiter": "," },
> SELECT * FROM tmp.`test.csv`
> Error: UNSUPPORTED_OPERATION ERROR: Trying to write something big in a
> column
> column name columns
> column index
> Fragment 0:0
>
> Same error with
> SELECT col1, col2 FROM TABLE(tmp.`test.csv`(type => 'text',
> fieldDelimiter => ',', extractHeader => true))
> Error: UNSUPPORTED_OPERATION ERROR: Trying to write something big in a
> column
> columnIndex 0
> Limit 65536
> Fragment 0:0
>
> But it's OK with
> SELECT columns[0], columns[1] FROM TABLE(tmp.`test.csv`(type => 'text',
> fieldDelimiter => ',', extractHeader => false))
>      col1  | col2
> +---------+------
> | w       | x
> | ...y... | z
>
> Why it's not possible to work with big varchar with extractHeader=true
> event though it work with extractHeader=false ?
>
>
> And in a derivative point for the last case, for printing, ...y... is
> truncated and next fields (here only col2) are not display for this row
> (without any mark/warning) but the real value is correct.
> Maybe it's a bug to not display other columns and maybe the length of
> the printed character should be controlled by an option and finally a
> marker should indicate when it's the case
>
> Thanks for any explanation

Re: Big varchar are ok when extractHeader=false but not when extractHeader=true

Posted by Boaz Ben-Zvi <bo...@apache.org>.

  Hi Benj,

    Testing with the latest code, this error does not show:

      0: jdbc:drill:zk=local> select columns[0], columns[1] from 
dfs.`/data/bar.csv`;
+----------------------------------------------------------------------------------+---------+
      | EXPR$0                                      | EXPR$1  |
+----------------------------------------------------------------------------------+---------+
      | hello | world   |
      | 
a01234567890a01234567890a01234567890a01234567890a01234567890a01234567890a01234567890a01234567890a 
...............890

The error you saw is raised in the append() method in 
FieldVarCharOutput.java , but somehow when I ran the example above that 
code path was not taken.

However indeed the printout truncates (at about 10,000 chars), including 
truncating the following columns. The actual varchar retains its length:

      0: jdbc:drill:zk=local> select char_length(columns[0]) from 
dfs.`/data/bar.csv`;
      +---------+
      | EXPR$0  |
      +---------+
      | 5       |
      | 72061   |
      +---------+

  -- Boaz

On 1/23/19 12:29 PM, benj.dev wrote:
> Hi,
>
> With a CSV file test.csv
> col1,col2
> w,x
> ...y...,z
>
> where ...y... is a > 65536 character string (let say 66000 for example)
>
> Error with
> extract of storage : "csv": { "type": "text", "extensions": [ "csv" ],
> "extractHeader": true, "delimiter": "," },
> SELECT * FROM tmp.`test.csv`
> Error: UNSUPPORTED_OPERATION ERROR: Trying to write something big in a
> column
> column name columns
> column index
> Fragment 0:0
>
> Same error with
> SELECT col1, col2 FROM TABLE(tmp.`test.csv`(type => 'text',
> fieldDelimiter => ',', extractHeader => true))
> Error: UNSUPPORTED_OPERATION ERROR: Trying to write something big in a
> column
> columnIndex 0
> Limit 65536
> Fragment 0:0
>
> But it's OK with
> SELECT columns[0], columns[1] FROM TABLE(tmp.`test.csv`(type => 'text',
> fieldDelimiter => ',', extractHeader => false))
>      col1  | col2
> +---------+------
> | w       | x
> | ...y... | z
>
> Why it's not possible to work with big varchar with extractHeader=true
> event though it work with extractHeader=false ?
>
>
> And in a derivative point for the last case, for printing, ...y... is
> truncated and next fields (here only col2) are not display for this row
> (without any mark/warning) but the real value is correct.
> Maybe it's a bug to not display other columns and maybe the length of
> the printed character should be controlled by an option and finally a
> marker should indicate when it's the case
>
> Thanks for any explanation