You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Steven Phillips (JIRA)" <ji...@apache.org> on 2015/04/15 01:20:58 UTC

[jira] [Commented] (DRILL-2554) Data missing in output of select * on JSON data file, with json.all_text_mode set to true

    [ https://issues.apache.org/jira/browse/DRILL-2554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14495118#comment-14495118 ] 

Steven Phillips commented on DRILL-2554:
----------------------------------------

This is actually not related to json reader at all, it's a bug in the JDBC driver and/or ValueVector interface.

In ValueVector.Accessor interface, there is a method getValueCount(). For non-repeated types, the meaning is unambiguous. But repeated vectors have two different value counts: getGroupCount() and getValueCount().

In the JDBC class BoundCheckingAccessor, the method getObject() calls getValueCount(), which actually corresponds to the childValueCount in repeated vectors, when what we really want is getGroupCount().

I actually think that the JDBC driver is doing the correct thing, and that ValueVector interface is wrong. Since getValueCount() is defined in the top level interface (ValueVector.Accessor), it should do the same thing for all types of vectors, and for repeated vectors this means returning the "group" count.

> Data missing in output of select * on JSON data file, with json.all_text_mode set to true
> -----------------------------------------------------------------------------------------
>
>                 Key: DRILL-2554
>                 URL: https://issues.apache.org/jira/browse/DRILL-2554
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - JSON
>    Affects Versions: 0.8.0
>            Reporter: Khurram Faraaz
>            Assignee: Steven Phillips
>             Fix For: 0.9.0
>
>
> Data is missing from the output of select * from JSON data file statement. Data pertaining to key2 and key3 and key4 is missing from the output of the below select statement. I had enabled `store.json.all_text_mode`=true for that session.
> {code}
> 0: jdbc:drill:> alter session set `store.json.all_text_mode`=true;
> +------------+------------+
> |     ok     |  summary   |
> +------------+------------+
> | true       | store.json.all_text_mode updated. |
> +------------+------------+
> 1 row selected (0.022 seconds)
> 0: jdbc:drill:> select * from `testJsnData02.json`;
> +------------+------------+------------+------------+------------+
> |    key     |    key1    |    key2    |    key3    |    key4    |
> +------------+------------+------------+------------+------------+
> | 12345      | {}         | []         | {}         | []         |
> | -123456    | {}         | []         | {}         | null       |
> | 0          | {}         | []         | {}         | null       |
> | -99999.999 | {}         | []         | {}         | null       |
> | 99999999.9876 | {}         | []         | {}         | null       |
> | Hello World! | {}         | []         | {}         | null       |
> | this is a long string, not very long though! | {}         | []         | {}         | null       |
> | true       | {}         | []         | {}         | null       |
> | false      | {}         | []         | {}         | null       |
> | null       | {}         | []         | {}         | null       |
> | 2147483647 | {}         | []         | {}         | null       |
> | 1100110010101010100101010101010101 | {}         | []         | {}         | null       |
> | 2008-1-23 14:24:23 | {}         | []         | {}         | null       |
> | 2008-2-23  | {}         | []         | {}         | null       |
> | 10:20:30.123 | {}         | null       | {}         | null       |
> | -1         | {}         | null       | {}         | null       |
> | 3.147      | {}         | null       | {}         | null       |
> | null       | {"id":"1000.997"} | null       | {}         | null       |
> | null       | {}         | null       | {}         | null       |
> | null       | {}         | null       | {}         | null       |
> | null       | {}         | null       | {}         | null       |
> | abcdefghijklmnopqrstuvwxyz1234567890ABCDEFGHIJKLMNOPQRSTUVWXYZ    12345 aeiou | {}         | null       | {}         | null       |
> +------------+------------+------------+------------+------------+
> 22 rows selected (0.069 seconds)
> 0: jdbc:drill:> select * from sys.version;
> +------------+----------------+-------------+-------------+------------+
> | commit_id  | commit_message | commit_time | build_email | build_time |
> +------------+----------------+-------------+-------------+------------+
> | f658a3c513ddf7f2d1b0ad7aa1f3f65049a594fe | DRILL-2209 Insert ProjectOperator with MuxExchange | 09.03.2015 @ 01:49:18 EDT | Unknown     | 09.03.2015 @ 04:52:49 EDT |
> +------------+----------------+-------------+-------------+------------+
> 1 row selected (0.041 seconds)
> {code}
> The data that I used in my test was
> {code}
> {"key":12345}
> {"key":-123456}
> {"key":0}
> {"key":-99999.999}
> {"key":99999999.9876}
> {"key":"Hello World!"}
> {"key":"this is a long string, not very long though!"}
> {"key":true}
> {"key":false}
> {"key":null}
> {"key":2147483647}
> {"key":1100110010101010100101010101010101}
> {"key":"2008-1-23 14:24:23"}
> {"key":"2008-2-23"}
> {"key":"10:20:30.123"}
> {"key":-1}
> {"key":3.147}
> {"key1":{"id":1000.997}}
> {"key2":[1,2,3,4,-1,0,135.987,99999,-9999.876,2147483647,"test string",null,true,false]}
> {"key3":{"id":null}}
> {"key4":[null]}
> {"key":"abcdefghijklmnopqrstuvwxyz1234567890ABCDEFGHIJKLMNOPQRSTUVWXYZ    12345 aeiou"}
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)