You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by Khurram Faraaz <kf...@maprtech.com> on 2015/09/10 20:46:48 UTC

[DISCUSS] Querying nested data from JSON with and without json_all_text_mode results in errors

Hi,


Querying nested data from a JSON file with and with out setting
store.json.all_text_mode
results in errors.


Data that was used in the test is available here -
https://data.cityofchicago.org/Health-Human-Services/Food-Inspections/4ijn-s7e5

There is an option (on the top right) to Export as JSON.

Drill commit ID : 0686bc23 (on master)


[root@centos-01 ~]# hadoop fs -ls /tmp/rows.json

-rwxr-xr-x   3 root root  156417523 2015-09-10 17:18 /tmp/rows.json


[root@centos-01 food_inspection_data]# hadoop fs -put rows.json /tmp

[root@centos-01 food_inspection_data]# cd /opt/mapr/drill/drill-1.2.0/bin

[root@centos-01 bin]# ./sqlline -u "jdbc:drill:schema=dfs.tmp"

apache drill 1.0.0

"a little sql for your nosql"


Query 1)


0: jdbc:drill:schema=dfs.tmp> select * from `rows.json`;

Error: DATA_READ ERROR: You tried to start when you are using a ValueWriter
of type NullableVarCharWriterImpl.


File  /tmp/rows.json

Record  1

Line  1023

Column  29

Field  smallest

Fragment 0:0


[Error Id: 0844933a-ed61-4137-9970-203576d5770d on centos-02.qa.lab:31010]
(state=,code=0)


*-----*


Query 2) Drill seems to be checking for correctness of data much deeper in
the nesting. Note that I do a count over field* meta, *and field*
"smallest" *reported in error is much deeper inside* meta, *
meta.view.columns.cachedContents.smallest


0: jdbc:drill:schema=dfs.tmp> select count(meta) from `rows.json`;

Error: DATA_READ ERROR: You tried to start when you are using a ValueWriter
of type NullableVarCharWriterImpl.


File  /tmp/rows.json

Record  1

Line  1023

Column  29

Field  smallest

Fragment 0:0


[Error Id: 66e7f461-6671-4862-bde5-685c8c2304c9 on centos-04.qa.lab:31010]
(state=,code=0)


Explain plan for Query (2)


0: jdbc:drill:schema=dfs.tmp> explain plan for select count(meta) from
`rows.json`;

*+------+------+*

*| **text** | **json** |*

*+------+------+*

*| *00-00    Screen

00-01      Project(EXPR$0=[$0])

00-02        StreamAgg(group=[{}], EXPR$0=[$SUM0($0)])

00-03          StreamAgg(group=[{}], EXPR$0=[COUNT($0)])

00-04            Scan(groupscan=[EasyGroupScan
[selectionRoot=maprfs:/tmp/rows.json, numFiles=1, columns=[`meta`],
files=[maprfs:///tmp/rows.json]]])


Query 3)


0: jdbc:drill:schema=dfs.tmp> select count(t.meta.view) from `rows.json` t;

Error: DATA_READ ERROR: You tried to start when you are using a ValueWriter
of type NullableVarCharWriterImpl.


File  /tmp/rows.json

Record  1

Line  1023

Column  29

Field  smallest

Fragment 0:0


[Error Id: 55ea650c-9587-42d8-aec9-0f5a4b77ee4c on centos-04.qa.lab:31010]
(state=,code=0)


Query 4) - The count returned is wrong!


0: jdbc:drill:schema=dfs.tmp> select count(t.meta.view.id) from `rows.json`
t;

*+---------+*

*| **EXPR$0 ** |*

*+---------+*

*| *1      * |*

*+---------+*

1 row selected (1.57 seconds)

-------------------

The below queries were run after json all text mode configuration was
enabled.

0: jdbc:drill:schema=dfs.tmp> alter session set
`store.json.all_text_mode`=true;

*+-------+------------------------------------+*

*| ** ok  ** | **             summary              ** |*

*+-------+------------------------------------+*

*| *true * | *store.json.all_text_mode updated. * |*

*+-------+------------------------------------+*

1 row selected (0.218 seconds)


0: jdbc:drill:schema=dfs.tmp> select count(t.meta.view) from `rows.json` t;

Error: DATA_READ ERROR: Error parsing JSON - You tried to start when you
are using a ValueWriter of type NullableVarCharWriterImpl.


File  /tmp/rows.json

Record  1

Fragment 0:0


[Error Id: 5eeee0f3-69fd-445c-a149-a3aa2f18d1cb on centos-04.qa.lab:31010]
(state=,code=0)


COUNT returned in below query is wrong.


0: jdbc:drill:schema=dfs.tmp> select count(t.meta.view.id) from `rows.json`
t;

+---------+

| EXPR$0  |

+---------+

| 1       |

+---------+

1 row selected (0.948 seconds)


Query to perform count over nested data.


0: jdbc:drill:schema=dfs.tmp> select
count(t.meta.view.columns.cachedContents.smallest) from `rows.json` t;

Error: DATA_READ ERROR: Error parsing JSON - You tried to start when you
are using a ValueWriter of type NullableVarCharWriterImpl.


File  /tmp/rows.json

Record  1

Fragment 0:0


[Error Id: 4ce0d5fe-abd7-47d9-8106-269fe37409e6 on centos-04.qa.lab:31010]
(state=,code=0)


0: jdbc:drill:schema=dfs.tmp> select
t.meta.view.columns.cachedContents.smallest from `rows.json` t;

Error: DATA_READ ERROR: Error parsing JSON - You tried to start when you
are using a ValueWriter of type NullableVarCharWriterImpl.


File  /tmp/rows.json

Record  1

Fragment 0:0


[Error Id: b2514414-af51-46c5-938d-ff792e6a8a8c on centos-04.qa.lab:31010]
(state=,code=0)

0: jdbc:drill:schema=dfs.tmp> select * from `rows.json`;

Error: DATA_READ ERROR: Error parsing JSON - You tried to start when you
are using a ValueWriter of type NullableVarCharWriterImpl.


File  /tmp/rows.json

Record  1

Fragment 0:0


[Error Id: 3615e088-64cc-4cba-8cfe-602563f4e16c on centos-04.qa.lab:31010]
(state=,code=0)

These are the occorrences of field named *smallest* in the JSON file

Administrators-MacBook-Pro:Downloads kfaraaz$ grep "smallest" rows.json

          "smallest" : "44247",

          "smallest" : "'C' A CATERING AND EVENT COMPANY",

          "smallest" : "'C' A CATERING AND EVENT COMPANY",

          "smallest" : "0",

          "smallest" : "(convenience store)",

          "smallest" : "All",

          "smallest" : " ",

          "smallest" : "alsip",

          "smallest" : "IL",

          "smallest" : "60007",

          "smallest" : "2010-01-04T00:00:00",

          "smallest" : "1315 license reinspection",

          "smallest" : "Business Not Located",

          "smallest" : "1. SOURCE SOUND CONDITION, NO SPOILAGE, FOODS
PROPERLY LABELED, SHELLFISH TAGS IN PLACE - Comments: ",

          "smallest" : "41.64467013219805",

          "smallest" : "-87.91442843927047",

          "smallest" : {

, [ 101045, "180563EF-F00C-41F9-A0C2-7B2100AD2DED", 101045, 1441883778,
"386464", 1441883778, "386464", null, "343208", "SUBWAY # 26817-309 W.
MONROE", "SUBWAY", "1519940", "Restaurant", "Risk 1 (High)", "309 W MONROE
ST ", "CHICAGO", "IL", "60606", "2010-07-13T00:00:00", "Canvass", "Pass",
"32. FOOD AND NON-FOOD CONTACT SURFACES PROPERLY DESIGNED, CONSTRUCTED AND
MAINTAINED - Comments: All food and non-food contact equipment and utensils
shall be smooth, easily cleanable, and durable, and shall be in good
repair. A few of the smallest cutting boards, with deep/dark grooves must
be sanded/bleached- to make smooth/easily cleanable. | 33. FOOD AND
NON-FOOD CONTACT EQUIPMENT UTENSILS CLEAN, FREE OF ABRASIVE DETERGENTS -
Comments: All food and non-food contact surfaces of equipment and all food
storage utensils shall be thoroughly cleaned and sanitized daily. Interior
of ice chute at soda dispensers need cleaning. | 34. FLOORS: CONSTRUCTED
PER CODE, CLEANED, GOOD REPAIR, COVING INSTALLED, DUST-LESS CLEANING
METHODS USED - Comments: The floors shall be constructed per code, be
smooth and easily cleaned, and be kept clean and in good repair. Floors of
walk-in cooler (around corners) need cleaning; baseboard at the outside
base of the walk-in cooler door is in poor repair, must fix.",
"41.88052826494154", "-87.63556938091433", [ null, "41.88052826494154",
"-87.63556938091433", null, false ] ]

Thanks,
Khurram

Re: [DISCUSS] Querying nested data from JSON with and without json_all_text_mode results in errors

Posted by Abdel Hakim Deneche <ad...@maprtech.com>.
It's interesting that the following query also fails:

SELECT COUNT(meta) FROM `rows.json`

the schema change is nested inside meta.view, so even though we are
counting how many meta "groups" we have, Drill still reads and materializes
all the content of the meta field.



On Thu, Sep 10, 2015 at 2:16 PM, Jason Altekruse <al...@gmail.com>
wrote:

> This is a known issue, we have been describing this schema change scenario
> as a change in "data shape". Here is a very simple dataset that will
> produce the same error
>
> { "a" : 1 }
> { "a" : { "b" : 3} }
>
> We currently only use all_text_mode to change the type of the data at a
> leaf in the schema.
>
> There is an argument that we could have implemented all_text_mode to read
> this file and transform this data into this representation, where complex
> data is stored in text blobs as JSON:
>
> { "a" : "1" }
> { "a" : "{\"b\" : 3}" }
>
> Unfortunately, if this came out of the read, we would need to force users
> to use the covert_from(col, 'JSON') function to access the fields that are
> nested below the root level of the schema.
>
> We decided not to do this at the time all_text_mode was implemented. I do
> not believe we want to provide a workaround this right now as the work on
> embedded type, which will enable native support for all kinds of schema
> changes is ongoing.
>
> https://issues.apache.org/jira/browse/DRILL-3228
>
>
>
>
> On Thu, Sep 10, 2015 at 11:46 AM, Khurram Faraaz <kf...@maprtech.com>
> wrote:
>
> > Hi,
> >
> >
> > Querying nested data from a JSON file with and with out setting
> > store.json.all_text_mode
> > results in errors.
> >
> >
> > Data that was used in the test is available here -
> >
> >
> https://data.cityofchicago.org/Health-Human-Services/Food-Inspections/4ijn-s7e5
> >
> > There is an option (on the top right) to Export as JSON.
> >
> > Drill commit ID : 0686bc23 (on master)
> >
> >
> > [root@centos-01 ~]# hadoop fs -ls /tmp/rows.json
> >
> > -rwxr-xr-x   3 root root  156417523 2015-09-10 17:18 /tmp/rows.json
> >
> >
> > [root@centos-01 food_inspection_data]# hadoop fs -put rows.json /tmp
> >
> > [root@centos-01 food_inspection_data]# cd
> /opt/mapr/drill/drill-1.2.0/bin
> >
> > [root@centos-01 bin]# ./sqlline -u "jdbc:drill:schema=dfs.tmp"
> >
> > apache drill 1.0.0
> >
> > "a little sql for your nosql"
> >
> >
> > Query 1)
> >
> >
> > 0: jdbc:drill:schema=dfs.tmp> select * from `rows.json`;
> >
> > Error: DATA_READ ERROR: You tried to start when you are using a
> ValueWriter
> > of type NullableVarCharWriterImpl.
> >
> >
> > File  /tmp/rows.json
> >
> > Record  1
> >
> > Line  1023
> >
> > Column  29
> >
> > Field  smallest
> >
> > Fragment 0:0
> >
> >
> > [Error Id: 0844933a-ed61-4137-9970-203576d5770d on
> centos-02.qa.lab:31010]
> > (state=,code=0)
> >
> >
> > *-----*
> >
> >
> > Query 2) Drill seems to be checking for correctness of data much deeper
> in
> > the nesting. Note that I do a count over field* meta, *and field*
> > "smallest" *reported in error is much deeper inside* meta, *
> > meta.view.columns.cachedContents.smallest
> >
> >
> > 0: jdbc:drill:schema=dfs.tmp> select count(meta) from `rows.json`;
> >
> > Error: DATA_READ ERROR: You tried to start when you are using a
> ValueWriter
> > of type NullableVarCharWriterImpl.
> >
> >
> > File  /tmp/rows.json
> >
> > Record  1
> >
> > Line  1023
> >
> > Column  29
> >
> > Field  smallest
> >
> > Fragment 0:0
> >
> >
> > [Error Id: 66e7f461-6671-4862-bde5-685c8c2304c9 on
> centos-04.qa.lab:31010]
> > (state=,code=0)
> >
> >
> > Explain plan for Query (2)
> >
> >
> > 0: jdbc:drill:schema=dfs.tmp> explain plan for select count(meta) from
> > `rows.json`;
> >
> > *+------+------+*
> >
> > *| **text** | **json** |*
> >
> > *+------+------+*
> >
> > *| *00-00    Screen
> >
> > 00-01      Project(EXPR$0=[$0])
> >
> > 00-02        StreamAgg(group=[{}], EXPR$0=[$SUM0($0)])
> >
> > 00-03          StreamAgg(group=[{}], EXPR$0=[COUNT($0)])
> >
> > 00-04            Scan(groupscan=[EasyGroupScan
> > [selectionRoot=maprfs:/tmp/rows.json, numFiles=1, columns=[`meta`],
> > files=[maprfs:///tmp/rows.json]]])
> >
> >
> > Query 3)
> >
> >
> > 0: jdbc:drill:schema=dfs.tmp> select count(t.meta.view) from `rows.json`
> t;
> >
> > Error: DATA_READ ERROR: You tried to start when you are using a
> ValueWriter
> > of type NullableVarCharWriterImpl.
> >
> >
> > File  /tmp/rows.json
> >
> > Record  1
> >
> > Line  1023
> >
> > Column  29
> >
> > Field  smallest
> >
> > Fragment 0:0
> >
> >
> > [Error Id: 55ea650c-9587-42d8-aec9-0f5a4b77ee4c on
> centos-04.qa.lab:31010]
> > (state=,code=0)
> >
> >
> > Query 4) - The count returned is wrong!
> >
> >
> > 0: jdbc:drill:schema=dfs.tmp> select count(t.meta.view.id) from
> > `rows.json`
> > t;
> >
> > *+---------+*
> >
> > *| **EXPR$0 ** |*
> >
> > *+---------+*
> >
> > *| *1      * |*
> >
> > *+---------+*
> >
> > 1 row selected (1.57 seconds)
> >
> > -------------------
> >
> > The below queries were run after json all text mode configuration was
> > enabled.
> >
> > 0: jdbc:drill:schema=dfs.tmp> alter session set
> > `store.json.all_text_mode`=true;
> >
> > *+-------+------------------------------------+*
> >
> > *| ** ok  ** | **             summary              ** |*
> >
> > *+-------+------------------------------------+*
> >
> > *| *true * | *store.json.all_text_mode updated. * |*
> >
> > *+-------+------------------------------------+*
> >
> > 1 row selected (0.218 seconds)
> >
> >
> > 0: jdbc:drill:schema=dfs.tmp> select count(t.meta.view) from `rows.json`
> t;
> >
> > Error: DATA_READ ERROR: Error parsing JSON - You tried to start when you
> > are using a ValueWriter of type NullableVarCharWriterImpl.
> >
> >
> > File  /tmp/rows.json
> >
> > Record  1
> >
> > Fragment 0:0
> >
> >
> > [Error Id: 5eeee0f3-69fd-445c-a149-a3aa2f18d1cb on
> centos-04.qa.lab:31010]
> > (state=,code=0)
> >
> >
> > COUNT returned in below query is wrong.
> >
> >
> > 0: jdbc:drill:schema=dfs.tmp> select count(t.meta.view.id) from
> > `rows.json`
> > t;
> >
> > +---------+
> >
> > | EXPR$0  |
> >
> > +---------+
> >
> > | 1       |
> >
> > +---------+
> >
> > 1 row selected (0.948 seconds)
> >
> >
> > Query to perform count over nested data.
> >
> >
> > 0: jdbc:drill:schema=dfs.tmp> select
> > count(t.meta.view.columns.cachedContents.smallest) from `rows.json` t;
> >
> > Error: DATA_READ ERROR: Error parsing JSON - You tried to start when you
> > are using a ValueWriter of type NullableVarCharWriterImpl.
> >
> >
> > File  /tmp/rows.json
> >
> > Record  1
> >
> > Fragment 0:0
> >
> >
> > [Error Id: 4ce0d5fe-abd7-47d9-8106-269fe37409e6 on
> centos-04.qa.lab:31010]
> > (state=,code=0)
> >
> >
> > 0: jdbc:drill:schema=dfs.tmp> select
> > t.meta.view.columns.cachedContents.smallest from `rows.json` t;
> >
> > Error: DATA_READ ERROR: Error parsing JSON - You tried to start when you
> > are using a ValueWriter of type NullableVarCharWriterImpl.
> >
> >
> > File  /tmp/rows.json
> >
> > Record  1
> >
> > Fragment 0:0
> >
> >
> > [Error Id: b2514414-af51-46c5-938d-ff792e6a8a8c on
> centos-04.qa.lab:31010]
> > (state=,code=0)
> >
> > 0: jdbc:drill:schema=dfs.tmp> select * from `rows.json`;
> >
> > Error: DATA_READ ERROR: Error parsing JSON - You tried to start when you
> > are using a ValueWriter of type NullableVarCharWriterImpl.
> >
> >
> > File  /tmp/rows.json
> >
> > Record  1
> >
> > Fragment 0:0
> >
> >
> > [Error Id: 3615e088-64cc-4cba-8cfe-602563f4e16c on
> centos-04.qa.lab:31010]
> > (state=,code=0)
> >
> > These are the occorrences of field named *smallest* in the JSON file
> >
> > Administrators-MacBook-Pro:Downloads kfaraaz$ grep "smallest" rows.json
> >
> >           "smallest" : "44247",
> >
> >           "smallest" : "'C' A CATERING AND EVENT COMPANY",
> >
> >           "smallest" : "'C' A CATERING AND EVENT COMPANY",
> >
> >           "smallest" : "0",
> >
> >           "smallest" : "(convenience store)",
> >
> >           "smallest" : "All",
> >
> >           "smallest" : " ",
> >
> >           "smallest" : "alsip",
> >
> >           "smallest" : "IL",
> >
> >           "smallest" : "60007",
> >
> >           "smallest" : "2010-01-04T00:00:00",
> >
> >           "smallest" : "1315 license reinspection",
> >
> >           "smallest" : "Business Not Located",
> >
> >           "smallest" : "1. SOURCE SOUND CONDITION, NO SPOILAGE, FOODS
> > PROPERLY LABELED, SHELLFISH TAGS IN PLACE - Comments: ",
> >
> >           "smallest" : "41.64467013219805",
> >
> >           "smallest" : "-87.91442843927047",
> >
> >           "smallest" : {
> >
> > , [ 101045, "180563EF-F00C-41F9-A0C2-7B2100AD2DED", 101045, 1441883778,
> > "386464", 1441883778, "386464", null, "343208", "SUBWAY # 26817-309 W.
> > MONROE", "SUBWAY", "1519940", "Restaurant", "Risk 1 (High)", "309 W
> MONROE
> > ST ", "CHICAGO", "IL", "60606", "2010-07-13T00:00:00", "Canvass", "Pass",
> > "32. FOOD AND NON-FOOD CONTACT SURFACES PROPERLY DESIGNED, CONSTRUCTED
> AND
> > MAINTAINED - Comments: All food and non-food contact equipment and
> utensils
> > shall be smooth, easily cleanable, and durable, and shall be in good
> > repair. A few of the smallest cutting boards, with deep/dark grooves must
> > be sanded/bleached- to make smooth/easily cleanable. | 33. FOOD AND
> > NON-FOOD CONTACT EQUIPMENT UTENSILS CLEAN, FREE OF ABRASIVE DETERGENTS -
> > Comments: All food and non-food contact surfaces of equipment and all
> food
> > storage utensils shall be thoroughly cleaned and sanitized daily.
> Interior
> > of ice chute at soda dispensers need cleaning. | 34. FLOORS: CONSTRUCTED
> > PER CODE, CLEANED, GOOD REPAIR, COVING INSTALLED, DUST-LESS CLEANING
> > METHODS USED - Comments: The floors shall be constructed per code, be
> > smooth and easily cleaned, and be kept clean and in good repair. Floors
> of
> > walk-in cooler (around corners) need cleaning; baseboard at the outside
> > base of the walk-in cooler door is in poor repair, must fix.",
> > "41.88052826494154", "-87.63556938091433", [ null, "41.88052826494154",
> > "-87.63556938091433", null, false ] ]
> >
> > Thanks,
> > Khurram
> >
>



-- 

Abdelhakim Deneche

Software Engineer

  <http://www.mapr.com/>


Now Available - Free Hadoop On-Demand Training
<http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>

Re: [DISCUSS] Querying nested data from JSON with and without json_all_text_mode results in errors

Posted by Jason Altekruse <al...@gmail.com>.
This is a known issue, we have been describing this schema change scenario
as a change in "data shape". Here is a very simple dataset that will
produce the same error

{ "a" : 1 }
{ "a" : { "b" : 3} }

We currently only use all_text_mode to change the type of the data at a
leaf in the schema.

There is an argument that we could have implemented all_text_mode to read
this file and transform this data into this representation, where complex
data is stored in text blobs as JSON:

{ "a" : "1" }
{ "a" : "{\"b\" : 3}" }

Unfortunately, if this came out of the read, we would need to force users
to use the covert_from(col, 'JSON') function to access the fields that are
nested below the root level of the schema.

We decided not to do this at the time all_text_mode was implemented. I do
not believe we want to provide a workaround this right now as the work on
embedded type, which will enable native support for all kinds of schema
changes is ongoing.

https://issues.apache.org/jira/browse/DRILL-3228




On Thu, Sep 10, 2015 at 11:46 AM, Khurram Faraaz <kf...@maprtech.com>
wrote:

> Hi,
>
>
> Querying nested data from a JSON file with and with out setting
> store.json.all_text_mode
> results in errors.
>
>
> Data that was used in the test is available here -
>
> https://data.cityofchicago.org/Health-Human-Services/Food-Inspections/4ijn-s7e5
>
> There is an option (on the top right) to Export as JSON.
>
> Drill commit ID : 0686bc23 (on master)
>
>
> [root@centos-01 ~]# hadoop fs -ls /tmp/rows.json
>
> -rwxr-xr-x   3 root root  156417523 2015-09-10 17:18 /tmp/rows.json
>
>
> [root@centos-01 food_inspection_data]# hadoop fs -put rows.json /tmp
>
> [root@centos-01 food_inspection_data]# cd /opt/mapr/drill/drill-1.2.0/bin
>
> [root@centos-01 bin]# ./sqlline -u "jdbc:drill:schema=dfs.tmp"
>
> apache drill 1.0.0
>
> "a little sql for your nosql"
>
>
> Query 1)
>
>
> 0: jdbc:drill:schema=dfs.tmp> select * from `rows.json`;
>
> Error: DATA_READ ERROR: You tried to start when you are using a ValueWriter
> of type NullableVarCharWriterImpl.
>
>
> File  /tmp/rows.json
>
> Record  1
>
> Line  1023
>
> Column  29
>
> Field  smallest
>
> Fragment 0:0
>
>
> [Error Id: 0844933a-ed61-4137-9970-203576d5770d on centos-02.qa.lab:31010]
> (state=,code=0)
>
>
> *-----*
>
>
> Query 2) Drill seems to be checking for correctness of data much deeper in
> the nesting. Note that I do a count over field* meta, *and field*
> "smallest" *reported in error is much deeper inside* meta, *
> meta.view.columns.cachedContents.smallest
>
>
> 0: jdbc:drill:schema=dfs.tmp> select count(meta) from `rows.json`;
>
> Error: DATA_READ ERROR: You tried to start when you are using a ValueWriter
> of type NullableVarCharWriterImpl.
>
>
> File  /tmp/rows.json
>
> Record  1
>
> Line  1023
>
> Column  29
>
> Field  smallest
>
> Fragment 0:0
>
>
> [Error Id: 66e7f461-6671-4862-bde5-685c8c2304c9 on centos-04.qa.lab:31010]
> (state=,code=0)
>
>
> Explain plan for Query (2)
>
>
> 0: jdbc:drill:schema=dfs.tmp> explain plan for select count(meta) from
> `rows.json`;
>
> *+------+------+*
>
> *| **text** | **json** |*
>
> *+------+------+*
>
> *| *00-00    Screen
>
> 00-01      Project(EXPR$0=[$0])
>
> 00-02        StreamAgg(group=[{}], EXPR$0=[$SUM0($0)])
>
> 00-03          StreamAgg(group=[{}], EXPR$0=[COUNT($0)])
>
> 00-04            Scan(groupscan=[EasyGroupScan
> [selectionRoot=maprfs:/tmp/rows.json, numFiles=1, columns=[`meta`],
> files=[maprfs:///tmp/rows.json]]])
>
>
> Query 3)
>
>
> 0: jdbc:drill:schema=dfs.tmp> select count(t.meta.view) from `rows.json` t;
>
> Error: DATA_READ ERROR: You tried to start when you are using a ValueWriter
> of type NullableVarCharWriterImpl.
>
>
> File  /tmp/rows.json
>
> Record  1
>
> Line  1023
>
> Column  29
>
> Field  smallest
>
> Fragment 0:0
>
>
> [Error Id: 55ea650c-9587-42d8-aec9-0f5a4b77ee4c on centos-04.qa.lab:31010]
> (state=,code=0)
>
>
> Query 4) - The count returned is wrong!
>
>
> 0: jdbc:drill:schema=dfs.tmp> select count(t.meta.view.id) from
> `rows.json`
> t;
>
> *+---------+*
>
> *| **EXPR$0 ** |*
>
> *+---------+*
>
> *| *1      * |*
>
> *+---------+*
>
> 1 row selected (1.57 seconds)
>
> -------------------
>
> The below queries were run after json all text mode configuration was
> enabled.
>
> 0: jdbc:drill:schema=dfs.tmp> alter session set
> `store.json.all_text_mode`=true;
>
> *+-------+------------------------------------+*
>
> *| ** ok  ** | **             summary              ** |*
>
> *+-------+------------------------------------+*
>
> *| *true * | *store.json.all_text_mode updated. * |*
>
> *+-------+------------------------------------+*
>
> 1 row selected (0.218 seconds)
>
>
> 0: jdbc:drill:schema=dfs.tmp> select count(t.meta.view) from `rows.json` t;
>
> Error: DATA_READ ERROR: Error parsing JSON - You tried to start when you
> are using a ValueWriter of type NullableVarCharWriterImpl.
>
>
> File  /tmp/rows.json
>
> Record  1
>
> Fragment 0:0
>
>
> [Error Id: 5eeee0f3-69fd-445c-a149-a3aa2f18d1cb on centos-04.qa.lab:31010]
> (state=,code=0)
>
>
> COUNT returned in below query is wrong.
>
>
> 0: jdbc:drill:schema=dfs.tmp> select count(t.meta.view.id) from
> `rows.json`
> t;
>
> +---------+
>
> | EXPR$0  |
>
> +---------+
>
> | 1       |
>
> +---------+
>
> 1 row selected (0.948 seconds)
>
>
> Query to perform count over nested data.
>
>
> 0: jdbc:drill:schema=dfs.tmp> select
> count(t.meta.view.columns.cachedContents.smallest) from `rows.json` t;
>
> Error: DATA_READ ERROR: Error parsing JSON - You tried to start when you
> are using a ValueWriter of type NullableVarCharWriterImpl.
>
>
> File  /tmp/rows.json
>
> Record  1
>
> Fragment 0:0
>
>
> [Error Id: 4ce0d5fe-abd7-47d9-8106-269fe37409e6 on centos-04.qa.lab:31010]
> (state=,code=0)
>
>
> 0: jdbc:drill:schema=dfs.tmp> select
> t.meta.view.columns.cachedContents.smallest from `rows.json` t;
>
> Error: DATA_READ ERROR: Error parsing JSON - You tried to start when you
> are using a ValueWriter of type NullableVarCharWriterImpl.
>
>
> File  /tmp/rows.json
>
> Record  1
>
> Fragment 0:0
>
>
> [Error Id: b2514414-af51-46c5-938d-ff792e6a8a8c on centos-04.qa.lab:31010]
> (state=,code=0)
>
> 0: jdbc:drill:schema=dfs.tmp> select * from `rows.json`;
>
> Error: DATA_READ ERROR: Error parsing JSON - You tried to start when you
> are using a ValueWriter of type NullableVarCharWriterImpl.
>
>
> File  /tmp/rows.json
>
> Record  1
>
> Fragment 0:0
>
>
> [Error Id: 3615e088-64cc-4cba-8cfe-602563f4e16c on centos-04.qa.lab:31010]
> (state=,code=0)
>
> These are the occorrences of field named *smallest* in the JSON file
>
> Administrators-MacBook-Pro:Downloads kfaraaz$ grep "smallest" rows.json
>
>           "smallest" : "44247",
>
>           "smallest" : "'C' A CATERING AND EVENT COMPANY",
>
>           "smallest" : "'C' A CATERING AND EVENT COMPANY",
>
>           "smallest" : "0",
>
>           "smallest" : "(convenience store)",
>
>           "smallest" : "All",
>
>           "smallest" : " ",
>
>           "smallest" : "alsip",
>
>           "smallest" : "IL",
>
>           "smallest" : "60007",
>
>           "smallest" : "2010-01-04T00:00:00",
>
>           "smallest" : "1315 license reinspection",
>
>           "smallest" : "Business Not Located",
>
>           "smallest" : "1. SOURCE SOUND CONDITION, NO SPOILAGE, FOODS
> PROPERLY LABELED, SHELLFISH TAGS IN PLACE - Comments: ",
>
>           "smallest" : "41.64467013219805",
>
>           "smallest" : "-87.91442843927047",
>
>           "smallest" : {
>
> , [ 101045, "180563EF-F00C-41F9-A0C2-7B2100AD2DED", 101045, 1441883778,
> "386464", 1441883778, "386464", null, "343208", "SUBWAY # 26817-309 W.
> MONROE", "SUBWAY", "1519940", "Restaurant", "Risk 1 (High)", "309 W MONROE
> ST ", "CHICAGO", "IL", "60606", "2010-07-13T00:00:00", "Canvass", "Pass",
> "32. FOOD AND NON-FOOD CONTACT SURFACES PROPERLY DESIGNED, CONSTRUCTED AND
> MAINTAINED - Comments: All food and non-food contact equipment and utensils
> shall be smooth, easily cleanable, and durable, and shall be in good
> repair. A few of the smallest cutting boards, with deep/dark grooves must
> be sanded/bleached- to make smooth/easily cleanable. | 33. FOOD AND
> NON-FOOD CONTACT EQUIPMENT UTENSILS CLEAN, FREE OF ABRASIVE DETERGENTS -
> Comments: All food and non-food contact surfaces of equipment and all food
> storage utensils shall be thoroughly cleaned and sanitized daily. Interior
> of ice chute at soda dispensers need cleaning. | 34. FLOORS: CONSTRUCTED
> PER CODE, CLEANED, GOOD REPAIR, COVING INSTALLED, DUST-LESS CLEANING
> METHODS USED - Comments: The floors shall be constructed per code, be
> smooth and easily cleaned, and be kept clean and in good repair. Floors of
> walk-in cooler (around corners) need cleaning; baseboard at the outside
> base of the walk-in cooler door is in poor repair, must fix.",
> "41.88052826494154", "-87.63556938091433", [ null, "41.88052826494154",
> "-87.63556938091433", null, false ] ]
>
> Thanks,
> Khurram
>