You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by Tugdual Grall <tu...@gmail.com> on 2015/06/22 15:59:58 UTC

JSON/Join/Dynamic schema : java.lang.IllegalStateException: Failure while reading vector.

Hello,

In my use case I have several JSON documents that I need to query using a
join.
The structure of each document can vary a lot (some fields a present or not
in documents)

Sometimes the following exception is raised:
org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR:
java.lang.IllegalStateException: Failure while reading vector. Expected
vector class of org.apache.drill.exec.vector.NullableIntVector but was
holding vector class org.apache.drill.exec.vector.NullableVarCharVector.
Fragment 0:0 [Error Id: 35c751bd-3ca0-4e4a-bbac-ad5823ce582f on
192.168.99.13:31010]

The queries:

Following query works:
-----
SELECT customers.id, orders.demo
FROM  dfs.`/Users/tgrall/working/customers/*.json` customers,
      dfs.`/Users/tgrall/working/orders/*.json` orders
WHERE customers.id = orders.cust_id
AND customers.country = 'FRANCE'
-----

Following query FAILS:
-----
SELECT customers.id, orders.cool
FROM  dfs.`/Users/tgrall/working/customers/*.json` customers,
      dfs.`/Users/tgrall/working/orders/*.json` orders
WHERE customers.id = orders.cust_id
AND customers.country = 'FRANCE'
-----


The documents:

Here the files:

./customers/333.json
{
  "id" : 333,
  "name" : "Dave Smith",
  "country" : "FRANCE"
}


./orders/111.json
{
"tax" : 10,
"id" : 111,
"cust_id" : 333,
"total" : 12,
"demo" :10
}

./orders/222.json
{
"cool":20,
"id" : 222,
"cust_id" : 111,
"total" : 12
}


To reproduce the bug you may have to change the document (add/remove cool,
tax fields)

It looks like the schema is not "updated" on the fly in some case.

Any idea how to workaround? Is that bug?

Regards
Tug

Re: JSON/Join/Dynamic schema : java.lang.IllegalStateException: Failure while reading vector.

Posted by Hanifi Gunes <hg...@maprtech.com>.
Great.

On Mon, Jun 22, 2015 at 11:45 AM, Tugdual Grall <tu...@gmail.com> wrote:

> Thanks for your help.
>
> If I use * I have another exception:
>
> --
> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR:
> org.apache.drill.exec.exception.SchemaChangeException: Hash join does not
> support schema changes Fragment 0:0 [Error Id:
> 0b20d580-37a3-491a-9987-4d04fb6f2d43 on maprdemo:31010]
> --
>
> Creating the JIRA as we speak...
>
> Tug
>
>
> On Mon, Jun 22, 2015 at 8:41 PM, Hanifi Gunes <hg...@maprtech.com> wrote:
>
> > This is a bug in handling schema change. I would not expect this to
> happen
> > in case your second query had select * rather than an explicit
> projection,
> > select t.cool. Can you file a JIRA?
> >
> > On Mon, Jun 22, 2015 at 9:20 AM, Andries Engelbrecht <
> > aengelbrecht@maprtech.com> wrote:
> >
> > > Ted,
> > >
> > > Agree fully, it definitely seems like a reproducible bug that should be
> > > filed and resolved.
> > >
> > >
> > > —Andries
> > >
> > >
> > >
> > > On Jun 22, 2015, at 9:14 AM, Ted Dunning <te...@gmail.com>
> wrote:
> > >
> > > > Andries,
> > > >
> > > > That sounds like a reasonable suggestion, but the real problem is
> that
> > it
> > > > appears that having the field initially and then having the field be
> > > > missing is OK, but if it is missing first and then present Drill
> blows
> > a
> > > > gasket.
> > > >
> > > > I think it looks like a bug.  Very good and simple demo.
> > > >
> > > >
> > > >
> > > > On Mon, Jun 22, 2015 at 8:53 AM, Andries Engelbrecht <
> > > > aengelbrecht@maprtech.com> wrote:
> > > >
> > > >> A couple of things to try that I have found useful in the past.
> > > >>
> > > >> Pending if you want inner or outer joins, you may want to look at
> > using
> > > >> predicates to eliminate records that are not relevant to the join
> and
> > > can
> > > >> complicate the work Drill has to do.
> > > >>
> > > >> ie. add predicate "orders.cool is not null”
> > > >>
> > > >> Not only does it filter out the records that are not of interest
> (and
> > > can
> > > >> cause other challenges), but normally if you can apply predicates to
> > > >> queries to reduce the working set that Drill has to join it can
> > > >> substantially improve the performance for large data sets. Joins
> tend
> > > to be
> > > >> one of the more expensive operators in any execution engine, where
> > > >> predicates tend to be a much easier operation to execute at large
> > scale.
> > > >>
> > > >> —Andries
> > > >>
> > > >>
> > > >> On Jun 22, 2015, at 7:19 AM, Christopher Matta <cm...@mapr.com>
> > wrote:
> > > >>
> > > >>> I can confirm that this is reproducible:
> > > >>>
> > > >>> orders/111.json:
> > > >>>
> > > >>> {
> > > >>>  "tax" : 10,
> > > >>>  "id" : 111,
> > > >>>  "cust_id" : 333,
> > > >>>  "total" : 12,
> > > >>>  "demo" :10
> > > >>> }
> > > >>>
> > > >>> orders/222.json:
> > > >>>
> > > >>> {
> > > >>>  "cool": 20,
> > > >>>  "id" : 222,
> > > >>>  "cust_id" : 111,
> > > >>>  "total" : 12
> > > >>> }
> > > >>>
> > > >>> 1st query:
> > > >>>
> > > >>> 0: jdbc:drill:zk=sen11:5181,sen12:5181> SELECT customers.id,
> > > orders.cool
> > > >>> . . . . . . . . . . . . . . . . . . . > FROM
> > > >>> `maprfs.cmatta`.`test/customers/*.json` customers,
> > > >>> . . . . . . . . . . . . . . . . . . . >
> > > >>> `maprfs.cmatta`.`test/orders/*.json` orders
> > > >>> . . . . . . . . . . . . . . . . . . . > WHERE customers.id =
> > > >> orders.cust_id
> > > >>> . . . . . . . . . . . . . . . . . . . > AND customers.country =
> > > 'FRANCE';
> > > >>> +------+-------+
> > > >>> |  id  | cool  |
> > > >>> +------+-------+
> > > >>> | 333  | null  |
> > > >>> +------+-------+
> > > >>> 1 row selected (0.258 seconds)
> > > >>>
> > > >>> Now change orders/111.json by moving the cool field from 222.json
> to
> > > >>> 111.json:
> > > >>>
> > > >>> {
> > > >>>  "cool": 20,
> > > >>>  "tax" : 10,
> > > >>>  "id" : 111,
> > > >>>  "cust_id" : 333,
> > > >>>  "total" : 12,
> > > >>>  "demo" :10
> > > >>> }
> > > >>>
> > > >>> And removing cool from orders/222.json:
> > > >>>
> > > >>> {
> > > >>>  "id" : 222,
> > > >>>  "cust_id" : 111,
> > > >>>  "total" : 12
> > > >>> }
> > > >>>
> > > >>> Re-run the query:
> > > >>>
> > > >>> : jdbc:drill:zk=sen11:5181,sen12:5181> SELECT customers.id,
> > > orders.cool
> > > >>> . . . . . . . . . . . . . . . . . . . > FROM
> > > >>> `maprfs.cmatta`.`test/customers/*.json` customers,
> > > >>> . . . . . . . . . . . . . . . . . . . >
> > > >>> `maprfs.cmatta`.`test/orders/*.json` orders
> > > >>> . . . . . . . . . . . . . . . . . . . > WHERE customers.id =
> > > >> orders.cust_id
> > > >>> . . . . . . . . . . . . . . . . . . . > AND customers.country =
> > > 'FRANCE';
> > > >>> java.lang.RuntimeException: java.sql.SQLException: SYSTEM ERROR:
> > > >>> java.lang.IllegalStateException: Failure while reading vector.
> > > >>> Expected vector class of
> > > >>> org.apache.drill.exec.vector.NullableIntVector but was holding
> vector
> > > >>> class org.apache.drill.exec.vector.NullableVarCharVector.
> > > >>>
> > > >>> Fragment 0:0
> > > >>>
> > > >>> [Error Id: 04e231ee-8bad-4ad2-aff3-6c0273befd2f on
> > > >> se-node11.se.lab:31010]
> > > >>>       at sqlline.IncrementalRows.hasNext(IncrementalRows.java:73)
> > > >>>       at
> > > >>
> > >
> >
> sqlline.TableOutputFormat$ResizingRowsProvider.next(TableOutputFormat.java:87)
> > > >>>       at
> sqlline.TableOutputFormat.print(TableOutputFormat.java:118)
> > > >>>       at sqlline.SqlLine.print(SqlLine.java:1583)
> > > >>>       at sqlline.Commands.execute(Commands.java:852)
> > > >>>       at sqlline.Commands.sql(Commands.java:751)
> > > >>>       at sqlline.SqlLine.dispatch(SqlLine.java:738)
> > > >>>       at sqlline.SqlLine.begin(SqlLine.java:612)
> > > >>>       at sqlline.SqlLine.start(SqlLine.java:366)
> > > >>>       at sqlline.SqlLine.main(SqlLine.java:259)
> > > >>>
> > > >>> ​
> > > >>>
> > > >>> Chris Matta
> > > >>> cmatta@mapr.com
> > > >>> 215-701-3146
> > > >>>
> > > >>> On Mon, Jun 22, 2015 at 10:13 AM, Tugdual Grall <tugdual@gmail.com
> >
> > > >> wrote:
> > > >>>
> > > >>>> Yes.
> > > >>>>
> > > >>>> On Mon, Jun 22, 2015 at 4:12 PM, Christopher Matta <
> cmatta@mapr.com
> > >
> > > >>>> wrote:
> > > >>>>
> > > >>>>> Just to clarify, you run the *exact same query* once and it
> works,
> > > then
> > > >>>>> you remove say the “cool” field from orders/222.json and put it
> in
> > > >>>>> orders/111.json and the next time the same query returns that
> > error?
> > > >>>>> ​
> > > >>>>>
> > > >>>>> Chris Matta
> > > >>>>> cmatta@mapr.com
> > > >>>>> 215-701-3146
> > > >>>>>
> > > >>>>> On Mon, Jun 22, 2015 at 9:59 AM, Tugdual Grall <
> tugdual@gmail.com>
> > > >> wrote:
> > > >>>>>
> > > >>>>>> Hello,
> > > >>>>>>
> > > >>>>>> In my use case I have several JSON documents that I need to
> query
> > > >> using a
> > > >>>>>> join.
> > > >>>>>> The structure of each document can vary a lot (some fields a
> > present
> > > >> or
> > > >>>>>> not
> > > >>>>>> in documents)
> > > >>>>>>
> > > >>>>>> Sometimes the following exception is raised:
> > > >>>>>> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM
> > > ERROR:
> > > >>>>>> java.lang.IllegalStateException: Failure while reading vector.
> > > >> Expected
> > > >>>>>> vector class of org.apache.drill.exec.vector.NullableIntVector
> but
> > > was
> > > >>>>>> holding vector class
> > > >> org.apache.drill.exec.vector.NullableVarCharVector.
> > > >>>>>> Fragment 0:0 [Error Id: 35c751bd-3ca0-4e4a-bbac-ad5823ce582f on
> > > >>>>>> 192.168.99.13:31010]
> > > >>>>>>
> > > >>>>>> The queries:
> > > >>>>>>
> > > >>>>>> Following query works:
> > > >>>>>> -----
> > > >>>>>> SELECT customers.id, orders.demo
> > > >>>>>> FROM  dfs.`/Users/tgrall/working/customers/*.json` customers,
> > > >>>>>>     dfs.`/Users/tgrall/working/orders/*.json` orders
> > > >>>>>> WHERE customers.id = orders.cust_id
> > > >>>>>> AND customers.country = 'FRANCE'
> > > >>>>>> -----
> > > >>>>>>
> > > >>>>>> Following query FAILS:
> > > >>>>>> -----
> > > >>>>>> SELECT customers.id, orders.cool
> > > >>>>>> FROM  dfs.`/Users/tgrall/working/customers/*.json` customers,
> > > >>>>>>     dfs.`/Users/tgrall/working/orders/*.json` orders
> > > >>>>>> WHERE customers.id = orders.cust_id
> > > >>>>>> AND customers.country = 'FRANCE'
> > > >>>>>> -----
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> The documents:
> > > >>>>>>
> > > >>>>>> Here the files:
> > > >>>>>>
> > > >>>>>> ./customers/333.json
> > > >>>>>> {
> > > >>>>>> "id" : 333,
> > > >>>>>> "name" : "Dave Smith",
> > > >>>>>> "country" : "FRANCE"
> > > >>>>>> }
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> ./orders/111.json
> > > >>>>>> {
> > > >>>>>> "tax" : 10,
> > > >>>>>> "id" : 111,
> > > >>>>>> "cust_id" : 333,
> > > >>>>>> "total" : 12,
> > > >>>>>> "demo" :10
> > > >>>>>> }
> > > >>>>>>
> > > >>>>>> ./orders/222.json
> > > >>>>>> {
> > > >>>>>> "cool":20,
> > > >>>>>> "id" : 222,
> > > >>>>>> "cust_id" : 111,
> > > >>>>>> "total" : 12
> > > >>>>>> }
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> To reproduce the bug you may have to change the document
> > (add/remove
> > > >>>>>> cool,
> > > >>>>>> tax fields)
> > > >>>>>>
> > > >>>>>> It looks like the schema is not "updated" on the fly in some
> case.
> > > >>>>>>
> > > >>>>>> Any idea how to workaround? Is that bug?
> > > >>>>>>
> > > >>>>>> Regards
> > > >>>>>> Tug
> > > >>>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>
> > > >>
> > > >>
> > >
> > >
> >
>

Re: JSON/Join/Dynamic schema : java.lang.IllegalStateException: Failure while reading vector.

Posted by Tugdual Grall <tu...@gmail.com>.
Thanks for your help.

If I use * I have another exception:

--
org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR:
org.apache.drill.exec.exception.SchemaChangeException: Hash join does not
support schema changes Fragment 0:0 [Error Id:
0b20d580-37a3-491a-9987-4d04fb6f2d43 on maprdemo:31010]
--

Creating the JIRA as we speak...

Tug


On Mon, Jun 22, 2015 at 8:41 PM, Hanifi Gunes <hg...@maprtech.com> wrote:

> This is a bug in handling schema change. I would not expect this to happen
> in case your second query had select * rather than an explicit projection,
> select t.cool. Can you file a JIRA?
>
> On Mon, Jun 22, 2015 at 9:20 AM, Andries Engelbrecht <
> aengelbrecht@maprtech.com> wrote:
>
> > Ted,
> >
> > Agree fully, it definitely seems like a reproducible bug that should be
> > filed and resolved.
> >
> >
> > —Andries
> >
> >
> >
> > On Jun 22, 2015, at 9:14 AM, Ted Dunning <te...@gmail.com> wrote:
> >
> > > Andries,
> > >
> > > That sounds like a reasonable suggestion, but the real problem is that
> it
> > > appears that having the field initially and then having the field be
> > > missing is OK, but if it is missing first and then present Drill blows
> a
> > > gasket.
> > >
> > > I think it looks like a bug.  Very good and simple demo.
> > >
> > >
> > >
> > > On Mon, Jun 22, 2015 at 8:53 AM, Andries Engelbrecht <
> > > aengelbrecht@maprtech.com> wrote:
> > >
> > >> A couple of things to try that I have found useful in the past.
> > >>
> > >> Pending if you want inner or outer joins, you may want to look at
> using
> > >> predicates to eliminate records that are not relevant to the join and
> > can
> > >> complicate the work Drill has to do.
> > >>
> > >> ie. add predicate "orders.cool is not null”
> > >>
> > >> Not only does it filter out the records that are not of interest (and
> > can
> > >> cause other challenges), but normally if you can apply predicates to
> > >> queries to reduce the working set that Drill has to join it can
> > >> substantially improve the performance for large data sets. Joins tend
> > to be
> > >> one of the more expensive operators in any execution engine, where
> > >> predicates tend to be a much easier operation to execute at large
> scale.
> > >>
> > >> —Andries
> > >>
> > >>
> > >> On Jun 22, 2015, at 7:19 AM, Christopher Matta <cm...@mapr.com>
> wrote:
> > >>
> > >>> I can confirm that this is reproducible:
> > >>>
> > >>> orders/111.json:
> > >>>
> > >>> {
> > >>>  "tax" : 10,
> > >>>  "id" : 111,
> > >>>  "cust_id" : 333,
> > >>>  "total" : 12,
> > >>>  "demo" :10
> > >>> }
> > >>>
> > >>> orders/222.json:
> > >>>
> > >>> {
> > >>>  "cool": 20,
> > >>>  "id" : 222,
> > >>>  "cust_id" : 111,
> > >>>  "total" : 12
> > >>> }
> > >>>
> > >>> 1st query:
> > >>>
> > >>> 0: jdbc:drill:zk=sen11:5181,sen12:5181> SELECT customers.id,
> > orders.cool
> > >>> . . . . . . . . . . . . . . . . . . . > FROM
> > >>> `maprfs.cmatta`.`test/customers/*.json` customers,
> > >>> . . . . . . . . . . . . . . . . . . . >
> > >>> `maprfs.cmatta`.`test/orders/*.json` orders
> > >>> . . . . . . . . . . . . . . . . . . . > WHERE customers.id =
> > >> orders.cust_id
> > >>> . . . . . . . . . . . . . . . . . . . > AND customers.country =
> > 'FRANCE';
> > >>> +------+-------+
> > >>> |  id  | cool  |
> > >>> +------+-------+
> > >>> | 333  | null  |
> > >>> +------+-------+
> > >>> 1 row selected (0.258 seconds)
> > >>>
> > >>> Now change orders/111.json by moving the cool field from 222.json to
> > >>> 111.json:
> > >>>
> > >>> {
> > >>>  "cool": 20,
> > >>>  "tax" : 10,
> > >>>  "id" : 111,
> > >>>  "cust_id" : 333,
> > >>>  "total" : 12,
> > >>>  "demo" :10
> > >>> }
> > >>>
> > >>> And removing cool from orders/222.json:
> > >>>
> > >>> {
> > >>>  "id" : 222,
> > >>>  "cust_id" : 111,
> > >>>  "total" : 12
> > >>> }
> > >>>
> > >>> Re-run the query:
> > >>>
> > >>> : jdbc:drill:zk=sen11:5181,sen12:5181> SELECT customers.id,
> > orders.cool
> > >>> . . . . . . . . . . . . . . . . . . . > FROM
> > >>> `maprfs.cmatta`.`test/customers/*.json` customers,
> > >>> . . . . . . . . . . . . . . . . . . . >
> > >>> `maprfs.cmatta`.`test/orders/*.json` orders
> > >>> . . . . . . . . . . . . . . . . . . . > WHERE customers.id =
> > >> orders.cust_id
> > >>> . . . . . . . . . . . . . . . . . . . > AND customers.country =
> > 'FRANCE';
> > >>> java.lang.RuntimeException: java.sql.SQLException: SYSTEM ERROR:
> > >>> java.lang.IllegalStateException: Failure while reading vector.
> > >>> Expected vector class of
> > >>> org.apache.drill.exec.vector.NullableIntVector but was holding vector
> > >>> class org.apache.drill.exec.vector.NullableVarCharVector.
> > >>>
> > >>> Fragment 0:0
> > >>>
> > >>> [Error Id: 04e231ee-8bad-4ad2-aff3-6c0273befd2f on
> > >> se-node11.se.lab:31010]
> > >>>       at sqlline.IncrementalRows.hasNext(IncrementalRows.java:73)
> > >>>       at
> > >>
> >
> sqlline.TableOutputFormat$ResizingRowsProvider.next(TableOutputFormat.java:87)
> > >>>       at sqlline.TableOutputFormat.print(TableOutputFormat.java:118)
> > >>>       at sqlline.SqlLine.print(SqlLine.java:1583)
> > >>>       at sqlline.Commands.execute(Commands.java:852)
> > >>>       at sqlline.Commands.sql(Commands.java:751)
> > >>>       at sqlline.SqlLine.dispatch(SqlLine.java:738)
> > >>>       at sqlline.SqlLine.begin(SqlLine.java:612)
> > >>>       at sqlline.SqlLine.start(SqlLine.java:366)
> > >>>       at sqlline.SqlLine.main(SqlLine.java:259)
> > >>>
> > >>> ​
> > >>>
> > >>> Chris Matta
> > >>> cmatta@mapr.com
> > >>> 215-701-3146
> > >>>
> > >>> On Mon, Jun 22, 2015 at 10:13 AM, Tugdual Grall <tu...@gmail.com>
> > >> wrote:
> > >>>
> > >>>> Yes.
> > >>>>
> > >>>> On Mon, Jun 22, 2015 at 4:12 PM, Christopher Matta <cmatta@mapr.com
> >
> > >>>> wrote:
> > >>>>
> > >>>>> Just to clarify, you run the *exact same query* once and it works,
> > then
> > >>>>> you remove say the “cool” field from orders/222.json and put it in
> > >>>>> orders/111.json and the next time the same query returns that
> error?
> > >>>>> ​
> > >>>>>
> > >>>>> Chris Matta
> > >>>>> cmatta@mapr.com
> > >>>>> 215-701-3146
> > >>>>>
> > >>>>> On Mon, Jun 22, 2015 at 9:59 AM, Tugdual Grall <tu...@gmail.com>
> > >> wrote:
> > >>>>>
> > >>>>>> Hello,
> > >>>>>>
> > >>>>>> In my use case I have several JSON documents that I need to query
> > >> using a
> > >>>>>> join.
> > >>>>>> The structure of each document can vary a lot (some fields a
> present
> > >> or
> > >>>>>> not
> > >>>>>> in documents)
> > >>>>>>
> > >>>>>> Sometimes the following exception is raised:
> > >>>>>> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM
> > ERROR:
> > >>>>>> java.lang.IllegalStateException: Failure while reading vector.
> > >> Expected
> > >>>>>> vector class of org.apache.drill.exec.vector.NullableIntVector but
> > was
> > >>>>>> holding vector class
> > >> org.apache.drill.exec.vector.NullableVarCharVector.
> > >>>>>> Fragment 0:0 [Error Id: 35c751bd-3ca0-4e4a-bbac-ad5823ce582f on
> > >>>>>> 192.168.99.13:31010]
> > >>>>>>
> > >>>>>> The queries:
> > >>>>>>
> > >>>>>> Following query works:
> > >>>>>> -----
> > >>>>>> SELECT customers.id, orders.demo
> > >>>>>> FROM  dfs.`/Users/tgrall/working/customers/*.json` customers,
> > >>>>>>     dfs.`/Users/tgrall/working/orders/*.json` orders
> > >>>>>> WHERE customers.id = orders.cust_id
> > >>>>>> AND customers.country = 'FRANCE'
> > >>>>>> -----
> > >>>>>>
> > >>>>>> Following query FAILS:
> > >>>>>> -----
> > >>>>>> SELECT customers.id, orders.cool
> > >>>>>> FROM  dfs.`/Users/tgrall/working/customers/*.json` customers,
> > >>>>>>     dfs.`/Users/tgrall/working/orders/*.json` orders
> > >>>>>> WHERE customers.id = orders.cust_id
> > >>>>>> AND customers.country = 'FRANCE'
> > >>>>>> -----
> > >>>>>>
> > >>>>>>
> > >>>>>> The documents:
> > >>>>>>
> > >>>>>> Here the files:
> > >>>>>>
> > >>>>>> ./customers/333.json
> > >>>>>> {
> > >>>>>> "id" : 333,
> > >>>>>> "name" : "Dave Smith",
> > >>>>>> "country" : "FRANCE"
> > >>>>>> }
> > >>>>>>
> > >>>>>>
> > >>>>>> ./orders/111.json
> > >>>>>> {
> > >>>>>> "tax" : 10,
> > >>>>>> "id" : 111,
> > >>>>>> "cust_id" : 333,
> > >>>>>> "total" : 12,
> > >>>>>> "demo" :10
> > >>>>>> }
> > >>>>>>
> > >>>>>> ./orders/222.json
> > >>>>>> {
> > >>>>>> "cool":20,
> > >>>>>> "id" : 222,
> > >>>>>> "cust_id" : 111,
> > >>>>>> "total" : 12
> > >>>>>> }
> > >>>>>>
> > >>>>>>
> > >>>>>> To reproduce the bug you may have to change the document
> (add/remove
> > >>>>>> cool,
> > >>>>>> tax fields)
> > >>>>>>
> > >>>>>> It looks like the schema is not "updated" on the fly in some case.
> > >>>>>>
> > >>>>>> Any idea how to workaround? Is that bug?
> > >>>>>>
> > >>>>>> Regards
> > >>>>>> Tug
> > >>>>>>
> > >>>>>
> > >>>>>
> > >>>>
> > >>
> > >>
> >
> >
>

Re: JSON/Join/Dynamic schema : java.lang.IllegalStateException: Failure while reading vector.

Posted by Hanifi Gunes <hg...@maprtech.com>.
This is a bug in handling schema change. I would not expect this to happen
in case your second query had select * rather than an explicit projection,
select t.cool. Can you file a JIRA?

On Mon, Jun 22, 2015 at 9:20 AM, Andries Engelbrecht <
aengelbrecht@maprtech.com> wrote:

> Ted,
>
> Agree fully, it definitely seems like a reproducible bug that should be
> filed and resolved.
>
>
> —Andries
>
>
>
> On Jun 22, 2015, at 9:14 AM, Ted Dunning <te...@gmail.com> wrote:
>
> > Andries,
> >
> > That sounds like a reasonable suggestion, but the real problem is that it
> > appears that having the field initially and then having the field be
> > missing is OK, but if it is missing first and then present Drill blows a
> > gasket.
> >
> > I think it looks like a bug.  Very good and simple demo.
> >
> >
> >
> > On Mon, Jun 22, 2015 at 8:53 AM, Andries Engelbrecht <
> > aengelbrecht@maprtech.com> wrote:
> >
> >> A couple of things to try that I have found useful in the past.
> >>
> >> Pending if you want inner or outer joins, you may want to look at using
> >> predicates to eliminate records that are not relevant to the join and
> can
> >> complicate the work Drill has to do.
> >>
> >> ie. add predicate "orders.cool is not null”
> >>
> >> Not only does it filter out the records that are not of interest (and
> can
> >> cause other challenges), but normally if you can apply predicates to
> >> queries to reduce the working set that Drill has to join it can
> >> substantially improve the performance for large data sets. Joins tend
> to be
> >> one of the more expensive operators in any execution engine, where
> >> predicates tend to be a much easier operation to execute at large scale.
> >>
> >> —Andries
> >>
> >>
> >> On Jun 22, 2015, at 7:19 AM, Christopher Matta <cm...@mapr.com> wrote:
> >>
> >>> I can confirm that this is reproducible:
> >>>
> >>> orders/111.json:
> >>>
> >>> {
> >>>  "tax" : 10,
> >>>  "id" : 111,
> >>>  "cust_id" : 333,
> >>>  "total" : 12,
> >>>  "demo" :10
> >>> }
> >>>
> >>> orders/222.json:
> >>>
> >>> {
> >>>  "cool": 20,
> >>>  "id" : 222,
> >>>  "cust_id" : 111,
> >>>  "total" : 12
> >>> }
> >>>
> >>> 1st query:
> >>>
> >>> 0: jdbc:drill:zk=sen11:5181,sen12:5181> SELECT customers.id,
> orders.cool
> >>> . . . . . . . . . . . . . . . . . . . > FROM
> >>> `maprfs.cmatta`.`test/customers/*.json` customers,
> >>> . . . . . . . . . . . . . . . . . . . >
> >>> `maprfs.cmatta`.`test/orders/*.json` orders
> >>> . . . . . . . . . . . . . . . . . . . > WHERE customers.id =
> >> orders.cust_id
> >>> . . . . . . . . . . . . . . . . . . . > AND customers.country =
> 'FRANCE';
> >>> +------+-------+
> >>> |  id  | cool  |
> >>> +------+-------+
> >>> | 333  | null  |
> >>> +------+-------+
> >>> 1 row selected (0.258 seconds)
> >>>
> >>> Now change orders/111.json by moving the cool field from 222.json to
> >>> 111.json:
> >>>
> >>> {
> >>>  "cool": 20,
> >>>  "tax" : 10,
> >>>  "id" : 111,
> >>>  "cust_id" : 333,
> >>>  "total" : 12,
> >>>  "demo" :10
> >>> }
> >>>
> >>> And removing cool from orders/222.json:
> >>>
> >>> {
> >>>  "id" : 222,
> >>>  "cust_id" : 111,
> >>>  "total" : 12
> >>> }
> >>>
> >>> Re-run the query:
> >>>
> >>> : jdbc:drill:zk=sen11:5181,sen12:5181> SELECT customers.id,
> orders.cool
> >>> . . . . . . . . . . . . . . . . . . . > FROM
> >>> `maprfs.cmatta`.`test/customers/*.json` customers,
> >>> . . . . . . . . . . . . . . . . . . . >
> >>> `maprfs.cmatta`.`test/orders/*.json` orders
> >>> . . . . . . . . . . . . . . . . . . . > WHERE customers.id =
> >> orders.cust_id
> >>> . . . . . . . . . . . . . . . . . . . > AND customers.country =
> 'FRANCE';
> >>> java.lang.RuntimeException: java.sql.SQLException: SYSTEM ERROR:
> >>> java.lang.IllegalStateException: Failure while reading vector.
> >>> Expected vector class of
> >>> org.apache.drill.exec.vector.NullableIntVector but was holding vector
> >>> class org.apache.drill.exec.vector.NullableVarCharVector.
> >>>
> >>> Fragment 0:0
> >>>
> >>> [Error Id: 04e231ee-8bad-4ad2-aff3-6c0273befd2f on
> >> se-node11.se.lab:31010]
> >>>       at sqlline.IncrementalRows.hasNext(IncrementalRows.java:73)
> >>>       at
> >>
> sqlline.TableOutputFormat$ResizingRowsProvider.next(TableOutputFormat.java:87)
> >>>       at sqlline.TableOutputFormat.print(TableOutputFormat.java:118)
> >>>       at sqlline.SqlLine.print(SqlLine.java:1583)
> >>>       at sqlline.Commands.execute(Commands.java:852)
> >>>       at sqlline.Commands.sql(Commands.java:751)
> >>>       at sqlline.SqlLine.dispatch(SqlLine.java:738)
> >>>       at sqlline.SqlLine.begin(SqlLine.java:612)
> >>>       at sqlline.SqlLine.start(SqlLine.java:366)
> >>>       at sqlline.SqlLine.main(SqlLine.java:259)
> >>>
> >>> ​
> >>>
> >>> Chris Matta
> >>> cmatta@mapr.com
> >>> 215-701-3146
> >>>
> >>> On Mon, Jun 22, 2015 at 10:13 AM, Tugdual Grall <tu...@gmail.com>
> >> wrote:
> >>>
> >>>> Yes.
> >>>>
> >>>> On Mon, Jun 22, 2015 at 4:12 PM, Christopher Matta <cm...@mapr.com>
> >>>> wrote:
> >>>>
> >>>>> Just to clarify, you run the *exact same query* once and it works,
> then
> >>>>> you remove say the “cool” field from orders/222.json and put it in
> >>>>> orders/111.json and the next time the same query returns that error?
> >>>>> ​
> >>>>>
> >>>>> Chris Matta
> >>>>> cmatta@mapr.com
> >>>>> 215-701-3146
> >>>>>
> >>>>> On Mon, Jun 22, 2015 at 9:59 AM, Tugdual Grall <tu...@gmail.com>
> >> wrote:
> >>>>>
> >>>>>> Hello,
> >>>>>>
> >>>>>> In my use case I have several JSON documents that I need to query
> >> using a
> >>>>>> join.
> >>>>>> The structure of each document can vary a lot (some fields a present
> >> or
> >>>>>> not
> >>>>>> in documents)
> >>>>>>
> >>>>>> Sometimes the following exception is raised:
> >>>>>> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM
> ERROR:
> >>>>>> java.lang.IllegalStateException: Failure while reading vector.
> >> Expected
> >>>>>> vector class of org.apache.drill.exec.vector.NullableIntVector but
> was
> >>>>>> holding vector class
> >> org.apache.drill.exec.vector.NullableVarCharVector.
> >>>>>> Fragment 0:0 [Error Id: 35c751bd-3ca0-4e4a-bbac-ad5823ce582f on
> >>>>>> 192.168.99.13:31010]
> >>>>>>
> >>>>>> The queries:
> >>>>>>
> >>>>>> Following query works:
> >>>>>> -----
> >>>>>> SELECT customers.id, orders.demo
> >>>>>> FROM  dfs.`/Users/tgrall/working/customers/*.json` customers,
> >>>>>>     dfs.`/Users/tgrall/working/orders/*.json` orders
> >>>>>> WHERE customers.id = orders.cust_id
> >>>>>> AND customers.country = 'FRANCE'
> >>>>>> -----
> >>>>>>
> >>>>>> Following query FAILS:
> >>>>>> -----
> >>>>>> SELECT customers.id, orders.cool
> >>>>>> FROM  dfs.`/Users/tgrall/working/customers/*.json` customers,
> >>>>>>     dfs.`/Users/tgrall/working/orders/*.json` orders
> >>>>>> WHERE customers.id = orders.cust_id
> >>>>>> AND customers.country = 'FRANCE'
> >>>>>> -----
> >>>>>>
> >>>>>>
> >>>>>> The documents:
> >>>>>>
> >>>>>> Here the files:
> >>>>>>
> >>>>>> ./customers/333.json
> >>>>>> {
> >>>>>> "id" : 333,
> >>>>>> "name" : "Dave Smith",
> >>>>>> "country" : "FRANCE"
> >>>>>> }
> >>>>>>
> >>>>>>
> >>>>>> ./orders/111.json
> >>>>>> {
> >>>>>> "tax" : 10,
> >>>>>> "id" : 111,
> >>>>>> "cust_id" : 333,
> >>>>>> "total" : 12,
> >>>>>> "demo" :10
> >>>>>> }
> >>>>>>
> >>>>>> ./orders/222.json
> >>>>>> {
> >>>>>> "cool":20,
> >>>>>> "id" : 222,
> >>>>>> "cust_id" : 111,
> >>>>>> "total" : 12
> >>>>>> }
> >>>>>>
> >>>>>>
> >>>>>> To reproduce the bug you may have to change the document (add/remove
> >>>>>> cool,
> >>>>>> tax fields)
> >>>>>>
> >>>>>> It looks like the schema is not "updated" on the fly in some case.
> >>>>>>
> >>>>>> Any idea how to workaround? Is that bug?
> >>>>>>
> >>>>>> Regards
> >>>>>> Tug
> >>>>>>
> >>>>>
> >>>>>
> >>>>
> >>
> >>
>
>

Re: JSON/Join/Dynamic schema : java.lang.IllegalStateException: Failure while reading vector.

Posted by Andries Engelbrecht <ae...@maprtech.com>.
Ted,

Agree fully, it definitely seems like a reproducible bug that should be filed and resolved.


—Andries



On Jun 22, 2015, at 9:14 AM, Ted Dunning <te...@gmail.com> wrote:

> Andries,
> 
> That sounds like a reasonable suggestion, but the real problem is that it
> appears that having the field initially and then having the field be
> missing is OK, but if it is missing first and then present Drill blows a
> gasket.
> 
> I think it looks like a bug.  Very good and simple demo.
> 
> 
> 
> On Mon, Jun 22, 2015 at 8:53 AM, Andries Engelbrecht <
> aengelbrecht@maprtech.com> wrote:
> 
>> A couple of things to try that I have found useful in the past.
>> 
>> Pending if you want inner or outer joins, you may want to look at using
>> predicates to eliminate records that are not relevant to the join and can
>> complicate the work Drill has to do.
>> 
>> ie. add predicate "orders.cool is not null”
>> 
>> Not only does it filter out the records that are not of interest (and can
>> cause other challenges), but normally if you can apply predicates to
>> queries to reduce the working set that Drill has to join it can
>> substantially improve the performance for large data sets. Joins tend to be
>> one of the more expensive operators in any execution engine, where
>> predicates tend to be a much easier operation to execute at large scale.
>> 
>> —Andries
>> 
>> 
>> On Jun 22, 2015, at 7:19 AM, Christopher Matta <cm...@mapr.com> wrote:
>> 
>>> I can confirm that this is reproducible:
>>> 
>>> orders/111.json:
>>> 
>>> {
>>>  "tax" : 10,
>>>  "id" : 111,
>>>  "cust_id" : 333,
>>>  "total" : 12,
>>>  "demo" :10
>>> }
>>> 
>>> orders/222.json:
>>> 
>>> {
>>>  "cool": 20,
>>>  "id" : 222,
>>>  "cust_id" : 111,
>>>  "total" : 12
>>> }
>>> 
>>> 1st query:
>>> 
>>> 0: jdbc:drill:zk=sen11:5181,sen12:5181> SELECT customers.id, orders.cool
>>> . . . . . . . . . . . . . . . . . . . > FROM
>>> `maprfs.cmatta`.`test/customers/*.json` customers,
>>> . . . . . . . . . . . . . . . . . . . >
>>> `maprfs.cmatta`.`test/orders/*.json` orders
>>> . . . . . . . . . . . . . . . . . . . > WHERE customers.id =
>> orders.cust_id
>>> . . . . . . . . . . . . . . . . . . . > AND customers.country = 'FRANCE';
>>> +------+-------+
>>> |  id  | cool  |
>>> +------+-------+
>>> | 333  | null  |
>>> +------+-------+
>>> 1 row selected (0.258 seconds)
>>> 
>>> Now change orders/111.json by moving the cool field from 222.json to
>>> 111.json:
>>> 
>>> {
>>>  "cool": 20,
>>>  "tax" : 10,
>>>  "id" : 111,
>>>  "cust_id" : 333,
>>>  "total" : 12,
>>>  "demo" :10
>>> }
>>> 
>>> And removing cool from orders/222.json:
>>> 
>>> {
>>>  "id" : 222,
>>>  "cust_id" : 111,
>>>  "total" : 12
>>> }
>>> 
>>> Re-run the query:
>>> 
>>> : jdbc:drill:zk=sen11:5181,sen12:5181> SELECT customers.id, orders.cool
>>> . . . . . . . . . . . . . . . . . . . > FROM
>>> `maprfs.cmatta`.`test/customers/*.json` customers,
>>> . . . . . . . . . . . . . . . . . . . >
>>> `maprfs.cmatta`.`test/orders/*.json` orders
>>> . . . . . . . . . . . . . . . . . . . > WHERE customers.id =
>> orders.cust_id
>>> . . . . . . . . . . . . . . . . . . . > AND customers.country = 'FRANCE';
>>> java.lang.RuntimeException: java.sql.SQLException: SYSTEM ERROR:
>>> java.lang.IllegalStateException: Failure while reading vector.
>>> Expected vector class of
>>> org.apache.drill.exec.vector.NullableIntVector but was holding vector
>>> class org.apache.drill.exec.vector.NullableVarCharVector.
>>> 
>>> Fragment 0:0
>>> 
>>> [Error Id: 04e231ee-8bad-4ad2-aff3-6c0273befd2f on
>> se-node11.se.lab:31010]
>>>       at sqlline.IncrementalRows.hasNext(IncrementalRows.java:73)
>>>       at
>> sqlline.TableOutputFormat$ResizingRowsProvider.next(TableOutputFormat.java:87)
>>>       at sqlline.TableOutputFormat.print(TableOutputFormat.java:118)
>>>       at sqlline.SqlLine.print(SqlLine.java:1583)
>>>       at sqlline.Commands.execute(Commands.java:852)
>>>       at sqlline.Commands.sql(Commands.java:751)
>>>       at sqlline.SqlLine.dispatch(SqlLine.java:738)
>>>       at sqlline.SqlLine.begin(SqlLine.java:612)
>>>       at sqlline.SqlLine.start(SqlLine.java:366)
>>>       at sqlline.SqlLine.main(SqlLine.java:259)
>>> 
>>> ​
>>> 
>>> Chris Matta
>>> cmatta@mapr.com
>>> 215-701-3146
>>> 
>>> On Mon, Jun 22, 2015 at 10:13 AM, Tugdual Grall <tu...@gmail.com>
>> wrote:
>>> 
>>>> Yes.
>>>> 
>>>> On Mon, Jun 22, 2015 at 4:12 PM, Christopher Matta <cm...@mapr.com>
>>>> wrote:
>>>> 
>>>>> Just to clarify, you run the *exact same query* once and it works, then
>>>>> you remove say the “cool” field from orders/222.json and put it in
>>>>> orders/111.json and the next time the same query returns that error?
>>>>> ​
>>>>> 
>>>>> Chris Matta
>>>>> cmatta@mapr.com
>>>>> 215-701-3146
>>>>> 
>>>>> On Mon, Jun 22, 2015 at 9:59 AM, Tugdual Grall <tu...@gmail.com>
>> wrote:
>>>>> 
>>>>>> Hello,
>>>>>> 
>>>>>> In my use case I have several JSON documents that I need to query
>> using a
>>>>>> join.
>>>>>> The structure of each document can vary a lot (some fields a present
>> or
>>>>>> not
>>>>>> in documents)
>>>>>> 
>>>>>> Sometimes the following exception is raised:
>>>>>> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR:
>>>>>> java.lang.IllegalStateException: Failure while reading vector.
>> Expected
>>>>>> vector class of org.apache.drill.exec.vector.NullableIntVector but was
>>>>>> holding vector class
>> org.apache.drill.exec.vector.NullableVarCharVector.
>>>>>> Fragment 0:0 [Error Id: 35c751bd-3ca0-4e4a-bbac-ad5823ce582f on
>>>>>> 192.168.99.13:31010]
>>>>>> 
>>>>>> The queries:
>>>>>> 
>>>>>> Following query works:
>>>>>> -----
>>>>>> SELECT customers.id, orders.demo
>>>>>> FROM  dfs.`/Users/tgrall/working/customers/*.json` customers,
>>>>>>     dfs.`/Users/tgrall/working/orders/*.json` orders
>>>>>> WHERE customers.id = orders.cust_id
>>>>>> AND customers.country = 'FRANCE'
>>>>>> -----
>>>>>> 
>>>>>> Following query FAILS:
>>>>>> -----
>>>>>> SELECT customers.id, orders.cool
>>>>>> FROM  dfs.`/Users/tgrall/working/customers/*.json` customers,
>>>>>>     dfs.`/Users/tgrall/working/orders/*.json` orders
>>>>>> WHERE customers.id = orders.cust_id
>>>>>> AND customers.country = 'FRANCE'
>>>>>> -----
>>>>>> 
>>>>>> 
>>>>>> The documents:
>>>>>> 
>>>>>> Here the files:
>>>>>> 
>>>>>> ./customers/333.json
>>>>>> {
>>>>>> "id" : 333,
>>>>>> "name" : "Dave Smith",
>>>>>> "country" : "FRANCE"
>>>>>> }
>>>>>> 
>>>>>> 
>>>>>> ./orders/111.json
>>>>>> {
>>>>>> "tax" : 10,
>>>>>> "id" : 111,
>>>>>> "cust_id" : 333,
>>>>>> "total" : 12,
>>>>>> "demo" :10
>>>>>> }
>>>>>> 
>>>>>> ./orders/222.json
>>>>>> {
>>>>>> "cool":20,
>>>>>> "id" : 222,
>>>>>> "cust_id" : 111,
>>>>>> "total" : 12
>>>>>> }
>>>>>> 
>>>>>> 
>>>>>> To reproduce the bug you may have to change the document (add/remove
>>>>>> cool,
>>>>>> tax fields)
>>>>>> 
>>>>>> It looks like the schema is not "updated" on the fly in some case.
>>>>>> 
>>>>>> Any idea how to workaround? Is that bug?
>>>>>> 
>>>>>> Regards
>>>>>> Tug
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>> 
>> 


Re: JSON/Join/Dynamic schema : java.lang.IllegalStateException: Failure while reading vector.

Posted by Ted Dunning <te...@gmail.com>.
Andries,

That sounds like a reasonable suggestion, but the real problem is that it
appears that having the field initially and then having the field be
missing is OK, but if it is missing first and then present Drill blows a
gasket.

I think it looks like a bug.  Very good and simple demo.



On Mon, Jun 22, 2015 at 8:53 AM, Andries Engelbrecht <
aengelbrecht@maprtech.com> wrote:

> A couple of things to try that I have found useful in the past.
>
> Pending if you want inner or outer joins, you may want to look at using
> predicates to eliminate records that are not relevant to the join and can
> complicate the work Drill has to do.
>
> ie. add predicate "orders.cool is not null”
>
> Not only does it filter out the records that are not of interest (and can
> cause other challenges), but normally if you can apply predicates to
> queries to reduce the working set that Drill has to join it can
> substantially improve the performance for large data sets. Joins tend to be
> one of the more expensive operators in any execution engine, where
> predicates tend to be a much easier operation to execute at large scale.
>
> —Andries
>
>
> On Jun 22, 2015, at 7:19 AM, Christopher Matta <cm...@mapr.com> wrote:
>
> > I can confirm that this is reproducible:
> >
> > orders/111.json:
> >
> > {
> >   "tax" : 10,
> >   "id" : 111,
> >   "cust_id" : 333,
> >   "total" : 12,
> >   "demo" :10
> > }
> >
> > orders/222.json:
> >
> > {
> >   "cool": 20,
> >   "id" : 222,
> >   "cust_id" : 111,
> >   "total" : 12
> > }
> >
> > 1st query:
> >
> > 0: jdbc:drill:zk=sen11:5181,sen12:5181> SELECT customers.id, orders.cool
> > . . . . . . . . . . . . . . . . . . . > FROM
> > `maprfs.cmatta`.`test/customers/*.json` customers,
> > . . . . . . . . . . . . . . . . . . . >
> > `maprfs.cmatta`.`test/orders/*.json` orders
> > . . . . . . . . . . . . . . . . . . . > WHERE customers.id =
> orders.cust_id
> > . . . . . . . . . . . . . . . . . . . > AND customers.country = 'FRANCE';
> > +------+-------+
> > |  id  | cool  |
> > +------+-------+
> > | 333  | null  |
> > +------+-------+
> > 1 row selected (0.258 seconds)
> >
> > Now change orders/111.json by moving the cool field from 222.json to
> > 111.json:
> >
> > {
> >   "cool": 20,
> >   "tax" : 10,
> >   "id" : 111,
> >   "cust_id" : 333,
> >   "total" : 12,
> >   "demo" :10
> > }
> >
> > And removing cool from orders/222.json:
> >
> > {
> >   "id" : 222,
> >   "cust_id" : 111,
> >   "total" : 12
> > }
> >
> > Re-run the query:
> >
> > : jdbc:drill:zk=sen11:5181,sen12:5181> SELECT customers.id, orders.cool
> > . . . . . . . . . . . . . . . . . . . > FROM
> > `maprfs.cmatta`.`test/customers/*.json` customers,
> > . . . . . . . . . . . . . . . . . . . >
> > `maprfs.cmatta`.`test/orders/*.json` orders
> > . . . . . . . . . . . . . . . . . . . > WHERE customers.id =
> orders.cust_id
> > . . . . . . . . . . . . . . . . . . . > AND customers.country = 'FRANCE';
> > java.lang.RuntimeException: java.sql.SQLException: SYSTEM ERROR:
> > java.lang.IllegalStateException: Failure while reading vector.
> > Expected vector class of
> > org.apache.drill.exec.vector.NullableIntVector but was holding vector
> > class org.apache.drill.exec.vector.NullableVarCharVector.
> >
> > Fragment 0:0
> >
> > [Error Id: 04e231ee-8bad-4ad2-aff3-6c0273befd2f on
> se-node11.se.lab:31010]
> >        at sqlline.IncrementalRows.hasNext(IncrementalRows.java:73)
> >        at
> sqlline.TableOutputFormat$ResizingRowsProvider.next(TableOutputFormat.java:87)
> >        at sqlline.TableOutputFormat.print(TableOutputFormat.java:118)
> >        at sqlline.SqlLine.print(SqlLine.java:1583)
> >        at sqlline.Commands.execute(Commands.java:852)
> >        at sqlline.Commands.sql(Commands.java:751)
> >        at sqlline.SqlLine.dispatch(SqlLine.java:738)
> >        at sqlline.SqlLine.begin(SqlLine.java:612)
> >        at sqlline.SqlLine.start(SqlLine.java:366)
> >        at sqlline.SqlLine.main(SqlLine.java:259)
> >
> > ​
> >
> > Chris Matta
> > cmatta@mapr.com
> > 215-701-3146
> >
> > On Mon, Jun 22, 2015 at 10:13 AM, Tugdual Grall <tu...@gmail.com>
> wrote:
> >
> >> Yes.
> >>
> >> On Mon, Jun 22, 2015 at 4:12 PM, Christopher Matta <cm...@mapr.com>
> >> wrote:
> >>
> >>> Just to clarify, you run the *exact same query* once and it works, then
> >>> you remove say the “cool” field from orders/222.json and put it in
> >>> orders/111.json and the next time the same query returns that error?
> >>> ​
> >>>
> >>> Chris Matta
> >>> cmatta@mapr.com
> >>> 215-701-3146
> >>>
> >>> On Mon, Jun 22, 2015 at 9:59 AM, Tugdual Grall <tu...@gmail.com>
> wrote:
> >>>
> >>>> Hello,
> >>>>
> >>>> In my use case I have several JSON documents that I need to query
> using a
> >>>> join.
> >>>> The structure of each document can vary a lot (some fields a present
> or
> >>>> not
> >>>> in documents)
> >>>>
> >>>> Sometimes the following exception is raised:
> >>>> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR:
> >>>> java.lang.IllegalStateException: Failure while reading vector.
> Expected
> >>>> vector class of org.apache.drill.exec.vector.NullableIntVector but was
> >>>> holding vector class
> org.apache.drill.exec.vector.NullableVarCharVector.
> >>>> Fragment 0:0 [Error Id: 35c751bd-3ca0-4e4a-bbac-ad5823ce582f on
> >>>> 192.168.99.13:31010]
> >>>>
> >>>> The queries:
> >>>>
> >>>> Following query works:
> >>>> -----
> >>>> SELECT customers.id, orders.demo
> >>>> FROM  dfs.`/Users/tgrall/working/customers/*.json` customers,
> >>>>      dfs.`/Users/tgrall/working/orders/*.json` orders
> >>>> WHERE customers.id = orders.cust_id
> >>>> AND customers.country = 'FRANCE'
> >>>> -----
> >>>>
> >>>> Following query FAILS:
> >>>> -----
> >>>> SELECT customers.id, orders.cool
> >>>> FROM  dfs.`/Users/tgrall/working/customers/*.json` customers,
> >>>>      dfs.`/Users/tgrall/working/orders/*.json` orders
> >>>> WHERE customers.id = orders.cust_id
> >>>> AND customers.country = 'FRANCE'
> >>>> -----
> >>>>
> >>>>
> >>>> The documents:
> >>>>
> >>>> Here the files:
> >>>>
> >>>> ./customers/333.json
> >>>> {
> >>>>  "id" : 333,
> >>>>  "name" : "Dave Smith",
> >>>>  "country" : "FRANCE"
> >>>> }
> >>>>
> >>>>
> >>>> ./orders/111.json
> >>>> {
> >>>> "tax" : 10,
> >>>> "id" : 111,
> >>>> "cust_id" : 333,
> >>>> "total" : 12,
> >>>> "demo" :10
> >>>> }
> >>>>
> >>>> ./orders/222.json
> >>>> {
> >>>> "cool":20,
> >>>> "id" : 222,
> >>>> "cust_id" : 111,
> >>>> "total" : 12
> >>>> }
> >>>>
> >>>>
> >>>> To reproduce the bug you may have to change the document (add/remove
> >>>> cool,
> >>>> tax fields)
> >>>>
> >>>> It looks like the schema is not "updated" on the fly in some case.
> >>>>
> >>>> Any idea how to workaround? Is that bug?
> >>>>
> >>>> Regards
> >>>> Tug
> >>>>
> >>>
> >>>
> >>
>
>

Re: JSON/Join/Dynamic schema : java.lang.IllegalStateException: Failure while reading vector.

Posted by Andries Engelbrecht <ae...@maprtech.com>.
A couple of things to try that I have found useful in the past.

Pending if you want inner or outer joins, you may want to look at using predicates to eliminate records that are not relevant to the join and can complicate the work Drill has to do.

ie. add predicate "orders.cool is not null”

Not only does it filter out the records that are not of interest (and can cause other challenges), but normally if you can apply predicates to queries to reduce the working set that Drill has to join it can substantially improve the performance for large data sets. Joins tend to be one of the more expensive operators in any execution engine, where predicates tend to be a much easier operation to execute at large scale.

—Andries


On Jun 22, 2015, at 7:19 AM, Christopher Matta <cm...@mapr.com> wrote:

> I can confirm that this is reproducible:
> 
> orders/111.json:
> 
> {
>   "tax" : 10,
>   "id" : 111,
>   "cust_id" : 333,
>   "total" : 12,
>   "demo" :10
> }
> 
> orders/222.json:
> 
> {
>   "cool": 20,
>   "id" : 222,
>   "cust_id" : 111,
>   "total" : 12
> }
> 
> 1st query:
> 
> 0: jdbc:drill:zk=sen11:5181,sen12:5181> SELECT customers.id, orders.cool
> . . . . . . . . . . . . . . . . . . . > FROM
> `maprfs.cmatta`.`test/customers/*.json` customers,
> . . . . . . . . . . . . . . . . . . . >
> `maprfs.cmatta`.`test/orders/*.json` orders
> . . . . . . . . . . . . . . . . . . . > WHERE customers.id = orders.cust_id
> . . . . . . . . . . . . . . . . . . . > AND customers.country = 'FRANCE';
> +------+-------+
> |  id  | cool  |
> +------+-------+
> | 333  | null  |
> +------+-------+
> 1 row selected (0.258 seconds)
> 
> Now change orders/111.json by moving the cool field from 222.json to
> 111.json:
> 
> {
>   "cool": 20,
>   "tax" : 10,
>   "id" : 111,
>   "cust_id" : 333,
>   "total" : 12,
>   "demo" :10
> }
> 
> And removing cool from orders/222.json:
> 
> {
>   "id" : 222,
>   "cust_id" : 111,
>   "total" : 12
> }
> 
> Re-run the query:
> 
> : jdbc:drill:zk=sen11:5181,sen12:5181> SELECT customers.id, orders.cool
> . . . . . . . . . . . . . . . . . . . > FROM
> `maprfs.cmatta`.`test/customers/*.json` customers,
> . . . . . . . . . . . . . . . . . . . >
> `maprfs.cmatta`.`test/orders/*.json` orders
> . . . . . . . . . . . . . . . . . . . > WHERE customers.id = orders.cust_id
> . . . . . . . . . . . . . . . . . . . > AND customers.country = 'FRANCE';
> java.lang.RuntimeException: java.sql.SQLException: SYSTEM ERROR:
> java.lang.IllegalStateException: Failure while reading vector.
> Expected vector class of
> org.apache.drill.exec.vector.NullableIntVector but was holding vector
> class org.apache.drill.exec.vector.NullableVarCharVector.
> 
> Fragment 0:0
> 
> [Error Id: 04e231ee-8bad-4ad2-aff3-6c0273befd2f on se-node11.se.lab:31010]
>        at sqlline.IncrementalRows.hasNext(IncrementalRows.java:73)
>        at sqlline.TableOutputFormat$ResizingRowsProvider.next(TableOutputFormat.java:87)
>        at sqlline.TableOutputFormat.print(TableOutputFormat.java:118)
>        at sqlline.SqlLine.print(SqlLine.java:1583)
>        at sqlline.Commands.execute(Commands.java:852)
>        at sqlline.Commands.sql(Commands.java:751)
>        at sqlline.SqlLine.dispatch(SqlLine.java:738)
>        at sqlline.SqlLine.begin(SqlLine.java:612)
>        at sqlline.SqlLine.start(SqlLine.java:366)
>        at sqlline.SqlLine.main(SqlLine.java:259)
> 
> ​
> 
> Chris Matta
> cmatta@mapr.com
> 215-701-3146
> 
> On Mon, Jun 22, 2015 at 10:13 AM, Tugdual Grall <tu...@gmail.com> wrote:
> 
>> Yes.
>> 
>> On Mon, Jun 22, 2015 at 4:12 PM, Christopher Matta <cm...@mapr.com>
>> wrote:
>> 
>>> Just to clarify, you run the *exact same query* once and it works, then
>>> you remove say the “cool” field from orders/222.json and put it in
>>> orders/111.json and the next time the same query returns that error?
>>> ​
>>> 
>>> Chris Matta
>>> cmatta@mapr.com
>>> 215-701-3146
>>> 
>>> On Mon, Jun 22, 2015 at 9:59 AM, Tugdual Grall <tu...@gmail.com> wrote:
>>> 
>>>> Hello,
>>>> 
>>>> In my use case I have several JSON documents that I need to query using a
>>>> join.
>>>> The structure of each document can vary a lot (some fields a present or
>>>> not
>>>> in documents)
>>>> 
>>>> Sometimes the following exception is raised:
>>>> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR:
>>>> java.lang.IllegalStateException: Failure while reading vector. Expected
>>>> vector class of org.apache.drill.exec.vector.NullableIntVector but was
>>>> holding vector class org.apache.drill.exec.vector.NullableVarCharVector.
>>>> Fragment 0:0 [Error Id: 35c751bd-3ca0-4e4a-bbac-ad5823ce582f on
>>>> 192.168.99.13:31010]
>>>> 
>>>> The queries:
>>>> 
>>>> Following query works:
>>>> -----
>>>> SELECT customers.id, orders.demo
>>>> FROM  dfs.`/Users/tgrall/working/customers/*.json` customers,
>>>>      dfs.`/Users/tgrall/working/orders/*.json` orders
>>>> WHERE customers.id = orders.cust_id
>>>> AND customers.country = 'FRANCE'
>>>> -----
>>>> 
>>>> Following query FAILS:
>>>> -----
>>>> SELECT customers.id, orders.cool
>>>> FROM  dfs.`/Users/tgrall/working/customers/*.json` customers,
>>>>      dfs.`/Users/tgrall/working/orders/*.json` orders
>>>> WHERE customers.id = orders.cust_id
>>>> AND customers.country = 'FRANCE'
>>>> -----
>>>> 
>>>> 
>>>> The documents:
>>>> 
>>>> Here the files:
>>>> 
>>>> ./customers/333.json
>>>> {
>>>>  "id" : 333,
>>>>  "name" : "Dave Smith",
>>>>  "country" : "FRANCE"
>>>> }
>>>> 
>>>> 
>>>> ./orders/111.json
>>>> {
>>>> "tax" : 10,
>>>> "id" : 111,
>>>> "cust_id" : 333,
>>>> "total" : 12,
>>>> "demo" :10
>>>> }
>>>> 
>>>> ./orders/222.json
>>>> {
>>>> "cool":20,
>>>> "id" : 222,
>>>> "cust_id" : 111,
>>>> "total" : 12
>>>> }
>>>> 
>>>> 
>>>> To reproduce the bug you may have to change the document (add/remove
>>>> cool,
>>>> tax fields)
>>>> 
>>>> It looks like the schema is not "updated" on the fly in some case.
>>>> 
>>>> Any idea how to workaround? Is that bug?
>>>> 
>>>> Regards
>>>> Tug
>>>> 
>>> 
>>> 
>> 


Re: JSON/Join/Dynamic schema : java.lang.IllegalStateException: Failure while reading vector.

Posted by Christopher Matta <cm...@mapr.com>.
I can confirm that this is reproducible:

orders/111.json:

{
   "tax" : 10,
   "id" : 111,
   "cust_id" : 333,
   "total" : 12,
   "demo" :10
}

orders/222.json:

{
   "cool": 20,
   "id" : 222,
   "cust_id" : 111,
   "total" : 12
}

1st query:

0: jdbc:drill:zk=sen11:5181,sen12:5181> SELECT customers.id, orders.cool
. . . . . . . . . . . . . . . . . . . > FROM
`maprfs.cmatta`.`test/customers/*.json` customers,
. . . . . . . . . . . . . . . . . . . >
`maprfs.cmatta`.`test/orders/*.json` orders
. . . . . . . . . . . . . . . . . . . > WHERE customers.id = orders.cust_id
. . . . . . . . . . . . . . . . . . . > AND customers.country = 'FRANCE';
+------+-------+
|  id  | cool  |
+------+-------+
| 333  | null  |
+------+-------+
1 row selected (0.258 seconds)

Now change orders/111.json by moving the cool field from 222.json to
111.json:

{
   "cool": 20,
   "tax" : 10,
   "id" : 111,
   "cust_id" : 333,
   "total" : 12,
   "demo" :10
}

And removing cool from orders/222.json:

{
   "id" : 222,
   "cust_id" : 111,
   "total" : 12
}

Re-run the query:

: jdbc:drill:zk=sen11:5181,sen12:5181> SELECT customers.id, orders.cool
. . . . . . . . . . . . . . . . . . . > FROM
`maprfs.cmatta`.`test/customers/*.json` customers,
. . . . . . . . . . . . . . . . . . . >
`maprfs.cmatta`.`test/orders/*.json` orders
. . . . . . . . . . . . . . . . . . . > WHERE customers.id = orders.cust_id
. . . . . . . . . . . . . . . . . . . > AND customers.country = 'FRANCE';
java.lang.RuntimeException: java.sql.SQLException: SYSTEM ERROR:
java.lang.IllegalStateException: Failure while reading vector.
Expected vector class of
org.apache.drill.exec.vector.NullableIntVector but was holding vector
class org.apache.drill.exec.vector.NullableVarCharVector.

Fragment 0:0

[Error Id: 04e231ee-8bad-4ad2-aff3-6c0273befd2f on se-node11.se.lab:31010]
        at sqlline.IncrementalRows.hasNext(IncrementalRows.java:73)
        at sqlline.TableOutputFormat$ResizingRowsProvider.next(TableOutputFormat.java:87)
        at sqlline.TableOutputFormat.print(TableOutputFormat.java:118)
        at sqlline.SqlLine.print(SqlLine.java:1583)
        at sqlline.Commands.execute(Commands.java:852)
        at sqlline.Commands.sql(Commands.java:751)
        at sqlline.SqlLine.dispatch(SqlLine.java:738)
        at sqlline.SqlLine.begin(SqlLine.java:612)
        at sqlline.SqlLine.start(SqlLine.java:366)
        at sqlline.SqlLine.main(SqlLine.java:259)

​

Chris Matta
cmatta@mapr.com
215-701-3146

On Mon, Jun 22, 2015 at 10:13 AM, Tugdual Grall <tu...@gmail.com> wrote:

> Yes.
>
> On Mon, Jun 22, 2015 at 4:12 PM, Christopher Matta <cm...@mapr.com>
> wrote:
>
>> Just to clarify, you run the *exact same query* once and it works, then
>> you remove say the “cool” field from orders/222.json and put it in
>> orders/111.json and the next time the same query returns that error?
>> ​
>>
>> Chris Matta
>> cmatta@mapr.com
>> 215-701-3146
>>
>> On Mon, Jun 22, 2015 at 9:59 AM, Tugdual Grall <tu...@gmail.com> wrote:
>>
>>> Hello,
>>>
>>> In my use case I have several JSON documents that I need to query using a
>>> join.
>>> The structure of each document can vary a lot (some fields a present or
>>> not
>>> in documents)
>>>
>>> Sometimes the following exception is raised:
>>> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR:
>>> java.lang.IllegalStateException: Failure while reading vector. Expected
>>> vector class of org.apache.drill.exec.vector.NullableIntVector but was
>>> holding vector class org.apache.drill.exec.vector.NullableVarCharVector.
>>> Fragment 0:0 [Error Id: 35c751bd-3ca0-4e4a-bbac-ad5823ce582f on
>>> 192.168.99.13:31010]
>>>
>>> The queries:
>>>
>>> Following query works:
>>> -----
>>> SELECT customers.id, orders.demo
>>> FROM  dfs.`/Users/tgrall/working/customers/*.json` customers,
>>>       dfs.`/Users/tgrall/working/orders/*.json` orders
>>> WHERE customers.id = orders.cust_id
>>> AND customers.country = 'FRANCE'
>>> -----
>>>
>>> Following query FAILS:
>>> -----
>>> SELECT customers.id, orders.cool
>>> FROM  dfs.`/Users/tgrall/working/customers/*.json` customers,
>>>       dfs.`/Users/tgrall/working/orders/*.json` orders
>>> WHERE customers.id = orders.cust_id
>>> AND customers.country = 'FRANCE'
>>> -----
>>>
>>>
>>> The documents:
>>>
>>> Here the files:
>>>
>>> ./customers/333.json
>>> {
>>>   "id" : 333,
>>>   "name" : "Dave Smith",
>>>   "country" : "FRANCE"
>>> }
>>>
>>>
>>> ./orders/111.json
>>> {
>>> "tax" : 10,
>>> "id" : 111,
>>> "cust_id" : 333,
>>> "total" : 12,
>>> "demo" :10
>>> }
>>>
>>> ./orders/222.json
>>> {
>>> "cool":20,
>>> "id" : 222,
>>> "cust_id" : 111,
>>> "total" : 12
>>> }
>>>
>>>
>>> To reproduce the bug you may have to change the document (add/remove
>>> cool,
>>> tax fields)
>>>
>>> It looks like the schema is not "updated" on the fly in some case.
>>>
>>> Any idea how to workaround? Is that bug?
>>>
>>> Regards
>>> Tug
>>>
>>
>>
>

Re: JSON/Join/Dynamic schema : java.lang.IllegalStateException: Failure while reading vector.

Posted by Tugdual Grall <tu...@gmail.com>.
Yes.

On Mon, Jun 22, 2015 at 4:12 PM, Christopher Matta <cm...@mapr.com> wrote:

> Just to clarify, you run the *exact same query* once and it works, then
> you remove say the “cool” field from orders/222.json and put it in
> orders/111.json and the next time the same query returns that error?
> ​
>
> Chris Matta
> cmatta@mapr.com
> 215-701-3146
>
> On Mon, Jun 22, 2015 at 9:59 AM, Tugdual Grall <tu...@gmail.com> wrote:
>
>> Hello,
>>
>> In my use case I have several JSON documents that I need to query using a
>> join.
>> The structure of each document can vary a lot (some fields a present or
>> not
>> in documents)
>>
>> Sometimes the following exception is raised:
>> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR:
>> java.lang.IllegalStateException: Failure while reading vector. Expected
>> vector class of org.apache.drill.exec.vector.NullableIntVector but was
>> holding vector class org.apache.drill.exec.vector.NullableVarCharVector.
>> Fragment 0:0 [Error Id: 35c751bd-3ca0-4e4a-bbac-ad5823ce582f on
>> 192.168.99.13:31010]
>>
>> The queries:
>>
>> Following query works:
>> -----
>> SELECT customers.id, orders.demo
>> FROM  dfs.`/Users/tgrall/working/customers/*.json` customers,
>>       dfs.`/Users/tgrall/working/orders/*.json` orders
>> WHERE customers.id = orders.cust_id
>> AND customers.country = 'FRANCE'
>> -----
>>
>> Following query FAILS:
>> -----
>> SELECT customers.id, orders.cool
>> FROM  dfs.`/Users/tgrall/working/customers/*.json` customers,
>>       dfs.`/Users/tgrall/working/orders/*.json` orders
>> WHERE customers.id = orders.cust_id
>> AND customers.country = 'FRANCE'
>> -----
>>
>>
>> The documents:
>>
>> Here the files:
>>
>> ./customers/333.json
>> {
>>   "id" : 333,
>>   "name" : "Dave Smith",
>>   "country" : "FRANCE"
>> }
>>
>>
>> ./orders/111.json
>> {
>> "tax" : 10,
>> "id" : 111,
>> "cust_id" : 333,
>> "total" : 12,
>> "demo" :10
>> }
>>
>> ./orders/222.json
>> {
>> "cool":20,
>> "id" : 222,
>> "cust_id" : 111,
>> "total" : 12
>> }
>>
>>
>> To reproduce the bug you may have to change the document (add/remove cool,
>> tax fields)
>>
>> It looks like the schema is not "updated" on the fly in some case.
>>
>> Any idea how to workaround? Is that bug?
>>
>> Regards
>> Tug
>>
>
>

Re: JSON/Join/Dynamic schema : java.lang.IllegalStateException: Failure while reading vector.

Posted by Christopher Matta <cm...@mapr.com>.
Just to clarify, you run the *exact same query* once and it works, then you
remove say the “cool” field from orders/222.json and put it in
orders/111.json and the next time the same query returns that error?
​

Chris Matta
cmatta@mapr.com
215-701-3146

On Mon, Jun 22, 2015 at 9:59 AM, Tugdual Grall <tu...@gmail.com> wrote:

> Hello,
>
> In my use case I have several JSON documents that I need to query using a
> join.
> The structure of each document can vary a lot (some fields a present or not
> in documents)
>
> Sometimes the following exception is raised:
> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR:
> java.lang.IllegalStateException: Failure while reading vector. Expected
> vector class of org.apache.drill.exec.vector.NullableIntVector but was
> holding vector class org.apache.drill.exec.vector.NullableVarCharVector.
> Fragment 0:0 [Error Id: 35c751bd-3ca0-4e4a-bbac-ad5823ce582f on
> 192.168.99.13:31010]
>
> The queries:
>
> Following query works:
> -----
> SELECT customers.id, orders.demo
> FROM  dfs.`/Users/tgrall/working/customers/*.json` customers,
>       dfs.`/Users/tgrall/working/orders/*.json` orders
> WHERE customers.id = orders.cust_id
> AND customers.country = 'FRANCE'
> -----
>
> Following query FAILS:
> -----
> SELECT customers.id, orders.cool
> FROM  dfs.`/Users/tgrall/working/customers/*.json` customers,
>       dfs.`/Users/tgrall/working/orders/*.json` orders
> WHERE customers.id = orders.cust_id
> AND customers.country = 'FRANCE'
> -----
>
>
> The documents:
>
> Here the files:
>
> ./customers/333.json
> {
>   "id" : 333,
>   "name" : "Dave Smith",
>   "country" : "FRANCE"
> }
>
>
> ./orders/111.json
> {
> "tax" : 10,
> "id" : 111,
> "cust_id" : 333,
> "total" : 12,
> "demo" :10
> }
>
> ./orders/222.json
> {
> "cool":20,
> "id" : 222,
> "cust_id" : 111,
> "total" : 12
> }
>
>
> To reproduce the bug you may have to change the document (add/remove cool,
> tax fields)
>
> It looks like the schema is not "updated" on the fly in some case.
>
> Any idea how to workaround? Is that bug?
>
> Regards
> Tug
>