You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Paul Rogers (JIRA)" <ji...@apache.org> on 2017/11/11 00:08:00 UTC

[jira] [Commented] (DRILL-4842) SELECT * on JSON data results in NumberFormatException

    [ https://issues.apache.org/jira/browse/DRILL-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16248214#comment-16248214 ] 

Paul Rogers commented on DRILL-4842:
------------------------------------

The JSON reader has been updated in the "Batch Size Control" project. A new feature was added to handle runs of nulls.

If the first value seen for a field is null, then we enter "deferred null mode."

While in that mode, we do not create a value vector; instead we bide our time waiting for an actual value.

When we see an actual value, we create a vector of that type. The vector automatically performs "fill empties" logic for the missing null values.

If we reach the end of a batch, and we've seen no value, then we assume "text mode" for this field. That is, this field (only) acts as if "all text mode" were set. The field becomes a nullable Varchar.

Once we see a scalar value, that value is saved as text. So, if we have 10K nulls, followed by the value 10, the value will be saved as the string "10".

If the value ends up being an array or object, then the parse fails with a type mismatch error.

The above is only a work-around. The proper solution is to allow the user to specify a schema so that Drill need not try to predict the future.

> SELECT * on JSON data results in NumberFormatException
> ------------------------------------------------------
>
>                 Key: DRILL-4842
>                 URL: https://issues.apache.org/jira/browse/DRILL-4842
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Flow
>    Affects Versions: 1.2.0
>            Reporter: Khurram Faraaz
>            Assignee: Serhii Harnyk
>         Attachments: tooManyNulls.json
>
>
> Note that doing SELECT c1 returns correct results, the failure is seen when we do SELECT star. json.all_text_mode was set to true.
> JSON file tooManyNulls.json has one key c1 with 4096 nulls as its value and the 4097th key c1 has the value "Hello World"
> git commit ID : aaf220ff
> MapR Drill 1.8.0 RPM
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> alter session set `store.json.all_text_mode`=true;
> +-------+------------------------------------+
> |  ok   |              summary               |
> +-------+------------------------------------+
> | true  | store.json.all_text_mode updated.  |
> +-------+------------------------------------+
> 1 row selected (0.27 seconds)
> 0: jdbc:drill:schema=dfs.tmp> SELECT c1 FROM `tooManyNulls.json` WHERE c1 IN ('Hello World');
> +--------------+
> |      c1      |
> +--------------+
> | Hello World  |
> +--------------+
> 1 row selected (0.243 seconds)
> 0: jdbc:drill:schema=dfs.tmp> select * FROM `tooManyNulls.json` WHERE c1 IN ('Hello World');
> Error: SYSTEM ERROR: NumberFormatException: Hello World
> Fragment 0:0
> [Error Id: 9cafb3f9-3d5c-478a-b55c-900602b8765e on centos-01.qa.lab:31010]
>  (java.lang.NumberFormatException) Hello World
>     org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.nfeI():95
>     org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.varTypesToInt():120
>     org.apache.drill.exec.test.generated.FiltererGen1169.doSetup():45
>     org.apache.drill.exec.test.generated.FiltererGen1169.setup():54
>     org.apache.drill.exec.physical.impl.filter.FilterRecordBatch.generateSV2Filterer():195
>     org.apache.drill.exec.physical.impl.filter.FilterRecordBatch.setupNewSchema():107
>     org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():78
>     org.apache.drill.exec.record.AbstractRecordBatch.next():162
>     org.apache.drill.exec.record.AbstractRecordBatch.next():119
>     org.apache.drill.exec.record.AbstractRecordBatch.next():109
>     org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
>     org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext():94
>     org.apache.drill.exec.record.AbstractRecordBatch.next():162
>     org.apache.drill.exec.record.AbstractRecordBatch.next():119
>     org.apache.drill.exec.record.AbstractRecordBatch.next():109
>     org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
>     org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():135
>     org.apache.drill.exec.record.AbstractRecordBatch.next():162
>     org.apache.drill.exec.record.AbstractRecordBatch.next():119
>     org.apache.drill.exec.record.AbstractRecordBatch.next():109
>     org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
>     org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():135
>     org.apache.drill.exec.record.AbstractRecordBatch.next():162
>     org.apache.drill.exec.physical.impl.BaseRootExec.next():104
>     org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():81
>     org.apache.drill.exec.physical.impl.BaseRootExec.next():94
>     org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():257
>     org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():251
>     java.security.AccessController.doPrivileged():-2
>     javax.security.auth.Subject.doAs():415
>     org.apache.hadoop.security.UserGroupInformation.doAs():1595
>     org.apache.drill.exec.work.fragment.FragmentExecutor.run():251
>     org.apache.drill.common.SelfCleaningRunnable.run():38
>     java.util.concurrent.ThreadPoolExecutor.runWorker():1145
>     java.util.concurrent.ThreadPoolExecutor$Worker.run():615
>     java.lang.Thread.run():745 (state=,code=0)
> 0: jdbc:drill:schema=dfs.tmp>
> {noformat}
> Stack trace from drillbit.log
> {noformat}
> Caused by: java.lang.NumberFormatException: Hello World
>         at org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.nfeI(StringFunctionHelpers.java:95) ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>         at org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.varTypesToInt(StringFunctionHelpers.java:120) ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>         at org.apache.drill.exec.test.generated.FiltererGen1169.doSetup(FilterTemplate2.java:45) ~[na:na]
>         at org.apache.drill.exec.test.generated.FiltererGen1169.setup(FilterTemplate2.java:54) ~[na:na]
>         at org.apache.drill.exec.physical.impl.filter.FilterRecordBatch.generateSV2Filterer(FilterRecordBatch.java:195) ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.filter.FilterRecordBatch.setupNewSchema(FilterRecordBatch.java:107) ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:78) ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162) ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119) ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109) ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51) ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext(RemovingRecordBatch.java:94) ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162) ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119) ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109) ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51) ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:135) ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162) ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119) ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109) ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51) ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:135) ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162) ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:104) ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:81) ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:94) ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>         at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:257) ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>         at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:251) ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)