You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Alexander Reshetov (JIRA)" <ji...@apache.org> on 2015/04/03 22:35:52 UTC

[jira] [Created] (DRILL-2677) Query does not go beyond 4096 lines in small JSON files

Alexander Reshetov created DRILL-2677:
-----------------------------------------

             Summary: Query does not go beyond 4096 lines in small JSON files
                 Key: DRILL-2677
                 URL: https://issues.apache.org/jira/browse/DRILL-2677
             Project: Apache Drill
          Issue Type: Bug
         Environment: drill 0.8 official build
            Reporter: Alexander Reshetov


Hello,

I'm trying to execute next query:
{code}
select * from (select source.pck, source.`timestamp`, flatten(source.HostUpdateTypeNW.Transfers) as entry from dfs.`/mnt/data/dataset_4095_and_1.json` as source) as parsed;
{code}

And it works as expected and I got result:
{code}
+------------+------------+------------+
|    pck     | timestamp  |   entry    |
+------------+------------+------------+
| 3547       | 1419807470286356 | {"TransferingPurpose":"8","TransferingImpact":"88","TransferingKind":"8","TransferingTime":"888888888","PackageOrigSenderID":"8","TransferingID":"88888","TransitCN":"888","PackageChkPnt":"8888","PackageFullSize":"8","TransferingSessionID":"8","SubpackagesCounter":"8"} |
+------------+------------+------------+
1 row selected (0.188 seconds)
{code}

This file contains 4095 same lines of one JSON string + at the end another JOSN line (see attached file dataset_4095_and_1.json)

The problem is when first string repeats more than 4095 times query got exception. Here is query for file with 4096 string of first type + 1 string of another (see attached file dataset_4096_and_1.json).

{code}
select * from (select source.pck, source.`timestamp`, flatten(source.HostUpdateTypeNW.Transfers) as entry from dfs.`/mnt/data/dataset_4096_and_1.json` as source) as parsed;
Exception in thread "2ae108ff-b7ea-8f07-054e-84875815d856:frag:0:0" java.lang.RuntimeException: Error closing fragment context.
	at org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources(FragmentExecutor.java:224)
	at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:187)
	at org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassCastException: org.apache.drill.exec.vector.NullableIntVector cannot be cast to org.apache.drill.exec.vector.RepeatedVector
	at org.apache.drill.exec.physical.impl.flatten.FlattenRecordBatch.getFlattenFieldTransferPair(FlattenRecordBatch.java:274)
	at org.apache.drill.exec.physical.impl.flatten.FlattenRecordBatch.setupNewSchema(FlattenRecordBatch.java:296)
	at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:78)
	at org.apache.drill.exec.physical.impl.flatten.FlattenRecordBatch.innerNext(FlattenRecordBatch.java:122)
	at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:142)
	at org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118)
	at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:99)
	at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:89)
	at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
	at org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:134)
	at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:142)
	at org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118)
	at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:68)
	at org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:96)
	at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:58)
	at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:163)
	... 4 more
Query failed: RemoteRpcException: Failure while running fragment., org.apache.drill.exec.vector.NullableIntVector cannot be cast to org.apache.drill.exec.vector.RepeatedVector [ cb6c7914-438f-440a-9c74-fe39130feca9 on testlab-broker:31010 ]
[ cb6c7914-438f-440a-9c74-fe39130feca9 on testlab-broker:31010 ]

Error: exception while executing query: Failure while executing query. (state=,code=0)
{code}

It means that Drill stops analyze schema exactly after 4096 lines and that's why my query is failing.

And I assume that such behavior lead to another issue from which I investigated this one. It could be shown on large files, perhaps Drill somehow split file into smaller chunks and in one of them exists similar sequence of lines (4096 of the same type from Drill point of view and it stops query which lead to another exception). Large file attached as dataset_sample.json.gz

Here is view (dataset_sample.view.drill) which I use for query of large file:
{code}
{
  "name" : "dataset_sample",
  "sql" : "SELECT `Message`.`timestamp`, `flatten`(`Message`.`HostUpdateTypeCR`['Transfers']) AS `entries`\nFROM `dfs`.`/mnt/data/dataset_sample.json.gz` AS `Message`",
  "fields" : [ {
    "name" : "timestamp",
    "type" : "ANY"
  }, {
    "name" : "transfers",
    "type" : "ANY"
  } ],
  "workspaceSchemaPath" : [ "dfs", "mnt" ]
}
{code}

And here is query which I'm trying to execute:
{code}
0: jdbc:drill:zk=local> create table dataset_tbl as
. . . . . . . . . . . > select dataset_sample.transfers.TransferingID as id, dataset_sample.transfers.TransferingKind as type from dataset_sample;
Query failed: Query stopped., index: 9502, length: 1 (expected: range(0, 1024)) [ c5eac3ee-0266-4645-b6b5-2a1b58df4821 on testlab-broker:31010 ]

Error: exception while executing query: Failure while executing query. (state=,code=0)

0: jdbc:drill:zk=local> Exception in thread "WorkManager-19" java.lang.IllegalStateException
	at com.google.common.base.Preconditions.checkState(Preconditions.java:133)
	at org.apache.drill.common.DeferredException.addException(DeferredException.java:47)
	at org.apache.drill.common.DeferredException.addThrowable(DeferredException.java:61)
	at org.apache.drill.exec.ops.FragmentContext.fail(FragmentContext.java:133)
	at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:181)
	at org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)
{code}

Please let me know if I should split this issue to two separate issues or if you need any additional info.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)