You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by Khurram Faraaz <kf...@maprtech.com> on 2016/02/21 19:00:40 UTC

Verifying fix for DRILL-4291 - Cannot read from the middle of a record. Current token was START_ARRAY

Hi All,

As part of verifying the fix for DRILL-4291 we used this test, where we
have a million keys and each key holds an array of varying length string
values. I am hitting a DATA_READ error, with this message, Error parsing
JSON - Cannot read from the middle of a record. Current token was
START_ARRAY

Drill version : Drill 1.5.0 commit ID : ca53c244

Note that entire data is enclosed in a single array
[{"key":["abcd","efg","q34l"]},{"key":[]},{"key":[]},...{"key":["q","dfsdfgdf","343123asda"]}]

Six of data
[root@centos-01 ~]# hadoop fs -ls /tmp/varLenStrArray.json
-rwxr-xr-x   3 root root 4394965495 2016-02-19 11:20
/tmp/varLenStrArray.json

Failing query : select count(*) from dfs.tmp.`varLenStrArray.json`

Query Failed: An Error Occurred
org.apache.drill.common.exceptions.UserRemoteException: DATA_READ ERROR:
Error parsing JSON - Cannot read from the middle of a record. Current token
was START_ARRAY File /tmp/varLenStrArray.json Record 1 Fragment 0:0 [Error
Id: 2f4fbc7a-f54c-4d9d-8461-0ca09ce1972a on centos-02.qa.lab:31010]

Stacktrace from drillbit.log

2016-02-21 17:02:03,151 [29361273-ce67-2111-6982-b9f5b35731f2:foreman] INFO
 o.a.drill.exec.work.foreman.Foreman - Query text for query id
29361273-ce67-2111-6982-b9f5b35731f2: select count(*) from
dfs.tmp.`varLenStrArray.json`
2016-02-21 17:02:03,333 [29361273-ce67-2111-6982-b9f5b35731f2:foreman] INFO
 o.a.d.e.s.schedule.BlockMapBuilder - Get block maps: Executed 1 out of 1
using 1 threads. Time: 30ms total, 30.756175ms avg, 30ms max.
2016-02-21 17:02:03,333 [29361273-ce67-2111-6982-b9f5b35731f2:foreman] INFO
 o.a.d.e.s.schedule.BlockMapBuilder - Get block maps: Executed 1 out of 1
using 1 threads. Earliest start: 1.163000 μs, Latest start: 1.163000 μs,
Average start: 1.163000 μs .
2016-02-21 17:02:03,380 [29361273-ce67-2111-6982-b9f5b35731f2:frag:0:0]
INFO  o.a.d.e.w.fragment.FragmentExecutor -
29361273-ce67-2111-6982-b9f5b35731f2:0:0: State change requested
AWAITING_ALLOCATION --> RUNNING
2016-02-21 17:02:03,381 [29361273-ce67-2111-6982-b9f5b35731f2:frag:0:0]
INFO  o.a.d.e.w.f.FragmentStatusReporter -
29361273-ce67-2111-6982-b9f5b35731f2:0:0: State to report: RUNNING
2016-02-21 17:02:03,383 [29361273-ce67-2111-6982-b9f5b35731f2:frag:0:0]
INFO  o.a.d.e.s.easy.json.JSONRecordReader - User Error Occurred
org.apache.drill.common.exceptions.UserException: DATA_READ ERROR: Error
parsing JSON - Cannot read from the middle of a record. Current token was
START_ARRAY

File  /tmp/varLenStrArray.json
Record  1

[Error Id: 2f4fbc7a-f54c-4d9d-8461-0ca09ce1972a ]
        at
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:543)
~[drill-common-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
        at
org.apache.drill.exec.store.easy.json.JSONRecordReader.handleAndRaise(JSONRecordReader.java:179)
[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
        at
org.apache.drill.exec.store.easy.json.JSONRecordReader.next(JSONRecordReader.java:219)
[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
        at
org.apache.drill.exec.physical.impl.ScanBatch.next(ScanBatch.java:191)
[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
        at
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
        at
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
        at
org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
        at
org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:132)
[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
        at
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
        at
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
        at
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
        at
org.apache.drill.exec.physical.impl.aggregate.StreamingAggBatch.buildSchema(StreamingAggBatch.java:100)
[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
        at
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:142)
[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
        at
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
        at
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
        at
org.apache.drill.exec.physical.impl.aggregate.StreamingAggBatch.buildSchema(StreamingAggBatch.java:100)
[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
        at
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:142)
[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
        at
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
        at
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
        at
org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
        at
org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:132)
[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
        at
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
        at
org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:104)
[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
        at
org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:81)
[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
        at
org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:94)
[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
        at
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:256)
[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
        at
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:250)
[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
        at java.security.AccessController.doPrivileged(Native Method)
[na:1.8.0_65]
        at javax.security.auth.Subject.doAs(Subject.java:422) [na:1.8.0_65]
        at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1595)
[hadoop-common-2.7.0-mapr-1506.jar:na]
        at
org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:250)
[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
        at
org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
[drill-common-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
[na:1.8.0_65]
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[na:1.8.0_65]
        at java.lang.Thread.run(Thread.java:745) [na:1.8.0_65]
Caused by: java.lang.IllegalStateException: Cannot read from the middle of
a record. Current token was START_ARRAY
        at
org.apache.drill.exec.store.easy.json.reader.CountingJsonReader.write(CountingJsonReader.java:39)
~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
        at
org.apache.drill.exec.store.easy.json.JSONRecordReader.next(JSONRecordReader.java:197)
[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
        ... 32 common frames omitted
2016-02-21 17:02:03,383 [29361273-ce67-2111-6982-b9f5b35731f2:frag:0:0]
INFO  o.a.d.e.w.fragment.FragmentExecutor -
29361273-ce67-2111-6982-b9f5b35731f2:0:0: State change requested RUNNING
--> FAILED

Python script used to generate data (random strings of varying length
[1-256] characters, in arrays)

import json
import random
import string

def genRandomStrInArray():
    return ''.join([random.choice(string.ascii_uppercase + string.digits +
string.ascii_lowercase) for i in range(random.randint(1,256))])


def genJSONMap():
    return {
        'key': [genRandomStrInArray() for i in xrange(64)]    # Note this
generates 64 values in the list.
    }

def genJSON():
    return [genJSONMap() for i in xrange(1000000)]

def writeJSONToFile(fname, jsonObj):
    with open(fname, 'w') as f:
        json.dump(jsonObj, f)

def main():
    writeJSONToFile('/Users/kfaraaz/varLenStrArray.json', genJSON())

if __name__ == '__main__':
    main()


Thanks,
Khurram