You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by yousuf <yo...@css.org.sa> on 2016/12/07 11:11:13 UTC

Fwd: Exception : IndexOutOfBoundsException: index: 0, length: 264 - ... querying mongodb

Hi

I'm currently exploring apache drill, running on a cluster mode. my 
datasoure is mongodb.My datasource table contains 5 million documents. I 
can't execute a simple query

|select body from mongo.twitter.tweets limit 10;|

*Throwing exception*

|QueryFailed:AnErrorOccurredorg.apache.drill.common.exceptions.UserRemoteException:SYSTEM 
ERROR:IndexOutOfBoundsException:index:0,length:264(expected:range(0,256))Fragment1:2[ErrorId:8903127a-e9e9-407e-8afc-2092b4c03cf0on 
test01.css.org:31010](java.lang.IndexOutOfBoundsException)index:0,length:264(expected:range(0,256))io.netty.buffer.AbstractByteBuf.checkIndex():1134io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes():272io.netty.buffer.WrappedByteBuf.setBytes():390io.netty.buffer.UnsafeDirectLittleEndian.setBytes():30io.netty.buffer.DrillBuf.setBytes():753io.netty.buffer.AbstractByteBuf.setBytes():510org.apache.drill.exec.store.bson.BsonRecordReader.writeString():265org.apache.drill.exec.store.bson.BsonRecordReader.writeToListOrMap():167org.apache.drill.exec.store.bson.BsonRecordReader.write():75org.apache.drill.exec.store.mongo.MongoRecordReader.next():186org.apache.drill.exec.physical.impl.ScanBatch.next():178org.apache.drill.exec.record.AbstractRecordBatch.next():119org.apache.drill.exec.record.AbstractRecordBatch.next():109org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51org.apache.drill.exec.physical.impl.limit.LimitRecordBatch.innerNext():115org.apache.drill.exec.record.AbstractRecordBatch.next():162org.apache.drill.exec.record.AbstractRecordBatch.next():119org.apache.drill.exec.record.AbstractRecordBatch.next():109org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext():94org.apache.drill.exec.record.AbstractRecordBatch.next():162org.apache.drill.exec.physical.impl.BaseRootExec.next():104org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():92org.apache.drill.exec.physical.impl.BaseRootExec.next():94org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():232org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():226java.security.AccessController.doPrivileged():-2javax.security.auth.Subject.doAs():422org.apache.hadoop.security.UserGroupInformation.doAs():1657org.apache.drill.exec.work.fragment.FragmentExecutor.run():226org.apache.drill.common.SelfCleaningRunnable.run():38java.util.concurrent.ThreadPoolExecutor.runWorker():1142java.util.concurrent.ThreadPoolExecutor$Worker.run():617java.lang.Thread.run():745|

*Working query which is fetching results:*

|select body from mongo.twitter.tweets where tweet_id 
='tag:search.twitter.com,2005:xxxxxxxxxx';|

Sample document in source

|{"_id":ObjectId("58402ad5757d7fede822e641"),"rule_list":["x","(contains:x 
(contains:y OR contains:y1)) OR (contains:v contains:b) OR (contains:v 
(contains:r OR 
contains:t))"],"actor_friends_count":79,"klout_score":19,"actor_favorites_count":0,"actor_preferred_username":"xxxxxxx","sentiment":"neg","tweet_id":"tag:search.twitter.com,2005:xxxxxxxxx","object_actor_followers_count":1286,"actor_posted_time":"2016-07-16T14:08:25.000Z","actor_id":"id:twitter.com:xxxxxxxx","actor_display_name":"xxxxx","retweet_count":6,"hashtag_list":["myhashtag"],"body":"my 
tweet 
body","actor_followers_count":25,"actor_status_count":243,"verb":"share","posted_time":"2016-08-01T07:49:00.000Z","object_actor_status_count":206,"lang":"ar","object_actor_preferred_username":"xxxxxx","original_tweet_id":"tag:search.twitter.com,2005:xxxxxx","gender":"male","object_actor_id":"id:twitter.com:xxxxxxx","favorites_count":0,"object_posted_time":"2016-06-20T04:12:02.000Z","object_actor_friends_count":2516,"generator_display_name":"Twitter 
for iPhone","object_actor_display_name":"sdfsf","actor_listed_count":0}|

Any help is appreciated!

Yousuf


Re: Exception : IndexOutOfBoundsException: index: 0, length: 264 - ... querying mongodb

Posted by yousuf <yo...@css.org.sa>.
Thanks for your reply, able to fix the issue by setting.

set store.mongo.bson.record.reader = false;




On 12/07/2016 08:28 PM, Chunhui Shi wrote:
> The length of utf8 encoded byte array is not guarantee to be the same as
> String.length().  A fix should be in BsonRecordReader.writeString().
>
> On Wed, Dec 7, 2016 at 3:11 AM, yousuf <yo...@css.org.sa> wrote:
>
>> Hi
>>
>> I'm currently exploring apache drill, running on a cluster mode. my
>> datasoure is mongodb.My datasource table contains 5 million documents. I
>> can't execute a simple query
>>
>> |select body from mongo.twitter.tweets limit 10;|
>>
>> *Throwing exception*
>>
>> |QueryFailed:AnErrorOccurredorg.apache.drill.common.
>> exceptions.UserRemoteException:SYSTEM ERROR:IndexOutOfBoundsExceptio
>> n:index:0,length:264(expected:range(0,256))Fragment1:2[Error
>> Id:8903127a-e9e9-407e-8afc-2092b4c03cf0on test01.css.org:31010](java.lan
>> g.IndexOutOfBoundsException)index:0,length:264(expected:rang
>> e(0,256))io.netty.buffer.AbstractByteBuf.checkIndex():1134io
>> .netty.buffer.PooledUnsafeDirectByteBuf.setBytes():272io.
>> netty.buffer.WrappedByteBuf.setBytes():390io.netty.buffer.
>> UnsafeDirectLittleEndian.setBytes():30io.netty.buffer.DrillB
>> uf.setBytes():753io.netty.buffer.AbstractByteBuf.setByte
>> s():510org.apache.drill.exec.store.bson.BsonRecordReader.
>> writeString():265org.apache.drill.exec.store.bson.
>> BsonRecordReader.writeToListOrMap():167org.apache.drill.
>> exec.store.bson.BsonRecordReader.write():75org.apache.drill.
>> exec.store.mongo.MongoRecordReader.next():186org.apache.drill.exec.physi
>> cal.impl.ScanBatch.next():178org.apache.drill.exec.recor
>> d.AbstractRecordBatch.next():119org.apache.drill.exec.
>> record.AbstractRecordBatch.next():109org.apache.drill.
>> exec.record.AbstractSingleRecordBatch.innerNext():51org.
>> apache.drill.exec.physical.impl.limit.LimitRecordBatch.in
>> nerNext():115org.apache.drill.exec.record.AbstractRecordBatc
>> h.next():162org.apache.drill.exec.record.AbstractRecordBatch.next():
>> 119org.apache.drill.exec.record.AbstractRecordBatch.
>> next():109org.apache.drill.exec.record.AbstractSingleReco
>> rdBatch.innerNext():51org.apache.drill.exec.physical.impl.svremover.
>> RemovingRecordBatch.innerNext():94org.apache.drill.exec.
>> record.AbstractRecordBatch.next():162org.apache.drill.exec.physical.impl.
>> BaseRootExec.next():104org.apache.drill.exec.physical.
>> impl.SingleSenderCreator$SingleSenderRootExec.innerNext():
>> 92org.apache.drill.exec.physical.impl.BaseRootExec.
>> next():94org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():
>> 232org.apache.drill.exec.work.fragment.FragmentExecutor$1.
>> run():226java.security.AccessController.doPrivileged():-
>> 2javax.security.auth.Subject.doAs():422org.apache.hadoop.
>> security.UserGroupInformation.doAs():1657org.apache.drill.
>> exec.work.fragment.FragmentExecutor.run():226org.apache.
>> drill.common.SelfCleaningRunnable.run():38java.util.
>> concurrent.ThreadPoolExecutor.runWorker():1142java.util.
>> concurrent.ThreadPoolExecutor$Worker.run():617java.lang.Thread.run():745|
>>
>> *Working query which is fetching results:*
>>
>> |select body from mongo.twitter.tweets where tweet_id ='tag:
>> search.twitter.com,2005:xxxxxxxxxx';|
>>
>> Sample document in source
>>
>> |{"_id":ObjectId("58402ad5757d7fede822e641"),"rule_list":["x","(contains:x
>> (contains:y OR contains:y1)) OR (contains:v contains:b) OR (contains:v
>> (contains:r OR contains:t))"],"actor_friends_
>> count":79,"klout_score":19,"actor_favorites_count":0,"actor_
>> preferred_username":"xxxxxxx","sentiment":"neg","tweet_id":"tag:
>> search.twitter.com,2005:xxxxxxxxx","object_actor_
>> followers_count":1286,"actor_posted_time":"2016-07-16T14:
>> 08:25.000Z","actor_id":"id:twitter.com:xxxxxxxx","actor_
>> display_name":"xxxxx","retweet_count":6,"hashtag_list":["myhashtag"],"body":"my
>> tweet body","actor_followers_count":25,"actor_status_count":243,"v
>> erb":"share","posted_time":"2016-08-01T07:49:00.000Z","objec
>> t_actor_status_count":206,"lang":"ar","object_actor_prefe
>> rred_username":"xxxxxx","original_tweet_id":"tag:search.twitter.com
>> ,2005:xxxxxx","gender":"male","object_actor_id":"id:twitter.com:
>> xxxxxxx","favorites_count":0,"object_posted_time":"2016-06-20T04:12:02.
>> 000Z","object_actor_friends_count":2516,"generator_display_name":"Twitter
>> for iPhone","object_actor_display_name":"sdfsf","actor_listed_count":0}|
>>
>> Any help is appreciated!
>>
>> Yousuf
>>
>>


Re: Exception : IndexOutOfBoundsException: index: 0, length: 264 - ... querying mongodb

Posted by Chunhui Shi <cs...@maprtech.com>.
The length of utf8 encoded byte array is not guarantee to be the same as
String.length().  A fix should be in BsonRecordReader.writeString().

On Wed, Dec 7, 2016 at 3:11 AM, yousuf <yo...@css.org.sa> wrote:

>
> Hi
>
> I'm currently exploring apache drill, running on a cluster mode. my
> datasoure is mongodb.My datasource table contains 5 million documents. I
> can't execute a simple query
>
> |select body from mongo.twitter.tweets limit 10;|
>
> *Throwing exception*
>
> |QueryFailed:AnErrorOccurredorg.apache.drill.common.
> exceptions.UserRemoteException:SYSTEM ERROR:IndexOutOfBoundsExceptio
> n:index:0,length:264(expected:range(0,256))Fragment1:2[Error
> Id:8903127a-e9e9-407e-8afc-2092b4c03cf0on test01.css.org:31010](java.lan
> g.IndexOutOfBoundsException)index:0,length:264(expected:rang
> e(0,256))io.netty.buffer.AbstractByteBuf.checkIndex():1134io
> .netty.buffer.PooledUnsafeDirectByteBuf.setBytes():272io.
> netty.buffer.WrappedByteBuf.setBytes():390io.netty.buffer.
> UnsafeDirectLittleEndian.setBytes():30io.netty.buffer.DrillB
> uf.setBytes():753io.netty.buffer.AbstractByteBuf.setByte
> s():510org.apache.drill.exec.store.bson.BsonRecordReader.
> writeString():265org.apache.drill.exec.store.bson.
> BsonRecordReader.writeToListOrMap():167org.apache.drill.
> exec.store.bson.BsonRecordReader.write():75org.apache.drill.
> exec.store.mongo.MongoRecordReader.next():186org.apache.drill.exec.physi
> cal.impl.ScanBatch.next():178org.apache.drill.exec.recor
> d.AbstractRecordBatch.next():119org.apache.drill.exec.
> record.AbstractRecordBatch.next():109org.apache.drill.
> exec.record.AbstractSingleRecordBatch.innerNext():51org.
> apache.drill.exec.physical.impl.limit.LimitRecordBatch.in
> nerNext():115org.apache.drill.exec.record.AbstractRecordBatc
> h.next():162org.apache.drill.exec.record.AbstractRecordBatch.next():
> 119org.apache.drill.exec.record.AbstractRecordBatch.
> next():109org.apache.drill.exec.record.AbstractSingleReco
> rdBatch.innerNext():51org.apache.drill.exec.physical.impl.svremover.
> RemovingRecordBatch.innerNext():94org.apache.drill.exec.
> record.AbstractRecordBatch.next():162org.apache.drill.exec.physical.impl.
> BaseRootExec.next():104org.apache.drill.exec.physical.
> impl.SingleSenderCreator$SingleSenderRootExec.innerNext():
> 92org.apache.drill.exec.physical.impl.BaseRootExec.
> next():94org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():
> 232org.apache.drill.exec.work.fragment.FragmentExecutor$1.
> run():226java.security.AccessController.doPrivileged():-
> 2javax.security.auth.Subject.doAs():422org.apache.hadoop.
> security.UserGroupInformation.doAs():1657org.apache.drill.
> exec.work.fragment.FragmentExecutor.run():226org.apache.
> drill.common.SelfCleaningRunnable.run():38java.util.
> concurrent.ThreadPoolExecutor.runWorker():1142java.util.
> concurrent.ThreadPoolExecutor$Worker.run():617java.lang.Thread.run():745|
>
> *Working query which is fetching results:*
>
> |select body from mongo.twitter.tweets where tweet_id ='tag:
> search.twitter.com,2005:xxxxxxxxxx';|
>
> Sample document in source
>
> |{"_id":ObjectId("58402ad5757d7fede822e641"),"rule_list":["x","(contains:x
> (contains:y OR contains:y1)) OR (contains:v contains:b) OR (contains:v
> (contains:r OR contains:t))"],"actor_friends_
> count":79,"klout_score":19,"actor_favorites_count":0,"actor_
> preferred_username":"xxxxxxx","sentiment":"neg","tweet_id":"tag:
> search.twitter.com,2005:xxxxxxxxx","object_actor_
> followers_count":1286,"actor_posted_time":"2016-07-16T14:
> 08:25.000Z","actor_id":"id:twitter.com:xxxxxxxx","actor_
> display_name":"xxxxx","retweet_count":6,"hashtag_list":["myhashtag"],"body":"my
> tweet body","actor_followers_count":25,"actor_status_count":243,"v
> erb":"share","posted_time":"2016-08-01T07:49:00.000Z","objec
> t_actor_status_count":206,"lang":"ar","object_actor_prefe
> rred_username":"xxxxxx","original_tweet_id":"tag:search.twitter.com
> ,2005:xxxxxx","gender":"male","object_actor_id":"id:twitter.com:
> xxxxxxx","favorites_count":0,"object_posted_time":"2016-06-20T04:12:02.
> 000Z","object_actor_friends_count":2516,"generator_display_name":"Twitter
> for iPhone","object_actor_display_name":"sdfsf","actor_listed_count":0}|
>
> Any help is appreciated!
>
> Yousuf
>
>

Re: Exception : IndexOutOfBoundsException: index: 0, length: 264 - ... querying mongodb

Posted by yousuf <yo...@css.org.sa>.
Able to fix the solution by setting set store.mongo.bson.record.reader = 
false;


Thanks
On 12/08/2016 10:04 AM, yousuf wrote:
>
> Hi,
>
> Thank you for your reply.
>
> Fyi, the body field having arabic & english tweets, I'm using mongo 
> 3.2.11 version and apache-drill 1.8.0
>
>
> Thanks & Kind Regards
>
>
> On 12/07/2016 09:24 PM, Kathleen Li wrote:
>> I am not able to reproduce your issue at least with your one sample record, reproduce step:
>> (1) from mongodb, display your sample record:
>>> db.kath.find().pretty();
>> {
>>
>> 	"_id" : ObjectId("58402ad5757d7fede822e641"),
>> 	"rule_list" : [
>> 		"x",
>> 		"(contains:x(contains:y OR contains:y1)) OR (contains:v contains:b) OR (contains:v(contains:r OR contains:t))"
>> 	],
>> 	"actor_friends_count" : 79,
>> 	"klout_score" : 19,
>> 	"actor_favorites_count" : 0,
>> 	"actor_preferred_username" : "xxxxxxx",
>> 	"sentiment" : "neg",
>> 	"tweet_id" : "tag:search.twitter.com,2005:xxxxxxxxx",
>> 	"object_actor_followers_count" : 1286,
>> 	"actor_posted_time" : "2016-07-16T14:08:25.000Z",
>> 	"actor_id" : "id:twitter.com:xxxxxxxx",
>> 	"actor_display_name" : "xxxxx",
>> 	"retweet_count" : 6,
>> 	"hashtag_list" : [
>> 		"myhashtag"
>> 	],
>> 	"body" : "my tweet body",
>> 	"actor_followers_count" : 25,
>> 	"actor_status_count" : 243,
>> 	"verb" : "share",
>> 	"posted_time" : "2016-08-01T07:49:00.000Z",
>> 	"object_actor_status_count" : 206,
>> 	"lang" : "ar",
>> 	"object_actor_preferred_username" : "xxxxxx",
>> 	"original_tweet_id" : "tag:search.twitter.com,2005:xxxxxx",
>> 	"gender" : "male",
>> 	"object_actor_id" : "id:twitter.com:xxxxxxx",
>> 	"favorites_count" : 0,
>> 	"object_posted_time" : "2016-06-20T04:12:02.000Z",
>> 	"object_actor_friends_count" : 2516,
>> 	"generator_display_name" : "Twitter for iPhone",
>> 	"object_actor_display_name" : "sdfsf",
>> 	"actor_listed_count" : 0
>> }
>>
>>
>>
>>
>> (2)query from drill
>> 0: jdbc:drill:zk=drill1:5181,drill2:5181,dril> select body from kath where tweet_id='tag:search.twitter.com,2005:xxxxxxxxx'
>> . . . . . . . . . . . . . . . . . . . . . . .> ;
>> +----------------+
>> | body |
>> +----------------+
>> | my tweet body |
>> +----------------+
>> 1 row selected (0.285 seconds)
>> 0: jdbc:drill:zk=drill1:5181,drill2:5181,dril> select body from kath limit 1;
>> +----------------+
>> | body |
>> +----------------+
>> | my tweet body |
>> +----------------+
>>
>>
>>
>> The drill version I am using is
>>
>> 0: jdbc:drill:zk=drill1:5181,drill2:5181,dril> select * from sys.version;
>> +----------+-------------------------------------------+-----------------------------------------------------------------+----------------------------+--------------+----------------------------+
>> | version | commit_id | commit_message | commit_time | build_email | build_time |
>> +----------+-------------------------------------------+-----------------------------------------------------------------+----------------------------+--------------+----------------------------+
>> | 1.8.0 | cd599b4ab670aa5d317b80a31326f9bcf8c0aa72 | MD-1127: Add system property to disable loopback address check | 19.09.2016 @ 22:46:34 UTC | Unknown | 19.09.2016 @ 22:53:13 UTC |
>> +----------+------------------------------------
>>
>>
>>
>>
>>
>>
>>
>> On 12/7/16, 3:11 AM, "yousuf"<yo...@css.org.sa>  wrote:
>>
>>> Hi
>>>
>>> I'm currently exploring apache drill, running on a cluster mode. my
>>> datasoure is mongodb.My datasource table contains 5 million documents. I
>>> can't execute a simple query
>>>
>>> |select body from mongo.twitter.tweets limit 10;|
>>>
>>> *Throwing exception*
>>>
>>> |QueryFailed:AnErrorOccurredorg.apache.drill.common.exceptions.UserRemoteException:SYSTEM
>>> ERROR:IndexOutOfBoundsException:index:0,length:264(expected:range(0,256))Fragment1:2[ErrorId:8903127a-e9e9-407e-8afc-2092b4c03cf0on
>>> test01.css.org:31010](java.lang.IndexOutOfBoundsException)index:0,length:264(expected:range(0,256))io.netty.buffer.AbstractByteBuf.checkIndex():1134io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes():272io.netty.buffer.WrappedByteBuf.setBytes():390io.netty.buffer.UnsafeDirectLittleEndian.setBytes():30io.netty.buffer.DrillBuf.setBytes():753io.netty.buffer.AbstractByteBuf.setBytes():510org.apache.drill.exec.store.bson.BsonRecordReader.writeString():265org.apache.drill.exec.store.bson.BsonRecordReader.writeToListOrMap():167org.apache.drill.exec.store.bson.BsonRecordReader.write():75org.apache.drill.exec.store.mongo.MongoRecordReader.next():186org.apache.drill.exec.physical.impl.ScanBatch.next():178org.apache.drill.exec.record.AbstractRecordBatch.next():119org.apache.drill.exec.record.AbstractRecordBatch.next():109org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51org.apache.drill.exec.physical.impl.limit.LimitRecordBatch.innerNext():115org.apache.drill.exec.record.Ab
>>   stractRecordBatch.next():162org.apache.drill.exec.record.AbstractRecordBatch.next():119org.apache.drill.exec.record.AbstractRecordBatch.next():109org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext():94org.apache.drill.exec.record.AbstractRecordBatch.next():162org.apache.drill.exec.physical.impl.BaseRootExec.next():104org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():92org.apache.drill.exec.physical.impl.BaseRootExec.next():94org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():232org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():226java.security.AccessController.doPrivileged():-2javax.security.auth.Subject.doAs():422org.apache.hadoop.security.UserGroupInformation.doAs():1657org.apache.drill.exec.work.fragment.FragmentExecutor.run():226org.apache.drill.common.SelfCleaningRunnable.run():38java.util.concurrent.ThreadPoolExecutor.runWork
>>   er():1142java.util.concurrent.ThreadPoolExecutor$Worker.run():617java.lang.Thread.run():745|
>>> *Working query which is fetching results:*
>>>
>>> |select body from mongo.twitter.tweets where tweet_id
>>> ='tag:search.twitter.com,2005:xxxxxxxxxx';|
>>>
>>> Sample document in source
>>>
>>> |{"_id":ObjectId("58402ad5757d7fede822e641"),"rule_list":["x","(contains:x
>>> (contains:y OR contains:y1)) OR (contains:v contains:b) OR (contains:v
>>> (contains:r OR
>>> contains:t))"],"actor_friends_count":79,"klout_score":19,"actor_favorites_count":0,"actor_preferred_username":"xxxxxxx","sentiment":"neg","tweet_id":"tag:search.twitter.com,2005:xxxxxxxxx","object_actor_followers_count":1286,"actor_posted_time":"2016-07-16T14:08:25.000Z","actor_id":"id:twitter.com:xxxxxxxx","actor_display_name":"xxxxx","retweet_count":6,"hashtag_list":["myhashtag"],"body":"my
>>> tweet
>>> body","actor_followers_count":25,"actor_status_count":243,"verb":"share","posted_time":"2016-08-01T07:49:00.000Z","object_actor_status_count":206,"lang":"ar","object_actor_preferred_username":"xxxxxx","original_tweet_id":"tag:search.twitter.com,2005:xxxxxx","gender":"male","object_actor_id":"id:twitter.com:xxxxxxx","favorites_count":0,"object_posted_time":"2016-06-20T04:12:02.000Z","object_actor_friends_count":2516,"generator_display_name":"Twitter
>>> for iPhone","object_actor_display_name":"sdfsf","actor_listed_count":0}|
>>>
>>> Any help is appreciated!
>>>
>>> Yousuf
>>>
>


Re: Exception : IndexOutOfBoundsException: index: 0, length: 264 - ... querying mongodb

Posted by yousuf <yo...@css.org.sa>.
Hi,

Thank you for your reply.

Fyi, the body field having arabic & english tweets, I'm using mongo 
3.2.11 version and apache-drill 1.8.0


Thanks & Kind Regards


On 12/07/2016 09:24 PM, Kathleen Li wrote:
> I am not able to reproduce your issue at least with your one sample record, reproduce step:
> (1) from mongodb, display your sample record:
>> db.kath.find().pretty();
> {
>
> 	"_id" : ObjectId("58402ad5757d7fede822e641"),
> 	"rule_list" : [
> 		"x",
> 		"(contains:x(contains:y OR contains:y1)) OR (contains:v contains:b) OR (contains:v(contains:r OR contains:t))"
> 	],
> 	"actor_friends_count" : 79,
> 	"klout_score" : 19,
> 	"actor_favorites_count" : 0,
> 	"actor_preferred_username" : "xxxxxxx",
> 	"sentiment" : "neg",
> 	"tweet_id" : "tag:search.twitter.com,2005:xxxxxxxxx",
> 	"object_actor_followers_count" : 1286,
> 	"actor_posted_time" : "2016-07-16T14:08:25.000Z",
> 	"actor_id" : "id:twitter.com:xxxxxxxx",
> 	"actor_display_name" : "xxxxx",
> 	"retweet_count" : 6,
> 	"hashtag_list" : [
> 		"myhashtag"
> 	],
> 	"body" : "my tweet body",
> 	"actor_followers_count" : 25,
> 	"actor_status_count" : 243,
> 	"verb" : "share",
> 	"posted_time" : "2016-08-01T07:49:00.000Z",
> 	"object_actor_status_count" : 206,
> 	"lang" : "ar",
> 	"object_actor_preferred_username" : "xxxxxx",
> 	"original_tweet_id" : "tag:search.twitter.com,2005:xxxxxx",
> 	"gender" : "male",
> 	"object_actor_id" : "id:twitter.com:xxxxxxx",
> 	"favorites_count" : 0,
> 	"object_posted_time" : "2016-06-20T04:12:02.000Z",
> 	"object_actor_friends_count" : 2516,
> 	"generator_display_name" : "Twitter for iPhone",
> 	"object_actor_display_name" : "sdfsf",
> 	"actor_listed_count" : 0
> }
>
>
>
>
> (2)query from drill
> 0: jdbc:drill:zk=drill1:5181,drill2:5181,dril> select body from kath where tweet_id='tag:search.twitter.com,2005:xxxxxxxxx'
> . . . . . . . . . . . . . . . . . . . . . . .> ;
> +----------------+
> | body |
> +----------------+
> | my tweet body |
> +----------------+
> 1 row selected (0.285 seconds)
> 0: jdbc:drill:zk=drill1:5181,drill2:5181,dril> select body from kath limit 1;
> +----------------+
> | body |
> +----------------+
> | my tweet body |
> +----------------+
>
>
>
> The drill version I am using is
>
> 0: jdbc:drill:zk=drill1:5181,drill2:5181,dril> select * from sys.version;
> +----------+-------------------------------------------+-----------------------------------------------------------------+----------------------------+--------------+----------------------------+
> | version | commit_id | commit_message | commit_time | build_email | build_time |
> +----------+-------------------------------------------+-----------------------------------------------------------------+----------------------------+--------------+----------------------------+
> | 1.8.0 | cd599b4ab670aa5d317b80a31326f9bcf8c0aa72 | MD-1127: Add system property to disable loopback address check | 19.09.2016 @ 22:46:34 UTC | Unknown | 19.09.2016 @ 22:53:13 UTC |
> +----------+------------------------------------
>
>
>
>
>
>
>
> On 12/7/16, 3:11 AM, "yousuf" <yo...@css.org.sa> wrote:
>
>> Hi
>>
>> I'm currently exploring apache drill, running on a cluster mode. my
>> datasoure is mongodb.My datasource table contains 5 million documents. I
>> can't execute a simple query
>>
>> |select body from mongo.twitter.tweets limit 10;|
>>
>> *Throwing exception*
>>
>> |QueryFailed:AnErrorOccurredorg.apache.drill.common.exceptions.UserRemoteException:SYSTEM
>> ERROR:IndexOutOfBoundsException:index:0,length:264(expected:range(0,256))Fragment1:2[ErrorId:8903127a-e9e9-407e-8afc-2092b4c03cf0on
>> test01.css.org:31010](java.lang.IndexOutOfBoundsException)index:0,length:264(expected:range(0,256))io.netty.buffer.AbstractByteBuf.checkIndex():1134io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes():272io.netty.buffer.WrappedByteBuf.setBytes():390io.netty.buffer.UnsafeDirectLittleEndian.setBytes():30io.netty.buffer.DrillBuf.setBytes():753io.netty.buffer.AbstractByteBuf.setBytes():510org.apache.drill.exec.store.bson.BsonRecordReader.writeString():265org.apache.drill.exec.store.bson.BsonRecordReader.writeToListOrMap():167org.apache.drill.exec.store.bson.BsonRecordReader.write():75org.apache.drill.exec.store.mongo.MongoRecordReader.next():186org.apache.drill.exec.physical.impl.ScanBatch.next():178org.apache.drill.exec.record.AbstractRecordBatch.next():119org.apache.drill.exec.record.AbstractRecordBatch.next():109org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51org.apache.drill.exec.physical.impl.limit.LimitRecordBatch.innerNext():115org.apache.drill.exec.record.Ab
>   stractRecordBatch.next():162org.apache.drill.exec.record.AbstractRecordBatch.next():119org.apache.drill.exec.record.AbstractRecordBatch.next():109org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext():94org.apache.drill.exec.record.AbstractRecordBatch.next():162org.apache.drill.exec.physical.impl.BaseRootExec.next():104org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():92org.apache.drill.exec.physical.impl.BaseRootExec.next():94org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():232org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():226java.security.AccessController.doPrivileged():-2javax.security.auth.Subject.doAs():422org.apache.hadoop.security.UserGroupInformation.doAs():1657org.apache.drill.exec.work.fragment.FragmentExecutor.run():226org.apache.drill.common.SelfCleaningRunnable.run():38java.util.concurrent.ThreadPoolExecutor.runWork
>   er():1142java.util.concurrent.ThreadPoolExecutor$Worker.run():617java.lang.Thread.run():745|
>> *Working query which is fetching results:*
>>
>> |select body from mongo.twitter.tweets where tweet_id
>> ='tag:search.twitter.com,2005:xxxxxxxxxx';|
>>
>> Sample document in source
>>
>> |{"_id":ObjectId("58402ad5757d7fede822e641"),"rule_list":["x","(contains:x
>> (contains:y OR contains:y1)) OR (contains:v contains:b) OR (contains:v
>> (contains:r OR
>> contains:t))"],"actor_friends_count":79,"klout_score":19,"actor_favorites_count":0,"actor_preferred_username":"xxxxxxx","sentiment":"neg","tweet_id":"tag:search.twitter.com,2005:xxxxxxxxx","object_actor_followers_count":1286,"actor_posted_time":"2016-07-16T14:08:25.000Z","actor_id":"id:twitter.com:xxxxxxxx","actor_display_name":"xxxxx","retweet_count":6,"hashtag_list":["myhashtag"],"body":"my
>> tweet
>> body","actor_followers_count":25,"actor_status_count":243,"verb":"share","posted_time":"2016-08-01T07:49:00.000Z","object_actor_status_count":206,"lang":"ar","object_actor_preferred_username":"xxxxxx","original_tweet_id":"tag:search.twitter.com,2005:xxxxxx","gender":"male","object_actor_id":"id:twitter.com:xxxxxxx","favorites_count":0,"object_posted_time":"2016-06-20T04:12:02.000Z","object_actor_friends_count":2516,"generator_display_name":"Twitter
>> for iPhone","object_actor_display_name":"sdfsf","actor_listed_count":0}|
>>
>> Any help is appreciated!
>>
>> Yousuf
>>
>


Re: Exception : IndexOutOfBoundsException: index: 0, length: 264 - ... querying mongodb

Posted by Kathleen Li <kl...@maprtech.com>.
I am not able to reproduce your issue at least with your one sample record, reproduce step:
(1) from mongodb, display your sample record:
>db.kath.find().pretty();
{

	"_id" : ObjectId("58402ad5757d7fede822e641"),
	"rule_list" : [
		"x",
		"(contains:x(contains:y OR contains:y1)) OR (contains:v contains:b) OR (contains:v(contains:r OR contains:t))"
	],
	"actor_friends_count" : 79,
	"klout_score" : 19,
	"actor_favorites_count" : 0,
	"actor_preferred_username" : "xxxxxxx",
	"sentiment" : "neg",
	"tweet_id" : "tag:search.twitter.com,2005:xxxxxxxxx",
	"object_actor_followers_count" : 1286,
	"actor_posted_time" : "2016-07-16T14:08:25.000Z",
	"actor_id" : "id:twitter.com:xxxxxxxx",
	"actor_display_name" : "xxxxx",
	"retweet_count" : 6,
	"hashtag_list" : [
		"myhashtag"
	],
	"body" : "my tweet body",
	"actor_followers_count" : 25,
	"actor_status_count" : 243,
	"verb" : "share",
	"posted_time" : "2016-08-01T07:49:00.000Z",
	"object_actor_status_count" : 206,
	"lang" : "ar",
	"object_actor_preferred_username" : "xxxxxx",
	"original_tweet_id" : "tag:search.twitter.com,2005:xxxxxx",
	"gender" : "male",
	"object_actor_id" : "id:twitter.com:xxxxxxx",
	"favorites_count" : 0,
	"object_posted_time" : "2016-06-20T04:12:02.000Z",
	"object_actor_friends_count" : 2516,
	"generator_display_name" : "Twitter for iPhone",
	"object_actor_display_name" : "sdfsf",
	"actor_listed_count" : 0
}




(2)query from drill
0: jdbc:drill:zk=drill1:5181,drill2:5181,dril> select body from kath where tweet_id='tag:search.twitter.com,2005:xxxxxxxxx'
. . . . . . . . . . . . . . . . . . . . . . .> ;
+----------------+
| body |
+----------------+
| my tweet body |
+----------------+
1 row selected (0.285 seconds)
0: jdbc:drill:zk=drill1:5181,drill2:5181,dril> select body from kath limit 1;
+----------------+
| body |
+----------------+
| my tweet body |
+----------------+



The drill version I am using is 

0: jdbc:drill:zk=drill1:5181,drill2:5181,dril> select * from sys.version;
+----------+-------------------------------------------+-----------------------------------------------------------------+----------------------------+--------------+----------------------------+
| version | commit_id | commit_message | commit_time | build_email | build_time |
+----------+-------------------------------------------+-----------------------------------------------------------------+----------------------------+--------------+----------------------------+
| 1.8.0 | cd599b4ab670aa5d317b80a31326f9bcf8c0aa72 | MD-1127: Add system property to disable loopback address check | 19.09.2016 @ 22:46:34 UTC | Unknown | 19.09.2016 @ 22:53:13 UTC |
+----------+------------------------------------







On 12/7/16, 3:11 AM, "yousuf" <yo...@css.org.sa> wrote:

>
>Hi
>
>I'm currently exploring apache drill, running on a cluster mode. my 
>datasoure is mongodb.My datasource table contains 5 million documents. I 
>can't execute a simple query
>
>|select body from mongo.twitter.tweets limit 10;|
>
>*Throwing exception*
>
>|QueryFailed:AnErrorOccurredorg.apache.drill.common.exceptions.UserRemoteException:SYSTEM 
>ERROR:IndexOutOfBoundsException:index:0,length:264(expected:range(0,256))Fragment1:2[ErrorId:8903127a-e9e9-407e-8afc-2092b4c03cf0on 
>test01.css.org:31010](java.lang.IndexOutOfBoundsException)index:0,length:264(expected:range(0,256))io.netty.buffer.AbstractByteBuf.checkIndex():1134io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes():272io.netty.buffer.WrappedByteBuf.setBytes():390io.netty.buffer.UnsafeDirectLittleEndian.setBytes():30io.netty.buffer.DrillBuf.setBytes():753io.netty.buffer.AbstractByteBuf.setBytes():510org.apache.drill.exec.store.bson.BsonRecordReader.writeString():265org.apache.drill.exec.store.bson.BsonRecordReader.writeToListOrMap():167org.apache.drill.exec.store.bson.BsonRecordReader.write():75org.apache.drill.exec.store.mongo.MongoRecordReader.next():186org.apache.drill.exec.physical.impl.ScanBatch.next():178org.apache.drill.exec.record.AbstractRecordBatch.next():119org.apache.drill.exec.record.AbstractRecordBatch.next():109org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51org.apache.drill.exec.physical.impl.limit.LimitRecordBatch.innerNext():115org.apache.drill.exec.record.Ab
 stractRecordBatch.next():162org.apache.drill.exec.record.AbstractRecordBatch.next():119org.apache.drill.exec.record.AbstractRecordBatch.next():109org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext():94org.apache.drill.exec.record.AbstractRecordBatch.next():162org.apache.drill.exec.physical.impl.BaseRootExec.next():104org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():92org.apache.drill.exec.physical.impl.BaseRootExec.next():94org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():232org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():226java.security.AccessController.doPrivileged():-2javax.security.auth.Subject.doAs():422org.apache.hadoop.security.UserGroupInformation.doAs():1657org.apache.drill.exec.work.fragment.FragmentExecutor.run():226org.apache.drill.common.SelfCleaningRunnable.run():38java.util.concurrent.ThreadPoolExecutor.runWork
 er():1142java.util.concurrent.ThreadPoolExecutor$Worker.run():617java.lang.Thread.run():745|
>
>*Working query which is fetching results:*
>
>|select body from mongo.twitter.tweets where tweet_id 
>='tag:search.twitter.com,2005:xxxxxxxxxx';|
>
>Sample document in source
>
>|{"_id":ObjectId("58402ad5757d7fede822e641"),"rule_list":["x","(contains:x 
>(contains:y OR contains:y1)) OR (contains:v contains:b) OR (contains:v 
>(contains:r OR 
>contains:t))"],"actor_friends_count":79,"klout_score":19,"actor_favorites_count":0,"actor_preferred_username":"xxxxxxx","sentiment":"neg","tweet_id":"tag:search.twitter.com,2005:xxxxxxxxx","object_actor_followers_count":1286,"actor_posted_time":"2016-07-16T14:08:25.000Z","actor_id":"id:twitter.com:xxxxxxxx","actor_display_name":"xxxxx","retweet_count":6,"hashtag_list":["myhashtag"],"body":"my 
>tweet 
>body","actor_followers_count":25,"actor_status_count":243,"verb":"share","posted_time":"2016-08-01T07:49:00.000Z","object_actor_status_count":206,"lang":"ar","object_actor_preferred_username":"xxxxxx","original_tweet_id":"tag:search.twitter.com,2005:xxxxxx","gender":"male","object_actor_id":"id:twitter.com:xxxxxxx","favorites_count":0,"object_posted_time":"2016-06-20T04:12:02.000Z","object_actor_friends_count":2516,"generator_display_name":"Twitter 
>for iPhone","object_actor_display_name":"sdfsf","actor_listed_count":0}|
>
>Any help is appreciated!
>
>Yousuf
>