You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by Georgy Abraham <it...@gmail.com> on 2014/08/20 03:57:11 UTC

RE: java.lang.ArrayIndexOutOfBoundsException:3atbacktype.storm.utils.DisruptorQueue.consumeBatchToCursor

One quick thing to check , if the message is being replayed the offset of Kafka queue won't get incremented for each message . For example, if there was 5 msgs earlier in Kafka queue , the log will show committing offset 5 . Of you pit a new message , ideally after processing the offset should increment to 6 , if not mostly the case would be replaying the message because guaranteed message processing is enabled for Kafka spout 

-----Original Message-----
From: Kushan Maskey
Sent: 20-08-2014 AM 01:55
To: user@storm.incubator.apache.org
Subject: Re: java.lang.ArrayIndexOutOfBoundsException:3atbacktype.storm.utils.DisruptorQueue.consumeBatchToCursor

When i look at the worker logs, I see that KafkaSpout keep trying to get message from Kafka and then emits some thing even though there is no value and i see some logs that I print out in the bolt execute method being printed out.
 


I do not see the error now as I have added a check in my code to see if the data is null or empty.




Thanks.







 
--
Kushan Maskey
 


On Tue, Aug 19, 2014 at 2:10 PM, Georgy Abraham <it...@gmail.com> wrote:

 


I have tried Kafka spout of wrustmenier , the one that's getting integrated in 0.92 storm . That didn't give me any such problem . Are you sure its emitting empty messges . If any tulle coming from Kafka spout is not acknowledged , it will be replayed after the timeout. Is this your problem ??
 


From: Kushan Maskey
Sent: 19-08-2014 PM 07:18
 To: user@storm.incubator.apache.org
 Subject: Re: java.lang.ArrayIndexOutOfBoundsException: 3atbacktype.storm.utils.DisruptorQueue.consumeBatchToCursor
 



Hi georgy,



Thanks for the reply. I realized that it was coming from my code and  I have resolved my problem.




So what I found out is that even when there is no message in kafka to be read, KafkaSpout keep emitting a null or empty string fields. I take that emitted value and then i parse the data in my code. That is when it is throwing that exception. My question now will be why would KafkaSpout emit null or empty values where there is no data on kafka to be read.




Thanks.





--
Kushan Maskey

 


On Mon, Aug 18, 2014 at 9:39 PM, Georgy Abraham <it...@gmail.com> wrote:




>From the error messaage the Array index out of bounds is coming from your code . Maybe you missed something ?? You are using StormSubmitter class to run it on cluster right ?? 
I haven't tried with a different curator version , so don't know that.



From: Kushan Maskey
Sent: 14-08-2014 PM 10:03
To: user@storm.incubator.apache.org
Subject: java.lang.ArrayIndexOutOfBoundsException: 3 atbacktype.storm.utils.DisruptorQueue.consumeBatchToCursor




I am getting this error message in the Storm UI. Topology works fine on localCluster.






java.lang.RuntimeException: java.lang.ArrayIndexOutOfBoundsException: 3 at backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:128) at backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:99) at backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:80) at backtype.storm.daemon.executor$fn__5641$fn__5653$fn__5700.invoke(executor.clj:746) at backtype.storm.util$async_loop$fn__457.invoke(util.clj:431) at clojure.lang.AFn.run(AFn.java:24) at java.lang.Thread.run(Thread.java:744) Caused by: java.lang.ArrayIndexOutOfBoundsException: 3 at <my package>.method(<My Class>.java:135) at <My Class>.method(<MyClass>.java:83) at <MyBolt>.execute(<MyBolt>.java:56) at backtype.storm.topology.BasicBoltExecutor.execute(BasicBoltExecutor.java:50) at backtype.storm.daemon.executor$fn__5641$tuple_action_fn__5643.invoke(executor.clj:631) at backtype.storm.daemon.executor$mk_task_receiver$fn__5564.invoke(executor.clj:399) at backtype.storm.disruptor$clojure_handler$reify__745.onEvent(disruptor.clj:58) at backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:125) ... 6 more








I am wondering if it has to do with curator version. Coz the storm distribution comes with curator 2.4.0 and i think we have to use curator 2.5.0. 




I am using storm 0.9.2 with kafka_2.10-0.8.1.1, zookeeper 3.4.5.





--
Kushan Maskey
817.403.7500
 

Re: data cleansing in real time systems

Posted by Nathan Marz <na...@nathanmarz.com>.
Deletion is typically done by running a job that copies the master dataset
into a new folder, filtering out bad data along the way. This is expensive,
but that's ok since this is only done in rare circumstances. When I've done
this in the past I'm extra careful before deleting the corrupted master
dataset by collecting stats before/after to make sure I've filtered out
only the bad stuff.


On Tue, Aug 19, 2014 at 10:33 PM, Adaryl "Bob" Wakefield, MBA <
adaryl.wakefield@hotmail.com> wrote:

>   I need help clearing something up. So I read this:
> http://nathanmarz.com/blog/how-to-beat-the-cap-theorem.html
>
> And in it he says:
> “Likewise, writing bad data has a clear path to recovery: delete the bad
> data and precompute the queries again. Since data is immutable and the
> master dataset is append-only, writing bad data does not override or
> otherwise destroy good data.”
>
> That sentence makes no sense to me.
>
> Data is immutable – > master dataset is append-only – > delete the bad data
>
> What? He gives an example of in the batch layer you store raw files in
> HDFS. My understanding is that you can’t do row level deletes on files in
> HDFS (because it’s append-only). What am I missing  here?
>
> Adaryl "Bob" Wakefield, MBA
> Principal
> Mass Street Analytics
> 913.938.6685
> www.linkedin.com/in/bobwakefieldmba
> Twitter: @BobLovesData
>
>
>



-- 
Twitter: @nathanmarz
http://nathanmarz.com

data cleansing in real time systems

Posted by "Adaryl \"Bob\" Wakefield, MBA" <ad...@hotmail.com>.
I need help clearing something up. So I read this:
http://nathanmarz.com/blog/how-to-beat-the-cap-theorem.html

And in it he says:
“Likewise, writing bad data has a clear path to recovery: delete the bad data and precompute the queries again. Since data is immutable and the master dataset is append-only, writing bad data does not override or otherwise destroy good data.”

That sentence makes no sense to me. 

Data is immutable – > master dataset is append-only – > delete the bad data

What? He gives an example of in the batch layer you store raw files in HDFS. My understanding is that you can’t do row level deletes on files in HDFS (because it’s append-only). What am I missing  here?

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData