You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@storm.apache.org by "Raja.Aravapalli" <Ra...@target.com> on 2016/05/26 19:39:10 UTC

Storm bolt doesn't guarantee to process the records in order they receive ?

Hi

I had a storm topology that reads records from kafka, extracts timestamp present in the record, and does a lookup on hbase table, apply business logic, and then updates the hbase table with latest values in the current record!!

I have written a custom hbase bolt extending BaseRichBolt, where, the code, does a lookup on the hbase table and apply some business logic on the message that has been read from kafka, and then updates the hbase table with latest data!

The problem i am seeing is, some times, the bolt is receiving/processing the records in a jumbled order, due to which my application is thinking that a particular record is already processed, and ignoring the record!!! Application is not processing a serious amount of records due to this!!

For Example:

suppose there are two records that are read from kafka, one record belongs to 10th hour and second records belongs to 11th hour...

My custom HBase bolt, processing the 11th hour record first... then reading/processing the 10th hour record later!! Because, 11th hour record is processed first, application is assuming 10th record is already processed and ignoring the 10th hour record from processing!!

Can someone pls help me understand, why my custom hbase bolt is not processing the records in order it receive ?

should i have to mention any additional properties to ensure, the bolt processes the records in the order it receives ? what are possible alternatives i can try to fix this ?

FYI, i am using field grouping for hbase bolt, thru which i want to ensure, all the records of a particular user goes into same task!! Nevertheless to mention, thinking field grouping might causing the issue, reduces the no.of tasks for my custom hbase bolt to 1 task, still the same issue!!

Wondering why hbase bolt is not reading/processing records in the order it receives !!! Please someone help me with your thoughts!!

Thanks a lot.


Regards,
Raja.

auto backpressure in 1.x seems to be problematic

Posted by 이승진 <sw...@navercorp.com>.

  
Hello all,
 
I recently upgraded my storm cluster from 0.9.5 to 1.0.0, and then 1.0.1.
 
In 1.0.0, if a certain bolt is running slowly and tuples are queued up the auto backpressure throttles the spout.
 
but after that bolt consumed and executed all the tuples, spouts is still not back up as normal again.
 
The reason is one backpressure-related node in zookeeper (under STORM_ROOT/backpressure/worker-id/) is not being deleted.
 
Once I delete it manually the topology runs well again.
 
This problem happens very frequently in 1.0.0, though less severe but still happening in 1.0.1
 
I thought STORM-1696/1731 solved the problem and in that hope I upgraded again the storm cluster from 1.0.0 to 1.0.1 but it didn't work out as I expected.
 
Obviously this can be in part due to bad combination of configured topology.maxSpoutPending and topology.sleep.spout.wait.strategy.time.ms, but auto backpressure is supposed to handle this kind of stuff regardless of those configuration, if I am understanding correctly.
 
Any commend would be a big help for me, and I want to ask out all of you if you are having the same problem.