You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@bahir.apache.org by Arne Zachlod <ar...@nerdkeller.org> on 2019/07/16 20:24:57 UTC

how is the Spark/Bahir dataflow with MQTT?

Hello,

sorry if this is kind of a beginner question to ask, but I couldn't find 
any documentation on this. I'm using PySpark 2.4.3 running with the 
Bahir git master, and everything seems to work great, thank you for that.

I didn't do any real scaling tests jet, but I was wondering how the flow 
of data works with bahir. I have a single DStream created by 
MQTTUtils.createStream() and this seems to create a single MQTT listener 
according to my mosquitto logs. So, my question is: is that correct? Did 
I do something wrong?
My original plan was to use some DNS trickery in order to scale beyond 
what a singe machine is capable of delivering via network, is that still 
possible? Basically, I wanted a MQTT subscriber per spark worker if that 
is supported.

Any pointing to some documentation or example even would be greatly 
appreciated.