You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2018/10/17 17:54:39 UTC

[GitHub] haphut commented on issue #2556: Messaging Gateway

haphut commented on issue #2556: Messaging Gateway 
URL: https://github.com/apache/pulsar/issues/2556#issuecomment-430726152
 
 
   It would be great if Pulsar could act as an MQTT broker while storing the received messages.
   
   We at [HSL](https://github.com/HSLdevcom/) have sources sending us data over MQTT and we publish open APIs over MQTT. For example, our fleet produces [High-frequency positioning (HFP)](https://digitransit.fi/en/developers/apis/4-realtime-api/vehicle-positions/) MQTT messages with QoS 1. Including the confidential messages, we get roughly 1000-2000 messages per second into the HFP topic tree. We'd like to store those messages in Pulsar.
   
   We'd also like to filter out the repeated messages that occur due to QoS 1. As Pulsar doesn't offer compaction that would retain only the first message for the key instead of the last, we'd have to build that kind of compaction in a Pulsar client.
   
   Additionally, we'd like to avoid losing data due to MQTT subscriber crashes and [this problem](https://streaml.io/blog/pulsar-effectively-once#addressing-producer-application-crashes): "Effectively-once publishing in practice only makes sense when the messages are coming from a replayable source as opposed to a non-replayable source (for example online HTTP requests). For non-replayable sources, there’s no way to re-send the previous pending messages after a crash."
   
   An approach for not losing data is to deploy several MQTT subscribers into different availability zones forwarding messages into the same Pulsar topic. The key of each Pulsar message would contain at least the digest of the payload. Again the variant of topic compaction that would retain the first message for each key would allow deduplicating the repeated data. This feels overly complicated, though, just for storing the messages.
   
   MQTT seems to be on the rise in public transport in the Nordics. Off the top of my head I can see several reasons:
   1. The protocol standard and several existing broker products avoids vendor lock-in.
   1. It's fairly easy to understand the basics of MQTT.
   1. Safely mixing public and confidential uses on the same MQTT broker is not a problem.
   1. Easy demarcation of responsibility between data providers and consumers aids in demanding MQTT in contracts.
   
   Unfortunately, storing the delivered data is often ignored in the contract phase but almost all of the data needs to be stored. There could be a market for an MQTT broker that acked messaged only after storing them, not after receiving them.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services