You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@kafka.apache.org by Hannes Petri <ha...@gmail.com> on 2018/03/12 17:37:29 UTC

Using Kafka to build a streaming platform for a research facility?

Hi,

I work at a research facility where numerous hi-res detectors produce thousands of GB of data every day. We want to build a highly flexible and performant streaming platform for storing, transmitting and routing the data. For example, detector output needs to end up:

1. In permanent storage systems 
2. In realtime or semi-realtime visualization software
3. In post-processing and analysis software
4. In metrics software

...and possibly more. Now I'm exploring Kafka as an option to back such a platform.

Would Kafka be a good fit? The reason I'm asking is because among the use cases, I've mostly seen Kafka being used with more lightweight data, geared towards business events and high frequency streams of text and scalars. In other words, *more* but *smaller* messages.

In my situation, we'd be looking at low frequency but huge files (typically these detectors produce one large file at a time). In order not to flood the storage, raw data topics would need to have a very short retention time (hours to days).

Does anybody have experience in using Kafka in a similar scenario? What are your thoughts about the situation I describe? Would we benefit from using Kafka?

I highly appreciate any input on this. Many thanks in advance. 

Best regards
Hannes

Sent from my iPhone