You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by Chris Teoh <ch...@gmail.com> on 2015/09/01 16:51:39 UTC

Large message payloads?

Hi,

Thanks for the help thus far with NiFi. I'll get to implementing those
suggestions posed to me from my earlier messages.

I'm also considering other message bus products that may be suitable too.
In doing so, I have read that these message bus systems like RabbitMQ
typically don't have large message payloads sent through them. In one of
the video demos I saw on YouTube, it appeared that NiFi was used to ingest
data from a Twitter feed. Am I correct in assuming that I can transit large
volumes of data through NiFi flows in and out of Hadoop?

I'm thinking other message bus systems end up just handling the signaling
and the bulk data transfers occur outside of the message bus and never
transit through it.

Kind regards
Chris

Re: Large message payloads?

Posted by Bryan Bende <bb...@gmail.com>.
Hi Chris,

Yes you are correct that large payloads can be moved through NiFi.

As data moves through NiFi, a pointer to the data is being passed around,
referred to as a FlowFile. The content of the FlowFile is only accessed as
needed.
The key for large payloads would be to operate on the payload in a
streaming fashion so that you don't read too many large payloads in to
memory exceed your JVM memory.

As an example, a typical pattern for bringing data into HDFS from NiFi, is
to use a MergeContent processor right before a PutHDFS processor.
MergeContent can take many small/medium size files
and merge them together to form an appropriate size file for HDFS. It does
this by copying all of the input streams from the original files to a new
output stream, and can therefore merge a large amount
of files without exceeding the memory of the JVM.

Hope that helps.

-Bryan


On Tue, Sep 1, 2015 at 10:51 AM, Chris Teoh <ch...@gmail.com> wrote:

> Hi,
>
> Thanks for the help thus far with NiFi. I'll get to implementing those
> suggestions posed to me from my earlier messages.
>
> I'm also considering other message bus products that may be suitable too.
> In doing so, I have read that these message bus systems like RabbitMQ
> typically don't have large message payloads sent through them. In one of
> the video demos I saw on YouTube, it appeared that NiFi was used to ingest
> data from a Twitter feed. Am I correct in assuming that I can transit large
> volumes of data through NiFi flows in and out of Hadoop?
>
> I'm thinking other message bus systems end up just handling the signaling
> and the bulk data transfers occur outside of the message bus and never
> transit through it.
>
> Kind regards
> Chris
>