You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nifi.apache.org by ashwin konale <as...@gmail.com> on 2018/10/17 17:54:54 UTC

Scaling source processors in nifi horizontally.

Hi,

I am experimenting with nifi for one of our usecases with plans of
extending it to various other data routing, ingestion usecases. Right now I
need to ingest data from mysql binlogs to hdfs/GCS. We have around 250
different schemas and about 3000 tables to read data from. Volume of the
data flow ranges from 500 - 2000 messages per second in different schemas.

Right now the problem is mysqlCDC processor can run in only one thread. To
overcome this issue I have two options.

1. Use primary node execution, so different processors for each of the
schemas. So eventually all processors which reads from mysql will run in
single node, which will be a bottleneck no matter how big my nifi cluster
is.

2. Another approach is to use multiple nifi instances to pull data and have
master nifi cluster for ingestion to various sinks. In this approach I will
have to manage all these small nifi instances, and may have to build some
kind of tooling on top of it to monitor/provision new processor for newly
added schemas etc.

Is there any better way to achieve my usecase with nifi ? Please advice me
on the architechture.

Looking forward for suggestion.

- Ashwin

Re: Scaling source processors in nifi horizontally.

Posted by Mike Thomsen <mi...@gmail.com>.
> may have to build some kind of tooling on top of it to monitor/provision
new processor for newly added schemas etc.

Could you elaborate on this part of your use case?

On Wed, Oct 17, 2018 at 2:31 PM ashwin konale <as...@gmail.com>
wrote:

> Hi,
>
> I am experimenting with nifi for one of our usecases with plans of
> extending it to various other data routing, ingestion usecases. Right now I
> need to ingest data from mysql binlogs to hdfs/GCS. We have around 250
> different schemas and about 3000 tables to read data from. Volume of the
> data flow ranges from 500 - 2000 messages per second in different schemas.
>
> Right now the problem is mysqlCDC processor can run in only one thread. To
> overcome this issue I have two options.
>
> 1. Use primary node execution, so different processors for each of the
> schemas. So eventually all processors which reads from mysql will run in
> single node, which will be a bottleneck no matter how big my nifi cluster
> is.
>
> 2. Another approach is to use multiple nifi instances to pull data and have
> master nifi cluster for ingestion to various sinks. In this approach I will
> have to manage all these small nifi instances, and may have to build some
> kind of tooling on top of it to monitor/provision new processor for newly
> added schemas etc.
>
> Is there any better way to achieve my usecase with nifi ? Please advice me
> on the architechture.
>
> Looking forward for suggestion.
>
> - Ashwin
>