You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@nifi.apache.org by Dnyaneshwar Pawar <dn...@persistent.com> on 2018/10/15 05:36:53 UTC

RE: High volume data with ExecuteSQL processor

Hi Koji,

As suggested, the "Max Rows Per Flow File" is not available for ExecuteSQL processor, its available with QueryDatabaseTable processor. But we cannot use QueryDatabaseTable processor as its not accepting upstream connections and we have requirement of accepting upstream connection from other processors (e.g. HandleHTTPRequest processor). Please suggest how we can use ExecuteSQL to process high volume data.

-----Original Message-----
From: Koji Kawamura <ij...@gmail.com> 
Sent: Tuesday, September 25, 2018 5:59 AM
To: users@nifi.apache.org
Subject: Re: High volume data with ExecuteSQL processor

Hello,

Did you try setting 'Max Rows Per Flow File' at ExecuteSQL processor?
If the OOM happened when NiFi writes all results into a single FlowFile, then the property can help breaking the result set into several FlowFiles to avoid that.

Thanks,
Koji
On Fri, Sep 21, 2018 at 3:56 PM Dnyaneshwar Pawar <dn...@persistent.com> wrote:
>
> Hi,
>
>
>
> How to execute/process High volume data with ExecuteSQL processor:
>
>
>
> We tried to execute query for db2 database which has around 10 lakh 
> records. While executing this query
>
> we are getting OutOfMemory error and that request(flowfile) is stuck 
> in queue. When we restart nifi, it still stuck in queue and as soon as 
> we start nifi,
>
> we are again getting same error as it is stuck in queue. Is there any way to configure retry for queue(connection to 2 processor).
>
>
>
> We also tried to change property for Flow File repository in 
> nifi.properties (nifi.flowfile.repository.implementation) to 
> 'org.apache.nifi.controller.repository.VolatileFlowFileRepository',
>
> This is removing flowfile in query while restarting nifi. But it has risk of data loss in the event of power/machine failure for other processes.
>
> So please suggest how to execute high volume data query execution or any retry mechanism available for queued flowfile.
>
>
>
>
>
> Regards,
>
> Dnyaneshwar Pawar
>
>
DISCLAIMER
==========
This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.

Re: High volume data with ExecuteSQL processor

Posted by Matt Burgess <ma...@apache.org>.

Dnyaneshwar,

In the upcoming release of NiFi 1.8.0, ExecuteSQL will have Max Rows
Per Flow File [1]. In the meantime, you might try GenerateTableFetch,
it takes incoming flow files and generates SQL for X number of rows
per flow file (it is called Partition Size in that processor). The
limitation is that you can't provide your own SQL, it will generate
SQL based on the columns to return, any max-value columns specified,
and an optional custom WHERE clause. If you have complex SQL this
won't be a viable workaround, but if not it should do the trick for
now.

Regards,
Matt

[1] https://issues.apache.org/jira/browse/NIFI-1251

On Mon, Oct 15, 2018 at 1:37 AM Dnyaneshwar Pawar
<dn...@persistent.com> wrote:
>
> Hi Koji,
>
> As suggested, the "Max Rows Per Flow File" is not available for ExecuteSQL processor, its available with QueryDatabaseTable processor. But we cannot use QueryDatabaseTable processor as its not accepting upstream connections and we have requirement of accepting upstream connection from other processors (e.g. HandleHTTPRequest processor). Please suggest how we can use ExecuteSQL to process high volume data.
>
> -----Original Message-----
> From: Koji Kawamura <ij...@gmail.com>
> Sent: Tuesday, September 25, 2018 5:59 AM
> To: users@nifi.apache.org
> Subject: Re: High volume data with ExecuteSQL processor
>
> Hello,
>
> Did you try setting 'Max Rows Per Flow File' at ExecuteSQL processor?
> If the OOM happened when NiFi writes all results into a single FlowFile, then the property can help breaking the result set into several FlowFiles to avoid that.
>
> Thanks,
> Koji
> On Fri, Sep 21, 2018 at 3:56 PM Dnyaneshwar Pawar <dn...@persistent.com> wrote:
> >
> > Hi,
> >
> >
> >
> > How to execute/process High volume data with ExecuteSQL processor:
> >
> >
> >
> > We tried to execute query for db2 database which has around 10 lakh
> > records. While executing this query
> >
> > we are getting OutOfMemory error and that request(flowfile) is stuck
> > in queue. When we restart nifi, it still stuck in queue and as soon as
> > we start nifi,
> >
> > we are again getting same error as it is stuck in queue. Is there any way to configure retry for queue(connection to 2 processor).
> >
> >
> >
> > We also tried to change property for Flow File repository in
> > nifi.properties (nifi.flowfile.repository.implementation) to
> > 'org.apache.nifi.controller.repository.VolatileFlowFileRepository',
> >
> > This is removing flowfile in query while restarting nifi. But it has risk of data loss in the event of power/machine failure for other processes.
> >
> > So please suggest how to execute high volume data query execution or any retry mechanism available for queued flowfile.
> >
> >
> >
> >
> >
> > Regards,
> >
> > Dnyaneshwar Pawar
> >
> >
> DISCLAIMER
> ==========
> This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.