You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by "John W. Phillips" <jo...@looker.com> on 2021/02/10 20:18:12 UTC

Issue with GenerateTableFetch Processor

I’m having an issue with the GenerateTableFetch Processor, and I wanted to
ask for some insight into whether this is a bug or expected behavior.  Using
NiFi 1.12.1 I have a MySQL table with 1M+ rows, and I have a
GenerateTableFetch processor with a `maximum-value column` and
`partition-size` set to 25000 and a `run schedule` of 9 minutes.  When the
etl starts up I get a sequence of queries for the existing 1M+ rows like
this example
`SELECT … ORDER BY maxvalcolumn LIMIT 25000 OFFSET 375000`.

The on 9 minutes intervals I get queries like 
`SELECT … FROM ... WHERE maxvalcolumn > … AND maxvalcolumn <= … ORDER BY
maxvalcolumn LIMIT 25000`

The issue is that I see only 1 query per 9 minutes with a `LIMIT 25000`, so
if my table accumulates more than 25000 rows in 9 minutes the `LIMIT 25000`
term simply drops the additional rows and they are passed up.  Does the
GenerateTableFetch delta copy generate any additional queries with the
`OFFSET` term?  I’m not sure if there’s a configuration where I can get
multiple queries using the `OFFSET` term in the 9 minute interval, or if I
can have the query generated without the `LIMIT 25000` term.

Thanks,
John



--
Sent from: http://apache-nifi-users-list.2361937.n4.nabble.com/

Re: Issue with GenerateTableFetch Processor

Posted by "John W. Phillips" <jo...@looker.com>.
Thank you, this is very helpful!  John



--
Sent from: http://apache-nifi-users-list.2361937.n4.nabble.com/

Re: Issue with GenerateTableFetch Processor

Posted by Matt Burgess <ma...@apache.org>.
John,

It should be generating multiple queries with OFFSET, I tried to
reproduce in a unit test (using Derby not MySQL) and everything looked
fine. I ran it once with 3 rows and a partition size of 2 and got the
expected 2 flowfiles (one with 2 rows and one with 1). Then I added 6
rows and ran again (this simulates a larger number of rows than the
partition size coming in before the processor gets run again (9 min in
your case). I got the expected 3 flowfiles out with the correct SQL:

SELECT * FROM TEST_QUERY_DB_TABLE WHERE ID > 2 AND ID <= 8 ORDER BY ID
FETCH NEXT 2 ROWS ONLY
SELECT * FROM TEST_QUERY_DB_TABLE WHERE ID > 2 AND ID <= 8 ORDER BY ID
OFFSET 2 ROWS FETCH NEXT 2 ROWS ONLY
SELECT * FROM TEST_QUERY_DB_TABLE WHERE ID > 2 AND ID <= 8 ORDER BY ID
OFFSET 4 ROWS FETCH NEXT 2 ROWS ONLY

I can try it on MySQL to see if it makes any difference, but I
wouldn't think so. In any case it might make sense to tune the
partition size and run schedule to better characterize the volume and
rate of rows coming in, such as setting the run schedule to something
like 1 minute.

Regards,
Matt


On Wed, Feb 10, 2021 at 3:18 PM John W. Phillips
<jo...@looker.com> wrote:
>
> I’m having an issue with the GenerateTableFetch Processor, and I wanted to
> ask for some insight into whether this is a bug or expected behavior.  Using
> NiFi 1.12.1 I have a MySQL table with 1M+ rows, and I have a
> GenerateTableFetch processor with a `maximum-value column` and
> `partition-size` set to 25000 and a `run schedule` of 9 minutes.  When the
> etl starts up I get a sequence of queries for the existing 1M+ rows like
> this example
> `SELECT … ORDER BY maxvalcolumn LIMIT 25000 OFFSET 375000`.
>
> The on 9 minutes intervals I get queries like
> `SELECT … FROM ... WHERE maxvalcolumn > … AND maxvalcolumn <= … ORDER BY
> maxvalcolumn LIMIT 25000`
>
> The issue is that I see only 1 query per 9 minutes with a `LIMIT 25000`, so
> if my table accumulates more than 25000 rows in 9 minutes the `LIMIT 25000`
> term simply drops the additional rows and they are passed up.  Does the
> GenerateTableFetch delta copy generate any additional queries with the
> `OFFSET` term?  I’m not sure if there’s a configuration where I can get
> multiple queries using the `OFFSET` term in the 9 minute interval, or if I
> can have the query generated without the `LIMIT 25000` term.
>
> Thanks,
> John
>
>
>
> --
> Sent from: http://apache-nifi-users-list.2361937.n4.nabble.com/