You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by l vic <lv...@gmail.com> on 2018/10/29 15:13:06 UTC

schedule execution from relational database?

Hi,
i have "event_time" field in SQLite database that means epoch time for
triggering of external event. What processor(s) can i use to implement
schedule monitoring/ execution based on change in "event_time" value?
Thanks,

Re: schedule execution from relational database?

Posted by l vic <lv...@gmail.com>.
Hi Matt,
Nifi does handle other parts of it, just different process group.
Regards,
Victor

On Mon, Oct 29, 2018 at 12:39 PM Matt Burgess <ma...@apache.org> wrote:

> Victor,
>
> Yes, both QDT and GTF would generate something like "SELECT * from
> myTable where event_time > X", and QDT will execute it and update X.
> So if event_time is always increasing, it will continue to pick up the
> same row(s).
>
> That's a curious use case, maybe NiFi could handle other parts of it
> so you wouldn't need to update a single row in an external database
> table?
>
> Regards,
> Matt
>
> On Mon, Oct 29, 2018 at 12:36 PM l vic <lv...@gmail.com> wrote:
> >
> > What if have only one row and update the values in it? Will QDT fetch
> updates?
> > Thank you,
> > Victor
> >
> >
> > On Mon, Oct 29, 2018 at 11:54 AM Matt Burgess <ma...@apache.org>
> wrote:
> >>
> >> You can use QueryDatabaseTable (QDT) for this, you'd set your
> >> "event_time" column as the "Maximum Value Column(s)" property in the
> >> processor. The first time QDT executes, it will fetch all the rows
> >> (since it has not seen event_time before), then it will keep track of
> >> the largest value of event_time. As new rows are added (with larger
> >> event_time values), QDT will only fetch the rows whose event_time is
> >> greater than the largest one it's seen. Then it updates its "largest
> >> seen value" and so on.
> >>
> >> GenerateTableFetch (GTF) is another option, it works in a similar
> >> fashion, except that it does not fetch the rows itself, instead it
> >> generates flow files containing SQL statements that you can send
> >> downstream to perhaps ExecuteSQL in order to actually fetch the rows.
> >> GTF is often used in place of QDT if you'll be fetching a large number
> >> of rows in each statement, as you can distribute the SQL flow files
> >> among the nodes in a cluster, to do the fetch in parallel.
> >>
> >> Regards,
> >> Matt
> >>
> >> On Mon, Oct 29, 2018 at 11:13 AM l vic <lv...@gmail.com> wrote:
> >> >
> >> > Hi,
> >> > i have "event_time" field in SQLite database that means epoch time
> for triggering of external event. What processor(s) can i use to implement
> schedule monitoring/ execution based on change in "event_time" value?
> >> > Thanks,
>

Re: schedule execution from relational database?

Posted by l vic <lv...@gmail.com>.
My purpose is to use new epoch milliseconds value in flow file to  schedule
spark job at corresponding date/time, I am asking how that can be done in
NiFi.
Thank you,
Victor


On Mon, Oct 29, 2018 at 4:30 PM Matt Burgess <ma...@apache.org> wrote:

> Not sure I understand what you mean. Are you using the flow file to
> trigger ExecuteStreamCommand to schedule a cron job? Or do you mean
> scheduling a processor to run in NiFi? Or something else?
> On Mon, Oct 29, 2018 at 3:58 PM l vic <lv...@gmail.com> wrote:
> >
> > QDT works, eg it can detect change in MaximumValue column but how can I
> use it to schedule cron job? I know it's possible to schedule cron from UI
> but how can i do it based on the value of attribute?
> > Thank you again,
> > V.
> >
> > On Mon, Oct 29, 2018 at 12:39 PM Matt Burgess <ma...@apache.org>
> wrote:
> >>
> >> Victor,
> >>
> >> Yes, both QDT and GTF would generate something like "SELECT * from
> >> myTable where event_time > X", and QDT will execute it and update X.
> >> So if event_time is always increasing, it will continue to pick up the
> >> same row(s).
> >>
> >> That's a curious use case, maybe NiFi could handle other parts of it
> >> so you wouldn't need to update a single row in an external database
> >> table?
> >>
> >> Regards,
> >> Matt
> >>
> >> On Mon, Oct 29, 2018 at 12:36 PM l vic <lv...@gmail.com> wrote:
> >> >
> >> > What if have only one row and update the values in it? Will QDT fetch
> updates?
> >> > Thank you,
> >> > Victor
> >> >
> >> >
> >> > On Mon, Oct 29, 2018 at 11:54 AM Matt Burgess <ma...@apache.org>
> wrote:
> >> >>
> >> >> You can use QueryDatabaseTable (QDT) for this, you'd set your
> >> >> "event_time" column as the "Maximum Value Column(s)" property in the
> >> >> processor. The first time QDT executes, it will fetch all the rows
> >> >> (since it has not seen event_time before), then it will keep track of
> >> >> the largest value of event_time. As new rows are added (with larger
> >> >> event_time values), QDT will only fetch the rows whose event_time is
> >> >> greater than the largest one it's seen. Then it updates its "largest
> >> >> seen value" and so on.
> >> >>
> >> >> GenerateTableFetch (GTF) is another option, it works in a similar
> >> >> fashion, except that it does not fetch the rows itself, instead it
> >> >> generates flow files containing SQL statements that you can send
> >> >> downstream to perhaps ExecuteSQL in order to actually fetch the rows.
> >> >> GTF is often used in place of QDT if you'll be fetching a large
> number
> >> >> of rows in each statement, as you can distribute the SQL flow files
> >> >> among the nodes in a cluster, to do the fetch in parallel.
> >> >>
> >> >> Regards,
> >> >> Matt
> >> >>
> >> >> On Mon, Oct 29, 2018 at 11:13 AM l vic <lv...@gmail.com> wrote:
> >> >> >
> >> >> > Hi,
> >> >> > i have "event_time" field in SQLite database that means epoch time
> for triggering of external event. What processor(s) can i use to implement
> schedule monitoring/ execution based on change in "event_time" value?
> >> >> > Thanks,
>

Re: schedule execution from relational database?

Posted by Matt Burgess <ma...@apache.org>.
Not sure I understand what you mean. Are you using the flow file to
trigger ExecuteStreamCommand to schedule a cron job? Or do you mean
scheduling a processor to run in NiFi? Or something else?
On Mon, Oct 29, 2018 at 3:58 PM l vic <lv...@gmail.com> wrote:
>
> QDT works, eg it can detect change in MaximumValue column but how can I use it to schedule cron job? I know it's possible to schedule cron from UI but how can i do it based on the value of attribute?
> Thank you again,
> V.
>
> On Mon, Oct 29, 2018 at 12:39 PM Matt Burgess <ma...@apache.org> wrote:
>>
>> Victor,
>>
>> Yes, both QDT and GTF would generate something like "SELECT * from
>> myTable where event_time > X", and QDT will execute it and update X.
>> So if event_time is always increasing, it will continue to pick up the
>> same row(s).
>>
>> That's a curious use case, maybe NiFi could handle other parts of it
>> so you wouldn't need to update a single row in an external database
>> table?
>>
>> Regards,
>> Matt
>>
>> On Mon, Oct 29, 2018 at 12:36 PM l vic <lv...@gmail.com> wrote:
>> >
>> > What if have only one row and update the values in it? Will QDT fetch updates?
>> > Thank you,
>> > Victor
>> >
>> >
>> > On Mon, Oct 29, 2018 at 11:54 AM Matt Burgess <ma...@apache.org> wrote:
>> >>
>> >> You can use QueryDatabaseTable (QDT) for this, you'd set your
>> >> "event_time" column as the "Maximum Value Column(s)" property in the
>> >> processor. The first time QDT executes, it will fetch all the rows
>> >> (since it has not seen event_time before), then it will keep track of
>> >> the largest value of event_time. As new rows are added (with larger
>> >> event_time values), QDT will only fetch the rows whose event_time is
>> >> greater than the largest one it's seen. Then it updates its "largest
>> >> seen value" and so on.
>> >>
>> >> GenerateTableFetch (GTF) is another option, it works in a similar
>> >> fashion, except that it does not fetch the rows itself, instead it
>> >> generates flow files containing SQL statements that you can send
>> >> downstream to perhaps ExecuteSQL in order to actually fetch the rows.
>> >> GTF is often used in place of QDT if you'll be fetching a large number
>> >> of rows in each statement, as you can distribute the SQL flow files
>> >> among the nodes in a cluster, to do the fetch in parallel.
>> >>
>> >> Regards,
>> >> Matt
>> >>
>> >> On Mon, Oct 29, 2018 at 11:13 AM l vic <lv...@gmail.com> wrote:
>> >> >
>> >> > Hi,
>> >> > i have "event_time" field in SQLite database that means epoch time for triggering of external event. What processor(s) can i use to implement schedule monitoring/ execution based on change in "event_time" value?
>> >> > Thanks,

Re: schedule execution from relational database?

Posted by l vic <lv...@gmail.com>.
Hi Ed,
It usually would be days from the moment new value is captured....

On Mon, Oct 29, 2018 at 6:29 PM Ed B <bd...@gmail.com> wrote:

> Hey Victor,
>
> If you already pulled the record and know new value - that won't really
> help you to determine a change in a schedule.
> In my opinion, the schedule determined by the acceptable data latency for
> given application, in other words, how soon you want your changed data be
> captured.
> The answer can be from "real-time" to "on-demand".
>
> For your particular case, you need to decide and then either schedule
> every X sec/min/hours/days, etc, or at given time (at minute 30 of each
> hour every day except for Saturday).
> If you don't know what your requirements for data availability and latency
> are, you could start with something like every "5 mins". And then adjust as
> needed.
>
> Regards,
> Ed.
>
>
> On Mon, Oct 29, 2018 at 3:58 PM l vic <lv...@gmail.com> wrote:
>
>> QDT works, eg it can detect change in MaximumValue column but how can I
>> use it to schedule cron job? I know it's possible to schedule cron from UI
>> but how can i do it based on the value of attribute?
>> Thank you again,
>> V.
>>
>> On Mon, Oct 29, 2018 at 12:39 PM Matt Burgess <ma...@apache.org>
>> wrote:
>>
>>> Victor,
>>>
>>> Yes, both QDT and GTF would generate something like "SELECT * from
>>> myTable where event_time > X", and QDT will execute it and update X.
>>> So if event_time is always increasing, it will continue to pick up the
>>> same row(s).
>>>
>>> That's a curious use case, maybe NiFi could handle other parts of it
>>> so you wouldn't need to update a single row in an external database
>>> table?
>>>
>>> Regards,
>>> Matt
>>>
>>> On Mon, Oct 29, 2018 at 12:36 PM l vic <lv...@gmail.com> wrote:
>>> >
>>> > What if have only one row and update the values in it? Will QDT fetch
>>> updates?
>>> > Thank you,
>>> > Victor
>>> >
>>> >
>>> > On Mon, Oct 29, 2018 at 11:54 AM Matt Burgess <ma...@apache.org>
>>> wrote:
>>> >>
>>> >> You can use QueryDatabaseTable (QDT) for this, you'd set your
>>> >> "event_time" column as the "Maximum Value Column(s)" property in the
>>> >> processor. The first time QDT executes, it will fetch all the rows
>>> >> (since it has not seen event_time before), then it will keep track of
>>> >> the largest value of event_time. As new rows are added (with larger
>>> >> event_time values), QDT will only fetch the rows whose event_time is
>>> >> greater than the largest one it's seen. Then it updates its "largest
>>> >> seen value" and so on.
>>> >>
>>> >> GenerateTableFetch (GTF) is another option, it works in a similar
>>> >> fashion, except that it does not fetch the rows itself, instead it
>>> >> generates flow files containing SQL statements that you can send
>>> >> downstream to perhaps ExecuteSQL in order to actually fetch the rows.
>>> >> GTF is often used in place of QDT if you'll be fetching a large number
>>> >> of rows in each statement, as you can distribute the SQL flow files
>>> >> among the nodes in a cluster, to do the fetch in parallel.
>>> >>
>>> >> Regards,
>>> >> Matt
>>> >>
>>> >> On Mon, Oct 29, 2018 at 11:13 AM l vic <lv...@gmail.com> wrote:
>>> >> >
>>> >> > Hi,
>>> >> > i have "event_time" field in SQLite database that means epoch time
>>> for triggering of external event. What processor(s) can i use to implement
>>> schedule monitoring/ execution based on change in "event_time" value?
>>> >> > Thanks,
>>>
>>

Re: schedule execution from relational database?

Posted by Ed B <bd...@gmail.com>.
Hey Victor,

If you already pulled the record and know new value - that won't really
help you to determine a change in a schedule.
In my opinion, the schedule determined by the acceptable data latency for
given application, in other words, how soon you want your changed data be
captured.
The answer can be from "real-time" to "on-demand".

For your particular case, you need to decide and then either schedule every
X sec/min/hours/days, etc, or at given time (at minute 30 of each hour
every day except for Saturday).
If you don't know what your requirements for data availability and latency
are, you could start with something like every "5 mins". And then adjust as
needed.

Regards,
Ed.


On Mon, Oct 29, 2018 at 3:58 PM l vic <lv...@gmail.com> wrote:

> QDT works, eg it can detect change in MaximumValue column but how can I
> use it to schedule cron job? I know it's possible to schedule cron from UI
> but how can i do it based on the value of attribute?
> Thank you again,
> V.
>
> On Mon, Oct 29, 2018 at 12:39 PM Matt Burgess <ma...@apache.org>
> wrote:
>
>> Victor,
>>
>> Yes, both QDT and GTF would generate something like "SELECT * from
>> myTable where event_time > X", and QDT will execute it and update X.
>> So if event_time is always increasing, it will continue to pick up the
>> same row(s).
>>
>> That's a curious use case, maybe NiFi could handle other parts of it
>> so you wouldn't need to update a single row in an external database
>> table?
>>
>> Regards,
>> Matt
>>
>> On Mon, Oct 29, 2018 at 12:36 PM l vic <lv...@gmail.com> wrote:
>> >
>> > What if have only one row and update the values in it? Will QDT fetch
>> updates?
>> > Thank you,
>> > Victor
>> >
>> >
>> > On Mon, Oct 29, 2018 at 11:54 AM Matt Burgess <ma...@apache.org>
>> wrote:
>> >>
>> >> You can use QueryDatabaseTable (QDT) for this, you'd set your
>> >> "event_time" column as the "Maximum Value Column(s)" property in the
>> >> processor. The first time QDT executes, it will fetch all the rows
>> >> (since it has not seen event_time before), then it will keep track of
>> >> the largest value of event_time. As new rows are added (with larger
>> >> event_time values), QDT will only fetch the rows whose event_time is
>> >> greater than the largest one it's seen. Then it updates its "largest
>> >> seen value" and so on.
>> >>
>> >> GenerateTableFetch (GTF) is another option, it works in a similar
>> >> fashion, except that it does not fetch the rows itself, instead it
>> >> generates flow files containing SQL statements that you can send
>> >> downstream to perhaps ExecuteSQL in order to actually fetch the rows.
>> >> GTF is often used in place of QDT if you'll be fetching a large number
>> >> of rows in each statement, as you can distribute the SQL flow files
>> >> among the nodes in a cluster, to do the fetch in parallel.
>> >>
>> >> Regards,
>> >> Matt
>> >>
>> >> On Mon, Oct 29, 2018 at 11:13 AM l vic <lv...@gmail.com> wrote:
>> >> >
>> >> > Hi,
>> >> > i have "event_time" field in SQLite database that means epoch time
>> for triggering of external event. What processor(s) can i use to implement
>> schedule monitoring/ execution based on change in "event_time" value?
>> >> > Thanks,
>>
>

Re: schedule execution from relational database?

Posted by l vic <lv...@gmail.com>.
QDT works, eg it can detect change in MaximumValue column but how can I use
it to schedule cron job? I know it's possible to schedule cron from UI but
how can i do it based on the value of attribute?
Thank you again,
V.

On Mon, Oct 29, 2018 at 12:39 PM Matt Burgess <ma...@apache.org> wrote:

> Victor,
>
> Yes, both QDT and GTF would generate something like "SELECT * from
> myTable where event_time > X", and QDT will execute it and update X.
> So if event_time is always increasing, it will continue to pick up the
> same row(s).
>
> That's a curious use case, maybe NiFi could handle other parts of it
> so you wouldn't need to update a single row in an external database
> table?
>
> Regards,
> Matt
>
> On Mon, Oct 29, 2018 at 12:36 PM l vic <lv...@gmail.com> wrote:
> >
> > What if have only one row and update the values in it? Will QDT fetch
> updates?
> > Thank you,
> > Victor
> >
> >
> > On Mon, Oct 29, 2018 at 11:54 AM Matt Burgess <ma...@apache.org>
> wrote:
> >>
> >> You can use QueryDatabaseTable (QDT) for this, you'd set your
> >> "event_time" column as the "Maximum Value Column(s)" property in the
> >> processor. The first time QDT executes, it will fetch all the rows
> >> (since it has not seen event_time before), then it will keep track of
> >> the largest value of event_time. As new rows are added (with larger
> >> event_time values), QDT will only fetch the rows whose event_time is
> >> greater than the largest one it's seen. Then it updates its "largest
> >> seen value" and so on.
> >>
> >> GenerateTableFetch (GTF) is another option, it works in a similar
> >> fashion, except that it does not fetch the rows itself, instead it
> >> generates flow files containing SQL statements that you can send
> >> downstream to perhaps ExecuteSQL in order to actually fetch the rows.
> >> GTF is often used in place of QDT if you'll be fetching a large number
> >> of rows in each statement, as you can distribute the SQL flow files
> >> among the nodes in a cluster, to do the fetch in parallel.
> >>
> >> Regards,
> >> Matt
> >>
> >> On Mon, Oct 29, 2018 at 11:13 AM l vic <lv...@gmail.com> wrote:
> >> >
> >> > Hi,
> >> > i have "event_time" field in SQLite database that means epoch time
> for triggering of external event. What processor(s) can i use to implement
> schedule monitoring/ execution based on change in "event_time" value?
> >> > Thanks,
>

Re: schedule execution from relational database?

Posted by Matt Burgess <ma...@apache.org>.
Victor,

Yes, both QDT and GTF would generate something like "SELECT * from
myTable where event_time > X", and QDT will execute it and update X.
So if event_time is always increasing, it will continue to pick up the
same row(s).

That's a curious use case, maybe NiFi could handle other parts of it
so you wouldn't need to update a single row in an external database
table?

Regards,
Matt

On Mon, Oct 29, 2018 at 12:36 PM l vic <lv...@gmail.com> wrote:
>
> What if have only one row and update the values in it? Will QDT fetch updates?
> Thank you,
> Victor
>
>
> On Mon, Oct 29, 2018 at 11:54 AM Matt Burgess <ma...@apache.org> wrote:
>>
>> You can use QueryDatabaseTable (QDT) for this, you'd set your
>> "event_time" column as the "Maximum Value Column(s)" property in the
>> processor. The first time QDT executes, it will fetch all the rows
>> (since it has not seen event_time before), then it will keep track of
>> the largest value of event_time. As new rows are added (with larger
>> event_time values), QDT will only fetch the rows whose event_time is
>> greater than the largest one it's seen. Then it updates its "largest
>> seen value" and so on.
>>
>> GenerateTableFetch (GTF) is another option, it works in a similar
>> fashion, except that it does not fetch the rows itself, instead it
>> generates flow files containing SQL statements that you can send
>> downstream to perhaps ExecuteSQL in order to actually fetch the rows.
>> GTF is often used in place of QDT if you'll be fetching a large number
>> of rows in each statement, as you can distribute the SQL flow files
>> among the nodes in a cluster, to do the fetch in parallel.
>>
>> Regards,
>> Matt
>>
>> On Mon, Oct 29, 2018 at 11:13 AM l vic <lv...@gmail.com> wrote:
>> >
>> > Hi,
>> > i have "event_time" field in SQLite database that means epoch time for triggering of external event. What processor(s) can i use to implement schedule monitoring/ execution based on change in "event_time" value?
>> > Thanks,

Re: schedule execution from relational database?

Posted by l vic <lv...@gmail.com>.
What if have only one row and update the values in it? Will QDT fetch
updates?
Thank you,
Victor


On Mon, Oct 29, 2018 at 11:54 AM Matt Burgess <ma...@apache.org> wrote:

> You can use QueryDatabaseTable (QDT) for this, you'd set your
> "event_time" column as the "Maximum Value Column(s)" property in the
> processor. The first time QDT executes, it will fetch all the rows
> (since it has not seen event_time before), then it will keep track of
> the largest value of event_time. As new rows are added (with larger
> event_time values), QDT will only fetch the rows whose event_time is
> greater than the largest one it's seen. Then it updates its "largest
> seen value" and so on.
>
> GenerateTableFetch (GTF) is another option, it works in a similar
> fashion, except that it does not fetch the rows itself, instead it
> generates flow files containing SQL statements that you can send
> downstream to perhaps ExecuteSQL in order to actually fetch the rows.
> GTF is often used in place of QDT if you'll be fetching a large number
> of rows in each statement, as you can distribute the SQL flow files
> among the nodes in a cluster, to do the fetch in parallel.
>
> Regards,
> Matt
>
> On Mon, Oct 29, 2018 at 11:13 AM l vic <lv...@gmail.com> wrote:
> >
> > Hi,
> > i have "event_time" field in SQLite database that means epoch time for
> triggering of external event. What processor(s) can i use to implement
> schedule monitoring/ execution based on change in "event_time" value?
> > Thanks,
>

Re: schedule execution from relational database?

Posted by Matt Burgess <ma...@apache.org>.
You can use QueryDatabaseTable (QDT) for this, you'd set your
"event_time" column as the "Maximum Value Column(s)" property in the
processor. The first time QDT executes, it will fetch all the rows
(since it has not seen event_time before), then it will keep track of
the largest value of event_time. As new rows are added (with larger
event_time values), QDT will only fetch the rows whose event_time is
greater than the largest one it's seen. Then it updates its "largest
seen value" and so on.

GenerateTableFetch (GTF) is another option, it works in a similar
fashion, except that it does not fetch the rows itself, instead it
generates flow files containing SQL statements that you can send
downstream to perhaps ExecuteSQL in order to actually fetch the rows.
GTF is often used in place of QDT if you'll be fetching a large number
of rows in each statement, as you can distribute the SQL flow files
among the nodes in a cluster, to do the fetch in parallel.

Regards,
Matt

On Mon, Oct 29, 2018 at 11:13 AM l vic <lv...@gmail.com> wrote:
>
> Hi,
> i have "event_time" field in SQLite database that means epoch time for triggering of external event. What processor(s) can i use to implement schedule monitoring/ execution based on change in "event_time" value?
> Thanks,