You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@apex.apache.org by Guilherme Hott <gu...@gmail.com> on 2017/06/13 21:29:49 UTC

Is there a way to schedule an operator?

Hi guys,

Is there a way to schedule an operator? I need an operator start the DAG
once a day at 00am.

Best

-- 
*Guilherme Hott*
*Software Engineer*
Skype: guilhermehott
@guilhermehott
https://www.linkedin.com/in/guilhermehott

Re: Is there a way to schedule an operator?

Posted by Vlad Rozov <v....@datatorrent.com>.
Use Apex client to start application or use REST API to start 
application if you have DataTorrent RTS license from you cron job.

Thank you,

Vlad

On 6/13/17 15:30, Guilherme Hott wrote:
> I was thinking about to use a cron sending a kafka message and in my 
> DAG I'll have a kafka input operator to consume this message and start 
> the process. I think this work but I would like to know if have 
> something more appropriate.
>
> On Tue, Jun 13, 2017 at 3:25 PM, Pramod Immaneni 
> <pramod@datatorrent.com <ma...@datatorrent.com>> wrote:
>
>     There is no built scheduler to schedule the DAGs at a prescribed
>     time, you would need to use some external mechanisms. Because it
>     is a daily one-time activity, would something like cron work for you?
>
>     On Tue, Jun 13, 2017 at 3:22 PM, Guilherme Hott
>     <guilhermehott@gmail.com <ma...@gmail.com>> wrote:
>
>         Because I am syncing my data from a table in a database to
>         HDFS and I want to do this just once a day to save processing use.
>
>         On Tue, Jun 13, 2017 at 2:45 PM, Ganelin, Ilya
>         <Ilya.Ganelin@capitalone.com
>         <ma...@capitalone.com>> wrote:
>
>             Why don’t you want your dag to continue running? Are there
>             resources you wish to release?
>
>             - Ilya Ganelin
>
>             id:image001.png@01D1F7A4.F3D42980
>
>             *From: *Guilherme Hott <guilhermehott@gmail.com
>             <ma...@gmail.com>>
>             *Reply-To: *"users@apex.apache.org
>             <ma...@apex.apache.org>" <users@apex.apache.org
>             <ma...@apex.apache.org>>
>             *Date: *Tuesday, June 13, 2017 at 2:29 PM
>             *To: *"users@apex.apache.org
>             <ma...@apex.apache.org>" <users@apex.apache.org
>             <ma...@apex.apache.org>>
>             *Subject: *Is there a way to schedule an operator?
>
>             Hi guys,
>
>             Is there a way to schedule an operator? I need an
>             operator start the DAG once a day at 00am.
>
>             Best
>
>             -- 
>
>             *Guilherme Hott*
>
>             /Software Engineer/
>
>             Skype: guilhermehott
>
>             @guilhermehott
>
>             https://www.linkedin.com/in/guilhermehott
>             <https://www.linkedin.com/in/guilhermehott>
>
>
>             ------------------------------------------------------------------------
>
>             The information contained in this e-mail is confidential
>             and/or proprietary to Capital One and/or its affiliates
>             and may only be used solely in performance of work or
>             services for Capital One. The information transmitted
>             herewith is intended only for use by the individual or
>             entity to which it is addressed. If the reader of this
>             message is not the intended recipient, you are hereby
>             notified that any review, retransmission, dissemination,
>             distribution, copying or other use of, or taking of any
>             action in reliance upon this information is strictly
>             prohibited. If you have received this communication in
>             error, please contact the sender and delete the material
>             from your computer.
>
>
>
>
>         -- 
>         *Guilherme Hott*
>         /Software Engineer/
>         Skype: guilhermehott
>         @guilhermehott
>         https://www.linkedin.com/in/guilhermehott
>         <https://www.linkedin.com/in/guilhermehott>
>
>
>
>
>
> -- 
> *Guilherme Hott*
> /Software Engineer/
> Skype: guilhermehott
> @guilhermehott
> https://www.linkedin.com/in/guilhermehott
>


Re: Is there a way to schedule an operator?

Posted by Pramod Immaneni <pr...@datatorrent.com>.
It is one way to go, to have an operator implement the scheduling. You can
signal the operator using the change of a property. Look at the
set-operator-property command in the apex cli, it can be used to change a
property which will result in the corresponding setter method being called
in the operator and you can use that as a signal to do further operations.

On Tue, Jun 13, 2017 at 3:30 PM, Guilherme Hott <gu...@gmail.com>
wrote:

> I was thinking about to use a cron sending a kafka message and in my DAG
> I'll have a kafka input operator to consume this message and start the
> process. I think this work but I would like to know if have something more
> appropriate.
>
> On Tue, Jun 13, 2017 at 3:25 PM, Pramod Immaneni <pr...@datatorrent.com>
> wrote:
>
>> There is no built scheduler to schedule the DAGs at a prescribed time,
>> you would need to use some external mechanisms. Because it is a daily
>> one-time activity, would something like cron work for you?
>>
>> On Tue, Jun 13, 2017 at 3:22 PM, Guilherme Hott <gu...@gmail.com>
>> wrote:
>>
>>> Because I am syncing my data from a table in a database to HDFS and I
>>> want to do this just once a day to save processing use.
>>>
>>> On Tue, Jun 13, 2017 at 2:45 PM, Ganelin, Ilya <
>>> Ilya.Ganelin@capitalone.com> wrote:
>>>
>>>> Why don’t you want your dag to continue running? Are there resources
>>>> you wish to release?
>>>>
>>>>
>>>>
>>>> - Ilya Ganelin
>>>>
>>>> [image: id:image001.png@01D1F7A4.F3D42980]
>>>>
>>>>
>>>>
>>>> *From: *Guilherme Hott <gu...@gmail.com>
>>>> *Reply-To: *"users@apex.apache.org" <us...@apex.apache.org>
>>>> *Date: *Tuesday, June 13, 2017 at 2:29 PM
>>>> *To: *"users@apex.apache.org" <us...@apex.apache.org>
>>>> *Subject: *Is there a way to schedule an operator?
>>>>
>>>>
>>>>
>>>> Hi guys,
>>>>
>>>>
>>>>
>>>> Is there a way to schedule an operator? I need an operator start the
>>>> DAG once a day at 00am.
>>>>
>>>>
>>>>
>>>> Best
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> *Guilherme Hott*
>>>>
>>>> *Software Engineer*
>>>>
>>>> Skype: guilhermehott
>>>>
>>>> @guilhermehott
>>>>
>>>> https://www.linkedin.com/in/guilhermehott
>>>>
>>>>
>>>>
>>>> ------------------------------
>>>>
>>>> The information contained in this e-mail is confidential and/or
>>>> proprietary to Capital One and/or its affiliates and may only be used
>>>> solely in performance of work or services for Capital One. The information
>>>> transmitted herewith is intended only for use by the individual or entity
>>>> to which it is addressed. If the reader of this message is not the intended
>>>> recipient, you are hereby notified that any review, retransmission,
>>>> dissemination, distribution, copying or other use of, or taking of any
>>>> action in reliance upon this information is strictly prohibited. If you
>>>> have received this communication in error, please contact the sender and
>>>> delete the material from your computer.
>>>>
>>>
>>>
>>>
>>> --
>>> *Guilherme Hott*
>>> *Software Engineer*
>>> Skype: guilhermehott
>>> @guilhermehott
>>> https://www.linkedin.com/in/guilhermehott
>>>
>>>
>>
>
>
> --
> *Guilherme Hott*
> *Software Engineer*
> Skype: guilhermehott
> @guilhermehott
> https://www.linkedin.com/in/guilhermehott
>
>

Re: Is there a way to schedule an operator?

Posted by Guilherme Hott <gu...@gmail.com>.
I was thinking about to use a cron sending a kafka message and in my DAG
I'll have a kafka input operator to consume this message and start the
process. I think this work but I would like to know if have something more
appropriate.

On Tue, Jun 13, 2017 at 3:25 PM, Pramod Immaneni <pr...@datatorrent.com>
wrote:

> There is no built scheduler to schedule the DAGs at a prescribed time, you
> would need to use some external mechanisms. Because it is a daily one-time
> activity, would something like cron work for you?
>
> On Tue, Jun 13, 2017 at 3:22 PM, Guilherme Hott <gu...@gmail.com>
> wrote:
>
>> Because I am syncing my data from a table in a database to HDFS and I
>> want to do this just once a day to save processing use.
>>
>> On Tue, Jun 13, 2017 at 2:45 PM, Ganelin, Ilya <
>> Ilya.Ganelin@capitalone.com> wrote:
>>
>>> Why don’t you want your dag to continue running? Are there resources you
>>> wish to release?
>>>
>>>
>>>
>>> - Ilya Ganelin
>>>
>>> [image: id:image001.png@01D1F7A4.F3D42980]
>>>
>>>
>>>
>>> *From: *Guilherme Hott <gu...@gmail.com>
>>> *Reply-To: *"users@apex.apache.org" <us...@apex.apache.org>
>>> *Date: *Tuesday, June 13, 2017 at 2:29 PM
>>> *To: *"users@apex.apache.org" <us...@apex.apache.org>
>>> *Subject: *Is there a way to schedule an operator?
>>>
>>>
>>>
>>> Hi guys,
>>>
>>>
>>>
>>> Is there a way to schedule an operator? I need an operator start the DAG
>>> once a day at 00am.
>>>
>>>
>>>
>>> Best
>>>
>>>
>>>
>>> --
>>>
>>> *Guilherme Hott*
>>>
>>> *Software Engineer*
>>>
>>> Skype: guilhermehott
>>>
>>> @guilhermehott
>>>
>>> https://www.linkedin.com/in/guilhermehott
>>>
>>>
>>>
>>> ------------------------------
>>>
>>> The information contained in this e-mail is confidential and/or
>>> proprietary to Capital One and/or its affiliates and may only be used
>>> solely in performance of work or services for Capital One. The information
>>> transmitted herewith is intended only for use by the individual or entity
>>> to which it is addressed. If the reader of this message is not the intended
>>> recipient, you are hereby notified that any review, retransmission,
>>> dissemination, distribution, copying or other use of, or taking of any
>>> action in reliance upon this information is strictly prohibited. If you
>>> have received this communication in error, please contact the sender and
>>> delete the material from your computer.
>>>
>>
>>
>>
>> --
>> *Guilherme Hott*
>> *Software Engineer*
>> Skype: guilhermehott
>> @guilhermehott
>> https://www.linkedin.com/in/guilhermehott
>>
>>
>


-- 
*Guilherme Hott*
*Software Engineer*
Skype: guilhermehott
@guilhermehott
https://www.linkedin.com/in/guilhermehott

Re: Is there a way to schedule an operator?

Posted by Pramod Immaneni <pr...@datatorrent.com>.
There is no built scheduler to schedule the DAGs at a prescribed time, you
would need to use some external mechanisms. Because it is a daily one-time
activity, would something like cron work for you?

On Tue, Jun 13, 2017 at 3:22 PM, Guilherme Hott <gu...@gmail.com>
wrote:

> Because I am syncing my data from a table in a database to HDFS and I want
> to do this just once a day to save processing use.
>
> On Tue, Jun 13, 2017 at 2:45 PM, Ganelin, Ilya <
> Ilya.Ganelin@capitalone.com> wrote:
>
>> Why don’t you want your dag to continue running? Are there resources you
>> wish to release?
>>
>>
>>
>> - Ilya Ganelin
>>
>> [image: id:image001.png@01D1F7A4.F3D42980]
>>
>>
>>
>> *From: *Guilherme Hott <gu...@gmail.com>
>> *Reply-To: *"users@apex.apache.org" <us...@apex.apache.org>
>> *Date: *Tuesday, June 13, 2017 at 2:29 PM
>> *To: *"users@apex.apache.org" <us...@apex.apache.org>
>> *Subject: *Is there a way to schedule an operator?
>>
>>
>>
>> Hi guys,
>>
>>
>>
>> Is there a way to schedule an operator? I need an operator start the DAG
>> once a day at 00am.
>>
>>
>>
>> Best
>>
>>
>>
>> --
>>
>> *Guilherme Hott*
>>
>> *Software Engineer*
>>
>> Skype: guilhermehott
>>
>> @guilhermehott
>>
>> https://www.linkedin.com/in/guilhermehott
>>
>>
>>
>> ------------------------------
>>
>> The information contained in this e-mail is confidential and/or
>> proprietary to Capital One and/or its affiliates and may only be used
>> solely in performance of work or services for Capital One. The information
>> transmitted herewith is intended only for use by the individual or entity
>> to which it is addressed. If the reader of this message is not the intended
>> recipient, you are hereby notified that any review, retransmission,
>> dissemination, distribution, copying or other use of, or taking of any
>> action in reliance upon this information is strictly prohibited. If you
>> have received this communication in error, please contact the sender and
>> delete the material from your computer.
>>
>
>
>
> --
> *Guilherme Hott*
> *Software Engineer*
> Skype: guilhermehott
> @guilhermehott
> https://www.linkedin.com/in/guilhermehott
>
>

Re: Is there a way to schedule an operator?

Posted by Guilherme Hott <gu...@gmail.com>.
Because I am syncing my data from a table in a database to HDFS and I want
to do this just once a day to save processing use.

On Tue, Jun 13, 2017 at 2:45 PM, Ganelin, Ilya <Il...@capitalone.com>
wrote:

> Why don’t you want your dag to continue running? Are there resources you
> wish to release?
>
>
>
> - Ilya Ganelin
>
> [image: id:image001.png@01D1F7A4.F3D42980]
>
>
>
> *From: *Guilherme Hott <gu...@gmail.com>
> *Reply-To: *"users@apex.apache.org" <us...@apex.apache.org>
> *Date: *Tuesday, June 13, 2017 at 2:29 PM
> *To: *"users@apex.apache.org" <us...@apex.apache.org>
> *Subject: *Is there a way to schedule an operator?
>
>
>
> Hi guys,
>
>
>
> Is there a way to schedule an operator? I need an operator start the DAG
> once a day at 00am.
>
>
>
> Best
>
>
>
> --
>
> *Guilherme Hott*
>
> *Software Engineer*
>
> Skype: guilhermehott
>
> @guilhermehott
>
> https://www.linkedin.com/in/guilhermehott
>
>
>
> ------------------------------
>
> The information contained in this e-mail is confidential and/or
> proprietary to Capital One and/or its affiliates and may only be used
> solely in performance of work or services for Capital One. The information
> transmitted herewith is intended only for use by the individual or entity
> to which it is addressed. If the reader of this message is not the intended
> recipient, you are hereby notified that any review, retransmission,
> dissemination, distribution, copying or other use of, or taking of any
> action in reliance upon this information is strictly prohibited. If you
> have received this communication in error, please contact the sender and
> delete the material from your computer.
>



-- 
*Guilherme Hott*
*Software Engineer*
Skype: guilhermehott
@guilhermehott
https://www.linkedin.com/in/guilhermehott

Re: Is there a way to schedule an operator?

Posted by "Ganelin, Ilya" <Il...@capitalone.com>.
Why don’t you want your dag to continue running? Are there resources you wish to release?

- Ilya Ganelin
[id:image001.png@01D1F7A4.F3D42980]

From: Guilherme Hott <gu...@gmail.com>
Reply-To: "users@apex.apache.org" <us...@apex.apache.org>
Date: Tuesday, June 13, 2017 at 2:29 PM
To: "users@apex.apache.org" <us...@apex.apache.org>
Subject: Is there a way to schedule an operator?

Hi guys,

Is there a way to schedule an operator? I need an operator start the DAG once a day at 00am.

Best

--
Guilherme Hott
Software Engineer
Skype: guilhermehott
@guilhermehott
https://www.linkedin.com/in/guilhermehott

________________________________________________________

The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates and may only be used solely in performance of work or services for Capital One. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.

Re: Is there a way to schedule an operator?

Posted by Pramod Immaneni <pr...@datatorrent.com>.
There are some initiatives afoot such as APEXCORE-235 and APEXCORE-408
JIRAs and there have been discussions in the past about scheduling. It is a
valid use case and if there is sufficient interest and anyone is interested
in working on it, it would be great. Please feel free to add more
information to the above JIRAs or create a new one on how you would like
this feature to work.

Thanks

On Tue, Jun 13, 2017 at 3:36 PM, dashirov@yahoo.com <da...@yahoo.com>
wrote:

> I have input operators that reach out to Google, Facebook, Bing, Yahoo
> etc. once a day or an hour and download marketing spend statistics. Apex
> promises batch and streaming to be equal class citizens. How is this
> equality achieved if there's no scheduler for batch jobs to rely on? If
> want the dag to take data stream from batch pipeline and affect streaming
> pipelines running alongside. Do you not see this as a valid use case?
>
> Sent from Yahoo Mail on Android
> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>
> On Tue, Jun 13, 2017 at 5:29 PM, Guilherme Hott
> <gu...@gmail.com> wrote:
> Hi guys,
>
> Is there a way to schedule an operator? I need an operator start the DAG
> once a day at 00am.
>
> Best
>
> --
> *Guilherme Hott*
> *Software Engineer*
> Skype: guilhermehott
> @guilhermehott
> https://www.linkedin.com/in/guilhermehott
>
>

Re: Is there a way to schedule an operator?

Posted by Vlad Rozov <v....@datatorrent.com>.
I agree. IMO, application scheduling is not part of a streaming engine 
functionality and there are plenty of other projects that can help with 
it. A streaming engine whether in batch or streaming use case needs to 
support

- watermarks and triggers (with few exceptions mostly supported by Apex)
- effective resource utilization (contributors to help with the 
functionality are welcome)

Thank you,

Vlad

On 6/14/17 07:32, Amol Kekre wrote:
>
> The only thing missing is to kick off a job, in case the ask is to use 
> resources the batch way "use and terminate once done". An operator 
> that keeps an eye and has ability to kick off a job suffices. Kicking 
> off a batch job can be done via any of the following
>
> 1. Files
>    -> Start post all data arrival. Usually a .done file in a dir, 
> which triggers entire dir to be processed
>    -> Start asap and end on .done
> 2. Message (a start message)
>
> I think batch use cases are mainly #1. This technically is not a batch 
> vs stream use case, just a scheduler (Oozie like) part of batch.
>
> Thks
> Amol
>
>
> /
> /
>
> E:amol@datatorrent.com <ma...@datatorrent.com> | M: 
> 510-449-2606 | Twitter: @/amolhkekre/
>
> www.datatorrent.com <http://www.datatorrent.com>
>
>
> On Tue, Jun 13, 2017 at 11:47 PM, Ganelin, Ilya 
> <Ilya.Ganelin@capitalone.com <ma...@capitalone.com>> wrote:
>
>     I think it's a very relevant use case. In the Apex formulation
>     this would work as follows. An operator runs continuously and
>     maintains an internal state that tracks process files or an offset
>     (e.g. In Kafka). As more data becomes available, the operator
>     performs the appropriate operation and then returns to waiting. In
>     this fashion, batched data is processed as soon as it becomes
>     available but the process overall is still a batch process since
>     it's limited by the production of the source batches.
>
>     There are a couple of examples of this in Malhar, for example the
>     AbstractFileInputOperator.
>
>     Your earlier comment with regards to your motivation is
>     interesting. Can you elaborate on the load reduction you get with
>     your approach? A number of batched small writes to a DB may prove
>     to be more efficient from a latency or database utilization
>     standpoint when compared with infrequent large batch writes
>     particularly if they involve index updates.
>
>
>
>
>     ------------------------------------------------------------------------
>     *From:* dashirov@yahoo.com <ma...@yahoo.com>
>     <dashirov@yahoo.com <ma...@yahoo.com>>
>     *Sent:* Tuesday, June 13, 2017 6:36:29 PM
>     *To:* guilhermehott@gmail.com <ma...@gmail.com>;
>     users@apex.apache.org <ma...@apex.apache.org>
>     *Subject:* Re: Is there a way to schedule an operator?
>     I have input operators that reach out to Google, Facebook, Bing,
>     Yahoo etc. once a day or an hour and download marketing spend
>     statistics. Apex promises batch and streaming to be equal class
>     citizens. How is this equality achieved if there's no scheduler
>     for batch jobs to rely on? If want the dag to take data stream
>     from batch pipeline and affect streaming pipelines running
>     alongside. Do you not see this as a valid use case?
>
>     Sent from Yahoo Mail on Android
>     <https://overview.mail.yahoo.com/mobile/?.src=Android>
>
>         On Tue, Jun 13, 2017 at 5:29 PM, Guilherme Hott
>         <guilhermehott@gmail.com <ma...@gmail.com>> wrote:
>         Hi guys,
>
>         Is there a way to schedule an operator? I need an
>         operator start the DAG once a day at 00am.
>
>         Best
>
>         -- 
>         *Guilherme Hott*
>         /Software Engineer/
>         Skype: guilhermehott
>         @guilhermehott
>         https://www.linkedin.com/in/guilhermehott
>         <https://www.linkedin.com/in/guilhermehott>
>
>
>     ------------------------------------------------------------------------
>
>     The information contained in this e-mail is confidential and/or
>     proprietary to Capital One and/or its affiliates and may only be
>     used solely in performance of work or services for Capital One.
>     The information transmitted herewith is intended only for use by
>     the individual or entity to which it is addressed. If the reader
>     of this message is not the intended recipient, you are hereby
>     notified that any review, retransmission, dissemination,
>     distribution, copying or other use of, or taking of any action in
>     reliance upon this information is strictly prohibited. If you have
>     received this communication in error, please contact the sender
>     and delete the material from your computer.
>
>


Re: Is there a way to schedule an operator?

Posted by Amol Kekre <am...@datatorrent.com>.
The only thing missing is to kick off a job, in case the ask is to use
resources the batch way "use and terminate once done". An operator that
keeps an eye and has ability to kick off a job suffices. Kicking off a
batch job can be done via any of the following

1. Files
   -> Start post all data arrival. Usually a .done file in a dir, which
triggers entire dir to be processed
   -> Start asap and end on .done
2. Message (a start message)

I think batch use cases are mainly #1. This technically is not a batch vs
stream use case, just a scheduler (Oozie like) part of batch.

Thks
Amol



E:amol@datatorrent.com | M: 510-449-2606 | Twitter: @*amolhkekre*

www.datatorrent.com


On Tue, Jun 13, 2017 at 11:47 PM, Ganelin, Ilya <Ilya.Ganelin@capitalone.com
> wrote:

> I think it's a very relevant use case. In the Apex formulation this would
> work as follows. An operator runs continuously and maintains an internal
> state that tracks process files or an offset (e.g. In Kafka). As more data
> becomes available, the operator performs the appropriate operation and then
> returns to waiting. In this fashion, batched data is processed as soon as
> it becomes available but the process overall is still a batch process since
> it's limited by the production of the source batches.
>
> There are a couple of examples of this in Malhar, for example the
> AbstractFileInputOperator.
>
> Your earlier comment with regards to your motivation is interesting. Can
> you elaborate on the load reduction you get with your approach? A number of
> batched small writes to a DB may prove to be more efficient from a latency
> or database utilization standpoint when compared with infrequent large
> batch writes particularly if they involve index updates.
>
>
>
>
> ------------------------------
> *From:* dashirov@yahoo.com <da...@yahoo.com>
> *Sent:* Tuesday, June 13, 2017 6:36:29 PM
> *To:* guilhermehott@gmail.com; users@apex.apache.org
> *Subject:* Re: Is there a way to schedule an operator?
>
> I have input operators that reach out to Google, Facebook, Bing, Yahoo
> etc. once a day or an hour and download marketing spend statistics. Apex
> promises batch and streaming to be equal class citizens. How is this
> equality achieved if there's no scheduler for batch jobs to rely on? If
> want the dag to take data stream from batch pipeline and affect streaming
> pipelines running alongside. Do you not see this as a valid use case?
>
> Sent from Yahoo Mail on Android
> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>
> On Tue, Jun 13, 2017 at 5:29 PM, Guilherme Hott
> <gu...@gmail.com> wrote:
> Hi guys,
>
> Is there a way to schedule an operator? I need an operator start the DAG
> once a day at 00am.
>
> Best
>
> --
> *Guilherme Hott*
> *Software Engineer*
> Skype: guilhermehott
> @guilhermehott
> https://www.linkedin.com/in/guilhermehott
>
>
> ------------------------------
>
> The information contained in this e-mail is confidential and/or
> proprietary to Capital One and/or its affiliates and may only be used
> solely in performance of work or services for Capital One. The information
> transmitted herewith is intended only for use by the individual or entity
> to which it is addressed. If the reader of this message is not the intended
> recipient, you are hereby notified that any review, retransmission,
> dissemination, distribution, copying or other use of, or taking of any
> action in reliance upon this information is strictly prohibited. If you
> have received this communication in error, please contact the sender and
> delete the material from your computer.
>

RE: Is there a way to schedule an operator?

Posted by "Ganelin, Ilya" <Il...@capitalone.com>.
I think it's a very relevant use case. In the Apex formulation this would work as follows. An operator runs continuously and maintains an internal state that tracks process files or an offset (e.g. In Kafka). As more data becomes available, the operator performs the appropriate operation and then returns to waiting. In this fashion, batched data is processed as soon as it becomes available but the process overall is still a batch process since it's limited by the production of the source batches.

There are a couple of examples of this in Malhar, for example the AbstractFileInputOperator.

Your earlier comment with regards to your motivation is interesting. Can you elaborate on the load reduction you get with your approach? A number of batched small writes to a DB may prove to be more efficient from a latency or database utilization standpoint when compared with infrequent large batch writes particularly if they involve index updates.




________________________________
From: dashirov@yahoo.com <da...@yahoo.com>
Sent: Tuesday, June 13, 2017 6:36:29 PM
To: guilhermehott@gmail.com; users@apex.apache.org
Subject: Re: Is there a way to schedule an operator?

I have input operators that reach out to Google, Facebook, Bing, Yahoo etc. once a day or an hour and download marketing spend statistics. Apex promises batch and streaming to be equal class citizens. How is this equality achieved if there's no scheduler for batch jobs to rely on? If want the dag to take data stream from batch pipeline and affect streaming pipelines running alongside. Do you not see this as a valid use case?

Sent from Yahoo Mail on Android<https://overview.mail.yahoo.com/mobile/?.src=Android>

On Tue, Jun 13, 2017 at 5:29 PM, Guilherme Hott
<gu...@gmail.com> wrote:
Hi guys,

Is there a way to schedule an operator? I need an operator start the DAG once a day at 00am.

Best

--
Guilherme Hott
Software Engineer
Skype: guilhermehott
@guilhermehott
https://www.linkedin.com/in/guilhermehott

________________________________________________________

The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates and may only be used solely in performance of work or services for Capital One. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.

Re: Is there a way to schedule an operator?

Posted by "dashirov@yahoo.com" <da...@yahoo.com>.
I have input operators that reach out to Google, Facebook, Bing, Yahoo etc. once a day or an hour and download marketing spend statistics. Apex promises batch and streaming to be equal class citizens. How is this equality achieved if there's no scheduler for batch jobs to rely on? If want the dag to take data stream from batch pipeline and affect streaming pipelines running alongside. Do you not see this as a valid use case?
Sent from Yahoo Mail on Android 
 
  On Tue, Jun 13, 2017 at 5:29 PM, Guilherme Hott<gu...@gmail.com> wrote:   Hi guys,
Is there a way to schedule an operator? I need an operator start the DAG once a day at 00am.
Best
-- 
Guilherme HottSoftware EngineerSkype: guilhermehott@guilhermehotthttps://www.linkedin.com/in/guilhermehott