You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Aarti Gupta <aa...@qualys.com> on 2018/11/12 07:15:58 UTC

Ingesting data from an API

Hi,

I have an API that emits output that I want to use as a data source for
Flink.

I have written a custom source function that is as follows -

public class DynamicRuleSource extends AlertingRuleSource {
    private ArrayList<Rule> rules = new ArrayList<Rule>();


    public void run(SourceContext<Rule> ctx) throws Exception {
        System.out.println("In run ");
        while(true) {
            while (!rules.isEmpty()) {
                Rule rule = rules.remove(0);
                ctx.collectWithTimestamp(rule, 0);
                ctx.emitWatermark(new Watermark(0));
            }
            Thread.sleep(1000);
        }
    }

    public void addRule(Rule rule) {
        rules.add(rule);
    }

    @Override
    public void cancel() {
    }
}


When the API is invoked, it calls the addRule method in my CustomSource
function.

The run method in CustomSource polls for any data to be ingested.

The same object instance is shared with the API and the Flink Execution
environment, however, the output of the API does not get ingested into the
Flink DataStream.

Is this the right pattern to use, or is Kafka the recommended way of
streaming data into Flink ?

--Aarti

-- 
Aarti Gupta <https://www.linkedin.com/company/qualys>
Director, Engineering, Correlation


aagupta@qualys.com
T


Qualys, Inc. – Blog <https://qualys.com/blog> | Community
<https://community.qualys.com> | Twitter <https://twitter.com/qualys>


<https://www.qualys.com/email-banner>

Re: Ingesting data from an API

Posted by "Tzu-Li (Gordon) Tai" <tz...@apache.org>.
Hi Aarti,

I would imagine that the described approach (sharing the same object
instance with the API and the Flink runtime) would only work in toy
executions, such as executing the job within the IDE.

Moreover, you would not be able to have exactly-once semantics with this
source, which for most users, is one main advantage of choosing Kafka as
the source for Flink jobs.
With the way how Flink checkpointing works [1], Flink's exactly-once
guarantees rely on the fact that Kafka records are replayable from a
deterministic record offset.

Please let me know if you have any further questions.

Cheers,
Gordon

[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.6/internals/stream_checkpointing.html

On Mon, Nov 12, 2018 at 3:16 PM Aarti Gupta <aa...@qualys.com> wrote:

> Hi,
>
> I have an API that emits output that I want to use as a data source for
> Flink.
>
> I have written a custom source function that is as follows -
>
> public class DynamicRuleSource extends AlertingRuleSource {
>     private ArrayList<Rule> rules = new ArrayList<Rule>();
>
>
>     public void run(SourceContext<Rule> ctx) throws Exception {
>         System.out.println("In run ");
>         while(true) {
>             while (!rules.isEmpty()) {
>                 Rule rule = rules.remove(0);
>                 ctx.collectWithTimestamp(rule, 0);
>                 ctx.emitWatermark(new Watermark(0));
>             }
>             Thread.sleep(1000);
>         }
>     }
>
>     public void addRule(Rule rule) {
>         rules.add(rule);
>     }
>
>     @Override
>     public void cancel() {
>     }
> }
>
>
> When the API is invoked, it calls the addRule method in my CustomSource
> function.
>
> The run method in CustomSource polls for any data to be ingested.
>
> The same object instance is shared with the API and the Flink Execution
> environment, however, the output of the API does not get ingested into the
> Flink DataStream.
>
> Is this the right pattern to use, or is Kafka the recommended way of
> streaming data into Flink ?
>
> --Aarti
>
> --
> Aarti Gupta <https://www.linkedin.com/company/qualys>
> Director, Engineering, Correlation
>
>
> aagupta@qualys.com
> T
>
>
> Qualys, Inc. – Blog <https://qualys.com/blog> | Community
> <https://community.qualys.com> | Twitter <https://twitter.com/qualys>
>
>
> <https://www.qualys.com/email-banner>
>