You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Georg Heiler <ge...@gmail.com> on 2017/04/23 16:02:33 UTC

Flink first project

New to flink I would like to do a small project to get a better feeling for
flink. I am thinking of getting some stats from several REST api (i.e.
Bitcoin course values from different exchanges) and comparing prices over
different exchanges in real time.

Are there already some REST api sources for flink as a sample to get
started implementing a custom REST source?

I was thinking about using https://github.com/timmolter/XChange to connect
to several exchanges. E.g. to make a single api call by hand would look
similar to

val currencyPair = new CurrencyPair(Currency.XMR, Currency.BTC)
  CertHelper.trustAllCerts()
  val poloniex =
ExchangeFactory.INSTANCE.createExchange(classOf[PoloniexExchange].getName)
  val dataService = poloniex.getMarketDataService

  generic(dataService)
  raw(dataService.asInstanceOf[PoloniexMarketDataServiceRaw])
System.out.println(dataService.getTicker(currencyPair))

How would be a proper way to make this available as a flink source? I have
seen
https://github.com/apache/flink/blob/master/flink-contrib/flink-connector-wikiedits/src/main/java/org/apache/flink/streaming/connectors/wikiedits/WikipediaEditsSource.java
but
new to flink am a bit unsure how to proceed.

Regards,
Georg

Re: Flink first project

Posted by Georg Heiler <ge...@gmail.com>.
Thanks for the overview. I think I will use akka streams and pipe the
result to kafka, then move on with flink.
Tzu-Li (Gordon) Tai <tz...@apache.org> schrieb am Do. 27. Apr. 2017 um
18:37:

> Hi Georg,
>
> Simply from the aspect of a Flink source that listens to a REST endpoint
> for input data, there should be quite a variety of options to do that. The
> Akka streaming source from Bahir should also serve this purpose well. It
> would also be quite straightforward to implement one yourself.
>
> On the other hand, what Jörn was suggesting was that you would want to
> first persist the incoming data from the REST endpoint to a repayable
> storage / queue, and your Flink job reads from that replayable storage /
> queue.
> The reason for this is that Flink’s checkpointing mechanism for
> exactly-once guarantee relies on a replayable source (see [1]), and since a
> REST endpoint is not replayable, you’ll not be able to benefit from the
> fault-tolerance guarantees provided by Flink. The most popular source used
> with Flink for exactly-once, currently, is Kafka [2]. The only extra
> latency compared to just fetching REST endpoint, in this setup, is writing
> to the intermediate Kafka topic.
>
> Of course, if you’re just testing around and just getting to know Flink,
> this setup isn’t necessary.
> You can just start off with a source such as the Flink Akka connector in
> Bahir, and start writing your first Flink job right away :)
>
> Cheers,
> Gordon
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/connectors/guarantees.html
> [2]
> https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/connectors/kafka.html
>
> On 24 April 2017 at 4:02:14 PM, Georg Heiler (georg.kf.heiler@gmail.com)
> wrote:
>
> Wouldn't adding flume -> Kafka -> flink also introduce additional latency?
>
> Georg Heiler <ge...@gmail.com> schrieb am So., 23. Apr. 2017 um
> 20:23 Uhr:
>
>> So you would suggest flume over a custom akka-source from bahir?
>>
>> Jörn Franke <jo...@gmail.com> schrieb am So., 23. Apr. 2017 um
>> 18:59 Uhr:
>>
>>> I would use flume to import these sources to HDFS and then use flink or
>>> Hadoop or whatever to process them. While it is possible to do it in flink,
>>> you do not want that your processing fails because the web service is not
>>> available etc.
>>> Via flume which is suitable for this kind of tasks it is more controlled
>>> and reliable.
>>>
>>> On 23. Apr 2017, at 18:02, Georg Heiler <ge...@gmail.com>
>>> wrote:
>>>
>>> New to flink I would like to do a small project to get a better feeling
>>> for flink. I am thinking of getting some stats from several REST api (i.e.
>>> Bitcoin course values from different exchanges) and comparing prices over
>>> different exchanges in real time.
>>>
>>> Are there already some REST api sources for flink as a sample to get
>>> started implementing a custom REST source?
>>>
>>> I was thinking about using https://github.com/timmolter/XChange to
>>> connect to several exchanges. E.g. to make a single api call by hand would
>>> look similar to
>>>
>>> val currencyPair = new CurrencyPair(Currency.XMR, Currency.BTC)
>>>   CertHelper.trustAllCerts()
>>>   val poloniex =
>>> ExchangeFactory.INSTANCE.createExchange(classOf[PoloniexExchange].getName)
>>>   val dataService = poloniex.getMarketDataService
>>>
>>>   generic(dataService)
>>>   raw(dataService.asInstanceOf[PoloniexMarketDataServiceRaw])
>>> System.out.println(dataService.getTicker(currencyPair))
>>>
>>> How would be a proper way to make this available as a flink source? I
>>> have seen
>>> https://github.com/apache/flink/blob/master/flink-contrib/flink-connector-wikiedits/src/main/java/org/apache/flink/streaming/connectors/wikiedits/WikipediaEditsSource.java but
>>> new to flink am a bit unsure how to proceed.
>>>
>>> Regards,
>>> Georg
>>>
>>>

Re: Flink first project

Posted by "Tzu-Li (Gordon) Tai" <tz...@apache.org>.
Hi Georg,

Simply from the aspect of a Flink source that listens to a REST endpoint for input data, there should be quite a variety of options to do that. The Akka streaming source from Bahir should also serve this purpose well. It would also be quite straightforward to implement one yourself.

On the other hand, what Jörn was suggesting was that you would want to first persist the incoming data from the REST endpoint to a repayable storage / queue, and your Flink job reads from that replayable storage / queue.
The reason for this is that Flink’s checkpointing mechanism for exactly-once guarantee relies on a replayable source (see [1]), and since a REST endpoint is not replayable, you’ll not be able to benefit from the fault-tolerance guarantees provided by Flink. The most popular source used with Flink for exactly-once, currently, is Kafka [2]. The only extra latency compared to just fetching REST endpoint, in this setup, is writing to the intermediate Kafka topic.

Of course, if you’re just testing around and just getting to know Flink, this setup isn’t necessary.
You can just start off with a source such as the Flink Akka connector in Bahir, and start writing your first Flink job right away :)

Cheers,
Gordon

[1] https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/connectors/guarantees.html
[2] https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/connectors/kafka.html

On 24 April 2017 at 4:02:14 PM, Georg Heiler (georg.kf.heiler@gmail.com) wrote:

Wouldn't adding flume -> Kafka -> flink also introduce additional latency?

Georg Heiler <ge...@gmail.com> schrieb am So., 23. Apr. 2017 um 20:23 Uhr:
So you would suggest flume over a custom akka-source from bahir?

Jörn Franke <jo...@gmail.com> schrieb am So., 23. Apr. 2017 um 18:59 Uhr:
I would use flume to import these sources to HDFS and then use flink or Hadoop or whatever to process them. While it is possible to do it in flink, you do not want that your processing fails because the web service is not available etc.
Via flume which is suitable for this kind of tasks it is more controlled and reliable.

On 23. Apr 2017, at 18:02, Georg Heiler <ge...@gmail.com> wrote:

New to flink I would like to do a small project to get a better feeling for flink. I am thinking of getting some stats from several REST api (i.e. Bitcoin course values from different exchanges) and comparing prices over different exchanges in real time.

Are there already some REST api sources for flink as a sample to get started implementing a custom REST source?

I was thinking about using https://github.com/timmolter/XChange to connect to several exchanges. E.g. to make a single api call by hand would look similar to

val currencyPair = new CurrencyPair(Currency.XMR, Currency.BTC)
  CertHelper.trustAllCerts()
  val poloniex = ExchangeFactory.INSTANCE.createExchange(classOf[PoloniexExchange].getName)
  val dataService = poloniex.getMarketDataService

  generic(dataService)
  raw(dataService.asInstanceOf[PoloniexMarketDataServiceRaw])
System.out.println(dataService.getTicker(currencyPair))

How would be a proper way to make this available as a flink source? I have seen https://github.com/apache/flink/blob/master/flink-contrib/flink-connector-wikiedits/src/main/java/org/apache/flink/streaming/connectors/wikiedits/WikipediaEditsSource.java but new to flink am a bit unsure how to proceed.

Regards,
Georg

Re: Flink first project

Posted by Georg Heiler <ge...@gmail.com>.
Wouldn't adding flume -> Kafka -> flink also introduce additional latency?

Georg Heiler <ge...@gmail.com> schrieb am So., 23. Apr. 2017 um
20:23 Uhr:

> So you would suggest flume over a custom akka-source from bahir?
>
> Jörn Franke <jo...@gmail.com> schrieb am So., 23. Apr. 2017 um
> 18:59 Uhr:
>
>> I would use flume to import these sources to HDFS and then use flink or
>> Hadoop or whatever to process them. While it is possible to do it in flink,
>> you do not want that your processing fails because the web service is not
>> available etc.
>> Via flume which is suitable for this kind of tasks it is more controlled
>> and reliable.
>>
>> On 23. Apr 2017, at 18:02, Georg Heiler <ge...@gmail.com>
>> wrote:
>>
>> New to flink I would like to do a small project to get a better feeling
>> for flink. I am thinking of getting some stats from several REST api (i.e.
>> Bitcoin course values from different exchanges) and comparing prices over
>> different exchanges in real time.
>>
>> Are there already some REST api sources for flink as a sample to get
>> started implementing a custom REST source?
>>
>> I was thinking about using https://github.com/timmolter/XChange to
>> connect to several exchanges. E.g. to make a single api call by hand would
>> look similar to
>>
>> val currencyPair = new CurrencyPair(Currency.XMR, Currency.BTC)
>>   CertHelper.trustAllCerts()
>>   val poloniex =
>> ExchangeFactory.INSTANCE.createExchange(classOf[PoloniexExchange].getName)
>>   val dataService = poloniex.getMarketDataService
>>
>>   generic(dataService)
>>   raw(dataService.asInstanceOf[PoloniexMarketDataServiceRaw])
>> System.out.println(dataService.getTicker(currencyPair))
>>
>> How would be a proper way to make this available as a flink source? I
>> have seen
>> https://github.com/apache/flink/blob/master/flink-contrib/flink-connector-wikiedits/src/main/java/org/apache/flink/streaming/connectors/wikiedits/WikipediaEditsSource.java but
>> new to flink am a bit unsure how to proceed.
>>
>> Regards,
>> Georg
>>
>>

Re: Flink first project

Posted by Georg Heiler <ge...@gmail.com>.
So you would suggest flume over a custom akka-source from bahir?

Jörn Franke <jo...@gmail.com> schrieb am So., 23. Apr. 2017 um
18:59 Uhr:

> I would use flume to import these sources to HDFS and then use flink or
> Hadoop or whatever to process them. While it is possible to do it in flink,
> you do not want that your processing fails because the web service is not
> available etc.
> Via flume which is suitable for this kind of tasks it is more controlled
> and reliable.
>
> On 23. Apr 2017, at 18:02, Georg Heiler <ge...@gmail.com> wrote:
>
> New to flink I would like to do a small project to get a better feeling
> for flink. I am thinking of getting some stats from several REST api (i.e.
> Bitcoin course values from different exchanges) and comparing prices over
> different exchanges in real time.
>
> Are there already some REST api sources for flink as a sample to get
> started implementing a custom REST source?
>
> I was thinking about using https://github.com/timmolter/XChange to
> connect to several exchanges. E.g. to make a single api call by hand would
> look similar to
>
> val currencyPair = new CurrencyPair(Currency.XMR, Currency.BTC)
>   CertHelper.trustAllCerts()
>   val poloniex =
> ExchangeFactory.INSTANCE.createExchange(classOf[PoloniexExchange].getName)
>   val dataService = poloniex.getMarketDataService
>
>   generic(dataService)
>   raw(dataService.asInstanceOf[PoloniexMarketDataServiceRaw])
> System.out.println(dataService.getTicker(currencyPair))
>
> How would be a proper way to make this available as a flink source? I have
> seen
> https://github.com/apache/flink/blob/master/flink-contrib/flink-connector-wikiedits/src/main/java/org/apache/flink/streaming/connectors/wikiedits/WikipediaEditsSource.java but
> new to flink am a bit unsure how to proceed.
>
> Regards,
> Georg
>
>

Re: Flink first project

Posted by Jörn Franke <jo...@gmail.com>.
I would use flume to import these sources to HDFS and then use flink or Hadoop or whatever to process them. While it is possible to do it in flink, you do not want that your processing fails because the web service is not available etc.
Via flume which is suitable for this kind of tasks it is more controlled and reliable.

> On 23. Apr 2017, at 18:02, Georg Heiler <ge...@gmail.com> wrote:
> 
> New to flink I would like to do a small project to get a better feeling for flink. I am thinking of getting some stats from several REST api (i.e. Bitcoin course values from different exchanges) and comparing prices over different exchanges in real time.
> 
> Are there already some REST api sources for flink as a sample to get started implementing a custom REST source?
> 
> I was thinking about using https://github.com/timmolter/XChange to connect to several exchanges. E.g. to make a single api call by hand would look similar to
> 
> val currencyPair = new CurrencyPair(Currency.XMR, Currency.BTC)
>   CertHelper.trustAllCerts()
>   val poloniex = ExchangeFactory.INSTANCE.createExchange(classOf[PoloniexExchange].getName)
>   val dataService = poloniex.getMarketDataService
> 
>   generic(dataService)
>   raw(dataService.asInstanceOf[PoloniexMarketDataServiceRaw])
> System.out.println(dataService.getTicker(currencyPair))
> 
> How would be a proper way to make this available as a flink source? I have seen https://github.com/apache/flink/blob/master/flink-contrib/flink-connector-wikiedits/src/main/java/org/apache/flink/streaming/connectors/wikiedits/WikipediaEditsSource.java but new to flink am a bit unsure how to proceed.
> 
> Regards,
> Georg