You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by Matt <dr...@gmail.com> on 2017/04/07 19:01:01 UTC

Flink + Druid example?

Hi all,

I'm looking for an example of Tranquility (Druid's lib) as a Flink sink.

I'm trying to follow the code in [1] but I feel it's incomplete or maybe
outdated, it doesn't mention anything about other method (tranquilizer)
that seems to be part of the BeamFactory interface in the current version.

If anyone has any code or a working project to use as a reference that
would be awesome for me and for the rest of us looking for a time-series
database solution!

Best regards,
Matt

[1] https://github.com/druid-io/tranquility/blob/master/docs/flink.md

Re: Flink + Druid example?

Posted by dr...@gmail.com.

Thank you for the information, I'll have a look.

> On Apr 10, 2017, at 06:02, Steven Le Roux <le...@gmail.com> wrote:
> 
> Hi,
> 
> I'm head of @OvhMetrics which is a Cloud scaled managed time series platform targetting IoT and Monitoring.
> 
> We're also using @warp10io components with some glue and optimisations. The storage layer is based on Apache HBase which is to me an ideal compromise between storage efficiency (bytes per data point, compression, no indexing), and performance (range scan capacities, custom filters, ...)
> 
> This allows us to use two paradigm to produce data : either you use the HTTP endpoint, either MR targetting directly HBase since Warp10 has strong hadoop integration.
> 
> Advantages of Warp10 vs Influx : 
>   - Warp10 is fully open source, influx is not (clustering not available as OSS)
>   - Influx is good at ingestion but it needs your data to come in order. Real time use cases show that data points don't arrive in order (some are retained, buffering make older point to arrive after newest, etc...)
>   - Warp10 has been measured at 1.8M data points/s per thread! (and not in an optimised case)
>   - The true power of Warp10 is WarpScript: its query language that adopts a data flow approach and has been designed for Time series from ground up. Our customers are doing truely amazing things with WarpScript that contains nearly 800 functions...  It brings analytics and signal processing over your time series data
>   - Warp10 can be deployed either standalone (in-mem or leveldb) or distributed mode (hbase)
>   - Security is mandatory and does not affect performance
>   - you can delete massive amounts of data range or just a single point easily.
> 
> 
> Matt, if you want few metrics of our use of Warp10 inside OVH :
>   - 450M of unique series
>   - nominal load of 1.5M datapoints/s
>   - we have a delete rate of 10M data points/s
> 
> 
> If you have more interest in Warp10, you can ask there :  https://groups.google.com/forum/#!forum/warp10-users
> 
> 
> Regards,
> 
> 
> 
>> On Mon, Apr 10, 2017 at 10:26 AM, Alexis Gendronneau <a....@gmail.com> wrote:
>> hi,
>> 
>> Did you know http://www.warp10.io/ ? It's a geotimeserie database. As far as i know this techno can handle 100k+  points per node ingestion, and its query language is powerful. I already tried it to process timeseries correlation. I'm pretty sure you wont be disappionted by it. 
>> 
>> Regards,
>> 
>> 2017-04-09 17:07 GMT+02:00 Matt <dr...@gmail.com>:
>>> I just noticed the first link is wrong, I intended to send [1] instead.
>>> 
>>> On a second look at InfluxDB, the compression is really better than Druid, same for write and read performance. I'll have a deeper look before committing to one.
>>> 
>>> [1] https://cdn2.hubspot.net/hub/528953/hubfs/Screen_Shot_2016-08-27_at_00.32.42.png?t=1491606817725
>>> 
>>> On Sat, Apr 8, 2017 at 9:40 PM, Matt <dr...@gmail.com> wrote:
>>>> I compared them some days ago.
>>>> 
>>>> I found a useful article about many of the tsdb available out there [1], check the big table on the article, it's really helpful. The thing that bothered me the most about InfluxDB was not being able to setup a cluster using the open source distribution, that may not be a problem in the future but I preferred to be able to do so now.
>>>> 
>>>> Regarding Druid there is also a really interesting talk by one of its committers [2]. I liked some of the decisions they made regarding the way queries are executed and the way the data is stored on disk (they have taken some ideas from the search engine industry).
>>>> 
>>>> The other promising alternative is Prometheus, though I haven't had a look at it yet, I plan to do so in the near future.
>>>> 
>>>> If anyone is using a time-series database and wants to tell us about it that would be helpful!
>>>> 
>>>> Best regards,
>>>> Matt
>>>> 
>>>> [1] https://blog.netsil.com/a-comparison-of-time-series-databases-and-netsils-use-of-druid-db805d471206
>>>> [2] https://www.youtube.com/watch?v=vbH8E0nH2Nw
>>>> 
>>>>> On Sat, Apr 8, 2017 at 8:16 PM, Ted Yu <yu...@gmail.com> wrote:
>>>>> I found this related post:
>>>>> 
>>>>> https://groups.google.com/forum/#!topic/druid-user/Co5WUZOMnEk
>>>>> 
>>>>>> On Sat, Apr 8, 2017 at 3:56 PM, Traku traku <tr...@gmail.com> wrote:
>>>>>> I'm using Influxdb. I think influxdb is easier as time-series database solution.
>>>>>> 
>>>>>> Did you compare them?
>>>>>> 
>>>>>> Best regards.
>>>>>> 
>>>>>> 2017-04-07 21:01 GMT+02:00 Matt <dr...@gmail.com>:
>>>>>>> Hi all,
>>>>>>> 
>>>>>>> I'm looking for an example of Tranquility (Druid's lib) as a Flink sink.
>>>>>>> 
>>>>>>> I'm trying to follow the code in [1] but I feel it's incomplete or maybe outdated, it doesn't mention anything about other method (tranquilizer) that seems to be part of the BeamFactory interface in the current version.
>>>>>>> 
>>>>>>> If anyone has any code or a working project to use as a reference that would be awesome for me and for the rest of us looking for a time-series database solution!
>>>>>>> 
>>>>>>> Best regards,
>>>>>>> Matt
>>>>>>> 
>>>>>>> [1] https://github.com/druid-io/tranquility/blob/master/docs/flink.md
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 
>> 
>> 
>> -- 
>> Alexis Gendronneau
>> 
>> alexis.gendronneau@corp.ovh.com
>> a.gendronneau@gmail.com
>

Re: Flink + Druid example?

Posted by Steven Le Roux <le...@gmail.com>.

Hi,

I'm head of @OvhMetrics which is a Cloud scaled managed time series
platform targetting IoT and Monitoring.

We're also using @warp10io components with some glue and optimisations. The
storage layer is based on Apache HBase which is to me an ideal compromise
between storage efficiency (bytes per data point, compression, no
indexing), and performance (range scan capacities, custom filters, ...)

This allows us to use two paradigm to produce data : either you use the
HTTP endpoint, either MR targetting directly HBase since Warp10 has strong
hadoop integration.

Advantages of Warp10 vs Influx :
  - Warp10 is fully open source, influx is not (clustering not available as
OSS)
  - Influx is good at ingestion but it needs your data to come in order.
Real time use cases show that data points don't arrive in order (some are
retained, buffering make older point to arrive after newest, etc...)
  - Warp10 has been measured at 1.8M data points/s per thread! (and not in
an optimised case)
  - The true power of Warp10 is WarpScript: its query language that adopts
a data flow approach and has been designed for Time series from ground up.
Our customers are doing truely amazing things with WarpScript that contains
nearly 800 functions...  It brings analytics and signal processing over
your time series data
  - Warp10 can be deployed either standalone (in-mem or leveldb) or
distributed mode (hbase)
  - Security is mandatory and does not affect performance
  - you can delete massive amounts of data range or just a single point
easily.


Matt, if you want few metrics of our use of Warp10 inside OVH :
  - 450M of unique series
  - nominal load of 1.5M datapoints/s
  - we have a delete rate of 10M data points/s


If you have more interest in Warp10, you can ask there :
https://groups.google.com/forum/#!forum/warp10-users


Regards,



On Mon, Apr 10, 2017 at 10:26 AM, Alexis Gendronneau <
a.gendronneau@gmail.com> wrote:

> hi,
>
> Did you know http://www.warp10.io/ ? It's a geotimeserie database. As far
> as i know this techno can handle 100k+  points per node ingestion, and its
> query language is powerful. I already tried it to process timeseries
> correlation. I'm pretty sure you wont be disappionted by it.
>
> Regards,
>
> 2017-04-09 17:07 GMT+02:00 Matt <dr...@gmail.com>:
>
>> I just noticed the first link is wrong, I intended to send [1] instead.
>>
>> On a second look at InfluxDB, the compression is really better than
>> Druid, same for write and read performance. I'll have a deeper look before
>> committing to one.
>>
>> [1] https://cdn2.hubspot.net/hub/528953/hubfs/Screen_Shot_20
>> 16-08-27_at_00.32.42.png?t=1491606817725
>>
>> On Sat, Apr 8, 2017 at 9:40 PM, Matt <dr...@gmail.com> wrote:
>>
>>> I compared them some days ago.
>>>
>>> I found a useful article about many of the tsdb available out there [1],
>>> check the big table on the article, it's really helpful. The thing that
>>> bothered me the most about InfluxDB was not being able to setup a cluster
>>> using the open source distribution, that may not be a problem in the future
>>> but I preferred to be able to do so now.
>>>
>>> Regarding Druid there is also a really interesting talk by one of its
>>> committers [2]. I liked some of the decisions they made regarding the way
>>> queries are executed and the way the data is stored on disk (they have
>>> taken some ideas from the search engine industry).
>>>
>>> The other promising alternative is Prometheus, though I haven't had a
>>> look at it yet, I plan to do so in the near future.
>>>
>>> If anyone is using a time-series database and wants to tell us about it
>>> that would be helpful!
>>>
>>> Best regards,
>>> Matt
>>>
>>> [1] https://blog.netsil.com/a-comparison-of-time-series-data
>>> bases-and-netsils-use-of-druid-db805d471206
>>> [2] https://www.youtube.com/watch?v=vbH8E0nH2Nw
>>>
>>> On Sat, Apr 8, 2017 at 8:16 PM, Ted Yu <yu...@gmail.com> wrote:
>>>
>>>> I found this related post:
>>>>
>>>> https://groups.google.com/forum/#!topic/druid-user/Co5WUZOMnEk
>>>>
>>>> On Sat, Apr 8, 2017 at 3:56 PM, Traku traku <tr...@gmail.com> wrote:
>>>>
>>>>> I'm using Influxdb. I think influxdb is easier as time-series database
>>>>> solution.
>>>>>
>>>>> Did you compare them?
>>>>>
>>>>> Best regards.
>>>>>
>>>>> 2017-04-07 21:01 GMT+02:00 Matt <dr...@gmail.com>:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I'm looking for an example of Tranquility (Druid's lib) as a Flink
>>>>>> sink.
>>>>>>
>>>>>> I'm trying to follow the code in [1] but I feel it's incomplete or
>>>>>> maybe outdated, it doesn't mention anything about other method
>>>>>> (tranquilizer) that seems to be part of the BeamFactory interface in the
>>>>>> current version.
>>>>>>
>>>>>> If anyone has any code or a working project to use as a reference
>>>>>> that would be awesome for me and for the rest of us looking for a
>>>>>> time-series database solution!
>>>>>>
>>>>>> Best regards,
>>>>>> Matt
>>>>>>
>>>>>> [1] https://github.com/druid-io/tranquility/blob/master/docs/flink.md
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>
>
> --
> Alexis Gendronneau
>
> alexis.gendronneau@corp.ovh.com
> a.gendronneau@gmail.com
>

Re: Flink + Druid example?

Posted by Alexis Gendronneau <a....@gmail.com>.

hi,

Did you know http://www.warp10.io/ ? It's a geotimeserie database. As far
as i know this techno can handle 100k+  points per node ingestion, and its
query language is powerful. I already tried it to process timeseries
correlation. I'm pretty sure you wont be disappionted by it.

Regards,

2017-04-09 17:07 GMT+02:00 Matt <dr...@gmail.com>:

> I just noticed the first link is wrong, I intended to send [1] instead.
>
> On a second look at InfluxDB, the compression is really better than Druid,
> same for write and read performance. I'll have a deeper look before
> committing to one.
>
> [1] https://cdn2.hubspot.net/hub/528953/hubfs/Screen_Shot_
> 2016-08-27_at_00.32.42.png?t=1491606817725
>
> On Sat, Apr 8, 2017 at 9:40 PM, Matt <dr...@gmail.com> wrote:
>
>> I compared them some days ago.
>>
>> I found a useful article about many of the tsdb available out there [1],
>> check the big table on the article, it's really helpful. The thing that
>> bothered me the most about InfluxDB was not being able to setup a cluster
>> using the open source distribution, that may not be a problem in the future
>> but I preferred to be able to do so now.
>>
>> Regarding Druid there is also a really interesting talk by one of its
>> committers [2]. I liked some of the decisions they made regarding the way
>> queries are executed and the way the data is stored on disk (they have
>> taken some ideas from the search engine industry).
>>
>> The other promising alternative is Prometheus, though I haven't had a
>> look at it yet, I plan to do so in the near future.
>>
>> If anyone is using a time-series database and wants to tell us about it
>> that would be helpful!
>>
>> Best regards,
>> Matt
>>
>> [1] https://blog.netsil.com/a-comparison-of-time-series-data
>> bases-and-netsils-use-of-druid-db805d471206
>> [2] https://www.youtube.com/watch?v=vbH8E0nH2Nw
>>
>> On Sat, Apr 8, 2017 at 8:16 PM, Ted Yu <yu...@gmail.com> wrote:
>>
>>> I found this related post:
>>>
>>> https://groups.google.com/forum/#!topic/druid-user/Co5WUZOMnEk
>>>
>>> On Sat, Apr 8, 2017 at 3:56 PM, Traku traku <tr...@gmail.com> wrote:
>>>
>>>> I'm using Influxdb. I think influxdb is easier as time-series database
>>>> solution.
>>>>
>>>> Did you compare them?
>>>>
>>>> Best regards.
>>>>
>>>> 2017-04-07 21:01 GMT+02:00 Matt <dr...@gmail.com>:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I'm looking for an example of Tranquility (Druid's lib) as a Flink
>>>>> sink.
>>>>>
>>>>> I'm trying to follow the code in [1] but I feel it's incomplete or
>>>>> maybe outdated, it doesn't mention anything about other method
>>>>> (tranquilizer) that seems to be part of the BeamFactory interface in the
>>>>> current version.
>>>>>
>>>>> If anyone has any code or a working project to use as a reference that
>>>>> would be awesome for me and for the rest of us looking for a time-series
>>>>> database solution!
>>>>>
>>>>> Best regards,
>>>>> Matt
>>>>>
>>>>> [1] https://github.com/druid-io/tranquility/blob/master/docs/flink.md
>>>>>
>>>>
>>>>
>>>
>>
>


-- 
Alexis Gendronneau

alexis.gendronneau@corp.ovh.com
a.gendronneau@gmail.com

Re: Flink + Druid example?

Posted by Matt <dr...@gmail.com>.

I just noticed the first link is wrong, I intended to send [1] instead.

On a second look at InfluxDB, the compression is really better than Druid,
same for write and read performance. I'll have a deeper look before
committing to one.

[1]
https://cdn2.hubspot.net/hub/528953/hubfs/Screen_Shot_2016-08-27_at_00.32.42.png?t=1491606817725

On Sat, Apr 8, 2017 at 9:40 PM, Matt <dr...@gmail.com> wrote:

> I compared them some days ago.
>
> I found a useful article about many of the tsdb available out there [1],
> check the big table on the article, it's really helpful. The thing that
> bothered me the most about InfluxDB was not being able to setup a cluster
> using the open source distribution, that may not be a problem in the future
> but I preferred to be able to do so now.
>
> Regarding Druid there is also a really interesting talk by one of its
> committers [2]. I liked some of the decisions they made regarding the way
> queries are executed and the way the data is stored on disk (they have
> taken some ideas from the search engine industry).
>
> The other promising alternative is Prometheus, though I haven't had a look
> at it yet, I plan to do so in the near future.
>
> If anyone is using a time-series database and wants to tell us about it
> that would be helpful!
>
> Best regards,
> Matt
>
> [1] https://blog.netsil.com/a-comparison-of-time-series-data
> bases-and-netsils-use-of-druid-db805d471206
> [2] https://www.youtube.com/watch?v=vbH8E0nH2Nw
>
> On Sat, Apr 8, 2017 at 8:16 PM, Ted Yu <yu...@gmail.com> wrote:
>
>> I found this related post:
>>
>> https://groups.google.com/forum/#!topic/druid-user/Co5WUZOMnEk
>>
>> On Sat, Apr 8, 2017 at 3:56 PM, Traku traku <tr...@gmail.com> wrote:
>>
>>> I'm using Influxdb. I think influxdb is easier as time-series database
>>> solution.
>>>
>>> Did you compare them?
>>>
>>> Best regards.
>>>
>>> 2017-04-07 21:01 GMT+02:00 Matt <dr...@gmail.com>:
>>>
>>>> Hi all,
>>>>
>>>> I'm looking for an example of Tranquility (Druid's lib) as a Flink sink.
>>>>
>>>> I'm trying to follow the code in [1] but I feel it's incomplete or
>>>> maybe outdated, it doesn't mention anything about other method
>>>> (tranquilizer) that seems to be part of the BeamFactory interface in the
>>>> current version.
>>>>
>>>> If anyone has any code or a working project to use as a reference that
>>>> would be awesome for me and for the rest of us looking for a time-series
>>>> database solution!
>>>>
>>>> Best regards,
>>>> Matt
>>>>
>>>> [1] https://github.com/druid-io/tranquility/blob/master/docs/flink.md
>>>>
>>>
>>>
>>
>

Re: Flink + Druid example?

Posted by Matt <dr...@gmail.com>.

I compared them some days ago.

I found a useful article about many of the tsdb available out there [1],
check the big table on the article, it's really helpful. The thing that
bothered me the most about InfluxDB was not being able to setup a cluster
using the open source distribution, that may not be a problem in the future
but I preferred to be able to do so now.

Regarding Druid there is also a really interesting talk by one of its
committers [2]. I liked some of the decisions they made regarding the way
queries are executed and the way the data is stored on disk (they have
taken some ideas from the search engine industry).

The other promising alternative is Prometheus, though I haven't had a look
at it yet, I plan to do so in the near future.

If anyone is using a time-series database and wants to tell us about it
that would be helpful!

Best regards,
Matt

[1] https://blog.netsil.com/a-comparison-of-time-series-
databases-and-netsils-use-of-druid-db805d471206
[2] https://www.youtube.com/watch?v=vbH8E0nH2Nw

On Sat, Apr 8, 2017 at 8:16 PM, Ted Yu <yu...@gmail.com> wrote:

> I found this related post:
>
> https://groups.google.com/forum/#!topic/druid-user/Co5WUZOMnEk
>
> On Sat, Apr 8, 2017 at 3:56 PM, Traku traku <tr...@gmail.com> wrote:
>
>> I'm using Influxdb. I think influxdb is easier as time-series database
>> solution.
>>
>> Did you compare them?
>>
>> Best regards.
>>
>> 2017-04-07 21:01 GMT+02:00 Matt <dr...@gmail.com>:
>>
>>> Hi all,
>>>
>>> I'm looking for an example of Tranquility (Druid's lib) as a Flink sink.
>>>
>>> I'm trying to follow the code in [1] but I feel it's incomplete or maybe
>>> outdated, it doesn't mention anything about other method (tranquilizer)
>>> that seems to be part of the BeamFactory interface in the current version.
>>>
>>> If anyone has any code or a working project to use as a reference that
>>> would be awesome for me and for the rest of us looking for a time-series
>>> database solution!
>>>
>>> Best regards,
>>> Matt
>>>
>>> [1] https://github.com/druid-io/tranquility/blob/master/docs/flink.md
>>>
>>
>>
>

Re: Flink + Druid example?

Posted by Ted Yu <yu...@gmail.com>.

I found this related post:

https://groups.google.com/forum/#!topic/druid-user/Co5WUZOMnEk

On Sat, Apr 8, 2017 at 3:56 PM, Traku traku <tr...@gmail.com> wrote:

> I'm using Influxdb. I think influxdb is easier as time-series database
> solution.
>
> Did you compare them?
>
> Best regards.
>
> 2017-04-07 21:01 GMT+02:00 Matt <dr...@gmail.com>:
>
>> Hi all,
>>
>> I'm looking for an example of Tranquility (Druid's lib) as a Flink sink.
>>
>> I'm trying to follow the code in [1] but I feel it's incomplete or maybe
>> outdated, it doesn't mention anything about other method (tranquilizer)
>> that seems to be part of the BeamFactory interface in the current version.
>>
>> If anyone has any code or a working project to use as a reference that
>> would be awesome for me and for the rest of us looking for a time-series
>> database solution!
>>
>> Best regards,
>> Matt
>>
>> [1] https://github.com/druid-io/tranquility/blob/master/docs/flink.md
>>
>
>

Re: Flink + Druid example?

Posted by Traku traku <tr...@gmail.com>.

I'm using Influxdb. I think influxdb is easier as time-series database
solution.

Did you compare them?

Best regards.

2017-04-07 21:01 GMT+02:00 Matt <dr...@gmail.com>:

> Hi all,
>
> I'm looking for an example of Tranquility (Druid's lib) as a Flink sink.
>
> I'm trying to follow the code in [1] but I feel it's incomplete or maybe
> outdated, it doesn't mention anything about other method (tranquilizer)
> that seems to be part of the BeamFactory interface in the current version.
>
> If anyone has any code or a working project to use as a reference that
> would be awesome for me and for the rest of us looking for a time-series
> database solution!
>
> Best regards,
> Matt
>
> [1] https://github.com/druid-io/tranquility/blob/master/docs/flink.md
>