You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sdap.apache.org by Julian Feinauer <j....@pragmaticminds.de> on 2019/03/29 08:49:40 UTC

Re: ... / Let me introduce myself

Hi all,

please excuse this mail.. this was some miscommunication between my mail client and me during subscription to the list.
So let me introduce myself for short… my name is Julian and I am mathematician (did my PhD in stochastics).
I live in germany and am the founder of a StartUp where we do a lot of data analytics, especially on “industrial data” and stream processing.
I’m involved in other incubating projects in this area as well, namely PLC4X, Edgent and IoTDB (and I do minor contributions to other projects).
Thus, I am very interested in what you guys do here and can hopefully contribute a bit : )

Julian

On 2019/03/29 08:26:45, Julian Feinauer <j....@pragmaticminds.de>> wrote:
>
>

Re: ... / Let me introduce myself

Posted by Julian Feinauer <jf...@apache.org>.
Hi Frank,

I just noticed that I didnt respond... 
Thank you very much for this detailed explanation.
If I understand it correctly you are "more" concerned with the application of the algorithm(s) to you specific datasets and formats than the development of these algorithms?

Julian

On 2019/04/02 17:43:40, Frank Greguska <fg...@gmail.com> wrote: 
> Unfortunately we are pretty sparse at documentation at this point so I will
> try to briefly summarize the work in that area here:
> 
> The anomaly detection consists mainly of two parts, analysis of the data
> itself and the ability to publish anomalies.
> 
> In terms of analyzing the data, we have focused on an algorithm we refer to
> as the "Daily Difference Average". In practice what this means is that for
> a given user-selected area and timeframe, for each day we extract the
> measurement values in the area on that day. Then we access a pre-computed
> climatology for that dataset and extract the same area from the
> climatology. Then we subtract the climatology data values from the data
> values on that day. Finally we average the differences in values. In this
> regard we can graph how far from "normal" (the climatology) measurements in
> an area on a given day are.
> 
> The algorithm is implemented here:
> https://github.com/apache/incubator-sdap-nexus/blob/49d7d43ea6c64e2d3055ab9af4ba07b948bbd2e1/analysis/webservice/algorithms_spark/DailyDifferenceAverageSpark.py
> 
> An example of the resulting plot can be seen here where we plot the
> difference from average sea surface temperature for the El Niño 3.4 region:
> https://imgur.com/a/gRLSrv8 (attaching the image directly causes the apache
> mail server to reject the message so I've uploaded it to imgur).
> 
> The ability to publish anomalies comes mainly from our Edge project:
> https://github.com/apache/incubator-sdap-edge
> In particular, the "oceanxtremes" plugin:
> https://github.com/apache/incubator-sdap-edge/tree/71d190599ca79591ef2bf2c116bfa86bc281059c/src/main/python/plugins/oceanxtremes
> This allows users to submit "anomalies" that capture the parameters used
> during the query so that other researchers can load up the exact same data
> and have a look for themselves. It also integrates with datacasting (
> https://datacasting.jpl.nasa.gov/) which is an RSS style feed that
> researchers could subscribe to in order to be notified of new anomalies.
> 
> I believe that mostly summarizes the work done so far, if anyone else has
> further input please share.
> 
> Thanks,
> 
> -Frank
> 
> On Mon, Apr 1, 2019 at 12:34 PM Julian Feinauer <
> j.feinauer@pragmaticminds.de> wrote:
> 
> > Hi Lewis,
> > Hi Frank,
> >
> > Thank you!
> > Of course I'll try to help with the release and RC checking.
> > I'm very interested in the anomaly detection... But did not find that much
> > documentation about it.
> > Could your point me towards it?
> >
> > Julian
> >
> > Von meinem Mobiltelefon gesendet
> >
> >
> > -------- Ursprüngliche Nachricht --------
> > Betreff: Re: ... / Let me introduce myself
> > Von: Frank Greguska
> > An: dev@sdap.apache.org
> > Cc:
> >
> > Welcome Julian, glad to have you on board.
> >
> > On Mon, Apr 1, 2019 at 8:25 AM Lewis John McGibbney <le...@apache.org>
> > wrote:
> >
> > > Hi Julian,
> > > Sounds great.
> > > Is there any particular part of SDAP that your interested in?
> > > The community is working towards its first incubating release so
> > hopefully
> > > you will be able to try it out soon. Reviewing the release candidate when
> > > it is prepared would be a real big help.
> > > Lewis
> > >
> > > On 2019/03/29 08:49:40, Julian Feinauer <j....@pragmaticminds.de>
> > > wrote:
> > > > Hi all,
> > > >
> > > > please excuse this mail.. this was some miscommunication between my
> > mail
> > > client and me during subscription to the list.
> > > > So let me introduce myself for short… my name is Julian and I am
> > > mathematician (did my PhD in stochastics).
> > > > I live in germany and am the founder of a StartUp where we do a lot of
> > > data analytics, especially on “industrial data” and stream processing.
> > > > I’m involved in other incubating projects in this area as well, namely
> > > PLC4X, Edgent and IoTDB (and I do minor contributions to other projects).
> > > > Thus, I am very interested in what you guys do here and can hopefully
> > > contribute a bit : )
> > > >
> > > > Julian
> > > >
> > > > On 2019/03/29 08:26:45, Julian Feinauer <j.feinauer@pragmaticminds.de
> > > <ma...@pragmaticminds.de>> wrote:
> > > > >
> > > > >
> > > >
> > >
> >
> 

Re: ... / Let me introduce myself

Posted by Kevin Ratnasekera <dj...@gmail.com>.
Hi Trevor,

As far as I know Apache have not yet applied to the GSoD. This is first
season of GSoD, may be not much people are aware of this at Apache. And
these days mentors are working on GSoC -  student application period.

It is great if we can submit ourselves, I would volunteer to be a mentor.

Regards
Kevin

On Tue, Apr 2, 2019 at 11:26 PM Trevor Grant <tr...@gmail.com>
wrote:

> Re: the weak docs-
>
> Have we considered getting involved in this? [1]
>
> If interested I can find out if Apache is going in as an org, or if we need
> to submit ourselves.  I mentored google summer of code a while back.
>
> [1] https://developers.google.com/season-of-docs/docs/timeline
>
> On Tue, Apr 2, 2019 at 12:44 PM Frank Greguska <fg...@gmail.com>
> wrote:
>
> > Unfortunately we are pretty sparse at documentation at this point so I
> will
> > try to briefly summarize the work in that area here:
> >
> > The anomaly detection consists mainly of two parts, analysis of the data
> > itself and the ability to publish anomalies.
> >
> > In terms of analyzing the data, we have focused on an algorithm we refer
> to
> > as the "Daily Difference Average". In practice what this means is that
> for
> > a given user-selected area and timeframe, for each day we extract the
> > measurement values in the area on that day. Then we access a pre-computed
> > climatology for that dataset and extract the same area from the
> > climatology. Then we subtract the climatology data values from the data
> > values on that day. Finally we average the differences in values. In this
> > regard we can graph how far from "normal" (the climatology) measurements
> in
> > an area on a given day are.
> >
> > The algorithm is implemented here:
> >
> >
> https://github.com/apache/incubator-sdap-nexus/blob/49d7d43ea6c64e2d3055ab9af4ba07b948bbd2e1/analysis/webservice/algorithms_spark/DailyDifferenceAverageSpark.py
> >
> > An example of the resulting plot can be seen here where we plot the
> > difference from average sea surface temperature for the El Niño 3.4
> region:
> > https://imgur.com/a/gRLSrv8 (attaching the image directly causes the
> > apache
> > mail server to reject the message so I've uploaded it to imgur).
> >
> > The ability to publish anomalies comes mainly from our Edge project:
> > https://github.com/apache/incubator-sdap-edge
> > In particular, the "oceanxtremes" plugin:
> >
> >
> https://github.com/apache/incubator-sdap-edge/tree/71d190599ca79591ef2bf2c116bfa86bc281059c/src/main/python/plugins/oceanxtremes
> > This allows users to submit "anomalies" that capture the parameters used
> > during the query so that other researchers can load up the exact same
> data
> > and have a look for themselves. It also integrates with datacasting (
> > https://datacasting.jpl.nasa.gov/) which is an RSS style feed that
> > researchers could subscribe to in order to be notified of new anomalies.
> >
> > I believe that mostly summarizes the work done so far, if anyone else has
> > further input please share.
> >
> > Thanks,
> >
> > -Frank
> >
> > On Mon, Apr 1, 2019 at 12:34 PM Julian Feinauer <
> > j.feinauer@pragmaticminds.de> wrote:
> >
> > > Hi Lewis,
> > > Hi Frank,
> > >
> > > Thank you!
> > > Of course I'll try to help with the release and RC checking.
> > > I'm very interested in the anomaly detection... But did not find that
> > much
> > > documentation about it.
> > > Could your point me towards it?
> > >
> > > Julian
> > >
> > > Von meinem Mobiltelefon gesendet
> > >
> > >
> > > -------- Ursprüngliche Nachricht --------
> > > Betreff: Re: ... / Let me introduce myself
> > > Von: Frank Greguska
> > > An: dev@sdap.apache.org
> > > Cc:
> > >
> > > Welcome Julian, glad to have you on board.
> > >
> > > On Mon, Apr 1, 2019 at 8:25 AM Lewis John McGibbney <
> lewismc@apache.org>
> > > wrote:
> > >
> > > > Hi Julian,
> > > > Sounds great.
> > > > Is there any particular part of SDAP that your interested in?
> > > > The community is working towards its first incubating release so
> > > hopefully
> > > > you will be able to try it out soon. Reviewing the release candidate
> > when
> > > > it is prepared would be a real big help.
> > > > Lewis
> > > >
> > > > On 2019/03/29 08:49:40, Julian Feinauer <
> j.feinauer@pragmaticminds.de>
> > > > wrote:
> > > > > Hi all,
> > > > >
> > > > > please excuse this mail.. this was some miscommunication between my
> > > mail
> > > > client and me during subscription to the list.
> > > > > So let me introduce myself for short… my name is Julian and I am
> > > > mathematician (did my PhD in stochastics).
> > > > > I live in germany and am the founder of a StartUp where we do a lot
> > of
> > > > data analytics, especially on “industrial data” and stream
> processing.
> > > > > I’m involved in other incubating projects in this area as well,
> > namely
> > > > PLC4X, Edgent and IoTDB (and I do minor contributions to other
> > projects).
> > > > > Thus, I am very interested in what you guys do here and can
> hopefully
> > > > contribute a bit : )
> > > > >
> > > > > Julian
> > > > >
> > > > > On 2019/03/29 08:26:45, Julian Feinauer <
> > j.feinauer@pragmaticminds.de
> > > > <ma...@pragmaticminds.de>> wrote:
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: ... / Let me introduce myself

Posted by Trevor Grant <tr...@gmail.com>.
Re: the weak docs-

Have we considered getting involved in this? [1]

If interested I can find out if Apache is going in as an org, or if we need
to submit ourselves.  I mentored google summer of code a while back.

[1] https://developers.google.com/season-of-docs/docs/timeline

On Tue, Apr 2, 2019 at 12:44 PM Frank Greguska <fg...@gmail.com> wrote:

> Unfortunately we are pretty sparse at documentation at this point so I will
> try to briefly summarize the work in that area here:
>
> The anomaly detection consists mainly of two parts, analysis of the data
> itself and the ability to publish anomalies.
>
> In terms of analyzing the data, we have focused on an algorithm we refer to
> as the "Daily Difference Average". In practice what this means is that for
> a given user-selected area and timeframe, for each day we extract the
> measurement values in the area on that day. Then we access a pre-computed
> climatology for that dataset and extract the same area from the
> climatology. Then we subtract the climatology data values from the data
> values on that day. Finally we average the differences in values. In this
> regard we can graph how far from "normal" (the climatology) measurements in
> an area on a given day are.
>
> The algorithm is implemented here:
>
> https://github.com/apache/incubator-sdap-nexus/blob/49d7d43ea6c64e2d3055ab9af4ba07b948bbd2e1/analysis/webservice/algorithms_spark/DailyDifferenceAverageSpark.py
>
> An example of the resulting plot can be seen here where we plot the
> difference from average sea surface temperature for the El Niño 3.4 region:
> https://imgur.com/a/gRLSrv8 (attaching the image directly causes the
> apache
> mail server to reject the message so I've uploaded it to imgur).
>
> The ability to publish anomalies comes mainly from our Edge project:
> https://github.com/apache/incubator-sdap-edge
> In particular, the "oceanxtremes" plugin:
>
> https://github.com/apache/incubator-sdap-edge/tree/71d190599ca79591ef2bf2c116bfa86bc281059c/src/main/python/plugins/oceanxtremes
> This allows users to submit "anomalies" that capture the parameters used
> during the query so that other researchers can load up the exact same data
> and have a look for themselves. It also integrates with datacasting (
> https://datacasting.jpl.nasa.gov/) which is an RSS style feed that
> researchers could subscribe to in order to be notified of new anomalies.
>
> I believe that mostly summarizes the work done so far, if anyone else has
> further input please share.
>
> Thanks,
>
> -Frank
>
> On Mon, Apr 1, 2019 at 12:34 PM Julian Feinauer <
> j.feinauer@pragmaticminds.de> wrote:
>
> > Hi Lewis,
> > Hi Frank,
> >
> > Thank you!
> > Of course I'll try to help with the release and RC checking.
> > I'm very interested in the anomaly detection... But did not find that
> much
> > documentation about it.
> > Could your point me towards it?
> >
> > Julian
> >
> > Von meinem Mobiltelefon gesendet
> >
> >
> > -------- Ursprüngliche Nachricht --------
> > Betreff: Re: ... / Let me introduce myself
> > Von: Frank Greguska
> > An: dev@sdap.apache.org
> > Cc:
> >
> > Welcome Julian, glad to have you on board.
> >
> > On Mon, Apr 1, 2019 at 8:25 AM Lewis John McGibbney <le...@apache.org>
> > wrote:
> >
> > > Hi Julian,
> > > Sounds great.
> > > Is there any particular part of SDAP that your interested in?
> > > The community is working towards its first incubating release so
> > hopefully
> > > you will be able to try it out soon. Reviewing the release candidate
> when
> > > it is prepared would be a real big help.
> > > Lewis
> > >
> > > On 2019/03/29 08:49:40, Julian Feinauer <j....@pragmaticminds.de>
> > > wrote:
> > > > Hi all,
> > > >
> > > > please excuse this mail.. this was some miscommunication between my
> > mail
> > > client and me during subscription to the list.
> > > > So let me introduce myself for short… my name is Julian and I am
> > > mathematician (did my PhD in stochastics).
> > > > I live in germany and am the founder of a StartUp where we do a lot
> of
> > > data analytics, especially on “industrial data” and stream processing.
> > > > I’m involved in other incubating projects in this area as well,
> namely
> > > PLC4X, Edgent and IoTDB (and I do minor contributions to other
> projects).
> > > > Thus, I am very interested in what you guys do here and can hopefully
> > > contribute a bit : )
> > > >
> > > > Julian
> > > >
> > > > On 2019/03/29 08:26:45, Julian Feinauer <
> j.feinauer@pragmaticminds.de
> > > <ma...@pragmaticminds.de>> wrote:
> > > > >
> > > > >
> > > >
> > >
> >
>

Re: ... / Let me introduce myself

Posted by Frank Greguska <fg...@gmail.com>.
Unfortunately we are pretty sparse at documentation at this point so I will
try to briefly summarize the work in that area here:

The anomaly detection consists mainly of two parts, analysis of the data
itself and the ability to publish anomalies.

In terms of analyzing the data, we have focused on an algorithm we refer to
as the "Daily Difference Average". In practice what this means is that for
a given user-selected area and timeframe, for each day we extract the
measurement values in the area on that day. Then we access a pre-computed
climatology for that dataset and extract the same area from the
climatology. Then we subtract the climatology data values from the data
values on that day. Finally we average the differences in values. In this
regard we can graph how far from "normal" (the climatology) measurements in
an area on a given day are.

The algorithm is implemented here:
https://github.com/apache/incubator-sdap-nexus/blob/49d7d43ea6c64e2d3055ab9af4ba07b948bbd2e1/analysis/webservice/algorithms_spark/DailyDifferenceAverageSpark.py

An example of the resulting plot can be seen here where we plot the
difference from average sea surface temperature for the El Niño 3.4 region:
https://imgur.com/a/gRLSrv8 (attaching the image directly causes the apache
mail server to reject the message so I've uploaded it to imgur).

The ability to publish anomalies comes mainly from our Edge project:
https://github.com/apache/incubator-sdap-edge
In particular, the "oceanxtremes" plugin:
https://github.com/apache/incubator-sdap-edge/tree/71d190599ca79591ef2bf2c116bfa86bc281059c/src/main/python/plugins/oceanxtremes
This allows users to submit "anomalies" that capture the parameters used
during the query so that other researchers can load up the exact same data
and have a look for themselves. It also integrates with datacasting (
https://datacasting.jpl.nasa.gov/) which is an RSS style feed that
researchers could subscribe to in order to be notified of new anomalies.

I believe that mostly summarizes the work done so far, if anyone else has
further input please share.

Thanks,

-Frank

On Mon, Apr 1, 2019 at 12:34 PM Julian Feinauer <
j.feinauer@pragmaticminds.de> wrote:

> Hi Lewis,
> Hi Frank,
>
> Thank you!
> Of course I'll try to help with the release and RC checking.
> I'm very interested in the anomaly detection... But did not find that much
> documentation about it.
> Could your point me towards it?
>
> Julian
>
> Von meinem Mobiltelefon gesendet
>
>
> -------- Ursprüngliche Nachricht --------
> Betreff: Re: ... / Let me introduce myself
> Von: Frank Greguska
> An: dev@sdap.apache.org
> Cc:
>
> Welcome Julian, glad to have you on board.
>
> On Mon, Apr 1, 2019 at 8:25 AM Lewis John McGibbney <le...@apache.org>
> wrote:
>
> > Hi Julian,
> > Sounds great.
> > Is there any particular part of SDAP that your interested in?
> > The community is working towards its first incubating release so
> hopefully
> > you will be able to try it out soon. Reviewing the release candidate when
> > it is prepared would be a real big help.
> > Lewis
> >
> > On 2019/03/29 08:49:40, Julian Feinauer <j....@pragmaticminds.de>
> > wrote:
> > > Hi all,
> > >
> > > please excuse this mail.. this was some miscommunication between my
> mail
> > client and me during subscription to the list.
> > > So let me introduce myself for short… my name is Julian and I am
> > mathematician (did my PhD in stochastics).
> > > I live in germany and am the founder of a StartUp where we do a lot of
> > data analytics, especially on “industrial data” and stream processing.
> > > I’m involved in other incubating projects in this area as well, namely
> > PLC4X, Edgent and IoTDB (and I do minor contributions to other projects).
> > > Thus, I am very interested in what you guys do here and can hopefully
> > contribute a bit : )
> > >
> > > Julian
> > >
> > > On 2019/03/29 08:26:45, Julian Feinauer <j.feinauer@pragmaticminds.de
> > <ma...@pragmaticminds.de>> wrote:
> > > >
> > > >
> > >
> >
>

AW: ... / Let me introduce myself

Posted by Julian Feinauer <j....@pragmaticminds.de>.
Hi Lewis,
Hi Frank,

Thank you!
Of course I'll try to help with the release and RC checking.
I'm very interested in the anomaly detection... But did not find that much documentation about it.
Could your point me towards it?

Julian

Von meinem Mobiltelefon gesendet


-------- Ursprüngliche Nachricht --------
Betreff: Re: ... / Let me introduce myself
Von: Frank Greguska
An: dev@sdap.apache.org
Cc:

Welcome Julian, glad to have you on board.

On Mon, Apr 1, 2019 at 8:25 AM Lewis John McGibbney <le...@apache.org>
wrote:

> Hi Julian,
> Sounds great.
> Is there any particular part of SDAP that your interested in?
> The community is working towards its first incubating release so hopefully
> you will be able to try it out soon. Reviewing the release candidate when
> it is prepared would be a real big help.
> Lewis
>
> On 2019/03/29 08:49:40, Julian Feinauer <j....@pragmaticminds.de>
> wrote:
> > Hi all,
> >
> > please excuse this mail.. this was some miscommunication between my mail
> client and me during subscription to the list.
> > So let me introduce myself for short… my name is Julian and I am
> mathematician (did my PhD in stochastics).
> > I live in germany and am the founder of a StartUp where we do a lot of
> data analytics, especially on “industrial data” and stream processing.
> > I’m involved in other incubating projects in this area as well, namely
> PLC4X, Edgent and IoTDB (and I do minor contributions to other projects).
> > Thus, I am very interested in what you guys do here and can hopefully
> contribute a bit : )
> >
> > Julian
> >
> > On 2019/03/29 08:26:45, Julian Feinauer <j.feinauer@pragmaticminds.de
> <ma...@pragmaticminds.de>> wrote:
> > >
> > >
> >
>

Re: ... / Let me introduce myself

Posted by Frank Greguska <fg...@gmail.com>.
Welcome Julian, glad to have you on board.

On Mon, Apr 1, 2019 at 8:25 AM Lewis John McGibbney <le...@apache.org>
wrote:

> Hi Julian,
> Sounds great.
> Is there any particular part of SDAP that your interested in?
> The community is working towards its first incubating release so hopefully
> you will be able to try it out soon. Reviewing the release candidate when
> it is prepared would be a real big help.
> Lewis
>
> On 2019/03/29 08:49:40, Julian Feinauer <j....@pragmaticminds.de>
> wrote:
> > Hi all,
> >
> > please excuse this mail.. this was some miscommunication between my mail
> client and me during subscription to the list.
> > So let me introduce myself for short… my name is Julian and I am
> mathematician (did my PhD in stochastics).
> > I live in germany and am the founder of a StartUp where we do a lot of
> data analytics, especially on “industrial data” and stream processing.
> > I’m involved in other incubating projects in this area as well, namely
> PLC4X, Edgent and IoTDB (and I do minor contributions to other projects).
> > Thus, I am very interested in what you guys do here and can hopefully
> contribute a bit : )
> >
> > Julian
> >
> > On 2019/03/29 08:26:45, Julian Feinauer <j.feinauer@pragmaticminds.de
> <ma...@pragmaticminds.de>> wrote:
> > >
> > >
> >
>

Re: ... / Let me introduce myself

Posted by Lewis John McGibbney <le...@apache.org>.
Hi Julian,
Sounds great. 
Is there any particular part of SDAP that your interested in?
The community is working towards its first incubating release so hopefully you will be able to try it out soon. Reviewing the release candidate when it is prepared would be a real big help.
Lewis

On 2019/03/29 08:49:40, Julian Feinauer <j....@pragmaticminds.de> wrote: 
> Hi all,
> 
> please excuse this mail.. this was some miscommunication between my mail client and me during subscription to the list.
> So let me introduce myself for short… my name is Julian and I am mathematician (did my PhD in stochastics).
> I live in germany and am the founder of a StartUp where we do a lot of data analytics, especially on “industrial data” and stream processing.
> I’m involved in other incubating projects in this area as well, namely PLC4X, Edgent and IoTDB (and I do minor contributions to other projects).
> Thus, I am very interested in what you guys do here and can hopefully contribute a bit : )
> 
> Julian
> 
> On 2019/03/29 08:26:45, Julian Feinauer <j....@pragmaticminds.de>> wrote:
> >
> >
>