You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@eagle.apache.org by "Liangfei.Su" <su...@gmail.com> on 2015/12/07 09:50:36 UTC

[DISCUSS] Provide analytic DSL support

Eagle input comes with stream data (like security audit log), eagle provide
alerting computation based on CEP DSL.

Similar to this process, eagle should be able to provide same DSL support
to expose the real-time monitoring feature, and furthermore could be
integrated with some storage backend (or as another streaming output) to
provide dashboard/presentation to user.

This would require
1. eagle programming API to support a new semantic of query(or
aggregation), using the similar alert DSL.
2. a clear definition of materialization interface, currently we might
start from the eagle built-in hbase storage implementation.
3. Metric API/Dashboard.


Currently, it require a lot of user customization and CEP engine capability
could not be reused. Try to capture this in
https://issues.apache.org/jira/browse/EAGLE-79.


Please suggest.


Thanks,

Ralph

Re: [DISCUSS] Provide analytic DSL support

Posted by "Liangfei.Su" <su...@gmail.com>.
No UI design yet.
I'm using EAGLE-79 <https://issues.apache.org/jira/browse/EAGLE-79> to
catch the support of stream analyze in eagle framework. UI to be separate
task.
Persist not in this jira either since it could be a parallel task.
On declarative persist, my consideration of persist is through metadata
definition of data source, and schema information. These metadatas should
be able to be update/reload in the similar way of policy lifecycle:

1. User change the metadata defintion
2. The persist executor would periodically reload the metadata from
underlying store. Use latest metadata it read.

Can you elaborate declarative persist? What's the user interface look like
and life cycle management for it?

Ralph


On Tue, Dec 15, 2015 at 3:52 PM, Zhang, Edward (GDI Hadoop) <
yonzhang@ebay.com> wrote:

> We probably also need support declarative persist along with this feature,
> I think.
>
> Is UI design started?
>
> Thanks
> Edward
>
> On 12/14/15, 23:34, "Liangfei.Su" <su...@gmail.com> wrote:
>
> >q1. Yes, same mechanism like policy definition would be used. User would
> >be
> >able to define an analyze-policy, and when analyze executor would try to
> >load and execute the policy. More, in programming, user could simply input
> >their analyze sql directly through API for simplicity.
> >
> >q2. Sure, user would be able to define simple analyze.
> >
> >q3. No, but there are dependency since Hao's work impact the API a lot.
> >pull 26 would quickly decouple this dependency.
> >
> >Ralph
> >
> >
> >On Mon, Dec 14, 2015 at 12:44 PM, Zhang, Edward (GDI Hadoop) <
> >yonzhang@ebay.com> wrote:
> >
> >> Thanks for updating.
> >> Some questions:
> >> 1. do we need aggregator declaration to downloaded from eagle service?
> >>(I
> >> believe it can be used in code directly) If that is true, can we use the
> >> same mechanism for policy lifecycle management? and do we want this
> >> declaration can be updated dynamically?
> >> 2. because aggregator declaration can be expressed with limited syntax,
> >> group by/max/top/avg/Š, is that possible future UI part can be more
> >> intuitive than current policy UI? :-)
> >> 3. How this design is aligned to general purpose monitoring design which
> >> Hao/Chen is working on. I mean in terms of input/output and business
> >> logic, will that be reused in the future?
> >>
> >> Those questions are not urgent request, but we can think of that while
> >> implementing.
> >>
> >> Thanks
> >> Edward
> >>
> >>
> >> On 12/13/15, 18:58, "Liangfei.Su" <su...@gmail.com> wrote:
> >>
> >> >Had a draft spec at
> >> >https://cwiki.apache.org/confluence/display/EAG/Stream+Analyze
> >> >
> >> >Please suggest.
> >> >
> >> >
> >> >Thanks,
> >> >Ralph
> >> >
> >> >
> >> >On Mon, Dec 7, 2015 at 6:00 PM, Liangfei.Su <su...@gmail.com>
> >>wrote:
> >> >
> >> >> For #1, the eagle programming API is mostly sit at the same place of
> >> >> Trident. Besides the platform independence and type safe, the eagle
> >>CEP
> >> >> could be used to help reduce the code effort to submit a topology.
> >>This
> >> >> extend the current alerting define experience to more wise cases.
> >> >>
> >> >> Like
> >> >> trident style of join
> >> >>
> >> >> topology.join(stream1, new Fields("key"), stream2, new Fields("x"),
> >>new
> >> >>Fields("key", "a", "b", "c"));
> >> >>
> >> >>
> >> >> to sql like
> >> >> from stream1=.., stream2=...
> >> >> select stream1.key, stream2.a, stream2.b, stream3.c where
> >> >> stream1.key=stream2.x
> >> >>
> >> >> from windowed join, things could be more complicated, and trident
> >> >>require
> >> >> user to do a couple of persiste/stateQuery by their code.
> >> >>
> >> >> Thanks,
> >> >> Ralph
> >> >>
> >> >> On Mon, Dec 7, 2015 at 5:16 PM, Chen, Hao <Ha...@ebay.com>
> wrote:
> >> >>
> >> >>> 1. Are you guys re-implement part of Trident?
> >> >>> >> 1) Trident is high-level API but field-based, eagle is
> >> >>>type-oritended.
> >> >>> >> 2) Eagle datastream is platform-indepent, not only on storm
> >> >>> >> 3) Eagle datastream support CEP CQL except for programming API.
> >> >>>
> >> >>> 2. How can the type information kept during the data processing by
> >> >>>Storm?
> >> >>> >> Type information is provided by Scala TypeTag[T]
> >> >>> >> Eagle could serialize valuable type information like type class,
> >> >>>type
> >> >>> fields and so on from TypeTag[T] before submitting to execution
> >> >>>environment
> >> >>> and then shared between processing element like spout/bolt.
> >> >>>
> >> >>>
> >> >>> Thanks,
> >> >>> Hao
> >> >>>
> >> >>>
> >> >>>
> >> >>>
> >> >>>
> >> >>> On 12/7/15, 5:02 PM, "Meng, Yiming" <yi...@ebay.com> wrote:
> >> >>>
> >> >>> >
> >> >>> >Quick questions:
> >> >>> >
> >> >>> >1. Are you guys re-implement part of Trident?
> >> >>> >2. How can the type information kept during the data processing by
> >> >>>Storm?
> >> >>> >
> >> >>> >Regards,
> >> >>> >Yiming Meng
> >> >>> >
> >> >>> >
> >> >>> >
> >> >>> >
> >> >>> >
> >> >>> >
> >> >>> >
> >> >>> >
> >> >>> >On 12/7/15, 4:58 PM, "Chen, Hao" <Ha...@ebay.com> wrote:
> >> >>> >
> >> >>> >>It¹s very good point and I¹m refactoring
> >> >>> https://issues.apache.org/jira/browse/EAGLE-66, after the work, we
> >> >>>could
> >> >>> start with Analytics DSL using siddhi.
> >> >>> >>
> >> >>> >>After we finished:
> >> >>> >>1. Typesafe DSL: EAGLE-66
> >> >>> >>2. SQL CEP (siddhi): EAGLE-79
> >> >>> >>3. DAG visualization/status/metric/dashboard
> >> >>> >>
> >> >>> >>We could even propose eagle-datastream as a general streaming
> >> >>>framework
> >> >>> independently for any streaming cases like real-time ETL.
> >> >>> >>
> >> >>> >>Thanks,
> >> >>> >>Hao
> >> >>> >>
> >> >>> >>
> >> >>> >>
> >> >>> >>
> >> >>> >>On 12/7/15, 4:50 PM, "Liangfei.Su" <su...@gmail.com> wrote:
> >> >>> >>
> >> >>> >>>Eagle input comes with stream data (like security audit log),
> >>eagle
> >> >>> provide
> >> >>> >>>alerting computation based on CEP DSL.
> >> >>> >>>
> >> >>> >>>Similar to this process, eagle should be able to provide same DSL
> >> >>> support
> >> >>> >>>to expose the real-time monitoring feature, and furthermore
> >>could be
> >> >>> >>>integrated with some storage backend (or as another streaming
> >> >>>output)
> >> >>> to
> >> >>> >>>provide dashboard/presentation to user.
> >> >>> >>>
> >> >>> >>>This would require
> >> >>> >>>1. eagle programming API to support a new semantic of query(or
> >> >>> >>>aggregation), using the similar alert DSL.
> >> >>> >>>2. a clear definition of materialization interface, currently we
> >> >>>might
> >> >>> >>>start from the eagle built-in hbase storage implementation.
> >> >>> >>>3. Metric API/Dashboard.
> >> >>> >>>
> >> >>> >>>
> >> >>> >>>Currently, it require a lot of user customization and CEP engine
> >> >>> capability
> >> >>> >>>could not be reused. Try to capture this in
> >> >>> >>>https://issues.apache.org/jira/browse/EAGLE-79.
> >> >>> >>>
> >> >>> >>>
> >> >>> >>>Please suggest.
> >> >>> >>>
> >> >>> >>>
> >> >>> >>>Thanks,
> >> >>> >>>
> >> >>> >>>Ralph
> >> >>>
> >> >>
> >> >>
> >>
> >>
>
>

Re: [DISCUSS] Provide analytic DSL support

Posted by "Zhang, Edward (GDI Hadoop)" <yo...@ebay.com>.
We probably also need support declarative persist along with this feature,
I think.

Is UI design started?

Thanks
Edward

On 12/14/15, 23:34, "Liangfei.Su" <su...@gmail.com> wrote:

>q1. Yes, same mechanism like policy definition would be used. User would
>be
>able to define an analyze-policy, and when analyze executor would try to
>load and execute the policy. More, in programming, user could simply input
>their analyze sql directly through API for simplicity.
>
>q2. Sure, user would be able to define simple analyze.
>
>q3. No, but there are dependency since Hao's work impact the API a lot.
>pull 26 would quickly decouple this dependency.
>
>Ralph
>
>
>On Mon, Dec 14, 2015 at 12:44 PM, Zhang, Edward (GDI Hadoop) <
>yonzhang@ebay.com> wrote:
>
>> Thanks for updating.
>> Some questions:
>> 1. do we need aggregator declaration to downloaded from eagle service?
>>(I
>> believe it can be used in code directly) If that is true, can we use the
>> same mechanism for policy lifecycle management? and do we want this
>> declaration can be updated dynamically?
>> 2. because aggregator declaration can be expressed with limited syntax,
>> group by/max/top/avg/Š, is that possible future UI part can be more
>> intuitive than current policy UI? :-)
>> 3. How this design is aligned to general purpose monitoring design which
>> Hao/Chen is working on. I mean in terms of input/output and business
>> logic, will that be reused in the future?
>>
>> Those questions are not urgent request, but we can think of that while
>> implementing.
>>
>> Thanks
>> Edward
>>
>>
>> On 12/13/15, 18:58, "Liangfei.Su" <su...@gmail.com> wrote:
>>
>> >Had a draft spec at
>> >https://cwiki.apache.org/confluence/display/EAG/Stream+Analyze
>> >
>> >Please suggest.
>> >
>> >
>> >Thanks,
>> >Ralph
>> >
>> >
>> >On Mon, Dec 7, 2015 at 6:00 PM, Liangfei.Su <su...@gmail.com>
>>wrote:
>> >
>> >> For #1, the eagle programming API is mostly sit at the same place of
>> >> Trident. Besides the platform independence and type safe, the eagle
>>CEP
>> >> could be used to help reduce the code effort to submit a topology.
>>This
>> >> extend the current alerting define experience to more wise cases.
>> >>
>> >> Like
>> >> trident style of join
>> >>
>> >> topology.join(stream1, new Fields("key"), stream2, new Fields("x"),
>>new
>> >>Fields("key", "a", "b", "c"));
>> >>
>> >>
>> >> to sql like
>> >> from stream1=.., stream2=...
>> >> select stream1.key, stream2.a, stream2.b, stream3.c where
>> >> stream1.key=stream2.x
>> >>
>> >> from windowed join, things could be more complicated, and trident
>> >>require
>> >> user to do a couple of persiste/stateQuery by their code.
>> >>
>> >> Thanks,
>> >> Ralph
>> >>
>> >> On Mon, Dec 7, 2015 at 5:16 PM, Chen, Hao <Ha...@ebay.com> wrote:
>> >>
>> >>> 1. Are you guys re-implement part of Trident?
>> >>> >> 1) Trident is high-level API but field-based, eagle is
>> >>>type-oritended.
>> >>> >> 2) Eagle datastream is platform-indepent, not only on storm
>> >>> >> 3) Eagle datastream support CEP CQL except for programming API.
>> >>>
>> >>> 2. How can the type information kept during the data processing by
>> >>>Storm?
>> >>> >> Type information is provided by Scala TypeTag[T]
>> >>> >> Eagle could serialize valuable type information like type class,
>> >>>type
>> >>> fields and so on from TypeTag[T] before submitting to execution
>> >>>environment
>> >>> and then shared between processing element like spout/bolt.
>> >>>
>> >>>
>> >>> Thanks,
>> >>> Hao
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> On 12/7/15, 5:02 PM, "Meng, Yiming" <yi...@ebay.com> wrote:
>> >>>
>> >>> >
>> >>> >Quick questions:
>> >>> >
>> >>> >1. Are you guys re-implement part of Trident?
>> >>> >2. How can the type information kept during the data processing by
>> >>>Storm?
>> >>> >
>> >>> >Regards,
>> >>> >Yiming Meng
>> >>> >
>> >>> >
>> >>> >
>> >>> >
>> >>> >
>> >>> >
>> >>> >
>> >>> >
>> >>> >On 12/7/15, 4:58 PM, "Chen, Hao" <Ha...@ebay.com> wrote:
>> >>> >
>> >>> >>It¹s very good point and I¹m refactoring
>> >>> https://issues.apache.org/jira/browse/EAGLE-66, after the work, we
>> >>>could
>> >>> start with Analytics DSL using siddhi.
>> >>> >>
>> >>> >>After we finished:
>> >>> >>1. Typesafe DSL: EAGLE-66
>> >>> >>2. SQL CEP (siddhi): EAGLE-79
>> >>> >>3. DAG visualization/status/metric/dashboard
>> >>> >>
>> >>> >>We could even propose eagle-datastream as a general streaming
>> >>>framework
>> >>> independently for any streaming cases like real-time ETL.
>> >>> >>
>> >>> >>Thanks,
>> >>> >>Hao
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >>On 12/7/15, 4:50 PM, "Liangfei.Su" <su...@gmail.com> wrote:
>> >>> >>
>> >>> >>>Eagle input comes with stream data (like security audit log),
>>eagle
>> >>> provide
>> >>> >>>alerting computation based on CEP DSL.
>> >>> >>>
>> >>> >>>Similar to this process, eagle should be able to provide same DSL
>> >>> support
>> >>> >>>to expose the real-time monitoring feature, and furthermore
>>could be
>> >>> >>>integrated with some storage backend (or as another streaming
>> >>>output)
>> >>> to
>> >>> >>>provide dashboard/presentation to user.
>> >>> >>>
>> >>> >>>This would require
>> >>> >>>1. eagle programming API to support a new semantic of query(or
>> >>> >>>aggregation), using the similar alert DSL.
>> >>> >>>2. a clear definition of materialization interface, currently we
>> >>>might
>> >>> >>>start from the eagle built-in hbase storage implementation.
>> >>> >>>3. Metric API/Dashboard.
>> >>> >>>
>> >>> >>>
>> >>> >>>Currently, it require a lot of user customization and CEP engine
>> >>> capability
>> >>> >>>could not be reused. Try to capture this in
>> >>> >>>https://issues.apache.org/jira/browse/EAGLE-79.
>> >>> >>>
>> >>> >>>
>> >>> >>>Please suggest.
>> >>> >>>
>> >>> >>>
>> >>> >>>Thanks,
>> >>> >>>
>> >>> >>>Ralph
>> >>>
>> >>
>> >>
>>
>>


Re: [DISCUSS] Provide analytic DSL support

Posted by "Liangfei.Su" <su...@gmail.com>.
q1. Yes, same mechanism like policy definition would be used. User would be
able to define an analyze-policy, and when analyze executor would try to
load and execute the policy. More, in programming, user could simply input
their analyze sql directly through API for simplicity.

q2. Sure, user would be able to define simple analyze.

q3. No, but there are dependency since Hao's work impact the API a lot.
pull 26 would quickly decouple this dependency.

Ralph


On Mon, Dec 14, 2015 at 12:44 PM, Zhang, Edward (GDI Hadoop) <
yonzhang@ebay.com> wrote:

> Thanks for updating.
> Some questions:
> 1. do we need aggregator declaration to downloaded from eagle service? (I
> believe it can be used in code directly) If that is true, can we use the
> same mechanism for policy lifecycle management? and do we want this
> declaration can be updated dynamically?
> 2. because aggregator declaration can be expressed with limited syntax,
> group by/max/top/avg/Š, is that possible future UI part can be more
> intuitive than current policy UI? :-)
> 3. How this design is aligned to general purpose monitoring design which
> Hao/Chen is working on. I mean in terms of input/output and business
> logic, will that be reused in the future?
>
> Those questions are not urgent request, but we can think of that while
> implementing.
>
> Thanks
> Edward
>
>
> On 12/13/15, 18:58, "Liangfei.Su" <su...@gmail.com> wrote:
>
> >Had a draft spec at
> >https://cwiki.apache.org/confluence/display/EAG/Stream+Analyze
> >
> >Please suggest.
> >
> >
> >Thanks,
> >Ralph
> >
> >
> >On Mon, Dec 7, 2015 at 6:00 PM, Liangfei.Su <su...@gmail.com> wrote:
> >
> >> For #1, the eagle programming API is mostly sit at the same place of
> >> Trident. Besides the platform independence and type safe, the eagle CEP
> >> could be used to help reduce the code effort to submit a topology. This
> >> extend the current alerting define experience to more wise cases.
> >>
> >> Like
> >> trident style of join
> >>
> >> topology.join(stream1, new Fields("key"), stream2, new Fields("x"), new
> >>Fields("key", "a", "b", "c"));
> >>
> >>
> >> to sql like
> >> from stream1=.., stream2=...
> >> select stream1.key, stream2.a, stream2.b, stream3.c where
> >> stream1.key=stream2.x
> >>
> >> from windowed join, things could be more complicated, and trident
> >>require
> >> user to do a couple of persiste/stateQuery by their code.
> >>
> >> Thanks,
> >> Ralph
> >>
> >> On Mon, Dec 7, 2015 at 5:16 PM, Chen, Hao <Ha...@ebay.com> wrote:
> >>
> >>> 1. Are you guys re-implement part of Trident?
> >>> >> 1) Trident is high-level API but field-based, eagle is
> >>>type-oritended.
> >>> >> 2) Eagle datastream is platform-indepent, not only on storm
> >>> >> 3) Eagle datastream support CEP CQL except for programming API.
> >>>
> >>> 2. How can the type information kept during the data processing by
> >>>Storm?
> >>> >> Type information is provided by Scala TypeTag[T]
> >>> >> Eagle could serialize valuable type information like type class,
> >>>type
> >>> fields and so on from TypeTag[T] before submitting to execution
> >>>environment
> >>> and then shared between processing element like spout/bolt.
> >>>
> >>>
> >>> Thanks,
> >>> Hao
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> On 12/7/15, 5:02 PM, "Meng, Yiming" <yi...@ebay.com> wrote:
> >>>
> >>> >
> >>> >Quick questions:
> >>> >
> >>> >1. Are you guys re-implement part of Trident?
> >>> >2. How can the type information kept during the data processing by
> >>>Storm?
> >>> >
> >>> >Regards,
> >>> >Yiming Meng
> >>> >
> >>> >
> >>> >
> >>> >
> >>> >
> >>> >
> >>> >
> >>> >
> >>> >On 12/7/15, 4:58 PM, "Chen, Hao" <Ha...@ebay.com> wrote:
> >>> >
> >>> >>It¹s very good point and I¹m refactoring
> >>> https://issues.apache.org/jira/browse/EAGLE-66, after the work, we
> >>>could
> >>> start with Analytics DSL using siddhi.
> >>> >>
> >>> >>After we finished:
> >>> >>1. Typesafe DSL: EAGLE-66
> >>> >>2. SQL CEP (siddhi): EAGLE-79
> >>> >>3. DAG visualization/status/metric/dashboard
> >>> >>
> >>> >>We could even propose eagle-datastream as a general streaming
> >>>framework
> >>> independently for any streaming cases like real-time ETL.
> >>> >>
> >>> >>Thanks,
> >>> >>Hao
> >>> >>
> >>> >>
> >>> >>
> >>> >>
> >>> >>On 12/7/15, 4:50 PM, "Liangfei.Su" <su...@gmail.com> wrote:
> >>> >>
> >>> >>>Eagle input comes with stream data (like security audit log), eagle
> >>> provide
> >>> >>>alerting computation based on CEP DSL.
> >>> >>>
> >>> >>>Similar to this process, eagle should be able to provide same DSL
> >>> support
> >>> >>>to expose the real-time monitoring feature, and furthermore could be
> >>> >>>integrated with some storage backend (or as another streaming
> >>>output)
> >>> to
> >>> >>>provide dashboard/presentation to user.
> >>> >>>
> >>> >>>This would require
> >>> >>>1. eagle programming API to support a new semantic of query(or
> >>> >>>aggregation), using the similar alert DSL.
> >>> >>>2. a clear definition of materialization interface, currently we
> >>>might
> >>> >>>start from the eagle built-in hbase storage implementation.
> >>> >>>3. Metric API/Dashboard.
> >>> >>>
> >>> >>>
> >>> >>>Currently, it require a lot of user customization and CEP engine
> >>> capability
> >>> >>>could not be reused. Try to capture this in
> >>> >>>https://issues.apache.org/jira/browse/EAGLE-79.
> >>> >>>
> >>> >>>
> >>> >>>Please suggest.
> >>> >>>
> >>> >>>
> >>> >>>Thanks,
> >>> >>>
> >>> >>>Ralph
> >>>
> >>
> >>
>
>

Re: [DISCUSS] Provide analytic DSL support

Posted by "Zhang, Edward (GDI Hadoop)" <yo...@ebay.com>.
Thanks for updating.
Some questions:
1. do we need aggregator declaration to downloaded from eagle service? (I
believe it can be used in code directly) If that is true, can we use the
same mechanism for policy lifecycle management? and do we want this
declaration can be updated dynamically?
2. because aggregator declaration can be expressed with limited syntax,
group by/max/top/avg/Š, is that possible future UI part can be more
intuitive than current policy UI? :-)
3. How this design is aligned to general purpose monitoring design which
Hao/Chen is working on. I mean in terms of input/output and business
logic, will that be reused in the future?

Those questions are not urgent request, but we can think of that while
implementing.

Thanks
Edward


On 12/13/15, 18:58, "Liangfei.Su" <su...@gmail.com> wrote:

>Had a draft spec at
>https://cwiki.apache.org/confluence/display/EAG/Stream+Analyze
>
>Please suggest.
>
>
>Thanks,
>Ralph
>
>
>On Mon, Dec 7, 2015 at 6:00 PM, Liangfei.Su <su...@gmail.com> wrote:
>
>> For #1, the eagle programming API is mostly sit at the same place of
>> Trident. Besides the platform independence and type safe, the eagle CEP
>> could be used to help reduce the code effort to submit a topology. This
>> extend the current alerting define experience to more wise cases.
>>
>> Like
>> trident style of join
>>
>> topology.join(stream1, new Fields("key"), stream2, new Fields("x"), new
>>Fields("key", "a", "b", "c"));
>>
>>
>> to sql like
>> from stream1=.., stream2=...
>> select stream1.key, stream2.a, stream2.b, stream3.c where
>> stream1.key=stream2.x
>>
>> from windowed join, things could be more complicated, and trident
>>require
>> user to do a couple of persiste/stateQuery by their code.
>>
>> Thanks,
>> Ralph
>>
>> On Mon, Dec 7, 2015 at 5:16 PM, Chen, Hao <Ha...@ebay.com> wrote:
>>
>>> 1. Are you guys re-implement part of Trident?
>>> >> 1) Trident is high-level API but field-based, eagle is
>>>type-oritended.
>>> >> 2) Eagle datastream is platform-indepent, not only on storm
>>> >> 3) Eagle datastream support CEP CQL except for programming API.
>>>
>>> 2. How can the type information kept during the data processing by
>>>Storm?
>>> >> Type information is provided by Scala TypeTag[T]
>>> >> Eagle could serialize valuable type information like type class,
>>>type
>>> fields and so on from TypeTag[T] before submitting to execution
>>>environment
>>> and then shared between processing element like spout/bolt.
>>>
>>>
>>> Thanks,
>>> Hao
>>>
>>>
>>>
>>>
>>>
>>> On 12/7/15, 5:02 PM, "Meng, Yiming" <yi...@ebay.com> wrote:
>>>
>>> >
>>> >Quick questions:
>>> >
>>> >1. Are you guys re-implement part of Trident?
>>> >2. How can the type information kept during the data processing by
>>>Storm?
>>> >
>>> >Regards,
>>> >Yiming Meng
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >On 12/7/15, 4:58 PM, "Chen, Hao" <Ha...@ebay.com> wrote:
>>> >
>>> >>It¹s very good point and I¹m refactoring
>>> https://issues.apache.org/jira/browse/EAGLE-66, after the work, we
>>>could
>>> start with Analytics DSL using siddhi.
>>> >>
>>> >>After we finished:
>>> >>1. Typesafe DSL: EAGLE-66
>>> >>2. SQL CEP (siddhi): EAGLE-79
>>> >>3. DAG visualization/status/metric/dashboard
>>> >>
>>> >>We could even propose eagle-datastream as a general streaming
>>>framework
>>> independently for any streaming cases like real-time ETL.
>>> >>
>>> >>Thanks,
>>> >>Hao
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>On 12/7/15, 4:50 PM, "Liangfei.Su" <su...@gmail.com> wrote:
>>> >>
>>> >>>Eagle input comes with stream data (like security audit log), eagle
>>> provide
>>> >>>alerting computation based on CEP DSL.
>>> >>>
>>> >>>Similar to this process, eagle should be able to provide same DSL
>>> support
>>> >>>to expose the real-time monitoring feature, and furthermore could be
>>> >>>integrated with some storage backend (or as another streaming
>>>output)
>>> to
>>> >>>provide dashboard/presentation to user.
>>> >>>
>>> >>>This would require
>>> >>>1. eagle programming API to support a new semantic of query(or
>>> >>>aggregation), using the similar alert DSL.
>>> >>>2. a clear definition of materialization interface, currently we
>>>might
>>> >>>start from the eagle built-in hbase storage implementation.
>>> >>>3. Metric API/Dashboard.
>>> >>>
>>> >>>
>>> >>>Currently, it require a lot of user customization and CEP engine
>>> capability
>>> >>>could not be reused. Try to capture this in
>>> >>>https://issues.apache.org/jira/browse/EAGLE-79.
>>> >>>
>>> >>>
>>> >>>Please suggest.
>>> >>>
>>> >>>
>>> >>>Thanks,
>>> >>>
>>> >>>Ralph
>>>
>>
>>


Re: [DISCUSS] Provide analytic DSL support

Posted by "Liangfei.Su" <su...@gmail.com>.
Had a draft spec at
https://cwiki.apache.org/confluence/display/EAG/Stream+Analyze

Please suggest.


Thanks,
Ralph


On Mon, Dec 7, 2015 at 6:00 PM, Liangfei.Su <su...@gmail.com> wrote:

> For #1, the eagle programming API is mostly sit at the same place of
> Trident. Besides the platform independence and type safe, the eagle CEP
> could be used to help reduce the code effort to submit a topology. This
> extend the current alerting define experience to more wise cases.
>
> Like
> trident style of join
>
> topology.join(stream1, new Fields("key"), stream2, new Fields("x"), new Fields("key", "a", "b", "c"));
>
>
> to sql like
> from stream1=.., stream2=...
> select stream1.key, stream2.a, stream2.b, stream3.c where
> stream1.key=stream2.x
>
> from windowed join, things could be more complicated, and trident require
> user to do a couple of persiste/stateQuery by their code.
>
> Thanks,
> Ralph
>
> On Mon, Dec 7, 2015 at 5:16 PM, Chen, Hao <Ha...@ebay.com> wrote:
>
>> 1. Are you guys re-implement part of Trident?
>> >> 1) Trident is high-level API but field-based, eagle is type-oritended.
>> >> 2) Eagle datastream is platform-indepent, not only on storm
>> >> 3) Eagle datastream support CEP CQL except for programming API.
>>
>> 2. How can the type information kept during the data processing by Storm?
>> >> Type information is provided by Scala TypeTag[T]
>> >> Eagle could serialize valuable type information like type class, type
>> fields and so on from TypeTag[T] before submitting to execution environment
>> and then shared between processing element like spout/bolt.
>>
>>
>> Thanks,
>> Hao
>>
>>
>>
>>
>>
>> On 12/7/15, 5:02 PM, "Meng, Yiming" <yi...@ebay.com> wrote:
>>
>> >
>> >Quick questions:
>> >
>> >1. Are you guys re-implement part of Trident?
>> >2. How can the type information kept during the data processing by Storm?
>> >
>> >Regards,
>> >Yiming Meng
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >On 12/7/15, 4:58 PM, "Chen, Hao" <Ha...@ebay.com> wrote:
>> >
>> >>It’s very good point and I’m refactoring
>> https://issues.apache.org/jira/browse/EAGLE-66, after the work, we could
>> start with Analytics DSL using siddhi.
>> >>
>> >>After we finished:
>> >>1. Typesafe DSL: EAGLE-66
>> >>2. SQL CEP (siddhi): EAGLE-79
>> >>3. DAG visualization/status/metric/dashboard
>> >>
>> >>We could even propose eagle-datastream as a general streaming framework
>> independently for any streaming cases like real-time ETL.
>> >>
>> >>Thanks,
>> >>Hao
>> >>
>> >>
>> >>
>> >>
>> >>On 12/7/15, 4:50 PM, "Liangfei.Su" <su...@gmail.com> wrote:
>> >>
>> >>>Eagle input comes with stream data (like security audit log), eagle
>> provide
>> >>>alerting computation based on CEP DSL.
>> >>>
>> >>>Similar to this process, eagle should be able to provide same DSL
>> support
>> >>>to expose the real-time monitoring feature, and furthermore could be
>> >>>integrated with some storage backend (or as another streaming output)
>> to
>> >>>provide dashboard/presentation to user.
>> >>>
>> >>>This would require
>> >>>1. eagle programming API to support a new semantic of query(or
>> >>>aggregation), using the similar alert DSL.
>> >>>2. a clear definition of materialization interface, currently we might
>> >>>start from the eagle built-in hbase storage implementation.
>> >>>3. Metric API/Dashboard.
>> >>>
>> >>>
>> >>>Currently, it require a lot of user customization and CEP engine
>> capability
>> >>>could not be reused. Try to capture this in
>> >>>https://issues.apache.org/jira/browse/EAGLE-79.
>> >>>
>> >>>
>> >>>Please suggest.
>> >>>
>> >>>
>> >>>Thanks,
>> >>>
>> >>>Ralph
>>
>
>

Re: [DISCUSS] Provide analytic DSL support

Posted by "Liangfei.Su" <su...@gmail.com>.
For #1, the eagle programming API is mostly sit at the same place of
Trident. Besides the platform independence and type safe, the eagle CEP
could be used to help reduce the code effort to submit a topology. This
extend the current alerting define experience to more wise cases.

Like
trident style of join

topology.join(stream1, new Fields("key"), stream2, new Fields("x"),
new Fields("key", "a", "b", "c"));


to sql like
from stream1=.., stream2=...
select stream1.key, stream2.a, stream2.b, stream3.c where
stream1.key=stream2.x

from windowed join, things could be more complicated, and trident require
user to do a couple of persiste/stateQuery by their code.

Thanks,
Ralph

On Mon, Dec 7, 2015 at 5:16 PM, Chen, Hao <Ha...@ebay.com> wrote:

> 1. Are you guys re-implement part of Trident?
> >> 1) Trident is high-level API but field-based, eagle is type-oritended.
> >> 2) Eagle datastream is platform-indepent, not only on storm
> >> 3) Eagle datastream support CEP CQL except for programming API.
>
> 2. How can the type information kept during the data processing by Storm?
> >> Type information is provided by Scala TypeTag[T]
> >> Eagle could serialize valuable type information like type class, type
> fields and so on from TypeTag[T] before submitting to execution environment
> and then shared between processing element like spout/bolt.
>
>
> Thanks,
> Hao
>
>
>
>
>
> On 12/7/15, 5:02 PM, "Meng, Yiming" <yi...@ebay.com> wrote:
>
> >
> >Quick questions:
> >
> >1. Are you guys re-implement part of Trident?
> >2. How can the type information kept during the data processing by Storm?
> >
> >Regards,
> >Yiming Meng
> >
> >
> >
> >
> >
> >
> >
> >
> >On 12/7/15, 4:58 PM, "Chen, Hao" <Ha...@ebay.com> wrote:
> >
> >>It’s very good point and I’m refactoring
> https://issues.apache.org/jira/browse/EAGLE-66, after the work, we could
> start with Analytics DSL using siddhi.
> >>
> >>After we finished:
> >>1. Typesafe DSL: EAGLE-66
> >>2. SQL CEP (siddhi): EAGLE-79
> >>3. DAG visualization/status/metric/dashboard
> >>
> >>We could even propose eagle-datastream as a general streaming framework
> independently for any streaming cases like real-time ETL.
> >>
> >>Thanks,
> >>Hao
> >>
> >>
> >>
> >>
> >>On 12/7/15, 4:50 PM, "Liangfei.Su" <su...@gmail.com> wrote:
> >>
> >>>Eagle input comes with stream data (like security audit log), eagle
> provide
> >>>alerting computation based on CEP DSL.
> >>>
> >>>Similar to this process, eagle should be able to provide same DSL
> support
> >>>to expose the real-time monitoring feature, and furthermore could be
> >>>integrated with some storage backend (or as another streaming output) to
> >>>provide dashboard/presentation to user.
> >>>
> >>>This would require
> >>>1. eagle programming API to support a new semantic of query(or
> >>>aggregation), using the similar alert DSL.
> >>>2. a clear definition of materialization interface, currently we might
> >>>start from the eagle built-in hbase storage implementation.
> >>>3. Metric API/Dashboard.
> >>>
> >>>
> >>>Currently, it require a lot of user customization and CEP engine
> capability
> >>>could not be reused. Try to capture this in
> >>>https://issues.apache.org/jira/browse/EAGLE-79.
> >>>
> >>>
> >>>Please suggest.
> >>>
> >>>
> >>>Thanks,
> >>>
> >>>Ralph
>

Re: [DISCUSS] Provide analytic DSL support

Posted by "Chen, Hao" <Ha...@ebay.com>.
1. Are you guys re-implement part of Trident?
>> 1) Trident is high-level API but field-based, eagle is type-oritended.
>> 2) Eagle datastream is platform-indepent, not only on storm 
>> 3) Eagle datastream support CEP CQL except for programming API.

2. How can the type information kept during the data processing by Storm?
>> Type information is provided by Scala TypeTag[T]
>> Eagle could serialize valuable type information like type class, type fields and so on from TypeTag[T] before submitting to execution environment and then shared between processing element like spout/bolt.


Thanks,
Hao





On 12/7/15, 5:02 PM, "Meng, Yiming" <yi...@ebay.com> wrote:

>
>Quick questions:
>
>1. Are you guys re-implement part of Trident?
>2. How can the type information kept during the data processing by Storm? 
>
>Regards,
>Yiming Meng
>
>
>
>
>
>
>
>
>On 12/7/15, 4:58 PM, "Chen, Hao" <Ha...@ebay.com> wrote:
>
>>It’s very good point and I’m refactoring https://issues.apache.org/jira/browse/EAGLE-66, after the work, we could start with Analytics DSL using siddhi.
>>
>>After we finished:
>>1. Typesafe DSL: EAGLE-66
>>2. SQL CEP (siddhi): EAGLE-79
>>3. DAG visualization/status/metric/dashboard
>>
>>We could even propose eagle-datastream as a general streaming framework independently for any streaming cases like real-time ETL.
>>
>>Thanks,
>>Hao
>>
>>
>>
>>
>>On 12/7/15, 4:50 PM, "Liangfei.Su" <su...@gmail.com> wrote:
>>
>>>Eagle input comes with stream data (like security audit log), eagle provide
>>>alerting computation based on CEP DSL.
>>>
>>>Similar to this process, eagle should be able to provide same DSL support
>>>to expose the real-time monitoring feature, and furthermore could be
>>>integrated with some storage backend (or as another streaming output) to
>>>provide dashboard/presentation to user.
>>>
>>>This would require
>>>1. eagle programming API to support a new semantic of query(or
>>>aggregation), using the similar alert DSL.
>>>2. a clear definition of materialization interface, currently we might
>>>start from the eagle built-in hbase storage implementation.
>>>3. Metric API/Dashboard.
>>>
>>>
>>>Currently, it require a lot of user customization and CEP engine capability
>>>could not be reused. Try to capture this in
>>>https://issues.apache.org/jira/browse/EAGLE-79.
>>>
>>>
>>>Please suggest.
>>>
>>>
>>>Thanks,
>>>
>>>Ralph

Re: [DISCUSS] Provide analytic DSL support

Posted by "Meng, Yiming" <yi...@ebay.com>.
Quick questions:

1. Are you guys re-implement part of Trident?
2. How can the type information kept during the data processing by Storm? 

Regards,
Yiming Meng








On 12/7/15, 4:58 PM, "Chen, Hao" <Ha...@ebay.com> wrote:

>It’s very good point and I’m refactoring https://issues.apache.org/jira/browse/EAGLE-66, after the work, we could start with Analytics DSL using siddhi.
>
>After we finished:
>1. Typesafe DSL: EAGLE-66
>2. SQL CEP (siddhi): EAGLE-79
>3. DAG visualization/status/metric/dashboard
>
>We could even propose eagle-datastream as a general streaming framework independently for any streaming cases like real-time ETL.
>
>Thanks,
>Hao
>
>
>
>
>On 12/7/15, 4:50 PM, "Liangfei.Su" <su...@gmail.com> wrote:
>
>>Eagle input comes with stream data (like security audit log), eagle provide
>>alerting computation based on CEP DSL.
>>
>>Similar to this process, eagle should be able to provide same DSL support
>>to expose the real-time monitoring feature, and furthermore could be
>>integrated with some storage backend (or as another streaming output) to
>>provide dashboard/presentation to user.
>>
>>This would require
>>1. eagle programming API to support a new semantic of query(or
>>aggregation), using the similar alert DSL.
>>2. a clear definition of materialization interface, currently we might
>>start from the eagle built-in hbase storage implementation.
>>3. Metric API/Dashboard.
>>
>>
>>Currently, it require a lot of user customization and CEP engine capability
>>could not be reused. Try to capture this in
>>https://issues.apache.org/jira/browse/EAGLE-79.
>>
>>
>>Please suggest.
>>
>>
>>Thanks,
>>
>>Ralph

Re: [DISCUSS] Provide analytic DSL support

Posted by "Chen, Hao" <Ha...@ebay.com>.
It’s very good point and I’m refactoring https://issues.apache.org/jira/browse/EAGLE-66, after the work, we could start with Analytics DSL using siddhi.

After we finished:
1. Typesafe DSL: EAGLE-66
2. SQL CEP (siddhi): EAGLE-79
3. DAG visualization/status/metric/dashboard

We could even propose eagle-datastream as a general streaming framework independently for any streaming cases like real-time ETL.

Thanks,
Hao




On 12/7/15, 4:50 PM, "Liangfei.Su" <su...@gmail.com> wrote:

>Eagle input comes with stream data (like security audit log), eagle provide
>alerting computation based on CEP DSL.
>
>Similar to this process, eagle should be able to provide same DSL support
>to expose the real-time monitoring feature, and furthermore could be
>integrated with some storage backend (or as another streaming output) to
>provide dashboard/presentation to user.
>
>This would require
>1. eagle programming API to support a new semantic of query(or
>aggregation), using the similar alert DSL.
>2. a clear definition of materialization interface, currently we might
>start from the eagle built-in hbase storage implementation.
>3. Metric API/Dashboard.
>
>
>Currently, it require a lot of user customization and CEP engine capability
>could not be reused. Try to capture this in
>https://issues.apache.org/jira/browse/EAGLE-79.
>
>
>Please suggest.
>
>
>Thanks,
>
>Ralph