You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by Stephan Ewen <se...@apache.org> on 2019/02/13 11:21:11 UTC

[DISCUSS] Adding a mid-term roadmap to the Flink website

Hi all!

Recently several contributors, committers, and users asked about making it
more visible in which way the project is currently going.

Users and developers can track the direction by following the discussion
threads and JIRA, but due to the mass of discussions and open issues, it is
very hard to get a good overall picture.
Especially for new users and contributors, is is very hard to get a quick
overview of the project direction.

To fix this, I suggest to add a brief roadmap summary to the homepage. It
is a bit of a commitment to keep that roadmap up to date, but I think the
benefit for users justifies that.
The Apache Beam project has added such a roadmap [1]
<https://beam.apache.org/roadmap/>, which was received very well by the
community, I would suggest to follow a similar structure here.

If the community is in favor of this, I would volunteer to write a first
version of such a roadmap. The points I would include are below.

Best,
Stephan

[1] https://beam.apache.org/roadmap/

========================================================

Disclaimer: Apache Flink is not governed or steered by any one single
entity, but by its community and Project Management Committee (PMC). This
is not a authoritative roadmap in the sense of a plan with a specific
timeline. Instead, we share our vision for the future and major initiatives
that are receiving attention and give users and contributors an
understanding what they can look forward to.

*Future Role of Table API and DataStream API*
  - Table API becomes first class citizen
  - Table API becomes primary API for analytics use cases
      * Declarative, automatic optimizations
      * No manual control over state and timers
  - DataStream API becomes primary API for applications and data pipeline
use cases
      * Physical, user controls data types, no magic or optimizer
      * Explicit control over state and time

*Batch Streaming Unification*
  - Table API unification (environments) (FLIP-32)
  - New unified source interface (FLIP-27)
  - Runtime operator unification & code reuse between DataStream / Table
  - Extending Table API to make it convenient API for all analytical use
cases (easier mix in of UDFs)
  - Same join operators on bounded/unbounded Table API and DataStream API

*Faster Batch (Bounded Streams)*
  - Much of this comes via Blink contribution/merging
  - Fine-grained Fault Tolerance on bounded data (Table API)
  - Batch Scheduling on bounded data (Table API)
  - External Shuffle Services Support on bounded streams
  - Caching of intermediate results on bounded data (Table API)
  - Extending DataStream API to explicitly model bounded streams (API
breaking)
  - Add fine fault tolerance, scheduling, caching also to DataStream API

*Streaming State Evolution*
  - Let all built-in serializers support stable evolution
  - First class support for other evolvable formats (Protobuf, Thrift)
  - Savepoint input/output format to modify / adjust savepoints

*Simpler Event Time Handling*
  - Event Time Alignment in Sources
  - Simpler out-of-the box support in sources

*Checkpointing*
  - Consistency of Side Effects: suspend / end with savepoint (FLIP-34)
  - Failed checkpoints explicitly aborted on TaskManagers (not only on
coordinator)

*Automatic scaling (adjusting parallelism)*
  - Reactive scaling
  - Active scaling policies

*Kubernetes Integration*
  - Active Kubernetes Integration (Flink actively manages containers)

*SQL Ecosystem*
  - Extended Metadata Stores / Catalog / Schema Registries support
  - DDL support
  - Integration with Hive Ecosystem

*Simpler Handling of Dependencies*
  - Scala in the APIs, but not in the core (hide in separate class loader)
  - Hadoop-free by default

Re: [DISCUSS] Adding a mid-term roadmap to the Flink website

Posted by Stephan Ewen <se...@apache.org>.

Hi Shaoxuan!

I think adding the web UI improvements makes sense - there is not much open
to discuss there. Will do that.

For the machine learning improvements - that is a pretty big piece and I
think the discussions are still ongoing. I would prefer this to advance a
bit before adding it to the roadmap. The way I proposed the roadmap, it was
meant to reflect the ongoing features where we have consensus on what it
should roughly look like.
We can update the roadmap very soon, once the machine learning discussion
has advanced a bit and has reached the state of a FLIP or so.

What do you think?

Best,
Stephan

On Mon, Feb 18, 2019 at 4:31 PM Shaoxuan Wang <ws...@gmail.com> wrote:

> Hi Stephan,
>
> Thanks for summarizing the work&discussions into a roadmap. It really
> helps users to understand where Flink will forward to. The entire outline
> looks good to me. If appropriate, I would recommend to add another two
> attracting categories in the roadmap.
>
> *Flink ML Enhancement*
>   - Refactor ML pipeline on TableAPI
>   - Python support for TableAPI
>   - Support streaming training & inference.
>   - Seamless integration of DL engines (Tensorflow, PyTorch etc)
>   - ML platform with a group of AI tooling
> Some of these work have already been discussed in the dev mail list.
> Related JIRA (FLINK-11095) and discussion:
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Embracing-Table-API-in-Flink-ML-td25368.html
> ;
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Python-and-Non-JVM-Language-Support-in-Flink-td25905.html
>
>
> *Flink-Runtime-Web Improvement*
>   - Much of this comes via Blink
>   - Refactor the entire module to use latest Angular (7.x)
>   - Add resource information at three levels including Cluster,
> TaskManager and Job
>   - Add operator level topology and and data flow tracing
>   - Add new metrics to track the back pressure, filter and data skew
>   - Add log association to Job, Vertex and SubTasks
> Related JIRA (FLINK-10705) and discussion:
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Change-underlying-Frontend-Architecture-for-Flink-Web-Dashboard-td24902.html
>
>
> What do you think?
>
> Regards,
> Shaoxuan
>
>
>
> On Wed, Feb 13, 2019 at 7:21 PM Stephan Ewen <se...@apache.org> wrote:
>
>> Hi all!
>>
>> Recently several contributors, committers, and users asked about making
>> it more visible in which way the project is currently going.
>>
>> Users and developers can track the direction by following the discussion
>> threads and JIRA, but due to the mass of discussions and open issues, it is
>> very hard to get a good overall picture.
>> Especially for new users and contributors, is is very hard to get a quick
>> overview of the project direction.
>>
>> To fix this, I suggest to add a brief roadmap summary to the homepage. It
>> is a bit of a commitment to keep that roadmap up to date, but I think the
>> benefit for users justifies that.
>> The Apache Beam project has added such a roadmap [1]
>> <https://beam.apache.org/roadmap/>, which was received very well by the
>> community, I would suggest to follow a similar structure here.
>>
>> If the community is in favor of this, I would volunteer to write a first
>> version of such a roadmap. The points I would include are below.
>>
>> Best,
>> Stephan
>>
>> [1] https://beam.apache.org/roadmap/
>>
>> ========================================================
>>
>> Disclaimer: Apache Flink is not governed or steered by any one single
>> entity, but by its community and Project Management Committee (PMC). This
>> is not a authoritative roadmap in the sense of a plan with a specific
>> timeline. Instead, we share our vision for the future and major initiatives
>> that are receiving attention and give users and contributors an
>> understanding what they can look forward to.
>>
>> *Future Role of Table API and DataStream API*
>>   - Table API becomes first class citizen
>>   - Table API becomes primary API for analytics use cases
>>       * Declarative, automatic optimizations
>>       * No manual control over state and timers
>>   - DataStream API becomes primary API for applications and data pipeline
>> use cases
>>       * Physical, user controls data types, no magic or optimizer
>>       * Explicit control over state and time
>>
>> *Batch Streaming Unification*
>>   - Table API unification (environments) (FLIP-32)
>>   - New unified source interface (FLIP-27)
>>   - Runtime operator unification & code reuse between DataStream / Table
>>   - Extending Table API to make it convenient API for all analytical use
>> cases (easier mix in of UDFs)
>>   - Same join operators on bounded/unbounded Table API and DataStream API
>>
>> *Faster Batch (Bounded Streams)*
>>   - Much of this comes via Blink contribution/merging
>>   - Fine-grained Fault Tolerance on bounded data (Table API)
>>   - Batch Scheduling on bounded data (Table API)
>>   - External Shuffle Services Support on bounded streams
>>   - Caching of intermediate results on bounded data (Table API)
>>   - Extending DataStream API to explicitly model bounded streams (API
>> breaking)
>>   - Add fine fault tolerance, scheduling, caching also to DataStream API
>>
>> *Streaming State Evolution*
>>   - Let all built-in serializers support stable evolution
>>   - First class support for other evolvable formats (Protobuf, Thrift)
>>   - Savepoint input/output format to modify / adjust savepoints
>>
>> *Simpler Event Time Handling*
>>   - Event Time Alignment in Sources
>>   - Simpler out-of-the box support in sources
>>
>> *Checkpointing*
>>   - Consistency of Side Effects: suspend / end with savepoint (FLIP-34)
>>   - Failed checkpoints explicitly aborted on TaskManagers (not only on
>> coordinator)
>>
>> *Automatic scaling (adjusting parallelism)*
>>   - Reactive scaling
>>   - Active scaling policies
>>
>> *Kubernetes Integration*
>>   - Active Kubernetes Integration (Flink actively manages containers)
>>
>> *SQL Ecosystem*
>>   - Extended Metadata Stores / Catalog / Schema Registries support
>>   - DDL support
>>   - Integration with Hive Ecosystem
>>
>> *Simpler Handling of Dependencies*
>>   - Scala in the APIs, but not in the core (hide in separate class loader)
>>   - Hadoop-free by default
>>
>>

Re: [DISCUSS] Adding a mid-term roadmap to the Flink website

Posted by Stephan Ewen <se...@apache.org>.

Hi Shaoxuan!

I think adding the web UI improvements makes sense - there is not much open
to discuss there. Will do that.

For the machine learning improvements - that is a pretty big piece and I
think the discussions are still ongoing. I would prefer this to advance a
bit before adding it to the roadmap. The way I proposed the roadmap, it was
meant to reflect the ongoing features where we have consensus on what it
should roughly look like.
We can update the roadmap very soon, once the machine learning discussion
has advanced a bit and has reached the state of a FLIP or so.

What do you think?

Best,
Stephan

On Mon, Feb 18, 2019 at 4:31 PM Shaoxuan Wang <ws...@gmail.com> wrote:

> Hi Stephan,
>
> Thanks for summarizing the work&discussions into a roadmap. It really
> helps users to understand where Flink will forward to. The entire outline
> looks good to me. If appropriate, I would recommend to add another two
> attracting categories in the roadmap.
>
> *Flink ML Enhancement*
>   - Refactor ML pipeline on TableAPI
>   - Python support for TableAPI
>   - Support streaming training & inference.
>   - Seamless integration of DL engines (Tensorflow, PyTorch etc)
>   - ML platform with a group of AI tooling
> Some of these work have already been discussed in the dev mail list.
> Related JIRA (FLINK-11095) and discussion:
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Embracing-Table-API-in-Flink-ML-td25368.html
> ;
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Python-and-Non-JVM-Language-Support-in-Flink-td25905.html
>
>
> *Flink-Runtime-Web Improvement*
>   - Much of this comes via Blink
>   - Refactor the entire module to use latest Angular (7.x)
>   - Add resource information at three levels including Cluster,
> TaskManager and Job
>   - Add operator level topology and and data flow tracing
>   - Add new metrics to track the back pressure, filter and data skew
>   - Add log association to Job, Vertex and SubTasks
> Related JIRA (FLINK-10705) and discussion:
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Change-underlying-Frontend-Architecture-for-Flink-Web-Dashboard-td24902.html
>
>
> What do you think?
>
> Regards,
> Shaoxuan
>
>
>
> On Wed, Feb 13, 2019 at 7:21 PM Stephan Ewen <se...@apache.org> wrote:
>
>> Hi all!
>>
>> Recently several contributors, committers, and users asked about making
>> it more visible in which way the project is currently going.
>>
>> Users and developers can track the direction by following the discussion
>> threads and JIRA, but due to the mass of discussions and open issues, it is
>> very hard to get a good overall picture.
>> Especially for new users and contributors, is is very hard to get a quick
>> overview of the project direction.
>>
>> To fix this, I suggest to add a brief roadmap summary to the homepage. It
>> is a bit of a commitment to keep that roadmap up to date, but I think the
>> benefit for users justifies that.
>> The Apache Beam project has added such a roadmap [1]
>> <https://beam.apache.org/roadmap/>, which was received very well by the
>> community, I would suggest to follow a similar structure here.
>>
>> If the community is in favor of this, I would volunteer to write a first
>> version of such a roadmap. The points I would include are below.
>>
>> Best,
>> Stephan
>>
>> [1] https://beam.apache.org/roadmap/
>>
>> ========================================================
>>
>> Disclaimer: Apache Flink is not governed or steered by any one single
>> entity, but by its community and Project Management Committee (PMC). This
>> is not a authoritative roadmap in the sense of a plan with a specific
>> timeline. Instead, we share our vision for the future and major initiatives
>> that are receiving attention and give users and contributors an
>> understanding what they can look forward to.
>>
>> *Future Role of Table API and DataStream API*
>>   - Table API becomes first class citizen
>>   - Table API becomes primary API for analytics use cases
>>       * Declarative, automatic optimizations
>>       * No manual control over state and timers
>>   - DataStream API becomes primary API for applications and data pipeline
>> use cases
>>       * Physical, user controls data types, no magic or optimizer
>>       * Explicit control over state and time
>>
>> *Batch Streaming Unification*
>>   - Table API unification (environments) (FLIP-32)
>>   - New unified source interface (FLIP-27)
>>   - Runtime operator unification & code reuse between DataStream / Table
>>   - Extending Table API to make it convenient API for all analytical use
>> cases (easier mix in of UDFs)
>>   - Same join operators on bounded/unbounded Table API and DataStream API
>>
>> *Faster Batch (Bounded Streams)*
>>   - Much of this comes via Blink contribution/merging
>>   - Fine-grained Fault Tolerance on bounded data (Table API)
>>   - Batch Scheduling on bounded data (Table API)
>>   - External Shuffle Services Support on bounded streams
>>   - Caching of intermediate results on bounded data (Table API)
>>   - Extending DataStream API to explicitly model bounded streams (API
>> breaking)
>>   - Add fine fault tolerance, scheduling, caching also to DataStream API
>>
>> *Streaming State Evolution*
>>   - Let all built-in serializers support stable evolution
>>   - First class support for other evolvable formats (Protobuf, Thrift)
>>   - Savepoint input/output format to modify / adjust savepoints
>>
>> *Simpler Event Time Handling*
>>   - Event Time Alignment in Sources
>>   - Simpler out-of-the box support in sources
>>
>> *Checkpointing*
>>   - Consistency of Side Effects: suspend / end with savepoint (FLIP-34)
>>   - Failed checkpoints explicitly aborted on TaskManagers (not only on
>> coordinator)
>>
>> *Automatic scaling (adjusting parallelism)*
>>   - Reactive scaling
>>   - Active scaling policies
>>
>> *Kubernetes Integration*
>>   - Active Kubernetes Integration (Flink actively manages containers)
>>
>> *SQL Ecosystem*
>>   - Extended Metadata Stores / Catalog / Schema Registries support
>>   - DDL support
>>   - Integration with Hive Ecosystem
>>
>> *Simpler Handling of Dependencies*
>>   - Scala in the APIs, but not in the core (hide in separate class loader)
>>   - Hadoop-free by default
>>
>>

Re: [DISCUSS] Adding a mid-term roadmap to the Flink website

Posted by Shaoxuan Wang <ws...@gmail.com>.

Hi Stephan,

Thanks for summarizing the work&discussions into a roadmap. It really helps
users to understand where Flink will forward to. The entire outline looks
good to me. If appropriate, I would recommend to add another two attracting
categories in the roadmap.

*Flink ML Enhancement*
  - Refactor ML pipeline on TableAPI
  - Python support for TableAPI
  - Support streaming training & inference.
  - Seamless integration of DL engines (Tensorflow, PyTorch etc)
  - ML platform with a group of AI tooling
Some of these work have already been discussed in the dev mail list.
Related JIRA (FLINK-11095) and discussion:
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Embracing-Table-API-in-Flink-ML-td25368.html
;
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Python-and-Non-JVM-Language-Support-in-Flink-td25905.html


*Flink-Runtime-Web Improvement*
  - Much of this comes via Blink
  - Refactor the entire module to use latest Angular (7.x)
  - Add resource information at three levels including Cluster, TaskManager
and Job
  - Add operator level topology and and data flow tracing
  - Add new metrics to track the back pressure, filter and data skew
  - Add log association to Job, Vertex and SubTasks
Related JIRA (FLINK-10705) and discussion:
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Change-underlying-Frontend-Architecture-for-Flink-Web-Dashboard-td24902.html


What do you think?

Regards,
Shaoxuan



On Wed, Feb 13, 2019 at 7:21 PM Stephan Ewen <se...@apache.org> wrote:

> Hi all!
>
> Recently several contributors, committers, and users asked about making it
> more visible in which way the project is currently going.
>
> Users and developers can track the direction by following the discussion
> threads and JIRA, but due to the mass of discussions and open issues, it is
> very hard to get a good overall picture.
> Especially for new users and contributors, is is very hard to get a quick
> overview of the project direction.
>
> To fix this, I suggest to add a brief roadmap summary to the homepage. It
> is a bit of a commitment to keep that roadmap up to date, but I think the
> benefit for users justifies that.
> The Apache Beam project has added such a roadmap [1]
> <https://beam.apache.org/roadmap/>, which was received very well by the
> community, I would suggest to follow a similar structure here.
>
> If the community is in favor of this, I would volunteer to write a first
> version of such a roadmap. The points I would include are below.
>
> Best,
> Stephan
>
> [1] https://beam.apache.org/roadmap/
>
> ========================================================
>
> Disclaimer: Apache Flink is not governed or steered by any one single
> entity, but by its community and Project Management Committee (PMC). This
> is not a authoritative roadmap in the sense of a plan with a specific
> timeline. Instead, we share our vision for the future and major initiatives
> that are receiving attention and give users and contributors an
> understanding what they can look forward to.
>
> *Future Role of Table API and DataStream API*
>   - Table API becomes first class citizen
>   - Table API becomes primary API for analytics use cases
>       * Declarative, automatic optimizations
>       * No manual control over state and timers
>   - DataStream API becomes primary API for applications and data pipeline
> use cases
>       * Physical, user controls data types, no magic or optimizer
>       * Explicit control over state and time
>
> *Batch Streaming Unification*
>   - Table API unification (environments) (FLIP-32)
>   - New unified source interface (FLIP-27)
>   - Runtime operator unification & code reuse between DataStream / Table
>   - Extending Table API to make it convenient API for all analytical use
> cases (easier mix in of UDFs)
>   - Same join operators on bounded/unbounded Table API and DataStream API
>
> *Faster Batch (Bounded Streams)*
>   - Much of this comes via Blink contribution/merging
>   - Fine-grained Fault Tolerance on bounded data (Table API)
>   - Batch Scheduling on bounded data (Table API)
>   - External Shuffle Services Support on bounded streams
>   - Caching of intermediate results on bounded data (Table API)
>   - Extending DataStream API to explicitly model bounded streams (API
> breaking)
>   - Add fine fault tolerance, scheduling, caching also to DataStream API
>
> *Streaming State Evolution*
>   - Let all built-in serializers support stable evolution
>   - First class support for other evolvable formats (Protobuf, Thrift)
>   - Savepoint input/output format to modify / adjust savepoints
>
> *Simpler Event Time Handling*
>   - Event Time Alignment in Sources
>   - Simpler out-of-the box support in sources
>
> *Checkpointing*
>   - Consistency of Side Effects: suspend / end with savepoint (FLIP-34)
>   - Failed checkpoints explicitly aborted on TaskManagers (not only on
> coordinator)
>
> *Automatic scaling (adjusting parallelism)*
>   - Reactive scaling
>   - Active scaling policies
>
> *Kubernetes Integration*
>   - Active Kubernetes Integration (Flink actively manages containers)
>
> *SQL Ecosystem*
>   - Extended Metadata Stores / Catalog / Schema Registries support
>   - DDL support
>   - Integration with Hive Ecosystem
>
> *Simpler Handling of Dependencies*
>   - Scala in the APIs, but not in the core (hide in separate class loader)
>   - Hadoop-free by default
>
>

Re: [DISCUSS] Adding a mid-term roadmap to the Flink website

Posted by vino yang <ya...@gmail.com>.

Great job. Stephan!

Best,
Vino

Jamie Grier <jg...@lyft.com> 于2019年2月27日周三 上午2:27写道：

> This is awesome, Stephan!  Thanks for doing this.
>
> -Jamie
>
>
> On Tue, Feb 26, 2019 at 9:29 AM Stephan Ewen <se...@apache.org> wrote:
>
>> Here is the pull request with a draft of the roadmap:
>> https://github.com/apache/flink-web/pull/178
>>
>> Best,
>> Stephan
>>
>> On Fri, Feb 22, 2019 at 5:18 AM Hequn Cheng <ch...@gmail.com> wrote:
>>
>>> Hi Stephan,
>>>
>>> Thanks for summarizing the great roadmap! It is very helpful for users
>>> and developers to track the direction of Flink.
>>> +1 for putting the roadmap on the website and update it per release.
>>>
>>> Besides, would be great if the roadmap can add the UpsertSource
>>> feature(maybe put it under 'Batch Streaming Unification').
>>> It has been discussed a long time ago[1,2] and is moving forward step by
>>> step.
>>> Currently, Flink can only emit upsert results. With the UpsertSource, we
>>> can make our system a more complete one.
>>>
>>> Best, Hequn
>>>
>>> [1]
>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-TABLE-How-to-handle-empty-delete-for-UpsertSource-td23856.html#a23874
>>> [2] https://issues.apache.org/jira/browse/FLINK-8545
>>> <https://issues.apache.org/jira/browse/FLINK-8545>
>>>
>>>
>>>
>>> On Fri, Feb 22, 2019 at 3:31 AM Rong Rong <wa...@gmail.com> wrote:
>>>
>>>> Hi Stephan,
>>>>
>>>> Yes. I completely agree. Jincheng & Jark gave some very valuable
>>>> feedbacks and suggestions and I think we can definitely move the
>>>> conversation forward to reach a more concrete doc first before we put in to
>>>> the roadmap. Thanks for reviewing it and driving the roadmap effort!
>>>>
>>>> --
>>>> Rong
>>>>
>>>> On Thu, Feb 21, 2019 at 8:50 AM Stephan Ewen <se...@apache.org> wrote:
>>>>
>>>>> Hi Rong Rong!
>>>>>
>>>>> I would add the security / kerberos threads to the roadmap. They seem
>>>>> to be advanced enough in the discussions so that there is clarity what will
>>>>> come.
>>>>>
>>>>> For the window operator with slicing, I would personally like to see
>>>>> the discussion advance and have some more clarity and consensus on the
>>>>> feature before adding it to the roadmap. Not having that in the first
>>>>> version of the roadmap does not mean there will be no activity. And when
>>>>> the discussion advances well in the next weeks, we can update the roadmap
>>>>> soon.
>>>>>
>>>>> What do you think?
>>>>>
>>>>> Best,
>>>>> Stephan
>>>>>
>>>>>
>>>>> On Thu, Feb 14, 2019 at 5:46 PM Rong Rong <wa...@gmail.com> wrote:
>>>>>
>>>>>> Hi Stephan,
>>>>>>
>>>>>> Thanks for the clarification, yes I think these issues has already
>>>>>> been discussed in previous mailing list threads [1,2,3].
>>>>>>
>>>>>> I also agree that updating the "official" roadmap every release is a
>>>>>> very good idea to avoid frequent update.
>>>>>> One question I might've been a bit confusion is: are we suggesting to
>>>>>> keep one roadmap on the documentation site (e.g. [4]) per release, or
>>>>>> simply just one most up-to-date roadmap in the main website [5] ?
>>>>>> Just like the release notes in every release, the former will
>>>>>> probably provide a good tracker for users to look back at previous roadmaps
>>>>>> as well I am assuming.
>>>>>>
>>>>>> Thanks,
>>>>>> Rong
>>>>>>
>>>>>> [1]
>>>>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Improvement-to-Flink-Window-Operator-with-Slicing-td25750.html
>>>>>> [2]
>>>>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-security-improvements-td21068.html
>>>>>> [3]
>>>>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-Kerberos-Improvement-td25983.html
>>>>>>
>>>>>> [4] https://ci.apache.org/projects/flink/flink-docs-release-1.7/
>>>>>> [5] https://flink.apache.org/
>>>>>>
>>>>>> On Thu, Feb 14, 2019 at 2:26 AM Stephan Ewen <se...@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>>> I think the website is better as well.
>>>>>>>
>>>>>>> I agree with Fabian that the wiki is not so visible, and visibility
>>>>>>> is the main motivation.
>>>>>>> This type of roadmap overview would not be updated by everyone -
>>>>>>> letting committers update the roadmap means the listed threads are actually
>>>>>>> happening at the moment.
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Feb 14, 2019 at 11:14 AM Fabian Hueske <fh...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I like the idea of putting the roadmap on the website because it is
>>>>>>>> much more visible (and IMO more credible, obligatory) there.
>>>>>>>> However, I share the concerns about frequent updates.
>>>>>>>>
>>>>>>>> It think it would be great to update the "official" roadmap on the
>>>>>>>> website once per release (-bugfix releases), i.e., every three month.
>>>>>>>> We can use the wiki to collect and draft the roadmap for the next
>>>>>>>> update.
>>>>>>>>
>>>>>>>> Best, Fabian
>>>>>>>>
>>>>>>>>
>>>>>>>> Am Do., 14. Feb. 2019 um 11:03 Uhr schrieb Jeff Zhang <
>>>>>>>> zjffdu@gmail.com>:
>>>>>>>>
>>>>>>>>> Hi Stephan,
>>>>>>>>>
>>>>>>>>> Thanks for this proposal. It is a good idea to track the roadmap.
>>>>>>>>> One suggestion is that it might be better to put it into wiki page first.
>>>>>>>>> Because it is easier to update the roadmap on wiki compared to on flink web
>>>>>>>>> site. And I guess we may need to update the roadmap very often at the
>>>>>>>>> beginning as there's so many discussions and proposals in community
>>>>>>>>> recently. We can move it into flink web site later when we feel it could be
>>>>>>>>> nailed down.
>>>>>>>>>
>>>>>>>>> Stephan Ewen <se...@apache.org> 于2019年2月14日周四 下午5:44写道：
>>>>>>>>>
>>>>>>>>>> Thanks Jincheng and Rong Rong!
>>>>>>>>>>
>>>>>>>>>> I am not deciding a roadmap and making a call on what features
>>>>>>>>>> should be developed or not. I was only collecting broader issues that are
>>>>>>>>>> already happening or have an active FLIP/design discussion plus committer
>>>>>>>>>> support.
>>>>>>>>>>
>>>>>>>>>> Do we have that for the suggested issues as well? If yes , we can
>>>>>>>>>> add them (can you point me to the issue/mail-thread), if not, let's try and
>>>>>>>>>> move the discussion forward and add them to the roadmap overview then.
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Stephan
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, Feb 13, 2019 at 6:47 PM Rong Rong <wa...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Thanks Stephan for the great proposal.
>>>>>>>>>>>
>>>>>>>>>>> This would not only be beneficial for new users but also for
>>>>>>>>>>> contributors to keep track on all upcoming features.
>>>>>>>>>>>
>>>>>>>>>>> I think that better window operator support can also be
>>>>>>>>>>> separately group into its own category, as they affects both future
>>>>>>>>>>> DataStream API and batch stream unification.
>>>>>>>>>>> can we also include:
>>>>>>>>>>> - OVER aggregate for DataStream API separately as @jincheng
>>>>>>>>>>> suggested.
>>>>>>>>>>> - Improving sliding window operator [1]
>>>>>>>>>>>
>>>>>>>>>>> One more additional suggestion, can we also include a more
>>>>>>>>>>> extendable security module [2,3] @shuyi and I are currently working on?
>>>>>>>>>>> This will significantly improve the usability for Flink in
>>>>>>>>>>> corporate environments where proprietary or 3rd-party security integration
>>>>>>>>>>> is needed.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Rong
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> [1]
>>>>>>>>>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Improvement-to-Flink-Window-Operator-with-Slicing-td25750.html
>>>>>>>>>>> [2]
>>>>>>>>>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-security-improvements-td21068.html
>>>>>>>>>>> [3]
>>>>>>>>>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-Kerberos-Improvement-td25983.html
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Feb 13, 2019 at 3:39 AM jincheng sun <
>>>>>>>>>>> sunjincheng121@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Very excited and thank you for launching such a great
>>>>>>>>>>>> discussion, Stephan !
>>>>>>>>>>>>
>>>>>>>>>>>> Here only a little suggestion that in the Batch Streaming
>>>>>>>>>>>> Unification section, do we need to add an item:
>>>>>>>>>>>>
>>>>>>>>>>>> - Same window operators on bounded/unbounded Table API and
>>>>>>>>>>>> DataStream API
>>>>>>>>>>>> (currently OVER window only exists in SQL/TableAPI, DataStream
>>>>>>>>>>>> API does not yet support)
>>>>>>>>>>>>
>>>>>>>>>>>> Best,
>>>>>>>>>>>> Jincheng
>>>>>>>>>>>>
>>>>>>>>>>>> Stephan Ewen <se...@apache.org> 于2019年2月13日周三 下午7:21写道：
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi all!
>>>>>>>>>>>>>
>>>>>>>>>>>>> Recently several contributors, committers, and users asked
>>>>>>>>>>>>> about making it more visible in which way the project is currently going.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Users and developers can track the direction by following the
>>>>>>>>>>>>> discussion threads and JIRA, but due to the mass of discussions and open
>>>>>>>>>>>>> issues, it is very hard to get a good overall picture.
>>>>>>>>>>>>> Especially for new users and contributors, is is very hard to
>>>>>>>>>>>>> get a quick overview of the project direction.
>>>>>>>>>>>>>
>>>>>>>>>>>>> To fix this, I suggest to add a brief roadmap summary to the
>>>>>>>>>>>>> homepage. It is a bit of a commitment to keep that roadmap up to date, but
>>>>>>>>>>>>> I think the benefit for users justifies that.
>>>>>>>>>>>>> The Apache Beam project has added such a roadmap [1]
>>>>>>>>>>>>> <https://beam.apache.org/roadmap/>, which was received very
>>>>>>>>>>>>> well by the community, I would suggest to follow a similar structure here.
>>>>>>>>>>>>>
>>>>>>>>>>>>> If the community is in favor of this, I would volunteer to
>>>>>>>>>>>>> write a first version of such a roadmap. The points I would include are
>>>>>>>>>>>>> below.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Best,
>>>>>>>>>>>>> Stephan
>>>>>>>>>>>>>
>>>>>>>>>>>>> [1] https://beam.apache.org/roadmap/
>>>>>>>>>>>>>
>>>>>>>>>>>>> ========================================================
>>>>>>>>>>>>>
>>>>>>>>>>>>> Disclaimer: Apache Flink is not governed or steered by any one
>>>>>>>>>>>>> single entity, but by its community and Project Management Committee (PMC).
>>>>>>>>>>>>> This is not a authoritative roadmap in the sense of a plan with a specific
>>>>>>>>>>>>> timeline. Instead, we share our vision for the future and major initiatives
>>>>>>>>>>>>> that are receiving attention and give users and contributors an
>>>>>>>>>>>>> understanding what they can look forward to.
>>>>>>>>>>>>>
>>>>>>>>>>>>> *Future Role of Table API and DataStream API*
>>>>>>>>>>>>>   - Table API becomes first class citizen
>>>>>>>>>>>>>   - Table API becomes primary API for analytics use cases
>>>>>>>>>>>>>       * Declarative, automatic optimizations
>>>>>>>>>>>>>       * No manual control over state and timers
>>>>>>>>>>>>>   - DataStream API becomes primary API for applications and
>>>>>>>>>>>>> data pipeline use cases
>>>>>>>>>>>>>       * Physical, user controls data types, no magic or
>>>>>>>>>>>>> optimizer
>>>>>>>>>>>>>       * Explicit control over state and time
>>>>>>>>>>>>>
>>>>>>>>>>>>> *Batch Streaming Unification*
>>>>>>>>>>>>>   - Table API unification (environments) (FLIP-32)
>>>>>>>>>>>>>   - New unified source interface (FLIP-27)
>>>>>>>>>>>>>   - Runtime operator unification & code reuse between
>>>>>>>>>>>>> DataStream / Table
>>>>>>>>>>>>>   - Extending Table API to make it convenient API for all
>>>>>>>>>>>>> analytical use cases (easier mix in of UDFs)
>>>>>>>>>>>>>   - Same join operators on bounded/unbounded Table API and
>>>>>>>>>>>>> DataStream API
>>>>>>>>>>>>>
>>>>>>>>>>>>> *Faster Batch (Bounded Streams)*
>>>>>>>>>>>>>   - Much of this comes via Blink contribution/merging
>>>>>>>>>>>>>   - Fine-grained Fault Tolerance on bounded data (Table API)
>>>>>>>>>>>>>   - Batch Scheduling on bounded data (Table API)
>>>>>>>>>>>>>   - External Shuffle Services Support on bounded streams
>>>>>>>>>>>>>   - Caching of intermediate results on bounded data (Table API)
>>>>>>>>>>>>>   - Extending DataStream API to explicitly model bounded
>>>>>>>>>>>>> streams (API breaking)
>>>>>>>>>>>>>   - Add fine fault tolerance, scheduling, caching also to
>>>>>>>>>>>>> DataStream API
>>>>>>>>>>>>>
>>>>>>>>>>>>> *Streaming State Evolution*
>>>>>>>>>>>>>   - Let all built-in serializers support stable evolution
>>>>>>>>>>>>>   - First class support for other evolvable formats (Protobuf,
>>>>>>>>>>>>> Thrift)
>>>>>>>>>>>>>   - Savepoint input/output format to modify / adjust savepoints
>>>>>>>>>>>>>
>>>>>>>>>>>>> *Simpler Event Time Handling*
>>>>>>>>>>>>>   - Event Time Alignment in Sources
>>>>>>>>>>>>>   - Simpler out-of-the box support in sources
>>>>>>>>>>>>>
>>>>>>>>>>>>> *Checkpointing*
>>>>>>>>>>>>>   - Consistency of Side Effects: suspend / end with savepoint
>>>>>>>>>>>>> (FLIP-34)
>>>>>>>>>>>>>   - Failed checkpoints explicitly aborted on TaskManagers (not
>>>>>>>>>>>>> only on coordinator)
>>>>>>>>>>>>>
>>>>>>>>>>>>> *Automatic scaling (adjusting parallelism)*
>>>>>>>>>>>>>   - Reactive scaling
>>>>>>>>>>>>>   - Active scaling policies
>>>>>>>>>>>>>
>>>>>>>>>>>>> *Kubernetes Integration*
>>>>>>>>>>>>>   - Active Kubernetes Integration (Flink actively manages
>>>>>>>>>>>>> containers)
>>>>>>>>>>>>>
>>>>>>>>>>>>> *SQL Ecosystem*
>>>>>>>>>>>>>   - Extended Metadata Stores / Catalog / Schema Registries
>>>>>>>>>>>>> support
>>>>>>>>>>>>>   - DDL support
>>>>>>>>>>>>>   - Integration with Hive Ecosystem
>>>>>>>>>>>>>
>>>>>>>>>>>>> *Simpler Handling of Dependencies*
>>>>>>>>>>>>>   - Scala in the APIs, but not in the core (hide in separate
>>>>>>>>>>>>> class loader)
>>>>>>>>>>>>>   - Hadoop-free by default
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Best Regards
>>>>>>>>>
>>>>>>>>> Jeff Zhang
>>>>>>>>>
>>>>>>>>

Re: [DISCUSS] Adding a mid-term roadmap to the Flink website

Posted by vino yang <ya...@gmail.com>.

Great job. Stephan!

Best,
Vino

Jamie Grier <jg...@lyft.com> 于2019年2月27日周三 上午2:27写道：

> This is awesome, Stephan!  Thanks for doing this.
>
> -Jamie
>
>
> On Tue, Feb 26, 2019 at 9:29 AM Stephan Ewen <se...@apache.org> wrote:
>
>> Here is the pull request with a draft of the roadmap:
>> https://github.com/apache/flink-web/pull/178
>>
>> Best,
>> Stephan
>>
>> On Fri, Feb 22, 2019 at 5:18 AM Hequn Cheng <ch...@gmail.com> wrote:
>>
>>> Hi Stephan,
>>>
>>> Thanks for summarizing the great roadmap! It is very helpful for users
>>> and developers to track the direction of Flink.
>>> +1 for putting the roadmap on the website and update it per release.
>>>
>>> Besides, would be great if the roadmap can add the UpsertSource
>>> feature(maybe put it under 'Batch Streaming Unification').
>>> It has been discussed a long time ago[1,2] and is moving forward step by
>>> step.
>>> Currently, Flink can only emit upsert results. With the UpsertSource, we
>>> can make our system a more complete one.
>>>
>>> Best, Hequn
>>>
>>> [1]
>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-TABLE-How-to-handle-empty-delete-for-UpsertSource-td23856.html#a23874
>>> [2] https://issues.apache.org/jira/browse/FLINK-8545
>>> <https://issues.apache.org/jira/browse/FLINK-8545>
>>>
>>>
>>>
>>> On Fri, Feb 22, 2019 at 3:31 AM Rong Rong <wa...@gmail.com> wrote:
>>>
>>>> Hi Stephan,
>>>>
>>>> Yes. I completely agree. Jincheng & Jark gave some very valuable
>>>> feedbacks and suggestions and I think we can definitely move the
>>>> conversation forward to reach a more concrete doc first before we put in to
>>>> the roadmap. Thanks for reviewing it and driving the roadmap effort!
>>>>
>>>> --
>>>> Rong
>>>>
>>>> On Thu, Feb 21, 2019 at 8:50 AM Stephan Ewen <se...@apache.org> wrote:
>>>>
>>>>> Hi Rong Rong!
>>>>>
>>>>> I would add the security / kerberos threads to the roadmap. They seem
>>>>> to be advanced enough in the discussions so that there is clarity what will
>>>>> come.
>>>>>
>>>>> For the window operator with slicing, I would personally like to see
>>>>> the discussion advance and have some more clarity and consensus on the
>>>>> feature before adding it to the roadmap. Not having that in the first
>>>>> version of the roadmap does not mean there will be no activity. And when
>>>>> the discussion advances well in the next weeks, we can update the roadmap
>>>>> soon.
>>>>>
>>>>> What do you think?
>>>>>
>>>>> Best,
>>>>> Stephan
>>>>>
>>>>>
>>>>> On Thu, Feb 14, 2019 at 5:46 PM Rong Rong <wa...@gmail.com> wrote:
>>>>>
>>>>>> Hi Stephan,
>>>>>>
>>>>>> Thanks for the clarification, yes I think these issues has already
>>>>>> been discussed in previous mailing list threads [1,2,3].
>>>>>>
>>>>>> I also agree that updating the "official" roadmap every release is a
>>>>>> very good idea to avoid frequent update.
>>>>>> One question I might've been a bit confusion is: are we suggesting to
>>>>>> keep one roadmap on the documentation site (e.g. [4]) per release, or
>>>>>> simply just one most up-to-date roadmap in the main website [5] ?
>>>>>> Just like the release notes in every release, the former will
>>>>>> probably provide a good tracker for users to look back at previous roadmaps
>>>>>> as well I am assuming.
>>>>>>
>>>>>> Thanks,
>>>>>> Rong
>>>>>>
>>>>>> [1]
>>>>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Improvement-to-Flink-Window-Operator-with-Slicing-td25750.html
>>>>>> [2]
>>>>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-security-improvements-td21068.html
>>>>>> [3]
>>>>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-Kerberos-Improvement-td25983.html
>>>>>>
>>>>>> [4] https://ci.apache.org/projects/flink/flink-docs-release-1.7/
>>>>>> [5] https://flink.apache.org/
>>>>>>
>>>>>> On Thu, Feb 14, 2019 at 2:26 AM Stephan Ewen <se...@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>>> I think the website is better as well.
>>>>>>>
>>>>>>> I agree with Fabian that the wiki is not so visible, and visibility
>>>>>>> is the main motivation.
>>>>>>> This type of roadmap overview would not be updated by everyone -
>>>>>>> letting committers update the roadmap means the listed threads are actually
>>>>>>> happening at the moment.
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Feb 14, 2019 at 11:14 AM Fabian Hueske <fh...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I like the idea of putting the roadmap on the website because it is
>>>>>>>> much more visible (and IMO more credible, obligatory) there.
>>>>>>>> However, I share the concerns about frequent updates.
>>>>>>>>
>>>>>>>> It think it would be great to update the "official" roadmap on the
>>>>>>>> website once per release (-bugfix releases), i.e., every three month.
>>>>>>>> We can use the wiki to collect and draft the roadmap for the next
>>>>>>>> update.
>>>>>>>>
>>>>>>>> Best, Fabian
>>>>>>>>
>>>>>>>>
>>>>>>>> Am Do., 14. Feb. 2019 um 11:03 Uhr schrieb Jeff Zhang <
>>>>>>>> zjffdu@gmail.com>:
>>>>>>>>
>>>>>>>>> Hi Stephan,
>>>>>>>>>
>>>>>>>>> Thanks for this proposal. It is a good idea to track the roadmap.
>>>>>>>>> One suggestion is that it might be better to put it into wiki page first.
>>>>>>>>> Because it is easier to update the roadmap on wiki compared to on flink web
>>>>>>>>> site. And I guess we may need to update the roadmap very often at the
>>>>>>>>> beginning as there's so many discussions and proposals in community
>>>>>>>>> recently. We can move it into flink web site later when we feel it could be
>>>>>>>>> nailed down.
>>>>>>>>>
>>>>>>>>> Stephan Ewen <se...@apache.org> 于2019年2月14日周四 下午5:44写道：
>>>>>>>>>
>>>>>>>>>> Thanks Jincheng and Rong Rong!
>>>>>>>>>>
>>>>>>>>>> I am not deciding a roadmap and making a call on what features
>>>>>>>>>> should be developed or not. I was only collecting broader issues that are
>>>>>>>>>> already happening or have an active FLIP/design discussion plus committer
>>>>>>>>>> support.
>>>>>>>>>>
>>>>>>>>>> Do we have that for the suggested issues as well? If yes , we can
>>>>>>>>>> add them (can you point me to the issue/mail-thread), if not, let's try and
>>>>>>>>>> move the discussion forward and add them to the roadmap overview then.
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Stephan
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, Feb 13, 2019 at 6:47 PM Rong Rong <wa...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Thanks Stephan for the great proposal.
>>>>>>>>>>>
>>>>>>>>>>> This would not only be beneficial for new users but also for
>>>>>>>>>>> contributors to keep track on all upcoming features.
>>>>>>>>>>>
>>>>>>>>>>> I think that better window operator support can also be
>>>>>>>>>>> separately group into its own category, as they affects both future
>>>>>>>>>>> DataStream API and batch stream unification.
>>>>>>>>>>> can we also include:
>>>>>>>>>>> - OVER aggregate for DataStream API separately as @jincheng
>>>>>>>>>>> suggested.
>>>>>>>>>>> - Improving sliding window operator [1]
>>>>>>>>>>>
>>>>>>>>>>> One more additional suggestion, can we also include a more
>>>>>>>>>>> extendable security module [2,3] @shuyi and I are currently working on?
>>>>>>>>>>> This will significantly improve the usability for Flink in
>>>>>>>>>>> corporate environments where proprietary or 3rd-party security integration
>>>>>>>>>>> is needed.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Rong
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> [1]
>>>>>>>>>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Improvement-to-Flink-Window-Operator-with-Slicing-td25750.html
>>>>>>>>>>> [2]
>>>>>>>>>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-security-improvements-td21068.html
>>>>>>>>>>> [3]
>>>>>>>>>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-Kerberos-Improvement-td25983.html
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Feb 13, 2019 at 3:39 AM jincheng sun <
>>>>>>>>>>> sunjincheng121@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Very excited and thank you for launching such a great
>>>>>>>>>>>> discussion, Stephan !
>>>>>>>>>>>>
>>>>>>>>>>>> Here only a little suggestion that in the Batch Streaming
>>>>>>>>>>>> Unification section, do we need to add an item:
>>>>>>>>>>>>
>>>>>>>>>>>> - Same window operators on bounded/unbounded Table API and
>>>>>>>>>>>> DataStream API
>>>>>>>>>>>> (currently OVER window only exists in SQL/TableAPI, DataStream
>>>>>>>>>>>> API does not yet support)
>>>>>>>>>>>>
>>>>>>>>>>>> Best,
>>>>>>>>>>>> Jincheng
>>>>>>>>>>>>
>>>>>>>>>>>> Stephan Ewen <se...@apache.org> 于2019年2月13日周三 下午7:21写道：
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi all!
>>>>>>>>>>>>>
>>>>>>>>>>>>> Recently several contributors, committers, and users asked
>>>>>>>>>>>>> about making it more visible in which way the project is currently going.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Users and developers can track the direction by following the
>>>>>>>>>>>>> discussion threads and JIRA, but due to the mass of discussions and open
>>>>>>>>>>>>> issues, it is very hard to get a good overall picture.
>>>>>>>>>>>>> Especially for new users and contributors, is is very hard to
>>>>>>>>>>>>> get a quick overview of the project direction.
>>>>>>>>>>>>>
>>>>>>>>>>>>> To fix this, I suggest to add a brief roadmap summary to the
>>>>>>>>>>>>> homepage. It is a bit of a commitment to keep that roadmap up to date, but
>>>>>>>>>>>>> I think the benefit for users justifies that.
>>>>>>>>>>>>> The Apache Beam project has added such a roadmap [1]
>>>>>>>>>>>>> <https://beam.apache.org/roadmap/>, which was received very
>>>>>>>>>>>>> well by the community, I would suggest to follow a similar structure here.
>>>>>>>>>>>>>
>>>>>>>>>>>>> If the community is in favor of this, I would volunteer to
>>>>>>>>>>>>> write a first version of such a roadmap. The points I would include are
>>>>>>>>>>>>> below.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Best,
>>>>>>>>>>>>> Stephan
>>>>>>>>>>>>>
>>>>>>>>>>>>> [1] https://beam.apache.org/roadmap/
>>>>>>>>>>>>>
>>>>>>>>>>>>> ========================================================
>>>>>>>>>>>>>
>>>>>>>>>>>>> Disclaimer: Apache Flink is not governed or steered by any one
>>>>>>>>>>>>> single entity, but by its community and Project Management Committee (PMC).
>>>>>>>>>>>>> This is not a authoritative roadmap in the sense of a plan with a specific
>>>>>>>>>>>>> timeline. Instead, we share our vision for the future and major initiatives
>>>>>>>>>>>>> that are receiving attention and give users and contributors an
>>>>>>>>>>>>> understanding what they can look forward to.
>>>>>>>>>>>>>
>>>>>>>>>>>>> *Future Role of Table API and DataStream API*
>>>>>>>>>>>>>   - Table API becomes first class citizen
>>>>>>>>>>>>>   - Table API becomes primary API for analytics use cases
>>>>>>>>>>>>>       * Declarative, automatic optimizations
>>>>>>>>>>>>>       * No manual control over state and timers
>>>>>>>>>>>>>   - DataStream API becomes primary API for applications and
>>>>>>>>>>>>> data pipeline use cases
>>>>>>>>>>>>>       * Physical, user controls data types, no magic or
>>>>>>>>>>>>> optimizer
>>>>>>>>>>>>>       * Explicit control over state and time
>>>>>>>>>>>>>
>>>>>>>>>>>>> *Batch Streaming Unification*
>>>>>>>>>>>>>   - Table API unification (environments) (FLIP-32)
>>>>>>>>>>>>>   - New unified source interface (FLIP-27)
>>>>>>>>>>>>>   - Runtime operator unification & code reuse between
>>>>>>>>>>>>> DataStream / Table
>>>>>>>>>>>>>   - Extending Table API to make it convenient API for all
>>>>>>>>>>>>> analytical use cases (easier mix in of UDFs)
>>>>>>>>>>>>>   - Same join operators on bounded/unbounded Table API and
>>>>>>>>>>>>> DataStream API
>>>>>>>>>>>>>
>>>>>>>>>>>>> *Faster Batch (Bounded Streams)*
>>>>>>>>>>>>>   - Much of this comes via Blink contribution/merging
>>>>>>>>>>>>>   - Fine-grained Fault Tolerance on bounded data (Table API)
>>>>>>>>>>>>>   - Batch Scheduling on bounded data (Table API)
>>>>>>>>>>>>>   - External Shuffle Services Support on bounded streams
>>>>>>>>>>>>>   - Caching of intermediate results on bounded data (Table API)
>>>>>>>>>>>>>   - Extending DataStream API to explicitly model bounded
>>>>>>>>>>>>> streams (API breaking)
>>>>>>>>>>>>>   - Add fine fault tolerance, scheduling, caching also to
>>>>>>>>>>>>> DataStream API
>>>>>>>>>>>>>
>>>>>>>>>>>>> *Streaming State Evolution*
>>>>>>>>>>>>>   - Let all built-in serializers support stable evolution
>>>>>>>>>>>>>   - First class support for other evolvable formats (Protobuf,
>>>>>>>>>>>>> Thrift)
>>>>>>>>>>>>>   - Savepoint input/output format to modify / adjust savepoints
>>>>>>>>>>>>>
>>>>>>>>>>>>> *Simpler Event Time Handling*
>>>>>>>>>>>>>   - Event Time Alignment in Sources
>>>>>>>>>>>>>   - Simpler out-of-the box support in sources
>>>>>>>>>>>>>
>>>>>>>>>>>>> *Checkpointing*
>>>>>>>>>>>>>   - Consistency of Side Effects: suspend / end with savepoint
>>>>>>>>>>>>> (FLIP-34)
>>>>>>>>>>>>>   - Failed checkpoints explicitly aborted on TaskManagers (not
>>>>>>>>>>>>> only on coordinator)
>>>>>>>>>>>>>
>>>>>>>>>>>>> *Automatic scaling (adjusting parallelism)*
>>>>>>>>>>>>>   - Reactive scaling
>>>>>>>>>>>>>   - Active scaling policies
>>>>>>>>>>>>>
>>>>>>>>>>>>> *Kubernetes Integration*
>>>>>>>>>>>>>   - Active Kubernetes Integration (Flink actively manages
>>>>>>>>>>>>> containers)
>>>>>>>>>>>>>
>>>>>>>>>>>>> *SQL Ecosystem*
>>>>>>>>>>>>>   - Extended Metadata Stores / Catalog / Schema Registries
>>>>>>>>>>>>> support
>>>>>>>>>>>>>   - DDL support
>>>>>>>>>>>>>   - Integration with Hive Ecosystem
>>>>>>>>>>>>>
>>>>>>>>>>>>> *Simpler Handling of Dependencies*
>>>>>>>>>>>>>   - Scala in the APIs, but not in the core (hide in separate
>>>>>>>>>>>>> class loader)
>>>>>>>>>>>>>   - Hadoop-free by default
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Best Regards
>>>>>>>>>
>>>>>>>>> Jeff Zhang
>>>>>>>>>
>>>>>>>>

Re: [DISCUSS] Adding a mid-term roadmap to the Flink website

Posted by Jamie Grier <jg...@lyft.com>.

This is awesome, Stephan!  Thanks for doing this.

-Jamie


On Tue, Feb 26, 2019 at 9:29 AM Stephan Ewen <se...@apache.org> wrote:

> Here is the pull request with a draft of the roadmap:
> https://github.com/apache/flink-web/pull/178
>
> Best,
> Stephan
>
> On Fri, Feb 22, 2019 at 5:18 AM Hequn Cheng <ch...@gmail.com> wrote:
>
>> Hi Stephan,
>>
>> Thanks for summarizing the great roadmap! It is very helpful for users
>> and developers to track the direction of Flink.
>> +1 for putting the roadmap on the website and update it per release.
>>
>> Besides, would be great if the roadmap can add the UpsertSource
>> feature(maybe put it under 'Batch Streaming Unification').
>> It has been discussed a long time ago[1,2] and is moving forward step by
>> step.
>> Currently, Flink can only emit upsert results. With the UpsertSource, we
>> can make our system a more complete one.
>>
>> Best, Hequn
>>
>> [1]
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-TABLE-How-to-handle-empty-delete-for-UpsertSource-td23856.html#a23874
>> [2] https://issues.apache.org/jira/browse/FLINK-8545
>> <https://issues.apache.org/jira/browse/FLINK-8545>
>>
>>
>>
>> On Fri, Feb 22, 2019 at 3:31 AM Rong Rong <wa...@gmail.com> wrote:
>>
>>> Hi Stephan,
>>>
>>> Yes. I completely agree. Jincheng & Jark gave some very valuable
>>> feedbacks and suggestions and I think we can definitely move the
>>> conversation forward to reach a more concrete doc first before we put in to
>>> the roadmap. Thanks for reviewing it and driving the roadmap effort!
>>>
>>> --
>>> Rong
>>>
>>> On Thu, Feb 21, 2019 at 8:50 AM Stephan Ewen <se...@apache.org> wrote:
>>>
>>>> Hi Rong Rong!
>>>>
>>>> I would add the security / kerberos threads to the roadmap. They seem
>>>> to be advanced enough in the discussions so that there is clarity what will
>>>> come.
>>>>
>>>> For the window operator with slicing, I would personally like to see
>>>> the discussion advance and have some more clarity and consensus on the
>>>> feature before adding it to the roadmap. Not having that in the first
>>>> version of the roadmap does not mean there will be no activity. And when
>>>> the discussion advances well in the next weeks, we can update the roadmap
>>>> soon.
>>>>
>>>> What do you think?
>>>>
>>>> Best,
>>>> Stephan
>>>>
>>>>
>>>> On Thu, Feb 14, 2019 at 5:46 PM Rong Rong <wa...@gmail.com> wrote:
>>>>
>>>>> Hi Stephan,
>>>>>
>>>>> Thanks for the clarification, yes I think these issues has already
>>>>> been discussed in previous mailing list threads [1,2,3].
>>>>>
>>>>> I also agree that updating the "official" roadmap every release is a
>>>>> very good idea to avoid frequent update.
>>>>> One question I might've been a bit confusion is: are we suggesting to
>>>>> keep one roadmap on the documentation site (e.g. [4]) per release, or
>>>>> simply just one most up-to-date roadmap in the main website [5] ?
>>>>> Just like the release notes in every release, the former will probably
>>>>> provide a good tracker for users to look back at previous roadmaps as well
>>>>> I am assuming.
>>>>>
>>>>> Thanks,
>>>>> Rong
>>>>>
>>>>> [1]
>>>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Improvement-to-Flink-Window-Operator-with-Slicing-td25750.html
>>>>> [2]
>>>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-security-improvements-td21068.html
>>>>> [3]
>>>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-Kerberos-Improvement-td25983.html
>>>>>
>>>>> [4] https://ci.apache.org/projects/flink/flink-docs-release-1.7/
>>>>> [5] https://flink.apache.org/
>>>>>
>>>>> On Thu, Feb 14, 2019 at 2:26 AM Stephan Ewen <se...@apache.org> wrote:
>>>>>
>>>>>> I think the website is better as well.
>>>>>>
>>>>>> I agree with Fabian that the wiki is not so visible, and visibility
>>>>>> is the main motivation.
>>>>>> This type of roadmap overview would not be updated by everyone -
>>>>>> letting committers update the roadmap means the listed threads are actually
>>>>>> happening at the moment.
>>>>>>
>>>>>>
>>>>>> On Thu, Feb 14, 2019 at 11:14 AM Fabian Hueske <fh...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I like the idea of putting the roadmap on the website because it is
>>>>>>> much more visible (and IMO more credible, obligatory) there.
>>>>>>> However, I share the concerns about frequent updates.
>>>>>>>
>>>>>>> It think it would be great to update the "official" roadmap on the
>>>>>>> website once per release (-bugfix releases), i.e., every three month.
>>>>>>> We can use the wiki to collect and draft the roadmap for the next
>>>>>>> update.
>>>>>>>
>>>>>>> Best, Fabian
>>>>>>>
>>>>>>>
>>>>>>> Am Do., 14. Feb. 2019 um 11:03 Uhr schrieb Jeff Zhang <
>>>>>>> zjffdu@gmail.com>:
>>>>>>>
>>>>>>>> Hi Stephan,
>>>>>>>>
>>>>>>>> Thanks for this proposal. It is a good idea to track the roadmap.
>>>>>>>> One suggestion is that it might be better to put it into wiki page first.
>>>>>>>> Because it is easier to update the roadmap on wiki compared to on flink web
>>>>>>>> site. And I guess we may need to update the roadmap very often at the
>>>>>>>> beginning as there's so many discussions and proposals in community
>>>>>>>> recently. We can move it into flink web site later when we feel it could be
>>>>>>>> nailed down.
>>>>>>>>
>>>>>>>> Stephan Ewen <se...@apache.org> 于2019年2月14日周四 下午5:44写道：
>>>>>>>>
>>>>>>>>> Thanks Jincheng and Rong Rong!
>>>>>>>>>
>>>>>>>>> I am not deciding a roadmap and making a call on what features
>>>>>>>>> should be developed or not. I was only collecting broader issues that are
>>>>>>>>> already happening or have an active FLIP/design discussion plus committer
>>>>>>>>> support.
>>>>>>>>>
>>>>>>>>> Do we have that for the suggested issues as well? If yes , we can
>>>>>>>>> add them (can you point me to the issue/mail-thread), if not, let's try and
>>>>>>>>> move the discussion forward and add them to the roadmap overview then.
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> Stephan
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Feb 13, 2019 at 6:47 PM Rong Rong <wa...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Thanks Stephan for the great proposal.
>>>>>>>>>>
>>>>>>>>>> This would not only be beneficial for new users but also for
>>>>>>>>>> contributors to keep track on all upcoming features.
>>>>>>>>>>
>>>>>>>>>> I think that better window operator support can also be
>>>>>>>>>> separately group into its own category, as they affects both future
>>>>>>>>>> DataStream API and batch stream unification.
>>>>>>>>>> can we also include:
>>>>>>>>>> - OVER aggregate for DataStream API separately as @jincheng
>>>>>>>>>> suggested.
>>>>>>>>>> - Improving sliding window operator [1]
>>>>>>>>>>
>>>>>>>>>> One more additional suggestion, can we also include a more
>>>>>>>>>> extendable security module [2,3] @shuyi and I are currently working on?
>>>>>>>>>> This will significantly improve the usability for Flink in
>>>>>>>>>> corporate environments where proprietary or 3rd-party security integration
>>>>>>>>>> is needed.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Rong
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> [1]
>>>>>>>>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Improvement-to-Flink-Window-Operator-with-Slicing-td25750.html
>>>>>>>>>> [2]
>>>>>>>>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-security-improvements-td21068.html
>>>>>>>>>> [3]
>>>>>>>>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-Kerberos-Improvement-td25983.html
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, Feb 13, 2019 at 3:39 AM jincheng sun <
>>>>>>>>>> sunjincheng121@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Very excited and thank you for launching such a great
>>>>>>>>>>> discussion, Stephan !
>>>>>>>>>>>
>>>>>>>>>>> Here only a little suggestion that in the Batch Streaming
>>>>>>>>>>> Unification section, do we need to add an item:
>>>>>>>>>>>
>>>>>>>>>>> - Same window operators on bounded/unbounded Table API and
>>>>>>>>>>> DataStream API
>>>>>>>>>>> (currently OVER window only exists in SQL/TableAPI, DataStream
>>>>>>>>>>> API does not yet support)
>>>>>>>>>>>
>>>>>>>>>>> Best,
>>>>>>>>>>> Jincheng
>>>>>>>>>>>
>>>>>>>>>>> Stephan Ewen <se...@apache.org> 于2019年2月13日周三 下午7:21写道：
>>>>>>>>>>>
>>>>>>>>>>>> Hi all!
>>>>>>>>>>>>
>>>>>>>>>>>> Recently several contributors, committers, and users asked
>>>>>>>>>>>> about making it more visible in which way the project is currently going.
>>>>>>>>>>>>
>>>>>>>>>>>> Users and developers can track the direction by following the
>>>>>>>>>>>> discussion threads and JIRA, but due to the mass of discussions and open
>>>>>>>>>>>> issues, it is very hard to get a good overall picture.
>>>>>>>>>>>> Especially for new users and contributors, is is very hard to
>>>>>>>>>>>> get a quick overview of the project direction.
>>>>>>>>>>>>
>>>>>>>>>>>> To fix this, I suggest to add a brief roadmap summary to the
>>>>>>>>>>>> homepage. It is a bit of a commitment to keep that roadmap up to date, but
>>>>>>>>>>>> I think the benefit for users justifies that.
>>>>>>>>>>>> The Apache Beam project has added such a roadmap [1]
>>>>>>>>>>>> <https://beam.apache.org/roadmap/>, which was received very
>>>>>>>>>>>> well by the community, I would suggest to follow a similar structure here.
>>>>>>>>>>>>
>>>>>>>>>>>> If the community is in favor of this, I would volunteer to
>>>>>>>>>>>> write a first version of such a roadmap. The points I would include are
>>>>>>>>>>>> below.
>>>>>>>>>>>>
>>>>>>>>>>>> Best,
>>>>>>>>>>>> Stephan
>>>>>>>>>>>>
>>>>>>>>>>>> [1] https://beam.apache.org/roadmap/
>>>>>>>>>>>>
>>>>>>>>>>>> ========================================================
>>>>>>>>>>>>
>>>>>>>>>>>> Disclaimer: Apache Flink is not governed or steered by any one
>>>>>>>>>>>> single entity, but by its community and Project Management Committee (PMC).
>>>>>>>>>>>> This is not a authoritative roadmap in the sense of a plan with a specific
>>>>>>>>>>>> timeline. Instead, we share our vision for the future and major initiatives
>>>>>>>>>>>> that are receiving attention and give users and contributors an
>>>>>>>>>>>> understanding what they can look forward to.
>>>>>>>>>>>>
>>>>>>>>>>>> *Future Role of Table API and DataStream API*
>>>>>>>>>>>>   - Table API becomes first class citizen
>>>>>>>>>>>>   - Table API becomes primary API for analytics use cases
>>>>>>>>>>>>       * Declarative, automatic optimizations
>>>>>>>>>>>>       * No manual control over state and timers
>>>>>>>>>>>>   - DataStream API becomes primary API for applications and
>>>>>>>>>>>> data pipeline use cases
>>>>>>>>>>>>       * Physical, user controls data types, no magic or
>>>>>>>>>>>> optimizer
>>>>>>>>>>>>       * Explicit control over state and time
>>>>>>>>>>>>
>>>>>>>>>>>> *Batch Streaming Unification*
>>>>>>>>>>>>   - Table API unification (environments) (FLIP-32)
>>>>>>>>>>>>   - New unified source interface (FLIP-27)
>>>>>>>>>>>>   - Runtime operator unification & code reuse between
>>>>>>>>>>>> DataStream / Table
>>>>>>>>>>>>   - Extending Table API to make it convenient API for all
>>>>>>>>>>>> analytical use cases (easier mix in of UDFs)
>>>>>>>>>>>>   - Same join operators on bounded/unbounded Table API and
>>>>>>>>>>>> DataStream API
>>>>>>>>>>>>
>>>>>>>>>>>> *Faster Batch (Bounded Streams)*
>>>>>>>>>>>>   - Much of this comes via Blink contribution/merging
>>>>>>>>>>>>   - Fine-grained Fault Tolerance on bounded data (Table API)
>>>>>>>>>>>>   - Batch Scheduling on bounded data (Table API)
>>>>>>>>>>>>   - External Shuffle Services Support on bounded streams
>>>>>>>>>>>>   - Caching of intermediate results on bounded data (Table API)
>>>>>>>>>>>>   - Extending DataStream API to explicitly model bounded
>>>>>>>>>>>> streams (API breaking)
>>>>>>>>>>>>   - Add fine fault tolerance, scheduling, caching also to
>>>>>>>>>>>> DataStream API
>>>>>>>>>>>>
>>>>>>>>>>>> *Streaming State Evolution*
>>>>>>>>>>>>   - Let all built-in serializers support stable evolution
>>>>>>>>>>>>   - First class support for other evolvable formats (Protobuf,
>>>>>>>>>>>> Thrift)
>>>>>>>>>>>>   - Savepoint input/output format to modify / adjust savepoints
>>>>>>>>>>>>
>>>>>>>>>>>> *Simpler Event Time Handling*
>>>>>>>>>>>>   - Event Time Alignment in Sources
>>>>>>>>>>>>   - Simpler out-of-the box support in sources
>>>>>>>>>>>>
>>>>>>>>>>>> *Checkpointing*
>>>>>>>>>>>>   - Consistency of Side Effects: suspend / end with savepoint
>>>>>>>>>>>> (FLIP-34)
>>>>>>>>>>>>   - Failed checkpoints explicitly aborted on TaskManagers (not
>>>>>>>>>>>> only on coordinator)
>>>>>>>>>>>>
>>>>>>>>>>>> *Automatic scaling (adjusting parallelism)*
>>>>>>>>>>>>   - Reactive scaling
>>>>>>>>>>>>   - Active scaling policies
>>>>>>>>>>>>
>>>>>>>>>>>> *Kubernetes Integration*
>>>>>>>>>>>>   - Active Kubernetes Integration (Flink actively manages
>>>>>>>>>>>> containers)
>>>>>>>>>>>>
>>>>>>>>>>>> *SQL Ecosystem*
>>>>>>>>>>>>   - Extended Metadata Stores / Catalog / Schema Registries
>>>>>>>>>>>> support
>>>>>>>>>>>>   - DDL support
>>>>>>>>>>>>   - Integration with Hive Ecosystem
>>>>>>>>>>>>
>>>>>>>>>>>> *Simpler Handling of Dependencies*
>>>>>>>>>>>>   - Scala in the APIs, but not in the core (hide in separate
>>>>>>>>>>>> class loader)
>>>>>>>>>>>>   - Hadoop-free by default
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Best Regards
>>>>>>>>
>>>>>>>> Jeff Zhang
>>>>>>>>
>>>>>>>

Re: [DISCUSS] Adding a mid-term roadmap to the Flink website

Posted by Jamie Grier <jg...@lyft.com.INVALID>.

This is awesome, Stephan!  Thanks for doing this.

-Jamie


On Tue, Feb 26, 2019 at 9:29 AM Stephan Ewen <se...@apache.org> wrote:

> Here is the pull request with a draft of the roadmap:
> https://github.com/apache/flink-web/pull/178
>
> Best,
> Stephan
>
> On Fri, Feb 22, 2019 at 5:18 AM Hequn Cheng <ch...@gmail.com> wrote:
>
>> Hi Stephan,
>>
>> Thanks for summarizing the great roadmap! It is very helpful for users
>> and developers to track the direction of Flink.
>> +1 for putting the roadmap on the website and update it per release.
>>
>> Besides, would be great if the roadmap can add the UpsertSource
>> feature(maybe put it under 'Batch Streaming Unification').
>> It has been discussed a long time ago[1,2] and is moving forward step by
>> step.
>> Currently, Flink can only emit upsert results. With the UpsertSource, we
>> can make our system a more complete one.
>>
>> Best, Hequn
>>
>> [1]
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-TABLE-How-to-handle-empty-delete-for-UpsertSource-td23856.html#a23874
>> [2] https://issues.apache.org/jira/browse/FLINK-8545
>> <https://issues.apache.org/jira/browse/FLINK-8545>
>>
>>
>>
>> On Fri, Feb 22, 2019 at 3:31 AM Rong Rong <wa...@gmail.com> wrote:
>>
>>> Hi Stephan,
>>>
>>> Yes. I completely agree. Jincheng & Jark gave some very valuable
>>> feedbacks and suggestions and I think we can definitely move the
>>> conversation forward to reach a more concrete doc first before we put in to
>>> the roadmap. Thanks for reviewing it and driving the roadmap effort!
>>>
>>> --
>>> Rong
>>>
>>> On Thu, Feb 21, 2019 at 8:50 AM Stephan Ewen <se...@apache.org> wrote:
>>>
>>>> Hi Rong Rong!
>>>>
>>>> I would add the security / kerberos threads to the roadmap. They seem
>>>> to be advanced enough in the discussions so that there is clarity what will
>>>> come.
>>>>
>>>> For the window operator with slicing, I would personally like to see
>>>> the discussion advance and have some more clarity and consensus on the
>>>> feature before adding it to the roadmap. Not having that in the first
>>>> version of the roadmap does not mean there will be no activity. And when
>>>> the discussion advances well in the next weeks, we can update the roadmap
>>>> soon.
>>>>
>>>> What do you think?
>>>>
>>>> Best,
>>>> Stephan
>>>>
>>>>
>>>> On Thu, Feb 14, 2019 at 5:46 PM Rong Rong <wa...@gmail.com> wrote:
>>>>
>>>>> Hi Stephan,
>>>>>
>>>>> Thanks for the clarification, yes I think these issues has already
>>>>> been discussed in previous mailing list threads [1,2,3].
>>>>>
>>>>> I also agree that updating the "official" roadmap every release is a
>>>>> very good idea to avoid frequent update.
>>>>> One question I might've been a bit confusion is: are we suggesting to
>>>>> keep one roadmap on the documentation site (e.g. [4]) per release, or
>>>>> simply just one most up-to-date roadmap in the main website [5] ?
>>>>> Just like the release notes in every release, the former will probably
>>>>> provide a good tracker for users to look back at previous roadmaps as well
>>>>> I am assuming.
>>>>>
>>>>> Thanks,
>>>>> Rong
>>>>>
>>>>> [1]
>>>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Improvement-to-Flink-Window-Operator-with-Slicing-td25750.html
>>>>> [2]
>>>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-security-improvements-td21068.html
>>>>> [3]
>>>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-Kerberos-Improvement-td25983.html
>>>>>
>>>>> [4] https://ci.apache.org/projects/flink/flink-docs-release-1.7/
>>>>> [5] https://flink.apache.org/
>>>>>
>>>>> On Thu, Feb 14, 2019 at 2:26 AM Stephan Ewen <se...@apache.org> wrote:
>>>>>
>>>>>> I think the website is better as well.
>>>>>>
>>>>>> I agree with Fabian that the wiki is not so visible, and visibility
>>>>>> is the main motivation.
>>>>>> This type of roadmap overview would not be updated by everyone -
>>>>>> letting committers update the roadmap means the listed threads are actually
>>>>>> happening at the moment.
>>>>>>
>>>>>>
>>>>>> On Thu, Feb 14, 2019 at 11:14 AM Fabian Hueske <fh...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I like the idea of putting the roadmap on the website because it is
>>>>>>> much more visible (and IMO more credible, obligatory) there.
>>>>>>> However, I share the concerns about frequent updates.
>>>>>>>
>>>>>>> It think it would be great to update the "official" roadmap on the
>>>>>>> website once per release (-bugfix releases), i.e., every three month.
>>>>>>> We can use the wiki to collect and draft the roadmap for the next
>>>>>>> update.
>>>>>>>
>>>>>>> Best, Fabian
>>>>>>>
>>>>>>>
>>>>>>> Am Do., 14. Feb. 2019 um 11:03 Uhr schrieb Jeff Zhang <
>>>>>>> zjffdu@gmail.com>:
>>>>>>>
>>>>>>>> Hi Stephan,
>>>>>>>>
>>>>>>>> Thanks for this proposal. It is a good idea to track the roadmap.
>>>>>>>> One suggestion is that it might be better to put it into wiki page first.
>>>>>>>> Because it is easier to update the roadmap on wiki compared to on flink web
>>>>>>>> site. And I guess we may need to update the roadmap very often at the
>>>>>>>> beginning as there's so many discussions and proposals in community
>>>>>>>> recently. We can move it into flink web site later when we feel it could be
>>>>>>>> nailed down.
>>>>>>>>
>>>>>>>> Stephan Ewen <se...@apache.org> 于2019年2月14日周四 下午5:44写道：
>>>>>>>>
>>>>>>>>> Thanks Jincheng and Rong Rong!
>>>>>>>>>
>>>>>>>>> I am not deciding a roadmap and making a call on what features
>>>>>>>>> should be developed or not. I was only collecting broader issues that are
>>>>>>>>> already happening or have an active FLIP/design discussion plus committer
>>>>>>>>> support.
>>>>>>>>>
>>>>>>>>> Do we have that for the suggested issues as well? If yes , we can
>>>>>>>>> add them (can you point me to the issue/mail-thread), if not, let's try and
>>>>>>>>> move the discussion forward and add them to the roadmap overview then.
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> Stephan
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Feb 13, 2019 at 6:47 PM Rong Rong <wa...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Thanks Stephan for the great proposal.
>>>>>>>>>>
>>>>>>>>>> This would not only be beneficial for new users but also for
>>>>>>>>>> contributors to keep track on all upcoming features.
>>>>>>>>>>
>>>>>>>>>> I think that better window operator support can also be
>>>>>>>>>> separately group into its own category, as they affects both future
>>>>>>>>>> DataStream API and batch stream unification.
>>>>>>>>>> can we also include:
>>>>>>>>>> - OVER aggregate for DataStream API separately as @jincheng
>>>>>>>>>> suggested.
>>>>>>>>>> - Improving sliding window operator [1]
>>>>>>>>>>
>>>>>>>>>> One more additional suggestion, can we also include a more
>>>>>>>>>> extendable security module [2,3] @shuyi and I are currently working on?
>>>>>>>>>> This will significantly improve the usability for Flink in
>>>>>>>>>> corporate environments where proprietary or 3rd-party security integration
>>>>>>>>>> is needed.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Rong
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> [1]
>>>>>>>>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Improvement-to-Flink-Window-Operator-with-Slicing-td25750.html
>>>>>>>>>> [2]
>>>>>>>>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-security-improvements-td21068.html
>>>>>>>>>> [3]
>>>>>>>>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-Kerberos-Improvement-td25983.html
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, Feb 13, 2019 at 3:39 AM jincheng sun <
>>>>>>>>>> sunjincheng121@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Very excited and thank you for launching such a great
>>>>>>>>>>> discussion, Stephan !
>>>>>>>>>>>
>>>>>>>>>>> Here only a little suggestion that in the Batch Streaming
>>>>>>>>>>> Unification section, do we need to add an item:
>>>>>>>>>>>
>>>>>>>>>>> - Same window operators on bounded/unbounded Table API and
>>>>>>>>>>> DataStream API
>>>>>>>>>>> (currently OVER window only exists in SQL/TableAPI, DataStream
>>>>>>>>>>> API does not yet support)
>>>>>>>>>>>
>>>>>>>>>>> Best,
>>>>>>>>>>> Jincheng
>>>>>>>>>>>
>>>>>>>>>>> Stephan Ewen <se...@apache.org> 于2019年2月13日周三 下午7:21写道：
>>>>>>>>>>>
>>>>>>>>>>>> Hi all!
>>>>>>>>>>>>
>>>>>>>>>>>> Recently several contributors, committers, and users asked
>>>>>>>>>>>> about making it more visible in which way the project is currently going.
>>>>>>>>>>>>
>>>>>>>>>>>> Users and developers can track the direction by following the
>>>>>>>>>>>> discussion threads and JIRA, but due to the mass of discussions and open
>>>>>>>>>>>> issues, it is very hard to get a good overall picture.
>>>>>>>>>>>> Especially for new users and contributors, is is very hard to
>>>>>>>>>>>> get a quick overview of the project direction.
>>>>>>>>>>>>
>>>>>>>>>>>> To fix this, I suggest to add a brief roadmap summary to the
>>>>>>>>>>>> homepage. It is a bit of a commitment to keep that roadmap up to date, but
>>>>>>>>>>>> I think the benefit for users justifies that.
>>>>>>>>>>>> The Apache Beam project has added such a roadmap [1]
>>>>>>>>>>>> <https://beam.apache.org/roadmap/>, which was received very
>>>>>>>>>>>> well by the community, I would suggest to follow a similar structure here.
>>>>>>>>>>>>
>>>>>>>>>>>> If the community is in favor of this, I would volunteer to
>>>>>>>>>>>> write a first version of such a roadmap. The points I would include are
>>>>>>>>>>>> below.
>>>>>>>>>>>>
>>>>>>>>>>>> Best,
>>>>>>>>>>>> Stephan
>>>>>>>>>>>>
>>>>>>>>>>>> [1] https://beam.apache.org/roadmap/
>>>>>>>>>>>>
>>>>>>>>>>>> ========================================================
>>>>>>>>>>>>
>>>>>>>>>>>> Disclaimer: Apache Flink is not governed or steered by any one
>>>>>>>>>>>> single entity, but by its community and Project Management Committee (PMC).
>>>>>>>>>>>> This is not a authoritative roadmap in the sense of a plan with a specific
>>>>>>>>>>>> timeline. Instead, we share our vision for the future and major initiatives
>>>>>>>>>>>> that are receiving attention and give users and contributors an
>>>>>>>>>>>> understanding what they can look forward to.
>>>>>>>>>>>>
>>>>>>>>>>>> *Future Role of Table API and DataStream API*
>>>>>>>>>>>>   - Table API becomes first class citizen
>>>>>>>>>>>>   - Table API becomes primary API for analytics use cases
>>>>>>>>>>>>       * Declarative, automatic optimizations
>>>>>>>>>>>>       * No manual control over state and timers
>>>>>>>>>>>>   - DataStream API becomes primary API for applications and
>>>>>>>>>>>> data pipeline use cases
>>>>>>>>>>>>       * Physical, user controls data types, no magic or
>>>>>>>>>>>> optimizer
>>>>>>>>>>>>       * Explicit control over state and time
>>>>>>>>>>>>
>>>>>>>>>>>> *Batch Streaming Unification*
>>>>>>>>>>>>   - Table API unification (environments) (FLIP-32)
>>>>>>>>>>>>   - New unified source interface (FLIP-27)
>>>>>>>>>>>>   - Runtime operator unification & code reuse between
>>>>>>>>>>>> DataStream / Table
>>>>>>>>>>>>   - Extending Table API to make it convenient API for all
>>>>>>>>>>>> analytical use cases (easier mix in of UDFs)
>>>>>>>>>>>>   - Same join operators on bounded/unbounded Table API and
>>>>>>>>>>>> DataStream API
>>>>>>>>>>>>
>>>>>>>>>>>> *Faster Batch (Bounded Streams)*
>>>>>>>>>>>>   - Much of this comes via Blink contribution/merging
>>>>>>>>>>>>   - Fine-grained Fault Tolerance on bounded data (Table API)
>>>>>>>>>>>>   - Batch Scheduling on bounded data (Table API)
>>>>>>>>>>>>   - External Shuffle Services Support on bounded streams
>>>>>>>>>>>>   - Caching of intermediate results on bounded data (Table API)
>>>>>>>>>>>>   - Extending DataStream API to explicitly model bounded
>>>>>>>>>>>> streams (API breaking)
>>>>>>>>>>>>   - Add fine fault tolerance, scheduling, caching also to
>>>>>>>>>>>> DataStream API
>>>>>>>>>>>>
>>>>>>>>>>>> *Streaming State Evolution*
>>>>>>>>>>>>   - Let all built-in serializers support stable evolution
>>>>>>>>>>>>   - First class support for other evolvable formats (Protobuf,
>>>>>>>>>>>> Thrift)
>>>>>>>>>>>>   - Savepoint input/output format to modify / adjust savepoints
>>>>>>>>>>>>
>>>>>>>>>>>> *Simpler Event Time Handling*
>>>>>>>>>>>>   - Event Time Alignment in Sources
>>>>>>>>>>>>   - Simpler out-of-the box support in sources
>>>>>>>>>>>>
>>>>>>>>>>>> *Checkpointing*
>>>>>>>>>>>>   - Consistency of Side Effects: suspend / end with savepoint
>>>>>>>>>>>> (FLIP-34)
>>>>>>>>>>>>   - Failed checkpoints explicitly aborted on TaskManagers (not
>>>>>>>>>>>> only on coordinator)
>>>>>>>>>>>>
>>>>>>>>>>>> *Automatic scaling (adjusting parallelism)*
>>>>>>>>>>>>   - Reactive scaling
>>>>>>>>>>>>   - Active scaling policies
>>>>>>>>>>>>
>>>>>>>>>>>> *Kubernetes Integration*
>>>>>>>>>>>>   - Active Kubernetes Integration (Flink actively manages
>>>>>>>>>>>> containers)
>>>>>>>>>>>>
>>>>>>>>>>>> *SQL Ecosystem*
>>>>>>>>>>>>   - Extended Metadata Stores / Catalog / Schema Registries
>>>>>>>>>>>> support
>>>>>>>>>>>>   - DDL support
>>>>>>>>>>>>   - Integration with Hive Ecosystem
>>>>>>>>>>>>
>>>>>>>>>>>> *Simpler Handling of Dependencies*
>>>>>>>>>>>>   - Scala in the APIs, but not in the core (hide in separate
>>>>>>>>>>>> class loader)
>>>>>>>>>>>>   - Hadoop-free by default
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Best Regards
>>>>>>>>
>>>>>>>> Jeff Zhang
>>>>>>>>
>>>>>>>

Re: [DISCUSS] Adding a mid-term roadmap to the Flink website

Posted by Stephan Ewen <se...@apache.org>.

Here is the pull request with a draft of the roadmap:
https://github.com/apache/flink-web/pull/178

Best,
Stephan

On Fri, Feb 22, 2019 at 5:18 AM Hequn Cheng <ch...@gmail.com> wrote:

> Hi Stephan,
>
> Thanks for summarizing the great roadmap! It is very helpful for users and
> developers to track the direction of Flink.
> +1 for putting the roadmap on the website and update it per release.
>
> Besides, would be great if the roadmap can add the UpsertSource
> feature(maybe put it under 'Batch Streaming Unification').
> It has been discussed a long time ago[1,2] and is moving forward step by
> step.
> Currently, Flink can only emit upsert results. With the UpsertSource, we
> can make our system a more complete one.
>
> Best, Hequn
>
> [1]
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-TABLE-How-to-handle-empty-delete-for-UpsertSource-td23856.html#a23874
> [2] https://issues.apache.org/jira/browse/FLINK-8545
> <https://issues.apache.org/jira/browse/FLINK-8545>
>
>
>
> On Fri, Feb 22, 2019 at 3:31 AM Rong Rong <wa...@gmail.com> wrote:
>
>> Hi Stephan,
>>
>> Yes. I completely agree. Jincheng & Jark gave some very valuable
>> feedbacks and suggestions and I think we can definitely move the
>> conversation forward to reach a more concrete doc first before we put in to
>> the roadmap. Thanks for reviewing it and driving the roadmap effort!
>>
>> --
>> Rong
>>
>> On Thu, Feb 21, 2019 at 8:50 AM Stephan Ewen <se...@apache.org> wrote:
>>
>>> Hi Rong Rong!
>>>
>>> I would add the security / kerberos threads to the roadmap. They seem to
>>> be advanced enough in the discussions so that there is clarity what will
>>> come.
>>>
>>> For the window operator with slicing, I would personally like to see the
>>> discussion advance and have some more clarity and consensus on the feature
>>> before adding it to the roadmap. Not having that in the first version of
>>> the roadmap does not mean there will be no activity. And when the
>>> discussion advances well in the next weeks, we can update the roadmap soon.
>>>
>>> What do you think?
>>>
>>> Best,
>>> Stephan
>>>
>>>
>>> On Thu, Feb 14, 2019 at 5:46 PM Rong Rong <wa...@gmail.com> wrote:
>>>
>>>> Hi Stephan,
>>>>
>>>> Thanks for the clarification, yes I think these issues has already been
>>>> discussed in previous mailing list threads [1,2,3].
>>>>
>>>> I also agree that updating the "official" roadmap every release is a
>>>> very good idea to avoid frequent update.
>>>> One question I might've been a bit confusion is: are we suggesting to
>>>> keep one roadmap on the documentation site (e.g. [4]) per release, or
>>>> simply just one most up-to-date roadmap in the main website [5] ?
>>>> Just like the release notes in every release, the former will probably
>>>> provide a good tracker for users to look back at previous roadmaps as well
>>>> I am assuming.
>>>>
>>>> Thanks,
>>>> Rong
>>>>
>>>> [1]
>>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Improvement-to-Flink-Window-Operator-with-Slicing-td25750.html
>>>> [2]
>>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-security-improvements-td21068.html
>>>> [3]
>>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-Kerberos-Improvement-td25983.html
>>>>
>>>> [4] https://ci.apache.org/projects/flink/flink-docs-release-1.7/
>>>> [5] https://flink.apache.org/
>>>>
>>>> On Thu, Feb 14, 2019 at 2:26 AM Stephan Ewen <se...@apache.org> wrote:
>>>>
>>>>> I think the website is better as well.
>>>>>
>>>>> I agree with Fabian that the wiki is not so visible, and visibility is
>>>>> the main motivation.
>>>>> This type of roadmap overview would not be updated by everyone -
>>>>> letting committers update the roadmap means the listed threads are actually
>>>>> happening at the moment.
>>>>>
>>>>>
>>>>> On Thu, Feb 14, 2019 at 11:14 AM Fabian Hueske <fh...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I like the idea of putting the roadmap on the website because it is
>>>>>> much more visible (and IMO more credible, obligatory) there.
>>>>>> However, I share the concerns about frequent updates.
>>>>>>
>>>>>> It think it would be great to update the "official" roadmap on the
>>>>>> website once per release (-bugfix releases), i.e., every three month.
>>>>>> We can use the wiki to collect and draft the roadmap for the next
>>>>>> update.
>>>>>>
>>>>>> Best, Fabian
>>>>>>
>>>>>>
>>>>>> Am Do., 14. Feb. 2019 um 11:03 Uhr schrieb Jeff Zhang <
>>>>>> zjffdu@gmail.com>:
>>>>>>
>>>>>>> Hi Stephan,
>>>>>>>
>>>>>>> Thanks for this proposal. It is a good idea to track the roadmap.
>>>>>>> One suggestion is that it might be better to put it into wiki page first.
>>>>>>> Because it is easier to update the roadmap on wiki compared to on flink web
>>>>>>> site. And I guess we may need to update the roadmap very often at the
>>>>>>> beginning as there's so many discussions and proposals in community
>>>>>>> recently. We can move it into flink web site later when we feel it could be
>>>>>>> nailed down.
>>>>>>>
>>>>>>> Stephan Ewen <se...@apache.org> 于2019年2月14日周四 下午5:44写道：
>>>>>>>
>>>>>>>> Thanks Jincheng and Rong Rong!
>>>>>>>>
>>>>>>>> I am not deciding a roadmap and making a call on what features
>>>>>>>> should be developed or not. I was only collecting broader issues that are
>>>>>>>> already happening or have an active FLIP/design discussion plus committer
>>>>>>>> support.
>>>>>>>>
>>>>>>>> Do we have that for the suggested issues as well? If yes , we can
>>>>>>>> add them (can you point me to the issue/mail-thread), if not, let's try and
>>>>>>>> move the discussion forward and add them to the roadmap overview then.
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Stephan
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Feb 13, 2019 at 6:47 PM Rong Rong <wa...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Thanks Stephan for the great proposal.
>>>>>>>>>
>>>>>>>>> This would not only be beneficial for new users but also for
>>>>>>>>> contributors to keep track on all upcoming features.
>>>>>>>>>
>>>>>>>>> I think that better window operator support can also be separately
>>>>>>>>> group into its own category, as they affects both future DataStream API and
>>>>>>>>> batch stream unification.
>>>>>>>>> can we also include:
>>>>>>>>> - OVER aggregate for DataStream API separately as @jincheng
>>>>>>>>> suggested.
>>>>>>>>> - Improving sliding window operator [1]
>>>>>>>>>
>>>>>>>>> One more additional suggestion, can we also include a more
>>>>>>>>> extendable security module [2,3] @shuyi and I are currently working on?
>>>>>>>>> This will significantly improve the usability for Flink in
>>>>>>>>> corporate environments where proprietary or 3rd-party security integration
>>>>>>>>> is needed.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Rong
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> [1]
>>>>>>>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Improvement-to-Flink-Window-Operator-with-Slicing-td25750.html
>>>>>>>>> [2]
>>>>>>>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-security-improvements-td21068.html
>>>>>>>>> [3]
>>>>>>>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-Kerberos-Improvement-td25983.html
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Feb 13, 2019 at 3:39 AM jincheng sun <
>>>>>>>>> sunjincheng121@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Very excited and thank you for launching such a great discussion,
>>>>>>>>>> Stephan !
>>>>>>>>>>
>>>>>>>>>> Here only a little suggestion that in the Batch Streaming
>>>>>>>>>> Unification section, do we need to add an item:
>>>>>>>>>>
>>>>>>>>>> - Same window operators on bounded/unbounded Table API and
>>>>>>>>>> DataStream API
>>>>>>>>>> (currently OVER window only exists in SQL/TableAPI, DataStream
>>>>>>>>>> API does not yet support)
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Jincheng
>>>>>>>>>>
>>>>>>>>>> Stephan Ewen <se...@apache.org> 于2019年2月13日周三 下午7:21写道：
>>>>>>>>>>
>>>>>>>>>>> Hi all!
>>>>>>>>>>>
>>>>>>>>>>> Recently several contributors, committers, and users asked about
>>>>>>>>>>> making it more visible in which way the project is currently going.
>>>>>>>>>>>
>>>>>>>>>>> Users and developers can track the direction by following the
>>>>>>>>>>> discussion threads and JIRA, but due to the mass of discussions and open
>>>>>>>>>>> issues, it is very hard to get a good overall picture.
>>>>>>>>>>> Especially for new users and contributors, is is very hard to
>>>>>>>>>>> get a quick overview of the project direction.
>>>>>>>>>>>
>>>>>>>>>>> To fix this, I suggest to add a brief roadmap summary to the
>>>>>>>>>>> homepage. It is a bit of a commitment to keep that roadmap up to date, but
>>>>>>>>>>> I think the benefit for users justifies that.
>>>>>>>>>>> The Apache Beam project has added such a roadmap [1]
>>>>>>>>>>> <https://beam.apache.org/roadmap/>, which was received very
>>>>>>>>>>> well by the community, I would suggest to follow a similar structure here.
>>>>>>>>>>>
>>>>>>>>>>> If the community is in favor of this, I would volunteer to write
>>>>>>>>>>> a first version of such a roadmap. The points I would include are below.
>>>>>>>>>>>
>>>>>>>>>>> Best,
>>>>>>>>>>> Stephan
>>>>>>>>>>>
>>>>>>>>>>> [1] https://beam.apache.org/roadmap/
>>>>>>>>>>>
>>>>>>>>>>> ========================================================
>>>>>>>>>>>
>>>>>>>>>>> Disclaimer: Apache Flink is not governed or steered by any one
>>>>>>>>>>> single entity, but by its community and Project Management Committee (PMC).
>>>>>>>>>>> This is not a authoritative roadmap in the sense of a plan with a specific
>>>>>>>>>>> timeline. Instead, we share our vision for the future and major initiatives
>>>>>>>>>>> that are receiving attention and give users and contributors an
>>>>>>>>>>> understanding what they can look forward to.
>>>>>>>>>>>
>>>>>>>>>>> *Future Role of Table API and DataStream API*
>>>>>>>>>>>   - Table API becomes first class citizen
>>>>>>>>>>>   - Table API becomes primary API for analytics use cases
>>>>>>>>>>>       * Declarative, automatic optimizations
>>>>>>>>>>>       * No manual control over state and timers
>>>>>>>>>>>   - DataStream API becomes primary API for applications and data
>>>>>>>>>>> pipeline use cases
>>>>>>>>>>>       * Physical, user controls data types, no magic or optimizer
>>>>>>>>>>>       * Explicit control over state and time
>>>>>>>>>>>
>>>>>>>>>>> *Batch Streaming Unification*
>>>>>>>>>>>   - Table API unification (environments) (FLIP-32)
>>>>>>>>>>>   - New unified source interface (FLIP-27)
>>>>>>>>>>>   - Runtime operator unification & code reuse between DataStream
>>>>>>>>>>> / Table
>>>>>>>>>>>   - Extending Table API to make it convenient API for all
>>>>>>>>>>> analytical use cases (easier mix in of UDFs)
>>>>>>>>>>>   - Same join operators on bounded/unbounded Table API and
>>>>>>>>>>> DataStream API
>>>>>>>>>>>
>>>>>>>>>>> *Faster Batch (Bounded Streams)*
>>>>>>>>>>>   - Much of this comes via Blink contribution/merging
>>>>>>>>>>>   - Fine-grained Fault Tolerance on bounded data (Table API)
>>>>>>>>>>>   - Batch Scheduling on bounded data (Table API)
>>>>>>>>>>>   - External Shuffle Services Support on bounded streams
>>>>>>>>>>>   - Caching of intermediate results on bounded data (Table API)
>>>>>>>>>>>   - Extending DataStream API to explicitly model bounded streams
>>>>>>>>>>> (API breaking)
>>>>>>>>>>>   - Add fine fault tolerance, scheduling, caching also to
>>>>>>>>>>> DataStream API
>>>>>>>>>>>
>>>>>>>>>>> *Streaming State Evolution*
>>>>>>>>>>>   - Let all built-in serializers support stable evolution
>>>>>>>>>>>   - First class support for other evolvable formats (Protobuf,
>>>>>>>>>>> Thrift)
>>>>>>>>>>>   - Savepoint input/output format to modify / adjust savepoints
>>>>>>>>>>>
>>>>>>>>>>> *Simpler Event Time Handling*
>>>>>>>>>>>   - Event Time Alignment in Sources
>>>>>>>>>>>   - Simpler out-of-the box support in sources
>>>>>>>>>>>
>>>>>>>>>>> *Checkpointing*
>>>>>>>>>>>   - Consistency of Side Effects: suspend / end with savepoint
>>>>>>>>>>> (FLIP-34)
>>>>>>>>>>>   - Failed checkpoints explicitly aborted on TaskManagers (not
>>>>>>>>>>> only on coordinator)
>>>>>>>>>>>
>>>>>>>>>>> *Automatic scaling (adjusting parallelism)*
>>>>>>>>>>>   - Reactive scaling
>>>>>>>>>>>   - Active scaling policies
>>>>>>>>>>>
>>>>>>>>>>> *Kubernetes Integration*
>>>>>>>>>>>   - Active Kubernetes Integration (Flink actively manages
>>>>>>>>>>> containers)
>>>>>>>>>>>
>>>>>>>>>>> *SQL Ecosystem*
>>>>>>>>>>>   - Extended Metadata Stores / Catalog / Schema Registries
>>>>>>>>>>> support
>>>>>>>>>>>   - DDL support
>>>>>>>>>>>   - Integration with Hive Ecosystem
>>>>>>>>>>>
>>>>>>>>>>> *Simpler Handling of Dependencies*
>>>>>>>>>>>   - Scala in the APIs, but not in the core (hide in separate
>>>>>>>>>>> class loader)
>>>>>>>>>>>   - Hadoop-free by default
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Best Regards
>>>>>>>
>>>>>>> Jeff Zhang
>>>>>>>
>>>>>>

Re: [DISCUSS] Adding a mid-term roadmap to the Flink website

Posted by Stephan Ewen <se...@apache.org>.

Here is the pull request with a draft of the roadmap:
https://github.com/apache/flink-web/pull/178

Best,
Stephan

On Fri, Feb 22, 2019 at 5:18 AM Hequn Cheng <ch...@gmail.com> wrote:

> Hi Stephan,
>
> Thanks for summarizing the great roadmap! It is very helpful for users and
> developers to track the direction of Flink.
> +1 for putting the roadmap on the website and update it per release.
>
> Besides, would be great if the roadmap can add the UpsertSource
> feature(maybe put it under 'Batch Streaming Unification').
> It has been discussed a long time ago[1,2] and is moving forward step by
> step.
> Currently, Flink can only emit upsert results. With the UpsertSource, we
> can make our system a more complete one.
>
> Best, Hequn
>
> [1]
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-TABLE-How-to-handle-empty-delete-for-UpsertSource-td23856.html#a23874
> [2] https://issues.apache.org/jira/browse/FLINK-8545
> <https://issues.apache.org/jira/browse/FLINK-8545>
>
>
>
> On Fri, Feb 22, 2019 at 3:31 AM Rong Rong <wa...@gmail.com> wrote:
>
>> Hi Stephan,
>>
>> Yes. I completely agree. Jincheng & Jark gave some very valuable
>> feedbacks and suggestions and I think we can definitely move the
>> conversation forward to reach a more concrete doc first before we put in to
>> the roadmap. Thanks for reviewing it and driving the roadmap effort!
>>
>> --
>> Rong
>>
>> On Thu, Feb 21, 2019 at 8:50 AM Stephan Ewen <se...@apache.org> wrote:
>>
>>> Hi Rong Rong!
>>>
>>> I would add the security / kerberos threads to the roadmap. They seem to
>>> be advanced enough in the discussions so that there is clarity what will
>>> come.
>>>
>>> For the window operator with slicing, I would personally like to see the
>>> discussion advance and have some more clarity and consensus on the feature
>>> before adding it to the roadmap. Not having that in the first version of
>>> the roadmap does not mean there will be no activity. And when the
>>> discussion advances well in the next weeks, we can update the roadmap soon.
>>>
>>> What do you think?
>>>
>>> Best,
>>> Stephan
>>>
>>>
>>> On Thu, Feb 14, 2019 at 5:46 PM Rong Rong <wa...@gmail.com> wrote:
>>>
>>>> Hi Stephan,
>>>>
>>>> Thanks for the clarification, yes I think these issues has already been
>>>> discussed in previous mailing list threads [1,2,3].
>>>>
>>>> I also agree that updating the "official" roadmap every release is a
>>>> very good idea to avoid frequent update.
>>>> One question I might've been a bit confusion is: are we suggesting to
>>>> keep one roadmap on the documentation site (e.g. [4]) per release, or
>>>> simply just one most up-to-date roadmap in the main website [5] ?
>>>> Just like the release notes in every release, the former will probably
>>>> provide a good tracker for users to look back at previous roadmaps as well
>>>> I am assuming.
>>>>
>>>> Thanks,
>>>> Rong
>>>>
>>>> [1]
>>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Improvement-to-Flink-Window-Operator-with-Slicing-td25750.html
>>>> [2]
>>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-security-improvements-td21068.html
>>>> [3]
>>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-Kerberos-Improvement-td25983.html
>>>>
>>>> [4] https://ci.apache.org/projects/flink/flink-docs-release-1.7/
>>>> [5] https://flink.apache.org/
>>>>
>>>> On Thu, Feb 14, 2019 at 2:26 AM Stephan Ewen <se...@apache.org> wrote:
>>>>
>>>>> I think the website is better as well.
>>>>>
>>>>> I agree with Fabian that the wiki is not so visible, and visibility is
>>>>> the main motivation.
>>>>> This type of roadmap overview would not be updated by everyone -
>>>>> letting committers update the roadmap means the listed threads are actually
>>>>> happening at the moment.
>>>>>
>>>>>
>>>>> On Thu, Feb 14, 2019 at 11:14 AM Fabian Hueske <fh...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I like the idea of putting the roadmap on the website because it is
>>>>>> much more visible (and IMO more credible, obligatory) there.
>>>>>> However, I share the concerns about frequent updates.
>>>>>>
>>>>>> It think it would be great to update the "official" roadmap on the
>>>>>> website once per release (-bugfix releases), i.e., every three month.
>>>>>> We can use the wiki to collect and draft the roadmap for the next
>>>>>> update.
>>>>>>
>>>>>> Best, Fabian
>>>>>>
>>>>>>
>>>>>> Am Do., 14. Feb. 2019 um 11:03 Uhr schrieb Jeff Zhang <
>>>>>> zjffdu@gmail.com>:
>>>>>>
>>>>>>> Hi Stephan,
>>>>>>>
>>>>>>> Thanks for this proposal. It is a good idea to track the roadmap.
>>>>>>> One suggestion is that it might be better to put it into wiki page first.
>>>>>>> Because it is easier to update the roadmap on wiki compared to on flink web
>>>>>>> site. And I guess we may need to update the roadmap very often at the
>>>>>>> beginning as there's so many discussions and proposals in community
>>>>>>> recently. We can move it into flink web site later when we feel it could be
>>>>>>> nailed down.
>>>>>>>
>>>>>>> Stephan Ewen <se...@apache.org> 于2019年2月14日周四 下午5:44写道：
>>>>>>>
>>>>>>>> Thanks Jincheng and Rong Rong!
>>>>>>>>
>>>>>>>> I am not deciding a roadmap and making a call on what features
>>>>>>>> should be developed or not. I was only collecting broader issues that are
>>>>>>>> already happening or have an active FLIP/design discussion plus committer
>>>>>>>> support.
>>>>>>>>
>>>>>>>> Do we have that for the suggested issues as well? If yes , we can
>>>>>>>> add them (can you point me to the issue/mail-thread), if not, let's try and
>>>>>>>> move the discussion forward and add them to the roadmap overview then.
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Stephan
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Feb 13, 2019 at 6:47 PM Rong Rong <wa...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Thanks Stephan for the great proposal.
>>>>>>>>>
>>>>>>>>> This would not only be beneficial for new users but also for
>>>>>>>>> contributors to keep track on all upcoming features.
>>>>>>>>>
>>>>>>>>> I think that better window operator support can also be separately
>>>>>>>>> group into its own category, as they affects both future DataStream API and
>>>>>>>>> batch stream unification.
>>>>>>>>> can we also include:
>>>>>>>>> - OVER aggregate for DataStream API separately as @jincheng
>>>>>>>>> suggested.
>>>>>>>>> - Improving sliding window operator [1]
>>>>>>>>>
>>>>>>>>> One more additional suggestion, can we also include a more
>>>>>>>>> extendable security module [2,3] @shuyi and I are currently working on?
>>>>>>>>> This will significantly improve the usability for Flink in
>>>>>>>>> corporate environments where proprietary or 3rd-party security integration
>>>>>>>>> is needed.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Rong
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> [1]
>>>>>>>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Improvement-to-Flink-Window-Operator-with-Slicing-td25750.html
>>>>>>>>> [2]
>>>>>>>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-security-improvements-td21068.html
>>>>>>>>> [3]
>>>>>>>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-Kerberos-Improvement-td25983.html
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Feb 13, 2019 at 3:39 AM jincheng sun <
>>>>>>>>> sunjincheng121@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Very excited and thank you for launching such a great discussion,
>>>>>>>>>> Stephan !
>>>>>>>>>>
>>>>>>>>>> Here only a little suggestion that in the Batch Streaming
>>>>>>>>>> Unification section, do we need to add an item:
>>>>>>>>>>
>>>>>>>>>> - Same window operators on bounded/unbounded Table API and
>>>>>>>>>> DataStream API
>>>>>>>>>> (currently OVER window only exists in SQL/TableAPI, DataStream
>>>>>>>>>> API does not yet support)
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Jincheng
>>>>>>>>>>
>>>>>>>>>> Stephan Ewen <se...@apache.org> 于2019年2月13日周三 下午7:21写道：
>>>>>>>>>>
>>>>>>>>>>> Hi all!
>>>>>>>>>>>
>>>>>>>>>>> Recently several contributors, committers, and users asked about
>>>>>>>>>>> making it more visible in which way the project is currently going.
>>>>>>>>>>>
>>>>>>>>>>> Users and developers can track the direction by following the
>>>>>>>>>>> discussion threads and JIRA, but due to the mass of discussions and open
>>>>>>>>>>> issues, it is very hard to get a good overall picture.
>>>>>>>>>>> Especially for new users and contributors, is is very hard to
>>>>>>>>>>> get a quick overview of the project direction.
>>>>>>>>>>>
>>>>>>>>>>> To fix this, I suggest to add a brief roadmap summary to the
>>>>>>>>>>> homepage. It is a bit of a commitment to keep that roadmap up to date, but
>>>>>>>>>>> I think the benefit for users justifies that.
>>>>>>>>>>> The Apache Beam project has added such a roadmap [1]
>>>>>>>>>>> <https://beam.apache.org/roadmap/>, which was received very
>>>>>>>>>>> well by the community, I would suggest to follow a similar structure here.
>>>>>>>>>>>
>>>>>>>>>>> If the community is in favor of this, I would volunteer to write
>>>>>>>>>>> a first version of such a roadmap. The points I would include are below.
>>>>>>>>>>>
>>>>>>>>>>> Best,
>>>>>>>>>>> Stephan
>>>>>>>>>>>
>>>>>>>>>>> [1] https://beam.apache.org/roadmap/
>>>>>>>>>>>
>>>>>>>>>>> ========================================================
>>>>>>>>>>>
>>>>>>>>>>> Disclaimer: Apache Flink is not governed or steered by any one
>>>>>>>>>>> single entity, but by its community and Project Management Committee (PMC).
>>>>>>>>>>> This is not a authoritative roadmap in the sense of a plan with a specific
>>>>>>>>>>> timeline. Instead, we share our vision for the future and major initiatives
>>>>>>>>>>> that are receiving attention and give users and contributors an
>>>>>>>>>>> understanding what they can look forward to.
>>>>>>>>>>>
>>>>>>>>>>> *Future Role of Table API and DataStream API*
>>>>>>>>>>>   - Table API becomes first class citizen
>>>>>>>>>>>   - Table API becomes primary API for analytics use cases
>>>>>>>>>>>       * Declarative, automatic optimizations
>>>>>>>>>>>       * No manual control over state and timers
>>>>>>>>>>>   - DataStream API becomes primary API for applications and data
>>>>>>>>>>> pipeline use cases
>>>>>>>>>>>       * Physical, user controls data types, no magic or optimizer
>>>>>>>>>>>       * Explicit control over state and time
>>>>>>>>>>>
>>>>>>>>>>> *Batch Streaming Unification*
>>>>>>>>>>>   - Table API unification (environments) (FLIP-32)
>>>>>>>>>>>   - New unified source interface (FLIP-27)
>>>>>>>>>>>   - Runtime operator unification & code reuse between DataStream
>>>>>>>>>>> / Table
>>>>>>>>>>>   - Extending Table API to make it convenient API for all
>>>>>>>>>>> analytical use cases (easier mix in of UDFs)
>>>>>>>>>>>   - Same join operators on bounded/unbounded Table API and
>>>>>>>>>>> DataStream API
>>>>>>>>>>>
>>>>>>>>>>> *Faster Batch (Bounded Streams)*
>>>>>>>>>>>   - Much of this comes via Blink contribution/merging
>>>>>>>>>>>   - Fine-grained Fault Tolerance on bounded data (Table API)
>>>>>>>>>>>   - Batch Scheduling on bounded data (Table API)
>>>>>>>>>>>   - External Shuffle Services Support on bounded streams
>>>>>>>>>>>   - Caching of intermediate results on bounded data (Table API)
>>>>>>>>>>>   - Extending DataStream API to explicitly model bounded streams
>>>>>>>>>>> (API breaking)
>>>>>>>>>>>   - Add fine fault tolerance, scheduling, caching also to
>>>>>>>>>>> DataStream API
>>>>>>>>>>>
>>>>>>>>>>> *Streaming State Evolution*
>>>>>>>>>>>   - Let all built-in serializers support stable evolution
>>>>>>>>>>>   - First class support for other evolvable formats (Protobuf,
>>>>>>>>>>> Thrift)
>>>>>>>>>>>   - Savepoint input/output format to modify / adjust savepoints
>>>>>>>>>>>
>>>>>>>>>>> *Simpler Event Time Handling*
>>>>>>>>>>>   - Event Time Alignment in Sources
>>>>>>>>>>>   - Simpler out-of-the box support in sources
>>>>>>>>>>>
>>>>>>>>>>> *Checkpointing*
>>>>>>>>>>>   - Consistency of Side Effects: suspend / end with savepoint
>>>>>>>>>>> (FLIP-34)
>>>>>>>>>>>   - Failed checkpoints explicitly aborted on TaskManagers (not
>>>>>>>>>>> only on coordinator)
>>>>>>>>>>>
>>>>>>>>>>> *Automatic scaling (adjusting parallelism)*
>>>>>>>>>>>   - Reactive scaling
>>>>>>>>>>>   - Active scaling policies
>>>>>>>>>>>
>>>>>>>>>>> *Kubernetes Integration*
>>>>>>>>>>>   - Active Kubernetes Integration (Flink actively manages
>>>>>>>>>>> containers)
>>>>>>>>>>>
>>>>>>>>>>> *SQL Ecosystem*
>>>>>>>>>>>   - Extended Metadata Stores / Catalog / Schema Registries
>>>>>>>>>>> support
>>>>>>>>>>>   - DDL support
>>>>>>>>>>>   - Integration with Hive Ecosystem
>>>>>>>>>>>
>>>>>>>>>>> *Simpler Handling of Dependencies*
>>>>>>>>>>>   - Scala in the APIs, but not in the core (hide in separate
>>>>>>>>>>> class loader)
>>>>>>>>>>>   - Hadoop-free by default
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Best Regards
>>>>>>>
>>>>>>> Jeff Zhang
>>>>>>>
>>>>>>

Re: [DISCUSS] Adding a mid-term roadmap to the Flink website

Posted by Hequn Cheng <ch...@gmail.com>.

Hi Stephan,

Thanks for summarizing the great roadmap! It is very helpful for users and
developers to track the direction of Flink.
+1 for putting the roadmap on the website and update it per release.

Besides, would be great if the roadmap can add the UpsertSource
feature(maybe put it under 'Batch Streaming Unification').
It has been discussed a long time ago[1,2] and is moving forward step by
step.
Currently, Flink can only emit upsert results. With the UpsertSource, we
can make our system a more complete one.

Best, Hequn

[1]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-TABLE-How-to-handle-empty-delete-for-UpsertSource-td23856.html#a23874
[2] https://issues.apache.org/jira/browse/FLINK-8545
<https://issues.apache.org/jira/browse/FLINK-8545>



On Fri, Feb 22, 2019 at 3:31 AM Rong Rong <wa...@gmail.com> wrote:

> Hi Stephan,
>
> Yes. I completely agree. Jincheng & Jark gave some very valuable feedbacks
> and suggestions and I think we can definitely move the conversation forward
> to reach a more concrete doc first before we put in to the roadmap. Thanks
> for reviewing it and driving the roadmap effort!
>
> --
> Rong
>
> On Thu, Feb 21, 2019 at 8:50 AM Stephan Ewen <se...@apache.org> wrote:
>
>> Hi Rong Rong!
>>
>> I would add the security / kerberos threads to the roadmap. They seem to
>> be advanced enough in the discussions so that there is clarity what will
>> come.
>>
>> For the window operator with slicing, I would personally like to see the
>> discussion advance and have some more clarity and consensus on the feature
>> before adding it to the roadmap. Not having that in the first version of
>> the roadmap does not mean there will be no activity. And when the
>> discussion advances well in the next weeks, we can update the roadmap soon.
>>
>> What do you think?
>>
>> Best,
>> Stephan
>>
>>
>> On Thu, Feb 14, 2019 at 5:46 PM Rong Rong <wa...@gmail.com> wrote:
>>
>>> Hi Stephan,
>>>
>>> Thanks for the clarification, yes I think these issues has already been
>>> discussed in previous mailing list threads [1,2,3].
>>>
>>> I also agree that updating the "official" roadmap every release is a
>>> very good idea to avoid frequent update.
>>> One question I might've been a bit confusion is: are we suggesting to
>>> keep one roadmap on the documentation site (e.g. [4]) per release, or
>>> simply just one most up-to-date roadmap in the main website [5] ?
>>> Just like the release notes in every release, the former will probably
>>> provide a good tracker for users to look back at previous roadmaps as well
>>> I am assuming.
>>>
>>> Thanks,
>>> Rong
>>>
>>> [1]
>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Improvement-to-Flink-Window-Operator-with-Slicing-td25750.html
>>> [2]
>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-security-improvements-td21068.html
>>> [3]
>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-Kerberos-Improvement-td25983.html
>>>
>>> [4] https://ci.apache.org/projects/flink/flink-docs-release-1.7/
>>> [5] https://flink.apache.org/
>>>
>>> On Thu, Feb 14, 2019 at 2:26 AM Stephan Ewen <se...@apache.org> wrote:
>>>
>>>> I think the website is better as well.
>>>>
>>>> I agree with Fabian that the wiki is not so visible, and visibility is
>>>> the main motivation.
>>>> This type of roadmap overview would not be updated by everyone -
>>>> letting committers update the roadmap means the listed threads are actually
>>>> happening at the moment.
>>>>
>>>>
>>>> On Thu, Feb 14, 2019 at 11:14 AM Fabian Hueske <fh...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I like the idea of putting the roadmap on the website because it is
>>>>> much more visible (and IMO more credible, obligatory) there.
>>>>> However, I share the concerns about frequent updates.
>>>>>
>>>>> It think it would be great to update the "official" roadmap on the
>>>>> website once per release (-bugfix releases), i.e., every three month.
>>>>> We can use the wiki to collect and draft the roadmap for the next
>>>>> update.
>>>>>
>>>>> Best, Fabian
>>>>>
>>>>>
>>>>> Am Do., 14. Feb. 2019 um 11:03 Uhr schrieb Jeff Zhang <
>>>>> zjffdu@gmail.com>:
>>>>>
>>>>>> Hi Stephan,
>>>>>>
>>>>>> Thanks for this proposal. It is a good idea to track the roadmap. One
>>>>>> suggestion is that it might be better to put it into wiki page first.
>>>>>> Because it is easier to update the roadmap on wiki compared to on flink web
>>>>>> site. And I guess we may need to update the roadmap very often at the
>>>>>> beginning as there's so many discussions and proposals in community
>>>>>> recently. We can move it into flink web site later when we feel it could be
>>>>>> nailed down.
>>>>>>
>>>>>> Stephan Ewen <se...@apache.org> 于2019年2月14日周四 下午5:44写道：
>>>>>>
>>>>>>> Thanks Jincheng and Rong Rong!
>>>>>>>
>>>>>>> I am not deciding a roadmap and making a call on what features
>>>>>>> should be developed or not. I was only collecting broader issues that are
>>>>>>> already happening or have an active FLIP/design discussion plus committer
>>>>>>> support.
>>>>>>>
>>>>>>> Do we have that for the suggested issues as well? If yes , we can
>>>>>>> add them (can you point me to the issue/mail-thread), if not, let's try and
>>>>>>> move the discussion forward and add them to the roadmap overview then.
>>>>>>>
>>>>>>> Best,
>>>>>>> Stephan
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Feb 13, 2019 at 6:47 PM Rong Rong <wa...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Thanks Stephan for the great proposal.
>>>>>>>>
>>>>>>>> This would not only be beneficial for new users but also for
>>>>>>>> contributors to keep track on all upcoming features.
>>>>>>>>
>>>>>>>> I think that better window operator support can also be separately
>>>>>>>> group into its own category, as they affects both future DataStream API and
>>>>>>>> batch stream unification.
>>>>>>>> can we also include:
>>>>>>>> - OVER aggregate for DataStream API separately as @jincheng
>>>>>>>> suggested.
>>>>>>>> - Improving sliding window operator [1]
>>>>>>>>
>>>>>>>> One more additional suggestion, can we also include a more
>>>>>>>> extendable security module [2,3] @shuyi and I are currently working on?
>>>>>>>> This will significantly improve the usability for Flink in
>>>>>>>> corporate environments where proprietary or 3rd-party security integration
>>>>>>>> is needed.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Rong
>>>>>>>>
>>>>>>>>
>>>>>>>> [1]
>>>>>>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Improvement-to-Flink-Window-Operator-with-Slicing-td25750.html
>>>>>>>> [2]
>>>>>>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-security-improvements-td21068.html
>>>>>>>> [3]
>>>>>>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-Kerberos-Improvement-td25983.html
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Feb 13, 2019 at 3:39 AM jincheng sun <
>>>>>>>> sunjincheng121@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Very excited and thank you for launching such a great discussion,
>>>>>>>>> Stephan !
>>>>>>>>>
>>>>>>>>> Here only a little suggestion that in the Batch Streaming
>>>>>>>>> Unification section, do we need to add an item:
>>>>>>>>>
>>>>>>>>> - Same window operators on bounded/unbounded Table API and
>>>>>>>>> DataStream API
>>>>>>>>> (currently OVER window only exists in SQL/TableAPI, DataStream API
>>>>>>>>> does not yet support)
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> Jincheng
>>>>>>>>>
>>>>>>>>> Stephan Ewen <se...@apache.org> 于2019年2月13日周三 下午7:21写道：
>>>>>>>>>
>>>>>>>>>> Hi all!
>>>>>>>>>>
>>>>>>>>>> Recently several contributors, committers, and users asked about
>>>>>>>>>> making it more visible in which way the project is currently going.
>>>>>>>>>>
>>>>>>>>>> Users and developers can track the direction by following the
>>>>>>>>>> discussion threads and JIRA, but due to the mass of discussions and open
>>>>>>>>>> issues, it is very hard to get a good overall picture.
>>>>>>>>>> Especially for new users and contributors, is is very hard to get
>>>>>>>>>> a quick overview of the project direction.
>>>>>>>>>>
>>>>>>>>>> To fix this, I suggest to add a brief roadmap summary to the
>>>>>>>>>> homepage. It is a bit of a commitment to keep that roadmap up to date, but
>>>>>>>>>> I think the benefit for users justifies that.
>>>>>>>>>> The Apache Beam project has added such a roadmap [1]
>>>>>>>>>> <https://beam.apache.org/roadmap/>, which was received very well
>>>>>>>>>> by the community, I would suggest to follow a similar structure here.
>>>>>>>>>>
>>>>>>>>>> If the community is in favor of this, I would volunteer to write
>>>>>>>>>> a first version of such a roadmap. The points I would include are below.
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Stephan
>>>>>>>>>>
>>>>>>>>>> [1] https://beam.apache.org/roadmap/
>>>>>>>>>>
>>>>>>>>>> ========================================================
>>>>>>>>>>
>>>>>>>>>> Disclaimer: Apache Flink is not governed or steered by any one
>>>>>>>>>> single entity, but by its community and Project Management Committee (PMC).
>>>>>>>>>> This is not a authoritative roadmap in the sense of a plan with a specific
>>>>>>>>>> timeline. Instead, we share our vision for the future and major initiatives
>>>>>>>>>> that are receiving attention and give users and contributors an
>>>>>>>>>> understanding what they can look forward to.
>>>>>>>>>>
>>>>>>>>>> *Future Role of Table API and DataStream API*
>>>>>>>>>>   - Table API becomes first class citizen
>>>>>>>>>>   - Table API becomes primary API for analytics use cases
>>>>>>>>>>       * Declarative, automatic optimizations
>>>>>>>>>>       * No manual control over state and timers
>>>>>>>>>>   - DataStream API becomes primary API for applications and data
>>>>>>>>>> pipeline use cases
>>>>>>>>>>       * Physical, user controls data types, no magic or optimizer
>>>>>>>>>>       * Explicit control over state and time
>>>>>>>>>>
>>>>>>>>>> *Batch Streaming Unification*
>>>>>>>>>>   - Table API unification (environments) (FLIP-32)
>>>>>>>>>>   - New unified source interface (FLIP-27)
>>>>>>>>>>   - Runtime operator unification & code reuse between DataStream
>>>>>>>>>> / Table
>>>>>>>>>>   - Extending Table API to make it convenient API for all
>>>>>>>>>> analytical use cases (easier mix in of UDFs)
>>>>>>>>>>   - Same join operators on bounded/unbounded Table API and
>>>>>>>>>> DataStream API
>>>>>>>>>>
>>>>>>>>>> *Faster Batch (Bounded Streams)*
>>>>>>>>>>   - Much of this comes via Blink contribution/merging
>>>>>>>>>>   - Fine-grained Fault Tolerance on bounded data (Table API)
>>>>>>>>>>   - Batch Scheduling on bounded data (Table API)
>>>>>>>>>>   - External Shuffle Services Support on bounded streams
>>>>>>>>>>   - Caching of intermediate results on bounded data (Table API)
>>>>>>>>>>   - Extending DataStream API to explicitly model bounded streams
>>>>>>>>>> (API breaking)
>>>>>>>>>>   - Add fine fault tolerance, scheduling, caching also to
>>>>>>>>>> DataStream API
>>>>>>>>>>
>>>>>>>>>> *Streaming State Evolution*
>>>>>>>>>>   - Let all built-in serializers support stable evolution
>>>>>>>>>>   - First class support for other evolvable formats (Protobuf,
>>>>>>>>>> Thrift)
>>>>>>>>>>   - Savepoint input/output format to modify / adjust savepoints
>>>>>>>>>>
>>>>>>>>>> *Simpler Event Time Handling*
>>>>>>>>>>   - Event Time Alignment in Sources
>>>>>>>>>>   - Simpler out-of-the box support in sources
>>>>>>>>>>
>>>>>>>>>> *Checkpointing*
>>>>>>>>>>   - Consistency of Side Effects: suspend / end with savepoint
>>>>>>>>>> (FLIP-34)
>>>>>>>>>>   - Failed checkpoints explicitly aborted on TaskManagers (not
>>>>>>>>>> only on coordinator)
>>>>>>>>>>
>>>>>>>>>> *Automatic scaling (adjusting parallelism)*
>>>>>>>>>>   - Reactive scaling
>>>>>>>>>>   - Active scaling policies
>>>>>>>>>>
>>>>>>>>>> *Kubernetes Integration*
>>>>>>>>>>   - Active Kubernetes Integration (Flink actively manages
>>>>>>>>>> containers)
>>>>>>>>>>
>>>>>>>>>> *SQL Ecosystem*
>>>>>>>>>>   - Extended Metadata Stores / Catalog / Schema Registries support
>>>>>>>>>>   - DDL support
>>>>>>>>>>   - Integration with Hive Ecosystem
>>>>>>>>>>
>>>>>>>>>> *Simpler Handling of Dependencies*
>>>>>>>>>>   - Scala in the APIs, but not in the core (hide in separate
>>>>>>>>>> class loader)
>>>>>>>>>>   - Hadoop-free by default
>>>>>>>>>>
>>>>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Best Regards
>>>>>>
>>>>>> Jeff Zhang
>>>>>>
>>>>>

Re: [DISCUSS] Adding a mid-term roadmap to the Flink website

Posted by Hequn Cheng <ch...@gmail.com>.

Hi Stephan,

Thanks for summarizing the great roadmap! It is very helpful for users and
developers to track the direction of Flink.
+1 for putting the roadmap on the website and update it per release.

Besides, would be great if the roadmap can add the UpsertSource
feature(maybe put it under 'Batch Streaming Unification').
It has been discussed a long time ago[1,2] and is moving forward step by
step.
Currently, Flink can only emit upsert results. With the UpsertSource, we
can make our system a more complete one.

Best, Hequn

[1]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-TABLE-How-to-handle-empty-delete-for-UpsertSource-td23856.html#a23874
[2] https://issues.apache.org/jira/browse/FLINK-8545
<https://issues.apache.org/jira/browse/FLINK-8545>



On Fri, Feb 22, 2019 at 3:31 AM Rong Rong <wa...@gmail.com> wrote:

> Hi Stephan,
>
> Yes. I completely agree. Jincheng & Jark gave some very valuable feedbacks
> and suggestions and I think we can definitely move the conversation forward
> to reach a more concrete doc first before we put in to the roadmap. Thanks
> for reviewing it and driving the roadmap effort!
>
> --
> Rong
>
> On Thu, Feb 21, 2019 at 8:50 AM Stephan Ewen <se...@apache.org> wrote:
>
>> Hi Rong Rong!
>>
>> I would add the security / kerberos threads to the roadmap. They seem to
>> be advanced enough in the discussions so that there is clarity what will
>> come.
>>
>> For the window operator with slicing, I would personally like to see the
>> discussion advance and have some more clarity and consensus on the feature
>> before adding it to the roadmap. Not having that in the first version of
>> the roadmap does not mean there will be no activity. And when the
>> discussion advances well in the next weeks, we can update the roadmap soon.
>>
>> What do you think?
>>
>> Best,
>> Stephan
>>
>>
>> On Thu, Feb 14, 2019 at 5:46 PM Rong Rong <wa...@gmail.com> wrote:
>>
>>> Hi Stephan,
>>>
>>> Thanks for the clarification, yes I think these issues has already been
>>> discussed in previous mailing list threads [1,2,3].
>>>
>>> I also agree that updating the "official" roadmap every release is a
>>> very good idea to avoid frequent update.
>>> One question I might've been a bit confusion is: are we suggesting to
>>> keep one roadmap on the documentation site (e.g. [4]) per release, or
>>> simply just one most up-to-date roadmap in the main website [5] ?
>>> Just like the release notes in every release, the former will probably
>>> provide a good tracker for users to look back at previous roadmaps as well
>>> I am assuming.
>>>
>>> Thanks,
>>> Rong
>>>
>>> [1]
>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Improvement-to-Flink-Window-Operator-with-Slicing-td25750.html
>>> [2]
>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-security-improvements-td21068.html
>>> [3]
>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-Kerberos-Improvement-td25983.html
>>>
>>> [4] https://ci.apache.org/projects/flink/flink-docs-release-1.7/
>>> [5] https://flink.apache.org/
>>>
>>> On Thu, Feb 14, 2019 at 2:26 AM Stephan Ewen <se...@apache.org> wrote:
>>>
>>>> I think the website is better as well.
>>>>
>>>> I agree with Fabian that the wiki is not so visible, and visibility is
>>>> the main motivation.
>>>> This type of roadmap overview would not be updated by everyone -
>>>> letting committers update the roadmap means the listed threads are actually
>>>> happening at the moment.
>>>>
>>>>
>>>> On Thu, Feb 14, 2019 at 11:14 AM Fabian Hueske <fh...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I like the idea of putting the roadmap on the website because it is
>>>>> much more visible (and IMO more credible, obligatory) there.
>>>>> However, I share the concerns about frequent updates.
>>>>>
>>>>> It think it would be great to update the "official" roadmap on the
>>>>> website once per release (-bugfix releases), i.e., every three month.
>>>>> We can use the wiki to collect and draft the roadmap for the next
>>>>> update.
>>>>>
>>>>> Best, Fabian
>>>>>
>>>>>
>>>>> Am Do., 14. Feb. 2019 um 11:03 Uhr schrieb Jeff Zhang <
>>>>> zjffdu@gmail.com>:
>>>>>
>>>>>> Hi Stephan,
>>>>>>
>>>>>> Thanks for this proposal. It is a good idea to track the roadmap. One
>>>>>> suggestion is that it might be better to put it into wiki page first.
>>>>>> Because it is easier to update the roadmap on wiki compared to on flink web
>>>>>> site. And I guess we may need to update the roadmap very often at the
>>>>>> beginning as there's so many discussions and proposals in community
>>>>>> recently. We can move it into flink web site later when we feel it could be
>>>>>> nailed down.
>>>>>>
>>>>>> Stephan Ewen <se...@apache.org> 于2019年2月14日周四 下午5:44写道：
>>>>>>
>>>>>>> Thanks Jincheng and Rong Rong!
>>>>>>>
>>>>>>> I am not deciding a roadmap and making a call on what features
>>>>>>> should be developed or not. I was only collecting broader issues that are
>>>>>>> already happening or have an active FLIP/design discussion plus committer
>>>>>>> support.
>>>>>>>
>>>>>>> Do we have that for the suggested issues as well? If yes , we can
>>>>>>> add them (can you point me to the issue/mail-thread), if not, let's try and
>>>>>>> move the discussion forward and add them to the roadmap overview then.
>>>>>>>
>>>>>>> Best,
>>>>>>> Stephan
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Feb 13, 2019 at 6:47 PM Rong Rong <wa...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Thanks Stephan for the great proposal.
>>>>>>>>
>>>>>>>> This would not only be beneficial for new users but also for
>>>>>>>> contributors to keep track on all upcoming features.
>>>>>>>>
>>>>>>>> I think that better window operator support can also be separately
>>>>>>>> group into its own category, as they affects both future DataStream API and
>>>>>>>> batch stream unification.
>>>>>>>> can we also include:
>>>>>>>> - OVER aggregate for DataStream API separately as @jincheng
>>>>>>>> suggested.
>>>>>>>> - Improving sliding window operator [1]
>>>>>>>>
>>>>>>>> One more additional suggestion, can we also include a more
>>>>>>>> extendable security module [2,3] @shuyi and I are currently working on?
>>>>>>>> This will significantly improve the usability for Flink in
>>>>>>>> corporate environments where proprietary or 3rd-party security integration
>>>>>>>> is needed.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Rong
>>>>>>>>
>>>>>>>>
>>>>>>>> [1]
>>>>>>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Improvement-to-Flink-Window-Operator-with-Slicing-td25750.html
>>>>>>>> [2]
>>>>>>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-security-improvements-td21068.html
>>>>>>>> [3]
>>>>>>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-Kerberos-Improvement-td25983.html
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Feb 13, 2019 at 3:39 AM jincheng sun <
>>>>>>>> sunjincheng121@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Very excited and thank you for launching such a great discussion,
>>>>>>>>> Stephan !
>>>>>>>>>
>>>>>>>>> Here only a little suggestion that in the Batch Streaming
>>>>>>>>> Unification section, do we need to add an item:
>>>>>>>>>
>>>>>>>>> - Same window operators on bounded/unbounded Table API and
>>>>>>>>> DataStream API
>>>>>>>>> (currently OVER window only exists in SQL/TableAPI, DataStream API
>>>>>>>>> does not yet support)
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> Jincheng
>>>>>>>>>
>>>>>>>>> Stephan Ewen <se...@apache.org> 于2019年2月13日周三 下午7:21写道：
>>>>>>>>>
>>>>>>>>>> Hi all!
>>>>>>>>>>
>>>>>>>>>> Recently several contributors, committers, and users asked about
>>>>>>>>>> making it more visible in which way the project is currently going.
>>>>>>>>>>
>>>>>>>>>> Users and developers can track the direction by following the
>>>>>>>>>> discussion threads and JIRA, but due to the mass of discussions and open
>>>>>>>>>> issues, it is very hard to get a good overall picture.
>>>>>>>>>> Especially for new users and contributors, is is very hard to get
>>>>>>>>>> a quick overview of the project direction.
>>>>>>>>>>
>>>>>>>>>> To fix this, I suggest to add a brief roadmap summary to the
>>>>>>>>>> homepage. It is a bit of a commitment to keep that roadmap up to date, but
>>>>>>>>>> I think the benefit for users justifies that.
>>>>>>>>>> The Apache Beam project has added such a roadmap [1]
>>>>>>>>>> <https://beam.apache.org/roadmap/>, which was received very well
>>>>>>>>>> by the community, I would suggest to follow a similar structure here.
>>>>>>>>>>
>>>>>>>>>> If the community is in favor of this, I would volunteer to write
>>>>>>>>>> a first version of such a roadmap. The points I would include are below.
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Stephan
>>>>>>>>>>
>>>>>>>>>> [1] https://beam.apache.org/roadmap/
>>>>>>>>>>
>>>>>>>>>> ========================================================
>>>>>>>>>>
>>>>>>>>>> Disclaimer: Apache Flink is not governed or steered by any one
>>>>>>>>>> single entity, but by its community and Project Management Committee (PMC).
>>>>>>>>>> This is not a authoritative roadmap in the sense of a plan with a specific
>>>>>>>>>> timeline. Instead, we share our vision for the future and major initiatives
>>>>>>>>>> that are receiving attention and give users and contributors an
>>>>>>>>>> understanding what they can look forward to.
>>>>>>>>>>
>>>>>>>>>> *Future Role of Table API and DataStream API*
>>>>>>>>>>   - Table API becomes first class citizen
>>>>>>>>>>   - Table API becomes primary API for analytics use cases
>>>>>>>>>>       * Declarative, automatic optimizations
>>>>>>>>>>       * No manual control over state and timers
>>>>>>>>>>   - DataStream API becomes primary API for applications and data
>>>>>>>>>> pipeline use cases
>>>>>>>>>>       * Physical, user controls data types, no magic or optimizer
>>>>>>>>>>       * Explicit control over state and time
>>>>>>>>>>
>>>>>>>>>> *Batch Streaming Unification*
>>>>>>>>>>   - Table API unification (environments) (FLIP-32)
>>>>>>>>>>   - New unified source interface (FLIP-27)
>>>>>>>>>>   - Runtime operator unification & code reuse between DataStream
>>>>>>>>>> / Table
>>>>>>>>>>   - Extending Table API to make it convenient API for all
>>>>>>>>>> analytical use cases (easier mix in of UDFs)
>>>>>>>>>>   - Same join operators on bounded/unbounded Table API and
>>>>>>>>>> DataStream API
>>>>>>>>>>
>>>>>>>>>> *Faster Batch (Bounded Streams)*
>>>>>>>>>>   - Much of this comes via Blink contribution/merging
>>>>>>>>>>   - Fine-grained Fault Tolerance on bounded data (Table API)
>>>>>>>>>>   - Batch Scheduling on bounded data (Table API)
>>>>>>>>>>   - External Shuffle Services Support on bounded streams
>>>>>>>>>>   - Caching of intermediate results on bounded data (Table API)
>>>>>>>>>>   - Extending DataStream API to explicitly model bounded streams
>>>>>>>>>> (API breaking)
>>>>>>>>>>   - Add fine fault tolerance, scheduling, caching also to
>>>>>>>>>> DataStream API
>>>>>>>>>>
>>>>>>>>>> *Streaming State Evolution*
>>>>>>>>>>   - Let all built-in serializers support stable evolution
>>>>>>>>>>   - First class support for other evolvable formats (Protobuf,
>>>>>>>>>> Thrift)
>>>>>>>>>>   - Savepoint input/output format to modify / adjust savepoints
>>>>>>>>>>
>>>>>>>>>> *Simpler Event Time Handling*
>>>>>>>>>>   - Event Time Alignment in Sources
>>>>>>>>>>   - Simpler out-of-the box support in sources
>>>>>>>>>>
>>>>>>>>>> *Checkpointing*
>>>>>>>>>>   - Consistency of Side Effects: suspend / end with savepoint
>>>>>>>>>> (FLIP-34)
>>>>>>>>>>   - Failed checkpoints explicitly aborted on TaskManagers (not
>>>>>>>>>> only on coordinator)
>>>>>>>>>>
>>>>>>>>>> *Automatic scaling (adjusting parallelism)*
>>>>>>>>>>   - Reactive scaling
>>>>>>>>>>   - Active scaling policies
>>>>>>>>>>
>>>>>>>>>> *Kubernetes Integration*
>>>>>>>>>>   - Active Kubernetes Integration (Flink actively manages
>>>>>>>>>> containers)
>>>>>>>>>>
>>>>>>>>>> *SQL Ecosystem*
>>>>>>>>>>   - Extended Metadata Stores / Catalog / Schema Registries support
>>>>>>>>>>   - DDL support
>>>>>>>>>>   - Integration with Hive Ecosystem
>>>>>>>>>>
>>>>>>>>>> *Simpler Handling of Dependencies*
>>>>>>>>>>   - Scala in the APIs, but not in the core (hide in separate
>>>>>>>>>> class loader)
>>>>>>>>>>   - Hadoop-free by default
>>>>>>>>>>
>>>>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Best Regards
>>>>>>
>>>>>> Jeff Zhang
>>>>>>
>>>>>

Re: [DISCUSS] Adding a mid-term roadmap to the Flink website

Posted by Rong Rong <wa...@gmail.com>.

Hi Stephan,

Yes. I completely agree. Jincheng & Jark gave some very valuable feedbacks
and suggestions and I think we can definitely move the conversation forward
to reach a more concrete doc first before we put in to the roadmap. Thanks
for reviewing it and driving the roadmap effort!

--
Rong

On Thu, Feb 21, 2019 at 8:50 AM Stephan Ewen <se...@apache.org> wrote:

> Hi Rong Rong!
>
> I would add the security / kerberos threads to the roadmap. They seem to
> be advanced enough in the discussions so that there is clarity what will
> come.
>
> For the window operator with slicing, I would personally like to see the
> discussion advance and have some more clarity and consensus on the feature
> before adding it to the roadmap. Not having that in the first version of
> the roadmap does not mean there will be no activity. And when the
> discussion advances well in the next weeks, we can update the roadmap soon.
>
> What do you think?
>
> Best,
> Stephan
>
>
> On Thu, Feb 14, 2019 at 5:46 PM Rong Rong <wa...@gmail.com> wrote:
>
>> Hi Stephan,
>>
>> Thanks for the clarification, yes I think these issues has already been
>> discussed in previous mailing list threads [1,2,3].
>>
>> I also agree that updating the "official" roadmap every release is a very
>> good idea to avoid frequent update.
>> One question I might've been a bit confusion is: are we suggesting to
>> keep one roadmap on the documentation site (e.g. [4]) per release, or
>> simply just one most up-to-date roadmap in the main website [5] ?
>> Just like the release notes in every release, the former will probably
>> provide a good tracker for users to look back at previous roadmaps as well
>> I am assuming.
>>
>> Thanks,
>> Rong
>>
>> [1]
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Improvement-to-Flink-Window-Operator-with-Slicing-td25750.html
>> [2]
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-security-improvements-td21068.html
>> [3]
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-Kerberos-Improvement-td25983.html
>>
>> [4] https://ci.apache.org/projects/flink/flink-docs-release-1.7/
>> [5] https://flink.apache.org/
>>
>> On Thu, Feb 14, 2019 at 2:26 AM Stephan Ewen <se...@apache.org> wrote:
>>
>>> I think the website is better as well.
>>>
>>> I agree with Fabian that the wiki is not so visible, and visibility is
>>> the main motivation.
>>> This type of roadmap overview would not be updated by everyone - letting
>>> committers update the roadmap means the listed threads are actually
>>> happening at the moment.
>>>
>>>
>>> On Thu, Feb 14, 2019 at 11:14 AM Fabian Hueske <fh...@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I like the idea of putting the roadmap on the website because it is
>>>> much more visible (and IMO more credible, obligatory) there.
>>>> However, I share the concerns about frequent updates.
>>>>
>>>> It think it would be great to update the "official" roadmap on the
>>>> website once per release (-bugfix releases), i.e., every three month.
>>>> We can use the wiki to collect and draft the roadmap for the next
>>>> update.
>>>>
>>>> Best, Fabian
>>>>
>>>>
>>>> Am Do., 14. Feb. 2019 um 11:03 Uhr schrieb Jeff Zhang <zjffdu@gmail.com
>>>> >:
>>>>
>>>>> Hi Stephan,
>>>>>
>>>>> Thanks for this proposal. It is a good idea to track the roadmap. One
>>>>> suggestion is that it might be better to put it into wiki page first.
>>>>> Because it is easier to update the roadmap on wiki compared to on flink web
>>>>> site. And I guess we may need to update the roadmap very often at the
>>>>> beginning as there's so many discussions and proposals in community
>>>>> recently. We can move it into flink web site later when we feel it could be
>>>>> nailed down.
>>>>>
>>>>> Stephan Ewen <se...@apache.org> 于2019年2月14日周四 下午5:44写道：
>>>>>
>>>>>> Thanks Jincheng and Rong Rong!
>>>>>>
>>>>>> I am not deciding a roadmap and making a call on what features should
>>>>>> be developed or not. I was only collecting broader issues that are already
>>>>>> happening or have an active FLIP/design discussion plus committer support.
>>>>>>
>>>>>> Do we have that for the suggested issues as well? If yes , we can add
>>>>>> them (can you point me to the issue/mail-thread), if not, let's try and
>>>>>> move the discussion forward and add them to the roadmap overview then.
>>>>>>
>>>>>> Best,
>>>>>> Stephan
>>>>>>
>>>>>>
>>>>>> On Wed, Feb 13, 2019 at 6:47 PM Rong Rong <wa...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Thanks Stephan for the great proposal.
>>>>>>>
>>>>>>> This would not only be beneficial for new users but also for
>>>>>>> contributors to keep track on all upcoming features.
>>>>>>>
>>>>>>> I think that better window operator support can also be separately
>>>>>>> group into its own category, as they affects both future DataStream API and
>>>>>>> batch stream unification.
>>>>>>> can we also include:
>>>>>>> - OVER aggregate for DataStream API separately as @jincheng
>>>>>>> suggested.
>>>>>>> - Improving sliding window operator [1]
>>>>>>>
>>>>>>> One more additional suggestion, can we also include a more
>>>>>>> extendable security module [2,3] @shuyi and I are currently working on?
>>>>>>> This will significantly improve the usability for Flink in corporate
>>>>>>> environments where proprietary or 3rd-party security integration is needed.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Rong
>>>>>>>
>>>>>>>
>>>>>>> [1]
>>>>>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Improvement-to-Flink-Window-Operator-with-Slicing-td25750.html
>>>>>>> [2]
>>>>>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-security-improvements-td21068.html
>>>>>>> [3]
>>>>>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-Kerberos-Improvement-td25983.html
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Feb 13, 2019 at 3:39 AM jincheng sun <
>>>>>>> sunjincheng121@gmail.com> wrote:
>>>>>>>
>>>>>>>> Very excited and thank you for launching such a great discussion,
>>>>>>>> Stephan !
>>>>>>>>
>>>>>>>> Here only a little suggestion that in the Batch Streaming
>>>>>>>> Unification section, do we need to add an item:
>>>>>>>>
>>>>>>>> - Same window operators on bounded/unbounded Table API and
>>>>>>>> DataStream API
>>>>>>>> (currently OVER window only exists in SQL/TableAPI, DataStream API
>>>>>>>> does not yet support)
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Jincheng
>>>>>>>>
>>>>>>>> Stephan Ewen <se...@apache.org> 于2019年2月13日周三 下午7:21写道：
>>>>>>>>
>>>>>>>>> Hi all!
>>>>>>>>>
>>>>>>>>> Recently several contributors, committers, and users asked about
>>>>>>>>> making it more visible in which way the project is currently going.
>>>>>>>>>
>>>>>>>>> Users and developers can track the direction by following the
>>>>>>>>> discussion threads and JIRA, but due to the mass of discussions and open
>>>>>>>>> issues, it is very hard to get a good overall picture.
>>>>>>>>> Especially for new users and contributors, is is very hard to get
>>>>>>>>> a quick overview of the project direction.
>>>>>>>>>
>>>>>>>>> To fix this, I suggest to add a brief roadmap summary to the
>>>>>>>>> homepage. It is a bit of a commitment to keep that roadmap up to date, but
>>>>>>>>> I think the benefit for users justifies that.
>>>>>>>>> The Apache Beam project has added such a roadmap [1]
>>>>>>>>> <https://beam.apache.org/roadmap/>, which was received very well
>>>>>>>>> by the community, I would suggest to follow a similar structure here.
>>>>>>>>>
>>>>>>>>> If the community is in favor of this, I would volunteer to write a
>>>>>>>>> first version of such a roadmap. The points I would include are below.
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> Stephan
>>>>>>>>>
>>>>>>>>> [1] https://beam.apache.org/roadmap/
>>>>>>>>>
>>>>>>>>> ========================================================
>>>>>>>>>
>>>>>>>>> Disclaimer: Apache Flink is not governed or steered by any one
>>>>>>>>> single entity, but by its community and Project Management Committee (PMC).
>>>>>>>>> This is not a authoritative roadmap in the sense of a plan with a specific
>>>>>>>>> timeline. Instead, we share our vision for the future and major initiatives
>>>>>>>>> that are receiving attention and give users and contributors an
>>>>>>>>> understanding what they can look forward to.
>>>>>>>>>
>>>>>>>>> *Future Role of Table API and DataStream API*
>>>>>>>>>   - Table API becomes first class citizen
>>>>>>>>>   - Table API becomes primary API for analytics use cases
>>>>>>>>>       * Declarative, automatic optimizations
>>>>>>>>>       * No manual control over state and timers
>>>>>>>>>   - DataStream API becomes primary API for applications and data
>>>>>>>>> pipeline use cases
>>>>>>>>>       * Physical, user controls data types, no magic or optimizer
>>>>>>>>>       * Explicit control over state and time
>>>>>>>>>
>>>>>>>>> *Batch Streaming Unification*
>>>>>>>>>   - Table API unification (environments) (FLIP-32)
>>>>>>>>>   - New unified source interface (FLIP-27)
>>>>>>>>>   - Runtime operator unification & code reuse between DataStream /
>>>>>>>>> Table
>>>>>>>>>   - Extending Table API to make it convenient API for all
>>>>>>>>> analytical use cases (easier mix in of UDFs)
>>>>>>>>>   - Same join operators on bounded/unbounded Table API and
>>>>>>>>> DataStream API
>>>>>>>>>
>>>>>>>>> *Faster Batch (Bounded Streams)*
>>>>>>>>>   - Much of this comes via Blink contribution/merging
>>>>>>>>>   - Fine-grained Fault Tolerance on bounded data (Table API)
>>>>>>>>>   - Batch Scheduling on bounded data (Table API)
>>>>>>>>>   - External Shuffle Services Support on bounded streams
>>>>>>>>>   - Caching of intermediate results on bounded data (Table API)
>>>>>>>>>   - Extending DataStream API to explicitly model bounded streams
>>>>>>>>> (API breaking)
>>>>>>>>>   - Add fine fault tolerance, scheduling, caching also to
>>>>>>>>> DataStream API
>>>>>>>>>
>>>>>>>>> *Streaming State Evolution*
>>>>>>>>>   - Let all built-in serializers support stable evolution
>>>>>>>>>   - First class support for other evolvable formats (Protobuf,
>>>>>>>>> Thrift)
>>>>>>>>>   - Savepoint input/output format to modify / adjust savepoints
>>>>>>>>>
>>>>>>>>> *Simpler Event Time Handling*
>>>>>>>>>   - Event Time Alignment in Sources
>>>>>>>>>   - Simpler out-of-the box support in sources
>>>>>>>>>
>>>>>>>>> *Checkpointing*
>>>>>>>>>   - Consistency of Side Effects: suspend / end with savepoint
>>>>>>>>> (FLIP-34)
>>>>>>>>>   - Failed checkpoints explicitly aborted on TaskManagers (not
>>>>>>>>> only on coordinator)
>>>>>>>>>
>>>>>>>>> *Automatic scaling (adjusting parallelism)*
>>>>>>>>>   - Reactive scaling
>>>>>>>>>   - Active scaling policies
>>>>>>>>>
>>>>>>>>> *Kubernetes Integration*
>>>>>>>>>   - Active Kubernetes Integration (Flink actively manages
>>>>>>>>> containers)
>>>>>>>>>
>>>>>>>>> *SQL Ecosystem*
>>>>>>>>>   - Extended Metadata Stores / Catalog / Schema Registries support
>>>>>>>>>   - DDL support
>>>>>>>>>   - Integration with Hive Ecosystem
>>>>>>>>>
>>>>>>>>> *Simpler Handling of Dependencies*
>>>>>>>>>   - Scala in the APIs, but not in the core (hide in separate class
>>>>>>>>> loader)
>>>>>>>>>   - Hadoop-free by default
>>>>>>>>>
>>>>>>>>>
>>>>>
>>>>> --
>>>>> Best Regards
>>>>>
>>>>> Jeff Zhang
>>>>>
>>>>

Re: [DISCUSS] Adding a mid-term roadmap to the Flink website

Posted by Rong Rong <wa...@gmail.com>.

Hi Stephan,

Yes. I completely agree. Jincheng & Jark gave some very valuable feedbacks
and suggestions and I think we can definitely move the conversation forward
to reach a more concrete doc first before we put in to the roadmap. Thanks
for reviewing it and driving the roadmap effort!

--
Rong

On Thu, Feb 21, 2019 at 8:50 AM Stephan Ewen <se...@apache.org> wrote:

> Hi Rong Rong!
>
> I would add the security / kerberos threads to the roadmap. They seem to
> be advanced enough in the discussions so that there is clarity what will
> come.
>
> For the window operator with slicing, I would personally like to see the
> discussion advance and have some more clarity and consensus on the feature
> before adding it to the roadmap. Not having that in the first version of
> the roadmap does not mean there will be no activity. And when the
> discussion advances well in the next weeks, we can update the roadmap soon.
>
> What do you think?
>
> Best,
> Stephan
>
>
> On Thu, Feb 14, 2019 at 5:46 PM Rong Rong <wa...@gmail.com> wrote:
>
>> Hi Stephan,
>>
>> Thanks for the clarification, yes I think these issues has already been
>> discussed in previous mailing list threads [1,2,3].
>>
>> I also agree that updating the "official" roadmap every release is a very
>> good idea to avoid frequent update.
>> One question I might've been a bit confusion is: are we suggesting to
>> keep one roadmap on the documentation site (e.g. [4]) per release, or
>> simply just one most up-to-date roadmap in the main website [5] ?
>> Just like the release notes in every release, the former will probably
>> provide a good tracker for users to look back at previous roadmaps as well
>> I am assuming.
>>
>> Thanks,
>> Rong
>>
>> [1]
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Improvement-to-Flink-Window-Operator-with-Slicing-td25750.html
>> [2]
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-security-improvements-td21068.html
>> [3]
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-Kerberos-Improvement-td25983.html
>>
>> [4] https://ci.apache.org/projects/flink/flink-docs-release-1.7/
>> [5] https://flink.apache.org/
>>
>> On Thu, Feb 14, 2019 at 2:26 AM Stephan Ewen <se...@apache.org> wrote:
>>
>>> I think the website is better as well.
>>>
>>> I agree with Fabian that the wiki is not so visible, and visibility is
>>> the main motivation.
>>> This type of roadmap overview would not be updated by everyone - letting
>>> committers update the roadmap means the listed threads are actually
>>> happening at the moment.
>>>
>>>
>>> On Thu, Feb 14, 2019 at 11:14 AM Fabian Hueske <fh...@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I like the idea of putting the roadmap on the website because it is
>>>> much more visible (and IMO more credible, obligatory) there.
>>>> However, I share the concerns about frequent updates.
>>>>
>>>> It think it would be great to update the "official" roadmap on the
>>>> website once per release (-bugfix releases), i.e., every three month.
>>>> We can use the wiki to collect and draft the roadmap for the next
>>>> update.
>>>>
>>>> Best, Fabian
>>>>
>>>>
>>>> Am Do., 14. Feb. 2019 um 11:03 Uhr schrieb Jeff Zhang <zjffdu@gmail.com
>>>> >:
>>>>
>>>>> Hi Stephan,
>>>>>
>>>>> Thanks for this proposal. It is a good idea to track the roadmap. One
>>>>> suggestion is that it might be better to put it into wiki page first.
>>>>> Because it is easier to update the roadmap on wiki compared to on flink web
>>>>> site. And I guess we may need to update the roadmap very often at the
>>>>> beginning as there's so many discussions and proposals in community
>>>>> recently. We can move it into flink web site later when we feel it could be
>>>>> nailed down.
>>>>>
>>>>> Stephan Ewen <se...@apache.org> 于2019年2月14日周四 下午5:44写道：
>>>>>
>>>>>> Thanks Jincheng and Rong Rong!
>>>>>>
>>>>>> I am not deciding a roadmap and making a call on what features should
>>>>>> be developed or not. I was only collecting broader issues that are already
>>>>>> happening or have an active FLIP/design discussion plus committer support.
>>>>>>
>>>>>> Do we have that for the suggested issues as well? If yes , we can add
>>>>>> them (can you point me to the issue/mail-thread), if not, let's try and
>>>>>> move the discussion forward and add them to the roadmap overview then.
>>>>>>
>>>>>> Best,
>>>>>> Stephan
>>>>>>
>>>>>>
>>>>>> On Wed, Feb 13, 2019 at 6:47 PM Rong Rong <wa...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Thanks Stephan for the great proposal.
>>>>>>>
>>>>>>> This would not only be beneficial for new users but also for
>>>>>>> contributors to keep track on all upcoming features.
>>>>>>>
>>>>>>> I think that better window operator support can also be separately
>>>>>>> group into its own category, as they affects both future DataStream API and
>>>>>>> batch stream unification.
>>>>>>> can we also include:
>>>>>>> - OVER aggregate for DataStream API separately as @jincheng
>>>>>>> suggested.
>>>>>>> - Improving sliding window operator [1]
>>>>>>>
>>>>>>> One more additional suggestion, can we also include a more
>>>>>>> extendable security module [2,3] @shuyi and I are currently working on?
>>>>>>> This will significantly improve the usability for Flink in corporate
>>>>>>> environments where proprietary or 3rd-party security integration is needed.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Rong
>>>>>>>
>>>>>>>
>>>>>>> [1]
>>>>>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Improvement-to-Flink-Window-Operator-with-Slicing-td25750.html
>>>>>>> [2]
>>>>>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-security-improvements-td21068.html
>>>>>>> [3]
>>>>>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-Kerberos-Improvement-td25983.html
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Feb 13, 2019 at 3:39 AM jincheng sun <
>>>>>>> sunjincheng121@gmail.com> wrote:
>>>>>>>
>>>>>>>> Very excited and thank you for launching such a great discussion,
>>>>>>>> Stephan !
>>>>>>>>
>>>>>>>> Here only a little suggestion that in the Batch Streaming
>>>>>>>> Unification section, do we need to add an item:
>>>>>>>>
>>>>>>>> - Same window operators on bounded/unbounded Table API and
>>>>>>>> DataStream API
>>>>>>>> (currently OVER window only exists in SQL/TableAPI, DataStream API
>>>>>>>> does not yet support)
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Jincheng
>>>>>>>>
>>>>>>>> Stephan Ewen <se...@apache.org> 于2019年2月13日周三 下午7:21写道：
>>>>>>>>
>>>>>>>>> Hi all!
>>>>>>>>>
>>>>>>>>> Recently several contributors, committers, and users asked about
>>>>>>>>> making it more visible in which way the project is currently going.
>>>>>>>>>
>>>>>>>>> Users and developers can track the direction by following the
>>>>>>>>> discussion threads and JIRA, but due to the mass of discussions and open
>>>>>>>>> issues, it is very hard to get a good overall picture.
>>>>>>>>> Especially for new users and contributors, is is very hard to get
>>>>>>>>> a quick overview of the project direction.
>>>>>>>>>
>>>>>>>>> To fix this, I suggest to add a brief roadmap summary to the
>>>>>>>>> homepage. It is a bit of a commitment to keep that roadmap up to date, but
>>>>>>>>> I think the benefit for users justifies that.
>>>>>>>>> The Apache Beam project has added such a roadmap [1]
>>>>>>>>> <https://beam.apache.org/roadmap/>, which was received very well
>>>>>>>>> by the community, I would suggest to follow a similar structure here.
>>>>>>>>>
>>>>>>>>> If the community is in favor of this, I would volunteer to write a
>>>>>>>>> first version of such a roadmap. The points I would include are below.
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> Stephan
>>>>>>>>>
>>>>>>>>> [1] https://beam.apache.org/roadmap/
>>>>>>>>>
>>>>>>>>> ========================================================
>>>>>>>>>
>>>>>>>>> Disclaimer: Apache Flink is not governed or steered by any one
>>>>>>>>> single entity, but by its community and Project Management Committee (PMC).
>>>>>>>>> This is not a authoritative roadmap in the sense of a plan with a specific
>>>>>>>>> timeline. Instead, we share our vision for the future and major initiatives
>>>>>>>>> that are receiving attention and give users and contributors an
>>>>>>>>> understanding what they can look forward to.
>>>>>>>>>
>>>>>>>>> *Future Role of Table API and DataStream API*
>>>>>>>>>   - Table API becomes first class citizen
>>>>>>>>>   - Table API becomes primary API for analytics use cases
>>>>>>>>>       * Declarative, automatic optimizations
>>>>>>>>>       * No manual control over state and timers
>>>>>>>>>   - DataStream API becomes primary API for applications and data
>>>>>>>>> pipeline use cases
>>>>>>>>>       * Physical, user controls data types, no magic or optimizer
>>>>>>>>>       * Explicit control over state and time
>>>>>>>>>
>>>>>>>>> *Batch Streaming Unification*
>>>>>>>>>   - Table API unification (environments) (FLIP-32)
>>>>>>>>>   - New unified source interface (FLIP-27)
>>>>>>>>>   - Runtime operator unification & code reuse between DataStream /
>>>>>>>>> Table
>>>>>>>>>   - Extending Table API to make it convenient API for all
>>>>>>>>> analytical use cases (easier mix in of UDFs)
>>>>>>>>>   - Same join operators on bounded/unbounded Table API and
>>>>>>>>> DataStream API
>>>>>>>>>
>>>>>>>>> *Faster Batch (Bounded Streams)*
>>>>>>>>>   - Much of this comes via Blink contribution/merging
>>>>>>>>>   - Fine-grained Fault Tolerance on bounded data (Table API)
>>>>>>>>>   - Batch Scheduling on bounded data (Table API)
>>>>>>>>>   - External Shuffle Services Support on bounded streams
>>>>>>>>>   - Caching of intermediate results on bounded data (Table API)
>>>>>>>>>   - Extending DataStream API to explicitly model bounded streams
>>>>>>>>> (API breaking)
>>>>>>>>>   - Add fine fault tolerance, scheduling, caching also to
>>>>>>>>> DataStream API
>>>>>>>>>
>>>>>>>>> *Streaming State Evolution*
>>>>>>>>>   - Let all built-in serializers support stable evolution
>>>>>>>>>   - First class support for other evolvable formats (Protobuf,
>>>>>>>>> Thrift)
>>>>>>>>>   - Savepoint input/output format to modify / adjust savepoints
>>>>>>>>>
>>>>>>>>> *Simpler Event Time Handling*
>>>>>>>>>   - Event Time Alignment in Sources
>>>>>>>>>   - Simpler out-of-the box support in sources
>>>>>>>>>
>>>>>>>>> *Checkpointing*
>>>>>>>>>   - Consistency of Side Effects: suspend / end with savepoint
>>>>>>>>> (FLIP-34)
>>>>>>>>>   - Failed checkpoints explicitly aborted on TaskManagers (not
>>>>>>>>> only on coordinator)
>>>>>>>>>
>>>>>>>>> *Automatic scaling (adjusting parallelism)*
>>>>>>>>>   - Reactive scaling
>>>>>>>>>   - Active scaling policies
>>>>>>>>>
>>>>>>>>> *Kubernetes Integration*
>>>>>>>>>   - Active Kubernetes Integration (Flink actively manages
>>>>>>>>> containers)
>>>>>>>>>
>>>>>>>>> *SQL Ecosystem*
>>>>>>>>>   - Extended Metadata Stores / Catalog / Schema Registries support
>>>>>>>>>   - DDL support
>>>>>>>>>   - Integration with Hive Ecosystem
>>>>>>>>>
>>>>>>>>> *Simpler Handling of Dependencies*
>>>>>>>>>   - Scala in the APIs, but not in the core (hide in separate class
>>>>>>>>> loader)
>>>>>>>>>   - Hadoop-free by default
>>>>>>>>>
>>>>>>>>>
>>>>>
>>>>> --
>>>>> Best Regards
>>>>>
>>>>> Jeff Zhang
>>>>>
>>>>

Re: [DISCUSS] Adding a mid-term roadmap to the Flink website

Posted by Stephan Ewen <se...@apache.org>.

Hi Rong Rong!

I would add the security / kerberos threads to the roadmap. They seem to be
advanced enough in the discussions so that there is clarity what will come.

For the window operator with slicing, I would personally like to see the
discussion advance and have some more clarity and consensus on the feature
before adding it to the roadmap. Not having that in the first version of
the roadmap does not mean there will be no activity. And when the
discussion advances well in the next weeks, we can update the roadmap soon.

What do you think?

Best,
Stephan


On Thu, Feb 14, 2019 at 5:46 PM Rong Rong <wa...@gmail.com> wrote:

> Hi Stephan,
>
> Thanks for the clarification, yes I think these issues has already been
> discussed in previous mailing list threads [1,2,3].
>
> I also agree that updating the "official" roadmap every release is a very
> good idea to avoid frequent update.
> One question I might've been a bit confusion is: are we suggesting to keep
> one roadmap on the documentation site (e.g. [4]) per release, or simply
> just one most up-to-date roadmap in the main website [5] ?
> Just like the release notes in every release, the former will probably
> provide a good tracker for users to look back at previous roadmaps as well
> I am assuming.
>
> Thanks,
> Rong
>
> [1]
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Improvement-to-Flink-Window-Operator-with-Slicing-td25750.html
> [2]
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-security-improvements-td21068.html
> [3]
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-Kerberos-Improvement-td25983.html
>
> [4] https://ci.apache.org/projects/flink/flink-docs-release-1.7/
> [5] https://flink.apache.org/
>
> On Thu, Feb 14, 2019 at 2:26 AM Stephan Ewen <se...@apache.org> wrote:
>
>> I think the website is better as well.
>>
>> I agree with Fabian that the wiki is not so visible, and visibility is
>> the main motivation.
>> This type of roadmap overview would not be updated by everyone - letting
>> committers update the roadmap means the listed threads are actually
>> happening at the moment.
>>
>>
>> On Thu, Feb 14, 2019 at 11:14 AM Fabian Hueske <fh...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I like the idea of putting the roadmap on the website because it is much
>>> more visible (and IMO more credible, obligatory) there.
>>> However, I share the concerns about frequent updates.
>>>
>>> It think it would be great to update the "official" roadmap on the
>>> website once per release (-bugfix releases), i.e., every three month.
>>> We can use the wiki to collect and draft the roadmap for the next
>>> update.
>>>
>>> Best, Fabian
>>>
>>>
>>> Am Do., 14. Feb. 2019 um 11:03 Uhr schrieb Jeff Zhang <zjffdu@gmail.com
>>> >:
>>>
>>>> Hi Stephan,
>>>>
>>>> Thanks for this proposal. It is a good idea to track the roadmap. One
>>>> suggestion is that it might be better to put it into wiki page first.
>>>> Because it is easier to update the roadmap on wiki compared to on flink web
>>>> site. And I guess we may need to update the roadmap very often at the
>>>> beginning as there's so many discussions and proposals in community
>>>> recently. We can move it into flink web site later when we feel it could be
>>>> nailed down.
>>>>
>>>> Stephan Ewen <se...@apache.org> 于2019年2月14日周四 下午5:44写道：
>>>>
>>>>> Thanks Jincheng and Rong Rong!
>>>>>
>>>>> I am not deciding a roadmap and making a call on what features should
>>>>> be developed or not. I was only collecting broader issues that are already
>>>>> happening or have an active FLIP/design discussion plus committer support.
>>>>>
>>>>> Do we have that for the suggested issues as well? If yes , we can add
>>>>> them (can you point me to the issue/mail-thread), if not, let's try and
>>>>> move the discussion forward and add them to the roadmap overview then.
>>>>>
>>>>> Best,
>>>>> Stephan
>>>>>
>>>>>
>>>>> On Wed, Feb 13, 2019 at 6:47 PM Rong Rong <wa...@gmail.com> wrote:
>>>>>
>>>>>> Thanks Stephan for the great proposal.
>>>>>>
>>>>>> This would not only be beneficial for new users but also for
>>>>>> contributors to keep track on all upcoming features.
>>>>>>
>>>>>> I think that better window operator support can also be separately
>>>>>> group into its own category, as they affects both future DataStream API and
>>>>>> batch stream unification.
>>>>>> can we also include:
>>>>>> - OVER aggregate for DataStream API separately as @jincheng suggested.
>>>>>> - Improving sliding window operator [1]
>>>>>>
>>>>>> One more additional suggestion, can we also include a more extendable
>>>>>> security module [2,3] @shuyi and I are currently working on?
>>>>>> This will significantly improve the usability for Flink in corporate
>>>>>> environments where proprietary or 3rd-party security integration is needed.
>>>>>>
>>>>>> Thanks,
>>>>>> Rong
>>>>>>
>>>>>>
>>>>>> [1]
>>>>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Improvement-to-Flink-Window-Operator-with-Slicing-td25750.html
>>>>>> [2]
>>>>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-security-improvements-td21068.html
>>>>>> [3]
>>>>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-Kerberos-Improvement-td25983.html
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Feb 13, 2019 at 3:39 AM jincheng sun <
>>>>>> sunjincheng121@gmail.com> wrote:
>>>>>>
>>>>>>> Very excited and thank you for launching such a great discussion,
>>>>>>> Stephan !
>>>>>>>
>>>>>>> Here only a little suggestion that in the Batch Streaming
>>>>>>> Unification section, do we need to add an item:
>>>>>>>
>>>>>>> - Same window operators on bounded/unbounded Table API and
>>>>>>> DataStream API
>>>>>>> (currently OVER window only exists in SQL/TableAPI, DataStream API
>>>>>>> does not yet support)
>>>>>>>
>>>>>>> Best,
>>>>>>> Jincheng
>>>>>>>
>>>>>>> Stephan Ewen <se...@apache.org> 于2019年2月13日周三 下午7:21写道：
>>>>>>>
>>>>>>>> Hi all!
>>>>>>>>
>>>>>>>> Recently several contributors, committers, and users asked about
>>>>>>>> making it more visible in which way the project is currently going.
>>>>>>>>
>>>>>>>> Users and developers can track the direction by following the
>>>>>>>> discussion threads and JIRA, but due to the mass of discussions and open
>>>>>>>> issues, it is very hard to get a good overall picture.
>>>>>>>> Especially for new users and contributors, is is very hard to get a
>>>>>>>> quick overview of the project direction.
>>>>>>>>
>>>>>>>> To fix this, I suggest to add a brief roadmap summary to the
>>>>>>>> homepage. It is a bit of a commitment to keep that roadmap up to date, but
>>>>>>>> I think the benefit for users justifies that.
>>>>>>>> The Apache Beam project has added such a roadmap [1]
>>>>>>>> <https://beam.apache.org/roadmap/>, which was received very well
>>>>>>>> by the community, I would suggest to follow a similar structure here.
>>>>>>>>
>>>>>>>> If the community is in favor of this, I would volunteer to write a
>>>>>>>> first version of such a roadmap. The points I would include are below.
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Stephan
>>>>>>>>
>>>>>>>> [1] https://beam.apache.org/roadmap/
>>>>>>>>
>>>>>>>> ========================================================
>>>>>>>>
>>>>>>>> Disclaimer: Apache Flink is not governed or steered by any one
>>>>>>>> single entity, but by its community and Project Management Committee (PMC).
>>>>>>>> This is not a authoritative roadmap in the sense of a plan with a specific
>>>>>>>> timeline. Instead, we share our vision for the future and major initiatives
>>>>>>>> that are receiving attention and give users and contributors an
>>>>>>>> understanding what they can look forward to.
>>>>>>>>
>>>>>>>> *Future Role of Table API and DataStream API*
>>>>>>>>   - Table API becomes first class citizen
>>>>>>>>   - Table API becomes primary API for analytics use cases
>>>>>>>>       * Declarative, automatic optimizations
>>>>>>>>       * No manual control over state and timers
>>>>>>>>   - DataStream API becomes primary API for applications and data
>>>>>>>> pipeline use cases
>>>>>>>>       * Physical, user controls data types, no magic or optimizer
>>>>>>>>       * Explicit control over state and time
>>>>>>>>
>>>>>>>> *Batch Streaming Unification*
>>>>>>>>   - Table API unification (environments) (FLIP-32)
>>>>>>>>   - New unified source interface (FLIP-27)
>>>>>>>>   - Runtime operator unification & code reuse between DataStream /
>>>>>>>> Table
>>>>>>>>   - Extending Table API to make it convenient API for all
>>>>>>>> analytical use cases (easier mix in of UDFs)
>>>>>>>>   - Same join operators on bounded/unbounded Table API and
>>>>>>>> DataStream API
>>>>>>>>
>>>>>>>> *Faster Batch (Bounded Streams)*
>>>>>>>>   - Much of this comes via Blink contribution/merging
>>>>>>>>   - Fine-grained Fault Tolerance on bounded data (Table API)
>>>>>>>>   - Batch Scheduling on bounded data (Table API)
>>>>>>>>   - External Shuffle Services Support on bounded streams
>>>>>>>>   - Caching of intermediate results on bounded data (Table API)
>>>>>>>>   - Extending DataStream API to explicitly model bounded streams
>>>>>>>> (API breaking)
>>>>>>>>   - Add fine fault tolerance, scheduling, caching also to
>>>>>>>> DataStream API
>>>>>>>>
>>>>>>>> *Streaming State Evolution*
>>>>>>>>   - Let all built-in serializers support stable evolution
>>>>>>>>   - First class support for other evolvable formats (Protobuf,
>>>>>>>> Thrift)
>>>>>>>>   - Savepoint input/output format to modify / adjust savepoints
>>>>>>>>
>>>>>>>> *Simpler Event Time Handling*
>>>>>>>>   - Event Time Alignment in Sources
>>>>>>>>   - Simpler out-of-the box support in sources
>>>>>>>>
>>>>>>>> *Checkpointing*
>>>>>>>>   - Consistency of Side Effects: suspend / end with savepoint
>>>>>>>> (FLIP-34)
>>>>>>>>   - Failed checkpoints explicitly aborted on TaskManagers (not only
>>>>>>>> on coordinator)
>>>>>>>>
>>>>>>>> *Automatic scaling (adjusting parallelism)*
>>>>>>>>   - Reactive scaling
>>>>>>>>   - Active scaling policies
>>>>>>>>
>>>>>>>> *Kubernetes Integration*
>>>>>>>>   - Active Kubernetes Integration (Flink actively manages
>>>>>>>> containers)
>>>>>>>>
>>>>>>>> *SQL Ecosystem*
>>>>>>>>   - Extended Metadata Stores / Catalog / Schema Registries support
>>>>>>>>   - DDL support
>>>>>>>>   - Integration with Hive Ecosystem
>>>>>>>>
>>>>>>>> *Simpler Handling of Dependencies*
>>>>>>>>   - Scala in the APIs, but not in the core (hide in separate class
>>>>>>>> loader)
>>>>>>>>   - Hadoop-free by default
>>>>>>>>
>>>>>>>>
>>>>
>>>> --
>>>> Best Regards
>>>>
>>>> Jeff Zhang
>>>>
>>>

Re: [DISCUSS] Adding a mid-term roadmap to the Flink website

Posted by Stephan Ewen <se...@apache.org>.

Hi Rong Rong!

I would add the security / kerberos threads to the roadmap. They seem to be
advanced enough in the discussions so that there is clarity what will come.

For the window operator with slicing, I would personally like to see the
discussion advance and have some more clarity and consensus on the feature
before adding it to the roadmap. Not having that in the first version of
the roadmap does not mean there will be no activity. And when the
discussion advances well in the next weeks, we can update the roadmap soon.

What do you think?

Best,
Stephan


On Thu, Feb 14, 2019 at 5:46 PM Rong Rong <wa...@gmail.com> wrote:

> Hi Stephan,
>
> Thanks for the clarification, yes I think these issues has already been
> discussed in previous mailing list threads [1,2,3].
>
> I also agree that updating the "official" roadmap every release is a very
> good idea to avoid frequent update.
> One question I might've been a bit confusion is: are we suggesting to keep
> one roadmap on the documentation site (e.g. [4]) per release, or simply
> just one most up-to-date roadmap in the main website [5] ?
> Just like the release notes in every release, the former will probably
> provide a good tracker for users to look back at previous roadmaps as well
> I am assuming.
>
> Thanks,
> Rong
>
> [1]
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Improvement-to-Flink-Window-Operator-with-Slicing-td25750.html
> [2]
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-security-improvements-td21068.html
> [3]
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-Kerberos-Improvement-td25983.html
>
> [4] https://ci.apache.org/projects/flink/flink-docs-release-1.7/
> [5] https://flink.apache.org/
>
> On Thu, Feb 14, 2019 at 2:26 AM Stephan Ewen <se...@apache.org> wrote:
>
>> I think the website is better as well.
>>
>> I agree with Fabian that the wiki is not so visible, and visibility is
>> the main motivation.
>> This type of roadmap overview would not be updated by everyone - letting
>> committers update the roadmap means the listed threads are actually
>> happening at the moment.
>>
>>
>> On Thu, Feb 14, 2019 at 11:14 AM Fabian Hueske <fh...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I like the idea of putting the roadmap on the website because it is much
>>> more visible (and IMO more credible, obligatory) there.
>>> However, I share the concerns about frequent updates.
>>>
>>> It think it would be great to update the "official" roadmap on the
>>> website once per release (-bugfix releases), i.e., every three month.
>>> We can use the wiki to collect and draft the roadmap for the next
>>> update.
>>>
>>> Best, Fabian
>>>
>>>
>>> Am Do., 14. Feb. 2019 um 11:03 Uhr schrieb Jeff Zhang <zjffdu@gmail.com
>>> >:
>>>
>>>> Hi Stephan,
>>>>
>>>> Thanks for this proposal. It is a good idea to track the roadmap. One
>>>> suggestion is that it might be better to put it into wiki page first.
>>>> Because it is easier to update the roadmap on wiki compared to on flink web
>>>> site. And I guess we may need to update the roadmap very often at the
>>>> beginning as there's so many discussions and proposals in community
>>>> recently. We can move it into flink web site later when we feel it could be
>>>> nailed down.
>>>>
>>>> Stephan Ewen <se...@apache.org> 于2019年2月14日周四 下午5:44写道：
>>>>
>>>>> Thanks Jincheng and Rong Rong!
>>>>>
>>>>> I am not deciding a roadmap and making a call on what features should
>>>>> be developed or not. I was only collecting broader issues that are already
>>>>> happening or have an active FLIP/design discussion plus committer support.
>>>>>
>>>>> Do we have that for the suggested issues as well? If yes , we can add
>>>>> them (can you point me to the issue/mail-thread), if not, let's try and
>>>>> move the discussion forward and add them to the roadmap overview then.
>>>>>
>>>>> Best,
>>>>> Stephan
>>>>>
>>>>>
>>>>> On Wed, Feb 13, 2019 at 6:47 PM Rong Rong <wa...@gmail.com> wrote:
>>>>>
>>>>>> Thanks Stephan for the great proposal.
>>>>>>
>>>>>> This would not only be beneficial for new users but also for
>>>>>> contributors to keep track on all upcoming features.
>>>>>>
>>>>>> I think that better window operator support can also be separately
>>>>>> group into its own category, as they affects both future DataStream API and
>>>>>> batch stream unification.
>>>>>> can we also include:
>>>>>> - OVER aggregate for DataStream API separately as @jincheng suggested.
>>>>>> - Improving sliding window operator [1]
>>>>>>
>>>>>> One more additional suggestion, can we also include a more extendable
>>>>>> security module [2,3] @shuyi and I are currently working on?
>>>>>> This will significantly improve the usability for Flink in corporate
>>>>>> environments where proprietary or 3rd-party security integration is needed.
>>>>>>
>>>>>> Thanks,
>>>>>> Rong
>>>>>>
>>>>>>
>>>>>> [1]
>>>>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Improvement-to-Flink-Window-Operator-with-Slicing-td25750.html
>>>>>> [2]
>>>>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-security-improvements-td21068.html
>>>>>> [3]
>>>>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-Kerberos-Improvement-td25983.html
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Feb 13, 2019 at 3:39 AM jincheng sun <
>>>>>> sunjincheng121@gmail.com> wrote:
>>>>>>
>>>>>>> Very excited and thank you for launching such a great discussion,
>>>>>>> Stephan !
>>>>>>>
>>>>>>> Here only a little suggestion that in the Batch Streaming
>>>>>>> Unification section, do we need to add an item:
>>>>>>>
>>>>>>> - Same window operators on bounded/unbounded Table API and
>>>>>>> DataStream API
>>>>>>> (currently OVER window only exists in SQL/TableAPI, DataStream API
>>>>>>> does not yet support)
>>>>>>>
>>>>>>> Best,
>>>>>>> Jincheng
>>>>>>>
>>>>>>> Stephan Ewen <se...@apache.org> 于2019年2月13日周三 下午7:21写道：
>>>>>>>
>>>>>>>> Hi all!
>>>>>>>>
>>>>>>>> Recently several contributors, committers, and users asked about
>>>>>>>> making it more visible in which way the project is currently going.
>>>>>>>>
>>>>>>>> Users and developers can track the direction by following the
>>>>>>>> discussion threads and JIRA, but due to the mass of discussions and open
>>>>>>>> issues, it is very hard to get a good overall picture.
>>>>>>>> Especially for new users and contributors, is is very hard to get a
>>>>>>>> quick overview of the project direction.
>>>>>>>>
>>>>>>>> To fix this, I suggest to add a brief roadmap summary to the
>>>>>>>> homepage. It is a bit of a commitment to keep that roadmap up to date, but
>>>>>>>> I think the benefit for users justifies that.
>>>>>>>> The Apache Beam project has added such a roadmap [1]
>>>>>>>> <https://beam.apache.org/roadmap/>, which was received very well
>>>>>>>> by the community, I would suggest to follow a similar structure here.
>>>>>>>>
>>>>>>>> If the community is in favor of this, I would volunteer to write a
>>>>>>>> first version of such a roadmap. The points I would include are below.
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Stephan
>>>>>>>>
>>>>>>>> [1] https://beam.apache.org/roadmap/
>>>>>>>>
>>>>>>>> ========================================================
>>>>>>>>
>>>>>>>> Disclaimer: Apache Flink is not governed or steered by any one
>>>>>>>> single entity, but by its community and Project Management Committee (PMC).
>>>>>>>> This is not a authoritative roadmap in the sense of a plan with a specific
>>>>>>>> timeline. Instead, we share our vision for the future and major initiatives
>>>>>>>> that are receiving attention and give users and contributors an
>>>>>>>> understanding what they can look forward to.
>>>>>>>>
>>>>>>>> *Future Role of Table API and DataStream API*
>>>>>>>>   - Table API becomes first class citizen
>>>>>>>>   - Table API becomes primary API for analytics use cases
>>>>>>>>       * Declarative, automatic optimizations
>>>>>>>>       * No manual control over state and timers
>>>>>>>>   - DataStream API becomes primary API for applications and data
>>>>>>>> pipeline use cases
>>>>>>>>       * Physical, user controls data types, no magic or optimizer
>>>>>>>>       * Explicit control over state and time
>>>>>>>>
>>>>>>>> *Batch Streaming Unification*
>>>>>>>>   - Table API unification (environments) (FLIP-32)
>>>>>>>>   - New unified source interface (FLIP-27)
>>>>>>>>   - Runtime operator unification & code reuse between DataStream /
>>>>>>>> Table
>>>>>>>>   - Extending Table API to make it convenient API for all
>>>>>>>> analytical use cases (easier mix in of UDFs)
>>>>>>>>   - Same join operators on bounded/unbounded Table API and
>>>>>>>> DataStream API
>>>>>>>>
>>>>>>>> *Faster Batch (Bounded Streams)*
>>>>>>>>   - Much of this comes via Blink contribution/merging
>>>>>>>>   - Fine-grained Fault Tolerance on bounded data (Table API)
>>>>>>>>   - Batch Scheduling on bounded data (Table API)
>>>>>>>>   - External Shuffle Services Support on bounded streams
>>>>>>>>   - Caching of intermediate results on bounded data (Table API)
>>>>>>>>   - Extending DataStream API to explicitly model bounded streams
>>>>>>>> (API breaking)
>>>>>>>>   - Add fine fault tolerance, scheduling, caching also to
>>>>>>>> DataStream API
>>>>>>>>
>>>>>>>> *Streaming State Evolution*
>>>>>>>>   - Let all built-in serializers support stable evolution
>>>>>>>>   - First class support for other evolvable formats (Protobuf,
>>>>>>>> Thrift)
>>>>>>>>   - Savepoint input/output format to modify / adjust savepoints
>>>>>>>>
>>>>>>>> *Simpler Event Time Handling*
>>>>>>>>   - Event Time Alignment in Sources
>>>>>>>>   - Simpler out-of-the box support in sources
>>>>>>>>
>>>>>>>> *Checkpointing*
>>>>>>>>   - Consistency of Side Effects: suspend / end with savepoint
>>>>>>>> (FLIP-34)
>>>>>>>>   - Failed checkpoints explicitly aborted on TaskManagers (not only
>>>>>>>> on coordinator)
>>>>>>>>
>>>>>>>> *Automatic scaling (adjusting parallelism)*
>>>>>>>>   - Reactive scaling
>>>>>>>>   - Active scaling policies
>>>>>>>>
>>>>>>>> *Kubernetes Integration*
>>>>>>>>   - Active Kubernetes Integration (Flink actively manages
>>>>>>>> containers)
>>>>>>>>
>>>>>>>> *SQL Ecosystem*
>>>>>>>>   - Extended Metadata Stores / Catalog / Schema Registries support
>>>>>>>>   - DDL support
>>>>>>>>   - Integration with Hive Ecosystem
>>>>>>>>
>>>>>>>> *Simpler Handling of Dependencies*
>>>>>>>>   - Scala in the APIs, but not in the core (hide in separate class
>>>>>>>> loader)
>>>>>>>>   - Hadoop-free by default
>>>>>>>>
>>>>>>>>
>>>>
>>>> --
>>>> Best Regards
>>>>
>>>> Jeff Zhang
>>>>
>>>

Re: [DISCUSS] Adding a mid-term roadmap to the Flink website

Posted by Rong Rong <wa...@gmail.com>.

Hi Stephan,

Thanks for the clarification, yes I think these issues has already been
discussed in previous mailing list threads [1,2,3].

I also agree that updating the "official" roadmap every release is a very
good idea to avoid frequent update.
One question I might've been a bit confusion is: are we suggesting to keep
one roadmap on the documentation site (e.g. [4]) per release, or simply
just one most up-to-date roadmap in the main website [5] ?
Just like the release notes in every release, the former will probably
provide a good tracker for users to look back at previous roadmaps as well
I am assuming.

Thanks,
Rong

[1]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Improvement-to-Flink-Window-Operator-with-Slicing-td25750.html
[2]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-security-improvements-td21068.html
[3]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-Kerberos-Improvement-td25983.html

[4] https://ci.apache.org/projects/flink/flink-docs-release-1.7/
[5] https://flink.apache.org/

On Thu, Feb 14, 2019 at 2:26 AM Stephan Ewen <se...@apache.org> wrote:

> I think the website is better as well.
>
> I agree with Fabian that the wiki is not so visible, and visibility is the
> main motivation.
> This type of roadmap overview would not be updated by everyone - letting
> committers update the roadmap means the listed threads are actually
> happening at the moment.
>
>
> On Thu, Feb 14, 2019 at 11:14 AM Fabian Hueske <fh...@gmail.com> wrote:
>
>> Hi,
>>
>> I like the idea of putting the roadmap on the website because it is much
>> more visible (and IMO more credible, obligatory) there.
>> However, I share the concerns about frequent updates.
>>
>> It think it would be great to update the "official" roadmap on the
>> website once per release (-bugfix releases), i.e., every three month.
>> We can use the wiki to collect and draft the roadmap for the next update.
>>
>> Best, Fabian
>>
>>
>> Am Do., 14. Feb. 2019 um 11:03 Uhr schrieb Jeff Zhang <zj...@gmail.com>:
>>
>>> Hi Stephan,
>>>
>>> Thanks for this proposal. It is a good idea to track the roadmap. One
>>> suggestion is that it might be better to put it into wiki page first.
>>> Because it is easier to update the roadmap on wiki compared to on flink web
>>> site. And I guess we may need to update the roadmap very often at the
>>> beginning as there's so many discussions and proposals in community
>>> recently. We can move it into flink web site later when we feel it could be
>>> nailed down.
>>>
>>> Stephan Ewen <se...@apache.org> 于2019年2月14日周四 下午5:44写道：
>>>
>>>> Thanks Jincheng and Rong Rong!
>>>>
>>>> I am not deciding a roadmap and making a call on what features should
>>>> be developed or not. I was only collecting broader issues that are already
>>>> happening or have an active FLIP/design discussion plus committer support.
>>>>
>>>> Do we have that for the suggested issues as well? If yes , we can add
>>>> them (can you point me to the issue/mail-thread), if not, let's try and
>>>> move the discussion forward and add them to the roadmap overview then.
>>>>
>>>> Best,
>>>> Stephan
>>>>
>>>>
>>>> On Wed, Feb 13, 2019 at 6:47 PM Rong Rong <wa...@gmail.com> wrote:
>>>>
>>>>> Thanks Stephan for the great proposal.
>>>>>
>>>>> This would not only be beneficial for new users but also for
>>>>> contributors to keep track on all upcoming features.
>>>>>
>>>>> I think that better window operator support can also be separately
>>>>> group into its own category, as they affects both future DataStream API and
>>>>> batch stream unification.
>>>>> can we also include:
>>>>> - OVER aggregate for DataStream API separately as @jincheng suggested.
>>>>> - Improving sliding window operator [1]
>>>>>
>>>>> One more additional suggestion, can we also include a more extendable
>>>>> security module [2,3] @shuyi and I are currently working on?
>>>>> This will significantly improve the usability for Flink in corporate
>>>>> environments where proprietary or 3rd-party security integration is needed.
>>>>>
>>>>> Thanks,
>>>>> Rong
>>>>>
>>>>>
>>>>> [1]
>>>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Improvement-to-Flink-Window-Operator-with-Slicing-td25750.html
>>>>> [2]
>>>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-security-improvements-td21068.html
>>>>> [3]
>>>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-Kerberos-Improvement-td25983.html
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Feb 13, 2019 at 3:39 AM jincheng sun <su...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Very excited and thank you for launching such a great discussion,
>>>>>> Stephan !
>>>>>>
>>>>>> Here only a little suggestion that in the Batch Streaming Unification
>>>>>> section, do we need to add an item:
>>>>>>
>>>>>> - Same window operators on bounded/unbounded Table API and DataStream
>>>>>> API
>>>>>> (currently OVER window only exists in SQL/TableAPI, DataStream API
>>>>>> does not yet support)
>>>>>>
>>>>>> Best,
>>>>>> Jincheng
>>>>>>
>>>>>> Stephan Ewen <se...@apache.org> 于2019年2月13日周三 下午7:21写道：
>>>>>>
>>>>>>> Hi all!
>>>>>>>
>>>>>>> Recently several contributors, committers, and users asked about
>>>>>>> making it more visible in which way the project is currently going.
>>>>>>>
>>>>>>> Users and developers can track the direction by following the
>>>>>>> discussion threads and JIRA, but due to the mass of discussions and open
>>>>>>> issues, it is very hard to get a good overall picture.
>>>>>>> Especially for new users and contributors, is is very hard to get a
>>>>>>> quick overview of the project direction.
>>>>>>>
>>>>>>> To fix this, I suggest to add a brief roadmap summary to the
>>>>>>> homepage. It is a bit of a commitment to keep that roadmap up to date, but
>>>>>>> I think the benefit for users justifies that.
>>>>>>> The Apache Beam project has added such a roadmap [1]
>>>>>>> <https://beam.apache.org/roadmap/>, which was received very well by
>>>>>>> the community, I would suggest to follow a similar structure here.
>>>>>>>
>>>>>>> If the community is in favor of this, I would volunteer to write a
>>>>>>> first version of such a roadmap. The points I would include are below.
>>>>>>>
>>>>>>> Best,
>>>>>>> Stephan
>>>>>>>
>>>>>>> [1] https://beam.apache.org/roadmap/
>>>>>>>
>>>>>>> ========================================================
>>>>>>>
>>>>>>> Disclaimer: Apache Flink is not governed or steered by any one
>>>>>>> single entity, but by its community and Project Management Committee (PMC).
>>>>>>> This is not a authoritative roadmap in the sense of a plan with a specific
>>>>>>> timeline. Instead, we share our vision for the future and major initiatives
>>>>>>> that are receiving attention and give users and contributors an
>>>>>>> understanding what they can look forward to.
>>>>>>>
>>>>>>> *Future Role of Table API and DataStream API*
>>>>>>>   - Table API becomes first class citizen
>>>>>>>   - Table API becomes primary API for analytics use cases
>>>>>>>       * Declarative, automatic optimizations
>>>>>>>       * No manual control over state and timers
>>>>>>>   - DataStream API becomes primary API for applications and data
>>>>>>> pipeline use cases
>>>>>>>       * Physical, user controls data types, no magic or optimizer
>>>>>>>       * Explicit control over state and time
>>>>>>>
>>>>>>> *Batch Streaming Unification*
>>>>>>>   - Table API unification (environments) (FLIP-32)
>>>>>>>   - New unified source interface (FLIP-27)
>>>>>>>   - Runtime operator unification & code reuse between DataStream /
>>>>>>> Table
>>>>>>>   - Extending Table API to make it convenient API for all analytical
>>>>>>> use cases (easier mix in of UDFs)
>>>>>>>   - Same join operators on bounded/unbounded Table API and
>>>>>>> DataStream API
>>>>>>>
>>>>>>> *Faster Batch (Bounded Streams)*
>>>>>>>   - Much of this comes via Blink contribution/merging
>>>>>>>   - Fine-grained Fault Tolerance on bounded data (Table API)
>>>>>>>   - Batch Scheduling on bounded data (Table API)
>>>>>>>   - External Shuffle Services Support on bounded streams
>>>>>>>   - Caching of intermediate results on bounded data (Table API)
>>>>>>>   - Extending DataStream API to explicitly model bounded streams
>>>>>>> (API breaking)
>>>>>>>   - Add fine fault tolerance, scheduling, caching also to DataStream
>>>>>>> API
>>>>>>>
>>>>>>> *Streaming State Evolution*
>>>>>>>   - Let all built-in serializers support stable evolution
>>>>>>>   - First class support for other evolvable formats (Protobuf,
>>>>>>> Thrift)
>>>>>>>   - Savepoint input/output format to modify / adjust savepoints
>>>>>>>
>>>>>>> *Simpler Event Time Handling*
>>>>>>>   - Event Time Alignment in Sources
>>>>>>>   - Simpler out-of-the box support in sources
>>>>>>>
>>>>>>> *Checkpointing*
>>>>>>>   - Consistency of Side Effects: suspend / end with savepoint
>>>>>>> (FLIP-34)
>>>>>>>   - Failed checkpoints explicitly aborted on TaskManagers (not only
>>>>>>> on coordinator)
>>>>>>>
>>>>>>> *Automatic scaling (adjusting parallelism)*
>>>>>>>   - Reactive scaling
>>>>>>>   - Active scaling policies
>>>>>>>
>>>>>>> *Kubernetes Integration*
>>>>>>>   - Active Kubernetes Integration (Flink actively manages containers)
>>>>>>>
>>>>>>> *SQL Ecosystem*
>>>>>>>   - Extended Metadata Stores / Catalog / Schema Registries support
>>>>>>>   - DDL support
>>>>>>>   - Integration with Hive Ecosystem
>>>>>>>
>>>>>>> *Simpler Handling of Dependencies*
>>>>>>>   - Scala in the APIs, but not in the core (hide in separate class
>>>>>>> loader)
>>>>>>>   - Hadoop-free by default
>>>>>>>
>>>>>>>
>>>
>>> --
>>> Best Regards
>>>
>>> Jeff Zhang
>>>
>>

Re: [DISCUSS] Adding a mid-term roadmap to the Flink website

Posted by Rong Rong <wa...@gmail.com>.

Hi Stephan,

Thanks for the clarification, yes I think these issues has already been
discussed in previous mailing list threads [1,2,3].

I also agree that updating the "official" roadmap every release is a very
good idea to avoid frequent update.
One question I might've been a bit confusion is: are we suggesting to keep
one roadmap on the documentation site (e.g. [4]) per release, or simply
just one most up-to-date roadmap in the main website [5] ?
Just like the release notes in every release, the former will probably
provide a good tracker for users to look back at previous roadmaps as well
I am assuming.

Thanks,
Rong

[1]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Improvement-to-Flink-Window-Operator-with-Slicing-td25750.html
[2]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-security-improvements-td21068.html
[3]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-Kerberos-Improvement-td25983.html

[4] https://ci.apache.org/projects/flink/flink-docs-release-1.7/
[5] https://flink.apache.org/

On Thu, Feb 14, 2019 at 2:26 AM Stephan Ewen <se...@apache.org> wrote:

> I think the website is better as well.
>
> I agree with Fabian that the wiki is not so visible, and visibility is the
> main motivation.
> This type of roadmap overview would not be updated by everyone - letting
> committers update the roadmap means the listed threads are actually
> happening at the moment.
>
>
> On Thu, Feb 14, 2019 at 11:14 AM Fabian Hueske <fh...@gmail.com> wrote:
>
>> Hi,
>>
>> I like the idea of putting the roadmap on the website because it is much
>> more visible (and IMO more credible, obligatory) there.
>> However, I share the concerns about frequent updates.
>>
>> It think it would be great to update the "official" roadmap on the
>> website once per release (-bugfix releases), i.e., every three month.
>> We can use the wiki to collect and draft the roadmap for the next update.
>>
>> Best, Fabian
>>
>>
>> Am Do., 14. Feb. 2019 um 11:03 Uhr schrieb Jeff Zhang <zj...@gmail.com>:
>>
>>> Hi Stephan,
>>>
>>> Thanks for this proposal. It is a good idea to track the roadmap. One
>>> suggestion is that it might be better to put it into wiki page first.
>>> Because it is easier to update the roadmap on wiki compared to on flink web
>>> site. And I guess we may need to update the roadmap very often at the
>>> beginning as there's so many discussions and proposals in community
>>> recently. We can move it into flink web site later when we feel it could be
>>> nailed down.
>>>
>>> Stephan Ewen <se...@apache.org> 于2019年2月14日周四 下午5:44写道：
>>>
>>>> Thanks Jincheng and Rong Rong!
>>>>
>>>> I am not deciding a roadmap and making a call on what features should
>>>> be developed or not. I was only collecting broader issues that are already
>>>> happening or have an active FLIP/design discussion plus committer support.
>>>>
>>>> Do we have that for the suggested issues as well? If yes , we can add
>>>> them (can you point me to the issue/mail-thread), if not, let's try and
>>>> move the discussion forward and add them to the roadmap overview then.
>>>>
>>>> Best,
>>>> Stephan
>>>>
>>>>
>>>> On Wed, Feb 13, 2019 at 6:47 PM Rong Rong <wa...@gmail.com> wrote:
>>>>
>>>>> Thanks Stephan for the great proposal.
>>>>>
>>>>> This would not only be beneficial for new users but also for
>>>>> contributors to keep track on all upcoming features.
>>>>>
>>>>> I think that better window operator support can also be separately
>>>>> group into its own category, as they affects both future DataStream API and
>>>>> batch stream unification.
>>>>> can we also include:
>>>>> - OVER aggregate for DataStream API separately as @jincheng suggested.
>>>>> - Improving sliding window operator [1]
>>>>>
>>>>> One more additional suggestion, can we also include a more extendable
>>>>> security module [2,3] @shuyi and I are currently working on?
>>>>> This will significantly improve the usability for Flink in corporate
>>>>> environments where proprietary or 3rd-party security integration is needed.
>>>>>
>>>>> Thanks,
>>>>> Rong
>>>>>
>>>>>
>>>>> [1]
>>>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Improvement-to-Flink-Window-Operator-with-Slicing-td25750.html
>>>>> [2]
>>>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-security-improvements-td21068.html
>>>>> [3]
>>>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-Kerberos-Improvement-td25983.html
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Feb 13, 2019 at 3:39 AM jincheng sun <su...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Very excited and thank you for launching such a great discussion,
>>>>>> Stephan !
>>>>>>
>>>>>> Here only a little suggestion that in the Batch Streaming Unification
>>>>>> section, do we need to add an item:
>>>>>>
>>>>>> - Same window operators on bounded/unbounded Table API and DataStream
>>>>>> API
>>>>>> (currently OVER window only exists in SQL/TableAPI, DataStream API
>>>>>> does not yet support)
>>>>>>
>>>>>> Best,
>>>>>> Jincheng
>>>>>>
>>>>>> Stephan Ewen <se...@apache.org> 于2019年2月13日周三 下午7:21写道：
>>>>>>
>>>>>>> Hi all!
>>>>>>>
>>>>>>> Recently several contributors, committers, and users asked about
>>>>>>> making it more visible in which way the project is currently going.
>>>>>>>
>>>>>>> Users and developers can track the direction by following the
>>>>>>> discussion threads and JIRA, but due to the mass of discussions and open
>>>>>>> issues, it is very hard to get a good overall picture.
>>>>>>> Especially for new users and contributors, is is very hard to get a
>>>>>>> quick overview of the project direction.
>>>>>>>
>>>>>>> To fix this, I suggest to add a brief roadmap summary to the
>>>>>>> homepage. It is a bit of a commitment to keep that roadmap up to date, but
>>>>>>> I think the benefit for users justifies that.
>>>>>>> The Apache Beam project has added such a roadmap [1]
>>>>>>> <https://beam.apache.org/roadmap/>, which was received very well by
>>>>>>> the community, I would suggest to follow a similar structure here.
>>>>>>>
>>>>>>> If the community is in favor of this, I would volunteer to write a
>>>>>>> first version of such a roadmap. The points I would include are below.
>>>>>>>
>>>>>>> Best,
>>>>>>> Stephan
>>>>>>>
>>>>>>> [1] https://beam.apache.org/roadmap/
>>>>>>>
>>>>>>> ========================================================
>>>>>>>
>>>>>>> Disclaimer: Apache Flink is not governed or steered by any one
>>>>>>> single entity, but by its community and Project Management Committee (PMC).
>>>>>>> This is not a authoritative roadmap in the sense of a plan with a specific
>>>>>>> timeline. Instead, we share our vision for the future and major initiatives
>>>>>>> that are receiving attention and give users and contributors an
>>>>>>> understanding what they can look forward to.
>>>>>>>
>>>>>>> *Future Role of Table API and DataStream API*
>>>>>>>   - Table API becomes first class citizen
>>>>>>>   - Table API becomes primary API for analytics use cases
>>>>>>>       * Declarative, automatic optimizations
>>>>>>>       * No manual control over state and timers
>>>>>>>   - DataStream API becomes primary API for applications and data
>>>>>>> pipeline use cases
>>>>>>>       * Physical, user controls data types, no magic or optimizer
>>>>>>>       * Explicit control over state and time
>>>>>>>
>>>>>>> *Batch Streaming Unification*
>>>>>>>   - Table API unification (environments) (FLIP-32)
>>>>>>>   - New unified source interface (FLIP-27)
>>>>>>>   - Runtime operator unification & code reuse between DataStream /
>>>>>>> Table
>>>>>>>   - Extending Table API to make it convenient API for all analytical
>>>>>>> use cases (easier mix in of UDFs)
>>>>>>>   - Same join operators on bounded/unbounded Table API and
>>>>>>> DataStream API
>>>>>>>
>>>>>>> *Faster Batch (Bounded Streams)*
>>>>>>>   - Much of this comes via Blink contribution/merging
>>>>>>>   - Fine-grained Fault Tolerance on bounded data (Table API)
>>>>>>>   - Batch Scheduling on bounded data (Table API)
>>>>>>>   - External Shuffle Services Support on bounded streams
>>>>>>>   - Caching of intermediate results on bounded data (Table API)
>>>>>>>   - Extending DataStream API to explicitly model bounded streams
>>>>>>> (API breaking)
>>>>>>>   - Add fine fault tolerance, scheduling, caching also to DataStream
>>>>>>> API
>>>>>>>
>>>>>>> *Streaming State Evolution*
>>>>>>>   - Let all built-in serializers support stable evolution
>>>>>>>   - First class support for other evolvable formats (Protobuf,
>>>>>>> Thrift)
>>>>>>>   - Savepoint input/output format to modify / adjust savepoints
>>>>>>>
>>>>>>> *Simpler Event Time Handling*
>>>>>>>   - Event Time Alignment in Sources
>>>>>>>   - Simpler out-of-the box support in sources
>>>>>>>
>>>>>>> *Checkpointing*
>>>>>>>   - Consistency of Side Effects: suspend / end with savepoint
>>>>>>> (FLIP-34)
>>>>>>>   - Failed checkpoints explicitly aborted on TaskManagers (not only
>>>>>>> on coordinator)
>>>>>>>
>>>>>>> *Automatic scaling (adjusting parallelism)*
>>>>>>>   - Reactive scaling
>>>>>>>   - Active scaling policies
>>>>>>>
>>>>>>> *Kubernetes Integration*
>>>>>>>   - Active Kubernetes Integration (Flink actively manages containers)
>>>>>>>
>>>>>>> *SQL Ecosystem*
>>>>>>>   - Extended Metadata Stores / Catalog / Schema Registries support
>>>>>>>   - DDL support
>>>>>>>   - Integration with Hive Ecosystem
>>>>>>>
>>>>>>> *Simpler Handling of Dependencies*
>>>>>>>   - Scala in the APIs, but not in the core (hide in separate class
>>>>>>> loader)
>>>>>>>   - Hadoop-free by default
>>>>>>>
>>>>>>>
>>>
>>> --
>>> Best Regards
>>>
>>> Jeff Zhang
>>>
>>

Re: [DISCUSS] Adding a mid-term roadmap to the Flink website

Posted by Stephan Ewen <se...@apache.org>.

I think the website is better as well.

I agree with Fabian that the wiki is not so visible, and visibility is the
main motivation.
This type of roadmap overview would not be updated by everyone - letting
committers update the roadmap means the listed threads are actually
happening at the moment.


On Thu, Feb 14, 2019 at 11:14 AM Fabian Hueske <fh...@gmail.com> wrote:

> Hi,
>
> I like the idea of putting the roadmap on the website because it is much
> more visible (and IMO more credible, obligatory) there.
> However, I share the concerns about frequent updates.
>
> It think it would be great to update the "official" roadmap on the website
> once per release (-bugfix releases), i.e., every three month.
> We can use the wiki to collect and draft the roadmap for the next update.
>
> Best, Fabian
>
>
> Am Do., 14. Feb. 2019 um 11:03 Uhr schrieb Jeff Zhang <zj...@gmail.com>:
>
>> Hi Stephan,
>>
>> Thanks for this proposal. It is a good idea to track the roadmap. One
>> suggestion is that it might be better to put it into wiki page first.
>> Because it is easier to update the roadmap on wiki compared to on flink web
>> site. And I guess we may need to update the roadmap very often at the
>> beginning as there's so many discussions and proposals in community
>> recently. We can move it into flink web site later when we feel it could be
>> nailed down.
>>
>> Stephan Ewen <se...@apache.org> 于2019年2月14日周四 下午5:44写道：
>>
>>> Thanks Jincheng and Rong Rong!
>>>
>>> I am not deciding a roadmap and making a call on what features should be
>>> developed or not. I was only collecting broader issues that are already
>>> happening or have an active FLIP/design discussion plus committer support.
>>>
>>> Do we have that for the suggested issues as well? If yes , we can add
>>> them (can you point me to the issue/mail-thread), if not, let's try and
>>> move the discussion forward and add them to the roadmap overview then.
>>>
>>> Best,
>>> Stephan
>>>
>>>
>>> On Wed, Feb 13, 2019 at 6:47 PM Rong Rong <wa...@gmail.com> wrote:
>>>
>>>> Thanks Stephan for the great proposal.
>>>>
>>>> This would not only be beneficial for new users but also for
>>>> contributors to keep track on all upcoming features.
>>>>
>>>> I think that better window operator support can also be separately
>>>> group into its own category, as they affects both future DataStream API and
>>>> batch stream unification.
>>>> can we also include:
>>>> - OVER aggregate for DataStream API separately as @jincheng suggested.
>>>> - Improving sliding window operator [1]
>>>>
>>>> One more additional suggestion, can we also include a more extendable
>>>> security module [2,3] @shuyi and I are currently working on?
>>>> This will significantly improve the usability for Flink in corporate
>>>> environments where proprietary or 3rd-party security integration is needed.
>>>>
>>>> Thanks,
>>>> Rong
>>>>
>>>>
>>>> [1]
>>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Improvement-to-Flink-Window-Operator-with-Slicing-td25750.html
>>>> [2]
>>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-security-improvements-td21068.html
>>>> [3]
>>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-Kerberos-Improvement-td25983.html
>>>>
>>>>
>>>>
>>>>
>>>> On Wed, Feb 13, 2019 at 3:39 AM jincheng sun <su...@gmail.com>
>>>> wrote:
>>>>
>>>>> Very excited and thank you for launching such a great discussion,
>>>>> Stephan !
>>>>>
>>>>> Here only a little suggestion that in the Batch Streaming Unification
>>>>> section, do we need to add an item:
>>>>>
>>>>> - Same window operators on bounded/unbounded Table API and DataStream
>>>>> API
>>>>> (currently OVER window only exists in SQL/TableAPI, DataStream API
>>>>> does not yet support)
>>>>>
>>>>> Best,
>>>>> Jincheng
>>>>>
>>>>> Stephan Ewen <se...@apache.org> 于2019年2月13日周三 下午7:21写道：
>>>>>
>>>>>> Hi all!
>>>>>>
>>>>>> Recently several contributors, committers, and users asked about
>>>>>> making it more visible in which way the project is currently going.
>>>>>>
>>>>>> Users and developers can track the direction by following the
>>>>>> discussion threads and JIRA, but due to the mass of discussions and open
>>>>>> issues, it is very hard to get a good overall picture.
>>>>>> Especially for new users and contributors, is is very hard to get a
>>>>>> quick overview of the project direction.
>>>>>>
>>>>>> To fix this, I suggest to add a brief roadmap summary to the
>>>>>> homepage. It is a bit of a commitment to keep that roadmap up to date, but
>>>>>> I think the benefit for users justifies that.
>>>>>> The Apache Beam project has added such a roadmap [1]
>>>>>> <https://beam.apache.org/roadmap/>, which was received very well by
>>>>>> the community, I would suggest to follow a similar structure here.
>>>>>>
>>>>>> If the community is in favor of this, I would volunteer to write a
>>>>>> first version of such a roadmap. The points I would include are below.
>>>>>>
>>>>>> Best,
>>>>>> Stephan
>>>>>>
>>>>>> [1] https://beam.apache.org/roadmap/
>>>>>>
>>>>>> ========================================================
>>>>>>
>>>>>> Disclaimer: Apache Flink is not governed or steered by any one single
>>>>>> entity, but by its community and Project Management Committee (PMC). This
>>>>>> is not a authoritative roadmap in the sense of a plan with a specific
>>>>>> timeline. Instead, we share our vision for the future and major initiatives
>>>>>> that are receiving attention and give users and contributors an
>>>>>> understanding what they can look forward to.
>>>>>>
>>>>>> *Future Role of Table API and DataStream API*
>>>>>>   - Table API becomes first class citizen
>>>>>>   - Table API becomes primary API for analytics use cases
>>>>>>       * Declarative, automatic optimizations
>>>>>>       * No manual control over state and timers
>>>>>>   - DataStream API becomes primary API for applications and data
>>>>>> pipeline use cases
>>>>>>       * Physical, user controls data types, no magic or optimizer
>>>>>>       * Explicit control over state and time
>>>>>>
>>>>>> *Batch Streaming Unification*
>>>>>>   - Table API unification (environments) (FLIP-32)
>>>>>>   - New unified source interface (FLIP-27)
>>>>>>   - Runtime operator unification & code reuse between DataStream /
>>>>>> Table
>>>>>>   - Extending Table API to make it convenient API for all analytical
>>>>>> use cases (easier mix in of UDFs)
>>>>>>   - Same join operators on bounded/unbounded Table API and DataStream
>>>>>> API
>>>>>>
>>>>>> *Faster Batch (Bounded Streams)*
>>>>>>   - Much of this comes via Blink contribution/merging
>>>>>>   - Fine-grained Fault Tolerance on bounded data (Table API)
>>>>>>   - Batch Scheduling on bounded data (Table API)
>>>>>>   - External Shuffle Services Support on bounded streams
>>>>>>   - Caching of intermediate results on bounded data (Table API)
>>>>>>   - Extending DataStream API to explicitly model bounded streams (API
>>>>>> breaking)
>>>>>>   - Add fine fault tolerance, scheduling, caching also to DataStream
>>>>>> API
>>>>>>
>>>>>> *Streaming State Evolution*
>>>>>>   - Let all built-in serializers support stable evolution
>>>>>>   - First class support for other evolvable formats (Protobuf, Thrift)
>>>>>>   - Savepoint input/output format to modify / adjust savepoints
>>>>>>
>>>>>> *Simpler Event Time Handling*
>>>>>>   - Event Time Alignment in Sources
>>>>>>   - Simpler out-of-the box support in sources
>>>>>>
>>>>>> *Checkpointing*
>>>>>>   - Consistency of Side Effects: suspend / end with savepoint
>>>>>> (FLIP-34)
>>>>>>   - Failed checkpoints explicitly aborted on TaskManagers (not only
>>>>>> on coordinator)
>>>>>>
>>>>>> *Automatic scaling (adjusting parallelism)*
>>>>>>   - Reactive scaling
>>>>>>   - Active scaling policies
>>>>>>
>>>>>> *Kubernetes Integration*
>>>>>>   - Active Kubernetes Integration (Flink actively manages containers)
>>>>>>
>>>>>> *SQL Ecosystem*
>>>>>>   - Extended Metadata Stores / Catalog / Schema Registries support
>>>>>>   - DDL support
>>>>>>   - Integration with Hive Ecosystem
>>>>>>
>>>>>> *Simpler Handling of Dependencies*
>>>>>>   - Scala in the APIs, but not in the core (hide in separate class
>>>>>> loader)
>>>>>>   - Hadoop-free by default
>>>>>>
>>>>>>
>>
>> --
>> Best Regards
>>
>> Jeff Zhang
>>
>

Re: [DISCUSS] Adding a mid-term roadmap to the Flink website

Posted by Stephan Ewen <se...@apache.org>.

I think the website is better as well.

I agree with Fabian that the wiki is not so visible, and visibility is the
main motivation.
This type of roadmap overview would not be updated by everyone - letting
committers update the roadmap means the listed threads are actually
happening at the moment.


On Thu, Feb 14, 2019 at 11:14 AM Fabian Hueske <fh...@gmail.com> wrote:

> Hi,
>
> I like the idea of putting the roadmap on the website because it is much
> more visible (and IMO more credible, obligatory) there.
> However, I share the concerns about frequent updates.
>
> It think it would be great to update the "official" roadmap on the website
> once per release (-bugfix releases), i.e., every three month.
> We can use the wiki to collect and draft the roadmap for the next update.
>
> Best, Fabian
>
>
> Am Do., 14. Feb. 2019 um 11:03 Uhr schrieb Jeff Zhang <zj...@gmail.com>:
>
>> Hi Stephan,
>>
>> Thanks for this proposal. It is a good idea to track the roadmap. One
>> suggestion is that it might be better to put it into wiki page first.
>> Because it is easier to update the roadmap on wiki compared to on flink web
>> site. And I guess we may need to update the roadmap very often at the
>> beginning as there's so many discussions and proposals in community
>> recently. We can move it into flink web site later when we feel it could be
>> nailed down.
>>
>> Stephan Ewen <se...@apache.org> 于2019年2月14日周四 下午5:44写道：
>>
>>> Thanks Jincheng and Rong Rong!
>>>
>>> I am not deciding a roadmap and making a call on what features should be
>>> developed or not. I was only collecting broader issues that are already
>>> happening or have an active FLIP/design discussion plus committer support.
>>>
>>> Do we have that for the suggested issues as well? If yes , we can add
>>> them (can you point me to the issue/mail-thread), if not, let's try and
>>> move the discussion forward and add them to the roadmap overview then.
>>>
>>> Best,
>>> Stephan
>>>
>>>
>>> On Wed, Feb 13, 2019 at 6:47 PM Rong Rong <wa...@gmail.com> wrote:
>>>
>>>> Thanks Stephan for the great proposal.
>>>>
>>>> This would not only be beneficial for new users but also for
>>>> contributors to keep track on all upcoming features.
>>>>
>>>> I think that better window operator support can also be separately
>>>> group into its own category, as they affects both future DataStream API and
>>>> batch stream unification.
>>>> can we also include:
>>>> - OVER aggregate for DataStream API separately as @jincheng suggested.
>>>> - Improving sliding window operator [1]
>>>>
>>>> One more additional suggestion, can we also include a more extendable
>>>> security module [2,3] @shuyi and I are currently working on?
>>>> This will significantly improve the usability for Flink in corporate
>>>> environments where proprietary or 3rd-party security integration is needed.
>>>>
>>>> Thanks,
>>>> Rong
>>>>
>>>>
>>>> [1]
>>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Improvement-to-Flink-Window-Operator-with-Slicing-td25750.html
>>>> [2]
>>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-security-improvements-td21068.html
>>>> [3]
>>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-Kerberos-Improvement-td25983.html
>>>>
>>>>
>>>>
>>>>
>>>> On Wed, Feb 13, 2019 at 3:39 AM jincheng sun <su...@gmail.com>
>>>> wrote:
>>>>
>>>>> Very excited and thank you for launching such a great discussion,
>>>>> Stephan !
>>>>>
>>>>> Here only a little suggestion that in the Batch Streaming Unification
>>>>> section, do we need to add an item:
>>>>>
>>>>> - Same window operators on bounded/unbounded Table API and DataStream
>>>>> API
>>>>> (currently OVER window only exists in SQL/TableAPI, DataStream API
>>>>> does not yet support)
>>>>>
>>>>> Best,
>>>>> Jincheng
>>>>>
>>>>> Stephan Ewen <se...@apache.org> 于2019年2月13日周三 下午7:21写道：
>>>>>
>>>>>> Hi all!
>>>>>>
>>>>>> Recently several contributors, committers, and users asked about
>>>>>> making it more visible in which way the project is currently going.
>>>>>>
>>>>>> Users and developers can track the direction by following the
>>>>>> discussion threads and JIRA, but due to the mass of discussions and open
>>>>>> issues, it is very hard to get a good overall picture.
>>>>>> Especially for new users and contributors, is is very hard to get a
>>>>>> quick overview of the project direction.
>>>>>>
>>>>>> To fix this, I suggest to add a brief roadmap summary to the
>>>>>> homepage. It is a bit of a commitment to keep that roadmap up to date, but
>>>>>> I think the benefit for users justifies that.
>>>>>> The Apache Beam project has added such a roadmap [1]
>>>>>> <https://beam.apache.org/roadmap/>, which was received very well by
>>>>>> the community, I would suggest to follow a similar structure here.
>>>>>>
>>>>>> If the community is in favor of this, I would volunteer to write a
>>>>>> first version of such a roadmap. The points I would include are below.
>>>>>>
>>>>>> Best,
>>>>>> Stephan
>>>>>>
>>>>>> [1] https://beam.apache.org/roadmap/
>>>>>>
>>>>>> ========================================================
>>>>>>
>>>>>> Disclaimer: Apache Flink is not governed or steered by any one single
>>>>>> entity, but by its community and Project Management Committee (PMC). This
>>>>>> is not a authoritative roadmap in the sense of a plan with a specific
>>>>>> timeline. Instead, we share our vision for the future and major initiatives
>>>>>> that are receiving attention and give users and contributors an
>>>>>> understanding what they can look forward to.
>>>>>>
>>>>>> *Future Role of Table API and DataStream API*
>>>>>>   - Table API becomes first class citizen
>>>>>>   - Table API becomes primary API for analytics use cases
>>>>>>       * Declarative, automatic optimizations
>>>>>>       * No manual control over state and timers
>>>>>>   - DataStream API becomes primary API for applications and data
>>>>>> pipeline use cases
>>>>>>       * Physical, user controls data types, no magic or optimizer
>>>>>>       * Explicit control over state and time
>>>>>>
>>>>>> *Batch Streaming Unification*
>>>>>>   - Table API unification (environments) (FLIP-32)
>>>>>>   - New unified source interface (FLIP-27)
>>>>>>   - Runtime operator unification & code reuse between DataStream /
>>>>>> Table
>>>>>>   - Extending Table API to make it convenient API for all analytical
>>>>>> use cases (easier mix in of UDFs)
>>>>>>   - Same join operators on bounded/unbounded Table API and DataStream
>>>>>> API
>>>>>>
>>>>>> *Faster Batch (Bounded Streams)*
>>>>>>   - Much of this comes via Blink contribution/merging
>>>>>>   - Fine-grained Fault Tolerance on bounded data (Table API)
>>>>>>   - Batch Scheduling on bounded data (Table API)
>>>>>>   - External Shuffle Services Support on bounded streams
>>>>>>   - Caching of intermediate results on bounded data (Table API)
>>>>>>   - Extending DataStream API to explicitly model bounded streams (API
>>>>>> breaking)
>>>>>>   - Add fine fault tolerance, scheduling, caching also to DataStream
>>>>>> API
>>>>>>
>>>>>> *Streaming State Evolution*
>>>>>>   - Let all built-in serializers support stable evolution
>>>>>>   - First class support for other evolvable formats (Protobuf, Thrift)
>>>>>>   - Savepoint input/output format to modify / adjust savepoints
>>>>>>
>>>>>> *Simpler Event Time Handling*
>>>>>>   - Event Time Alignment in Sources
>>>>>>   - Simpler out-of-the box support in sources
>>>>>>
>>>>>> *Checkpointing*
>>>>>>   - Consistency of Side Effects: suspend / end with savepoint
>>>>>> (FLIP-34)
>>>>>>   - Failed checkpoints explicitly aborted on TaskManagers (not only
>>>>>> on coordinator)
>>>>>>
>>>>>> *Automatic scaling (adjusting parallelism)*
>>>>>>   - Reactive scaling
>>>>>>   - Active scaling policies
>>>>>>
>>>>>> *Kubernetes Integration*
>>>>>>   - Active Kubernetes Integration (Flink actively manages containers)
>>>>>>
>>>>>> *SQL Ecosystem*
>>>>>>   - Extended Metadata Stores / Catalog / Schema Registries support
>>>>>>   - DDL support
>>>>>>   - Integration with Hive Ecosystem
>>>>>>
>>>>>> *Simpler Handling of Dependencies*
>>>>>>   - Scala in the APIs, but not in the core (hide in separate class
>>>>>> loader)
>>>>>>   - Hadoop-free by default
>>>>>>
>>>>>>
>>
>> --
>> Best Regards
>>
>> Jeff Zhang
>>
>

Re: [DISCUSS] Adding a mid-term roadmap to the Flink website

Posted by Fabian Hueske <fh...@gmail.com>.

Hi,

I like the idea of putting the roadmap on the website because it is much
more visible (and IMO more credible, obligatory) there.
However, I share the concerns about frequent updates.

It think it would be great to update the "official" roadmap on the website
once per release (-bugfix releases), i.e., every three month.
We can use the wiki to collect and draft the roadmap for the next update.

Best, Fabian


Am Do., 14. Feb. 2019 um 11:03 Uhr schrieb Jeff Zhang <zj...@gmail.com>:

> Hi Stephan,
>
> Thanks for this proposal. It is a good idea to track the roadmap. One
> suggestion is that it might be better to put it into wiki page first.
> Because it is easier to update the roadmap on wiki compared to on flink web
> site. And I guess we may need to update the roadmap very often at the
> beginning as there's so many discussions and proposals in community
> recently. We can move it into flink web site later when we feel it could be
> nailed down.
>
> Stephan Ewen <se...@apache.org> 于2019年2月14日周四 下午5:44写道：
>
>> Thanks Jincheng and Rong Rong!
>>
>> I am not deciding a roadmap and making a call on what features should be
>> developed or not. I was only collecting broader issues that are already
>> happening or have an active FLIP/design discussion plus committer support.
>>
>> Do we have that for the suggested issues as well? If yes , we can add
>> them (can you point me to the issue/mail-thread), if not, let's try and
>> move the discussion forward and add them to the roadmap overview then.
>>
>> Best,
>> Stephan
>>
>>
>> On Wed, Feb 13, 2019 at 6:47 PM Rong Rong <wa...@gmail.com> wrote:
>>
>>> Thanks Stephan for the great proposal.
>>>
>>> This would not only be beneficial for new users but also for
>>> contributors to keep track on all upcoming features.
>>>
>>> I think that better window operator support can also be separately group
>>> into its own category, as they affects both future DataStream API and batch
>>> stream unification.
>>> can we also include:
>>> - OVER aggregate for DataStream API separately as @jincheng suggested.
>>> - Improving sliding window operator [1]
>>>
>>> One more additional suggestion, can we also include a more extendable
>>> security module [2,3] @shuyi and I are currently working on?
>>> This will significantly improve the usability for Flink in corporate
>>> environments where proprietary or 3rd-party security integration is needed.
>>>
>>> Thanks,
>>> Rong
>>>
>>>
>>> [1]
>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Improvement-to-Flink-Window-Operator-with-Slicing-td25750.html
>>> [2]
>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-security-improvements-td21068.html
>>> [3]
>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-Kerberos-Improvement-td25983.html
>>>
>>>
>>>
>>>
>>> On Wed, Feb 13, 2019 at 3:39 AM jincheng sun <su...@gmail.com>
>>> wrote:
>>>
>>>> Very excited and thank you for launching such a great discussion,
>>>> Stephan !
>>>>
>>>> Here only a little suggestion that in the Batch Streaming Unification
>>>> section, do we need to add an item:
>>>>
>>>> - Same window operators on bounded/unbounded Table API and DataStream
>>>> API
>>>> (currently OVER window only exists in SQL/TableAPI, DataStream API does
>>>> not yet support)
>>>>
>>>> Best,
>>>> Jincheng
>>>>
>>>> Stephan Ewen <se...@apache.org> 于2019年2月13日周三 下午7:21写道：
>>>>
>>>>> Hi all!
>>>>>
>>>>> Recently several contributors, committers, and users asked about
>>>>> making it more visible in which way the project is currently going.
>>>>>
>>>>> Users and developers can track the direction by following the
>>>>> discussion threads and JIRA, but due to the mass of discussions and open
>>>>> issues, it is very hard to get a good overall picture.
>>>>> Especially for new users and contributors, is is very hard to get a
>>>>> quick overview of the project direction.
>>>>>
>>>>> To fix this, I suggest to add a brief roadmap summary to the homepage.
>>>>> It is a bit of a commitment to keep that roadmap up to date, but I think
>>>>> the benefit for users justifies that.
>>>>> The Apache Beam project has added such a roadmap [1]
>>>>> <https://beam.apache.org/roadmap/>, which was received very well by
>>>>> the community, I would suggest to follow a similar structure here.
>>>>>
>>>>> If the community is in favor of this, I would volunteer to write a
>>>>> first version of such a roadmap. The points I would include are below.
>>>>>
>>>>> Best,
>>>>> Stephan
>>>>>
>>>>> [1] https://beam.apache.org/roadmap/
>>>>>
>>>>> ========================================================
>>>>>
>>>>> Disclaimer: Apache Flink is not governed or steered by any one single
>>>>> entity, but by its community and Project Management Committee (PMC). This
>>>>> is not a authoritative roadmap in the sense of a plan with a specific
>>>>> timeline. Instead, we share our vision for the future and major initiatives
>>>>> that are receiving attention and give users and contributors an
>>>>> understanding what they can look forward to.
>>>>>
>>>>> *Future Role of Table API and DataStream API*
>>>>>   - Table API becomes first class citizen
>>>>>   - Table API becomes primary API for analytics use cases
>>>>>       * Declarative, automatic optimizations
>>>>>       * No manual control over state and timers
>>>>>   - DataStream API becomes primary API for applications and data
>>>>> pipeline use cases
>>>>>       * Physical, user controls data types, no magic or optimizer
>>>>>       * Explicit control over state and time
>>>>>
>>>>> *Batch Streaming Unification*
>>>>>   - Table API unification (environments) (FLIP-32)
>>>>>   - New unified source interface (FLIP-27)
>>>>>   - Runtime operator unification & code reuse between DataStream /
>>>>> Table
>>>>>   - Extending Table API to make it convenient API for all analytical
>>>>> use cases (easier mix in of UDFs)
>>>>>   - Same join operators on bounded/unbounded Table API and DataStream
>>>>> API
>>>>>
>>>>> *Faster Batch (Bounded Streams)*
>>>>>   - Much of this comes via Blink contribution/merging
>>>>>   - Fine-grained Fault Tolerance on bounded data (Table API)
>>>>>   - Batch Scheduling on bounded data (Table API)
>>>>>   - External Shuffle Services Support on bounded streams
>>>>>   - Caching of intermediate results on bounded data (Table API)
>>>>>   - Extending DataStream API to explicitly model bounded streams (API
>>>>> breaking)
>>>>>   - Add fine fault tolerance, scheduling, caching also to DataStream
>>>>> API
>>>>>
>>>>> *Streaming State Evolution*
>>>>>   - Let all built-in serializers support stable evolution
>>>>>   - First class support for other evolvable formats (Protobuf, Thrift)
>>>>>   - Savepoint input/output format to modify / adjust savepoints
>>>>>
>>>>> *Simpler Event Time Handling*
>>>>>   - Event Time Alignment in Sources
>>>>>   - Simpler out-of-the box support in sources
>>>>>
>>>>> *Checkpointing*
>>>>>   - Consistency of Side Effects: suspend / end with savepoint (FLIP-34)
>>>>>   - Failed checkpoints explicitly aborted on TaskManagers (not only on
>>>>> coordinator)
>>>>>
>>>>> *Automatic scaling (adjusting parallelism)*
>>>>>   - Reactive scaling
>>>>>   - Active scaling policies
>>>>>
>>>>> *Kubernetes Integration*
>>>>>   - Active Kubernetes Integration (Flink actively manages containers)
>>>>>
>>>>> *SQL Ecosystem*
>>>>>   - Extended Metadata Stores / Catalog / Schema Registries support
>>>>>   - DDL support
>>>>>   - Integration with Hive Ecosystem
>>>>>
>>>>> *Simpler Handling of Dependencies*
>>>>>   - Scala in the APIs, but not in the core (hide in separate class
>>>>> loader)
>>>>>   - Hadoop-free by default
>>>>>
>>>>>
>
> --
> Best Regards
>
> Jeff Zhang
>

Re: [DISCUSS] Adding a mid-term roadmap to the Flink website

Posted by zhijiang <wa...@aliyun.com>.

Thanks Stephan for this proposal and I totally agree with it. 

It is very necessary to summarize the overall features/directions the community is going or planning to go. Although I almost checked the mailing list everyday, it still seems difficult to trace everything. In addtion I think this whole roadmap picture can also help expose the relationships among different items, even avoid the similar/duplicated thoughts or works.

Just one small suggestion, if we coule add some existing link (jira/discussion/FLIP/google doc) for each listed item, then it would be easy to keep trace of the interested one and handle the progress of it.

Best,
Zhijiang
------------------------------------------------------------------
From:Jeff Zhang <zj...@gmail.com>
Send Time:2019年2月14日(星期四) 18:03
To:Stephan Ewen <se...@apache.org>
Cc:dev <de...@flink.apache.org>; user <us...@flink.apache.org>; jincheng sun <su...@gmail.com>; Shuyi Chen <su...@gmail.com>; Rong Rong <wa...@gmail.com>
Subject:Re: [DISCUSS] Adding a mid-term roadmap to the Flink website

Hi Stephan,

Thanks for this proposal. It is a good idea to track the roadmap. One
suggestion is that it might be better to put it into wiki page first.
Because it is easier to update the roadmap on wiki compared to on flink web
site. And I guess we may need to update the roadmap very often at the
beginning as there's so many discussions and proposals in community
recently. We can move it into flink web site later when we feel it could be
nailed down.

Stephan Ewen <se...@apache.org> 于2019年2月14日周四 下午5:44写道：

> Thanks Jincheng and Rong Rong!
>
> I am not deciding a roadmap and making a call on what features should be
> developed or not. I was only collecting broader issues that are already
> happening or have an active FLIP/design discussion plus committer support.
>
> Do we have that for the suggested issues as well? If yes , we can add them
> (can you point me to the issue/mail-thread), if not, let's try and move the
> discussion forward and add them to the roadmap overview then.
>
> Best,
> Stephan
>
>
> On Wed, Feb 13, 2019 at 6:47 PM Rong Rong <wa...@gmail.com> wrote:
>
>> Thanks Stephan for the great proposal.
>>
>> This would not only be beneficial for new users but also for contributors
>> to keep track on all upcoming features.
>>
>> I think that better window operator support can also be separately group
>> into its own category, as they affects both future DataStream API and batch
>> stream unification.
>> can we also include:
>> - OVER aggregate for DataStream API separately as @jincheng suggested.
>> - Improving sliding window operator [1]
>>
>> One more additional suggestion, can we also include a more extendable
>> security module [2,3] @shuyi and I are currently working on?
>> This will significantly improve the usability for Flink in corporate
>> environments where proprietary or 3rd-party security integration is needed.
>>
>> Thanks,
>> Rong
>>
>>
>> [1]
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Improvement-to-Flink-Window-Operator-with-Slicing-td25750.html
>> [2]
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-security-improvements-td21068.html
>> [3]
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-Kerberos-Improvement-td25983.html
>>
>>
>>
>>
>> On Wed, Feb 13, 2019 at 3:39 AM jincheng sun <su...@gmail.com>
>> wrote:
>>
>>> Very excited and thank you for launching such a great discussion,
>>> Stephan !
>>>
>>> Here only a little suggestion that in the Batch Streaming Unification
>>> section, do we need to add an item:
>>>
>>> - Same window operators on bounded/unbounded Table API and DataStream
>>> API
>>> (currently OVER window only exists in SQL/TableAPI, DataStream API does
>>> not yet support)
>>>
>>> Best,
>>> Jincheng
>>>
>>> Stephan Ewen <se...@apache.org> 于2019年2月13日周三 下午7:21写道：
>>>
>>>> Hi all!
>>>>
>>>> Recently several contributors, committers, and users asked about making
>>>> it more visible in which way the project is currently going.
>>>>
>>>> Users and developers can track the direction by following the
>>>> discussion threads and JIRA, but due to the mass of discussions and open
>>>> issues, it is very hard to get a good overall picture.
>>>> Especially for new users and contributors, is is very hard to get a
>>>> quick overview of the project direction.
>>>>
>>>> To fix this, I suggest to add a brief roadmap summary to the homepage.
>>>> It is a bit of a commitment to keep that roadmap up to date, but I think
>>>> the benefit for users justifies that.
>>>> The Apache Beam project has added such a roadmap [1]
>>>> <https://beam.apache.org/roadmap/>, which was received very well by
>>>> the community, I would suggest to follow a similar structure here.
>>>>
>>>> If the community is in favor of this, I would volunteer to write a
>>>> first version of such a roadmap. The points I would include are below.
>>>>
>>>> Best,
>>>> Stephan
>>>>
>>>> [1] https://beam.apache.org/roadmap/
>>>>
>>>> ========================================================
>>>>
>>>> Disclaimer: Apache Flink is not governed or steered by any one single
>>>> entity, but by its community and Project Management Committee (PMC). This
>>>> is not a authoritative roadmap in the sense of a plan with a specific
>>>> timeline. Instead, we share our vision for the future and major initiatives
>>>> that are receiving attention and give users and contributors an
>>>> understanding what they can look forward to.
>>>>
>>>> *Future Role of Table API and DataStream API*
>>>>   - Table API becomes first class citizen
>>>>   - Table API becomes primary API for analytics use cases
>>>>       * Declarative, automatic optimizations
>>>>       * No manual control over state and timers
>>>>   - DataStream API becomes primary API for applications and data
>>>> pipeline use cases
>>>>       * Physical, user controls data types, no magic or optimizer
>>>>       * Explicit control over state and time
>>>>
>>>> *Batch Streaming Unification*
>>>>   - Table API unification (environments) (FLIP-32)
>>>>   - New unified source interface (FLIP-27)
>>>>   - Runtime operator unification & code reuse between DataStream / Table
>>>>   - Extending Table API to make it convenient API for all analytical
>>>> use cases (easier mix in of UDFs)
>>>>   - Same join operators on bounded/unbounded Table API and DataStream
>>>> API
>>>>
>>>> *Faster Batch (Bounded Streams)*
>>>>   - Much of this comes via Blink contribution/merging
>>>>   - Fine-grained Fault Tolerance on bounded data (Table API)
>>>>   - Batch Scheduling on bounded data (Table API)
>>>>   - External Shuffle Services Support on bounded streams
>>>>   - Caching of intermediate results on bounded data (Table API)
>>>>   - Extending DataStream API to explicitly model bounded streams (API
>>>> breaking)
>>>>   - Add fine fault tolerance, scheduling, caching also to DataStream API
>>>>
>>>> *Streaming State Evolution*
>>>>   - Let all built-in serializers support stable evolution
>>>>   - First class support for other evolvable formats (Protobuf, Thrift)
>>>>   - Savepoint input/output format to modify / adjust savepoints
>>>>
>>>> *Simpler Event Time Handling*
>>>>   - Event Time Alignment in Sources
>>>>   - Simpler out-of-the box support in sources
>>>>
>>>> *Checkpointing*
>>>>   - Consistency of Side Effects: suspend / end with savepoint (FLIP-34)
>>>>   - Failed checkpoints explicitly aborted on TaskManagers (not only on
>>>> coordinator)
>>>>
>>>> *Automatic scaling (adjusting parallelism)*
>>>>   - Reactive scaling
>>>>   - Active scaling policies
>>>>
>>>> *Kubernetes Integration*
>>>>   - Active Kubernetes Integration (Flink actively manages containers)
>>>>
>>>> *SQL Ecosystem*
>>>>   - Extended Metadata Stores / Catalog / Schema Registries support
>>>>   - DDL support
>>>>   - Integration with Hive Ecosystem
>>>>
>>>> *Simpler Handling of Dependencies*
>>>>   - Scala in the APIs, but not in the core (hide in separate class
>>>> loader)
>>>>   - Hadoop-free by default
>>>>
>>>>

-- 
Best Regards

Jeff Zhang

Re: [DISCUSS] Adding a mid-term roadmap to the Flink website

Posted by zhijiang <wa...@aliyun.com.INVALID>.

Thanks Stephan for this proposal and I totally agree with it. 

It is very necessary to summarize the overall features/directions the community is going or planning to go. Although I almost checked the mailing list everyday, it still seems difficult to trace everything. In addtion I think this whole roadmap picture can also help expose the relationships among different items, even avoid the similar/duplicated thoughts or works.

Just one small suggestion, if we coule add some existing link (jira/discussion/FLIP/google doc) for each listed item, then it would be easy to keep trace of the interested one and handle the progress of it.

Best,
Zhijiang
------------------------------------------------------------------
From:Jeff Zhang <zj...@gmail.com>
Send Time:2019年2月14日(星期四) 18:03
To:Stephan Ewen <se...@apache.org>
Cc:dev <de...@flink.apache.org>; user <us...@flink.apache.org>; jincheng sun <su...@gmail.com>; Shuyi Chen <su...@gmail.com>; Rong Rong <wa...@gmail.com>
Subject:Re: [DISCUSS] Adding a mid-term roadmap to the Flink website

Hi Stephan,

Thanks for this proposal. It is a good idea to track the roadmap. One
suggestion is that it might be better to put it into wiki page first.
Because it is easier to update the roadmap on wiki compared to on flink web
site. And I guess we may need to update the roadmap very often at the
beginning as there's so many discussions and proposals in community
recently. We can move it into flink web site later when we feel it could be
nailed down.

Stephan Ewen <se...@apache.org> 于2019年2月14日周四 下午5:44写道：

> Thanks Jincheng and Rong Rong!
>
> I am not deciding a roadmap and making a call on what features should be
> developed or not. I was only collecting broader issues that are already
> happening or have an active FLIP/design discussion plus committer support.
>
> Do we have that for the suggested issues as well? If yes , we can add them
> (can you point me to the issue/mail-thread), if not, let's try and move the
> discussion forward and add them to the roadmap overview then.
>
> Best,
> Stephan
>
>
> On Wed, Feb 13, 2019 at 6:47 PM Rong Rong <wa...@gmail.com> wrote:
>
>> Thanks Stephan for the great proposal.
>>
>> This would not only be beneficial for new users but also for contributors
>> to keep track on all upcoming features.
>>
>> I think that better window operator support can also be separately group
>> into its own category, as they affects both future DataStream API and batch
>> stream unification.
>> can we also include:
>> - OVER aggregate for DataStream API separately as @jincheng suggested.
>> - Improving sliding window operator [1]
>>
>> One more additional suggestion, can we also include a more extendable
>> security module [2,3] @shuyi and I are currently working on?
>> This will significantly improve the usability for Flink in corporate
>> environments where proprietary or 3rd-party security integration is needed.
>>
>> Thanks,
>> Rong
>>
>>
>> [1]
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Improvement-to-Flink-Window-Operator-with-Slicing-td25750.html
>> [2]
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-security-improvements-td21068.html
>> [3]
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-Kerberos-Improvement-td25983.html
>>
>>
>>
>>
>> On Wed, Feb 13, 2019 at 3:39 AM jincheng sun <su...@gmail.com>
>> wrote:
>>
>>> Very excited and thank you for launching such a great discussion,
>>> Stephan !
>>>
>>> Here only a little suggestion that in the Batch Streaming Unification
>>> section, do we need to add an item:
>>>
>>> - Same window operators on bounded/unbounded Table API and DataStream
>>> API
>>> (currently OVER window only exists in SQL/TableAPI, DataStream API does
>>> not yet support)
>>>
>>> Best,
>>> Jincheng
>>>
>>> Stephan Ewen <se...@apache.org> 于2019年2月13日周三 下午7:21写道：
>>>
>>>> Hi all!
>>>>
>>>> Recently several contributors, committers, and users asked about making
>>>> it more visible in which way the project is currently going.
>>>>
>>>> Users and developers can track the direction by following the
>>>> discussion threads and JIRA, but due to the mass of discussions and open
>>>> issues, it is very hard to get a good overall picture.
>>>> Especially for new users and contributors, is is very hard to get a
>>>> quick overview of the project direction.
>>>>
>>>> To fix this, I suggest to add a brief roadmap summary to the homepage.
>>>> It is a bit of a commitment to keep that roadmap up to date, but I think
>>>> the benefit for users justifies that.
>>>> The Apache Beam project has added such a roadmap [1]
>>>> <https://beam.apache.org/roadmap/>, which was received very well by
>>>> the community, I would suggest to follow a similar structure here.
>>>>
>>>> If the community is in favor of this, I would volunteer to write a
>>>> first version of such a roadmap. The points I would include are below.
>>>>
>>>> Best,
>>>> Stephan
>>>>
>>>> [1] https://beam.apache.org/roadmap/
>>>>
>>>> ========================================================
>>>>
>>>> Disclaimer: Apache Flink is not governed or steered by any one single
>>>> entity, but by its community and Project Management Committee (PMC). This
>>>> is not a authoritative roadmap in the sense of a plan with a specific
>>>> timeline. Instead, we share our vision for the future and major initiatives
>>>> that are receiving attention and give users and contributors an
>>>> understanding what they can look forward to.
>>>>
>>>> *Future Role of Table API and DataStream API*
>>>>   - Table API becomes first class citizen
>>>>   - Table API becomes primary API for analytics use cases
>>>>       * Declarative, automatic optimizations
>>>>       * No manual control over state and timers
>>>>   - DataStream API becomes primary API for applications and data
>>>> pipeline use cases
>>>>       * Physical, user controls data types, no magic or optimizer
>>>>       * Explicit control over state and time
>>>>
>>>> *Batch Streaming Unification*
>>>>   - Table API unification (environments) (FLIP-32)
>>>>   - New unified source interface (FLIP-27)
>>>>   - Runtime operator unification & code reuse between DataStream / Table
>>>>   - Extending Table API to make it convenient API for all analytical
>>>> use cases (easier mix in of UDFs)
>>>>   - Same join operators on bounded/unbounded Table API and DataStream
>>>> API
>>>>
>>>> *Faster Batch (Bounded Streams)*
>>>>   - Much of this comes via Blink contribution/merging
>>>>   - Fine-grained Fault Tolerance on bounded data (Table API)
>>>>   - Batch Scheduling on bounded data (Table API)
>>>>   - External Shuffle Services Support on bounded streams
>>>>   - Caching of intermediate results on bounded data (Table API)
>>>>   - Extending DataStream API to explicitly model bounded streams (API
>>>> breaking)
>>>>   - Add fine fault tolerance, scheduling, caching also to DataStream API
>>>>
>>>> *Streaming State Evolution*
>>>>   - Let all built-in serializers support stable evolution
>>>>   - First class support for other evolvable formats (Protobuf, Thrift)
>>>>   - Savepoint input/output format to modify / adjust savepoints
>>>>
>>>> *Simpler Event Time Handling*
>>>>   - Event Time Alignment in Sources
>>>>   - Simpler out-of-the box support in sources
>>>>
>>>> *Checkpointing*
>>>>   - Consistency of Side Effects: suspend / end with savepoint (FLIP-34)
>>>>   - Failed checkpoints explicitly aborted on TaskManagers (not only on
>>>> coordinator)
>>>>
>>>> *Automatic scaling (adjusting parallelism)*
>>>>   - Reactive scaling
>>>>   - Active scaling policies
>>>>
>>>> *Kubernetes Integration*
>>>>   - Active Kubernetes Integration (Flink actively manages containers)
>>>>
>>>> *SQL Ecosystem*
>>>>   - Extended Metadata Stores / Catalog / Schema Registries support
>>>>   - DDL support
>>>>   - Integration with Hive Ecosystem
>>>>
>>>> *Simpler Handling of Dependencies*
>>>>   - Scala in the APIs, but not in the core (hide in separate class
>>>> loader)
>>>>   - Hadoop-free by default
>>>>
>>>>

-- 
Best Regards

Jeff Zhang

Re: [DISCUSS] Adding a mid-term roadmap to the Flink website

Posted by Fabian Hueske <fh...@gmail.com>.

Hi,

I like the idea of putting the roadmap on the website because it is much
more visible (and IMO more credible, obligatory) there.
However, I share the concerns about frequent updates.

It think it would be great to update the "official" roadmap on the website
once per release (-bugfix releases), i.e., every three month.
We can use the wiki to collect and draft the roadmap for the next update.

Best, Fabian


Am Do., 14. Feb. 2019 um 11:03 Uhr schrieb Jeff Zhang <zj...@gmail.com>:

> Hi Stephan,
>
> Thanks for this proposal. It is a good idea to track the roadmap. One
> suggestion is that it might be better to put it into wiki page first.
> Because it is easier to update the roadmap on wiki compared to on flink web
> site. And I guess we may need to update the roadmap very often at the
> beginning as there's so many discussions and proposals in community
> recently. We can move it into flink web site later when we feel it could be
> nailed down.
>
> Stephan Ewen <se...@apache.org> 于2019年2月14日周四 下午5:44写道：
>
>> Thanks Jincheng and Rong Rong!
>>
>> I am not deciding a roadmap and making a call on what features should be
>> developed or not. I was only collecting broader issues that are already
>> happening or have an active FLIP/design discussion plus committer support.
>>
>> Do we have that for the suggested issues as well? If yes , we can add
>> them (can you point me to the issue/mail-thread), if not, let's try and
>> move the discussion forward and add them to the roadmap overview then.
>>
>> Best,
>> Stephan
>>
>>
>> On Wed, Feb 13, 2019 at 6:47 PM Rong Rong <wa...@gmail.com> wrote:
>>
>>> Thanks Stephan for the great proposal.
>>>
>>> This would not only be beneficial for new users but also for
>>> contributors to keep track on all upcoming features.
>>>
>>> I think that better window operator support can also be separately group
>>> into its own category, as they affects both future DataStream API and batch
>>> stream unification.
>>> can we also include:
>>> - OVER aggregate for DataStream API separately as @jincheng suggested.
>>> - Improving sliding window operator [1]
>>>
>>> One more additional suggestion, can we also include a more extendable
>>> security module [2,3] @shuyi and I are currently working on?
>>> This will significantly improve the usability for Flink in corporate
>>> environments where proprietary or 3rd-party security integration is needed.
>>>
>>> Thanks,
>>> Rong
>>>
>>>
>>> [1]
>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Improvement-to-Flink-Window-Operator-with-Slicing-td25750.html
>>> [2]
>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-security-improvements-td21068.html
>>> [3]
>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-Kerberos-Improvement-td25983.html
>>>
>>>
>>>
>>>
>>> On Wed, Feb 13, 2019 at 3:39 AM jincheng sun <su...@gmail.com>
>>> wrote:
>>>
>>>> Very excited and thank you for launching such a great discussion,
>>>> Stephan !
>>>>
>>>> Here only a little suggestion that in the Batch Streaming Unification
>>>> section, do we need to add an item:
>>>>
>>>> - Same window operators on bounded/unbounded Table API and DataStream
>>>> API
>>>> (currently OVER window only exists in SQL/TableAPI, DataStream API does
>>>> not yet support)
>>>>
>>>> Best,
>>>> Jincheng
>>>>
>>>> Stephan Ewen <se...@apache.org> 于2019年2月13日周三 下午7:21写道：
>>>>
>>>>> Hi all!
>>>>>
>>>>> Recently several contributors, committers, and users asked about
>>>>> making it more visible in which way the project is currently going.
>>>>>
>>>>> Users and developers can track the direction by following the
>>>>> discussion threads and JIRA, but due to the mass of discussions and open
>>>>> issues, it is very hard to get a good overall picture.
>>>>> Especially for new users and contributors, is is very hard to get a
>>>>> quick overview of the project direction.
>>>>>
>>>>> To fix this, I suggest to add a brief roadmap summary to the homepage.
>>>>> It is a bit of a commitment to keep that roadmap up to date, but I think
>>>>> the benefit for users justifies that.
>>>>> The Apache Beam project has added such a roadmap [1]
>>>>> <https://beam.apache.org/roadmap/>, which was received very well by
>>>>> the community, I would suggest to follow a similar structure here.
>>>>>
>>>>> If the community is in favor of this, I would volunteer to write a
>>>>> first version of such a roadmap. The points I would include are below.
>>>>>
>>>>> Best,
>>>>> Stephan
>>>>>
>>>>> [1] https://beam.apache.org/roadmap/
>>>>>
>>>>> ========================================================
>>>>>
>>>>> Disclaimer: Apache Flink is not governed or steered by any one single
>>>>> entity, but by its community and Project Management Committee (PMC). This
>>>>> is not a authoritative roadmap in the sense of a plan with a specific
>>>>> timeline. Instead, we share our vision for the future and major initiatives
>>>>> that are receiving attention and give users and contributors an
>>>>> understanding what they can look forward to.
>>>>>
>>>>> *Future Role of Table API and DataStream API*
>>>>>   - Table API becomes first class citizen
>>>>>   - Table API becomes primary API for analytics use cases
>>>>>       * Declarative, automatic optimizations
>>>>>       * No manual control over state and timers
>>>>>   - DataStream API becomes primary API for applications and data
>>>>> pipeline use cases
>>>>>       * Physical, user controls data types, no magic or optimizer
>>>>>       * Explicit control over state and time
>>>>>
>>>>> *Batch Streaming Unification*
>>>>>   - Table API unification (environments) (FLIP-32)
>>>>>   - New unified source interface (FLIP-27)
>>>>>   - Runtime operator unification & code reuse between DataStream /
>>>>> Table
>>>>>   - Extending Table API to make it convenient API for all analytical
>>>>> use cases (easier mix in of UDFs)
>>>>>   - Same join operators on bounded/unbounded Table API and DataStream
>>>>> API
>>>>>
>>>>> *Faster Batch (Bounded Streams)*
>>>>>   - Much of this comes via Blink contribution/merging
>>>>>   - Fine-grained Fault Tolerance on bounded data (Table API)
>>>>>   - Batch Scheduling on bounded data (Table API)
>>>>>   - External Shuffle Services Support on bounded streams
>>>>>   - Caching of intermediate results on bounded data (Table API)
>>>>>   - Extending DataStream API to explicitly model bounded streams (API
>>>>> breaking)
>>>>>   - Add fine fault tolerance, scheduling, caching also to DataStream
>>>>> API
>>>>>
>>>>> *Streaming State Evolution*
>>>>>   - Let all built-in serializers support stable evolution
>>>>>   - First class support for other evolvable formats (Protobuf, Thrift)
>>>>>   - Savepoint input/output format to modify / adjust savepoints
>>>>>
>>>>> *Simpler Event Time Handling*
>>>>>   - Event Time Alignment in Sources
>>>>>   - Simpler out-of-the box support in sources
>>>>>
>>>>> *Checkpointing*
>>>>>   - Consistency of Side Effects: suspend / end with savepoint (FLIP-34)
>>>>>   - Failed checkpoints explicitly aborted on TaskManagers (not only on
>>>>> coordinator)
>>>>>
>>>>> *Automatic scaling (adjusting parallelism)*
>>>>>   - Reactive scaling
>>>>>   - Active scaling policies
>>>>>
>>>>> *Kubernetes Integration*
>>>>>   - Active Kubernetes Integration (Flink actively manages containers)
>>>>>
>>>>> *SQL Ecosystem*
>>>>>   - Extended Metadata Stores / Catalog / Schema Registries support
>>>>>   - DDL support
>>>>>   - Integration with Hive Ecosystem
>>>>>
>>>>> *Simpler Handling of Dependencies*
>>>>>   - Scala in the APIs, but not in the core (hide in separate class
>>>>> loader)
>>>>>   - Hadoop-free by default
>>>>>
>>>>>
>
> --
> Best Regards
>
> Jeff Zhang
>

Re: [DISCUSS] Adding a mid-term roadmap to the Flink website

Posted by Jeff Zhang <zj...@gmail.com>.

Hi Stephan,

Thanks for this proposal. It is a good idea to track the roadmap. One
suggestion is that it might be better to put it into wiki page first.
Because it is easier to update the roadmap on wiki compared to on flink web
site. And I guess we may need to update the roadmap very often at the
beginning as there's so many discussions and proposals in community
recently. We can move it into flink web site later when we feel it could be
nailed down.

Stephan Ewen <se...@apache.org> 于2019年2月14日周四 下午5:44写道：

> Thanks Jincheng and Rong Rong!
>
> I am not deciding a roadmap and making a call on what features should be
> developed or not. I was only collecting broader issues that are already
> happening or have an active FLIP/design discussion plus committer support.
>
> Do we have that for the suggested issues as well? If yes , we can add them
> (can you point me to the issue/mail-thread), if not, let's try and move the
> discussion forward and add them to the roadmap overview then.
>
> Best,
> Stephan
>
>
> On Wed, Feb 13, 2019 at 6:47 PM Rong Rong <wa...@gmail.com> wrote:
>
>> Thanks Stephan for the great proposal.
>>
>> This would not only be beneficial for new users but also for contributors
>> to keep track on all upcoming features.
>>
>> I think that better window operator support can also be separately group
>> into its own category, as they affects both future DataStream API and batch
>> stream unification.
>> can we also include:
>> - OVER aggregate for DataStream API separately as @jincheng suggested.
>> - Improving sliding window operator [1]
>>
>> One more additional suggestion, can we also include a more extendable
>> security module [2,3] @shuyi and I are currently working on?
>> This will significantly improve the usability for Flink in corporate
>> environments where proprietary or 3rd-party security integration is needed.
>>
>> Thanks,
>> Rong
>>
>>
>> [1]
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Improvement-to-Flink-Window-Operator-with-Slicing-td25750.html
>> [2]
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-security-improvements-td21068.html
>> [3]
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-Kerberos-Improvement-td25983.html
>>
>>
>>
>>
>> On Wed, Feb 13, 2019 at 3:39 AM jincheng sun <su...@gmail.com>
>> wrote:
>>
>>> Very excited and thank you for launching such a great discussion,
>>> Stephan !
>>>
>>> Here only a little suggestion that in the Batch Streaming Unification
>>> section, do we need to add an item:
>>>
>>> - Same window operators on bounded/unbounded Table API and DataStream
>>> API
>>> (currently OVER window only exists in SQL/TableAPI, DataStream API does
>>> not yet support)
>>>
>>> Best,
>>> Jincheng
>>>
>>> Stephan Ewen <se...@apache.org> 于2019年2月13日周三 下午7:21写道：
>>>
>>>> Hi all!
>>>>
>>>> Recently several contributors, committers, and users asked about making
>>>> it more visible in which way the project is currently going.
>>>>
>>>> Users and developers can track the direction by following the
>>>> discussion threads and JIRA, but due to the mass of discussions and open
>>>> issues, it is very hard to get a good overall picture.
>>>> Especially for new users and contributors, is is very hard to get a
>>>> quick overview of the project direction.
>>>>
>>>> To fix this, I suggest to add a brief roadmap summary to the homepage.
>>>> It is a bit of a commitment to keep that roadmap up to date, but I think
>>>> the benefit for users justifies that.
>>>> The Apache Beam project has added such a roadmap [1]
>>>> <https://beam.apache.org/roadmap/>, which was received very well by
>>>> the community, I would suggest to follow a similar structure here.
>>>>
>>>> If the community is in favor of this, I would volunteer to write a
>>>> first version of such a roadmap. The points I would include are below.
>>>>
>>>> Best,
>>>> Stephan
>>>>
>>>> [1] https://beam.apache.org/roadmap/
>>>>
>>>> ========================================================
>>>>
>>>> Disclaimer: Apache Flink is not governed or steered by any one single
>>>> entity, but by its community and Project Management Committee (PMC). This
>>>> is not a authoritative roadmap in the sense of a plan with a specific
>>>> timeline. Instead, we share our vision for the future and major initiatives
>>>> that are receiving attention and give users and contributors an
>>>> understanding what they can look forward to.
>>>>
>>>> *Future Role of Table API and DataStream API*
>>>>   - Table API becomes first class citizen
>>>>   - Table API becomes primary API for analytics use cases
>>>>       * Declarative, automatic optimizations
>>>>       * No manual control over state and timers
>>>>   - DataStream API becomes primary API for applications and data
>>>> pipeline use cases
>>>>       * Physical, user controls data types, no magic or optimizer
>>>>       * Explicit control over state and time
>>>>
>>>> *Batch Streaming Unification*
>>>>   - Table API unification (environments) (FLIP-32)
>>>>   - New unified source interface (FLIP-27)
>>>>   - Runtime operator unification & code reuse between DataStream / Table
>>>>   - Extending Table API to make it convenient API for all analytical
>>>> use cases (easier mix in of UDFs)
>>>>   - Same join operators on bounded/unbounded Table API and DataStream
>>>> API
>>>>
>>>> *Faster Batch (Bounded Streams)*
>>>>   - Much of this comes via Blink contribution/merging
>>>>   - Fine-grained Fault Tolerance on bounded data (Table API)
>>>>   - Batch Scheduling on bounded data (Table API)
>>>>   - External Shuffle Services Support on bounded streams
>>>>   - Caching of intermediate results on bounded data (Table API)
>>>>   - Extending DataStream API to explicitly model bounded streams (API
>>>> breaking)
>>>>   - Add fine fault tolerance, scheduling, caching also to DataStream API
>>>>
>>>> *Streaming State Evolution*
>>>>   - Let all built-in serializers support stable evolution
>>>>   - First class support for other evolvable formats (Protobuf, Thrift)
>>>>   - Savepoint input/output format to modify / adjust savepoints
>>>>
>>>> *Simpler Event Time Handling*
>>>>   - Event Time Alignment in Sources
>>>>   - Simpler out-of-the box support in sources
>>>>
>>>> *Checkpointing*
>>>>   - Consistency of Side Effects: suspend / end with savepoint (FLIP-34)
>>>>   - Failed checkpoints explicitly aborted on TaskManagers (not only on
>>>> coordinator)
>>>>
>>>> *Automatic scaling (adjusting parallelism)*
>>>>   - Reactive scaling
>>>>   - Active scaling policies
>>>>
>>>> *Kubernetes Integration*
>>>>   - Active Kubernetes Integration (Flink actively manages containers)
>>>>
>>>> *SQL Ecosystem*
>>>>   - Extended Metadata Stores / Catalog / Schema Registries support
>>>>   - DDL support
>>>>   - Integration with Hive Ecosystem
>>>>
>>>> *Simpler Handling of Dependencies*
>>>>   - Scala in the APIs, but not in the core (hide in separate class
>>>> loader)
>>>>   - Hadoop-free by default
>>>>
>>>>

-- 
Best Regards

Jeff Zhang

Re: [DISCUSS] Adding a mid-term roadmap to the Flink website

Posted by Jark Wu <im...@gmail.com>.

Thanks Stephan for the proposal and a big +1 to this!

I also think it's a good idea to add a link of discussion/FLIP/JIRA to each
item as Zhijiang mentioned above.
This would be a great help for keeping track of progress and joining in the
discussion easily.

Best,
Jark

On Fri, 15 Feb 2019 at 11:34, jincheng sun <su...@gmail.com> wrote:

> Hi Stephan,
>
> Thanks for the clarification! You are right, we have never initiated a
> discussion about supporting OVER Window on DataStream, we can discuss it in
> a separate thread. I agree with you add the item after move the discussion
> forward.
>
> +1 for putting the roadmap on the website.
> +1 for periodically update the roadmap, as mentioned by Fabian, we can
> update it at every feature version release.
>
> Thanks,
> Jincheng
>
> Stephan Ewen <se...@apache.org> 于2019年2月14日周四 下午5:44写道：
>
>> Thanks Jincheng and Rong Rong!
>>
>> I am not deciding a roadmap and making a call on what features should be
>> developed or not. I was only collecting broader issues that are already
>> happening or have an active FLIP/design discussion plus committer support.
>>
>> Do we have that for the suggested issues as well? If yes , we can add
>> them (can you point me to the issue/mail-thread), if not, let's try and
>> move the discussion forward and add them to the roadmap overview then.
>>
>> Best,
>> Stephan
>>
>>
>> On Wed, Feb 13, 2019 at 6:47 PM Rong Rong <wa...@gmail.com> wrote:
>>
>>> Thanks Stephan for the great proposal.
>>>
>>> This would not only be beneficial for new users but also for
>>> contributors to keep track on all upcoming features.
>>>
>>> I think that better window operator support can also be separately group
>>> into its own category, as they affects both future DataStream API and batch
>>> stream unification.
>>> can we also include:
>>> - OVER aggregate for DataStream API separately as @jincheng suggested.
>>> - Improving sliding window operator [1]
>>>
>>> One more additional suggestion, can we also include a more extendable
>>> security module [2,3] @shuyi and I are currently working on?
>>> This will significantly improve the usability for Flink in corporate
>>> environments where proprietary or 3rd-party security integration is needed.
>>>
>>> Thanks,
>>> Rong
>>>
>>>
>>> [1]
>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Improvement-to-Flink-Window-Operator-with-Slicing-td25750.html
>>> [2]
>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-security-improvements-td21068.html
>>> [3]
>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-Kerberos-Improvement-td25983.html
>>>
>>>
>>>
>>>
>>> On Wed, Feb 13, 2019 at 3:39 AM jincheng sun <su...@gmail.com>
>>> wrote:
>>>
>>>> Very excited and thank you for launching such a great discussion,
>>>> Stephan !
>>>>
>>>> Here only a little suggestion that in the Batch Streaming Unification
>>>> section, do we need to add an item:
>>>>
>>>> - Same window operators on bounded/unbounded Table API and DataStream
>>>> API
>>>> (currently OVER window only exists in SQL/TableAPI, DataStream API does
>>>> not yet support)
>>>>
>>>> Best,
>>>> Jincheng
>>>>
>>>> Stephan Ewen <se...@apache.org> 于2019年2月13日周三 下午7:21写道：
>>>>
>>>>> Hi all!
>>>>>
>>>>> Recently several contributors, committers, and users asked about
>>>>> making it more visible in which way the project is currently going.
>>>>>
>>>>> Users and developers can track the direction by following the
>>>>> discussion threads and JIRA, but due to the mass of discussions and open
>>>>> issues, it is very hard to get a good overall picture.
>>>>> Especially for new users and contributors, is is very hard to get a
>>>>> quick overview of the project direction.
>>>>>
>>>>> To fix this, I suggest to add a brief roadmap summary to the homepage.
>>>>> It is a bit of a commitment to keep that roadmap up to date, but I think
>>>>> the benefit for users justifies that.
>>>>> The Apache Beam project has added such a roadmap [1]
>>>>> <https://beam.apache.org/roadmap/>, which was received very well by
>>>>> the community, I would suggest to follow a similar structure here.
>>>>>
>>>>> If the community is in favor of this, I would volunteer to write a
>>>>> first version of such a roadmap. The points I would include are below.
>>>>>
>>>>> Best,
>>>>> Stephan
>>>>>
>>>>> [1] https://beam.apache.org/roadmap/
>>>>>
>>>>> ========================================================
>>>>>
>>>>> Disclaimer: Apache Flink is not governed or steered by any one single
>>>>> entity, but by its community and Project Management Committee (PMC). This
>>>>> is not a authoritative roadmap in the sense of a plan with a specific
>>>>> timeline. Instead, we share our vision for the future and major initiatives
>>>>> that are receiving attention and give users and contributors an
>>>>> understanding what they can look forward to.
>>>>>
>>>>> *Future Role of Table API and DataStream API*
>>>>>   - Table API becomes first class citizen
>>>>>   - Table API becomes primary API for analytics use cases
>>>>>       * Declarative, automatic optimizations
>>>>>       * No manual control over state and timers
>>>>>   - DataStream API becomes primary API for applications and data
>>>>> pipeline use cases
>>>>>       * Physical, user controls data types, no magic or optimizer
>>>>>       * Explicit control over state and time
>>>>>
>>>>> *Batch Streaming Unification*
>>>>>   - Table API unification (environments) (FLIP-32)
>>>>>   - New unified source interface (FLIP-27)
>>>>>   - Runtime operator unification & code reuse between DataStream /
>>>>> Table
>>>>>   - Extending Table API to make it convenient API for all analytical
>>>>> use cases (easier mix in of UDFs)
>>>>>   - Same join operators on bounded/unbounded Table API and DataStream
>>>>> API
>>>>>
>>>>> *Faster Batch (Bounded Streams)*
>>>>>   - Much of this comes via Blink contribution/merging
>>>>>   - Fine-grained Fault Tolerance on bounded data (Table API)
>>>>>   - Batch Scheduling on bounded data (Table API)
>>>>>   - External Shuffle Services Support on bounded streams
>>>>>   - Caching of intermediate results on bounded data (Table API)
>>>>>   - Extending DataStream API to explicitly model bounded streams (API
>>>>> breaking)
>>>>>   - Add fine fault tolerance, scheduling, caching also to DataStream
>>>>> API
>>>>>
>>>>> *Streaming State Evolution*
>>>>>   - Let all built-in serializers support stable evolution
>>>>>   - First class support for other evolvable formats (Protobuf, Thrift)
>>>>>   - Savepoint input/output format to modify / adjust savepoints
>>>>>
>>>>> *Simpler Event Time Handling*
>>>>>   - Event Time Alignment in Sources
>>>>>   - Simpler out-of-the box support in sources
>>>>>
>>>>> *Checkpointing*
>>>>>   - Consistency of Side Effects: suspend / end with savepoint (FLIP-34)
>>>>>   - Failed checkpoints explicitly aborted on TaskManagers (not only on
>>>>> coordinator)
>>>>>
>>>>> *Automatic scaling (adjusting parallelism)*
>>>>>   - Reactive scaling
>>>>>   - Active scaling policies
>>>>>
>>>>> *Kubernetes Integration*
>>>>>   - Active Kubernetes Integration (Flink actively manages containers)
>>>>>
>>>>> *SQL Ecosystem*
>>>>>   - Extended Metadata Stores / Catalog / Schema Registries support
>>>>>   - DDL support
>>>>>   - Integration with Hive Ecosystem
>>>>>
>>>>> *Simpler Handling of Dependencies*
>>>>>   - Scala in the APIs, but not in the core (hide in separate class
>>>>> loader)
>>>>>   - Hadoop-free by default
>>>>>
>>>>>

Re: [DISCUSS] Adding a mid-term roadmap to the Flink website

Posted by Jark Wu <im...@gmail.com>.

Thanks Stephan for the proposal and a big +1 to this!

I also think it's a good idea to add a link of discussion/FLIP/JIRA to each
item as Zhijiang mentioned above.
This would be a great help for keeping track of progress and joining in the
discussion easily.

Best,
Jark

On Fri, 15 Feb 2019 at 11:34, jincheng sun <su...@gmail.com> wrote:

> Hi Stephan,
>
> Thanks for the clarification! You are right, we have never initiated a
> discussion about supporting OVER Window on DataStream, we can discuss it in
> a separate thread. I agree with you add the item after move the discussion
> forward.
>
> +1 for putting the roadmap on the website.
> +1 for periodically update the roadmap, as mentioned by Fabian, we can
> update it at every feature version release.
>
> Thanks,
> Jincheng
>
> Stephan Ewen <se...@apache.org> 于2019年2月14日周四 下午5:44写道：
>
>> Thanks Jincheng and Rong Rong!
>>
>> I am not deciding a roadmap and making a call on what features should be
>> developed or not. I was only collecting broader issues that are already
>> happening or have an active FLIP/design discussion plus committer support.
>>
>> Do we have that for the suggested issues as well? If yes , we can add
>> them (can you point me to the issue/mail-thread), if not, let's try and
>> move the discussion forward and add them to the roadmap overview then.
>>
>> Best,
>> Stephan
>>
>>
>> On Wed, Feb 13, 2019 at 6:47 PM Rong Rong <wa...@gmail.com> wrote:
>>
>>> Thanks Stephan for the great proposal.
>>>
>>> This would not only be beneficial for new users but also for
>>> contributors to keep track on all upcoming features.
>>>
>>> I think that better window operator support can also be separately group
>>> into its own category, as they affects both future DataStream API and batch
>>> stream unification.
>>> can we also include:
>>> - OVER aggregate for DataStream API separately as @jincheng suggested.
>>> - Improving sliding window operator [1]
>>>
>>> One more additional suggestion, can we also include a more extendable
>>> security module [2,3] @shuyi and I are currently working on?
>>> This will significantly improve the usability for Flink in corporate
>>> environments where proprietary or 3rd-party security integration is needed.
>>>
>>> Thanks,
>>> Rong
>>>
>>>
>>> [1]
>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Improvement-to-Flink-Window-Operator-with-Slicing-td25750.html
>>> [2]
>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-security-improvements-td21068.html
>>> [3]
>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-Kerberos-Improvement-td25983.html
>>>
>>>
>>>
>>>
>>> On Wed, Feb 13, 2019 at 3:39 AM jincheng sun <su...@gmail.com>
>>> wrote:
>>>
>>>> Very excited and thank you for launching such a great discussion,
>>>> Stephan !
>>>>
>>>> Here only a little suggestion that in the Batch Streaming Unification
>>>> section, do we need to add an item:
>>>>
>>>> - Same window operators on bounded/unbounded Table API and DataStream
>>>> API
>>>> (currently OVER window only exists in SQL/TableAPI, DataStream API does
>>>> not yet support)
>>>>
>>>> Best,
>>>> Jincheng
>>>>
>>>> Stephan Ewen <se...@apache.org> 于2019年2月13日周三 下午7:21写道：
>>>>
>>>>> Hi all!
>>>>>
>>>>> Recently several contributors, committers, and users asked about
>>>>> making it more visible in which way the project is currently going.
>>>>>
>>>>> Users and developers can track the direction by following the
>>>>> discussion threads and JIRA, but due to the mass of discussions and open
>>>>> issues, it is very hard to get a good overall picture.
>>>>> Especially for new users and contributors, is is very hard to get a
>>>>> quick overview of the project direction.
>>>>>
>>>>> To fix this, I suggest to add a brief roadmap summary to the homepage.
>>>>> It is a bit of a commitment to keep that roadmap up to date, but I think
>>>>> the benefit for users justifies that.
>>>>> The Apache Beam project has added such a roadmap [1]
>>>>> <https://beam.apache.org/roadmap/>, which was received very well by
>>>>> the community, I would suggest to follow a similar structure here.
>>>>>
>>>>> If the community is in favor of this, I would volunteer to write a
>>>>> first version of such a roadmap. The points I would include are below.
>>>>>
>>>>> Best,
>>>>> Stephan
>>>>>
>>>>> [1] https://beam.apache.org/roadmap/
>>>>>
>>>>> ========================================================
>>>>>
>>>>> Disclaimer: Apache Flink is not governed or steered by any one single
>>>>> entity, but by its community and Project Management Committee (PMC). This
>>>>> is not a authoritative roadmap in the sense of a plan with a specific
>>>>> timeline. Instead, we share our vision for the future and major initiatives
>>>>> that are receiving attention and give users and contributors an
>>>>> understanding what they can look forward to.
>>>>>
>>>>> *Future Role of Table API and DataStream API*
>>>>>   - Table API becomes first class citizen
>>>>>   - Table API becomes primary API for analytics use cases
>>>>>       * Declarative, automatic optimizations
>>>>>       * No manual control over state and timers
>>>>>   - DataStream API becomes primary API for applications and data
>>>>> pipeline use cases
>>>>>       * Physical, user controls data types, no magic or optimizer
>>>>>       * Explicit control over state and time
>>>>>
>>>>> *Batch Streaming Unification*
>>>>>   - Table API unification (environments) (FLIP-32)
>>>>>   - New unified source interface (FLIP-27)
>>>>>   - Runtime operator unification & code reuse between DataStream /
>>>>> Table
>>>>>   - Extending Table API to make it convenient API for all analytical
>>>>> use cases (easier mix in of UDFs)
>>>>>   - Same join operators on bounded/unbounded Table API and DataStream
>>>>> API
>>>>>
>>>>> *Faster Batch (Bounded Streams)*
>>>>>   - Much of this comes via Blink contribution/merging
>>>>>   - Fine-grained Fault Tolerance on bounded data (Table API)
>>>>>   - Batch Scheduling on bounded data (Table API)
>>>>>   - External Shuffle Services Support on bounded streams
>>>>>   - Caching of intermediate results on bounded data (Table API)
>>>>>   - Extending DataStream API to explicitly model bounded streams (API
>>>>> breaking)
>>>>>   - Add fine fault tolerance, scheduling, caching also to DataStream
>>>>> API
>>>>>
>>>>> *Streaming State Evolution*
>>>>>   - Let all built-in serializers support stable evolution
>>>>>   - First class support for other evolvable formats (Protobuf, Thrift)
>>>>>   - Savepoint input/output format to modify / adjust savepoints
>>>>>
>>>>> *Simpler Event Time Handling*
>>>>>   - Event Time Alignment in Sources
>>>>>   - Simpler out-of-the box support in sources
>>>>>
>>>>> *Checkpointing*
>>>>>   - Consistency of Side Effects: suspend / end with savepoint (FLIP-34)
>>>>>   - Failed checkpoints explicitly aborted on TaskManagers (not only on
>>>>> coordinator)
>>>>>
>>>>> *Automatic scaling (adjusting parallelism)*
>>>>>   - Reactive scaling
>>>>>   - Active scaling policies
>>>>>
>>>>> *Kubernetes Integration*
>>>>>   - Active Kubernetes Integration (Flink actively manages containers)
>>>>>
>>>>> *SQL Ecosystem*
>>>>>   - Extended Metadata Stores / Catalog / Schema Registries support
>>>>>   - DDL support
>>>>>   - Integration with Hive Ecosystem
>>>>>
>>>>> *Simpler Handling of Dependencies*
>>>>>   - Scala in the APIs, but not in the core (hide in separate class
>>>>> loader)
>>>>>   - Hadoop-free by default
>>>>>
>>>>>

Re: [DISCUSS] Adding a mid-term roadmap to the Flink website

Posted by jincheng sun <su...@gmail.com>.

Hi Stephan,

Thanks for the clarification! You are right, we have never initiated a
discussion about supporting OVER Window on DataStream, we can discuss it in
a separate thread. I agree with you add the item after move the discussion
forward.

+1 for putting the roadmap on the website.
+1 for periodically update the roadmap, as mentioned by Fabian, we can
update it at every feature version release.

Thanks,
Jincheng

Stephan Ewen <se...@apache.org> 于2019年2月14日周四 下午5:44写道：

> Thanks Jincheng and Rong Rong!
>
> I am not deciding a roadmap and making a call on what features should be
> developed or not. I was only collecting broader issues that are already
> happening or have an active FLIP/design discussion plus committer support.
>
> Do we have that for the suggested issues as well? If yes , we can add them
> (can you point me to the issue/mail-thread), if not, let's try and move the
> discussion forward and add them to the roadmap overview then.
>
> Best,
> Stephan
>
>
> On Wed, Feb 13, 2019 at 6:47 PM Rong Rong <wa...@gmail.com> wrote:
>
>> Thanks Stephan for the great proposal.
>>
>> This would not only be beneficial for new users but also for contributors
>> to keep track on all upcoming features.
>>
>> I think that better window operator support can also be separately group
>> into its own category, as they affects both future DataStream API and batch
>> stream unification.
>> can we also include:
>> - OVER aggregate for DataStream API separately as @jincheng suggested.
>> - Improving sliding window operator [1]
>>
>> One more additional suggestion, can we also include a more extendable
>> security module [2,3] @shuyi and I are currently working on?
>> This will significantly improve the usability for Flink in corporate
>> environments where proprietary or 3rd-party security integration is needed.
>>
>> Thanks,
>> Rong
>>
>>
>> [1]
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Improvement-to-Flink-Window-Operator-with-Slicing-td25750.html
>> [2]
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-security-improvements-td21068.html
>> [3]
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-Kerberos-Improvement-td25983.html
>>
>>
>>
>>
>> On Wed, Feb 13, 2019 at 3:39 AM jincheng sun <su...@gmail.com>
>> wrote:
>>
>>> Very excited and thank you for launching such a great discussion,
>>> Stephan !
>>>
>>> Here only a little suggestion that in the Batch Streaming Unification
>>> section, do we need to add an item:
>>>
>>> - Same window operators on bounded/unbounded Table API and DataStream
>>> API
>>> (currently OVER window only exists in SQL/TableAPI, DataStream API does
>>> not yet support)
>>>
>>> Best,
>>> Jincheng
>>>
>>> Stephan Ewen <se...@apache.org> 于2019年2月13日周三 下午7:21写道：
>>>
>>>> Hi all!
>>>>
>>>> Recently several contributors, committers, and users asked about making
>>>> it more visible in which way the project is currently going.
>>>>
>>>> Users and developers can track the direction by following the
>>>> discussion threads and JIRA, but due to the mass of discussions and open
>>>> issues, it is very hard to get a good overall picture.
>>>> Especially for new users and contributors, is is very hard to get a
>>>> quick overview of the project direction.
>>>>
>>>> To fix this, I suggest to add a brief roadmap summary to the homepage.
>>>> It is a bit of a commitment to keep that roadmap up to date, but I think
>>>> the benefit for users justifies that.
>>>> The Apache Beam project has added such a roadmap [1]
>>>> <https://beam.apache.org/roadmap/>, which was received very well by
>>>> the community, I would suggest to follow a similar structure here.
>>>>
>>>> If the community is in favor of this, I would volunteer to write a
>>>> first version of such a roadmap. The points I would include are below.
>>>>
>>>> Best,
>>>> Stephan
>>>>
>>>> [1] https://beam.apache.org/roadmap/
>>>>
>>>> ========================================================
>>>>
>>>> Disclaimer: Apache Flink is not governed or steered by any one single
>>>> entity, but by its community and Project Management Committee (PMC). This
>>>> is not a authoritative roadmap in the sense of a plan with a specific
>>>> timeline. Instead, we share our vision for the future and major initiatives
>>>> that are receiving attention and give users and contributors an
>>>> understanding what they can look forward to.
>>>>
>>>> *Future Role of Table API and DataStream API*
>>>>   - Table API becomes first class citizen
>>>>   - Table API becomes primary API for analytics use cases
>>>>       * Declarative, automatic optimizations
>>>>       * No manual control over state and timers
>>>>   - DataStream API becomes primary API for applications and data
>>>> pipeline use cases
>>>>       * Physical, user controls data types, no magic or optimizer
>>>>       * Explicit control over state and time
>>>>
>>>> *Batch Streaming Unification*
>>>>   - Table API unification (environments) (FLIP-32)
>>>>   - New unified source interface (FLIP-27)
>>>>   - Runtime operator unification & code reuse between DataStream / Table
>>>>   - Extending Table API to make it convenient API for all analytical
>>>> use cases (easier mix in of UDFs)
>>>>   - Same join operators on bounded/unbounded Table API and DataStream
>>>> API
>>>>
>>>> *Faster Batch (Bounded Streams)*
>>>>   - Much of this comes via Blink contribution/merging
>>>>   - Fine-grained Fault Tolerance on bounded data (Table API)
>>>>   - Batch Scheduling on bounded data (Table API)
>>>>   - External Shuffle Services Support on bounded streams
>>>>   - Caching of intermediate results on bounded data (Table API)
>>>>   - Extending DataStream API to explicitly model bounded streams (API
>>>> breaking)
>>>>   - Add fine fault tolerance, scheduling, caching also to DataStream API
>>>>
>>>> *Streaming State Evolution*
>>>>   - Let all built-in serializers support stable evolution
>>>>   - First class support for other evolvable formats (Protobuf, Thrift)
>>>>   - Savepoint input/output format to modify / adjust savepoints
>>>>
>>>> *Simpler Event Time Handling*
>>>>   - Event Time Alignment in Sources
>>>>   - Simpler out-of-the box support in sources
>>>>
>>>> *Checkpointing*
>>>>   - Consistency of Side Effects: suspend / end with savepoint (FLIP-34)
>>>>   - Failed checkpoints explicitly aborted on TaskManagers (not only on
>>>> coordinator)
>>>>
>>>> *Automatic scaling (adjusting parallelism)*
>>>>   - Reactive scaling
>>>>   - Active scaling policies
>>>>
>>>> *Kubernetes Integration*
>>>>   - Active Kubernetes Integration (Flink actively manages containers)
>>>>
>>>> *SQL Ecosystem*
>>>>   - Extended Metadata Stores / Catalog / Schema Registries support
>>>>   - DDL support
>>>>   - Integration with Hive Ecosystem
>>>>
>>>> *Simpler Handling of Dependencies*
>>>>   - Scala in the APIs, but not in the core (hide in separate class
>>>> loader)
>>>>   - Hadoop-free by default
>>>>
>>>>

Re: [DISCUSS] Adding a mid-term roadmap to the Flink website

Posted by jincheng sun <su...@gmail.com>.

Hi Stephan,

Thanks for the clarification! You are right, we have never initiated a
discussion about supporting OVER Window on DataStream, we can discuss it in
a separate thread. I agree with you add the item after move the discussion
forward.

+1 for putting the roadmap on the website.
+1 for periodically update the roadmap, as mentioned by Fabian, we can
update it at every feature version release.

Thanks,
Jincheng

Stephan Ewen <se...@apache.org> 于2019年2月14日周四 下午5:44写道：

> Thanks Jincheng and Rong Rong!
>
> I am not deciding a roadmap and making a call on what features should be
> developed or not. I was only collecting broader issues that are already
> happening or have an active FLIP/design discussion plus committer support.
>
> Do we have that for the suggested issues as well? If yes , we can add them
> (can you point me to the issue/mail-thread), if not, let's try and move the
> discussion forward and add them to the roadmap overview then.
>
> Best,
> Stephan
>
>
> On Wed, Feb 13, 2019 at 6:47 PM Rong Rong <wa...@gmail.com> wrote:
>
>> Thanks Stephan for the great proposal.
>>
>> This would not only be beneficial for new users but also for contributors
>> to keep track on all upcoming features.
>>
>> I think that better window operator support can also be separately group
>> into its own category, as they affects both future DataStream API and batch
>> stream unification.
>> can we also include:
>> - OVER aggregate for DataStream API separately as @jincheng suggested.
>> - Improving sliding window operator [1]
>>
>> One more additional suggestion, can we also include a more extendable
>> security module [2,3] @shuyi and I are currently working on?
>> This will significantly improve the usability for Flink in corporate
>> environments where proprietary or 3rd-party security integration is needed.
>>
>> Thanks,
>> Rong
>>
>>
>> [1]
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Improvement-to-Flink-Window-Operator-with-Slicing-td25750.html
>> [2]
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-security-improvements-td21068.html
>> [3]
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-Kerberos-Improvement-td25983.html
>>
>>
>>
>>
>> On Wed, Feb 13, 2019 at 3:39 AM jincheng sun <su...@gmail.com>
>> wrote:
>>
>>> Very excited and thank you for launching such a great discussion,
>>> Stephan !
>>>
>>> Here only a little suggestion that in the Batch Streaming Unification
>>> section, do we need to add an item:
>>>
>>> - Same window operators on bounded/unbounded Table API and DataStream
>>> API
>>> (currently OVER window only exists in SQL/TableAPI, DataStream API does
>>> not yet support)
>>>
>>> Best,
>>> Jincheng
>>>
>>> Stephan Ewen <se...@apache.org> 于2019年2月13日周三 下午7:21写道：
>>>
>>>> Hi all!
>>>>
>>>> Recently several contributors, committers, and users asked about making
>>>> it more visible in which way the project is currently going.
>>>>
>>>> Users and developers can track the direction by following the
>>>> discussion threads and JIRA, but due to the mass of discussions and open
>>>> issues, it is very hard to get a good overall picture.
>>>> Especially for new users and contributors, is is very hard to get a
>>>> quick overview of the project direction.
>>>>
>>>> To fix this, I suggest to add a brief roadmap summary to the homepage.
>>>> It is a bit of a commitment to keep that roadmap up to date, but I think
>>>> the benefit for users justifies that.
>>>> The Apache Beam project has added such a roadmap [1]
>>>> <https://beam.apache.org/roadmap/>, which was received very well by
>>>> the community, I would suggest to follow a similar structure here.
>>>>
>>>> If the community is in favor of this, I would volunteer to write a
>>>> first version of such a roadmap. The points I would include are below.
>>>>
>>>> Best,
>>>> Stephan
>>>>
>>>> [1] https://beam.apache.org/roadmap/
>>>>
>>>> ========================================================
>>>>
>>>> Disclaimer: Apache Flink is not governed or steered by any one single
>>>> entity, but by its community and Project Management Committee (PMC). This
>>>> is not a authoritative roadmap in the sense of a plan with a specific
>>>> timeline. Instead, we share our vision for the future and major initiatives
>>>> that are receiving attention and give users and contributors an
>>>> understanding what they can look forward to.
>>>>
>>>> *Future Role of Table API and DataStream API*
>>>>   - Table API becomes first class citizen
>>>>   - Table API becomes primary API for analytics use cases
>>>>       * Declarative, automatic optimizations
>>>>       * No manual control over state and timers
>>>>   - DataStream API becomes primary API for applications and data
>>>> pipeline use cases
>>>>       * Physical, user controls data types, no magic or optimizer
>>>>       * Explicit control over state and time
>>>>
>>>> *Batch Streaming Unification*
>>>>   - Table API unification (environments) (FLIP-32)
>>>>   - New unified source interface (FLIP-27)
>>>>   - Runtime operator unification & code reuse between DataStream / Table
>>>>   - Extending Table API to make it convenient API for all analytical
>>>> use cases (easier mix in of UDFs)
>>>>   - Same join operators on bounded/unbounded Table API and DataStream
>>>> API
>>>>
>>>> *Faster Batch (Bounded Streams)*
>>>>   - Much of this comes via Blink contribution/merging
>>>>   - Fine-grained Fault Tolerance on bounded data (Table API)
>>>>   - Batch Scheduling on bounded data (Table API)
>>>>   - External Shuffle Services Support on bounded streams
>>>>   - Caching of intermediate results on bounded data (Table API)
>>>>   - Extending DataStream API to explicitly model bounded streams (API
>>>> breaking)
>>>>   - Add fine fault tolerance, scheduling, caching also to DataStream API
>>>>
>>>> *Streaming State Evolution*
>>>>   - Let all built-in serializers support stable evolution
>>>>   - First class support for other evolvable formats (Protobuf, Thrift)
>>>>   - Savepoint input/output format to modify / adjust savepoints
>>>>
>>>> *Simpler Event Time Handling*
>>>>   - Event Time Alignment in Sources
>>>>   - Simpler out-of-the box support in sources
>>>>
>>>> *Checkpointing*
>>>>   - Consistency of Side Effects: suspend / end with savepoint (FLIP-34)
>>>>   - Failed checkpoints explicitly aborted on TaskManagers (not only on
>>>> coordinator)
>>>>
>>>> *Automatic scaling (adjusting parallelism)*
>>>>   - Reactive scaling
>>>>   - Active scaling policies
>>>>
>>>> *Kubernetes Integration*
>>>>   - Active Kubernetes Integration (Flink actively manages containers)
>>>>
>>>> *SQL Ecosystem*
>>>>   - Extended Metadata Stores / Catalog / Schema Registries support
>>>>   - DDL support
>>>>   - Integration with Hive Ecosystem
>>>>
>>>> *Simpler Handling of Dependencies*
>>>>   - Scala in the APIs, but not in the core (hide in separate class
>>>> loader)
>>>>   - Hadoop-free by default
>>>>
>>>>

Re: [DISCUSS] Adding a mid-term roadmap to the Flink website

Posted by Jeff Zhang <zj...@gmail.com>.

Hi Stephan,

Thanks for this proposal. It is a good idea to track the roadmap. One
suggestion is that it might be better to put it into wiki page first.
Because it is easier to update the roadmap on wiki compared to on flink web
site. And I guess we may need to update the roadmap very often at the
beginning as there's so many discussions and proposals in community
recently. We can move it into flink web site later when we feel it could be
nailed down.

Stephan Ewen <se...@apache.org> 于2019年2月14日周四 下午5:44写道：

> Thanks Jincheng and Rong Rong!
>
> I am not deciding a roadmap and making a call on what features should be
> developed or not. I was only collecting broader issues that are already
> happening or have an active FLIP/design discussion plus committer support.
>
> Do we have that for the suggested issues as well? If yes , we can add them
> (can you point me to the issue/mail-thread), if not, let's try and move the
> discussion forward and add them to the roadmap overview then.
>
> Best,
> Stephan
>
>
> On Wed, Feb 13, 2019 at 6:47 PM Rong Rong <wa...@gmail.com> wrote:
>
>> Thanks Stephan for the great proposal.
>>
>> This would not only be beneficial for new users but also for contributors
>> to keep track on all upcoming features.
>>
>> I think that better window operator support can also be separately group
>> into its own category, as they affects both future DataStream API and batch
>> stream unification.
>> can we also include:
>> - OVER aggregate for DataStream API separately as @jincheng suggested.
>> - Improving sliding window operator [1]
>>
>> One more additional suggestion, can we also include a more extendable
>> security module [2,3] @shuyi and I are currently working on?
>> This will significantly improve the usability for Flink in corporate
>> environments where proprietary or 3rd-party security integration is needed.
>>
>> Thanks,
>> Rong
>>
>>
>> [1]
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Improvement-to-Flink-Window-Operator-with-Slicing-td25750.html
>> [2]
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-security-improvements-td21068.html
>> [3]
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-Kerberos-Improvement-td25983.html
>>
>>
>>
>>
>> On Wed, Feb 13, 2019 at 3:39 AM jincheng sun <su...@gmail.com>
>> wrote:
>>
>>> Very excited and thank you for launching such a great discussion,
>>> Stephan !
>>>
>>> Here only a little suggestion that in the Batch Streaming Unification
>>> section, do we need to add an item:
>>>
>>> - Same window operators on bounded/unbounded Table API and DataStream
>>> API
>>> (currently OVER window only exists in SQL/TableAPI, DataStream API does
>>> not yet support)
>>>
>>> Best,
>>> Jincheng
>>>
>>> Stephan Ewen <se...@apache.org> 于2019年2月13日周三 下午7:21写道：
>>>
>>>> Hi all!
>>>>
>>>> Recently several contributors, committers, and users asked about making
>>>> it more visible in which way the project is currently going.
>>>>
>>>> Users and developers can track the direction by following the
>>>> discussion threads and JIRA, but due to the mass of discussions and open
>>>> issues, it is very hard to get a good overall picture.
>>>> Especially for new users and contributors, is is very hard to get a
>>>> quick overview of the project direction.
>>>>
>>>> To fix this, I suggest to add a brief roadmap summary to the homepage.
>>>> It is a bit of a commitment to keep that roadmap up to date, but I think
>>>> the benefit for users justifies that.
>>>> The Apache Beam project has added such a roadmap [1]
>>>> <https://beam.apache.org/roadmap/>, which was received very well by
>>>> the community, I would suggest to follow a similar structure here.
>>>>
>>>> If the community is in favor of this, I would volunteer to write a
>>>> first version of such a roadmap. The points I would include are below.
>>>>
>>>> Best,
>>>> Stephan
>>>>
>>>> [1] https://beam.apache.org/roadmap/
>>>>
>>>> ========================================================
>>>>
>>>> Disclaimer: Apache Flink is not governed or steered by any one single
>>>> entity, but by its community and Project Management Committee (PMC). This
>>>> is not a authoritative roadmap in the sense of a plan with a specific
>>>> timeline. Instead, we share our vision for the future and major initiatives
>>>> that are receiving attention and give users and contributors an
>>>> understanding what they can look forward to.
>>>>
>>>> *Future Role of Table API and DataStream API*
>>>>   - Table API becomes first class citizen
>>>>   - Table API becomes primary API for analytics use cases
>>>>       * Declarative, automatic optimizations
>>>>       * No manual control over state and timers
>>>>   - DataStream API becomes primary API for applications and data
>>>> pipeline use cases
>>>>       * Physical, user controls data types, no magic or optimizer
>>>>       * Explicit control over state and time
>>>>
>>>> *Batch Streaming Unification*
>>>>   - Table API unification (environments) (FLIP-32)
>>>>   - New unified source interface (FLIP-27)
>>>>   - Runtime operator unification & code reuse between DataStream / Table
>>>>   - Extending Table API to make it convenient API for all analytical
>>>> use cases (easier mix in of UDFs)
>>>>   - Same join operators on bounded/unbounded Table API and DataStream
>>>> API
>>>>
>>>> *Faster Batch (Bounded Streams)*
>>>>   - Much of this comes via Blink contribution/merging
>>>>   - Fine-grained Fault Tolerance on bounded data (Table API)
>>>>   - Batch Scheduling on bounded data (Table API)
>>>>   - External Shuffle Services Support on bounded streams
>>>>   - Caching of intermediate results on bounded data (Table API)
>>>>   - Extending DataStream API to explicitly model bounded streams (API
>>>> breaking)
>>>>   - Add fine fault tolerance, scheduling, caching also to DataStream API
>>>>
>>>> *Streaming State Evolution*
>>>>   - Let all built-in serializers support stable evolution
>>>>   - First class support for other evolvable formats (Protobuf, Thrift)
>>>>   - Savepoint input/output format to modify / adjust savepoints
>>>>
>>>> *Simpler Event Time Handling*
>>>>   - Event Time Alignment in Sources
>>>>   - Simpler out-of-the box support in sources
>>>>
>>>> *Checkpointing*
>>>>   - Consistency of Side Effects: suspend / end with savepoint (FLIP-34)
>>>>   - Failed checkpoints explicitly aborted on TaskManagers (not only on
>>>> coordinator)
>>>>
>>>> *Automatic scaling (adjusting parallelism)*
>>>>   - Reactive scaling
>>>>   - Active scaling policies
>>>>
>>>> *Kubernetes Integration*
>>>>   - Active Kubernetes Integration (Flink actively manages containers)
>>>>
>>>> *SQL Ecosystem*
>>>>   - Extended Metadata Stores / Catalog / Schema Registries support
>>>>   - DDL support
>>>>   - Integration with Hive Ecosystem
>>>>
>>>> *Simpler Handling of Dependencies*
>>>>   - Scala in the APIs, but not in the core (hide in separate class
>>>> loader)
>>>>   - Hadoop-free by default
>>>>
>>>>

-- 
Best Regards

Jeff Zhang

Re: [DISCUSS] Adding a mid-term roadmap to the Flink website

Posted by Stephan Ewen <se...@apache.org>.

Thanks Jincheng and Rong Rong!

I am not deciding a roadmap and making a call on what features should be
developed or not. I was only collecting broader issues that are already
happening or have an active FLIP/design discussion plus committer support.

Do we have that for the suggested issues as well? If yes , we can add them
(can you point me to the issue/mail-thread), if not, let's try and move the
discussion forward and add them to the roadmap overview then.

Best,
Stephan


On Wed, Feb 13, 2019 at 6:47 PM Rong Rong <wa...@gmail.com> wrote:

> Thanks Stephan for the great proposal.
>
> This would not only be beneficial for new users but also for contributors
> to keep track on all upcoming features.
>
> I think that better window operator support can also be separately group
> into its own category, as they affects both future DataStream API and batch
> stream unification.
> can we also include:
> - OVER aggregate for DataStream API separately as @jincheng suggested.
> - Improving sliding window operator [1]
>
> One more additional suggestion, can we also include a more extendable
> security module [2,3] @shuyi and I are currently working on?
> This will significantly improve the usability for Flink in corporate
> environments where proprietary or 3rd-party security integration is needed.
>
> Thanks,
> Rong
>
>
> [1]
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Improvement-to-Flink-Window-Operator-with-Slicing-td25750.html
> [2]
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-security-improvements-td21068.html
> [3]
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-Kerberos-Improvement-td25983.html
>
>
>
>
> On Wed, Feb 13, 2019 at 3:39 AM jincheng sun <su...@gmail.com>
> wrote:
>
>> Very excited and thank you for launching such a great discussion, Stephan
>> !
>>
>> Here only a little suggestion that in the Batch Streaming Unification
>> section, do we need to add an item:
>>
>> - Same window operators on bounded/unbounded Table API and DataStream API
>> (currently OVER window only exists in SQL/TableAPI, DataStream API does
>> not yet support)
>>
>> Best,
>> Jincheng
>>
>> Stephan Ewen <se...@apache.org> 于2019年2月13日周三 下午7:21写道：
>>
>>> Hi all!
>>>
>>> Recently several contributors, committers, and users asked about making
>>> it more visible in which way the project is currently going.
>>>
>>> Users and developers can track the direction by following the discussion
>>> threads and JIRA, but due to the mass of discussions and open issues, it is
>>> very hard to get a good overall picture.
>>> Especially for new users and contributors, is is very hard to get a
>>> quick overview of the project direction.
>>>
>>> To fix this, I suggest to add a brief roadmap summary to the homepage.
>>> It is a bit of a commitment to keep that roadmap up to date, but I think
>>> the benefit for users justifies that.
>>> The Apache Beam project has added such a roadmap [1]
>>> <https://beam.apache.org/roadmap/>, which was received very well by the
>>> community, I would suggest to follow a similar structure here.
>>>
>>> If the community is in favor of this, I would volunteer to write a first
>>> version of such a roadmap. The points I would include are below.
>>>
>>> Best,
>>> Stephan
>>>
>>> [1] https://beam.apache.org/roadmap/
>>>
>>> ========================================================
>>>
>>> Disclaimer: Apache Flink is not governed or steered by any one single
>>> entity, but by its community and Project Management Committee (PMC). This
>>> is not a authoritative roadmap in the sense of a plan with a specific
>>> timeline. Instead, we share our vision for the future and major initiatives
>>> that are receiving attention and give users and contributors an
>>> understanding what they can look forward to.
>>>
>>> *Future Role of Table API and DataStream API*
>>>   - Table API becomes first class citizen
>>>   - Table API becomes primary API for analytics use cases
>>>       * Declarative, automatic optimizations
>>>       * No manual control over state and timers
>>>   - DataStream API becomes primary API for applications and data
>>> pipeline use cases
>>>       * Physical, user controls data types, no magic or optimizer
>>>       * Explicit control over state and time
>>>
>>> *Batch Streaming Unification*
>>>   - Table API unification (environments) (FLIP-32)
>>>   - New unified source interface (FLIP-27)
>>>   - Runtime operator unification & code reuse between DataStream / Table
>>>   - Extending Table API to make it convenient API for all analytical use
>>> cases (easier mix in of UDFs)
>>>   - Same join operators on bounded/unbounded Table API and DataStream API
>>>
>>> *Faster Batch (Bounded Streams)*
>>>   - Much of this comes via Blink contribution/merging
>>>   - Fine-grained Fault Tolerance on bounded data (Table API)
>>>   - Batch Scheduling on bounded data (Table API)
>>>   - External Shuffle Services Support on bounded streams
>>>   - Caching of intermediate results on bounded data (Table API)
>>>   - Extending DataStream API to explicitly model bounded streams (API
>>> breaking)
>>>   - Add fine fault tolerance, scheduling, caching also to DataStream API
>>>
>>> *Streaming State Evolution*
>>>   - Let all built-in serializers support stable evolution
>>>   - First class support for other evolvable formats (Protobuf, Thrift)
>>>   - Savepoint input/output format to modify / adjust savepoints
>>>
>>> *Simpler Event Time Handling*
>>>   - Event Time Alignment in Sources
>>>   - Simpler out-of-the box support in sources
>>>
>>> *Checkpointing*
>>>   - Consistency of Side Effects: suspend / end with savepoint (FLIP-34)
>>>   - Failed checkpoints explicitly aborted on TaskManagers (not only on
>>> coordinator)
>>>
>>> *Automatic scaling (adjusting parallelism)*
>>>   - Reactive scaling
>>>   - Active scaling policies
>>>
>>> *Kubernetes Integration*
>>>   - Active Kubernetes Integration (Flink actively manages containers)
>>>
>>> *SQL Ecosystem*
>>>   - Extended Metadata Stores / Catalog / Schema Registries support
>>>   - DDL support
>>>   - Integration with Hive Ecosystem
>>>
>>> *Simpler Handling of Dependencies*
>>>   - Scala in the APIs, but not in the core (hide in separate class
>>> loader)
>>>   - Hadoop-free by default
>>>
>>>

Re: [DISCUSS] Adding a mid-term roadmap to the Flink website

Posted by Stephan Ewen <se...@apache.org>.

Thanks Jincheng and Rong Rong!

I am not deciding a roadmap and making a call on what features should be
developed or not. I was only collecting broader issues that are already
happening or have an active FLIP/design discussion plus committer support.

Do we have that for the suggested issues as well? If yes , we can add them
(can you point me to the issue/mail-thread), if not, let's try and move the
discussion forward and add them to the roadmap overview then.

Best,
Stephan


On Wed, Feb 13, 2019 at 6:47 PM Rong Rong <wa...@gmail.com> wrote:

> Thanks Stephan for the great proposal.
>
> This would not only be beneficial for new users but also for contributors
> to keep track on all upcoming features.
>
> I think that better window operator support can also be separately group
> into its own category, as they affects both future DataStream API and batch
> stream unification.
> can we also include:
> - OVER aggregate for DataStream API separately as @jincheng suggested.
> - Improving sliding window operator [1]
>
> One more additional suggestion, can we also include a more extendable
> security module [2,3] @shuyi and I are currently working on?
> This will significantly improve the usability for Flink in corporate
> environments where proprietary or 3rd-party security integration is needed.
>
> Thanks,
> Rong
>
>
> [1]
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Improvement-to-Flink-Window-Operator-with-Slicing-td25750.html
> [2]
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-security-improvements-td21068.html
> [3]
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-Kerberos-Improvement-td25983.html
>
>
>
>
> On Wed, Feb 13, 2019 at 3:39 AM jincheng sun <su...@gmail.com>
> wrote:
>
>> Very excited and thank you for launching such a great discussion, Stephan
>> !
>>
>> Here only a little suggestion that in the Batch Streaming Unification
>> section, do we need to add an item:
>>
>> - Same window operators on bounded/unbounded Table API and DataStream API
>> (currently OVER window only exists in SQL/TableAPI, DataStream API does
>> not yet support)
>>
>> Best,
>> Jincheng
>>
>> Stephan Ewen <se...@apache.org> 于2019年2月13日周三 下午7:21写道：
>>
>>> Hi all!
>>>
>>> Recently several contributors, committers, and users asked about making
>>> it more visible in which way the project is currently going.
>>>
>>> Users and developers can track the direction by following the discussion
>>> threads and JIRA, but due to the mass of discussions and open issues, it is
>>> very hard to get a good overall picture.
>>> Especially for new users and contributors, is is very hard to get a
>>> quick overview of the project direction.
>>>
>>> To fix this, I suggest to add a brief roadmap summary to the homepage.
>>> It is a bit of a commitment to keep that roadmap up to date, but I think
>>> the benefit for users justifies that.
>>> The Apache Beam project has added such a roadmap [1]
>>> <https://beam.apache.org/roadmap/>, which was received very well by the
>>> community, I would suggest to follow a similar structure here.
>>>
>>> If the community is in favor of this, I would volunteer to write a first
>>> version of such a roadmap. The points I would include are below.
>>>
>>> Best,
>>> Stephan
>>>
>>> [1] https://beam.apache.org/roadmap/
>>>
>>> ========================================================
>>>
>>> Disclaimer: Apache Flink is not governed or steered by any one single
>>> entity, but by its community and Project Management Committee (PMC). This
>>> is not a authoritative roadmap in the sense of a plan with a specific
>>> timeline. Instead, we share our vision for the future and major initiatives
>>> that are receiving attention and give users and contributors an
>>> understanding what they can look forward to.
>>>
>>> *Future Role of Table API and DataStream API*
>>>   - Table API becomes first class citizen
>>>   - Table API becomes primary API for analytics use cases
>>>       * Declarative, automatic optimizations
>>>       * No manual control over state and timers
>>>   - DataStream API becomes primary API for applications and data
>>> pipeline use cases
>>>       * Physical, user controls data types, no magic or optimizer
>>>       * Explicit control over state and time
>>>
>>> *Batch Streaming Unification*
>>>   - Table API unification (environments) (FLIP-32)
>>>   - New unified source interface (FLIP-27)
>>>   - Runtime operator unification & code reuse between DataStream / Table
>>>   - Extending Table API to make it convenient API for all analytical use
>>> cases (easier mix in of UDFs)
>>>   - Same join operators on bounded/unbounded Table API and DataStream API
>>>
>>> *Faster Batch (Bounded Streams)*
>>>   - Much of this comes via Blink contribution/merging
>>>   - Fine-grained Fault Tolerance on bounded data (Table API)
>>>   - Batch Scheduling on bounded data (Table API)
>>>   - External Shuffle Services Support on bounded streams
>>>   - Caching of intermediate results on bounded data (Table API)
>>>   - Extending DataStream API to explicitly model bounded streams (API
>>> breaking)
>>>   - Add fine fault tolerance, scheduling, caching also to DataStream API
>>>
>>> *Streaming State Evolution*
>>>   - Let all built-in serializers support stable evolution
>>>   - First class support for other evolvable formats (Protobuf, Thrift)
>>>   - Savepoint input/output format to modify / adjust savepoints
>>>
>>> *Simpler Event Time Handling*
>>>   - Event Time Alignment in Sources
>>>   - Simpler out-of-the box support in sources
>>>
>>> *Checkpointing*
>>>   - Consistency of Side Effects: suspend / end with savepoint (FLIP-34)
>>>   - Failed checkpoints explicitly aborted on TaskManagers (not only on
>>> coordinator)
>>>
>>> *Automatic scaling (adjusting parallelism)*
>>>   - Reactive scaling
>>>   - Active scaling policies
>>>
>>> *Kubernetes Integration*
>>>   - Active Kubernetes Integration (Flink actively manages containers)
>>>
>>> *SQL Ecosystem*
>>>   - Extended Metadata Stores / Catalog / Schema Registries support
>>>   - DDL support
>>>   - Integration with Hive Ecosystem
>>>
>>> *Simpler Handling of Dependencies*
>>>   - Scala in the APIs, but not in the core (hide in separate class
>>> loader)
>>>   - Hadoop-free by default
>>>
>>>

Re: [DISCUSS] Adding a mid-term roadmap to the Flink website

Posted by Rong Rong <wa...@gmail.com>.

Thanks Stephan for the great proposal.

This would not only be beneficial for new users but also for contributors
to keep track on all upcoming features.

I think that better window operator support can also be separately group
into its own category, as they affects both future DataStream API and batch
stream unification.
can we also include:
- OVER aggregate for DataStream API separately as @jincheng suggested.
- Improving sliding window operator [1]

One more additional suggestion, can we also include a more extendable
security module [2,3] @shuyi and I are currently working on?
This will significantly improve the usability for Flink in corporate
environments where proprietary or 3rd-party security integration is needed.

Thanks,
Rong


[1]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Improvement-to-Flink-Window-Operator-with-Slicing-td25750.html
[2]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-security-improvements-td21068.html
[3]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-Kerberos-Improvement-td25983.html




On Wed, Feb 13, 2019 at 3:39 AM jincheng sun <su...@gmail.com>
wrote:

> Very excited and thank you for launching such a great discussion, Stephan !
>
> Here only a little suggestion that in the Batch Streaming Unification
> section, do we need to add an item:
>
> - Same window operators on bounded/unbounded Table API and DataStream API
> (currently OVER window only exists in SQL/TableAPI, DataStream API does
> not yet support)
>
> Best,
> Jincheng
>
> Stephan Ewen <se...@apache.org> 于2019年2月13日周三 下午7:21写道：
>
>> Hi all!
>>
>> Recently several contributors, committers, and users asked about making
>> it more visible in which way the project is currently going.
>>
>> Users and developers can track the direction by following the discussion
>> threads and JIRA, but due to the mass of discussions and open issues, it is
>> very hard to get a good overall picture.
>> Especially for new users and contributors, is is very hard to get a quick
>> overview of the project direction.
>>
>> To fix this, I suggest to add a brief roadmap summary to the homepage. It
>> is a bit of a commitment to keep that roadmap up to date, but I think the
>> benefit for users justifies that.
>> The Apache Beam project has added such a roadmap [1]
>> <https://beam.apache.org/roadmap/>, which was received very well by the
>> community, I would suggest to follow a similar structure here.
>>
>> If the community is in favor of this, I would volunteer to write a first
>> version of such a roadmap. The points I would include are below.
>>
>> Best,
>> Stephan
>>
>> [1] https://beam.apache.org/roadmap/
>>
>> ========================================================
>>
>> Disclaimer: Apache Flink is not governed or steered by any one single
>> entity, but by its community and Project Management Committee (PMC). This
>> is not a authoritative roadmap in the sense of a plan with a specific
>> timeline. Instead, we share our vision for the future and major initiatives
>> that are receiving attention and give users and contributors an
>> understanding what they can look forward to.
>>
>> *Future Role of Table API and DataStream API*
>>   - Table API becomes first class citizen
>>   - Table API becomes primary API for analytics use cases
>>       * Declarative, automatic optimizations
>>       * No manual control over state and timers
>>   - DataStream API becomes primary API for applications and data pipeline
>> use cases
>>       * Physical, user controls data types, no magic or optimizer
>>       * Explicit control over state and time
>>
>> *Batch Streaming Unification*
>>   - Table API unification (environments) (FLIP-32)
>>   - New unified source interface (FLIP-27)
>>   - Runtime operator unification & code reuse between DataStream / Table
>>   - Extending Table API to make it convenient API for all analytical use
>> cases (easier mix in of UDFs)
>>   - Same join operators on bounded/unbounded Table API and DataStream API
>>
>> *Faster Batch (Bounded Streams)*
>>   - Much of this comes via Blink contribution/merging
>>   - Fine-grained Fault Tolerance on bounded data (Table API)
>>   - Batch Scheduling on bounded data (Table API)
>>   - External Shuffle Services Support on bounded streams
>>   - Caching of intermediate results on bounded data (Table API)
>>   - Extending DataStream API to explicitly model bounded streams (API
>> breaking)
>>   - Add fine fault tolerance, scheduling, caching also to DataStream API
>>
>> *Streaming State Evolution*
>>   - Let all built-in serializers support stable evolution
>>   - First class support for other evolvable formats (Protobuf, Thrift)
>>   - Savepoint input/output format to modify / adjust savepoints
>>
>> *Simpler Event Time Handling*
>>   - Event Time Alignment in Sources
>>   - Simpler out-of-the box support in sources
>>
>> *Checkpointing*
>>   - Consistency of Side Effects: suspend / end with savepoint (FLIP-34)
>>   - Failed checkpoints explicitly aborted on TaskManagers (not only on
>> coordinator)
>>
>> *Automatic scaling (adjusting parallelism)*
>>   - Reactive scaling
>>   - Active scaling policies
>>
>> *Kubernetes Integration*
>>   - Active Kubernetes Integration (Flink actively manages containers)
>>
>> *SQL Ecosystem*
>>   - Extended Metadata Stores / Catalog / Schema Registries support
>>   - DDL support
>>   - Integration with Hive Ecosystem
>>
>> *Simpler Handling of Dependencies*
>>   - Scala in the APIs, but not in the core (hide in separate class loader)
>>   - Hadoop-free by default
>>
>>

Re: [DISCUSS] Adding a mid-term roadmap to the Flink website

Posted by Rong Rong <wa...@gmail.com>.

Thanks Stephan for the great proposal.

This would not only be beneficial for new users but also for contributors
to keep track on all upcoming features.

I think that better window operator support can also be separately group
into its own category, as they affects both future DataStream API and batch
stream unification.
can we also include:
- OVER aggregate for DataStream API separately as @jincheng suggested.
- Improving sliding window operator [1]

One more additional suggestion, can we also include a more extendable
security module [2,3] @shuyi and I are currently working on?
This will significantly improve the usability for Flink in corporate
environments where proprietary or 3rd-party security integration is needed.

Thanks,
Rong


[1]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Improvement-to-Flink-Window-Operator-with-Slicing-td25750.html
[2]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-security-improvements-td21068.html
[3]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-Kerberos-Improvement-td25983.html




On Wed, Feb 13, 2019 at 3:39 AM jincheng sun <su...@gmail.com>
wrote:

> Very excited and thank you for launching such a great discussion, Stephan !
>
> Here only a little suggestion that in the Batch Streaming Unification
> section, do we need to add an item:
>
> - Same window operators on bounded/unbounded Table API and DataStream API
> (currently OVER window only exists in SQL/TableAPI, DataStream API does
> not yet support)
>
> Best,
> Jincheng
>
> Stephan Ewen <se...@apache.org> 于2019年2月13日周三 下午7:21写道：
>
>> Hi all!
>>
>> Recently several contributors, committers, and users asked about making
>> it more visible in which way the project is currently going.
>>
>> Users and developers can track the direction by following the discussion
>> threads and JIRA, but due to the mass of discussions and open issues, it is
>> very hard to get a good overall picture.
>> Especially for new users and contributors, is is very hard to get a quick
>> overview of the project direction.
>>
>> To fix this, I suggest to add a brief roadmap summary to the homepage. It
>> is a bit of a commitment to keep that roadmap up to date, but I think the
>> benefit for users justifies that.
>> The Apache Beam project has added such a roadmap [1]
>> <https://beam.apache.org/roadmap/>, which was received very well by the
>> community, I would suggest to follow a similar structure here.
>>
>> If the community is in favor of this, I would volunteer to write a first
>> version of such a roadmap. The points I would include are below.
>>
>> Best,
>> Stephan
>>
>> [1] https://beam.apache.org/roadmap/
>>
>> ========================================================
>>
>> Disclaimer: Apache Flink is not governed or steered by any one single
>> entity, but by its community and Project Management Committee (PMC). This
>> is not a authoritative roadmap in the sense of a plan with a specific
>> timeline. Instead, we share our vision for the future and major initiatives
>> that are receiving attention and give users and contributors an
>> understanding what they can look forward to.
>>
>> *Future Role of Table API and DataStream API*
>>   - Table API becomes first class citizen
>>   - Table API becomes primary API for analytics use cases
>>       * Declarative, automatic optimizations
>>       * No manual control over state and timers
>>   - DataStream API becomes primary API for applications and data pipeline
>> use cases
>>       * Physical, user controls data types, no magic or optimizer
>>       * Explicit control over state and time
>>
>> *Batch Streaming Unification*
>>   - Table API unification (environments) (FLIP-32)
>>   - New unified source interface (FLIP-27)
>>   - Runtime operator unification & code reuse between DataStream / Table
>>   - Extending Table API to make it convenient API for all analytical use
>> cases (easier mix in of UDFs)
>>   - Same join operators on bounded/unbounded Table API and DataStream API
>>
>> *Faster Batch (Bounded Streams)*
>>   - Much of this comes via Blink contribution/merging
>>   - Fine-grained Fault Tolerance on bounded data (Table API)
>>   - Batch Scheduling on bounded data (Table API)
>>   - External Shuffle Services Support on bounded streams
>>   - Caching of intermediate results on bounded data (Table API)
>>   - Extending DataStream API to explicitly model bounded streams (API
>> breaking)
>>   - Add fine fault tolerance, scheduling, caching also to DataStream API
>>
>> *Streaming State Evolution*
>>   - Let all built-in serializers support stable evolution
>>   - First class support for other evolvable formats (Protobuf, Thrift)
>>   - Savepoint input/output format to modify / adjust savepoints
>>
>> *Simpler Event Time Handling*
>>   - Event Time Alignment in Sources
>>   - Simpler out-of-the box support in sources
>>
>> *Checkpointing*
>>   - Consistency of Side Effects: suspend / end with savepoint (FLIP-34)
>>   - Failed checkpoints explicitly aborted on TaskManagers (not only on
>> coordinator)
>>
>> *Automatic scaling (adjusting parallelism)*
>>   - Reactive scaling
>>   - Active scaling policies
>>
>> *Kubernetes Integration*
>>   - Active Kubernetes Integration (Flink actively manages containers)
>>
>> *SQL Ecosystem*
>>   - Extended Metadata Stores / Catalog / Schema Registries support
>>   - DDL support
>>   - Integration with Hive Ecosystem
>>
>> *Simpler Handling of Dependencies*
>>   - Scala in the APIs, but not in the core (hide in separate class loader)
>>   - Hadoop-free by default
>>
>>

Re: [DISCUSS] Adding a mid-term roadmap to the Flink website

Posted by jincheng sun <su...@gmail.com>.

Very excited and thank you for launching such a great discussion, Stephan !

Here only a little suggestion that in the Batch Streaming Unification
section, do we need to add an item:

- Same window operators on bounded/unbounded Table API and DataStream API
(currently OVER window only exists in SQL/TableAPI, DataStream API does not
yet support)

Best,
Jincheng

Stephan Ewen <se...@apache.org> 于2019年2月13日周三 下午7:21写道：

> Hi all!
>
> Recently several contributors, committers, and users asked about making it
> more visible in which way the project is currently going.
>
> Users and developers can track the direction by following the discussion
> threads and JIRA, but due to the mass of discussions and open issues, it is
> very hard to get a good overall picture.
> Especially for new users and contributors, is is very hard to get a quick
> overview of the project direction.
>
> To fix this, I suggest to add a brief roadmap summary to the homepage. It
> is a bit of a commitment to keep that roadmap up to date, but I think the
> benefit for users justifies that.
> The Apache Beam project has added such a roadmap [1]
> <https://beam.apache.org/roadmap/>, which was received very well by the
> community, I would suggest to follow a similar structure here.
>
> If the community is in favor of this, I would volunteer to write a first
> version of such a roadmap. The points I would include are below.
>
> Best,
> Stephan
>
> [1] https://beam.apache.org/roadmap/
>
> ========================================================
>
> Disclaimer: Apache Flink is not governed or steered by any one single
> entity, but by its community and Project Management Committee (PMC). This
> is not a authoritative roadmap in the sense of a plan with a specific
> timeline. Instead, we share our vision for the future and major initiatives
> that are receiving attention and give users and contributors an
> understanding what they can look forward to.
>
> *Future Role of Table API and DataStream API*
>   - Table API becomes first class citizen
>   - Table API becomes primary API for analytics use cases
>       * Declarative, automatic optimizations
>       * No manual control over state and timers
>   - DataStream API becomes primary API for applications and data pipeline
> use cases
>       * Physical, user controls data types, no magic or optimizer
>       * Explicit control over state and time
>
> *Batch Streaming Unification*
>   - Table API unification (environments) (FLIP-32)
>   - New unified source interface (FLIP-27)
>   - Runtime operator unification & code reuse between DataStream / Table
>   - Extending Table API to make it convenient API for all analytical use
> cases (easier mix in of UDFs)
>   - Same join operators on bounded/unbounded Table API and DataStream API
>
> *Faster Batch (Bounded Streams)*
>   - Much of this comes via Blink contribution/merging
>   - Fine-grained Fault Tolerance on bounded data (Table API)
>   - Batch Scheduling on bounded data (Table API)
>   - External Shuffle Services Support on bounded streams
>   - Caching of intermediate results on bounded data (Table API)
>   - Extending DataStream API to explicitly model bounded streams (API
> breaking)
>   - Add fine fault tolerance, scheduling, caching also to DataStream API
>
> *Streaming State Evolution*
>   - Let all built-in serializers support stable evolution
>   - First class support for other evolvable formats (Protobuf, Thrift)
>   - Savepoint input/output format to modify / adjust savepoints
>
> *Simpler Event Time Handling*
>   - Event Time Alignment in Sources
>   - Simpler out-of-the box support in sources
>
> *Checkpointing*
>   - Consistency of Side Effects: suspend / end with savepoint (FLIP-34)
>   - Failed checkpoints explicitly aborted on TaskManagers (not only on
> coordinator)
>
> *Automatic scaling (adjusting parallelism)*
>   - Reactive scaling
>   - Active scaling policies
>
> *Kubernetes Integration*
>   - Active Kubernetes Integration (Flink actively manages containers)
>
> *SQL Ecosystem*
>   - Extended Metadata Stores / Catalog / Schema Registries support
>   - DDL support
>   - Integration with Hive Ecosystem
>
> *Simpler Handling of Dependencies*
>   - Scala in the APIs, but not in the core (hide in separate class loader)
>   - Hadoop-free by default
>
>

Re: [DISCUSS] Adding a mid-term roadmap to the Flink website

Posted by Shaoxuan Wang <ws...@gmail.com>.

Hi Stephan,

Thanks for summarizing the work&discussions into a roadmap. It really helps
users to understand where Flink will forward to. The entire outline looks
good to me. If appropriate, I would recommend to add another two attracting
categories in the roadmap.

*Flink ML Enhancement*
  - Refactor ML pipeline on TableAPI
  - Python support for TableAPI
  - Support streaming training & inference.
  - Seamless integration of DL engines (Tensorflow, PyTorch etc)
  - ML platform with a group of AI tooling
Some of these work have already been discussed in the dev mail list.
Related JIRA (FLINK-11095) and discussion:
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Embracing-Table-API-in-Flink-ML-td25368.html
;
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Python-and-Non-JVM-Language-Support-in-Flink-td25905.html


*Flink-Runtime-Web Improvement*
  - Much of this comes via Blink
  - Refactor the entire module to use latest Angular (7.x)
  - Add resource information at three levels including Cluster, TaskManager
and Job
  - Add operator level topology and and data flow tracing
  - Add new metrics to track the back pressure, filter and data skew
  - Add log association to Job, Vertex and SubTasks
Related JIRA (FLINK-10705) and discussion:
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Change-underlying-Frontend-Architecture-for-Flink-Web-Dashboard-td24902.html


What do you think?

Regards,
Shaoxuan



On Wed, Feb 13, 2019 at 7:21 PM Stephan Ewen <se...@apache.org> wrote:

> Hi all!
>
> Recently several contributors, committers, and users asked about making it
> more visible in which way the project is currently going.
>
> Users and developers can track the direction by following the discussion
> threads and JIRA, but due to the mass of discussions and open issues, it is
> very hard to get a good overall picture.
> Especially for new users and contributors, is is very hard to get a quick
> overview of the project direction.
>
> To fix this, I suggest to add a brief roadmap summary to the homepage. It
> is a bit of a commitment to keep that roadmap up to date, but I think the
> benefit for users justifies that.
> The Apache Beam project has added such a roadmap [1]
> <https://beam.apache.org/roadmap/>, which was received very well by the
> community, I would suggest to follow a similar structure here.
>
> If the community is in favor of this, I would volunteer to write a first
> version of such a roadmap. The points I would include are below.
>
> Best,
> Stephan
>
> [1] https://beam.apache.org/roadmap/
>
> ========================================================
>
> Disclaimer: Apache Flink is not governed or steered by any one single
> entity, but by its community and Project Management Committee (PMC). This
> is not a authoritative roadmap in the sense of a plan with a specific
> timeline. Instead, we share our vision for the future and major initiatives
> that are receiving attention and give users and contributors an
> understanding what they can look forward to.
>
> *Future Role of Table API and DataStream API*
>   - Table API becomes first class citizen
>   - Table API becomes primary API for analytics use cases
>       * Declarative, automatic optimizations
>       * No manual control over state and timers
>   - DataStream API becomes primary API for applications and data pipeline
> use cases
>       * Physical, user controls data types, no magic or optimizer
>       * Explicit control over state and time
>
> *Batch Streaming Unification*
>   - Table API unification (environments) (FLIP-32)
>   - New unified source interface (FLIP-27)
>   - Runtime operator unification & code reuse between DataStream / Table
>   - Extending Table API to make it convenient API for all analytical use
> cases (easier mix in of UDFs)
>   - Same join operators on bounded/unbounded Table API and DataStream API
>
> *Faster Batch (Bounded Streams)*
>   - Much of this comes via Blink contribution/merging
>   - Fine-grained Fault Tolerance on bounded data (Table API)
>   - Batch Scheduling on bounded data (Table API)
>   - External Shuffle Services Support on bounded streams
>   - Caching of intermediate results on bounded data (Table API)
>   - Extending DataStream API to explicitly model bounded streams (API
> breaking)
>   - Add fine fault tolerance, scheduling, caching also to DataStream API
>
> *Streaming State Evolution*
>   - Let all built-in serializers support stable evolution
>   - First class support for other evolvable formats (Protobuf, Thrift)
>   - Savepoint input/output format to modify / adjust savepoints
>
> *Simpler Event Time Handling*
>   - Event Time Alignment in Sources
>   - Simpler out-of-the box support in sources
>
> *Checkpointing*
>   - Consistency of Side Effects: suspend / end with savepoint (FLIP-34)
>   - Failed checkpoints explicitly aborted on TaskManagers (not only on
> coordinator)
>
> *Automatic scaling (adjusting parallelism)*
>   - Reactive scaling
>   - Active scaling policies
>
> *Kubernetes Integration*
>   - Active Kubernetes Integration (Flink actively manages containers)
>
> *SQL Ecosystem*
>   - Extended Metadata Stores / Catalog / Schema Registries support
>   - DDL support
>   - Integration with Hive Ecosystem
>
> *Simpler Handling of Dependencies*
>   - Scala in the APIs, but not in the core (hide in separate class loader)
>   - Hadoop-free by default
>
>

Re: [DISCUSS] Adding a mid-term roadmap to the Flink website

Posted by jincheng sun <su...@gmail.com>.

Very excited and thank you for launching such a great discussion, Stephan !

Here only a little suggestion that in the Batch Streaming Unification
section, do we need to add an item:

- Same window operators on bounded/unbounded Table API and DataStream API
(currently OVER window only exists in SQL/TableAPI, DataStream API does not
yet support)

Best,
Jincheng

Stephan Ewen <se...@apache.org> 于2019年2月13日周三 下午7:21写道：

> Hi all!
>
> Recently several contributors, committers, and users asked about making it
> more visible in which way the project is currently going.
>
> Users and developers can track the direction by following the discussion
> threads and JIRA, but due to the mass of discussions and open issues, it is
> very hard to get a good overall picture.
> Especially for new users and contributors, is is very hard to get a quick
> overview of the project direction.
>
> To fix this, I suggest to add a brief roadmap summary to the homepage. It
> is a bit of a commitment to keep that roadmap up to date, but I think the
> benefit for users justifies that.
> The Apache Beam project has added such a roadmap [1]
> <https://beam.apache.org/roadmap/>, which was received very well by the
> community, I would suggest to follow a similar structure here.
>
> If the community is in favor of this, I would volunteer to write a first
> version of such a roadmap. The points I would include are below.
>
> Best,
> Stephan
>
> [1] https://beam.apache.org/roadmap/
>
> ========================================================
>
> Disclaimer: Apache Flink is not governed or steered by any one single
> entity, but by its community and Project Management Committee (PMC). This
> is not a authoritative roadmap in the sense of a plan with a specific
> timeline. Instead, we share our vision for the future and major initiatives
> that are receiving attention and give users and contributors an
> understanding what they can look forward to.
>
> *Future Role of Table API and DataStream API*
>   - Table API becomes first class citizen
>   - Table API becomes primary API for analytics use cases
>       * Declarative, automatic optimizations
>       * No manual control over state and timers
>   - DataStream API becomes primary API for applications and data pipeline
> use cases
>       * Physical, user controls data types, no magic or optimizer
>       * Explicit control over state and time
>
> *Batch Streaming Unification*
>   - Table API unification (environments) (FLIP-32)
>   - New unified source interface (FLIP-27)
>   - Runtime operator unification & code reuse between DataStream / Table
>   - Extending Table API to make it convenient API for all analytical use
> cases (easier mix in of UDFs)
>   - Same join operators on bounded/unbounded Table API and DataStream API
>
> *Faster Batch (Bounded Streams)*
>   - Much of this comes via Blink contribution/merging
>   - Fine-grained Fault Tolerance on bounded data (Table API)
>   - Batch Scheduling on bounded data (Table API)
>   - External Shuffle Services Support on bounded streams
>   - Caching of intermediate results on bounded data (Table API)
>   - Extending DataStream API to explicitly model bounded streams (API
> breaking)
>   - Add fine fault tolerance, scheduling, caching also to DataStream API
>
> *Streaming State Evolution*
>   - Let all built-in serializers support stable evolution
>   - First class support for other evolvable formats (Protobuf, Thrift)
>   - Savepoint input/output format to modify / adjust savepoints
>
> *Simpler Event Time Handling*
>   - Event Time Alignment in Sources
>   - Simpler out-of-the box support in sources
>
> *Checkpointing*
>   - Consistency of Side Effects: suspend / end with savepoint (FLIP-34)
>   - Failed checkpoints explicitly aborted on TaskManagers (not only on
> coordinator)
>
> *Automatic scaling (adjusting parallelism)*
>   - Reactive scaling
>   - Active scaling policies
>
> *Kubernetes Integration*
>   - Active Kubernetes Integration (Flink actively manages containers)
>
> *SQL Ecosystem*
>   - Extended Metadata Stores / Catalog / Schema Registries support
>   - DDL support
>   - Integration with Hive Ecosystem
>
> *Simpler Handling of Dependencies*
>   - Scala in the APIs, but not in the core (hide in separate class loader)
>   - Hadoop-free by default
>
>