You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by Robert Metzger <rm...@apache.org> on 2019/06/07 12:26:40 UTC

Re: [DISCUSS] Clean up and reorganize the JIRA components

Hi,
Thanks for your explanation.
I've added "Benchmarks" and renamed "Runtime / Operators".

On Mon, May 20, 2019 at 10:59 AM Piotr Nowojski <pi...@ververica.com> wrote:

> Hi,
>
> > Concrete operator implementations will then go into the "API /
> DataStream"?
> > (or "API / DataSet" or Table)
> > Afaik, there were some ideas to share operator implementations between
> > DataStream and Table
>
> Yes & yes. I think for now we could keep the concrete operators
> implementations under API / DataStream and we can split them out once we
> have true use case for that. Unless this is confusing for someone, in that
> case we could split it now to API / DataStream Operators.
>
> >> 2. I think we should add additional component for benchmarks and
> >> benchmarking infrastructure. While this is more complicated topic
> (because
> >> of the setup and how is it running), it should be on the same level as
> >> correctness tests.
> >>
> >
> > I'm not sure if it is a good idea to add a "Benchmarks" component into
> the
> > Flink JIRA. Afaik, the benchmarks are managed from here?
> > https://github.com/dataArtisans/flink/tree/benchmark-request <
> https://github.com/dataArtisans/flink/tree/benchmark-request>
>
> Not all of them, some of them are in apache/flink. And it might be a
> subject to change in the future. Ideally we should have benchmarking code
> in the same repository, if not for some licensing issues. Also if we ever
> implement full cluster benchmarks (not using JMH), they could also reside
> in the Flink repository.
>
> Regardless of that, does it matter where the benchmarks are? In my opinion
> the only thing that matters is that benchmarks are just another for of
> tests/verification, we have unit tests, integrations tests, end to end
> tests and also various level benchmarks. Why should those things be treated
> differently?
>
> > Doesn't it make sense to track issues with GH issues there?
> > Or asking more broadly, what types of issues would you see in that
> > component?
>
> Same kind of issues as for any other type of tests. For example:
> - release blocker Jira issue that benchmarks are broken and are not
> testing anything (from time to time we have to fix something in the
> benchmarking setup and also it happened couple of times, that benchmarks
> have discovered some release blocker regressions in the Flink)
> - Jira issue to fix some benchmark
> - Jira issue to implement a missing benchmark
> - …
>
> Piotrek
>
> > On 17 May 2019, at 14:41, Robert Metzger <rm...@apache.org> wrote:
> >
> > Hi,
> >
> > 1. Renaming “Runtime / Operators” to “Runtime / Task” or something like
> >> “Runtime / Processing”. “Runtime / Operators” was confusing me, since it
> >> sounded like it covers concrete implementations of the operators, like
> >> “WindowOperator” or various join implementations.
> >>
> >
> > I'm fine with this renaming.
> > Concrete operator implementations will then go into the "API /
> DataStream"?
> > (or "API / DataSet" or Table)
> > Afaik, there were some ideas to share operator implementations between
> > DataStream and Table. If that's the case, we would have to find a good
> > components for that as well.
> >
> >
> >>
> >> 2. I think we should add additional component for benchmarks and
> >> benchmarking infrastructure. While this is more complicated topic
> (because
> >> of the setup and how is it running), it should be on the same level as
> >> correctness tests.
> >>
> >
> > I'm not sure if it is a good idea to add a "Benchmarks" component into
> the
> > Flink JIRA. Afaik, the benchmarks are managed from here?
> > https://github.com/dataArtisans/flink/tree/benchmark-request
> > Doesn't it make sense to track issues with GH issues there?
> > Or asking more broadly, what types of issues would you see in that
> > component?
> >
> >
> >>
> >> Piotrek
> >>
> >>> On 20 Feb 2019, at 10:53, Robert Metzger <rm...@apache.org> wrote:
> >>>
> >>> Thanks a lot Timo!
> >>>
> >>> I will start a vote Chesnay!
> >>>
> >>> On Wed, Feb 20, 2019 at 10:11 AM Timo Walther <tw...@apache.org>
> >> wrote:
> >>>
> >>>> +1 for the vote. Btw I can help cleaning up the "Table API & SQL"
> >>>> component. It seems to be the biggest with 1229 Issues.
> >>>>
> >>>> Thanks,
> >>>> Timo
> >>>>
> >>>> Am 20.02.19 um 10:09 schrieb Chesnay Schepler:
> >>>>> I would prefer if you'd start a vote with a new cleaned up proposal.
> >>>>>
> >>>>> On 18.02.2019 15:23, Robert Metzger wrote:
> >>>>>> I added "Runtime / Configuration" to the proposal:
> >>>>>>
> >>>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/Proposal+for+new+JIRA+Components
> >>>>>>
> >>>>>>
> >>>>>> Since this discussion has been open for 10 days, I assume we have
> >>>>>> reached
> >>>>>> consensus here. I will soon start renaming components.
> >>>>>>
> >>>>>> On Wed, Feb 13, 2019 at 10:51 AM Chesnay Schepler <
> chesnay@apache.org
> >>>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> The only parent I can think of is "Infrastructure", but I don't
> quite
> >>>>>>> like it :/
> >>>>>>>
> >>>>>>> +1 for "Runtime / Configuration"; this is too general to be placed
> in
> >>>>>>> coordination imo.
> >>>>>>>
> >>>>>>> On 12.02.2019 18:25, Robert Metzger wrote:
> >>>>>>>> Thanks a lot for your feedback Chesnay!
> >>>>>>>>
> >>>>>>>> re build/travis/release: Do you have a good idea for a common
> >>>>>>>> parent for
> >>>>>>>> "Build System", "Travis" and "Release System"?
> >>>>>>>>
> >>>>>>>> re legacy: Okay, I see your point. I will keep the Legacy
> Components
> >>>>>>> prefix.
> >>>>>>>> re library: I think I don't have a argument here. My proposal is
> >>>>>>>> based on
> >>>>>>>> what I felt as being right :) I added the "Library / " prefix to
> the
> >>>>>>>> proposal.
> >>>>>>>>
> >>>>>>>> re core/config: From the proposed components, I see the best match
> >>>>>>>> with
> >>>>>>>> "Runtime / Coordination", but I agree that this example is
> >>>>>>>> difficult to
> >>>>>>>> place into my proposed scheme. Do you think we should introduce
> >>>>>>>> "Runtime
> >>>>>>> /
> >>>>>>>> Configuration" as a component?
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> I updated the proposal accordingly!
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Tue, Feb 12, 2019 at 12:19 PM Chesnay Schepler <
> >> chesnay@apache.org
> >>>>>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> re build/travis/release: No, I'm against merging build system,
> >> travis
> >>>>>>>>> and release system.
> >>>>>>>>>
> >>>>>>>>> re legacy: So going forward you're proposing to move dropped
> >> features
> >>>>>>>>> into the legacy bucket and make it impossible to search for
> >> specific
> >>>>>>>>> issues for that component? There's 0 overhead to having these
> >>>>>>>>> components, so I really don't get the benefit here, but see the
> >>>>>>> overhead.
> >>>>>>>>> I don't buy the argument of "people will not open issues if the
> >>>>>>>>> component doesn't exist", they will just leave the component
> field
> >>>>>>>>> blank
> >>>>>>>>> or add a random one (that would be wrong). In fact, if you had a
> >>>>>>>>> storm/tez component (that users would adhere to) then it would be
> >>>>>>>>> _easier_ to figure out whether an issue can be rejected right
> away.
> >>>>>>>>>
> >>>>>>>>> re library: If you are against a library category, what's your
> >>>>>>>>> argument
> >>>>>>>>> for a connector category?
> >>>>>>>>>
> >>>>>>>>> re tests: I don't mind "tests" being removed from tickets about
> >> test
> >>>>>>>>> instabilities, but you specified the migration as "rename E2E
> >> tests"
> >>>>>>>>> which is not equivalent.
> >>>>>>>>> Under what category would you file modifications to
> >>>>>>> flink-test-utils-junit?
> >>>>>>>>> I would propose to not differentiate between e2e and other
> tests; I
> >>>>>>>>> would go along with "Test infrastructure", and remove the major
> >>>>>>>>> "Tests"
> >>>>>>>>> category.
> >>>>>>>>>
> >>>>>>>>> re core/config: As an example, where (under Runtime) would you
> >>>>>>>>> place the
> >>>>>>>>> introduction of the ConfigOption class?
> >>>>>>>>>
> >>>>>>>>> On 11.02.2019 11:31, Robert Metzger wrote:
> >>>>>>>>>> Thanks a lot for your feedback!
> >>>>>>>>>>
> >>>>>>>>>> @Timo:
> >>>>>>>>>> I've followed your suggestions and updated the proposed names in
> >> the
> >>>>>>>>> wiki.
> >>>>>>>>>> Regarding a new "SQL/Connectors" component: I (with admittedly
> >>>>>>>>>> not much
> >>>>>>>>>> knowledge) would not add this component at the moment, and put
> >>>>>>>>>> the SQL
> >>>>>>>>>> stuff into the respective connector component.
> >>>>>>>>>> It is probably pretty difficult for a user to decide whether a
> but
> >>>>>>>>> belongs
> >>>>>>>>>> to "SQL/Connector" to "Connectors/Kafka" when Kafka in SQL does
> >> not
> >>>>>>> work.
> >>>>>>>>>> @Chesnay:
> >>>>>>>>>> - You are suggesting to rename "Build System" to "Maven" and
> still
> >>>>>>> merge
> >>>>>>>>> it
> >>>>>>>>>> with "Travis", "Release System" etc. as in the proposal?
> >>>>>>>>>>
> >>>>>>>>>> - "Runtime / Control Plan" vs "Runtime / Coordination" -- I
> >>>>>>>>>> changed the
> >>>>>>>>>> proposal
> >>>>>>>>>>
> >>>>>>>>>> - Re. "Documentation": Yes, I think that would be better in the
> >> long
> >>>>>>> run.
> >>>>>>>>>> We are already in a situation where there are groups within the
> >>>>>>> community
> >>>>>>>>>> focusing on certain areas of the code (such as SQL, the runtime,
> >>>>>>>>>> connectors). Those groups will monitor their components, but it
> >> will
> >>>>>>> be a
> >>>>>>>>>> lot of overhead for them to monitor the "Documentation"
> component.
> >>>>>>>>>> We can also try to assign documentation components to both
> >>>>>>>>> "Documentation"
> >>>>>>>>>> and the affected component, such as "Runtime / Metrics".
> >>>>>>>>>>
> >>>>>>>>>> - Removed "Misc / " prefix.
> >>>>>>>>>>
> >>>>>>>>>> - "Legacy Components": Usually legacy components usually have
> >>>>>>>>>> very few
> >>>>>>>>>> tickets. "Flink on Tez" has 13, "Storm Compat" ~30, and JIRA has
> >>>>>>>>>> a bulk
> >>>>>>>>>> edit feature :)
> >>>>>>>>>> The benefit of having it generalized is that people will
> probably
> >>>>>>>>>> not
> >>>>>>> add
> >>>>>>>>>> tickets to it.
> >>>>>>>>>>
> >>>>>>>>>> - "Libraries /" prefix: I don't think that it is necessary. Some
> >>>>>>>>> libraries
> >>>>>>>>>> might grow in the future (like the Table API), then we need to
> >>>>>>>>>> rename.
> >>>>>>>>>> the "flink-libraries" module does contain stuff like the sql
> >>>>>>>>>> client or
> >>>>>>>>> the
> >>>>>>>>>> python api, which are already covered by other components in my
> >>>>>>> proposal
> >>>>>>>>> --
> >>>>>>>>>> so going with the maven module structure is not an argument
> here.
> >>>>>>>>>>
> >>>>>>>>>> - "End to end infrastructure" and "Tests: The same argument as
> >>>>>>>>>> with the
> >>>>>>>>>> "Documentation" applies here. The maintainers of Kafka, Metrics,
> >> ..
> >>>>>>>>> should
> >>>>>>>>>> get visibility into "their" test instabilities through "their"
> >>>>>>>>> components.
> >>>>>>>>>> Not many people will feel responsible for the "Tests" component.
> >>>>>>>>>>
> >>>>>>>>>> For "Core" and "Configuration", I will move the tickets to the
> >>>>>>>>> appropriate
> >>>>>>>>>> components in "Runtime /".
> >>>>>>>>>>
> >>>>>>>>>> For "API / Scala": Good point. I will add that component.
> >>>>>>>>>>
> >>>>>>>>>> How to do it? I will just go through the pain and do it.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Best,
> >>>>>>>>>> Robert
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On Fri, Feb 8, 2019 at 2:40 PM Chesnay Schepler <
> >> chesnay@apache.org
> >>>>>
> >>>>>>>>> wrote:
> >>>>>>>>>>> Some concerns:
> >>>>>>>>>>>
> >>>>>>>>>>> Travis and build system / release system are entirely
> different.
> >> I
> >>>>>>> would
> >>>>>>>>>>> even keep the release system away from the build-system, as it
> >>>>>>>>>>> is more
> >>>>>>>>>>> about the release scripts and documentation, while the latter
> is
> >>>>>>>>>>> about
> >>>>>>>>>>> maven. Actually I'd just rename build-system to maven.
> >>>>>>>>>>>
> >>>>>>>>>>> Control Plane is a term I've never heard before in this
> context;
> >>>>>>>>>>> I'd
> >>>>>>>>>>> replace it with Coordination.
> >>>>>>>>>>>
> >>>>>>>>>>> The "Documentation" descriptions refers to it as a "Fallback
> >>>>>>> component".
> >>>>>>>>>>> In other words, if I make a change to the metrics
> documentation I
> >>>>>>>>>>> shouldn't use this component any more?
> >>>>>>>>>>>
> >>>>>>>>>>> I don't see the benefit of a `Misc` major category. I'd
> attribute
> >>>>>>>>>>> everything that doesn't have a major category implicitly to
> >> "Misc".
> >>>>>>>>>>>
> >>>>>>>>>>> Not a fan of a generalized "Legacy components" category; this
> >> seems
> >>>>>>>>>>> unnecessary. It's also a bit weird going forward as we'd have
> to
> >>>>>>>>>>> touch
> >>>>>>>>>>> every JIRA for a component if we drop it.
> >>>>>>>>>>>
> >>>>>>>>>>> How come gelly/CEP don't have a Major category (libraries?)
> >>>>>>>>>>>
> >>>>>>>>>>> "End to end infrastructure" is not equivalent to "E2E tests".
> >>>>>>>>>>> Infrastructure is not about fixing failing tests, which is what
> >> we
> >>>>>>>>>>> partially used this component for so far.
> >>>>>>>>>>>
> >>>>>>>>>>> I don't believe you can get rid of the generic "Tests"
> component;
> >>>>>>>>>>> consider any changes to the `flink-test-utils-junit` module.
> >>>>>>>>>>>
> >>>>>>>>>>> You propose deleting "Core" and "Configuration" but haven't
> >>>>>>>>>>> listed any
> >>>>>>>>>>> migration paths.
> >>>>>>>>>>>
> >>>>>>>>>>> If there's a API / Python category there should also be a API /
> >>>>>>>>>>> Scala
> >>>>>>>>>>> category. This could also include the shala-shell. Note that
> the
> >>>>>>>>>>> existing Scala API category is not mentioned anywhere in the
> >>>>>>>>>>> document.
> >>>>>>>>>>>
> >>>>>>>>>>> How do you actually want to do the migration?
> >>>>>>>>>>>
> >>>>>>>>>>> On 08.02.2019 13:13, Timo Walther wrote:
> >>>>>>>>>>>> Hi Robert,
> >>>>>>>>>>>>
> >>>>>>>>>>>> thanks for starting this discussion. I was also about to
> suggest
> >>>>>>>>>>>> splitting the `Table API & SQL` component because it contains
> >>>>>>>>>>>> already
> >>>>>>>>>>>> more than 1000 issues.
> >>>>>>>>>>>>
> >>>>>>>>>>>> My comments:
> >>>>>>>>>>>>
> >>>>>>>>>>>> - Rename "SQL/Shell" to "SQL/Client" because the long-term
> goal
> >>>>>>>>>>>> might
> >>>>>>>>>>>> not only be a CLI interface. I would keep the generic name
> "SQL
> >>>>>>>>>>>> Client" for now. This is also what is written in FLIPs,
> >>>>>>> presentations,
> >>>>>>>>>>>> and documentation.
> >>>>>>>>>>>> - Rename "SQL/Query Planner" to "SQL/Planner" a query is
> >> read-only
> >>>>>>>>>>>> operation but we support things like INSERT INTO etc.. Planner
> >> is
> >>>>>>> more
> >>>>>>>>>>>> generic.
> >>>>>>>>>>>> - Rename "Gelly" to "Graph Processing". New users don't know
> >> what
> >>>>>>>>>>>> Gelly means. This is the only component that has a "feature
> >>>>>>>>>>>> name". I
> >>>>>>>>>>>> don't know if we want to stick with that in the future.
> >>>>>>>>>>>> - Not sure about this: Introduce a "SQL/Connectors"? Because
> SQL
> >>>>>>>>>>>> connectors are tightly bound to SQL internals but also to the
> >>>>>>>>>>>> connector itself.
> >>>>>>>>>>>> - Rename "Connectors/HCatalog" to "Connectors/Hive". This name
> >> is
> >>>>>>> more
> >>>>>>>>>>>> generic and reflects the efforts about Hive Metastore and
> >> catalog
> >>>>>>>>>>>> integration that is currenlty taking place.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Thanks,
> >>>>>>>>>>>> Timo
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> Am 08.02.19 um 12:39 schrieb Robert Metzger:
> >>>>>>>>>>>>> Hi all,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I am currently trying to improve how the Flink community is
> >>>>>>>>>>>>> handling
> >>>>>>>>>>>>> incoming pull requests and JIRA tickets.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I've looked at how other big communities are handling such a
> >> high
> >>>>>>>>>>>>> number of
> >>>>>>>>>>>>> contributions, and I found that many are using GitHub labels
> >>>>>>>>>>>>> extensively.
> >>>>>>>>>>>>> An integral part of the label use is to tag PRs with the
> >>>>>>>>>>>>> component /
> >>>>>>>>>>>>> area
> >>>>>>>>>>>>> they belong to. I think the most obvious and logical way of
> >>>>>>>>>>>>> tagging
> >>>>>>>>>>>>> the PRs
> >>>>>>>>>>>>> is by using the JIRA components. This will force us to keep
> >>>>>>>>>>>>> the JIRA
> >>>>>>>>>>>>> tickets well-organized, if we want the PRs to be organized :)
> >>>>>>>>>>>>> I will soon start a separate discussion for the GitHub
> labels.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Let's first discuss the JIRA components.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I've created the following Wiki page with my proposal of the
> >> new
> >>>>>>>>>>>>> component,
> >>>>>>>>>>>>> and how to migrate from the existing components:
> >>>>>>>>>>>>>
> >>>>>>>
> >>>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/Proposal+for+new+JIRA+Components
> >>>>>>>
> >>>>>>>>>>>>> Please comment here or directly in the Wiki to let me know
> >>>>>>>>>>>>> what you
> >>>>>>>>>>>>> think.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Best,
> >>>>>>>>>>>>> Robert
> >>>>>>>>>>>>>
> >>>>>>>
> >>>>
> >>>>
> >>
> >>
>
>