You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@storm.apache.org by Stig Rohde Døssing <st...@gmail.com> on 2018/05/01 20:06:24 UTC

Re: [DISCUSS] Decouple Storm core and connectors

Put up a PR for discussion, I think it's helpful to see some code before we
decide whether we want to try releasing storm-kafka-client independently or
not https://github.com/apache/storm/pull/2653.

2018-04-28 18:49 GMT+02:00 Alexandre Vermeerbergen <avermeerbergen@gmail.com
>:

> Hello Stig,
>
> +1 for your proposal (non-binding).
>
> No problem to drop Java 7 support from my perspective (next Java Long
> Term Support is Java 11, Java 8 end of support is in Jan 2019, so who
> cares about Java 7?)
>
> And kudos for your proposed item: "Update the storm-kafka-client docs
> with a compatibility matrix showing
> which versions of Storm the connector is compatible with" - I'm still
> blocked with Kafka libs 0.10.2.0 for consuming our Kafka 1.0.1 Brokers
> using Storm Kafka Client 1.2.0, I had very unstable behavior when
> trying to use Kafka 1.0.0 libs for consuming our brokers (which were
> at Kafka 1.0.0 version). I hope to find time to resume tests and to
> provide feedbacks to clarify this.
>
> Best regards,
> Alexandre Vermeerbergen
>
>
> 2018-04-28 15:06 GMT+02:00 Stig Rohde Døssing <st...@gmail.com>:
> > Sorry about the necro, but I think this is still relevant.
> >
> > Would everyone be okay with trying out the following for
> storm-kafka-client
> > (to start)?
> > * Detach storm-kafka-client's release process from Storm's release
> process,
> > so we can release it separately, but keep the code in the Storm repo. We
> > can probably do this by making some changes to the poms, e.g. by skipping
> > release for storm-kafka-client unless a specific profile is set.
> > * Make the code on the master branch the new storm-kafka-client baseline.
> > This would drop support for Java 7. I doubt this will be an issue since
> > Java 7 has been out of date for a while now, but if we're not sure about
> > this we could open a poll on the mailing list to see if there's a need to
> > stay on Java 7.
> > * Update the storm-kafka-client docs with a compatibility matrix showing
> > which versions of Storm the connector is compatible with.
> > * Bump the storm-kafka-client version to e.g. 2.0.0(?) for the first
> > release using this method.
> >
> > I'm not sure if we should create a branch just for storm-kafka-client, or
> > if we can get away with just having development for it happen on master.
> >
> > As Taylor mentioned this should allow us to get some experience with
> > releasing an external component separately, without making it too hard to
> > roll back if we decide that releasing separately doesn't work well.
> >
> > If no one objects, I'll open an issue to make these changes.
> >
> >
> > 2018-02-09 16:51 GMT+01:00 Xin Wang <da...@gmail.com>:
> >
> >> I agree with Jungtaek. The same case has happened again on RocketMQ.(
> >> https://github.com/apache/storm/pull/2518)
> >> The following is my advice.
> >>
> >> 1. Now storm has too many connectors, we can separate the first class
> >> connectors from others.
> >> The following is a possible list including all existing connectors.
> >>
> >> First class:
> >>
> >>    - Kafka,
> >>    - HDFS,
> >>    - HBase,
> >>    - Hive,
> >>    - Redis,
> >>    - JDBC,
> >>    - JMS
> >>
> >>
> >>
> >> Others:
> >>
> >>    - Solr,
> >>    - Cassandra,
> >>    - Elasticsearch,
> >>    - Event Hubs
> >>    - RocketMQ
> >>    - MongoDB
> >>    - OpenTSDB
> >>    - Kinesis
> >>    - Druid
> >>    - MQTT,
> >>    - PMML
> >>
> >>
> >> 2. For first class connectors we can leave the code as it is, but
> release
> >> them independently;
> >> for other connectors, I prefer to move them to Bahir like the way of
> >> Spark/Flink.
> >> We can have a communication with the Bahir community, and request to
> create
> >> a https://github.com/apache/bahir-storm.git repo.
> >>
> >>
> >>
> >> 2018-02-01 9:10 GMT+08:00 P. Taylor Goetz <pt...@gmail.com>:
> >>
> >> > I’d start with Storm-Kafka-client as an experiment, and if that goes
> >> well,
> >> > move all connectors to the same model.
> >> >
> >> > Some connectors are bound to a stable protocol (e.g. JMS, MQTT), some
> are
> >> > bound to frequently changing APIs (e.g. Apache Kafka, cassandra, ES,
> >> etc.).
> >> > The former tend to be stable in terms of usage patterns and use cases,
> >> the
> >> > latter case case not so much. For example, consider hdfs integration.
> >> It’s
> >> > changed a lot in response to different usage patterns. Kafka due to
> >> > new/changing APIs. JMS hasn’t changed much at all since it’s tied to a
> >> > stable API.
> >> >
> >> > There’s also the fact that a high percentage of connectors integrate
> with
> >> > the most stable Storm APIs (spout, bolt, trident). The volatile (using
> >> the
> >> > term loosely) parts of our API affect projects like Mesos and
> >> streamparse,
> >> > but not the connectors we sponsor.
> >> >
> >> > -Taylor
> >> >
> >> > > On Jan 31, 2018, at 7:07 PM, Roshan Naik <ro...@hortonworks.com>
> >> wrote:
> >> > >
> >> > > I was thinking if the any connector is released more frequently,
> their
> >> > quality would be more mature and typically have lower impact on a
> Storm
> >> > release (compared to now) … if we decide to bundle them in Storm as
> well.
> >> > > -roshan
> >> > >
> >> > >
> >> > > On 1/31/18, 4:02 PM, "P. Taylor Goetz" <pt...@gmail.com> wrote:
> >> > >
> >> > >    I think we all agree that releasing connectors as part of a Storm
> >> > release hinders the frequency of the release cycle for both Storm
> proper,
> >> > as well as connectors.
> >> > >
> >> > >    If that’s the case, then the question is how to proceed.
> >> > >
> >> > >    -Taylor
> >> > >
> >> > >> On Jan 31, 2018, at 6:46 PM, Roshan Naik <ro...@hortonworks.com>
> >> > wrote:
> >> > >>
> >> > >> One thought is to …
> >> > >> - do a frequent separate release
> >> > >> - *and also* include the latest stuff along with each Storm
> release.
> >> > >>
> >> > >> -roshan
> >> > >>
> >> > >>
> >> > >> On 1/31/18, 10:43 AM, "generalbas.srd@gmail.com on behalf of Stig
> >> > Rohde Døssing" <generalbas.srd@gmail.com on behalf of
> >> > stigdoessing@gmail.com> wrote:
> >> > >>
> >> > >>   Hugo,
> >> > >>   It's not my impression that anyone is complaining that
> >> > storm-kafka-client
> >> > >>   has been exceptionally buggy, or that we haven't been fixing the
> >> > issues as
> >> > >>   they crop up. The problem is that we're sitting on the fixes for
> way
> >> > longer
> >> > >>   than is reasonable, and even if we release Storm more often,
> users
> >> > have to
> >> > >>   go out of their way to know that they should really be using the
> >> > latest
> >> > >>   storm-kafka-client rather than the one that ships with their
> Storm
> >> > >>   installation, because the version number of storm-kafka-client
> >> > happens to
> >> > >>   not mean anything regarding compatibility with Storm.
> >> > >>
> >> > >>   Everyone,
> >> > >>
> >> > >>   Most of what I've written here has already been said, but I've
> >> already
> >> > >>   written it so...
> >> > >>
> >> > >>   I really don't see the point in going through the effort of
> >> separating
> >> > >>   connectors out to another repository if we're just going to make
> the
> >> > other
> >> > >>   repository the second class citizen connector graveyard.
> >> > >>
> >> > >>   The point to separating storm-kafka-client out is so it can get a
> >> > release
> >> > >>   cycle different from Storm, so we can avoid the situation we're
> in
> >> > now in
> >> > >>   the future. There's obviously a flaw in our process when we have
> to
> >> > choose
> >> > >>   between breaking semantic versioning and releasing broken
> software.
> >> > >>
> >> > >>   I agree that it would be good to release Storm a little more
> often,
> >> > but I
> >> > >>   don't think that fully addresses my concerns. Are we willing to
> >> > increment
> >> > >>   Storm's major version number if a connector needs to break its
> API
> >> > (e.g. as
> >> > >>   I want to do in https://github.com/apache/storm/pull/2300)?
> >> > >>
> >> > >>   I think a key observation is that Storm's core API is extremely
> >> > stable.
> >> > >>   Storm and the connectors aren't usually tightly coupled in the
> sense
> >> > that
> >> > >>   e.g. version 1.0.2 of storm-kafka-client would only work with
> Storm
> >> > 1.0.2
> >> > >>   and not 1.0.0, so in many cases there's no reason you wouldn't
> use
> >> the
> >> > >>   latest connector version instead of the one that happens to ship
> >> with
> >> > the
> >> > >>   version of Storm you're using. I think it would be attractive if
> we
> >> > could
> >> > >>   reduce the number of branches of connectors we need to maintain,
> and
> >> > >>   instead keep a compatibility matrix between Storm and the
> connector
> >> > in each
> >> > >>   README, for the rare occasions when the Storm core API changes.
> >> > >>
> >> > >>   +1 for trying out storm-kafka-client with its own release cycle
> and
> >> > >>   branches/subrepo/whichever way we want to separate the code, but
> >> > still part
> >> > >>   of the main Storm project JIRA and mailing list. Worst case we
> merge
> >> > it
> >> > >>   back in after a while. We may want to think about how to do that
> >> > before we
> >> > >>   separate out, just so we don't release e.g. storm-kafka-client
> 2.3.1
> >> > and
> >> > >>   then have to merge back to Storm which is still on 2.0.0.
> >> > >>
> >> > >>   2018-01-31 3:36 GMT+01:00 Jungtaek Lim <ka...@gmail.com>:
> >> > >>
> >> > >>> Agreed for this topic: this is not related to current release
> >> > candidate and
> >> > >>> verifying release candidate is higher priority.
> >> > >>> For me I didn't start verifying 1.1.2 / 1.0.6 RC2 because the
> other
> >> > topic I
> >> > >>> initiated could affect the current release. I'll post a short
> notice
> >> in
> >> > >>> that discussion thread.
> >> > >>>
> >> > >>> -Jungtaek Lim (HeartSaVioR)
> >> > >>>
> >> > >>> 2018년 1월 31일 (수) 오전 10:58, P. Taylor Goetz <pt...@gmail.com>님이
> 작성:
> >> > >>>
> >> > >>>> Hit send on that too soon...
> >> > >>>>
> >> > >>>> This is an important discussion topic, but has no effect on the
> >> > current
> >> > >>>> RCs. Id recommend focusing on the current releases and come back
> to
> >> > this
> >> > >>>> after getting  releases out.
> >> > >>>>
> >> > >>>> -Taylor
> >> > >>>>
> >> > >>>>> On Jan 30, 2018, at 8:51 PM, P. Taylor Goetz <ptgoetz@gmail.com
> >
> >> > >>> wrote:
> >> > >>>>>
> >> > >>>>> Also, in the interest of getting releases out, we have 3 open RC
> >> > cycles
> >> > >>>> in flight.
> >> > >>>>>
> >> > >>>>> Discussion energy might be better focused on that.
> >> > >>>>>
> >> > >>>>> -Taylor
> >> > >>>>>
> >> > >>>>>> On Jan 30, 2018, at 7:52 PM, P. Taylor Goetz <
> ptgoetz@gmail.com>
> >> > >>> wrote:
> >> > >>>>>>
> >> > >>>>>>
> >> > >>>>>>
> >> > >>>>>>> On Jan 30, 2018, at 7:31 PM, Harsha <st...@harsha.io> wrote:
> >> > >>>>>>>
> >> > >>>>>>> Hi,
> >> > >>>>>>>        In general connectors are independent of Storm run-time
> >> for
> >> > >>>> most parts. I.e if the APIs are not changed (storm-core or
> trident
> >> > >>> haven't
> >> > >>>> changed in years except the package re-name). You can take the
> >> latest
> >> > >>>> connector and run in storm 1.0 or higher. So the users doesn't
> need
> >> to
> >> > >>>> upgrade their storm cluster just to get a latest connector
> upgrade.
> >> > Which
> >> > >>>> they might be doing it but by making the release separate and
> >> stating
> >> > the
> >> > >>>> minimum supported storm version for the connectors will help the
> >> > users.
> >> > >>>>>>> This makes it easier for the connectors to be released
> >> > independently
> >> > >>>> of the core/run-time and makes it easy for them to be fixed and
> >> > released
> >> > >>>> more often. But moving them to Bahir or other external project
> will
> >> > make
> >> > >>> it
> >> > >>>> detached from Storm itself that it might not see any
> co-ordination
> >> as
> >> > >>>> reviewers from storm  will need to be aware of an external
> project.
> >> > >>>>>>> My proposal would be
> >> > >>>>>>> 1. Can we create a sub-project in git under Storm so we can
> move
> >> > the
> >> > >>>> connectors there and everything else related Storm applies there.
> >> > >>>>>>> 2.  Can we keep maintaining storm connectors within same repo
> but
> >> > >>>> different release module for it .
> >> > >>>>>>
> >> > >>>>>> +1 That’s exactly my point. Just jettisoning connectors to
> Bahir
> >> > >>>> without commitments from the Storm community would be a mistake.
> >> > >>>>>>
> >> > >>>>>> Releasing connectors independently can be handled easily at the
> >> > Maven
> >> > >>>> level. No need for a separate repo initiaially.
> >> > >>>>>>
> >> > >>>>>>
> >> > >>>>>>>
> >> > >>>>>>> This is a separate topic but can improve the release
> timelines if
> >> > we
> >> > >>>> have multiple release managers that are handling the maint
> release
> >> and
> >> > >>> also
> >> > >>>> main release versions. Its good to have rotation of release
> managers
> >> > from
> >> > >>>> PMC so that everyone will understand the process and can spread
> the
> >> > >>>> responsibilities. There are threads started before but don't
> think
> >> > they
> >> > >>> are
> >> > >>>> addressed or any action item is taken. We should start another
> >> thread
> >> > to
> >> > >>>> discuss this process as well.
> >> > >>>>>>
> >> > >>>>>> Breaking up external modules into separately released versions
> >> would
> >> > >>> be
> >> > >>>> a great way to indoctrinate those new to the license grooming and
> >> > release
> >> > >>>> process. Everyone could participate.
> >> > >>>>>>
> >> > >>>>>>
> >> > >>>>>>>
> >> > >>>>>>> Thanks,
> >> > >>>>>>> Harsha
> >> > >>>>>>
> >> > >>>>>> -Taylor
> >> > >>>>>>
> >> > >>>>>>>
> >> > >>>>>>>> On Tue, Jan 30, 2018, at 9:49 AM, Hugo Da Cruz Louro wrote:
> >> > >>>>>>>> I think that the bahir approach makes sense for connectors
> that
> >> > >>> don’t
> >> > >>>>>>>> fall into the "first class support” category. I am in favor
> of
> >> > >>> moving
> >> > >>>>>>>> such lower adoption connectors and have the interested
> >> communities
> >> > >>>>>>>> support them with the most suitable release cycle. Connectors
> >> that
> >> > >>> are
> >> > >>>>>>>> idle, such as some examples that Jungtaek gave, we should
> >> consider
> >> > >>>>>>>> removing them altogether, especially if they are so outdated
> >> that
> >> > >>> they
> >> > >>>>>>>> may not even work.
> >> > >>>>>>>>
> >> > >>>>>>>> Mainstream connectors such as storm-kafka-client should be
> kept
> >> in
> >> > >>> the
> >> > >>>>>>>> Storm repo. For example, Flink keeps
> flink-connector-kafka-0.x
> >> in
> >> > >>> the
> >> > >>>>>>>> Flink repo.
> >> > >>>>>>>>
> >> > >>>>>>>> I am in agreement with Jungtaek when he says: "fixing
> critical
> >> > bugs
> >> > >>> in
> >> > >>>>>>>> storm-kafka-client should trigger release, instead of waiting
> >> for
> >> > >>>> Storm
> >> > >>>>>>>> core to have some fixes to be worth to release”. Storm’s
> release
> >> > >>>> cadence
> >> > >>>>>>>> is currently not very high and one can argue that Storm
> entirely
> >> > >>> could
> >> > >>>>>>>> benefit from more frequent releases. If it is sto
> >> rm-kafka-client
> >> > >>>>>>>> triggering those releases, so be it. Moving forward I do not
> >> > expect
> >> > >>>> the
> >> > >>>>>>>> storm-kafka-client connector to be subject to so many changes
> >> that
> >> > >>> it
> >> > >>>>>>>> would warrant its own release cycle.
> >> > >>>>>>>>
> >> > >>>>>>>> I also would like to highlight that although
> storm-kafka-client
> >> > has
> >> > >>>> been
> >> > >>>>>>>> the center of this discussion, as it was mentioned in this
> >> > >>>>>>>> thread<https://goo.gl/VY7QTG>, storm-kafka-client has had a
> >> much
> >> > >>> less
> >> > >>>>>>>> rocky road to stability compared to for example storm-kafka.
> >> > >>> Therefore
> >> > >>>>>>>> it’s worth evaluating if the challenges that we have faced
> with
> >> > >>> storm-
> >> > >>>>>>>> kafka-client have been out of norm for such an important and
> >> > complex
> >> > >>>>>>>> feature, and if they warrant significant changes in how we do
> >> > >>> things.
> >> > >>>>>>>>
> >> > >>>>>>>> Thanks,
> >> > >>>>>>>> Hugo
> >> > >>>>>>>>
> >> > >>>>>>>> On Jan 29, 2018, at 9:18 PM, Jungtaek Lim
> >> > >>>>>>>> <ka...@gmail.com>> wrote:
> >> > >>>>>>>>
> >> > >>>>>>>> Let me add a proof of my opinion: major patch of
> storm-eventhubs
> >> > >>>> hasn't
> >> > >>>>>>>> been getting even a comment over 4 months.
> >> > >>>>>>>> https://github.com/apache/storm/pull/2322
> >> > >>>>>>>>
> >> > >>>>>>>> I'd rather want to discuss regarding discontinue supporting
> >> > >>>> officially if
> >> > >>>>>>>> we no longer interest of, or we don't have resource to
> support,
> >> or
> >> > >>> any
> >> > >>>>>>>> valid reasons. If we agree on discontinue supporting
> officially,
> >> > we
> >> > >>>> can
> >> > >>>>>>>> move out to other repo. and let it self maintained. It may be
> >> able
> >> > >>> to
> >> > >>>> get
> >> > >>>>>>>> attention and have enough contributors so that we feel
> better to
> >> > get
> >> > >>>> to
> >> > >>>>>>>> Storm core Repository again, or it can be silently
> forgotten. It
> >> > >>>> shouldn't
> >> > >>>>>>>> affect Storm core repository at any case.
> >> > >>>>>>>>
> >> > >>>>>>>> 2018년 1월 30일 (화) 오후 2:03, Jungtaek Lim <ka...@gmail.com>님이
> >> 작성:
> >> > >>>>>>>>
> >> > >>>>>>>> If we worry about breaking somethings along with our
> >> > >>>>>>>> users/consumers/distributors, picking one of less
> used/updated
> >> > >>>> connector as
> >> > >>>>>>>> experiment makes more sense to me. It's OK if we want to pick
> >> one
> >> > of
> >> > >>>> most
> >> > >>>>>>>> active and widely used connector intentionally to accelerate
> >> > >>>> experiment.
> >> > >>>>>>>>
> >> > >>>>>>>> Decoupling connectors and moving to other repo. like Bahir
> will
> >> > make
> >> > >>>> it
> >> > >>>>>>>> clear who are having interest of which connectors.
> >> storm-eventhubs
> >> > >>> for
> >> > >>>>>>>> example, major code contributions were done from MS
> developers.
> >> > Now
> >> > >>>> they
> >> > >>>>>>>> are gone, and I don't know even storm-eventhubs are
> compatible
> >> > with
> >> > >>>> recent
> >> > >>>>>>>> Azure Eventhub. That's just a one of them. I've seen many
> >> > connectors
> >> > >>>> in
> >> > >>>>>>>> same, or similar, or possible (say truck number 1) situation.
> >> > >>>>>>>>
> >> > >>>>>>>> -Jungtaek Lim (HeartSaVioR)
> >> > >>>>>>>>
> >> > >>>>>>>> 2018년 1월 30일 (화) 오후 1:30, P. Taylor Goetz <ptgoetz@gmail.com
> >님이
> >> > 작성:
> >> > >>>>>>>>
> >> > >>>>>>>>
> >> > >>>>>>>> On Jan 29, 2018, at 8:03 PM, Jungtaek Lim <kabhwan@gmail.com
> >
> >> > >>> wrote:
> >> > >>>>>>>>
> >> > >>>>>>>>
> >> > >>>>>>>>
> >> > >>>>>>>> - Do we ensure they're all maintained?
> >> > >>>>>>>> -- Did we exclude inactive committers/PMCs for connector's
> >> > committer
> >> > >>>>>>>>
> >> > >>>>>>>> sponsors, and do they have enough committer sponsors after
> that?
> >> > >>>>>>>>
> >> > >>>>>>>>
> >> > >>>>>>>> Good point. We’ve had some sponsors go silent recently. Maybe
> >> ping
> >> > >>>>>>>> sponsors and ask if they wish to maintain sponsorship?
> >> > >>>>>>>>
> >> > >>>>>>>> As a sponsor for a number of connectors, I’ll check on the
> ones
> >> > I’ve
> >> > >>>>>>>> sponsored.
> >> > >>>>>>>>
> >> > >>>>>>>> - Do they all worth to keep maintaining in Storm main
> >> repository?
> >> > >>>>>>>>
> >> > >>>>>>>>
> >> > >>>>>>>> Again, that’s a question of whether there is user/dev
> interest.
> >> > >>>>>>>>
> >> > >>>>>>>>
> >> > >>>>>>>> -- Should we trigger release if we find and resolve
> >> > critical/blocker
> >> > >>>> issue
> >> > >>>>>>>> from them? If not, why we allow to leave the thing which is
> in
> >> > main
> >> > >>>>>>>> repository as inconsistent state?
> >> > >>>>>>>>
> >> > >>>>>>>>
> >> > >>>>>>>> Some are tied to fairly well established protocols, some
> target
> >> > >>> really
> >> > >>>>>>>> volatile APIs. Bug reports and mailing list activity may not
> be
> >> a
> >> > >>> good
> >> > >>>>>>>> status indicator.
> >> > >>>>>>>>
> >> > >>>>>>>> Storm’s Kafka integration was the initial model for the
> >> “batteries
> >> > >>>>>>>> included” impetus behind `external`. If we want to evolve how
> >> that
> >> > >>>> works,
> >> > >>>>>>>> why not start there, see what works/doesn’t work, and adapt.
> >> > >>>>>>>>
> >> > >>>>>>>> I don’t want to shock our users/consumers/distributors.
> >> > >>>>>>>>
> >> > >>>>>>>>
> >> > >>>>>>>> -Taylor
> >> > >>>>>>>>
> >> > >>>>>>>>
> >> > >>>>>>>>
> >> > >>>>>>>>
> >> > >>>>>>>>
> >> > >>>>>>>>
> >> > >>>>>>>>
> >> > >>>>
> >> > >>>
> >> > >>
> >> > >>
> >> > >
> >> > >
> >> > >
> >> >
> >>
> >>
> >>
> >> --
> >> Thanks,
> >> Xin
> >>
>