You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by Sean Owen <so...@cloudera.com> on 2017/10/01 08:27:59 UTC

Re: Should Flume integration be behind a profile?

I tried and failed to do this in
https://issues.apache.org/jira/browse/SPARK-22142 because it became clear
that the Flume examples would have to be removed to make this work, too.
(Well, you can imagine other solutions with extra source dirs or modules
for flume examples enabled by a profile, but that doesn't help the docs and
is nontrivial complexity for little gain.)

It kind of suggests Flume support should be deprecated if it's put behind a
profile. Like with Kafka 0.8. (This is why I'm raising it again to the
whole list.)

Any preferences among:
1. Put Flume behind a profile, remove examples, deprecate
2. Put Flume behind a profile, remove examples, but don't deprecate
3. Punt until Spark 3.0, when this integration would probably be removed
entirely (?)

On Tue, Sep 26, 2017 at 10:36 AM Sean Owen <so...@cloudera.com> wrote:

> Not a big deal, but I'm wondering whether Flume integration should at
> least be opt-in and behind a profile? it still sees some use (at least on
> our end) but not applicable to the majority of users. Most other
> third-party framework integrations are behind a profile, like YARN, Mesos,
> Kinesis, Kafka 0.8, Docker. Just soliciting comments, not arguing for it.
>
> (Well, actually it annoys me that the Flume integration always fails to
> compile in IntelliJ unless you generate the sources manually)
>

Re: Should Flume integration be behind a profile?

Posted by Sean Owen <so...@cloudera.com>.

CCing user@
Yeah good point about perhaps moving the examples into the module itself.
Actually removing it would be a long way off, no matter what.

On Mon, Oct 2, 2017 at 8:35 AM Nick Pentreath <ni...@gmail.com>
wrote:

> I'd agree with #1 or #2. Deprecation now seems fine.
>
> Perhaps this should be raised on the user list also?
>
> And perhaps it makes sense to look at moving the Flume support into Apache
> Bahir if there is interest (I've cc'ed Bahir dev list here)? That way the
> current state of the connector could keep going for those users who may
> need it.
>
> As for examples, for the Kinesis connector the examples now live in the
> subproject (see e.g. KinesisWordCountASL under external/kinesis-asl). So we
> don't have to completely remove the examples, just move them (this may not
> solve the doc issue but at least the examples are still there for anyone
> who needs them).
>
> On Mon, 2 Oct 2017 at 06:36 Mridul Muralidharan <mr...@gmail.com> wrote:
>
>> I agree, proposal 1 sounds better among the options.
>>
>> Regards,
>> Mridul
>>
>>
>> On Sun, Oct 1, 2017 at 3:50 PM, Reynold Xin <rx...@databricks.com> wrote:
>> > Probably should do 1, and then it is an easier transition in 3.0.
>> >
>> > On Sun, Oct 1, 2017 at 1:28 AM Sean Owen <so...@cloudera.com> wrote:
>> >>
>> >> I tried and failed to do this in
>> >> https://issues.apache.org/jira/browse/SPARK-22142 because it became
>> clear
>> >> that the Flume examples would have to be removed to make this work,
>> too.
>> >> (Well, you can imagine other solutions with extra source dirs or
>> modules for
>> >> flume examples enabled by a profile, but that doesn't help the docs
>> and is
>> >> nontrivial complexity for little gain.)
>> >>
>> >> It kind of suggests Flume support should be deprecated if it's put
>> behind
>> >> a profile. Like with Kafka 0.8. (This is why I'm raising it again to
>> the
>> >> whole list.)
>> >>
>> >> Any preferences among:
>> >> 1. Put Flume behind a profile, remove examples, deprecate
>> >> 2. Put Flume behind a profile, remove examples, but don't deprecate
>> >> 3. Punt until Spark 3.0, when this integration would probably be
>> removed
>> >> entirely (?)
>> >>
>> >> On Tue, Sep 26, 2017 at 10:36 AM Sean Owen <so...@cloudera.com> wrote:
>> >>>
>> >>> Not a big deal, but I'm wondering whether Flume integration should at
>> >>> least be opt-in and behind a profile? it still sees some use (at
>> least on
>> >>> our end) but not applicable to the majority of users. Most other
>> third-party
>> >>> framework integrations are behind a profile, like YARN, Mesos,
>> Kinesis,
>> >>> Kafka 0.8, Docker. Just soliciting comments, not arguing for it.
>> >>>
>> >>> (Well, actually it annoys me that the Flume integration always fails
>> to
>> >>> compile in IntelliJ unless you generate the sources manually)
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>
>>

Re: Should Flume integration be behind a profile?

Posted by Sean Owen <so...@cloudera.com>.

CCing user@
Yeah good point about perhaps moving the examples into the module itself.
Actually removing it would be a long way off, no matter what.

On Mon, Oct 2, 2017 at 8:35 AM Nick Pentreath <ni...@gmail.com>
wrote:

> I'd agree with #1 or #2. Deprecation now seems fine.
>
> Perhaps this should be raised on the user list also?
>
> And perhaps it makes sense to look at moving the Flume support into Apache
> Bahir if there is interest (I've cc'ed Bahir dev list here)? That way the
> current state of the connector could keep going for those users who may
> need it.
>
> As for examples, for the Kinesis connector the examples now live in the
> subproject (see e.g. KinesisWordCountASL under external/kinesis-asl). So we
> don't have to completely remove the examples, just move them (this may not
> solve the doc issue but at least the examples are still there for anyone
> who needs them).
>
> On Mon, 2 Oct 2017 at 06:36 Mridul Muralidharan <mr...@gmail.com> wrote:
>
>> I agree, proposal 1 sounds better among the options.
>>
>> Regards,
>> Mridul
>>
>>
>> On Sun, Oct 1, 2017 at 3:50 PM, Reynold Xin <rx...@databricks.com> wrote:
>> > Probably should do 1, and then it is an easier transition in 3.0.
>> >
>> > On Sun, Oct 1, 2017 at 1:28 AM Sean Owen <so...@cloudera.com> wrote:
>> >>
>> >> I tried and failed to do this in
>> >> https://issues.apache.org/jira/browse/SPARK-22142 because it became
>> clear
>> >> that the Flume examples would have to be removed to make this work,
>> too.
>> >> (Well, you can imagine other solutions with extra source dirs or
>> modules for
>> >> flume examples enabled by a profile, but that doesn't help the docs
>> and is
>> >> nontrivial complexity for little gain.)
>> >>
>> >> It kind of suggests Flume support should be deprecated if it's put
>> behind
>> >> a profile. Like with Kafka 0.8. (This is why I'm raising it again to
>> the
>> >> whole list.)
>> >>
>> >> Any preferences among:
>> >> 1. Put Flume behind a profile, remove examples, deprecate
>> >> 2. Put Flume behind a profile, remove examples, but don't deprecate
>> >> 3. Punt until Spark 3.0, when this integration would probably be
>> removed
>> >> entirely (?)
>> >>
>> >> On Tue, Sep 26, 2017 at 10:36 AM Sean Owen <so...@cloudera.com> wrote:
>> >>>
>> >>> Not a big deal, but I'm wondering whether Flume integration should at
>> >>> least be opt-in and behind a profile? it still sees some use (at
>> least on
>> >>> our end) but not applicable to the majority of users. Most other
>> third-party
>> >>> framework integrations are behind a profile, like YARN, Mesos,
>> Kinesis,
>> >>> Kafka 0.8, Docker. Just soliciting comments, not arguing for it.
>> >>>
>> >>> (Well, actually it annoys me that the Flume integration always fails
>> to
>> >>> compile in IntelliJ unless you generate the sources manually)
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>
>>

Re: Should Flume integration be behind a profile?

Posted by Luciano Resende <lu...@gmail.com>.

On Mon, Oct 2, 2017 at 12:34 AM, Nick Pentreath <ni...@gmail.com>
wrote:

> I'd agree with #1 or #2. Deprecation now seems fine.
>
> Perhaps this should be raised on the user list also?
>
> And perhaps it makes sense to look at moving the Flume support into Apache
> Bahir if there is interest (I've cc'ed Bahir dev list here)? That way the
> current state of the connector could keep going for those users who may
> need it.
>
>
+1

Apache Bahir main goal is to provide extensions to multiple distributed
analytic platforms, extending their reach with a diversity of streaming
connectors and SQL data sources. Apache Bahir would welcome proposals to
move extensions from Apache Spark to itself, this would give more
flexibility to the Spark dev community as they could focus on core
functionality, without loosing the ability to enhance these extensions as
most of Spark committers have write access to Bahir repositories. Also,
users should not see much difference, as Bahir have been creating releases
for every Spark release.

If the Spark dev community decides to move to this route, please create a
jira on the Bahir project and we could use this thread or a new specific
one to discuss any details.

Thanks

-- 
Luciano Resende
http://twitter.com/lresende1975
http://lresende.blogspot.com/

Re: Should Flume integration be behind a profile?

Posted by Luciano Resende <lu...@gmail.com>.

On Mon, Oct 2, 2017 at 12:34 AM, Nick Pentreath <ni...@gmail.com>
wrote:

> I'd agree with #1 or #2. Deprecation now seems fine.
>
> Perhaps this should be raised on the user list also?
>
> And perhaps it makes sense to look at moving the Flume support into Apache
> Bahir if there is interest (I've cc'ed Bahir dev list here)? That way the
> current state of the connector could keep going for those users who may
> need it.
>
>
+1

Apache Bahir main goal is to provide extensions to multiple distributed
analytic platforms, extending their reach with a diversity of streaming
connectors and SQL data sources. Apache Bahir would welcome proposals to
move extensions from Apache Spark to itself, this would give more
flexibility to the Spark dev community as they could focus on core
functionality, without loosing the ability to enhance these extensions as
most of Spark committers have write access to Bahir repositories. Also,
users should not see much difference, as Bahir have been creating releases
for every Spark release.

If the Spark dev community decides to move to this route, please create a
jira on the Bahir project and we could use this thread or a new specific
one to discuss any details.

Thanks

-- 
Luciano Resende
http://twitter.com/lresende1975
http://lresende.blogspot.com/

Re: Should Flume integration be behind a profile?

Posted by Sean Owen <so...@cloudera.com>.

CCing user@
Yeah good point about perhaps moving the examples into the module itself.
Actually removing it would be a long way off, no matter what.

On Mon, Oct 2, 2017 at 8:35 AM Nick Pentreath <ni...@gmail.com>
wrote:

> I'd agree with #1 or #2. Deprecation now seems fine.
>
> Perhaps this should be raised on the user list also?
>
> And perhaps it makes sense to look at moving the Flume support into Apache
> Bahir if there is interest (I've cc'ed Bahir dev list here)? That way the
> current state of the connector could keep going for those users who may
> need it.
>
> As for examples, for the Kinesis connector the examples now live in the
> subproject (see e.g. KinesisWordCountASL under external/kinesis-asl). So we
> don't have to completely remove the examples, just move them (this may not
> solve the doc issue but at least the examples are still there for anyone
> who needs them).
>
> On Mon, 2 Oct 2017 at 06:36 Mridul Muralidharan <mr...@gmail.com> wrote:
>
>> I agree, proposal 1 sounds better among the options.
>>
>> Regards,
>> Mridul
>>
>>
>> On Sun, Oct 1, 2017 at 3:50 PM, Reynold Xin <rx...@databricks.com> wrote:
>> > Probably should do 1, and then it is an easier transition in 3.0.
>> >
>> > On Sun, Oct 1, 2017 at 1:28 AM Sean Owen <so...@cloudera.com> wrote:
>> >>
>> >> I tried and failed to do this in
>> >> https://issues.apache.org/jira/browse/SPARK-22142 because it became
>> clear
>> >> that the Flume examples would have to be removed to make this work,
>> too.
>> >> (Well, you can imagine other solutions with extra source dirs or
>> modules for
>> >> flume examples enabled by a profile, but that doesn't help the docs
>> and is
>> >> nontrivial complexity for little gain.)
>> >>
>> >> It kind of suggests Flume support should be deprecated if it's put
>> behind
>> >> a profile. Like with Kafka 0.8. (This is why I'm raising it again to
>> the
>> >> whole list.)
>> >>
>> >> Any preferences among:
>> >> 1. Put Flume behind a profile, remove examples, deprecate
>> >> 2. Put Flume behind a profile, remove examples, but don't deprecate
>> >> 3. Punt until Spark 3.0, when this integration would probably be
>> removed
>> >> entirely (?)
>> >>
>> >> On Tue, Sep 26, 2017 at 10:36 AM Sean Owen <so...@cloudera.com> wrote:
>> >>>
>> >>> Not a big deal, but I'm wondering whether Flume integration should at
>> >>> least be opt-in and behind a profile? it still sees some use (at
>> least on
>> >>> our end) but not applicable to the majority of users. Most other
>> third-party
>> >>> framework integrations are behind a profile, like YARN, Mesos,
>> Kinesis,
>> >>> Kafka 0.8, Docker. Just soliciting comments, not arguing for it.
>> >>>
>> >>> (Well, actually it annoys me that the Flume integration always fails
>> to
>> >>> compile in IntelliJ unless you generate the sources manually)
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>
>>

Re: Should Flume integration be behind a profile?

Posted by Nick Pentreath <ni...@gmail.com>.

I'd agree with #1 or #2. Deprecation now seems fine.

Perhaps this should be raised on the user list also?

And perhaps it makes sense to look at moving the Flume support into Apache
Bahir if there is interest (I've cc'ed Bahir dev list here)? That way the
current state of the connector could keep going for those users who may
need it.

As for examples, for the Kinesis connector the examples now live in the
subproject (see e.g. KinesisWordCountASL under external/kinesis-asl). So we
don't have to completely remove the examples, just move them (this may not
solve the doc issue but at least the examples are still there for anyone
who needs them).

On Mon, 2 Oct 2017 at 06:36 Mridul Muralidharan <mr...@gmail.com> wrote:

> I agree, proposal 1 sounds better among the options.
>
> Regards,
> Mridul
>
>
> On Sun, Oct 1, 2017 at 3:50 PM, Reynold Xin <rx...@databricks.com> wrote:
> > Probably should do 1, and then it is an easier transition in 3.0.
> >
> > On Sun, Oct 1, 2017 at 1:28 AM Sean Owen <so...@cloudera.com> wrote:
> >>
> >> I tried and failed to do this in
> >> https://issues.apache.org/jira/browse/SPARK-22142 because it became
> clear
> >> that the Flume examples would have to be removed to make this work, too.
> >> (Well, you can imagine other solutions with extra source dirs or
> modules for
> >> flume examples enabled by a profile, but that doesn't help the docs and
> is
> >> nontrivial complexity for little gain.)
> >>
> >> It kind of suggests Flume support should be deprecated if it's put
> behind
> >> a profile. Like with Kafka 0.8. (This is why I'm raising it again to the
> >> whole list.)
> >>
> >> Any preferences among:
> >> 1. Put Flume behind a profile, remove examples, deprecate
> >> 2. Put Flume behind a profile, remove examples, but don't deprecate
> >> 3. Punt until Spark 3.0, when this integration would probably be removed
> >> entirely (?)
> >>
> >> On Tue, Sep 26, 2017 at 10:36 AM Sean Owen <so...@cloudera.com> wrote:
> >>>
> >>> Not a big deal, but I'm wondering whether Flume integration should at
> >>> least be opt-in and behind a profile? it still sees some use (at least
> on
> >>> our end) but not applicable to the majority of users. Most other
> third-party
> >>> framework integrations are behind a profile, like YARN, Mesos, Kinesis,
> >>> Kafka 0.8, Docker. Just soliciting comments, not arguing for it.
> >>>
> >>> (Well, actually it annoys me that the Flume integration always fails to
> >>> compile in IntelliJ unless you generate the sources manually)
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>
>

Re: Should Flume integration be behind a profile?

Posted by Nick Pentreath <ni...@gmail.com>.

I'd agree with #1 or #2. Deprecation now seems fine.

Perhaps this should be raised on the user list also?

And perhaps it makes sense to look at moving the Flume support into Apache
Bahir if there is interest (I've cc'ed Bahir dev list here)? That way the
current state of the connector could keep going for those users who may
need it.

As for examples, for the Kinesis connector the examples now live in the
subproject (see e.g. KinesisWordCountASL under external/kinesis-asl). So we
don't have to completely remove the examples, just move them (this may not
solve the doc issue but at least the examples are still there for anyone
who needs them).

On Mon, 2 Oct 2017 at 06:36 Mridul Muralidharan <mr...@gmail.com> wrote:

> I agree, proposal 1 sounds better among the options.
>
> Regards,
> Mridul
>
>
> On Sun, Oct 1, 2017 at 3:50 PM, Reynold Xin <rx...@databricks.com> wrote:
> > Probably should do 1, and then it is an easier transition in 3.0.
> >
> > On Sun, Oct 1, 2017 at 1:28 AM Sean Owen <so...@cloudera.com> wrote:
> >>
> >> I tried and failed to do this in
> >> https://issues.apache.org/jira/browse/SPARK-22142 because it became
> clear
> >> that the Flume examples would have to be removed to make this work, too.
> >> (Well, you can imagine other solutions with extra source dirs or
> modules for
> >> flume examples enabled by a profile, but that doesn't help the docs and
> is
> >> nontrivial complexity for little gain.)
> >>
> >> It kind of suggests Flume support should be deprecated if it's put
> behind
> >> a profile. Like with Kafka 0.8. (This is why I'm raising it again to the
> >> whole list.)
> >>
> >> Any preferences among:
> >> 1. Put Flume behind a profile, remove examples, deprecate
> >> 2. Put Flume behind a profile, remove examples, but don't deprecate
> >> 3. Punt until Spark 3.0, when this integration would probably be removed
> >> entirely (?)
> >>
> >> On Tue, Sep 26, 2017 at 10:36 AM Sean Owen <so...@cloudera.com> wrote:
> >>>
> >>> Not a big deal, but I'm wondering whether Flume integration should at
> >>> least be opt-in and behind a profile? it still sees some use (at least
> on
> >>> our end) but not applicable to the majority of users. Most other
> third-party
> >>> framework integrations are behind a profile, like YARN, Mesos, Kinesis,
> >>> Kafka 0.8, Docker. Just soliciting comments, not arguing for it.
> >>>
> >>> (Well, actually it annoys me that the Flume integration always fails to
> >>> compile in IntelliJ unless you generate the sources manually)
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>
>

Re: Should Flume integration be behind a profile?

Posted by Mridul Muralidharan <mr...@gmail.com>.

I agree, proposal 1 sounds better among the options.

Regards,
Mridul


On Sun, Oct 1, 2017 at 3:50 PM, Reynold Xin <rx...@databricks.com> wrote:
> Probably should do 1, and then it is an easier transition in 3.0.
>
> On Sun, Oct 1, 2017 at 1:28 AM Sean Owen <so...@cloudera.com> wrote:
>>
>> I tried and failed to do this in
>> https://issues.apache.org/jira/browse/SPARK-22142 because it became clear
>> that the Flume examples would have to be removed to make this work, too.
>> (Well, you can imagine other solutions with extra source dirs or modules for
>> flume examples enabled by a profile, but that doesn't help the docs and is
>> nontrivial complexity for little gain.)
>>
>> It kind of suggests Flume support should be deprecated if it's put behind
>> a profile. Like with Kafka 0.8. (This is why I'm raising it again to the
>> whole list.)
>>
>> Any preferences among:
>> 1. Put Flume behind a profile, remove examples, deprecate
>> 2. Put Flume behind a profile, remove examples, but don't deprecate
>> 3. Punt until Spark 3.0, when this integration would probably be removed
>> entirely (?)
>>
>> On Tue, Sep 26, 2017 at 10:36 AM Sean Owen <so...@cloudera.com> wrote:
>>>
>>> Not a big deal, but I'm wondering whether Flume integration should at
>>> least be opt-in and behind a profile? it still sees some use (at least on
>>> our end) but not applicable to the majority of users. Most other third-party
>>> framework integrations are behind a profile, like YARN, Mesos, Kinesis,
>>> Kafka 0.8, Docker. Just soliciting comments, not arguing for it.
>>>
>>> (Well, actually it annoys me that the Flume integration always fails to
>>> compile in IntelliJ unless you generate the sources manually)

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org

Re: Should Flume integration be behind a profile?

Posted by Reynold Xin <rx...@databricks.com>.

Probably should do 1, and then it is an easier transition in 3.0.

On Sun, Oct 1, 2017 at 1:28 AM Sean Owen <so...@cloudera.com> wrote:

> I tried and failed to do this in
> https://issues.apache.org/jira/browse/SPARK-22142 because it became clear
> that the Flume examples would have to be removed to make this work, too.
> (Well, you can imagine other solutions with extra source dirs or modules
> for flume examples enabled by a profile, but that doesn't help the docs and
> is nontrivial complexity for little gain.)
>
> It kind of suggests Flume support should be deprecated if it's put behind
> a profile. Like with Kafka 0.8. (This is why I'm raising it again to the
> whole list.)
>
> Any preferences among:
> 1. Put Flume behind a profile, remove examples, deprecate
> 2. Put Flume behind a profile, remove examples, but don't deprecate
> 3. Punt until Spark 3.0, when this integration would probably be removed
> entirely (?)
>
> On Tue, Sep 26, 2017 at 10:36 AM Sean Owen <so...@cloudera.com> wrote:
>
>> Not a big deal, but I'm wondering whether Flume integration should at
>> least be opt-in and behind a profile? it still sees some use (at least on
>> our end) but not applicable to the majority of users. Most other
>> third-party framework integrations are behind a profile, like YARN, Mesos,
>> Kinesis, Kafka 0.8, Docker. Just soliciting comments, not arguing for it.
>>
>> (Well, actually it annoys me that the Flume integration always fails to
>> compile in IntelliJ unless you generate the sources manually)
>>
>