You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Kostas Sakellis <ko...@cloudera.com> on 2016/03/22 08:27:19 UTC

SPARK-13843 Next steps

Hello all,

I'd like to close out the discussion on SPARK-13843 by getting a poll from
the community on which components we should seriously reconsider re-adding
back to Apache Spark. For reference, here are the modules that were removed
as part of SPARK-13843 and pushed to: https://github.com/spark-packages

   - streaming-flume
   - streaming-akka
   - streaming-mqtt
   - streaming-zeromq
   - streaming-twitter

For us, we'd like to see the streaming-flume added back to Apache Spark.

Thanks,
Kostas

Re: SPARK-13843 Next steps

Posted by Jean-Baptiste Onofré <jb...@nanthrax.net>.
OK, so kafka, kinesis and flume will stay in Spark.

Thanks,
Regards
JB

On 03/22/2016 08:30 AM, Reynold Xin wrote:
> Kinesis is still in it. I think it's OK to add Flume back.
>
> On Tue, Mar 22, 2016 at 12:29 AM, Jean-Baptiste Onofré <jb@nanthrax.net
> <ma...@nanthrax.net>> wrote:
>
>     Thanks for the update Kostas,
>
>     for now, kafka stays in Spark and Kinesis will be removed, right ?
>
>     Regards
>     JB
>
>     On 03/22/2016 08:27 AM, Kostas Sakellis wrote:
>
>         Hello all,
>
>         I'd like to close out the discussion on SPARK-13843 by getting a
>         poll
>         from the community on which components we should seriously
>         reconsider
>         re-adding back to Apache Spark. For reference, here are the
>         modules that
>         were removed as part of SPARK-13843 and pushed to:
>         https://github.com/spark-packages
>
>            * streaming-flume
>            * streaming-akka
>            * streaming-mqtt
>            * streaming-zeromq
>            * streaming-twitter
>
>         For us, we'd like to see the streaming-flume added back to
>         Apache Spark.
>
>         Thanks,
>         Kostas
>
>
>     --
>     Jean-Baptiste Onofré
>     jbonofre@apache.org <ma...@apache.org>
>     http://blog.nanthrax.net
>     Talend - http://www.talend.com
>
>     ---------------------------------------------------------------------
>     To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>     <ma...@spark.apache.org>
>     For additional commands, e-mail: dev-help@spark.apache.org
>     <ma...@spark.apache.org>
>
>

-- 
Jean-Baptiste Onofré
jbonofre@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Re: SPARK-13843 Next steps

Posted by Reynold Xin <rx...@databricks.com>.
Kinesis is still in it. I think it's OK to add Flume back.

On Tue, Mar 22, 2016 at 12:29 AM, Jean-Baptiste Onofré <jb...@nanthrax.net>
wrote:

> Thanks for the update Kostas,
>
> for now, kafka stays in Spark and Kinesis will be removed, right ?
>
> Regards
> JB
>
> On 03/22/2016 08:27 AM, Kostas Sakellis wrote:
>
>> Hello all,
>>
>> I'd like to close out the discussion on SPARK-13843 by getting a poll
>> from the community on which components we should seriously reconsider
>> re-adding back to Apache Spark. For reference, here are the modules that
>> were removed as part of SPARK-13843 and pushed to:
>> https://github.com/spark-packages
>>
>>   * streaming-flume
>>   * streaming-akka
>>   * streaming-mqtt
>>   * streaming-zeromq
>>   * streaming-twitter
>>
>> For us, we'd like to see the streaming-flume added back to Apache Spark.
>>
>> Thanks,
>> Kostas
>>
>
> --
> Jean-Baptiste Onofré
> jbonofre@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>
>

Re: SPARK-13843 Next steps

Posted by Jean-Baptiste Onofré <jb...@nanthrax.net>.
Thanks for the update Kostas,

for now, kafka stays in Spark and Kinesis will be removed, right ?

Regards
JB

On 03/22/2016 08:27 AM, Kostas Sakellis wrote:
> Hello all,
>
> I'd like to close out the discussion on SPARK-13843 by getting a poll
> from the community on which components we should seriously reconsider
> re-adding back to Apache Spark. For reference, here are the modules that
> were removed as part of SPARK-13843 and pushed to:
> https://github.com/spark-packages
>
>   * streaming-flume
>   * streaming-akka
>   * streaming-mqtt
>   * streaming-zeromq
>   * streaming-twitter
>
> For us, we'd like to see the streaming-flume added back to Apache Spark.
>
> Thanks,
> Kostas

-- 
Jean-Baptiste Onofré
jbonofre@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Re: SPARK-13843 Next steps

Posted by Cody Koeninger <co...@koeninger.org>.
I'm in favor of everything in /extras and /external being removed, but
I'm more in favor of making a decision and moving on.

On Tue, Mar 22, 2016 at 12:20 PM, Marcelo Vanzin <va...@cloudera.com> wrote:
> +1 for getting flume back.
>
> On Tue, Mar 22, 2016 at 12:27 AM, Kostas Sakellis <ko...@cloudera.com> wrote:
>> Hello all,
>>
>> I'd like to close out the discussion on SPARK-13843 by getting a poll from
>> the community on which components we should seriously reconsider re-adding
>> back to Apache Spark. For reference, here are the modules that were removed
>> as part of SPARK-13843 and pushed to: https://github.com/spark-packages
>>
>> streaming-flume
>> streaming-akka
>> streaming-mqtt
>> streaming-zeromq
>> streaming-twitter
>>
>> For us, we'd like to see the streaming-flume added back to Apache Spark.
>>
>> Thanks,
>> Kostas
>
>
>
> --
> Marcelo
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Re: SPARK-13843 Next steps

Posted by Marcelo Vanzin <va...@cloudera.com>.
+1 for getting flume back.

On Tue, Mar 22, 2016 at 12:27 AM, Kostas Sakellis <ko...@cloudera.com> wrote:
> Hello all,
>
> I'd like to close out the discussion on SPARK-13843 by getting a poll from
> the community on which components we should seriously reconsider re-adding
> back to Apache Spark. For reference, here are the modules that were removed
> as part of SPARK-13843 and pushed to: https://github.com/spark-packages
>
> streaming-flume
> streaming-akka
> streaming-mqtt
> streaming-zeromq
> streaming-twitter
>
> For us, we'd like to see the streaming-flume added back to Apache Spark.
>
> Thanks,
> Kostas



-- 
Marcelo

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Re: SPARK-13843 Next steps

Posted by Steve Loughran <st...@hortonworks.com>.
while sonatype are utterly strict about the org.apache namespace (it guarantees that all such artifacts have come through the ASF release process, ideally including code-signing), nobody checks the org.apache internals, or worries too much about them. Note that spark itself has some bits of code in org.apache.hive so as to subclass the thriftserver.

What are the costs of having a project's package used externally?

1. interesting debugging sessions if JARs with conflicting classes are loaded.
2. you can't sign the JARs in the metadata. Nobody does that with the maven artifacts anyway.
3. whoever's package name it is often gets to see the stack traces in bug reports filed against them.



On 29 Mar 2016, at 01:47, Sean Owen <so...@cloudera.com>> wrote:

I tend to agree. If it's going to present a significant technical hurdle and the software is clearly non ASF like via a different artifact, there's a decent argument the namespace should stay. The artifact has to change though and that is what David was referring to in his other message.

On Mon, Mar 28, 2016, 08:33 Cody Koeninger <co...@koeninger.org>> wrote:
I really think the only thing that should have to change is the maven
group and identifier, not the java namespace.

There are compatibility problems with the java namespace changing
(e.g. access to private[spark]), and I don't think that someone who
takes the time to change their build file to download a maven artifact
without "apache" in the identifier is at significant risk of consumer
confusion.

I've tried to get a straight answer from ASF trademarks on this point,
but the answers I've been getting are mixed, and personally disturbing
to me in terms of over-reaching.

On Sat, Mar 26, 2016 at 9:03 AM, Sean Owen <so...@cloudera.com>> wrote:
> Looks like this is done; docs have been moved, flume is back in, etc.
>
> For the moment Kafka streaming is still in the project and I know
> there's still discussion about how to manage multiple versions within
> the project.
>
> One other thing we need to finish up is stuff like the namespace of
> the code that was moved out. I believe it'll have to move out of the
> org.apache namespace as well as change its artifact group. At least,
> David indicated Sonatype wouldn't let someone non-ASF push an artifact
> from that group anyway.
>
> Also might be worth adding a description at
> https://github.com/spark-packages explaining that these are just some
> unofficial Spark-related packages.
>
> On Tue, Mar 22, 2016 at 7:27 AM, Kostas Sakellis <ko...@cloudera.com>> wrote:
>> Hello all,
>>
>> I'd like to close out the discussion on SPARK-13843 by getting a poll from
>> the community on which components we should seriously reconsider re-adding
>> back to Apache Spark. For reference, here are the modules that were removed
>> as part of SPARK-13843 and pushed to: https://github.com/spark-packages
>>
>> streaming-flume
>> streaming-akka
>> streaming-mqtt
>> streaming-zeromq
>> streaming-twitter
>>
>> For us, we'd like to see the streaming-flume added back to Apache Spark.
>>
>> Thanks,
>> Kostas
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org<ma...@spark.apache.org>
> For additional commands, e-mail: dev-help@spark.apache.org<ma...@spark.apache.org>
>


Re: SPARK-13843 Next steps

Posted by Sean Owen <so...@cloudera.com>.
I tend to agree. If it's going to present a significant technical hurdle
and the software is clearly non ASF like via a different artifact, there's
a decent argument the namespace should stay. The artifact has to change
though and that is what David was referring to in his other message.

On Mon, Mar 28, 2016, 08:33 Cody Koeninger <co...@koeninger.org> wrote:

> I really think the only thing that should have to change is the maven
> group and identifier, not the java namespace.
>
> There are compatibility problems with the java namespace changing
> (e.g. access to private[spark]), and I don't think that someone who
> takes the time to change their build file to download a maven artifact
> without "apache" in the identifier is at significant risk of consumer
> confusion.
>
> I've tried to get a straight answer from ASF trademarks on this point,
> but the answers I've been getting are mixed, and personally disturbing
> to me in terms of over-reaching.
>
> On Sat, Mar 26, 2016 at 9:03 AM, Sean Owen <so...@cloudera.com> wrote:
> > Looks like this is done; docs have been moved, flume is back in, etc.
> >
> > For the moment Kafka streaming is still in the project and I know
> > there's still discussion about how to manage multiple versions within
> > the project.
> >
> > One other thing we need to finish up is stuff like the namespace of
> > the code that was moved out. I believe it'll have to move out of the
> > org.apache namespace as well as change its artifact group. At least,
> > David indicated Sonatype wouldn't let someone non-ASF push an artifact
> > from that group anyway.
> >
> > Also might be worth adding a description at
> > https://github.com/spark-packages explaining that these are just some
> > unofficial Spark-related packages.
> >
> > On Tue, Mar 22, 2016 at 7:27 AM, Kostas Sakellis <ko...@cloudera.com>
> wrote:
> >> Hello all,
> >>
> >> I'd like to close out the discussion on SPARK-13843 by getting a poll
> from
> >> the community on which components we should seriously reconsider
> re-adding
> >> back to Apache Spark. For reference, here are the modules that were
> removed
> >> as part of SPARK-13843 and pushed to: https://github.com/spark-packages
> >>
> >> streaming-flume
> >> streaming-akka
> >> streaming-mqtt
> >> streaming-zeromq
> >> streaming-twitter
> >>
> >> For us, we'd like to see the streaming-flume added back to Apache Spark.
> >>
> >> Thanks,
> >> Kostas
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> > For additional commands, e-mail: dev-help@spark.apache.org
> >
>

Re: SPARK-13843 Next steps

Posted by Marcelo Vanzin <va...@cloudera.com>.
On Mon, Mar 28, 2016 at 8:33 AM, Cody Koeninger <co...@koeninger.org> wrote:
> There are compatibility problems with the java namespace changing
> (e.g. access to private[spark])

I think it would be fine to keep the package names for backwards
compatibility, but I think if these external projects want to keep a
separate release cycle from Spark, they should refrain from using
"private[spark]" APIs; which I guess is an argument for changing the
package names at some point.

-- 
Marcelo

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Re: SPARK-13843 Next steps

Posted by Cody Koeninger <co...@koeninger.org>.
I really think the only thing that should have to change is the maven
group and identifier, not the java namespace.

There are compatibility problems with the java namespace changing
(e.g. access to private[spark]), and I don't think that someone who
takes the time to change their build file to download a maven artifact
without "apache" in the identifier is at significant risk of consumer
confusion.

I've tried to get a straight answer from ASF trademarks on this point,
but the answers I've been getting are mixed, and personally disturbing
to me in terms of over-reaching.

On Sat, Mar 26, 2016 at 9:03 AM, Sean Owen <so...@cloudera.com> wrote:
> Looks like this is done; docs have been moved, flume is back in, etc.
>
> For the moment Kafka streaming is still in the project and I know
> there's still discussion about how to manage multiple versions within
> the project.
>
> One other thing we need to finish up is stuff like the namespace of
> the code that was moved out. I believe it'll have to move out of the
> org.apache namespace as well as change its artifact group. At least,
> David indicated Sonatype wouldn't let someone non-ASF push an artifact
> from that group anyway.
>
> Also might be worth adding a description at
> https://github.com/spark-packages explaining that these are just some
> unofficial Spark-related packages.
>
> On Tue, Mar 22, 2016 at 7:27 AM, Kostas Sakellis <ko...@cloudera.com> wrote:
>> Hello all,
>>
>> I'd like to close out the discussion on SPARK-13843 by getting a poll from
>> the community on which components we should seriously reconsider re-adding
>> back to Apache Spark. For reference, here are the modules that were removed
>> as part of SPARK-13843 and pushed to: https://github.com/spark-packages
>>
>> streaming-flume
>> streaming-akka
>> streaming-mqtt
>> streaming-zeromq
>> streaming-twitter
>>
>> For us, we'd like to see the streaming-flume added back to Apache Spark.
>>
>> Thanks,
>> Kostas
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Re: SPARK-13843 Next steps

Posted by Sean Owen <so...@cloudera.com>.
Looks like this is done; docs have been moved, flume is back in, etc.

For the moment Kafka streaming is still in the project and I know
there's still discussion about how to manage multiple versions within
the project.

One other thing we need to finish up is stuff like the namespace of
the code that was moved out. I believe it'll have to move out of the
org.apache namespace as well as change its artifact group. At least,
David indicated Sonatype wouldn't let someone non-ASF push an artifact
from that group anyway.

Also might be worth adding a description at
https://github.com/spark-packages explaining that these are just some
unofficial Spark-related packages.

On Tue, Mar 22, 2016 at 7:27 AM, Kostas Sakellis <ko...@cloudera.com> wrote:
> Hello all,
>
> I'd like to close out the discussion on SPARK-13843 by getting a poll from
> the community on which components we should seriously reconsider re-adding
> back to Apache Spark. For reference, here are the modules that were removed
> as part of SPARK-13843 and pushed to: https://github.com/spark-packages
>
> streaming-flume
> streaming-akka
> streaming-mqtt
> streaming-zeromq
> streaming-twitter
>
> For us, we'd like to see the streaming-flume added back to Apache Spark.
>
> Thanks,
> Kostas

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org