You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Usman Ehtesham <ue...@gmail.com> on 2015/06/12 06:06:39 UTC

Contributing to pyspark

Hello,

I am currently taking a course in Apache Spark via EdX (
https://www.edx.org/course/introduction-big-data-apache-spark-uc-berkeleyx-cs100-1x)
and at the same time I try to look at the code for pySpark too. I wanted to
ask, if ideally I would like to contribute to pyspark specifically, how can
I do that? I do not intend to contribute to core Apache Spark any time soon
(mainly because I do not know Scala) but I am very comfortable in Python.

Any tips on how to contribute specifically to pyspark without being
affected by other parts of Spark would be greatly appreciated.

P.S.: I ask this because there is a small change/improvement I would like
to propose. Also since I just started learning Spark, I would like to also
read and understand the pyspark code as I learn about Spark. :)

Hope to hear from you soon.

Usman Ehtesham Gul
https://github.com/ueg1990

Re: Contributing to pyspark

Posted by Manoj Kumar <ma...@gmail.com>.
1, Yes, because the issues are in JIRA.
2. Nope, (at least as far as MLlib is concerned) because most if it are
just wrappers to the underlying Scala functions or methods and are not
implemented in pure Python.
3. I'm not sure about this. It seems to work fine for me!

HTH

On Fri, Jun 12, 2015 at 10:41 AM, Usman Ehtesham Gul <ue...@gmail.com>
wrote:

> Hello Manoj,
>
> First of all thank you for the quick reply. Just a couple of more things.
> I have started reading the link you provided; I will definitely filter JIRA
> with PySpark.
>
> Can you verify:
>
> 1) We fork from Github right? I ask because on Github, I see its mirrored
> and there are no issues section. I am assuming because that is done in Jira.
> 2) To contribute to PySpark, we will have to clone the whole project. But
> if our changes/contributions are only specific to pyspark, we can do those
> too without relying on core spark and other client libraries right?
> 3) I think the email user@spark.apache.org is broken. I am getting email
> from  MAILER-DAEMON@apache.org that email could be sent to this address.
> Can you check this?
>
> Thank you again. Hope to hear from you soon.
>
> Usman
>
> On Jun 12, 2015, at 12:57 AM, Manoj Kumar <ma...@gmail.com>
> wrote:
>
> Hi,
>
> Thanks for your interest in PySpark.
>
> The first thing is to have a look at the "how to contribute" guide
> https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark
> and filter the JIRA's using the label PySpark.
>
> If you have your own improvement in mind, you can file your a JIRA,
> discuss and then send a Pull Request
>
> HTH
>
> Regards.
>
> On Fri, Jun 12, 2015 at 9:36 AM, Usman Ehtesham <ue...@gmail.com>
> wrote:
>
>> Hello,
>>
>> I am currently taking a course in Apache Spark via EdX (
>> https://www.edx.org/course/introduction-big-data-apache-spark-uc-berkeleyx-cs100-1x)
>> and at the same time I try to look at the code for pySpark too. I wanted to
>> ask, if ideally I would like to contribute to pyspark specifically, how can
>> I do that? I do not intend to contribute to core Apache Spark any time soon
>> (mainly because I do not know Scala) but I am very comfortable in Python.
>>
>> Any tips on how to contribute specifically to pyspark without being
>> affected by other parts of Spark would be greatly appreciated.
>>
>> P.S.: I ask this because there is a small change/improvement I would like
>> to propose. Also since I just started learning Spark, I would like to also
>> read and understand the pyspark code as I learn about Spark. :)
>>
>> Hope to hear from you soon.
>>
>> Usman Ehtesham Gul
>> https://github.com/ueg1990
>>
>
>
>
> --
> Godspeed,
> Manoj Kumar,
> http://manojbits.wordpress.com
> <http://goog_1017110195/>
> http://github.com/MechCoder
>
>
>


-- 
Godspeed,
Manoj Kumar,
http://manojbits.wordpress.com
<http://goog_1017110195>
http://github.com/MechCoder

Re: Contributing to pyspark

Posted by Manoj Kumar <ma...@gmail.com>.
Hi,

Thanks for your interest in PySpark.

The first thing is to have a look at the "how to contribute" guide
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark and
filter the JIRA's using the label PySpark.

If you have your own improvement in mind, you can file your a JIRA, discuss
and then send a Pull Request

HTH

Regards.

On Fri, Jun 12, 2015 at 9:36 AM, Usman Ehtesham <ue...@gmail.com>
wrote:

> Hello,
>
> I am currently taking a course in Apache Spark via EdX (
> https://www.edx.org/course/introduction-big-data-apache-spark-uc-berkeleyx-cs100-1x)
> and at the same time I try to look at the code for pySpark too. I wanted to
> ask, if ideally I would like to contribute to pyspark specifically, how can
> I do that? I do not intend to contribute to core Apache Spark any time soon
> (mainly because I do not know Scala) but I am very comfortable in Python.
>
> Any tips on how to contribute specifically to pyspark without being
> affected by other parts of Spark would be greatly appreciated.
>
> P.S.: I ask this because there is a small change/improvement I would like
> to propose. Also since I just started learning Spark, I would like to also
> read and understand the pyspark code as I learn about Spark. :)
>
> Hope to hear from you soon.
>
> Usman Ehtesham Gul
> https://github.com/ueg1990
>



-- 
Godspeed,
Manoj Kumar,
http://manojbits.wordpress.com
<http://goog_1017110195>
http://github.com/MechCoder