You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Hyukjin Kwon <gu...@gmail.com> on 2022/08/08 12:43:40 UTC

Contributions and help needed in SPARK-40005

Hi all,

I am trying to improve PySpark documentation especially:

   - Make the examples self-contained, e.g.,
   https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html
   - Document Parameters
   https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html#pandas.DataFrame.pivot.
   There are many API that misses parameters in PySpark, e.g., DataFrame.union

Here is one example PR I am working on:
https://github.com/apache/spark/pull/37437
I can't do it all by myself. Any help, review, and contributions would be
welcome and appreciated.

Thank you all in advance.

Re: Contributions and help needed in SPARK-40005

Posted by Khalid Mammadov <kh...@gmail.com>.
Will do, thanks!

On Wed, 31 Aug 2022, 01:14 Hyukjin Kwon, <gu...@gmail.com> wrote:

> Oh, that's a mistake. please just go ahead and reuse that JIRA :-).
> You can just create a PR with reusing the same JIRA ID for functions.py
>
> On Wed, 31 Aug 2022 at 01:18, Khalid Mammadov <kh...@gmail.com>
> wrote:
>
>> Hi @Hyukjin Kwon <gu...@gmail.com>
>>
>> I see you have resolved the JIRA and I got some more things to do in
>> functions.py (only done 50%). So shall I create a new JIRA for each new PR
>> or ok to reuse this one?
>>
>> On Fri, 19 Aug 2022, 09:29 Khalid Mammadov, <kh...@gmail.com>
>> wrote:
>>
>>> Will do, thanks!
>>>
>>> On Fri, 19 Aug 2022, 09:11 Hyukjin Kwon, <gu...@gmail.com> wrote:
>>>
>>>> Sure, that would be great.
>>>>
>>>> I did the first 25 functions in functions.py. Please go ahead with the
>>>> rest of them.
>>>> You can create a PR with the title such
>>>> as [SPARK-40142][PYTHON][SQL][FOLLOW-UP] Make pyspark.sql.functions
>>>> examples self-contained (part 2, 25 functions)
>>>>
>>>> Thanks!
>>>>
>>>> On Fri, 19 Aug 2022 at 16:50, Khalid Mammadov <
>>>> khalidmammadov9@gmail.com> wrote:
>>>>
>>>>> I am picking up "functions.py" if noone is already
>>>>>
>>>>> On Fri, 19 Aug 2022, 07:56 Khalid Mammadov, <kh...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> I thought it's all finished (checked few). Do you have list of those
>>>>>> 50%?
>>>>>> Happy to contribute 😊
>>>>>>
>>>>>> On Fri, 19 Aug 2022, 05:54 Hyukjin Kwon, <gu...@gmail.com> wrote:
>>>>>>
>>>>>>> We're half way, roughly 50%. More contributions would be very
>>>>>>> helpful.
>>>>>>> If the size of the file is too large, feel free to split it to
>>>>>>> multiple parts (e.g., https://github.com/apache/spark/pull/37575)
>>>>>>>
>>>>>>> On Tue, 9 Aug 2022 at 12:26, Qian SUN <qi...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Sure, I will do it. SPARK-40010
>>>>>>>> <https://issues.apache.org/jira/browse/SPARK-40010> is built to
>>>>>>>> track progress.
>>>>>>>>
>>>>>>>> Hyukjin Kwon gurwls223@gmail.com
>>>>>>>> <ht...@gmail.com> 于2022年8月9日周二 10:58写道:
>>>>>>>>
>>>>>>>> Please go ahead. Would be very appreciated.
>>>>>>>>>
>>>>>>>>> On Tue, 9 Aug 2022 at 11:58, Qian SUN <qi...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Hyukjin
>>>>>>>>>>
>>>>>>>>>> I would like to do some work and pick up *Window.py *if possible.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Qian
>>>>>>>>>>
>>>>>>>>>> Hyukjin Kwon <gu...@gmail.com> 于2022年8月9日周二 10:41写道:
>>>>>>>>>>
>>>>>>>>>>> Thanks Khalid for taking a look.
>>>>>>>>>>>
>>>>>>>>>>> On Tue, 9 Aug 2022 at 00:37, Khalid Mammadov <
>>>>>>>>>>> khalidmammadov9@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Hyukjin
>>>>>>>>>>>> That's great initiative, here is a PR that address one of those
>>>>>>>>>>>> issues that's waiting for review:
>>>>>>>>>>>> https://github.com/apache/spark/pull/37408
>>>>>>>>>>>>
>>>>>>>>>>>> Perhaps, it would be also good to track these pending issues
>>>>>>>>>>>> somewhere to avoid effort duplication.
>>>>>>>>>>>>
>>>>>>>>>>>> For example, I would like to pick up *union* and *union all*
>>>>>>>>>>>> if no one has already.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Khalid
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Aug 8, 2022 at 1:44 PM Hyukjin Kwon <
>>>>>>>>>>>> gurwls223@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I am trying to improve PySpark documentation especially:
>>>>>>>>>>>>>
>>>>>>>>>>>>>    - Make the examples self-contained, e.g.,
>>>>>>>>>>>>>    https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html
>>>>>>>>>>>>>    - Document Parameters
>>>>>>>>>>>>>    https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html#pandas.DataFrame.pivot.
>>>>>>>>>>>>>    There are many API that misses parameters in PySpark, e.g., DataFrame.union
>>>>>>>>>>>>>
>>>>>>>>>>>>> Here is one example PR I am working on:
>>>>>>>>>>>>> https://github.com/apache/spark/pull/37437
>>>>>>>>>>>>> I can't do it all by myself. Any help, review, and
>>>>>>>>>>>>> contributions would be welcome and appreciated.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thank you all in advance.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Best!
>>>>>>>>>> Qian SUN
>>>>>>>>>>
>>>>>>>>> --
>>>>>>>> Best!
>>>>>>>> Qian SUN
>>>>>>>>
>>>>>>>

Re: Contributions and help needed in SPARK-40005

Posted by Hyukjin Kwon <gu...@gmail.com>.
Oh, that's a mistake. please just go ahead and reuse that JIRA :-).
You can just create a PR with reusing the same JIRA ID for functions.py

On Wed, 31 Aug 2022 at 01:18, Khalid Mammadov <kh...@gmail.com>
wrote:

> Hi @Hyukjin Kwon <gu...@gmail.com>
>
> I see you have resolved the JIRA and I got some more things to do in
> functions.py (only done 50%). So shall I create a new JIRA for each new PR
> or ok to reuse this one?
>
> On Fri, 19 Aug 2022, 09:29 Khalid Mammadov, <kh...@gmail.com>
> wrote:
>
>> Will do, thanks!
>>
>> On Fri, 19 Aug 2022, 09:11 Hyukjin Kwon, <gu...@gmail.com> wrote:
>>
>>> Sure, that would be great.
>>>
>>> I did the first 25 functions in functions.py. Please go ahead with the
>>> rest of them.
>>> You can create a PR with the title such
>>> as [SPARK-40142][PYTHON][SQL][FOLLOW-UP] Make pyspark.sql.functions
>>> examples self-contained (part 2, 25 functions)
>>>
>>> Thanks!
>>>
>>> On Fri, 19 Aug 2022 at 16:50, Khalid Mammadov <kh...@gmail.com>
>>> wrote:
>>>
>>>> I am picking up "functions.py" if noone is already
>>>>
>>>> On Fri, 19 Aug 2022, 07:56 Khalid Mammadov, <kh...@gmail.com>
>>>> wrote:
>>>>
>>>>> I thought it's all finished (checked few). Do you have list of those
>>>>> 50%?
>>>>> Happy to contribute 😊
>>>>>
>>>>> On Fri, 19 Aug 2022, 05:54 Hyukjin Kwon, <gu...@gmail.com> wrote:
>>>>>
>>>>>> We're half way, roughly 50%. More contributions would be very helpful.
>>>>>> If the size of the file is too large, feel free to split it to
>>>>>> multiple parts (e.g., https://github.com/apache/spark/pull/37575)
>>>>>>
>>>>>> On Tue, 9 Aug 2022 at 12:26, Qian SUN <qi...@gmail.com> wrote:
>>>>>>
>>>>>>> Sure, I will do it. SPARK-40010
>>>>>>> <https://issues.apache.org/jira/browse/SPARK-40010> is built to
>>>>>>> track progress.
>>>>>>>
>>>>>>> Hyukjin Kwon gurwls223@gmail.com <ht...@gmail.com>
>>>>>>> 于2022年8月9日周二 10:58写道:
>>>>>>>
>>>>>>> Please go ahead. Would be very appreciated.
>>>>>>>>
>>>>>>>> On Tue, 9 Aug 2022 at 11:58, Qian SUN <qi...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi Hyukjin
>>>>>>>>>
>>>>>>>>> I would like to do some work and pick up *Window.py *if possible.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Qian
>>>>>>>>>
>>>>>>>>> Hyukjin Kwon <gu...@gmail.com> 于2022年8月9日周二 10:41写道:
>>>>>>>>>
>>>>>>>>>> Thanks Khalid for taking a look.
>>>>>>>>>>
>>>>>>>>>> On Tue, 9 Aug 2022 at 00:37, Khalid Mammadov <
>>>>>>>>>> khalidmammadov9@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Hyukjin
>>>>>>>>>>> That's great initiative, here is a PR that address one of those
>>>>>>>>>>> issues that's waiting for review:
>>>>>>>>>>> https://github.com/apache/spark/pull/37408
>>>>>>>>>>>
>>>>>>>>>>> Perhaps, it would be also good to track these pending issues
>>>>>>>>>>> somewhere to avoid effort duplication.
>>>>>>>>>>>
>>>>>>>>>>> For example, I would like to pick up *union* and *union all* if
>>>>>>>>>>> no one has already.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Khalid
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Aug 8, 2022 at 1:44 PM Hyukjin Kwon <gu...@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>
>>>>>>>>>>>> I am trying to improve PySpark documentation especially:
>>>>>>>>>>>>
>>>>>>>>>>>>    - Make the examples self-contained, e.g.,
>>>>>>>>>>>>    https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html
>>>>>>>>>>>>    - Document Parameters
>>>>>>>>>>>>    https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html#pandas.DataFrame.pivot.
>>>>>>>>>>>>    There are many API that misses parameters in PySpark, e.g., DataFrame.union
>>>>>>>>>>>>
>>>>>>>>>>>> Here is one example PR I am working on:
>>>>>>>>>>>> https://github.com/apache/spark/pull/37437
>>>>>>>>>>>> I can't do it all by myself. Any help, review, and
>>>>>>>>>>>> contributions would be welcome and appreciated.
>>>>>>>>>>>>
>>>>>>>>>>>> Thank you all in advance.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Best!
>>>>>>>>> Qian SUN
>>>>>>>>>
>>>>>>>> --
>>>>>>> Best!
>>>>>>> Qian SUN
>>>>>>>
>>>>>>

Re: Contributions and help needed in SPARK-40005

Posted by Khalid Mammadov <kh...@gmail.com>.
Hi @Hyukjin Kwon <gu...@gmail.com>

I see you have resolved the JIRA and I got some more things to do in
functions.py (only done 50%). So shall I create a new JIRA for each new PR
or ok to reuse this one?

On Fri, 19 Aug 2022, 09:29 Khalid Mammadov, <kh...@gmail.com>
wrote:

> Will do, thanks!
>
> On Fri, 19 Aug 2022, 09:11 Hyukjin Kwon, <gu...@gmail.com> wrote:
>
>> Sure, that would be great.
>>
>> I did the first 25 functions in functions.py. Please go ahead with the
>> rest of them.
>> You can create a PR with the title such
>> as [SPARK-40142][PYTHON][SQL][FOLLOW-UP] Make pyspark.sql.functions
>> examples self-contained (part 2, 25 functions)
>>
>> Thanks!
>>
>> On Fri, 19 Aug 2022 at 16:50, Khalid Mammadov <kh...@gmail.com>
>> wrote:
>>
>>> I am picking up "functions.py" if noone is already
>>>
>>> On Fri, 19 Aug 2022, 07:56 Khalid Mammadov, <kh...@gmail.com>
>>> wrote:
>>>
>>>> I thought it's all finished (checked few). Do you have list of those
>>>> 50%?
>>>> Happy to contribute 😊
>>>>
>>>> On Fri, 19 Aug 2022, 05:54 Hyukjin Kwon, <gu...@gmail.com> wrote:
>>>>
>>>>> We're half way, roughly 50%. More contributions would be very helpful.
>>>>> If the size of the file is too large, feel free to split it to
>>>>> multiple parts (e.g., https://github.com/apache/spark/pull/37575)
>>>>>
>>>>> On Tue, 9 Aug 2022 at 12:26, Qian SUN <qi...@gmail.com> wrote:
>>>>>
>>>>>> Sure, I will do it. SPARK-40010
>>>>>> <https://issues.apache.org/jira/browse/SPARK-40010> is built to
>>>>>> track progress.
>>>>>>
>>>>>> Hyukjin Kwon gurwls223@gmail.com <ht...@gmail.com>
>>>>>> 于2022年8月9日周二 10:58写道:
>>>>>>
>>>>>> Please go ahead. Would be very appreciated.
>>>>>>>
>>>>>>> On Tue, 9 Aug 2022 at 11:58, Qian SUN <qi...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Hyukjin
>>>>>>>>
>>>>>>>> I would like to do some work and pick up *Window.py *if possible.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Qian
>>>>>>>>
>>>>>>>> Hyukjin Kwon <gu...@gmail.com> 于2022年8月9日周二 10:41写道:
>>>>>>>>
>>>>>>>>> Thanks Khalid for taking a look.
>>>>>>>>>
>>>>>>>>> On Tue, 9 Aug 2022 at 00:37, Khalid Mammadov <
>>>>>>>>> khalidmammadov9@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Hyukjin
>>>>>>>>>> That's great initiative, here is a PR that address one of those
>>>>>>>>>> issues that's waiting for review:
>>>>>>>>>> https://github.com/apache/spark/pull/37408
>>>>>>>>>>
>>>>>>>>>> Perhaps, it would be also good to track these pending issues
>>>>>>>>>> somewhere to avoid effort duplication.
>>>>>>>>>>
>>>>>>>>>> For example, I would like to pick up *union* and *union all* if
>>>>>>>>>> no one has already.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Khalid
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, Aug 8, 2022 at 1:44 PM Hyukjin Kwon <gu...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi all,
>>>>>>>>>>>
>>>>>>>>>>> I am trying to improve PySpark documentation especially:
>>>>>>>>>>>
>>>>>>>>>>>    - Make the examples self-contained, e.g.,
>>>>>>>>>>>    https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html
>>>>>>>>>>>    - Document Parameters
>>>>>>>>>>>    https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html#pandas.DataFrame.pivot.
>>>>>>>>>>>    There are many API that misses parameters in PySpark, e.g., DataFrame.union
>>>>>>>>>>>
>>>>>>>>>>> Here is one example PR I am working on:
>>>>>>>>>>> https://github.com/apache/spark/pull/37437
>>>>>>>>>>> I can't do it all by myself. Any help, review, and contributions
>>>>>>>>>>> would be welcome and appreciated.
>>>>>>>>>>>
>>>>>>>>>>> Thank you all in advance.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Best!
>>>>>>>> Qian SUN
>>>>>>>>
>>>>>>> --
>>>>>> Best!
>>>>>> Qian SUN
>>>>>>
>>>>>

Re: Contributions and help needed in SPARK-40005

Posted by Khalid Mammadov <kh...@gmail.com>.
Will do, thanks!

On Fri, 19 Aug 2022, 09:11 Hyukjin Kwon, <gu...@gmail.com> wrote:

> Sure, that would be great.
>
> I did the first 25 functions in functions.py. Please go ahead with the
> rest of them.
> You can create a PR with the title such
> as [SPARK-40142][PYTHON][SQL][FOLLOW-UP] Make pyspark.sql.functions
> examples self-contained (part 2, 25 functions)
>
> Thanks!
>
> On Fri, 19 Aug 2022 at 16:50, Khalid Mammadov <kh...@gmail.com>
> wrote:
>
>> I am picking up "functions.py" if noone is already
>>
>> On Fri, 19 Aug 2022, 07:56 Khalid Mammadov, <kh...@gmail.com>
>> wrote:
>>
>>> I thought it's all finished (checked few). Do you have list of those
>>> 50%?
>>> Happy to contribute 😊
>>>
>>> On Fri, 19 Aug 2022, 05:54 Hyukjin Kwon, <gu...@gmail.com> wrote:
>>>
>>>> We're half way, roughly 50%. More contributions would be very helpful.
>>>> If the size of the file is too large, feel free to split it to multiple
>>>> parts (e.g., https://github.com/apache/spark/pull/37575)
>>>>
>>>> On Tue, 9 Aug 2022 at 12:26, Qian SUN <qi...@gmail.com> wrote:
>>>>
>>>>> Sure, I will do it. SPARK-40010
>>>>> <https://issues.apache.org/jira/browse/SPARK-40010> is built to track
>>>>> progress.
>>>>>
>>>>> Hyukjin Kwon gurwls223@gmail.com <ht...@gmail.com>
>>>>> 于2022年8月9日周二 10:58写道:
>>>>>
>>>>> Please go ahead. Would be very appreciated.
>>>>>>
>>>>>> On Tue, 9 Aug 2022 at 11:58, Qian SUN <qi...@gmail.com> wrote:
>>>>>>
>>>>>>> Hi Hyukjin
>>>>>>>
>>>>>>> I would like to do some work and pick up *Window.py *if possible.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Qian
>>>>>>>
>>>>>>> Hyukjin Kwon <gu...@gmail.com> 于2022年8月9日周二 10:41写道:
>>>>>>>
>>>>>>>> Thanks Khalid for taking a look.
>>>>>>>>
>>>>>>>> On Tue, 9 Aug 2022 at 00:37, Khalid Mammadov <
>>>>>>>> khalidmammadov9@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi Hyukjin
>>>>>>>>> That's great initiative, here is a PR that address one of those
>>>>>>>>> issues that's waiting for review:
>>>>>>>>> https://github.com/apache/spark/pull/37408
>>>>>>>>>
>>>>>>>>> Perhaps, it would be also good to track these pending issues
>>>>>>>>> somewhere to avoid effort duplication.
>>>>>>>>>
>>>>>>>>> For example, I would like to pick up *union* and *union all* if
>>>>>>>>> no one has already.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Khalid
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Aug 8, 2022 at 1:44 PM Hyukjin Kwon <gu...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi all,
>>>>>>>>>>
>>>>>>>>>> I am trying to improve PySpark documentation especially:
>>>>>>>>>>
>>>>>>>>>>    - Make the examples self-contained, e.g.,
>>>>>>>>>>    https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html
>>>>>>>>>>    - Document Parameters
>>>>>>>>>>    https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html#pandas.DataFrame.pivot.
>>>>>>>>>>    There are many API that misses parameters in PySpark, e.g., DataFrame.union
>>>>>>>>>>
>>>>>>>>>> Here is one example PR I am working on:
>>>>>>>>>> https://github.com/apache/spark/pull/37437
>>>>>>>>>> I can't do it all by myself. Any help, review, and contributions
>>>>>>>>>> would be welcome and appreciated.
>>>>>>>>>>
>>>>>>>>>> Thank you all in advance.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Best!
>>>>>>> Qian SUN
>>>>>>>
>>>>>> --
>>>>> Best!
>>>>> Qian SUN
>>>>>
>>>>

Re: Contributions and help needed in SPARK-40005

Posted by Hyukjin Kwon <gu...@gmail.com>.
Sure, that would be great.

I did the first 25 functions in functions.py. Please go ahead with the rest
of them.
You can create a PR with the title such
as [SPARK-40142][PYTHON][SQL][FOLLOW-UP] Make pyspark.sql.functions
examples self-contained (part 2, 25 functions)

Thanks!

On Fri, 19 Aug 2022 at 16:50, Khalid Mammadov <kh...@gmail.com>
wrote:

> I am picking up "functions.py" if noone is already
>
> On Fri, 19 Aug 2022, 07:56 Khalid Mammadov, <kh...@gmail.com>
> wrote:
>
>> I thought it's all finished (checked few). Do you have list of those 50%?
>> Happy to contribute 😊
>>
>> On Fri, 19 Aug 2022, 05:54 Hyukjin Kwon, <gu...@gmail.com> wrote:
>>
>>> We're half way, roughly 50%. More contributions would be very helpful.
>>> If the size of the file is too large, feel free to split it to multiple
>>> parts (e.g., https://github.com/apache/spark/pull/37575)
>>>
>>> On Tue, 9 Aug 2022 at 12:26, Qian SUN <qi...@gmail.com> wrote:
>>>
>>>> Sure, I will do it. SPARK-40010
>>>> <https://issues.apache.org/jira/browse/SPARK-40010> is built to track
>>>> progress.
>>>>
>>>> Hyukjin Kwon gurwls223@gmail.com <ht...@gmail.com>
>>>> 于2022年8月9日周二 10:58写道:
>>>>
>>>> Please go ahead. Would be very appreciated.
>>>>>
>>>>> On Tue, 9 Aug 2022 at 11:58, Qian SUN <qi...@gmail.com> wrote:
>>>>>
>>>>>> Hi Hyukjin
>>>>>>
>>>>>> I would like to do some work and pick up *Window.py *if possible.
>>>>>>
>>>>>> Thanks,
>>>>>> Qian
>>>>>>
>>>>>> Hyukjin Kwon <gu...@gmail.com> 于2022年8月9日周二 10:41写道:
>>>>>>
>>>>>>> Thanks Khalid for taking a look.
>>>>>>>
>>>>>>> On Tue, 9 Aug 2022 at 00:37, Khalid Mammadov <
>>>>>>> khalidmammadov9@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi Hyukjin
>>>>>>>> That's great initiative, here is a PR that address one of those
>>>>>>>> issues that's waiting for review:
>>>>>>>> https://github.com/apache/spark/pull/37408
>>>>>>>>
>>>>>>>> Perhaps, it would be also good to track these pending issues
>>>>>>>> somewhere to avoid effort duplication.
>>>>>>>>
>>>>>>>> For example, I would like to pick up *union* and *union all* if no
>>>>>>>> one has already.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Khalid
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Aug 8, 2022 at 1:44 PM Hyukjin Kwon <gu...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi all,
>>>>>>>>>
>>>>>>>>> I am trying to improve PySpark documentation especially:
>>>>>>>>>
>>>>>>>>>    - Make the examples self-contained, e.g.,
>>>>>>>>>    https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html
>>>>>>>>>    - Document Parameters
>>>>>>>>>    https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html#pandas.DataFrame.pivot.
>>>>>>>>>    There are many API that misses parameters in PySpark, e.g., DataFrame.union
>>>>>>>>>
>>>>>>>>> Here is one example PR I am working on:
>>>>>>>>> https://github.com/apache/spark/pull/37437
>>>>>>>>> I can't do it all by myself. Any help, review, and contributions
>>>>>>>>> would be welcome and appreciated.
>>>>>>>>>
>>>>>>>>> Thank you all in advance.
>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Best!
>>>>>> Qian SUN
>>>>>>
>>>>> --
>>>> Best!
>>>> Qian SUN
>>>>
>>>

Re: Contributions and help needed in SPARK-40005

Posted by Khalid Mammadov <kh...@gmail.com>.
I am picking up "functions.py" if noone is already

On Fri, 19 Aug 2022, 07:56 Khalid Mammadov, <kh...@gmail.com>
wrote:

> I thought it's all finished (checked few). Do you have list of those 50%?
> Happy to contribute 😊
>
> On Fri, 19 Aug 2022, 05:54 Hyukjin Kwon, <gu...@gmail.com> wrote:
>
>> We're half way, roughly 50%. More contributions would be very helpful.
>> If the size of the file is too large, feel free to split it to multiple
>> parts (e.g., https://github.com/apache/spark/pull/37575)
>>
>> On Tue, 9 Aug 2022 at 12:26, Qian SUN <qi...@gmail.com> wrote:
>>
>>> Sure, I will do it. SPARK-40010
>>> <https://issues.apache.org/jira/browse/SPARK-40010> is built to track
>>> progress.
>>>
>>> Hyukjin Kwon gurwls223@gmail.com <ht...@gmail.com>
>>> 于2022年8月9日周二 10:58写道:
>>>
>>> Please go ahead. Would be very appreciated.
>>>>
>>>> On Tue, 9 Aug 2022 at 11:58, Qian SUN <qi...@gmail.com> wrote:
>>>>
>>>>> Hi Hyukjin
>>>>>
>>>>> I would like to do some work and pick up *Window.py *if possible.
>>>>>
>>>>> Thanks,
>>>>> Qian
>>>>>
>>>>> Hyukjin Kwon <gu...@gmail.com> 于2022年8月9日周二 10:41写道:
>>>>>
>>>>>> Thanks Khalid for taking a look.
>>>>>>
>>>>>> On Tue, 9 Aug 2022 at 00:37, Khalid Mammadov <
>>>>>> khalidmammadov9@gmail.com> wrote:
>>>>>>
>>>>>>> Hi Hyukjin
>>>>>>> That's great initiative, here is a PR that address one of those
>>>>>>> issues that's waiting for review:
>>>>>>> https://github.com/apache/spark/pull/37408
>>>>>>>
>>>>>>> Perhaps, it would be also good to track these pending issues
>>>>>>> somewhere to avoid effort duplication.
>>>>>>>
>>>>>>> For example, I would like to pick up *union* and *union all* if no
>>>>>>> one has already.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Khalid
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Aug 8, 2022 at 1:44 PM Hyukjin Kwon <gu...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> I am trying to improve PySpark documentation especially:
>>>>>>>>
>>>>>>>>    - Make the examples self-contained, e.g.,
>>>>>>>>    https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html
>>>>>>>>    - Document Parameters
>>>>>>>>    https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html#pandas.DataFrame.pivot.
>>>>>>>>    There are many API that misses parameters in PySpark, e.g., DataFrame.union
>>>>>>>>
>>>>>>>> Here is one example PR I am working on:
>>>>>>>> https://github.com/apache/spark/pull/37437
>>>>>>>> I can't do it all by myself. Any help, review, and contributions
>>>>>>>> would be welcome and appreciated.
>>>>>>>>
>>>>>>>> Thank you all in advance.
>>>>>>>>
>>>>>>>
>>>>>
>>>>> --
>>>>> Best!
>>>>> Qian SUN
>>>>>
>>>> --
>>> Best!
>>> Qian SUN
>>>
>>

Re: Contributions and help needed in SPARK-40005

Posted by Khalid Mammadov <kh...@gmail.com>.
I thought it's all finished (checked few). Do you have list of those 50%?
Happy to contribute 😊

On Fri, 19 Aug 2022, 05:54 Hyukjin Kwon, <gu...@gmail.com> wrote:

> We're half way, roughly 50%. More contributions would be very helpful.
> If the size of the file is too large, feel free to split it to multiple
> parts (e.g., https://github.com/apache/spark/pull/37575)
>
> On Tue, 9 Aug 2022 at 12:26, Qian SUN <qi...@gmail.com> wrote:
>
>> Sure, I will do it. SPARK-40010
>> <https://issues.apache.org/jira/browse/SPARK-40010> is built to track
>> progress.
>>
>> Hyukjin Kwon gurwls223@gmail.com <ht...@gmail.com>
>> 于2022年8月9日周二 10:58写道:
>>
>> Please go ahead. Would be very appreciated.
>>>
>>> On Tue, 9 Aug 2022 at 11:58, Qian SUN <qi...@gmail.com> wrote:
>>>
>>>> Hi Hyukjin
>>>>
>>>> I would like to do some work and pick up *Window.py *if possible.
>>>>
>>>> Thanks,
>>>> Qian
>>>>
>>>> Hyukjin Kwon <gu...@gmail.com> 于2022年8月9日周二 10:41写道:
>>>>
>>>>> Thanks Khalid for taking a look.
>>>>>
>>>>> On Tue, 9 Aug 2022 at 00:37, Khalid Mammadov <
>>>>> khalidmammadov9@gmail.com> wrote:
>>>>>
>>>>>> Hi Hyukjin
>>>>>> That's great initiative, here is a PR that address one of those
>>>>>> issues that's waiting for review:
>>>>>> https://github.com/apache/spark/pull/37408
>>>>>>
>>>>>> Perhaps, it would be also good to track these pending issues
>>>>>> somewhere to avoid effort duplication.
>>>>>>
>>>>>> For example, I would like to pick up *union* and *union all* if no
>>>>>> one has already.
>>>>>>
>>>>>> Thanks,
>>>>>> Khalid
>>>>>>
>>>>>>
>>>>>> On Mon, Aug 8, 2022 at 1:44 PM Hyukjin Kwon <gu...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> I am trying to improve PySpark documentation especially:
>>>>>>>
>>>>>>>    - Make the examples self-contained, e.g.,
>>>>>>>    https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html
>>>>>>>    - Document Parameters
>>>>>>>    https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html#pandas.DataFrame.pivot.
>>>>>>>    There are many API that misses parameters in PySpark, e.g., DataFrame.union
>>>>>>>
>>>>>>> Here is one example PR I am working on:
>>>>>>> https://github.com/apache/spark/pull/37437
>>>>>>> I can't do it all by myself. Any help, review, and contributions
>>>>>>> would be welcome and appreciated.
>>>>>>>
>>>>>>> Thank you all in advance.
>>>>>>>
>>>>>>
>>>>
>>>> --
>>>> Best!
>>>> Qian SUN
>>>>
>>> --
>> Best!
>> Qian SUN
>>
>

Re: Contributions and help needed in SPARK-40005

Posted by Hyukjin Kwon <gu...@gmail.com>.
We're half way, roughly 50%. More contributions would be very helpful.
If the size of the file is too large, feel free to split it to multiple
parts (e.g., https://github.com/apache/spark/pull/37575)

On Tue, 9 Aug 2022 at 12:26, Qian SUN <qi...@gmail.com> wrote:

> Sure, I will do it. SPARK-40010
> <https://issues.apache.org/jira/browse/SPARK-40010> is built to track
> progress.
>
> Hyukjin Kwon gurwls223@gmail.com <ht...@gmail.com>
> 于2022年8月9日周二 10:58写道:
>
> Please go ahead. Would be very appreciated.
>>
>> On Tue, 9 Aug 2022 at 11:58, Qian SUN <qi...@gmail.com> wrote:
>>
>>> Hi Hyukjin
>>>
>>> I would like to do some work and pick up *Window.py *if possible.
>>>
>>> Thanks,
>>> Qian
>>>
>>> Hyukjin Kwon <gu...@gmail.com> 于2022年8月9日周二 10:41写道:
>>>
>>>> Thanks Khalid for taking a look.
>>>>
>>>> On Tue, 9 Aug 2022 at 00:37, Khalid Mammadov <kh...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Hyukjin
>>>>> That's great initiative, here is a PR that address one of those issues
>>>>> that's waiting for review: https://github.com/apache/spark/pull/37408
>>>>>
>>>>> Perhaps, it would be also good to track these pending issues somewhere
>>>>> to avoid effort duplication.
>>>>>
>>>>> For example, I would like to pick up *union* and *union all* if no
>>>>> one has already.
>>>>>
>>>>> Thanks,
>>>>> Khalid
>>>>>
>>>>>
>>>>> On Mon, Aug 8, 2022 at 1:44 PM Hyukjin Kwon <gu...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I am trying to improve PySpark documentation especially:
>>>>>>
>>>>>>    - Make the examples self-contained, e.g.,
>>>>>>    https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html
>>>>>>    - Document Parameters
>>>>>>    https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html#pandas.DataFrame.pivot.
>>>>>>    There are many API that misses parameters in PySpark, e.g., DataFrame.union
>>>>>>
>>>>>> Here is one example PR I am working on:
>>>>>> https://github.com/apache/spark/pull/37437
>>>>>> I can't do it all by myself. Any help, review, and contributions
>>>>>> would be welcome and appreciated.
>>>>>>
>>>>>> Thank you all in advance.
>>>>>>
>>>>>
>>>
>>> --
>>> Best!
>>> Qian SUN
>>>
>> --
> Best!
> Qian SUN
>

Re: Contributions and help needed in SPARK-40005

Posted by Qian SUN <qi...@gmail.com>.
Sure, I will do it. SPARK-40010
<https://issues.apache.org/jira/browse/SPARK-40010> is built to track
progress.

Hyukjin Kwon gurwls223@gmail.com <ht...@gmail.com>
于2022年8月9日周二 10:58写道:

Please go ahead. Would be very appreciated.
>
> On Tue, 9 Aug 2022 at 11:58, Qian SUN <qi...@gmail.com> wrote:
>
>> Hi Hyukjin
>>
>> I would like to do some work and pick up *Window.py *if possible.
>>
>> Thanks,
>> Qian
>>
>> Hyukjin Kwon <gu...@gmail.com> 于2022年8月9日周二 10:41写道:
>>
>>> Thanks Khalid for taking a look.
>>>
>>> On Tue, 9 Aug 2022 at 00:37, Khalid Mammadov <kh...@gmail.com>
>>> wrote:
>>>
>>>> Hi Hyukjin
>>>> That's great initiative, here is a PR that address one of those issues
>>>> that's waiting for review: https://github.com/apache/spark/pull/37408
>>>>
>>>> Perhaps, it would be also good to track these pending issues somewhere
>>>> to avoid effort duplication.
>>>>
>>>> For example, I would like to pick up *union* and *union all* if no
>>>> one has already.
>>>>
>>>> Thanks,
>>>> Khalid
>>>>
>>>>
>>>> On Mon, Aug 8, 2022 at 1:44 PM Hyukjin Kwon <gu...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I am trying to improve PySpark documentation especially:
>>>>>
>>>>>    - Make the examples self-contained, e.g.,
>>>>>    https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html
>>>>>    - Document Parameters
>>>>>    https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html#pandas.DataFrame.pivot.
>>>>>    There are many API that misses parameters in PySpark, e.g., DataFrame.union
>>>>>
>>>>> Here is one example PR I am working on:
>>>>> https://github.com/apache/spark/pull/37437
>>>>> I can't do it all by myself. Any help, review, and contributions
>>>>> would be welcome and appreciated.
>>>>>
>>>>> Thank you all in advance.
>>>>>
>>>>
>>
>> --
>> Best!
>> Qian SUN
>>
> --
Best!
Qian SUN

Re: Contributions and help needed in SPARK-40005

Posted by Hyukjin Kwon <gu...@gmail.com>.
Please go ahead. Would be very appreciated.

On Tue, 9 Aug 2022 at 11:58, Qian SUN <qi...@gmail.com> wrote:

> Hi Hyukjin
>
> I would like to do some work and pick up *Window.py *if possible.
>
> Thanks,
> Qian
>
> Hyukjin Kwon <gu...@gmail.com> 于2022年8月9日周二 10:41写道:
>
>> Thanks Khalid for taking a look.
>>
>> On Tue, 9 Aug 2022 at 00:37, Khalid Mammadov <kh...@gmail.com>
>> wrote:
>>
>>> Hi Hyukjin
>>> That's great initiative, here is a PR that address one of those issues
>>> that's waiting for review: https://github.com/apache/spark/pull/37408
>>>
>>> Perhaps, it would be also good to track these pending issues somewhere
>>> to avoid effort duplication.
>>>
>>> For example, I would like to pick up *union* and *union all* if no
>>> one has already.
>>>
>>> Thanks,
>>> Khalid
>>>
>>>
>>> On Mon, Aug 8, 2022 at 1:44 PM Hyukjin Kwon <gu...@gmail.com> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I am trying to improve PySpark documentation especially:
>>>>
>>>>    - Make the examples self-contained, e.g.,
>>>>    https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html
>>>>    - Document Parameters
>>>>    https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html#pandas.DataFrame.pivot.
>>>>    There are many API that misses parameters in PySpark, e.g., DataFrame.union
>>>>
>>>> Here is one example PR I am working on:
>>>> https://github.com/apache/spark/pull/37437
>>>> I can't do it all by myself. Any help, review, and contributions
>>>> would be welcome and appreciated.
>>>>
>>>> Thank you all in advance.
>>>>
>>>
>
> --
> Best!
> Qian SUN
>

Re: Contributions and help needed in SPARK-40005

Posted by Qian SUN <qi...@gmail.com>.
Hi Hyukjin

I would like to do some work and pick up *Window.py *if possible.

Thanks,
Qian

Hyukjin Kwon <gu...@gmail.com> 于2022年8月9日周二 10:41写道:

> Thanks Khalid for taking a look.
>
> On Tue, 9 Aug 2022 at 00:37, Khalid Mammadov <kh...@gmail.com>
> wrote:
>
>> Hi Hyukjin
>> That's great initiative, here is a PR that address one of those issues
>> that's waiting for review: https://github.com/apache/spark/pull/37408
>>
>> Perhaps, it would be also good to track these pending issues somewhere to
>> avoid effort duplication.
>>
>> For example, I would like to pick up *union* and *union all* if no
>> one has already.
>>
>> Thanks,
>> Khalid
>>
>>
>> On Mon, Aug 8, 2022 at 1:44 PM Hyukjin Kwon <gu...@gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> I am trying to improve PySpark documentation especially:
>>>
>>>    - Make the examples self-contained, e.g.,
>>>    https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html
>>>    - Document Parameters
>>>    https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html#pandas.DataFrame.pivot.
>>>    There are many API that misses parameters in PySpark, e.g., DataFrame.union
>>>
>>> Here is one example PR I am working on:
>>> https://github.com/apache/spark/pull/37437
>>> I can't do it all by myself. Any help, review, and contributions
>>> would be welcome and appreciated.
>>>
>>> Thank you all in advance.
>>>
>>

-- 
Best!
Qian SUN

Re: Contributions and help needed in SPARK-40005

Posted by Hyukjin Kwon <gu...@gmail.com>.
Thanks Khalid for taking a look.

On Tue, 9 Aug 2022 at 00:37, Khalid Mammadov <kh...@gmail.com>
wrote:

> Hi Hyukjin
> That's great initiative, here is a PR that address one of those issues
> that's waiting for review: https://github.com/apache/spark/pull/37408
>
> Perhaps, it would be also good to track these pending issues somewhere to
> avoid effort duplication.
>
> For example, I would like to pick up *union* and *union all* if no
> one has already.
>
> Thanks,
> Khalid
>
>
> On Mon, Aug 8, 2022 at 1:44 PM Hyukjin Kwon <gu...@gmail.com> wrote:
>
>> Hi all,
>>
>> I am trying to improve PySpark documentation especially:
>>
>>    - Make the examples self-contained, e.g.,
>>    https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html
>>    - Document Parameters
>>    https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html#pandas.DataFrame.pivot.
>>    There are many API that misses parameters in PySpark, e.g., DataFrame.union
>>
>> Here is one example PR I am working on:
>> https://github.com/apache/spark/pull/37437
>> I can't do it all by myself. Any help, review, and contributions would be
>> welcome and appreciated.
>>
>> Thank you all in advance.
>>
>

Re: Contributions and help needed in SPARK-40005

Posted by Khalid Mammadov <kh...@gmail.com>.
Hi Hyukjin
That's great initiative, here is a PR that address one of those issues
that's waiting for review: https://github.com/apache/spark/pull/37408

Perhaps, it would be also good to track these pending issues somewhere to
avoid effort duplication.

For example, I would like to pick up *union* and *union all* if no one has
already.

Thanks,
Khalid


On Mon, Aug 8, 2022 at 1:44 PM Hyukjin Kwon <gu...@gmail.com> wrote:

> Hi all,
>
> I am trying to improve PySpark documentation especially:
>
>    - Make the examples self-contained, e.g.,
>    https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html
>    - Document Parameters
>    https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html#pandas.DataFrame.pivot.
>    There are many API that misses parameters in PySpark, e.g., DataFrame.union
>
> Here is one example PR I am working on:
> https://github.com/apache/spark/pull/37437
> I can't do it all by myself. Any help, review, and contributions would be
> welcome and appreciated.
>
> Thank you all in advance.
>