You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by ofer <of...@gmail.com> on 2016/11/24 09:39:51 UTC

PySpark TaskContext

Hi,
Is there a way to get in PYSPARK something like TaskContext from a code
running on executor like in scala spark?

If not - how can i know my task id from inside the executors?

Thanks!



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/PySpark-TaskContext-tp28125.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Re: PySpark TaskContext

Posted by Holden Karau <ho...@pigscanfly.ca>.
I love working with the Python community & I've heard similar requests in
the past few months so its good to have a solid reason to try and add this
functionality :)

Just to be clear though I'm not a Spark committer so when I work on stuff
getting in it very much dependent on me finding a committer who shares my
view - but I've over a hundred commits so it happens more often than not :)

On Thu, Nov 24, 2016 at 3:15 AM, Ofer Eliassaf <of...@gmail.com>
wrote:

> thank u so much for this! Great to see that u listen to the community.
>
> On Thu, Nov 24, 2016 at 12:10 PM, Holden Karau <ho...@pigscanfly.ca>
> wrote:
>
>> https://issues.apache.org/jira/browse/SPARK-18576
>>
>> On Thu, Nov 24, 2016 at 2:05 AM, Holden Karau <ho...@pigscanfly.ca>
>> wrote:
>>
>>> Cool - thanks. I'll circle back with the JIRA number once I've got it
>>> created - will probably take awhile before it lands in a Spark release
>>> (since 2.1 has already branched) but better debugging information for
>>> Python users is certainly important/useful.
>>>
>>> On Thu, Nov 24, 2016 at 2:03 AM, Ofer Eliassaf <of...@gmail.com>
>>> wrote:
>>>
>>>> Since we can't work with log4j in pyspark executors we build our own
>>>> logging infrastructure (based on logstash/elastic/kibana).
>>>> Would help to have TID in the logs, so we can drill down accordingly.
>>>>
>>>>
>>>> On Thu, Nov 24, 2016 at 11:48 AM, Holden Karau <ho...@pigscanfly.ca>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> The TaskContext isn't currently exposed in PySpark but I've been
>>>>> meaning to look at exposing at least some of TaskContext for parity in
>>>>> PySpark. Is there a particular use case which you want this for? Would help
>>>>> with crafting the JIRA :)
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Holden :)
>>>>>
>>>>> On Thu, Nov 24, 2016 at 1:39 AM, ofer <of...@gmail.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>> Is there a way to get in PYSPARK something like TaskContext from a
>>>>>> code
>>>>>> running on executor like in scala spark?
>>>>>>
>>>>>> If not - how can i know my task id from inside the executors?
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> View this message in context: http://apache-spark-user-list.
>>>>>> 1001560.n3.nabble.com/PySpark-TaskContext-tp28125.html
>>>>>> Sent from the Apache Spark User List mailing list archive at
>>>>>> Nabble.com.
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Cell : 425-233-8271
>>>>> Twitter: https://twitter.com/holdenkarau
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Regards,
>>>> Ofer Eliassaf
>>>>
>>>
>>>
>>>
>>> --
>>> Cell : 425-233-8271
>>> Twitter: https://twitter.com/holdenkarau
>>>
>>
>>
>>
>> --
>> Cell : 425-233-8271
>> Twitter: https://twitter.com/holdenkarau
>>
>
>
>
> --
> Regards,
> Ofer Eliassaf
>



-- 
Cell : 425-233-8271
Twitter: https://twitter.com/holdenkarau

Re: PySpark TaskContext

Posted by Ofer Eliassaf <of...@gmail.com>.
thank u so much for this! Great to see that u listen to the community.

On Thu, Nov 24, 2016 at 12:10 PM, Holden Karau <ho...@pigscanfly.ca> wrote:

> https://issues.apache.org/jira/browse/SPARK-18576
>
> On Thu, Nov 24, 2016 at 2:05 AM, Holden Karau <ho...@pigscanfly.ca>
> wrote:
>
>> Cool - thanks. I'll circle back with the JIRA number once I've got it
>> created - will probably take awhile before it lands in a Spark release
>> (since 2.1 has already branched) but better debugging information for
>> Python users is certainly important/useful.
>>
>> On Thu, Nov 24, 2016 at 2:03 AM, Ofer Eliassaf <of...@gmail.com>
>> wrote:
>>
>>> Since we can't work with log4j in pyspark executors we build our own
>>> logging infrastructure (based on logstash/elastic/kibana).
>>> Would help to have TID in the logs, so we can drill down accordingly.
>>>
>>>
>>> On Thu, Nov 24, 2016 at 11:48 AM, Holden Karau <ho...@pigscanfly.ca>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> The TaskContext isn't currently exposed in PySpark but I've been
>>>> meaning to look at exposing at least some of TaskContext for parity in
>>>> PySpark. Is there a particular use case which you want this for? Would help
>>>> with crafting the JIRA :)
>>>>
>>>> Cheers,
>>>>
>>>> Holden :)
>>>>
>>>> On Thu, Nov 24, 2016 at 1:39 AM, ofer <of...@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>> Is there a way to get in PYSPARK something like TaskContext from a code
>>>>> running on executor like in scala spark?
>>>>>
>>>>> If not - how can i know my task id from inside the executors?
>>>>>
>>>>> Thanks!
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> View this message in context: http://apache-spark-user-list.
>>>>> 1001560.n3.nabble.com/PySpark-TaskContext-tp28125.html
>>>>> Sent from the Apache Spark User List mailing list archive at
>>>>> Nabble.com.
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Cell : 425-233-8271
>>>> Twitter: https://twitter.com/holdenkarau
>>>>
>>>
>>>
>>>
>>> --
>>> Regards,
>>> Ofer Eliassaf
>>>
>>
>>
>>
>> --
>> Cell : 425-233-8271
>> Twitter: https://twitter.com/holdenkarau
>>
>
>
>
> --
> Cell : 425-233-8271
> Twitter: https://twitter.com/holdenkarau
>



-- 
Regards,
Ofer Eliassaf

Re: PySpark TaskContext

Posted by Holden Karau <ho...@pigscanfly.ca>.
https://issues.apache.org/jira/browse/SPARK-18576

On Thu, Nov 24, 2016 at 2:05 AM, Holden Karau <ho...@pigscanfly.ca> wrote:

> Cool - thanks. I'll circle back with the JIRA number once I've got it
> created - will probably take awhile before it lands in a Spark release
> (since 2.1 has already branched) but better debugging information for
> Python users is certainly important/useful.
>
> On Thu, Nov 24, 2016 at 2:03 AM, Ofer Eliassaf <of...@gmail.com>
> wrote:
>
>> Since we can't work with log4j in pyspark executors we build our own
>> logging infrastructure (based on logstash/elastic/kibana).
>> Would help to have TID in the logs, so we can drill down accordingly.
>>
>>
>> On Thu, Nov 24, 2016 at 11:48 AM, Holden Karau <ho...@pigscanfly.ca>
>> wrote:
>>
>>> Hi,
>>>
>>> The TaskContext isn't currently exposed in PySpark but I've been meaning
>>> to look at exposing at least some of TaskContext for parity in PySpark. Is
>>> there a particular use case which you want this for? Would help with
>>> crafting the JIRA :)
>>>
>>> Cheers,
>>>
>>> Holden :)
>>>
>>> On Thu, Nov 24, 2016 at 1:39 AM, ofer <of...@gmail.com> wrote:
>>>
>>>> Hi,
>>>> Is there a way to get in PYSPARK something like TaskContext from a code
>>>> running on executor like in scala spark?
>>>>
>>>> If not - how can i know my task id from inside the executors?
>>>>
>>>> Thanks!
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context: http://apache-spark-user-list.
>>>> 1001560.n3.nabble.com/PySpark-TaskContext-tp28125.html
>>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>>>
>>>>
>>>
>>>
>>> --
>>> Cell : 425-233-8271
>>> Twitter: https://twitter.com/holdenkarau
>>>
>>
>>
>>
>> --
>> Regards,
>> Ofer Eliassaf
>>
>
>
>
> --
> Cell : 425-233-8271
> Twitter: https://twitter.com/holdenkarau
>



-- 
Cell : 425-233-8271
Twitter: https://twitter.com/holdenkarau

Re: PySpark TaskContext

Posted by Holden Karau <ho...@pigscanfly.ca>.
Cool - thanks. I'll circle back with the JIRA number once I've got it
created - will probably take awhile before it lands in a Spark release
(since 2.1 has already branched) but better debugging information for
Python users is certainly important/useful.

On Thu, Nov 24, 2016 at 2:03 AM, Ofer Eliassaf <of...@gmail.com>
wrote:

> Since we can't work with log4j in pyspark executors we build our own
> logging infrastructure (based on logstash/elastic/kibana).
> Would help to have TID in the logs, so we can drill down accordingly.
>
>
> On Thu, Nov 24, 2016 at 11:48 AM, Holden Karau <ho...@pigscanfly.ca>
> wrote:
>
>> Hi,
>>
>> The TaskContext isn't currently exposed in PySpark but I've been meaning
>> to look at exposing at least some of TaskContext for parity in PySpark. Is
>> there a particular use case which you want this for? Would help with
>> crafting the JIRA :)
>>
>> Cheers,
>>
>> Holden :)
>>
>> On Thu, Nov 24, 2016 at 1:39 AM, ofer <of...@gmail.com> wrote:
>>
>>> Hi,
>>> Is there a way to get in PYSPARK something like TaskContext from a code
>>> running on executor like in scala spark?
>>>
>>> If not - how can i know my task id from inside the executors?
>>>
>>> Thanks!
>>>
>>>
>>>
>>> --
>>> View this message in context: http://apache-spark-user-list.
>>> 1001560.n3.nabble.com/PySpark-TaskContext-tp28125.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>>
>>>
>>
>>
>> --
>> Cell : 425-233-8271
>> Twitter: https://twitter.com/holdenkarau
>>
>
>
>
> --
> Regards,
> Ofer Eliassaf
>



-- 
Cell : 425-233-8271
Twitter: https://twitter.com/holdenkarau

Re: PySpark TaskContext

Posted by Ofer Eliassaf <of...@gmail.com>.
Since we can't work with log4j in pyspark executors we build our own
logging infrastructure (based on logstash/elastic/kibana).
Would help to have TID in the logs, so we can drill down accordingly.


On Thu, Nov 24, 2016 at 11:48 AM, Holden Karau <ho...@pigscanfly.ca> wrote:

> Hi,
>
> The TaskContext isn't currently exposed in PySpark but I've been meaning
> to look at exposing at least some of TaskContext for parity in PySpark. Is
> there a particular use case which you want this for? Would help with
> crafting the JIRA :)
>
> Cheers,
>
> Holden :)
>
> On Thu, Nov 24, 2016 at 1:39 AM, ofer <of...@gmail.com> wrote:
>
>> Hi,
>> Is there a way to get in PYSPARK something like TaskContext from a code
>> running on executor like in scala spark?
>>
>> If not - how can i know my task id from inside the executors?
>>
>> Thanks!
>>
>>
>>
>> --
>> View this message in context: http://apache-spark-user-list.
>> 1001560.n3.nabble.com/PySpark-TaskContext-tp28125.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>
>>
>
>
> --
> Cell : 425-233-8271
> Twitter: https://twitter.com/holdenkarau
>



-- 
Regards,
Ofer Eliassaf

Re: PySpark TaskContext

Posted by Holden Karau <ho...@pigscanfly.ca>.
Hi,

The TaskContext isn't currently exposed in PySpark but I've been meaning to
look at exposing at least some of TaskContext for parity in PySpark. Is
there a particular use case which you want this for? Would help with
crafting the JIRA :)

Cheers,

Holden :)

On Thu, Nov 24, 2016 at 1:39 AM, ofer <of...@gmail.com> wrote:

> Hi,
> Is there a way to get in PYSPARK something like TaskContext from a code
> running on executor like in scala spark?
>
> If not - how can i know my task id from inside the executors?
>
> Thanks!
>
>
>
> --
> View this message in context: http://apache-spark-user-list.
> 1001560.n3.nabble.com/PySpark-TaskContext-tp28125.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>


-- 
Cell : 425-233-8271
Twitter: https://twitter.com/holdenkarau