You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Thomas Graves (Jira)" <ji...@apache.org> on 2020/09/14 15:34:00 UTC

[jira] [Commented] (SPARK-32037) Rename blacklisting feature to avoid language with racist connotation

    [ https://issues.apache.org/jira/browse/SPARK-32037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17195531#comment-17195531 ] 

Thomas Graves commented on SPARK-32037:
---------------------------------------

it is a good point about blocklist being typoed (but I would hope would be caught in reviews) but if you are looking at amount of change it is only 1 character.  Also I don't really see how BlocklistTracker sounds any worse then BlacklistTracker.  Both might be a bit weird. HealthTracker might be better there although would be better if we could give context to what health and in this case its either node or executor which is hard to give a name to that includes both.  Like you pointed out then you have TaskSetHealthTracker - which isn't really right because its tracking the health of the node/executor for that taskset not the taskset itself.

If you look at the description to the config denied seems a bit weird to me:

_If set to "true", prevent Spark from scheduling tasks on executors that have been blacklisted due to too many task failures. The blacklisting algorithm can be further controlled by the other "spark.blacklist" configuration options._

If we look at the options in the context of this sentence...:

executor that have been denied due to too many task failures

executors that have been blocked due to too many task failures

executors that have been excluded due to to many task failures

The last 2 definitely make more sense in that context.  Now you could definitely re-write the sentence for denied, but the other thing is that executors can be removed from the list so denied/allowed or removed from denied doesn't make as much sense to me in this context.  block or exclude make more sense to me if they can go active again (blocked/unblocked or excluded/included).  

Naming things is always a pain.  I think based on all the feedback if no one has strong objections I will go with "blocklist".  I'll start to make the changes and should start to see in the context of this if it doesn't make sense.  Perhaps we can do a mix of things where the BlacklistTracker would be renamed HealthTracker but other things internally are referred to as blocklist or blocked.

> Rename blacklisting feature to avoid language with racist connotation
> ---------------------------------------------------------------------
>
>                 Key: SPARK-32037
>                 URL: https://issues.apache.org/jira/browse/SPARK-32037
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 3.0.1
>            Reporter: Erik Krogen
>            Priority: Minor
>
> As per [discussion on the Spark dev list|https://lists.apache.org/thread.html/rf6b2cdcba4d3875350517a2339619e5d54e12e66626a88553f9fe275%40%3Cdev.spark.apache.org%3E], it will be beneficial to remove references to problematic language that can alienate potential community members. One such reference is "blacklist". While it seems to me that there is some valid debate as to whether this term has racist origins, the cultural connotations are inescapable in today's world.
> I've created a separate task, SPARK-32036, to remove references outside of this feature. Given the large surface area of this feature and the public-facing UI / configs / etc., more care will need to be taken here.
> I'd like to start by opening up debate on what the best replacement name would be. Reject-/deny-/ignore-/block-list are common replacements for "blacklist", but I'm not sure that any of them work well for this situation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org