You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Patrick Wendell (JIRA)" <ji...@apache.org> on 2014/10/16 05:55:33 UTC

[jira] [Created] (SPARK-3963) Support getting task-scoped properties from TaskContext

Patrick Wendell created SPARK-3963:
--------------------------------------

             Summary: Support getting task-scoped properties from TaskContext
                 Key: SPARK-3963
                 URL: https://issues.apache.org/jira/browse/SPARK-3963
             Project: Spark
          Issue Type: Improvement
          Components: Spark Core
            Reporter: Patrick Wendell


This is a proposal for a minor feature. Given stabilization of the TaskContext API, it would be nice to have a mechanism for Spark jobs to access properties that are defined based on task-level scope by Spark RDD's. I'd like to propose adding a simple properties hash map with some standard spark properties that users can access. Later it would be nice to support users setting these properties, but for now to keep it simple in 1.2. I'd prefer users not be able to set them.

The main use case is providing the file name from Hadoop RDD's, a very common request. But I'd imagine us using this for other things later on. We could also use this to expose some of the taskMetrics, such as e.g. the input bytes.

{code}
val data = sc.textFile("s3n//..2014/*/*/*.json")
data.mapPartitions { 
  val fileName = TaskContext.get.getProperty(TaskContext.HADOOP_FILE_NAME)
  val parts = fileName.split("/")
  val (year, month, day) = (parts[3], parts[4], parts[5])
}
{code}

Internally we'd have a method called setProperty, but this wouldn't be exposed initially.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org