You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Charles Reiss (JIRA)" <ji...@apache.org> on 2014/11/12 01:16:33 UTC

[jira] [Resolved] (SPARK-4157) Task input statistics incomplete when a task reads from multiple locations

     [ https://issues.apache.org/jira/browse/SPARK-4157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Charles Reiss resolved SPARK-4157.
----------------------------------
    Resolution: Duplicate

> Task input statistics incomplete when a task reads from multiple locations
> --------------------------------------------------------------------------
>
>                 Key: SPARK-4157
>                 URL: https://issues.apache.org/jira/browse/SPARK-4157
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.1.0
>            Reporter: Charles Reiss
>            Priority: Minor
>
> SPARK-1683 introduced tracking of filesystem reads for tasks, but the tracking code assumes that each task reads from exactly one file/cache block, and replaces any prior InputMetrics object for a task after each read.
> But, for example, a task computing a shuffle-less join (input RDDs are prepartitioned by key) may read two or more cached dependency RDD blocks from cache. In this case, the displayed input size will be for whichever dependency was requested last.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org