You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hong Shen (JIRA)" <ji...@apache.org> on 2015/01/21 09:47:34 UTC

[jira] [Comment Edited] (SPARK-5347) InputMetrics bug when inputSplit is not instanceOf FileSplit

    [ https://issues.apache.org/jira/browse/SPARK-5347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14285366#comment-14285366 ] 

Hong Shen edited comment on SPARK-5347 at 1/21/15 8:46 AM:
-----------------------------------------------------------

  It's because in HadoopRDD, inputMetrics only been set when split is instanceOf FileSplit, but CombineFileInputFormat use InputSplit. It's not nessesary to  instanceOf FileSplit, only have to  instanceOf InputSplit.
{code}
      override def close() {
        try {
          reader.close()
          if (bytesReadCallback.isDefined) {
            val bytesReadFn = bytesReadCallback.get
            inputMetrics.bytesRead = bytesReadFn()
          } else if (split.inputSplit.value.isInstanceOf[FileSplit]) {
            // If we can't get the bytes read from the FS stats, fall back to the split size,
            // which may be inaccurate.
            try {
              inputMetrics.bytesRead = split.inputSplit.value.getLength
              context.taskMetrics.inputMetrics = Some(inputMetrics)
            } catch {
              case e: java.io.IOException =>
                logWarning("Unable to get input size to set InputMetrics for task", e)
            }
          }
        } catch {
          case e: Exception => {
            if (!Utils.inShutdown()) {
              logWarning("Exception in RecordReader.close()", e)
            }
          }
        }
      }
{code}


was (Author: shenhong):
  It's because in HadoopRDD, inputMetrics only been set when split is instanceOf FileSplit, but CombineFileInputFormat use InputSplit. It's not nessesary to  instanceOf FileSplit, only have to  instanceOf InputSplit.
{code}
if (bytesReadCallback.isDefined) {
            val bytesReadFn = bytesReadCallback.get
            inputMetrics.bytesRead = bytesReadFn()
          } else if (split.inputSplit.value.isInstanceOf[FileSplit]) {
            // If we can't get the bytes read from the FS stats, fall back to the split size,
            // which may be inaccurate.
            try {
              inputMetrics.bytesRead = split.inputSplit.value.getLength
              context.taskMetrics.inputMetrics = Some(inputMetrics)
            } catch {
              case e: java.io.IOException =>
                logWarning("Unable to get input size to set InputMetrics for task", e)
            }
          }
{code}

> InputMetrics bug when inputSplit is not instanceOf FileSplit
> ------------------------------------------------------------
>
>                 Key: SPARK-5347
>                 URL: https://issues.apache.org/jira/browse/SPARK-5347
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.2.0
>            Reporter: Hong Shen
>
> When inputFormatClass is set to CombineFileInputFormat, input metrics show that input is empty. It don't appear is spark-1.1.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org