You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by jerryshao <gi...@git.apache.org> on 2015/04/24 08:31:04 UTC

[GitHub] spark pull request: [SPARK-7112][Streaming] Add a DirectStreamTrac...

GitHub user jerryshao opened a pull request:

    https://github.com/apache/spark/pull/5680

    [SPARK-7112][Streaming] Add a DirectStreamTracker to track the direct streams

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jerryshao/apache-spark SPARK-7111

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/5680.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #5680
    
----
commit 28d668faf51495e779aa1f874ceb03a64bccf410
Author: jerryshao <sa...@intel.com>
Date:   2015-04-24T06:07:54Z

    Add DirectStreamTracker to track the direct streams

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7112][Streaming][WIP] Add a InputInfoTr...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the pull request:

    https://github.com/apache/spark/pull/5680#issuecomment-97695062
  
    Yes, I will do this, please take take a look at the whole design, thanks a lot :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7112][Streaming][WIP] Add a InputInfoTr...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5680#discussion_r29453720
  
    --- Diff: streaming/src/main/scala/org/apache/spark/streaming/ui/StreamingJobProgressListener.scala ---
    @@ -70,9 +70,8 @@ private[streaming] class StreamingJobProgressListener(ssc: StreamingContext)
         runningBatchInfos(batchStarted.batchInfo.batchTime) = batchStarted.batchInfo
         waitingBatchInfos.remove(batchStarted.batchInfo.batchTime)
     
    -    batchStarted.batchInfo.receivedBlockInfo.foreach { case (_, infos) =>
    -      totalReceivedRecords += infos.map(_.numRecords).sum
    -    }
    +    // TODO. this should be fixed when input stream is not receiver based stream.
    +    totalReceivedRecords += batchStarted.batchInfo.streamIdToNumRecords.values.sum
    --- End diff --
    
    Yes we need to. Receiver or not, all the data has been received by the
    system.
    On Apr 30, 2015 1:10 AM, "Saisai Shao" <no...@github.com> wrote:
    
    > In
    > streaming/src/main/scala/org/apache/spark/streaming/ui/StreamingJobProgressListener.scala
    > <https://github.com/apache/spark/pull/5680#discussion_r29410293>:
    >
    > > @@ -70,9 +70,8 @@ private[streaming] class StreamingJobProgressListener(ssc: StreamingContext)
    > >      runningBatchInfos(batchStarted.batchInfo.batchTime) = batchStarted.batchInfo
    > >      waitingBatchInfos.remove(batchStarted.batchInfo.batchTime)
    > >
    > > -    batchStarted.batchInfo.receivedBlockInfo.foreach { case (_, infos) =>
    > > -      totalReceivedRecords += infos.map(_.numRecords).sum
    > > -    }
    > > +    // TODO. this should be fixed when input stream is not receiver based stream.
    > > +    totalReceivedRecords += batchStarted.batchInfo.streamIdToNumRecords.values.sum
    >
    > Yes, will do. Also have one concern, if the batchStarted is not a
    > receiver-based batchInfo, so do we need to count this records into
    > totalReceivedRecords when batch is just started.
    >
    > —
    > Reply to this email directly or view it on GitHub
    > <https://github.com/apache/spark/pull/5680/files#r29410293>.
    >



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7112][Streaming] Add a DirectStreamTrac...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5680#issuecomment-96769258
  
      [Test build #30979 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30979/consoleFull) for   PR 5680 at commit [`28d668f`](https://github.com/apache/spark/commit/28d668faf51495e779aa1f874ceb03a64bccf410).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7112][Streaming][WIP] Add a InputInfoTr...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5680#issuecomment-97709908
  
      [Test build #31407 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31407/consoleFull) for   PR 5680 at commit [`17fa251`](https://github.com/apache/spark/commit/17fa2513f2c4f9f1847f5493989d8b3c78278eb8).
     * This patch **fails to build**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class InputInfo(batchTime: Time, inputStreamId: Int, numRecords: Long)`
    
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7112][Streaming][WIP] Add a InputInfoTr...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5680#issuecomment-97692345
  
      [Test build #31397 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31397/consoleFull) for   PR 5680 at commit [`3ad00d9`](https://github.com/apache/spark/commit/3ad00d9d1171cdf0563167a0e368482fb798043b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7112][Streaming][WIP] Add a InputInfoTr...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5680#discussion_r29408816
  
    --- Diff: streaming/src/main/scala/org/apache/spark/streaming/scheduler/InputInfoTracker.scala ---
    @@ -0,0 +1,74 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.streaming.scheduler
    +
    +import scala.collection.mutable
    +
    +import org.apache.spark.Logging
    +import org.apache.spark.streaming.{Time, StreamingContext}
    +
    +/** To track the information of input stream at specified batch time. */
    +case class InputInfo(batchTime: Time, inputStreamId: Int, numRecords: Long)
    +
    +/**
    + * This class manages all the input streams as well as their input data statistics. The information
    + * will output to StreamingListener to better monitoring.
    + */
    +private[streaming] class InputInfoTracker(ssc: StreamingContext) extends Logging {
    +
    +  /** Track all the input streams registered in DStreamGraph */
    +  val inputStreams = ssc.graph.getInputStreams()
    +  /** Track all the id of input streams registered in DStreamGraph */
    +  val inputStreamIds = inputStreams.map(_.id)
    +
    +  // Map to track all the InputInfo related to specific batch time and input stream.
    +  private val batchTimeToInputInfos = new mutable.HashMap[Time, mutable.HashMap[Int, InputInfo]]
    +
    +  /** Report the input information with batch time to the tracker */
    +  def reportInfo(batchTime: Time, inputInfo: InputInfo): Unit = synchronized {
    +    val inputInfos = batchTimeToInputInfos.getOrElseUpdate(batchTime,
    +      new mutable.HashMap[Int, InputInfo]())
    +
    +    if (inputInfos.contains(inputInfo.inputStreamId)) {
    +      throw new IllegalStateException(s"Input stream ${inputInfo.inputStreamId}} for batch" +
    +        s"$batchTime is already added into InputInfoTracker, this is a illegal state")
    +    }
    +    inputInfos += ((inputInfo.inputStreamId, inputInfo))
    +  }
    +
    +  /** Get the all the input stream's information of specified batch time */
    +  def getInfo(batchTime: Time): Map[Int, InputInfo] = synchronized {
    +    val inputInfos = batchTimeToInputInfos.get(batchTime)
    +    // Convert mutable HashMap to immutable Map for the caller
    +    inputInfos.map(_.toMap).getOrElse(Map[Int, InputInfo]())
    +  }
    +
    +  /** Get the input information of specified batch time and input stream id */
    +  def getInfoOfBatchAndStream(batchTime: Time, inputStreamId: Int
    --- End diff --
    
    This is not used anywhere other than tests, is this necessary?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7112][Streaming][WIP] Add a InputInfoTr...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5680#issuecomment-97712628
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7112][Streaming][WIP] Add a InputInfoTr...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5680#issuecomment-97712927
  
      [Test build #31411 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31411/consoleFull) for   PR 5680 at commit [`8325787`](https://github.com/apache/spark/commit/8325787bf13bcca16a405561413f1d81b3229941).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7112][Streaming][WIP] Add a InputInfoTr...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5680#issuecomment-97697502
  
      [Test build #31397 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31397/consoleFull) for   PR 5680 at commit [`3ad00d9`](https://github.com/apache/spark/commit/3ad00d9d1171cdf0563167a0e368482fb798043b).
     * This patch **fails MiMa tests**.
     * This patch **does not merge cleanly**.
     * This patch adds the following public classes _(experimental)_:
      * `case class InputInfo(batchTime: Time, inputStreamId: Int, numRecords: Long)`
    
     * This patch **adds the following new dependencies:**
       * `tachyon-0.5.0.jar`
       * `tachyon-client-0.5.0.jar`
    
     * This patch **removes the following dependencies:**
       * `spark-unsafe_2.10-1.4.0-SNAPSHOT.jar`
       * `tachyon-0.6.4.jar`
       * `tachyon-client-0.6.4.jar`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7112][Streaming] Add a DirectStreamTrac...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5680#issuecomment-95845825
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30920/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7112][Streaming] Add a DirectStreamTrac...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5680#issuecomment-97692199
  
    Build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7112][Streaming][WIP] Add a InputInfoTr...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5680#issuecomment-97697526
  
    Build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7112][Streaming][WIP] Add a InputInfoTr...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5680#issuecomment-97725321
  
      [Test build #31408 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31408/consoleFull) for   PR 5680 at commit [`8325787`](https://github.com/apache/spark/commit/8325787bf13bcca16a405561413f1d81b3229941).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class Param[T] (val parent: Params, val name: String, val doc: String, val isValid: T => Boolean)`
      * `class DoubleParam(parent: Params, name: String, doc: String, isValid: Double => Boolean)`
      * `class IntParam(parent: Params, name: String, doc: String, isValid: Int => Boolean)`
      * `class FloatParam(parent: Params, name: String, doc: String, isValid: Float => Boolean)`
      * `class LongParam(parent: Params, name: String, doc: String, isValid: Long => Boolean)`
      * `class BooleanParam(parent: Params, name: String, doc: String) // No need for isValid`
      * `case class ParamPair[T](param: Param[T], value: T) `
      * `class KMeansModel (`
      * `trait PMMLExportable `
      * `case class Sample(`
      * `case class Sample(`
      * `case class InputInfo(batchTime: Time, inputStreamId: Int, numRecords: Long)`
    
     * This patch **removes the following dependencies:**
       * `RoaringBitmap-0.4.5.jar`
       * `activation-1.1.jar`
       * `akka-actor_2.10-2.3.4-spark.jar`
       * `akka-remote_2.10-2.3.4-spark.jar`
       * `akka-slf4j_2.10-2.3.4-spark.jar`
       * `aopalliance-1.0.jar`
       * `arpack_combined_all-0.1.jar`
       * `avro-1.7.7.jar`
       * `breeze-macros_2.10-0.11.2.jar`
       * `breeze_2.10-0.11.2.jar`
       * `chill-java-0.5.0.jar`
       * `chill_2.10-0.5.0.jar`
       * `commons-beanutils-1.7.0.jar`
       * `commons-beanutils-core-1.8.0.jar`
       * `commons-cli-1.2.jar`
       * `commons-codec-1.10.jar`
       * `commons-collections-3.2.1.jar`
       * `commons-compress-1.4.1.jar`
       * `commons-configuration-1.6.jar`
       * `commons-digester-1.8.jar`
       * `commons-httpclient-3.1.jar`
       * `commons-io-2.1.jar`
       * `commons-lang-2.5.jar`
       * `commons-lang3-3.3.2.jar`
       * `commons-math-2.1.jar`
       * `commons-math3-3.4.1.jar`
       * `commons-net-2.2.jar`
       * `compress-lzf-1.0.0.jar`
       * `config-1.2.1.jar`
       * `core-1.1.2.jar`
       * `curator-client-2.4.0.jar`
       * `curator-framework-2.4.0.jar`
       * `curator-recipes-2.4.0.jar`
       * `gmbal-api-only-3.0.0-b023.jar`
       * `grizzly-framework-2.1.2.jar`
       * `grizzly-http-2.1.2.jar`
       * `grizzly-http-server-2.1.2.jar`
       * `grizzly-http-servlet-2.1.2.jar`
       * `grizzly-rcm-2.1.2.jar`
       * `groovy-all-2.3.7.jar`
       * `guava-14.0.1.jar`
       * `guice-3.0.jar`
       * `hadoop-annotations-2.2.0.jar`
       * `hadoop-auth-2.2.0.jar`
       * `hadoop-client-2.2.0.jar`
       * `hadoop-common-2.2.0.jar`
       * `hadoop-hdfs-2.2.0.jar`
       * `hadoop-mapreduce-client-app-2.2.0.jar`
       * `hadoop-mapreduce-client-common-2.2.0.jar`
       * `hadoop-mapreduce-client-core-2.2.0.jar`
       * `hadoop-mapreduce-client-jobclient-2.2.0.jar`
       * `hadoop-mapreduce-client-shuffle-2.2.0.jar`
       * `hadoop-yarn-api-2.2.0.jar`
       * `hadoop-yarn-client-2.2.0.jar`
       * `hadoop-yarn-common-2.2.0.jar`
       * `hadoop-yarn-server-common-2.2.0.jar`
       * `ivy-2.4.0.jar`
       * `jackson-annotations-2.4.0.jar`
       * `jackson-core-2.4.4.jar`
       * `jackson-core-asl-1.8.8.jar`
       * `jackson-databind-2.4.4.jar`
       * `jackson-jaxrs-1.8.8.jar`
       * `jackson-mapper-asl-1.8.8.jar`
       * `jackson-module-scala_2.10-2.4.4.jar`
       * `jackson-xc-1.8.8.jar`
       * `jansi-1.4.jar`
       * `javax.inject-1.jar`
       * `javax.servlet-3.0.0.v201112011016.jar`
       * `javax.servlet-3.1.jar`
       * `javax.servlet-api-3.0.1.jar`
       * `jaxb-api-2.2.2.jar`
       * `jaxb-impl-2.2.3-1.jar`
       * `jcl-over-slf4j-1.7.10.jar`
       * `jersey-client-1.9.jar`
       * `jersey-core-1.9.jar`
       * `jersey-grizzly2-1.9.jar`
       * `jersey-guice-1.9.jar`
       * `jersey-json-1.9.jar`
       * `jersey-server-1.9.jar`
       * `jersey-test-framework-core-1.9.jar`
       * `jersey-test-framework-grizzly2-1.9.jar`
       * `jets3t-0.7.1.jar`
       * `jettison-1.1.jar`
       * `jetty-util-6.1.26.jar`
       * `jline-0.9.94.jar`
       * `jline-2.10.4.jar`
       * `jodd-core-3.6.3.jar`
       * `json4s-ast_2.10-3.2.10.jar`
       * `json4s-core_2.10-3.2.10.jar`
       * `json4s-jackson_2.10-3.2.10.jar`
       * `jsr305-1.3.9.jar`
       * `jtransforms-2.4.0.jar`
       * `jul-to-slf4j-1.7.10.jar`
       * `kryo-2.21.jar`
       * `log4j-1.2.17.jar`
       * `lz4-1.2.0.jar`
       * `management-api-3.0.0-b012.jar`
       * `mesos-0.21.0-shaded-protobuf.jar`
       * `metrics-core-3.1.0.jar`
       * `metrics-graphite-3.1.0.jar`
       * `metrics-json-3.1.0.jar`
       * `metrics-jvm-3.1.0.jar`
       * `minlog-1.2.jar`
       * `netty-3.8.0.Final.jar`
       * `netty-all-4.0.23.Final.jar`
       * `objenesis-1.2.jar`
       * `opencsv-2.3.jar`
       * `oro-2.0.8.jar`
       * `paranamer-2.6.jar`
       * `parquet-column-1.6.0rc3.jar`
       * `parquet-common-1.6.0rc3.jar`
       * `parquet-encoding-1.6.0rc3.jar`
       * `parquet-format-2.2.0-rc1.jar`
       * `parquet-generator-1.6.0rc3.jar`
       * `parquet-hadoop-1.6.0rc3.jar`
       * `parquet-jackson-1.6.0rc3.jar`
       * `protobuf-java-2.4.1.jar`
       * `protobuf-java-2.5.0-spark.jar`
       * `py4j-0.8.2.1.jar`
       * `pyrolite-2.0.1.jar`
       * `quasiquotes_2.10-2.0.1.jar`
       * `reflectasm-1.07-shaded.jar`
       * `scala-compiler-2.10.4.jar`
       * `scala-library-2.10.4.jar`
       * `scala-reflect-2.10.4.jar`
       * `scalap-2.10.4.jar`
       * `scalatest_2.10-2.2.1.jar`
       * `slf4j-api-1.7.10.jar`
       * `slf4j-log4j12-1.7.10.jar`
       * `snappy-java-1.1.1.7.jar`
       * `spark-bagel_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-catalyst_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-core_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-graphx_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-launcher_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-mllib_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-network-common_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-network-shuffle_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-repl_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-sql_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-streaming_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-unsafe_2.10-1.4.0-SNAPSHOT.jar`
       * `spire-macros_2.10-0.7.4.jar`
       * `spire_2.10-0.7.4.jar`
       * `stax-api-1.0.1.jar`
       * `stream-2.7.0.jar`
       * `tachyon-0.6.4.jar`
       * `tachyon-client-0.6.4.jar`
       * `uncommons-maths-1.2.2a.jar`
       * `unused-1.0.0.jar`
       * `xmlenc-0.52.jar`
       * `xz-1.0.jar`
       * `zookeeper-3.4.5.jar`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7112][Streaming][WIP] Add a InputInfoTr...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5680#issuecomment-97763562
  
      [Test build #31417 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31417/consoleFull) for   PR 5680 at commit [`6682bef`](https://github.com/apache/spark/commit/6682bef8de8104243ba5f3d2f4305ba0b0bda3f8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7112][Streaming][WIP] Add a InputInfoTr...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5680#discussion_r29411380
  
    --- Diff: streaming/src/main/scala/org/apache/spark/streaming/scheduler/InputInfoTracker.scala ---
    @@ -0,0 +1,74 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.streaming.scheduler
    +
    +import scala.collection.mutable
    +
    +import org.apache.spark.Logging
    +import org.apache.spark.streaming.{Time, StreamingContext}
    +
    +/** To track the information of input stream at specified batch time. */
    +case class InputInfo(batchTime: Time, inputStreamId: Int, numRecords: Long)
    +
    +/**
    + * This class manages all the input streams as well as their input data statistics. The information
    + * will output to StreamingListener to better monitoring.
    + */
    +private[streaming] class InputInfoTracker(ssc: StreamingContext) extends Logging {
    +
    +  /** Track all the input streams registered in DStreamGraph */
    +  val inputStreams = ssc.graph.getInputStreams()
    +  /** Track all the id of input streams registered in DStreamGraph */
    +  val inputStreamIds = inputStreams.map(_.id)
    --- End diff --
    
    Originally I'd like to expose these, but these can also be gotten from DStreamGraph, so I will remove these.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7112][Streaming] Add a DirectStreamTrac...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on the pull request:

    https://github.com/apache/spark/pull/5680#issuecomment-97613397
  
    I have posted a different design on the JIRA. Please take a look.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7112][Streaming][WIP] Add a InputInfoTr...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5680#discussion_r29408882
  
    --- Diff: streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobScheduler.scala ---
    @@ -50,6 +50,8 @@ class JobScheduler(val ssc: StreamingContext) extends Logging {
       // These two are created only when scheduler starts.
       // eventLoop not being null means the scheduler has been started and not stopped
       var receiverTracker: ReceiverTracker = null
    +  // A tracker to track all the input stream information as well as processed record number
    +  var inputInfoTracker: InputInfoTracker = null
    --- End diff --
    
    an empty line after this would be good.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7112][Streaming][WIP] Add a InputInfoTr...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5680#issuecomment-97762965
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7112][Streaming][WIP] Add a InputInfoTr...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5680#discussion_r29427075
  
    --- Diff: streaming/src/main/scala/org/apache/spark/streaming/ui/StreamingJobProgressListener.scala ---
    @@ -101,6 +101,7 @@ private[streaming] class StreamingJobProgressListener(ssc: StreamingContext)
         runningBatchUIData(batchStarted.batchInfo.batchTime) = BatchUIData(batchStarted.batchInfo)
         waitingBatchUIData.remove(batchStarted.batchInfo.batchTime)
     
    +    // TODO. this should be fixed when input stream is not receiver based stream.
    --- End diff --
    
    All these TODOs means that: now `BatchInfo#streamIdtoNumRecords` will track both the input stream's received data (for receiver-based stream) and to-be-processed data number (for receiver-less stream). So the semantics in `StreamingJobProgressListener` is slightly different.
    
    Take `totalReceivedRecords` as an example: should we add the receiver-less input stream's number of to-be-processed record into here, since it is not actually received yet. So we have to figure out this semantics difference.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7112][Streaming][WIP] Add a InputInfoTr...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5680#discussion_r29409167
  
    --- Diff: streaming/src/main/scala/org/apache/spark/streaming/scheduler/InputInfoTracker.scala ---
    @@ -0,0 +1,74 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.streaming.scheduler
    +
    +import scala.collection.mutable
    +
    +import org.apache.spark.Logging
    +import org.apache.spark.streaming.{Time, StreamingContext}
    +
    +/** To track the information of input stream at specified batch time. */
    +case class InputInfo(batchTime: Time, inputStreamId: Int, numRecords: Long)
    +
    +/**
    + * This class manages all the input streams as well as their input data statistics. The information
    + * will output to StreamingListener to better monitoring.
    + */
    +private[streaming] class InputInfoTracker(ssc: StreamingContext) extends Logging {
    +
    +  /** Track all the input streams registered in DStreamGraph */
    +  val inputStreams = ssc.graph.getInputStreams()
    +  /** Track all the id of input streams registered in DStreamGraph */
    +  val inputStreamIds = inputStreams.map(_.id)
    +
    +  // Map to track all the InputInfo related to specific batch time and input stream.
    +  private val batchTimeToInputInfos = new mutable.HashMap[Time, mutable.HashMap[Int, InputInfo]]
    +
    +  /** Report the input information with batch time to the tracker */
    +  def reportInfo(batchTime: Time, inputInfo: InputInfo): Unit = synchronized {
    +    val inputInfos = batchTimeToInputInfos.getOrElseUpdate(batchTime,
    +      new mutable.HashMap[Int, InputInfo]())
    +
    +    if (inputInfos.contains(inputInfo.inputStreamId)) {
    +      throw new IllegalStateException(s"Input stream ${inputInfo.inputStreamId}} for batch" +
    +        s"$batchTime is already added into InputInfoTracker, this is a illegal state")
    +    }
    +    inputInfos += ((inputInfo.inputStreamId, inputInfo))
    +  }
    +
    +  /** Get the all the input stream's information of specified batch time */
    +  def getInfo(batchTime: Time): Map[Int, InputInfo] = synchronized {
    +    val inputInfos = batchTimeToInputInfos.get(batchTime)
    +    // Convert mutable HashMap to immutable Map for the caller
    +    inputInfos.map(_.toMap).getOrElse(Map[Int, InputInfo]())
    +  }
    +
    +  /** Get the input information of specified batch time and input stream id */
    +  def getInfoOfBatchAndStream(batchTime: Time, inputStreamId: Int
    --- End diff --
    
    yes, only used for test, I can remove it if necessary.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7112][Streaming] Add a DirectStreamTrac...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5680#issuecomment-95819308
  
      [Test build #30920 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30920/consoleFull) for   PR 5680 at commit [`28d668f`](https://github.com/apache/spark/commit/28d668faf51495e779aa1f874ceb03a64bccf410).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7112][Streaming][WIP] Add a InputInfoTr...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5680#issuecomment-97725326
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7112][Streaming][WIP] Add a InputInfoTr...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on the pull request:

    https://github.com/apache/spark/pull/5680#issuecomment-97694754
  
    There are merge conflicts! Please merge master!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7112][Streaming][WIP] Add a InputInfoTr...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5680#issuecomment-97762940
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7112][Streaming][WIP] Add a InputInfoTr...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the pull request:

    https://github.com/apache/spark/pull/5680#issuecomment-97711812
  
    Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7112][Streaming][WIP] Add a InputInfoTr...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5680#discussion_r29410424
  
    --- Diff: streaming/src/main/scala/org/apache/spark/streaming/ui/StreamingPage.scala ---
    @@ -95,7 +95,7 @@ private[ui] class StreamingPage(parent: StreamingTab)
             "Maximum rate\n[events/sec]",
             "Last Error"
           )
    -      val dataRows = (0 until listener.numReceivers).map { receiverId =>
    --- End diff --
    
    Now all the input streams will have a unique id (not only receiver based input streams), so assuming  this will get error.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7112][Streaming][WIP] Add a InputInfoTr...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5680#issuecomment-97822054
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7112][Streaming][WIP] Add a InputInfoTr...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5680#issuecomment-97709002
  
      [Test build #31408 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31408/consoleFull) for   PR 5680 at commit [`8325787`](https://github.com/apache/spark/commit/8325787bf13bcca16a405561413f1d81b3229941).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7112][Streaming][WIP] Add a InputInfoTr...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5680#discussion_r29409159
  
    --- Diff: streaming/src/main/scala/org/apache/spark/streaming/ui/StreamingJobProgressListener.scala ---
    @@ -135,28 +132,25 @@ private[streaming] class StreamingJobProgressListener(ssc: StreamingContext)
     
       def receivedRecordsDistributions: Map[Int, Option[Distribution]] = synchronized {
         val latestBatchInfos = retainedBatches.reverse.take(batchInfoLimit)
    -    val latestBlockInfos = latestBatchInfos.map(_.receivedBlockInfo)
    -    (0 until numReceivers).map { receiverId =>
    -      val blockInfoOfParticularReceiver = latestBlockInfos.map { batchInfo =>
    -        batchInfo.get(receiverId).getOrElse(Array.empty)
    -      }
    -      val recordsOfParticularReceiver = blockInfoOfParticularReceiver.map { blockInfo =>
    -      // calculate records per second for each batch
    -        blockInfo.map(_.numRecords).sum.toDouble * 1000 / batchDuration
    -      }
    -      val distributionOption = Distribution(recordsOfParticularReceiver)
    -      (receiverId, distributionOption)
    +
    +    // TODO. this should be fixed when receiver-less input stream is mixed into BatchInfo
    --- End diff --
    
    What does this to do mean?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7112][Streaming][WIP] Add a InputInfoTr...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5680#discussion_r29410045
  
    --- Diff: streaming/src/main/scala/org/apache/spark/streaming/ui/StreamingJobProgressListener.scala ---
    @@ -135,28 +132,25 @@ private[streaming] class StreamingJobProgressListener(ssc: StreamingContext)
     
       def receivedRecordsDistributions: Map[Int, Option[Distribution]] = synchronized {
         val latestBatchInfos = retainedBatches.reverse.take(batchInfoLimit)
    -    val latestBlockInfos = latestBatchInfos.map(_.receivedBlockInfo)
    -    (0 until numReceivers).map { receiverId =>
    -      val blockInfoOfParticularReceiver = latestBlockInfos.map { batchInfo =>
    -        batchInfo.get(receiverId).getOrElse(Array.empty)
    -      }
    -      val recordsOfParticularReceiver = blockInfoOfParticularReceiver.map { blockInfo =>
    -      // calculate records per second for each batch
    -        blockInfo.map(_.numRecords).sum.toDouble * 1000 / batchDuration
    -      }
    -      val distributionOption = Distribution(recordsOfParticularReceiver)
    -      (receiverId, distributionOption)
    +
    +    // TODO. this should be fixed when receiver-less input stream is mixed into BatchInfo
    --- End diff --
    
    This is what makes me concern a lot. Now for the `BatchInfo's streamIdToNumRecords`, all the input stream's statistic data will be in it, not receiver-based input stream, so do we need to differentiate the statistics?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7112][Streaming][WIP] Add a InputInfoTr...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5680#issuecomment-97725327
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31408/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7112][Streaming][WIP] Add a InputInfoTr...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5680#discussion_r29427122
  
    --- Diff: streaming/src/main/scala/org/apache/spark/streaming/ui/StreamingJobProgressListener.scala ---
    @@ -188,25 +189,26 @@ private[streaming] class StreamingJobProgressListener(ssc: StreamingContext)
       }
     
       def receivedRecordsDistributions: Map[Int, Option[Distribution]] = synchronized {
    -    val latestBatches = retainedBatches.reverse.take(batchUIDataLimit)
    -    (0 until numReceivers).map { receiverId =>
    -      val recordsOfParticularReceiver = latestBatches.map { batch =>
    -        // calculate records per second for each batch
    -        batch.receiverNumRecords.get(receiverId).sum.toDouble * 1000 / batchDuration
    -      }
    -      val distributionOption = Distribution(recordsOfParticularReceiver)
    -      (receiverId, distributionOption)
    +    val latestBatchInfos = retainedBatches.reverse.take(batchUIDataLimit)
    +
    +    // TODO. this should be fixed when receiver-less input stream is mixed into BatchInfo
    --- End diff --
    
    Maybe this TODO is not necessary.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7112][Streaming][WIP] Add a InputInfoTr...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5680#issuecomment-97709914
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31407/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7112][Streaming][WIP] Add a InputInfoTr...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5680#issuecomment-97708865
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7112][Streaming][WIP] Add a InputInfoTr...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5680#discussion_r29410293
  
    --- Diff: streaming/src/main/scala/org/apache/spark/streaming/ui/StreamingJobProgressListener.scala ---
    @@ -70,9 +70,8 @@ private[streaming] class StreamingJobProgressListener(ssc: StreamingContext)
         runningBatchInfos(batchStarted.batchInfo.batchTime) = batchStarted.batchInfo
         waitingBatchInfos.remove(batchStarted.batchInfo.batchTime)
     
    -    batchStarted.batchInfo.receivedBlockInfo.foreach { case (_, infos) =>
    -      totalReceivedRecords += infos.map(_.numRecords).sum
    -    }
    +    // TODO. this should be fixed when input stream is not receiver based stream.
    +    totalReceivedRecords += batchStarted.batchInfo.streamIdToNumRecords.values.sum
    --- End diff --
    
    Yes, will do. Also have one concern, if the `batchStarted` is not a receiver-based batchInfo, so do we need to count this records into `totalReceivedRecords` when batch is just started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7112][Streaming][WIP] Add a InputInfoTr...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5680#issuecomment-97712611
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7112][Streaming] Add a DirectStreamTrac...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5680#issuecomment-97692178
  
     Build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7112][Streaming][WIP] Add a InputInfoTr...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the pull request:

    https://github.com/apache/spark/pull/5680#issuecomment-97708542
  
    Hi @zsxwing, you also changed a lot on this part, would you please take a look at this, thanks a lot.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7112][Streaming][WIP] Add a InputInfoTr...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5680#issuecomment-97708818
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7112][Streaming][WIP] Add a InputInfoTr...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5680#discussion_r29408785
  
    --- Diff: streaming/src/main/scala/org/apache/spark/streaming/scheduler/InputInfoTracker.scala ---
    @@ -0,0 +1,74 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.streaming.scheduler
    +
    +import scala.collection.mutable
    +
    +import org.apache.spark.Logging
    +import org.apache.spark.streaming.{Time, StreamingContext}
    +
    +/** To track the information of input stream at specified batch time. */
    +case class InputInfo(batchTime: Time, inputStreamId: Int, numRecords: Long)
    +
    +/**
    + * This class manages all the input streams as well as their input data statistics. The information
    + * will output to StreamingListener to better monitoring.
    + */
    +private[streaming] class InputInfoTracker(ssc: StreamingContext) extends Logging {
    +
    +  /** Track all the input streams registered in DStreamGraph */
    +  val inputStreams = ssc.graph.getInputStreams()
    --- End diff --
    
    Can this be private?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7112][Streaming][WIP] Add a InputInfoTr...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5680#issuecomment-97725344
  
      [Test build #31411 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31411/consoleFull) for   PR 5680 at commit [`8325787`](https://github.com/apache/spark/commit/8325787bf13bcca16a405561413f1d81b3229941).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class InputInfo(batchTime: Time, inputStreamId: Int, numRecords: Long)`
    
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7112][Streaming][WIP] Add a InputInfoTr...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5680#issuecomment-97725353
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31411/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7112][Streaming][WIP] Add a InputInfoTr...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5680#issuecomment-97707535
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7112][Streaming][WIP] Add a InputInfoTr...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5680#discussion_r29408973
  
    --- Diff: streaming/src/main/scala/org/apache/spark/streaming/scheduler/BatchInfo.scala ---
    @@ -32,7 +33,7 @@ import org.apache.spark.streaming.Time
     @DeveloperApi
     case class BatchInfo(
         batchTime: Time,
    -    receivedBlockInfo: Map[Int, Array[ReceivedBlockInfo]],
    +    streamIdToNumRecords: Map[Int, Long],
         submissionTime: Long,
         processingStartTime: Option[Long],
         processingEndTime: Option[Long]
    --- End diff --
    
    Can you make a method called `numRecords` which returns the sum? This is the same approach taken by @zsxwing in  #5533, so will be easier to merge conflicts later. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7112][Streaming][WIP] Add a InputInfoTr...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5680#issuecomment-97697528
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31397/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7112][Streaming][WIP] Add a InputInfoTr...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5680#issuecomment-97709911
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7112][Streaming][WIP] Add a InputInfoTr...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5680#issuecomment-97822058
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31417/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7112][Streaming][WIP] Add a InputInfoTr...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5680#issuecomment-97725352
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7112][Streaming][WIP] Add a InputInfoTr...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5680#discussion_r29409065
  
    --- Diff: streaming/src/main/scala/org/apache/spark/streaming/ui/StreamingJobProgressListener.scala ---
    @@ -70,9 +70,8 @@ private[streaming] class StreamingJobProgressListener(ssc: StreamingContext)
         runningBatchInfos(batchStarted.batchInfo.batchTime) = batchStarted.batchInfo
         waitingBatchInfos.remove(batchStarted.batchInfo.batchTime)
     
    -    batchStarted.batchInfo.receivedBlockInfo.foreach { case (_, infos) =>
    -      totalReceivedRecords += infos.map(_.numRecords).sum
    -    }
    +    // TODO. this should be fixed when input stream is not receiver based stream.
    +    totalReceivedRecords += batchStarted.batchInfo.streamIdToNumRecords.values.sum
    --- End diff --
    
    This can be replaced by `batchStarted.batchInfo.numRecords` if you implement `numRecords` as I said above.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7112][Streaming] Add a DirectStreamTrac...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5680#issuecomment-95845816
  
      [Test build #30920 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30920/consoleFull) for   PR 5680 at commit [`28d668f`](https://github.com/apache/spark/commit/28d668faf51495e779aa1f874ceb03a64bccf410).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7112][Streaming][WIP] Add a InputInfoTr...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5680#issuecomment-97707496
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7112][Streaming][WIP] Add a InputInfoTr...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5680#issuecomment-97822008
  
      [Test build #31417 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31417/consoleFull) for   PR 5680 at commit [`6682bef`](https://github.com/apache/spark/commit/6682bef8de8104243ba5f3d2f4305ba0b0bda3f8).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class InputInfo(batchTime: Time, inputStreamId: Int, numRecords: Long)`
    
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7112][Streaming] Add a DirectStreamTrac...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on the pull request:

    https://github.com/apache/spark/pull/5680#issuecomment-97587604
  
    @jerryshao  This is a decent patch for the assumed design, but I mentioned in the parent JIRA https://issues.apache.org/jira/browse/SPARK-7111 that the design is not great. The "direct API" is a name we came up to differentiate new Kafka API from the old one receiver-based one, and logically every subclass of `InputDStream` that is not a `ReceiverInputDStream` is  a direct stream. So further separating out InputDStreams as direct stream and non-direct stream (beyond the receiver stream) is a bad idea. In the end, the goal is to enable all the streams, irrespective of its type, to report information about its input to the infra. And there should be a single common way / code path for doing it for all input streams, with some customization / override for receiver input streams (as unlike non-receiver-streams, all receiver streams has a common way of reporting block info).
    
    I have been thinking about this since last night and I will post a very rough design doc on the JIRA shortly. Please follow up on the JIRA.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7112][Streaming][WIP] Add a InputInfoTr...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5680#discussion_r29408797
  
    --- Diff: streaming/src/main/scala/org/apache/spark/streaming/scheduler/InputInfoTracker.scala ---
    @@ -0,0 +1,74 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.streaming.scheduler
    +
    +import scala.collection.mutable
    +
    +import org.apache.spark.Logging
    +import org.apache.spark.streaming.{Time, StreamingContext}
    +
    +/** To track the information of input stream at specified batch time. */
    +case class InputInfo(batchTime: Time, inputStreamId: Int, numRecords: Long)
    +
    +/**
    + * This class manages all the input streams as well as their input data statistics. The information
    + * will output to StreamingListener to better monitoring.
    + */
    +private[streaming] class InputInfoTracker(ssc: StreamingContext) extends Logging {
    +
    +  /** Track all the input streams registered in DStreamGraph */
    +  val inputStreams = ssc.graph.getInputStreams()
    +  /** Track all the id of input streams registered in DStreamGraph */
    +  val inputStreamIds = inputStreams.map(_.id)
    --- End diff --
    
    Can this be private?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7112][Streaming] Add a DirectStreamTrac...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the pull request:

    https://github.com/apache/spark/pull/5680#issuecomment-97622726
  
    Yes, I will take a look at the design doc, thanks a lot for your comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7112][Streaming][WIP] Add a InputInfoTr...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5680#discussion_r29448422
  
    --- Diff: streaming/src/main/scala/org/apache/spark/streaming/scheduler/InputInfoTracker.scala ---
    @@ -0,0 +1,69 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.streaming.scheduler
    +
    +import scala.collection.mutable
    +
    +import org.apache.spark.Logging
    +import org.apache.spark.streaming.{Time, StreamingContext}
    +
    +/** To track the information of input stream at specified batch time. */
    +case class InputInfo(batchTime: Time, inputStreamId: Int, numRecords: Long)
    +
    +/**
    + * This class manages all the input streams as well as their input data statistics. The information
    + * will output to StreamingListener to better monitoring.
    --- End diff --
    
    "to better monitoring" -> "for better monitoring"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org