You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by tdas <gi...@git.apache.org> on 2015/04/23 05:10:15 UTC

[GitHub] spark pull request: [SPARK-7056] Make the Write Ahead Log pluggabl...

GitHub user tdas opened a pull request:

    https://github.com/apache/spark/pull/5645

    [SPARK-7056] Make the Write Ahead Log pluggable

    Users may want the WAL data to be written to non-HDFS data storage systems. To allow that, we have to make the WAL pluggable. The following design doc outlines the plan. 
    
    https://docs.google.com/a/databricks.com/document/d/1A2XaOLRFzvIZSi18i_luNw5Rmm9j2j4AigktXxIYxmY/edit?usp=sharing
    
    Things to add. 
    * Unit tests for WriteAheadLogUtils


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tdas/spark wal-pluggable

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/5645.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #5645
    
----
commit 7dd2d4b9f6dcc993d0474bc2b65daa358a14115e
Author: Tathagata Das <ta...@gmail.com>
Date:   2015-04-23T03:06:19Z

    Added pluggable WriteAheadLog interface, and refactored all code along with it

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

Posted by harishreedharan <gi...@git.apache.org>.
Github user harishreedharan commented on the pull request:

    https://github.com/apache/spark/pull/5645#issuecomment-96500303
  
    I am taking a look at this. So far this looks good, I will comments, if any, tomorrow.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5645#issuecomment-97374354
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5645#discussion_r29309292
  
    --- Diff: streaming/src/main/scala/org/apache/spark/streaming/util/WriteAheadLogUtils.scala ---
    @@ -0,0 +1,129 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.streaming.util
    +
    +import scala.util.control.NonFatal
    +
    +import org.apache.hadoop.conf.Configuration
    +
    +import org.apache.spark.util.Utils
    +import org.apache.spark.{Logging, SparkConf, SparkException}
    +
    +/** A helper class with utility functions related to the WriteAheadLog interface */
    +private[streaming] object WriteAheadLogUtils extends Logging {
    +  val RECEIVER_WAL_ENABLE_CONF_KEY = "spark.streaming.receiver.writeAheadLog.enable"
    +  val RECEIVER_WAL_CLASS_CONF_KEY = "spark.streaming.receiver.writeAheadLog.class"
    +  val RECEIVER_WAL_ROLLING_INTERVAL_CONF_KEY =
    +    "spark.streaming.receiver.writeAheadLog.rollingIntervalSecs"
    +  val RECEIVER_WAL_MAX_FAILURES_CONF_KEY = "spark.streaming.receiver.writeAheadLog.maxFailures"
    +
    +  val DRIVER_WAL_CLASS_CONF_KEY = "spark.streaming.driver.writeAheadLog.class"
    +  val DRIVER_WAL_ROLLING_INTERVAL_CONF_KEY =
    +    "spark.streaming.driver.writeAheadLog.rollingIntervalSecs"
    +  val DRIVER_WAL_MAX_FAILURES_CONF_KEY = "spark.streaming.driver.writeAheadLog.maxFailures"
    +
    +  val DEFAULT_ROLLING_INTERVAL_SECS = 60
    +  val DEFAULT_MAX_FAILURES = 3
    +
    +  def enableReceiverLog(conf: SparkConf): Boolean = {
    +    conf.getBoolean(RECEIVER_WAL_ENABLE_CONF_KEY, false)
    +  }
    +
    +  def getRollingIntervalSecs(conf: SparkConf, isDriver: Boolean): Int = {
    +    if (isDriver) {
    +      conf.getInt(DRIVER_WAL_ROLLING_INTERVAL_CONF_KEY, DEFAULT_ROLLING_INTERVAL_SECS)
    +    } else {
    +      conf.getInt(RECEIVER_WAL_ROLLING_INTERVAL_CONF_KEY, DEFAULT_ROLLING_INTERVAL_SECS)
    +    }
    +  }
    +
    +  def getMaxFailures(conf: SparkConf, isDriver: Boolean): Int = {
    +    if (isDriver) {
    +      conf.getInt(DRIVER_WAL_MAX_FAILURES_CONF_KEY, DEFAULT_MAX_FAILURES)
    +    } else {
    +      conf.getInt(RECEIVER_WAL_MAX_FAILURES_CONF_KEY, DEFAULT_MAX_FAILURES)
    +    }
    +  }
    +
    +  /**
    +   * Create a WriteAheadLog for the driver. If configured with custom WAL class, it will try
    +   * to create instance of that class, otherwise it will create the default FileBasedWriteAheadLog.
    +   */
    +  def createLogForDriver(
    +      sparkConf: SparkConf,
    +      fileWalLogDirectory: String,
    +      fileWalHadoopConf: Configuration
    +    ): WriteAheadLog = {
    +    createLog(true, sparkConf, fileWalLogDirectory, fileWalHadoopConf)
    +  }
    +
    +  /**
    +   * Create a WriteAheadLog for the receiver. If configured with custom WAL class, it will try
    +   * to create instance of that class, otherwise it will create the default FileBasedWriteAheadLog.
    +   */
    +  def createLogForReceiver(
    +      sparkConf: SparkConf,
    +      fileWalLogDirectory: String,
    +      fileWalHadoopConf: Configuration
    +    ): WriteAheadLog = {
    +    createLog(false, sparkConf, fileWalLogDirectory, fileWalHadoopConf)
    +  }
    +
    +  /**
    +   * Create a WriteAheadLog based on the value of the given config key. The config key is used
    +   * to get the class name from the SparkConf. If the class is configured, it will try to
    +   * create instance of that class by first trying `new CustomWAL(sparkConf, logDir)` then trying
    +   * `new CustomWAL(sparkConf)`. If either fails, it will fail. If no class is configured, then
    +   * it will create the default FileBasedWriteAheadLog.
    +   */
    +  private def createLog(
    +      isDriver: Boolean,
    +      sparkConf: SparkConf,
    +      fileWalLogDirectory: String,
    +      fileWalHadoopConf: Configuration
    +    ): WriteAheadLog = {
    +
    +    val classNameOption = if (isDriver) {
    +      sparkConf.getOption(DRIVER_WAL_CLASS_CONF_KEY)
    +    } else {
    +      sparkConf.getOption(RECEIVER_WAL_CLASS_CONF_KEY)
    +    }
    +    classNameOption.map { className =>
    +      try {
    +        instantiateClass(
    +          Utils.classForName(className).asInstanceOf[Class[WriteAheadLog]], sparkConf)
    +      } catch {
    +        case NonFatal(e) =>
    +          throw new SparkException(s"Could not create a write ahead log of class $className", e)
    +      }
    +    }.getOrElse {
    +      new FileBasedWriteAheadLog(sparkConf, fileWalLogDirectory, fileWalHadoopConf,
    +        getRollingIntervalSecs(sparkConf, isDriver), getMaxFailures(sparkConf, isDriver))
    +    }
    +  }
    +
    +  /** Instantiate the class, either using single arg constructor or zero arg constructor */
    +  private def instantiateClass(cls: Class[WriteAheadLog], conf: SparkConf): WriteAheadLog = {
    --- End diff --
    
    I think `Class[WriteAheadLog]` should be changed to `Class[_ <: WriteAheadLog]`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5645#issuecomment-95806388
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30913/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5645#discussion_r29307153
  
    --- Diff: streaming/src/main/scala/org/apache/spark/streaming/rdd/WriteAheadLogBackedBlockRDD.scala ---
    @@ -96,10 +100,29 @@ class WriteAheadLogBackedBlockRDD[T: ClassTag](
             logDebug(s"Read partition data of $this from block manager, block $blockId")
             iterator
           case None => // Data not found in Block Manager, grab it from write ahead log file
    -        val reader = new WriteAheadLogRandomReader(partition.segment.path, hadoopConf)
    -        val dataRead = reader.read(partition.segment)
    -        reader.close()
    -        logInfo(s"Read partition data of $this from write ahead log, segment ${partition.segment}")
    +        var dataRead: ByteBuffer = null
    +        var writeAheadLog: WriteAheadLog = null
    +        try {
    +          val dummyDirectory = FileUtils.getTempDirectoryPath()
    +          writeAheadLog = WriteAheadLogUtils.createLogForReceiver(
    +            SparkEnv.get.conf, dummyDirectory, hadoopConf)
    +          dataRead = writeAheadLog.read(partition.walRecordHandle)
    +        } catch {
    +          case NonFatal(e) =>
    +            throw new SparkException(
    +              s"Could not read data from write ahead log record ${partition.walRecordHandle}", e)
    +        } finally {
    +          if (writeAheadLog != null) {
    +            writeAheadLog.close()
    --- End diff --
    
    May be reset writeAheadLog to null after close to avoid unexpected behavior :).
    
    `writeAheadLog = null`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056] Make the Write Ahead Log pluggabl...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on the pull request:

    https://github.com/apache/spark/pull/5645#issuecomment-95678755
  
    @jerryshao @hshreedharan Can you please take a look. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5645#discussion_r29029244
  
    --- Diff: streaming/src/main/scala/org/apache/spark/streaming/rdd/WriteAheadLogBackedBlockRDD.scala ---
    @@ -96,9 +99,27 @@ class WriteAheadLogBackedBlockRDD[T: ClassTag](
             logDebug(s"Read partition data of $this from block manager, block $blockId")
             iterator
           case None => // Data not found in Block Manager, grab it from write ahead log file
    -        val reader = new WriteAheadLogRandomReader(partition.segment.path, hadoopConf)
    -        val dataRead = reader.read(partition.segment)
    -        reader.close()
    +        var dataRead: ByteBuffer = null
    +        var writeAheadLog: WriteAheadLog = null
    +        try {
    +          val dummyDirectory = FileUtils.getTempDirectoryPath()
    +          writeAheadLog = WriteAheadLogUtils.createLogForReceiver(
    +            SparkEnv.get.conf, dummyDirectory, hadoopConf)
    --- End diff --
    
    hadoopConf is always available through the SparkContext. Irrespective of whether Hadoop file system is used, a Hadoop conf is created by the SparkContext which is passed on to this location. If the WAL is not the default FileBasedWAL, then this parameter is just ignored (see the method `WriteAheadLogUtils.createLogForReceiver`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056] Make the Write Ahead Log pluggabl...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5645#issuecomment-95694153
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30865/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5645#discussion_r29209219
  
    --- Diff: streaming/src/main/java/org/apache/spark/streaming/util/WriteAheadLog.java ---
    @@ -0,0 +1,59 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.streaming.util;
    +
    +import java.nio.ByteBuffer;
    +import java.util.Iterator;
    +
    +/**
    + * Interface representing a write ahead log (aka journal) that is used by Spark Streaming to
    + * save the received data (by receivers) and associated metadata to a reliable storage, so that
    + * they can be recovered after driver failures. See the Spark docs for more information on how
    --- End diff --
    
    It might be good to give a high level description like "A log is any storage service capable of persisting data binary data associated with a particular time and removing all data older than a certain time." 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5645#issuecomment-97285567
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5645#issuecomment-97350415
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/5645#issuecomment-96884076
  
    I added some comments on the public interface. The main one is about whether we use opaque buffers to make the serialization of the segment identifier more explicit.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5645#issuecomment-97272246
  
      [Test build #31187 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31187/consoleFull) for   PR 5645 at commit [`b65e155`](https://github.com/apache/spark/commit/b65e155c06b08fbdd13bd6c8505ca7428a6f5580).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5645#discussion_r29210080
  
    --- Diff: streaming/src/main/java/org/apache/spark/streaming/util/WriteAheadLogSegment.java ---
    @@ -0,0 +1,26 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.streaming.util;
    +
    +/**
    + * This is an interface that represent the information required by any implementation of
    + * a WriteAheadLog to read a written record.
    + */
    +@org.apache.spark.annotation.DeveloperApi
    +public interface WriteAheadLogSegment extends java.io.Serializable {
    --- End diff --
    
    Also could we call this a `WALSegmentHandle` or something? This isn't the segment itself it's just an identifier.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on the pull request:

    https://github.com/apache/spark/pull/5645#issuecomment-96895715
  
    I think I dont have a strong opinion either ways, I just dont see too much benefit with bytebuffer. Rather it make its slightly hard for naive users to implement their own WAL, while making it no more or less difficult for advanced users who care about backward compatibility.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

Posted by helena <gi...@git.apache.org>.
Github user helena commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5645#discussion_r29194115
  
    --- Diff: streaming/src/main/scala/org/apache/spark/streaming/rdd/WriteAheadLogBackedBlockRDD.scala ---
    @@ -96,9 +99,27 @@ class WriteAheadLogBackedBlockRDD[T: ClassTag](
             logDebug(s"Read partition data of $this from block manager, block $blockId")
             iterator
           case None => // Data not found in Block Manager, grab it from write ahead log file
    -        val reader = new WriteAheadLogRandomReader(partition.segment.path, hadoopConf)
    -        val dataRead = reader.read(partition.segment)
    -        reader.close()
    +        var dataRead: ByteBuffer = null
    --- End diff --
    
    I feel dirty seeing nulls in scala


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5645#issuecomment-97272253
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31187/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5645#discussion_r29308674
  
    --- Diff: streaming/src/main/scala/org/apache/spark/streaming/util/FileBasedWriteAheadLog.scala ---
    @@ -108,12 +117,14 @@ private[streaming] class WriteAheadLogManager(
        * the latest the records. This does not deal with currently active log files, and
        * hence the implementation is kept simple.
        */
    -  def readFromLog(): Iterator[ByteBuffer] = synchronized {
    +  def readAll(): JIterator[ByteBuffer] = synchronized {
    +    import scala.collection.JavaConversions._
         val logFilesToRead = pastLogs.map{ _.path} ++ currentLogPath
         logInfo("Reading from the logs: " + logFilesToRead.mkString("\n"))
    +
    --- End diff --
    
    Same number of characters either way :) That's a nit, will do if I have to change something else. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5645#issuecomment-95788717
  
      [Test build #30909 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30909/consoleFull) for   PR 5645 at commit [`e0d19fb`](https://github.com/apache/spark/commit/e0d19fb1f0e6472d3e0ca55223c36ed506f32709).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5645#issuecomment-97284175
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5645#issuecomment-95761613
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30885/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5645#issuecomment-95798479
  
      [Test build #30909 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30909/consoleFull) for   PR 5645 at commit [`e0d19fb`](https://github.com/apache/spark/commit/e0d19fb1f0e6472d3e0ca55223c36ed506f32709).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on the pull request:

    https://github.com/apache/spark/pull/5645#issuecomment-97373328
  
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056] Make the Write Ahead Log pluggabl...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5645#issuecomment-95413168
  
      [Test build #30803 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30803/consoleFull) for   PR 5645 at commit [`09bc6fe`](https://github.com/apache/spark/commit/09bc6fe6d792a6007a1e88fa90d956dbd4d547fb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5645#discussion_r29118877
  
    --- Diff: streaming/src/main/scala/org/apache/spark/streaming/rdd/WriteAheadLogBackedBlockRDD.scala ---
    @@ -96,9 +99,27 @@ class WriteAheadLogBackedBlockRDD[T: ClassTag](
             logDebug(s"Read partition data of $this from block manager, block $blockId")
             iterator
           case None => // Data not found in Block Manager, grab it from write ahead log file
    -        val reader = new WriteAheadLogRandomReader(partition.segment.path, hadoopConf)
    -        val dataRead = reader.read(partition.segment)
    -        reader.close()
    +        var dataRead: ByteBuffer = null
    +        var writeAheadLog: WriteAheadLog = null
    +        try {
    +          val dummyDirectory = FileUtils.getTempDirectoryPath()
    +          writeAheadLog = WriteAheadLogUtils.createLogForReceiver(
    +            SparkEnv.get.conf, dummyDirectory, hadoopConf)
    --- End diff --
    
    The log directory needs to be passed through the `WriteAheadLogUtils.createLogForXXX()`. If you want to hide it from this method, and pass it through the SparkConf, then every place where `WriteAheadLogUtils.createLogForXXX()` needs to be called, we need to add the following. 
    ```
    val walConf = SparkEnv.get.conf.clone()
    walConf.set("logdir", ....)
    ```
    
    IMO that duplicates code everywhere and uglier that this dummy dir approach.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5645#discussion_r29316443
  
    --- Diff: streaming/src/main/scala/org/apache/spark/streaming/util/WriteAheadLogUtils.scala ---
    @@ -0,0 +1,129 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.streaming.util
    +
    +import scala.util.control.NonFatal
    +
    +import org.apache.hadoop.conf.Configuration
    +
    +import org.apache.spark.util.Utils
    +import org.apache.spark.{Logging, SparkConf, SparkException}
    +
    +/** A helper class with utility functions related to the WriteAheadLog interface */
    +private[streaming] object WriteAheadLogUtils extends Logging {
    +  val RECEIVER_WAL_ENABLE_CONF_KEY = "spark.streaming.receiver.writeAheadLog.enable"
    +  val RECEIVER_WAL_CLASS_CONF_KEY = "spark.streaming.receiver.writeAheadLog.class"
    +  val RECEIVER_WAL_ROLLING_INTERVAL_CONF_KEY =
    +    "spark.streaming.receiver.writeAheadLog.rollingIntervalSecs"
    +  val RECEIVER_WAL_MAX_FAILURES_CONF_KEY = "spark.streaming.receiver.writeAheadLog.maxFailures"
    +
    +  val DRIVER_WAL_CLASS_CONF_KEY = "spark.streaming.driver.writeAheadLog.class"
    +  val DRIVER_WAL_ROLLING_INTERVAL_CONF_KEY =
    +    "spark.streaming.driver.writeAheadLog.rollingIntervalSecs"
    +  val DRIVER_WAL_MAX_FAILURES_CONF_KEY = "spark.streaming.driver.writeAheadLog.maxFailures"
    +
    +  val DEFAULT_ROLLING_INTERVAL_SECS = 60
    +  val DEFAULT_MAX_FAILURES = 3
    +
    +  def enableReceiverLog(conf: SparkConf): Boolean = {
    +    conf.getBoolean(RECEIVER_WAL_ENABLE_CONF_KEY, false)
    +  }
    +
    +  def getRollingIntervalSecs(conf: SparkConf, isDriver: Boolean): Int = {
    +    if (isDriver) {
    +      conf.getInt(DRIVER_WAL_ROLLING_INTERVAL_CONF_KEY, DEFAULT_ROLLING_INTERVAL_SECS)
    +    } else {
    +      conf.getInt(RECEIVER_WAL_ROLLING_INTERVAL_CONF_KEY, DEFAULT_ROLLING_INTERVAL_SECS)
    +    }
    +  }
    +
    +  def getMaxFailures(conf: SparkConf, isDriver: Boolean): Int = {
    +    if (isDriver) {
    +      conf.getInt(DRIVER_WAL_MAX_FAILURES_CONF_KEY, DEFAULT_MAX_FAILURES)
    +    } else {
    +      conf.getInt(RECEIVER_WAL_MAX_FAILURES_CONF_KEY, DEFAULT_MAX_FAILURES)
    +    }
    +  }
    +
    +  /**
    +   * Create a WriteAheadLog for the driver. If configured with custom WAL class, it will try
    +   * to create instance of that class, otherwise it will create the default FileBasedWriteAheadLog.
    +   */
    +  def createLogForDriver(
    +      sparkConf: SparkConf,
    +      fileWalLogDirectory: String,
    +      fileWalHadoopConf: Configuration
    +    ): WriteAheadLog = {
    +    createLog(true, sparkConf, fileWalLogDirectory, fileWalHadoopConf)
    +  }
    +
    +  /**
    +   * Create a WriteAheadLog for the receiver. If configured with custom WAL class, it will try
    +   * to create instance of that class, otherwise it will create the default FileBasedWriteAheadLog.
    +   */
    +  def createLogForReceiver(
    +      sparkConf: SparkConf,
    +      fileWalLogDirectory: String,
    +      fileWalHadoopConf: Configuration
    +    ): WriteAheadLog = {
    +    createLog(false, sparkConf, fileWalLogDirectory, fileWalHadoopConf)
    +  }
    +
    +  /**
    +   * Create a WriteAheadLog based on the value of the given config key. The config key is used
    +   * to get the class name from the SparkConf. If the class is configured, it will try to
    +   * create instance of that class by first trying `new CustomWAL(sparkConf, logDir)` then trying
    +   * `new CustomWAL(sparkConf)`. If either fails, it will fail. If no class is configured, then
    +   * it will create the default FileBasedWriteAheadLog.
    +   */
    +  private def createLog(
    +      isDriver: Boolean,
    +      sparkConf: SparkConf,
    +      fileWalLogDirectory: String,
    +      fileWalHadoopConf: Configuration
    +    ): WriteAheadLog = {
    +
    +    val classNameOption = if (isDriver) {
    +      sparkConf.getOption(DRIVER_WAL_CLASS_CONF_KEY)
    +    } else {
    +      sparkConf.getOption(RECEIVER_WAL_CLASS_CONF_KEY)
    +    }
    +    classNameOption.map { className =>
    +      try {
    +        instantiateClass(
    +          Utils.classForName(className).asInstanceOf[Class[WriteAheadLog]], sparkConf)
    +      } catch {
    +        case NonFatal(e) =>
    +          throw new SparkException(s"Could not create a write ahead log of class $className", e)
    +      }
    +    }.getOrElse {
    +      new FileBasedWriteAheadLog(sparkConf, fileWalLogDirectory, fileWalHadoopConf,
    +        getRollingIntervalSecs(sparkConf, isDriver), getMaxFailures(sparkConf, isDriver))
    +    }
    +  }
    +
    +  /** Instantiate the class, either using single arg constructor or zero arg constructor */
    +  private def instantiateClass(cls: Class[WriteAheadLog], conf: SparkConf): WriteAheadLog = {
    --- End diff --
    
    Does not help much as it just makes the thing look more complicated. Changed nonetheless for correctness.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5645#issuecomment-97298863
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on the pull request:

    https://github.com/apache/spark/pull/5645#issuecomment-97566507
  
    @harishreedharan @jerryshao @pwendell Thanks for the feedback y'all. Merging this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on the pull request:

    https://github.com/apache/spark/pull/5645#issuecomment-97520613
  
    @pwendell I am going to merge this if you have no other comments. It is blocking my other PR #5732 :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056] Make the Write Ahead Log pluggabl...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5645#issuecomment-95441936
  
      [Test build #30814 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30814/consoleFull) for   PR 5645 at commit [`837c4f5`](https://github.com/apache/spark/commit/837c4f5cb90abee46feb15bfd1c103cddc58af32).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5645#issuecomment-95761603
  
      [Test build #30885 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30885/consoleFull) for   PR 5645 at commit [`86abcb1`](https://github.com/apache/spark/commit/86abcb1578ddeecfa11fc4fe116d894ccd838e44).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5645#discussion_r29209101
  
    --- Diff: streaming/src/main/java/org/apache/spark/streaming/util/WriteAheadLogSegment.java ---
    @@ -0,0 +1,26 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.streaming.util;
    +
    +/**
    + * This is an interface that represent the information required by any implementation of
    + * a WriteAheadLog to read a written record.
    --- End diff --
    
    It might be good to say something like 'Represents an opaque identifier that references metadata required to read a specific piece of data".


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5645#discussion_r29222456
  
    --- Diff: streaming/src/main/java/org/apache/spark/streaming/util/WriteAheadLogSegment.java ---
    @@ -0,0 +1,26 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.streaming.util;
    +
    +/**
    + * This is an interface that represent the information required by any implementation of
    + * a WriteAheadLog to read a written record.
    + */
    +@org.apache.spark.annotation.DeveloperApi
    +public interface WriteAheadLogSegment extends java.io.Serializable {
    --- End diff --
    
    Actually, this is an interface, so I am not sure we can create an alternate
    method without breaking binary compatibility.
    Well, they could leave the serialization of MyWALSegment to Java, which is
    just a wrapper for a ByteBuffer/byte array which contains all the real
    data. If that sounds too complicated, then may be we should do bytes. And
    probably we should simply use byte array instead of ByteBuffer, as we
    probably dont need to deal with direct byte buffers here.
    
    
    On Mon, Apr 27, 2015 at 9:31 PM, Patrick Wendell <no...@github.com>
    wrote:
    
    > In
    > streaming/src/main/java/org/apache/spark/streaming/util/WriteAheadLogSegment.java
    > <https://github.com/apache/spark/pull/5645#discussion_r29213656>:
    >
    > > + *
    > > + * Unless required by applicable law or agreed to in writing, software
    > > + * distributed under the License is distributed on an "AS IS" BASIS,
    > > + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    > > + * See the License for the specific language governing permissions and
    > > + * limitations under the License.
    > > + */
    > > +
    > > +package org.apache.spark.streaming.util;
    > > +
    > > +/**
    > > + * This is an interface that represent the information required by any implementation of
    > > + * a WriteAheadLog to read a written record.
    > > + */
    > > +@org.apache.spark.annotation.DeveloperApi
    > > +public interface WriteAheadLogSegment extends java.io.Serializable {
    >
    > But in the current approach, they can't for instance use kryo or protobuf
    > to serialize, unless they do something really crazy like use an
    > externalizable hook to then call into Kryo. I guess I'm just thinking ahead
    > to how this will evolve. However, if we want to have this in the future we
    > can always create an alternative version that is additive, so I don't feel
    > strongly at all
    >
    > —
    > Reply to this email directly or view it on GitHub
    > <https://github.com/apache/spark/pull/5645/files#r29213656>.
    >



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5645#issuecomment-97374345
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5645#issuecomment-95798234
  
      [Test build #30913 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30913/consoleFull) for   PR 5645 at commit [`1a32a4b`](https://github.com/apache/spark/commit/1a32a4b5ce740721343915452974c9fb3f9a3910).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5645#discussion_r29307615
  
    --- Diff: streaming/src/main/scala/org/apache/spark/streaming/rdd/WriteAheadLogBackedBlockRDD.scala ---
    @@ -96,10 +100,29 @@ class WriteAheadLogBackedBlockRDD[T: ClassTag](
             logDebug(s"Read partition data of $this from block manager, block $blockId")
             iterator
           case None => // Data not found in Block Manager, grab it from write ahead log file
    -        val reader = new WriteAheadLogRandomReader(partition.segment.path, hadoopConf)
    -        val dataRead = reader.read(partition.segment)
    -        reader.close()
    -        logInfo(s"Read partition data of $this from write ahead log, segment ${partition.segment}")
    +        var dataRead: ByteBuffer = null
    +        var writeAheadLog: WriteAheadLog = null
    +        try {
    +          val dummyDirectory = FileUtils.getTempDirectoryPath()
    +          writeAheadLog = WriteAheadLogUtils.createLogForReceiver(
    +            SparkEnv.get.conf, dummyDirectory, hadoopConf)
    +          dataRead = writeAheadLog.read(partition.walRecordHandle)
    +        } catch {
    +          case NonFatal(e) =>
    +            throw new SparkException(
    +              s"Could not read data from write ahead log record ${partition.walRecordHandle}", e)
    +        } finally {
    +          if (writeAheadLog != null) {
    +            writeAheadLog.close()
    +          }
    +        }
    +        if (dataRead == null) {
    +          throw new SparkException(
    +            s"Could not read data from write ahead log record ${partition.walRecordHandle}, " +
    +              s"read returned null")
    +        }
    +        logInfo(s"Read partition data of $this from write ahead log, record handle " +
    +          partition.walRecordHandle)
    --- End diff --
    
    Since it has to be on the next line, using s"$..." is strictly more number of chars :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5645#issuecomment-97341727
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5645#issuecomment-97298865
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31230/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

Posted by harishreedharan <gi...@git.apache.org>.
Github user harishreedharan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5645#discussion_r29120523
  
    --- Diff: streaming/src/main/java/org/apache/spark/streaming/util/WriteAheadLogSegment.java ---
    @@ -0,0 +1,26 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.streaming.util;
    +
    +/**
    + * This is an interface that represent the information required by any implementation of
    --- End diff --
    
    represents


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056] Make the Write Ahead Log pluggabl...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5645#issuecomment-95442062
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30814/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5645#issuecomment-97252755
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5645#issuecomment-97350416
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31264/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5645#issuecomment-95770620
  
      [Test build #30886 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30886/consoleFull) for   PR 5645 at commit [`9310cbf`](https://github.com/apache/spark/commit/9310cbf23640bee22ddd349e84ca5a254d440b81).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5645#discussion_r29029162
  
    --- Diff: streaming/src/main/scala/org/apache/spark/streaming/rdd/WriteAheadLogBackedBlockRDD.scala ---
    @@ -96,9 +99,27 @@ class WriteAheadLogBackedBlockRDD[T: ClassTag](
             logDebug(s"Read partition data of $this from block manager, block $blockId")
             iterator
           case None => // Data not found in Block Manager, grab it from write ahead log file
    -        val reader = new WriteAheadLogRandomReader(partition.segment.path, hadoopConf)
    -        val dataRead = reader.read(partition.segment)
    -        reader.close()
    +        var dataRead: ByteBuffer = null
    +        var writeAheadLog: WriteAheadLog = null
    +        try {
    +          val dummyDirectory = FileUtils.getTempDirectoryPath()
    --- End diff --
    
    So the default WAL is file based so a log directory is needed for it to work. However, the log directory is really not needed reading a particular record. But to read a single record you have to create a FileBasedWriteAheadLog object, which needs a log directory. Hence I am providing a dummy directory for this. 
    
    I know that this is a little awkward. This is the cost of defining a single interface for both writing and reading single records. Earlier there were two independent classes (WALWriter and WALRandomReader) that was used for these two purposes, which has different requirements. But since I am trying make single interface that can be used for all reading and writing, the log directory must be provided in the constructor of the default file-based WAL. This results in the awkwardness. 
    
    I dont quite like it myself, but it may practically be okay as long as we ensure that the FileBasedWAL does not create unnecessary directories/files when only reading a single record. I can add a test to ensure that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5645#issuecomment-95798520
  
      [Test build #30914 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30914/consoleFull) for   PR 5645 at commit [`d7cd15b`](https://github.com/apache/spark/commit/d7cd15b5cef64766a432918e54cca4750d13745b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5645#discussion_r29026826
  
    --- Diff: streaming/src/main/scala/org/apache/spark/streaming/rdd/WriteAheadLogBackedBlockRDD.scala ---
    @@ -96,9 +99,27 @@ class WriteAheadLogBackedBlockRDD[T: ClassTag](
             logDebug(s"Read partition data of $this from block manager, block $blockId")
             iterator
           case None => // Data not found in Block Manager, grab it from write ahead log file
    -        val reader = new WriteAheadLogRandomReader(partition.segment.path, hadoopConf)
    -        val dataRead = reader.read(partition.segment)
    -        reader.close()
    +        var dataRead: ByteBuffer = null
    +        var writeAheadLog: WriteAheadLog = null
    +        try {
    +          val dummyDirectory = FileUtils.getTempDirectoryPath()
    --- End diff --
    
    Why here need to use `dummyDirectory`? Assuming WAL may not be file-based, so I'm not sure what's the meaning we need to have this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on the pull request:

    https://github.com/apache/spark/pull/5645#issuecomment-96484330
  
    @pwendell Please take a look.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5645#issuecomment-97283992
  
      [Test build #31210 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31210/consoleFull) for   PR 5645 at commit [`bde26b1`](https://github.com/apache/spark/commit/bde26b15f312a03dc2f3deb93c43e8551240cd97).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5645#issuecomment-97253241
  
      [Test build #31187 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31187/consoleFull) for   PR 5645 at commit [`b65e155`](https://github.com/apache/spark/commit/b65e155c06b08fbdd13bd6c8505ca7428a6f5580).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5645#issuecomment-97285427
  
      [Test build #31212 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31212/consoleFull) for   PR 5645 at commit [`569a416`](https://github.com/apache/spark/commit/569a416c647fc2eebc76e00b3bd7d18242ad617f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5645#issuecomment-97298862
  
      [Test build #31230 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31230/consoleFull) for   PR 5645 at commit [`c2bc738`](https://github.com/apache/spark/commit/c2bc7384e2551bb5e333d0c2b0b61fe02ce0d0b5).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5645#discussion_r29211121
  
    --- Diff: streaming/src/main/scala/org/apache/spark/streaming/rdd/WriteAheadLogBackedBlockRDD.scala ---
    @@ -96,9 +99,27 @@ class WriteAheadLogBackedBlockRDD[T: ClassTag](
             logDebug(s"Read partition data of $this from block manager, block $blockId")
             iterator
           case None => // Data not found in Block Manager, grab it from write ahead log file
    -        val reader = new WriteAheadLogRandomReader(partition.segment.path, hadoopConf)
    -        val dataRead = reader.read(partition.segment)
    -        reader.close()
    +        var dataRead: ByteBuffer = null
    --- End diff --
    
    Why allocate two (at least) objects when it is completely obvious not going to be used. The null does not get exposed to anything outside the function, and hence is okay to have. 
    
    If you look at rest of the Spark source code, we dont strictly adhere to Scala-way of doing things, rather balance code understandability (limit the levels of functional nesting) and efficiency (while loops instead of for when perf matters) with Scala styles. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5645#discussion_r29302646
  
    --- Diff: streaming/src/main/scala/org/apache/spark/streaming/receiver/ReceivedBlockHandler.scala ---
    @@ -96,7 +96,7 @@ private[streaming] class BlockManagerBasedBlockHandler(
      */
     private[streaming] case class WriteAheadLogBasedStoreResult(
         blockId: StreamBlockId,
    -    segment: WriteAheadLogFileSegment
    +    segment: WriteAheadLogRecordHandle
    --- End diff --
    
    This is a leftover from the name change, segment --> handle. If there is no other change needed, I can take care of it in the next PR #5732 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5645#issuecomment-97284177
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31210/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056] Make the Write Ahead Log pluggabl...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5645#issuecomment-95412474
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30802/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5645#issuecomment-97272252
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056] Make the Write Ahead Log pluggabl...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5645#issuecomment-95476595
  
      [Test build #30820 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30820/consoleFull) for   PR 5645 at commit [`bce5e75`](https://github.com/apache/spark/commit/bce5e7597d4f59952b6b9de7bf194392b5e5790f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5645#issuecomment-95811180
  
      [Test build #30914 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30914/consoleFull) for   PR 5645 at commit [`d7cd15b`](https://github.com/apache/spark/commit/d7cd15b5cef64766a432918e54cca4750d13745b).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5645#issuecomment-97296639
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

Posted by harishreedharan <gi...@git.apache.org>.
Github user harishreedharan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5645#discussion_r29120504
  
    --- Diff: streaming/src/main/java/org/apache/spark/streaming/util/WriteAheadLog.java ---
    @@ -0,0 +1,59 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.streaming.util;
    +
    +import java.nio.ByteBuffer;
    +import java.util.Iterator;
    +
    +/**
    + * Interface representing a write ahead log (aka journal) that is used by Spark Streaming to
    + * save the received data (by receivers) and associated metadata to a reliable storage, so that
    + * they can be recovered after driver failures. See the Spark docs for more information on how
    + * to plug in your own custom implementation of a write ahead log.
    + */
    +@org.apache.spark.annotation.DeveloperApi
    +public interface WriteAheadLog {
    +  /**
    +   * Write the record to the log and return the segment information that is necessary to read
    +   * back the written record. The time is used to the index the record, such that it can be
    +   * cleaned later. Note that the written data must be durable and readable (using the
    +   * segment info) by the time this function returns.
    +   */
    +  WriteAheadLogSegment write(ByteBuffer record, long time);
    +
    +  /**
    +   * Read a written record based on the given segment information.
    +   */
    +  ByteBuffer read(WriteAheadLogSegment segment);
    +
    +  /**
    +   * Read and return an iterator of all the records that have written and not yet cleanup.
    --- End diff --
    
    not yet cleaned up.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056] Make the Write Ahead Log pluggabl...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5645#issuecomment-95694135
  
      [Test build #30865 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30865/consoleFull) for   PR 5645 at commit [`84ce469`](https://github.com/apache/spark/commit/84ce4693ed7f1a161c52f75d33a4699428363470).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5645#issuecomment-96773906
  
      [Test build #30987 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30987/consoleFull) for   PR 5645 at commit [`d7cd15b`](https://github.com/apache/spark/commit/d7cd15b5cef64766a432918e54cca4750d13745b).
     * This patch **fails some tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.
     * This patch **adds the following new dependencies:**
       * `RoaringBitmap-0.4.5.jar`
       * `activation-1.1.jar`
       * `akka-actor_2.10-2.3.4-spark.jar`
       * `akka-remote_2.10-2.3.4-spark.jar`
       * `akka-slf4j_2.10-2.3.4-spark.jar`
       * `aopalliance-1.0.jar`
       * `arpack_combined_all-0.1.jar`
       * `avro-1.7.7.jar`
       * `breeze-macros_2.10-0.11.2.jar`
       * `breeze_2.10-0.11.2.jar`
       * `chill-java-0.5.0.jar`
       * `chill_2.10-0.5.0.jar`
       * `commons-beanutils-1.7.0.jar`
       * `commons-beanutils-core-1.8.0.jar`
       * `commons-cli-1.2.jar`
       * `commons-codec-1.10.jar`
       * `commons-collections-3.2.1.jar`
       * `commons-compress-1.4.1.jar`
       * `commons-configuration-1.6.jar`
       * `commons-digester-1.8.jar`
       * `commons-httpclient-3.1.jar`
       * `commons-io-2.1.jar`
       * `commons-lang-2.5.jar`
       * `commons-lang3-3.3.2.jar`
       * `commons-math-2.1.jar`
       * `commons-math3-3.4.1.jar`
       * `commons-net-2.2.jar`
       * `compress-lzf-1.0.0.jar`
       * `config-1.2.1.jar`
       * `core-1.1.2.jar`
       * `curator-client-2.4.0.jar`
       * `curator-framework-2.4.0.jar`
       * `curator-recipes-2.4.0.jar`
       * `gmbal-api-only-3.0.0-b023.jar`
       * `grizzly-framework-2.1.2.jar`
       * `grizzly-http-2.1.2.jar`
       * `grizzly-http-server-2.1.2.jar`
       * `grizzly-http-servlet-2.1.2.jar`
       * `grizzly-rcm-2.1.2.jar`
       * `groovy-all-2.3.7.jar`
       * `guava-14.0.1.jar`
       * `guice-3.0.jar`
       * `hadoop-annotations-2.2.0.jar`
       * `hadoop-auth-2.2.0.jar`
       * `hadoop-client-2.2.0.jar`
       * `hadoop-common-2.2.0.jar`
       * `hadoop-hdfs-2.2.0.jar`
       * `hadoop-mapreduce-client-app-2.2.0.jar`
       * `hadoop-mapreduce-client-common-2.2.0.jar`
       * `hadoop-mapreduce-client-core-2.2.0.jar`
       * `hadoop-mapreduce-client-jobclient-2.2.0.jar`
       * `hadoop-mapreduce-client-shuffle-2.2.0.jar`
       * `hadoop-yarn-api-2.2.0.jar`
       * `hadoop-yarn-client-2.2.0.jar`
       * `hadoop-yarn-common-2.2.0.jar`
       * `hadoop-yarn-server-common-2.2.0.jar`
       * `ivy-2.4.0.jar`
       * `jackson-annotations-2.4.0.jar`
       * `jackson-core-2.4.4.jar`
       * `jackson-core-asl-1.8.8.jar`
       * `jackson-databind-2.4.4.jar`
       * `jackson-jaxrs-1.8.8.jar`
       * `jackson-mapper-asl-1.8.8.jar`
       * `jackson-module-scala_2.10-2.4.4.jar`
       * `jackson-xc-1.8.8.jar`
       * `jansi-1.4.jar`
       * `javax.inject-1.jar`
       * `javax.servlet-3.0.0.v201112011016.jar`
       * `javax.servlet-3.1.jar`
       * `javax.servlet-api-3.0.1.jar`
       * `jaxb-api-2.2.2.jar`
       * `jaxb-impl-2.2.3-1.jar`
       * `jcl-over-slf4j-1.7.10.jar`
       * `jersey-client-1.9.jar`
       * `jersey-core-1.9.jar`
       * `jersey-grizzly2-1.9.jar`
       * `jersey-guice-1.9.jar`
       * `jersey-json-1.9.jar`
       * `jersey-server-1.9.jar`
       * `jersey-test-framework-core-1.9.jar`
       * `jersey-test-framework-grizzly2-1.9.jar`
       * `jets3t-0.7.1.jar`
       * `jettison-1.1.jar`
       * `jetty-util-6.1.26.jar`
       * `jline-0.9.94.jar`
       * `jline-2.10.4.jar`
       * `jodd-core-3.6.3.jar`
       * `json4s-ast_2.10-3.2.10.jar`
       * `json4s-core_2.10-3.2.10.jar`
       * `json4s-jackson_2.10-3.2.10.jar`
       * `jsr305-1.3.9.jar`
       * `jtransforms-2.4.0.jar`
       * `jul-to-slf4j-1.7.10.jar`
       * `kryo-2.21.jar`
       * `log4j-1.2.17.jar`
       * `lz4-1.2.0.jar`
       * `management-api-3.0.0-b012.jar`
       * `mesos-0.21.0-shaded-protobuf.jar`
       * `metrics-core-3.1.0.jar`
       * `metrics-graphite-3.1.0.jar`
       * `metrics-json-3.1.0.jar`
       * `metrics-jvm-3.1.0.jar`
       * `minlog-1.2.jar`
       * `netty-3.8.0.Final.jar`
       * `netty-all-4.0.23.Final.jar`
       * `objenesis-1.2.jar`
       * `opencsv-2.3.jar`
       * `oro-2.0.8.jar`
       * `paranamer-2.6.jar`
       * `parquet-column-1.6.0rc3.jar`
       * `parquet-common-1.6.0rc3.jar`
       * `parquet-encoding-1.6.0rc3.jar`
       * `parquet-format-2.2.0-rc1.jar`
       * `parquet-generator-1.6.0rc3.jar`
       * `parquet-hadoop-1.6.0rc3.jar`
       * `parquet-jackson-1.6.0rc3.jar`
       * `protobuf-java-2.4.1.jar`
       * `protobuf-java-2.5.0-spark.jar`
       * `py4j-0.8.2.1.jar`
       * `pyrolite-2.0.1.jar`
       * `quasiquotes_2.10-2.0.1.jar`
       * `reflectasm-1.07-shaded.jar`
       * `scala-compiler-2.10.4.jar`
       * `scala-library-2.10.4.jar`
       * `scala-reflect-2.10.4.jar`
       * `scalap-2.10.4.jar`
       * `scalatest_2.10-2.2.1.jar`
       * `slf4j-api-1.7.10.jar`
       * `slf4j-log4j12-1.7.10.jar`
       * `snappy-java-1.1.1.7.jar`
       * `spark-bagel_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-catalyst_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-core_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-graphx_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-launcher_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-mllib_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-network-common_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-network-shuffle_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-repl_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-sql_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-streaming_2.10-1.4.0-SNAPSHOT.jar`
       * `spire-macros_2.10-0.7.4.jar`
       * `spire_2.10-0.7.4.jar`
       * `stax-api-1.0.1.jar`
       * `stream-2.7.0.jar`
       * `tachyon-0.6.4.jar`
       * `tachyon-client-0.6.4.jar`
       * `uncommons-maths-1.2.2a.jar`
       * `unused-1.0.0.jar`
       * `xmlenc-0.52.jar`
       * `xz-1.0.jar`
       * `zookeeper-3.4.5.jar`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5645#issuecomment-95770651
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30886/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5645#discussion_r29209693
  
    --- Diff: streaming/src/main/java/org/apache/spark/streaming/util/WriteAheadLogSegment.java ---
    @@ -0,0 +1,26 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.streaming.util;
    +
    +/**
    + * This is an interface that represent the information required by any implementation of
    + * a WriteAheadLog to read a written record.
    + */
    +@org.apache.spark.annotation.DeveloperApi
    +public interface WriteAheadLogSegment extends java.io.Serializable {
    --- End diff --
    
    I wonder if we should more explicitly build the serialization of these segment identifiers into this interface. One extreme option is to have the segment identifiers actually be byte buffers and ask the user to deal on their own with serializing them.
    
    The main concerns I have are the following:
    1. Individual implementations of this must be java Serializable, but it's not possible to reflect that in the implementation.
    2. If those implementations want to evolve over different versions for instance they add a new field to the segment identifier, it will be tricky for them to do in a way that's backwards compatible (they'll have to write a custom externalization logic, which isn't really used for backwards compatibility).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5645#issuecomment-95798495
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30909/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5645#issuecomment-95811201
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30914/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5645#discussion_r29211200
  
    --- Diff: streaming/src/main/java/org/apache/spark/streaming/util/WriteAheadLog.java ---
    @@ -0,0 +1,59 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.streaming.util;
    +
    +import java.nio.ByteBuffer;
    +import java.util.Iterator;
    +
    +/**
    + * Interface representing a write ahead log (aka journal) that is used by Spark Streaming to
    + * save the received data (by receivers) and associated metadata to a reliable storage, so that
    + * they can be recovered after driver failures. See the Spark docs for more information on how
    --- End diff --
    
    Good idea. Will do.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

Posted by helena <gi...@git.apache.org>.
Github user helena commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5645#discussion_r29194436
  
    --- Diff: streaming/src/main/scala/org/apache/spark/streaming/rdd/WriteAheadLogBackedBlockRDD.scala ---
    @@ -96,9 +99,27 @@ class WriteAheadLogBackedBlockRDD[T: ClassTag](
             logDebug(s"Read partition data of $this from block manager, block $blockId")
             iterator
           case None => // Data not found in Block Manager, grab it from write ahead log file
    -        val reader = new WriteAheadLogRandomReader(partition.segment.path, hadoopConf)
    -        val dataRead = reader.read(partition.segment)
    -        reader.close()
    +        var dataRead: ByteBuffer = null
    --- End diff --
    
    ByteBuffer.wrap(new byte[0])


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

Posted by harishreedharan <gi...@git.apache.org>.
Github user harishreedharan commented on the pull request:

    https://github.com/apache/spark/pull/5645#issuecomment-97558045
  
    Generally looks good. Posted one comment, which is not a blocker, but something we can consider adding later too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5645#discussion_r29211158
  
    --- Diff: streaming/src/main/java/org/apache/spark/streaming/util/WriteAheadLogSegment.java ---
    @@ -0,0 +1,26 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.streaming.util;
    +
    +/**
    + * This is an interface that represent the information required by any implementation of
    + * a WriteAheadLog to read a written record.
    --- End diff --
    
    Saying "opaque" is kind of confusing. Its opaque to Spark, but not to whoever is implementing the WAL.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

Posted by harishreedharan <gi...@git.apache.org>.
Github user harishreedharan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5645#discussion_r29372590
  
    --- Diff: streaming/src/main/scala/org/apache/spark/streaming/rdd/WriteAheadLogBackedBlockRDD.scala ---
    @@ -111,14 +140,20 @@ class WriteAheadLogBackedBlockRDD[T: ClassTag](
     
       /**
        * Get the preferred location of the partition. This returns the locations of the block
    -   * if it is present in the block manager, else it returns the location of the
    -   * corresponding segment in HDFS.
    +   * if it is present in the block manager, else if FileBasedWriteAheadLogSegment is used,
    +   * it returns the location of the corresponding file segment in HDFS .
        */
       override def getPreferredLocations(split: Partition): Seq[String] = {
         val partition = split.asInstanceOf[WriteAheadLogBackedBlockRDDPartition]
         val blockLocations = getBlockIdLocations().get(partition.blockId)
    -    blockLocations.getOrElse(
    -      HdfsUtils.getFileSegmentLocations(
    -        partition.segment.path, partition.segment.offset, partition.segment.length, hadoopConfig))
    +    blockLocations.getOrElse {
    --- End diff --
    
    It might make sense to add location info to the WALRecordHandle interface itself. This way, systems that are not HDFS, but still benefit from preferred locations can use it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

Posted by harishreedharan <gi...@git.apache.org>.
Github user harishreedharan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5645#discussion_r29120494
  
    --- Diff: streaming/src/main/java/org/apache/spark/streaming/util/WriteAheadLog.java ---
    @@ -0,0 +1,59 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.streaming.util;
    +
    +import java.nio.ByteBuffer;
    +import java.util.Iterator;
    +
    +/**
    + * Interface representing a write ahead log (aka journal) that is used by Spark Streaming to
    + * save the received data (by receivers) and associated metadata to a reliable storage, so that
    + * they can be recovered after driver failures. See the Spark docs for more information on how
    + * to plug in your own custom implementation of a write ahead log.
    + */
    +@org.apache.spark.annotation.DeveloperApi
    +public interface WriteAheadLog {
    --- End diff --
    
    Is the idea that this would be useful for Java implementations to keep this a Java interface?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5645#discussion_r29028153
  
    --- Diff: streaming/src/main/scala/org/apache/spark/streaming/rdd/WriteAheadLogBackedBlockRDD.scala ---
    @@ -96,9 +99,27 @@ class WriteAheadLogBackedBlockRDD[T: ClassTag](
             logDebug(s"Read partition data of $this from block manager, block $blockId")
             iterator
           case None => // Data not found in Block Manager, grab it from write ahead log file
    -        val reader = new WriteAheadLogRandomReader(partition.segment.path, hadoopConf)
    -        val dataRead = reader.read(partition.segment)
    -        reader.close()
    +        var dataRead: ByteBuffer = null
    +        var writeAheadLog: WriteAheadLog = null
    +        try {
    +          val dummyDirectory = FileUtils.getTempDirectoryPath()
    +          writeAheadLog = WriteAheadLogUtils.createLogForReceiver(
    +            SparkEnv.get.conf, dummyDirectory, hadoopConf)
    --- End diff --
    
    Also IIUC here if the journal system if not hadoop based, hadoopConf may not be available.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5645#issuecomment-97283953
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5645#issuecomment-97284167
  
      [Test build #31210 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31210/consoleFull) for   PR 5645 at commit [`bde26b1`](https://github.com/apache/spark/commit/bde26b15f312a03dc2f3deb93c43e8551240cd97).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5645#issuecomment-95762547
  
      [Test build #30886 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30886/consoleFull) for   PR 5645 at commit [`9310cbf`](https://github.com/apache/spark/commit/9310cbf23640bee22ddd349e84ca5a254d440b81).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5645#discussion_r29213656
  
    --- Diff: streaming/src/main/java/org/apache/spark/streaming/util/WriteAheadLogSegment.java ---
    @@ -0,0 +1,26 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.streaming.util;
    +
    +/**
    + * This is an interface that represent the information required by any implementation of
    + * a WriteAheadLog to read a written record.
    + */
    +@org.apache.spark.annotation.DeveloperApi
    +public interface WriteAheadLogSegment extends java.io.Serializable {
    --- End diff --
    
    But in the current approach, they can't for instance use kryo or protobuf to serialize, unless they do something really crazy like use an externalizable hook to then call into Kryo. I guess I'm just thinking ahead to how this will evolve. However, if we want to have this in the future we can always create an alternative version that is additive, so I don't feel strongly at all


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5645#issuecomment-97395084
  
      [Test build #31272 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31272/consoleFull) for   PR 5645 at commit [`2c431fd`](https://github.com/apache/spark/commit/2c431fd559faa5c1934db176b05c152ee8236248).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5645#issuecomment-97283964
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5645#issuecomment-97341677
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5645#issuecomment-97285565
  
      [Test build #31212 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31212/consoleFull) for   PR 5645 at commit [`569a416`](https://github.com/apache/spark/commit/569a416c647fc2eebc76e00b3bd7d18242ad617f).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5645#discussion_r29209731
  
    --- Diff: streaming/src/main/java/org/apache/spark/streaming/util/WriteAheadLog.java ---
    @@ -0,0 +1,59 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.streaming.util;
    +
    +import java.nio.ByteBuffer;
    +import java.util.Iterator;
    +
    +/**
    + * Interface representing a write ahead log (aka journal) that is used by Spark Streaming to
    + * save the received data (by receivers) and associated metadata to a reliable storage, so that
    + * they can be recovered after driver failures. See the Spark docs for more information on how
    + * to plug in your own custom implementation of a write ahead log.
    + */
    +@org.apache.spark.annotation.DeveloperApi
    +public interface WriteAheadLog {
    +  /**
    +   * Write the record to the log and return the segment information that is necessary to read
    +   * back the written record. The time is used to the index the record, such that it can be
    +   * cleaned later. Note that the written data must be durable and readable (using the
    +   * segment info) by the time this function returns.
    +   */
    +  WriteAheadLogSegment write(ByteBuffer record, long time);
    +
    +  /**
    +   * Read a written record based on the given segment information.
    +   */
    +  ByteBuffer read(WriteAheadLogSegment segment);
    +
    +  /**
    +   * Read and return an iterator of all the records that have written and not yet cleanup.
    +   */
    +  Iterator<ByteBuffer> readAll();
    +
    +  /**
    +   * Cleanup all the records that are older than the given threshold time. It can wait for
    +   * the completion of the deletion.
    +   */
    +  void cleanup(long threshTime, boolean waitForCompletion);
    --- End diff --
    
    For some reason I feel like it should just be "clean" instead of "cleanup" (as in "log cleaning"). This is totally subjective though so I think the current one is OK to.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056] Make the Write Ahead Log pluggabl...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5645#issuecomment-95413177
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30803/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5645#issuecomment-95761083
  
      [Test build #30885 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30885/consoleFull) for   PR 5645 at commit [`86abcb1`](https://github.com/apache/spark/commit/86abcb1578ddeecfa11fc4fe116d894ccd838e44).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056] Make the Write Ahead Log pluggabl...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5645#issuecomment-95412466
  
      [Test build #30802 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30802/consoleFull) for   PR 5645 at commit [`7dd2d4b`](https://github.com/apache/spark/commit/7dd2d4b9f6dcc993d0474bc2b65daa358a14115e).
     * This patch **fails RAT tests**.
     * This patch **does not merge cleanly**.
     * This patch adds no public classes.
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5645#discussion_r29308676
  
    --- Diff: streaming/src/main/scala/org/apache/spark/streaming/rdd/WriteAheadLogBackedBlockRDD.scala ---
    @@ -96,10 +100,29 @@ class WriteAheadLogBackedBlockRDD[T: ClassTag](
             logDebug(s"Read partition data of $this from block manager, block $blockId")
             iterator
           case None => // Data not found in Block Manager, grab it from write ahead log file
    -        val reader = new WriteAheadLogRandomReader(partition.segment.path, hadoopConf)
    -        val dataRead = reader.read(partition.segment)
    -        reader.close()
    -        logInfo(s"Read partition data of $this from write ahead log, segment ${partition.segment}")
    +        var dataRead: ByteBuffer = null
    +        var writeAheadLog: WriteAheadLog = null
    +        try {
    +          val dummyDirectory = FileUtils.getTempDirectoryPath()
    +          writeAheadLog = WriteAheadLogUtils.createLogForReceiver(
    +            SparkEnv.get.conf, dummyDirectory, hadoopConf)
    +          dataRead = writeAheadLog.read(partition.walRecordHandle)
    +        } catch {
    +          case NonFatal(e) =>
    +            throw new SparkException(
    +              s"Could not read data from write ahead log record ${partition.walRecordHandle}", e)
    +        } finally {
    +          if (writeAheadLog != null) {
    +            writeAheadLog.close()
    --- End diff --
    
    Done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5645#discussion_r29374751
  
    --- Diff: streaming/src/main/scala/org/apache/spark/streaming/rdd/WriteAheadLogBackedBlockRDD.scala ---
    @@ -111,14 +140,20 @@ class WriteAheadLogBackedBlockRDD[T: ClassTag](
     
       /**
        * Get the preferred location of the partition. This returns the locations of the block
    -   * if it is present in the block manager, else it returns the location of the
    -   * corresponding segment in HDFS.
    +   * if it is present in the block manager, else if FileBasedWriteAheadLogSegment is used,
    +   * it returns the location of the corresponding file segment in HDFS .
        */
       override def getPreferredLocations(split: Partition): Seq[String] = {
         val partition = split.asInstanceOf[WriteAheadLogBackedBlockRDDPartition]
         val blockLocations = getBlockIdLocations().get(partition.blockId)
    -    blockLocations.getOrElse(
    -      HdfsUtils.getFileSegmentLocations(
    -        partition.segment.path, partition.segment.offset, partition.segment.length, hadoopConfig))
    +    blockLocations.getOrElse {
    --- End diff --
    
    That's a good point. I wasnt super sure of whether it is a good idea to have it in the interface in this version. We can add it later and maintain binary compatibility as the RecordHandle is an abstract class. Also It is still a developer API s. For now, I am going to merge this in to unblock #5732 . 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056] Make the Write Ahead Log pluggabl...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5645#issuecomment-95442059
  
      [Test build #30814 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30814/consoleFull) for   PR 5645 at commit [`837c4f5`](https://github.com/apache/spark/commit/837c4f5cb90abee46feb15bfd1c103cddc58af32).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5645#discussion_r29307750
  
    --- Diff: streaming/src/main/scala/org/apache/spark/streaming/util/FileBasedWriteAheadLog.scala ---
    @@ -108,12 +117,14 @@ private[streaming] class WriteAheadLogManager(
        * the latest the records. This does not deal with currently active log files, and
        * hence the implementation is kept simple.
        */
    -  def readFromLog(): Iterator[ByteBuffer] = synchronized {
    +  def readAll(): JIterator[ByteBuffer] = synchronized {
    +    import scala.collection.JavaConversions._
         val logFilesToRead = pastLogs.map{ _.path} ++ currentLogPath
         logInfo("Reading from the logs: " + logFilesToRead.mkString("\n"))
    +
    --- End diff --
    
    Also would be better to change to string interpolation for the above line.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5645#issuecomment-95806380
  
      [Test build #30913 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30913/consoleFull) for   PR 5645 at commit [`1a32a4b`](https://github.com/apache/spark/commit/1a32a4b5ce740721343915452974c9fb3f9a3910).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5645#issuecomment-97252669
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056] Make the Write Ahead Log pluggabl...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5645#issuecomment-95411727
  
      [Test build #30802 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30802/consoleFull) for   PR 5645 at commit [`7dd2d4b`](https://github.com/apache/spark/commit/7dd2d4b9f6dcc993d0474bc2b65daa358a14115e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056] Make the Write Ahead Log pluggabl...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5645#issuecomment-95478390
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30820/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5645#discussion_r29213422
  
    --- Diff: streaming/src/main/java/org/apache/spark/streaming/util/WriteAheadLog.java ---
    @@ -0,0 +1,59 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.streaming.util;
    +
    +import java.nio.ByteBuffer;
    +import java.util.Iterator;
    +
    +/**
    + * Interface representing a write ahead log (aka journal) that is used by Spark Streaming to
    + * save the received data (by receivers) and associated metadata to a reliable storage, so that
    + * they can be recovered after driver failures. See the Spark docs for more information on how
    + * to plug in your own custom implementation of a write ahead log.
    + */
    +@org.apache.spark.annotation.DeveloperApi
    +public interface WriteAheadLog {
    --- End diff --
    
    Yes. Its meant for users to create arbitrary implementations and we want to
    stay backward compatible (scala traits  have pretty nasty corner cases).
    On Apr 26, 2015 9:48 PM, "Hari Shreedharan" <no...@github.com>
    wrote:
    
    > In
    > streaming/src/main/java/org/apache/spark/streaming/util/WriteAheadLog.java
    > <https://github.com/apache/spark/pull/5645#discussion_r29120494>:
    >
    > > + * limitations under the License.
    > > + */
    > > +
    > > +package org.apache.spark.streaming.util;
    > > +
    > > +import java.nio.ByteBuffer;
    > > +import java.util.Iterator;
    > > +
    > > +/**
    > > + * Interface representing a write ahead log (aka journal) that is used by Spark Streaming to
    > > + * save the received data (by receivers) and associated metadata to a reliable storage, so that
    > > + * they can be recovered after driver failures. See the Spark docs for more information on how
    > > + * to plug in your own custom implementation of a write ahead log.
    > > + */
    > > +@org.apache.spark.annotation.DeveloperApi
    > > +public interface WriteAheadLog {
    >
    > Is the idea that this would be useful for Java implementations to keep
    > this a Java interface?
    >
    > —
    > Reply to this email directly or view it on GitHub
    > <https://github.com/apache/spark/pull/5645/files#r29120494>.
    >



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/5645


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5645#issuecomment-97285569
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31212/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5645#issuecomment-97395144
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31272/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5645#issuecomment-97285142
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5645#issuecomment-97296663
  
      [Test build #31230 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31230/consoleFull) for   PR 5645 at commit [`c2bc738`](https://github.com/apache/spark/commit/c2bc7384e2551bb5e333d0c2b0b61fe02ce0d0b5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5645#discussion_r29306105
  
    --- Diff: streaming/src/main/scala/org/apache/spark/streaming/receiver/ReceivedBlockHandler.scala ---
    @@ -96,7 +96,7 @@ private[streaming] class BlockManagerBasedBlockHandler(
      */
     private[streaming] case class WriteAheadLogBasedStoreResult(
         blockId: StreamBlockId,
    -    segment: WriteAheadLogFileSegment
    +    segment: WriteAheadLogRecordHandle
    --- End diff --
    
    On second thought, I am fixing this. This is a problem in too many places.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5645#discussion_r29308665
  
    --- Diff: streaming/src/main/scala/org/apache/spark/streaming/util/FileBasedWriteAheadLogSegment.scala ---
    @@ -17,4 +17,5 @@
     package org.apache.spark.streaming.util
     
     /** Class for representing a segment of data in a write ahead log file */
    -private[streaming] case class WriteAheadLogFileSegment (path: String, offset: Long, length: Int)
    +private[streaming] case class FileBasedWriteAheadLogSegment(path: String, offset: Long, length: Int)
    --- End diff --
    
    I have been thinking the same. The consistent way would be `FileBasedWriteAheadLogRecordHandle`. That is super super long. :( So I kept it as segment. I think its better of the two options.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056] Make the Write Ahead Log pluggabl...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5645#issuecomment-95413174
  
      [Test build #30803 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30803/consoleFull) for   PR 5645 at commit [`09bc6fe`](https://github.com/apache/spark/commit/09bc6fe6d792a6007a1e88fa90d956dbd4d547fb).
     * This patch **fails RAT tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5645#issuecomment-97285135
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5645#issuecomment-97296652
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5645#issuecomment-97374436
  
      [Test build #31272 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31272/consoleFull) for   PR 5645 at commit [`2c431fd`](https://github.com/apache/spark/commit/2c431fd559faa5c1934db176b05c152ee8236248).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5645#issuecomment-97395140
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5645#discussion_r29029517
  
    --- Diff: streaming/src/main/scala/org/apache/spark/streaming/rdd/WriteAheadLogBackedBlockRDD.scala ---
    @@ -96,9 +99,27 @@ class WriteAheadLogBackedBlockRDD[T: ClassTag](
             logDebug(s"Read partition data of $this from block manager, block $blockId")
             iterator
           case None => // Data not found in Block Manager, grab it from write ahead log file
    -        val reader = new WriteAheadLogRandomReader(partition.segment.path, hadoopConf)
    -        val dataRead = reader.read(partition.segment)
    -        reader.close()
    +        var dataRead: ByteBuffer = null
    +        var writeAheadLog: WriteAheadLog = null
    +        try {
    +          val dummyDirectory = FileUtils.getTempDirectoryPath()
    +          writeAheadLog = WriteAheadLogUtils.createLogForReceiver(
    +            SparkEnv.get.conf, dummyDirectory, hadoopConf)
    --- End diff --
    
    What I'm thinking is that do we need to have this parameter for the interface, can we hide this into file-based WAL implementations.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5645#discussion_r29211734
  
    --- Diff: streaming/src/main/java/org/apache/spark/streaming/util/WriteAheadLogSegment.java ---
    @@ -0,0 +1,26 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.streaming.util;
    +
    +/**
    + * This is an interface that represent the information required by any implementation of
    + * a WriteAheadLog to read a written record.
    + */
    +@org.apache.spark.annotation.DeveloperApi
    +public interface WriteAheadLogSegment extends java.io.Serializable {
    --- End diff --
    
    1. Well, for advanced users who want to implement their own WAL implementation will have to ensure that the segment info is serializable, no matter whether we expose an interface or a bytebuffer. In fact, exposing an interface avoids them from writing the code to serialize and return a bytebuffer in a usual case, which is easier to user. Also this interface is expected to be called not faster than 100s of time per second. So does not require super high serialization efficiency. Even if they want, they can always make the implementation extend Externalizable.
    
    2. That is a good point. There are easy workaround even if we dont make this a ByteBuffer. They can put a bytebuffer within their implementation `class MyWALSegment(byteBuffer: ByteBuffer) extends WALSegment`. Now for people who dont care about backward compatibility, making it a bytebuffer make it harder for them to implement. For others who do care about backward compatibility, they will have to write custom externalization logic either way, while returning bytebuffer or returning MyWALSegment.
    
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on the pull request:

    https://github.com/apache/spark/pull/5645#issuecomment-97202120
  
    Offline conversation with @pwendell - he thinks its fine either ways, as long as there is a way to add more methods in the future. To enable that I am making the interfaces into abstract classes, so that we can add more methods with minimal affect to binary compatibility.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056] Make the Write Ahead Log pluggabl...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5645#issuecomment-95677883
  
      [Test build #30865 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30865/consoleFull) for   PR 5645 at commit [`84ce469`](https://github.com/apache/spark/commit/84ce4693ed7f1a161c52f75d33a4699428363470).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056] Make the Write Ahead Log pluggabl...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5645#issuecomment-95478385
  
      [Test build #30820 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30820/consoleFull) for   PR 5645 at commit [`bce5e75`](https://github.com/apache/spark/commit/bce5e7597d4f59952b6b9de7bf194392b5e5790f).
     * This patch **fails to build**.
     * This patch merges cleanly.
     * This patch adds no public classes.
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5645#discussion_r29307843
  
    --- Diff: streaming/src/main/scala/org/apache/spark/streaming/util/FileBasedWriteAheadLogSegment.scala ---
    @@ -17,4 +17,5 @@
     package org.apache.spark.streaming.util
     
     /** Class for representing a segment of data in a write ahead log file */
    -private[streaming] case class WriteAheadLogFileSegment (path: String, offset: Long, length: Int)
    +private[streaming] case class FileBasedWriteAheadLogSegment(path: String, offset: Long, length: Int)
    --- End diff --
    
    Is it better to rename to `FileBasedWriteAheadLogRecordHandle`, here is `Segment`, but the base class is `RecordHandle`, is is better to keep consistent?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5645#discussion_r29307541
  
    --- Diff: streaming/src/main/scala/org/apache/spark/streaming/rdd/WriteAheadLogBackedBlockRDD.scala ---
    @@ -96,10 +100,29 @@ class WriteAheadLogBackedBlockRDD[T: ClassTag](
             logDebug(s"Read partition data of $this from block manager, block $blockId")
             iterator
           case None => // Data not found in Block Manager, grab it from write ahead log file
    -        val reader = new WriteAheadLogRandomReader(partition.segment.path, hadoopConf)
    -        val dataRead = reader.read(partition.segment)
    -        reader.close()
    -        logInfo(s"Read partition data of $this from write ahead log, segment ${partition.segment}")
    +        var dataRead: ByteBuffer = null
    +        var writeAheadLog: WriteAheadLog = null
    +        try {
    +          val dummyDirectory = FileUtils.getTempDirectoryPath()
    +          writeAheadLog = WriteAheadLogUtils.createLogForReceiver(
    +            SparkEnv.get.conf, dummyDirectory, hadoopConf)
    +          dataRead = writeAheadLog.read(partition.walRecordHandle)
    +        } catch {
    +          case NonFatal(e) =>
    +            throw new SparkException(
    +              s"Could not read data from write ahead log record ${partition.walRecordHandle}", e)
    +        } finally {
    +          if (writeAheadLog != null) {
    +            writeAheadLog.close()
    +          }
    +        }
    +        if (dataRead == null) {
    +          throw new SparkException(
    +            s"Could not read data from write ahead log record ${partition.walRecordHandle}, " +
    +              s"read returned null")
    +        }
    +        logInfo(s"Read partition data of $this from write ahead log, record handle " +
    +          partition.walRecordHandle)
    --- End diff --
    
    Would be better to use string interpolation here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org