You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by squito <gi...@git.apache.org> on 2018/09/12 18:57:27 UTC

[GitHub] spark pull request #22404: DO NOT MERGE

GitHub user squito opened a pull request:

    https://github.com/apache/spark/pull/22404

    DO NOT MERGE

    just for testing

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/squito/spark assorted_2.3_fixes

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22404.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22404
    
----
commit f1c86c92bb7c14174027e40a890fd8e22c707887
Author: Imran Rashid <ir...@...>
Date:   2018-08-14T02:35:34Z

    [PYSPARK] Updates to pyspark broadcast

commit 2be676d948f88719326aaeb000ebfde1d2049e0d
Author: Imran Rashid <ir...@...>
Date:   2018-09-06T17:11:47Z

    [PYSPARK][SQL] Updates to RowQueue
    
    Tested with updates to RowQueueSuite

commit 1b4e8ebf147e75c7a07e7a547b671f44a9cc7042
Author: Imran Rashid <ir...@...>
Date:   2018-08-22T21:38:28Z

    [CORE] Updates to remote cache reads
    
    Covered by tests in DistributedSuite

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22404: DO NOT MERGE

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22404
  
    **[Test build #96041 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96041/testReport)** for PR 22404 at commit [`38dd9d8`](https://github.com/apache/spark/commit/38dd9d89d32a55f8ad77ed0acca1f45cb0671aea).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22404: DO NOT MERGE

Posted by squito <gi...@git.apache.org>.
Github user squito closed the pull request at:

    https://github.com/apache/spark/pull/22404


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22404: DO NOT MERGE

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22404
  
    **[Test build #96001 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96001/testReport)** for PR 22404 at commit [`2b0d10d`](https://github.com/apache/spark/commit/2b0d10d27d86eea81f9cc15eb61564533e6651fc).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22404: DO NOT MERGE

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22404
  
    **[Test build #96001 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96001/testReport)** for PR 22404 at commit [`2b0d10d`](https://github.com/apache/spark/commit/2b0d10d27d86eea81f9cc15eb61564533e6651fc).
     * This patch **fails to build**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22404: DO NOT MERGE

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22404
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22404: DO NOT MERGE

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22404
  
    **[Test build #96000 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96000/testReport)** for PR 22404 at commit [`1b4e8eb`](https://github.com/apache/spark/commit/1b4e8ebf147e75c7a07e7a547b671f44a9cc7042).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22404: DO NOT MERGE

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22404
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96041/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22404: DO NOT MERGE

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22404
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96002/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22404: DO NOT MERGE

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22404
  
    **[Test build #96002 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96002/testReport)** for PR 22404 at commit [`4f3e29e`](https://github.com/apache/spark/commit/4f3e29eea891999d76aac12f49faed6330c592e8).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22404: DO NOT MERGE

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22404#discussion_r217155746
  
    --- Diff: python/pyspark/context.py ---
    @@ -499,19 +506,32 @@ def f(split, iterator):
     
         def _serialize_to_jvm(self, data, parallelism, serializer):
             """
    -        Calling the Java parallelize() method with an ArrayList is too slow,
    -        because it sends O(n) Py4J commands.  As an alternative, serialized
    -        objects are written to a file and loaded through textFile().
    -        """
    -        tempFile = NamedTemporaryFile(delete=False, dir=self._temp_dir)
    -        try:
    -            serializer.dump_stream(data, tempFile)
    -            tempFile.close()
    -            readRDDFromFile = self._jvm.PythonRDD.readRDDFromFile
    -            return readRDDFromFile(self._jsc, tempFile.name, parallelism)
    -        finally:
    -            # readRDDFromFile eagerily reads the file so we can delete right after.
    -            os.unlink(tempFile.name)
    +        Using py4j to send a large dataset to the jvm is really slow, so we use either a file
    +        or a socket if we have encryption enabled.
    +        """
    +        if self._encryption_enabled:
    +            # with encryption, we open a server in java and send the data directly
    +            server = self._jvm.PythonParallelizeServer(self._jsc.sc(), parallelism)
    +            (sock_file, _) = local_connect_and_auth(server.port(), server.secret())
    +            chunked_out = ChunkedStream(sock_file, 8192)
    +            serializer.dump_stream(data, chunked_out)
    +            chunked_out.close()
    +            # this call will block until the server has read all the data and processed it (or
    +            # throws an exception)
    +            r = server.getResult()
    +            return r
    --- End diff --
    
    Nit, do you want to just `return server.getResult()` in cases like this?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22404: DO NOT MERGE

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22404#discussion_r217153478
  
    --- Diff: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/SimpleDownloadFile.java ---
    @@ -0,0 +1,89 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.spark.network.shuffle;
    +
    +import org.apache.spark.network.buffer.FileSegmentManagedBuffer;
    +import org.apache.spark.network.buffer.ManagedBuffer;
    +import org.apache.spark.network.util.TransportConf;
    +
    +import java.io.*;
    +import java.nio.ByteBuffer;
    +import java.nio.channels.Channels;
    +import java.nio.channels.WritableByteChannel;
    +
    +/**
    + * A DownloadFile that does not take any encryption settings into account for reading and
    + * writing data.
    + *
    + * This does *not* mean the data in the file is un-encrypted -- it could be that the data is
    + * already encrypted when its written, and subsequent layer is responsible for decrypting.
    + */
    +public class SimpleDownloadFile implements DownloadFile {
    +  private final File file;
    +  private final TransportConf transportConf;
    +
    +  public SimpleDownloadFile(File file, TransportConf transportConf) {
    +    this.file = file;
    +    this.transportConf = transportConf;
    +  }
    +
    +  @Override
    +  public boolean delete() {
    +    return file.delete();
    +  }
    +
    +  @Override
    +  public DownloadFileWritableChannel openForWriting() {
    +    try {
    +      return new SimpleDownloadWritableChannel();
    +    } catch (FileNotFoundException e) {
    +      throw new RuntimeException(e);
    +    }
    +  }
    +
    +  @Override
    +  public String path() {
    +    return file.getAbsolutePath();
    +  }
    +
    +  private class SimpleDownloadWritableChannel implements DownloadFileWritableChannel {
    +    private final WritableByteChannel channel;
    --- End diff --
    
    More nits, would put blank lines around this


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22404: DO NOT MERGE

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22404
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3055/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22404: DO NOT MERGE

Posted by squito <gi...@git.apache.org>.
Github user squito commented on the issue:

    https://github.com/apache/spark/pull/22404
  
    thanks Sean, all good points, just updated.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22404: DO NOT MERGE

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22404
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96000/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22404: DO NOT MERGE

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22404#discussion_r217153096
  
    --- Diff: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/DownloadFile.java ---
    @@ -0,0 +1,45 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.spark.network.shuffle;
    +
    +/**
    + * A handle on the file used when fetching remote data to disk.  Used to ensure the lifecycle of
    + * writing the data, reading it back, and then cleaning it up is followed.  Specific implementations
    + * may also handle encryption.  The data can be read only via DownloadFileWritableChannel,
    + * which ensures data is not read until after the writer is closed.
    + */
    +public interface DownloadFile {
    +  /**
    +   * Delete the file.
    +   *
    +   * @return  <code>true</code> if and only if the file or directory is
    +   *          successfully deleted; <code>false</code> otherwise
    +   */
    +  public boolean delete();
    +
    +  /**
    +   * A channel for writing data to the file.  This special channel allows access to the data for
    +   * reading, after the channel is closed, via {@link DownloadFileWritableChannel#closeAndRead()}.
    +   */
    +  public DownloadFileWritableChannel openForWriting();
    +
    +  /**
    +   * The path of the file, intended only for debug purposes.
    +   * @return
    --- End diff --
    
    These are just nits, but you can put the text on the line above in the tag here


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22404: DO NOT MERGE

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22404
  
    **[Test build #96041 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96041/testReport)** for PR 22404 at commit [`38dd9d8`](https://github.com/apache/spark/commit/38dd9d89d32a55f8ad77ed0acca1f45cb0671aea).
     * This patch **fails from timeout after a configured wait of \`250m\`**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22404: DO NOT MERGE

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22404
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22404: DO NOT MERGE

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22404
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3089/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22404: DO NOT MERGE

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22404
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22404: DO NOT MERGE

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22404
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22404: DO NOT MERGE

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22404
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96001/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22404: DO NOT MERGE

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22404
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3056/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22404: DO NOT MERGE

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22404#discussion_r217154410
  
    --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala ---
    @@ -692,5 +681,238 @@ private[spark] class PythonBroadcast(@transient var path: String) extends Serial
         }
         super.finalize()
       }
    +
    +  def setupEncryptionServer(): Array[Any] = {
    +    encryptionServer = new PythonServer[Unit]("broadcast-encrypt-server") {
    +      override def handleConnection(sock: Socket): Unit = {
    +        val env = SparkEnv.get
    +        val in = sock.getInputStream()
    +        val dir = new File(Utils.getLocalDir(env.conf))
    +        val file = File.createTempFile("broadcast", "", dir)
    +        path = file.getAbsolutePath
    +        val out = env.serializerManager.wrapForEncryption(new FileOutputStream(path))
    +        DechunkedInputStream.dechunkAndCopyToOutput(in, out)
    +      }
    +    }
    +    Array(encryptionServer.port, encryptionServer.secret)
    +  }
    +
    +  def waitTillDataReceived(): Unit = encryptionServer.getResult()
     }
     // scalastyle:on no.finalize
    +
    +/**
    + * The inverse of pyspark's ChunkedStream for sending broadcast data.
    + * Tested from python tests.
    + */
    +private[spark] class DechunkedInputStream(wrapped: InputStream) extends InputStream with Logging {
    +  private val din = new DataInputStream(wrapped)
    +  private var remainingInChunk = din.readInt()
    +
    +  override def read(): Int = {
    +    val into = new Array[Byte](1)
    +    val n = read(into, 0, 1)
    +    if (n == -1) {
    +      -1
    +    } else {
    +      // if you just cast a byte to an int, then anything > 127 is negative, which is interpreted
    +      // as an EOF
    +      val b = into(0)
    +      if (b < 0) {
    --- End diff --
    
    Pardon is this just trying to treat it as an unsigned byte? then just `b & 0xFF`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22404: DO NOT MERGE

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22404
  
    **[Test build #4338 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4338/testReport)** for PR 22404 at commit [`2b0d10d`](https://github.com/apache/spark/commit/2b0d10d27d86eea81f9cc15eb61564533e6651fc).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22404: DO NOT MERGE

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22404#discussion_r217154988
  
    --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala ---
    @@ -193,19 +193,51 @@ private[spark] abstract class BasePythonRunner[IN, OUT](
             val newBids = broadcastVars.map(_.id).toSet
             // number of different broadcasts
             val toRemove = oldBids.diff(newBids)
    -        val cnt = toRemove.size + newBids.diff(oldBids).size
    +        val addedBids = newBids.diff(oldBids)
    +        val cnt = toRemove.size + addedBids.size
    +        val needsDecryptionServer = env.serializerManager.encryptionEnabled && addedBids.nonEmpty
    +        dataOut.writeBoolean(needsDecryptionServer)
             dataOut.writeInt(cnt)
    -        for (bid <- toRemove) {
    -          // remove the broadcast from worker
    -          dataOut.writeLong(- bid - 1)  // bid >= 0
    -          oldBids.remove(bid)
    +        def sendBidsToRemove(): Unit = {
    +          for (bid <- toRemove) {
    +            // remove the broadcast from worker
    +            dataOut.writeLong(-bid - 1) // bid >= 0
    +            oldBids.remove(bid)
    +          }
             }
    -        for (broadcast <- broadcastVars) {
    -          if (!oldBids.contains(broadcast.id)) {
    +        if (needsDecryptionServer) {
    +          // if there is encryption, we setup a server which reads the encrypted files, and sends
    +          // the decrypted data to python
    +          val idsAndFiles = broadcastVars.flatMap { broadcast =>
    +            if (!oldBids.contains(broadcast.id)) {
    --- End diff --
    
    Nit: flip the if condition for clarity?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22404: DO NOT MERGE

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22404#discussion_r217153676
  
    --- Diff: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/SimpleDownloadFile.java ---
    @@ -0,0 +1,89 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.spark.network.shuffle;
    +
    +import org.apache.spark.network.buffer.FileSegmentManagedBuffer;
    +import org.apache.spark.network.buffer.ManagedBuffer;
    +import org.apache.spark.network.util.TransportConf;
    +
    +import java.io.*;
    +import java.nio.ByteBuffer;
    +import java.nio.channels.Channels;
    +import java.nio.channels.WritableByteChannel;
    +
    +/**
    + * A DownloadFile that does not take any encryption settings into account for reading and
    + * writing data.
    + *
    + * This does *not* mean the data in the file is un-encrypted -- it could be that the data is
    + * already encrypted when its written, and subsequent layer is responsible for decrypting.
    + */
    +public class SimpleDownloadFile implements DownloadFile {
    +  private final File file;
    +  private final TransportConf transportConf;
    +
    +  public SimpleDownloadFile(File file, TransportConf transportConf) {
    +    this.file = file;
    +    this.transportConf = transportConf;
    +  }
    +
    +  @Override
    +  public boolean delete() {
    +    return file.delete();
    +  }
    +
    +  @Override
    +  public DownloadFileWritableChannel openForWriting() {
    +    try {
    +      return new SimpleDownloadWritableChannel();
    +    } catch (FileNotFoundException e) {
    +      throw new RuntimeException(e);
    --- End diff --
    
    Is it bad to just let the method declare that it throws IOException for FileNotFoundException? I know they're checked, but it's more precise


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22404: DO NOT MERGE

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22404
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22404: DO NOT MERGE

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22404
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22404: DO NOT MERGE

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22404
  
    **[Test build #96000 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96000/testReport)** for PR 22404 at commit [`1b4e8eb`](https://github.com/apache/spark/commit/1b4e8ebf147e75c7a07e7a547b671f44a9cc7042).
     * This patch **fails Python style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22404: DO NOT MERGE

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22404
  
    **[Test build #96002 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96002/testReport)** for PR 22404 at commit [`4f3e29e`](https://github.com/apache/spark/commit/4f3e29eea891999d76aac12f49faed6330c592e8).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22404: DO NOT MERGE

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22404
  
    **[Test build #4338 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4338/testReport)** for PR 22404 at commit [`2b0d10d`](https://github.com/apache/spark/commit/2b0d10d27d86eea81f9cc15eb61564533e6651fc).
     * This patch **fails to build**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22404: DO NOT MERGE

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22404
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22404: DO NOT MERGE

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22404#discussion_r217153554
  
    --- Diff: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/SimpleDownloadFile.java ---
    @@ -0,0 +1,89 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.spark.network.shuffle;
    +
    +import org.apache.spark.network.buffer.FileSegmentManagedBuffer;
    +import org.apache.spark.network.buffer.ManagedBuffer;
    +import org.apache.spark.network.util.TransportConf;
    +
    +import java.io.*;
    --- End diff --
    
    Usually we unroll star imports, but not a big deal


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22404: DO NOT MERGE

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22404#discussion_r217153301
  
    --- Diff: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/DownloadFile.java ---
    @@ -0,0 +1,45 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.spark.network.shuffle;
    --- End diff --
    
    Super nit, I think this can be preceded by a blank line


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org