You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by ash211 <gi...@git.apache.org> on 2014/09/07 09:00:39 UTC

[GitHub] spark pull request: [SPARK-2849] Handle driver configs separately ...

Github user ash211 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1845#discussion_r17213316
  
    --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
    @@ -1421,3 +1421,24 @@ private[spark] object Utils extends Logging {
       }
     
     }
    +
    +/**
    + * A utility class to redirect the child process's stdout or stderr.
    + */
    +private[spark] class RedirectThread(in: InputStream, out: OutputStream, name: String)
    +  extends Thread(name) {
    +
    +  setDaemon(true)
    +  override def run() {
    +    scala.util.control.Exception.ignoring(classOf[IOException]) {
    +      // FIXME: We copy the stream on the level of bytes to avoid encoding problems.
    --- End diff --
    
    I'm guessing the original author was reading Strings() before, which requires that you define how to interpret the bytes into strings (encodings like ASCII/UTF8, etc).  There were probably some byte sequences coming through that weren't characters in UTF8 so exceptions were being thrown.  Since this is just reading from an InputStream and writing to an OutputStream, copying bytes would be much more efficient than reading bytes, interpreting as characters, converting back to bytes, then sending those out the other side.
    
    Conveniently, Apache has an ```IOUtils.copy(inputStream, outputStream, bufferSize)``` [method](http://commons.apache.org/proper/commons-io/apidocs/org/apache/commons/io/IOUtils.html#copy(java.io.InputStream, java.io.OutputStream, int)) that would do exactly this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [GitHub] spark pull request: [SPARK-2849] Handle driver configs separately ...

Posted by Sean Owen <so...@cloudera.com>.

(Guava also has ByteStreams.copy for this and I know its a dependency
already - I forget if commons IO is.)
Github user ash211 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1845#discussion_r17213316

    --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
    @@ -1421,3 +1421,24 @@ private[spark] object Utils extends Logging {
       }

     }
    +
    +/**
    + * A utility class to redirect the child process's stdout or stderr.
    + */
    +private[spark] class RedirectThread(in: InputStream, out:
OutputStream, name: String)
    +  extends Thread(name) {
    +
    +  setDaemon(true)
    +  override def run() {
    +    scala.util.control.Exception.ignoring(classOf[IOException]) {
    +      // FIXME: We copy the stream on the level of bytes to avoid
encoding problems.
    --- End diff --

    I'm guessing the original author was reading Strings() before, which
requires that you define how to interpret the bytes into strings (encodings
like ASCII/UTF8, etc).  There were probably some byte sequences coming
through that weren't characters in UTF8 so exceptions were being thrown.
Since this is just reading from an InputStream and writing to an
OutputStream, copying bytes would be much more efficient than reading
bytes, interpreting as characters, converting back to bytes, then sending
those out the other side.

    Conveniently, Apache has an ```IOUtils.copy(inputStream, outputStream,
bufferSize)``` [method](
http://commons.apache.org/proper/commons-io/apidocs/org/apache/commons/io/IOUtils.html#copy(java.io.InputStream,
java.io.OutputStream, int)) that would do exactly this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org