You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by scwf <wa...@huawei.com> on 2014/11/16 13:24:57 UTC

send currentJars and currentFiles to exetutor with actor?

I notice that spark serialize each task with the dependencies (files and JARs
added to the SparkContext) , 
  def serializeWithDependencies(
      task: Task[_],
      currentFiles: HashMap[String, Long],
      currentJars: HashMap[String, Long],
      serializer: SerializerInstance)
    : ByteBuffer = {

    val out = new ByteArrayOutputStream(4096)
    val dataOut = new DataOutputStream(out)

    // Write currentFiles
    dataOut.writeInt(currentFiles.size)
    for ((name, timestamp) <- currentFiles) {
      dataOut.writeUTF(name)
      dataOut.writeLong(timestamp)
    }

    // Write currentJars
    dataOut.writeInt(currentJars.size)
    for ((name, timestamp) <- currentJars) {
      dataOut.writeUTF(name)
      dataOut.writeLong(timestamp)
    }

    // Write the task itself and finish
    dataOut.flush()
    val taskBytes = serializer.serialize(task).array()
    out.write(taskBytes)
    ByteBuffer.wrap(out.toByteArray)
  }

Why not send currentJars and currentFiles to exetutor using actor? I think
it's not necessary to serialize them for each task. 



--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/send-currentJars-and-currentFiles-to-exetutor-with-actor-tp9381.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: send currentJars and currentFiles to exetutor with actor?

Posted by Reynold Xin <rx...@databricks.com>.

The current design is not ideal, but the size of dependencies should be
fairly small since we only send the path and timestamp, not the jars
themselves.

Executors can come and go. This is essentially a state replication problem
that you gotta be very careful with consistency.

On Sun, Nov 16, 2014 at 4:24 AM, scwf <wa...@huawei.com> wrote:

> I notice that spark serialize each task with the dependencies (files and
> JARs
> added to the SparkContext) ,
>   def serializeWithDependencies(
>       task: Task[_],
>       currentFiles: HashMap[String, Long],
>       currentJars: HashMap[String, Long],
>       serializer: SerializerInstance)
>     : ByteBuffer = {
>
>     val out = new ByteArrayOutputStream(4096)
>     val dataOut = new DataOutputStream(out)
>
>     // Write currentFiles
>     dataOut.writeInt(currentFiles.size)
>     for ((name, timestamp) <- currentFiles) {
>       dataOut.writeUTF(name)
>       dataOut.writeLong(timestamp)
>     }
>
>     // Write currentJars
>     dataOut.writeInt(currentJars.size)
>     for ((name, timestamp) <- currentJars) {
>       dataOut.writeUTF(name)
>       dataOut.writeLong(timestamp)
>     }
>
>     // Write the task itself and finish
>     dataOut.flush()
>     val taskBytes = serializer.serialize(task).array()
>     out.write(taskBytes)
>     ByteBuffer.wrap(out.toByteArray)
>   }
>
> Why not send currentJars and currentFiles to exetutor using actor? I think
> it's not necessary to serialize them for each task.
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/send-currentJars-and-currentFiles-to-exetutor-with-actor-tp9381.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>
>