You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by wxhsdp <wx...@gmail.com> on 2014/04/26 11:26:26 UTC

how to get subArray without copy

Hi, all
  i want to do the following operations:
  (1) each partition do some operations on the partition data in Array
format
  (2) split the array into subArrays, and combine each subArray with an id
  (3) do a shuffle according to the id

  here is the pseudo code
  /*pseudo code*/

  case class MyObject(val id: Int, val arr: Array[T])

  val b = a.mapPartitions{ itr =>
    val c = itr.toArray

    /*some operations on c*/

    val d = new Array[MyObject](2)

    d(0) = (0, c(index0 to index1)) //line 10
    d(1) = (1, c(index2 to index3)) //line 11

    d.toIterator
  }

  b.groupBy(id...)

  my question is how to get the subArray with memory efficiency in line 10
and 11, i don't want val d to
  occupy extra memory. is there a way to do like pointer reference in c?

  Array.slice does a copy and it consumes 4x memory than the original one, i
don't know the reason. it's
  related to java autoboxing?

   Array.view returns IndexedSeqView, if you convert it to Array right in
line10
   d(0) = (0, c.view(index0, index1).toArray) //line 10
   it's the same as Array.slice

   if you convert it to Array after b.groupBy(id...), error occurs since
it's not serializable

ERROR executor.Executor: Exception in task ID 1
java.io.NotSerializableException:
scala.collection.mutable.IndexedSeqView$$anon$2




--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/how-to-get-subArray-without-copy-tp4873.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: how to get subArray without copy

Posted by wxhsdp <wx...@gmail.com>.
the way i can find out is to use 2-D Array if the split has regularity



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/how-to-get-subArray-without-copy-tp4873p4888.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.