You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Ayoub <be...@gmail.com> on 2015/07/26 22:21:51 UTC

RDD[Future[T]] => Future[RDD[T]]

Hello,

I am trying to convert the result I get after doing some async IO :

val rdd: RDD[T] = // some rdd 

val result: RDD[Future[T]] = rdd.map(httpCall)

Is there a way collect all futures once they are completed in a *non
blocking* (i.e. without scala.concurrent
Await) and lazy way?

If the RDD was a standard scala collection then calling
"scala.concurrent.Future.sequence" would have resolved the issue but RDD is
not a TraversableOnce (which is required by the method).

Is there a way to do this kind of transformation with an RDD[Future[T]] ?

Thanks,
Ayoub.



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/RDD-Future-T-Future-RDD-T-tp24000.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: RDD[Future[T]] => Future[RDD[T]]

Posted by Ayoub Benali <be...@gmail.com>.
It doesn't work because mapPartitions expects a function f:(Iterator[T]) ⇒
Iterator[U] while .sequence wraps the iterator in a Future

2015-07-26 22:25 GMT+02:00 Ignacio Blasco <el...@gmail.com>:

> Maybe using mapPartitions and .sequence inside it?
> El 26/7/2015 10:22 p. m., "Ayoub" <be...@gmail.com> escribió:
>
>> Hello,
>>
>> I am trying to convert the result I get after doing some async IO :
>>
>> val rdd: RDD[T] = // some rdd
>>
>> val result: RDD[Future[T]] = rdd.map(httpCall)
>>
>> Is there a way collect all futures once they are completed in a *non
>> blocking* (i.e. without scala.concurrent
>> Await) and lazy way?
>>
>> If the RDD was a standard scala collection then calling
>> "scala.concurrent.Future.sequence" would have resolved the issue but RDD
>> is
>> not a TraversableOnce (which is required by the method).
>>
>> Is there a way to do this kind of transformation with an RDD[Future[T]] ?
>>
>> Thanks,
>> Ayoub.
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/RDD-Future-T-Future-RDD-T-tp24000.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org
>>
>>

Re: RDD[Future[T]] => Future[RDD[T]]

Posted by Ignacio Blasco <el...@gmail.com>.
Maybe using mapPartitions and .sequence inside it?
El 26/7/2015 10:22 p. m., "Ayoub" <be...@gmail.com> escribió:

> Hello,
>
> I am trying to convert the result I get after doing some async IO :
>
> val rdd: RDD[T] = // some rdd
>
> val result: RDD[Future[T]] = rdd.map(httpCall)
>
> Is there a way collect all futures once they are completed in a *non
> blocking* (i.e. without scala.concurrent
> Await) and lazy way?
>
> If the RDD was a standard scala collection then calling
> "scala.concurrent.Future.sequence" would have resolved the issue but RDD is
> not a TraversableOnce (which is required by the method).
>
> Is there a way to do this kind of transformation with an RDD[Future[T]] ?
>
> Thanks,
> Ayoub.
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/RDD-Future-T-Future-RDD-T-tp24000.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>