You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by robertberta <ro...@atigeo.com> on 2014/09/09 14:39:13 UTC

spark functionality similar to hadoop's RecordWriter close method

I want to call a function for batches of elements from an rdd

val javaClass:org.apache.spark.api.java.function.Function[Seq[String],Unit]
= new JavaClass()
rdd.mapPartitions(_.grouped(5)).foreach(javaClass)

1.This worked fine in spark 0.9.1 , when we upgrade to spark 1.0.2 ,
Function changed from class to interface and we get :

type mismatch;
found   : org.apache.spark.api.java.function.Function[Seq[String],Unit]
required: Seq[String] => Unit

We are using Java 1.7
We use that class for one time initialization method call on each executor
and for batch processing .

2. Previously on hadoop by RecordWriter.close() we get a callback method for
every executor that processed map/reduce operations. We would like this in
spark too , is it possible? 



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark-functionality-similar-to-hadoop-s-RecordWriter-close-method-tp13795.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: spark functionality similar to hadoop's RecordWriter close method

Posted by Sean Owen <so...@cloudera.com>.

You're mixing the Java and Scala APIs here. Your call to foreach() is
expecting a Scala function and you're giving it a Java Function.
Ideally you just use the Scala API, of course. Before explaining how
to actually use a Java function here, maybe clarify that you have to
do it and can't use Scala for some reason? since your declaration of
the Java function also seems like it isn't intentional -- Unit and Seq
are not Java types.

There is no callback, and no direct analog of RecordWriter.close(). If
you are writing a "foreach" function, then you want to use
"foreachPartition", and then after writing a partition's worth of
records, you could do something at the end of the function. This may
or may not suit your purpose as it is not necessarily called in the
same way that RecordWriter.close() was. But ideally you're not relying
on that kind of thing anyway.

On Tue, Sep 9, 2014 at 1:39 PM, robertberta <ro...@atigeo.com> wrote:
> I want to call a function for batches of elements from an rdd
>
> val javaClass:org.apache.spark.api.java.function.Function[Seq[String],Unit]
> = new JavaClass()
> rdd.mapPartitions(_.grouped(5)).foreach(javaClass)
>
> 1.This worked fine in spark 0.9.1 , when we upgrade to spark 1.0.2 ,
> Function changed from class to interface and we get :
>
> type mismatch;
> found   : org.apache.spark.api.java.function.Function[Seq[String],Unit]
> required: Seq[String] => Unit
>
> We are using Java 1.7
> We use that class for one time initialization method call on each executor
> and for batch processing .
>
> 2. Previously on hadoop by RecordWriter.close() we get a callback method for
> every executor that processed map/reduce operations. We would like this in
> spark too , is it possible?
>
>
>
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark-functionality-similar-to-hadoop-s-RecordWriter-close-method-tp13795.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org