You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Shay Seng <sh...@1618labs.com> on 2013/10/18 02:35:35 UTC
help on SparkContext.sequenceFile()
Hey gurus,
I'm having a little trouble deciphering the docs for
sequenceFile[K, V](path: String, minSplits: Int =
defaultMinSplits<http://spark.incubator.apache.org/docs/latest/api/core/org/apache/spark/SparkContext.html#defaultMinSplits:Int>
)(implicit km: ClassManifest[K], vm: ClassManifest[V], kcf: () =>
WritableConverter[K], vcf: () => WritableConverter[V]):
RDD<http://spark.incubator.apache.org/docs/latest/api/core/org/apache/spark/rdd/RDD.html>[(K,
V)]
Does anyone have a short example snippet?
tks
shay
Re: help on SparkContext.sequenceFile()
Posted by Matei Zaharia <ma...@gmail.com>.
You need to do import SparkContext._ at the top of your program.
Matei
On Oct 18, 2013, at 5:56 PM, Shay Seng <sh...@1618labs.com> wrote:
> I seem to be having issues compiling it though...
> def readAsRdd[T: ClassManifest](sc: org.apache.spark.SparkContext, uri:String, clazz:java.lang.Class[_]) = {
> val rdd = sc.sequenceFile[org.apache.hadoop.io.Text, org.apache.hadoop.io.BytesWritable](uri)
> rdd.map(l=>{
> val sz = l._2.getLength
> val b = l._2.getBytes.slice(0,sz)
> val parseFrom = clazz.getMethod("parseFrom",Class.forName("[B"))
> parseFrom.invoke(null,b).asInstanceOf[T]
> })
> }
>
> [ERROR] /Users/shay/sb/experiment/sps-emr/ue/src/main/scala/ue/proto.scala:57: error: could not find implicit value for parameter kcf: () => org.apache.spark.WritableConverter[org.apache.hadoop.io.Text]
> [ERROR] Error occurred in an application involving default arguments.
> [INFO] val rdd = sc.sequenceFile[org.apache.hadoop.io.Text, org.apache.hadoop.io.BytesWritable](uri)
>
>
>
> On Fri, Oct 18, 2013 at 9:37 AM, Matei Zaharia <ma...@gmail.com> wrote:
> Don't worry about the implicit params, those are filled in by the compiler. All you need to do is provide a key and value type, and a path. Look at how sequenceFile gets used in this test:
>
> https://git-wip-us.apache.org/repos/asf?p=incubator-spark.git;a=blob;f=core/src/test/scala/spark/FileSuite.scala;hb=af3c9d50
>
> In particular, the K and V in Spark can be any Writable class, *or* primitive types like Int, Double, etc, or String. For the latter ones, Spark automatically uses the correct Hadoop Writable (e.g. IntWritable, DoubleWritable, Text).
>
> Matei
>
>
>
> On Oct 17, 2013, at 5:35 PM, Shay Seng <sh...@1618labs.com> wrote:
>
>> Hey gurus,
>>
>> I'm having a little trouble deciphering the docs for
>>
>> sequenceFile[K, V](path: String, minSplits: Int = defaultMinSplits)(implicit km: ClassManifest[K], vm: ClassManifest[V], kcf: () ⇒WritableConverter[K], vcf: () ⇒ WritableConverter[V]): RDD[(K, V)]
>>
>> Does anyone have a short example snippet?
>>
>> tks
>> shay
>>
>>
>
>
Re: help on SparkContext.sequenceFile()
Posted by Shay Seng <sh...@1618labs.com>.
I seem to be having issues compiling it though...
def readAsRdd[T: ClassManifest](sc: org.apache.spark.SparkContext,
uri:String, clazz:java.lang.Class[_]) = {
val rdd = sc.sequenceFile[org.apache.hadoop.io.Text,
org.apache.hadoop.io.BytesWritable](uri)
rdd.map(l=>{
val sz = l._2.getLength
val b = l._2.getBytes.slice(0,sz)
val parseFrom = clazz.getMethod("parseFrom",Class.forName("[B"))
parseFrom.invoke(null,b).asInstanceOf[T]
})
}
[ERROR]
/Users/shay/sb/experiment/sps-emr/ue/src/main/scala/ue/proto.scala:57:
error: could not find implicit value for parameter kcf: () =>
org.apache.spark.WritableConverter[org.apache.hadoop.io.Text]
[ERROR] Error occurred in an application involving default arguments.
[INFO] val rdd = sc.sequenceFile[org.apache.hadoop.io.Text,
org.apache.hadoop.io.BytesWritable](uri)
On Fri, Oct 18, 2013 at 9:37 AM, Matei Zaharia <ma...@gmail.com>wrote:
> Don't worry about the implicit params, those are filled in by the
> compiler. All you need to do is provide a key and value type, and a path.
> Look at how sequenceFile gets used in this test:
>
>
> https://git-wip-us.apache.org/repos/asf?p=incubator-spark.git;a=blob;f=core/src/test/scala/spark/FileSuite.scala;hb=af3c9d50
>
> In particular, the K and V in Spark can be any Writable class, *or*
> primitive types like Int, Double, etc, or String. For the latter ones,
> Spark automatically uses the correct Hadoop Writable (e.g. IntWritable,
> DoubleWritable, Text).
>
> Matei
>
>
>
> On Oct 17, 2013, at 5:35 PM, Shay Seng <sh...@1618labs.com> wrote:
>
> Hey gurus,
>
> I'm having a little trouble deciphering the docs for
>
> sequenceFile[K, V](path: String, minSplits: Int = defaultMinSplits<http://spark.incubator.apache.org/docs/latest/api/core/org/apache/spark/SparkContext.html#defaultMinSplits:Int>
> )(implicit km: ClassManifest[K], vm: ClassManifest[V], kcf: () =>
> WritableConverter[K], vcf: () => WritableConverter[V]): RDD<http://spark.incubator.apache.org/docs/latest/api/core/org/apache/spark/rdd/RDD.html>[(K,
> V)]
>
> Does anyone have a short example snippet?
>
> tks
> shay
>
>
>
>
Re: help on SparkContext.sequenceFile()
Posted by Matei Zaharia <ma...@gmail.com>.
Don't worry about the implicit params, those are filled in by the compiler. All you need to do is provide a key and value type, and a path. Look at how sequenceFile gets used in this test:
https://git-wip-us.apache.org/repos/asf?p=incubator-spark.git;a=blob;f=core/src/test/scala/spark/FileSuite.scala;hb=af3c9d50
In particular, the K and V in Spark can be any Writable class, *or* primitive types like Int, Double, etc, or String. For the latter ones, Spark automatically uses the correct Hadoop Writable (e.g. IntWritable, DoubleWritable, Text).
Matei
On Oct 17, 2013, at 5:35 PM, Shay Seng <sh...@1618labs.com> wrote:
> Hey gurus,
>
> I'm having a little trouble deciphering the docs for
>
> sequenceFile[K, V](path: String, minSplits: Int = defaultMinSplits)(implicit km: ClassManifest[K], vm: ClassManifest[V], kcf: () ⇒WritableConverter[K], vcf: () ⇒ WritableConverter[V]): RDD[(K, V)]
>
> Does anyone have a short example snippet?
>
> tks
> shay
>
>