You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by buring <qy...@gmail.com> on 2014/11/10 10:07:33 UTC
index File create by mapFile can't read
Hi
Recently i want to save a big RDD[(k,v)] in form of index and data ,I
deceide to use hadoop mapFile. I tried some examples like this
:https://gist.github.com/airawat/6538748
I runs the code well and generate a index and data file. I can use command
"hadoop fs -text /spark/out2/mapFile/data" to open the file .But when I run
command :hadoop fs -text /spark/out2/mapFile/index ,I can't see the index
content .there are only some informations in console :
14/11/10 16:11:04 INFO zlib.ZlibFactory: Successfully loaded & initialized
native-zlib library
14/11/10 16:11:04 INFO compress.CodecPool: Got brand-new decompressor
[.deflate]
14/11/10 16:11:04 INFO compress.CodecPool: Got brand-new decompressor
[.deflate]
14/11/10 16:11:04 INFO compress.CodecPool: Got brand-new decompressor
[.deflate]
14/11/10 16:11:04 INFO compress.CodecPool: Got brand-new decompressor
[.deflate]
and commond :hadoop fs -ls /spark/out2/mapFile/ shows follows
-rw-r--r-- 3 spark hdfs 24002 2014-11-10 15:19
/spark/out2/mapFile/data
-rw-r--r-- 3 spark hdfs 136 2014-11-10 15:19
/spark/out2/mapFile/index
I think "INFO compress.CodecPool: Got brand-new decompressor [.deflate]"
should't prohibit show the data in index. It'e really confused me. My code
was as follows:
def try_Map_File(writePath:String) = {
val uri = writePath+"/mapFile"
val data=Array(
"One, two, buckle my shoe","Three, four, shut the door","Five, six,
pick up sticks",
"Seven, eight, lay them straight","Nine, ten, a big fat hen")
val con = new SparkConf()
con.set("spark.io.compression.codec","org.apache.spark.io.LZ4CompressionCodec")
val sc= new SparkContext(con)
val conf = sc.hadoopConfiguration
val fs = FileSystem.get(URI.create(uri),conf)
val key = new IntWritable()
val value = new Text()
var writer:MapFile.Writer = null
try{
val writer = new Writer(conf,fs,uri,key.getClass,value.getClass)
writer.setIndexInterval(64)
for(i<- Range(0,512)){
key.set(i+1)
value.set(data(i%data.length))
writer.append(key,value)
}
}finally {
IOUtils.closeStream(writer)
}
}
can anyone give me some idea or other method to instead mapFile?
--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/index-File-create-by-mapFile-can-t-read-tp18471.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org