You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by buring <qy...@gmail.com> on 2014/11/10 10:07:33 UTC
index File create by mapFile can't read

Hi
	Recently i want to save a big RDD[(k,v)] in form of index and data ,I
deceide to use hadoop mapFile. I tried some examples like this
:https://gist.github.com/airawat/6538748  
	I runs the code well and generate a index and data file. I can use command
"hadoop fs -text /spark/out2/mapFile/data" to open the file .But when I run
command :hadoop fs -text /spark/out2/mapFile/index ,I can't see the index
content .there are only some informations in console :
	14/11/10 16:11:04 INFO zlib.ZlibFactory: Successfully loaded & initialized
native-zlib library
	14/11/10 16:11:04 INFO compress.CodecPool: Got brand-new decompressor
[.deflate]
	14/11/10 16:11:04 INFO compress.CodecPool: Got brand-new decompressor
[.deflate]
	14/11/10 16:11:04 INFO compress.CodecPool: Got brand-new decompressor
[.deflate]
	14/11/10 16:11:04 INFO compress.CodecPool: Got brand-new decompressor
[.deflate]

and commond :hadoop fs -ls /spark/out2/mapFile/  shows follows
-rw-r--r--   3 spark hdfs      24002 2014-11-10 15:19
/spark/out2/mapFile/data
-rw-r--r--   3 spark hdfs        136 2014-11-10 15:19
/spark/out2/mapFile/index

	I think "INFO compress.CodecPool: Got brand-new decompressor [.deflate]"
should't prohibit show the data in index. It'e really confused me. My code
was as follows:
	def try_Map_File(writePath:String) = { 
	    val uri = writePath+"/mapFile"
	    val data=Array(
	      "One, two, buckle my shoe","Three, four, shut the door","Five, six,
pick up sticks",    
	      "Seven, eight, lay them straight","Nine, ten, a big fat hen")
	            
	    val con = new SparkConf()
	   
con.set("spark.io.compression.codec","org.apache.spark.io.LZ4CompressionCodec")
	    val sc= new SparkContext(con)

	    val conf = sc.hadoopConfiguration
	    val fs = FileSystem.get(URI.create(uri),conf)
	    val key = new IntWritable()
	    val value = new Text()
	    var writer:MapFile.Writer = null
	    try{
	      val writer = new Writer(conf,fs,uri,key.getClass,value.getClass)
	      writer.setIndexInterval(64)
	      for(i<- Range(0,512)){
	        key.set(i+1)
	        value.set(data(i%data.length))
	        writer.append(key,value)
	      }
	    }finally {
	      IOUtils.closeStream(writer)
	    }
  	}
	can anyone give me some idea or other method to instead mapFile?




--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/index-File-create-by-mapFile-can-t-read-tp18471.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org