You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Hyukjin Kwon <gu...@gmail.com> on 2016/02/01 02:11:41 UTC

Re: Reading lzo+index with spark-csv (Splittable reads)

Hm.. As I said here
https://github.com/databricks/spark-csv/issues/245#issuecomment-177682354,

It sounds reasonable in a way though. For me, this might be to deal with
some narrow use-cases.

How about using csvRdd(),
https://github.com/databricks/spark-csv/blob/master/src/main/scala/com/databricks/spark/csv/CsvParser.scala#L143-L162
?

I think you can do this like below:


val rdd = sc.newAPIHadoopFile("/file.csv.lzo",
                    classOf[com.hadoop.mapreduce.LzoTextInputFormat],
                    classOf[org.apache.hadoop.io.LongWritable],
                    classOf[org.apache.hadoop.io.Text])
val df = new CsvParser()
      .csvRdd(sqlContext, rdd)



2016-01-30 10:04 GMT+09:00 syepes <sy...@gmail.com>:

> Well looking at the src it look like its not implemented:
>
>
> https://github.com/databricks/spark-csv/blob/master/src/main/scala/com/databricks/spark/csv/util/TextFile.scala#L34-L36
>
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Reading-lzo-index-with-spark-csv-Splittable-reads-tp26103p26105.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>