You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Roberto Coluccio <ro...@gmail.com> on 2015/07/17 11:29:08 UTC

Spark 1.3.1 + Hive: write output to CSV with header on S3

Hello community,

I'm currently using Spark 1.3.1 with Hive support for outputting processed
data on an external Hive table backed on S3. I'm using a manual
specification of the delimiter, but I'd want to know if is there any
"clean" way to write in CSV format:

*val* sparkConf = *new* SparkConf()

*val* sc = *new* SparkContext(sparkConf)

*val* hiveContext = *new* org.apache.spark.sql.hive.HiveContext(sc)

*import* hiveContext.implicits._

hiveContext.sql( "CREATE EXTERNAL TABLE IF NOT EXISTS table_name(field1
STRING, field2 STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LOCATION '" + path_on_s3 + "'")

hiveContext.sql(<an INSERT OVERWRITE query to write into the above table>)


I also need the header of the table to be printed on each written file. I
tried with:


hiveContext.sql("set hive.cli.print.header=true")


But it didn't work.


Any hint?


Thank you.


Best regards,

Roberto

Re: Spark 1.3.1 + Hive: write output to CSV with header on S3

Posted by Michael Armbrust <mi...@databricks.com>.

Using a hive-site.xml file on the classpath.

On Fri, Jul 17, 2015 at 8:37 AM, spark user <sp...@yahoo.com.invalid>
wrote:

> Hi Roberto
>
> I have question regarding HiveContext .
>
> when you create HiveContext where you define Hive connection properties ?
> Suppose Hive is not in local machine i need to connect , how HiveConext
> will know the data base info like url ,username and password ?
>
> String  username = "";
> String  password = "";
>
> String url = "jdbc:hive2://quickstart.cloudera:10000/default";
>
>
>
>   On Friday, July 17, 2015 2:29 AM, Roberto Coluccio <
> roberto.coluccio@gmail.com> wrote:
>
>
> Hello community,
>
> I'm currently using Spark 1.3.1 with Hive support for outputting processed
> data on an external Hive table backed on S3. I'm using a manual
> specification of the delimiter, but I'd want to know if is there any
> "clean" way to write in CSV format:
>
> *val* sparkConf = *new* SparkConf()
> *val* sc = *new* SparkContext(sparkConf)
> *val* hiveContext = *new* org.apache.spark.sql.hive.HiveContext(sc)
> *import* hiveContext.implicits._
> hiveContext.sql( "CREATE EXTERNAL TABLE IF NOT EXISTS table_name(field1
> STRING, field2 STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
> LOCATION '" + path_on_s3 + "'")
> hiveContext.sql(<an INSERT OVERWRITE query to write into the above table>)
>
> I also need the header of the table to be printed on each written file. I
> tried with:
>
> hiveContext.sql("set hive.cli.print.header=true")
>
> But it didn't work.
>
> Any hint?
>
> Thank you.
>
> Best regards,
> Roberto
>
>
>
>

Re: Spark 1.3.1 + Hive: write output to CSV with header on S3

Posted by spark user <sp...@yahoo.com.INVALID>.

Hi Roberto 
I have question regarding HiveContext . 
when you create HiveContext where you define Hive connection properties ?  Suppose Hive is not in local machine i need to connect , how HiveConext will know the data base info like url ,username and password ?
String  username = "";
String  password = "";
String url = "jdbc:hive2://quickstart.cloudera:10000/default";  


     On Friday, July 17, 2015 2:29 AM, Roberto Coluccio <ro...@gmail.com> wrote:
   

 Hello community,
I'm currently using Spark 1.3.1 with Hive support for outputting processed data on an external Hive table backed on S3. I'm using a manual specification of the delimiter, but I'd want to know if is there any "clean" way to write in CSV format:
val sparkConf = new SparkConf()val sc = new SparkContext(sparkConf)val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)import hiveContext.implicits._   hiveContext.sql( "CREATE EXTERNAL TABLE IF NOT EXISTS table_name(field1 STRING, field2 STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LOCATION '" + path_on_s3 + "'")hiveContext.sql(<an INSERT OVERWRITE query to write into the above table>)
I also need the header of the table to be printed on each written file. I tried with:
hiveContext.sql("set hive.cli.print.header=true")
But it didn't work.
Any hint?
Thank you.
Best regards,Roberto