You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Divya Gehlot <di...@gmail.com> on 2015/12/11 06:53:32 UTC
org.apache.spark.SparkException: Task failed while writing rows.+
Spark output data to hive table
Hi,
I am using HDP2.3.2 with Spark 1.4.1 and trying to insert data in hive
table using hive context.
Below is the sample code
1. spark-shell --master yarn-client --driver-memory 512m
--executor-memory 512m
2. //Sample code
3. import org.apache.spark.sql.SQLContext
4. import sqlContext.implicits._
5. val sqlContext = new org.apache.spark.sql.SQLContext(sc)
6. val people = sc.textFile("/user/spark/people.txt")
7. val schemaString = "name age"
8. import org.apache.spark.sql.Row;
9. import org.apache.spark.sql.types.{StructType,StructField,StringType};
10. val schema =
11. StructType(
12. schemaString.split(" ").map(fieldName =>
StructField(fieldName, StringType, true)))
13. val rowRDD = people.map(_.split(",")).map(p => Row(p(0), p(1).trim))
14. //Create hive context
15. val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
16. //Apply the schema to the
17. val df = hiveContext.createDataFrame(rowRDD, schema);
18. val options = Map("path" ->
"hdfs://sandbox.hortonworks.com:8020/apps/hive/warehouse/personhivetable")
19. df.write.format("org.apache.spark.sql.hive.orc.DefaultSource").options(options).saveAsTable("personhivetable")
Getting below error :
1. org.apache.spark.SparkException: Task failed while writing rows.
2. at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.org$apache$spark$sql$sources$InsertIntoHadoopFsRelation$writeRows$1(commands.scala:191)
3. at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$anonfun$insert$1.apply(commands.scala:160)
4. at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$anonfun$insert$1.apply(commands.scala:160)
5. at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
6. at org.apache.spark.scheduler.Task.run(Task.scala:70)
7. at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
8. at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
9. at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
10. at java.lang.Thread.run(Thread.java:745)
11. Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
12. at $line30.$read$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$anonfun$2.apply(<console>:29)
13. at $line30.$read$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$anonfun$2.apply(<console>:29)
14. at scala.collection.Iterator$anon$11.next(Iterator.scala:328)
15. at scala.collection.Iterator$anon$11.next(Iterator.scala:328)
16. at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.org$apache$spark$sql$sources$InsertIntoHadoopFsRelation$writeRows$1(commands.scala:182)
17. ... 8 more
Is it configuration issue?
When I googled it I found out that Environment variable named HIVE_CONF_DIR
should be there in spark-env.sh
Then I checked spark-env.sh in HDP2.3.2,I couldnt find the Environment
variable named HIVE_CONF_DIR .
Do I need to add above mentioned variables to insert spark output data to
hive tables.
Would really appreciate pointers.
Thanks,
Divya
Add comment
<https://community.hortonworks.com/questions/6023/orgapachesparksparkexception-task-failed-while-wri.html#>