You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Divya Gehlot <di...@gmail.com> on 2015/12/11 06:53:32 UTC

org.apache.spark.SparkException: Task failed while writing rows.+ Spark output data to hive table

Hi,

I am using HDP2.3.2 with Spark 1.4.1 and trying to insert data in hive
table using hive context.

Below is the sample code


   1. spark-shell   --master yarn-client --driver-memory 512m
--executor-memory 512m
   2. //Sample code
   3. import org.apache.spark.sql.SQLContext
   4. import sqlContext.implicits._
   5. val sqlContext = new org.apache.spark.sql.SQLContext(sc)
   6. val people = sc.textFile("/user/spark/people.txt")
   7. val schemaString = "name age"
   8. import org.apache.spark.sql.Row;
   9. import org.apache.spark.sql.types.{StructType,StructField,StringType};
   10. val schema =
   11.   StructType(
   12.     schemaString.split(" ").map(fieldName =>
StructField(fieldName, StringType, true)))
   13. val rowRDD = people.map(_.split(",")).map(p => Row(p(0), p(1).trim))
   14. //Create hive context
   15. val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
   16. //Apply the schema to the
   17. val df = hiveContext.createDataFrame(rowRDD, schema);
   18. val options = Map("path" ->
"hdfs://sandbox.hortonworks.com:8020/apps/hive/warehouse/personhivetable")
   19. df.write.format("org.apache.spark.sql.hive.orc.DefaultSource").options(options).saveAsTable("personhivetable")

Getting below error :


   1. org.apache.spark.SparkException: Task failed while writing rows.
   2. 	at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.org$apache$spark$sql$sources$InsertIntoHadoopFsRelation$writeRows$1(commands.scala:191)
   3. 	at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$anonfun$insert$1.apply(commands.scala:160)
   4. 	at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$anonfun$insert$1.apply(commands.scala:160)
   5. 	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
   6. 	at org.apache.spark.scheduler.Task.run(Task.scala:70)
   7. 	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
   8. 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   9. 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   10. 	at java.lang.Thread.run(Thread.java:745)
   11. Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
   12. 	at $line30.$read$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$anonfun$2.apply(<console>:29)
   13. 	at $line30.$read$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$anonfun$2.apply(<console>:29)
   14. 	at scala.collection.Iterator$anon$11.next(Iterator.scala:328)
   15. 	at scala.collection.Iterator$anon$11.next(Iterator.scala:328)
   16. 	at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.org$apache$spark$sql$sources$InsertIntoHadoopFsRelation$writeRows$1(commands.scala:182)
   17. 	... 8 more

Is it configuration issue?

When I googled it I found out that Environment variable named HIVE_CONF_DIR
should be there in spark-env.sh

Then I checked spark-env.sh in HDP2.3.2,I couldnt find the Environment
variable named HIVE_CONF_DIR .

Do I need to add above mentioned variables to insert spark output data to
hive tables.

Would really appreciate pointers.

Thanks,

Divya
Add comment
<https://community.hortonworks.com/questions/6023/orgapachesparksparkexception-task-failed-while-wri.html#>