You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Akhilesh Pathodia <pa...@gmail.com> on 2016/01/23 17:59:34 UTC

Spark not writing data in Hive format

Hi,

I am trying to write data from Spark to hive partitioned table. The job is
running without any error, but it is not writing the data to correct
location.

job-executor-0] parquet.ParquetRelation (Logging.scala:logInfo(59)) -
Listing file:/yarn/nm/usercache/root/appcache/application_1453561680059_0005/container_e89_1453561680059_0005_01_000001/tmp/spark-f252468d-61f0-44f2-8819-34e2c27c80c7/metastore/case_logs
on driver
2016-01-23 07:58:53,223 INFO  [streaming-job-executor-0]
parquet.ParquetRelation (Logging.scala:logInfo(59)) - Listing
file:/yarn/nm/usercache/root/appcache/application_1453561680059_0005/container_e89_1453561680059_0005_01_000001/tmp/spark-f252468d-61f0-44f2-8819-34e2c27c80c7/metastore/case_logs
on driver*2016-01-23 07:58:53,276 WARN  [streaming-job-executor-0]
hive.HiveContext$$anon$1 (Logging.scala:logWarning(71)) - Persisting
partitioned data source relation `CASE_LOGS` into Hive metastore in
Spark SQL specific format, which is NOT compatible with Hive. Input
path(s): *
file:/yarn/nm/usercache/root/appcache/application_1453561680059_0005/container_e89_1453561680059_0005_01_000001/tmp/spark-f252468d-61f0-44f2-8819-34e2c27c80c7/metastore/case_logs
2016-01-23 07:58:53,454 INFO  [streaming-job-executor-0]
log.PerfLogger (PerfLogger.java:PerfLogBegin(118)) - <PERFLOG
method=create_table_with_environment_context
from=org.apache.hadoop.hive.metastore.RetryingHMSHandler>654 INFO
[JobScheduler] scheduler.JobScheduler (Logging.scala:logInfo(59)) -
Finished job streaming job 1453564710000 ms.0 from job set of time
1453564710000 ms


Its not writing data in Spark SQL format instead of Hive format. Can
anybody tell me how to get rid of this issue?

Spark version - 1.5.0
CDH 5.5.1

Thanks,
Akhilesh Pathodia