You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sasha Ovsankin (JIRA)" <ji...@apache.org> on 2016/04/26 23:57:13 UTC
[jira] [Commented] (SPARK-14927) DataFrame. saveAsTable creates RDD partitions but not Hive partitions

    [ https://issues.apache.org/jira/browse/SPARK-14927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15259014#comment-15259014 ] 

Sasha Ovsankin commented on SPARK-14927:
----------------------------------------

Seems that the workaround is not to let saveAsTable create the table but rather create it oneself prior to writing to it, like so:

{code}
    hc.sql("create external table tmp.partitiontest1(val string) partitioned by (year int)")

    Seq(2012 -> "a", 2013 -> "b", 2014 -> "c").toDF("year", "val")
      .write
      .partitionBy("year")
      .mode(SaveMode.Append)
      .saveAsTable("tmp.partitiontest1")
{code}

> DataFrame. saveAsTable creates RDD partitions but not Hive partitions
> ---------------------------------------------------------------------
>
>                 Key: SPARK-14927
>                 URL: https://issues.apache.org/jira/browse/SPARK-14927
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.5.2, 1.6.1
>         Environment: Mac OS X 10.11.4 local
>            Reporter: Sasha Ovsankin
>
> This is a followup to http://stackoverflow.com/questions/31341498/save-spark-dataframe-as-dynamic-partitioned-table-in-hive . I tried to use suggestions in the answers but couldn't make it to work in Spark 1.6.1
> I am trying to create partitions programmatically from `DataFrame. Here is the relevant code (adapted from a Spark test):
>     hc.setConf("hive.metastore.warehouse.dir", "tmp/tests")
>     //    hc.setConf("hive.exec.dynamic.partition", "true")
>     //    hc.setConf("hive.exec.dynamic.partition.mode", "nonstrict")
>     hc.sql("create database if not exists tmp")
>     hc.sql("drop table if exists tmp.partitiontest1")
>     Seq(2012 -> "a").toDF("year", "val")
>       .write
>       .partitionBy("year")
>       .mode(SaveMode.Append)
>       .saveAsTable("tmp.partitiontest1")
>     hc.sql("show partitions tmp.partitiontest1").show
> Full file is here: https://gist.github.com/SashaOv/7c65f03a51c7e8f9c9e018cd42aa4c4a
> I get the error that the table is not partitioned:
>     ======================
>     HIVE FAILURE OUTPUT
>     ======================
>     SET hive.support.sql11.reserved.keywords=false
>     SET hive.metastore.warehouse.dir=tmp/tests
>     OK
>     OK
>     FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Table tmp.partitiontest1 is not a partitioned table
>     ======================
> It looks like the root cause is that `org.apache.spark.sql.hive.HiveMetastoreCatalog.newSparkSQLSpecificMetastoreTable` always creates table with empty partitions.
> Any help to move this forward is appreciated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org