You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2019/05/21 04:13:36 UTC

[jira] [Resolved] (SPARK-18833) Changing partition location using the 'ALTER TABLE .. SET LOCATION' command via beeline doesn't get reflected in Spark

     [ https://issues.apache.org/jira/browse/SPARK-18833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon resolved SPARK-18833.
----------------------------------
    Resolution: Incomplete

> Changing partition location using the 'ALTER TABLE .. SET LOCATION' command via beeline doesn't get reflected in Spark
> ----------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-18833
>                 URL: https://issues.apache.org/jira/browse/SPARK-18833
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.0.2
>            Reporter: Salil Surendran
>            Priority: Major
>              Labels: bulk-closed
>
> Use the 'ALTER TABLE' command to change the partition location of a table via beeline. spark-shell doesn't find any of the data from the table even though the data can be read via beeline. To reproduce do the following:
> == At hive side: ===
> hive> CREATE EXTERNAL TABLE testA (id STRING, name STRING) PARTITIONED BY (idP STRING) STORED AS PARQUET LOCATION '/user/root/A/' ;
> hive> CREATE EXTERNAL TABLE testB (id STRING, name STRING) PARTITIONED BY (idP STRING) STORED AS PARQUET LOCATION '/user/root/B/' ;
> hive> CREATE EXTERNAL TABLE testC (id STRING, name STRING) PARTITIONED BY (idP STRING) STORED AS PARQUET LOCATION '/user/root/C/' ;
> hive> insert into table testA PARTITION (idP='1') values ('1',"test"),('2',"test2");
> hive> ALTER TABLE testB ADD IF NOT EXISTS PARTITION(idP=‘1’);
> hive> ALTER TABLE testB PARTITION (idP='1') SET LOCATION '/user/root/A/idp=1/';
> hive> select * from testA;
> OK
> 1 test 1
> 2 test2 1
> hive> select * from testB;
> OK
> 1 test 1
> 2 test2 1
> Conclusion: it worked changing the location to the place where the parquet file is present.
> === At spark side: ===
> scala> import org.apache.spark.sql.hive.HiveContext
> scala> val hiveContext = new HiveContext(sc)
> scala> hiveContext.refreshTable("testB")
> scala> hiveContext.sql("select * from testB").count
> res2: Long = 0
> scala> hiveContext.sql("ALTER TABLE testC ADD IF NOT EXISTS PARTITION(idP='1')")
> res3: org.apache.spark.sql.DataFrame = [result: string]
> scala> hiveContext.sql("ALTER TABLE testC PARTITION (idP='1') SET LOCATION '/user/root/A/idp=1/' ")
> res4: org.apache.spark.sql.DataFrame = [result: string]
> scala> hiveContext.sql("select * from testC").count
> res6: Long = 0
> scala> hiveContext.refreshTable("testC")
> scala> hiveContext.sql("select * from testC").count
> res8: Long = 0 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org