You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Yuming Wang (Jira)" <ji...@apache.org> on 2019/11/02 01:13:00 UTC

[jira] [Commented] (SPARK-29719) Converted Metastore relations (ORC, Parquet) wouldn't update InMemoryFileIndex

    [ https://issues.apache.org/jira/browse/SPARK-29719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16965183#comment-16965183 ] 

Yuming Wang commented on SPARK-29719:
-------------------------------------

You should refresh {{my_table}}. A similar issue: https://github.com/apache/spark/pull/22721

> Converted Metastore relations (ORC, Parquet) wouldn't update InMemoryFileIndex
> ------------------------------------------------------------------------------
>
>                 Key: SPARK-29719
>                 URL: https://issues.apache.org/jira/browse/SPARK-29719
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.4.0
>            Reporter: Alexander Bessonov
>            Priority: Major
>
> Spark attempts to convert Hive tables backed by Parquet and ORC into an internal logical relationships which cache file locations for underlying data. That cache wouldn't be invalidated when attempting to re-read partitioned table later on. The table might have new files by the time it is re-read which might be ignored.
>  
>  
> {code:java}
> val spark = SparkSession.builder()
>     .master("yarn")
>     .enableHiveSupport
>     .config("spark.sql.hive.caseSensitiveInferenceMode", "NEVER_INFER")
>     .getOrCreate()
> val df1 = spark.table("my_table").filter("date=20191101")
> // Do something with `df1`
> // External process writes to the partition
> val df2 = spark.table("my_table").filter("date=20191101")
> // Do something with `df2`. Data in `df1` and `df2` should be different, but is equal.{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org