You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Alexander Bessonov (Jira)" <ji...@apache.org> on 2019/11/01 21:12:00 UTC

[jira] [Created] (SPARK-29719) Converted Metastore relations (ORC, Parquet) wouldn't update InMemoryFileIndex

Alexander Bessonov created SPARK-29719:
------------------------------------------

             Summary: Converted Metastore relations (ORC, Parquet) wouldn't update InMemoryFileIndex
                 Key: SPARK-29719
                 URL: https://issues.apache.org/jira/browse/SPARK-29719
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.4.0
            Reporter: Alexander Bessonov


Spark attempts to convert Hive tables backed by Parquet and ORC into an internal logical relationships which cache file locations for underlying data. That cache wouldn't be invalidated when attempting to re-read partitioned table later on. The table might have new files by the time it is re-read which might be ignored.

 

 
{code:java}
val spark = SparkSession.builder()
    .master("yarn")
    .enableHiveSupport
    .config("spark.sql.hive.caseSensitiveInferenceMode", "NEVER_INFER")
    .getOrCreate()

val df1 = spark.table("my_table").filter("date=20191101")
// Do something with `df1`
// External process writes to the partition
val df2 = spark.table("my_table").filter("date=20191101")
// Do something with `df2`. Data in `df1` and `df2` should be different, but is equal.{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org