You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Zheng Shao (JIRA)" <ji...@apache.org> on 2009/09/30 09:46:32 UTC

[jira] Created: (HIVE-860) Persistent distributed cache

Persistent distributed cache
----------------------------

                 Key: HIVE-860
                 URL: https://issues.apache.org/jira/browse/HIVE-860
             Project: Hadoop Hive
          Issue Type: Improvement
            Reporter: Zheng Shao


DistributedCache is shared across multiple jobs, if the hdfs file name is the same.

We need to make sure Hive put the same file into the same location every time and do not overwrite if the file content is the same.

We can achieve 2 different results:
A1. Files added with the same name, timestamp, and md5 in the same session will have a single copy in distributed cache.
A2. Filed added with the same name, timestamp, and md5 will have a single copy in distributed cache.

A2 has a bigger benefit in sharing but may raise a question on when Hive should clean it up in hdfs.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.