You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Parth Chandra (JIRA)" <ji...@apache.org> on 2015/12/03 09:32:11 UTC

[jira] [Commented] (DRILL-4154) Metadata Caching : Upgrading cache to v2 from v1 corrupts the cache in some scenarios

    [ https://issues.apache.org/jira/browse/DRILL-4154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15037480#comment-15037480 ] 

Parth Chandra commented on DRILL-4154:
--------------------------------------

[~rkins] After many hours of trying to reproduce this, the only way I am able to get the metadata cache file to look like in 'broken-cache.txt' is if the metadata cache file gets created without the migration tool having been run on the parquet files. The data files you attached do not have the appropriate version number and in that case the parquet code prevents us from reading the stats for binary columns. 
There is an issue with the migration tool in that, at least on a local file system, the timestamp of the directory does not get updated after the parquet files are updated. This should be fixed. (Note I have yet to try this on a dfs).

For the second issue, it is likely that when you copied the cache file, the directory timestamp was also updated. I have seen sometimes, that in such a case the timestamp of the directory may be a few microseconds newer than the timestamp of the copied cache file. In this case we think the cache file is stale and recreate it. This behaviour is safe. Also this situation is unlikely to occur as copying metadata cache files is not likely to happen.

> Metadata Caching : Upgrading cache to v2 from v1 corrupts the cache in some scenarios
> -------------------------------------------------------------------------------------
>
>                 Key: DRILL-4154
>                 URL: https://issues.apache.org/jira/browse/DRILL-4154
>             Project: Apache Drill
>          Issue Type: Bug
>            Reporter: Rahul Challapalli
>            Priority: Critical
>         Attachments: broken-cache.txt, fewtypes_varcharpartition.tar.tgz, old-cache.txt
>
>
> git.commit.id.abbrev=46c47a2
> I copied the data along with the cache file onto maprfs. Now I ran the upgrade tool (https://github.com/parthchandra/drill-upgrade). Now I ran the metadata_caching suite from the functional tests (concurrency 10) without the datagen phase. I see 3 test failures and when I looked at the cache file it seems to be containing wrong information for the varchar column. 
> Sample from the cache :
> {code}
>       {
>         "name" : [ "varchar_col" ]
>       }, {
>         "name" : [ "float_col" ],
>         "mxValue" : 68797.22,
>         "nulls" : 0
>       }
> {code}
> Now I followed the same steps and instead of running the suites I executed the "REFRESH TABLE METADATA" command or any query on that folder,  the cache file seems to be created properly
> I attached the data and cache files required. Let me know if you need anything



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)