You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Vitalii Diravka (JIRA)" <ji...@apache.org> on 2017/08/11 08:46:00 UTC
[jira] [Updated] (DRILL-3867) Store relative paths in metadata file

     [ https://issues.apache.org/jira/browse/DRILL-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vitalii Diravka updated DRILL-3867:
-----------------------------------
    Description: 
git.commit.id.abbrev=cf4f745
git.commit.time=29.09.2015 @ 23\:19\:52 UTC

The below sequence of steps reproduces the issue

1. Create the cache file
{code}
0: jdbc:drill:zk=10.10.103.60:5181> refresh table metadata dfs.`/drill/testdata/metadata_caching/lineitem`;
+-------+-------------------------------------------------------------------------------------+
|  ok   |                                       summary                                       |
+-------+-------------------------------------------------------------------------------------+
| true  | Successfully updated metadata for table /drill/testdata/metadata_caching/lineitem.  |
+-------+-------------------------------------------------------------------------------------+
1 row selected (1.558 seconds)
{code}

2. Move the directory
{code}
hadoop fs -mv /drill/testdata/metadata_caching/lineitem /drill/
{code}

3. Now run a query on top of it
{code}
0: jdbc:drill:zk=10.10.103.60:5181> select * from dfs.`/drill/lineitem` limit 1;
Error: SYSTEM ERROR: FileNotFoundException: Requested file maprfs:///drill/testdata/metadata_caching/lineitem/2006/1 does not exist.


[Error Id: b456d912-57a0-4690-a44b-140d4964903e on pssc-66.qa.lab:31010] (state=,code=0)
{code}
This is obvious given the fact that we are storing absolute file paths in the cache file.

*Summary description of the fix:*

In Drill 1.11 and later, Drill stores the paths to the Parquet files as relative paths instead of absolute paths. You can move partitioned Parquet directories from one location in the distributed files system to another without issuing the REFRESH TABLE METADATA command to rebuild the Parquet metadata files; the metadata remains valid in the new location.

Note

Reverting back to a previous version of Drill from 1.11 is not recommended because Drill will incorrectly interpret the Parquet metadata files created by Drill 1.11. Should this occur, remove the Parquet metadata files and run the refresh table metadata command to rebuild the files in the older format.


  was:
git.commit.id.abbrev=cf4f745
git.commit.time=29.09.2015 @ 23\:19\:52 UTC

The below sequence of steps reproduces the issue

1. Create the cache file
{code}
0: jdbc:drill:zk=10.10.103.60:5181> refresh table metadata dfs.`/drill/testdata/metadata_caching/lineitem`;
+-------+-------------------------------------------------------------------------------------+
|  ok   |                                       summary                                       |
+-------+-------------------------------------------------------------------------------------+
| true  | Successfully updated metadata for table /drill/testdata/metadata_caching/lineitem.  |
+-------+-------------------------------------------------------------------------------------+
1 row selected (1.558 seconds)
{code}

2. Move the directory
{code}
hadoop fs -mv /drill/testdata/metadata_caching/lineitem /drill/
{code}

3. Now run a query on top of it
{code}
0: jdbc:drill:zk=10.10.103.60:5181> select * from dfs.`/drill/lineitem` limit 1;
Error: SYSTEM ERROR: FileNotFoundException: Requested file maprfs:///drill/testdata/metadata_caching/lineitem/2006/1 does not exist.


[Error Id: b456d912-57a0-4690-a44b-140d4964903e on pssc-66.qa.lab:31010] (state=,code=0)
{code}

This is obvious given the fact that we are storing absolute file paths in the cache file


> Store relative paths in metadata file
> -------------------------------------
>
>                 Key: DRILL-3867
>                 URL: https://issues.apache.org/jira/browse/DRILL-3867
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Metadata
>    Affects Versions: 1.2.0
>            Reporter: Rahul Challapalli
>            Assignee: Vitalii Diravka
>              Labels: doc-impacting, ready-to-commit
>             Fix For: 1.11.0
>
>
> git.commit.id.abbrev=cf4f745
> git.commit.time=29.09.2015 @ 23\:19\:52 UTC
> The below sequence of steps reproduces the issue
> 1. Create the cache file
> {code}
> 0: jdbc:drill:zk=10.10.103.60:5181> refresh table metadata dfs.`/drill/testdata/metadata_caching/lineitem`;
> +-------+-------------------------------------------------------------------------------------+
> |  ok   |                                       summary                                       |
> +-------+-------------------------------------------------------------------------------------+
> | true  | Successfully updated metadata for table /drill/testdata/metadata_caching/lineitem.  |
> +-------+-------------------------------------------------------------------------------------+
> 1 row selected (1.558 seconds)
> {code}
> 2. Move the directory
> {code}
> hadoop fs -mv /drill/testdata/metadata_caching/lineitem /drill/
> {code}
> 3. Now run a query on top of it
> {code}
> 0: jdbc:drill:zk=10.10.103.60:5181> select * from dfs.`/drill/lineitem` limit 1;
> Error: SYSTEM ERROR: FileNotFoundException: Requested file maprfs:///drill/testdata/metadata_caching/lineitem/2006/1 does not exist.
> [Error Id: b456d912-57a0-4690-a44b-140d4964903e on pssc-66.qa.lab:31010] (state=,code=0)
> {code}
> This is obvious given the fact that we are storing absolute file paths in the cache file.
> *Summary description of the fix:*
> In Drill 1.11 and later, Drill stores the paths to the Parquet files as relative paths instead of absolute paths. You can move partitioned Parquet directories from one location in the distributed files system to another without issuing the REFRESH TABLE METADATA command to rebuild the Parquet metadata files; the metadata remains valid in the new location.
> Note
> Reverting back to a previous version of Drill from 1.11 is not recommended because Drill will incorrectly interpret the Parquet metadata files created by Drill 1.11. Should this occur, remove the Parquet metadata files and run the refresh table metadata command to rebuild the files in the older format.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)