You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@drill.apache.org by kr...@apache.org on 2015/12/15 03:29:00 UTC

drill git commit: reorg/correct Parquet migration

Repository: drill
Updated Branches:
  refs/heads/gh-pages 7c9401a32 -> d75db7472


reorg/correct Parquet migration


Project: http://git-wip-us.apache.org/repos/asf/drill/repo
Commit: http://git-wip-us.apache.org/repos/asf/drill/commit/d75db747
Tree: http://git-wip-us.apache.org/repos/asf/drill/tree/d75db747
Diff: http://git-wip-us.apache.org/repos/asf/drill/diff/d75db747

Branch: refs/heads/gh-pages
Commit: d75db7472284ad46444a01dc1e8f793af7daced7
Parents: 7c9401a
Author: Kris Hahn <kr...@apache.org>
Authored: Mon Dec 14 18:26:23 2015 -0800
Committer: Kris Hahn <kr...@apache.org>
Committed: Mon Dec 14 18:28:06 2015 -0800

----------------------------------------------------------------------
 _data/docs.json                                 | 179 ++++++++-----------
 _docs/install/010-install-drill-introduction.md |   4 +-
 _docs/install/020-migrating-parquet-data.md     |  59 ++++++
 .../010-partition-pruning-introduction.md       |   4 +-
 .../020-migrating-partitioned-data.md           |  58 ------
 5 files changed, 136 insertions(+), 168 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/drill/blob/d75db747/_data/docs.json
----------------------------------------------------------------------
diff --git a/_data/docs.json b/_data/docs.json
index cd815a3..01502b4 100644
--- a/_data/docs.json
+++ b/_data/docs.json
@@ -3582,8 +3582,8 @@
             "next_title": "Optimizing Parquet Metadata Reading", 
             "next_url": "/docs/optimizing-parquet-metadata-reading/", 
             "parent": "Partition Pruning", 
-            "previous_title": "Migrating Partitioned Data", 
-            "previous_url": "/docs/migrating-partitioned-data/", 
+            "previous_title": "Partition Pruning Introduction", 
+            "previous_url": "/docs/partition-pruning-introduction/", 
             "relative_path": "_docs/performance-tuning/partition-pruning/030-using-partition-pruning.md", 
             "title": "How to Partition Data", 
             "url": "/docs/how-to-partition-data/"
@@ -3659,8 +3659,8 @@
                         }
                     ], 
                     "children": [], 
-                    "next_title": "Installing Drill in Embedded Mode", 
-                    "next_url": "/docs/installing-drill-in-embedded-mode/", 
+                    "next_title": "Migrating Parquet Data", 
+                    "next_url": "/docs/migrating-parquet-data/", 
                     "parent": "Install Drill", 
                     "previous_title": "Install Drill", 
                     "previous_url": "/docs/install-drill/", 
@@ -3675,6 +3675,23 @@
                             "url": "/docs/install-drill/"
                         }
                     ], 
+                    "children": [], 
+                    "next_title": "Installing Drill in Embedded Mode", 
+                    "next_url": "/docs/installing-drill-in-embedded-mode/", 
+                    "parent": "Install Drill", 
+                    "previous_title": "Install Drill Introduction", 
+                    "previous_url": "/docs/install-drill-introduction/", 
+                    "relative_path": "_docs/install/020-migrating-parquet-data.md", 
+                    "title": "Migrating Parquet Data", 
+                    "url": "/docs/migrating-parquet-data/"
+                }, 
+                {
+                    "breadcrumbs": [
+                        {
+                            "title": "Install Drill", 
+                            "url": "/docs/install-drill/"
+                        }
+                    ], 
                     "children": [
                         {
                             "breadcrumbs": [
@@ -3785,8 +3802,8 @@
                     "next_title": "Embedded Mode Prerequisites", 
                     "next_url": "/docs/embedded-mode-prerequisites/", 
                     "parent": "Install Drill", 
-                    "previous_title": "Install Drill Introduction", 
-                    "previous_url": "/docs/install-drill-introduction/", 
+                    "previous_title": "Migrating Parquet Data", 
+                    "previous_url": "/docs/migrating-parquet-data/", 
                     "relative_path": "_docs/install/030-installing-drill-in-embedded-mode.md", 
                     "title": "Installing Drill in Embedded Mode", 
                     "url": "/docs/installing-drill-in-embedded-mode/"
@@ -3907,8 +3924,8 @@
                 }
             ], 
             "children": [], 
-            "next_title": "Installing Drill in Embedded Mode", 
-            "next_url": "/docs/installing-drill-in-embedded-mode/", 
+            "next_title": "Migrating Parquet Data", 
+            "next_url": "/docs/migrating-parquet-data/", 
             "parent": "Install Drill", 
             "previous_title": "Install Drill", 
             "previous_url": "/docs/install-drill/", 
@@ -4114,8 +4131,8 @@
             "next_title": "Embedded Mode Prerequisites", 
             "next_url": "/docs/embedded-mode-prerequisites/", 
             "parent": "Install Drill", 
-            "previous_title": "Install Drill Introduction", 
-            "previous_url": "/docs/install-drill-introduction/", 
+            "previous_title": "Migrating Parquet Data", 
+            "previous_url": "/docs/migrating-parquet-data/", 
             "relative_path": "_docs/install/030-installing-drill-in-embedded-mode.md", 
             "title": "Installing Drill in Embedded Mode", 
             "url": "/docs/installing-drill-in-embedded-mode/"
@@ -4885,26 +4902,22 @@
             "title": "Math and Trig", 
             "url": "/docs/math-and-trig/"
         }, 
-        "Migrating Partitioned Data": {
+        "Migrating Parquet Data": {
             "breadcrumbs": [
                 {
-                    "title": "Partition Pruning", 
-                    "url": "/docs/partition-pruning/"
-                }, 
-                {
-                    "title": "Performance Tuning", 
-                    "url": "/docs/performance-tuning/"
+                    "title": "Install Drill", 
+                    "url": "/docs/install-drill/"
                 }
             ], 
             "children": [], 
-            "next_title": "How to Partition Data", 
-            "next_url": "/docs/how-to-partition-data/", 
-            "parent": "Partition Pruning", 
-            "previous_title": "Partition Pruning Introduction", 
-            "previous_url": "/docs/partition-pruning-introduction/", 
-            "relative_path": "_docs/performance-tuning/partition-pruning/020-migrating-partitioned-data.md", 
-            "title": "Migrating Partitioned Data", 
-            "url": "/docs/migrating-partitioned-data/"
+            "next_title": "Installing Drill in Embedded Mode", 
+            "next_url": "/docs/installing-drill-in-embedded-mode/", 
+            "parent": "Install Drill", 
+            "previous_title": "Install Drill Introduction", 
+            "previous_url": "/docs/install-drill-introduction/", 
+            "relative_path": "_docs/install/020-migrating-parquet-data.md", 
+            "title": "Migrating Parquet Data", 
+            "url": "/docs/migrating-parquet-data/"
         }, 
         "Modify logback.xml": {
             "breadcrumbs": [
@@ -5854,8 +5867,8 @@
                         }
                     ], 
                     "children": [], 
-                    "next_title": "Migrating Partitioned Data", 
-                    "next_url": "/docs/migrating-partitioned-data/", 
+                    "next_title": "How to Partition Data", 
+                    "next_url": "/docs/how-to-partition-data/", 
                     "parent": "Partition Pruning", 
                     "previous_title": "Partition Pruning", 
                     "previous_url": "/docs/partition-pruning/", 
@@ -5875,32 +5888,11 @@
                         }
                     ], 
                     "children": [], 
-                    "next_title": "How to Partition Data", 
-                    "next_url": "/docs/how-to-partition-data/", 
-                    "parent": "Partition Pruning", 
-                    "previous_title": "Partition Pruning Introduction", 
-                    "previous_url": "/docs/partition-pruning-introduction/", 
-                    "relative_path": "_docs/performance-tuning/partition-pruning/020-migrating-partitioned-data.md", 
-                    "title": "Migrating Partitioned Data", 
-                    "url": "/docs/migrating-partitioned-data/"
-                }, 
-                {
-                    "breadcrumbs": [
-                        {
-                            "title": "Partition Pruning", 
-                            "url": "/docs/partition-pruning/"
-                        }, 
-                        {
-                            "title": "Performance Tuning", 
-                            "url": "/docs/performance-tuning/"
-                        }
-                    ], 
-                    "children": [], 
                     "next_title": "Optimizing Parquet Metadata Reading", 
                     "next_url": "/docs/optimizing-parquet-metadata-reading/", 
                     "parent": "Partition Pruning", 
-                    "previous_title": "Migrating Partitioned Data", 
-                    "previous_url": "/docs/migrating-partitioned-data/", 
+                    "previous_title": "Partition Pruning Introduction", 
+                    "previous_url": "/docs/partition-pruning-introduction/", 
                     "relative_path": "_docs/performance-tuning/partition-pruning/030-using-partition-pruning.md", 
                     "title": "How to Partition Data", 
                     "url": "/docs/how-to-partition-data/"
@@ -5927,8 +5919,8 @@
                 }
             ], 
             "children": [], 
-            "next_title": "Migrating Partitioned Data", 
-            "next_url": "/docs/migrating-partitioned-data/", 
+            "next_title": "How to Partition Data", 
+            "next_url": "/docs/how-to-partition-data/", 
             "parent": "Partition Pruning", 
             "previous_title": "Partition Pruning", 
             "previous_url": "/docs/partition-pruning/", 
@@ -5993,8 +5985,8 @@
                                 }
                             ], 
                             "children": [], 
-                            "next_title": "Migrating Partitioned Data", 
-                            "next_url": "/docs/migrating-partitioned-data/", 
+                            "next_title": "How to Partition Data", 
+                            "next_url": "/docs/how-to-partition-data/", 
                             "parent": "Partition Pruning", 
                             "previous_title": "Partition Pruning", 
                             "previous_url": "/docs/partition-pruning/", 
@@ -6014,32 +6006,11 @@
                                 }
                             ], 
                             "children": [], 
-                            "next_title": "How to Partition Data", 
-                            "next_url": "/docs/how-to-partition-data/", 
-                            "parent": "Partition Pruning", 
-                            "previous_title": "Partition Pruning Introduction", 
-                            "previous_url": "/docs/partition-pruning-introduction/", 
-                            "relative_path": "_docs/performance-tuning/partition-pruning/020-migrating-partitioned-data.md", 
-                            "title": "Migrating Partitioned Data", 
-                            "url": "/docs/migrating-partitioned-data/"
-                        }, 
-                        {
-                            "breadcrumbs": [
-                                {
-                                    "title": "Partition Pruning", 
-                                    "url": "/docs/partition-pruning/"
-                                }, 
-                                {
-                                    "title": "Performance Tuning", 
-                                    "url": "/docs/performance-tuning/"
-                                }
-                            ], 
-                            "children": [], 
                             "next_title": "Optimizing Parquet Metadata Reading", 
                             "next_url": "/docs/optimizing-parquet-metadata-reading/", 
                             "parent": "Partition Pruning", 
-                            "previous_title": "Migrating Partitioned Data", 
-                            "previous_url": "/docs/migrating-partitioned-data/", 
+                            "previous_title": "Partition Pruning Introduction", 
+                            "previous_url": "/docs/partition-pruning-introduction/", 
                             "relative_path": "_docs/performance-tuning/partition-pruning/030-using-partition-pruning.md", 
                             "title": "How to Partition Data", 
                             "url": "/docs/how-to-partition-data/"
@@ -12490,8 +12461,8 @@
                         }
                     ], 
                     "children": [], 
-                    "next_title": "Installing Drill in Embedded Mode", 
-                    "next_url": "/docs/installing-drill-in-embedded-mode/", 
+                    "next_title": "Migrating Parquet Data", 
+                    "next_url": "/docs/migrating-parquet-data/", 
                     "parent": "Install Drill", 
                     "previous_title": "Install Drill", 
                     "previous_url": "/docs/install-drill/", 
@@ -12506,6 +12477,23 @@
                             "url": "/docs/install-drill/"
                         }
                     ], 
+                    "children": [], 
+                    "next_title": "Installing Drill in Embedded Mode", 
+                    "next_url": "/docs/installing-drill-in-embedded-mode/", 
+                    "parent": "Install Drill", 
+                    "previous_title": "Install Drill Introduction", 
+                    "previous_url": "/docs/install-drill-introduction/", 
+                    "relative_path": "_docs/install/020-migrating-parquet-data.md", 
+                    "title": "Migrating Parquet Data", 
+                    "url": "/docs/migrating-parquet-data/"
+                }, 
+                {
+                    "breadcrumbs": [
+                        {
+                            "title": "Install Drill", 
+                            "url": "/docs/install-drill/"
+                        }
+                    ], 
                     "children": [
                         {
                             "breadcrumbs": [
@@ -12616,8 +12604,8 @@
                     "next_title": "Embedded Mode Prerequisites", 
                     "next_url": "/docs/embedded-mode-prerequisites/", 
                     "parent": "Install Drill", 
-                    "previous_title": "Install Drill Introduction", 
-                    "previous_url": "/docs/install-drill-introduction/", 
+                    "previous_title": "Migrating Parquet Data", 
+                    "previous_url": "/docs/migrating-parquet-data/", 
                     "relative_path": "_docs/install/030-installing-drill-in-embedded-mode.md", 
                     "title": "Installing Drill in Embedded Mode", 
                     "url": "/docs/installing-drill-in-embedded-mode/"
@@ -14329,8 +14317,8 @@
                                 }
                             ], 
                             "children": [], 
-                            "next_title": "Migrating Partitioned Data", 
-                            "next_url": "/docs/migrating-partitioned-data/", 
+                            "next_title": "How to Partition Data", 
+                            "next_url": "/docs/how-to-partition-data/", 
                             "parent": "Partition Pruning", 
                             "previous_title": "Partition Pruning", 
                             "previous_url": "/docs/partition-pruning/", 
@@ -14350,32 +14338,11 @@
                                 }
                             ], 
                             "children": [], 
-                            "next_title": "How to Partition Data", 
-                            "next_url": "/docs/how-to-partition-data/", 
-                            "parent": "Partition Pruning", 
-                            "previous_title": "Partition Pruning Introduction", 
-                            "previous_url": "/docs/partition-pruning-introduction/", 
-                            "relative_path": "_docs/performance-tuning/partition-pruning/020-migrating-partitioned-data.md", 
-                            "title": "Migrating Partitioned Data", 
-                            "url": "/docs/migrating-partitioned-data/"
-                        }, 
-                        {
-                            "breadcrumbs": [
-                                {
-                                    "title": "Partition Pruning", 
-                                    "url": "/docs/partition-pruning/"
-                                }, 
-                                {
-                                    "title": "Performance Tuning", 
-                                    "url": "/docs/performance-tuning/"
-                                }
-                            ], 
-                            "children": [], 
                             "next_title": "Optimizing Parquet Metadata Reading", 
                             "next_url": "/docs/optimizing-parquet-metadata-reading/", 
                             "parent": "Partition Pruning", 
-                            "previous_title": "Migrating Partitioned Data", 
-                            "previous_url": "/docs/migrating-partitioned-data/", 
+                            "previous_title": "Partition Pruning Introduction", 
+                            "previous_url": "/docs/partition-pruning-introduction/", 
                             "relative_path": "_docs/performance-tuning/partition-pruning/030-using-partition-pruning.md", 
                             "title": "How to Partition Data", 
                             "url": "/docs/how-to-partition-data/"

http://git-wip-us.apache.org/repos/asf/drill/blob/d75db747/_docs/install/010-install-drill-introduction.md
----------------------------------------------------------------------
diff --git a/_docs/install/010-install-drill-introduction.md b/_docs/install/010-install-drill-introduction.md
index 0ca6fe5..e0bf802 100644
--- a/_docs/install/010-install-drill-introduction.md
+++ b/_docs/install/010-install-drill-introduction.md
@@ -3,8 +3,8 @@ title: "Install Drill Introduction"
 parent: "Install Drill"
 ---
 
+If you installed Drill 1.2 or earlier and generated Parquet files, you need to [migrate the files to Drill 1.3]({{site.baseurl}}/docs/migrating-parquet-data) as explained in the next section.
 
 You can install Drill in either embedded mode or distributed mode. Installing
 Drill in embedded mode does not require any configuration. To use Drill in a
-clustered Hadoop environment, install Drill in distributed mode. You need to perform some configuration after installing Drill in distributed mode. After you complete these tasks, connect Drill to your Hive, HBase, or distributed file system
-data sources, and run queries on them.
\ No newline at end of file
+clustered Hadoop environment, install Drill in distributed mode. You need to perform some configuration after installing Drill in distributed mode. After you complete these tasks, connect Drill to your Hive, HBase, or distributed file system data sources, and run queries on them.
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/drill/blob/d75db747/_docs/install/020-migrating-parquet-data.md
----------------------------------------------------------------------
diff --git a/_docs/install/020-migrating-parquet-data.md b/_docs/install/020-migrating-parquet-data.md
new file mode 100755
index 0000000..f7eb72e
--- /dev/null
+++ b/_docs/install/020-migrating-parquet-data.md
@@ -0,0 +1,59 @@
+---
+title: "Migrating Parquet Data"
+parent: "Install Drill"
+--- 
+
+Migrating Parquet data that you generated using Drill 1.2 or earlier is mandatory before using the data in Drill 1.3. The data in must be marked as Drill-generated. Use the [drill-upgrade tool](https://github.com/parthchandra/drill-upgrade) to migrate Parquet data that you generated in Drill 1.2 or earlier as described in ["How to Migrate Data"]({{site.baseurl}}/docs/migrating-parquet-data/#how-to-migrate-data). 
+
+{% include startimportant.html %} Run the upgrade tool only on Drill-generated Parquet files. {% include endimportant.html %}
+
+<!-- as described in [DRILL-4070](https://issues.apache.org/jira/browse/DRILL-4070).  -->
+
+## Why Migrate Drill Data
+Drill 1.3 uses the latest Apache Parquet Library when generating and partitioning Parquet files, whereas Drill 1.2 and earlier uses a version of the previous Parquet Library created by the Drill team. The Drill team fixed a bug in the previous Library to accurately process Parquet files generated by other tools, such as Impala and Hive. Apache Parquet fixed the bug in the latest Library, making it suitable for use in Drill 1.3. Drill now uses the same Apache Parquet Library as Impala, Hive, and other software. You need to run the upgrade tool on Parquet files generated by Drill 1.2 and earlier that used the previous Library. 
+
+The upgrade tool simply inserts a version number in the metadata to mark the file as a Drill file. 
+
+<!-- The bug fix eliminated the risk of inaccurate metadata that could cause incorrect results when querying Hive- and Pig-generated Parquet files. No such risk exists with Drill-generated Parquet files. Querying Drill-generated Parquet files, regardless of the Drill version, yields accurate results. Drill-generated Parquet files, regardless of the Drill release, contain accurate metadata. -->
+
+
+## Preparing for the Migration
+Set aside sufficient time for the migration. In a test by Drill developers, it took 32 minutes to upgrade 1TB data in 840 files and 370 minutes to upgrade 100 GB data in 200k files. Although the size of files is a factor in the upgrade time, the number of files is the most significant factor.
+
+System administrators can write a shell script to run the upgrade tool simultaneously on multiple sub-directories.
+
+Back up the data to be migrated and create one or more `temp` directories as described in the next section before migrating the data.
+
+## How to Migrate Data
+The `temp` directory or directories hold a copy for recovery of the file(s) currently being modified in the event of a system failure. Inspecting the `temp` directory can also indicate the success or failure of an unattended migration.
+
+To migrate Parquet data for use in Drill 1.3 that you generated in Drill 1.2 or earlier, follow these steps:
+
+{% include startimportant.html %} Run the upgrade tool only on Drill-generated Parquet files. {% include endimportant.html %}
+
+1. Back up the data to be migrated.  
+2. Create one or more `temp` directories, depending on how you plan to run the upgrade tool, on the same file system as the data.  
+   For example, if the data is on HDFS, create the temp directory on HDFS.
+   Create distinct temp directories when you run the upgrade tool simultaneously on multiple directories as different directories can have files with same names.  
+3. Download the upgrade tool from [github](https://github.com/parthchandra/drill-upgrade).  
+4. If you use [Parquet metadata caching]({{site.baseurl}}/docs/optimizing-parquet-metadata-reading/#how-to-trigger-generation-of-the-parquet-metadata-cache-file):  
+   * Delete the cache file you generated from all directories and subdirectories where you plan to run the upgrade tool.  
+   * Run REFRESH TABLE METADATA on all the folders where a cache file previously existed.  
+5. Run the upgrade tool as shown in the following example:    
+
+             java -Dlog.path=/<your path>/drill-upgrade/upgrade.log -cp drill-upgrade-1.0-jar-with-dependencies.jar org.apache.drill.upgrade.Upgrade_12_13 --tempDir=maprfs:///drill/upgrade-temp maprfs:///drill/testdata/
+
+## Checking the Success of the Migration
+If you perform an unattended migration, check that the temp directory or directories are empty. Empty directories indicate success.
+
+## Handling of Migration Failure
+
+If a network connection goes down, or if a user cancels the operation, the file that was being processed at the time of cancellation could be corrupted. To recover from such a situation, perform the following steps:
+
+1. Copy the file back from the temp directory to your directory of Parquet files. 
+2. Re-run the upgrade tool.
+
+The tool skips the files that it has already processed and only updates the remaining files.
+
+
+

http://git-wip-us.apache.org/repos/asf/drill/blob/d75db747/_docs/performance-tuning/partition-pruning/010-partition-pruning-introduction.md
----------------------------------------------------------------------
diff --git a/_docs/performance-tuning/partition-pruning/010-partition-pruning-introduction.md b/_docs/performance-tuning/partition-pruning/010-partition-pruning-introduction.md
index 0271bec..8c76c3e 100755
--- a/_docs/performance-tuning/partition-pruning/010-partition-pruning-introduction.md
+++ b/_docs/performance-tuning/partition-pruning/010-partition-pruning-introduction.md
@@ -7,8 +7,8 @@ Partition pruning is a performance optimization that limits the number of files
 
 The query planner in Drill performs partition pruning by evaluating the filters. If no partition filters are present, the underlying Scan operator reads all files in all directories and then sends the data to operators, such as Filter, downstream. When partition filters are present, the query planner pushes the filters down to the Scan if possible. The Scan reads only the directories that match the partition filters, thus reducing disk I/O.
 
-## Using Partitioned Drill 1.1-1.2 Data
-Before using partitioned Drill 1.1-1.2 data in Drill 1.3, you need to migrate the data. Migrate Parquet data as described in ["Migrating Partitioned Data"]({{site.baseurl}}/docs/migrating-partitioned-data/). 
+## Using Partitioned Drill Data
+Before using Parquet data created by Drill 1.2 or earlier in Drill 1.3, you need to migrate the data. Migrate Parquet data as described in ["Migrating Parquet Data"]({{site.baseurl}}/docs/migrating-parquet-data/). 
 
 {% include startimportant.html %}Migrate only Parquet files that Drill generated.{% include endimportant.html %}
 

http://git-wip-us.apache.org/repos/asf/drill/blob/d75db747/_docs/performance-tuning/partition-pruning/020-migrating-partitioned-data.md
----------------------------------------------------------------------
diff --git a/_docs/performance-tuning/partition-pruning/020-migrating-partitioned-data.md b/_docs/performance-tuning/partition-pruning/020-migrating-partitioned-data.md
deleted file mode 100755
index 728f32e..0000000
--- a/_docs/performance-tuning/partition-pruning/020-migrating-partitioned-data.md
+++ /dev/null
@@ -1,58 +0,0 @@
----
-title: "Migrating Partitioned Data"
-parent: "Partition Pruning"
---- 
-
-Migrating Parquet data that you partitioned and generated using Drill 1.1 and 1.2 is mandatory before using the data in Drill 1.3. The data in must be marked as Drill-generated. Use the [drill-upgrade tool](https://github.com/parthchandra/drill-upgrade) to migrate Parquet data that you partitioned and generated in Drill 1.1 or 1.2. 
-
-{% include startimportant.html %} Run the upgrade tool only on Drill-generated Parquet files. {% include endimportant.html %}
-
-<!-- as described in [DRILL-4070](https://issues.apache.org/jira/browse/DRILL-4070).  -->
-
-## Why Migrate Drill 1.1-1.2 Data
-Parquet data partitioning became available in Drill 1.1 with the introduction of the PARTITION BY clause of the CTAS command. Drill 1.3 uses the latest (as of the 1.3 release date) Apache Parquet Library from when generating and partitioning Parquet files, whereas Drill 1.1 and 1.2 use a version of a previous Parquet Library created by the Drill team. The Drill team fixed a bug in the previous Library to accurately process Parquet files generated by other tools, such as Impala and Hive. Apache Parquet fixed the bug in the latest Library, making it suitable for use in Drill 1.3. Drill now uses the same Apache Parquet Library as Impala, Hive, and other software. You need to run the upgrade tool on Parquet files generated by Drill 1.1 and 1.2 that used the previous Library. 
-
-The upgrade tool simply inserts a version number in the metadata to mark the file as a Drill file. 
-
-<!-- The bug fix eliminated the risk of inaccurate metadata that could cause incorrect results when querying Hive- and Pig-generated Parquet files. No such risk exists with Drill-generated Parquet files. Querying Drill-generated Parquet files, regardless of the Drill version, yields accurate results. Drill-generated Parquet files, regardless of the Drill release, contain accurate metadata. -->
-
-
-## Preparing for the Migration
-Set aside sufficient time for the migration. In a test by Drill developers, it took 32 minutes to upgrade 1TB data in 840 files and 370 minutes to upgrade 100 GB data in 200k files. Although the size of files is a factor in the upgrade time, the number of files is the most significant factor.
-
-System administrators can write a shell script to run the upgrade tool simultaneously on multiple sub-directories.
-
-Back up the data to be migrated and create one or more `temp` directories as described in the next section before migrating the data.
-
-## How to Migrate Data
-Use the [drill-upgrade tool](https://github.com/parthchandra/drill-upgrade) to modify one file at a time. The `temp` directory or directories hold a copy for recovery of the file(s) currently being modified in the event of a system failure. Inspecting the `temp` directory can also indicate the success or failure of an unattended migration.
-
-To migrate Parquet data for use in Drill 1.3 that you partitioned and generated in Drill 1.1 or 1.2, follow these steps:
-
-{% include startimportant.html %} Run the upgrade tool only on Drill-generated Parquet files. {% include endimportant.html %}
-
-1. Back up the data to be migrated.  
-2. Create one or more temp directories, depending on how you plan to run the upgrade tool, on the same file system as the data.  
-   For example, if the data is on HDFS, create the temp directory on HDFS.
-   Create distinct temp directories when you run the upgrade tool simultaneously on multiple directories as different directories can have files with same names.  
-3. Access the upgrade tool at TBD.  
-4. If you use [Parquet metadata caching]({{site.baseurl}}/docs/optimizing-parquet-metadata-reading/#how-to-trigger-generation-of-the-parquet-metadata-cache-file):  
-   * Delete the cache file you generated from all directories and subdirectories where you plan to run the upgrade tool.  
-   * Run REFRESH TABLE METADATA on all the folders where a cache file previously existed.  
-5. Run the upgrade tool as shown in the following example:    
-        java -Dlog.path=/<your path>/drill-upgrade/upgrade.log -cp drill-upgrade-1.0-jar-with-dependencies.jar org.apache.drill.upgrade.Upgrade_12_13 --tempDir=maprfs:///drill/upgrade-temp maprfs:///drill/testdata/
-
-## Checking the Success of the Migration
-If you perform an unattended migration, check that the temp directory or directories are empty. Empty directories indicate success.
-
-## Handling of Migration Failure
-
-If a network connection goes down, or if a user cancels the operation, the file that was being processed at the time of cancellation could be corrupted. To recover from such a situation, perform the following steps:
-
-1. Copy the file back from the temp directory to your directory of Parquet files. 
-2. Re-run the upgrade tool.
-
-The tool skips the files that it has already processed and only updates the remaining files.
-
-
-