You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by jaltekruse <gi...@git.apache.org> on 2016/01/27 19:56:09 UTC
[GitHub] drill pull request: DRILL-4203: fix dates written into parquet fil...
GitHub user jaltekruse opened a pull request:
https://github.com/apache/drill/pull/341
DRILL-4203: fix dates written into parquet files to conform to parquet format spec
This branch includes an update of the version number to 1.5.0, this is required because we need a hard release to signal that all future parquet files are not corrupted. Without this change the fixed files written by the writer would still be considered corrupt (as all of the rest of the files generated with earlier commits with the version 1.5.0-SNAPSHOT will actually be corrupted). This commit can be removed/amended when the changes are merged, but this patch should be immediately followed by a change of the version number to avoid the risk of generating files with corrected date values, but a version number that will tell the reader to still shift the dates.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/jaltekruse/incubator-drill 4203-parquet-dates-bug-squash2
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/drill/pull/341.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #341
----
commit 3cbbe1c418ec8e802144f6cba1d88ede9de7f930
Author: Jason Altekruse <al...@gmail.com>
Date: 2015-12-31T16:22:04Z
DRILL-4203: Fix date values written in parquet files created by Drill
Drill was writing non-standard dates into parquet files for all releases
before 1.5.0. The values have been read by Drill correctly by Drill, but
external tools like Spark reading the files will see corrupted values for
all dates that have been written by Drill.
This change corrects the behavior of the Drill parquet writer to correctly
store dates in the format given in the parquet specification.
To maintain compatibility with old files, the parquet reader code has
been updated to check for the old format and automatically shift the
corrupted values into corrected ones automatically.
The test cases included here should ensure that all files produced by
historical versions of Drill will continue to return the same values they
had in previous releases. For compatibility with external tools, any old
files with corrupted dates can be re-written using the CREATE TABLE AS
command (as the writer will now only produce the specification-compliant
values, even if after reading out of older corrupt files).
While the old behavior was a consistent shift into an unlikely range
to be used in a modern database (over 10,000 years in the future), these are still
valid date values. In the case where these may have been written into
files intentionally, and we cannot be certain from the metadata if Drill
produced the files, an option is included to turn off the auto-correction.
Use of this option is assumed to be extremely unlikely, but it is included
for completeness.
commit 9a3f3b8a3d599d3e8981c7b987f229809db8eec4
Author: Jason Altekruse <al...@gmail.com>
Date: 2016-01-27T18:20:01Z
Fix DrillVersionInfo to make it provide a valid version number even during
the unit tests.
This is now a build-time generated class, rather than one that looks on the
classpath for META-INF files.
This pattern for file generation with parameters passed from the POM files
was borrowed from parquet-mr.
commit fb4bc2271c625dd25729575fc77f117b2c1d0a72
Author: Jason Altekruse <al...@gmail.com>
Date: 2016-01-26T04:19:24Z
Changing version of Drill to 1.5.0
This isn't actually the 1.5.0 release, but the primary condition used
to identify if corrected dates are stored in a parquet file is the
Drill version included in the metadata. This version number is retrieved
from the META-INF in the drill jar. This version number change is needed
to make some of the regression tests pass, otherwise the 1.5.0-SNAPSHOT
version will make the tests assume that the files are corrupt (as all
commits before this one were writing corrupt dates).
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] drill pull request: DRILL-4203: fix dates written into parquet fil...
Posted by jaltekruse <gi...@git.apache.org>.
Github user jaltekruse commented on the pull request:
https://github.com/apache/drill/pull/341#issuecomment-175793221
@adeneche @parthchandra @StevenMPhillips Can one of you please review this change?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] drill pull request: DRILL-4203: fix dates written into parquet fil...
Posted by jaltekruse <gi...@git.apache.org>.
Github user jaltekruse commented on the pull request:
https://github.com/apache/drill/pull/341#issuecomment-175920847
Did you see my comment above? I am using the version number to signal that the dates are fixed now and in all future versions. All other commits with a version of 1.5.0-SNAPSHOT have produced invalid date values, so I want to make sure we correct dates in files that have the current version string written into their footers.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] drill pull request #341: DRILL-4203: fix dates written into parquet files to...
Posted by jaltekruse <gi...@git.apache.org>.
Github user jaltekruse closed the pull request at:
https://github.com/apache/drill/pull/341
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] drill issue #341: DRILL-4203: fix dates written into parquet files to confor...
Posted by jaltekruse <gi...@git.apache.org>.
Github user jaltekruse commented on the issue:
https://github.com/apache/drill/pull/341
This has been replaced by
https://github.com/apache/drill/pull/595
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] drill pull request: DRILL-4203: fix dates written into parquet fil...
Posted by adeneche <gi...@git.apache.org>.
Github user adeneche commented on the pull request:
https://github.com/apache/drill/pull/341#issuecomment-175917437
@jaltekruse why the version change ?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---