You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by Vitalii Diravka <vi...@gmail.com> on 2016/11/03 10:50:31 UTC

Re: isDateCorrect field in ParquetTableMetadata

I have opened a PR <https://github.com/apache/drill/pull/644> for the
DRILL-4980 <https://issues.apache.org/jira/browse/DRILL-4980>.

@Jinfeng I removed is.date.correct flag from the writer.
And I left the checking of it in detectCorruptDates() method for the reason
if somebody has already generated the parquet files with this property.

@Jason I have added the drill version checking as you suggested to do.

@Paul I have added parquet-writer.version property to parquet metadata
footer.
If you have other thoughts about variables names, please let me know I will
try to respond as fast as possible.
Kind regards
Vitalii

2016-10-31 7:07 GMT+02:00 Paul Rogers <pr...@maprtech.com>:

> Choosing a good property name should solve the confusion issue. Perhaps
> drill.writer as the name.
>
> The writer version is not needed if we feel that we’ll never again change
> the writer or introduce bugs. Since that is hard to predict, adding a
> writer version is very low cost insurance. Indeed, including a writer
> version is a very common technique. The question we should answer is why we
> don’t need to use this technique here…
>
> One can imagine that, as we evolve from 1.x to the 2.1 (or later) format,
> we may do so in fits and starts, perhaps based on community contributions.
> A version will help us know what capabilities were supported in the writer
> that wrote a particular file.
>
> Thanks,
>
> - Paul
>
> > On Oct 28, 2016, at 3:03 PM, Jason Altekruse <ja...@dremio.com> wrote:
> >
> > The only worry I have about declaring a writer version is possible
> > confusion with the Parquet format version itself. The format is already
> > defined through version 2.1 or something like that, but we are currently
> > only writing files based on the 1.x version of the format.
> >
> > My preferred solution to this problem would be to just make point
> releases
> > for problems like this (like in this case we could have made a 1.8.1
> > release, and then all of the 1.8.0-SNAPSHOT would all known to be bad and
> > everything after would be 1.8.1-SNAPSHOT and could have been known to be
> > correct).
> >
> > I'm open to to hearing other opinions on this, I just generally feel like
> > these bugs should be rare, and fixing them should be done with a lot of
> > care (and in this case I missed a few things). I don't think it would be
> > crazy to say that we should only merge these kinds of patches if we are
> > willing to say the fix is ready for a release.
> >
> > Jason Altekruse
> > Software Engineer at Dremio
> > Apache Drill Committer
> >
> > On Fri, Oct 28, 2016 at 2:52 PM, Vitalii Diravka <
> vitalii.diravka@gmail.com>
> > wrote:
> >
> >> Jinfeng,
> >>
> >> isDateCorrect will be false in the code when isDateCorrect property is
> >> absent in the parquet metadata.
> >>
> >> Anyway I am going to implement the mentioned approach with the
> >> parquet-writer.version instead of isDateCorrect property.
> >>
>
>