You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@impala.apache.org by Joseph McDonnell <jo...@cloudera.com> on 2017/03/07 00:09:59 UTC

IMPALA-4624 requires reload of data for local PlannerTest runs

IMPALA-4624 changes Impala's Parquet writer to fill in the column chunk
metadata's encoding_stats field. This changed the file size for some of the
Parquet files used in our tests, resulting in a diff in
PlannerTest::testPredicatePropagation. Since the log file is now updated to
the new file sizes, any run of PlannerTest with the old tables will hit
this diff. The two tables involved are tpch_parquet.regionkey and
tpch_parquet.nation. Here is a workaround short of a full data reload:

use tpch_parquet;
insert overwrite table nation select * from nation;
insert overwrite table region select * from region;

It is possible that other tests that are not run as part of the normal test
suite may have a similar issue.

Thanks,
Joe

Re: IMPALA-4624 requires reload of data for local PlannerTest runs

Posted by Alex Behm <al...@cloudera.com>.
That makes sense. Sounds like the existing regex needs some polish.

On Mon, Mar 6, 2017 at 4:58 PM, Joseph McDonnell <jo...@cloudera.com>
wrote:

> What happened is that a size changed from size=900B to size=1.1KB. It looks
> like the change from B to KB is the problem rather than the number.
>
> On Mon, Mar 6, 2017 at 4:42 PM, Alex Behm <al...@cloudera.com> wrote:
>
> > The regex is baked into the test validation code: TestUtils.java, look
> > at FileSizeFilter
> >
> > On Mon, Mar 6, 2017 at 4:36 PM, Joseph McDonnell <
> > joemcdonnell@cloudera.com>
> > wrote:
> >
> > > Looking through the PlannerTest/*.test files, I don't see any regexes
> for
> > > the file sizes.
> > >
> > > On Mon, Mar 6, 2017 at 4:22 PM, Daniel Hecht <dh...@cloudera.com>
> > wrote:
> > >
> > > > I thought we had replaced all file sizes in the planner tests
> expected
> > > > results with regex '/d+' so they wouldn't be file size sensitive.
> Maybe
> > > we
> > > > just missed some places?
> > > >
> > > > On Mon, Mar 6, 2017 at 4:09 PM, Joseph McDonnell <
> > > > joemcdonnell@cloudera.com>
> > > > wrote:
> > > >
> > > > > IMPALA-4624 changes Impala's Parquet writer to fill in the column
> > chunk
> > > > > metadata's encoding_stats field. This changed the file size for
> some
> > of
> > > > the
> > > > > Parquet files used in our tests, resulting in a diff in
> > > > > PlannerTest::testPredicatePropagation. Since the log file is now
> > > updated
> > > > > to
> > > > > the new file sizes, any run of PlannerTest with the old tables will
> > hit
> > > > > this diff. The two tables involved are tpch_parquet.regionkey and
> > > > > tpch_parquet.nation. Here is a workaround short of a full data
> > reload:
> > > > >
> > > > > use tpch_parquet;
> > > > > insert overwrite table nation select * from nation;
> > > > > insert overwrite table region select * from region;
> > > > >
> > > > > It is possible that other tests that are not run as part of the
> > normal
> > > > test
> > > > > suite may have a similar issue.
> > > > >
> > > > > Thanks,
> > > > > Joe
> > > > >
> > > >
> > >
> >
>

Re: IMPALA-4624 requires reload of data for local PlannerTest runs

Posted by Joseph McDonnell <jo...@cloudera.com>.
What happened is that a size changed from size=900B to size=1.1KB. It looks
like the change from B to KB is the problem rather than the number.

On Mon, Mar 6, 2017 at 4:42 PM, Alex Behm <al...@cloudera.com> wrote:

> The regex is baked into the test validation code: TestUtils.java, look
> at FileSizeFilter
>
> On Mon, Mar 6, 2017 at 4:36 PM, Joseph McDonnell <
> joemcdonnell@cloudera.com>
> wrote:
>
> > Looking through the PlannerTest/*.test files, I don't see any regexes for
> > the file sizes.
> >
> > On Mon, Mar 6, 2017 at 4:22 PM, Daniel Hecht <dh...@cloudera.com>
> wrote:
> >
> > > I thought we had replaced all file sizes in the planner tests expected
> > > results with regex '/d+' so they wouldn't be file size sensitive. Maybe
> > we
> > > just missed some places?
> > >
> > > On Mon, Mar 6, 2017 at 4:09 PM, Joseph McDonnell <
> > > joemcdonnell@cloudera.com>
> > > wrote:
> > >
> > > > IMPALA-4624 changes Impala's Parquet writer to fill in the column
> chunk
> > > > metadata's encoding_stats field. This changed the file size for some
> of
> > > the
> > > > Parquet files used in our tests, resulting in a diff in
> > > > PlannerTest::testPredicatePropagation. Since the log file is now
> > updated
> > > > to
> > > > the new file sizes, any run of PlannerTest with the old tables will
> hit
> > > > this diff. The two tables involved are tpch_parquet.regionkey and
> > > > tpch_parquet.nation. Here is a workaround short of a full data
> reload:
> > > >
> > > > use tpch_parquet;
> > > > insert overwrite table nation select * from nation;
> > > > insert overwrite table region select * from region;
> > > >
> > > > It is possible that other tests that are not run as part of the
> normal
> > > test
> > > > suite may have a similar issue.
> > > >
> > > > Thanks,
> > > > Joe
> > > >
> > >
> >
>

Re: IMPALA-4624 requires reload of data for local PlannerTest runs

Posted by Alex Behm <al...@cloudera.com>.
The regex is baked into the test validation code: TestUtils.java, look
at FileSizeFilter

On Mon, Mar 6, 2017 at 4:36 PM, Joseph McDonnell <jo...@cloudera.com>
wrote:

> Looking through the PlannerTest/*.test files, I don't see any regexes for
> the file sizes.
>
> On Mon, Mar 6, 2017 at 4:22 PM, Daniel Hecht <dh...@cloudera.com> wrote:
>
> > I thought we had replaced all file sizes in the planner tests expected
> > results with regex '/d+' so they wouldn't be file size sensitive. Maybe
> we
> > just missed some places?
> >
> > On Mon, Mar 6, 2017 at 4:09 PM, Joseph McDonnell <
> > joemcdonnell@cloudera.com>
> > wrote:
> >
> > > IMPALA-4624 changes Impala's Parquet writer to fill in the column chunk
> > > metadata's encoding_stats field. This changed the file size for some of
> > the
> > > Parquet files used in our tests, resulting in a diff in
> > > PlannerTest::testPredicatePropagation. Since the log file is now
> updated
> > > to
> > > the new file sizes, any run of PlannerTest with the old tables will hit
> > > this diff. The two tables involved are tpch_parquet.regionkey and
> > > tpch_parquet.nation. Here is a workaround short of a full data reload:
> > >
> > > use tpch_parquet;
> > > insert overwrite table nation select * from nation;
> > > insert overwrite table region select * from region;
> > >
> > > It is possible that other tests that are not run as part of the normal
> > test
> > > suite may have a similar issue.
> > >
> > > Thanks,
> > > Joe
> > >
> >
>

Re: IMPALA-4624 requires reload of data for local PlannerTest runs

Posted by Joseph McDonnell <jo...@cloudera.com>.
Looking through the PlannerTest/*.test files, I don't see any regexes for
the file sizes.

On Mon, Mar 6, 2017 at 4:22 PM, Daniel Hecht <dh...@cloudera.com> wrote:

> I thought we had replaced all file sizes in the planner tests expected
> results with regex '/d+' so they wouldn't be file size sensitive. Maybe we
> just missed some places?
>
> On Mon, Mar 6, 2017 at 4:09 PM, Joseph McDonnell <
> joemcdonnell@cloudera.com>
> wrote:
>
> > IMPALA-4624 changes Impala's Parquet writer to fill in the column chunk
> > metadata's encoding_stats field. This changed the file size for some of
> the
> > Parquet files used in our tests, resulting in a diff in
> > PlannerTest::testPredicatePropagation. Since the log file is now updated
> > to
> > the new file sizes, any run of PlannerTest with the old tables will hit
> > this diff. The two tables involved are tpch_parquet.regionkey and
> > tpch_parquet.nation. Here is a workaround short of a full data reload:
> >
> > use tpch_parquet;
> > insert overwrite table nation select * from nation;
> > insert overwrite table region select * from region;
> >
> > It is possible that other tests that are not run as part of the normal
> test
> > suite may have a similar issue.
> >
> > Thanks,
> > Joe
> >
>

Re: IMPALA-4624 requires reload of data for local PlannerTest runs

Posted by Daniel Hecht <dh...@cloudera.com>.
I thought we had replaced all file sizes in the planner tests expected
results with regex '/d+' so they wouldn't be file size sensitive. Maybe we
just missed some places?

On Mon, Mar 6, 2017 at 4:09 PM, Joseph McDonnell <jo...@cloudera.com>
wrote:

> IMPALA-4624 changes Impala's Parquet writer to fill in the column chunk
> metadata's encoding_stats field. This changed the file size for some of the
> Parquet files used in our tests, resulting in a diff in
> PlannerTest::testPredicatePropagation. Since the log file is now updated
> to
> the new file sizes, any run of PlannerTest with the old tables will hit
> this diff. The two tables involved are tpch_parquet.regionkey and
> tpch_parquet.nation. Here is a workaround short of a full data reload:
>
> use tpch_parquet;
> insert overwrite table nation select * from nation;
> insert overwrite table region select * from region;
>
> It is possible that other tests that are not run as part of the normal test
> suite may have a similar issue.
>
> Thanks,
> Joe
>