You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@impala.apache.org by Joe McDonnell <jo...@cloudera.com> on 2017/12/13 21:14:39 UTC

Test data directory layout change (IMPALA-6052)

I just uploaded a preview of a code change for IMPALA-6052, which changes
the HDFS directory locations for Impala test data:
https://gerrit.cloudera.org/#/c/8260/
Summary of the change below.

This change would require all developers to reload test data, so I wanted
to start a discussion about the timing of this change. In particular, does
a change like this belong in a point release (2.12)? Are there any concerns
about going forward with this change for 2.12?

Thanks,
Joe

Test tables will now be organized into database directories rather than
being at the top level of /test-warehouse. The new format matches the
default placement of a table when LOCATION is not specified.

e.g.
Table: functional.alltypes
Old: /test-warehouse/alltypes
New: /test-warehouse/functional.db/alltypes

Table: functional_parquet.alltypes
Old: /test-warehouse/alltypes_parquet
New: /test-warehouse/functional_parquet.db/alltypes

Before this change, /test-warehouse has 900+ subdirectories. After the
change, it has about 60. This should make it easier to navigate our HDFS
directories.

Re: Test data directory layout change (IMPALA-6052)

Posted by Joe McDonnell <jo...@cloudera.com>.
I will test that case with the new code.

Thanks,
Joe

On Wed, Dec 13, 2017 at 2:21 PM, Jim Apple <jb...@cloudera.com> wrote:

> I think it's fine for this to be in 2.12, not 3.0.
>
> While you are working on this, can you test it with
> bin/single_node_perf_run.py, too? As of today, I believe that works
> with scale factors > 1.
>
> On Wed, Dec 13, 2017 at 1:14 PM, Joe McDonnell
> <jo...@cloudera.com> wrote:
> > I just uploaded a preview of a code change for IMPALA-6052, which changes
> > the HDFS directory locations for Impala test data:
> > https://gerrit.cloudera.org/#/c/8260/
> > Summary of the change below.
> >
> > This change would require all developers to reload test data, so I wanted
> > to start a discussion about the timing of this change. In particular,
> does
> > a change like this belong in a point release (2.12)? Are there any
> concerns
> > about going forward with this change for 2.12?
> >
> > Thanks,
> > Joe
> >
> > Test tables will now be organized into database directories rather than
> > being at the top level of /test-warehouse. The new format matches the
> > default placement of a table when LOCATION is not specified.
> >
> > e.g.
> > Table: functional.alltypes
> > Old: /test-warehouse/alltypes
> > New: /test-warehouse/functional.db/alltypes
> >
> > Table: functional_parquet.alltypes
> > Old: /test-warehouse/alltypes_parquet
> > New: /test-warehouse/functional_parquet.db/alltypes
> >
> > Before this change, /test-warehouse has 900+ subdirectories. After the
> > change, it has about 60. This should make it easier to navigate our HDFS
> > directories.
>

Re: Test data directory layout change (IMPALA-6052)

Posted by Jim Apple <jb...@cloudera.com>.
I think it's fine for this to be in 2.12, not 3.0.

While you are working on this, can you test it with
bin/single_node_perf_run.py, too? As of today, I believe that works
with scale factors > 1.

On Wed, Dec 13, 2017 at 1:14 PM, Joe McDonnell
<jo...@cloudera.com> wrote:
> I just uploaded a preview of a code change for IMPALA-6052, which changes
> the HDFS directory locations for Impala test data:
> https://gerrit.cloudera.org/#/c/8260/
> Summary of the change below.
>
> This change would require all developers to reload test data, so I wanted
> to start a discussion about the timing of this change. In particular, does
> a change like this belong in a point release (2.12)? Are there any concerns
> about going forward with this change for 2.12?
>
> Thanks,
> Joe
>
> Test tables will now be organized into database directories rather than
> being at the top level of /test-warehouse. The new format matches the
> default placement of a table when LOCATION is not specified.
>
> e.g.
> Table: functional.alltypes
> Old: /test-warehouse/alltypes
> New: /test-warehouse/functional.db/alltypes
>
> Table: functional_parquet.alltypes
> Old: /test-warehouse/alltypes_parquet
> New: /test-warehouse/functional_parquet.db/alltypes
>
> Before this change, /test-warehouse has 900+ subdirectories. After the
> change, it has about 60. This should make it easier to navigate our HDFS
> directories.