You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Stamatis Zampetakis <za...@gmail.com> on 2020/07/31 17:38:46 UTC

Re: Hive TPC-DS metastore dumps in Postgres

There is now a PR [1] with various improvements over the last update. Feel
free to check it out and let me know what you think.

Best,
Stamatis

[1] https://github.com/apache/hive/pull/1347

On Mon, Jun 22, 2020 at 5:32 PM Stamatis Zampetakis <za...@gmail.com>
wrote:

> Hey guys,
>
> I put up a small project on GitHub [1] with Hive metastore dumps from
> tpcds10tb/tpcds30tb (+partitioning) and some scripts to quickly spin up a
> dockerized Postgres with those loaded.
>
> Personally, I find it useful to check the plans of TPC-DS queries using
> the usual qtest mechanism (without external tools and tapping into a real
> cluster) having at hand beefy stats + partitioning info. The driver and
> other changes needed to run these tests are located in [2].
>
> I am sharing it here in case it might be of use to somebody else.
>
> The two main commands that you will need if you wanna try this out:
> docker build --tag postgres-tpcds-metastore:1.0 .
> mvn test -Dtest=TestTezPerfDBCliDriver -Dtest.output.overwrite=true
> -Dtest.metastore.db=postgres.tpcds
>
> Small caveat: Currently in [2] the dockerized postgres is restarted for
> every query which makes things slow. This will be fixed later on.
>
> Best,
> Stamatis
>
> [1] https://github.com/zabetak/hive-postgres-metastore
> [2] https://github.com/zabetak/hive/tree/qtest_postgres_driver
>