You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by "Luong, Dickson" <di...@condenast.com> on 2018/08/14 19:39:16 UTC

External Table Creation is slow/hangs

I have a dataset up on S3 in partitioned folders. I'm trying to create an
external hive table pointing to the location of that data. The table schema
is set up to have the column partitions matching how the folders are set up
on S3.

I've done this quite a few times successfully, but when the data is large
the table creation query is either extremely slow or it hangs (We can't
tell).

I've followed some of the tips in
https://hortonworks.github.io/hdp-aws/s3-hive/index.html#general-performance-tips
by configuring some of the parameters involving file permission and file
size checks to adjust for S3 but still no luck.

We're using EMR 5.12.1 which contains Hive 2.3.2. The table creation query
does not show up in the Tez UI, but it does show up in the HiveServer UI as
running, but we're not sure if it actually is or just hung (most likely the
latter).

Our (very roundabout) solution so far is to copy all the files in that
master folder to another directory, delete the files, create the external
table when the directory is empty, and to transfer the files back. We need
to keep the original directory name as other processes depend on it and
can't simply just start in a fresh directory, so this whole method is
obviously not ideal.

Any tips / solutions to this problem we've been tackling would be greatly
appreciated.

Dickson

Re: External Table Creation is slow/hangs

Posted by Furcy Pin <pi...@gmail.com>.
Hi,

I can't tell for sure where your problem is coming from, but from what you
said, I guess that the Hive Metastore is performing some list or scan
operation on the files
and that operation is taking a very long time.

maybe setting *hive.stats.autogather* to false might help.

Also, beware that some configuration parameters that apply to the Metastore
cannot be changed via a SET operation,
and require you to change the configuration file of your Metastore service
and restart it.
Maybe that's why some of the conf changes you tried had no effect...

Also, don't hesitate to provide more details about what type of query you
run (e.g. is your table partitioned? etc.)
and what configuration tweaks you tried already.

Hope this helps,

Furcy







On Tue, 14 Aug 2018 at 21:39, Luong, Dickson <di...@condenast.com>
wrote:

> I have a dataset up on S3 in partitioned folders. I'm trying to create an
> external hive table pointing to the location of that data. The table schema
> is set up to have the column partitions matching how the folders are set up
> on S3.
>
> I've done this quite a few times successfully, but when the data is large
> the table creation query is either extremely slow or it hangs (We can't
> tell).
>
> I've followed some of the tips in
> https://hortonworks.github.io/hdp-aws/s3-hive/index.html#general-performance-tips
> by configuring some of the parameters involving file permission and file
> size checks to adjust for S3 but still no luck.
>
> We're using EMR 5.12.1 which contains Hive 2.3.2. The table creation query
> does not show up in the Tez UI, but it does show up in the HiveServer UI as
> running, but we're not sure if it actually is or just hung (most likely the
> latter).
>
> Our (very roundabout) solution so far is to copy all the files in that
> master folder to another directory, delete the files, create the external
> table when the directory is empty, and to transfer the files back. We need
> to keep the original directory name as other processes depend on it and
> can't simply just start in a fresh directory, so this whole method is
> obviously not ideal.
>
> Any tips / solutions to this problem we've been tackling would be greatly
> appreciated.
>
> Dickson
>