You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@impala.apache.org by Dimitris Tsirogiannis <dt...@cloudera.com> on 2018/04/02 22:28:53 UTC

Re: Difference between LOAD DATA and refresh

Hi Antoni,

I apologize for the extremely delayed response. LOAD DATA will still
require the catalog to update the metadata of that table, hence making it
susceptible to IMPALA-5058 if that operation is taking a long time. How
long does it usually take to refresh a partition? That said, IMPALA-5058 is
fixed in 5.15. So, you may want to consider upgrading your system if that's
possible.

Dimitris

On Mon, Jan 8, 2018 at 8:47 AM, Antoni Ivanov <ai...@vmware.com> wrote:

> Hi,
>
>
>
> We are wondering if we can reduce the impact of https://issues.apache.org/
> jira/browse/IMPALA-5058
>
> Now we use “insert statements using spark” and then we use refresh
> partition x
>
> Now we are thinking of using directly  LOAD DATA statement.
>
>
>
> I imagine LOAD DATA doesn’t require to communicate with hive metastore db
> (only update hdfs block location).
>
>
>
> ?
>
> Thanks,
>
> Antoni
>