You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@impala.apache.org by Cliff Resnick <cr...@gmail.com> on 2018/12/16 03:16:51 UTC

using Kudu binary column in Impala

We're doing some testing storing Hyperloglog synopsis in Kudu.  It works
well in spark, but the hope is to also query through Impala with a UDF.
Spark would remain as the writer, with Impala read-only. To work with
Impala I'm wondering if it's best to define the HLL data as Kudu string
type with plain encoding, or perhaps it's possible to keep it as binary but
declare it as string in an external table definition? I presume the latter
is not possible since Kudu's generated external table script does not do
this. Please forgive me for not conducting my own experimentation but I
figured someone here has run up against this before, and if so please let
me know!

-Cliff

Re: using Kudu binary column in Impala

Posted by Tim Armstrong <ta...@cloudera.com>.
We don't support Kudu binary columns in Impala:
https://issues.apache.org/jira/browse/IMPALA-5323. At least with
Impala/Kudu using a string should work fine. We use strings internally in
Impala for storing HLL intermediates for stats computation.

On Sat, Dec 15, 2018 at 7:17 PM Cliff Resnick <cr...@gmail.com> wrote:

> We're doing some testing storing Hyperloglog synopsis in Kudu.  It works
> well in spark, but the hope is to also query through Impala with a UDF.
> Spark would remain as the writer, with Impala read-only. To work with
> Impala I'm wondering if it's best to define the HLL data as Kudu string
> type with plain encoding, or perhaps it's possible to keep it as binary but
> declare it as string in an external table definition? I presume the latter
> is not possible since Kudu's generated external table script does not do
> this. Please forgive me for not conducting my own experimentation but I
> figured someone here has run up against this before, and if so please let
> me know!
>
> -Cliff
>
>
>

Re: using Kudu binary column in Impala

Posted by Tim Armstrong <ta...@cloudera.com>.
We don't support Kudu binary columns in Impala:
https://issues.apache.org/jira/browse/IMPALA-5323. At least with
Impala/Kudu using a string should work fine. We use strings internally in
Impala for storing HLL intermediates for stats computation.

On Sat, Dec 15, 2018 at 7:17 PM Cliff Resnick <cr...@gmail.com> wrote:

> We're doing some testing storing Hyperloglog synopsis in Kudu.  It works
> well in spark, but the hope is to also query through Impala with a UDF.
> Spark would remain as the writer, with Impala read-only. To work with
> Impala I'm wondering if it's best to define the HLL data as Kudu string
> type with plain encoding, or perhaps it's possible to keep it as binary but
> declare it as string in an external table definition? I presume the latter
> is not possible since Kudu's generated external table script does not do
> this. Please forgive me for not conducting my own experimentation but I
> figured someone here has run up against this before, and if so please let
> me know!
>
> -Cliff
>
>
>