You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@impala.apache.org by Tim Armstrong <ta...@cloudera.com> on 2020/05/07 18:40:39 UTC

Re: Impala 4.0 breaking changes

I'm going to revive this thread. I thought of a few more defaults that we
might want to change. These are default changes we (putting on Cloudera hat
temporarily) have made for some new production deployments and have been
happy with.

Query result spooling has a bunch of advantages for resource consumption
and fetch speed. It uses a bounded amount of memory and scratch space, but
I think it's overall a better default. We've been using it in production
for a while now and haven't had any issues.
https://impala.apache.org/docs/build/html/topics/impala_spool_query_results.html

I think we should also switch the default file format to parquet, because
it's more correct (default text has some issues with escaping) and because
it's more performant.
https://impala.apache.org/docs/build/html/topics/impala_default_file_format.html

We could also consider creating insert_only transactional tables by default
-
https://impala.apache.org/docs/build/html/topics/impala_default_transactional_type.html.
The pros and cons here are more complex - we get more consistent behaviour
by default, but there can be perf/scalability consequences.

Any objections or thoughts on these?

On Thu, Mar 19, 2020 at 4:44 PM Tim Armstrong <ta...@cloudera.com>
wrote:

> I think ARM support can ship in whatever release it's reading in, since
> it's not a breaking change.
>
> On Wed, Mar 18, 2020 at 9:43 PM 赵 仁海 <zh...@hotmail.com> wrote:
>
>> Thanks
>> I will work hard on this ^_^
>>
>> ________________________________
>> 发件人: Jim Apple <ap...@jbapple.com>
>> 发送时间: 2020年3月19日 10:21
>> 收件人: dev@impala.apache.org <de...@impala.apache.org>
>> 主题: Re: Impala 4.0 breaking changes
>>
>> I agree. I don’t know how far we are from having arm64 support, though,
>> and
>> we might not get there for a 4.0 release, I’d guess. But that doesn’t mean
>> it couldn’t arrive by the time for 4.1 or 4.7 or 5.55 or whatever.
>>
>> On Wed, Mar 18, 2020 at 6:32 PM Joe McDonnell <jo...@cloudera.com>
>> wrote:
>>
>> > Patches to add support for arm64 are definitely welcome in any release.
>> >
>> > Thanks,
>> > Joe
>> >
>> > On Mon, Mar 16, 2020 at 6:11 PM 赵 仁海 <zh...@hotmail.com> wrote:
>> >
>> > > Hi
>> > >
>> > > Could we  add support for arm64?
>> > >
>> > > Thanks
>> > > Zhao Renhai
>> > >
>> > > ________________________________
>> > > 发件人: Joe McDonnell <jo...@cloudera.com>
>> > > 发送时间: 2020年3月17日 1:07
>> > > 收件人: dev@impala.apache.org <de...@impala.apache.org>
>> > > 主题: Impala 4.0 breaking changes
>> > >
>> > > Now that Impala 3.4 is branched and master is Impala 4.0, we need to
>> > decide
>> > > what breaking changes will happen in Impala 4.0. I have provided a
>> series
>> > > of proposals below. I welcome feedback on them. Other proposals are
>> also
>> > > welcome.
>> > >
>> > > Thanks,
>> > > Joe
>> > >
>> > > Proposal 0: Hadoop component versions
>> > >
>> > > Switch to CDP versions of components by default. This means that
>> Impala
>> > > will use Hive 3+ (which is already essentially Hive 4 and may change
>> > names
>> > > to being Hive 4).
>> > > Remove support for CDH versions of components.
>> > > This was already discussed in the original thread for Impala 4, so
>> this
>> > is
>> > > not new.
>> > >
>> > > Proposal 1: OS support
>> > >
>> > > Drop support for Centos 6, Ubuntu 14, and Debian (all versions)
>> > > Retain support for Ubuntu 16, Ubuntu 18, Centos 7, and SLES 12
>> > > Centos 7 development will be focused on newer Centos 7 versions such
>> as
>> > 7.6
>> > > and 7.7.
>> > > Add support for Centos 8
>> > > Move main development from Ubuntu 16 to Ubuntu 18 over time.
>> > >
>> > > Proposal 2: Python support
>> > >
>> > > Drop support for Python 2.6
>> > > Add support for Python 3 over time.
>> > >
>> > > Proposal 3: Impala-lzo
>> > >
>> > > Drop support for Impala-lzo/hadoop-lzo
>> > >
>> > > Proposal 4: Clients
>> > >
>> > > Deprecate beeswax protocol. This means that it can be removed in the
>> next
>> > > major version number, but it would not be removed in Impala 4. Current
>> > > users of beeswax would need to start migrating to HS2.
>> > >
>> > > Proposal 5: Sentry
>> > >
>> > > Drop support for Sentry in favor of Ranger.
>> > >
>> > > Proposal 6: Metadata
>> > >
>> > > Metadata V2 will become the default. Metadata V1 will be deprecated.
>> > >
>> > > Thanks,
>> > > Joe
>> > >
>> >
>>
>

Re: Impala 4.0 breaking changes

Posted by Tim Armstrong <ta...@cloudera.com>.
I marked that as a blocker. Did you have a specific approach in mind? E.g.
changing the behaviour, having it controlled by a flag, etc?

On Sat, May 30, 2020 at 7:52 PM Shant Hovsepian <sh...@superdupershant.com>
wrote:

> Here's another one regarding support of ordinals in HAVING clauses.
>
>
> https://github.com/apache/impala/blame/master/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java#L54
>
> https://issues.apache.org/jira/browse/IMPALA-7844
>
> -Shant
>
>
> On Fri, May 8, 2020 at 10:35 AM Sahil Takiar <ta...@gmail.com>
> wrote:
>
> > Another aspect is that ACID-inserts are probably faster, especially on
> > object stores like S3.
> >
> >
> > Note that
> >
> >
> https://impala.apache.org/docs/build/html/topics/impala_s3_skip_insert_staging.html
> > allows
> > for direct-writes to S3 (no staging directory). Although this does not
> work
> > for insert overwrite queries.
> >
> > On Fri, May 8, 2020 at 1:44 AM Zoltán Borók-Nagy <bo...@apache.org>
> > wrote:
> >
> > > About transactional tables:
> > > If there's an ACID base directory in the table (due to compaction or
> > INSERT
> > > OVERWRITE), then files at table/partition-root level will be ignored.
> > > So in that case Spark would need to do ACID-aware inserts.
> > >
> > > Another aspect is that ACID-inserts are probably faster, especially on
> > > object stores like S3.
> > > The reason for this is that we don't need to create a staging directory
> > and
> > > move (which is a copy on S3) files to their final location.
> > > However, read amplification is definitely greater for ACID tables.
> > >
> > > Btw, do we want to achieve consistent default behavior with an upstream
> > > Hive version?
> > >
> > > That said, I think creating non-transactional tables is a good default.
> > > Especially because Impala will probably support Hudi and Iceberg in the
> > > future, so it's probably better to let the users choose explicitly.
> > >
> > > - Zoltan
> > >
> > >
> > > On Thu, May 7, 2020 at 11:46 PM Tim Armstrong <tarmstrong@cloudera.com
> >
> > > wrote:
> > >
> > > > That's a pretty good argument against defaulting to transactional
> > tables.
> > > > You are right that it doesn't work out-of-the box with most other
> > > engines -
> > > > writing files into the base directory of the table/partition will not
> > > work
> > > > as intended afaik.
> > > >
> > > > On Thu, May 7, 2020 at 1:10 PM Shant Hovsepian <sh...@cloudera.com>
> > > wrote:
> > > >
> > > > > How compatible with other engines is the insert only transaction
> > type.
> > > > >
> > > > > Very often data is loaded with spark, especially for cases with
> > complex
> > > > > types where it's the only option. Will landing parquet files in the
> > > table
> > > > > path just work even if we don't get consistent inserts or does
> spark
> > > need
> > > > > to be aware of the table format in either case?
> > > > >
> > > > > -Shant
> > > > >
> > > > > On Thu, May 7, 2020 at 3:09 PM Sahil Takiar <
> takiar.sahil@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > +1 on query results spooling, I've been thinking about enabling
> it
> > by
> > > > > > default recently since it seems to be relatively stable.
> > > > > >
> > > > > > On Thu, May 7, 2020 at 11:41 AM Tim Armstrong <
> > > tarmstrong@cloudera.com
> > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > I'm going to revive this thread. I thought of a few more
> defaults
> > > > that
> > > > > we
> > > > > > > might want to change. These are default changes we (putting on
> > > > Cloudera
> > > > > > hat
> > > > > > > temporarily) have made for some new production deployments and
> > have
> > > > > been
> > > > > > > happy with.
> > > > > > >
> > > > > > > Query result spooling has a bunch of advantages for resource
> > > > > consumption
> > > > > > > and fetch speed. It uses a bounded amount of memory and scratch
> > > > space,
> > > > > > but
> > > > > > > I think it's overall a better default. We've been using it in
> > > > > production
> > > > > > > for a while now and haven't had any issues.
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://impala.apache.org/docs/build/html/topics/impala_spool_query_results.html
> > > > > > >
> > > > > > > I think we should also switch the default file format to
> parquet,
> > > > > because
> > > > > > > it's more correct (default text has some issues with escaping)
> > and
> > > > > > because
> > > > > > > it's more performant.
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://impala.apache.org/docs/build/html/topics/impala_default_file_format.html
> > > > > > >
> > > > > > > We could also consider creating insert_only transactional
> tables
> > by
> > > > > > default
> > > > > > > -
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://impala.apache.org/docs/build/html/topics/impala_default_transactional_type.html
> > > > > > > .
> > > > > > > The pros and cons here are more complex - we get more
> consistent
> > > > > > behaviour
> > > > > > > by default, but there can be perf/scalability consequences.
> > > > > > >
> > > > > > > Any objections or thoughts on these?
> > > > > > >
> > > > > > > On Thu, Mar 19, 2020 at 4:44 PM Tim Armstrong <
> > > > tarmstrong@cloudera.com
> > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > I think ARM support can ship in whatever release it's reading
> > in,
> > > > > since
> > > > > > > > it's not a breaking change.
> > > > > > > >
> > > > > > > > On Wed, Mar 18, 2020 at 9:43 PM 赵 仁海 <zhaorenhai@hotmail.com
> >
> > > > wrote:
> > > > > > > >
> > > > > > > >> Thanks
> > > > > > > >> I will work hard on this ^_^
> > > > > > > >>
> > > > > > > >> ________________________________
> > > > > > > >> 发件人: Jim Apple <ap...@jbapple.com>
> > > > > > > >> 发送时间: 2020年3月19日 10:21
> > > > > > > >> 收件人: dev@impala.apache.org <de...@impala.apache.org>
> > > > > > > >> 主题: Re: Impala 4.0 breaking changes
> > > > > > > >>
> > > > > > > >> I agree. I don’t know how far we are from having arm64
> > support,
> > > > > > though,
> > > > > > > >> and
> > > > > > > >> we might not get there for a 4.0 release, I’d guess. But
> that
> > > > > doesn’t
> > > > > > > mean
> > > > > > > >> it couldn’t arrive by the time for 4.1 or 4.7 or 5.55 or
> > > whatever.
> > > > > > > >>
> > > > > > > >> On Wed, Mar 18, 2020 at 6:32 PM Joe McDonnell <
> > > > > > > joemcdonnell@cloudera.com>
> > > > > > > >> wrote:
> > > > > > > >>
> > > > > > > >> > Patches to add support for arm64 are definitely welcome in
> > any
> > > > > > > release.
> > > > > > > >> >
> > > > > > > >> > Thanks,
> > > > > > > >> > Joe
> > > > > > > >> >
> > > > > > > >> > On Mon, Mar 16, 2020 at 6:11 PM 赵 仁海 <
> > zhaorenhai@hotmail.com>
> > > > > > wrote:
> > > > > > > >> >
> > > > > > > >> > > Hi
> > > > > > > >> > >
> > > > > > > >> > > Could we  add support for arm64?
> > > > > > > >> > >
> > > > > > > >> > > Thanks
> > > > > > > >> > > Zhao Renhai
> > > > > > > >> > >
> > > > > > > >> > > ________________________________
> > > > > > > >> > > 发件人: Joe McDonnell <jo...@cloudera.com>
> > > > > > > >> > > 发送时间: 2020年3月17日 1:07
> > > > > > > >> > > 收件人: dev@impala.apache.org <de...@impala.apache.org>
> > > > > > > >> > > 主题: Impala 4.0 breaking changes
> > > > > > > >> > >
> > > > > > > >> > > Now that Impala 3.4 is branched and master is Impala
> 4.0,
> > we
> > > > > need
> > > > > > to
> > > > > > > >> > decide
> > > > > > > >> > > what breaking changes will happen in Impala 4.0. I have
> > > > > provided a
> > > > > > > >> series
> > > > > > > >> > > of proposals below. I welcome feedback on them. Other
> > > > proposals
> > > > > > are
> > > > > > > >> also
> > > > > > > >> > > welcome.
> > > > > > > >> > >
> > > > > > > >> > > Thanks,
> > > > > > > >> > > Joe
> > > > > > > >> > >
> > > > > > > >> > > Proposal 0: Hadoop component versions
> > > > > > > >> > >
> > > > > > > >> > > Switch to CDP versions of components by default. This
> > means
> > > > that
> > > > > > > >> Impala
> > > > > > > >> > > will use Hive 3+ (which is already essentially Hive 4
> and
> > > may
> > > > > > change
> > > > > > > >> > names
> > > > > > > >> > > to being Hive 4).
> > > > > > > >> > > Remove support for CDH versions of components.
> > > > > > > >> > > This was already discussed in the original thread for
> > Impala
> > > > 4,
> > > > > so
> > > > > > > >> this
> > > > > > > >> > is
> > > > > > > >> > > not new.
> > > > > > > >> > >
> > > > > > > >> > > Proposal 1: OS support
> > > > > > > >> > >
> > > > > > > >> > > Drop support for Centos 6, Ubuntu 14, and Debian (all
> > > > versions)
> > > > > > > >> > > Retain support for Ubuntu 16, Ubuntu 18, Centos 7, and
> > SLES
> > > 12
> > > > > > > >> > > Centos 7 development will be focused on newer Centos 7
> > > > versions
> > > > > > such
> > > > > > > >> as
> > > > > > > >> > 7.6
> > > > > > > >> > > and 7.7.
> > > > > > > >> > > Add support for Centos 8
> > > > > > > >> > > Move main development from Ubuntu 16 to Ubuntu 18 over
> > time.
> > > > > > > >> > >
> > > > > > > >> > > Proposal 2: Python support
> > > > > > > >> > >
> > > > > > > >> > > Drop support for Python 2.6
> > > > > > > >> > > Add support for Python 3 over time.
> > > > > > > >> > >
> > > > > > > >> > > Proposal 3: Impala-lzo
> > > > > > > >> > >
> > > > > > > >> > > Drop support for Impala-lzo/hadoop-lzo
> > > > > > > >> > >
> > > > > > > >> > > Proposal 4: Clients
> > > > > > > >> > >
> > > > > > > >> > > Deprecate beeswax protocol. This means that it can be
> > > removed
> > > > in
> > > > > > the
> > > > > > > >> next
> > > > > > > >> > > major version number, but it would not be removed in
> > Impala
> > > 4.
> > > > > > > Current
> > > > > > > >> > > users of beeswax would need to start migrating to HS2.
> > > > > > > >> > >
> > > > > > > >> > > Proposal 5: Sentry
> > > > > > > >> > >
> > > > > > > >> > > Drop support for Sentry in favor of Ranger.
> > > > > > > >> > >
> > > > > > > >> > > Proposal 6: Metadata
> > > > > > > >> > >
> > > > > > > >> > > Metadata V2 will become the default. Metadata V1 will be
> > > > > > deprecated.
> > > > > > > >> > >
> > > > > > > >> > > Thanks,
> > > > > > > >> > > Joe
> > > > > > > >> > >
> > > > > > > >> >
> > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Sahil Takiar
> > > > > > Software Engineer
> > > > > > takiar.sahil@gmail.com | (510) 673-0309
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> > --
> > Sahil Takiar
> > Software Engineer
> > takiar.sahil@gmail.com | (510) 673-0309
> >
>

Re: Impala 4.0 breaking changes

Posted by Shant Hovsepian <sh...@superdupershant.com>.
Here's another one regarding support of ordinals in HAVING clauses.

https://github.com/apache/impala/blame/master/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java#L54

https://issues.apache.org/jira/browse/IMPALA-7844

-Shant


On Fri, May 8, 2020 at 10:35 AM Sahil Takiar <ta...@gmail.com> wrote:

> Another aspect is that ACID-inserts are probably faster, especially on
> object stores like S3.
>
>
> Note that
>
> https://impala.apache.org/docs/build/html/topics/impala_s3_skip_insert_staging.html
> allows
> for direct-writes to S3 (no staging directory). Although this does not work
> for insert overwrite queries.
>
> On Fri, May 8, 2020 at 1:44 AM Zoltán Borók-Nagy <bo...@apache.org>
> wrote:
>
> > About transactional tables:
> > If there's an ACID base directory in the table (due to compaction or
> INSERT
> > OVERWRITE), then files at table/partition-root level will be ignored.
> > So in that case Spark would need to do ACID-aware inserts.
> >
> > Another aspect is that ACID-inserts are probably faster, especially on
> > object stores like S3.
> > The reason for this is that we don't need to create a staging directory
> and
> > move (which is a copy on S3) files to their final location.
> > However, read amplification is definitely greater for ACID tables.
> >
> > Btw, do we want to achieve consistent default behavior with an upstream
> > Hive version?
> >
> > That said, I think creating non-transactional tables is a good default.
> > Especially because Impala will probably support Hudi and Iceberg in the
> > future, so it's probably better to let the users choose explicitly.
> >
> > - Zoltan
> >
> >
> > On Thu, May 7, 2020 at 11:46 PM Tim Armstrong <ta...@cloudera.com>
> > wrote:
> >
> > > That's a pretty good argument against defaulting to transactional
> tables.
> > > You are right that it doesn't work out-of-the box with most other
> > engines -
> > > writing files into the base directory of the table/partition will not
> > work
> > > as intended afaik.
> > >
> > > On Thu, May 7, 2020 at 1:10 PM Shant Hovsepian <sh...@cloudera.com>
> > wrote:
> > >
> > > > How compatible with other engines is the insert only transaction
> type.
> > > >
> > > > Very often data is loaded with spark, especially for cases with
> complex
> > > > types where it's the only option. Will landing parquet files in the
> > table
> > > > path just work even if we don't get consistent inserts or does spark
> > need
> > > > to be aware of the table format in either case?
> > > >
> > > > -Shant
> > > >
> > > > On Thu, May 7, 2020 at 3:09 PM Sahil Takiar <ta...@gmail.com>
> > > > wrote:
> > > >
> > > > > +1 on query results spooling, I've been thinking about enabling it
> by
> > > > > default recently since it seems to be relatively stable.
> > > > >
> > > > > On Thu, May 7, 2020 at 11:41 AM Tim Armstrong <
> > tarmstrong@cloudera.com
> > > >
> > > > > wrote:
> > > > >
> > > > > > I'm going to revive this thread. I thought of a few more defaults
> > > that
> > > > we
> > > > > > might want to change. These are default changes we (putting on
> > > Cloudera
> > > > > hat
> > > > > > temporarily) have made for some new production deployments and
> have
> > > > been
> > > > > > happy with.
> > > > > >
> > > > > > Query result spooling has a bunch of advantages for resource
> > > > consumption
> > > > > > and fetch speed. It uses a bounded amount of memory and scratch
> > > space,
> > > > > but
> > > > > > I think it's overall a better default. We've been using it in
> > > > production
> > > > > > for a while now and haven't had any issues.
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://impala.apache.org/docs/build/html/topics/impala_spool_query_results.html
> > > > > >
> > > > > > I think we should also switch the default file format to parquet,
> > > > because
> > > > > > it's more correct (default text has some issues with escaping)
> and
> > > > > because
> > > > > > it's more performant.
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://impala.apache.org/docs/build/html/topics/impala_default_file_format.html
> > > > > >
> > > > > > We could also consider creating insert_only transactional tables
> by
> > > > > default
> > > > > > -
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://impala.apache.org/docs/build/html/topics/impala_default_transactional_type.html
> > > > > > .
> > > > > > The pros and cons here are more complex - we get more consistent
> > > > > behaviour
> > > > > > by default, but there can be perf/scalability consequences.
> > > > > >
> > > > > > Any objections or thoughts on these?
> > > > > >
> > > > > > On Thu, Mar 19, 2020 at 4:44 PM Tim Armstrong <
> > > tarmstrong@cloudera.com
> > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > I think ARM support can ship in whatever release it's reading
> in,
> > > > since
> > > > > > > it's not a breaking change.
> > > > > > >
> > > > > > > On Wed, Mar 18, 2020 at 9:43 PM 赵 仁海 <zh...@hotmail.com>
> > > wrote:
> > > > > > >
> > > > > > >> Thanks
> > > > > > >> I will work hard on this ^_^
> > > > > > >>
> > > > > > >> ________________________________
> > > > > > >> 发件人: Jim Apple <ap...@jbapple.com>
> > > > > > >> 发送时间: 2020年3月19日 10:21
> > > > > > >> 收件人: dev@impala.apache.org <de...@impala.apache.org>
> > > > > > >> 主题: Re: Impala 4.0 breaking changes
> > > > > > >>
> > > > > > >> I agree. I don’t know how far we are from having arm64
> support,
> > > > > though,
> > > > > > >> and
> > > > > > >> we might not get there for a 4.0 release, I’d guess. But that
> > > > doesn’t
> > > > > > mean
> > > > > > >> it couldn’t arrive by the time for 4.1 or 4.7 or 5.55 or
> > whatever.
> > > > > > >>
> > > > > > >> On Wed, Mar 18, 2020 at 6:32 PM Joe McDonnell <
> > > > > > joemcdonnell@cloudera.com>
> > > > > > >> wrote:
> > > > > > >>
> > > > > > >> > Patches to add support for arm64 are definitely welcome in
> any
> > > > > > release.
> > > > > > >> >
> > > > > > >> > Thanks,
> > > > > > >> > Joe
> > > > > > >> >
> > > > > > >> > On Mon, Mar 16, 2020 at 6:11 PM 赵 仁海 <
> zhaorenhai@hotmail.com>
> > > > > wrote:
> > > > > > >> >
> > > > > > >> > > Hi
> > > > > > >> > >
> > > > > > >> > > Could we  add support for arm64?
> > > > > > >> > >
> > > > > > >> > > Thanks
> > > > > > >> > > Zhao Renhai
> > > > > > >> > >
> > > > > > >> > > ________________________________
> > > > > > >> > > 发件人: Joe McDonnell <jo...@cloudera.com>
> > > > > > >> > > 发送时间: 2020年3月17日 1:07
> > > > > > >> > > 收件人: dev@impala.apache.org <de...@impala.apache.org>
> > > > > > >> > > 主题: Impala 4.0 breaking changes
> > > > > > >> > >
> > > > > > >> > > Now that Impala 3.4 is branched and master is Impala 4.0,
> we
> > > > need
> > > > > to
> > > > > > >> > decide
> > > > > > >> > > what breaking changes will happen in Impala 4.0. I have
> > > > provided a
> > > > > > >> series
> > > > > > >> > > of proposals below. I welcome feedback on them. Other
> > > proposals
> > > > > are
> > > > > > >> also
> > > > > > >> > > welcome.
> > > > > > >> > >
> > > > > > >> > > Thanks,
> > > > > > >> > > Joe
> > > > > > >> > >
> > > > > > >> > > Proposal 0: Hadoop component versions
> > > > > > >> > >
> > > > > > >> > > Switch to CDP versions of components by default. This
> means
> > > that
> > > > > > >> Impala
> > > > > > >> > > will use Hive 3+ (which is already essentially Hive 4 and
> > may
> > > > > change
> > > > > > >> > names
> > > > > > >> > > to being Hive 4).
> > > > > > >> > > Remove support for CDH versions of components.
> > > > > > >> > > This was already discussed in the original thread for
> Impala
> > > 4,
> > > > so
> > > > > > >> this
> > > > > > >> > is
> > > > > > >> > > not new.
> > > > > > >> > >
> > > > > > >> > > Proposal 1: OS support
> > > > > > >> > >
> > > > > > >> > > Drop support for Centos 6, Ubuntu 14, and Debian (all
> > > versions)
> > > > > > >> > > Retain support for Ubuntu 16, Ubuntu 18, Centos 7, and
> SLES
> > 12
> > > > > > >> > > Centos 7 development will be focused on newer Centos 7
> > > versions
> > > > > such
> > > > > > >> as
> > > > > > >> > 7.6
> > > > > > >> > > and 7.7.
> > > > > > >> > > Add support for Centos 8
> > > > > > >> > > Move main development from Ubuntu 16 to Ubuntu 18 over
> time.
> > > > > > >> > >
> > > > > > >> > > Proposal 2: Python support
> > > > > > >> > >
> > > > > > >> > > Drop support for Python 2.6
> > > > > > >> > > Add support for Python 3 over time.
> > > > > > >> > >
> > > > > > >> > > Proposal 3: Impala-lzo
> > > > > > >> > >
> > > > > > >> > > Drop support for Impala-lzo/hadoop-lzo
> > > > > > >> > >
> > > > > > >> > > Proposal 4: Clients
> > > > > > >> > >
> > > > > > >> > > Deprecate beeswax protocol. This means that it can be
> > removed
> > > in
> > > > > the
> > > > > > >> next
> > > > > > >> > > major version number, but it would not be removed in
> Impala
> > 4.
> > > > > > Current
> > > > > > >> > > users of beeswax would need to start migrating to HS2.
> > > > > > >> > >
> > > > > > >> > > Proposal 5: Sentry
> > > > > > >> > >
> > > > > > >> > > Drop support for Sentry in favor of Ranger.
> > > > > > >> > >
> > > > > > >> > > Proposal 6: Metadata
> > > > > > >> > >
> > > > > > >> > > Metadata V2 will become the default. Metadata V1 will be
> > > > > deprecated.
> > > > > > >> > >
> > > > > > >> > > Thanks,
> > > > > > >> > > Joe
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Sahil Takiar
> > > > > Software Engineer
> > > > > takiar.sahil@gmail.com | (510) 673-0309
> > > > >
> > > >
> > >
> >
>
>
> --
> Sahil Takiar
> Software Engineer
> takiar.sahil@gmail.com | (510) 673-0309
>

Re: Impala 4.0 breaking changes

Posted by Sahil Takiar <ta...@gmail.com>.
Another aspect is that ACID-inserts are probably faster, especially on
object stores like S3.


Note that
https://impala.apache.org/docs/build/html/topics/impala_s3_skip_insert_staging.html
allows
for direct-writes to S3 (no staging directory). Although this does not work
for insert overwrite queries.

On Fri, May 8, 2020 at 1:44 AM Zoltán Borók-Nagy <bo...@apache.org>
wrote:

> About transactional tables:
> If there's an ACID base directory in the table (due to compaction or INSERT
> OVERWRITE), then files at table/partition-root level will be ignored.
> So in that case Spark would need to do ACID-aware inserts.
>
> Another aspect is that ACID-inserts are probably faster, especially on
> object stores like S3.
> The reason for this is that we don't need to create a staging directory and
> move (which is a copy on S3) files to their final location.
> However, read amplification is definitely greater for ACID tables.
>
> Btw, do we want to achieve consistent default behavior with an upstream
> Hive version?
>
> That said, I think creating non-transactional tables is a good default.
> Especially because Impala will probably support Hudi and Iceberg in the
> future, so it's probably better to let the users choose explicitly.
>
> - Zoltan
>
>
> On Thu, May 7, 2020 at 11:46 PM Tim Armstrong <ta...@cloudera.com>
> wrote:
>
> > That's a pretty good argument against defaulting to transactional tables.
> > You are right that it doesn't work out-of-the box with most other
> engines -
> > writing files into the base directory of the table/partition will not
> work
> > as intended afaik.
> >
> > On Thu, May 7, 2020 at 1:10 PM Shant Hovsepian <sh...@cloudera.com>
> wrote:
> >
> > > How compatible with other engines is the insert only transaction type.
> > >
> > > Very often data is loaded with spark, especially for cases with complex
> > > types where it's the only option. Will landing parquet files in the
> table
> > > path just work even if we don't get consistent inserts or does spark
> need
> > > to be aware of the table format in either case?
> > >
> > > -Shant
> > >
> > > On Thu, May 7, 2020 at 3:09 PM Sahil Takiar <ta...@gmail.com>
> > > wrote:
> > >
> > > > +1 on query results spooling, I've been thinking about enabling it by
> > > > default recently since it seems to be relatively stable.
> > > >
> > > > On Thu, May 7, 2020 at 11:41 AM Tim Armstrong <
> tarmstrong@cloudera.com
> > >
> > > > wrote:
> > > >
> > > > > I'm going to revive this thread. I thought of a few more defaults
> > that
> > > we
> > > > > might want to change. These are default changes we (putting on
> > Cloudera
> > > > hat
> > > > > temporarily) have made for some new production deployments and have
> > > been
> > > > > happy with.
> > > > >
> > > > > Query result spooling has a bunch of advantages for resource
> > > consumption
> > > > > and fetch speed. It uses a bounded amount of memory and scratch
> > space,
> > > > but
> > > > > I think it's overall a better default. We've been using it in
> > > production
> > > > > for a while now and haven't had any issues.
> > > > >
> > > > >
> > > >
> > >
> >
> https://impala.apache.org/docs/build/html/topics/impala_spool_query_results.html
> > > > >
> > > > > I think we should also switch the default file format to parquet,
> > > because
> > > > > it's more correct (default text has some issues with escaping) and
> > > > because
> > > > > it's more performant.
> > > > >
> > > > >
> > > >
> > >
> >
> https://impala.apache.org/docs/build/html/topics/impala_default_file_format.html
> > > > >
> > > > > We could also consider creating insert_only transactional tables by
> > > > default
> > > > > -
> > > > >
> > > > >
> > > >
> > >
> >
> https://impala.apache.org/docs/build/html/topics/impala_default_transactional_type.html
> > > > > .
> > > > > The pros and cons here are more complex - we get more consistent
> > > > behaviour
> > > > > by default, but there can be perf/scalability consequences.
> > > > >
> > > > > Any objections or thoughts on these?
> > > > >
> > > > > On Thu, Mar 19, 2020 at 4:44 PM Tim Armstrong <
> > tarmstrong@cloudera.com
> > > >
> > > > > wrote:
> > > > >
> > > > > > I think ARM support can ship in whatever release it's reading in,
> > > since
> > > > > > it's not a breaking change.
> > > > > >
> > > > > > On Wed, Mar 18, 2020 at 9:43 PM 赵 仁海 <zh...@hotmail.com>
> > wrote:
> > > > > >
> > > > > >> Thanks
> > > > > >> I will work hard on this ^_^
> > > > > >>
> > > > > >> ________________________________
> > > > > >> 发件人: Jim Apple <ap...@jbapple.com>
> > > > > >> 发送时间: 2020年3月19日 10:21
> > > > > >> 收件人: dev@impala.apache.org <de...@impala.apache.org>
> > > > > >> 主题: Re: Impala 4.0 breaking changes
> > > > > >>
> > > > > >> I agree. I don’t know how far we are from having arm64 support,
> > > > though,
> > > > > >> and
> > > > > >> we might not get there for a 4.0 release, I’d guess. But that
> > > doesn’t
> > > > > mean
> > > > > >> it couldn’t arrive by the time for 4.1 or 4.7 or 5.55 or
> whatever.
> > > > > >>
> > > > > >> On Wed, Mar 18, 2020 at 6:32 PM Joe McDonnell <
> > > > > joemcdonnell@cloudera.com>
> > > > > >> wrote:
> > > > > >>
> > > > > >> > Patches to add support for arm64 are definitely welcome in any
> > > > > release.
> > > > > >> >
> > > > > >> > Thanks,
> > > > > >> > Joe
> > > > > >> >
> > > > > >> > On Mon, Mar 16, 2020 at 6:11 PM 赵 仁海 <zh...@hotmail.com>
> > > > wrote:
> > > > > >> >
> > > > > >> > > Hi
> > > > > >> > >
> > > > > >> > > Could we  add support for arm64?
> > > > > >> > >
> > > > > >> > > Thanks
> > > > > >> > > Zhao Renhai
> > > > > >> > >
> > > > > >> > > ________________________________
> > > > > >> > > 发件人: Joe McDonnell <jo...@cloudera.com>
> > > > > >> > > 发送时间: 2020年3月17日 1:07
> > > > > >> > > 收件人: dev@impala.apache.org <de...@impala.apache.org>
> > > > > >> > > 主题: Impala 4.0 breaking changes
> > > > > >> > >
> > > > > >> > > Now that Impala 3.4 is branched and master is Impala 4.0, we
> > > need
> > > > to
> > > > > >> > decide
> > > > > >> > > what breaking changes will happen in Impala 4.0. I have
> > > provided a
> > > > > >> series
> > > > > >> > > of proposals below. I welcome feedback on them. Other
> > proposals
> > > > are
> > > > > >> also
> > > > > >> > > welcome.
> > > > > >> > >
> > > > > >> > > Thanks,
> > > > > >> > > Joe
> > > > > >> > >
> > > > > >> > > Proposal 0: Hadoop component versions
> > > > > >> > >
> > > > > >> > > Switch to CDP versions of components by default. This means
> > that
> > > > > >> Impala
> > > > > >> > > will use Hive 3+ (which is already essentially Hive 4 and
> may
> > > > change
> > > > > >> > names
> > > > > >> > > to being Hive 4).
> > > > > >> > > Remove support for CDH versions of components.
> > > > > >> > > This was already discussed in the original thread for Impala
> > 4,
> > > so
> > > > > >> this
> > > > > >> > is
> > > > > >> > > not new.
> > > > > >> > >
> > > > > >> > > Proposal 1: OS support
> > > > > >> > >
> > > > > >> > > Drop support for Centos 6, Ubuntu 14, and Debian (all
> > versions)
> > > > > >> > > Retain support for Ubuntu 16, Ubuntu 18, Centos 7, and SLES
> 12
> > > > > >> > > Centos 7 development will be focused on newer Centos 7
> > versions
> > > > such
> > > > > >> as
> > > > > >> > 7.6
> > > > > >> > > and 7.7.
> > > > > >> > > Add support for Centos 8
> > > > > >> > > Move main development from Ubuntu 16 to Ubuntu 18 over time.
> > > > > >> > >
> > > > > >> > > Proposal 2: Python support
> > > > > >> > >
> > > > > >> > > Drop support for Python 2.6
> > > > > >> > > Add support for Python 3 over time.
> > > > > >> > >
> > > > > >> > > Proposal 3: Impala-lzo
> > > > > >> > >
> > > > > >> > > Drop support for Impala-lzo/hadoop-lzo
> > > > > >> > >
> > > > > >> > > Proposal 4: Clients
> > > > > >> > >
> > > > > >> > > Deprecate beeswax protocol. This means that it can be
> removed
> > in
> > > > the
> > > > > >> next
> > > > > >> > > major version number, but it would not be removed in Impala
> 4.
> > > > > Current
> > > > > >> > > users of beeswax would need to start migrating to HS2.
> > > > > >> > >
> > > > > >> > > Proposal 5: Sentry
> > > > > >> > >
> > > > > >> > > Drop support for Sentry in favor of Ranger.
> > > > > >> > >
> > > > > >> > > Proposal 6: Metadata
> > > > > >> > >
> > > > > >> > > Metadata V2 will become the default. Metadata V1 will be
> > > > deprecated.
> > > > > >> > >
> > > > > >> > > Thanks,
> > > > > >> > > Joe
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Sahil Takiar
> > > > Software Engineer
> > > > takiar.sahil@gmail.com | (510) 673-0309
> > > >
> > >
> >
>


-- 
Sahil Takiar
Software Engineer
takiar.sahil@gmail.com | (510) 673-0309

Re: Impala 4.0 breaking changes

Posted by Zoltán Borók-Nagy <bo...@apache.org>.
About transactional tables:
If there's an ACID base directory in the table (due to compaction or INSERT
OVERWRITE), then files at table/partition-root level will be ignored.
So in that case Spark would need to do ACID-aware inserts.

Another aspect is that ACID-inserts are probably faster, especially on
object stores like S3.
The reason for this is that we don't need to create a staging directory and
move (which is a copy on S3) files to their final location.
However, read amplification is definitely greater for ACID tables.

Btw, do we want to achieve consistent default behavior with an upstream
Hive version?

That said, I think creating non-transactional tables is a good default.
Especially because Impala will probably support Hudi and Iceberg in the
future, so it's probably better to let the users choose explicitly.

- Zoltan


On Thu, May 7, 2020 at 11:46 PM Tim Armstrong <ta...@cloudera.com>
wrote:

> That's a pretty good argument against defaulting to transactional tables.
> You are right that it doesn't work out-of-the box with most other engines -
> writing files into the base directory of the table/partition will not work
> as intended afaik.
>
> On Thu, May 7, 2020 at 1:10 PM Shant Hovsepian <sh...@cloudera.com> wrote:
>
> > How compatible with other engines is the insert only transaction type.
> >
> > Very often data is loaded with spark, especially for cases with complex
> > types where it's the only option. Will landing parquet files in the table
> > path just work even if we don't get consistent inserts or does spark need
> > to be aware of the table format in either case?
> >
> > -Shant
> >
> > On Thu, May 7, 2020 at 3:09 PM Sahil Takiar <ta...@gmail.com>
> > wrote:
> >
> > > +1 on query results spooling, I've been thinking about enabling it by
> > > default recently since it seems to be relatively stable.
> > >
> > > On Thu, May 7, 2020 at 11:41 AM Tim Armstrong <tarmstrong@cloudera.com
> >
> > > wrote:
> > >
> > > > I'm going to revive this thread. I thought of a few more defaults
> that
> > we
> > > > might want to change. These are default changes we (putting on
> Cloudera
> > > hat
> > > > temporarily) have made for some new production deployments and have
> > been
> > > > happy with.
> > > >
> > > > Query result spooling has a bunch of advantages for resource
> > consumption
> > > > and fetch speed. It uses a bounded amount of memory and scratch
> space,
> > > but
> > > > I think it's overall a better default. We've been using it in
> > production
> > > > for a while now and haven't had any issues.
> > > >
> > > >
> > >
> >
> https://impala.apache.org/docs/build/html/topics/impala_spool_query_results.html
> > > >
> > > > I think we should also switch the default file format to parquet,
> > because
> > > > it's more correct (default text has some issues with escaping) and
> > > because
> > > > it's more performant.
> > > >
> > > >
> > >
> >
> https://impala.apache.org/docs/build/html/topics/impala_default_file_format.html
> > > >
> > > > We could also consider creating insert_only transactional tables by
> > > default
> > > > -
> > > >
> > > >
> > >
> >
> https://impala.apache.org/docs/build/html/topics/impala_default_transactional_type.html
> > > > .
> > > > The pros and cons here are more complex - we get more consistent
> > > behaviour
> > > > by default, but there can be perf/scalability consequences.
> > > >
> > > > Any objections or thoughts on these?
> > > >
> > > > On Thu, Mar 19, 2020 at 4:44 PM Tim Armstrong <
> tarmstrong@cloudera.com
> > >
> > > > wrote:
> > > >
> > > > > I think ARM support can ship in whatever release it's reading in,
> > since
> > > > > it's not a breaking change.
> > > > >
> > > > > On Wed, Mar 18, 2020 at 9:43 PM 赵 仁海 <zh...@hotmail.com>
> wrote:
> > > > >
> > > > >> Thanks
> > > > >> I will work hard on this ^_^
> > > > >>
> > > > >> ________________________________
> > > > >> 发件人: Jim Apple <ap...@jbapple.com>
> > > > >> 发送时间: 2020年3月19日 10:21
> > > > >> 收件人: dev@impala.apache.org <de...@impala.apache.org>
> > > > >> 主题: Re: Impala 4.0 breaking changes
> > > > >>
> > > > >> I agree. I don’t know how far we are from having arm64 support,
> > > though,
> > > > >> and
> > > > >> we might not get there for a 4.0 release, I’d guess. But that
> > doesn’t
> > > > mean
> > > > >> it couldn’t arrive by the time for 4.1 or 4.7 or 5.55 or whatever.
> > > > >>
> > > > >> On Wed, Mar 18, 2020 at 6:32 PM Joe McDonnell <
> > > > joemcdonnell@cloudera.com>
> > > > >> wrote:
> > > > >>
> > > > >> > Patches to add support for arm64 are definitely welcome in any
> > > > release.
> > > > >> >
> > > > >> > Thanks,
> > > > >> > Joe
> > > > >> >
> > > > >> > On Mon, Mar 16, 2020 at 6:11 PM 赵 仁海 <zh...@hotmail.com>
> > > wrote:
> > > > >> >
> > > > >> > > Hi
> > > > >> > >
> > > > >> > > Could we  add support for arm64?
> > > > >> > >
> > > > >> > > Thanks
> > > > >> > > Zhao Renhai
> > > > >> > >
> > > > >> > > ________________________________
> > > > >> > > 发件人: Joe McDonnell <jo...@cloudera.com>
> > > > >> > > 发送时间: 2020年3月17日 1:07
> > > > >> > > 收件人: dev@impala.apache.org <de...@impala.apache.org>
> > > > >> > > 主题: Impala 4.0 breaking changes
> > > > >> > >
> > > > >> > > Now that Impala 3.4 is branched and master is Impala 4.0, we
> > need
> > > to
> > > > >> > decide
> > > > >> > > what breaking changes will happen in Impala 4.0. I have
> > provided a
> > > > >> series
> > > > >> > > of proposals below. I welcome feedback on them. Other
> proposals
> > > are
> > > > >> also
> > > > >> > > welcome.
> > > > >> > >
> > > > >> > > Thanks,
> > > > >> > > Joe
> > > > >> > >
> > > > >> > > Proposal 0: Hadoop component versions
> > > > >> > >
> > > > >> > > Switch to CDP versions of components by default. This means
> that
> > > > >> Impala
> > > > >> > > will use Hive 3+ (which is already essentially Hive 4 and may
> > > change
> > > > >> > names
> > > > >> > > to being Hive 4).
> > > > >> > > Remove support for CDH versions of components.
> > > > >> > > This was already discussed in the original thread for Impala
> 4,
> > so
> > > > >> this
> > > > >> > is
> > > > >> > > not new.
> > > > >> > >
> > > > >> > > Proposal 1: OS support
> > > > >> > >
> > > > >> > > Drop support for Centos 6, Ubuntu 14, and Debian (all
> versions)
> > > > >> > > Retain support for Ubuntu 16, Ubuntu 18, Centos 7, and SLES 12
> > > > >> > > Centos 7 development will be focused on newer Centos 7
> versions
> > > such
> > > > >> as
> > > > >> > 7.6
> > > > >> > > and 7.7.
> > > > >> > > Add support for Centos 8
> > > > >> > > Move main development from Ubuntu 16 to Ubuntu 18 over time.
> > > > >> > >
> > > > >> > > Proposal 2: Python support
> > > > >> > >
> > > > >> > > Drop support for Python 2.6
> > > > >> > > Add support for Python 3 over time.
> > > > >> > >
> > > > >> > > Proposal 3: Impala-lzo
> > > > >> > >
> > > > >> > > Drop support for Impala-lzo/hadoop-lzo
> > > > >> > >
> > > > >> > > Proposal 4: Clients
> > > > >> > >
> > > > >> > > Deprecate beeswax protocol. This means that it can be removed
> in
> > > the
> > > > >> next
> > > > >> > > major version number, but it would not be removed in Impala 4.
> > > > Current
> > > > >> > > users of beeswax would need to start migrating to HS2.
> > > > >> > >
> > > > >> > > Proposal 5: Sentry
> > > > >> > >
> > > > >> > > Drop support for Sentry in favor of Ranger.
> > > > >> > >
> > > > >> > > Proposal 6: Metadata
> > > > >> > >
> > > > >> > > Metadata V2 will become the default. Metadata V1 will be
> > > deprecated.
> > > > >> > >
> > > > >> > > Thanks,
> > > > >> > > Joe
> > > > >> > >
> > > > >> >
> > > > >>
> > > > >
> > > >
> > >
> > >
> > > --
> > > Sahil Takiar
> > > Software Engineer
> > > takiar.sahil@gmail.com | (510) 673-0309
> > >
> >
>

Re: Impala 4.0 breaking changes

Posted by Tim Armstrong <ta...@cloudera.com>.
That's a pretty good argument against defaulting to transactional tables.
You are right that it doesn't work out-of-the box with most other engines -
writing files into the base directory of the table/partition will not work
as intended afaik.

On Thu, May 7, 2020 at 1:10 PM Shant Hovsepian <sh...@cloudera.com> wrote:

> How compatible with other engines is the insert only transaction type.
>
> Very often data is loaded with spark, especially for cases with complex
> types where it's the only option. Will landing parquet files in the table
> path just work even if we don't get consistent inserts or does spark need
> to be aware of the table format in either case?
>
> -Shant
>
> On Thu, May 7, 2020 at 3:09 PM Sahil Takiar <ta...@gmail.com>
> wrote:
>
> > +1 on query results spooling, I've been thinking about enabling it by
> > default recently since it seems to be relatively stable.
> >
> > On Thu, May 7, 2020 at 11:41 AM Tim Armstrong <ta...@cloudera.com>
> > wrote:
> >
> > > I'm going to revive this thread. I thought of a few more defaults that
> we
> > > might want to change. These are default changes we (putting on Cloudera
> > hat
> > > temporarily) have made for some new production deployments and have
> been
> > > happy with.
> > >
> > > Query result spooling has a bunch of advantages for resource
> consumption
> > > and fetch speed. It uses a bounded amount of memory and scratch space,
> > but
> > > I think it's overall a better default. We've been using it in
> production
> > > for a while now and haven't had any issues.
> > >
> > >
> >
> https://impala.apache.org/docs/build/html/topics/impala_spool_query_results.html
> > >
> > > I think we should also switch the default file format to parquet,
> because
> > > it's more correct (default text has some issues with escaping) and
> > because
> > > it's more performant.
> > >
> > >
> >
> https://impala.apache.org/docs/build/html/topics/impala_default_file_format.html
> > >
> > > We could also consider creating insert_only transactional tables by
> > default
> > > -
> > >
> > >
> >
> https://impala.apache.org/docs/build/html/topics/impala_default_transactional_type.html
> > > .
> > > The pros and cons here are more complex - we get more consistent
> > behaviour
> > > by default, but there can be perf/scalability consequences.
> > >
> > > Any objections or thoughts on these?
> > >
> > > On Thu, Mar 19, 2020 at 4:44 PM Tim Armstrong <tarmstrong@cloudera.com
> >
> > > wrote:
> > >
> > > > I think ARM support can ship in whatever release it's reading in,
> since
> > > > it's not a breaking change.
> > > >
> > > > On Wed, Mar 18, 2020 at 9:43 PM 赵 仁海 <zh...@hotmail.com> wrote:
> > > >
> > > >> Thanks
> > > >> I will work hard on this ^_^
> > > >>
> > > >> ________________________________
> > > >> 发件人: Jim Apple <ap...@jbapple.com>
> > > >> 发送时间: 2020年3月19日 10:21
> > > >> 收件人: dev@impala.apache.org <de...@impala.apache.org>
> > > >> 主题: Re: Impala 4.0 breaking changes
> > > >>
> > > >> I agree. I don’t know how far we are from having arm64 support,
> > though,
> > > >> and
> > > >> we might not get there for a 4.0 release, I’d guess. But that
> doesn’t
> > > mean
> > > >> it couldn’t arrive by the time for 4.1 or 4.7 or 5.55 or whatever.
> > > >>
> > > >> On Wed, Mar 18, 2020 at 6:32 PM Joe McDonnell <
> > > joemcdonnell@cloudera.com>
> > > >> wrote:
> > > >>
> > > >> > Patches to add support for arm64 are definitely welcome in any
> > > release.
> > > >> >
> > > >> > Thanks,
> > > >> > Joe
> > > >> >
> > > >> > On Mon, Mar 16, 2020 at 6:11 PM 赵 仁海 <zh...@hotmail.com>
> > wrote:
> > > >> >
> > > >> > > Hi
> > > >> > >
> > > >> > > Could we  add support for arm64?
> > > >> > >
> > > >> > > Thanks
> > > >> > > Zhao Renhai
> > > >> > >
> > > >> > > ________________________________
> > > >> > > 发件人: Joe McDonnell <jo...@cloudera.com>
> > > >> > > 发送时间: 2020年3月17日 1:07
> > > >> > > 收件人: dev@impala.apache.org <de...@impala.apache.org>
> > > >> > > 主题: Impala 4.0 breaking changes
> > > >> > >
> > > >> > > Now that Impala 3.4 is branched and master is Impala 4.0, we
> need
> > to
> > > >> > decide
> > > >> > > what breaking changes will happen in Impala 4.0. I have
> provided a
> > > >> series
> > > >> > > of proposals below. I welcome feedback on them. Other proposals
> > are
> > > >> also
> > > >> > > welcome.
> > > >> > >
> > > >> > > Thanks,
> > > >> > > Joe
> > > >> > >
> > > >> > > Proposal 0: Hadoop component versions
> > > >> > >
> > > >> > > Switch to CDP versions of components by default. This means that
> > > >> Impala
> > > >> > > will use Hive 3+ (which is already essentially Hive 4 and may
> > change
> > > >> > names
> > > >> > > to being Hive 4).
> > > >> > > Remove support for CDH versions of components.
> > > >> > > This was already discussed in the original thread for Impala 4,
> so
> > > >> this
> > > >> > is
> > > >> > > not new.
> > > >> > >
> > > >> > > Proposal 1: OS support
> > > >> > >
> > > >> > > Drop support for Centos 6, Ubuntu 14, and Debian (all versions)
> > > >> > > Retain support for Ubuntu 16, Ubuntu 18, Centos 7, and SLES 12
> > > >> > > Centos 7 development will be focused on newer Centos 7 versions
> > such
> > > >> as
> > > >> > 7.6
> > > >> > > and 7.7.
> > > >> > > Add support for Centos 8
> > > >> > > Move main development from Ubuntu 16 to Ubuntu 18 over time.
> > > >> > >
> > > >> > > Proposal 2: Python support
> > > >> > >
> > > >> > > Drop support for Python 2.6
> > > >> > > Add support for Python 3 over time.
> > > >> > >
> > > >> > > Proposal 3: Impala-lzo
> > > >> > >
> > > >> > > Drop support for Impala-lzo/hadoop-lzo
> > > >> > >
> > > >> > > Proposal 4: Clients
> > > >> > >
> > > >> > > Deprecate beeswax protocol. This means that it can be removed in
> > the
> > > >> next
> > > >> > > major version number, but it would not be removed in Impala 4.
> > > Current
> > > >> > > users of beeswax would need to start migrating to HS2.
> > > >> > >
> > > >> > > Proposal 5: Sentry
> > > >> > >
> > > >> > > Drop support for Sentry in favor of Ranger.
> > > >> > >
> > > >> > > Proposal 6: Metadata
> > > >> > >
> > > >> > > Metadata V2 will become the default. Metadata V1 will be
> > deprecated.
> > > >> > >
> > > >> > > Thanks,
> > > >> > > Joe
> > > >> > >
> > > >> >
> > > >>
> > > >
> > >
> >
> >
> > --
> > Sahil Takiar
> > Software Engineer
> > takiar.sahil@gmail.com | (510) 673-0309
> >
>

Re: Impala 4.0 breaking changes

Posted by Shant Hovsepian <sh...@cloudera.com>.
How compatible with other engines is the insert only transaction type.

Very often data is loaded with spark, especially for cases with complex
types where it's the only option. Will landing parquet files in the table
path just work even if we don't get consistent inserts or does spark need
to be aware of the table format in either case?

-Shant

On Thu, May 7, 2020 at 3:09 PM Sahil Takiar <ta...@gmail.com> wrote:

> +1 on query results spooling, I've been thinking about enabling it by
> default recently since it seems to be relatively stable.
>
> On Thu, May 7, 2020 at 11:41 AM Tim Armstrong <ta...@cloudera.com>
> wrote:
>
> > I'm going to revive this thread. I thought of a few more defaults that we
> > might want to change. These are default changes we (putting on Cloudera
> hat
> > temporarily) have made for some new production deployments and have been
> > happy with.
> >
> > Query result spooling has a bunch of advantages for resource consumption
> > and fetch speed. It uses a bounded amount of memory and scratch space,
> but
> > I think it's overall a better default. We've been using it in production
> > for a while now and haven't had any issues.
> >
> >
> https://impala.apache.org/docs/build/html/topics/impala_spool_query_results.html
> >
> > I think we should also switch the default file format to parquet, because
> > it's more correct (default text has some issues with escaping) and
> because
> > it's more performant.
> >
> >
> https://impala.apache.org/docs/build/html/topics/impala_default_file_format.html
> >
> > We could also consider creating insert_only transactional tables by
> default
> > -
> >
> >
> https://impala.apache.org/docs/build/html/topics/impala_default_transactional_type.html
> > .
> > The pros and cons here are more complex - we get more consistent
> behaviour
> > by default, but there can be perf/scalability consequences.
> >
> > Any objections or thoughts on these?
> >
> > On Thu, Mar 19, 2020 at 4:44 PM Tim Armstrong <ta...@cloudera.com>
> > wrote:
> >
> > > I think ARM support can ship in whatever release it's reading in, since
> > > it's not a breaking change.
> > >
> > > On Wed, Mar 18, 2020 at 9:43 PM 赵 仁海 <zh...@hotmail.com> wrote:
> > >
> > >> Thanks
> > >> I will work hard on this ^_^
> > >>
> > >> ________________________________
> > >> 发件人: Jim Apple <ap...@jbapple.com>
> > >> 发送时间: 2020年3月19日 10:21
> > >> 收件人: dev@impala.apache.org <de...@impala.apache.org>
> > >> 主题: Re: Impala 4.0 breaking changes
> > >>
> > >> I agree. I don’t know how far we are from having arm64 support,
> though,
> > >> and
> > >> we might not get there for a 4.0 release, I’d guess. But that doesn’t
> > mean
> > >> it couldn’t arrive by the time for 4.1 or 4.7 or 5.55 or whatever.
> > >>
> > >> On Wed, Mar 18, 2020 at 6:32 PM Joe McDonnell <
> > joemcdonnell@cloudera.com>
> > >> wrote:
> > >>
> > >> > Patches to add support for arm64 are definitely welcome in any
> > release.
> > >> >
> > >> > Thanks,
> > >> > Joe
> > >> >
> > >> > On Mon, Mar 16, 2020 at 6:11 PM 赵 仁海 <zh...@hotmail.com>
> wrote:
> > >> >
> > >> > > Hi
> > >> > >
> > >> > > Could we  add support for arm64?
> > >> > >
> > >> > > Thanks
> > >> > > Zhao Renhai
> > >> > >
> > >> > > ________________________________
> > >> > > 发件人: Joe McDonnell <jo...@cloudera.com>
> > >> > > 发送时间: 2020年3月17日 1:07
> > >> > > 收件人: dev@impala.apache.org <de...@impala.apache.org>
> > >> > > 主题: Impala 4.0 breaking changes
> > >> > >
> > >> > > Now that Impala 3.4 is branched and master is Impala 4.0, we need
> to
> > >> > decide
> > >> > > what breaking changes will happen in Impala 4.0. I have provided a
> > >> series
> > >> > > of proposals below. I welcome feedback on them. Other proposals
> are
> > >> also
> > >> > > welcome.
> > >> > >
> > >> > > Thanks,
> > >> > > Joe
> > >> > >
> > >> > > Proposal 0: Hadoop component versions
> > >> > >
> > >> > > Switch to CDP versions of components by default. This means that
> > >> Impala
> > >> > > will use Hive 3+ (which is already essentially Hive 4 and may
> change
> > >> > names
> > >> > > to being Hive 4).
> > >> > > Remove support for CDH versions of components.
> > >> > > This was already discussed in the original thread for Impala 4, so
> > >> this
> > >> > is
> > >> > > not new.
> > >> > >
> > >> > > Proposal 1: OS support
> > >> > >
> > >> > > Drop support for Centos 6, Ubuntu 14, and Debian (all versions)
> > >> > > Retain support for Ubuntu 16, Ubuntu 18, Centos 7, and SLES 12
> > >> > > Centos 7 development will be focused on newer Centos 7 versions
> such
> > >> as
> > >> > 7.6
> > >> > > and 7.7.
> > >> > > Add support for Centos 8
> > >> > > Move main development from Ubuntu 16 to Ubuntu 18 over time.
> > >> > >
> > >> > > Proposal 2: Python support
> > >> > >
> > >> > > Drop support for Python 2.6
> > >> > > Add support for Python 3 over time.
> > >> > >
> > >> > > Proposal 3: Impala-lzo
> > >> > >
> > >> > > Drop support for Impala-lzo/hadoop-lzo
> > >> > >
> > >> > > Proposal 4: Clients
> > >> > >
> > >> > > Deprecate beeswax protocol. This means that it can be removed in
> the
> > >> next
> > >> > > major version number, but it would not be removed in Impala 4.
> > Current
> > >> > > users of beeswax would need to start migrating to HS2.
> > >> > >
> > >> > > Proposal 5: Sentry
> > >> > >
> > >> > > Drop support for Sentry in favor of Ranger.
> > >> > >
> > >> > > Proposal 6: Metadata
> > >> > >
> > >> > > Metadata V2 will become the default. Metadata V1 will be
> deprecated.
> > >> > >
> > >> > > Thanks,
> > >> > > Joe
> > >> > >
> > >> >
> > >>
> > >
> >
>
>
> --
> Sahil Takiar
> Software Engineer
> takiar.sahil@gmail.com | (510) 673-0309
>

Re: Impala 4.0 breaking changes

Posted by Sahil Takiar <ta...@gmail.com>.
+1 on query results spooling, I've been thinking about enabling it by
default recently since it seems to be relatively stable.

On Thu, May 7, 2020 at 11:41 AM Tim Armstrong <ta...@cloudera.com>
wrote:

> I'm going to revive this thread. I thought of a few more defaults that we
> might want to change. These are default changes we (putting on Cloudera hat
> temporarily) have made for some new production deployments and have been
> happy with.
>
> Query result spooling has a bunch of advantages for resource consumption
> and fetch speed. It uses a bounded amount of memory and scratch space, but
> I think it's overall a better default. We've been using it in production
> for a while now and haven't had any issues.
>
> https://impala.apache.org/docs/build/html/topics/impala_spool_query_results.html
>
> I think we should also switch the default file format to parquet, because
> it's more correct (default text has some issues with escaping) and because
> it's more performant.
>
> https://impala.apache.org/docs/build/html/topics/impala_default_file_format.html
>
> We could also consider creating insert_only transactional tables by default
> -
>
> https://impala.apache.org/docs/build/html/topics/impala_default_transactional_type.html
> .
> The pros and cons here are more complex - we get more consistent behaviour
> by default, but there can be perf/scalability consequences.
>
> Any objections or thoughts on these?
>
> On Thu, Mar 19, 2020 at 4:44 PM Tim Armstrong <ta...@cloudera.com>
> wrote:
>
> > I think ARM support can ship in whatever release it's reading in, since
> > it's not a breaking change.
> >
> > On Wed, Mar 18, 2020 at 9:43 PM 赵 仁海 <zh...@hotmail.com> wrote:
> >
> >> Thanks
> >> I will work hard on this ^_^
> >>
> >> ________________________________
> >> 发件人: Jim Apple <ap...@jbapple.com>
> >> 发送时间: 2020年3月19日 10:21
> >> 收件人: dev@impala.apache.org <de...@impala.apache.org>
> >> 主题: Re: Impala 4.0 breaking changes
> >>
> >> I agree. I don’t know how far we are from having arm64 support, though,
> >> and
> >> we might not get there for a 4.0 release, I’d guess. But that doesn’t
> mean
> >> it couldn’t arrive by the time for 4.1 or 4.7 or 5.55 or whatever.
> >>
> >> On Wed, Mar 18, 2020 at 6:32 PM Joe McDonnell <
> joemcdonnell@cloudera.com>
> >> wrote:
> >>
> >> > Patches to add support for arm64 are definitely welcome in any
> release.
> >> >
> >> > Thanks,
> >> > Joe
> >> >
> >> > On Mon, Mar 16, 2020 at 6:11 PM 赵 仁海 <zh...@hotmail.com> wrote:
> >> >
> >> > > Hi
> >> > >
> >> > > Could we  add support for arm64?
> >> > >
> >> > > Thanks
> >> > > Zhao Renhai
> >> > >
> >> > > ________________________________
> >> > > 发件人: Joe McDonnell <jo...@cloudera.com>
> >> > > 发送时间: 2020年3月17日 1:07
> >> > > 收件人: dev@impala.apache.org <de...@impala.apache.org>
> >> > > 主题: Impala 4.0 breaking changes
> >> > >
> >> > > Now that Impala 3.4 is branched and master is Impala 4.0, we need to
> >> > decide
> >> > > what breaking changes will happen in Impala 4.0. I have provided a
> >> series
> >> > > of proposals below. I welcome feedback on them. Other proposals are
> >> also
> >> > > welcome.
> >> > >
> >> > > Thanks,
> >> > > Joe
> >> > >
> >> > > Proposal 0: Hadoop component versions
> >> > >
> >> > > Switch to CDP versions of components by default. This means that
> >> Impala
> >> > > will use Hive 3+ (which is already essentially Hive 4 and may change
> >> > names
> >> > > to being Hive 4).
> >> > > Remove support for CDH versions of components.
> >> > > This was already discussed in the original thread for Impala 4, so
> >> this
> >> > is
> >> > > not new.
> >> > >
> >> > > Proposal 1: OS support
> >> > >
> >> > > Drop support for Centos 6, Ubuntu 14, and Debian (all versions)
> >> > > Retain support for Ubuntu 16, Ubuntu 18, Centos 7, and SLES 12
> >> > > Centos 7 development will be focused on newer Centos 7 versions such
> >> as
> >> > 7.6
> >> > > and 7.7.
> >> > > Add support for Centos 8
> >> > > Move main development from Ubuntu 16 to Ubuntu 18 over time.
> >> > >
> >> > > Proposal 2: Python support
> >> > >
> >> > > Drop support for Python 2.6
> >> > > Add support for Python 3 over time.
> >> > >
> >> > > Proposal 3: Impala-lzo
> >> > >
> >> > > Drop support for Impala-lzo/hadoop-lzo
> >> > >
> >> > > Proposal 4: Clients
> >> > >
> >> > > Deprecate beeswax protocol. This means that it can be removed in the
> >> next
> >> > > major version number, but it would not be removed in Impala 4.
> Current
> >> > > users of beeswax would need to start migrating to HS2.
> >> > >
> >> > > Proposal 5: Sentry
> >> > >
> >> > > Drop support for Sentry in favor of Ranger.
> >> > >
> >> > > Proposal 6: Metadata
> >> > >
> >> > > Metadata V2 will become the default. Metadata V1 will be deprecated.
> >> > >
> >> > > Thanks,
> >> > > Joe
> >> > >
> >> >
> >>
> >
>


-- 
Sahil Takiar
Software Engineer
takiar.sahil@gmail.com | (510) 673-0309