You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@impala.apache.org by Tim Armstrong <ta...@cloudera.com> on 2020/06/01 16:46:56 UTC

Re: Impala 4.0 breaking changes

I marked that as a blocker. Did you have a specific approach in mind? E.g.
changing the behaviour, having it controlled by a flag, etc?

On Sat, May 30, 2020 at 7:52 PM Shant Hovsepian <sh...@superdupershant.com>
wrote:

> Here's another one regarding support of ordinals in HAVING clauses.
>
>
> https://github.com/apache/impala/blame/master/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java#L54
>
> https://issues.apache.org/jira/browse/IMPALA-7844
>
> -Shant
>
>
> On Fri, May 8, 2020 at 10:35 AM Sahil Takiar <ta...@gmail.com>
> wrote:
>
> > Another aspect is that ACID-inserts are probably faster, especially on
> > object stores like S3.
> >
> >
> > Note that
> >
> >
> https://impala.apache.org/docs/build/html/topics/impala_s3_skip_insert_staging.html
> > allows
> > for direct-writes to S3 (no staging directory). Although this does not
> work
> > for insert overwrite queries.
> >
> > On Fri, May 8, 2020 at 1:44 AM Zoltán Borók-Nagy <bo...@apache.org>
> > wrote:
> >
> > > About transactional tables:
> > > If there's an ACID base directory in the table (due to compaction or
> > INSERT
> > > OVERWRITE), then files at table/partition-root level will be ignored.
> > > So in that case Spark would need to do ACID-aware inserts.
> > >
> > > Another aspect is that ACID-inserts are probably faster, especially on
> > > object stores like S3.
> > > The reason for this is that we don't need to create a staging directory
> > and
> > > move (which is a copy on S3) files to their final location.
> > > However, read amplification is definitely greater for ACID tables.
> > >
> > > Btw, do we want to achieve consistent default behavior with an upstream
> > > Hive version?
> > >
> > > That said, I think creating non-transactional tables is a good default.
> > > Especially because Impala will probably support Hudi and Iceberg in the
> > > future, so it's probably better to let the users choose explicitly.
> > >
> > > - Zoltan
> > >
> > >
> > > On Thu, May 7, 2020 at 11:46 PM Tim Armstrong <tarmstrong@cloudera.com
> >
> > > wrote:
> > >
> > > > That's a pretty good argument against defaulting to transactional
> > tables.
> > > > You are right that it doesn't work out-of-the box with most other
> > > engines -
> > > > writing files into the base directory of the table/partition will not
> > > work
> > > > as intended afaik.
> > > >
> > > > On Thu, May 7, 2020 at 1:10 PM Shant Hovsepian <sh...@cloudera.com>
> > > wrote:
> > > >
> > > > > How compatible with other engines is the insert only transaction
> > type.
> > > > >
> > > > > Very often data is loaded with spark, especially for cases with
> > complex
> > > > > types where it's the only option. Will landing parquet files in the
> > > table
> > > > > path just work even if we don't get consistent inserts or does
> spark
> > > need
> > > > > to be aware of the table format in either case?
> > > > >
> > > > > -Shant
> > > > >
> > > > > On Thu, May 7, 2020 at 3:09 PM Sahil Takiar <
> takiar.sahil@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > +1 on query results spooling, I've been thinking about enabling
> it
> > by
> > > > > > default recently since it seems to be relatively stable.
> > > > > >
> > > > > > On Thu, May 7, 2020 at 11:41 AM Tim Armstrong <
> > > tarmstrong@cloudera.com
> > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > I'm going to revive this thread. I thought of a few more
> defaults
> > > > that
> > > > > we
> > > > > > > might want to change. These are default changes we (putting on
> > > > Cloudera
> > > > > > hat
> > > > > > > temporarily) have made for some new production deployments and
> > have
> > > > > been
> > > > > > > happy with.
> > > > > > >
> > > > > > > Query result spooling has a bunch of advantages for resource
> > > > > consumption
> > > > > > > and fetch speed. It uses a bounded amount of memory and scratch
> > > > space,
> > > > > > but
> > > > > > > I think it's overall a better default. We've been using it in
> > > > > production
> > > > > > > for a while now and haven't had any issues.
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://impala.apache.org/docs/build/html/topics/impala_spool_query_results.html
> > > > > > >
> > > > > > > I think we should also switch the default file format to
> parquet,
> > > > > because
> > > > > > > it's more correct (default text has some issues with escaping)
> > and
> > > > > > because
> > > > > > > it's more performant.
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://impala.apache.org/docs/build/html/topics/impala_default_file_format.html
> > > > > > >
> > > > > > > We could also consider creating insert_only transactional
> tables
> > by
> > > > > > default
> > > > > > > -
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://impala.apache.org/docs/build/html/topics/impala_default_transactional_type.html
> > > > > > > .
> > > > > > > The pros and cons here are more complex - we get more
> consistent
> > > > > > behaviour
> > > > > > > by default, but there can be perf/scalability consequences.
> > > > > > >
> > > > > > > Any objections or thoughts on these?
> > > > > > >
> > > > > > > On Thu, Mar 19, 2020 at 4:44 PM Tim Armstrong <
> > > > tarmstrong@cloudera.com
> > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > I think ARM support can ship in whatever release it's reading
> > in,
> > > > > since
> > > > > > > > it's not a breaking change.
> > > > > > > >
> > > > > > > > On Wed, Mar 18, 2020 at 9:43 PM 赵 仁海 <zhaorenhai@hotmail.com
> >
> > > > wrote:
> > > > > > > >
> > > > > > > >> Thanks
> > > > > > > >> I will work hard on this ^_^
> > > > > > > >>
> > > > > > > >> ________________________________
> > > > > > > >> 发件人: Jim Apple <ap...@jbapple.com>
> > > > > > > >> 发送时间: 2020年3月19日 10:21
> > > > > > > >> 收件人: dev@impala.apache.org <de...@impala.apache.org>
> > > > > > > >> 主题: Re: Impala 4.0 breaking changes
> > > > > > > >>
> > > > > > > >> I agree. I don’t know how far we are from having arm64
> > support,
> > > > > > though,
> > > > > > > >> and
> > > > > > > >> we might not get there for a 4.0 release, I’d guess. But
> that
> > > > > doesn’t
> > > > > > > mean
> > > > > > > >> it couldn’t arrive by the time for 4.1 or 4.7 or 5.55 or
> > > whatever.
> > > > > > > >>
> > > > > > > >> On Wed, Mar 18, 2020 at 6:32 PM Joe McDonnell <
> > > > > > > joemcdonnell@cloudera.com>
> > > > > > > >> wrote:
> > > > > > > >>
> > > > > > > >> > Patches to add support for arm64 are definitely welcome in
> > any
> > > > > > > release.
> > > > > > > >> >
> > > > > > > >> > Thanks,
> > > > > > > >> > Joe
> > > > > > > >> >
> > > > > > > >> > On Mon, Mar 16, 2020 at 6:11 PM 赵 仁海 <
> > zhaorenhai@hotmail.com>
> > > > > > wrote:
> > > > > > > >> >
> > > > > > > >> > > Hi
> > > > > > > >> > >
> > > > > > > >> > > Could we  add support for arm64?
> > > > > > > >> > >
> > > > > > > >> > > Thanks
> > > > > > > >> > > Zhao Renhai
> > > > > > > >> > >
> > > > > > > >> > > ________________________________
> > > > > > > >> > > 发件人: Joe McDonnell <jo...@cloudera.com>
> > > > > > > >> > > 发送时间: 2020年3月17日 1:07
> > > > > > > >> > > 收件人: dev@impala.apache.org <de...@impala.apache.org>
> > > > > > > >> > > 主题: Impala 4.0 breaking changes
> > > > > > > >> > >
> > > > > > > >> > > Now that Impala 3.4 is branched and master is Impala
> 4.0,
> > we
> > > > > need
> > > > > > to
> > > > > > > >> > decide
> > > > > > > >> > > what breaking changes will happen in Impala 4.0. I have
> > > > > provided a
> > > > > > > >> series
> > > > > > > >> > > of proposals below. I welcome feedback on them. Other
> > > > proposals
> > > > > > are
> > > > > > > >> also
> > > > > > > >> > > welcome.
> > > > > > > >> > >
> > > > > > > >> > > Thanks,
> > > > > > > >> > > Joe
> > > > > > > >> > >
> > > > > > > >> > > Proposal 0: Hadoop component versions
> > > > > > > >> > >
> > > > > > > >> > > Switch to CDP versions of components by default. This
> > means
> > > > that
> > > > > > > >> Impala
> > > > > > > >> > > will use Hive 3+ (which is already essentially Hive 4
> and
> > > may
> > > > > > change
> > > > > > > >> > names
> > > > > > > >> > > to being Hive 4).
> > > > > > > >> > > Remove support for CDH versions of components.
> > > > > > > >> > > This was already discussed in the original thread for
> > Impala
> > > > 4,
> > > > > so
> > > > > > > >> this
> > > > > > > >> > is
> > > > > > > >> > > not new.
> > > > > > > >> > >
> > > > > > > >> > > Proposal 1: OS support
> > > > > > > >> > >
> > > > > > > >> > > Drop support for Centos 6, Ubuntu 14, and Debian (all
> > > > versions)
> > > > > > > >> > > Retain support for Ubuntu 16, Ubuntu 18, Centos 7, and
> > SLES
> > > 12
> > > > > > > >> > > Centos 7 development will be focused on newer Centos 7
> > > > versions
> > > > > > such
> > > > > > > >> as
> > > > > > > >> > 7.6
> > > > > > > >> > > and 7.7.
> > > > > > > >> > > Add support for Centos 8
> > > > > > > >> > > Move main development from Ubuntu 16 to Ubuntu 18 over
> > time.
> > > > > > > >> > >
> > > > > > > >> > > Proposal 2: Python support
> > > > > > > >> > >
> > > > > > > >> > > Drop support for Python 2.6
> > > > > > > >> > > Add support for Python 3 over time.
> > > > > > > >> > >
> > > > > > > >> > > Proposal 3: Impala-lzo
> > > > > > > >> > >
> > > > > > > >> > > Drop support for Impala-lzo/hadoop-lzo
> > > > > > > >> > >
> > > > > > > >> > > Proposal 4: Clients
> > > > > > > >> > >
> > > > > > > >> > > Deprecate beeswax protocol. This means that it can be
> > > removed
> > > > in
> > > > > > the
> > > > > > > >> next
> > > > > > > >> > > major version number, but it would not be removed in
> > Impala
> > > 4.
> > > > > > > Current
> > > > > > > >> > > users of beeswax would need to start migrating to HS2.
> > > > > > > >> > >
> > > > > > > >> > > Proposal 5: Sentry
> > > > > > > >> > >
> > > > > > > >> > > Drop support for Sentry in favor of Ranger.
> > > > > > > >> > >
> > > > > > > >> > > Proposal 6: Metadata
> > > > > > > >> > >
> > > > > > > >> > > Metadata V2 will become the default. Metadata V1 will be
> > > > > > deprecated.
> > > > > > > >> > >
> > > > > > > >> > > Thanks,
> > > > > > > >> > > Joe
> > > > > > > >> > >
> > > > > > > >> >
> > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Sahil Takiar
> > > > > > Software Engineer
> > > > > > takiar.sahil@gmail.com | (510) 673-0309
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> > --
> > Sahil Takiar
> > Software Engineer
> > takiar.sahil@gmail.com | (510) 673-0309
> >
>