You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@impala.apache.org by Jim Apple <jb...@apache.org> on 2019/05/20 00:03:07 UTC

Ubuntu 18.04 in pre-merge tests?

HEAD now passes on Ubuntu 18.04:

https://jenkins.impala.io/job/ubuntu-18.04-from-scratch/

Thanks to the community members who have made this happen!

Should we add Ubuntu 18.04 to our pre-merge Jenkins job, replace 16.04 with
18.04 in our pre-merge Jenkins job, or neither?

I propose adding 18.04 for now (ans so running both 16.04 and 18.04 on
merge) and removing 16.04 when it starts to become inconvenient.

Re: Ubuntu 18.04 in pre-merge tests?

Posted by Todd Lipcon <to...@cloudera.com>.
On Sun, Jun 23, 2019 at 8:23 PM Jim Apple <ap...@jbapple.com> wrote:

> >
> > Generally I think precommit running on something closer to the oldest
> > supported OS is better than running on the newest, since it's more likely
> > that new OSes are backward-compatible. Otherwise it's very easy to
> > introduce code that uses features not available on el7, for example.
> >
>
> I find that argument compelling. Do you think we should switch the
> pre-commit job to CentOS 7?
>

Seems reasonable to me, if someone wants to volunteer the time to do it :)
Perhaps others have some reason why it wouldn't work well, though.

-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Ubuntu 18.04 in pre-merge tests?

Posted by Jim Apple <ap...@jbapple.com>.
>
> Generally I think precommit running on something closer to the oldest
> supported OS is better than running on the newest, since it's more likely
> that new OSes are backward-compatible. Otherwise it's very easy to
> introduce code that uses features not available on el7, for example.
>

I find that argument compelling. Do you think we should switch the
pre-commit job to CentOS 7?

Re: Ubuntu 18.04 in pre-merge tests?

Posted by Laszlo Gaal <la...@cloudera.com>.
Thanks for the pointer, Todd!

On Wed, Jun 19, 2019 at 11:53 PM Todd Lipcon <to...@cloudera.com> wrote:

> This same issue was reported a month or two ago for Kudu on Fedora 29. I
> think Alexey Serbin had started to look into it. Alexey, did we figure out
> what was going on here?
>
> -Todd
>
> On Wed, Jun 19, 2019 at 6:00 AM Laszlo Gaal <la...@cloudera.com>
> wrote:
>
> > Having looked at the failing build Jim quoted above, the failure seems to
> > come from the security area.
> > This is from the Kudu master's log, from the startup sequence (see
> >
> >
> https://jenkins.impala.io/job/ubuntu-18.04-from-scratch/16/artifact/Impala/logs_static/logs/cluster/cdh6-node-1/kudu/master/kudu-master.INFO/*view*/
> > ),
> > all this in the context of an Impala minicluster:
> >
> > I0612 04:12:56.129866  8515 sys_catalog.cc:424] T
> > 00000000000000000000000000000000 P 58a05ce6efa74b30907ac4d679bd0515
> > [sys.catalog]: configured and running, proceeding with master startup.
> > W0612 04:12:56.130080  8522 catalog_manager.cc:1113] T
> > 00000000000000000000000000000000 P 58a05ce6efa74b30907ac4d679bd0515:
> > acquiring CA information for follower catalog manager: Not found: root CA
> > entry not found
> > W0612 04:12:56.130123  8522 catalog_manager.cc:596] Not found: root CA
> > entry not found: failed to prepare follower catalog manager, will retry
> > I0612 04:12:56.130151  8521 catalog_manager.cc:1055] Loading table and
> > tablet metadata into memory...
> > I0612 04:12:56.130228  8521 catalog_manager.cc:1066] Initializing Kudu
> > internal certificate authority...
> > W0612 04:12:56.167639  8636 negotiation.cc:320] Unauthorized connection
> > attempt: Server connection negotiation failed: server connection from
> > 127.0.0.1:50174: expected TLS_HANDSHAKE step: SASL_INITIATE
> > W0612 04:12:56.170145  8636 negotiation.cc:320] Unauthorized connection
> > attempt: Server connection negotiation failed: server connection from
> > 127.0.0.1:50176: expected TLS_HANDSHAKE step: SASL_INITIATE
> > W0612 04:12:56.172571  8636 negotiation.cc:320] Unauthorized connection
> > attempt: Server connection negotiation failed: server connection from
> > 127.0.0.1:50178: expected TLS_HANDSHAKE step: SASL_INITIATE
> > W0612 04:12:56.182530  8636 negotiation.cc:320] Unauthorized connection
> > attempt: Server connection negotiation failed: server connection from
> > 127.0.0.1:50180: expected TLS_HANDSHAKE step: SASL_INITIATE
> > W0612 04:12:56.185034  8636 negotiation.cc:320] Unauthorized connection
> > attempt: Server connection negotiation failed: server connection from
> > 127.0.0.1:50182: expected TLS_HANDSHAKE step: SASL_INITIATE
> > W0612 04:12:56.187453  8636 negotiation.cc:320] Unauthorized connection
> > attempt: Server connection negotiation failed: server connection from
> > 127.0.0.1:50184: expected TLS_HANDSHAKE step: SASL_INITIATE
> > I0612 04:12:56.197146  8521 catalog_manager.cc:950] Generated new
> > certificate authority record
> > I0612 04:12:56.198005  8521 catalog_manager.cc:1075] Loading token
> signing
> > keys...
> > W0612 04:12:56.293697  8636 negotiation.cc:320] Unauthorized connection
> > attempt: Server connection negotiation failed: server connection from
> > 127.0.0.1:50186: expected TLS_HANDSHAKE step: SASL_INITIATE
> > W0612 04:12:56.295320  8636 negotiation.cc:320] Unauthorized connection
> > attempt: Server connection negotiation failed: server connection from
> > 127.0.0.1:50188: expected TLS_HANDSHAKE step: SASL_INITIATE
> > W0612 04:12:56.296821  8636 negotiation.cc:320] Unauthorized connection
> > attempt: Server connection negotiation failed: server connection from
> > 127.0.0.1:50190: expected TLS_HANDSHAKE step: SASL_INITIATE
> > I0612 04:12:56.416918  8521 catalog_manager.cc:4292] T
> > 00000000000000000000000000000000 P 58a05ce6efa74b30907ac4d679bd0515:
> > Generated new TSK 0
> > W0612 04:12:57.174684  8901 negotiation.cc:320] Unauthorized connection
> > attempt: Server connection negotiation failed: server connection from
> > 127.0.0.1:50192: expected TLS_HANDSHAKE step: SASL_INITIATE
> > [and so on...]
> >
> > The same run has very similar messages in the tablet server logs as well:
> > 0612 04:12:56.289767  8396 rpc_server.cc:205] RPC server started. Bound
> to:
> > 127.0.0.1:31202
> > I0612 04:12:56.289903  8396 webserver.cc:308] Webserver started at
> > http://0.0.0.0:31302/ using document root
> >
> >
> /home/ubuntu/Impala/toolchain/cdh_components-1137441/kudu-1.10.0-cdh6.x-SNAPSHOT/release/bin/../lib/kudu/www
> > and password file <none>
> > W0612 04:12:56.293773  8897 heartbeater.cc:587] Failed to heartbeat to
> > 127.0.0.1:7051 (0 consecutive failures): Not authorized: Failed to ping
> > master at 127.0.0.1:7051: Client connection negotiation failed: client
> > connection to 127.0.0.1:7051: FATAL_UNAUTHORIZED: Not authorized:
> expected
> > TLS_HANDSHAKE step: SASL_INITIATE
> > W0612 04:12:56.296866  8897 heartbeater.cc:380] Failed 3 heartbeats in a
> > row: no longer allowing fast heartbeat attempts.
> > W0612 04:13:56.424613  8897 heartbeater.cc:587] Failed to heartbeat to
> > 127.0.0.1:7051 (62 consecutive failures): Not authorized: Failed to ping
> > master at 127.0.0.1:7051: Client connection negotiation failed: client
> > connection to 127.0.0.1:7051: FATAL_UNAUTHORIZED: Not authorized:
> expected
> > TLS_HANDSHAKE step: SASL_INITIATE
> > W0612 04:14:56.556850  8897 heartbeater.cc:587] Failed to heartbeat to
> > 127.0.0.1:7051 (122 consecutive failures): Not authorized: Failed to
> ping
> > master at 127.0.0.1:7051: Client connection negotiation failed: client
> > connection to 127.0.0.1:7051: FATAL_UNAUTHORIZED: Not authorized:
> expected
> > TLS_HANDSHAKE step: SASL_INITIATE
> > W0612 04:15:56.694403  8897 heartbeater.cc:587] Failed to heartbeat to
> > 127.0.0.1:7051 (182 consecutive failures): Not authorized: Failed to
> ping
> > master at 127.0.0.1:7051: Client connection negotiation failed: client
> > connection to 127.0.0.1:7051: FATAL_UNAUTHORIZED: Not authorized:
> expected
> > TLS_HANDSHAKE step: SASL_INITIATE
> > W0612 04:16:56.826400  8897 heartbeater.cc:587] Failed to heartbeat to
> > 127.0.0.1:7051 (242 consecutive failures): Not authorized: Failed to
> ping
> > master at 127.0.0.1:7051: Client connection negotiation failed: client
> > connection to 127.0.0.1:7051: FATAL_UNAUTHORIZED: Not authorized:
> expected
> > TLS_HANDSHAKE step: SASL_INITIATE
> > W0612 04:17:56.955927  8897 heartbeater.cc:587] Failed to heartbeat to
> > 127.0.0.1:7051 (302 consecutive failures): Not authorized: Failed to
> ping
> > master at 127.0.0.1:7051: Client connection negotiation failed: client
> > connection to 127.0.0.1:7051: FATAL_UNAUTHORIZED: Not authorized:
> expected
> > TLS_HANDSHAKE step: SASL_INITIATE
> > W0612 04:18:57.103503  8897 heartbeater.cc:587] Failed to heartbeat to
> > 127.0.0.1:7051 (362 consecutive failures): Not authorized: Failed to
> ping
> > master at 127.0.0.1:7051: Client connection negotiation failed: client
> > connection to 127.0.0.1:7051: FATAL_UNAUTHORIZED: Not authorized:
> expected
> > TLS_HANDSHAKE step: SASL_INITIATE
> > W0612 04:19:57.237712  8897 heartbeater.cc:587] Failed to heartbeat to
> > 127.0.0.1:7051 (422 consecutive failures): Not authorized: Failed to
> ping
> > master at 127.0.0.1:7051: Client connection negotiation failed: client
> > connection to 127.0.0.1:7051: FATAL_UNAUTHORIZED: Not authorized:
> expected
> > TLS_HANDSHAKE step: SASL_INITIATE
> > W0612 04:20:57.393489  8897 heartbeater.cc:587] Failed to heartbeat to
> > 127.0.0.1:7051 (482 consecutive failures): Not authorized: Failed to
> ping
> > master at 127.0.0.1:7051: Client connection negotiation failed: client
> > connection to 127.0.0.1:7051: FATAL_UNAUTHORIZED: Not authorized:
> expected
> > TLS_HANDSHAKE step: SASL_INITIATE
> > W0612 04:21:57.522513  8897 heartbeater.cc:587] Failed to heartbeat to
> > 127.0.0.1:7051 (542 consecutive failures): Not authorized: Failed to
> ping
> > master at 127.0.0.1:7051: Client connection negotiation failed: client
> > connection to 127.0.0.1:7051: FATAL_UNAUTHORIZED: Not authorized:
> expected
> > TLS_HANDSHAKE step: SASL_INITIATE
> > W0612 04:22:57.652271  8897 heartbeater.cc:587] Failed to heartbeat to
> > 127.0.0.1:7051 (602 consecutive failures): Not authorized: Failed to
> ping
> > master at 127.0.0.1:7051: Client connection negotiation failed: client
> > connection to 127.0.0.1:7051: FATAL_UNAUTHORIZED: Not authorized:
> expected
> > TLS_HANDSHAKE step: SASL_INITIATE
> > W0612 04:23:57.782537  8897 heartbeater.cc:587] Failed to heartbeat to
> > 127.0.0.1:7051 (662 consecutive failures): Not authorized: Failed to
> ping
> > master at 127.0.0.1:7051: Client connection negotiation failed: client
> > connection to 127.0.0.1:7051: FATAL_UNAUTHORIZED: Not authorized:
> expected
> > TLS_HANDSHAKE step: SASL_INITIATE
> > W0612 04:24:57.910481  8897 heartbeater.cc:587] Failed to heartbeat to
> > 127.0.0.1:7051 (722 consecutive failures): Not authorized: Failed to
> ping
> > master at 127.0.0.1:7051: Client connection negotiation failed: client
> > connection to 127.0.0.1:7051: FATAL_UNAUTHORIZED: Not authorized:
> expected
> > TLS_HANDSHAKE step: SASL_INITIATE
> >
> >
> > On Mon, Jun 17, 2019 at 9:08 PM Todd Lipcon <to...@cloudera.com> wrote:
> >
> > > On Sat, Jun 15, 2019 at 2:20 PM Jim Apple <ap...@jbapple.com> wrote:
> > >
> > > > My goal is to have Impala keep up with (what I perceive to be) the
> most
> > > > popular version of the most popular Linux distribution, for the
> purpose
> > > of
> > > > easing the workflow of developers, especially new developers.
> > > >
> > >
> > > Sure, that makes sense. I use Ubuntu 18 myself, but tend to develop
> > Impala
> > > on a remote box running el7 because the dev environment is too
> > heavy-weight
> > > to realistically run on my laptop.
> > >
> > >
> > > >
> > > > 18.04 stopped being able to load data some time between June 9th and
> > > > https://jenkins.impala.io/job/ubuntu-18.04-from-scratch/14/ and June
> > 12
> > > > and
> > > >
> > > >
> > >
> >
> https://jenkins.impala.io/job/ubuntu-18.04-from-scratch/16/artifact/Impala/logs_static/logs/data_loading/catalogd.ERROR/*view*/
> > > > .
> > > > I tried reproducing the June 9 run with the same git checkouts
> (Impala
> > > and
> > > > Impala-LZO) as #14 today, and data loading still failed.
> > > >
> > > > What RHEL 7 components did you have in mind that are closer to Ubuntu
> > > 16.04
> > > > than 18.04?
> > > >
> > >
> > > Stuff like libc, openssl, krb5, sasl, etc are pretty different
> > > version-wise. At least, I know when we made Kudu pass tests on Ubuntu
> 18,
> > > we dealt with issues mostly in those libraries, which aren't part of
> the
> > > toolchain (for security reasons we rely on OS-provided libs).
> > >
> > > Generally I think precommit running on something closer to the oldest
> > > supported OS is better than running on the newest, since it's more
> likely
> > > that new OSes are backward-compatible. Otherwise it's very easy to
> > > introduce code that uses features not available on el7, for example.
> > >
> > >
> > > >
> > > > On Wed, May 22, 2019 at 10:41 AM Todd Lipcon <to...@cloudera.com>
> > wrote:
> > > >
> > > > > On Mon, May 20, 2019 at 8:36 PM Jim Apple <ap...@jbapple.com>
> > wrote:
> > > > >
> > > > > > Maybe now would be a good time to implement Everblue jobs that
> ping
> > > > dev@
> > > > > > when they fail. Thoughts?
> > > > > >
> > > > >
> > > > > Mixed feelings on that. We already get many test runs per day of
> the
> > > > > "default" config because people are running precommit builds.
> Adding
> > an
> > > > > additional cron-based job to the mix that runs the same builds
> > doesn't
> > > > seem
> > > > > like it adds much unless it tests some other config (eg Ubuntu 18
> or
> > a
> > > > > longer suite of tests). One thing I could get on board with would
> be
> > > > > switching the precommit builds to run just "core" tests or some
> other
> > > > > faster subset, and defer the exhaustive/long runs to scheduled
> builds
> > > or
> > > > as
> > > > > an optional precommit for particularly invasive patches. I think
> that
> > > > would
> > > > > increase dev quality of life substantially (I find my productivity
> is
> > > > often
> > > > > hampered by only getting two shots at a precommit run per work
> day).
> > > > >
> > > > > I'm not against adding a cron-triggered full test/build on Ubuntu
> 18,
> > > but
> > > > > would like to know if someone plans to sign up to triage it when it
> > > > fails.
> > > > > My experience with other Apache communities is that collective
> > > ownership
> > > > > over test triage duty (ie "email the dev list on failure" doesn't
> > > work. I
> > > > > seem to recall we had such builds back in 2010 or so on Hadoop and
> > they
> > > > > just always got ignored. In various "day job" teams I've seen this
> > work
> > > > via
> > > > > a prescriptive rotation ("all team members take a triage/build-cop
> > > > shift")
> > > > > but that's not really compatbile with the nature of Apache projects
> > > being
> > > > > volunteer communities.
> > > > >
> > > > > So, I think I'll put the question back to you: as a committer you
> can
> > > > spend
> > > > > your time as you like. If you think an Ubuntu 18 job running on a
> > > > schedule
> > > > > would be useful and willing to sign up to triage failures, sounds
> > great
> > > > to
> > > > > me :) Personally I don't develop on Ubuntu 18 and in my day job
> it's
> > > not
> > > > a
> > > > > particularly important deployment platform, so I personally don't
> > think
> > > > > I'll spend much time triaging that build.
> > > > >
> > > > > Todd
> > > > >
> > > > >
> > > > > >
> > > > > > On Mon, May 20, 2019 at 9:09 AM Todd Lipcon <to...@cloudera.com>
> > > wrote:
> > > > > >
> > > > > > > Adding a build-only job for 18.04 makes sense to me. A full
> test
> > > run
> > > > on
> > > > > > > every precommit seems a bit expensive but doing one once a week
> > or
> > > > > > > something like that might be a good idea to prevent runtime
> > > > > regressions.
> > > > > > >
> > > > > > > As for switching the precommit from 16.04 to 18.04, I'd lean
> > > towards
> > > > > > > keeping to 16.04 due to it being closer in terms of component
> > > > versions
> > > > > to
> > > > > > > common enterprise distros like RHEL 7.
> > > > > > >
> > > > > > > -Todd
> > > > > > >
> > > > > > > On Sun, May 19, 2019 at 5:03 PM Jim Apple <jb...@apache.org>
> > > > wrote:
> > > > > > >
> > > > > > > > HEAD now passes on Ubuntu 18.04:
> > > > > > > >
> > > > > > > > https://jenkins.impala.io/job/ubuntu-18.04-from-scratch/
> > > > > > > >
> > > > > > > > Thanks to the community members who have made this happen!
> > > > > > > >
> > > > > > > > Should we add Ubuntu 18.04 to our pre-merge Jenkins job,
> > replace
> > > > > 16.04
> > > > > > > with
> > > > > > > > 18.04 in our pre-merge Jenkins job, or neither?
> > > > > > > >
> > > > > > > > I propose adding 18.04 for now (ans so running both 16.04 and
> > > 18.04
> > > > > on
> > > > > > > > merge) and removing 16.04 when it starts to become
> > inconvenient.
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Todd Lipcon
> > > > > > > Software Engineer, Cloudera
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Todd Lipcon
> > > > > Software Engineer, Cloudera
> > > > >
> > > >
> > >
> > >
> > > --
> > > Todd Lipcon
> > > Software Engineer, Cloudera
> > >
> >
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>

Re: Ubuntu 18.04 in pre-merge tests?

Posted by Laszlo Gaal <la...@cloudera.com>.
Thanks a lot, Todd & Alexey!

What I missed was that Ubuntu 18.04 upgraded to OpenSSL 1.1.1 since I fixed
IMPALA-8563 (update the SSL cipher set used in Impala BE tests).
This answer on AskUbuntu[1] shows the upgrade was pretty recent.

Since the test that Jim linked installs the dependencies from scratch every
time
it starts running, it is now pretty clear why this has happened only in the
recent runs.

Thanks again,

    - Laszlo

[1] https://askubuntu.com/a/1102966/592189

On Thu, Jun 20, 2019 at 9:00 AM Todd Lipcon <to...@cloudera.com> wrote:

> Alexey and I looked at this today and realized the issue is with OpenSSL
> 1.1.1, which adds support for TLS 1.3. This breaks the TLS negotiation in
> the krpc library. Likely Impala's usage of krpc would also break in this
> environment when wire encryption is enabled. I put up a temporary fix
> (disable TLS 1.3) here: http://gerrit.cloudera.org:8080/13683
>
> Likely we need to cross-port this to Impala's krpc copy as well.
>
> -Todd
>
> On Wed, Jun 19, 2019 at 3:30 PM Alexey Serbin <as...@cloudera.com>
> wrote:
>
> > Yep, some time ago over a weekend I started with an attempt to get
> > Fedora29 machine, but I stuck there while trying to provision such a
> > thing.  I.e., the machine has been eventually provisioned, but I could
> not
> > access it.  That was where I left it.
> >
> > Having Ubuntu18 as a target machine is better since at least it's easier
> > to create one for me.  I've provisioned one already and I'm starting Kudu
> > build there in at attempt take a look at the issue later tonight.
> >
> > I'll keep you posted on my findings.
> >
> >
> > Kind regards,
> >
> > Alexey
> >
> > On Wed, Jun 19, 2019 at 2:53 PM Todd Lipcon <to...@cloudera.com> wrote:
> >
> >> This same issue was reported a month or two ago for Kudu on Fedora 29. I
> >> think Alexey Serbin had started to look into it. Alexey, did we figure
> out
> >> what was going on here?
> >>
> >> -Todd
> >>
> >> On Wed, Jun 19, 2019 at 6:00 AM Laszlo Gaal <la...@cloudera.com>
> >> wrote:
> >>
> >>> Having looked at the failing build Jim quoted above, the failure seems
> to
> >>> come from the security area.
> >>> This is from the Kudu master's log, from the startup sequence (see
> >>>
> >>>
> https://jenkins.impala.io/job/ubuntu-18.04-from-scratch/16/artifact/Impala/logs_static/logs/cluster/cdh6-node-1/kudu/master/kudu-master.INFO/*view*/
> >>> ),
> >>> all this in the context of an Impala minicluster:
> >>>
> >>> I0612 04:12:56.129866  8515 sys_catalog.cc:424] T
> >>> 00000000000000000000000000000000 P 58a05ce6efa74b30907ac4d679bd0515
> >>> [sys.catalog]: configured and running, proceeding with master startup.
> >>> W0612 04:12:56.130080  8522 catalog_manager.cc:1113] T
> >>> 00000000000000000000000000000000 P 58a05ce6efa74b30907ac4d679bd0515:
> >>> acquiring CA information for follower catalog manager: Not found: root
> CA
> >>> entry not found
> >>> W0612 04:12:56.130123  8522 catalog_manager.cc:596] Not found: root CA
> >>> entry not found: failed to prepare follower catalog manager, will retry
> >>> I0612 04:12:56.130151  8521 catalog_manager.cc:1055] Loading table and
> >>> tablet metadata into memory...
> >>> I0612 04:12:56.130228  8521 catalog_manager.cc:1066] Initializing Kudu
> >>> internal certificate authority...
> >>> W0612 04:12:56.167639  8636 negotiation.cc:320] Unauthorized connection
> >>> attempt: Server connection negotiation failed: server connection from
> >>> 127.0.0.1:50174: expected TLS_HANDSHAKE step: SASL_INITIATE
> >>> W0612 04:12:56.170145  8636 negotiation.cc:320] Unauthorized connection
> >>> attempt: Server connection negotiation failed: server connection from
> >>> 127.0.0.1:50176: expected TLS_HANDSHAKE step: SASL_INITIATE
> >>> W0612 04:12:56.172571  8636 negotiation.cc:320] Unauthorized connection
> >>> attempt: Server connection negotiation failed: server connection from
> >>> 127.0.0.1:50178: expected TLS_HANDSHAKE step: SASL_INITIATE
> >>> W0612 04:12:56.182530  8636 negotiation.cc:320] Unauthorized connection
> >>> attempt: Server connection negotiation failed: server connection from
> >>> 127.0.0.1:50180: expected TLS_HANDSHAKE step: SASL_INITIATE
> >>> W0612 04:12:56.185034  8636 negotiation.cc:320] Unauthorized connection
> >>> attempt: Server connection negotiation failed: server connection from
> >>> 127.0.0.1:50182: expected TLS_HANDSHAKE step: SASL_INITIATE
> >>> W0612 04:12:56.187453  8636 negotiation.cc:320] Unauthorized connection
> >>> attempt: Server connection negotiation failed: server connection from
> >>> 127.0.0.1:50184: expected TLS_HANDSHAKE step: SASL_INITIATE
> >>> I0612 04:12:56.197146  8521 catalog_manager.cc:950] Generated new
> >>> certificate authority record
> >>> I0612 04:12:56.198005  8521 catalog_manager.cc:1075] Loading token
> >>> signing
> >>> keys...
> >>> W0612 04:12:56.293697  8636 negotiation.cc:320] Unauthorized connection
> >>> attempt: Server connection negotiation failed: server connection from
> >>> 127.0.0.1:50186: expected TLS_HANDSHAKE step: SASL_INITIATE
> >>> W0612 04:12:56.295320  8636 negotiation.cc:320] Unauthorized connection
> >>> attempt: Server connection negotiation failed: server connection from
> >>> 127.0.0.1:50188: expected TLS_HANDSHAKE step: SASL_INITIATE
> >>> W0612 04:12:56.296821  8636 negotiation.cc:320] Unauthorized connection
> >>> attempt: Server connection negotiation failed: server connection from
> >>> 127.0.0.1:50190: expected TLS_HANDSHAKE step: SASL_INITIATE
> >>> I0612 04:12:56.416918  8521 catalog_manager.cc:4292] T
> >>> 00000000000000000000000000000000 P 58a05ce6efa74b30907ac4d679bd0515:
> >>> Generated new TSK 0
> >>> W0612 04:12:57.174684  8901 negotiation.cc:320] Unauthorized connection
> >>> attempt: Server connection negotiation failed: server connection from
> >>> 127.0.0.1:50192: expected TLS_HANDSHAKE step: SASL_INITIATE
> >>> [and so on...]
> >>>
> >>> The same run has very similar messages in the tablet server logs as
> well:
> >>> 0612 04:12:56.289767  8396 rpc_server.cc:205] RPC server started. Bound
> >>> to:
> >>> 127.0.0.1:31202
> >>> I0612 04:12:56.289903  8396 webserver.cc:308] Webserver started at
> >>> http://0.0.0.0:31302/ using document root
> >>>
> >>>
> /home/ubuntu/Impala/toolchain/cdh_components-1137441/kudu-1.10.0-cdh6.x-SNAPSHOT/release/bin/../lib/kudu/www
> >>> and password file <none>
> >>> W0612 04:12:56.293773  8897 heartbeater.cc:587] Failed to heartbeat to
> >>> 127.0.0.1:7051 (0 consecutive failures): Not authorized: Failed to
> ping
> >>> master at 127.0.0.1:7051: Client connection negotiation failed: client
> >>> connection to 127.0.0.1:7051: FATAL_UNAUTHORIZED: Not authorized:
> >>> expected
> >>> TLS_HANDSHAKE step: SASL_INITIATE
> >>> W0612 04:12:56.296866  8897 heartbeater.cc:380] Failed 3 heartbeats in
> a
> >>> row: no longer allowing fast heartbeat attempts.
> >>> W0612 04:13:56.424613  8897 heartbeater.cc:587] Failed to heartbeat to
> >>> 127.0.0.1:7051 (62 consecutive failures): Not authorized: Failed to
> ping
> >>> master at 127.0.0.1:7051: Client connection negotiation failed: client
> >>> connection to 127.0.0.1:7051: FATAL_UNAUTHORIZED: Not authorized:
> >>> expected
> >>> TLS_HANDSHAKE step: SASL_INITIATE
> >>> W0612 04:14:56.556850  8897 heartbeater.cc:587] Failed to heartbeat to
> >>> 127.0.0.1:7051 (122 consecutive failures): Not authorized: Failed to
> >>> ping
> >>> master at 127.0.0.1:7051: Client connection negotiation failed: client
> >>> connection to 127.0.0.1:7051: FATAL_UNAUTHORIZED: Not authorized:
> >>> expected
> >>> TLS_HANDSHAKE step: SASL_INITIATE
> >>> W0612 04:15:56.694403  8897 heartbeater.cc:587] Failed to heartbeat to
> >>> 127.0.0.1:7051 (182 consecutive failures): Not authorized: Failed to
> >>> ping
> >>> master at 127.0.0.1:7051: Client connection negotiation failed: client
> >>> connection to 127.0.0.1:7051: FATAL_UNAUTHORIZED: Not authorized:
> >>> expected
> >>> TLS_HANDSHAKE step: SASL_INITIATE
> >>> W0612 04:16:56.826400  8897 heartbeater.cc:587] Failed to heartbeat to
> >>> 127.0.0.1:7051 (242 consecutive failures): Not authorized: Failed to
> >>> ping
> >>> master at 127.0.0.1:7051: Client connection negotiation failed: client
> >>> connection to 127.0.0.1:7051: FATAL_UNAUTHORIZED: Not authorized:
> >>> expected
> >>> TLS_HANDSHAKE step: SASL_INITIATE
> >>> W0612 04:17:56.955927  8897 heartbeater.cc:587] Failed to heartbeat to
> >>> 127.0.0.1:7051 (302 consecutive failures): Not authorized: Failed to
> >>> ping
> >>> master at 127.0.0.1:7051: Client connection negotiation failed: client
> >>> connection to 127.0.0.1:7051: FATAL_UNAUTHORIZED: Not authorized:
> >>> expected
> >>> TLS_HANDSHAKE step: SASL_INITIATE
> >>> W0612 04:18:57.103503  8897 heartbeater.cc:587] Failed to heartbeat to
> >>> 127.0.0.1:7051 (362 consecutive failures): Not authorized: Failed to
> >>> ping
> >>> master at 127.0.0.1:7051: Client connection negotiation failed: client
> >>> connection to 127.0.0.1:7051: FATAL_UNAUTHORIZED: Not authorized:
> >>> expected
> >>> TLS_HANDSHAKE step: SASL_INITIATE
> >>> W0612 04:19:57.237712  8897 heartbeater.cc:587] Failed to heartbeat to
> >>> 127.0.0.1:7051 (422 consecutive failures): Not authorized: Failed to
> >>> ping
> >>> master at 127.0.0.1:7051: Client connection negotiation failed: client
> >>> connection to 127.0.0.1:7051: FATAL_UNAUTHORIZED: Not authorized:
> >>> expected
> >>> TLS_HANDSHAKE step: SASL_INITIATE
> >>> W0612 04:20:57.393489  8897 heartbeater.cc:587] Failed to heartbeat to
> >>> 127.0.0.1:7051 (482 consecutive failures): Not authorized: Failed to
> >>> ping
> >>> master at 127.0.0.1:7051: Client connection negotiation failed: client
> >>> connection to 127.0.0.1:7051: FATAL_UNAUTHORIZED: Not authorized:
> >>> expected
> >>> TLS_HANDSHAKE step: SASL_INITIATE
> >>> W0612 04:21:57.522513  8897 heartbeater.cc:587] Failed to heartbeat to
> >>> 127.0.0.1:7051 (542 consecutive failures): Not authorized: Failed to
> >>> ping
> >>> master at 127.0.0.1:7051: Client connection negotiation failed: client
> >>> connection to 127.0.0.1:7051: FATAL_UNAUTHORIZED: Not authorized:
> >>> expected
> >>> TLS_HANDSHAKE step: SASL_INITIATE
> >>> W0612 04:22:57.652271  8897 heartbeater.cc:587] Failed to heartbeat to
> >>> 127.0.0.1:7051 (602 consecutive failures): Not authorized: Failed to
> >>> ping
> >>> master at 127.0.0.1:7051: Client connection negotiation failed: client
> >>> connection to 127.0.0.1:7051: FATAL_UNAUTHORIZED: Not authorized:
> >>> expected
> >>> TLS_HANDSHAKE step: SASL_INITIATE
> >>> W0612 04:23:57.782537  8897 heartbeater.cc:587] Failed to heartbeat to
> >>> 127.0.0.1:7051 (662 consecutive failures): Not authorized: Failed to
> >>> ping
> >>> master at 127.0.0.1:7051: Client connection negotiation failed: client
> >>> connection to 127.0.0.1:7051: FATAL_UNAUTHORIZED: Not authorized:
> >>> expected
> >>> TLS_HANDSHAKE step: SASL_INITIATE
> >>> W0612 04:24:57.910481  8897 heartbeater.cc:587] Failed to heartbeat to
> >>> 127.0.0.1:7051 (722 consecutive failures): Not authorized: Failed to
> >>> ping
> >>> master at 127.0.0.1:7051: Client connection negotiation failed: client
> >>> connection to 127.0.0.1:7051: FATAL_UNAUTHORIZED: Not authorized:
> >>> expected
> >>> TLS_HANDSHAKE step: SASL_INITIATE
> >>>
> >>>
> >>> On Mon, Jun 17, 2019 at 9:08 PM Todd Lipcon <to...@cloudera.com> wrote:
> >>>
> >>> > On Sat, Jun 15, 2019 at 2:20 PM Jim Apple <ap...@jbapple.com>
> wrote:
> >>> >
> >>> > > My goal is to have Impala keep up with (what I perceive to be) the
> >>> most
> >>> > > popular version of the most popular Linux distribution, for the
> >>> purpose
> >>> > of
> >>> > > easing the workflow of developers, especially new developers.
> >>> > >
> >>> >
> >>> > Sure, that makes sense. I use Ubuntu 18 myself, but tend to develop
> >>> Impala
> >>> > on a remote box running el7 because the dev environment is too
> >>> heavy-weight
> >>> > to realistically run on my laptop.
> >>> >
> >>> >
> >>> > >
> >>> > > 18.04 stopped being able to load data some time between June 9th
> and
> >>> > > https://jenkins.impala.io/job/ubuntu-18.04-from-scratch/14/ and
> >>> June 12
> >>> > > and
> >>> > >
> >>> > >
> >>> >
> >>>
> https://jenkins.impala.io/job/ubuntu-18.04-from-scratch/16/artifact/Impala/logs_static/logs/data_loading/catalogd.ERROR/*view*/
> >>> > > .
> >>> > > I tried reproducing the June 9 run with the same git checkouts
> >>> (Impala
> >>> > and
> >>> > > Impala-LZO) as #14 today, and data loading still failed.
> >>> > >
> >>> > > What RHEL 7 components did you have in mind that are closer to
> Ubuntu
> >>> > 16.04
> >>> > > than 18.04?
> >>> > >
> >>> >
> >>> > Stuff like libc, openssl, krb5, sasl, etc are pretty different
> >>> > version-wise. At least, I know when we made Kudu pass tests on Ubuntu
> >>> 18,
> >>> > we dealt with issues mostly in those libraries, which aren't part of
> >>> the
> >>> > toolchain (for security reasons we rely on OS-provided libs).
> >>> >
> >>> > Generally I think precommit running on something closer to the oldest
> >>> > supported OS is better than running on the newest, since it's more
> >>> likely
> >>> > that new OSes are backward-compatible. Otherwise it's very easy to
> >>> > introduce code that uses features not available on el7, for example.
> >>> >
> >>> >
> >>> > >
> >>> > > On Wed, May 22, 2019 at 10:41 AM Todd Lipcon <to...@cloudera.com>
> >>> wrote:
> >>> > >
> >>> > > > On Mon, May 20, 2019 at 8:36 PM Jim Apple <ap...@jbapple.com>
> >>> wrote:
> >>> > > >
> >>> > > > > Maybe now would be a good time to implement Everblue jobs that
> >>> ping
> >>> > > dev@
> >>> > > > > when they fail. Thoughts?
> >>> > > > >
> >>> > > >
> >>> > > > Mixed feelings on that. We already get many test runs per day of
> >>> the
> >>> > > > "default" config because people are running precommit builds.
> >>> Adding an
> >>> > > > additional cron-based job to the mix that runs the same builds
> >>> doesn't
> >>> > > seem
> >>> > > > like it adds much unless it tests some other config (eg Ubuntu 18
> >>> or a
> >>> > > > longer suite of tests). One thing I could get on board with would
> >>> be
> >>> > > > switching the precommit builds to run just "core" tests or some
> >>> other
> >>> > > > faster subset, and defer the exhaustive/long runs to scheduled
> >>> builds
> >>> > or
> >>> > > as
> >>> > > > an optional precommit for particularly invasive patches. I think
> >>> that
> >>> > > would
> >>> > > > increase dev quality of life substantially (I find my
> productivity
> >>> is
> >>> > > often
> >>> > > > hampered by only getting two shots at a precommit run per work
> >>> day).
> >>> > > >
> >>> > > > I'm not against adding a cron-triggered full test/build on Ubuntu
> >>> 18,
> >>> > but
> >>> > > > would like to know if someone plans to sign up to triage it when
> it
> >>> > > fails.
> >>> > > > My experience with other Apache communities is that collective
> >>> > ownership
> >>> > > > over test triage duty (ie "email the dev list on failure" doesn't
> >>> > work. I
> >>> > > > seem to recall we had such builds back in 2010 or so on Hadoop
> and
> >>> they
> >>> > > > just always got ignored. In various "day job" teams I've seen
> this
> >>> work
> >>> > > via
> >>> > > > a prescriptive rotation ("all team members take a
> triage/build-cop
> >>> > > shift")
> >>> > > > but that's not really compatbile with the nature of Apache
> projects
> >>> > being
> >>> > > > volunteer communities.
> >>> > > >
> >>> > > > So, I think I'll put the question back to you: as a committer you
> >>> can
> >>> > > spend
> >>> > > > your time as you like. If you think an Ubuntu 18 job running on a
> >>> > > schedule
> >>> > > > would be useful and willing to sign up to triage failures, sounds
> >>> great
> >>> > > to
> >>> > > > me :) Personally I don't develop on Ubuntu 18 and in my day job
> >>> it's
> >>> > not
> >>> > > a
> >>> > > > particularly important deployment platform, so I personally don't
> >>> think
> >>> > > > I'll spend much time triaging that build.
> >>> > > >
> >>> > > > Todd
> >>> > > >
> >>> > > >
> >>> > > > >
> >>> > > > > On Mon, May 20, 2019 at 9:09 AM Todd Lipcon <todd@cloudera.com
> >
> >>> > wrote:
> >>> > > > >
> >>> > > > > > Adding a build-only job for 18.04 makes sense to me. A full
> >>> test
> >>> > run
> >>> > > on
> >>> > > > > > every precommit seems a bit expensive but doing one once a
> >>> week or
> >>> > > > > > something like that might be a good idea to prevent runtime
> >>> > > > regressions.
> >>> > > > > >
> >>> > > > > > As for switching the precommit from 16.04 to 18.04, I'd lean
> >>> > towards
> >>> > > > > > keeping to 16.04 due to it being closer in terms of component
> >>> > > versions
> >>> > > > to
> >>> > > > > > common enterprise distros like RHEL 7.
> >>> > > > > >
> >>> > > > > > -Todd
> >>> > > > > >
> >>> > > > > > On Sun, May 19, 2019 at 5:03 PM Jim Apple <
> jbapple@apache.org>
> >>> > > wrote:
> >>> > > > > >
> >>> > > > > > > HEAD now passes on Ubuntu 18.04:
> >>> > > > > > >
> >>> > > > > > > https://jenkins.impala.io/job/ubuntu-18.04-from-scratch/
> >>> > > > > > >
> >>> > > > > > > Thanks to the community members who have made this happen!
> >>> > > > > > >
> >>> > > > > > > Should we add Ubuntu 18.04 to our pre-merge Jenkins job,
> >>> replace
> >>> > > > 16.04
> >>> > > > > > with
> >>> > > > > > > 18.04 in our pre-merge Jenkins job, or neither?
> >>> > > > > > >
> >>> > > > > > > I propose adding 18.04 for now (ans so running both 16.04
> and
> >>> > 18.04
> >>> > > > on
> >>> > > > > > > merge) and removing 16.04 when it starts to become
> >>> inconvenient.
> >>> > > > > > >
> >>> > > > > >
> >>> > > > > >
> >>> > > > > > --
> >>> > > > > > Todd Lipcon
> >>> > > > > > Software Engineer, Cloudera
> >>> > > > > >
> >>> > > > >
> >>> > > >
> >>> > > >
> >>> > > > --
> >>> > > > Todd Lipcon
> >>> > > > Software Engineer, Cloudera
> >>> > > >
> >>> > >
> >>> >
> >>> >
> >>> > --
> >>> > Todd Lipcon
> >>> > Software Engineer, Cloudera
> >>> >
> >>>
> >>
> >>
> >> --
> >> Todd Lipcon
> >> Software Engineer, Cloudera
> >>
> >
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>

Re: Ubuntu 18.04 in pre-merge tests?

Posted by Todd Lipcon <to...@cloudera.com>.
Alexey and I looked at this today and realized the issue is with OpenSSL
1.1.1, which adds support for TLS 1.3. This breaks the TLS negotiation in
the krpc library. Likely Impala's usage of krpc would also break in this
environment when wire encryption is enabled. I put up a temporary fix
(disable TLS 1.3) here: http://gerrit.cloudera.org:8080/13683

Likely we need to cross-port this to Impala's krpc copy as well.

-Todd

On Wed, Jun 19, 2019 at 3:30 PM Alexey Serbin <as...@cloudera.com> wrote:

> Yep, some time ago over a weekend I started with an attempt to get
> Fedora29 machine, but I stuck there while trying to provision such a
> thing.  I.e., the machine has been eventually provisioned, but I could not
> access it.  That was where I left it.
>
> Having Ubuntu18 as a target machine is better since at least it's easier
> to create one for me.  I've provisioned one already and I'm starting Kudu
> build there in at attempt take a look at the issue later tonight.
>
> I'll keep you posted on my findings.
>
>
> Kind regards,
>
> Alexey
>
> On Wed, Jun 19, 2019 at 2:53 PM Todd Lipcon <to...@cloudera.com> wrote:
>
>> This same issue was reported a month or two ago for Kudu on Fedora 29. I
>> think Alexey Serbin had started to look into it. Alexey, did we figure out
>> what was going on here?
>>
>> -Todd
>>
>> On Wed, Jun 19, 2019 at 6:00 AM Laszlo Gaal <la...@cloudera.com>
>> wrote:
>>
>>> Having looked at the failing build Jim quoted above, the failure seems to
>>> come from the security area.
>>> This is from the Kudu master's log, from the startup sequence (see
>>>
>>> https://jenkins.impala.io/job/ubuntu-18.04-from-scratch/16/artifact/Impala/logs_static/logs/cluster/cdh6-node-1/kudu/master/kudu-master.INFO/*view*/
>>> ),
>>> all this in the context of an Impala minicluster:
>>>
>>> I0612 04:12:56.129866  8515 sys_catalog.cc:424] T
>>> 00000000000000000000000000000000 P 58a05ce6efa74b30907ac4d679bd0515
>>> [sys.catalog]: configured and running, proceeding with master startup.
>>> W0612 04:12:56.130080  8522 catalog_manager.cc:1113] T
>>> 00000000000000000000000000000000 P 58a05ce6efa74b30907ac4d679bd0515:
>>> acquiring CA information for follower catalog manager: Not found: root CA
>>> entry not found
>>> W0612 04:12:56.130123  8522 catalog_manager.cc:596] Not found: root CA
>>> entry not found: failed to prepare follower catalog manager, will retry
>>> I0612 04:12:56.130151  8521 catalog_manager.cc:1055] Loading table and
>>> tablet metadata into memory...
>>> I0612 04:12:56.130228  8521 catalog_manager.cc:1066] Initializing Kudu
>>> internal certificate authority...
>>> W0612 04:12:56.167639  8636 negotiation.cc:320] Unauthorized connection
>>> attempt: Server connection negotiation failed: server connection from
>>> 127.0.0.1:50174: expected TLS_HANDSHAKE step: SASL_INITIATE
>>> W0612 04:12:56.170145  8636 negotiation.cc:320] Unauthorized connection
>>> attempt: Server connection negotiation failed: server connection from
>>> 127.0.0.1:50176: expected TLS_HANDSHAKE step: SASL_INITIATE
>>> W0612 04:12:56.172571  8636 negotiation.cc:320] Unauthorized connection
>>> attempt: Server connection negotiation failed: server connection from
>>> 127.0.0.1:50178: expected TLS_HANDSHAKE step: SASL_INITIATE
>>> W0612 04:12:56.182530  8636 negotiation.cc:320] Unauthorized connection
>>> attempt: Server connection negotiation failed: server connection from
>>> 127.0.0.1:50180: expected TLS_HANDSHAKE step: SASL_INITIATE
>>> W0612 04:12:56.185034  8636 negotiation.cc:320] Unauthorized connection
>>> attempt: Server connection negotiation failed: server connection from
>>> 127.0.0.1:50182: expected TLS_HANDSHAKE step: SASL_INITIATE
>>> W0612 04:12:56.187453  8636 negotiation.cc:320] Unauthorized connection
>>> attempt: Server connection negotiation failed: server connection from
>>> 127.0.0.1:50184: expected TLS_HANDSHAKE step: SASL_INITIATE
>>> I0612 04:12:56.197146  8521 catalog_manager.cc:950] Generated new
>>> certificate authority record
>>> I0612 04:12:56.198005  8521 catalog_manager.cc:1075] Loading token
>>> signing
>>> keys...
>>> W0612 04:12:56.293697  8636 negotiation.cc:320] Unauthorized connection
>>> attempt: Server connection negotiation failed: server connection from
>>> 127.0.0.1:50186: expected TLS_HANDSHAKE step: SASL_INITIATE
>>> W0612 04:12:56.295320  8636 negotiation.cc:320] Unauthorized connection
>>> attempt: Server connection negotiation failed: server connection from
>>> 127.0.0.1:50188: expected TLS_HANDSHAKE step: SASL_INITIATE
>>> W0612 04:12:56.296821  8636 negotiation.cc:320] Unauthorized connection
>>> attempt: Server connection negotiation failed: server connection from
>>> 127.0.0.1:50190: expected TLS_HANDSHAKE step: SASL_INITIATE
>>> I0612 04:12:56.416918  8521 catalog_manager.cc:4292] T
>>> 00000000000000000000000000000000 P 58a05ce6efa74b30907ac4d679bd0515:
>>> Generated new TSK 0
>>> W0612 04:12:57.174684  8901 negotiation.cc:320] Unauthorized connection
>>> attempt: Server connection negotiation failed: server connection from
>>> 127.0.0.1:50192: expected TLS_HANDSHAKE step: SASL_INITIATE
>>> [and so on...]
>>>
>>> The same run has very similar messages in the tablet server logs as well:
>>> 0612 04:12:56.289767  8396 rpc_server.cc:205] RPC server started. Bound
>>> to:
>>> 127.0.0.1:31202
>>> I0612 04:12:56.289903  8396 webserver.cc:308] Webserver started at
>>> http://0.0.0.0:31302/ using document root
>>>
>>> /home/ubuntu/Impala/toolchain/cdh_components-1137441/kudu-1.10.0-cdh6.x-SNAPSHOT/release/bin/../lib/kudu/www
>>> and password file <none>
>>> W0612 04:12:56.293773  8897 heartbeater.cc:587] Failed to heartbeat to
>>> 127.0.0.1:7051 (0 consecutive failures): Not authorized: Failed to ping
>>> master at 127.0.0.1:7051: Client connection negotiation failed: client
>>> connection to 127.0.0.1:7051: FATAL_UNAUTHORIZED: Not authorized:
>>> expected
>>> TLS_HANDSHAKE step: SASL_INITIATE
>>> W0612 04:12:56.296866  8897 heartbeater.cc:380] Failed 3 heartbeats in a
>>> row: no longer allowing fast heartbeat attempts.
>>> W0612 04:13:56.424613  8897 heartbeater.cc:587] Failed to heartbeat to
>>> 127.0.0.1:7051 (62 consecutive failures): Not authorized: Failed to ping
>>> master at 127.0.0.1:7051: Client connection negotiation failed: client
>>> connection to 127.0.0.1:7051: FATAL_UNAUTHORIZED: Not authorized:
>>> expected
>>> TLS_HANDSHAKE step: SASL_INITIATE
>>> W0612 04:14:56.556850  8897 heartbeater.cc:587] Failed to heartbeat to
>>> 127.0.0.1:7051 (122 consecutive failures): Not authorized: Failed to
>>> ping
>>> master at 127.0.0.1:7051: Client connection negotiation failed: client
>>> connection to 127.0.0.1:7051: FATAL_UNAUTHORIZED: Not authorized:
>>> expected
>>> TLS_HANDSHAKE step: SASL_INITIATE
>>> W0612 04:15:56.694403  8897 heartbeater.cc:587] Failed to heartbeat to
>>> 127.0.0.1:7051 (182 consecutive failures): Not authorized: Failed to
>>> ping
>>> master at 127.0.0.1:7051: Client connection negotiation failed: client
>>> connection to 127.0.0.1:7051: FATAL_UNAUTHORIZED: Not authorized:
>>> expected
>>> TLS_HANDSHAKE step: SASL_INITIATE
>>> W0612 04:16:56.826400  8897 heartbeater.cc:587] Failed to heartbeat to
>>> 127.0.0.1:7051 (242 consecutive failures): Not authorized: Failed to
>>> ping
>>> master at 127.0.0.1:7051: Client connection negotiation failed: client
>>> connection to 127.0.0.1:7051: FATAL_UNAUTHORIZED: Not authorized:
>>> expected
>>> TLS_HANDSHAKE step: SASL_INITIATE
>>> W0612 04:17:56.955927  8897 heartbeater.cc:587] Failed to heartbeat to
>>> 127.0.0.1:7051 (302 consecutive failures): Not authorized: Failed to
>>> ping
>>> master at 127.0.0.1:7051: Client connection negotiation failed: client
>>> connection to 127.0.0.1:7051: FATAL_UNAUTHORIZED: Not authorized:
>>> expected
>>> TLS_HANDSHAKE step: SASL_INITIATE
>>> W0612 04:18:57.103503  8897 heartbeater.cc:587] Failed to heartbeat to
>>> 127.0.0.1:7051 (362 consecutive failures): Not authorized: Failed to
>>> ping
>>> master at 127.0.0.1:7051: Client connection negotiation failed: client
>>> connection to 127.0.0.1:7051: FATAL_UNAUTHORIZED: Not authorized:
>>> expected
>>> TLS_HANDSHAKE step: SASL_INITIATE
>>> W0612 04:19:57.237712  8897 heartbeater.cc:587] Failed to heartbeat to
>>> 127.0.0.1:7051 (422 consecutive failures): Not authorized: Failed to
>>> ping
>>> master at 127.0.0.1:7051: Client connection negotiation failed: client
>>> connection to 127.0.0.1:7051: FATAL_UNAUTHORIZED: Not authorized:
>>> expected
>>> TLS_HANDSHAKE step: SASL_INITIATE
>>> W0612 04:20:57.393489  8897 heartbeater.cc:587] Failed to heartbeat to
>>> 127.0.0.1:7051 (482 consecutive failures): Not authorized: Failed to
>>> ping
>>> master at 127.0.0.1:7051: Client connection negotiation failed: client
>>> connection to 127.0.0.1:7051: FATAL_UNAUTHORIZED: Not authorized:
>>> expected
>>> TLS_HANDSHAKE step: SASL_INITIATE
>>> W0612 04:21:57.522513  8897 heartbeater.cc:587] Failed to heartbeat to
>>> 127.0.0.1:7051 (542 consecutive failures): Not authorized: Failed to
>>> ping
>>> master at 127.0.0.1:7051: Client connection negotiation failed: client
>>> connection to 127.0.0.1:7051: FATAL_UNAUTHORIZED: Not authorized:
>>> expected
>>> TLS_HANDSHAKE step: SASL_INITIATE
>>> W0612 04:22:57.652271  8897 heartbeater.cc:587] Failed to heartbeat to
>>> 127.0.0.1:7051 (602 consecutive failures): Not authorized: Failed to
>>> ping
>>> master at 127.0.0.1:7051: Client connection negotiation failed: client
>>> connection to 127.0.0.1:7051: FATAL_UNAUTHORIZED: Not authorized:
>>> expected
>>> TLS_HANDSHAKE step: SASL_INITIATE
>>> W0612 04:23:57.782537  8897 heartbeater.cc:587] Failed to heartbeat to
>>> 127.0.0.1:7051 (662 consecutive failures): Not authorized: Failed to
>>> ping
>>> master at 127.0.0.1:7051: Client connection negotiation failed: client
>>> connection to 127.0.0.1:7051: FATAL_UNAUTHORIZED: Not authorized:
>>> expected
>>> TLS_HANDSHAKE step: SASL_INITIATE
>>> W0612 04:24:57.910481  8897 heartbeater.cc:587] Failed to heartbeat to
>>> 127.0.0.1:7051 (722 consecutive failures): Not authorized: Failed to
>>> ping
>>> master at 127.0.0.1:7051: Client connection negotiation failed: client
>>> connection to 127.0.0.1:7051: FATAL_UNAUTHORIZED: Not authorized:
>>> expected
>>> TLS_HANDSHAKE step: SASL_INITIATE
>>>
>>>
>>> On Mon, Jun 17, 2019 at 9:08 PM Todd Lipcon <to...@cloudera.com> wrote:
>>>
>>> > On Sat, Jun 15, 2019 at 2:20 PM Jim Apple <ap...@jbapple.com> wrote:
>>> >
>>> > > My goal is to have Impala keep up with (what I perceive to be) the
>>> most
>>> > > popular version of the most popular Linux distribution, for the
>>> purpose
>>> > of
>>> > > easing the workflow of developers, especially new developers.
>>> > >
>>> >
>>> > Sure, that makes sense. I use Ubuntu 18 myself, but tend to develop
>>> Impala
>>> > on a remote box running el7 because the dev environment is too
>>> heavy-weight
>>> > to realistically run on my laptop.
>>> >
>>> >
>>> > >
>>> > > 18.04 stopped being able to load data some time between June 9th and
>>> > > https://jenkins.impala.io/job/ubuntu-18.04-from-scratch/14/ and
>>> June 12
>>> > > and
>>> > >
>>> > >
>>> >
>>> https://jenkins.impala.io/job/ubuntu-18.04-from-scratch/16/artifact/Impala/logs_static/logs/data_loading/catalogd.ERROR/*view*/
>>> > > .
>>> > > I tried reproducing the June 9 run with the same git checkouts
>>> (Impala
>>> > and
>>> > > Impala-LZO) as #14 today, and data loading still failed.
>>> > >
>>> > > What RHEL 7 components did you have in mind that are closer to Ubuntu
>>> > 16.04
>>> > > than 18.04?
>>> > >
>>> >
>>> > Stuff like libc, openssl, krb5, sasl, etc are pretty different
>>> > version-wise. At least, I know when we made Kudu pass tests on Ubuntu
>>> 18,
>>> > we dealt with issues mostly in those libraries, which aren't part of
>>> the
>>> > toolchain (for security reasons we rely on OS-provided libs).
>>> >
>>> > Generally I think precommit running on something closer to the oldest
>>> > supported OS is better than running on the newest, since it's more
>>> likely
>>> > that new OSes are backward-compatible. Otherwise it's very easy to
>>> > introduce code that uses features not available on el7, for example.
>>> >
>>> >
>>> > >
>>> > > On Wed, May 22, 2019 at 10:41 AM Todd Lipcon <to...@cloudera.com>
>>> wrote:
>>> > >
>>> > > > On Mon, May 20, 2019 at 8:36 PM Jim Apple <ap...@jbapple.com>
>>> wrote:
>>> > > >
>>> > > > > Maybe now would be a good time to implement Everblue jobs that
>>> ping
>>> > > dev@
>>> > > > > when they fail. Thoughts?
>>> > > > >
>>> > > >
>>> > > > Mixed feelings on that. We already get many test runs per day of
>>> the
>>> > > > "default" config because people are running precommit builds.
>>> Adding an
>>> > > > additional cron-based job to the mix that runs the same builds
>>> doesn't
>>> > > seem
>>> > > > like it adds much unless it tests some other config (eg Ubuntu 18
>>> or a
>>> > > > longer suite of tests). One thing I could get on board with would
>>> be
>>> > > > switching the precommit builds to run just "core" tests or some
>>> other
>>> > > > faster subset, and defer the exhaustive/long runs to scheduled
>>> builds
>>> > or
>>> > > as
>>> > > > an optional precommit for particularly invasive patches. I think
>>> that
>>> > > would
>>> > > > increase dev quality of life substantially (I find my productivity
>>> is
>>> > > often
>>> > > > hampered by only getting two shots at a precommit run per work
>>> day).
>>> > > >
>>> > > > I'm not against adding a cron-triggered full test/build on Ubuntu
>>> 18,
>>> > but
>>> > > > would like to know if someone plans to sign up to triage it when it
>>> > > fails.
>>> > > > My experience with other Apache communities is that collective
>>> > ownership
>>> > > > over test triage duty (ie "email the dev list on failure" doesn't
>>> > work. I
>>> > > > seem to recall we had such builds back in 2010 or so on Hadoop and
>>> they
>>> > > > just always got ignored. In various "day job" teams I've seen this
>>> work
>>> > > via
>>> > > > a prescriptive rotation ("all team members take a triage/build-cop
>>> > > shift")
>>> > > > but that's not really compatbile with the nature of Apache projects
>>> > being
>>> > > > volunteer communities.
>>> > > >
>>> > > > So, I think I'll put the question back to you: as a committer you
>>> can
>>> > > spend
>>> > > > your time as you like. If you think an Ubuntu 18 job running on a
>>> > > schedule
>>> > > > would be useful and willing to sign up to triage failures, sounds
>>> great
>>> > > to
>>> > > > me :) Personally I don't develop on Ubuntu 18 and in my day job
>>> it's
>>> > not
>>> > > a
>>> > > > particularly important deployment platform, so I personally don't
>>> think
>>> > > > I'll spend much time triaging that build.
>>> > > >
>>> > > > Todd
>>> > > >
>>> > > >
>>> > > > >
>>> > > > > On Mon, May 20, 2019 at 9:09 AM Todd Lipcon <to...@cloudera.com>
>>> > wrote:
>>> > > > >
>>> > > > > > Adding a build-only job for 18.04 makes sense to me. A full
>>> test
>>> > run
>>> > > on
>>> > > > > > every precommit seems a bit expensive but doing one once a
>>> week or
>>> > > > > > something like that might be a good idea to prevent runtime
>>> > > > regressions.
>>> > > > > >
>>> > > > > > As for switching the precommit from 16.04 to 18.04, I'd lean
>>> > towards
>>> > > > > > keeping to 16.04 due to it being closer in terms of component
>>> > > versions
>>> > > > to
>>> > > > > > common enterprise distros like RHEL 7.
>>> > > > > >
>>> > > > > > -Todd
>>> > > > > >
>>> > > > > > On Sun, May 19, 2019 at 5:03 PM Jim Apple <jb...@apache.org>
>>> > > wrote:
>>> > > > > >
>>> > > > > > > HEAD now passes on Ubuntu 18.04:
>>> > > > > > >
>>> > > > > > > https://jenkins.impala.io/job/ubuntu-18.04-from-scratch/
>>> > > > > > >
>>> > > > > > > Thanks to the community members who have made this happen!
>>> > > > > > >
>>> > > > > > > Should we add Ubuntu 18.04 to our pre-merge Jenkins job,
>>> replace
>>> > > > 16.04
>>> > > > > > with
>>> > > > > > > 18.04 in our pre-merge Jenkins job, or neither?
>>> > > > > > >
>>> > > > > > > I propose adding 18.04 for now (ans so running both 16.04 and
>>> > 18.04
>>> > > > on
>>> > > > > > > merge) and removing 16.04 when it starts to become
>>> inconvenient.
>>> > > > > > >
>>> > > > > >
>>> > > > > >
>>> > > > > > --
>>> > > > > > Todd Lipcon
>>> > > > > > Software Engineer, Cloudera
>>> > > > > >
>>> > > > >
>>> > > >
>>> > > >
>>> > > > --
>>> > > > Todd Lipcon
>>> > > > Software Engineer, Cloudera
>>> > > >
>>> > >
>>> >
>>> >
>>> > --
>>> > Todd Lipcon
>>> > Software Engineer, Cloudera
>>> >
>>>
>>
>>
>> --
>> Todd Lipcon
>> Software Engineer, Cloudera
>>
>

-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Ubuntu 18.04 in pre-merge tests?

Posted by Todd Lipcon <to...@cloudera.com>.
This same issue was reported a month or two ago for Kudu on Fedora 29. I
think Alexey Serbin had started to look into it. Alexey, did we figure out
what was going on here?

-Todd

On Wed, Jun 19, 2019 at 6:00 AM Laszlo Gaal <la...@cloudera.com>
wrote:

> Having looked at the failing build Jim quoted above, the failure seems to
> come from the security area.
> This is from the Kudu master's log, from the startup sequence (see
>
> https://jenkins.impala.io/job/ubuntu-18.04-from-scratch/16/artifact/Impala/logs_static/logs/cluster/cdh6-node-1/kudu/master/kudu-master.INFO/*view*/
> ),
> all this in the context of an Impala minicluster:
>
> I0612 04:12:56.129866  8515 sys_catalog.cc:424] T
> 00000000000000000000000000000000 P 58a05ce6efa74b30907ac4d679bd0515
> [sys.catalog]: configured and running, proceeding with master startup.
> W0612 04:12:56.130080  8522 catalog_manager.cc:1113] T
> 00000000000000000000000000000000 P 58a05ce6efa74b30907ac4d679bd0515:
> acquiring CA information for follower catalog manager: Not found: root CA
> entry not found
> W0612 04:12:56.130123  8522 catalog_manager.cc:596] Not found: root CA
> entry not found: failed to prepare follower catalog manager, will retry
> I0612 04:12:56.130151  8521 catalog_manager.cc:1055] Loading table and
> tablet metadata into memory...
> I0612 04:12:56.130228  8521 catalog_manager.cc:1066] Initializing Kudu
> internal certificate authority...
> W0612 04:12:56.167639  8636 negotiation.cc:320] Unauthorized connection
> attempt: Server connection negotiation failed: server connection from
> 127.0.0.1:50174: expected TLS_HANDSHAKE step: SASL_INITIATE
> W0612 04:12:56.170145  8636 negotiation.cc:320] Unauthorized connection
> attempt: Server connection negotiation failed: server connection from
> 127.0.0.1:50176: expected TLS_HANDSHAKE step: SASL_INITIATE
> W0612 04:12:56.172571  8636 negotiation.cc:320] Unauthorized connection
> attempt: Server connection negotiation failed: server connection from
> 127.0.0.1:50178: expected TLS_HANDSHAKE step: SASL_INITIATE
> W0612 04:12:56.182530  8636 negotiation.cc:320] Unauthorized connection
> attempt: Server connection negotiation failed: server connection from
> 127.0.0.1:50180: expected TLS_HANDSHAKE step: SASL_INITIATE
> W0612 04:12:56.185034  8636 negotiation.cc:320] Unauthorized connection
> attempt: Server connection negotiation failed: server connection from
> 127.0.0.1:50182: expected TLS_HANDSHAKE step: SASL_INITIATE
> W0612 04:12:56.187453  8636 negotiation.cc:320] Unauthorized connection
> attempt: Server connection negotiation failed: server connection from
> 127.0.0.1:50184: expected TLS_HANDSHAKE step: SASL_INITIATE
> I0612 04:12:56.197146  8521 catalog_manager.cc:950] Generated new
> certificate authority record
> I0612 04:12:56.198005  8521 catalog_manager.cc:1075] Loading token signing
> keys...
> W0612 04:12:56.293697  8636 negotiation.cc:320] Unauthorized connection
> attempt: Server connection negotiation failed: server connection from
> 127.0.0.1:50186: expected TLS_HANDSHAKE step: SASL_INITIATE
> W0612 04:12:56.295320  8636 negotiation.cc:320] Unauthorized connection
> attempt: Server connection negotiation failed: server connection from
> 127.0.0.1:50188: expected TLS_HANDSHAKE step: SASL_INITIATE
> W0612 04:12:56.296821  8636 negotiation.cc:320] Unauthorized connection
> attempt: Server connection negotiation failed: server connection from
> 127.0.0.1:50190: expected TLS_HANDSHAKE step: SASL_INITIATE
> I0612 04:12:56.416918  8521 catalog_manager.cc:4292] T
> 00000000000000000000000000000000 P 58a05ce6efa74b30907ac4d679bd0515:
> Generated new TSK 0
> W0612 04:12:57.174684  8901 negotiation.cc:320] Unauthorized connection
> attempt: Server connection negotiation failed: server connection from
> 127.0.0.1:50192: expected TLS_HANDSHAKE step: SASL_INITIATE
> [and so on...]
>
> The same run has very similar messages in the tablet server logs as well:
> 0612 04:12:56.289767  8396 rpc_server.cc:205] RPC server started. Bound to:
> 127.0.0.1:31202
> I0612 04:12:56.289903  8396 webserver.cc:308] Webserver started at
> http://0.0.0.0:31302/ using document root
>
> /home/ubuntu/Impala/toolchain/cdh_components-1137441/kudu-1.10.0-cdh6.x-SNAPSHOT/release/bin/../lib/kudu/www
> and password file <none>
> W0612 04:12:56.293773  8897 heartbeater.cc:587] Failed to heartbeat to
> 127.0.0.1:7051 (0 consecutive failures): Not authorized: Failed to ping
> master at 127.0.0.1:7051: Client connection negotiation failed: client
> connection to 127.0.0.1:7051: FATAL_UNAUTHORIZED: Not authorized: expected
> TLS_HANDSHAKE step: SASL_INITIATE
> W0612 04:12:56.296866  8897 heartbeater.cc:380] Failed 3 heartbeats in a
> row: no longer allowing fast heartbeat attempts.
> W0612 04:13:56.424613  8897 heartbeater.cc:587] Failed to heartbeat to
> 127.0.0.1:7051 (62 consecutive failures): Not authorized: Failed to ping
> master at 127.0.0.1:7051: Client connection negotiation failed: client
> connection to 127.0.0.1:7051: FATAL_UNAUTHORIZED: Not authorized: expected
> TLS_HANDSHAKE step: SASL_INITIATE
> W0612 04:14:56.556850  8897 heartbeater.cc:587] Failed to heartbeat to
> 127.0.0.1:7051 (122 consecutive failures): Not authorized: Failed to ping
> master at 127.0.0.1:7051: Client connection negotiation failed: client
> connection to 127.0.0.1:7051: FATAL_UNAUTHORIZED: Not authorized: expected
> TLS_HANDSHAKE step: SASL_INITIATE
> W0612 04:15:56.694403  8897 heartbeater.cc:587] Failed to heartbeat to
> 127.0.0.1:7051 (182 consecutive failures): Not authorized: Failed to ping
> master at 127.0.0.1:7051: Client connection negotiation failed: client
> connection to 127.0.0.1:7051: FATAL_UNAUTHORIZED: Not authorized: expected
> TLS_HANDSHAKE step: SASL_INITIATE
> W0612 04:16:56.826400  8897 heartbeater.cc:587] Failed to heartbeat to
> 127.0.0.1:7051 (242 consecutive failures): Not authorized: Failed to ping
> master at 127.0.0.1:7051: Client connection negotiation failed: client
> connection to 127.0.0.1:7051: FATAL_UNAUTHORIZED: Not authorized: expected
> TLS_HANDSHAKE step: SASL_INITIATE
> W0612 04:17:56.955927  8897 heartbeater.cc:587] Failed to heartbeat to
> 127.0.0.1:7051 (302 consecutive failures): Not authorized: Failed to ping
> master at 127.0.0.1:7051: Client connection negotiation failed: client
> connection to 127.0.0.1:7051: FATAL_UNAUTHORIZED: Not authorized: expected
> TLS_HANDSHAKE step: SASL_INITIATE
> W0612 04:18:57.103503  8897 heartbeater.cc:587] Failed to heartbeat to
> 127.0.0.1:7051 (362 consecutive failures): Not authorized: Failed to ping
> master at 127.0.0.1:7051: Client connection negotiation failed: client
> connection to 127.0.0.1:7051: FATAL_UNAUTHORIZED: Not authorized: expected
> TLS_HANDSHAKE step: SASL_INITIATE
> W0612 04:19:57.237712  8897 heartbeater.cc:587] Failed to heartbeat to
> 127.0.0.1:7051 (422 consecutive failures): Not authorized: Failed to ping
> master at 127.0.0.1:7051: Client connection negotiation failed: client
> connection to 127.0.0.1:7051: FATAL_UNAUTHORIZED: Not authorized: expected
> TLS_HANDSHAKE step: SASL_INITIATE
> W0612 04:20:57.393489  8897 heartbeater.cc:587] Failed to heartbeat to
> 127.0.0.1:7051 (482 consecutive failures): Not authorized: Failed to ping
> master at 127.0.0.1:7051: Client connection negotiation failed: client
> connection to 127.0.0.1:7051: FATAL_UNAUTHORIZED: Not authorized: expected
> TLS_HANDSHAKE step: SASL_INITIATE
> W0612 04:21:57.522513  8897 heartbeater.cc:587] Failed to heartbeat to
> 127.0.0.1:7051 (542 consecutive failures): Not authorized: Failed to ping
> master at 127.0.0.1:7051: Client connection negotiation failed: client
> connection to 127.0.0.1:7051: FATAL_UNAUTHORIZED: Not authorized: expected
> TLS_HANDSHAKE step: SASL_INITIATE
> W0612 04:22:57.652271  8897 heartbeater.cc:587] Failed to heartbeat to
> 127.0.0.1:7051 (602 consecutive failures): Not authorized: Failed to ping
> master at 127.0.0.1:7051: Client connection negotiation failed: client
> connection to 127.0.0.1:7051: FATAL_UNAUTHORIZED: Not authorized: expected
> TLS_HANDSHAKE step: SASL_INITIATE
> W0612 04:23:57.782537  8897 heartbeater.cc:587] Failed to heartbeat to
> 127.0.0.1:7051 (662 consecutive failures): Not authorized: Failed to ping
> master at 127.0.0.1:7051: Client connection negotiation failed: client
> connection to 127.0.0.1:7051: FATAL_UNAUTHORIZED: Not authorized: expected
> TLS_HANDSHAKE step: SASL_INITIATE
> W0612 04:24:57.910481  8897 heartbeater.cc:587] Failed to heartbeat to
> 127.0.0.1:7051 (722 consecutive failures): Not authorized: Failed to ping
> master at 127.0.0.1:7051: Client connection negotiation failed: client
> connection to 127.0.0.1:7051: FATAL_UNAUTHORIZED: Not authorized: expected
> TLS_HANDSHAKE step: SASL_INITIATE
>
>
> On Mon, Jun 17, 2019 at 9:08 PM Todd Lipcon <to...@cloudera.com> wrote:
>
> > On Sat, Jun 15, 2019 at 2:20 PM Jim Apple <ap...@jbapple.com> wrote:
> >
> > > My goal is to have Impala keep up with (what I perceive to be) the most
> > > popular version of the most popular Linux distribution, for the purpose
> > of
> > > easing the workflow of developers, especially new developers.
> > >
> >
> > Sure, that makes sense. I use Ubuntu 18 myself, but tend to develop
> Impala
> > on a remote box running el7 because the dev environment is too
> heavy-weight
> > to realistically run on my laptop.
> >
> >
> > >
> > > 18.04 stopped being able to load data some time between June 9th and
> > > https://jenkins.impala.io/job/ubuntu-18.04-from-scratch/14/ and June
> 12
> > > and
> > >
> > >
> >
> https://jenkins.impala.io/job/ubuntu-18.04-from-scratch/16/artifact/Impala/logs_static/logs/data_loading/catalogd.ERROR/*view*/
> > > .
> > > I tried reproducing the June 9 run with the same git checkouts (Impala
> > and
> > > Impala-LZO) as #14 today, and data loading still failed.
> > >
> > > What RHEL 7 components did you have in mind that are closer to Ubuntu
> > 16.04
> > > than 18.04?
> > >
> >
> > Stuff like libc, openssl, krb5, sasl, etc are pretty different
> > version-wise. At least, I know when we made Kudu pass tests on Ubuntu 18,
> > we dealt with issues mostly in those libraries, which aren't part of the
> > toolchain (for security reasons we rely on OS-provided libs).
> >
> > Generally I think precommit running on something closer to the oldest
> > supported OS is better than running on the newest, since it's more likely
> > that new OSes are backward-compatible. Otherwise it's very easy to
> > introduce code that uses features not available on el7, for example.
> >
> >
> > >
> > > On Wed, May 22, 2019 at 10:41 AM Todd Lipcon <to...@cloudera.com>
> wrote:
> > >
> > > > On Mon, May 20, 2019 at 8:36 PM Jim Apple <ap...@jbapple.com>
> wrote:
> > > >
> > > > > Maybe now would be a good time to implement Everblue jobs that ping
> > > dev@
> > > > > when they fail. Thoughts?
> > > > >
> > > >
> > > > Mixed feelings on that. We already get many test runs per day of the
> > > > "default" config because people are running precommit builds. Adding
> an
> > > > additional cron-based job to the mix that runs the same builds
> doesn't
> > > seem
> > > > like it adds much unless it tests some other config (eg Ubuntu 18 or
> a
> > > > longer suite of tests). One thing I could get on board with would be
> > > > switching the precommit builds to run just "core" tests or some other
> > > > faster subset, and defer the exhaustive/long runs to scheduled builds
> > or
> > > as
> > > > an optional precommit for particularly invasive patches. I think that
> > > would
> > > > increase dev quality of life substantially (I find my productivity is
> > > often
> > > > hampered by only getting two shots at a precommit run per work day).
> > > >
> > > > I'm not against adding a cron-triggered full test/build on Ubuntu 18,
> > but
> > > > would like to know if someone plans to sign up to triage it when it
> > > fails.
> > > > My experience with other Apache communities is that collective
> > ownership
> > > > over test triage duty (ie "email the dev list on failure" doesn't
> > work. I
> > > > seem to recall we had such builds back in 2010 or so on Hadoop and
> they
> > > > just always got ignored. In various "day job" teams I've seen this
> work
> > > via
> > > > a prescriptive rotation ("all team members take a triage/build-cop
> > > shift")
> > > > but that's not really compatbile with the nature of Apache projects
> > being
> > > > volunteer communities.
> > > >
> > > > So, I think I'll put the question back to you: as a committer you can
> > > spend
> > > > your time as you like. If you think an Ubuntu 18 job running on a
> > > schedule
> > > > would be useful and willing to sign up to triage failures, sounds
> great
> > > to
> > > > me :) Personally I don't develop on Ubuntu 18 and in my day job it's
> > not
> > > a
> > > > particularly important deployment platform, so I personally don't
> think
> > > > I'll spend much time triaging that build.
> > > >
> > > > Todd
> > > >
> > > >
> > > > >
> > > > > On Mon, May 20, 2019 at 9:09 AM Todd Lipcon <to...@cloudera.com>
> > wrote:
> > > > >
> > > > > > Adding a build-only job for 18.04 makes sense to me. A full test
> > run
> > > on
> > > > > > every precommit seems a bit expensive but doing one once a week
> or
> > > > > > something like that might be a good idea to prevent runtime
> > > > regressions.
> > > > > >
> > > > > > As for switching the precommit from 16.04 to 18.04, I'd lean
> > towards
> > > > > > keeping to 16.04 due to it being closer in terms of component
> > > versions
> > > > to
> > > > > > common enterprise distros like RHEL 7.
> > > > > >
> > > > > > -Todd
> > > > > >
> > > > > > On Sun, May 19, 2019 at 5:03 PM Jim Apple <jb...@apache.org>
> > > wrote:
> > > > > >
> > > > > > > HEAD now passes on Ubuntu 18.04:
> > > > > > >
> > > > > > > https://jenkins.impala.io/job/ubuntu-18.04-from-scratch/
> > > > > > >
> > > > > > > Thanks to the community members who have made this happen!
> > > > > > >
> > > > > > > Should we add Ubuntu 18.04 to our pre-merge Jenkins job,
> replace
> > > > 16.04
> > > > > > with
> > > > > > > 18.04 in our pre-merge Jenkins job, or neither?
> > > > > > >
> > > > > > > I propose adding 18.04 for now (ans so running both 16.04 and
> > 18.04
> > > > on
> > > > > > > merge) and removing 16.04 when it starts to become
> inconvenient.
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Todd Lipcon
> > > > > > Software Engineer, Cloudera
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Todd Lipcon
> > > > Software Engineer, Cloudera
> > > >
> > >
> >
> >
> > --
> > Todd Lipcon
> > Software Engineer, Cloudera
> >
>


-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Ubuntu 18.04 in pre-merge tests?

Posted by Laszlo Gaal <la...@cloudera.com>.
Having looked at the failing build Jim quoted above, the failure seems to
come from the security area.
This is from the Kudu master's log, from the startup sequence (see
https://jenkins.impala.io/job/ubuntu-18.04-from-scratch/16/artifact/Impala/logs_static/logs/cluster/cdh6-node-1/kudu/master/kudu-master.INFO/*view*/
),
all this in the context of an Impala minicluster:

I0612 04:12:56.129866  8515 sys_catalog.cc:424] T
00000000000000000000000000000000 P 58a05ce6efa74b30907ac4d679bd0515
[sys.catalog]: configured and running, proceeding with master startup.
W0612 04:12:56.130080  8522 catalog_manager.cc:1113] T
00000000000000000000000000000000 P 58a05ce6efa74b30907ac4d679bd0515:
acquiring CA information for follower catalog manager: Not found: root CA
entry not found
W0612 04:12:56.130123  8522 catalog_manager.cc:596] Not found: root CA
entry not found: failed to prepare follower catalog manager, will retry
I0612 04:12:56.130151  8521 catalog_manager.cc:1055] Loading table and
tablet metadata into memory...
I0612 04:12:56.130228  8521 catalog_manager.cc:1066] Initializing Kudu
internal certificate authority...
W0612 04:12:56.167639  8636 negotiation.cc:320] Unauthorized connection
attempt: Server connection negotiation failed: server connection from
127.0.0.1:50174: expected TLS_HANDSHAKE step: SASL_INITIATE
W0612 04:12:56.170145  8636 negotiation.cc:320] Unauthorized connection
attempt: Server connection negotiation failed: server connection from
127.0.0.1:50176: expected TLS_HANDSHAKE step: SASL_INITIATE
W0612 04:12:56.172571  8636 negotiation.cc:320] Unauthorized connection
attempt: Server connection negotiation failed: server connection from
127.0.0.1:50178: expected TLS_HANDSHAKE step: SASL_INITIATE
W0612 04:12:56.182530  8636 negotiation.cc:320] Unauthorized connection
attempt: Server connection negotiation failed: server connection from
127.0.0.1:50180: expected TLS_HANDSHAKE step: SASL_INITIATE
W0612 04:12:56.185034  8636 negotiation.cc:320] Unauthorized connection
attempt: Server connection negotiation failed: server connection from
127.0.0.1:50182: expected TLS_HANDSHAKE step: SASL_INITIATE
W0612 04:12:56.187453  8636 negotiation.cc:320] Unauthorized connection
attempt: Server connection negotiation failed: server connection from
127.0.0.1:50184: expected TLS_HANDSHAKE step: SASL_INITIATE
I0612 04:12:56.197146  8521 catalog_manager.cc:950] Generated new
certificate authority record
I0612 04:12:56.198005  8521 catalog_manager.cc:1075] Loading token signing
keys...
W0612 04:12:56.293697  8636 negotiation.cc:320] Unauthorized connection
attempt: Server connection negotiation failed: server connection from
127.0.0.1:50186: expected TLS_HANDSHAKE step: SASL_INITIATE
W0612 04:12:56.295320  8636 negotiation.cc:320] Unauthorized connection
attempt: Server connection negotiation failed: server connection from
127.0.0.1:50188: expected TLS_HANDSHAKE step: SASL_INITIATE
W0612 04:12:56.296821  8636 negotiation.cc:320] Unauthorized connection
attempt: Server connection negotiation failed: server connection from
127.0.0.1:50190: expected TLS_HANDSHAKE step: SASL_INITIATE
I0612 04:12:56.416918  8521 catalog_manager.cc:4292] T
00000000000000000000000000000000 P 58a05ce6efa74b30907ac4d679bd0515:
Generated new TSK 0
W0612 04:12:57.174684  8901 negotiation.cc:320] Unauthorized connection
attempt: Server connection negotiation failed: server connection from
127.0.0.1:50192: expected TLS_HANDSHAKE step: SASL_INITIATE
[and so on...]

The same run has very similar messages in the tablet server logs as well:
0612 04:12:56.289767  8396 rpc_server.cc:205] RPC server started. Bound to:
127.0.0.1:31202
I0612 04:12:56.289903  8396 webserver.cc:308] Webserver started at
http://0.0.0.0:31302/ using document root
/home/ubuntu/Impala/toolchain/cdh_components-1137441/kudu-1.10.0-cdh6.x-SNAPSHOT/release/bin/../lib/kudu/www
and password file <none>
W0612 04:12:56.293773  8897 heartbeater.cc:587] Failed to heartbeat to
127.0.0.1:7051 (0 consecutive failures): Not authorized: Failed to ping
master at 127.0.0.1:7051: Client connection negotiation failed: client
connection to 127.0.0.1:7051: FATAL_UNAUTHORIZED: Not authorized: expected
TLS_HANDSHAKE step: SASL_INITIATE
W0612 04:12:56.296866  8897 heartbeater.cc:380] Failed 3 heartbeats in a
row: no longer allowing fast heartbeat attempts.
W0612 04:13:56.424613  8897 heartbeater.cc:587] Failed to heartbeat to
127.0.0.1:7051 (62 consecutive failures): Not authorized: Failed to ping
master at 127.0.0.1:7051: Client connection negotiation failed: client
connection to 127.0.0.1:7051: FATAL_UNAUTHORIZED: Not authorized: expected
TLS_HANDSHAKE step: SASL_INITIATE
W0612 04:14:56.556850  8897 heartbeater.cc:587] Failed to heartbeat to
127.0.0.1:7051 (122 consecutive failures): Not authorized: Failed to ping
master at 127.0.0.1:7051: Client connection negotiation failed: client
connection to 127.0.0.1:7051: FATAL_UNAUTHORIZED: Not authorized: expected
TLS_HANDSHAKE step: SASL_INITIATE
W0612 04:15:56.694403  8897 heartbeater.cc:587] Failed to heartbeat to
127.0.0.1:7051 (182 consecutive failures): Not authorized: Failed to ping
master at 127.0.0.1:7051: Client connection negotiation failed: client
connection to 127.0.0.1:7051: FATAL_UNAUTHORIZED: Not authorized: expected
TLS_HANDSHAKE step: SASL_INITIATE
W0612 04:16:56.826400  8897 heartbeater.cc:587] Failed to heartbeat to
127.0.0.1:7051 (242 consecutive failures): Not authorized: Failed to ping
master at 127.0.0.1:7051: Client connection negotiation failed: client
connection to 127.0.0.1:7051: FATAL_UNAUTHORIZED: Not authorized: expected
TLS_HANDSHAKE step: SASL_INITIATE
W0612 04:17:56.955927  8897 heartbeater.cc:587] Failed to heartbeat to
127.0.0.1:7051 (302 consecutive failures): Not authorized: Failed to ping
master at 127.0.0.1:7051: Client connection negotiation failed: client
connection to 127.0.0.1:7051: FATAL_UNAUTHORIZED: Not authorized: expected
TLS_HANDSHAKE step: SASL_INITIATE
W0612 04:18:57.103503  8897 heartbeater.cc:587] Failed to heartbeat to
127.0.0.1:7051 (362 consecutive failures): Not authorized: Failed to ping
master at 127.0.0.1:7051: Client connection negotiation failed: client
connection to 127.0.0.1:7051: FATAL_UNAUTHORIZED: Not authorized: expected
TLS_HANDSHAKE step: SASL_INITIATE
W0612 04:19:57.237712  8897 heartbeater.cc:587] Failed to heartbeat to
127.0.0.1:7051 (422 consecutive failures): Not authorized: Failed to ping
master at 127.0.0.1:7051: Client connection negotiation failed: client
connection to 127.0.0.1:7051: FATAL_UNAUTHORIZED: Not authorized: expected
TLS_HANDSHAKE step: SASL_INITIATE
W0612 04:20:57.393489  8897 heartbeater.cc:587] Failed to heartbeat to
127.0.0.1:7051 (482 consecutive failures): Not authorized: Failed to ping
master at 127.0.0.1:7051: Client connection negotiation failed: client
connection to 127.0.0.1:7051: FATAL_UNAUTHORIZED: Not authorized: expected
TLS_HANDSHAKE step: SASL_INITIATE
W0612 04:21:57.522513  8897 heartbeater.cc:587] Failed to heartbeat to
127.0.0.1:7051 (542 consecutive failures): Not authorized: Failed to ping
master at 127.0.0.1:7051: Client connection negotiation failed: client
connection to 127.0.0.1:7051: FATAL_UNAUTHORIZED: Not authorized: expected
TLS_HANDSHAKE step: SASL_INITIATE
W0612 04:22:57.652271  8897 heartbeater.cc:587] Failed to heartbeat to
127.0.0.1:7051 (602 consecutive failures): Not authorized: Failed to ping
master at 127.0.0.1:7051: Client connection negotiation failed: client
connection to 127.0.0.1:7051: FATAL_UNAUTHORIZED: Not authorized: expected
TLS_HANDSHAKE step: SASL_INITIATE
W0612 04:23:57.782537  8897 heartbeater.cc:587] Failed to heartbeat to
127.0.0.1:7051 (662 consecutive failures): Not authorized: Failed to ping
master at 127.0.0.1:7051: Client connection negotiation failed: client
connection to 127.0.0.1:7051: FATAL_UNAUTHORIZED: Not authorized: expected
TLS_HANDSHAKE step: SASL_INITIATE
W0612 04:24:57.910481  8897 heartbeater.cc:587] Failed to heartbeat to
127.0.0.1:7051 (722 consecutive failures): Not authorized: Failed to ping
master at 127.0.0.1:7051: Client connection negotiation failed: client
connection to 127.0.0.1:7051: FATAL_UNAUTHORIZED: Not authorized: expected
TLS_HANDSHAKE step: SASL_INITIATE


On Mon, Jun 17, 2019 at 9:08 PM Todd Lipcon <to...@cloudera.com> wrote:

> On Sat, Jun 15, 2019 at 2:20 PM Jim Apple <ap...@jbapple.com> wrote:
>
> > My goal is to have Impala keep up with (what I perceive to be) the most
> > popular version of the most popular Linux distribution, for the purpose
> of
> > easing the workflow of developers, especially new developers.
> >
>
> Sure, that makes sense. I use Ubuntu 18 myself, but tend to develop Impala
> on a remote box running el7 because the dev environment is too heavy-weight
> to realistically run on my laptop.
>
>
> >
> > 18.04 stopped being able to load data some time between June 9th and
> > https://jenkins.impala.io/job/ubuntu-18.04-from-scratch/14/ and June 12
> > and
> >
> >
> https://jenkins.impala.io/job/ubuntu-18.04-from-scratch/16/artifact/Impala/logs_static/logs/data_loading/catalogd.ERROR/*view*/
> > .
> > I tried reproducing the June 9 run with the same git checkouts (Impala
> and
> > Impala-LZO) as #14 today, and data loading still failed.
> >
> > What RHEL 7 components did you have in mind that are closer to Ubuntu
> 16.04
> > than 18.04?
> >
>
> Stuff like libc, openssl, krb5, sasl, etc are pretty different
> version-wise. At least, I know when we made Kudu pass tests on Ubuntu 18,
> we dealt with issues mostly in those libraries, which aren't part of the
> toolchain (for security reasons we rely on OS-provided libs).
>
> Generally I think precommit running on something closer to the oldest
> supported OS is better than running on the newest, since it's more likely
> that new OSes are backward-compatible. Otherwise it's very easy to
> introduce code that uses features not available on el7, for example.
>
>
> >
> > On Wed, May 22, 2019 at 10:41 AM Todd Lipcon <to...@cloudera.com> wrote:
> >
> > > On Mon, May 20, 2019 at 8:36 PM Jim Apple <ap...@jbapple.com> wrote:
> > >
> > > > Maybe now would be a good time to implement Everblue jobs that ping
> > dev@
> > > > when they fail. Thoughts?
> > > >
> > >
> > > Mixed feelings on that. We already get many test runs per day of the
> > > "default" config because people are running precommit builds. Adding an
> > > additional cron-based job to the mix that runs the same builds doesn't
> > seem
> > > like it adds much unless it tests some other config (eg Ubuntu 18 or a
> > > longer suite of tests). One thing I could get on board with would be
> > > switching the precommit builds to run just "core" tests or some other
> > > faster subset, and defer the exhaustive/long runs to scheduled builds
> or
> > as
> > > an optional precommit for particularly invasive patches. I think that
> > would
> > > increase dev quality of life substantially (I find my productivity is
> > often
> > > hampered by only getting two shots at a precommit run per work day).
> > >
> > > I'm not against adding a cron-triggered full test/build on Ubuntu 18,
> but
> > > would like to know if someone plans to sign up to triage it when it
> > fails.
> > > My experience with other Apache communities is that collective
> ownership
> > > over test triage duty (ie "email the dev list on failure" doesn't
> work. I
> > > seem to recall we had such builds back in 2010 or so on Hadoop and they
> > > just always got ignored. In various "day job" teams I've seen this work
> > via
> > > a prescriptive rotation ("all team members take a triage/build-cop
> > shift")
> > > but that's not really compatbile with the nature of Apache projects
> being
> > > volunteer communities.
> > >
> > > So, I think I'll put the question back to you: as a committer you can
> > spend
> > > your time as you like. If you think an Ubuntu 18 job running on a
> > schedule
> > > would be useful and willing to sign up to triage failures, sounds great
> > to
> > > me :) Personally I don't develop on Ubuntu 18 and in my day job it's
> not
> > a
> > > particularly important deployment platform, so I personally don't think
> > > I'll spend much time triaging that build.
> > >
> > > Todd
> > >
> > >
> > > >
> > > > On Mon, May 20, 2019 at 9:09 AM Todd Lipcon <to...@cloudera.com>
> wrote:
> > > >
> > > > > Adding a build-only job for 18.04 makes sense to me. A full test
> run
> > on
> > > > > every precommit seems a bit expensive but doing one once a week or
> > > > > something like that might be a good idea to prevent runtime
> > > regressions.
> > > > >
> > > > > As for switching the precommit from 16.04 to 18.04, I'd lean
> towards
> > > > > keeping to 16.04 due to it being closer in terms of component
> > versions
> > > to
> > > > > common enterprise distros like RHEL 7.
> > > > >
> > > > > -Todd
> > > > >
> > > > > On Sun, May 19, 2019 at 5:03 PM Jim Apple <jb...@apache.org>
> > wrote:
> > > > >
> > > > > > HEAD now passes on Ubuntu 18.04:
> > > > > >
> > > > > > https://jenkins.impala.io/job/ubuntu-18.04-from-scratch/
> > > > > >
> > > > > > Thanks to the community members who have made this happen!
> > > > > >
> > > > > > Should we add Ubuntu 18.04 to our pre-merge Jenkins job, replace
> > > 16.04
> > > > > with
> > > > > > 18.04 in our pre-merge Jenkins job, or neither?
> > > > > >
> > > > > > I propose adding 18.04 for now (ans so running both 16.04 and
> 18.04
> > > on
> > > > > > merge) and removing 16.04 when it starts to become inconvenient.
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Todd Lipcon
> > > > > Software Engineer, Cloudera
> > > > >
> > > >
> > >
> > >
> > > --
> > > Todd Lipcon
> > > Software Engineer, Cloudera
> > >
> >
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>

Re: Ubuntu 18.04 in pre-merge tests?

Posted by Todd Lipcon <to...@cloudera.com>.
On Sat, Jun 15, 2019 at 2:20 PM Jim Apple <ap...@jbapple.com> wrote:

> My goal is to have Impala keep up with (what I perceive to be) the most
> popular version of the most popular Linux distribution, for the purpose of
> easing the workflow of developers, especially new developers.
>

Sure, that makes sense. I use Ubuntu 18 myself, but tend to develop Impala
on a remote box running el7 because the dev environment is too heavy-weight
to realistically run on my laptop.


>
> 18.04 stopped being able to load data some time between June 9th and
> https://jenkins.impala.io/job/ubuntu-18.04-from-scratch/14/ and June 12
> and
>
> https://jenkins.impala.io/job/ubuntu-18.04-from-scratch/16/artifact/Impala/logs_static/logs/data_loading/catalogd.ERROR/*view*/
> .
> I tried reproducing the June 9 run with the same git checkouts (Impala and
> Impala-LZO) as #14 today, and data loading still failed.
>
> What RHEL 7 components did you have in mind that are closer to Ubuntu 16.04
> than 18.04?
>

Stuff like libc, openssl, krb5, sasl, etc are pretty different
version-wise. At least, I know when we made Kudu pass tests on Ubuntu 18,
we dealt with issues mostly in those libraries, which aren't part of the
toolchain (for security reasons we rely on OS-provided libs).

Generally I think precommit running on something closer to the oldest
supported OS is better than running on the newest, since it's more likely
that new OSes are backward-compatible. Otherwise it's very easy to
introduce code that uses features not available on el7, for example.


>
> On Wed, May 22, 2019 at 10:41 AM Todd Lipcon <to...@cloudera.com> wrote:
>
> > On Mon, May 20, 2019 at 8:36 PM Jim Apple <ap...@jbapple.com> wrote:
> >
> > > Maybe now would be a good time to implement Everblue jobs that ping
> dev@
> > > when they fail. Thoughts?
> > >
> >
> > Mixed feelings on that. We already get many test runs per day of the
> > "default" config because people are running precommit builds. Adding an
> > additional cron-based job to the mix that runs the same builds doesn't
> seem
> > like it adds much unless it tests some other config (eg Ubuntu 18 or a
> > longer suite of tests). One thing I could get on board with would be
> > switching the precommit builds to run just "core" tests or some other
> > faster subset, and defer the exhaustive/long runs to scheduled builds or
> as
> > an optional precommit for particularly invasive patches. I think that
> would
> > increase dev quality of life substantially (I find my productivity is
> often
> > hampered by only getting two shots at a precommit run per work day).
> >
> > I'm not against adding a cron-triggered full test/build on Ubuntu 18, but
> > would like to know if someone plans to sign up to triage it when it
> fails.
> > My experience with other Apache communities is that collective ownership
> > over test triage duty (ie "email the dev list on failure" doesn't work. I
> > seem to recall we had such builds back in 2010 or so on Hadoop and they
> > just always got ignored. In various "day job" teams I've seen this work
> via
> > a prescriptive rotation ("all team members take a triage/build-cop
> shift")
> > but that's not really compatbile with the nature of Apache projects being
> > volunteer communities.
> >
> > So, I think I'll put the question back to you: as a committer you can
> spend
> > your time as you like. If you think an Ubuntu 18 job running on a
> schedule
> > would be useful and willing to sign up to triage failures, sounds great
> to
> > me :) Personally I don't develop on Ubuntu 18 and in my day job it's not
> a
> > particularly important deployment platform, so I personally don't think
> > I'll spend much time triaging that build.
> >
> > Todd
> >
> >
> > >
> > > On Mon, May 20, 2019 at 9:09 AM Todd Lipcon <to...@cloudera.com> wrote:
> > >
> > > > Adding a build-only job for 18.04 makes sense to me. A full test run
> on
> > > > every precommit seems a bit expensive but doing one once a week or
> > > > something like that might be a good idea to prevent runtime
> > regressions.
> > > >
> > > > As for switching the precommit from 16.04 to 18.04, I'd lean towards
> > > > keeping to 16.04 due to it being closer in terms of component
> versions
> > to
> > > > common enterprise distros like RHEL 7.
> > > >
> > > > -Todd
> > > >
> > > > On Sun, May 19, 2019 at 5:03 PM Jim Apple <jb...@apache.org>
> wrote:
> > > >
> > > > > HEAD now passes on Ubuntu 18.04:
> > > > >
> > > > > https://jenkins.impala.io/job/ubuntu-18.04-from-scratch/
> > > > >
> > > > > Thanks to the community members who have made this happen!
> > > > >
> > > > > Should we add Ubuntu 18.04 to our pre-merge Jenkins job, replace
> > 16.04
> > > > with
> > > > > 18.04 in our pre-merge Jenkins job, or neither?
> > > > >
> > > > > I propose adding 18.04 for now (ans so running both 16.04 and 18.04
> > on
> > > > > merge) and removing 16.04 when it starts to become inconvenient.
> > > > >
> > > >
> > > >
> > > > --
> > > > Todd Lipcon
> > > > Software Engineer, Cloudera
> > > >
> > >
> >
> >
> > --
> > Todd Lipcon
> > Software Engineer, Cloudera
> >
>


-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Ubuntu 18.04 in pre-merge tests?

Posted by Jim Apple <ap...@jbapple.com>.
My goal is to have Impala keep up with (what I perceive to be) the most
popular version of the most popular Linux distribution, for the purpose of
easing the workflow of developers, especially new developers.

18.04 stopped being able to load data some time between June 9th and
https://jenkins.impala.io/job/ubuntu-18.04-from-scratch/14/ and June 12 and
https://jenkins.impala.io/job/ubuntu-18.04-from-scratch/16/artifact/Impala/logs_static/logs/data_loading/catalogd.ERROR/*view*/.
I tried reproducing the June 9 run with the same git checkouts (Impala and
Impala-LZO) as #14 today, and data loading still failed.

What RHEL 7 components did you have in mind that are closer to Ubuntu 16.04
than 18.04?


On Wed, May 22, 2019 at 10:41 AM Todd Lipcon <to...@cloudera.com> wrote:

> On Mon, May 20, 2019 at 8:36 PM Jim Apple <ap...@jbapple.com> wrote:
>
> > Maybe now would be a good time to implement Everblue jobs that ping dev@
> > when they fail. Thoughts?
> >
>
> Mixed feelings on that. We already get many test runs per day of the
> "default" config because people are running precommit builds. Adding an
> additional cron-based job to the mix that runs the same builds doesn't seem
> like it adds much unless it tests some other config (eg Ubuntu 18 or a
> longer suite of tests). One thing I could get on board with would be
> switching the precommit builds to run just "core" tests or some other
> faster subset, and defer the exhaustive/long runs to scheduled builds or as
> an optional precommit for particularly invasive patches. I think that would
> increase dev quality of life substantially (I find my productivity is often
> hampered by only getting two shots at a precommit run per work day).
>
> I'm not against adding a cron-triggered full test/build on Ubuntu 18, but
> would like to know if someone plans to sign up to triage it when it fails.
> My experience with other Apache communities is that collective ownership
> over test triage duty (ie "email the dev list on failure" doesn't work. I
> seem to recall we had such builds back in 2010 or so on Hadoop and they
> just always got ignored. In various "day job" teams I've seen this work via
> a prescriptive rotation ("all team members take a triage/build-cop shift")
> but that's not really compatbile with the nature of Apache projects being
> volunteer communities.
>
> So, I think I'll put the question back to you: as a committer you can spend
> your time as you like. If you think an Ubuntu 18 job running on a schedule
> would be useful and willing to sign up to triage failures, sounds great to
> me :) Personally I don't develop on Ubuntu 18 and in my day job it's not a
> particularly important deployment platform, so I personally don't think
> I'll spend much time triaging that build.
>
> Todd
>
>
> >
> > On Mon, May 20, 2019 at 9:09 AM Todd Lipcon <to...@cloudera.com> wrote:
> >
> > > Adding a build-only job for 18.04 makes sense to me. A full test run on
> > > every precommit seems a bit expensive but doing one once a week or
> > > something like that might be a good idea to prevent runtime
> regressions.
> > >
> > > As for switching the precommit from 16.04 to 18.04, I'd lean towards
> > > keeping to 16.04 due to it being closer in terms of component versions
> to
> > > common enterprise distros like RHEL 7.
> > >
> > > -Todd
> > >
> > > On Sun, May 19, 2019 at 5:03 PM Jim Apple <jb...@apache.org> wrote:
> > >
> > > > HEAD now passes on Ubuntu 18.04:
> > > >
> > > > https://jenkins.impala.io/job/ubuntu-18.04-from-scratch/
> > > >
> > > > Thanks to the community members who have made this happen!
> > > >
> > > > Should we add Ubuntu 18.04 to our pre-merge Jenkins job, replace
> 16.04
> > > with
> > > > 18.04 in our pre-merge Jenkins job, or neither?
> > > >
> > > > I propose adding 18.04 for now (ans so running both 16.04 and 18.04
> on
> > > > merge) and removing 16.04 when it starts to become inconvenient.
> > > >
> > >
> > >
> > > --
> > > Todd Lipcon
> > > Software Engineer, Cloudera
> > >
> >
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>

Re: Ubuntu 18.04 in pre-merge tests?

Posted by Todd Lipcon <to...@cloudera.com>.
On Mon, May 20, 2019 at 8:36 PM Jim Apple <ap...@jbapple.com> wrote:

> Maybe now would be a good time to implement Everblue jobs that ping dev@
> when they fail. Thoughts?
>

Mixed feelings on that. We already get many test runs per day of the
"default" config because people are running precommit builds. Adding an
additional cron-based job to the mix that runs the same builds doesn't seem
like it adds much unless it tests some other config (eg Ubuntu 18 or a
longer suite of tests). One thing I could get on board with would be
switching the precommit builds to run just "core" tests or some other
faster subset, and defer the exhaustive/long runs to scheduled builds or as
an optional precommit for particularly invasive patches. I think that would
increase dev quality of life substantially (I find my productivity is often
hampered by only getting two shots at a precommit run per work day).

I'm not against adding a cron-triggered full test/build on Ubuntu 18, but
would like to know if someone plans to sign up to triage it when it fails.
My experience with other Apache communities is that collective ownership
over test triage duty (ie "email the dev list on failure" doesn't work. I
seem to recall we had such builds back in 2010 or so on Hadoop and they
just always got ignored. In various "day job" teams I've seen this work via
a prescriptive rotation ("all team members take a triage/build-cop shift")
but that's not really compatbile with the nature of Apache projects being
volunteer communities.

So, I think I'll put the question back to you: as a committer you can spend
your time as you like. If you think an Ubuntu 18 job running on a schedule
would be useful and willing to sign up to triage failures, sounds great to
me :) Personally I don't develop on Ubuntu 18 and in my day job it's not a
particularly important deployment platform, so I personally don't think
I'll spend much time triaging that build.

Todd


>
> On Mon, May 20, 2019 at 9:09 AM Todd Lipcon <to...@cloudera.com> wrote:
>
> > Adding a build-only job for 18.04 makes sense to me. A full test run on
> > every precommit seems a bit expensive but doing one once a week or
> > something like that might be a good idea to prevent runtime regressions.
> >
> > As for switching the precommit from 16.04 to 18.04, I'd lean towards
> > keeping to 16.04 due to it being closer in terms of component versions to
> > common enterprise distros like RHEL 7.
> >
> > -Todd
> >
> > On Sun, May 19, 2019 at 5:03 PM Jim Apple <jb...@apache.org> wrote:
> >
> > > HEAD now passes on Ubuntu 18.04:
> > >
> > > https://jenkins.impala.io/job/ubuntu-18.04-from-scratch/
> > >
> > > Thanks to the community members who have made this happen!
> > >
> > > Should we add Ubuntu 18.04 to our pre-merge Jenkins job, replace 16.04
> > with
> > > 18.04 in our pre-merge Jenkins job, or neither?
> > >
> > > I propose adding 18.04 for now (ans so running both 16.04 and 18.04 on
> > > merge) and removing 16.04 when it starts to become inconvenient.
> > >
> >
> >
> > --
> > Todd Lipcon
> > Software Engineer, Cloudera
> >
>


-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Ubuntu 18.04 in pre-merge tests?

Posted by Jim Apple <ap...@jbapple.com>.
Maybe now would be a good time to implement Everblue jobs that ping dev@
when they fail. Thoughts?

On Mon, May 20, 2019 at 9:09 AM Todd Lipcon <to...@cloudera.com> wrote:

> Adding a build-only job for 18.04 makes sense to me. A full test run on
> every precommit seems a bit expensive but doing one once a week or
> something like that might be a good idea to prevent runtime regressions.
>
> As for switching the precommit from 16.04 to 18.04, I'd lean towards
> keeping to 16.04 due to it being closer in terms of component versions to
> common enterprise distros like RHEL 7.
>
> -Todd
>
> On Sun, May 19, 2019 at 5:03 PM Jim Apple <jb...@apache.org> wrote:
>
> > HEAD now passes on Ubuntu 18.04:
> >
> > https://jenkins.impala.io/job/ubuntu-18.04-from-scratch/
> >
> > Thanks to the community members who have made this happen!
> >
> > Should we add Ubuntu 18.04 to our pre-merge Jenkins job, replace 16.04
> with
> > 18.04 in our pre-merge Jenkins job, or neither?
> >
> > I propose adding 18.04 for now (ans so running both 16.04 and 18.04 on
> > merge) and removing 16.04 when it starts to become inconvenient.
> >
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>

Re: Ubuntu 18.04 in pre-merge tests?

Posted by Todd Lipcon <to...@cloudera.com>.
Adding a build-only job for 18.04 makes sense to me. A full test run on
every precommit seems a bit expensive but doing one once a week or
something like that might be a good idea to prevent runtime regressions.

As for switching the precommit from 16.04 to 18.04, I'd lean towards
keeping to 16.04 due to it being closer in terms of component versions to
common enterprise distros like RHEL 7.

-Todd

On Sun, May 19, 2019 at 5:03 PM Jim Apple <jb...@apache.org> wrote:

> HEAD now passes on Ubuntu 18.04:
>
> https://jenkins.impala.io/job/ubuntu-18.04-from-scratch/
>
> Thanks to the community members who have made this happen!
>
> Should we add Ubuntu 18.04 to our pre-merge Jenkins job, replace 16.04 with
> 18.04 in our pre-merge Jenkins job, or neither?
>
> I propose adding 18.04 for now (ans so running both 16.04 and 18.04 on
> merge) and removing 16.04 when it starts to become inconvenient.
>


-- 
Todd Lipcon
Software Engineer, Cloudera