You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by Bowen Li <bo...@gmail.com> on 2019/06/24 21:48:11 UTC

[DISCUSS] solve unstable build capacity problem on TravisCI

Hi devs,

I've been experiencing the pain resulting from lack of stable build
capacity on Travis for Flink PRs [1]. Specifically, I noticed often that no
build in the queue is making any progress for hours, and suddenly 5 or 6
builds kick off all together after the long pause. I'm at PST (UTC-08) time
zone, and I've seen pause can be as long as 6 hours from PST 9am to 3pm
(let alone the time needed to drain the queue afterwards).

I think this has greatly impacted our productivity. I've experienced that
PRs submitted in the early morning of PST time zone won't finish their
build until late night of the same day.

So my questions are:

- Has anyone else experienced the same problem or have similar observation
on TravisCI? (I suspect it has things to do with time zone)

- What pricing plan of TravisCI is Flink currently using? Is it the free
plan for open source projects? What are the guaranteed build capacity of
the current plan?

- If the current pricing plan (either free or paid) can't provide stable
build capacity, can we upgrade to a higher priced plan with larger and more
stable build capacity?

BTW, another factor that contribute to the productivity problem is that our
build is slow - we run full build for every PR and a successful full build
takes ~5h. We definitely have more options to solve it, for instance,
modularize the build graphs and reuse artifacts from the previous build.
But I think that can be a big effort which is much harder to accomplish in
a short period of time and may deserve its own separate discussion.

[1] https://travis-ci.org/apache/flink/pull_requests

Re: [VOTE] Migrate to sponsored Travis account

Posted by Chesnay Schepler <ch...@apache.org>.
I have a prototype ready and will now commence a real world test. I will 
point it apache/flink and mirror it into a ververica controlled repo to 
start Travis runs.

Once the run is finished the bot will comment on the PR with the results.

This runs in addition to our existing CI.

On 04/07/2019 14:06, Chesnay Schepler wrote:
> Note that the Flinkbot approach isn't that trivial either; we can't 
> _just_ trigger builds for a branch in the apache repo, but would first 
> have to clone the branch/pr into a separate repository (that is owned 
> by the github account that the travis account would be tied to).
>
> One roadblock after the next showing up...
>
> On 04/07/2019 11:59, Chesnay Schepler wrote:
>> Small update with mostly bad news:
>>
>> INFRA doesn't know whether it is possible, and referred my to Travis 
>> support.
>> They did point out that it could be problematic in regards to 
>> read/write permissions for the repository.
>>
>> From my own findings /so far/ with a test repo/organization, it does 
>> not appear possible to configure the Travis account used for a 
>> specific repository.
>>
>> So yeah, if we go down this route we may have to pimp the Flinkbot to 
>> trigger builds through the Travis REST API.
>>
>> On 04/07/2019 10:46, Chesnay Schepler wrote:
>>> I've raised a JIRA 
>>> <https://issues.apache.org/jira/browse/INFRA-18703>with INFRA to 
>>> inquire whether it would be possible to switch to a different Travis 
>>> account, and if so what steps would need to be taken.
>>> We need a proper confirmation from INFRA since we are not in full 
>>> control of the flink repository (for example, we cannot access the 
>>> settings page).
>>>
>>> If this is indeed possible, Ververica is willing sponsor a Travis 
>>> account for the Flink project.
>>> This would provide us with more than enough resources than we need.
>>>
>>> Since this makes the project more reliant on resources provided by 
>>> external companies I would like to vote on this.
>>>
>>> Please vote on this proposal, as follows:
>>> [ ] +1, Approve the migration to a Ververica-sponsored Travis 
>>> account, provided that INFRA approves
>>> [ ] -1, Do not approach the migration to a Ververica-sponsored 
>>> Travis account
>>>
>>> The vote will be open for at least 24h, and until we have 
>>> confirmation from INFRA. The voting period may be shorter than the 
>>> usual 3 days since our current is effectively not working.
>>>
>>> On 04/07/2019 06:51, Bowen Li wrote:
>>>> Re: > Are they using their own Travis CI pool, or did the switch to 
>>>> an entirely different CI service?
>>>>
>>>> I reached out to Wes and Krisztián from Apache Arrow PMC. They are 
>>>> currently moving away from ASF's Travis to their own in-house metal 
>>>> machines at [1] with custom CI application at [2]. They've seen 
>>>> significant improvement w.r.t both much higher performance and 
>>>> basically no resource waiting time, "night-and-day" difference 
>>>> quoting Wes.
>>>>
>>>> Re: > If we can just switch to our own Travis pool, just for our 
>>>> project, then this might be something we can do fairly quickly?
>>>>
>>>> I believe so, according to [3] and [4]
>>>>
>>>>
>>>> [1] https://ci.ursalabs.org/ <https://ci.ursalabs.org/#/>
>>>> [2] https://github.com/ursa-labs/ursabot
>>>> [3] 
>>>> https://docs.travis-ci.com/user/migrate/open-source-repository-migration 
>>>>
>>>> [4] 
>>>> https://docs.travis-ci.com/user/migrate/open-source-on-travis-ci-com
>>>>
>>>>
>>>>
>>>> On Wed, Jul 3, 2019 at 12:01 AM Chesnay Schepler 
>>>> <chesnay@apache.org <ma...@apache.org>> wrote:
>>>>
>>>>     Are they using their own Travis CI pool, or did the switch to an
>>>>     entirely different CI service?
>>>>
>>>>     If we can just switch to our own Travis pool, just for our
>>>>     project, then
>>>>     this might be something we can do fairly quickly?
>>>>
>>>>     On 03/07/2019 05:55, Bowen Li wrote:
>>>>     > I responded in the INFRA ticket [1] that I believe they are
>>>>     using a wrong
>>>>     > metric against Flink and the total build time is a completely
>>>>     different
>>>>     > thing than guaranteed build capacity.
>>>>     >
>>>>     > My response:
>>>>     >
>>>>     > "As mentioned above, since I started to pay attention to Flink's
>>>>     build
>>>>     > queue a few tens of days ago, I'm in Seattle and I saw no build
>>>>     was kicking
>>>>     > off in PST daytime in weekdays for Flink. Our teammates in China
>>>>     and Europe
>>>>     > have also reported similar observations. So we need to evaluate
>>>>     how the
>>>>     > large total build time came from - if 1) your number and 2) our
>>>>     > observations from three locations that cover pretty much a full
>>>>     day, are
>>>>     > all true, I **guess** one reason can be that - highly likely the
>>>>     extra
>>>>     > build time came from weekends when other Apache projects may be
>>>>     idle and
>>>>     > Flink just drains hard its congested queue.
>>>>     >
>>>>     > Please be aware of that we're not complaining about the lack of
>>>>     resources
>>>>     > in general, I'm complaining about the lack of **stable, 
>>>> dedicated**
>>>>     > resources. An example for the latter one is, currently even if
>>>>     no build is
>>>>     > in Flink's queue and I submit a request to be the queue head 
>>>> in PST
>>>>     > morning, my build won't even start in 6-8+h. That is an absurd
>>>>     amount of
>>>>     > waiting time.
>>>>     >
>>>>     > That's saying, if ASF INFRA decides to adopt a quota system and
>>>>     grants
>>>>     > Flink five DEDICATED servers that runs all the time only for
>>>>     Flink, that'll
>>>>     > be PERFECT and can totally solve our problem now.
>>>>     >
>>>>     > Please be aware of that we're not complaining about the lack of
>>>>     resources
>>>>     > in general, I'm complaining about the lack of **stable, 
>>>> dedicated**
>>>>     > resources. An example for the latter one is, currently even if
>>>>     no build is
>>>>     > in Flink's queue and I submit a request to be the queue head 
>>>> in PST
>>>>     > morning, my build won't even start in 6-8+h. That is an absurd
>>>>     amount of
>>>>     > waiting time.
>>>>     >
>>>>     >
>>>>     > That's saying, if ASF INFRA decides to adopt a quota system and
>>>>     grants
>>>>     > Flink five DEDICATED servers that runs all the time only for
>>>>     Flink, that'll
>>>>     > be PERFECT and can totally solve our problem now.
>>>>     >
>>>>     > I feel what's missing in the ASF INFRA's Travis resource pool is
>>>>     some level
>>>>     > of build capacity SLAs and certainty"
>>>>     >
>>>>     >
>>>>     > Again, I believe there are differences in nature of these two
>>>>     problems,
>>>>     > long build time v.s. lack of dedicated build resource. That's
>>>>     saying,
>>>>     > shortening build time may relieve the situation, and may not.
>>>>     I'm sightly
>>>>     > negative on disabling IT cases for PRs, due to the downside is
>>>>     that we are
>>>>     > at risk of any potential bugs in PR that UTs doesn't catch, and
>>>>     may cost a
>>>>     > lot more to fix and if it slows others down or even block
>>>>     others, but am
>>>>     > open to others opinions on it.
>>>>     >
>>>>     > AFAICT from INFRA ticket[1], donating to ASF INFRA won't be
>>>>     feasible to
>>>>     > solve our problem since INFRA's pool is fully shared and they
>>>>     have no
>>>>     > control and finer insights over resource allocation to a
>>>>     specific Apache
>>>>     > project. As mentioned in [1], Apache Arrow is moving away from
>>>>     ASF INFRA
>>>>     > Travis pool (they are actually surprised Flink hasn't plan to do
>>>>     so). I
>>>>     > know that Spark is on its own build infra. If we all agree that
>>>>     funding our
>>>>     > own build infra, I'd be glad to help investigate any potential
>>>>     options
>>>>     > after releasing 1.9 since I'm super busy with 1.9 now.
>>>>     >
>>>>     > [1] https://issues.apache.org/jira/browse/INFRA-18533
>>>>     >
>>>>     >
>>>>     >
>>>>     > On Tue, Jul 2, 2019 at 4:46 AM Chesnay Schepler
>>>>     <chesnay@apache.org <ma...@apache.org>> wrote:
>>>>     >
>>>>     >> As a short-term stopgap, since we can assume this issue to
>>>>     become much
>>>>     >> worse in the following days/weeks, we could disable IT cases in
>>>>     PRs and
>>>>     >> only run them on master.
>>>>     >>
>>>>     >> On 02/07/2019 12:03, Chesnay Schepler wrote:
>>>>     >>> People really have to stop thinking that just because
>>>>     something works
>>>>     >>> for us it is also a good solution.
>>>>     >>> Also, please remember that our builds run for 2h from start to
>>>>     finish,
>>>>     >>> and not the 14 _minutes_ it takes for zeppelin.
>>>>     >>> We are dealing with an entirely different scale here, both in
>>>>     terms of
>>>>     >>> build times and number of builds.
>>>>     >>>
>>>>     >>> In this very thread people have been complaining about long 
>>>> queue
>>>>     >>> times for their builds. Surprise, other Apache projects 
>>>> have been
>>>>     >>> suffering the very same thing due to us not controlling our 
>>>> build
>>>>     >>> times. While switching services (be it Jenkins, CircleCI or
>>>>     whatever)
>>>>     >>> will possibly work for us (and these options are actually
>>>>     attractive,
>>>>     >>> like CircleCI's proper support for build artifacts), it 
>>>> will also
>>>>     >>> result in us likely negatively affecting other projects in
>>>>     significant
>>>>     >>> ways.
>>>>     >>>
>>>>     >>> Sure, the Jenkins setup has a good user experience for us, at
>>>>     the cost
>>>>     >>> of blocking Jenkins workers for a _lot_ of time. Right now we
>>>>     have 25
>>>>     >>> PR's in our queue; that's possibly 50h we'd consume of Jenkins
>>>>     >>> resources, and the European contributors haven't even really
>>>>     started yet.
>>>>     >>>
>>>>     >>> FYI, the latest INFRA response from INFRA-18533:
>>>>     >>>
>>>>     >>> "Our rough metrics shows that Flink used over 5800 hours of
>>>>     build time
>>>>     >>> last month. That is equal to EIGHT servers running 24/7 for
>>>>     the ENTIRE
>>>>     >>> MONTH. EIGHT. nonstop.
>>>>     >>> When we discovered this last night, we discussed it some and
>>>>     are going
>>>>     >>> to tune down Flink to allow only five executors maximum. We 
>>>> cannot
>>>>     >>> allow Flink to consume so much of a Foundation shared 
>>>> resource."
>>>>     >>>
>>>>     >>> So yes, we either
>>>>     >>> a) have to heavily reduce our CI usage or
>>>>     >>> b) fund our own, either maintaining it ourselves or donating
>>>>     to Apache.
>>>>     >>>
>>>>     >>> On 02/07/2019 05:11, Bowen Li wrote:
>>>>     >>>> By looking at the git history of the Jenkins script, its core
>>>>     part
>>>>     >>>> was finished in March 2017 (and only two minor update in
>>>>     2017/2018),
>>>>     >>>> so it's been running for over two years now and feels like
>>>>     Zepplin
>>>>     >>>> community has been quite happy with it. @Jeff Zhang
>>>>     >>>> <mailto:zjffdu@gmail.com <ma...@gmail.com>> can you
>>>>     share your insights and user
>>>>     >>>> experience with the Jenkins+Travis approach?
>>>>     >>>>
>>>>     >>>> Things like:
>>>>     >>>>
>>>>     >>>> - has the approach completely solved the resource capacity
>>>>     problem
>>>>     >>>> for Zepplin community? is Zepplin community happy with the
>>>>     result?
>>>>     >>>> - is the whole configuration chain stable (e.g. uptime) 
>>>> enough?
>>>>     >>>> - how often do you need to maintain the Jenkins infra? how 
>>>> many
>>>>     >>>> people are usually involved in maintenance and bug-fixes?
>>>>     >>>>
>>>>     >>>> The downside of this approach seems mostly to be on the
>>>>     maintenance
>>>>     >>>> to me - maintain the script and Jenkins infra.
>>>>     >>>>
>>>>     >>>> ** Having Our Own Travis-CI.com Account **
>>>>     >>>>
>>>>     >>>> Another alternative I've been thinking of is to have our own
>>>>     >>>> travis-ci.com <http://travis-ci.com> <http://travis-ci.com>
>>>>     account with paid dedicated
>>>>     >>>> resources. Note travis-ci.org <http://travis-ci.org>
>>>> <http://travis-ci.org> is the free
>>>>     >>>> version and travis-ci.com <http://travis-ci.com>
>>>> <http://travis-ci.com> is the commercial
>>>>     >>>> version. We currently use a shared resource pool managed by
>>>>     ASK INFRA
>>>>     >>>> team on travis-ci.org <http://travis-ci.org>
>>>> <http://travis-ci.org>, but we have no control
>>>>     >>>> over it - we can't see how it's configured, how much
>>>>     resources are
>>>>     >>>> available, how resources are allocated among Apache projects,
>>>>     etc.
>>>>     >>>> The nice thing about having an account on travis-ci.com
>>>> <http://travis-ci.com>
>>>>     >>>> <http://travis-ci.com> are:
>>>>     >>>>
>>>>     >>>> - relatively low cost with much better resource guarantee
>>>>     than what
>>>>     >>>> we currently have [1]: $249/month with 5 dedicated 
>>>> concurrency,
>>>>     >>>> $489/month with 10 concurrency
>>>>     >>>> - low maintenance work compared to using Jenkins
>>>>     >>>> - (potentially) no migration cost according to Travis's 
>>>> doc [2]
>>>>     >>>> (pending verification)
>>>>     >>>> - full control over the build capacity/configuration 
>>>> compared to
>>>>     >>>> using ASF INFRA's pool
>>>>     >>>>
>>>>     >>>> I'd be surprised if we as such a vibrant community cannot
>>>>     find and
>>>>     >>>> fund $249*12=$2988 a year in exchange for a much better 
>>>> developer
>>>>     >>>> experience and much higher productivity.
>>>>     >>>>
>>>>     >>>> [1] https://travis-ci.com/plans
>>>>     >>>> [2]
>>>>     >>>>
>>>>     >>
>>>> https://docs.travis-ci.com/user/migrate/open-source-repository-migration 
>>>>
>>>>     >>>> On Sat, Jun 29, 2019 at 8:39 AM Chesnay Schepler
>>>>     <chesnay@apache.org <ma...@apache.org>
>>>>     >>>> <mailto:chesnay@apache.org <ma...@apache.org>>> 
>>>> wrote:
>>>>     >>>>
>>>>     >>>>      So yes, the Jenkins job keeps pulling the state from
>>>>     Travis until it
>>>>     >>>>      finishes.
>>>>     >>>>
>>>>     >>>>      Note sure I'm comfortable with the idea of using Jenkins
>>>>     workers
>>>>     >>>>      just to
>>>>     >>>>      idle for a several hours.
>>>>     >>>>
>>>>     >>>>      On 29/06/2019 14:56, Jeff Zhang wrote:
>>>>     >>>>      > Here's what zeppelin community did, we make a python
>>>>     script to
>>>>     >>>>      check the
>>>>     >>>>      > build status of pull request.
>>>>     >>>>      > Here's script:
>>>>     >>>>      >
>>>> https://github.com/apache/zeppelin/blob/master/travis_check.py
>>>>     >>>>      >
>>>>     >>>>      > And this is the script we used in Jenkins build job.
>>>>     >>>>      >
>>>>     >>>>      > if [ -f "travis_check.py" ]; then
>>>>     >>>>      >    git log -n 1
>>>>     >>>>      >    STATUS=$(curl -s $BUILD_URL | grep -e "GitHub pull
>>>>     >>>>      request.*from.*" | sed
>>>>     >>>>      > 's/.*GitHub pull request <a
>>>>     >>>>      > href=\"\(https[^"]*\).*from[^"]*.\(https[^"]*\).*/\1
>>>>     \2/g')
>>>>     >>>>      >    AUTHOR=$(echo $STATUS | sed 's/.*[/]\(.*\)$/\1/g')
>>>>     >>>>      >    PR=$(echo $STATUS | awk '{print $1}' | sed
>>>>     >>>> 's/.*[/]\(.*\)$/\1/g')
>>>>     >>>>      >    #COMMIT=$(git log -n 1 | grep "^Merge:" | awk
>>>>     '{print $3}')
>>>>     >>>>      >    #if [ -z $COMMIT ]; then
>>>>     >>>>      >    #  COMMIT=$(curl -s
>>>>     >>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR
>>>>     >>>>      > | grep -e "\"label\":" -e "\"ref\":" -e "\"sha\":" |
>>>>     tr '\n' ' '
>>>>     >>>>      | sed
>>>>     >>>>      > 's/\(.*sha[^,]*,\)\(.*ref.*\)/\1 = \2/g' | tr = '\n' |
>>>>     grep -v
>>>>     >>>>      "apache:" |
>>>>     >>>>      > sed 's/.*sha.[^"]*["]\([^"]*\).*/\1/g')
>>>>     >>>>      >    #fi
>>>>     >>>>      >
>>>>     >>>>      >    # get commit hash from PR
>>>>     >>>>      >    COMMIT=$(curl -s
>>>>     >>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR |
>>>>     >>>>      > grep -e "\"label\":" -e "\"ref\":" -e "\"sha\":" | tr
>>>>     '\n' ' '
>>>>     >>>> | sed
>>>>     >>>>      > 's/\(.*sha[^,]*,\)\(.*ref.*\)/\1 = \2/g' | tr = '\n' |
>>>>     grep -v
>>>>     >>>>      "apache:" |
>>>>     >>>>      > sed 's/.*sha.[^"]*["]\([^"]*\).*/\1/g')
>>>>     >>>>      >    sleep 30 # sleep few moment to wait travis starts
>>>>     the build
>>>>     >>>>      >    RET_CODE=0
>>>>     >>>>      >    python ./travis_check.py ${AUTHOR} ${COMMIT} ||
>>>>     RET_CODE=$?
>>>>     >>>>      >    if [ $RET_CODE -eq 2 ]; then # try with repository
>>>>     name when
>>>>     >>>>      travis-ci is
>>>>     >>>>      > not available in the account
>>>>     >>>>      >      RET_CODE=0
>>>>     >>>>      >      AUTHOR=$(curl -s
>>>>     >>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR
>>>>     >>>>      > | grep '"full_name":' | grep -v "apache/zeppelin" | 
>>>> sed
>>>>     >>>>      > 's/.*[:][^"]*["]\([^/]*\).*/\1/g')
>>>>     >>>>      >    python ./travis_check.py ${AUTHOR} ${COMMIT} ||
>>>>     RET_CODE=$?
>>>>     >>>>      >    fi
>>>>     >>>>      >
>>>>     >>>>      >    if [ $RET_CODE -eq 2 ]; then # fail with can't find
>>>>     build
>>>>     >>>>      information in
>>>>     >>>>      > the travis
>>>>     >>>>      >      set +x
>>>>     >>>>      >      echo
>>>>     "-----------------------------------------------------"
>>>>     >>>>      >      echo "Looks like travis-ci is not configured for
>>>>     your fork."
>>>>     >>>>      >      echo "Please setup by swich on 'zeppelin'
>>>>     repository at
>>>>     >>>>      > https://travis-ci.org/profile and travis-ci."
>>>>     >>>>      >      echo "And then make sure 'Build branch updates'
>>>>     option is
>>>>     >>>>      enabled in
>>>>     >>>>      > the settings
>>>> https://travis-ci.org/${AUTHOR}/zeppelin/settings
>>>> <https://travis-ci.org/$%7BAUTHOR%7D/zeppelin/settings>
>>>>     >>>> <https://travis-ci.org/$%7BAUTHOR%7D/zeppelin/settings>."
>>>>     >>>>      >      echo ""
>>>>     >>>>      >      echo "To trigger CI after setup, you will need
>>>>     ammend your
>>>>     >>>>      last commit
>>>>     >>>>      > with"
>>>>     >>>>      >      echo "git commit --amend"
>>>>     >>>>      >      echo "git push your-remote HEAD --force"
>>>>     >>>>      >      echo ""
>>>>     >>>>      >      echo "See
>>>>     >>>>      >
>>>>     >>>>
>>>>     >>
>>>> http://zeppelin.apache.org/contribution/contributions.html#continuous-integration 
>>>>
>>>>     >>>>      > ."
>>>>     >>>>      >    fi
>>>>     >>>>      >
>>>>     >>>>      >    exit $RET_CODE
>>>>     >>>>      > else
>>>>     >>>>      >    set +x
>>>>     >>>>      >    echo "travis_check.py does not exists"
>>>>     >>>>      >    exit 1
>>>>     >>>>      > fi
>>>>     >>>>      >
>>>>     >>>>      > Chesnay Schepler <chesnay@apache.org
>>>> <ma...@apache.org>
>>>>     >>>>      <mailto:chesnay@apache.org <ma...@apache.org>>>
>>>>     于2019年6月29日周六 下午3:17写道:
>>>>     >>>>      >
>>>>     >>>>      >> Does this imply that a Jenkins job is active as long
>>>>     as the
>>>>     >>>>      Travis build
>>>>     >>>>      >> runs?
>>>>     >>>>      >>
>>>>     >>>>      >> On 26/06/2019 21:28, Bowen Li wrote:
>>>>     >>>>      >>> Hi,
>>>>     >>>>      >>>
>>>>     >>>>      >>> @Dawid, I think the "long test running" as I
>>>>     mentioned in the
>>>>     >>>>      first
>>>>     >>>>      >> email,
>>>>     >>>>      >>> also as you guys said, belongs to "a big effort
>>>>     which is much
>>>>     >>>>      harder to
>>>>     >>>>      >>> accomplish in a short period of time and may deserve
>>>>     its own
>>>>     >>>>      separate
>>>>     >>>>      >>> discussion". Thus I didn't include it in what we can
>>>>     do in a
>>>>     >>>>      foreseeable
>>>>     >>>>      >>> short term.
>>>>     >>>>      >>>
>>>>     >>>>      >>> Besides, I don't think that's the ultimate reason
>>>>     for lack of
>>>>     >>>>      build
>>>>     >>>>      >>> resources. Even if the build is shortened to
>>>>     something like
>>>>     >>>>      2h, the
>>>>     >>>>      >>> problems of no build machine works about 6 or more
>>>>     hours in
>>>>     >>>>      PST daytime
>>>>     >>>>      >>> that I described will still happen, because no
>>>>     machine from
>>>>     >>>>      ASF INFRA's
>>>>     >>>>      >>> pool is allocated to Flink. As I have paid close
>>>>     attention to
>>>>     >>>>      the build
>>>>     >>>>      >>> queue in the past few weekdays, it's a pretty clear
>>>>     pattern now.
>>>>     >>>>      >>>
>>>>     >>>>      >>> **The ultimate root cause** for that is - we don't
>>>>     have any
>>>>     >>>>      **dedicated**
>>>>     >>>>      >>> build resources that we can stably rely on. I'm
>>>>     actually ok to
>>>>     >>>>      wait for a
>>>>     >>>>      >>> long time if there are build requests running, it
>>>>     means at
>>>>     >>>>      least we are
>>>>     >>>>      >>> making progress. But I'm not ok with no build
>>>>     resource. A
>>>>     >>>>      better place I
>>>>     >>>>      >>> think we should aim at in short term is to always
>>>>     have at
>>>>     >>>>      least a central
>>>>     >>>>      >>> pool (can be 3 or 5) of machines dedicated to build
>>>>     Flink at
>>>>     >>>>      any time, or
>>>>     >>>>      >>> maybe use users resources.
>>>>     >>>>      >>>
>>>>     >>>>      >>> @Chesnay @Robert I synced with Jeff offline that
>>>>     Zeppelin
>>>>     >>>>      community is
>>>>     >>>>      >>> using a Jenkins job to automatically build on users'
>>>>     travis
>>>>     >>>>      account and
>>>>     >>>>      >>> link the result back to github PR. I guess the
>>>>     Jenkins job
>>>>     >>>>      would fetch
>>>>     >>>>      >>> latest upstream master and build the PR against it.
>>>>     Jeff has
>>>>     >>>> filed
>>>>     >>>>      >> tickets
>>>>     >>>>      >>> to learn and get access to the Jenkins infra. It'll
>>>>     better to
>>>>     >>>>      fully
>>>>     >>>>      >>> understand it first before judging this approach.
>>>>     >>>>      >>>
>>>>     >>>>      >>> I also heard good things about CircleCI, and ASF
>>>>     INFRA seems
>>>>     >>>>      to have a
>>>>     >>>>      >> pool
>>>>     >>>>      >>> of build capacity there too. Can be an alternative
>>>>     to consider.
>>>>     >>>>      >>>
>>>>     >>>>      >>>
>>>>     >>>>      >>>
>>>>     >>>>      >>>
>>>>     >>>>      >>>
>>>>     >>>>      >>>
>>>>     >>>>      >>>
>>>>     >>>>      >>>
>>>>     >>>>      >>>
>>>>     >>>>      >>> On Wed, Jun 26, 2019 at 12:44 AM Dawid Wysakowicz <
>>>>     >>>>      >> dwysakowicz@apache.org
>>>> <ma...@apache.org> <mailto:dwysakowicz@apache.org
>>>> <ma...@apache.org>>>
>>>>     >>>>      >>> wrote:
>>>>     >>>>      >>>
>>>>     >>>>      >>>> Sorry to jump in late, but I think Bowen missed the
>>>>     most
>>>>     >>>>      important point
>>>>     >>>>      >>>> from Chesnay's previous message in the summary. The
>>>>     ultimate
>>>>     >>>>      reason for
>>>>     >>>>      >>>> all the problems is that the tests take close to 2
>>>>     hours to
>>>>     >>>>      run already.
>>>>     >>>>      >>>> I fully support this claim: "Unless people start
>>>>     caring about
>>>>     >>>>      test times
>>>>     >>>>      >>>> before adding them, this issue cannot be solved"
>>>>     >>>>      >>>>
>>>>     >>>>      >>>> This is also another reason why using user's Travis
>>>>     account
>>>>     >>>>      won't help.
>>>>     >>>>      >>>> Every few weeks we reach the user's time limit for
>>>>     a single
>>>>     >>>>      profile.
>>>>     >>>>      >>>> This makes the user's builds simply fail, until we
>>>>     either
>>>>     >>>>      properly
>>>>     >>>>      >>>> decrease the time the tests take (which I am not
>>>>     sure we ever
>>>>     >>>>      did) or
>>>>     >>>>      >>>> postpone the problem by splitting into more
>>>>     profiles. (Note
>>>>     >>>>      that the ASF
>>>>     >>>>      >>>> Travis account has higher time limits)
>>>>     >>>>      >>>>
>>>>     >>>>      >>>> Best,
>>>>     >>>>      >>>>
>>>>     >>>>      >>>> Dawid
>>>>     >>>>      >>>>
>>>>     >>>>      >>>> On 26/06/2019 09:36, Robert Metzger wrote:
>>>>     >>>>      >>>>> Do we know if using "the best" available hardware
>>>>     would
>>>>     >>>>      improve the
>>>>     >>>>      >> build
>>>>     >>>>      >>>>> times?
>>>>     >>>>      >>>>> Imagine we would run the build on machines with
>>>>     plenty of
>>>>     >>>>      main memory
>>>>     >>>>      >> to
>>>>     >>>>      >>>>> mount everything to ramdisk + the latest CPU
>>>>     architecture?
>>>>     >>>>      >>>>>
>>>>     >>>>      >>>>> Throwing hardware at the problem could help reduce
>>>>     the time
>>>>     >>>>      of an
>>>>     >>>>      >>>>> individual build, and using our own infrastructure
>>>>     would
>>>>     >>>>      remove our
>>>>     >>>>      >>>>> dependency on Apache's Travis account (with the
>>>>     obvious
>>>>     >>>>      downside of
>>>>     >>>>      >>>> having
>>>>     >>>>      >>>>> to maintain the infrastructure)
>>>>     >>>>      >>>>> We could use an open source travis alternative, to
>>>>     have a
>>>>     >>>>      similar
>>>>     >>>>      >>>>> experience and make the migration easy.
>>>>     >>>>      >>>>>
>>>>     >>>>      >>>>>
>>>>     >>>>      >>>>> On Wed, Jun 26, 2019 at 9:34 AM Chesnay Schepler
>>>>     >>>>      <chesnay@apache.org <ma...@apache.org>
>>>>     <mailto:chesnay@apache.org <ma...@apache.org>>>
>>>>     >>>>      >>>> wrote:
>>>>     >>>>      >>>>>> >From what I gathered, there's no special
>>>>     sauce that the
>>>>     >>>>      Zeppelin
>>>>     >>>>      >>>>>> project uses which actually integrates a users 
>>>> Travis
>>>>     >>>>      account into the
>>>>     >>>>��     >>>> PR.
>>>>     >>>>      >>>>>> They just disabled Travis for PRs. And that's
>>>>     kind of it.
>>>>     >>>>      >>>>>>
>>>>     >>>>      >>>>>> Naturally we can do this (duh) and safe the ASF a
>>>>     fair
>>>>     >>>>      amount of
>>>>     >>>>      >>>>>> resources, but there are downsides:
>>>>     >>>>      >>>>>>
>>>>     >>>>      >>>>>> The discoverability of the Travis check takes a
>>>>     nose-dive.
>>>>     >>>>      Either we
>>>>     >>>>      >>>>>> require every contributor to always, an every
>>>>     commit, also
>>>>     >>>>      post a
>>>>     >>>>      >> Travis
>>>>     >>>>      >>>>>> build, or we have the reviewer sift through the
>>>>     >>>>      contributors account
>>>>     >>>>      >> to
>>>>     >>>>      >>>>>> find it.
>>>>     >>>>      >>>>>>
>>>>     >>>>      >>>>>> This is rather cumbersome. Additionally, it's
>>>>     also not
>>>>     >>>>      equivalent to
>>>>     >>>>      >>>>>> having a PR build.
>>>>     >>>>      >>>>>>
>>>>     >>>>      >>>>>> A normal branch build takes a branch as is and
>>>>     tests it. A
>>>>     >>>>      PR build
>>>>     >>>>      >>>>>> merges the branch into master, and then runs it.
>>>>     (Fun fact:
>>>>     >>>>      This is
>>>>     >>>>      >> why
>>>>     >>>>      >>>>>> a PR without merge conflicts is not being run on
>>>>     Travis.)
>>>>     >>>>      >>>>>>
>>>>     >>>>      >>>>>> And ultimately, everyone can already make use 
>>>> of this
>>>>     >>>>      approach anyway.
>>>>     >>>>      >>>>>>
>>>>     >>>>      >>>>>> On 25/06/2019 08:02, Jark Wu wrote:
>>>>     >>>>      >>>>>>> Hi Jeff,
>>>>     >>>>      >>>>>>>
>>>>     >>>>      >>>>>>> Thanks for sharing the Zeppelin approach. I
>>>>     think it's a
>>>>     >>>>      good idea to
>>>>     >>>>      >>>>>>> leverage user's travis account.
>>>>     >>>>      >>>>>>> In this way, we can have almost unlimited
>>>>     concurrent build
>>>>     >>>>      jobs and
>>>>     >>>>      >>>>>>> developers can restart build by themselves
>>>>     (currently only
>>>>     >>>>      committers
>>>>     >>>>      >>>>>>> can restart PR's build).
>>>>     >>>>      >>>>>>>
>>>>     >>>>      >>>>>>> But I'm still not very clear how to integrate 
>>>> user's
>>>>     >>>>      travis build
>>>>     >>>>      >> into
>>>>     >>>>      >>>>>>> the Flink pull request's build automatically.
>>>>     Can you
>>>>     >>>>      explain more in
>>>>     >>>>      >>>>>>> detail?
>>>>     >>>>      >>>>>>>
>>>>     >>>>      >>>>>>> Another question: does travis only build
>>>>     branches for user
>>>>     >>>>      account?
>>>>     >>>>      >>>>>>> My concern is that builds for PRs will rebase 
>>>> user's
>>>>     >>>>      commits against
>>>>     >>>>      >>>>>>> current master branch.
>>>>     >>>>      >>>>>>> This will help us to find problems before
>>>>     merge.  Builds
>>>>     >>>>      for branches
>>>>     >>>>      >>>>>>> will lose the impact of new commits in master.
>>>>     >>>>      >>>>>>> How does Zeppelin solve this problem?
>>>>     >>>>      >>>>>>>
>>>>     >>>>      >>>>>>> Thanks again for sharing the idea.
>>>>     >>>>      >>>>>>>
>>>>     >>>>      >>>>>>> Regards,
>>>>     >>>>      >>>>>>> Jark
>>>>     >>>>      >>>>>>>
>>>>     >>>>      >>>>>>> On Tue, 25 Jun 2019 at 11:01, Jeff Zhang
>>>>     <zjffdu@gmail.com <ma...@gmail.com>
>>>>     >>>>      <mailto:zjffdu@gmail.com <ma...@gmail.com>>
>>>>     >>>>      >>>>>>> <mailto:zjffdu@gmail.com
>>>> <ma...@gmail.com> <mailto:zjffdu@gmail.com
>>>> <ma...@gmail.com>>>> wrote:
>>>>     >>>>      >>>>>>>
>>>>     >>>>      >>>>>>>  Hi Folks,
>>>>     >>>>      >>>>>>>
>>>>     >>>>      >>>>>>>  Zeppelin meet this kind of issue before, we 
>>>> solve
>>>>     >>>> it by
>>>>     >>>>      >> delegating
>>>>     >>>>      >>>>>>>  each
>>>>     >>>>      >>>>>>>  one's PR build to his travis account
>>>>     (Everyone can
>>>>     >>>>      have 5 free
>>>>     >>>>      >>>>>>>  slot for
>>>>     >>>>      >>>>>>>  travis build).
>>>>     >>>>      >>>>>>>  Apache account travis build is only 
>>>> triggered when
>>>>     >>>>      PR is merged.
>>>>     >>>>      >>>>>>>
>>>>     >>>>      >>>>>>>
>>>>     >>>>      >>>>>>>
>>>>     >>>>      >>>>>>>  Kurt Young <ykt836@gmail.com
>>>> <ma...@gmail.com>
>>>>     >>>>      <mailto:ykt836@gmail.com <ma...@gmail.com>>
>>>>     <mailto:ykt836@gmail.com <ma...@gmail.com>
>>>>     >>>>      <mailto:ykt836@gmail.com <ma...@gmail.com>>>>
>>>>     >>>>      >>>>>>>  于2019年6月25日周二 上午10:16写道:
>>>>     >>>>      >>>>>>>
>>>>     >>>>      >>>>>>>  > (Forgot to cc George)
>>>>     >>>>      >>>>>>>  >
>>>>     >>>>      >>>>>>>  > Best,
>>>>     >>>>      >>>>>>>  > Kurt
>>>>     >>>>      >>>>>>>  >
>>>>     >>>>      >>>>>>>  >
>>>>     >>>>      >>>>>>>  > On Tue, Jun 25, 2019 at 10:16 AM Kurt Young
>>>>     >>>>      <ykt836@gmail.com <ma...@gmail.com>
>>>>     <mailto:ykt836@gmail.com <ma...@gmail.com>>
>>>>     >>>>      >>>>>>> <mailto:ykt836@gmail.com
>>>> <ma...@gmail.com> <mailto:ykt836@gmail.com
>>>> <ma...@gmail.com>>>>
>>>>     >>>>      wrote:
>>>>     >>>>      >>>>>>>  >
>>>>     >>>>      >>>>>>>  > > Hi Bowen,
>>>>     >>>>      >>>>>>>  > >
>>>>     >>>>      >>>>>>>  > > Thanks for bringing this up. We
>>>>     actually have
>>>>     >>>>      discussed
>>>>     >>>>      >> about
>>>>     >>>>      >>>>>>>  this, and I
>>>>     >>>>      >>>>>>>  > > think Till and George have
>>>>     >>>>      >>>>>>>  > > already spend sometime investigating
>>>>     it. I have
>>>>     >>>>      cced both of
>>>>     >>>>      >>>>>>>  them, and
>>>>     >>>>      >>>>>>>  > > maybe they can share
>>>>     >>>>      >>>>>>>  > > their findings.
>>>>     >>>>      >>>>>>>  > >
>>>>     >>>>      >>>>>>>  > > Best,
>>>>     >>>>      >>>>>>>  > > Kurt
>>>>     >>>>      >>>>>>>  > >
>>>>     >>>>      >>>>>>>  > >
>>>>     >>>>      >>>>>>>  > > On Tue, Jun 25, 2019 at 10:08 AM Jark Wu
>>>>     >>>>      <imjark@gmail.com <ma...@gmail.com>
>>>>     <mailto:imjark@gmail.com <ma...@gmail.com>>
>>>>     >>>>      >>>>>>> <mailto:imjark@gmail.com
>>>> <ma...@gmail.com> <mailto:imjark@gmail.com
>>>> <ma...@gmail.com>>>>
>>>>     >>>>      wrote:
>>>>     >>>>      >>>>>>>  > >
>>>>     >>>>      >>>>>>>  > >> Hi Bowen,
>>>>     >>>>      >>>>>>>  > >>
>>>>     >>>>      >>>>>>>  > >> Thanks for bringing this. We also
>>>>     suffered from
>>>>     >>>>      the long
>>>>     >>>>      >>>>>>>  build time.
>>>>     >>>>      >>>>>>>  > >> I agree that we should focus on
>>>>     solving build
>>>>     >>>>      capacity
>>>>     >>>>      >>>>>>>  problem in the
>>>>     >>>>      >>>>>>>  > >> thread.
>>>>     >>>>      >>>>>>>  > >>
>>>>     >>>>      >>>>>>>  > >> My observation is there is only one
>>>>     build is
>>>>     >>>>      running, all
>>>>     >>>>      >> the
>>>>     >>>>      >>>>>>>  others
>>>>     >>>>      >>>>>>>  > >> (other
>>>>     >>>>      >>>>>>>  > >> PRs, master) are pending.
>>>>     >>>>      >>>>>>>  > >> The pricing plan[1] of travis shows
>>>>     it can
>>>>     >>>> support
>>>>     >>>>      >> concurrent
>>>>     >>>>      >>>>>>>  build
>>>>     >>>>      >>>>>>>  > jobs.
>>>>     >>>>      >>>>>>>  > >> But I don't know which plan we are
>>>>     using, might
>>>>     >>>>      be the free
>>>>     >>>>      >>>>>>>  plan for
>>>>     >>>>      >>>>>>>  > open
>>>>     >>>>      >>>>>>>  > >> source.
>>>>     >>>>      >>>>>>>  > >>
>>>>     >>>>      >>>>>>>  > >> I cc-ed Chesnay who may have some
>>>>     experience on
>>>>     >>>>      Travis.
>>>>     >>>>      >>>>>>>  > >>
>>>>     >>>>      >>>>>>>  > >> Regards,
>>>>     >>>>      >>>>>>>  > >> Jark
>>>>     >>>>      >>>>>>>  > >>
>>>>     >>>>      >>>>>>>  > >> [1]: https://travis-ci.com/plans
>>>>     >>>>      >>>>>>>  > >>
>>>>     >>>>      >>>>>>>  > >> On Tue, 25 Jun 2019 at 08:11, Bowen Li <
>>>>     >>>>      >> bowenli86@gmail.com <ma...@gmail.com>
>>>>     <mailto:bowenli86@gmail.com <ma...@gmail.com>>
>>>>     >>>>      >>>>>>> <mailto:bowenli86@gmail.com
>>>> <ma...@gmail.com>
>>>>     >>>>      <mailto:bowenli86@gmail.com
>>>> <ma...@gmail.com>>>> wrote:
>>>>     >>>>      >>>>>>>  > >>
>>>>     >>>>      >>>>>>>  > >> > Hi Steven,
>>>>     >>>>      >>>>>>>  > >> >
>>>>     >>>>      >>>>>>>  > >> > I think you may not read what I
>>>>     wrote. The
>>>>     >>>>      discussion is
>>>>     >>>>      >>>> about
>>>>     >>>>      >>>>>>>  > "unstable
>>>>     >>>>      >>>>>>>  > >> > build **capacity**", in another word
>>>>     >>>>      "unstable / lack of
>>>>     >>>>      >>>> build
>>>>     >>>>      >>>>>>>  > >> resources",
>>>>     >>>>      >>>>>>>  > >> > not "unstable build".
>>>>     >>>>      >>>>>>>  > >> >
>>>>     >>>>      >>>>>>>  > >> > On Mon, Jun 24, 2019 at 4:40 PM
>>>>     Steven Wu
>>>>     >>>>      >>>>>>>  <stevenz3wu@gmail.com
>>>> <ma...@gmail.com> <mailto:stevenz3wu@gmail.com
>>>> <ma...@gmail.com>>
>>>>     >>>>      <mailto:stevenz3wu@gmail.com
>>>> <ma...@gmail.com> <mailto:stevenz3wu@gmail.com
>>>> <ma...@gmail.com>>>>
>>>>     >>>>      >>>>>>>  > wrote:
>>>>     >>>>      >>>>>>>  > >> >
>>>>     >>>>      >>>>>>>  > >> > > long and sometimes unstable build is
>>>>     >>>>      definitely a pain
>>>>     >>>>      >>>>>> point.
>>>>     >>>>      >>>>>>>  > >> > >
>>>>     >>>>      >>>>>>>  > >> > > I suspect the build failure here in
>>>>     >>>>      >> flink-connector-kafka
>>>>     >>>>      >>>>>>>  is not
>>>>     >>>>      >>>>>>>  > >> related
>>>>     >>>>      >>>>>>>  > >> > to
>>>>     >>>>      >>>>>>>  > >> > > my change. but there is no easy
>>>>     re-run the
>>>>     >>>>      build on
>>>>     >>>>      >>>>>>>  travis UI.
>>>>     >>>>      >>>>>>>  > Google
>>>>     >>>>      >>>>>>>  > >> > > search showed a trick of
>>>>     close-and-open the
>>>>     >>>>      PR will
>>>>     >>>>      >>>>>>>  trigger rebuild.
>>>>     >>>>      >>>>>>>  > >> but
>>>>     >>>>      >>>>>>>  > >> > > that could add noises to the PR
>>>>     activities.
>>>>     >>>>      >>>>>>>  > >> > >
>>>>     >>>> https://travis-ci.org/apache/flink/jobs/545555519
>>>>     >>>>      >>>>>>>  > >> > >
>>>>     >>>>      >>>>>>>  > >> > > travis-ci for my personal repo
>>>>     often failed
>>>>     >>>>      with
>>>>     >>>>      >>>>>>>  exceeding time
>>>>     >>>>      >>>>>>>  > limit
>>>>     >>>>      >>>>>>>  > >> > after
>>>>     >>>>      >>>>>>>  > >> > > 4+ hours.
>>>>     >>>>      >>>>>>>  > >> > > The job exceeded the maximum time
>>>>     limit for
>>>>     >>>>      jobs, and
>>>>     >>>>      >> has
>>>>     >>>>      >>>>>>>  been
>>>>     >>>>      >>>>>>>  > >> > terminated.
>>>>     >>>>      >>>>>>>  > >> > >
>>>>     >>>>      >>>>>>>  > >> > > On Mon, Jun 24, 2019 at 4:15 PM
>>>>     Bowen Li
>>>>     >>>>      >>>>>>>  <bowenli86@gmail.com
>>>> <ma...@gmail.com> <mailto:bowenli86@gmail.com
>>>> <ma...@gmail.com>>
>>>>     >>>>      <mailto:bowenli86@gmail.com <ma...@gmail.com>
>>>>     <mailto:bowenli86@gmail.com <ma...@gmail.com>>>>
>>>>     >>>>      >>>>>>>  > wrote:
>>>>     >>>>      >>>>>>>  > >> > >
>>>>     >>>>      >>>>>>>  > >> > > >
>>>>     >>>> https://travis-ci.org/apache/flink/builds/549681530
>>>>     >>>>      >>>>>>>  This build
>>>>     >>>>      >>>>>>>  > >> > request
>>>>     >>>>      >>>>>>>  > >> > > > has
>>>>     >>>>      >>>>>>>  > >> > > > been sitting at **HEAD of the
>>>>     queue**
>>>>     >>>>      since I first
>>>>     >>>>      >> saw
>>>>     >>>>      >>>>>>>  it at PST
>>>>     >>>>      >>>>>>>  > >> > 10:30am
>>>>     >>>>      >>>>>>>  > >> > > > (not sure how long it's been
>>>>     there before
>>>>     >>>>      10:30am).
>>>>     >>>>      >>>>>>>  It's PST
>>>>     >>>>      >>>>>>>  > 4:12pm
>>>>     >>>>      >>>>>>>  > >> now
>>>>     >>>>      >>>>>>>  > >> > > and
>>>>     >>>>      >>>>>>>  > >> > > > it hasn't started yet.
>>>>     >>>>      >>>>>>>  > >> > > >
>>>>     >>>>      >>>>>>>  > >> > > > On Mon, Jun 24, 2019 at 2:48 PM
>>>>     Bowen Li
>>>>     >>>>      >>>>>>>  <bowenli86@gmail.com
>>>> <ma...@gmail.com> <mailto:bowenli86@gmail.com
>>>> <ma...@gmail.com>>
>>>>     >>>>      <mailto:bowenli86@gmail.com <ma...@gmail.com>
>>>>     <mailto:bowenli86@gmail.com <ma...@gmail.com>>>>
>>>>     >>>>      >>>>>>>  > >> wrote:
>>>>     >>>>      >>>>>>>  > >> > > >
>>>>     >>>>      >>>>>>>  > >> > > > > Hi devs,
>>>>     >>>>      >>>>>>>  > >> > > > >
>>>>     >>>>      >>>>>>>  > >> > > > > I've been experiencing the pain
>>>>     >>>>      resulting from lack
>>>>     >>>>      >>>>>>>  of stable
>>>>     >>>>      >>>>>>>  > >> build
>>>>     >>>>      >>>>>>>  > >> > > > > capacity on Travis for Flink
>>>>     PRs [1].
>>>>     >>>>      >> Specifically, I
>>>>     >>>>      >>>>>>>  noticed
>>>>     >>>>      >>>>>>>  > >> often
>>>>     >>>>      >>>>>>>  > >> > > that
>>>>     >>>>      >>>>>>>  > >> > > > no
>>>>     >>>>      >>>>>>>  > >> > > > > build in the queue is making any
>>>>     >>>>      progress for
>>>>     >>>>      >> hours,
>>>>     >>>>      >>>> and
>>>>     >>>>      >>>>>>>  > suddenly
>>>>     >>>>      >>>>>>>  > >> 5
>>>>     >>>>      >>>>>>>  > >> > or
>>>>     >>>>      >>>>>>>  > >> > > 6
>>>>     >>>>      >>>>>>>  > >> > > > > builds kick off all together
>>>>     after the
>>>>     >>>>      long pause.
>>>>     >>>>      >>>>>>>  I'm at PST
>>>>     >>>>      >>>>>>>  > >> > (UTC-08)
>>>>     >>>>      >>>>>>>  > >> > > > time
>>>>     >>>>      >>>>>>>  > >> > > > > zone, and I've seen pause can
>>>>     be as
>>>>     >>>>      long as 6 hours
>>>>     >>>>      >>>>>>>  from PST 9am
>>>>     >>>>      >>>>>>>  > >> to
>>>>     >>>>      >>>>>>>  > >> > 3pm
>>>>     >>>>      >>>>>>>  > >> > > > > (let alone the time needed to
>>>>     drain the
>>>>     >>>>      queue
>>>>     >>>>      >>>>>>>  afterwards).
>>>>     >>>>      >>>>>>>  > >> > > > >
>>>>     >>>>      >>>>>>>  > >> > > > > I think this has greatly
>>>>     impacted our
>>>>     >>>>      productivity.
>>>>     >>>>      >>>> I've
>>>>     >>>>      >>>>>>>  > >> experienced
>>>>     >>>>      >>>>>>>  > >> > > that
>>>>     >>>>      >>>>>>>  > >> > > > > PRs submitted in the early
>>>>     morning of
>>>>     >>>>      PST time zone
>>>>     >>>>      >>>>>>>  won't finish
>>>>     >>>>      >>>>>>>  > >> > their
>>>>     >>>>      >>>>>>>  > >> > > > > build until late night of the
>>>>     same day.
>>>>     >>>>      >>>>>>>  > >> > > > >
>>>>     >>>>      >>>>>>>  > >> > > > > So my questions are:
>>>>     >>>>      >>>>>>>  > >> > > > >
>>>>     >>>>      >>>>>>>  > >> > > > > - Has anyone else experienced
>>>>     the same
>>>>     >>>>      problem or
>>>>     >>>>      >>>>>>>  have similar
>>>>     >>>>      >>>>>>>  > >> > > > observation
>>>>     >>>>      >>>>>>>  > >> > > > > on TravisCI? (I suspect it
>>>>     has things
>>>>     >>>>      to do with
>>>>     >>>>      >> time
>>>>     >>>>      >>>>>>>  zone)
>>>>     >>>>      >>>>>>>  > >> > > > >
>>>>     >>>>      >>>>>>>  > >> > > > > - What pricing plan of
>>>>     TravisCI is
>>>>     >>>>      Flink currently
>>>>     >>>>      >>>>>>>  using? Is it
>>>>     >>>>      >>>>>>>  > >> the
>>>>     >>>>      >>>>>>>  > >> > > free
>>>>     >>>>      >>>>>>>  > >> > > > > plan for open source
>>>>     projects? What
>>>>     >>>> are the
>>>>     >>>>      >>>>>>>  guaranteed build
>>>>     >>>>      >>>>>>>  > >> capacity
>>>>     >>>>      >>>>>>>  > >> > > of
>>>>     >>>>      >>>>>>>  > >> > > > > the current plan?
>>>>     >>>>      >>>>>>>  > >> > > > >
>>>>     >>>>      >>>>>>>  > >> > > > > - If the current pricing plan
>>>>     (either
>>>>     >>>>      free or paid)
>>>>     >>>>      >>>>>> can't
>>>>     >>>>      >>>>>>>  > provide
>>>>     >>>>      >>>>>>>  > >> > > stable
>>>>     >>>>      >>>>>>>  > >> > > > > build capacity, can we
>>>>     upgrade to a
>>>>     >>>>      higher priced
>>>>     >>>>      >>>>>>>  plan with
>>>>     >>>>      >>>>>>>  > larger
>>>>     >>>>      >>>>>>>  > >> > and
>>>>     >>>>      >>>>>>>  > >> > > > more
>>>>     >>>>      >>>>>>>  > >> > > > > stable build capacity?
>>>>     >>>>      >>>>>>>  > >> > > > >
>>>>     >>>>      >>>>>>>  > >> > > > > BTW, another factor that
>>>>     contribute to
>>>>     >>>> the
>>>>     >>>>      >>>>>>>  productivity problem
>>>>     >>>>      >>>>>>>  > is
>>>>     >>>>      >>>>>>>  > >> > that
>>>>     >>>>      >>>>>>>  > >> > > > > our build is slow - we run
>>>>     full build
>>>>     >>>>      for every PR
>>>>     >>>>      >>>> and a
>>>>     >>>>      >>>>>>>  > >> successful
>>>>     >>>>      >>>>>>>  > >> > > full
>>>>     >>>>      >>>>>>>  > >> > > > > build takes ~5h. We
>>>>     definitely have
>>>>     >>>>      more options to
>>>>     >>>>      >>>>>>>  solve it,
>>>>     >>>>      >>>>>>>  > for
>>>>     >>>>      >>>>>>>  > >> > > > instance,
>>>>     >>>>      >>>>>>>  > >> > > > > modularize the build graphs
>>>>     and reuse
>>>>     >>>>      artifacts
>>>>     >>>>      >> from
>>>>     >>>>      >>>> the
>>>>     >>>>      >>>>>>>  > previous
>>>>     >>>>      >>>>>>>  > >> > > build.
>>>>     >>>>      >>>>>>>  > >> > > > > But I think that can be a big
>>>>     effort
>>>>     >>>>      which is much
>>>>     >>>>      >>>>>>>  harder to
>>>>     >>>>      >>>>>>>  > >> > accomplish
>>>>     >>>>      >>>>>>>  > >> > > > in
>>>>     >>>>      >>>>>>>  > >> > > > > a short period of time and
>>>>     may deserve
>>>>     >>>>      its own
>>>>     >>>>      >>>> separate
>>>>     >>>>      >>>>>>>  > >> discussion.
>>>>     >>>>      >>>>>>>  > >> > > > >
>>>>     >>>>      >>>>>>>  > >> > > > > [1]
>>>>     >>>>      >> https://travis-ci.org/apache/flink/pull_requests
>>>>     >>>>      >>>>>>>  > >> > > > >
>>>>     >>>>      >>>>>>>  > >> > > > >
>>>>     >>>>      >>>>>>>  > >> > > >
>>>>     >>>>      >>>>>>>  > >> > >
>>>>     >>>>      >>>>>>>  > >> >
>>>>     >>>>      >>>>>>>  > >>
>>>>     >>>>      >>>>>>>  > >
>>>>     >>>>      >>>>>>>  >
>>>>     >>>>      >>>>>>>
>>>>     >>>>      >>>>>>>
>>>>     >>>>      >>>>>>>  --
>>>>     >>>>      >>>>>>>  Best Regards
>>>>     >>>>      >>>>>>>
>>>>     >>>>      >>>>>>>  Jeff Zhang
>>>>     >>>>      >>>>>>>
>>>>     >>>>      >>
>>>>     >>>>
>>>>     >>>
>>>>     >>
>>>>
>>>
>>>
>>
>
>


Re: [VOTE] Migrate to sponsored Travis account

Posted by JingsongLee <lz...@aliyun.com.INVALID>.
+1 for the migration

Best, JingsongLee


------------------------------------------------------------------
From:Jark Wu <im...@gmail.com>
Send Time:2019年7月5日(星期五) 10:35
To:dev <de...@flink.apache.org>
Cc:private <pr...@flink.apache.org>; Bowen Li <bo...@gmail.com>
Subject:Re: [VOTE] Migrate to sponsored Travis account

+1 for the migration and great thanks to Chesnay and Bowen for pushing this!

Cheers,
Jark

On Fri, 5 Jul 2019 at 09:34, Congxian Qiu <qc...@gmail.com> wrote:

> +1 for the migration.
>
> Best,
> Congxian
>
>
> Hequn Cheng <ch...@gmail.com> 于2019年7月4日周四 下午9:42写道:
>
> > +1.
> >
> > And thanks a lot to Chesnay for pushing this.
> >
> > Best, Hequn
> >
> > On Thu, Jul 4, 2019 at 8:07 PM Chesnay Schepler <ch...@apache.org>
> > wrote:
> >
> > > Note that the Flinkbot approach isn't that trivial either; we can't
> > > _just_ trigger builds for a branch in the apache repo, but would first
> > > have to clone the branch/pr into a separate repository (that is owned
> by
> > > the github account that the travis account would be tied to).
> > >
> > > One roadblock after the next showing up...
> > >
> > > On 04/07/2019 11:59, Chesnay Schepler wrote:
> > > > Small update with mostly bad news:
> > > >
> > > > INFRA doesn't know whether it is possible, and referred my to Travis
> > > > support.
> > > > They did point out that it could be problematic in regards to
> > > > read/write permissions for the repository.
> > > >
> > > > From my own findings /so far/ with a test repo/organization, it does
> > > > not appear possible to configure the Travis account used for a
> > > > specific repository.
> > > >
> > > > So yeah, if we go down this route we may have to pimp the Flinkbot to
> > > > trigger builds through the Travis REST API.
> > > >
> > > > On 04/07/2019 10:46, Chesnay Schepler wrote:
> > > >> I've raised a JIRA
> > > >> <https://issues.apache.org/jira/browse/INFRA-18703>with INFRA to
> > > >> inquire whether it would be possible to switch to a different Travis
> > > >> account, and if so what steps would need to be taken.
> > > >> We need a proper confirmation from INFRA since we are not in full
> > > >> control of the flink repository (for example, we cannot access the
> > > >> settings page).
> > > >>
> > > >> If this is indeed possible, Ververica is willing sponsor a Travis
> > > >> account for the Flink project.
> > > >> This would provide us with more than enough resources than we need.
> > > >>
> > > >> Since this makes the project more reliant on resources provided by
> > > >> external companies I would like to vote on this.
> > > >>
> > > >> Please vote on this proposal, as follows:
> > > >> [ ] +1, Approve the migration to a Ververica-sponsored Travis
> > > >> account, provided that INFRA approves
> > > >> [ ] -1, Do not approach the migration to a Ververica-sponsored
> Travis
> > > >> account
> > > >>
> > > >> The vote will be open for at least 24h, and until we have
> > > >> confirmation from INFRA. The voting period may be shorter than the
> > > >> usual 3 days since our current is effectively not working.
> > > >>
> > > >> On 04/07/2019 06:51, Bowen Li wrote:
> > > >>> Re: > Are they using their own Travis CI pool, or did the switch to
> > > >>> an entirely different CI service?
> > > >>>
> > > >>> I reached out to Wes and Krisztián from Apache Arrow PMC. They are
> > > >>> currently moving away from ASF's Travis to their own in-house metal
> > > >>> machines at [1] with custom CI application at [2]. They've seen
> > > >>> significant improvement w.r.t both much higher performance and
> > > >>> basically no resource waiting time, "night-and-day" difference
> > > >>> quoting Wes.
> > > >>>
> > > >>> Re: > If we can just switch to our own Travis pool, just for our
> > > >>> project, then this might be something we can do fairly quickly?
> > > >>>
> > > >>> I believe so, according to [3] and [4]
> > > >>>
> > > >>>
> > > >>> [1] https://ci.ursalabs.org/ <https://ci.ursalabs.org/#/>
> > > >>> [2] https://github.com/ursa-labs/ursabot
> > > >>> [3]
> > > >>>
> > >
> https://docs.travis-ci.com/user/migrate/open-source-repository-migration
> > > >>>
> > > >>> [4]
> > > >>>
> https://docs.travis-ci.com/user/migrate/open-source-on-travis-ci-com
> > > >>>
> > > >>>
> > > >>>
> > > >>> On Wed, Jul 3, 2019 at 12:01 AM Chesnay Schepler <
> chesnay@apache.org
> > > >>> <ma...@apache.org>> wrote:
> > > >>>
> > > >>>     Are they using their own Travis CI pool, or did the switch to
> an
> > > >>>     entirely different CI service?
> > > >>>
> > > >>>     If we can just switch to our own Travis pool, just for our
> > > >>>     project, then
> > > >>>     this might be something we can do fairly quickly?
> > > >>>
> > > >>>     On 03/07/2019 05:55, Bowen Li wrote:
> > > >>>     > I responded in the INFRA ticket [1] that I believe they are
> > > >>>     using a wrong
> > > >>>     > metric against Flink and the total build time is a completely
> > > >>>     different
> > > >>>     > thing than guaranteed build capacity.
> > > >>>     >
> > > >>>     > My response:
> > > >>>     >
> > > >>>     > "As mentioned above, since I started to pay attention to
> > Flink's
> > > >>>     build
> > > >>>     > queue a few tens of days ago, I'm in Seattle and I saw no
> build
> > > >>>     was kicking
> > > >>>     > off in PST daytime in weekdays for Flink. Our teammates in
> > China
> > > >>>     and Europe
> > > >>>     > have also reported similar observations. So we need to
> evaluate
> > > >>>     how the
> > > >>>     > large total build time came from - if 1) your number and 2)
> our
> > > >>>     > observations from three locations that cover pretty much a
> full
> > > >>>     day, are
> > > >>>     > all true, I **guess** one reason can be that - highly likely
> > the
> > > >>>     extra
> > > >>>     > build time came from weekends when other Apache projects may
> be
> > > >>>     idle and
> > > >>>     > Flink just drains hard its congested queue.
> > > >>>     >
> > > >>>     > Please be aware of that we're not complaining about the lack
> of
> > > >>>     resources
> > > >>>     > in general, I'm complaining about the lack of **stable,
> > > >>> dedicated**
> > > >>>     > resources. An example for the latter one is, currently even
> if
> > > >>>     no build is
> > > >>>     > in Flink's queue and I submit a request to be the queue head
> > > >>> in PST
> > > >>>     > morning, my build won't even start in 6-8+h. That is an
> absurd
> > > >>>     amount of
> > > >>>     > waiting time.
> > > >>>     >
> > > >>>     > That's saying, if ASF INFRA decides to adopt a quota system
> and
> > > >>>     grants
> > > >>>     > Flink five DEDICATED servers that runs all the time only for
> > > >>>     Flink, that'll
> > > >>>     > be PERFECT and can totally solve our problem now.
> > > >>>     >
> > > >>>     > Please be aware of that we're not complaining about the lack
> of
> > > >>>     resources
> > > >>>     > in general, I'm complaining about the lack of **stable,
> > > >>> dedicated**
> > > >>>     > resources. An example for the latter one is, currently even
> if
> > > >>>     no build is
> > > >>>     > in Flink's queue and I submit a request to be the queue head
> > > >>> in PST
> > > >>>     > morning, my build won't even start in 6-8+h. That is an
> absurd
> > > >>>     amount of
> > > >>>     > waiting time.
> > > >>>     >
> > > >>>     >
> > > >>>     > That's saying, if ASF INFRA decides to adopt a quota system
> and
> > > >>>     grants
> > > >>>     > Flink five DEDICATED servers that runs all the time only for
> > > >>>     Flink, that'll
> > > >>>     > be PERFECT and can totally solve our problem now.
> > > >>>     >
> > > >>>     > I feel what's missing in the ASF INFRA's Travis resource pool
> > is
> > > >>>     some level
> > > >>>     > of build capacity SLAs and certainty"
> > > >>>     >
> > > >>>     >
> > > >>>     > Again, I believe there are differences in nature of these two
> > > >>>     problems,
> > > >>>     > long build time v.s. lack of dedicated build resource. That's
> > > >>>     saying,
> > > >>>     > shortening build time may relieve the situation, and may not.
> > > >>>     I'm sightly
> > > >>>     > negative on disabling IT cases for PRs, due to the downside
> is
> > > >>>     that we are
> > > >>>     > at risk of any potential bugs in PR that UTs doesn't catch,
> and
> > > >>>     may cost a
> > > >>>     > lot more to fix and if it slows others down or even block
> > > >>>     others, but am
> > > >>>     > open to others opinions on it.
> > > >>>     >
> > > >>>     > AFAICT from INFRA ticket[1], donating to ASF INFRA won't be
> > > >>>     feasible to
> > > >>>     > solve our problem since INFRA's pool is fully shared and they
> > > >>>     have no
> > > >>>     > control and finer insights over resource allocation to a
> > > >>>     specific Apache
> > > >>>     > project. As mentioned in [1], Apache Arrow is moving away
> from
> > > >>>     ASF INFRA
> > > >>>     > Travis pool (they are actually surprised Flink hasn't plan to
> > do
> > > >>>     so). I
> > > >>>     > know that Spark is on its own build infra. If we all agree
> that
> > > >>>     funding our
> > > >>>     > own build infra, I'd be glad to help investigate any
> potential
> > > >>>     options
> > > >>>     > after releasing 1.9 since I'm super busy with 1.9 now.
> > > >>>     >
> > > >>>     > [1] https://issues.apache.org/jira/browse/INFRA-18533
> > > >>>     >
> > > >>>     >
> > > >>>     >
> > > >>>     > On Tue, Jul 2, 2019 at 4:46 AM Chesnay Schepler
> > > >>>     <chesnay@apache.org <ma...@apache.org>> wrote:
> > > >>>     >
> > > >>>     >> As a short-term stopgap, since we can assume this issue to
> > > >>>     become much
> > > >>>     >> worse in the following days/weeks, we could disable IT cases
> > in
> > > >>>     PRs and
> > > >>>     >> only run them on master.
> > > >>>     >>
> > > >>>     >> On 02/07/2019 12:03, Chesnay Schepler wrote:
> > > >>>     >>> People really have to stop thinking that just because
> > > >>>     something works
> > > >>>     >>> for us it is also a good solution.
> > > >>>     >>> Also, please remember that our builds run for 2h from start
> > to
> > > >>>     finish,
> > > >>>     >>> and not the 14 _minutes_ it takes for zeppelin.
> > > >>>     >>> We are dealing with an entirely different scale here, both
> in
> > > >>>     terms of
> > > >>>     >>> build times and number of builds.
> > > >>>     >>>
> > > >>>     >>> In this very thread people have been complaining about long
> > > >>> queue
> > > >>>     >>> times for their builds. Surprise, other Apache projects
> have
> > > >>> been
> > > >>>     >>> suffering the very same thing due to us not controlling our
> > > >>> build
> > > >>>     >>> times. While switching services (be it Jenkins, CircleCI or
> > > >>>     whatever)
> > > >>>     >>> will possibly work for us (and these options are actually
> > > >>>     attractive,
> > > >>>     >>> like CircleCI's proper support for build artifacts), it
> will
> > > >>> also
> > > >>>     >>> result in us likely negatively affecting other projects in
> > > >>>     significant
> > > >>>     >>> ways.
> > > >>>     >>>
> > > >>>     >>> Sure, the Jenkins setup has a good user experience for us,
> at
> > > >>>     the cost
> > > >>>     >>> of blocking Jenkins workers for a _lot_ of time. Right now
> we
> > > >>>     have 25
> > > >>>     >>> PR's in our queue; that's possibly 50h we'd consume of
> > Jenkins
> > > >>>     >>> resources, and the European contributors haven't even
> really
> > > >>>     started yet.
> > > >>>     >>>
> > > >>>     >>> FYI, the latest INFRA response from INFRA-18533:
> > > >>>     >>>
> > > >>>     >>> "Our rough metrics shows that Flink used over 5800 hours of
> > > >>>     build time
> > > >>>     >>> last month. That is equal to EIGHT servers running 24/7 for
> > > >>>     the ENTIRE
> > > >>>     >>> MONTH. EIGHT. nonstop.
> > > >>>     >>> When we discovered this last night, we discussed it some
> and
> > > >>>     are going
> > > >>>     >>> to tune down Flink to allow only five executors maximum. We
> > > >>> cannot
> > > >>>     >>> allow Flink to consume so much of a Foundation shared
> > > >>> resource."
> > > >>>     >>>
> > > >>>     >>> So yes, we either
> > > >>>     >>> a) have to heavily reduce our CI usage or
> > > >>>     >>> b) fund our own, either maintaining it ourselves or
> donating
> > > >>>     to Apache.
> > > >>>     >>>
> > > >>>     >>> On 02/07/2019 05:11, Bowen Li wrote:
> > > >>>     >>>> By looking at the git history of the Jenkins script, its
> > core
> > > >>>     part
> > > >>>     >>>> was finished in March 2017 (and only two minor update in
> > > >>>     2017/2018),
> > > >>>     >>>> so it's been running for over two years now and feels like
> > > >>>     Zepplin
> > > >>>     >>>> community has been quite happy with it. @Jeff Zhang
> > > >>>     >>>> <mailto:zjffdu@gmail.com <ma...@gmail.com>> can
> you
> > > >>>     share your insights and user
> > > >>>     >>>> experience with the Jenkins+Travis approach?
> > > >>>     >>>>
> > > >>>     >>>> Things like:
> > > >>>     >>>>
> > > >>>     >>>> - has the approach completely solved the resource capacity
> > > >>>     problem
> > > >>>     >>>> for Zepplin community? is Zepplin community happy with the
> > > >>>     result?
> > > >>>     >>>> - is the whole configuration chain stable (e.g. uptime)
> > > >>> enough?
> > > >>>     >>>> - how often do you need to maintain the Jenkins infra? how
> > > >>> many
> > > >>>     >>>> people are usually involved in maintenance and bug-fixes?
> > > >>>     >>>>
> > > >>>     >>>> The downside of this approach seems mostly to be on the
> > > >>>     maintenance
> > > >>>     >>>> to me - maintain the script and Jenkins infra.
> > > >>>     >>>>
> > > >>>     >>>> ** Having Our Own Travis-CI.com Account **
> > > >>>     >>>>
> > > >>>     >>>> Another alternative I've been thinking of is to have our
> own
> > > >>>     >>>> travis-ci.com <http://travis-ci.com> <
> http://travis-ci.com>
> > > >>>     account with paid dedicated
> > > >>>     >>>> resources. Note travis-ci.org <http://travis-ci.org>
> > > >>> <http://travis-ci.org> is the free
> > > >>>     >>>> version and travis-ci.com <http://travis-ci.com>
> > > >>> <http://travis-ci.com> is the commercial
> > > >>>     >>>> version. We currently use a shared resource pool managed
> by
> > > >>>     ASK INFRA
> > > >>>     >>>> team on travis-ci.org <http://travis-ci.org>
> > > >>> <http://travis-ci.org>, but we have no control
> > > >>>     >>>> over it - we can't see how it's configured, how much
> > > >>>     resources are
> > > >>>     >>>> available, how resources are allocated among Apache
> > projects,
> > > >>>     etc.
> > > >>>     >>>> The nice thing about having an account on travis-ci.com
> > > >>> <http://travis-ci.com>
> > > >>>     >>>> <http://travis-ci.com> are:
> > > >>>     >>>>
> > > >>>     >>>> - relatively low cost with much better resource guarantee
> > > >>>     than what
> > > >>>     >>>> we currently have [1]: $249/month with 5 dedicated
> > > >>> concurrency,
> > > >>>     >>>> $489/month with 10 concurrency
> > > >>>     >>>> - low maintenance work compared to using Jenkins
> > > >>>     >>>> - (potentially) no migration cost according to Travis's
> doc
> > > >>> [2]
> > > >>>     >>>> (pending verification)
> > > >>>     >>>> - full control over the build capacity/configuration
> > > >>> compared to
> > > >>>     >>>> using ASF INFRA's pool
> > > >>>     >>>>
> > > >>>     >>>> I'd be surprised if we as such a vibrant community cannot
> > > >>>     find and
> > > >>>     >>>> fund $249*12=$2988 a year in exchange for a much better
> > > >>> developer
> > > >>>     >>>> experience and much higher productivity.
> > > >>>     >>>>
> > > >>>     >>>> [1] https://travis-ci.com/plans
> > > >>>     >>>> [2]
> > > >>>     >>>>
> > > >>>     >>
> > > >>>
> > >
> https://docs.travis-ci.com/user/migrate/open-source-repository-migration
> > > >>>
> > > >>>     >>>> On Sat, Jun 29, 2019 at 8:39 AM Chesnay Schepler
> > > >>>     <chesnay@apache.org <ma...@apache.org>
> > > >>>     >>>> <mailto:chesnay@apache.org <ma...@apache.org>>>
> > > >>> wrote:
> > > >>>     >>>>
> > > >>>     >>>>      So yes, the Jenkins job keeps pulling the state from
> > > >>>     Travis until it
> > > >>>     >>>>      finishes.
> > > >>>     >>>>
> > > >>>     >>>>      Note sure I'm comfortable with the idea of using
> > Jenkins
> > > >>>     workers
> > > >>>     >>>>      just to
> > > >>>     >>>>      idle for a several hours.
> > > >>>     >>>>
> > > >>>     >>>>      On 29/06/2019 14:56, Jeff Zhang wrote:
> > > >>>     >>>>      > Here's what zeppelin community did, we make a
> python
> > > >>>     script to
> > > >>>     >>>>      check the
> > > >>>     >>>>      > build status of pull request.
> > > >>>     >>>>      > Here's script:
> > > >>>     >>>>      >
> > > >>> https://github.com/apache/zeppelin/blob/master/travis_check.py
> > > >>>     >>>>      >
> > > >>>     >>>>      > And this is the script we used in Jenkins build
> job.
> > > >>>     >>>>      >
> > > >>>     >>>>      > if [ -f "travis_check.py" ]; then
> > > >>>     >>>>      >    git log -n 1
> > > >>>     >>>>      >    STATUS=$(curl -s $BUILD_URL | grep -e "GitHub
> pull
> > > >>>     >>>>      request.*from.*" | sed
> > > >>>     >>>>      > 's/.*GitHub pull request <a
> > > >>>     >>>>      >
> href=\"\(https[^"]*\).*from[^"]*.\(https[^"]*\).*/\1
> > > >>>     \2/g')
> > > >>>     >>>>      >    AUTHOR=$(echo $STATUS | sed
> 's/.*[/]\(.*\)$/\1/g')
> > > >>>     >>>>      >    PR=$(echo $STATUS | awk '{print $1}' | sed
> > > >>>     >>>> 's/.*[/]\(.*\)$/\1/g')
> > > >>>     >>>>      >    #COMMIT=$(git log -n 1 | grep "^Merge:" | awk
> > > >>>     '{print $3}')
> > > >>>     >>>>      >    #if [ -z $COMMIT ]; then
> > > >>>     >>>>      >    #  COMMIT=$(curl -s
> > > >>>     >>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR
> > > >>>     >>>>      > | grep -e "\"label\":" -e "\"ref\":" -e "\"sha\":"
> |
> > > >>>     tr '\n' ' '
> > > >>>     >>>>      | sed
> > > >>>     >>>>      > 's/\(.*sha[^,]*,\)\(.*ref.*\)/\1 = \2/g' | tr =
> '\n'
> > |
> > > >>>     grep -v
> > > >>>     >>>>      "apache:" |
> > > >>>     >>>>      > sed 's/.*sha.[^"]*["]\([^"]*\).*/\1/g')
> > > >>>     >>>>      >    #fi
> > > >>>     >>>>      >
> > > >>>     >>>>      >    # get commit hash from PR
> > > >>>     >>>>      >    COMMIT=$(curl -s
> > > >>>     >>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR |
> > > >>>     >>>>      > grep -e "\"label\":" -e "\"ref\":" -e "\"sha\":" |
> tr
> > > >>>     '\n' ' '
> > > >>>     >>>> | sed
> > > >>>     >>>>      > 's/\(.*sha[^,]*,\)\(.*ref.*\)/\1 = \2/g' | tr =
> '\n'
> > |
> > > >>>     grep -v
> > > >>>     >>>>      "apache:" |
> > > >>>     >>>>      > sed 's/.*sha.[^"]*["]\([^"]*\).*/\1/g')
> > > >>>     >>>>      >    sleep 30 # sleep few moment to wait travis
> starts
> > > >>>     the build
> > > >>>     >>>>      >    RET_CODE=0
> > > >>>     >>>>      >    python ./travis_check.py ${AUTHOR} ${COMMIT} ||
> > > >>>     RET_CODE=$?
> > > >>>     >>>>      >    if [ $RET_CODE -eq 2 ]; then # try with
> repository
> > > >>>     name when
> > > >>>     >>>>      travis-ci is
> > > >>>     >>>>      > not available in the account
> > > >>>     >>>>      >      RET_CODE=0
> > > >>>     >>>>      >      AUTHOR=$(curl -s
> > > >>>     >>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR
> > > >>>     >>>>      > | grep '"full_name":' | grep -v "apache/zeppelin" |
> > sed
> > > >>>     >>>>      > 's/.*[:][^"]*["]\([^/]*\).*/\1/g')
> > > >>>     >>>>      >    python ./travis_check.py ${AUTHOR} ${COMMIT} ||
> > > >>>     RET_CODE=$?
> > > >>>     >>>>      >    fi
> > > >>>     >>>>      >
> > > >>>     >>>>      >    if [ $RET_CODE -eq 2 ]; then # fail with can't
> > find
> > > >>>     build
> > > >>>     >>>>      information in
> > > >>>     >>>>      > the travis
> > > >>>     >>>>      >      set +x
> > > >>>     >>>>      >      echo
> > > >>>     "-----------------------------------------------------"
> > > >>>     >>>>      >      echo "Looks like travis-ci is not configured
> for
> > > >>>     your fork."
> > > >>>     >>>>      >      echo "Please setup by swich on 'zeppelin'
> > > >>>     repository at
> > > >>>     >>>>      > https://travis-ci.org/profile and travis-ci."
> > > >>>     >>>>      >      echo "And then make sure 'Build branch
> updates'
> > > >>>     option is
> > > >>>     >>>>      enabled in
> > > >>>     >>>>      > the settings
> > > >>> https://travis-ci.org/${AUTHOR}/zeppelin/settings
> > > >>> <https://travis-ci.org/$%7BAUTHOR%7D/zeppelin/settings>
> > > >>>     >>>> <https://travis-ci.org/$%7BAUTHOR%7D/zeppelin/settings>."
> > > >>>     >>>>      >      echo ""
> > > >>>     >>>>      >      echo "To trigger CI after setup, you will need
> > > >>>     ammend your
> > > >>>     >>>>      last commit
> > > >>>     >>>>      > with"
> > > >>>     >>>>      >      echo "git commit --amend"
> > > >>>     >>>>      >      echo "git push your-remote HEAD --force"
> > > >>>     >>>>      >      echo ""
> > > >>>     >>>>      >      echo "See
> > > >>>     >>>>      >
> > > >>>     >>>>
> > > >>>     >>
> > > >>>
> > >
> >
> http://zeppelin.apache.org/contribution/contributions.html#continuous-integration
> > > >>>     >>>>      > ."
> > > >>>     >>>>      >    fi
> > > >>>     >>>>      >
> > > >>>     >>>>      >    exit $RET_CODE
> > > >>>     >>>>      > else
> > > >>>     >>>>      >    set +x
> > > >>>     >>>>      >    echo "travis_check.py does not exists"
> > > >>>     >>>>      >    exit 1
> > > >>>     >>>>      > fi
> > > >>>     >>>>      >
> > > >>>     >>>>      > Chesnay Schepler <chesnay@apache.org
> > > >>> <ma...@apache.org>
> > > >>>     >>>>      <mailto:chesnay@apache.org <mailto:
> chesnay@apache.org
> > >>>
> > > >>>     于2019年6月29日周六 下午3:17写道:
> > > >>>     >>>>      >
> > > >>>     >>>>      >> Does this imply that a Jenkins job is active as
> long
> > > >>>     as the
> > > >>>     >>>>      Travis build
> > > >>>     >>>>      >> runs?
> > > >>>     >>>>      >>
> > > >>>     >>>>      >> On 26/06/2019 21:28, Bowen Li wrote:
> > > >>>     >>>>      >>> Hi,
> > > >>>     >>>>      >>>
> > > >>>     >>>>      >>> @Dawid, I think the "long test running" as I
> > > >>>     mentioned in the
> > > >>>     >>>>      first
> > > >>>     >>>>      >> email,
> > > >>>     >>>>      >>> also as you guys said, belongs to "a big effort
> > > >>>     which is much
> > > >>>     >>>>      harder to
> > > >>>     >>>>      >>> accomplish in a short period of time and may
> > deserve
> > > >>>     its own
> > > >>>     >>>>      separate
> > > >>>     >>>>      >>> discussion". Thus I didn't include it in what we
> > can
> > > >>>     do in a
> > > >>>     >>>>      foreseeable
> > > >>>     >>>>      >>> short term.
> > > >>>     >>>>      >>>
> > > >>>     >>>>      >>> Besides, I don't think that's the ultimate reason
> > > >>>     for lack of
> > > >>>     >>>>      build
> > > >>>     >>>>      >>> resources. Even if the build is shortened to
> > > >>>     something like
> > > >>>     >>>>      2h, the
> > > >>>     >>>>      >>> problems of no build machine works about 6 or
> more
> > > >>>     hours in
> > > >>>     >>>>      PST daytime
> > > >>>     >>>>      >>> that I described will still happen, because no
> > > >>>     machine from
> > > >>>     >>>>      ASF INFRA's
> > > >>>     >>>>      >>> pool is allocated to Flink. As I have paid close
> > > >>>     attention to
> > > >>>     >>>>      the build
> > > >>>     >>>>      >>> queue in the past few weekdays, it's a pretty
> clear
> > > >>>     pattern now.
> > > >>>     >>>>      >>>
> > > >>>     >>>>      >>> **The ultimate root cause** for that is - we
> don't
> > > >>>     have any
> > > >>>     >>>>      **dedicated**
> > > >>>     >>>>      >>> build resources that we can stably rely on. I'm
> > > >>>     actually ok to
> > > >>>     >>>>      wait for a
> > > >>>     >>>>      >>> long time if there are build requests running, it
> > > >>>     means at
> > > >>>     >>>>      least we are
> > > >>>     >>>>      >>> making progress. But I'm not ok with no build
> > > >>>     resource. A
> > > >>>     >>>>      better place I
> > > >>>     >>>>      >>> think we should aim at in short term is to always
> > > >>>     have at
> > > >>>     >>>>      least a central
> > > >>>     >>>>      >>> pool (can be 3 or 5) of machines dedicated to
> build
> > > >>>     Flink at
> > > >>>     >>>>      any time, or
> > > >>>     >>>>      >>> maybe use users resources.
> > > >>>     >>>>      >>>
> > > >>>     >>>>      >>> @Chesnay @Robert I synced with Jeff offline that
> > > >>>     Zeppelin
> > > >>>     >>>>      community is
> > > >>>     >>>>      >>> using a Jenkins job to automatically build on
> > users'
> > > >>>     travis
> > > >>>     >>>>      account and
> > > >>>     >>>>      >>> link the result back to github PR. I guess the
> > > >>>     Jenkins job
> > > >>>     >>>>      would fetch
> > > >>>     >>>>      >>> latest upstream master and build the PR against
> it.
> > > >>>     Jeff has
> > > >>>     >>>> filed
> > > >>>     >>>>      >> tickets
> > > >>>     >>>>      >>> to learn and get access to the Jenkins infra.
> It'll
> > > >>>     better to
> > > >>>     >>>>      fully
> > > >>>     >>>>      >>> understand it first before judging this approach.
> > > >>>     >>>>      >>>
> > > >>>     >>>>      >>> I also heard good things about CircleCI, and ASF
> > > >>>     INFRA seems
> > > >>>     >>>>      to have a
> > > >>>     >>>>      >> pool
> > > >>>     >>>>      >>> of build capacity there too. Can be an
> alternative
> > > >>>     to consider.
> > > >>>     >>>>      >>>
> > > >>>     >>>>      >>>
> > > >>>     >>>>      >>>
> > > >>>     >>>>      >>>
> > > >>>     >>>>      >>>
> > > >>>     >>>>      >>>
> > > >>>     >>>>      >>>
> > > >>>     >>>>      >>>
> > > >>>     >>>>      >>>
> > > >>>     >>>>      >>> On Wed, Jun 26, 2019 at 12:44 AM Dawid
> Wysakowicz <
> > > >>>     >>>>      >> dwysakowicz@apache.org
> > > >>> <ma...@apache.org> <mailto:dwysakowicz@apache.org
> > > >>> <ma...@apache.org>>>
> > > >>>     >>>>      >>> wrote:
> > > >>>     >>>>      >>>
> > > >>>     >>>>      >>>> Sorry to jump in late, but I think Bowen missed
> > the
> > > >>>     most
> > > >>>     >>>>      important point
> > > >>>     >>>>      >>>> from Chesnay's previous message in the summary.
> > The
> > > >>>     ultimate
> > > >>>     >>>>      reason for
> > > >>>     >>>>      >>>> all the problems is that the tests take close
> to 2
> > > >>>     hours to
> > > >>>     >>>>      run already.
> > > >>>     >>>>      >>>> I fully support this claim: "Unless people start
> > > >>>     caring about
> > > >>>     >>>>      test times
> > > >>>     >>>>      >>>> before adding them, this issue cannot be solved"
> > > >>>     >>>>      >>>>
> > > >>>     >>>>      >>>> This is also another reason why using user's
> > Travis
> > > >>>     account
> > > >>>     >>>>      won't help.
> > > >>>     >>>>      >>>> Every few weeks we reach the user's time limit
> for
> > > >>>     a single
> > > >>>     >>>>      profile.
> > > >>>     >>>>      >>>> This makes the user's builds simply fail, until
> we
> > > >>>     either
> > > >>>     >>>>      properly
> > > >>>     >>>>      >>>> decrease the time the tests take (which I am not
> > > >>>     sure we ever
> > > >>>     >>>>      did) or
> > > >>>     >>>>      >>>> postpone the problem by splitting into more
> > > >>>     profiles. (Note
> > > >>>     >>>>      that the ASF
> > > >>>     >>>>      >>>> Travis account has higher time limits)
> > > >>>     >>>>      >>>>
> > > >>>     >>>>      >>>> Best,
> > > >>>     >>>>      >>>>
> > > >>>     >>>>      >>>> Dawid
> > > >>>     >>>>      >>>>
> > > >>>     >>>>      >>>> On 26/06/2019 09:36, Robert Metzger wrote:
> > > >>>     >>>>      >>>>> Do we know if using "the best" available
> hardware
> > > >>>     would
> > > >>>     >>>>      improve the
> > > >>>     >>>>      >> build
> > > >>>     >>>>      >>>>> times?
> > > >>>     >>>>      >>>>> Imagine we would run the build on machines with
> > > >>>     plenty of
> > > >>>     >>>>      main memory
> > > >>>     >>>>      >> to
> > > >>>     >>>>      >>>>> mount everything to ramdisk + the latest CPU
> > > >>>     architecture?
> > > >>>     >>>>      >>>>>
> > > >>>     >>>>      >>>>> Throwing hardware at the problem could help
> > reduce
> > > >>>     the time
> > > >>>     >>>>      of an
> > > >>>     >>>>      >>>>> individual build, and using our own
> > infrastructure
> > > >>>     would
> > > >>>     >>>>      remove our
> > > >>>     >>>>      >>>>> dependency on Apache's Travis account (with the
> > > >>>     obvious
> > > >>>     >>>>      downside of
> > > >>>     >>>>      >>>> having
> > > >>>     >>>>      >>>>> to maintain the infrastructure)
> > > >>>     >>>>      >>>>> We could use an open source travis alternative,
> > to
> > > >>>     have a
> > > >>>     >>>>      similar
> > > >>>     >>>>      >>>>> experience and make the migration easy.
> > > >>>     >>>>      >>>>>
> > > >>>     >>>>      >>>>>
> > > >>>     >>>>      >>>>> On Wed, Jun 26, 2019 at 9:34 AM Chesnay
> Schepler
> > > >>>     >>>>      <chesnay@apache.org <ma...@apache.org>
> > > >>>     <mailto:chesnay@apache.org <ma...@apache.org>>>
> > > >>>     >>>>      >>>> wrote:
> > > >>>     >>>>      >>>>>>    >From what I gathered, there's no special
> > > >>>     sauce that the
> > > >>>     >>>>      Zeppelin
> > > >>>     >>>>      >>>>>> project uses which actually integrates a users
> > > >>> Travis
> > > >>>     >>>>      account into the
> > > >>>     >>>>      >>>> PR.
> > > >>>     >>>>      >>>>>> They just disabled Travis for PRs. And that's
> > > >>>     kind of it.
> > > >>>     >>>>      >>>>>>
> > > >>>     >>>>      >>>>>> Naturally we can do this (duh) and safe the
> ASF
> > a
> > > >>>     fair
> > > >>>     >>>>      amount of
> > > >>>     >>>>      >>>>>> resources, but there are downsides:
> > > >>>     >>>>      >>>>>>
> > > >>>     >>>>      >>>>>> The discoverability of the Travis check takes
> a
> > > >>>     nose-dive.
> > > >>>     >>>>      Either we
> > > >>>     >>>>      >>>>>> require every contributor to always, an every
> > > >>>     commit, also
> > > >>>     >>>>      post a
> > > >>>     >>>>      >> Travis
> > > >>>     >>>>      >>>>>> build, or we have the reviewer sift through
> the
> > > >>>     >>>>      contributors account
> > > >>>     >>>>      >> to
> > > >>>     >>>>      >>>>>> find it.
> > > >>>     >>>>      >>>>>>
> > > >>>     >>>>      >>>>>> This is rather cumbersome. Additionally, it's
> > > >>>     also not
> > > >>>     >>>>      equivalent to
> > > >>>     >>>>      >>>>>> having a PR build.
> > > >>>     >>>>      >>>>>>
> > > >>>     >>>>      >>>>>> A normal branch build takes a branch as is and
> > > >>>     tests it. A
> > > >>>     >>>>      PR build
> > > >>>     >>>>      >>>>>> merges the branch into master, and then runs
> it.
> > > >>>     (Fun fact:
> > > >>>     >>>>      This is
> > > >>>     >>>>      >> why
> > > >>>     >>>>      >>>>>> a PR without merge conflicts is not being run
> on
> > > >>>     Travis.)
> > > >>>     >>>>      >>>>>>
> > > >>>     >>>>      >>>>>> And ultimately, everyone can already make use
> > > >>> of this
> > > >>>     >>>>      approach anyway.
> > > >>>     >>>>      >>>>>>
> > > >>>     >>>>      >>>>>> On 25/06/2019 08:02, Jark Wu wrote:
> > > >>>     >>>>      >>>>>>> Hi Jeff,
> > > >>>     >>>>      >>>>>>>
> > > >>>     >>>>      >>>>>>> Thanks for sharing the Zeppelin approach. I
> > > >>>     think it's a
> > > >>>     >>>>      good idea to
> > > >>>     >>>>      >>>>>>> leverage user's travis account.
> > > >>>     >>>>      >>>>>>> In this way, we can have almost unlimited
> > > >>>     concurrent build
> > > >>>     >>>>      jobs and
> > > >>>     >>>>      >>>>>>> developers can restart build by themselves
> > > >>>     (currently only
> > > >>>     >>>>      committers
> > > >>>     >>>>      >>>>>>> can restart PR's build).
> > > >>>     >>>>      >>>>>>>
> > > >>>     >>>>      >>>>>>> But I'm still not very clear how to integrate
> > > >>> user's
> > > >>>     >>>>      travis build
> > > >>>     >>>>      >> into
> > > >>>     >>>>      >>>>>>> the Flink pull request's build automatically.
> > > >>>     Can you
> > > >>>     >>>>      explain more in
> > > >>>     >>>>      >>>>>>> detail?
> > > >>>     >>>>      >>>>>>>
> > > >>>     >>>>      >>>>>>> Another question: does travis only build
> > > >>>     branches for user
> > > >>>     >>>>      account?
> > > >>>     >>>>      >>>>>>> My concern is that builds for PRs will rebase
> > > >>> user's
> > > >>>     >>>>      commits against
> > > >>>     >>>>      >>>>>>> current master branch.
> > > >>>     >>>>      >>>>>>> This will help us to find problems before
> > > >>>     merge.  Builds
> > > >>>     >>>>      for branches
> > > >>>     >>>>      >>>>>>> will lose the impact of new commits in
> master.
> > > >>>     >>>>      >>>>>>> How does Zeppelin solve this problem?
> > > >>>     >>>>      >>>>>>>
> > > >>>     >>>>      >>>>>>> Thanks again for sharing the idea.
> > > >>>     >>>>      >>>>>>>
> > > >>>     >>>>      >>>>>>> Regards,
> > > >>>     >>>>      >>>>>>> Jark
> > > >>>     >>>>      >>>>>>>
> > > >>>     >>>>      >>>>>>> On Tue, 25 Jun 2019 at 11:01, Jeff Zhang
> > > >>>     <zjffdu@gmail.com <ma...@gmail.com>
> > > >>>     >>>>      <mailto:zjffdu@gmail.com <ma...@gmail.com>>
> > > >>>     >>>>      >>>>>>> <mailto:zjffdu@gmail.com
> > > >>> <ma...@gmail.com> <mailto:zjffdu@gmail.com
> > > >>> <ma...@gmail.com>>>> wrote:
> > > >>>     >>>>      >>>>>>>
> > > >>>     >>>>      >>>>>>>  Hi Folks,
> > > >>>     >>>>      >>>>>>>
> > > >>>     >>>>      >>>>>>>  Zeppelin meet this kind of issue before, we
> > > >>> solve
> > > >>>     >>>> it by
> > > >>>     >>>>      >> delegating
> > > >>>     >>>>      >>>>>>>  each
> > > >>>     >>>>      >>>>>>>  one's PR build to his travis account
> > > >>>     (Everyone can
> > > >>>     >>>>      have 5 free
> > > >>>     >>>>      >>>>>>>  slot for
> > > >>>     >>>>      >>>>>>>  travis build).
> > > >>>     >>>>      >>>>>>>  Apache account travis build is only
> triggered
> > > >>> when
> > > >>>     >>>>      PR is merged.
> > > >>>     >>>>      >>>>>>>
> > > >>>     >>>>      >>>>>>>
> > > >>>     >>>>      >>>>>>>
> > > >>>     >>>>      >>>>>>>  Kurt Young <ykt836@gmail.com
> > > >>> <ma...@gmail.com>
> > > >>>     >>>>      <mailto:ykt836@gmail.com <ma...@gmail.com>>
> > > >>>     <mailto:ykt836@gmail.com <ma...@gmail.com>
> > > >>>     >>>>      <mailto:ykt836@gmail.com <mailto:ykt836@gmail.com
> >>>>
> > > >>>     >>>>      >>>>>>>  于2019年6月25日周二 上午10:16写道:
> > > >>>     >>>>      >>>>>>>
> > > >>>     >>>>      >>>>>>>  > (Forgot to cc George)
> > > >>>     >>>>      >>>>>>>  >
> > > >>>     >>>>      >>>>>>>  > Best,
> > > >>>     >>>>      >>>>>>>  > Kurt
> > > >>>     >>>>      >>>>>>>  >
> > > >>>     >>>>      >>>>>>>  >
> > > >>>     >>>>      >>>>>>>  > On Tue, Jun 25, 2019 at 10:16 AM Kurt
> Young
> > > >>>     >>>>      <ykt836@gmail.com <ma...@gmail.com>
> > > >>>     <mailto:ykt836@gmail.com <ma...@gmail.com>>
> > > >>>     >>>>      >>>>>>> <mailto:ykt836@gmail.com
> > > >>> <ma...@gmail.com> <mailto:ykt836@gmail.com
> > > >>> <ma...@gmail.com>>>>
> > > >>>     >>>>      wrote:
> > > >>>     >>>>      >>>>>>>  >
> > > >>>     >>>>      >>>>>>>  > > Hi Bowen,
> > > >>>     >>>>      >>>>>>>  > >
> > > >>>     >>>>      >>>>>>>  > > Thanks for bringing this up. We
> > > >>>     actually have
> > > >>>     >>>>      discussed
> > > >>>     >>>>      >> about
> > > >>>     >>>>      >>>>>>>  this, and I
> > > >>>     >>>>      >>>>>>>  > > think Till and George have
> > > >>>     >>>>      >>>>>>>  > > already spend sometime investigating
> > > >>>     it. I have
> > > >>>     >>>>      cced both of
> > > >>>     >>>>      >>>>>>>  them, and
> > > >>>     >>>>      >>>>>>>  > > maybe they can share
> > > >>>     >>>>      >>>>>>>  > > their findings.
> > > >>>     >>>>      >>>>>>>  > >
> > > >>>     >>>>      >>>>>>>  > > Best,
> > > >>>     >>>>      >>>>>>>  > > Kurt
> > > >>>     >>>>      >>>>>>>  > >
> > > >>>     >>>>      >>>>>>>  > >
> > > >>>     >>>>      >>>>>>>  > > On Tue, Jun 25, 2019 at 10:08 AM Jark Wu
> > > >>>     >>>>      <imjark@gmail.com <ma...@gmail.com>
> > > >>>     <mailto:imjark@gmail.com <ma...@gmail.com>>
> > > >>>     >>>>      >>>>>>> <mailto:imjark@gmail.com
> > > >>> <ma...@gmail.com> <mailto:imjark@gmail.com
> > > >>> <ma...@gmail.com>>>>
> > > >>>     >>>>      wrote:
> > > >>>     >>>>      >>>>>>>  > >
> > > >>>     >>>>      >>>>>>>  > >> Hi Bowen,
> > > >>>     >>>>      >>>>>>>  > >>
> > > >>>     >>>>      >>>>>>>  > >> Thanks for bringing this. We also
> > > >>>     suffered from
> > > >>>     >>>>      the long
> > > >>>     >>>>      >>>>>>>  build time.
> > > >>>     >>>>      >>>>>>>  > >> I agree that we should focus on
> > > >>>     solving build
> > > >>>     >>>>      capacity
> > > >>>     >>>>      >>>>>>>  problem in the
> > > >>>     >>>>      >>>>>>>  > >> thread.
> > > >>>     >>>>      >>>>>>>  > >>
> > > >>>     >>>>      >>>>>>>  > >> My observation is there is only one
> > > >>>     build is
> > > >>>     >>>>      running, all
> > > >>>     >>>>      >> the
> > > >>>     >>>>      >>>>>>>  others
> > > >>>     >>>>      >>>>>>>  > >> (other
> > > >>>     >>>>      >>>>>>>  > >> PRs, master) are pending.
> > > >>>     >>>>      >>>>>>>  > >> The pricing plan[1] of travis shows
> > > >>>     it can
> > > >>>     >>>> support
> > > >>>     >>>>      >> concurrent
> > > >>>     >>>>      >>>>>>>  build
> > > >>>     >>>>      >>>>>>>  > jobs.
> > > >>>     >>>>      >>>>>>>  > >> But I don't know which plan we are
> > > >>>     using, might
> > > >>>     >>>>      be the free
> > > >>>     >>>>      >>>>>>>  plan for
> > > >>>     >>>>      >>>>>>>  > open
> > > >>>     >>>>      >>>>>>>  > >> source.
> > > >>>     >>>>      >>>>>>>  > >>
> > > >>>     >>>>      >>>>>>>  > >> I cc-ed Chesnay who may have some
> > > >>>     experience on
> > > >>>     >>>>      Travis.
> > > >>>     >>>>      >>>>>>>  > >>
> > > >>>     >>>>      >>>>>>>  > >> Regards,
> > > >>>     >>>>      >>>>>>>  > >> Jark
> > > >>>     >>>>      >>>>>>>  > >>
> > > >>>     >>>>      >>>>>>>  > >> [1]: https://travis-ci.com/plans
> > > >>>     >>>>      >>>>>>>  > >>
> > > >>>     >>>>      >>>>>>>  > >> On Tue, 25 Jun 2019 at 08:11, Bowen Li
> <
> > > >>>     >>>>      >> bowenli86@gmail.com <ma...@gmail.com>
> > > >>>     <mailto:bowenli86@gmail.com <ma...@gmail.com>>
> > > >>>     >>>>      >>>>>>> <mailto:bowenli86@gmail.com
> > > >>> <ma...@gmail.com>
> > > >>>     >>>>      <mailto:bowenli86@gmail.com
> > > >>> <ma...@gmail.com>>>> wrote:
> > > >>>     >>>>      >>>>>>>  > >>
> > > >>>     >>>>      >>>>>>>  > >> > Hi Steven,
> > > >>>     >>>>      >>>>>>>  > >> >
> > > >>>     >>>>      >>>>>>>  > >> > I think you may not read what I
> > > >>>     wrote. The
> > > >>>     >>>>      discussion is
> > > >>>     >>>>      >>>> about
> > > >>>     >>>>      >>>>>>>  > "unstable
> > > >>>     >>>>      >>>>>>>  > >> > build **capacity**", in another word
> > > >>>     >>>>      "unstable / lack of
> > > >>>     >>>>      >>>> build
> > > >>>     >>>>      >>>>>>>  > >> resources",
> > > >>>     >>>>      >>>>>>>  > >> > not "unstable build".
> > > >>>     >>>>      >>>>>>>  > >> >
> > > >>>     >>>>      >>>>>>>  > >> > On Mon, Jun 24, 2019 at 4:40 PM
> > > >>>     Steven Wu
> > > >>>     >>>>      >>>>>>>  <stevenz3wu@gmail.com
> > > >>> <ma...@gmail.com> <mailto:stevenz3wu@gmail.com
> > > >>> <ma...@gmail.com>>
> > > >>>     >>>>      <mailto:stevenz3wu@gmail.com
> > > >>> <ma...@gmail.com> <mailto:stevenz3wu@gmail.com
> > > >>> <ma...@gmail.com>>>>
> > > >>>     >>>>      >>>>>>>  > wrote:
> > > >>>     >>>>      >>>>>>>  > >> >
> > > >>>     >>>>      >>>>>>>  > >> > > long and sometimes unstable build
> is
> > > >>>     >>>>      definitely a pain
> > > >>>     >>>>      >>>>>> point.
> > > >>>     >>>>      >>>>>>>  > >> > >
> > > >>>     >>>>      >>>>>>>  > >> > > I suspect the build failure here in
> > > >>>     >>>>      >> flink-connector-kafka
> > > >>>     >>>>      >>>>>>>  is not
> > > >>>     >>>>      >>>>>>>  > >> related
> > > >>>     >>>>      >>>>>>>  > >> > to
> > > >>>     >>>>      >>>>>>>  > >> > > my change. but there is no easy
> > > >>>     re-run the
> > > >>>     >>>>      build on
> > > >>>     >>>>      >>>>>>>  travis UI.
> > > >>>     >>>>      >>>>>>>  > Google
> > > >>>     >>>>      >>>>>>>  > >> > > search showed a trick of
> > > >>>     close-and-open the
> > > >>>     >>>>      PR will
> > > >>>     >>>>      >>>>>>>  trigger rebuild.
> > > >>>     >>>>      >>>>>>>  > >> but
> > > >>>     >>>>      >>>>>>>  > >> > > that could add noises to the PR
> > > >>>     activities.
> > > >>>     >>>>      >>>>>>>  > >> > >
> > > >>>     >>>> https://travis-ci.org/apache/flink/jobs/545555519
> > > >>>     >>>>      >>>>>>>  > >> > >
> > > >>>     >>>>      >>>>>>>  > >> > > travis-ci for my personal repo
> > > >>>     often failed
> > > >>>     >>>>      with
> > > >>>     >>>>      >>>>>>>  exceeding time
> > > >>>     >>>>      >>>>>>>  > limit
> > > >>>     >>>>      >>>>>>>  > >> > after
> > > >>>     >>>>      >>>>>>>  > >> > > 4+ hours.
> > > >>>     >>>>      >>>>>>>  > >> > > The job exceeded the maximum time
> > > >>>     limit for
> > > >>>     >>>>      jobs, and
> > > >>>     >>>>      >> has
> > > >>>     >>>>      >>>>>>>  been
> > > >>>     >>>>      >>>>>>>  > >> > terminated.
> > > >>>     >>>>      >>>>>>>  > >> > >
> > > >>>     >>>>      >>>>>>>  > >> > > On Mon, Jun 24, 2019 at 4:15 PM
> > > >>>     Bowen Li
> > > >>>     >>>>      >>>>>>>  <bowenli86@gmail.com
> > > >>> <ma...@gmail.com> <mailto:bowenli86@gmail.com
> > > >>> <ma...@gmail.com>>
> > > >>>     >>>>      <mailto:bowenli86@gmail.com <mailto:
> > bowenli86@gmail.com>
> > > >>>     <mailto:bowenli86@gmail.com <ma...@gmail.com>>>>
> > > >>>     >>>>      >>>>>>>  > wrote:
> > > >>>     >>>>      >>>>>>>  > >> > >
> > > >>>     >>>>      >>>>>>>  > >> > > >
> > > >>>     >>>> https://travis-ci.org/apache/flink/builds/549681530
> > > >>>     >>>>      >>>>>>>  This build
> > > >>>     >>>>      >>>>>>>  > >> > request
> > > >>>     >>>>      >>>>>>>  > >> > > > has
> > > >>>     >>>>      >>>>>>>  > >> > > > been sitting at **HEAD of the
> > > >>>     queue**
> > > >>>     >>>>      since I first
> > > >>>     >>>>      >> saw
> > > >>>     >>>>      >>>>>>>  it at PST
> > > >>>     >>>>      >>>>>>>  > >> > 10:30am
> > > >>>     >>>>      >>>>>>>  > >> > > > (not sure how long it's been
> > > >>>     there before
> > > >>>     >>>>      10:30am).
> > > >>>     >>>>      >>>>>>>  It's PST
> > > >>>     >>>>      >>>>>>>  > 4:12pm
> > > >>>     >>>>      >>>>>>>  > >> now
> > > >>>     >>>>      >>>>>>>  > >> > > and
> > > >>>     >>>>      >>>>>>>  > >> > > > it hasn't started yet.
> > > >>>     >>>>      >>>>>>>  > >> > > >
> > > >>>     >>>>      >>>>>>>  > >> > > > On Mon, Jun 24, 2019 at 2:48 PM
> > > >>>     Bowen Li
> > > >>>     >>>>      >>>>>>>  <bowenli86@gmail.com
> > > >>> <ma...@gmail.com> <mailto:bowenli86@gmail.com
> > > >>> <ma...@gmail.com>>
> > > >>>     >>>>      <mailto:bowenli86@gmail.com <mailto:
> > bowenli86@gmail.com>
> > > >>>     <mailto:bowenli86@gmail.com <ma...@gmail.com>>>>
> > > >>>     >>>>      >>>>>>>  > >> wrote:
> > > >>>     >>>>      >>>>>>>  > >> > > >
> > > >>>     >>>>      >>>>>>>  > >> > > > > Hi devs,
> > > >>>     >>>>      >>>>>>>  > >> > > > >
> > > >>>     >>>>      >>>>>>>  > >> > > > > I've been experiencing the pain
> > > >>>     >>>>      resulting from lack
> > > >>>     >>>>      >>>>>>>  of stable
> > > >>>     >>>>      >>>>>>>  > >> build
> > > >>>     >>>>      >>>>>>>  > >> > > > > capacity on Travis for Flink
> > > >>>     PRs [1].
> > > >>>     >>>>      >> Specifically, I
> > > >>>     >>>>      >>>>>>>  noticed
> > > >>>     >>>>      >>>>>>>  > >> often
> > > >>>     >>>>      >>>>>>>  > >> > > that
> > > >>>     >>>>      >>>>>>>  > >> > > > no
> > > >>>     >>>>      >>>>>>>  > >> > > > > build in the queue is making
> any
> > > >>>     >>>>      progress for
> > > >>>     >>>>      >> hours,
> > > >>>     >>>>      >>>> and
> > > >>>     >>>>      >>>>>>>  > suddenly
> > > >>>     >>>>      >>>>>>>  > >> 5
> > > >>>     >>>>      >>>>>>>  > >> > or
> > > >>>     >>>>      >>>>>>>  > >> > > 6
> > > >>>     >>>>      >>>>>>>  > >> > > > > builds kick off all together
> > > >>>     after the
> > > >>>     >>>>      long pause.
> > > >>>     >>>>      >>>>>>>  I'm at PST
> > > >>>     >>>>      >>>>>>>  > >> > (UTC-08)
> > > >>>     >>>>      >>>>>>>  > >> > > > time
> > > >>>     >>>>      >>>>>>>  > >> > > > > zone, and I've seen pause can
> > > >>>     be as
> > > >>>     >>>>      long as 6 hours
> > > >>>     >>>>      >>>>>>>  from PST 9am
> > > >>>     >>>>      >>>>>>>  > >> to
> > > >>>     >>>>      >>>>>>>  > >> > 3pm
> > > >>>     >>>>      >>>>>>>  > >> > > > > (let alone the time needed to
> > > >>>     drain the
> > > >>>     >>>>      queue
> > > >>>     >>>>      >>>>>>>  afterwards).
> > > >>>     >>>>      >>>>>>>  > >> > > > >
> > > >>>     >>>>      >>>>>>>  > >> > > > > I think this has greatly
> > > >>>     impacted our
> > > >>>     >>>>      productivity.
> > > >>>     >>>>      >>>> I've
> > > >>>     >>>>      >>>>>>>  > >> experienced
> > > >>>     >>>>      >>>>>>>  > >> > > that
> > > >>>     >>>>      >>>>>>>  > >> > > > > PRs submitted in the early
> > > >>>     morning of
> > > >>>     >>>>      PST time zone
> > > >>>     >>>>      >>>>>>>  won't finish
> > > >>>     >>>>      >>>>>>>  > >> > their
> > > >>>     >>>>      >>>>>>>  > >> > > > > build until late night of the
> > > >>>     same day.
> > > >>>     >>>>      >>>>>>>  > >> > > > >
> > > >>>     >>>>      >>>>>>>  > >> > > > > So my questions are:
> > > >>>     >>>>      >>>>>>>  > >> > > > >
> > > >>>     >>>>      >>>>>>>  > >> > > > > - Has anyone else experienced
> > > >>>     the same
> > > >>>     >>>>      problem or
> > > >>>     >>>>      >>>>>>>  have similar
> > > >>>     >>>>      >>>>>>>  > >> > > > observation
> > > >>>     >>>>      >>>>>>>  > >> > > > > on TravisCI? (I suspect it
> > > >>>     has things
> > > >>>     >>>>      to do with
> > > >>>     >>>>      >> time
> > > >>>     >>>>      >>>>>>>  zone)
> > > >>>     >>>>      >>>>>>>  > >> > > > >
> > > >>>     >>>>      >>>>>>>  > >> > > > > - What pricing plan of
> > > >>>     TravisCI is
> > > >>>     >>>>      Flink currently
> > > >>>     >>>>      >>>>>>>  using? Is it
> > > >>>     >>>>      >>>>>>>  > >> the
> > > >>>     >>>>      >>>>>>>  > >> > > free
> > > >>>     >>>>      >>>>>>>  > >> > > > > plan for open source
> > > >>>     projects? What
> > > >>>     >>>> are the
> > > >>>     >>>>      >>>>>>>  guaranteed build
> > > >>>     >>>>      >>>>>>>  > >> capacity
> > > >>>     >>>>      >>>>>>>  > >> > > of
> > > >>>     >>>>      >>>>>>>  > >> > > > > the current plan?
> > > >>>     >>>>      >>>>>>>  > >> > > > >
> > > >>>     >>>>      >>>>>>>  > >> > > > > - If the current pricing plan
> > > >>>     (either
> > > >>>     >>>>      free or paid)
> > > >>>     >>>>      >>>>>> can't
> > > >>>     >>>>      >>>>>>>  > provide
> > > >>>     >>>>      >>>>>>>  > >> > > stable
> > > >>>     >>>>      >>>>>>>  > >> > > > > build capacity, can we
> > > >>>     upgrade to a
> > > >>>     >>>>      higher priced
> > > >>>     >>>>      >>>>>>>  plan with
> > > >>>     >>>>      >>>>>>>  > larger
> > > >>>     >>>>      >>>>>>>  > >> > and
> > > >>>     >>>>      >>>>>>>  > >> > > > more
> > > >>>     >>>>      >>>>>>>  > >> > > > > stable build capacity?
> > > >>>     >>>>      >>>>>>>  > >> > > > >
> > > >>>     >>>>      >>>>>>>  > >> > > > > BTW, another factor that
> > > >>>     contribute to
> > > >>>     >>>> the
> > > >>>     >>>>      >>>>>>>  productivity problem
> > > >>>     >>>>      >>>>>>>  > is
> > > >>>     >>>>      >>>>>>>  > >> > that
> > > >>>     >>>>      >>>>>>>  > >> > > > > our build is slow - we run
> > > >>>     full build
> > > >>>     >>>>      for every PR
> > > >>>     >>>>      >>>> and a
> > > >>>     >>>>      >>>>>>>  > >> successful
> > > >>>     >>>>      >>>>>>>  > >> > > full
> > > >>>     >>>>      >>>>>>>  > >> > > > > build takes ~5h. We
> > > >>>     definitely have
> > > >>>     >>>>      more options to
> > > >>>     >>>>      >>>>>>>  solve it,
> > > >>>     >>>>      >>>>>>>  > for
> > > >>>     >>>>      >>>>>>>  > >> > > > instance,
> > > >>>     >>>>      >>>>>>>  > >> > > > > modularize the build graphs
> > > >>>     and reuse
> > > >>>     >>>>      artifacts
> > > >>>     >>>>      >> from
> > > >>>     >>>>      >>>> the
> > > >>>     >>>>      >>>>>>>  > previous
> > > >>>     >>>>      >>>>>>>  > >> > > build.
> > > >>>     >>>>      >>>>>>>  > >> > > > > But I think that can be a big
> > > >>>     effort
> > > >>>     >>>>      which is much
> > > >>>     >>>>      >>>>>>>  harder to
> > > >>>     >>>>      >>>>>>>  > >> > accomplish
> > > >>>     >>>>      >>>>>>>  > >> > > > in
> > > >>>     >>>>      >>>>>>>  > >> > > > > a short period of time and
> > > >>>     may deserve
> > > >>>     >>>>      its own
> > > >>>     >>>>      >>>> separate
> > > >>>     >>>>      >>>>>>>  > >> discussion.
> > > >>>     >>>>      >>>>>>>  > >> > > > >
> > > >>>     >>>>      >>>>>>>  > >> > > > > [1]
> > > >>>     >>>>      >> https://travis-ci.org/apache/flink/pull_requests
> > > >>>     >>>>      >>>>>>>  > >> > > > >
> > > >>>     >>>>      >>>>>>>  > >> > > > >
> > > >>>     >>>>      >>>>>>>  > >> > > >
> > > >>>     >>>>      >>>>>>>  > >> > >
> > > >>>     >>>>      >>>>>>>  > >> >
> > > >>>     >>>>      >>>>>>>  > >>
> > > >>>     >>>>      >>>>>>>  > >
> > > >>>     >>>>      >>>>>>>  >
> > > >>>     >>>>      >>>>>>>
> > > >>>     >>>>      >>>>>>>
> > > >>>     >>>>      >>>>>>>  --
> > > >>>     >>>>      >>>>>>>  Best Regards
> > > >>>     >>>>      >>>>>>>
> > > >>>     >>>>      >>>>>>>  Jeff Zhang
> > > >>>     >>>>      >>>>>>>
> > > >>>     >>>>      >>
> > > >>>     >>>>
> > > >>>     >>>
> > > >>>     >>
> > > >>>
> > > >>
> > > >>
> > > >
> > >
> > >
> >
>

Re: [VOTE] Migrate to sponsored Travis account

Posted by Yun Tang <my...@live.com>.
I noticed that switching to a separate Travis account to run CI is actually impossible from what https://issues.apache.org/jira/browse/INFRA-18703 said. Hope another option from Chesnay to speed up the CI progress would work soon.


Best
Yun Tang
________________________________
From: Jark Wu <im...@gmail.com>
Sent: Friday, July 5, 2019 10:34
To: dev
Cc: private@flink.apache.org; Bowen Li
Subject: Re: [VOTE] Migrate to sponsored Travis account

+1 for the migration and great thanks to Chesnay and Bowen for pushing this!

Cheers,
Jark

On Fri, 5 Jul 2019 at 09:34, Congxian Qiu <qc...@gmail.com> wrote:

> +1 for the migration.
>
> Best,
> Congxian
>
>
> Hequn Cheng <ch...@gmail.com> 于2019年7月4日周四 下午9:42写道:
>
> > +1.
> >
> > And thanks a lot to Chesnay for pushing this.
> >
> > Best, Hequn
> >
> > On Thu, Jul 4, 2019 at 8:07 PM Chesnay Schepler <ch...@apache.org>
> > wrote:
> >
> > > Note that the Flinkbot approach isn't that trivial either; we can't
> > > _just_ trigger builds for a branch in the apache repo, but would first
> > > have to clone the branch/pr into a separate repository (that is owned
> by
> > > the github account that the travis account would be tied to).
> > >
> > > One roadblock after the next showing up...
> > >
> > > On 04/07/2019 11:59, Chesnay Schepler wrote:
> > > > Small update with mostly bad news:
> > > >
> > > > INFRA doesn't know whether it is possible, and referred my to Travis
> > > > support.
> > > > They did point out that it could be problematic in regards to
> > > > read/write permissions for the repository.
> > > >
> > > > From my own findings /so far/ with a test repo/organization, it does
> > > > not appear possible to configure the Travis account used for a
> > > > specific repository.
> > > >
> > > > So yeah, if we go down this route we may have to pimp the Flinkbot to
> > > > trigger builds through the Travis REST API.
> > > >
> > > > On 04/07/2019 10:46, Chesnay Schepler wrote:
> > > >> I've raised a JIRA
> > > >> <https://issues.apache.org/jira/browse/INFRA-18703>with INFRA to
> > > >> inquire whether it would be possible to switch to a different Travis
> > > >> account, and if so what steps would need to be taken.
> > > >> We need a proper confirmation from INFRA since we are not in full
> > > >> control of the flink repository (for example, we cannot access the
> > > >> settings page).
> > > >>
> > > >> If this is indeed possible, Ververica is willing sponsor a Travis
> > > >> account for the Flink project.
> > > >> This would provide us with more than enough resources than we need.
> > > >>
> > > >> Since this makes the project more reliant on resources provided by
> > > >> external companies I would like to vote on this.
> > > >>
> > > >> Please vote on this proposal, as follows:
> > > >> [ ] +1, Approve the migration to a Ververica-sponsored Travis
> > > >> account, provided that INFRA approves
> > > >> [ ] -1, Do not approach the migration to a Ververica-sponsored
> Travis
> > > >> account
> > > >>
> > > >> The vote will be open for at least 24h, and until we have
> > > >> confirmation from INFRA. The voting period may be shorter than the
> > > >> usual 3 days since our current is effectively not working.
> > > >>
> > > >> On 04/07/2019 06:51, Bowen Li wrote:
> > > >>> Re: > Are they using their own Travis CI pool, or did the switch to
> > > >>> an entirely different CI service?
> > > >>>
> > > >>> I reached out to Wes and Krisztián from Apache Arrow PMC. They are
> > > >>> currently moving away from ASF's Travis to their own in-house metal
> > > >>> machines at [1] with custom CI application at [2]. They've seen
> > > >>> significant improvement w.r.t both much higher performance and
> > > >>> basically no resource waiting time, "night-and-day" difference
> > > >>> quoting Wes.
> > > >>>
> > > >>> Re: > If we can just switch to our own Travis pool, just for our
> > > >>> project, then this might be something we can do fairly quickly?
> > > >>>
> > > >>> I believe so, according to [3] and [4]
> > > >>>
> > > >>>
> > > >>> [1] https://ci.ursalabs.org/ <https://ci.ursalabs.org/#/>
> > > >>> [2] https://github.com/ursa-labs/ursabot
> > > >>> [3]
> > > >>>
> > >
> https://docs.travis-ci.com/user/migrate/open-source-repository-migration
> > > >>>
> > > >>> [4]
> > > >>>
> https://docs.travis-ci.com/user/migrate/open-source-on-travis-ci-com
> > > >>>
> > > >>>
> > > >>>
> > > >>> On Wed, Jul 3, 2019 at 12:01 AM Chesnay Schepler <
> chesnay@apache.org
> > > >>> <ma...@apache.org>> wrote:
> > > >>>
> > > >>>     Are they using their own Travis CI pool, or did the switch to
> an
> > > >>>     entirely different CI service?
> > > >>>
> > > >>>     If we can just switch to our own Travis pool, just for our
> > > >>>     project, then
> > > >>>     this might be something we can do fairly quickly?
> > > >>>
> > > >>>     On 03/07/2019 05:55, Bowen Li wrote:
> > > >>>     > I responded in the INFRA ticket [1] that I believe they are
> > > >>>     using a wrong
> > > >>>     > metric against Flink and the total build time is a completely
> > > >>>     different
> > > >>>     > thing than guaranteed build capacity.
> > > >>>     >
> > > >>>     > My response:
> > > >>>     >
> > > >>>     > "As mentioned above, since I started to pay attention to
> > Flink's
> > > >>>     build
> > > >>>     > queue a few tens of days ago, I'm in Seattle and I saw no
> build
> > > >>>     was kicking
> > > >>>     > off in PST daytime in weekdays for Flink. Our teammates in
> > China
> > > >>>     and Europe
> > > >>>     > have also reported similar observations. So we need to
> evaluate
> > > >>>     how the
> > > >>>     > large total build time came from - if 1) your number and 2)
> our
> > > >>>     > observations from three locations that cover pretty much a
> full
> > > >>>     day, are
> > > >>>     > all true, I **guess** one reason can be that - highly likely
> > the
> > > >>>     extra
> > > >>>     > build time came from weekends when other Apache projects may
> be
> > > >>>     idle and
> > > >>>     > Flink just drains hard its congested queue.
> > > >>>     >
> > > >>>     > Please be aware of that we're not complaining about the lack
> of
> > > >>>     resources
> > > >>>     > in general, I'm complaining about the lack of **stable,
> > > >>> dedicated**
> > > >>>     > resources. An example for the latter one is, currently even
> if
> > > >>>     no build is
> > > >>>     > in Flink's queue and I submit a request to be the queue head
> > > >>> in PST
> > > >>>     > morning, my build won't even start in 6-8+h. That is an
> absurd
> > > >>>     amount of
> > > >>>     > waiting time.
> > > >>>     >
> > > >>>     > That's saying, if ASF INFRA decides to adopt a quota system
> and
> > > >>>     grants
> > > >>>     > Flink five DEDICATED servers that runs all the time only for
> > > >>>     Flink, that'll
> > > >>>     > be PERFECT and can totally solve our problem now.
> > > >>>     >
> > > >>>     > Please be aware of that we're not complaining about the lack
> of
> > > >>>     resources
> > > >>>     > in general, I'm complaining about the lack of **stable,
> > > >>> dedicated**
> > > >>>     > resources. An example for the latter one is, currently even
> if
> > > >>>     no build is
> > > >>>     > in Flink's queue and I submit a request to be the queue head
> > > >>> in PST
> > > >>>     > morning, my build won't even start in 6-8+h. That is an
> absurd
> > > >>>     amount of
> > > >>>     > waiting time.
> > > >>>     >
> > > >>>     >
> > > >>>     > That's saying, if ASF INFRA decides to adopt a quota system
> and
> > > >>>     grants
> > > >>>     > Flink five DEDICATED servers that runs all the time only for
> > > >>>     Flink, that'll
> > > >>>     > be PERFECT and can totally solve our problem now.
> > > >>>     >
> > > >>>     > I feel what's missing in the ASF INFRA's Travis resource pool
> > is
> > > >>>     some level
> > > >>>     > of build capacity SLAs and certainty"
> > > >>>     >
> > > >>>     >
> > > >>>     > Again, I believe there are differences in nature of these two
> > > >>>     problems,
> > > >>>     > long build time v.s. lack of dedicated build resource. That's
> > > >>>     saying,
> > > >>>     > shortening build time may relieve the situation, and may not.
> > > >>>     I'm sightly
> > > >>>     > negative on disabling IT cases for PRs, due to the downside
> is
> > > >>>     that we are
> > > >>>     > at risk of any potential bugs in PR that UTs doesn't catch,
> and
> > > >>>     may cost a
> > > >>>     > lot more to fix and if it slows others down or even block
> > > >>>     others, but am
> > > >>>     > open to others opinions on it.
> > > >>>     >
> > > >>>     > AFAICT from INFRA ticket[1], donating to ASF INFRA won't be
> > > >>>     feasible to
> > > >>>     > solve our problem since INFRA's pool is fully shared and they
> > > >>>     have no
> > > >>>     > control and finer insights over resource allocation to a
> > > >>>     specific Apache
> > > >>>     > project. As mentioned in [1], Apache Arrow is moving away
> from
> > > >>>     ASF INFRA
> > > >>>     > Travis pool (they are actually surprised Flink hasn't plan to
> > do
> > > >>>     so). I
> > > >>>     > know that Spark is on its own build infra. If we all agree
> that
> > > >>>     funding our
> > > >>>     > own build infra, I'd be glad to help investigate any
> potential
> > > >>>     options
> > > >>>     > after releasing 1.9 since I'm super busy with 1.9 now.
> > > >>>     >
> > > >>>     > [1] https://issues.apache.org/jira/browse/INFRA-18533
> > > >>>     >
> > > >>>     >
> > > >>>     >
> > > >>>     > On Tue, Jul 2, 2019 at 4:46 AM Chesnay Schepler
> > > >>>     <chesnay@apache.org <ma...@apache.org>> wrote:
> > > >>>     >
> > > >>>     >> As a short-term stopgap, since we can assume this issue to
> > > >>>     become much
> > > >>>     >> worse in the following days/weeks, we could disable IT cases
> > in
> > > >>>     PRs and
> > > >>>     >> only run them on master.
> > > >>>     >>
> > > >>>     >> On 02/07/2019 12:03, Chesnay Schepler wrote:
> > > >>>     >>> People really have to stop thinking that just because
> > > >>>     something works
> > > >>>     >>> for us it is also a good solution.
> > > >>>     >>> Also, please remember that our builds run for 2h from start
> > to
> > > >>>     finish,
> > > >>>     >>> and not the 14 _minutes_ it takes for zeppelin.
> > > >>>     >>> We are dealing with an entirely different scale here, both
> in
> > > >>>     terms of
> > > >>>     >>> build times and number of builds.
> > > >>>     >>>
> > > >>>     >>> In this very thread people have been complaining about long
> > > >>> queue
> > > >>>     >>> times for their builds. Surprise, other Apache projects
> have
> > > >>> been
> > > >>>     >>> suffering the very same thing due to us not controlling our
> > > >>> build
> > > >>>     >>> times. While switching services (be it Jenkins, CircleCI or
> > > >>>     whatever)
> > > >>>     >>> will possibly work for us (and these options are actually
> > > >>>     attractive,
> > > >>>     >>> like CircleCI's proper support for build artifacts), it
> will
> > > >>> also
> > > >>>     >>> result in us likely negatively affecting other projects in
> > > >>>     significant
> > > >>>     >>> ways.
> > > >>>     >>>
> > > >>>     >>> Sure, the Jenkins setup has a good user experience for us,
> at
> > > >>>     the cost
> > > >>>     >>> of blocking Jenkins workers for a _lot_ of time. Right now
> we
> > > >>>     have 25
> > > >>>     >>> PR's in our queue; that's possibly 50h we'd consume of
> > Jenkins
> > > >>>     >>> resources, and the European contributors haven't even
> really
> > > >>>     started yet.
> > > >>>     >>>
> > > >>>     >>> FYI, the latest INFRA response from INFRA-18533:
> > > >>>     >>>
> > > >>>     >>> "Our rough metrics shows that Flink used over 5800 hours of
> > > >>>     build time
> > > >>>     >>> last month. That is equal to EIGHT servers running 24/7 for
> > > >>>     the ENTIRE
> > > >>>     >>> MONTH. EIGHT. nonstop.
> > > >>>     >>> When we discovered this last night, we discussed it some
> and
> > > >>>     are going
> > > >>>     >>> to tune down Flink to allow only five executors maximum. We
> > > >>> cannot
> > > >>>     >>> allow Flink to consume so much of a Foundation shared
> > > >>> resource."
> > > >>>     >>>
> > > >>>     >>> So yes, we either
> > > >>>     >>> a) have to heavily reduce our CI usage or
> > > >>>     >>> b) fund our own, either maintaining it ourselves or
> donating
> > > >>>     to Apache.
> > > >>>     >>>
> > > >>>     >>> On 02/07/2019 05:11, Bowen Li wrote:
> > > >>>     >>>> By looking at the git history of the Jenkins script, its
> > core
> > > >>>     part
> > > >>>     >>>> was finished in March 2017 (and only two minor update in
> > > >>>     2017/2018),
> > > >>>     >>>> so it's been running for over two years now and feels like
> > > >>>     Zepplin
> > > >>>     >>>> community has been quite happy with it. @Jeff Zhang
> > > >>>     >>>> <mailto:zjffdu@gmail.com <ma...@gmail.com>> can
> you
> > > >>>     share your insights and user
> > > >>>     >>>> experience with the Jenkins+Travis approach?
> > > >>>     >>>>
> > > >>>     >>>> Things like:
> > > >>>     >>>>
> > > >>>     >>>> - has the approach completely solved the resource capacity
> > > >>>     problem
> > > >>>     >>>> for Zepplin community? is Zepplin community happy with the
> > > >>>     result?
> > > >>>     >>>> - is the whole configuration chain stable (e.g. uptime)
> > > >>> enough?
> > > >>>     >>>> - how often do you need to maintain the Jenkins infra? how
> > > >>> many
> > > >>>     >>>> people are usually involved in maintenance and bug-fixes?
> > > >>>     >>>>
> > > >>>     >>>> The downside of this approach seems mostly to be on the
> > > >>>     maintenance
> > > >>>     >>>> to me - maintain the script and Jenkins infra.
> > > >>>     >>>>
> > > >>>     >>>> ** Having Our Own Travis-CI.com Account **
> > > >>>     >>>>
> > > >>>     >>>> Another alternative I've been thinking of is to have our
> own
> > > >>>     >>>> travis-ci.com <http://travis-ci.com> <
> http://travis-ci.com>
> > > >>>     account with paid dedicated
> > > >>>     >>>> resources. Note travis-ci.org <http://travis-ci.org>
> > > >>> <http://travis-ci.org> is the free
> > > >>>     >>>> version and travis-ci.com <http://travis-ci.com>
> > > >>> <http://travis-ci.com> is the commercial
> > > >>>     >>>> version. We currently use a shared resource pool managed
> by
> > > >>>     ASK INFRA
> > > >>>     >>>> team on travis-ci.org <http://travis-ci.org>
> > > >>> <http://travis-ci.org>, but we have no control
> > > >>>     >>>> over it - we can't see how it's configured, how much
> > > >>>     resources are
> > > >>>     >>>> available, how resources are allocated among Apache
> > projects,
> > > >>>     etc.
> > > >>>     >>>> The nice thing about having an account on travis-ci.com
> > > >>> <http://travis-ci.com>
> > > >>>     >>>> <http://travis-ci.com> are:
> > > >>>     >>>>
> > > >>>     >>>> - relatively low cost with much better resource guarantee
> > > >>>     than what
> > > >>>     >>>> we currently have [1]: $249/month with 5 dedicated
> > > >>> concurrency,
> > > >>>     >>>> $489/month with 10 concurrency
> > > >>>     >>>> - low maintenance work compared to using Jenkins
> > > >>>     >>>> - (potentially) no migration cost according to Travis's
> doc
> > > >>> [2]
> > > >>>     >>>> (pending verification)
> > > >>>     >>>> - full control over the build capacity/configuration
> > > >>> compared to
> > > >>>     >>>> using ASF INFRA's pool
> > > >>>     >>>>
> > > >>>     >>>> I'd be surprised if we as such a vibrant community cannot
> > > >>>     find and
> > > >>>     >>>> fund $249*12=$2988 a year in exchange for a much better
> > > >>> developer
> > > >>>     >>>> experience and much higher productivity.
> > > >>>     >>>>
> > > >>>     >>>> [1] https://travis-ci.com/plans
> > > >>>     >>>> [2]
> > > >>>     >>>>
> > > >>>     >>
> > > >>>
> > >
> https://docs.travis-ci.com/user/migrate/open-source-repository-migration
> > > >>>
> > > >>>     >>>> On Sat, Jun 29, 2019 at 8:39 AM Chesnay Schepler
> > > >>>     <chesnay@apache.org <ma...@apache.org>
> > > >>>     >>>> <mailto:chesnay@apache.org <ma...@apache.org>>>
> > > >>> wrote:
> > > >>>     >>>>
> > > >>>     >>>>      So yes, the Jenkins job keeps pulling the state from
> > > >>>     Travis until it
> > > >>>     >>>>      finishes.
> > > >>>     >>>>
> > > >>>     >>>>      Note sure I'm comfortable with the idea of using
> > Jenkins
> > > >>>     workers
> > > >>>     >>>>      just to
> > > >>>     >>>>      idle for a several hours.
> > > >>>     >>>>
> > > >>>     >>>>      On 29/06/2019 14:56, Jeff Zhang wrote:
> > > >>>     >>>>      > Here's what zeppelin community did, we make a
> python
> > > >>>     script to
> > > >>>     >>>>      check the
> > > >>>     >>>>      > build status of pull request.
> > > >>>     >>>>      > Here's script:
> > > >>>     >>>>      >
> > > >>> https://github.com/apache/zeppelin/blob/master/travis_check.py
> > > >>>     >>>>      >
> > > >>>     >>>>      > And this is the script we used in Jenkins build
> job.
> > > >>>     >>>>      >
> > > >>>     >>>>      > if [ -f "travis_check.py" ]; then
> > > >>>     >>>>      >    git log -n 1
> > > >>>     >>>>      >    STATUS=$(curl -s $BUILD_URL | grep -e "GitHub
> pull
> > > >>>     >>>>      request.*from.*" | sed
> > > >>>     >>>>      > 's/.*GitHub pull request <a
> > > >>>     >>>>      >
> href=\"\(https[^"]*\).*from[^"]*.\(https[^"]*\).*/\1
> > > >>>     \2/g')
> > > >>>     >>>>      >    AUTHOR=$(echo $STATUS | sed
> 's/.*[/]\(.*\)$/\1/g')
> > > >>>     >>>>      >    PR=$(echo $STATUS | awk '{print $1}' | sed
> > > >>>     >>>> 's/.*[/]\(.*\)$/\1/g')
> > > >>>     >>>>      >    #COMMIT=$(git log -n 1 | grep "^Merge:" | awk
> > > >>>     '{print $3}')
> > > >>>     >>>>      >    #if [ -z $COMMIT ]; then
> > > >>>     >>>>      >    #  COMMIT=$(curl -s
> > > >>>     >>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR
> > > >>>     >>>>      > | grep -e "\"label\":" -e "\"ref\":" -e "\"sha\":"
> |
> > > >>>     tr '\n' ' '
> > > >>>     >>>>      | sed
> > > >>>     >>>>      > 's/\(.*sha[^,]*,\)\(.*ref.*\)/\1 = \2/g' | tr =
> '\n'
> > |
> > > >>>     grep -v
> > > >>>     >>>>      "apache:" |
> > > >>>     >>>>      > sed 's/.*sha.[^"]*["]\([^"]*\).*/\1/g')
> > > >>>     >>>>      >    #fi
> > > >>>     >>>>      >
> > > >>>     >>>>      >    # get commit hash from PR
> > > >>>     >>>>      >    COMMIT=$(curl -s
> > > >>>     >>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR |
> > > >>>     >>>>      > grep -e "\"label\":" -e "\"ref\":" -e "\"sha\":" |
> tr
> > > >>>     '\n' ' '
> > > >>>     >>>> | sed
> > > >>>     >>>>      > 's/\(.*sha[^,]*,\)\(.*ref.*\)/\1 = \2/g' | tr =
> '\n'
> > |
> > > >>>     grep -v
> > > >>>     >>>>      "apache:" |
> > > >>>     >>>>      > sed 's/.*sha.[^"]*["]\([^"]*\).*/\1/g')
> > > >>>     >>>>      >    sleep 30 # sleep few moment to wait travis
> starts
> > > >>>     the build
> > > >>>     >>>>      >    RET_CODE=0
> > > >>>     >>>>      >    python ./travis_check.py ${AUTHOR} ${COMMIT} ||
> > > >>>     RET_CODE=$?
> > > >>>     >>>>      >    if [ $RET_CODE -eq 2 ]; then # try with
> repository
> > > >>>     name when
> > > >>>     >>>>      travis-ci is
> > > >>>     >>>>      > not available in the account
> > > >>>     >>>>      >      RET_CODE=0
> > > >>>     >>>>      >      AUTHOR=$(curl -s
> > > >>>     >>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR
> > > >>>     >>>>      > | grep '"full_name":' | grep -v "apache/zeppelin" |
> > sed
> > > >>>     >>>>      > 's/.*[:][^"]*["]\([^/]*\).*/\1/g')
> > > >>>     >>>>      >    python ./travis_check.py ${AUTHOR} ${COMMIT} ||
> > > >>>     RET_CODE=$?
> > > >>>     >>>>      >    fi
> > > >>>     >>>>      >
> > > >>>     >>>>      >    if [ $RET_CODE -eq 2 ]; then # fail with can't
> > find
> > > >>>     build
> > > >>>     >>>>      information in
> > > >>>     >>>>      > the travis
> > > >>>     >>>>      >      set +x
> > > >>>     >>>>      >      echo
> > > >>>     "-----------------------------------------------------"
> > > >>>     >>>>      >      echo "Looks like travis-ci is not configured
> for
> > > >>>     your fork."
> > > >>>     >>>>      >      echo "Please setup by swich on 'zeppelin'
> > > >>>     repository at
> > > >>>     >>>>      > https://travis-ci.org/profile and travis-ci."
> > > >>>     >>>>      >      echo "And then make sure 'Build branch
> updates'
> > > >>>     option is
> > > >>>     >>>>      enabled in
> > > >>>     >>>>      > the settings
> > > >>> https://travis-ci.org/${AUTHOR}/zeppelin/settings
> > > >>> <https://travis-ci.org/$%7BAUTHOR%7D/zeppelin/settings>
> > > >>>     >>>> <https://travis-ci.org/$%7BAUTHOR%7D/zeppelin/settings>."
> > > >>>     >>>>      >      echo ""
> > > >>>     >>>>      >      echo "To trigger CI after setup, you will need
> > > >>>     ammend your
> > > >>>     >>>>      last commit
> > > >>>     >>>>      > with"
> > > >>>     >>>>      >      echo "git commit --amend"
> > > >>>     >>>>      >      echo "git push your-remote HEAD --force"
> > > >>>     >>>>      >      echo ""
> > > >>>     >>>>      >      echo "See
> > > >>>     >>>>      >
> > > >>>     >>>>
> > > >>>     >>
> > > >>>
> > >
> >
> http://zeppelin.apache.org/contribution/contributions.html#continuous-integration
> > > >>>     >>>>      > ."
> > > >>>     >>>>      >    fi
> > > >>>     >>>>      >
> > > >>>     >>>>      >    exit $RET_CODE
> > > >>>     >>>>      > else
> > > >>>     >>>>      >    set +x
> > > >>>     >>>>      >    echo "travis_check.py does not exists"
> > > >>>     >>>>      >    exit 1
> > > >>>     >>>>      > fi
> > > >>>     >>>>      >
> > > >>>     >>>>      > Chesnay Schepler <chesnay@apache.org
> > > >>> <ma...@apache.org>
> > > >>>     >>>>      <mailto:chesnay@apache.org <mailto:
> chesnay@apache.org
> > >>>
> > > >>>     于2019年6月29日周六 下午3:17写道:
> > > >>>     >>>>      >
> > > >>>     >>>>      >> Does this imply that a Jenkins job is active as
> long
> > > >>>     as the
> > > >>>     >>>>      Travis build
> > > >>>     >>>>      >> runs?
> > > >>>     >>>>      >>
> > > >>>     >>>>      >> On 26/06/2019 21:28, Bowen Li wrote:
> > > >>>     >>>>      >>> Hi,
> > > >>>     >>>>      >>>
> > > >>>     >>>>      >>> @Dawid, I think the "long test running" as I
> > > >>>     mentioned in the
> > > >>>     >>>>      first
> > > >>>     >>>>      >> email,
> > > >>>     >>>>      >>> also as you guys said, belongs to "a big effort
> > > >>>     which is much
> > > >>>     >>>>      harder to
> > > >>>     >>>>      >>> accomplish in a short period of time and may
> > deserve
> > > >>>     its own
> > > >>>     >>>>      separate
> > > >>>     >>>>      >>> discussion". Thus I didn't include it in what we
> > can
> > > >>>     do in a
> > > >>>     >>>>      foreseeable
> > > >>>     >>>>      >>> short term.
> > > >>>     >>>>      >>>
> > > >>>     >>>>      >>> Besides, I don't think that's the ultimate reason
> > > >>>     for lack of
> > > >>>     >>>>      build
> > > >>>     >>>>      >>> resources. Even if the build is shortened to
> > > >>>     something like
> > > >>>     >>>>      2h, the
> > > >>>     >>>>      >>> problems of no build machine works about 6 or
> more
> > > >>>     hours in
> > > >>>     >>>>      PST daytime
> > > >>>     >>>>      >>> that I described will still happen, because no
> > > >>>     machine from
> > > >>>     >>>>      ASF INFRA's
> > > >>>     >>>>      >>> pool is allocated to Flink. As I have paid close
> > > >>>     attention to
> > > >>>     >>>>      the build
> > > >>>     >>>>      >>> queue in the past few weekdays, it's a pretty
> clear
> > > >>>     pattern now.
> > > >>>     >>>>      >>>
> > > >>>     >>>>      >>> **The ultimate root cause** for that is - we
> don't
> > > >>>     have any
> > > >>>     >>>>      **dedicated**
> > > >>>     >>>>      >>> build resources that we can stably rely on. I'm
> > > >>>     actually ok to
> > > >>>     >>>>      wait for a
> > > >>>     >>>>      >>> long time if there are build requests running, it
> > > >>>     means at
> > > >>>     >>>>      least we are
> > > >>>     >>>>      >>> making progress. But I'm not ok with no build
> > > >>>     resource. A
> > > >>>     >>>>      better place I
> > > >>>     >>>>      >>> think we should aim at in short term is to always
> > > >>>     have at
> > > >>>     >>>>      least a central
> > > >>>     >>>>      >>> pool (can be 3 or 5) of machines dedicated to
> build
> > > >>>     Flink at
> > > >>>     >>>>      any time, or
> > > >>>     >>>>      >>> maybe use users resources.
> > > >>>     >>>>      >>>
> > > >>>     >>>>      >>> @Chesnay @Robert I synced with Jeff offline that
> > > >>>     Zeppelin
> > > >>>     >>>>      community is
> > > >>>     >>>>      >>> using a Jenkins job to automatically build on
> > users'
> > > >>>     travis
> > > >>>     >>>>      account and
> > > >>>     >>>>      >>> link the result back to github PR. I guess the
> > > >>>     Jenkins job
> > > >>>     >>>>      would fetch
> > > >>>     >>>>      >>> latest upstream master and build the PR against
> it.
> > > >>>     Jeff has
> > > >>>     >>>> filed
> > > >>>     >>>>      >> tickets
> > > >>>     >>>>      >>> to learn and get access to the Jenkins infra.
> It'll
> > > >>>     better to
> > > >>>     >>>>      fully
> > > >>>     >>>>      >>> understand it first before judging this approach.
> > > >>>     >>>>      >>>
> > > >>>     >>>>      >>> I also heard good things about CircleCI, and ASF
> > > >>>     INFRA seems
> > > >>>     >>>>      to have a
> > > >>>     >>>>      >> pool
> > > >>>     >>>>      >>> of build capacity there too. Can be an
> alternative
> > > >>>     to consider.
> > > >>>     >>>>      >>>
> > > >>>     >>>>      >>>
> > > >>>     >>>>      >>>
> > > >>>     >>>>      >>>
> > > >>>     >>>>      >>>
> > > >>>     >>>>      >>>
> > > >>>     >>>>      >>>
> > > >>>     >>>>      >>>
> > > >>>     >>>>      >>>
> > > >>>     >>>>      >>> On Wed, Jun 26, 2019 at 12:44 AM Dawid
> Wysakowicz <
> > > >>>     >>>>      >> dwysakowicz@apache.org
> > > >>> <ma...@apache.org> <mailto:dwysakowicz@apache.org
> > > >>> <ma...@apache.org>>>
> > > >>>     >>>>      >>> wrote:
> > > >>>     >>>>      >>>
> > > >>>     >>>>      >>>> Sorry to jump in late, but I think Bowen missed
> > the
> > > >>>     most
> > > >>>     >>>>      important point
> > > >>>     >>>>      >>>> from Chesnay's previous message in the summary.
> > The
> > > >>>     ultimate
> > > >>>     >>>>      reason for
> > > >>>     >>>>      >>>> all the problems is that the tests take close
> to 2
> > > >>>     hours to
> > > >>>     >>>>      run already.
> > > >>>     >>>>      >>>> I fully support this claim: "Unless people start
> > > >>>     caring about
> > > >>>     >>>>      test times
> > > >>>     >>>>      >>>> before adding them, this issue cannot be solved"
> > > >>>     >>>>      >>>>
> > > >>>     >>>>      >>>> This is also another reason why using user's
> > Travis
> > > >>>     account
> > > >>>     >>>>      won't help.
> > > >>>     >>>>      >>>> Every few weeks we reach the user's time limit
> for
> > > >>>     a single
> > > >>>     >>>>      profile.
> > > >>>     >>>>      >>>> This makes the user's builds simply fail, until
> we
> > > >>>     either
> > > >>>     >>>>      properly
> > > >>>     >>>>      >>>> decrease the time the tests take (which I am not
> > > >>>     sure we ever
> > > >>>     >>>>      did) or
> > > >>>     >>>>      >>>> postpone the problem by splitting into more
> > > >>>     profiles. (Note
> > > >>>     >>>>      that the ASF
> > > >>>     >>>>      >>>> Travis account has higher time limits)
> > > >>>     >>>>      >>>>
> > > >>>     >>>>      >>>> Best,
> > > >>>     >>>>      >>>>
> > > >>>     >>>>      >>>> Dawid
> > > >>>     >>>>      >>>>
> > > >>>     >>>>      >>>> On 26/06/2019 09:36, Robert Metzger wrote:
> > > >>>     >>>>      >>>>> Do we know if using "the best" available
> hardware
> > > >>>     would
> > > >>>     >>>>      improve the
> > > >>>     >>>>      >> build
> > > >>>     >>>>      >>>>> times?
> > > >>>     >>>>      >>>>> Imagine we would run the build on machines with
> > > >>>     plenty of
> > > >>>     >>>>      main memory
> > > >>>     >>>>      >> to
> > > >>>     >>>>      >>>>> mount everything to ramdisk + the latest CPU
> > > >>>     architecture?
> > > >>>     >>>>      >>>>>
> > > >>>     >>>>      >>>>> Throwing hardware at the problem could help
> > reduce
> > > >>>     the time
> > > >>>     >>>>      of an
> > > >>>     >>>>      >>>>> individual build, and using our own
> > infrastructure
> > > >>>     would
> > > >>>     >>>>      remove our
> > > >>>     >>>>      >>>>> dependency on Apache's Travis account (with the
> > > >>>     obvious
> > > >>>     >>>>      downside of
> > > >>>     >>>>      >>>> having
> > > >>>     >>>>      >>>>> to maintain the infrastructure)
> > > >>>     >>>>      >>>>> We could use an open source travis alternative,
> > to
> > > >>>     have a
> > > >>>     >>>>      similar
> > > >>>     >>>>      >>>>> experience and make the migration easy.
> > > >>>     >>>>      >>>>>
> > > >>>     >>>>      >>>>>
> > > >>>     >>>>      >>>>> On Wed, Jun 26, 2019 at 9:34 AM Chesnay
> Schepler
> > > >>>     >>>>      <chesnay@apache.org <ma...@apache.org>
> > > >>>     <mailto:chesnay@apache.org <ma...@apache.org>>>
> > > >>>     >>>>      >>>> wrote:
> > > >>>     >>>>      >>>>>>    >From what I gathered, there's no special
> > > >>>     sauce that the
> > > >>>     >>>>      Zeppelin
> > > >>>     >>>>      >>>>>> project uses which actually integrates a users
> > > >>> Travis
> > > >>>     >>>>      account into the
> > > >>>     >>>>      >>>> PR.
> > > >>>     >>>>      >>>>>> They just disabled Travis for PRs. And that's
> > > >>>     kind of it.
> > > >>>     >>>>      >>>>>>
> > > >>>     >>>>      >>>>>> Naturally we can do this (duh) and safe the
> ASF
> > a
> > > >>>     fair
> > > >>>     >>>>      amount of
> > > >>>     >>>>      >>>>>> resources, but there are downsides:
> > > >>>     >>>>      >>>>>>
> > > >>>     >>>>      >>>>>> The discoverability of the Travis check takes
> a
> > > >>>     nose-dive.
> > > >>>     >>>>      Either we
> > > >>>     >>>>      >>>>>> require every contributor to always, an every
> > > >>>     commit, also
> > > >>>     >>>>      post a
> > > >>>     >>>>      >> Travis
> > > >>>     >>>>      >>>>>> build, or we have the reviewer sift through
> the
> > > >>>     >>>>      contributors account
> > > >>>     >>>>      >> to
> > > >>>     >>>>      >>>>>> find it.
> > > >>>     >>>>      >>>>>>
> > > >>>     >>>>      >>>>>> This is rather cumbersome. Additionally, it's
> > > >>>     also not
> > > >>>     >>>>      equivalent to
> > > >>>     >>>>      >>>>>> having a PR build.
> > > >>>     >>>>      >>>>>>
> > > >>>     >>>>      >>>>>> A normal branch build takes a branch as is and
> > > >>>     tests it. A
> > > >>>     >>>>      PR build
> > > >>>     >>>>      >>>>>> merges the branch into master, and then runs
> it.
> > > >>>     (Fun fact:
> > > >>>     >>>>      This is
> > > >>>     >>>>      >> why
> > > >>>     >>>>      >>>>>> a PR without merge conflicts is not being run
> on
> > > >>>     Travis.)
> > > >>>     >>>>      >>>>>>
> > > >>>     >>>>      >>>>>> And ultimately, everyone can already make use
> > > >>> of this
> > > >>>     >>>>      approach anyway.
> > > >>>     >>>>      >>>>>>
> > > >>>     >>>>      >>>>>> On 25/06/2019 08:02, Jark Wu wrote:
> > > >>>     >>>>      >>>>>>> Hi Jeff,
> > > >>>     >>>>      >>>>>>>
> > > >>>     >>>>      >>>>>>> Thanks for sharing the Zeppelin approach. I
> > > >>>     think it's a
> > > >>>     >>>>      good idea to
> > > >>>     >>>>      >>>>>>> leverage user's travis account.
> > > >>>     >>>>      >>>>>>> In this way, we can have almost unlimited
> > > >>>     concurrent build
> > > >>>     >>>>      jobs and
> > > >>>     >>>>      >>>>>>> developers can restart build by themselves
> > > >>>     (currently only
> > > >>>     >>>>      committers
> > > >>>     >>>>      >>>>>>> can restart PR's build).
> > > >>>     >>>>      >>>>>>>
> > > >>>     >>>>      >>>>>>> But I'm still not very clear how to integrate
> > > >>> user's
> > > >>>     >>>>      travis build
> > > >>>     >>>>      >> into
> > > >>>     >>>>      >>>>>>> the Flink pull request's build automatically.
> > > >>>     Can you
> > > >>>     >>>>      explain more in
> > > >>>     >>>>      >>>>>>> detail?
> > > >>>     >>>>      >>>>>>>
> > > >>>     >>>>      >>>>>>> Another question: does travis only build
> > > >>>     branches for user
> > > >>>     >>>>      account?
> > > >>>     >>>>      >>>>>>> My concern is that builds for PRs will rebase
> > > >>> user's
> > > >>>     >>>>      commits against
> > > >>>     >>>>      >>>>>>> current master branch.
> > > >>>     >>>>      >>>>>>> This will help us to find problems before
> > > >>>     merge.  Builds
> > > >>>     >>>>      for branches
> > > >>>     >>>>      >>>>>>> will lose the impact of new commits in
> master.
> > > >>>     >>>>      >>>>>>> How does Zeppelin solve this problem?
> > > >>>     >>>>      >>>>>>>
> > > >>>     >>>>      >>>>>>> Thanks again for sharing the idea.
> > > >>>     >>>>      >>>>>>>
> > > >>>     >>>>      >>>>>>> Regards,
> > > >>>     >>>>      >>>>>>> Jark
> > > >>>     >>>>      >>>>>>>
> > > >>>     >>>>      >>>>>>> On Tue, 25 Jun 2019 at 11:01, Jeff Zhang
> > > >>>     <zjffdu@gmail.com <ma...@gmail.com>
> > > >>>     >>>>      <mailto:zjffdu@gmail.com <ma...@gmail.com>>
> > > >>>     >>>>      >>>>>>> <mailto:zjffdu@gmail.com
> > > >>> <ma...@gmail.com> <mailto:zjffdu@gmail.com
> > > >>> <ma...@gmail.com>>>> wrote:
> > > >>>     >>>>      >>>>>>>
> > > >>>     >>>>      >>>>>>>  Hi Folks,
> > > >>>     >>>>      >>>>>>>
> > > >>>     >>>>      >>>>>>>  Zeppelin meet this kind of issue before, we
> > > >>> solve
> > > >>>     >>>> it by
> > > >>>     >>>>      >> delegating
> > > >>>     >>>>      >>>>>>>  each
> > > >>>     >>>>      >>>>>>>  one's PR build to his travis account
> > > >>>     (Everyone can
> > > >>>     >>>>      have 5 free
> > > >>>     >>>>      >>>>>>>  slot for
> > > >>>     >>>>      >>>>>>>  travis build).
> > > >>>     >>>>      >>>>>>>  Apache account travis build is only
> triggered
> > > >>> when
> > > >>>     >>>>      PR is merged.
> > > >>>     >>>>      >>>>>>>
> > > >>>     >>>>      >>>>>>>
> > > >>>     >>>>      >>>>>>>
> > > >>>     >>>>      >>>>>>>  Kurt Young <ykt836@gmail.com
> > > >>> <ma...@gmail.com>
> > > >>>     >>>>      <mailto:ykt836@gmail.com <ma...@gmail.com>>
> > > >>>     <mailto:ykt836@gmail.com <ma...@gmail.com>
> > > >>>     >>>>      <mailto:ykt836@gmail.com <mailto:ykt836@gmail.com
> >>>>
> > > >>>     >>>>      >>>>>>>  于2019年6月25日周二 上午10:16写道:
> > > >>>     >>>>      >>>>>>>
> > > >>>     >>>>      >>>>>>>  > (Forgot to cc George)
> > > >>>     >>>>      >>>>>>>  >
> > > >>>     >>>>      >>>>>>>  > Best,
> > > >>>     >>>>      >>>>>>>  > Kurt
> > > >>>     >>>>      >>>>>>>  >
> > > >>>     >>>>      >>>>>>>  >
> > > >>>     >>>>      >>>>>>>  > On Tue, Jun 25, 2019 at 10:16 AM Kurt
> Young
> > > >>>     >>>>      <ykt836@gmail.com <ma...@gmail.com>
> > > >>>     <mailto:ykt836@gmail.com <ma...@gmail.com>>
> > > >>>     >>>>      >>>>>>> <mailto:ykt836@gmail.com
> > > >>> <ma...@gmail.com> <mailto:ykt836@gmail.com
> > > >>> <ma...@gmail.com>>>>
> > > >>>     >>>>      wrote:
> > > >>>     >>>>      >>>>>>>  >
> > > >>>     >>>>      >>>>>>>  > > Hi Bowen,
> > > >>>     >>>>      >>>>>>>  > >
> > > >>>     >>>>      >>>>>>>  > > Thanks for bringing this up. We
> > > >>>     actually have
> > > >>>     >>>>      discussed
> > > >>>     >>>>      >> about
> > > >>>     >>>>      >>>>>>>  this, and I
> > > >>>     >>>>      >>>>>>>  > > think Till and George have
> > > >>>     >>>>      >>>>>>>  > > already spend sometime investigating
> > > >>>     it. I have
> > > >>>     >>>>      cced both of
> > > >>>     >>>>      >>>>>>>  them, and
> > > >>>     >>>>      >>>>>>>  > > maybe they can share
> > > >>>     >>>>      >>>>>>>  > > their findings.
> > > >>>     >>>>      >>>>>>>  > >
> > > >>>     >>>>      >>>>>>>  > > Best,
> > > >>>     >>>>      >>>>>>>  > > Kurt
> > > >>>     >>>>      >>>>>>>  > >
> > > >>>     >>>>      >>>>>>>  > >
> > > >>>     >>>>      >>>>>>>  > > On Tue, Jun 25, 2019 at 10:08 AM Jark Wu
> > > >>>     >>>>      <imjark@gmail.com <ma...@gmail.com>
> > > >>>     <mailto:imjark@gmail.com <ma...@gmail.com>>
> > > >>>     >>>>      >>>>>>> <mailto:imjark@gmail.com
> > > >>> <ma...@gmail.com> <mailto:imjark@gmail.com
> > > >>> <ma...@gmail.com>>>>
> > > >>>     >>>>      wrote:
> > > >>>     >>>>      >>>>>>>  > >
> > > >>>     >>>>      >>>>>>>  > >> Hi Bowen,
> > > >>>     >>>>      >>>>>>>  > >>
> > > >>>     >>>>      >>>>>>>  > >> Thanks for bringing this. We also
> > > >>>     suffered from
> > > >>>     >>>>      the long
> > > >>>     >>>>      >>>>>>>  build time.
> > > >>>     >>>>      >>>>>>>  > >> I agree that we should focus on
> > > >>>     solving build
> > > >>>     >>>>      capacity
> > > >>>     >>>>      >>>>>>>  problem in the
> > > >>>     >>>>      >>>>>>>  > >> thread.
> > > >>>     >>>>      >>>>>>>  > >>
> > > >>>     >>>>      >>>>>>>  > >> My observation is there is only one
> > > >>>     build is
> > > >>>     >>>>      running, all
> > > >>>     >>>>      >> the
> > > >>>     >>>>      >>>>>>>  others
> > > >>>     >>>>      >>>>>>>  > >> (other
> > > >>>     >>>>      >>>>>>>  > >> PRs, master) are pending.
> > > >>>     >>>>      >>>>>>>  > >> The pricing plan[1] of travis shows
> > > >>>     it can
> > > >>>     >>>> support
> > > >>>     >>>>      >> concurrent
> > > >>>     >>>>      >>>>>>>  build
> > > >>>     >>>>      >>>>>>>  > jobs.
> > > >>>     >>>>      >>>>>>>  > >> But I don't know which plan we are
> > > >>>     using, might
> > > >>>     >>>>      be the free
> > > >>>     >>>>      >>>>>>>  plan for
> > > >>>     >>>>      >>>>>>>  > open
> > > >>>     >>>>      >>>>>>>  > >> source.
> > > >>>     >>>>      >>>>>>>  > >>
> > > >>>     >>>>      >>>>>>>  > >> I cc-ed Chesnay who may have some
> > > >>>     experience on
> > > >>>     >>>>      Travis.
> > > >>>     >>>>      >>>>>>>  > >>
> > > >>>     >>>>      >>>>>>>  > >> Regards,
> > > >>>     >>>>      >>>>>>>  > >> Jark
> > > >>>     >>>>      >>>>>>>  > >>
> > > >>>     >>>>      >>>>>>>  > >> [1]: https://travis-ci.com/plans
> > > >>>     >>>>      >>>>>>>  > >>
> > > >>>     >>>>      >>>>>>>  > >> On Tue, 25 Jun 2019 at 08:11, Bowen Li
> <
> > > >>>     >>>>      >> bowenli86@gmail.com <ma...@gmail.com>
> > > >>>     <mailto:bowenli86@gmail.com <ma...@gmail.com>>
> > > >>>     >>>>      >>>>>>> <mailto:bowenli86@gmail.com
> > > >>> <ma...@gmail.com>
> > > >>>     >>>>      <mailto:bowenli86@gmail.com
> > > >>> <ma...@gmail.com>>>> wrote:
> > > >>>     >>>>      >>>>>>>  > >>
> > > >>>     >>>>      >>>>>>>  > >> > Hi Steven,
> > > >>>     >>>>      >>>>>>>  > >> >
> > > >>>     >>>>      >>>>>>>  > >> > I think you may not read what I
> > > >>>     wrote. The
> > > >>>     >>>>      discussion is
> > > >>>     >>>>      >>>> about
> > > >>>     >>>>      >>>>>>>  > "unstable
> > > >>>     >>>>      >>>>>>>  > >> > build **capacity**", in another word
> > > >>>     >>>>      "unstable / lack of
> > > >>>     >>>>      >>>> build
> > > >>>     >>>>      >>>>>>>  > >> resources",
> > > >>>     >>>>      >>>>>>>  > >> > not "unstable build".
> > > >>>     >>>>      >>>>>>>  > >> >
> > > >>>     >>>>      >>>>>>>  > >> > On Mon, Jun 24, 2019 at 4:40 PM
> > > >>>     Steven Wu
> > > >>>     >>>>      >>>>>>>  <stevenz3wu@gmail.com
> > > >>> <ma...@gmail.com> <mailto:stevenz3wu@gmail.com
> > > >>> <ma...@gmail.com>>
> > > >>>     >>>>      <mailto:stevenz3wu@gmail.com
> > > >>> <ma...@gmail.com> <mailto:stevenz3wu@gmail.com
> > > >>> <ma...@gmail.com>>>>
> > > >>>     >>>>      >>>>>>>  > wrote:
> > > >>>     >>>>      >>>>>>>  > >> >
> > > >>>     >>>>      >>>>>>>  > >> > > long and sometimes unstable build
> is
> > > >>>     >>>>      definitely a pain
> > > >>>     >>>>      >>>>>> point.
> > > >>>     >>>>      >>>>>>>  > >> > >
> > > >>>     >>>>      >>>>>>>  > >> > > I suspect the build failure here in
> > > >>>     >>>>      >> flink-connector-kafka
> > > >>>     >>>>      >>>>>>>  is not
> > > >>>     >>>>      >>>>>>>  > >> related
> > > >>>     >>>>      >>>>>>>  > >> > to
> > > >>>     >>>>      >>>>>>>  > >> > > my change. but there is no easy
> > > >>>     re-run the
> > > >>>     >>>>      build on
> > > >>>     >>>>      >>>>>>>  travis UI.
> > > >>>     >>>>      >>>>>>>  > Google
> > > >>>     >>>>      >>>>>>>  > >> > > search showed a trick of
> > > >>>     close-and-open the
> > > >>>     >>>>      PR will
> > > >>>     >>>>      >>>>>>>  trigger rebuild.
> > > >>>     >>>>      >>>>>>>  > >> but
> > > >>>     >>>>      >>>>>>>  > >> > > that could add noises to the PR
> > > >>>     activities.
> > > >>>     >>>>      >>>>>>>  > >> > >
> > > >>>     >>>> https://travis-ci.org/apache/flink/jobs/545555519
> > > >>>     >>>>      >>>>>>>  > >> > >
> > > >>>     >>>>      >>>>>>>  > >> > > travis-ci for my personal repo
> > > >>>     often failed
> > > >>>     >>>>      with
> > > >>>     >>>>      >>>>>>>  exceeding time
> > > >>>     >>>>      >>>>>>>  > limit
> > > >>>     >>>>      >>>>>>>  > >> > after
> > > >>>     >>>>      >>>>>>>  > >> > > 4+ hours.
> > > >>>     >>>>      >>>>>>>  > >> > > The job exceeded the maximum time
> > > >>>     limit for
> > > >>>     >>>>      jobs, and
> > > >>>     >>>>      >> has
> > > >>>     >>>>      >>>>>>>  been
> > > >>>     >>>>      >>>>>>>  > >> > terminated.
> > > >>>     >>>>      >>>>>>>  > >> > >
> > > >>>     >>>>      >>>>>>>  > >> > > On Mon, Jun 24, 2019 at 4:15 PM
> > > >>>     Bowen Li
> > > >>>     >>>>      >>>>>>>  <bowenli86@gmail.com
> > > >>> <ma...@gmail.com> <mailto:bowenli86@gmail.com
> > > >>> <ma...@gmail.com>>
> > > >>>     >>>>      <mailto:bowenli86@gmail.com <mailto:
> > bowenli86@gmail.com>
> > > >>>     <mailto:bowenli86@gmail.com <ma...@gmail.com>>>>
> > > >>>     >>>>      >>>>>>>  > wrote:
> > > >>>     >>>>      >>>>>>>  > >> > >
> > > >>>     >>>>      >>>>>>>  > >> > > >
> > > >>>     >>>> https://travis-ci.org/apache/flink/builds/549681530
> > > >>>     >>>>      >>>>>>>  This build
> > > >>>     >>>>      >>>>>>>  > >> > request
> > > >>>     >>>>      >>>>>>>  > >> > > > has
> > > >>>     >>>>      >>>>>>>  > >> > > > been sitting at **HEAD of the
> > > >>>     queue**
> > > >>>     >>>>      since I first
> > > >>>     >>>>      >> saw
> > > >>>     >>>>      >>>>>>>  it at PST
> > > >>>     >>>>      >>>>>>>  > >> > 10:30am
> > > >>>     >>>>      >>>>>>>  > >> > > > (not sure how long it's been
> > > >>>     there before
> > > >>>     >>>>      10:30am).
> > > >>>     >>>>      >>>>>>>  It's PST
> > > >>>     >>>>      >>>>>>>  > 4:12pm
> > > >>>     >>>>      >>>>>>>  > >> now
> > > >>>     >>>>      >>>>>>>  > >> > > and
> > > >>>     >>>>      >>>>>>>  > >> > > > it hasn't started yet.
> > > >>>     >>>>      >>>>>>>  > >> > > >
> > > >>>     >>>>      >>>>>>>  > >> > > > On Mon, Jun 24, 2019 at 2:48 PM
> > > >>>     Bowen Li
> > > >>>     >>>>      >>>>>>>  <bowenli86@gmail.com
> > > >>> <ma...@gmail.com> <mailto:bowenli86@gmail.com
> > > >>> <ma...@gmail.com>>
> > > >>>     >>>>      <mailto:bowenli86@gmail.com <mailto:
> > bowenli86@gmail.com>
> > > >>>     <mailto:bowenli86@gmail.com <ma...@gmail.com>>>>
> > > >>>     >>>>      >>>>>>>  > >> wrote:
> > > >>>     >>>>      >>>>>>>  > >> > > >
> > > >>>     >>>>      >>>>>>>  > >> > > > > Hi devs,
> > > >>>     >>>>      >>>>>>>  > >> > > > >
> > > >>>     >>>>      >>>>>>>  > >> > > > > I've been experiencing the pain
> > > >>>     >>>>      resulting from lack
> > > >>>     >>>>      >>>>>>>  of stable
> > > >>>     >>>>      >>>>>>>  > >> build
> > > >>>     >>>>      >>>>>>>  > >> > > > > capacity on Travis for Flink
> > > >>>     PRs [1].
> > > >>>     >>>>      >> Specifically, I
> > > >>>     >>>>      >>>>>>>  noticed
> > > >>>     >>>>      >>>>>>>  > >> often
> > > >>>     >>>>      >>>>>>>  > >> > > that
> > > >>>     >>>>      >>>>>>>  > >> > > > no
> > > >>>     >>>>      >>>>>>>  > >> > > > > build in the queue is making
> any
> > > >>>     >>>>      progress for
> > > >>>     >>>>      >> hours,
> > > >>>     >>>>      >>>> and
> > > >>>     >>>>      >>>>>>>  > suddenly
> > > >>>     >>>>      >>>>>>>  > >> 5
> > > >>>     >>>>      >>>>>>>  > >> > or
> > > >>>     >>>>      >>>>>>>  > >> > > 6
> > > >>>     >>>>      >>>>>>>  > >> > > > > builds kick off all together
> > > >>>     after the
> > > >>>     >>>>      long pause.
> > > >>>     >>>>      >>>>>>>  I'm at PST
> > > >>>     >>>>      >>>>>>>  > >> > (UTC-08)
> > > >>>     >>>>      >>>>>>>  > >> > > > time
> > > >>>     >>>>      >>>>>>>  > >> > > > > zone, and I've seen pause can
> > > >>>     be as
> > > >>>     >>>>      long as 6 hours
> > > >>>     >>>>      >>>>>>>  from PST 9am
> > > >>>     >>>>      >>>>>>>  > >> to
> > > >>>     >>>>      >>>>>>>  > >> > 3pm
> > > >>>     >>>>      >>>>>>>  > >> > > > > (let alone the time needed to
> > > >>>     drain the
> > > >>>     >>>>      queue
> > > >>>     >>>>      >>>>>>>  afterwards).
> > > >>>     >>>>      >>>>>>>  > >> > > > >
> > > >>>     >>>>      >>>>>>>  > >> > > > > I think this has greatly
> > > >>>     impacted our
> > > >>>     >>>>      productivity.
> > > >>>     >>>>      >>>> I've
> > > >>>     >>>>      >>>>>>>  > >> experienced
> > > >>>     >>>>      >>>>>>>  > >> > > that
> > > >>>     >>>>      >>>>>>>  > >> > > > > PRs submitted in the early
> > > >>>     morning of
> > > >>>     >>>>      PST time zone
> > > >>>     >>>>      >>>>>>>  won't finish
> > > >>>     >>>>      >>>>>>>  > >> > their
> > > >>>     >>>>      >>>>>>>  > >> > > > > build until late night of the
> > > >>>     same day.
> > > >>>     >>>>      >>>>>>>  > >> > > > >
> > > >>>     >>>>      >>>>>>>  > >> > > > > So my questions are:
> > > >>>     >>>>      >>>>>>>  > >> > > > >
> > > >>>     >>>>      >>>>>>>  > >> > > > > - Has anyone else experienced
> > > >>>     the same
> > > >>>     >>>>      problem or
> > > >>>     >>>>      >>>>>>>  have similar
> > > >>>     >>>>      >>>>>>>  > >> > > > observation
> > > >>>     >>>>      >>>>>>>  > >> > > > > on TravisCI? (I suspect it
> > > >>>     has things
> > > >>>     >>>>      to do with
> > > >>>     >>>>      >> time
> > > >>>     >>>>      >>>>>>>  zone)
> > > >>>     >>>>      >>>>>>>  > >> > > > >
> > > >>>     >>>>      >>>>>>>  > >> > > > > - What pricing plan of
> > > >>>     TravisCI is
> > > >>>     >>>>      Flink currently
> > > >>>     >>>>      >>>>>>>  using? Is it
> > > >>>     >>>>      >>>>>>>  > >> the
> > > >>>     >>>>      >>>>>>>  > >> > > free
> > > >>>     >>>>      >>>>>>>  > >> > > > > plan for open source
> > > >>>     projects? What
> > > >>>     >>>> are the
> > > >>>     >>>>      >>>>>>>  guaranteed build
> > > >>>     >>>>      >>>>>>>  > >> capacity
> > > >>>     >>>>      >>>>>>>  > >> > > of
> > > >>>     >>>>      >>>>>>>  > >> > > > > the current plan?
> > > >>>     >>>>      >>>>>>>  > >> > > > >
> > > >>>     >>>>      >>>>>>>  > >> > > > > - If the current pricing plan
> > > >>>     (either
> > > >>>     >>>>      free or paid)
> > > >>>     >>>>      >>>>>> can't
> > > >>>     >>>>      >>>>>>>  > provide
> > > >>>     >>>>      >>>>>>>  > >> > > stable
> > > >>>     >>>>      >>>>>>>  > >> > > > > build capacity, can we
> > > >>>     upgrade to a
> > > >>>     >>>>      higher priced
> > > >>>     >>>>      >>>>>>>  plan with
> > > >>>     >>>>      >>>>>>>  > larger
> > > >>>     >>>>      >>>>>>>  > >> > and
> > > >>>     >>>>      >>>>>>>  > >> > > > more
> > > >>>     >>>>      >>>>>>>  > >> > > > > stable build capacity?
> > > >>>     >>>>      >>>>>>>  > >> > > > >
> > > >>>     >>>>      >>>>>>>  > >> > > > > BTW, another factor that
> > > >>>     contribute to
> > > >>>     >>>> the
> > > >>>     >>>>      >>>>>>>  productivity problem
> > > >>>     >>>>      >>>>>>>  > is
> > > >>>     >>>>      >>>>>>>  > >> > that
> > > >>>     >>>>      >>>>>>>  > >> > > > > our build is slow - we run
> > > >>>     full build
> > > >>>     >>>>      for every PR
> > > >>>     >>>>      >>>> and a
> > > >>>     >>>>      >>>>>>>  > >> successful
> > > >>>     >>>>      >>>>>>>  > >> > > full
> > > >>>     >>>>      >>>>>>>  > >> > > > > build takes ~5h. We
> > > >>>     definitely have
> > > >>>     >>>>      more options to
> > > >>>     >>>>      >>>>>>>  solve it,
> > > >>>     >>>>      >>>>>>>  > for
> > > >>>     >>>>      >>>>>>>  > >> > > > instance,
> > > >>>     >>>>      >>>>>>>  > >> > > > > modularize the build graphs
> > > >>>     and reuse
> > > >>>     >>>>      artifacts
> > > >>>     >>>>      >> from
> > > >>>     >>>>      >>>> the
> > > >>>     >>>>      >>>>>>>  > previous
> > > >>>     >>>>      >>>>>>>  > >> > > build.
> > > >>>     >>>>      >>>>>>>  > >> > > > > But I think that can be a big
> > > >>>     effort
> > > >>>     >>>>      which is much
> > > >>>     >>>>      >>>>>>>  harder to
> > > >>>     >>>>      >>>>>>>  > >> > accomplish
> > > >>>     >>>>      >>>>>>>  > >> > > > in
> > > >>>     >>>>      >>>>>>>  > >> > > > > a short period of time and
> > > >>>     may deserve
> > > >>>     >>>>      its own
> > > >>>     >>>>      >>>> separate
> > > >>>     >>>>      >>>>>>>  > >> discussion.
> > > >>>     >>>>      >>>>>>>  > >> > > > >
> > > >>>     >>>>      >>>>>>>  > >> > > > > [1]
> > > >>>     >>>>      >> https://travis-ci.org/apache/flink/pull_requests
> > > >>>     >>>>      >>>>>>>  > >> > > > >
> > > >>>     >>>>      >>>>>>>  > >> > > > >
> > > >>>     >>>>      >>>>>>>  > >> > > >
> > > >>>     >>>>      >>>>>>>  > >> > >
> > > >>>     >>>>      >>>>>>>  > >> >
> > > >>>     >>>>      >>>>>>>  > >>
> > > >>>     >>>>      >>>>>>>  > >
> > > >>>     >>>>      >>>>>>>  >
> > > >>>     >>>>      >>>>>>>
> > > >>>     >>>>      >>>>>>>
> > > >>>     >>>>      >>>>>>>  --
> > > >>>     >>>>      >>>>>>>  Best Regards
> > > >>>     >>>>      >>>>>>>
> > > >>>     >>>>      >>>>>>>  Jeff Zhang
> > > >>>     >>>>      >>>>>>>
> > > >>>     >>>>      >>
> > > >>>     >>>>
> > > >>>     >>>
> > > >>>     >>
> > > >>>
> > > >>
> > > >>
> > > >
> > >
> > >
> >
>

Re: [VOTE] Migrate to sponsored Travis account

Posted by Jark Wu <im...@gmail.com>.
+1 for the migration and great thanks to Chesnay and Bowen for pushing this!

Cheers,
Jark

On Fri, 5 Jul 2019 at 09:34, Congxian Qiu <qc...@gmail.com> wrote:

> +1 for the migration.
>
> Best,
> Congxian
>
>
> Hequn Cheng <ch...@gmail.com> 于2019年7月4日周四 下午9:42写道:
>
> > +1.
> >
> > And thanks a lot to Chesnay for pushing this.
> >
> > Best, Hequn
> >
> > On Thu, Jul 4, 2019 at 8:07 PM Chesnay Schepler <ch...@apache.org>
> > wrote:
> >
> > > Note that the Flinkbot approach isn't that trivial either; we can't
> > > _just_ trigger builds for a branch in the apache repo, but would first
> > > have to clone the branch/pr into a separate repository (that is owned
> by
> > > the github account that the travis account would be tied to).
> > >
> > > One roadblock after the next showing up...
> > >
> > > On 04/07/2019 11:59, Chesnay Schepler wrote:
> > > > Small update with mostly bad news:
> > > >
> > > > INFRA doesn't know whether it is possible, and referred my to Travis
> > > > support.
> > > > They did point out that it could be problematic in regards to
> > > > read/write permissions for the repository.
> > > >
> > > > From my own findings /so far/ with a test repo/organization, it does
> > > > not appear possible to configure the Travis account used for a
> > > > specific repository.
> > > >
> > > > So yeah, if we go down this route we may have to pimp the Flinkbot to
> > > > trigger builds through the Travis REST API.
> > > >
> > > > On 04/07/2019 10:46, Chesnay Schepler wrote:
> > > >> I've raised a JIRA
> > > >> <https://issues.apache.org/jira/browse/INFRA-18703>with INFRA to
> > > >> inquire whether it would be possible to switch to a different Travis
> > > >> account, and if so what steps would need to be taken.
> > > >> We need a proper confirmation from INFRA since we are not in full
> > > >> control of the flink repository (for example, we cannot access the
> > > >> settings page).
> > > >>
> > > >> If this is indeed possible, Ververica is willing sponsor a Travis
> > > >> account for the Flink project.
> > > >> This would provide us with more than enough resources than we need.
> > > >>
> > > >> Since this makes the project more reliant on resources provided by
> > > >> external companies I would like to vote on this.
> > > >>
> > > >> Please vote on this proposal, as follows:
> > > >> [ ] +1, Approve the migration to a Ververica-sponsored Travis
> > > >> account, provided that INFRA approves
> > > >> [ ] -1, Do not approach the migration to a Ververica-sponsored
> Travis
> > > >> account
> > > >>
> > > >> The vote will be open for at least 24h, and until we have
> > > >> confirmation from INFRA. The voting period may be shorter than the
> > > >> usual 3 days since our current is effectively not working.
> > > >>
> > > >> On 04/07/2019 06:51, Bowen Li wrote:
> > > >>> Re: > Are they using their own Travis CI pool, or did the switch to
> > > >>> an entirely different CI service?
> > > >>>
> > > >>> I reached out to Wes and Krisztián from Apache Arrow PMC. They are
> > > >>> currently moving away from ASF's Travis to their own in-house metal
> > > >>> machines at [1] with custom CI application at [2]. They've seen
> > > >>> significant improvement w.r.t both much higher performance and
> > > >>> basically no resource waiting time, "night-and-day" difference
> > > >>> quoting Wes.
> > > >>>
> > > >>> Re: > If we can just switch to our own Travis pool, just for our
> > > >>> project, then this might be something we can do fairly quickly?
> > > >>>
> > > >>> I believe so, according to [3] and [4]
> > > >>>
> > > >>>
> > > >>> [1] https://ci.ursalabs.org/ <https://ci.ursalabs.org/#/>
> > > >>> [2] https://github.com/ursa-labs/ursabot
> > > >>> [3]
> > > >>>
> > >
> https://docs.travis-ci.com/user/migrate/open-source-repository-migration
> > > >>>
> > > >>> [4]
> > > >>>
> https://docs.travis-ci.com/user/migrate/open-source-on-travis-ci-com
> > > >>>
> > > >>>
> > > >>>
> > > >>> On Wed, Jul 3, 2019 at 12:01 AM Chesnay Schepler <
> chesnay@apache.org
> > > >>> <ma...@apache.org>> wrote:
> > > >>>
> > > >>>     Are they using their own Travis CI pool, or did the switch to
> an
> > > >>>     entirely different CI service?
> > > >>>
> > > >>>     If we can just switch to our own Travis pool, just for our
> > > >>>     project, then
> > > >>>     this might be something we can do fairly quickly?
> > > >>>
> > > >>>     On 03/07/2019 05:55, Bowen Li wrote:
> > > >>>     > I responded in the INFRA ticket [1] that I believe they are
> > > >>>     using a wrong
> > > >>>     > metric against Flink and the total build time is a completely
> > > >>>     different
> > > >>>     > thing than guaranteed build capacity.
> > > >>>     >
> > > >>>     > My response:
> > > >>>     >
> > > >>>     > "As mentioned above, since I started to pay attention to
> > Flink's
> > > >>>     build
> > > >>>     > queue a few tens of days ago, I'm in Seattle and I saw no
> build
> > > >>>     was kicking
> > > >>>     > off in PST daytime in weekdays for Flink. Our teammates in
> > China
> > > >>>     and Europe
> > > >>>     > have also reported similar observations. So we need to
> evaluate
> > > >>>     how the
> > > >>>     > large total build time came from - if 1) your number and 2)
> our
> > > >>>     > observations from three locations that cover pretty much a
> full
> > > >>>     day, are
> > > >>>     > all true, I **guess** one reason can be that - highly likely
> > the
> > > >>>     extra
> > > >>>     > build time came from weekends when other Apache projects may
> be
> > > >>>     idle and
> > > >>>     > Flink just drains hard its congested queue.
> > > >>>     >
> > > >>>     > Please be aware of that we're not complaining about the lack
> of
> > > >>>     resources
> > > >>>     > in general, I'm complaining about the lack of **stable,
> > > >>> dedicated**
> > > >>>     > resources. An example for the latter one is, currently even
> if
> > > >>>     no build is
> > > >>>     > in Flink's queue and I submit a request to be the queue head
> > > >>> in PST
> > > >>>     > morning, my build won't even start in 6-8+h. That is an
> absurd
> > > >>>     amount of
> > > >>>     > waiting time.
> > > >>>     >
> > > >>>     > That's saying, if ASF INFRA decides to adopt a quota system
> and
> > > >>>     grants
> > > >>>     > Flink five DEDICATED servers that runs all the time only for
> > > >>>     Flink, that'll
> > > >>>     > be PERFECT and can totally solve our problem now.
> > > >>>     >
> > > >>>     > Please be aware of that we're not complaining about the lack
> of
> > > >>>     resources
> > > >>>     > in general, I'm complaining about the lack of **stable,
> > > >>> dedicated**
> > > >>>     > resources. An example for the latter one is, currently even
> if
> > > >>>     no build is
> > > >>>     > in Flink's queue and I submit a request to be the queue head
> > > >>> in PST
> > > >>>     > morning, my build won't even start in 6-8+h. That is an
> absurd
> > > >>>     amount of
> > > >>>     > waiting time.
> > > >>>     >
> > > >>>     >
> > > >>>     > That's saying, if ASF INFRA decides to adopt a quota system
> and
> > > >>>     grants
> > > >>>     > Flink five DEDICATED servers that runs all the time only for
> > > >>>     Flink, that'll
> > > >>>     > be PERFECT and can totally solve our problem now.
> > > >>>     >
> > > >>>     > I feel what's missing in the ASF INFRA's Travis resource pool
> > is
> > > >>>     some level
> > > >>>     > of build capacity SLAs and certainty"
> > > >>>     >
> > > >>>     >
> > > >>>     > Again, I believe there are differences in nature of these two
> > > >>>     problems,
> > > >>>     > long build time v.s. lack of dedicated build resource. That's
> > > >>>     saying,
> > > >>>     > shortening build time may relieve the situation, and may not.
> > > >>>     I'm sightly
> > > >>>     > negative on disabling IT cases for PRs, due to the downside
> is
> > > >>>     that we are
> > > >>>     > at risk of any potential bugs in PR that UTs doesn't catch,
> and
> > > >>>     may cost a
> > > >>>     > lot more to fix and if it slows others down or even block
> > > >>>     others, but am
> > > >>>     > open to others opinions on it.
> > > >>>     >
> > > >>>     > AFAICT from INFRA ticket[1], donating to ASF INFRA won't be
> > > >>>     feasible to
> > > >>>     > solve our problem since INFRA's pool is fully shared and they
> > > >>>     have no
> > > >>>     > control and finer insights over resource allocation to a
> > > >>>     specific Apache
> > > >>>     > project. As mentioned in [1], Apache Arrow is moving away
> from
> > > >>>     ASF INFRA
> > > >>>     > Travis pool (they are actually surprised Flink hasn't plan to
> > do
> > > >>>     so). I
> > > >>>     > know that Spark is on its own build infra. If we all agree
> that
> > > >>>     funding our
> > > >>>     > own build infra, I'd be glad to help investigate any
> potential
> > > >>>     options
> > > >>>     > after releasing 1.9 since I'm super busy with 1.9 now.
> > > >>>     >
> > > >>>     > [1] https://issues.apache.org/jira/browse/INFRA-18533
> > > >>>     >
> > > >>>     >
> > > >>>     >
> > > >>>     > On Tue, Jul 2, 2019 at 4:46 AM Chesnay Schepler
> > > >>>     <chesnay@apache.org <ma...@apache.org>> wrote:
> > > >>>     >
> > > >>>     >> As a short-term stopgap, since we can assume this issue to
> > > >>>     become much
> > > >>>     >> worse in the following days/weeks, we could disable IT cases
> > in
> > > >>>     PRs and
> > > >>>     >> only run them on master.
> > > >>>     >>
> > > >>>     >> On 02/07/2019 12:03, Chesnay Schepler wrote:
> > > >>>     >>> People really have to stop thinking that just because
> > > >>>     something works
> > > >>>     >>> for us it is also a good solution.
> > > >>>     >>> Also, please remember that our builds run for 2h from start
> > to
> > > >>>     finish,
> > > >>>     >>> and not the 14 _minutes_ it takes for zeppelin.
> > > >>>     >>> We are dealing with an entirely different scale here, both
> in
> > > >>>     terms of
> > > >>>     >>> build times and number of builds.
> > > >>>     >>>
> > > >>>     >>> In this very thread people have been complaining about long
> > > >>> queue
> > > >>>     >>> times for their builds. Surprise, other Apache projects
> have
> > > >>> been
> > > >>>     >>> suffering the very same thing due to us not controlling our
> > > >>> build
> > > >>>     >>> times. While switching services (be it Jenkins, CircleCI or
> > > >>>     whatever)
> > > >>>     >>> will possibly work for us (and these options are actually
> > > >>>     attractive,
> > > >>>     >>> like CircleCI's proper support for build artifacts), it
> will
> > > >>> also
> > > >>>     >>> result in us likely negatively affecting other projects in
> > > >>>     significant
> > > >>>     >>> ways.
> > > >>>     >>>
> > > >>>     >>> Sure, the Jenkins setup has a good user experience for us,
> at
> > > >>>     the cost
> > > >>>     >>> of blocking Jenkins workers for a _lot_ of time. Right now
> we
> > > >>>     have 25
> > > >>>     >>> PR's in our queue; that's possibly 50h we'd consume of
> > Jenkins
> > > >>>     >>> resources, and the European contributors haven't even
> really
> > > >>>     started yet.
> > > >>>     >>>
> > > >>>     >>> FYI, the latest INFRA response from INFRA-18533:
> > > >>>     >>>
> > > >>>     >>> "Our rough metrics shows that Flink used over 5800 hours of
> > > >>>     build time
> > > >>>     >>> last month. That is equal to EIGHT servers running 24/7 for
> > > >>>     the ENTIRE
> > > >>>     >>> MONTH. EIGHT. nonstop.
> > > >>>     >>> When we discovered this last night, we discussed it some
> and
> > > >>>     are going
> > > >>>     >>> to tune down Flink to allow only five executors maximum. We
> > > >>> cannot
> > > >>>     >>> allow Flink to consume so much of a Foundation shared
> > > >>> resource."
> > > >>>     >>>
> > > >>>     >>> So yes, we either
> > > >>>     >>> a) have to heavily reduce our CI usage or
> > > >>>     >>> b) fund our own, either maintaining it ourselves or
> donating
> > > >>>     to Apache.
> > > >>>     >>>
> > > >>>     >>> On 02/07/2019 05:11, Bowen Li wrote:
> > > >>>     >>>> By looking at the git history of the Jenkins script, its
> > core
> > > >>>     part
> > > >>>     >>>> was finished in March 2017 (and only two minor update in
> > > >>>     2017/2018),
> > > >>>     >>>> so it's been running for over two years now and feels like
> > > >>>     Zepplin
> > > >>>     >>>> community has been quite happy with it. @Jeff Zhang
> > > >>>     >>>> <mailto:zjffdu@gmail.com <ma...@gmail.com>> can
> you
> > > >>>     share your insights and user
> > > >>>     >>>> experience with the Jenkins+Travis approach?
> > > >>>     >>>>
> > > >>>     >>>> Things like:
> > > >>>     >>>>
> > > >>>     >>>> - has the approach completely solved the resource capacity
> > > >>>     problem
> > > >>>     >>>> for Zepplin community? is Zepplin community happy with the
> > > >>>     result?
> > > >>>     >>>> - is the whole configuration chain stable (e.g. uptime)
> > > >>> enough?
> > > >>>     >>>> - how often do you need to maintain the Jenkins infra? how
> > > >>> many
> > > >>>     >>>> people are usually involved in maintenance and bug-fixes?
> > > >>>     >>>>
> > > >>>     >>>> The downside of this approach seems mostly to be on the
> > > >>>     maintenance
> > > >>>     >>>> to me - maintain the script and Jenkins infra.
> > > >>>     >>>>
> > > >>>     >>>> ** Having Our Own Travis-CI.com Account **
> > > >>>     >>>>
> > > >>>     >>>> Another alternative I've been thinking of is to have our
> own
> > > >>>     >>>> travis-ci.com <http://travis-ci.com> <
> http://travis-ci.com>
> > > >>>     account with paid dedicated
> > > >>>     >>>> resources. Note travis-ci.org <http://travis-ci.org>
> > > >>> <http://travis-ci.org> is the free
> > > >>>     >>>> version and travis-ci.com <http://travis-ci.com>
> > > >>> <http://travis-ci.com> is the commercial
> > > >>>     >>>> version. We currently use a shared resource pool managed
> by
> > > >>>     ASK INFRA
> > > >>>     >>>> team on travis-ci.org <http://travis-ci.org>
> > > >>> <http://travis-ci.org>, but we have no control
> > > >>>     >>>> over it - we can't see how it's configured, how much
> > > >>>     resources are
> > > >>>     >>>> available, how resources are allocated among Apache
> > projects,
> > > >>>     etc.
> > > >>>     >>>> The nice thing about having an account on travis-ci.com
> > > >>> <http://travis-ci.com>
> > > >>>     >>>> <http://travis-ci.com> are:
> > > >>>     >>>>
> > > >>>     >>>> - relatively low cost with much better resource guarantee
> > > >>>     than what
> > > >>>     >>>> we currently have [1]: $249/month with 5 dedicated
> > > >>> concurrency,
> > > >>>     >>>> $489/month with 10 concurrency
> > > >>>     >>>> - low maintenance work compared to using Jenkins
> > > >>>     >>>> - (potentially) no migration cost according to Travis's
> doc
> > > >>> [2]
> > > >>>     >>>> (pending verification)
> > > >>>     >>>> - full control over the build capacity/configuration
> > > >>> compared to
> > > >>>     >>>> using ASF INFRA's pool
> > > >>>     >>>>
> > > >>>     >>>> I'd be surprised if we as such a vibrant community cannot
> > > >>>     find and
> > > >>>     >>>> fund $249*12=$2988 a year in exchange for a much better
> > > >>> developer
> > > >>>     >>>> experience and much higher productivity.
> > > >>>     >>>>
> > > >>>     >>>> [1] https://travis-ci.com/plans
> > > >>>     >>>> [2]
> > > >>>     >>>>
> > > >>>     >>
> > > >>>
> > >
> https://docs.travis-ci.com/user/migrate/open-source-repository-migration
> > > >>>
> > > >>>     >>>> On Sat, Jun 29, 2019 at 8:39 AM Chesnay Schepler
> > > >>>     <chesnay@apache.org <ma...@apache.org>
> > > >>>     >>>> <mailto:chesnay@apache.org <ma...@apache.org>>>
> > > >>> wrote:
> > > >>>     >>>>
> > > >>>     >>>>      So yes, the Jenkins job keeps pulling the state from
> > > >>>     Travis until it
> > > >>>     >>>>      finishes.
> > > >>>     >>>>
> > > >>>     >>>>      Note sure I'm comfortable with the idea of using
> > Jenkins
> > > >>>     workers
> > > >>>     >>>>      just to
> > > >>>     >>>>      idle for a several hours.
> > > >>>     >>>>
> > > >>>     >>>>      On 29/06/2019 14:56, Jeff Zhang wrote:
> > > >>>     >>>>      > Here's what zeppelin community did, we make a
> python
> > > >>>     script to
> > > >>>     >>>>      check the
> > > >>>     >>>>      > build status of pull request.
> > > >>>     >>>>      > Here's script:
> > > >>>     >>>>      >
> > > >>> https://github.com/apache/zeppelin/blob/master/travis_check.py
> > > >>>     >>>>      >
> > > >>>     >>>>      > And this is the script we used in Jenkins build
> job.
> > > >>>     >>>>      >
> > > >>>     >>>>      > if [ -f "travis_check.py" ]; then
> > > >>>     >>>>      >    git log -n 1
> > > >>>     >>>>      >    STATUS=$(curl -s $BUILD_URL | grep -e "GitHub
> pull
> > > >>>     >>>>      request.*from.*" | sed
> > > >>>     >>>>      > 's/.*GitHub pull request <a
> > > >>>     >>>>      >
> href=\"\(https[^"]*\).*from[^"]*.\(https[^"]*\).*/\1
> > > >>>     \2/g')
> > > >>>     >>>>      >    AUTHOR=$(echo $STATUS | sed
> 's/.*[/]\(.*\)$/\1/g')
> > > >>>     >>>>      >    PR=$(echo $STATUS | awk '{print $1}' | sed
> > > >>>     >>>> 's/.*[/]\(.*\)$/\1/g')
> > > >>>     >>>>      >    #COMMIT=$(git log -n 1 | grep "^Merge:" | awk
> > > >>>     '{print $3}')
> > > >>>     >>>>      >    #if [ -z $COMMIT ]; then
> > > >>>     >>>>      >    #  COMMIT=$(curl -s
> > > >>>     >>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR
> > > >>>     >>>>      > | grep -e "\"label\":" -e "\"ref\":" -e "\"sha\":"
> |
> > > >>>     tr '\n' ' '
> > > >>>     >>>>      | sed
> > > >>>     >>>>      > 's/\(.*sha[^,]*,\)\(.*ref.*\)/\1 = \2/g' | tr =
> '\n'
> > |
> > > >>>     grep -v
> > > >>>     >>>>      "apache:" |
> > > >>>     >>>>      > sed 's/.*sha.[^"]*["]\([^"]*\).*/\1/g')
> > > >>>     >>>>      >    #fi
> > > >>>     >>>>      >
> > > >>>     >>>>      >    # get commit hash from PR
> > > >>>     >>>>      >    COMMIT=$(curl -s
> > > >>>     >>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR |
> > > >>>     >>>>      > grep -e "\"label\":" -e "\"ref\":" -e "\"sha\":" |
> tr
> > > >>>     '\n' ' '
> > > >>>     >>>> | sed
> > > >>>     >>>>      > 's/\(.*sha[^,]*,\)\(.*ref.*\)/\1 = \2/g' | tr =
> '\n'
> > |
> > > >>>     grep -v
> > > >>>     >>>>      "apache:" |
> > > >>>     >>>>      > sed 's/.*sha.[^"]*["]\([^"]*\).*/\1/g')
> > > >>>     >>>>      >    sleep 30 # sleep few moment to wait travis
> starts
> > > >>>     the build
> > > >>>     >>>>      >    RET_CODE=0
> > > >>>     >>>>      >    python ./travis_check.py ${AUTHOR} ${COMMIT} ||
> > > >>>     RET_CODE=$?
> > > >>>     >>>>      >    if [ $RET_CODE -eq 2 ]; then # try with
> repository
> > > >>>     name when
> > > >>>     >>>>      travis-ci is
> > > >>>     >>>>      > not available in the account
> > > >>>     >>>>      >      RET_CODE=0
> > > >>>     >>>>      >      AUTHOR=$(curl -s
> > > >>>     >>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR
> > > >>>     >>>>      > | grep '"full_name":' | grep -v "apache/zeppelin" |
> > sed
> > > >>>     >>>>      > 's/.*[:][^"]*["]\([^/]*\).*/\1/g')
> > > >>>     >>>>      >    python ./travis_check.py ${AUTHOR} ${COMMIT} ||
> > > >>>     RET_CODE=$?
> > > >>>     >>>>      >    fi
> > > >>>     >>>>      >
> > > >>>     >>>>      >    if [ $RET_CODE -eq 2 ]; then # fail with can't
> > find
> > > >>>     build
> > > >>>     >>>>      information in
> > > >>>     >>>>      > the travis
> > > >>>     >>>>      >      set +x
> > > >>>     >>>>      >      echo
> > > >>>     "-----------------------------------------------------"
> > > >>>     >>>>      >      echo "Looks like travis-ci is not configured
> for
> > > >>>     your fork."
> > > >>>     >>>>      >      echo "Please setup by swich on 'zeppelin'
> > > >>>     repository at
> > > >>>     >>>>      > https://travis-ci.org/profile and travis-ci."
> > > >>>     >>>>      >      echo "And then make sure 'Build branch
> updates'
> > > >>>     option is
> > > >>>     >>>>      enabled in
> > > >>>     >>>>      > the settings
> > > >>> https://travis-ci.org/${AUTHOR}/zeppelin/settings
> > > >>> <https://travis-ci.org/$%7BAUTHOR%7D/zeppelin/settings>
> > > >>>     >>>> <https://travis-ci.org/$%7BAUTHOR%7D/zeppelin/settings>."
> > > >>>     >>>>      >      echo ""
> > > >>>     >>>>      >      echo "To trigger CI after setup, you will need
> > > >>>     ammend your
> > > >>>     >>>>      last commit
> > > >>>     >>>>      > with"
> > > >>>     >>>>      >      echo "git commit --amend"
> > > >>>     >>>>      >      echo "git push your-remote HEAD --force"
> > > >>>     >>>>      >      echo ""
> > > >>>     >>>>      >      echo "See
> > > >>>     >>>>      >
> > > >>>     >>>>
> > > >>>     >>
> > > >>>
> > >
> >
> http://zeppelin.apache.org/contribution/contributions.html#continuous-integration
> > > >>>     >>>>      > ."
> > > >>>     >>>>      >    fi
> > > >>>     >>>>      >
> > > >>>     >>>>      >    exit $RET_CODE
> > > >>>     >>>>      > else
> > > >>>     >>>>      >    set +x
> > > >>>     >>>>      >    echo "travis_check.py does not exists"
> > > >>>     >>>>      >    exit 1
> > > >>>     >>>>      > fi
> > > >>>     >>>>      >
> > > >>>     >>>>      > Chesnay Schepler <chesnay@apache.org
> > > >>> <ma...@apache.org>
> > > >>>     >>>>      <mailto:chesnay@apache.org <mailto:
> chesnay@apache.org
> > >>>
> > > >>>     于2019年6月29日周六 下午3:17写道:
> > > >>>     >>>>      >
> > > >>>     >>>>      >> Does this imply that a Jenkins job is active as
> long
> > > >>>     as the
> > > >>>     >>>>      Travis build
> > > >>>     >>>>      >> runs?
> > > >>>     >>>>      >>
> > > >>>     >>>>      >> On 26/06/2019 21:28, Bowen Li wrote:
> > > >>>     >>>>      >>> Hi,
> > > >>>     >>>>      >>>
> > > >>>     >>>>      >>> @Dawid, I think the "long test running" as I
> > > >>>     mentioned in the
> > > >>>     >>>>      first
> > > >>>     >>>>      >> email,
> > > >>>     >>>>      >>> also as you guys said, belongs to "a big effort
> > > >>>     which is much
> > > >>>     >>>>      harder to
> > > >>>     >>>>      >>> accomplish in a short period of time and may
> > deserve
> > > >>>     its own
> > > >>>     >>>>      separate
> > > >>>     >>>>      >>> discussion". Thus I didn't include it in what we
> > can
> > > >>>     do in a
> > > >>>     >>>>      foreseeable
> > > >>>     >>>>      >>> short term.
> > > >>>     >>>>      >>>
> > > >>>     >>>>      >>> Besides, I don't think that's the ultimate reason
> > > >>>     for lack of
> > > >>>     >>>>      build
> > > >>>     >>>>      >>> resources. Even if the build is shortened to
> > > >>>     something like
> > > >>>     >>>>      2h, the
> > > >>>     >>>>      >>> problems of no build machine works about 6 or
> more
> > > >>>     hours in
> > > >>>     >>>>      PST daytime
> > > >>>     >>>>      >>> that I described will still happen, because no
> > > >>>     machine from
> > > >>>     >>>>      ASF INFRA's
> > > >>>     >>>>      >>> pool is allocated to Flink. As I have paid close
> > > >>>     attention to
> > > >>>     >>>>      the build
> > > >>>     >>>>      >>> queue in the past few weekdays, it's a pretty
> clear
> > > >>>     pattern now.
> > > >>>     >>>>      >>>
> > > >>>     >>>>      >>> **The ultimate root cause** for that is - we
> don't
> > > >>>     have any
> > > >>>     >>>>      **dedicated**
> > > >>>     >>>>      >>> build resources that we can stably rely on. I'm
> > > >>>     actually ok to
> > > >>>     >>>>      wait for a
> > > >>>     >>>>      >>> long time if there are build requests running, it
> > > >>>     means at
> > > >>>     >>>>      least we are
> > > >>>     >>>>      >>> making progress. But I'm not ok with no build
> > > >>>     resource. A
> > > >>>     >>>>      better place I
> > > >>>     >>>>      >>> think we should aim at in short term is to always
> > > >>>     have at
> > > >>>     >>>>      least a central
> > > >>>     >>>>      >>> pool (can be 3 or 5) of machines dedicated to
> build
> > > >>>     Flink at
> > > >>>     >>>>      any time, or
> > > >>>     >>>>      >>> maybe use users resources.
> > > >>>     >>>>      >>>
> > > >>>     >>>>      >>> @Chesnay @Robert I synced with Jeff offline that
> > > >>>     Zeppelin
> > > >>>     >>>>      community is
> > > >>>     >>>>      >>> using a Jenkins job to automatically build on
> > users'
> > > >>>     travis
> > > >>>     >>>>      account and
> > > >>>     >>>>      >>> link the result back to github PR. I guess the
> > > >>>     Jenkins job
> > > >>>     >>>>      would fetch
> > > >>>     >>>>      >>> latest upstream master and build the PR against
> it.
> > > >>>     Jeff has
> > > >>>     >>>> filed
> > > >>>     >>>>      >> tickets
> > > >>>     >>>>      >>> to learn and get access to the Jenkins infra.
> It'll
> > > >>>     better to
> > > >>>     >>>>      fully
> > > >>>     >>>>      >>> understand it first before judging this approach.
> > > >>>     >>>>      >>>
> > > >>>     >>>>      >>> I also heard good things about CircleCI, and ASF
> > > >>>     INFRA seems
> > > >>>     >>>>      to have a
> > > >>>     >>>>      >> pool
> > > >>>     >>>>      >>> of build capacity there too. Can be an
> alternative
> > > >>>     to consider.
> > > >>>     >>>>      >>>
> > > >>>     >>>>      >>>
> > > >>>     >>>>      >>>
> > > >>>     >>>>      >>>
> > > >>>     >>>>      >>>
> > > >>>     >>>>      >>>
> > > >>>     >>>>      >>>
> > > >>>     >>>>      >>>
> > > >>>     >>>>      >>>
> > > >>>     >>>>      >>> On Wed, Jun 26, 2019 at 12:44 AM Dawid
> Wysakowicz <
> > > >>>     >>>>      >> dwysakowicz@apache.org
> > > >>> <ma...@apache.org> <mailto:dwysakowicz@apache.org
> > > >>> <ma...@apache.org>>>
> > > >>>     >>>>      >>> wrote:
> > > >>>     >>>>      >>>
> > > >>>     >>>>      >>>> Sorry to jump in late, but I think Bowen missed
> > the
> > > >>>     most
> > > >>>     >>>>      important point
> > > >>>     >>>>      >>>> from Chesnay's previous message in the summary.
> > The
> > > >>>     ultimate
> > > >>>     >>>>      reason for
> > > >>>     >>>>      >>>> all the problems is that the tests take close
> to 2
> > > >>>     hours to
> > > >>>     >>>>      run already.
> > > >>>     >>>>      >>>> I fully support this claim: "Unless people start
> > > >>>     caring about
> > > >>>     >>>>      test times
> > > >>>     >>>>      >>>> before adding them, this issue cannot be solved"
> > > >>>     >>>>      >>>>
> > > >>>     >>>>      >>>> This is also another reason why using user's
> > Travis
> > > >>>     account
> > > >>>     >>>>      won't help.
> > > >>>     >>>>      >>>> Every few weeks we reach the user's time limit
> for
> > > >>>     a single
> > > >>>     >>>>      profile.
> > > >>>     >>>>      >>>> This makes the user's builds simply fail, until
> we
> > > >>>     either
> > > >>>     >>>>      properly
> > > >>>     >>>>      >>>> decrease the time the tests take (which I am not
> > > >>>     sure we ever
> > > >>>     >>>>      did) or
> > > >>>     >>>>      >>>> postpone the problem by splitting into more
> > > >>>     profiles. (Note
> > > >>>     >>>>      that the ASF
> > > >>>     >>>>      >>>> Travis account has higher time limits)
> > > >>>     >>>>      >>>>
> > > >>>     >>>>      >>>> Best,
> > > >>>     >>>>      >>>>
> > > >>>     >>>>      >>>> Dawid
> > > >>>     >>>>      >>>>
> > > >>>     >>>>      >>>> On 26/06/2019 09:36, Robert Metzger wrote:
> > > >>>     >>>>      >>>>> Do we know if using "the best" available
> hardware
> > > >>>     would
> > > >>>     >>>>      improve the
> > > >>>     >>>>      >> build
> > > >>>     >>>>      >>>>> times?
> > > >>>     >>>>      >>>>> Imagine we would run the build on machines with
> > > >>>     plenty of
> > > >>>     >>>>      main memory
> > > >>>     >>>>      >> to
> > > >>>     >>>>      >>>>> mount everything to ramdisk + the latest CPU
> > > >>>     architecture?
> > > >>>     >>>>      >>>>>
> > > >>>     >>>>      >>>>> Throwing hardware at the problem could help
> > reduce
> > > >>>     the time
> > > >>>     >>>>      of an
> > > >>>     >>>>      >>>>> individual build, and using our own
> > infrastructure
> > > >>>     would
> > > >>>     >>>>      remove our
> > > >>>     >>>>      >>>>> dependency on Apache's Travis account (with the
> > > >>>     obvious
> > > >>>     >>>>      downside of
> > > >>>     >>>>      >>>> having
> > > >>>     >>>>      >>>>> to maintain the infrastructure)
> > > >>>     >>>>      >>>>> We could use an open source travis alternative,
> > to
> > > >>>     have a
> > > >>>     >>>>      similar
> > > >>>     >>>>      >>>>> experience and make the migration easy.
> > > >>>     >>>>      >>>>>
> > > >>>     >>>>      >>>>>
> > > >>>     >>>>      >>>>> On Wed, Jun 26, 2019 at 9:34 AM Chesnay
> Schepler
> > > >>>     >>>>      <chesnay@apache.org <ma...@apache.org>
> > > >>>     <mailto:chesnay@apache.org <ma...@apache.org>>>
> > > >>>     >>>>      >>>> wrote:
> > > >>>     >>>>      >>>>>>    >From what I gathered, there's no special
> > > >>>     sauce that the
> > > >>>     >>>>      Zeppelin
> > > >>>     >>>>      >>>>>> project uses which actually integrates a users
> > > >>> Travis
> > > >>>     >>>>      account into the
> > > >>>     >>>>      >>>> PR.
> > > >>>     >>>>      >>>>>> They just disabled Travis for PRs. And that's
> > > >>>     kind of it.
> > > >>>     >>>>      >>>>>>
> > > >>>     >>>>      >>>>>> Naturally we can do this (duh) and safe the
> ASF
> > a
> > > >>>     fair
> > > >>>     >>>>      amount of
> > > >>>     >>>>      >>>>>> resources, but there are downsides:
> > > >>>     >>>>      >>>>>>
> > > >>>     >>>>      >>>>>> The discoverability of the Travis check takes
> a
> > > >>>     nose-dive.
> > > >>>     >>>>      Either we
> > > >>>     >>>>      >>>>>> require every contributor to always, an every
> > > >>>     commit, also
> > > >>>     >>>>      post a
> > > >>>     >>>>      >> Travis
> > > >>>     >>>>      >>>>>> build, or we have the reviewer sift through
> the
> > > >>>     >>>>      contributors account
> > > >>>     >>>>      >> to
> > > >>>     >>>>      >>>>>> find it.
> > > >>>     >>>>      >>>>>>
> > > >>>     >>>>      >>>>>> This is rather cumbersome. Additionally, it's
> > > >>>     also not
> > > >>>     >>>>      equivalent to
> > > >>>     >>>>      >>>>>> having a PR build.
> > > >>>     >>>>      >>>>>>
> > > >>>     >>>>      >>>>>> A normal branch build takes a branch as is and
> > > >>>     tests it. A
> > > >>>     >>>>      PR build
> > > >>>     >>>>      >>>>>> merges the branch into master, and then runs
> it.
> > > >>>     (Fun fact:
> > > >>>     >>>>      This is
> > > >>>     >>>>      >> why
> > > >>>     >>>>      >>>>>> a PR without merge conflicts is not being run
> on
> > > >>>     Travis.)
> > > >>>     >>>>      >>>>>>
> > > >>>     >>>>      >>>>>> And ultimately, everyone can already make use
> > > >>> of this
> > > >>>     >>>>      approach anyway.
> > > >>>     >>>>      >>>>>>
> > > >>>     >>>>      >>>>>> On 25/06/2019 08:02, Jark Wu wrote:
> > > >>>     >>>>      >>>>>>> Hi Jeff,
> > > >>>     >>>>      >>>>>>>
> > > >>>     >>>>      >>>>>>> Thanks for sharing the Zeppelin approach. I
> > > >>>     think it's a
> > > >>>     >>>>      good idea to
> > > >>>     >>>>      >>>>>>> leverage user's travis account.
> > > >>>     >>>>      >>>>>>> In this way, we can have almost unlimited
> > > >>>     concurrent build
> > > >>>     >>>>      jobs and
> > > >>>     >>>>      >>>>>>> developers can restart build by themselves
> > > >>>     (currently only
> > > >>>     >>>>      committers
> > > >>>     >>>>      >>>>>>> can restart PR's build).
> > > >>>     >>>>      >>>>>>>
> > > >>>     >>>>      >>>>>>> But I'm still not very clear how to integrate
> > > >>> user's
> > > >>>     >>>>      travis build
> > > >>>     >>>>      >> into
> > > >>>     >>>>      >>>>>>> the Flink pull request's build automatically.
> > > >>>     Can you
> > > >>>     >>>>      explain more in
> > > >>>     >>>>      >>>>>>> detail?
> > > >>>     >>>>      >>>>>>>
> > > >>>     >>>>      >>>>>>> Another question: does travis only build
> > > >>>     branches for user
> > > >>>     >>>>      account?
> > > >>>     >>>>      >>>>>>> My concern is that builds for PRs will rebase
> > > >>> user's
> > > >>>     >>>>      commits against
> > > >>>     >>>>      >>>>>>> current master branch.
> > > >>>     >>>>      >>>>>>> This will help us to find problems before
> > > >>>     merge.  Builds
> > > >>>     >>>>      for branches
> > > >>>     >>>>      >>>>>>> will lose the impact of new commits in
> master.
> > > >>>     >>>>      >>>>>>> How does Zeppelin solve this problem?
> > > >>>     >>>>      >>>>>>>
> > > >>>     >>>>      >>>>>>> Thanks again for sharing the idea.
> > > >>>     >>>>      >>>>>>>
> > > >>>     >>>>      >>>>>>> Regards,
> > > >>>     >>>>      >>>>>>> Jark
> > > >>>     >>>>      >>>>>>>
> > > >>>     >>>>      >>>>>>> On Tue, 25 Jun 2019 at 11:01, Jeff Zhang
> > > >>>     <zjffdu@gmail.com <ma...@gmail.com>
> > > >>>     >>>>      <mailto:zjffdu@gmail.com <ma...@gmail.com>>
> > > >>>     >>>>      >>>>>>> <mailto:zjffdu@gmail.com
> > > >>> <ma...@gmail.com> <mailto:zjffdu@gmail.com
> > > >>> <ma...@gmail.com>>>> wrote:
> > > >>>     >>>>      >>>>>>>
> > > >>>     >>>>      >>>>>>>  Hi Folks,
> > > >>>     >>>>      >>>>>>>
> > > >>>     >>>>      >>>>>>>  Zeppelin meet this kind of issue before, we
> > > >>> solve
> > > >>>     >>>> it by
> > > >>>     >>>>      >> delegating
> > > >>>     >>>>      >>>>>>>  each
> > > >>>     >>>>      >>>>>>>  one's PR build to his travis account
> > > >>>     (Everyone can
> > > >>>     >>>>      have 5 free
> > > >>>     >>>>      >>>>>>>  slot for
> > > >>>     >>>>      >>>>>>>  travis build).
> > > >>>     >>>>      >>>>>>>  Apache account travis build is only
> triggered
> > > >>> when
> > > >>>     >>>>      PR is merged.
> > > >>>     >>>>      >>>>>>>
> > > >>>     >>>>      >>>>>>>
> > > >>>     >>>>      >>>>>>>
> > > >>>     >>>>      >>>>>>>  Kurt Young <ykt836@gmail.com
> > > >>> <ma...@gmail.com>
> > > >>>     >>>>      <mailto:ykt836@gmail.com <ma...@gmail.com>>
> > > >>>     <mailto:ykt836@gmail.com <ma...@gmail.com>
> > > >>>     >>>>      <mailto:ykt836@gmail.com <mailto:ykt836@gmail.com
> >>>>
> > > >>>     >>>>      >>>>>>>  于2019年6月25日周二 上午10:16写道:
> > > >>>     >>>>      >>>>>>>
> > > >>>     >>>>      >>>>>>>  > (Forgot to cc George)
> > > >>>     >>>>      >>>>>>>  >
> > > >>>     >>>>      >>>>>>>  > Best,
> > > >>>     >>>>      >>>>>>>  > Kurt
> > > >>>     >>>>      >>>>>>>  >
> > > >>>     >>>>      >>>>>>>  >
> > > >>>     >>>>      >>>>>>>  > On Tue, Jun 25, 2019 at 10:16 AM Kurt
> Young
> > > >>>     >>>>      <ykt836@gmail.com <ma...@gmail.com>
> > > >>>     <mailto:ykt836@gmail.com <ma...@gmail.com>>
> > > >>>     >>>>      >>>>>>> <mailto:ykt836@gmail.com
> > > >>> <ma...@gmail.com> <mailto:ykt836@gmail.com
> > > >>> <ma...@gmail.com>>>>
> > > >>>     >>>>      wrote:
> > > >>>     >>>>      >>>>>>>  >
> > > >>>     >>>>      >>>>>>>  > > Hi Bowen,
> > > >>>     >>>>      >>>>>>>  > >
> > > >>>     >>>>      >>>>>>>  > > Thanks for bringing this up. We
> > > >>>     actually have
> > > >>>     >>>>      discussed
> > > >>>     >>>>      >> about
> > > >>>     >>>>      >>>>>>>  this, and I
> > > >>>     >>>>      >>>>>>>  > > think Till and George have
> > > >>>     >>>>      >>>>>>>  > > already spend sometime investigating
> > > >>>     it. I have
> > > >>>     >>>>      cced both of
> > > >>>     >>>>      >>>>>>>  them, and
> > > >>>     >>>>      >>>>>>>  > > maybe they can share
> > > >>>     >>>>      >>>>>>>  > > their findings.
> > > >>>     >>>>      >>>>>>>  > >
> > > >>>     >>>>      >>>>>>>  > > Best,
> > > >>>     >>>>      >>>>>>>  > > Kurt
> > > >>>     >>>>      >>>>>>>  > >
> > > >>>     >>>>      >>>>>>>  > >
> > > >>>     >>>>      >>>>>>>  > > On Tue, Jun 25, 2019 at 10:08 AM Jark Wu
> > > >>>     >>>>      <imjark@gmail.com <ma...@gmail.com>
> > > >>>     <mailto:imjark@gmail.com <ma...@gmail.com>>
> > > >>>     >>>>      >>>>>>> <mailto:imjark@gmail.com
> > > >>> <ma...@gmail.com> <mailto:imjark@gmail.com
> > > >>> <ma...@gmail.com>>>>
> > > >>>     >>>>      wrote:
> > > >>>     >>>>      >>>>>>>  > >
> > > >>>     >>>>      >>>>>>>  > >> Hi Bowen,
> > > >>>     >>>>      >>>>>>>  > >>
> > > >>>     >>>>      >>>>>>>  > >> Thanks for bringing this. We also
> > > >>>     suffered from
> > > >>>     >>>>      the long
> > > >>>     >>>>      >>>>>>>  build time.
> > > >>>     >>>>      >>>>>>>  > >> I agree that we should focus on
> > > >>>     solving build
> > > >>>     >>>>      capacity
> > > >>>     >>>>      >>>>>>>  problem in the
> > > >>>     >>>>      >>>>>>>  > >> thread.
> > > >>>     >>>>      >>>>>>>  > >>
> > > >>>     >>>>      >>>>>>>  > >> My observation is there is only one
> > > >>>     build is
> > > >>>     >>>>      running, all
> > > >>>     >>>>      >> the
> > > >>>     >>>>      >>>>>>>  others
> > > >>>     >>>>      >>>>>>>  > >> (other
> > > >>>     >>>>      >>>>>>>  > >> PRs, master) are pending.
> > > >>>     >>>>      >>>>>>>  > >> The pricing plan[1] of travis shows
> > > >>>     it can
> > > >>>     >>>> support
> > > >>>     >>>>      >> concurrent
> > > >>>     >>>>      >>>>>>>  build
> > > >>>     >>>>      >>>>>>>  > jobs.
> > > >>>     >>>>      >>>>>>>  > >> But I don't know which plan we are
> > > >>>     using, might
> > > >>>     >>>>      be the free
> > > >>>     >>>>      >>>>>>>  plan for
> > > >>>     >>>>      >>>>>>>  > open
> > > >>>     >>>>      >>>>>>>  > >> source.
> > > >>>     >>>>      >>>>>>>  > >>
> > > >>>     >>>>      >>>>>>>  > >> I cc-ed Chesnay who may have some
> > > >>>     experience on
> > > >>>     >>>>      Travis.
> > > >>>     >>>>      >>>>>>>  > >>
> > > >>>     >>>>      >>>>>>>  > >> Regards,
> > > >>>     >>>>      >>>>>>>  > >> Jark
> > > >>>     >>>>      >>>>>>>  > >>
> > > >>>     >>>>      >>>>>>>  > >> [1]: https://travis-ci.com/plans
> > > >>>     >>>>      >>>>>>>  > >>
> > > >>>     >>>>      >>>>>>>  > >> On Tue, 25 Jun 2019 at 08:11, Bowen Li
> <
> > > >>>     >>>>      >> bowenli86@gmail.com <ma...@gmail.com>
> > > >>>     <mailto:bowenli86@gmail.com <ma...@gmail.com>>
> > > >>>     >>>>      >>>>>>> <mailto:bowenli86@gmail.com
> > > >>> <ma...@gmail.com>
> > > >>>     >>>>      <mailto:bowenli86@gmail.com
> > > >>> <ma...@gmail.com>>>> wrote:
> > > >>>     >>>>      >>>>>>>  > >>
> > > >>>     >>>>      >>>>>>>  > >> > Hi Steven,
> > > >>>     >>>>      >>>>>>>  > >> >
> > > >>>     >>>>      >>>>>>>  > >> > I think you may not read what I
> > > >>>     wrote. The
> > > >>>     >>>>      discussion is
> > > >>>     >>>>      >>>> about
> > > >>>     >>>>      >>>>>>>  > "unstable
> > > >>>     >>>>      >>>>>>>  > >> > build **capacity**", in another word
> > > >>>     >>>>      "unstable / lack of
> > > >>>     >>>>      >>>> build
> > > >>>     >>>>      >>>>>>>  > >> resources",
> > > >>>     >>>>      >>>>>>>  > >> > not "unstable build".
> > > >>>     >>>>      >>>>>>>  > >> >
> > > >>>     >>>>      >>>>>>>  > >> > On Mon, Jun 24, 2019 at 4:40 PM
> > > >>>     Steven Wu
> > > >>>     >>>>      >>>>>>>  <stevenz3wu@gmail.com
> > > >>> <ma...@gmail.com> <mailto:stevenz3wu@gmail.com
> > > >>> <ma...@gmail.com>>
> > > >>>     >>>>      <mailto:stevenz3wu@gmail.com
> > > >>> <ma...@gmail.com> <mailto:stevenz3wu@gmail.com
> > > >>> <ma...@gmail.com>>>>
> > > >>>     >>>>      >>>>>>>  > wrote:
> > > >>>     >>>>      >>>>>>>  > >> >
> > > >>>     >>>>      >>>>>>>  > >> > > long and sometimes unstable build
> is
> > > >>>     >>>>      definitely a pain
> > > >>>     >>>>      >>>>>> point.
> > > >>>     >>>>      >>>>>>>  > >> > >
> > > >>>     >>>>      >>>>>>>  > >> > > I suspect the build failure here in
> > > >>>     >>>>      >> flink-connector-kafka
> > > >>>     >>>>      >>>>>>>  is not
> > > >>>     >>>>      >>>>>>>  > >> related
> > > >>>     >>>>      >>>>>>>  > >> > to
> > > >>>     >>>>      >>>>>>>  > >> > > my change. but there is no easy
> > > >>>     re-run the
> > > >>>     >>>>      build on
> > > >>>     >>>>      >>>>>>>  travis UI.
> > > >>>     >>>>      >>>>>>>  > Google
> > > >>>     >>>>      >>>>>>>  > >> > > search showed a trick of
> > > >>>     close-and-open the
> > > >>>     >>>>      PR will
> > > >>>     >>>>      >>>>>>>  trigger rebuild.
> > > >>>     >>>>      >>>>>>>  > >> but
> > > >>>     >>>>      >>>>>>>  > >> > > that could add noises to the PR
> > > >>>     activities.
> > > >>>     >>>>      >>>>>>>  > >> > >
> > > >>>     >>>> https://travis-ci.org/apache/flink/jobs/545555519
> > > >>>     >>>>      >>>>>>>  > >> > >
> > > >>>     >>>>      >>>>>>>  > >> > > travis-ci for my personal repo
> > > >>>     often failed
> > > >>>     >>>>      with
> > > >>>     >>>>      >>>>>>>  exceeding time
> > > >>>     >>>>      >>>>>>>  > limit
> > > >>>     >>>>      >>>>>>>  > >> > after
> > > >>>     >>>>      >>>>>>>  > >> > > 4+ hours.
> > > >>>     >>>>      >>>>>>>  > >> > > The job exceeded the maximum time
> > > >>>     limit for
> > > >>>     >>>>      jobs, and
> > > >>>     >>>>      >> has
> > > >>>     >>>>      >>>>>>>  been
> > > >>>     >>>>      >>>>>>>  > >> > terminated.
> > > >>>     >>>>      >>>>>>>  > >> > >
> > > >>>     >>>>      >>>>>>>  > >> > > On Mon, Jun 24, 2019 at 4:15 PM
> > > >>>     Bowen Li
> > > >>>     >>>>      >>>>>>>  <bowenli86@gmail.com
> > > >>> <ma...@gmail.com> <mailto:bowenli86@gmail.com
> > > >>> <ma...@gmail.com>>
> > > >>>     >>>>      <mailto:bowenli86@gmail.com <mailto:
> > bowenli86@gmail.com>
> > > >>>     <mailto:bowenli86@gmail.com <ma...@gmail.com>>>>
> > > >>>     >>>>      >>>>>>>  > wrote:
> > > >>>     >>>>      >>>>>>>  > >> > >
> > > >>>     >>>>      >>>>>>>  > >> > > >
> > > >>>     >>>> https://travis-ci.org/apache/flink/builds/549681530
> > > >>>     >>>>      >>>>>>>  This build
> > > >>>     >>>>      >>>>>>>  > >> > request
> > > >>>     >>>>      >>>>>>>  > >> > > > has
> > > >>>     >>>>      >>>>>>>  > >> > > > been sitting at **HEAD of the
> > > >>>     queue**
> > > >>>     >>>>      since I first
> > > >>>     >>>>      >> saw
> > > >>>     >>>>      >>>>>>>  it at PST
> > > >>>     >>>>      >>>>>>>  > >> > 10:30am
> > > >>>     >>>>      >>>>>>>  > >> > > > (not sure how long it's been
> > > >>>     there before
> > > >>>     >>>>      10:30am).
> > > >>>     >>>>      >>>>>>>  It's PST
> > > >>>     >>>>      >>>>>>>  > 4:12pm
> > > >>>     >>>>      >>>>>>>  > >> now
> > > >>>     >>>>      >>>>>>>  > >> > > and
> > > >>>     >>>>      >>>>>>>  > >> > > > it hasn't started yet.
> > > >>>     >>>>      >>>>>>>  > >> > > >
> > > >>>     >>>>      >>>>>>>  > >> > > > On Mon, Jun 24, 2019 at 2:48 PM
> > > >>>     Bowen Li
> > > >>>     >>>>      >>>>>>>  <bowenli86@gmail.com
> > > >>> <ma...@gmail.com> <mailto:bowenli86@gmail.com
> > > >>> <ma...@gmail.com>>
> > > >>>     >>>>      <mailto:bowenli86@gmail.com <mailto:
> > bowenli86@gmail.com>
> > > >>>     <mailto:bowenli86@gmail.com <ma...@gmail.com>>>>
> > > >>>     >>>>      >>>>>>>  > >> wrote:
> > > >>>     >>>>      >>>>>>>  > >> > > >
> > > >>>     >>>>      >>>>>>>  > >> > > > > Hi devs,
> > > >>>     >>>>      >>>>>>>  > >> > > > >
> > > >>>     >>>>      >>>>>>>  > >> > > > > I've been experiencing the pain
> > > >>>     >>>>      resulting from lack
> > > >>>     >>>>      >>>>>>>  of stable
> > > >>>     >>>>      >>>>>>>  > >> build
> > > >>>     >>>>      >>>>>>>  > >> > > > > capacity on Travis for Flink
> > > >>>     PRs [1].
> > > >>>     >>>>      >> Specifically, I
> > > >>>     >>>>      >>>>>>>  noticed
> > > >>>     >>>>      >>>>>>>  > >> often
> > > >>>     >>>>      >>>>>>>  > >> > > that
> > > >>>     >>>>      >>>>>>>  > >> > > > no
> > > >>>     >>>>      >>>>>>>  > >> > > > > build in the queue is making
> any
> > > >>>     >>>>      progress for
> > > >>>     >>>>      >> hours,
> > > >>>     >>>>      >>>> and
> > > >>>     >>>>      >>>>>>>  > suddenly
> > > >>>     >>>>      >>>>>>>  > >> 5
> > > >>>     >>>>      >>>>>>>  > >> > or
> > > >>>     >>>>      >>>>>>>  > >> > > 6
> > > >>>     >>>>      >>>>>>>  > >> > > > > builds kick off all together
> > > >>>     after the
> > > >>>     >>>>      long pause.
> > > >>>     >>>>      >>>>>>>  I'm at PST
> > > >>>     >>>>      >>>>>>>  > >> > (UTC-08)
> > > >>>     >>>>      >>>>>>>  > >> > > > time
> > > >>>     >>>>      >>>>>>>  > >> > > > > zone, and I've seen pause can
> > > >>>     be as
> > > >>>     >>>>      long as 6 hours
> > > >>>     >>>>      >>>>>>>  from PST 9am
> > > >>>     >>>>      >>>>>>>  > >> to
> > > >>>     >>>>      >>>>>>>  > >> > 3pm
> > > >>>     >>>>      >>>>>>>  > >> > > > > (let alone the time needed to
> > > >>>     drain the
> > > >>>     >>>>      queue
> > > >>>     >>>>      >>>>>>>  afterwards).
> > > >>>     >>>>      >>>>>>>  > >> > > > >
> > > >>>     >>>>      >>>>>>>  > >> > > > > I think this has greatly
> > > >>>     impacted our
> > > >>>     >>>>      productivity.
> > > >>>     >>>>      >>>> I've
> > > >>>     >>>>      >>>>>>>  > >> experienced
> > > >>>     >>>>      >>>>>>>  > >> > > that
> > > >>>     >>>>      >>>>>>>  > >> > > > > PRs submitted in the early
> > > >>>     morning of
> > > >>>     >>>>      PST time zone
> > > >>>     >>>>      >>>>>>>  won't finish
> > > >>>     >>>>      >>>>>>>  > >> > their
> > > >>>     >>>>      >>>>>>>  > >> > > > > build until late night of the
> > > >>>     same day.
> > > >>>     >>>>      >>>>>>>  > >> > > > >
> > > >>>     >>>>      >>>>>>>  > >> > > > > So my questions are:
> > > >>>     >>>>      >>>>>>>  > >> > > > >
> > > >>>     >>>>      >>>>>>>  > >> > > > > - Has anyone else experienced
> > > >>>     the same
> > > >>>     >>>>      problem or
> > > >>>     >>>>      >>>>>>>  have similar
> > > >>>     >>>>      >>>>>>>  > >> > > > observation
> > > >>>     >>>>      >>>>>>>  > >> > > > > on TravisCI? (I suspect it
> > > >>>     has things
> > > >>>     >>>>      to do with
> > > >>>     >>>>      >> time
> > > >>>     >>>>      >>>>>>>  zone)
> > > >>>     >>>>      >>>>>>>  > >> > > > >
> > > >>>     >>>>      >>>>>>>  > >> > > > > - What pricing plan of
> > > >>>     TravisCI is
> > > >>>     >>>>      Flink currently
> > > >>>     >>>>      >>>>>>>  using? Is it
> > > >>>     >>>>      >>>>>>>  > >> the
> > > >>>     >>>>      >>>>>>>  > >> > > free
> > > >>>     >>>>      >>>>>>>  > >> > > > > plan for open source
> > > >>>     projects? What
> > > >>>     >>>> are the
> > > >>>     >>>>      >>>>>>>  guaranteed build
> > > >>>     >>>>      >>>>>>>  > >> capacity
> > > >>>     >>>>      >>>>>>>  > >> > > of
> > > >>>     >>>>      >>>>>>>  > >> > > > > the current plan?
> > > >>>     >>>>      >>>>>>>  > >> > > > >
> > > >>>     >>>>      >>>>>>>  > >> > > > > - If the current pricing plan
> > > >>>     (either
> > > >>>     >>>>      free or paid)
> > > >>>     >>>>      >>>>>> can't
> > > >>>     >>>>      >>>>>>>  > provide
> > > >>>     >>>>      >>>>>>>  > >> > > stable
> > > >>>     >>>>      >>>>>>>  > >> > > > > build capacity, can we
> > > >>>     upgrade to a
> > > >>>     >>>>      higher priced
> > > >>>     >>>>      >>>>>>>  plan with
> > > >>>     >>>>      >>>>>>>  > larger
> > > >>>     >>>>      >>>>>>>  > >> > and
> > > >>>     >>>>      >>>>>>>  > >> > > > more
> > > >>>     >>>>      >>>>>>>  > >> > > > > stable build capacity?
> > > >>>     >>>>      >>>>>>>  > >> > > > >
> > > >>>     >>>>      >>>>>>>  > >> > > > > BTW, another factor that
> > > >>>     contribute to
> > > >>>     >>>> the
> > > >>>     >>>>      >>>>>>>  productivity problem
> > > >>>     >>>>      >>>>>>>  > is
> > > >>>     >>>>      >>>>>>>  > >> > that
> > > >>>     >>>>      >>>>>>>  > >> > > > > our build is slow - we run
> > > >>>     full build
> > > >>>     >>>>      for every PR
> > > >>>     >>>>      >>>> and a
> > > >>>     >>>>      >>>>>>>  > >> successful
> > > >>>     >>>>      >>>>>>>  > >> > > full
> > > >>>     >>>>      >>>>>>>  > >> > > > > build takes ~5h. We
> > > >>>     definitely have
> > > >>>     >>>>      more options to
> > > >>>     >>>>      >>>>>>>  solve it,
> > > >>>     >>>>      >>>>>>>  > for
> > > >>>     >>>>      >>>>>>>  > >> > > > instance,
> > > >>>     >>>>      >>>>>>>  > >> > > > > modularize the build graphs
> > > >>>     and reuse
> > > >>>     >>>>      artifacts
> > > >>>     >>>>      >> from
> > > >>>     >>>>      >>>> the
> > > >>>     >>>>      >>>>>>>  > previous
> > > >>>     >>>>      >>>>>>>  > >> > > build.
> > > >>>     >>>>      >>>>>>>  > >> > > > > But I think that can be a big
> > > >>>     effort
> > > >>>     >>>>      which is much
> > > >>>     >>>>      >>>>>>>  harder to
> > > >>>     >>>>      >>>>>>>  > >> > accomplish
> > > >>>     >>>>      >>>>>>>  > >> > > > in
> > > >>>     >>>>      >>>>>>>  > >> > > > > a short period of time and
> > > >>>     may deserve
> > > >>>     >>>>      its own
> > > >>>     >>>>      >>>> separate
> > > >>>     >>>>      >>>>>>>  > >> discussion.
> > > >>>     >>>>      >>>>>>>  > >> > > > >
> > > >>>     >>>>      >>>>>>>  > >> > > > > [1]
> > > >>>     >>>>      >> https://travis-ci.org/apache/flink/pull_requests
> > > >>>     >>>>      >>>>>>>  > >> > > > >
> > > >>>     >>>>      >>>>>>>  > >> > > > >
> > > >>>     >>>>      >>>>>>>  > >> > > >
> > > >>>     >>>>      >>>>>>>  > >> > >
> > > >>>     >>>>      >>>>>>>  > >> >
> > > >>>     >>>>      >>>>>>>  > >>
> > > >>>     >>>>      >>>>>>>  > >
> > > >>>     >>>>      >>>>>>>  >
> > > >>>     >>>>      >>>>>>>
> > > >>>     >>>>      >>>>>>>
> > > >>>     >>>>      >>>>>>>  --
> > > >>>     >>>>      >>>>>>>  Best Regards
> > > >>>     >>>>      >>>>>>>
> > > >>>     >>>>      >>>>>>>  Jeff Zhang
> > > >>>     >>>>      >>>>>>>
> > > >>>     >>>>      >>
> > > >>>     >>>>
> > > >>>     >>>
> > > >>>     >>
> > > >>>
> > > >>
> > > >>
> > > >
> > >
> > >
> >
>

Re: [VOTE] Migrate to sponsored Travis account

Posted by Congxian Qiu <qc...@gmail.com>.
+1 for the migration.

Best,
Congxian


Hequn Cheng <ch...@gmail.com> 于2019年7月4日周四 下午9:42写道:

> +1.
>
> And thanks a lot to Chesnay for pushing this.
>
> Best, Hequn
>
> On Thu, Jul 4, 2019 at 8:07 PM Chesnay Schepler <ch...@apache.org>
> wrote:
>
> > Note that the Flinkbot approach isn't that trivial either; we can't
> > _just_ trigger builds for a branch in the apache repo, but would first
> > have to clone the branch/pr into a separate repository (that is owned by
> > the github account that the travis account would be tied to).
> >
> > One roadblock after the next showing up...
> >
> > On 04/07/2019 11:59, Chesnay Schepler wrote:
> > > Small update with mostly bad news:
> > >
> > > INFRA doesn't know whether it is possible, and referred my to Travis
> > > support.
> > > They did point out that it could be problematic in regards to
> > > read/write permissions for the repository.
> > >
> > > From my own findings /so far/ with a test repo/organization, it does
> > > not appear possible to configure the Travis account used for a
> > > specific repository.
> > >
> > > So yeah, if we go down this route we may have to pimp the Flinkbot to
> > > trigger builds through the Travis REST API.
> > >
> > > On 04/07/2019 10:46, Chesnay Schepler wrote:
> > >> I've raised a JIRA
> > >> <https://issues.apache.org/jira/browse/INFRA-18703>with INFRA to
> > >> inquire whether it would be possible to switch to a different Travis
> > >> account, and if so what steps would need to be taken.
> > >> We need a proper confirmation from INFRA since we are not in full
> > >> control of the flink repository (for example, we cannot access the
> > >> settings page).
> > >>
> > >> If this is indeed possible, Ververica is willing sponsor a Travis
> > >> account for the Flink project.
> > >> This would provide us with more than enough resources than we need.
> > >>
> > >> Since this makes the project more reliant on resources provided by
> > >> external companies I would like to vote on this.
> > >>
> > >> Please vote on this proposal, as follows:
> > >> [ ] +1, Approve the migration to a Ververica-sponsored Travis
> > >> account, provided that INFRA approves
> > >> [ ] -1, Do not approach the migration to a Ververica-sponsored Travis
> > >> account
> > >>
> > >> The vote will be open for at least 24h, and until we have
> > >> confirmation from INFRA. The voting period may be shorter than the
> > >> usual 3 days since our current is effectively not working.
> > >>
> > >> On 04/07/2019 06:51, Bowen Li wrote:
> > >>> Re: > Are they using their own Travis CI pool, or did the switch to
> > >>> an entirely different CI service?
> > >>>
> > >>> I reached out to Wes and Krisztián from Apache Arrow PMC. They are
> > >>> currently moving away from ASF's Travis to their own in-house metal
> > >>> machines at [1] with custom CI application at [2]. They've seen
> > >>> significant improvement w.r.t both much higher performance and
> > >>> basically no resource waiting time, "night-and-day" difference
> > >>> quoting Wes.
> > >>>
> > >>> Re: > If we can just switch to our own Travis pool, just for our
> > >>> project, then this might be something we can do fairly quickly?
> > >>>
> > >>> I believe so, according to [3] and [4]
> > >>>
> > >>>
> > >>> [1] https://ci.ursalabs.org/ <https://ci.ursalabs.org/#/>
> > >>> [2] https://github.com/ursa-labs/ursabot
> > >>> [3]
> > >>>
> > https://docs.travis-ci.com/user/migrate/open-source-repository-migration
> > >>>
> > >>> [4]
> > >>> https://docs.travis-ci.com/user/migrate/open-source-on-travis-ci-com
> > >>>
> > >>>
> > >>>
> > >>> On Wed, Jul 3, 2019 at 12:01 AM Chesnay Schepler <chesnay@apache.org
> > >>> <ma...@apache.org>> wrote:
> > >>>
> > >>>     Are they using their own Travis CI pool, or did the switch to an
> > >>>     entirely different CI service?
> > >>>
> > >>>     If we can just switch to our own Travis pool, just for our
> > >>>     project, then
> > >>>     this might be something we can do fairly quickly?
> > >>>
> > >>>     On 03/07/2019 05:55, Bowen Li wrote:
> > >>>     > I responded in the INFRA ticket [1] that I believe they are
> > >>>     using a wrong
> > >>>     > metric against Flink and the total build time is a completely
> > >>>     different
> > >>>     > thing than guaranteed build capacity.
> > >>>     >
> > >>>     > My response:
> > >>>     >
> > >>>     > "As mentioned above, since I started to pay attention to
> Flink's
> > >>>     build
> > >>>     > queue a few tens of days ago, I'm in Seattle and I saw no build
> > >>>     was kicking
> > >>>     > off in PST daytime in weekdays for Flink. Our teammates in
> China
> > >>>     and Europe
> > >>>     > have also reported similar observations. So we need to evaluate
> > >>>     how the
> > >>>     > large total build time came from - if 1) your number and 2) our
> > >>>     > observations from three locations that cover pretty much a full
> > >>>     day, are
> > >>>     > all true, I **guess** one reason can be that - highly likely
> the
> > >>>     extra
> > >>>     > build time came from weekends when other Apache projects may be
> > >>>     idle and
> > >>>     > Flink just drains hard its congested queue.
> > >>>     >
> > >>>     > Please be aware of that we're not complaining about the lack of
> > >>>     resources
> > >>>     > in general, I'm complaining about the lack of **stable,
> > >>> dedicated**
> > >>>     > resources. An example for the latter one is, currently even if
> > >>>     no build is
> > >>>     > in Flink's queue and I submit a request to be the queue head
> > >>> in PST
> > >>>     > morning, my build won't even start in 6-8+h. That is an absurd
> > >>>     amount of
> > >>>     > waiting time.
> > >>>     >
> > >>>     > That's saying, if ASF INFRA decides to adopt a quota system and
> > >>>     grants
> > >>>     > Flink five DEDICATED servers that runs all the time only for
> > >>>     Flink, that'll
> > >>>     > be PERFECT and can totally solve our problem now.
> > >>>     >
> > >>>     > Please be aware of that we're not complaining about the lack of
> > >>>     resources
> > >>>     > in general, I'm complaining about the lack of **stable,
> > >>> dedicated**
> > >>>     > resources. An example for the latter one is, currently even if
> > >>>     no build is
> > >>>     > in Flink's queue and I submit a request to be the queue head
> > >>> in PST
> > >>>     > morning, my build won't even start in 6-8+h. That is an absurd
> > >>>     amount of
> > >>>     > waiting time.
> > >>>     >
> > >>>     >
> > >>>     > That's saying, if ASF INFRA decides to adopt a quota system and
> > >>>     grants
> > >>>     > Flink five DEDICATED servers that runs all the time only for
> > >>>     Flink, that'll
> > >>>     > be PERFECT and can totally solve our problem now.
> > >>>     >
> > >>>     > I feel what's missing in the ASF INFRA's Travis resource pool
> is
> > >>>     some level
> > >>>     > of build capacity SLAs and certainty"
> > >>>     >
> > >>>     >
> > >>>     > Again, I believe there are differences in nature of these two
> > >>>     problems,
> > >>>     > long build time v.s. lack of dedicated build resource. That's
> > >>>     saying,
> > >>>     > shortening build time may relieve the situation, and may not.
> > >>>     I'm sightly
> > >>>     > negative on disabling IT cases for PRs, due to the downside is
> > >>>     that we are
> > >>>     > at risk of any potential bugs in PR that UTs doesn't catch, and
> > >>>     may cost a
> > >>>     > lot more to fix and if it slows others down or even block
> > >>>     others, but am
> > >>>     > open to others opinions on it.
> > >>>     >
> > >>>     > AFAICT from INFRA ticket[1], donating to ASF INFRA won't be
> > >>>     feasible to
> > >>>     > solve our problem since INFRA's pool is fully shared and they
> > >>>     have no
> > >>>     > control and finer insights over resource allocation to a
> > >>>     specific Apache
> > >>>     > project. As mentioned in [1], Apache Arrow is moving away from
> > >>>     ASF INFRA
> > >>>     > Travis pool (they are actually surprised Flink hasn't plan to
> do
> > >>>     so). I
> > >>>     > know that Spark is on its own build infra. If we all agree that
> > >>>     funding our
> > >>>     > own build infra, I'd be glad to help investigate any potential
> > >>>     options
> > >>>     > after releasing 1.9 since I'm super busy with 1.9 now.
> > >>>     >
> > >>>     > [1] https://issues.apache.org/jira/browse/INFRA-18533
> > >>>     >
> > >>>     >
> > >>>     >
> > >>>     > On Tue, Jul 2, 2019 at 4:46 AM Chesnay Schepler
> > >>>     <chesnay@apache.org <ma...@apache.org>> wrote:
> > >>>     >
> > >>>     >> As a short-term stopgap, since we can assume this issue to
> > >>>     become much
> > >>>     >> worse in the following days/weeks, we could disable IT cases
> in
> > >>>     PRs and
> > >>>     >> only run them on master.
> > >>>     >>
> > >>>     >> On 02/07/2019 12:03, Chesnay Schepler wrote:
> > >>>     >>> People really have to stop thinking that just because
> > >>>     something works
> > >>>     >>> for us it is also a good solution.
> > >>>     >>> Also, please remember that our builds run for 2h from start
> to
> > >>>     finish,
> > >>>     >>> and not the 14 _minutes_ it takes for zeppelin.
> > >>>     >>> We are dealing with an entirely different scale here, both in
> > >>>     terms of
> > >>>     >>> build times and number of builds.
> > >>>     >>>
> > >>>     >>> In this very thread people have been complaining about long
> > >>> queue
> > >>>     >>> times for their builds. Surprise, other Apache projects have
> > >>> been
> > >>>     >>> suffering the very same thing due to us not controlling our
> > >>> build
> > >>>     >>> times. While switching services (be it Jenkins, CircleCI or
> > >>>     whatever)
> > >>>     >>> will possibly work for us (and these options are actually
> > >>>     attractive,
> > >>>     >>> like CircleCI's proper support for build artifacts), it will
> > >>> also
> > >>>     >>> result in us likely negatively affecting other projects in
> > >>>     significant
> > >>>     >>> ways.
> > >>>     >>>
> > >>>     >>> Sure, the Jenkins setup has a good user experience for us, at
> > >>>     the cost
> > >>>     >>> of blocking Jenkins workers for a _lot_ of time. Right now we
> > >>>     have 25
> > >>>     >>> PR's in our queue; that's possibly 50h we'd consume of
> Jenkins
> > >>>     >>> resources, and the European contributors haven't even really
> > >>>     started yet.
> > >>>     >>>
> > >>>     >>> FYI, the latest INFRA response from INFRA-18533:
> > >>>     >>>
> > >>>     >>> "Our rough metrics shows that Flink used over 5800 hours of
> > >>>     build time
> > >>>     >>> last month. That is equal to EIGHT servers running 24/7 for
> > >>>     the ENTIRE
> > >>>     >>> MONTH. EIGHT. nonstop.
> > >>>     >>> When we discovered this last night, we discussed it some and
> > >>>     are going
> > >>>     >>> to tune down Flink to allow only five executors maximum. We
> > >>> cannot
> > >>>     >>> allow Flink to consume so much of a Foundation shared
> > >>> resource."
> > >>>     >>>
> > >>>     >>> So yes, we either
> > >>>     >>> a) have to heavily reduce our CI usage or
> > >>>     >>> b) fund our own, either maintaining it ourselves or donating
> > >>>     to Apache.
> > >>>     >>>
> > >>>     >>> On 02/07/2019 05:11, Bowen Li wrote:
> > >>>     >>>> By looking at the git history of the Jenkins script, its
> core
> > >>>     part
> > >>>     >>>> was finished in March 2017 (and only two minor update in
> > >>>     2017/2018),
> > >>>     >>>> so it's been running for over two years now and feels like
> > >>>     Zepplin
> > >>>     >>>> community has been quite happy with it. @Jeff Zhang
> > >>>     >>>> <mailto:zjffdu@gmail.com <ma...@gmail.com>> can you
> > >>>     share your insights and user
> > >>>     >>>> experience with the Jenkins+Travis approach?
> > >>>     >>>>
> > >>>     >>>> Things like:
> > >>>     >>>>
> > >>>     >>>> - has the approach completely solved the resource capacity
> > >>>     problem
> > >>>     >>>> for Zepplin community? is Zepplin community happy with the
> > >>>     result?
> > >>>     >>>> - is the whole configuration chain stable (e.g. uptime)
> > >>> enough?
> > >>>     >>>> - how often do you need to maintain the Jenkins infra? how
> > >>> many
> > >>>     >>>> people are usually involved in maintenance and bug-fixes?
> > >>>     >>>>
> > >>>     >>>> The downside of this approach seems mostly to be on the
> > >>>     maintenance
> > >>>     >>>> to me - maintain the script and Jenkins infra.
> > >>>     >>>>
> > >>>     >>>> ** Having Our Own Travis-CI.com Account **
> > >>>     >>>>
> > >>>     >>>> Another alternative I've been thinking of is to have our own
> > >>>     >>>> travis-ci.com <http://travis-ci.com> <http://travis-ci.com>
> > >>>     account with paid dedicated
> > >>>     >>>> resources. Note travis-ci.org <http://travis-ci.org>
> > >>> <http://travis-ci.org> is the free
> > >>>     >>>> version and travis-ci.com <http://travis-ci.com>
> > >>> <http://travis-ci.com> is the commercial
> > >>>     >>>> version. We currently use a shared resource pool managed by
> > >>>     ASK INFRA
> > >>>     >>>> team on travis-ci.org <http://travis-ci.org>
> > >>> <http://travis-ci.org>, but we have no control
> > >>>     >>>> over it - we can't see how it's configured, how much
> > >>>     resources are
> > >>>     >>>> available, how resources are allocated among Apache
> projects,
> > >>>     etc.
> > >>>     >>>> The nice thing about having an account on travis-ci.com
> > >>> <http://travis-ci.com>
> > >>>     >>>> <http://travis-ci.com> are:
> > >>>     >>>>
> > >>>     >>>> - relatively low cost with much better resource guarantee
> > >>>     than what
> > >>>     >>>> we currently have [1]: $249/month with 5 dedicated
> > >>> concurrency,
> > >>>     >>>> $489/month with 10 concurrency
> > >>>     >>>> - low maintenance work compared to using Jenkins
> > >>>     >>>> - (potentially) no migration cost according to Travis's doc
> > >>> [2]
> > >>>     >>>> (pending verification)
> > >>>     >>>> - full control over the build capacity/configuration
> > >>> compared to
> > >>>     >>>> using ASF INFRA's pool
> > >>>     >>>>
> > >>>     >>>> I'd be surprised if we as such a vibrant community cannot
> > >>>     find and
> > >>>     >>>> fund $249*12=$2988 a year in exchange for a much better
> > >>> developer
> > >>>     >>>> experience and much higher productivity.
> > >>>     >>>>
> > >>>     >>>> [1] https://travis-ci.com/plans
> > >>>     >>>> [2]
> > >>>     >>>>
> > >>>     >>
> > >>>
> > https://docs.travis-ci.com/user/migrate/open-source-repository-migration
> > >>>
> > >>>     >>>> On Sat, Jun 29, 2019 at 8:39 AM Chesnay Schepler
> > >>>     <chesnay@apache.org <ma...@apache.org>
> > >>>     >>>> <mailto:chesnay@apache.org <ma...@apache.org>>>
> > >>> wrote:
> > >>>     >>>>
> > >>>     >>>>      So yes, the Jenkins job keeps pulling the state from
> > >>>     Travis until it
> > >>>     >>>>      finishes.
> > >>>     >>>>
> > >>>     >>>>      Note sure I'm comfortable with the idea of using
> Jenkins
> > >>>     workers
> > >>>     >>>>      just to
> > >>>     >>>>      idle for a several hours.
> > >>>     >>>>
> > >>>     >>>>      On 29/06/2019 14:56, Jeff Zhang wrote:
> > >>>     >>>>      > Here's what zeppelin community did, we make a python
> > >>>     script to
> > >>>     >>>>      check the
> > >>>     >>>>      > build status of pull request.
> > >>>     >>>>      > Here's script:
> > >>>     >>>>      >
> > >>> https://github.com/apache/zeppelin/blob/master/travis_check.py
> > >>>     >>>>      >
> > >>>     >>>>      > And this is the script we used in Jenkins build job.
> > >>>     >>>>      >
> > >>>     >>>>      > if [ -f "travis_check.py" ]; then
> > >>>     >>>>      >    git log -n 1
> > >>>     >>>>      >    STATUS=$(curl -s $BUILD_URL | grep -e "GitHub pull
> > >>>     >>>>      request.*from.*" | sed
> > >>>     >>>>      > 's/.*GitHub pull request <a
> > >>>     >>>>      > href=\"\(https[^"]*\).*from[^"]*.\(https[^"]*\).*/\1
> > >>>     \2/g')
> > >>>     >>>>      >    AUTHOR=$(echo $STATUS | sed 's/.*[/]\(.*\)$/\1/g')
> > >>>     >>>>      >    PR=$(echo $STATUS | awk '{print $1}' | sed
> > >>>     >>>> 's/.*[/]\(.*\)$/\1/g')
> > >>>     >>>>      >    #COMMIT=$(git log -n 1 | grep "^Merge:" | awk
> > >>>     '{print $3}')
> > >>>     >>>>      >    #if [ -z $COMMIT ]; then
> > >>>     >>>>      >    #  COMMIT=$(curl -s
> > >>>     >>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR
> > >>>     >>>>      > | grep -e "\"label\":" -e "\"ref\":" -e "\"sha\":" |
> > >>>     tr '\n' ' '
> > >>>     >>>>      | sed
> > >>>     >>>>      > 's/\(.*sha[^,]*,\)\(.*ref.*\)/\1 = \2/g' | tr = '\n'
> |
> > >>>     grep -v
> > >>>     >>>>      "apache:" |
> > >>>     >>>>      > sed 's/.*sha.[^"]*["]\([^"]*\).*/\1/g')
> > >>>     >>>>      >    #fi
> > >>>     >>>>      >
> > >>>     >>>>      >    # get commit hash from PR
> > >>>     >>>>      >    COMMIT=$(curl -s
> > >>>     >>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR |
> > >>>     >>>>      > grep -e "\"label\":" -e "\"ref\":" -e "\"sha\":" | tr
> > >>>     '\n' ' '
> > >>>     >>>> | sed
> > >>>     >>>>      > 's/\(.*sha[^,]*,\)\(.*ref.*\)/\1 = \2/g' | tr = '\n'
> |
> > >>>     grep -v
> > >>>     >>>>      "apache:" |
> > >>>     >>>>      > sed 's/.*sha.[^"]*["]\([^"]*\).*/\1/g')
> > >>>     >>>>      >    sleep 30 # sleep few moment to wait travis starts
> > >>>     the build
> > >>>     >>>>      >    RET_CODE=0
> > >>>     >>>>      >    python ./travis_check.py ${AUTHOR} ${COMMIT} ||
> > >>>     RET_CODE=$?
> > >>>     >>>>      >    if [ $RET_CODE -eq 2 ]; then # try with repository
> > >>>     name when
> > >>>     >>>>      travis-ci is
> > >>>     >>>>      > not available in the account
> > >>>     >>>>      >      RET_CODE=0
> > >>>     >>>>      >      AUTHOR=$(curl -s
> > >>>     >>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR
> > >>>     >>>>      > | grep '"full_name":' | grep -v "apache/zeppelin" |
> sed
> > >>>     >>>>      > 's/.*[:][^"]*["]\([^/]*\).*/\1/g')
> > >>>     >>>>      >    python ./travis_check.py ${AUTHOR} ${COMMIT} ||
> > >>>     RET_CODE=$?
> > >>>     >>>>      >    fi
> > >>>     >>>>      >
> > >>>     >>>>      >    if [ $RET_CODE -eq 2 ]; then # fail with can't
> find
> > >>>     build
> > >>>     >>>>      information in
> > >>>     >>>>      > the travis
> > >>>     >>>>      >      set +x
> > >>>     >>>>      >      echo
> > >>>     "-----------------------------------------------------"
> > >>>     >>>>      >      echo "Looks like travis-ci is not configured for
> > >>>     your fork."
> > >>>     >>>>      >      echo "Please setup by swich on 'zeppelin'
> > >>>     repository at
> > >>>     >>>>      > https://travis-ci.org/profile and travis-ci."
> > >>>     >>>>      >      echo "And then make sure 'Build branch updates'
> > >>>     option is
> > >>>     >>>>      enabled in
> > >>>     >>>>      > the settings
> > >>> https://travis-ci.org/${AUTHOR}/zeppelin/settings
> > >>> <https://travis-ci.org/$%7BAUTHOR%7D/zeppelin/settings>
> > >>>     >>>> <https://travis-ci.org/$%7BAUTHOR%7D/zeppelin/settings>."
> > >>>     >>>>      >      echo ""
> > >>>     >>>>      >      echo "To trigger CI after setup, you will need
> > >>>     ammend your
> > >>>     >>>>      last commit
> > >>>     >>>>      > with"
> > >>>     >>>>      >      echo "git commit --amend"
> > >>>     >>>>      >      echo "git push your-remote HEAD --force"
> > >>>     >>>>      >      echo ""
> > >>>     >>>>      >      echo "See
> > >>>     >>>>      >
> > >>>     >>>>
> > >>>     >>
> > >>>
> >
> http://zeppelin.apache.org/contribution/contributions.html#continuous-integration
> > >>>     >>>>      > ."
> > >>>     >>>>      >    fi
> > >>>     >>>>      >
> > >>>     >>>>      >    exit $RET_CODE
> > >>>     >>>>      > else
> > >>>     >>>>      >    set +x
> > >>>     >>>>      >    echo "travis_check.py does not exists"
> > >>>     >>>>      >    exit 1
> > >>>     >>>>      > fi
> > >>>     >>>>      >
> > >>>     >>>>      > Chesnay Schepler <chesnay@apache.org
> > >>> <ma...@apache.org>
> > >>>     >>>>      <mailto:chesnay@apache.org <mailto:chesnay@apache.org
> >>>
> > >>>     于2019年6月29日周六 下午3:17写道:
> > >>>     >>>>      >
> > >>>     >>>>      >> Does this imply that a Jenkins job is active as long
> > >>>     as the
> > >>>     >>>>      Travis build
> > >>>     >>>>      >> runs?
> > >>>     >>>>      >>
> > >>>     >>>>      >> On 26/06/2019 21:28, Bowen Li wrote:
> > >>>     >>>>      >>> Hi,
> > >>>     >>>>      >>>
> > >>>     >>>>      >>> @Dawid, I think the "long test running" as I
> > >>>     mentioned in the
> > >>>     >>>>      first
> > >>>     >>>>      >> email,
> > >>>     >>>>      >>> also as you guys said, belongs to "a big effort
> > >>>     which is much
> > >>>     >>>>      harder to
> > >>>     >>>>      >>> accomplish in a short period of time and may
> deserve
> > >>>     its own
> > >>>     >>>>      separate
> > >>>     >>>>      >>> discussion". Thus I didn't include it in what we
> can
> > >>>     do in a
> > >>>     >>>>      foreseeable
> > >>>     >>>>      >>> short term.
> > >>>     >>>>      >>>
> > >>>     >>>>      >>> Besides, I don't think that's the ultimate reason
> > >>>     for lack of
> > >>>     >>>>      build
> > >>>     >>>>      >>> resources. Even if the build is shortened to
> > >>>     something like
> > >>>     >>>>      2h, the
> > >>>     >>>>      >>> problems of no build machine works about 6 or more
> > >>>     hours in
> > >>>     >>>>      PST daytime
> > >>>     >>>>      >>> that I described will still happen, because no
> > >>>     machine from
> > >>>     >>>>      ASF INFRA's
> > >>>     >>>>      >>> pool is allocated to Flink. As I have paid close
> > >>>     attention to
> > >>>     >>>>      the build
> > >>>     >>>>      >>> queue in the past few weekdays, it's a pretty clear
> > >>>     pattern now.
> > >>>     >>>>      >>>
> > >>>     >>>>      >>> **The ultimate root cause** for that is - we don't
> > >>>     have any
> > >>>     >>>>      **dedicated**
> > >>>     >>>>      >>> build resources that we can stably rely on. I'm
> > >>>     actually ok to
> > >>>     >>>>      wait for a
> > >>>     >>>>      >>> long time if there are build requests running, it
> > >>>     means at
> > >>>     >>>>      least we are
> > >>>     >>>>      >>> making progress. But I'm not ok with no build
> > >>>     resource. A
> > >>>     >>>>      better place I
> > >>>     >>>>      >>> think we should aim at in short term is to always
> > >>>     have at
> > >>>     >>>>      least a central
> > >>>     >>>>      >>> pool (can be 3 or 5) of machines dedicated to build
> > >>>     Flink at
> > >>>     >>>>      any time, or
> > >>>     >>>>      >>> maybe use users resources.
> > >>>     >>>>      >>>
> > >>>     >>>>      >>> @Chesnay @Robert I synced with Jeff offline that
> > >>>     Zeppelin
> > >>>     >>>>      community is
> > >>>     >>>>      >>> using a Jenkins job to automatically build on
> users'
> > >>>     travis
> > >>>     >>>>      account and
> > >>>     >>>>      >>> link the result back to github PR. I guess the
> > >>>     Jenkins job
> > >>>     >>>>      would fetch
> > >>>     >>>>      >>> latest upstream master and build the PR against it.
> > >>>     Jeff has
> > >>>     >>>> filed
> > >>>     >>>>      >> tickets
> > >>>     >>>>      >>> to learn and get access to the Jenkins infra. It'll
> > >>>     better to
> > >>>     >>>>      fully
> > >>>     >>>>      >>> understand it first before judging this approach.
> > >>>     >>>>      >>>
> > >>>     >>>>      >>> I also heard good things about CircleCI, and ASF
> > >>>     INFRA seems
> > >>>     >>>>      to have a
> > >>>     >>>>      >> pool
> > >>>     >>>>      >>> of build capacity there too. Can be an alternative
> > >>>     to consider.
> > >>>     >>>>      >>>
> > >>>     >>>>      >>>
> > >>>     >>>>      >>>
> > >>>     >>>>      >>>
> > >>>     >>>>      >>>
> > >>>     >>>>      >>>
> > >>>     >>>>      >>>
> > >>>     >>>>      >>>
> > >>>     >>>>      >>>
> > >>>     >>>>      >>> On Wed, Jun 26, 2019 at 12:44 AM Dawid Wysakowicz <
> > >>>     >>>>      >> dwysakowicz@apache.org
> > >>> <ma...@apache.org> <mailto:dwysakowicz@apache.org
> > >>> <ma...@apache.org>>>
> > >>>     >>>>      >>> wrote:
> > >>>     >>>>      >>>
> > >>>     >>>>      >>>> Sorry to jump in late, but I think Bowen missed
> the
> > >>>     most
> > >>>     >>>>      important point
> > >>>     >>>>      >>>> from Chesnay's previous message in the summary.
> The
> > >>>     ultimate
> > >>>     >>>>      reason for
> > >>>     >>>>      >>>> all the problems is that the tests take close to 2
> > >>>     hours to
> > >>>     >>>>      run already.
> > >>>     >>>>      >>>> I fully support this claim: "Unless people start
> > >>>     caring about
> > >>>     >>>>      test times
> > >>>     >>>>      >>>> before adding them, this issue cannot be solved"
> > >>>     >>>>      >>>>
> > >>>     >>>>      >>>> This is also another reason why using user's
> Travis
> > >>>     account
> > >>>     >>>>      won't help.
> > >>>     >>>>      >>>> Every few weeks we reach the user's time limit for
> > >>>     a single
> > >>>     >>>>      profile.
> > >>>     >>>>      >>>> This makes the user's builds simply fail, until we
> > >>>     either
> > >>>     >>>>      properly
> > >>>     >>>>      >>>> decrease the time the tests take (which I am not
> > >>>     sure we ever
> > >>>     >>>>      did) or
> > >>>     >>>>      >>>> postpone the problem by splitting into more
> > >>>     profiles. (Note
> > >>>     >>>>      that the ASF
> > >>>     >>>>      >>>> Travis account has higher time limits)
> > >>>     >>>>      >>>>
> > >>>     >>>>      >>>> Best,
> > >>>     >>>>      >>>>
> > >>>     >>>>      >>>> Dawid
> > >>>     >>>>      >>>>
> > >>>     >>>>      >>>> On 26/06/2019 09:36, Robert Metzger wrote:
> > >>>     >>>>      >>>>> Do we know if using "the best" available hardware
> > >>>     would
> > >>>     >>>>      improve the
> > >>>     >>>>      >> build
> > >>>     >>>>      >>>>> times?
> > >>>     >>>>      >>>>> Imagine we would run the build on machines with
> > >>>     plenty of
> > >>>     >>>>      main memory
> > >>>     >>>>      >> to
> > >>>     >>>>      >>>>> mount everything to ramdisk + the latest CPU
> > >>>     architecture?
> > >>>     >>>>      >>>>>
> > >>>     >>>>      >>>>> Throwing hardware at the problem could help
> reduce
> > >>>     the time
> > >>>     >>>>      of an
> > >>>     >>>>      >>>>> individual build, and using our own
> infrastructure
> > >>>     would
> > >>>     >>>>      remove our
> > >>>     >>>>      >>>>> dependency on Apache's Travis account (with the
> > >>>     obvious
> > >>>     >>>>      downside of
> > >>>     >>>>      >>>> having
> > >>>     >>>>      >>>>> to maintain the infrastructure)
> > >>>     >>>>      >>>>> We could use an open source travis alternative,
> to
> > >>>     have a
> > >>>     >>>>      similar
> > >>>     >>>>      >>>>> experience and make the migration easy.
> > >>>     >>>>      >>>>>
> > >>>     >>>>      >>>>>
> > >>>     >>>>      >>>>> On Wed, Jun 26, 2019 at 9:34 AM Chesnay Schepler
> > >>>     >>>>      <chesnay@apache.org <ma...@apache.org>
> > >>>     <mailto:chesnay@apache.org <ma...@apache.org>>>
> > >>>     >>>>      >>>> wrote:
> > >>>     >>>>      >>>>>>    >From what I gathered, there's no special
> > >>>     sauce that the
> > >>>     >>>>      Zeppelin
> > >>>     >>>>      >>>>>> project uses which actually integrates a users
> > >>> Travis
> > >>>     >>>>      account into the
> > >>>     >>>>      >>>> PR.
> > >>>     >>>>      >>>>>> They just disabled Travis for PRs. And that's
> > >>>     kind of it.
> > >>>     >>>>      >>>>>>
> > >>>     >>>>      >>>>>> Naturally we can do this (duh) and safe the ASF
> a
> > >>>     fair
> > >>>     >>>>      amount of
> > >>>     >>>>      >>>>>> resources, but there are downsides:
> > >>>     >>>>      >>>>>>
> > >>>     >>>>      >>>>>> The discoverability of the Travis check takes a
> > >>>     nose-dive.
> > >>>     >>>>      Either we
> > >>>     >>>>      >>>>>> require every contributor to always, an every
> > >>>     commit, also
> > >>>     >>>>      post a
> > >>>     >>>>      >> Travis
> > >>>     >>>>      >>>>>> build, or we have the reviewer sift through the
> > >>>     >>>>      contributors account
> > >>>     >>>>      >> to
> > >>>     >>>>      >>>>>> find it.
> > >>>     >>>>      >>>>>>
> > >>>     >>>>      >>>>>> This is rather cumbersome. Additionally, it's
> > >>>     also not
> > >>>     >>>>      equivalent to
> > >>>     >>>>      >>>>>> having a PR build.
> > >>>     >>>>      >>>>>>
> > >>>     >>>>      >>>>>> A normal branch build takes a branch as is and
> > >>>     tests it. A
> > >>>     >>>>      PR build
> > >>>     >>>>      >>>>>> merges the branch into master, and then runs it.
> > >>>     (Fun fact:
> > >>>     >>>>      This is
> > >>>     >>>>      >> why
> > >>>     >>>>      >>>>>> a PR without merge conflicts is not being run on
> > >>>     Travis.)
> > >>>     >>>>      >>>>>>
> > >>>     >>>>      >>>>>> And ultimately, everyone can already make use
> > >>> of this
> > >>>     >>>>      approach anyway.
> > >>>     >>>>      >>>>>>
> > >>>     >>>>      >>>>>> On 25/06/2019 08:02, Jark Wu wrote:
> > >>>     >>>>      >>>>>>> Hi Jeff,
> > >>>     >>>>      >>>>>>>
> > >>>     >>>>      >>>>>>> Thanks for sharing the Zeppelin approach. I
> > >>>     think it's a
> > >>>     >>>>      good idea to
> > >>>     >>>>      >>>>>>> leverage user's travis account.
> > >>>     >>>>      >>>>>>> In this way, we can have almost unlimited
> > >>>     concurrent build
> > >>>     >>>>      jobs and
> > >>>     >>>>      >>>>>>> developers can restart build by themselves
> > >>>     (currently only
> > >>>     >>>>      committers
> > >>>     >>>>      >>>>>>> can restart PR's build).
> > >>>     >>>>      >>>>>>>
> > >>>     >>>>      >>>>>>> But I'm still not very clear how to integrate
> > >>> user's
> > >>>     >>>>      travis build
> > >>>     >>>>      >> into
> > >>>     >>>>      >>>>>>> the Flink pull request's build automatically.
> > >>>     Can you
> > >>>     >>>>      explain more in
> > >>>     >>>>      >>>>>>> detail?
> > >>>     >>>>      >>>>>>>
> > >>>     >>>>      >>>>>>> Another question: does travis only build
> > >>>     branches for user
> > >>>     >>>>      account?
> > >>>     >>>>      >>>>>>> My concern is that builds for PRs will rebase
> > >>> user's
> > >>>     >>>>      commits against
> > >>>     >>>>      >>>>>>> current master branch.
> > >>>     >>>>      >>>>>>> This will help us to find problems before
> > >>>     merge.  Builds
> > >>>     >>>>      for branches
> > >>>     >>>>      >>>>>>> will lose the impact of new commits in master.
> > >>>     >>>>      >>>>>>> How does Zeppelin solve this problem?
> > >>>     >>>>      >>>>>>>
> > >>>     >>>>      >>>>>>> Thanks again for sharing the idea.
> > >>>     >>>>      >>>>>>>
> > >>>     >>>>      >>>>>>> Regards,
> > >>>     >>>>      >>>>>>> Jark
> > >>>     >>>>      >>>>>>>
> > >>>     >>>>      >>>>>>> On Tue, 25 Jun 2019 at 11:01, Jeff Zhang
> > >>>     <zjffdu@gmail.com <ma...@gmail.com>
> > >>>     >>>>      <mailto:zjffdu@gmail.com <ma...@gmail.com>>
> > >>>     >>>>      >>>>>>> <mailto:zjffdu@gmail.com
> > >>> <ma...@gmail.com> <mailto:zjffdu@gmail.com
> > >>> <ma...@gmail.com>>>> wrote:
> > >>>     >>>>      >>>>>>>
> > >>>     >>>>      >>>>>>>  Hi Folks,
> > >>>     >>>>      >>>>>>>
> > >>>     >>>>      >>>>>>>  Zeppelin meet this kind of issue before, we
> > >>> solve
> > >>>     >>>> it by
> > >>>     >>>>      >> delegating
> > >>>     >>>>      >>>>>>>  each
> > >>>     >>>>      >>>>>>>  one's PR build to his travis account
> > >>>     (Everyone can
> > >>>     >>>>      have 5 free
> > >>>     >>>>      >>>>>>>  slot for
> > >>>     >>>>      >>>>>>>  travis build).
> > >>>     >>>>      >>>>>>>  Apache account travis build is only triggered
> > >>> when
> > >>>     >>>>      PR is merged.
> > >>>     >>>>      >>>>>>>
> > >>>     >>>>      >>>>>>>
> > >>>     >>>>      >>>>>>>
> > >>>     >>>>      >>>>>>>  Kurt Young <ykt836@gmail.com
> > >>> <ma...@gmail.com>
> > >>>     >>>>      <mailto:ykt836@gmail.com <ma...@gmail.com>>
> > >>>     <mailto:ykt836@gmail.com <ma...@gmail.com>
> > >>>     >>>>      <mailto:ykt836@gmail.com <ma...@gmail.com>>>>
> > >>>     >>>>      >>>>>>>  于2019年6月25日周二 上午10:16写道:
> > >>>     >>>>      >>>>>>>
> > >>>     >>>>      >>>>>>>  > (Forgot to cc George)
> > >>>     >>>>      >>>>>>>  >
> > >>>     >>>>      >>>>>>>  > Best,
> > >>>     >>>>      >>>>>>>  > Kurt
> > >>>     >>>>      >>>>>>>  >
> > >>>     >>>>      >>>>>>>  >
> > >>>     >>>>      >>>>>>>  > On Tue, Jun 25, 2019 at 10:16 AM Kurt Young
> > >>>     >>>>      <ykt836@gmail.com <ma...@gmail.com>
> > >>>     <mailto:ykt836@gmail.com <ma...@gmail.com>>
> > >>>     >>>>      >>>>>>> <mailto:ykt836@gmail.com
> > >>> <ma...@gmail.com> <mailto:ykt836@gmail.com
> > >>> <ma...@gmail.com>>>>
> > >>>     >>>>      wrote:
> > >>>     >>>>      >>>>>>>  >
> > >>>     >>>>      >>>>>>>  > > Hi Bowen,
> > >>>     >>>>      >>>>>>>  > >
> > >>>     >>>>      >>>>>>>  > > Thanks for bringing this up. We
> > >>>     actually have
> > >>>     >>>>      discussed
> > >>>     >>>>      >> about
> > >>>     >>>>      >>>>>>>  this, and I
> > >>>     >>>>      >>>>>>>  > > think Till and George have
> > >>>     >>>>      >>>>>>>  > > already spend sometime investigating
> > >>>     it. I have
> > >>>     >>>>      cced both of
> > >>>     >>>>      >>>>>>>  them, and
> > >>>     >>>>      >>>>>>>  > > maybe they can share
> > >>>     >>>>      >>>>>>>  > > their findings.
> > >>>     >>>>      >>>>>>>  > >
> > >>>     >>>>      >>>>>>>  > > Best,
> > >>>     >>>>      >>>>>>>  > > Kurt
> > >>>     >>>>      >>>>>>>  > >
> > >>>     >>>>      >>>>>>>  > >
> > >>>     >>>>      >>>>>>>  > > On Tue, Jun 25, 2019 at 10:08 AM Jark Wu
> > >>>     >>>>      <imjark@gmail.com <ma...@gmail.com>
> > >>>     <mailto:imjark@gmail.com <ma...@gmail.com>>
> > >>>     >>>>      >>>>>>> <mailto:imjark@gmail.com
> > >>> <ma...@gmail.com> <mailto:imjark@gmail.com
> > >>> <ma...@gmail.com>>>>
> > >>>     >>>>      wrote:
> > >>>     >>>>      >>>>>>>  > >
> > >>>     >>>>      >>>>>>>  > >> Hi Bowen,
> > >>>     >>>>      >>>>>>>  > >>
> > >>>     >>>>      >>>>>>>  > >> Thanks for bringing this. We also
> > >>>     suffered from
> > >>>     >>>>      the long
> > >>>     >>>>      >>>>>>>  build time.
> > >>>     >>>>      >>>>>>>  > >> I agree that we should focus on
> > >>>     solving build
> > >>>     >>>>      capacity
> > >>>     >>>>      >>>>>>>  problem in the
> > >>>     >>>>      >>>>>>>  > >> thread.
> > >>>     >>>>      >>>>>>>  > >>
> > >>>     >>>>      >>>>>>>  > >> My observation is there is only one
> > >>>     build is
> > >>>     >>>>      running, all
> > >>>     >>>>      >> the
> > >>>     >>>>      >>>>>>>  others
> > >>>     >>>>      >>>>>>>  > >> (other
> > >>>     >>>>      >>>>>>>  > >> PRs, master) are pending.
> > >>>     >>>>      >>>>>>>  > >> The pricing plan[1] of travis shows
> > >>>     it can
> > >>>     >>>> support
> > >>>     >>>>      >> concurrent
> > >>>     >>>>      >>>>>>>  build
> > >>>     >>>>      >>>>>>>  > jobs.
> > >>>     >>>>      >>>>>>>  > >> But I don't know which plan we are
> > >>>     using, might
> > >>>     >>>>      be the free
> > >>>     >>>>      >>>>>>>  plan for
> > >>>     >>>>      >>>>>>>  > open
> > >>>     >>>>      >>>>>>>  > >> source.
> > >>>     >>>>      >>>>>>>  > >>
> > >>>     >>>>      >>>>>>>  > >> I cc-ed Chesnay who may have some
> > >>>     experience on
> > >>>     >>>>      Travis.
> > >>>     >>>>      >>>>>>>  > >>
> > >>>     >>>>      >>>>>>>  > >> Regards,
> > >>>     >>>>      >>>>>>>  > >> Jark
> > >>>     >>>>      >>>>>>>  > >>
> > >>>     >>>>      >>>>>>>  > >> [1]: https://travis-ci.com/plans
> > >>>     >>>>      >>>>>>>  > >>
> > >>>     >>>>      >>>>>>>  > >> On Tue, 25 Jun 2019 at 08:11, Bowen Li <
> > >>>     >>>>      >> bowenli86@gmail.com <ma...@gmail.com>
> > >>>     <mailto:bowenli86@gmail.com <ma...@gmail.com>>
> > >>>     >>>>      >>>>>>> <mailto:bowenli86@gmail.com
> > >>> <ma...@gmail.com>
> > >>>     >>>>      <mailto:bowenli86@gmail.com
> > >>> <ma...@gmail.com>>>> wrote:
> > >>>     >>>>      >>>>>>>  > >>
> > >>>     >>>>      >>>>>>>  > >> > Hi Steven,
> > >>>     >>>>      >>>>>>>  > >> >
> > >>>     >>>>      >>>>>>>  > >> > I think you may not read what I
> > >>>     wrote. The
> > >>>     >>>>      discussion is
> > >>>     >>>>      >>>> about
> > >>>     >>>>      >>>>>>>  > "unstable
> > >>>     >>>>      >>>>>>>  > >> > build **capacity**", in another word
> > >>>     >>>>      "unstable / lack of
> > >>>     >>>>      >>>> build
> > >>>     >>>>      >>>>>>>  > >> resources",
> > >>>     >>>>      >>>>>>>  > >> > not "unstable build".
> > >>>     >>>>      >>>>>>>  > >> >
> > >>>     >>>>      >>>>>>>  > >> > On Mon, Jun 24, 2019 at 4:40 PM
> > >>>     Steven Wu
> > >>>     >>>>      >>>>>>>  <stevenz3wu@gmail.com
> > >>> <ma...@gmail.com> <mailto:stevenz3wu@gmail.com
> > >>> <ma...@gmail.com>>
> > >>>     >>>>      <mailto:stevenz3wu@gmail.com
> > >>> <ma...@gmail.com> <mailto:stevenz3wu@gmail.com
> > >>> <ma...@gmail.com>>>>
> > >>>     >>>>      >>>>>>>  > wrote:
> > >>>     >>>>      >>>>>>>  > >> >
> > >>>     >>>>      >>>>>>>  > >> > > long and sometimes unstable build is
> > >>>     >>>>      definitely a pain
> > >>>     >>>>      >>>>>> point.
> > >>>     >>>>      >>>>>>>  > >> > >
> > >>>     >>>>      >>>>>>>  > >> > > I suspect the build failure here in
> > >>>     >>>>      >> flink-connector-kafka
> > >>>     >>>>      >>>>>>>  is not
> > >>>     >>>>      >>>>>>>  > >> related
> > >>>     >>>>      >>>>>>>  > >> > to
> > >>>     >>>>      >>>>>>>  > >> > > my change. but there is no easy
> > >>>     re-run the
> > >>>     >>>>      build on
> > >>>     >>>>      >>>>>>>  travis UI.
> > >>>     >>>>      >>>>>>>  > Google
> > >>>     >>>>      >>>>>>>  > >> > > search showed a trick of
> > >>>     close-and-open the
> > >>>     >>>>      PR will
> > >>>     >>>>      >>>>>>>  trigger rebuild.
> > >>>     >>>>      >>>>>>>  > >> but
> > >>>     >>>>      >>>>>>>  > >> > > that could add noises to the PR
> > >>>     activities.
> > >>>     >>>>      >>>>>>>  > >> > >
> > >>>     >>>> https://travis-ci.org/apache/flink/jobs/545555519
> > >>>     >>>>      >>>>>>>  > >> > >
> > >>>     >>>>      >>>>>>>  > >> > > travis-ci for my personal repo
> > >>>     often failed
> > >>>     >>>>      with
> > >>>     >>>>      >>>>>>>  exceeding time
> > >>>     >>>>      >>>>>>>  > limit
> > >>>     >>>>      >>>>>>>  > >> > after
> > >>>     >>>>      >>>>>>>  > >> > > 4+ hours.
> > >>>     >>>>      >>>>>>>  > >> > > The job exceeded the maximum time
> > >>>     limit for
> > >>>     >>>>      jobs, and
> > >>>     >>>>      >> has
> > >>>     >>>>      >>>>>>>  been
> > >>>     >>>>      >>>>>>>  > >> > terminated.
> > >>>     >>>>      >>>>>>>  > >> > >
> > >>>     >>>>      >>>>>>>  > >> > > On Mon, Jun 24, 2019 at 4:15 PM
> > >>>     Bowen Li
> > >>>     >>>>      >>>>>>>  <bowenli86@gmail.com
> > >>> <ma...@gmail.com> <mailto:bowenli86@gmail.com
> > >>> <ma...@gmail.com>>
> > >>>     >>>>      <mailto:bowenli86@gmail.com <mailto:
> bowenli86@gmail.com>
> > >>>     <mailto:bowenli86@gmail.com <ma...@gmail.com>>>>
> > >>>     >>>>      >>>>>>>  > wrote:
> > >>>     >>>>      >>>>>>>  > >> > >
> > >>>     >>>>      >>>>>>>  > >> > > >
> > >>>     >>>> https://travis-ci.org/apache/flink/builds/549681530
> > >>>     >>>>      >>>>>>>  This build
> > >>>     >>>>      >>>>>>>  > >> > request
> > >>>     >>>>      >>>>>>>  > >> > > > has
> > >>>     >>>>      >>>>>>>  > >> > > > been sitting at **HEAD of the
> > >>>     queue**
> > >>>     >>>>      since I first
> > >>>     >>>>      >> saw
> > >>>     >>>>      >>>>>>>  it at PST
> > >>>     >>>>      >>>>>>>  > >> > 10:30am
> > >>>     >>>>      >>>>>>>  > >> > > > (not sure how long it's been
> > >>>     there before
> > >>>     >>>>      10:30am).
> > >>>     >>>>      >>>>>>>  It's PST
> > >>>     >>>>      >>>>>>>  > 4:12pm
> > >>>     >>>>      >>>>>>>  > >> now
> > >>>     >>>>      >>>>>>>  > >> > > and
> > >>>     >>>>      >>>>>>>  > >> > > > it hasn't started yet.
> > >>>     >>>>      >>>>>>>  > >> > > >
> > >>>     >>>>      >>>>>>>  > >> > > > On Mon, Jun 24, 2019 at 2:48 PM
> > >>>     Bowen Li
> > >>>     >>>>      >>>>>>>  <bowenli86@gmail.com
> > >>> <ma...@gmail.com> <mailto:bowenli86@gmail.com
> > >>> <ma...@gmail.com>>
> > >>>     >>>>      <mailto:bowenli86@gmail.com <mailto:
> bowenli86@gmail.com>
> > >>>     <mailto:bowenli86@gmail.com <ma...@gmail.com>>>>
> > >>>     >>>>      >>>>>>>  > >> wrote:
> > >>>     >>>>      >>>>>>>  > >> > > >
> > >>>     >>>>      >>>>>>>  > >> > > > > Hi devs,
> > >>>     >>>>      >>>>>>>  > >> > > > >
> > >>>     >>>>      >>>>>>>  > >> > > > > I've been experiencing the pain
> > >>>     >>>>      resulting from lack
> > >>>     >>>>      >>>>>>>  of stable
> > >>>     >>>>      >>>>>>>  > >> build
> > >>>     >>>>      >>>>>>>  > >> > > > > capacity on Travis for Flink
> > >>>     PRs [1].
> > >>>     >>>>      >> Specifically, I
> > >>>     >>>>      >>>>>>>  noticed
> > >>>     >>>>      >>>>>>>  > >> often
> > >>>     >>>>      >>>>>>>  > >> > > that
> > >>>     >>>>      >>>>>>>  > >> > > > no
> > >>>     >>>>      >>>>>>>  > >> > > > > build in the queue is making any
> > >>>     >>>>      progress for
> > >>>     >>>>      >> hours,
> > >>>     >>>>      >>>> and
> > >>>     >>>>      >>>>>>>  > suddenly
> > >>>     >>>>      >>>>>>>  > >> 5
> > >>>     >>>>      >>>>>>>  > >> > or
> > >>>     >>>>      >>>>>>>  > >> > > 6
> > >>>     >>>>      >>>>>>>  > >> > > > > builds kick off all together
> > >>>     after the
> > >>>     >>>>      long pause.
> > >>>     >>>>      >>>>>>>  I'm at PST
> > >>>     >>>>      >>>>>>>  > >> > (UTC-08)
> > >>>     >>>>      >>>>>>>  > >> > > > time
> > >>>     >>>>      >>>>>>>  > >> > > > > zone, and I've seen pause can
> > >>>     be as
> > >>>     >>>>      long as 6 hours
> > >>>     >>>>      >>>>>>>  from PST 9am
> > >>>     >>>>      >>>>>>>  > >> to
> > >>>     >>>>      >>>>>>>  > >> > 3pm
> > >>>     >>>>      >>>>>>>  > >> > > > > (let alone the time needed to
> > >>>     drain the
> > >>>     >>>>      queue
> > >>>     >>>>      >>>>>>>  afterwards).
> > >>>     >>>>      >>>>>>>  > >> > > > >
> > >>>     >>>>      >>>>>>>  > >> > > > > I think this has greatly
> > >>>     impacted our
> > >>>     >>>>      productivity.
> > >>>     >>>>      >>>> I've
> > >>>     >>>>      >>>>>>>  > >> experienced
> > >>>     >>>>      >>>>>>>  > >> > > that
> > >>>     >>>>      >>>>>>>  > >> > > > > PRs submitted in the early
> > >>>     morning of
> > >>>     >>>>      PST time zone
> > >>>     >>>>      >>>>>>>  won't finish
> > >>>     >>>>      >>>>>>>  > >> > their
> > >>>     >>>>      >>>>>>>  > >> > > > > build until late night of the
> > >>>     same day.
> > >>>     >>>>      >>>>>>>  > >> > > > >
> > >>>     >>>>      >>>>>>>  > >> > > > > So my questions are:
> > >>>     >>>>      >>>>>>>  > >> > > > >
> > >>>     >>>>      >>>>>>>  > >> > > > > - Has anyone else experienced
> > >>>     the same
> > >>>     >>>>      problem or
> > >>>     >>>>      >>>>>>>  have similar
> > >>>     >>>>      >>>>>>>  > >> > > > observation
> > >>>     >>>>      >>>>>>>  > >> > > > > on TravisCI? (I suspect it
> > >>>     has things
> > >>>     >>>>      to do with
> > >>>     >>>>      >> time
> > >>>     >>>>      >>>>>>>  zone)
> > >>>     >>>>      >>>>>>>  > >> > > > >
> > >>>     >>>>      >>>>>>>  > >> > > > > - What pricing plan of
> > >>>     TravisCI is
> > >>>     >>>>      Flink currently
> > >>>     >>>>      >>>>>>>  using? Is it
> > >>>     >>>>      >>>>>>>  > >> the
> > >>>     >>>>      >>>>>>>  > >> > > free
> > >>>     >>>>      >>>>>>>  > >> > > > > plan for open source
> > >>>     projects? What
> > >>>     >>>> are the
> > >>>     >>>>      >>>>>>>  guaranteed build
> > >>>     >>>>      >>>>>>>  > >> capacity
> > >>>     >>>>      >>>>>>>  > >> > > of
> > >>>     >>>>      >>>>>>>  > >> > > > > the current plan?
> > >>>     >>>>      >>>>>>>  > >> > > > >
> > >>>     >>>>      >>>>>>>  > >> > > > > - If the current pricing plan
> > >>>     (either
> > >>>     >>>>      free or paid)
> > >>>     >>>>      >>>>>> can't
> > >>>     >>>>      >>>>>>>  > provide
> > >>>     >>>>      >>>>>>>  > >> > > stable
> > >>>     >>>>      >>>>>>>  > >> > > > > build capacity, can we
> > >>>     upgrade to a
> > >>>     >>>>      higher priced
> > >>>     >>>>      >>>>>>>  plan with
> > >>>     >>>>      >>>>>>>  > larger
> > >>>     >>>>      >>>>>>>  > >> > and
> > >>>     >>>>      >>>>>>>  > >> > > > more
> > >>>     >>>>      >>>>>>>  > >> > > > > stable build capacity?
> > >>>     >>>>      >>>>>>>  > >> > > > >
> > >>>     >>>>      >>>>>>>  > >> > > > > BTW, another factor that
> > >>>     contribute to
> > >>>     >>>> the
> > >>>     >>>>      >>>>>>>  productivity problem
> > >>>     >>>>      >>>>>>>  > is
> > >>>     >>>>      >>>>>>>  > >> > that
> > >>>     >>>>      >>>>>>>  > >> > > > > our build is slow - we run
> > >>>     full build
> > >>>     >>>>      for every PR
> > >>>     >>>>      >>>> and a
> > >>>     >>>>      >>>>>>>  > >> successful
> > >>>     >>>>      >>>>>>>  > >> > > full
> > >>>     >>>>      >>>>>>>  > >> > > > > build takes ~5h. We
> > >>>     definitely have
> > >>>     >>>>      more options to
> > >>>     >>>>      >>>>>>>  solve it,
> > >>>     >>>>      >>>>>>>  > for
> > >>>     >>>>      >>>>>>>  > >> > > > instance,
> > >>>     >>>>      >>>>>>>  > >> > > > > modularize the build graphs
> > >>>     and reuse
> > >>>     >>>>      artifacts
> > >>>     >>>>      >> from
> > >>>     >>>>      >>>> the
> > >>>     >>>>      >>>>>>>  > previous
> > >>>     >>>>      >>>>>>>  > >> > > build.
> > >>>     >>>>      >>>>>>>  > >> > > > > But I think that can be a big
> > >>>     effort
> > >>>     >>>>      which is much
> > >>>     >>>>      >>>>>>>  harder to
> > >>>     >>>>      >>>>>>>  > >> > accomplish
> > >>>     >>>>      >>>>>>>  > >> > > > in
> > >>>     >>>>      >>>>>>>  > >> > > > > a short period of time and
> > >>>     may deserve
> > >>>     >>>>      its own
> > >>>     >>>>      >>>> separate
> > >>>     >>>>      >>>>>>>  > >> discussion.
> > >>>     >>>>      >>>>>>>  > >> > > > >
> > >>>     >>>>      >>>>>>>  > >> > > > > [1]
> > >>>     >>>>      >> https://travis-ci.org/apache/flink/pull_requests
> > >>>     >>>>      >>>>>>>  > >> > > > >
> > >>>     >>>>      >>>>>>>  > >> > > > >
> > >>>     >>>>      >>>>>>>  > >> > > >
> > >>>     >>>>      >>>>>>>  > >> > >
> > >>>     >>>>      >>>>>>>  > >> >
> > >>>     >>>>      >>>>>>>  > >>
> > >>>     >>>>      >>>>>>>  > >
> > >>>     >>>>      >>>>>>>  >
> > >>>     >>>>      >>>>>>>
> > >>>     >>>>      >>>>>>>
> > >>>     >>>>      >>>>>>>  --
> > >>>     >>>>      >>>>>>>  Best Regards
> > >>>     >>>>      >>>>>>>
> > >>>     >>>>      >>>>>>>  Jeff Zhang
> > >>>     >>>>      >>>>>>>
> > >>>     >>>>      >>
> > >>>     >>>>
> > >>>     >>>
> > >>>     >>
> > >>>
> > >>
> > >>
> > >
> >
> >
>

Re: [VOTE] Migrate to sponsored Travis account

Posted by Bowen Li <bo...@gmail.com>.
+1 on approval of the migration to our own Travis account. The foreseeable
benefits of the whole community's productivity and iteration speed would be
significant!

I think using Flinkbot or Travis REST API would be an implementation
details. Once we determine the overall direction, details can be figured
out.

Good news is that, upon my research on how Arrow and Spark integrate their
own in-house CI services with github repo, they are both using bots with
Github API. See a typical PR check for those projects at [1] and [2]. Thus,
we are **not alone** on this path.

Specifically for Apache Arrow, they have 'Ursabot', similar to our
Flinkbot, as I shared the link in the discussion. [3] lays out how Usrabot
works and integrates with Github API to trigger build. I think their
documentations is a bit outdated though - the doc says it cannot report
back build status to github, but from [1] we can see that the build status
are actually reported.

@Chesnay thanks for taking actions on this. Though I don't have access to
settings of Flink's github repo, I will continue to help push this
initiative in whichever way I can. Wes and Krisztián from Arrow are also
very friendly and helpful, and I can connect you to them to learn their
experience.

[1] https://github.com/apache/arrow/pull/4809
[2] https://github.com/apache/spark/pull/25053
[3] https://github.com/ursa-labs/ursabot#driving-ursabot


On Thu, Jul 4, 2019 at 6:42 AM Hequn Cheng <ch...@gmail.com> wrote:

> +1.
>
> And thanks a lot to Chesnay for pushing this.
>
> Best, Hequn
>
> On Thu, Jul 4, 2019 at 8:07 PM Chesnay Schepler <ch...@apache.org>
> wrote:
>
>> Note that the Flinkbot approach isn't that trivial either; we can't
>> _just_ trigger builds for a branch in the apache repo, but would first
>> have to clone the branch/pr into a separate repository (that is owned by
>> the github account that the travis account would be tied to).
>>
>> One roadblock after the next showing up...
>>
>> On 04/07/2019 11:59, Chesnay Schepler wrote:
>> > Small update with mostly bad news:
>> >
>> > INFRA doesn't know whether it is possible, and referred my to Travis
>> > support.
>> > They did point out that it could be problematic in regards to
>> > read/write permissions for the repository.
>> >
>> > From my own findings /so far/ with a test repo/organization, it does
>> > not appear possible to configure the Travis account used for a
>> > specific repository.
>> >
>> > So yeah, if we go down this route we may have to pimp the Flinkbot to
>> > trigger builds through the Travis REST API.
>> >
>> > On 04/07/2019 10:46, Chesnay Schepler wrote:
>> >> I've raised a JIRA
>> >> <https://issues.apache.org/jira/browse/INFRA-18703>with INFRA to
>> >> inquire whether it would be possible to switch to a different Travis
>> >> account, and if so what steps would need to be taken.
>> >> We need a proper confirmation from INFRA since we are not in full
>> >> control of the flink repository (for example, we cannot access the
>> >> settings page).
>> >>
>> >> If this is indeed possible, Ververica is willing sponsor a Travis
>> >> account for the Flink project.
>> >> This would provide us with more than enough resources than we need.
>> >>
>> >> Since this makes the project more reliant on resources provided by
>> >> external companies I would like to vote on this.
>> >>
>> >> Please vote on this proposal, as follows:
>> >> [ ] +1, Approve the migration to a Ververica-sponsored Travis
>> >> account, provided that INFRA approves
>> >> [ ] -1, Do not approach the migration to a Ververica-sponsored Travis
>> >> account
>> >>
>> >> The vote will be open for at least 24h, and until we have
>> >> confirmation from INFRA. The voting period may be shorter than the
>> >> usual 3 days since our current is effectively not working.
>> >>
>> >> On 04/07/2019 06:51, Bowen Li wrote:
>> >>> Re: > Are they using their own Travis CI pool, or did the switch to
>> >>> an entirely different CI service?
>> >>>
>> >>> I reached out to Wes and Krisztián from Apache Arrow PMC. They are
>> >>> currently moving away from ASF's Travis to their own in-house metal
>> >>> machines at [1] with custom CI application at [2]. They've seen
>> >>> significant improvement w.r.t both much higher performance and
>> >>> basically no resource waiting time, "night-and-day" difference
>> >>> quoting Wes.
>> >>>
>> >>> Re: > If we can just switch to our own Travis pool, just for our
>> >>> project, then this might be something we can do fairly quickly?
>> >>>
>> >>> I believe so, according to [3] and [4]
>> >>>
>> >>>
>> >>> [1] https://ci.ursalabs.org/ <https://ci.ursalabs.org/#/>
>> >>> [2] https://github.com/ursa-labs/ursabot
>> >>> [3]
>> >>>
>> https://docs.travis-ci.com/user/migrate/open-source-repository-migration
>> >>>
>> >>> [4]
>> >>> https://docs.travis-ci.com/user/migrate/open-source-on-travis-ci-com
>> >>>
>> >>>
>> >>>
>> >>> On Wed, Jul 3, 2019 at 12:01 AM Chesnay Schepler <chesnay@apache.org
>> >>> <ma...@apache.org>> wrote:
>> >>>
>> >>>     Are they using their own Travis CI pool, or did the switch to an
>> >>>     entirely different CI service?
>> >>>
>> >>>     If we can just switch to our own Travis pool, just for our
>> >>>     project, then
>> >>>     this might be something we can do fairly quickly?
>> >>>
>> >>>     On 03/07/2019 05:55, Bowen Li wrote:
>> >>>     > I responded in the INFRA ticket [1] that I believe they are
>> >>>     using a wrong
>> >>>     > metric against Flink and the total build time is a completely
>> >>>     different
>> >>>     > thing than guaranteed build capacity.
>> >>>     >
>> >>>     > My response:
>> >>>     >
>> >>>     > "As mentioned above, since I started to pay attention to Flink's
>> >>>     build
>> >>>     > queue a few tens of days ago, I'm in Seattle and I saw no build
>> >>>     was kicking
>> >>>     > off in PST daytime in weekdays for Flink. Our teammates in China
>> >>>     and Europe
>> >>>     > have also reported similar observations. So we need to evaluate
>> >>>     how the
>> >>>     > large total build time came from - if 1) your number and 2) our
>> >>>     > observations from three locations that cover pretty much a full
>> >>>     day, are
>> >>>     > all true, I **guess** one reason can be that - highly likely the
>> >>>     extra
>> >>>     > build time came from weekends when other Apache projects may be
>> >>>     idle and
>> >>>     > Flink just drains hard its congested queue.
>> >>>     >
>> >>>     > Please be aware of that we're not complaining about the lack of
>> >>>     resources
>> >>>     > in general, I'm complaining about the lack of **stable,
>> >>> dedicated**
>> >>>     > resources. An example for the latter one is, currently even if
>> >>>     no build is
>> >>>     > in Flink's queue and I submit a request to be the queue head
>> >>> in PST
>> >>>     > morning, my build won't even start in 6-8+h. That is an absurd
>> >>>     amount of
>> >>>     > waiting time.
>> >>>     >
>> >>>     > That's saying, if ASF INFRA decides to adopt a quota system and
>> >>>     grants
>> >>>     > Flink five DEDICATED servers that runs all the time only for
>> >>>     Flink, that'll
>> >>>     > be PERFECT and can totally solve our problem now.
>> >>>     >
>> >>>     > Please be aware of that we're not complaining about the lack of
>> >>>     resources
>> >>>     > in general, I'm complaining about the lack of **stable,
>> >>> dedicated**
>> >>>     > resources. An example for the latter one is, currently even if
>> >>>     no build is
>> >>>     > in Flink's queue and I submit a request to be the queue head
>> >>> in PST
>> >>>     > morning, my build won't even start in 6-8+h. That is an absurd
>> >>>     amount of
>> >>>     > waiting time.
>> >>>     >
>> >>>     >
>> >>>     > That's saying, if ASF INFRA decides to adopt a quota system and
>> >>>     grants
>> >>>     > Flink five DEDICATED servers that runs all the time only for
>> >>>     Flink, that'll
>> >>>     > be PERFECT and can totally solve our problem now.
>> >>>     >
>> >>>     > I feel what's missing in the ASF INFRA's Travis resource pool is
>> >>>     some level
>> >>>     > of build capacity SLAs and certainty"
>> >>>     >
>> >>>     >
>> >>>     > Again, I believe there are differences in nature of these two
>> >>>     problems,
>> >>>     > long build time v.s. lack of dedicated build resource. That's
>> >>>     saying,
>> >>>     > shortening build time may relieve the situation, and may not.
>> >>>     I'm sightly
>> >>>     > negative on disabling IT cases for PRs, due to the downside is
>> >>>     that we are
>> >>>     > at risk of any potential bugs in PR that UTs doesn't catch, and
>> >>>     may cost a
>> >>>     > lot more to fix and if it slows others down or even block
>> >>>     others, but am
>> >>>     > open to others opinions on it.
>> >>>     >
>> >>>     > AFAICT from INFRA ticket[1], donating to ASF INFRA won't be
>> >>>     feasible to
>> >>>     > solve our problem since INFRA's pool is fully shared and they
>> >>>     have no
>> >>>     > control and finer insights over resource allocation to a
>> >>>     specific Apache
>> >>>     > project. As mentioned in [1], Apache Arrow is moving away from
>> >>>     ASF INFRA
>> >>>     > Travis pool (they are actually surprised Flink hasn't plan to do
>> >>>     so). I
>> >>>     > know that Spark is on its own build infra. If we all agree that
>> >>>     funding our
>> >>>     > own build infra, I'd be glad to help investigate any potential
>> >>>     options
>> >>>     > after releasing 1.9 since I'm super busy with 1.9 now.
>> >>>     >
>> >>>     > [1] https://issues.apache.org/jira/browse/INFRA-18533
>> >>>     >
>> >>>     >
>> >>>     >
>> >>>     > On Tue, Jul 2, 2019 at 4:46 AM Chesnay Schepler
>> >>>     <chesnay@apache.org <ma...@apache.org>> wrote:
>> >>>     >
>> >>>     >> As a short-term stopgap, since we can assume this issue to
>> >>>     become much
>> >>>     >> worse in the following days/weeks, we could disable IT cases in
>> >>>     PRs and
>> >>>     >> only run them on master.
>> >>>     >>
>> >>>     >> On 02/07/2019 12:03, Chesnay Schepler wrote:
>> >>>     >>> People really have to stop thinking that just because
>> >>>     something works
>> >>>     >>> for us it is also a good solution.
>> >>>     >>> Also, please remember that our builds run for 2h from start to
>> >>>     finish,
>> >>>     >>> and not the 14 _minutes_ it takes for zeppelin.
>> >>>     >>> We are dealing with an entirely different scale here, both in
>> >>>     terms of
>> >>>     >>> build times and number of builds.
>> >>>     >>>
>> >>>     >>> In this very thread people have been complaining about long
>> >>> queue
>> >>>     >>> times for their builds. Surprise, other Apache projects have
>> >>> been
>> >>>     >>> suffering the very same thing due to us not controlling our
>> >>> build
>> >>>     >>> times. While switching services (be it Jenkins, CircleCI or
>> >>>     whatever)
>> >>>     >>> will possibly work for us (and these options are actually
>> >>>     attractive,
>> >>>     >>> like CircleCI's proper support for build artifacts), it will
>> >>> also
>> >>>     >>> result in us likely negatively affecting other projects in
>> >>>     significant
>> >>>     >>> ways.
>> >>>     >>>
>> >>>     >>> Sure, the Jenkins setup has a good user experience for us, at
>> >>>     the cost
>> >>>     >>> of blocking Jenkins workers for a _lot_ of time. Right now we
>> >>>     have 25
>> >>>     >>> PR's in our queue; that's possibly 50h we'd consume of Jenkins
>> >>>     >>> resources, and the European contributors haven't even really
>> >>>     started yet.
>> >>>     >>>
>> >>>     >>> FYI, the latest INFRA response from INFRA-18533:
>> >>>     >>>
>> >>>     >>> "Our rough metrics shows that Flink used over 5800 hours of
>> >>>     build time
>> >>>     >>> last month. That is equal to EIGHT servers running 24/7 for
>> >>>     the ENTIRE
>> >>>     >>> MONTH. EIGHT. nonstop.
>> >>>     >>> When we discovered this last night, we discussed it some and
>> >>>     are going
>> >>>     >>> to tune down Flink to allow only five executors maximum. We
>> >>> cannot
>> >>>     >>> allow Flink to consume so much of a Foundation shared
>> >>> resource."
>> >>>     >>>
>> >>>     >>> So yes, we either
>> >>>     >>> a) have to heavily reduce our CI usage or
>> >>>     >>> b) fund our own, either maintaining it ourselves or donating
>> >>>     to Apache.
>> >>>     >>>
>> >>>     >>> On 02/07/2019 05:11, Bowen Li wrote:
>> >>>     >>>> By looking at the git history of the Jenkins script, its core
>> >>>     part
>> >>>     >>>> was finished in March 2017 (and only two minor update in
>> >>>     2017/2018),
>> >>>     >>>> so it's been running for over two years now and feels like
>> >>>     Zepplin
>> >>>     >>>> community has been quite happy with it. @Jeff Zhang
>> >>>     >>>> <mailto:zjffdu@gmail.com <ma...@gmail.com>> can you
>> >>>     share your insights and user
>> >>>     >>>> experience with the Jenkins+Travis approach?
>> >>>     >>>>
>> >>>     >>>> Things like:
>> >>>     >>>>
>> >>>     >>>> - has the approach completely solved the resource capacity
>> >>>     problem
>> >>>     >>>> for Zepplin community? is Zepplin community happy with the
>> >>>     result?
>> >>>     >>>> - is the whole configuration chain stable (e.g. uptime)
>> >>> enough?
>> >>>     >>>> - how often do you need to maintain the Jenkins infra? how
>> >>> many
>> >>>     >>>> people are usually involved in maintenance and bug-fixes?
>> >>>     >>>>
>> >>>     >>>> The downside of this approach seems mostly to be on the
>> >>>     maintenance
>> >>>     >>>> to me - maintain the script and Jenkins infra.
>> >>>     >>>>
>> >>>     >>>> ** Having Our Own Travis-CI.com Account **
>> >>>     >>>>
>> >>>     >>>> Another alternative I've been thinking of is to have our own
>> >>>     >>>> travis-ci.com <http://travis-ci.com> <http://travis-ci.com>
>> >>>     account with paid dedicated
>> >>>     >>>> resources. Note travis-ci.org <http://travis-ci.org>
>> >>> <http://travis-ci.org> is the free
>> >>>     >>>> version and travis-ci.com <http://travis-ci.com>
>> >>> <http://travis-ci.com> is the commercial
>> >>>     >>>> version. We currently use a shared resource pool managed by
>> >>>     ASK INFRA
>> >>>     >>>> team on travis-ci.org <http://travis-ci.org>
>> >>> <http://travis-ci.org>, but we have no control
>> >>>     >>>> over it - we can't see how it's configured, how much
>> >>>     resources are
>> >>>     >>>> available, how resources are allocated among Apache projects,
>> >>>     etc.
>> >>>     >>>> The nice thing about having an account on travis-ci.com
>> >>> <http://travis-ci.com>
>> >>>     >>>> <http://travis-ci.com> are:
>> >>>     >>>>
>> >>>     >>>> - relatively low cost with much better resource guarantee
>> >>>     than what
>> >>>     >>>> we currently have [1]: $249/month with 5 dedicated
>> >>> concurrency,
>> >>>     >>>> $489/month with 10 concurrency
>> >>>     >>>> - low maintenance work compared to using Jenkins
>> >>>     >>>> - (potentially) no migration cost according to Travis's doc
>> >>> [2]
>> >>>     >>>> (pending verification)
>> >>>     >>>> - full control over the build capacity/configuration
>> >>> compared to
>> >>>     >>>> using ASF INFRA's pool
>> >>>     >>>>
>> >>>     >>>> I'd be surprised if we as such a vibrant community cannot
>> >>>     find and
>> >>>     >>>> fund $249*12=$2988 a year in exchange for a much better
>> >>> developer
>> >>>     >>>> experience and much higher productivity.
>> >>>     >>>>
>> >>>     >>>> [1] https://travis-ci.com/plans
>> >>>     >>>> [2]
>> >>>     >>>>
>> >>>     >>
>> >>>
>> https://docs.travis-ci.com/user/migrate/open-source-repository-migration
>> >>>
>> >>>     >>>> On Sat, Jun 29, 2019 at 8:39 AM Chesnay Schepler
>> >>>     <chesnay@apache.org <ma...@apache.org>
>> >>>     >>>> <mailto:chesnay@apache.org <ma...@apache.org>>>
>> >>> wrote:
>> >>>     >>>>
>> >>>     >>>>      So yes, the Jenkins job keeps pulling the state from
>> >>>     Travis until it
>> >>>     >>>>      finishes.
>> >>>     >>>>
>> >>>     >>>>      Note sure I'm comfortable with the idea of using Jenkins
>> >>>     workers
>> >>>     >>>>      just to
>> >>>     >>>>      idle for a several hours.
>> >>>     >>>>
>> >>>     >>>>      On 29/06/2019 14:56, Jeff Zhang wrote:
>> >>>     >>>>      > Here's what zeppelin community did, we make a python
>> >>>     script to
>> >>>     >>>>      check the
>> >>>     >>>>      > build status of pull request.
>> >>>     >>>>      > Here's script:
>> >>>     >>>>      >
>> >>> https://github.com/apache/zeppelin/blob/master/travis_check.py
>> >>>     >>>>      >
>> >>>     >>>>      > And this is the script we used in Jenkins build job.
>> >>>     >>>>      >
>> >>>     >>>>      > if [ -f "travis_check.py" ]; then
>> >>>     >>>>      >    git log -n 1
>> >>>     >>>>      >    STATUS=$(curl -s $BUILD_URL | grep -e "GitHub pull
>> >>>     >>>>      request.*from.*" | sed
>> >>>     >>>>      > 's/.*GitHub pull request <a
>> >>>     >>>>      > href=\"\(https[^"]*\).*from[^"]*.\(https[^"]*\).*/\1
>> >>>     \2/g')
>> >>>     >>>>      >    AUTHOR=$(echo $STATUS | sed 's/.*[/]\(.*\)$/\1/g')
>> >>>     >>>>      >    PR=$(echo $STATUS | awk '{print $1}' | sed
>> >>>     >>>> 's/.*[/]\(.*\)$/\1/g')
>> >>>     >>>>      >    #COMMIT=$(git log -n 1 | grep "^Merge:" | awk
>> >>>     '{print $3}')
>> >>>     >>>>      >    #if [ -z $COMMIT ]; then
>> >>>     >>>>      >    #  COMMIT=$(curl -s
>> >>>     >>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR
>> >>>     >>>>      > | grep -e "\"label\":" -e "\"ref\":" -e "\"sha\":" |
>> >>>     tr '\n' ' '
>> >>>     >>>>      | sed
>> >>>     >>>>      > 's/\(.*sha[^,]*,\)\(.*ref.*\)/\1 = \2/g' | tr = '\n' |
>> >>>     grep -v
>> >>>     >>>>      "apache:" |
>> >>>     >>>>      > sed 's/.*sha.[^"]*["]\([^"]*\).*/\1/g')
>> >>>     >>>>      >    #fi
>> >>>     >>>>      >
>> >>>     >>>>      >    # get commit hash from PR
>> >>>     >>>>      >    COMMIT=$(curl -s
>> >>>     >>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR |
>> >>>     >>>>      > grep -e "\"label\":" -e "\"ref\":" -e "\"sha\":" | tr
>> >>>     '\n' ' '
>> >>>     >>>> | sed
>> >>>     >>>>      > 's/\(.*sha[^,]*,\)\(.*ref.*\)/\1 = \2/g' | tr = '\n' |
>> >>>     grep -v
>> >>>     >>>>      "apache:" |
>> >>>     >>>>      > sed 's/.*sha.[^"]*["]\([^"]*\).*/\1/g')
>> >>>     >>>>      >    sleep 30 # sleep few moment to wait travis starts
>> >>>     the build
>> >>>     >>>>      >    RET_CODE=0
>> >>>     >>>>      >    python ./travis_check.py ${AUTHOR} ${COMMIT} ||
>> >>>     RET_CODE=$?
>> >>>     >>>>      >    if [ $RET_CODE -eq 2 ]; then # try with repository
>> >>>     name when
>> >>>     >>>>      travis-ci is
>> >>>     >>>>      > not available in the account
>> >>>     >>>>      >      RET_CODE=0
>> >>>     >>>>      >      AUTHOR=$(curl -s
>> >>>     >>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR
>> >>>     >>>>      > | grep '"full_name":' | grep -v "apache/zeppelin" |
>> sed
>> >>>     >>>>      > 's/.*[:][^"]*["]\([^/]*\).*/\1/g')
>> >>>     >>>>      >    python ./travis_check.py ${AUTHOR} ${COMMIT} ||
>> >>>     RET_CODE=$?
>> >>>     >>>>      >    fi
>> >>>     >>>>      >
>> >>>     >>>>      >    if [ $RET_CODE -eq 2 ]; then # fail with can't find
>> >>>     build
>> >>>     >>>>      information in
>> >>>     >>>>      > the travis
>> >>>     >>>>      >      set +x
>> >>>     >>>>      >      echo
>> >>>     "-----------------------------------------------------"
>> >>>     >>>>      >      echo "Looks like travis-ci is not configured for
>> >>>     your fork."
>> >>>     >>>>      >      echo "Please setup by swich on 'zeppelin'
>> >>>     repository at
>> >>>     >>>>      > https://travis-ci.org/profile and travis-ci."
>> >>>     >>>>      >      echo "And then make sure 'Build branch updates'
>> >>>     option is
>> >>>     >>>>      enabled in
>> >>>     >>>>      > the settings
>> >>> https://travis-ci.org/${AUTHOR}/zeppelin/settings
>> >>> <https://travis-ci.org/$%7BAUTHOR%7D/zeppelin/settings>
>> >>>     >>>> <https://travis-ci.org/$%7BAUTHOR%7D/zeppelin/settings>."
>> >>>     >>>>      >      echo ""
>> >>>     >>>>      >      echo "To trigger CI after setup, you will need
>> >>>     ammend your
>> >>>     >>>>      last commit
>> >>>     >>>>      > with"
>> >>>     >>>>      >      echo "git commit --amend"
>> >>>     >>>>      >      echo "git push your-remote HEAD --force"
>> >>>     >>>>      >      echo ""
>> >>>     >>>>      >      echo "See
>> >>>     >>>>      >
>> >>>     >>>>
>> >>>     >>
>> >>>
>> http://zeppelin.apache.org/contribution/contributions.html#continuous-integration
>> >>>     >>>>      > ."
>> >>>     >>>>      >    fi
>> >>>     >>>>      >
>> >>>     >>>>      >    exit $RET_CODE
>> >>>     >>>>      > else
>> >>>     >>>>      >    set +x
>> >>>     >>>>      >    echo "travis_check.py does not exists"
>> >>>     >>>>      >    exit 1
>> >>>     >>>>      > fi
>> >>>     >>>>      >
>> >>>     >>>>      > Chesnay Schepler <chesnay@apache.org
>> >>> <ma...@apache.org>
>> >>>     >>>>      <mailto:chesnay@apache.org <mailto:chesnay@apache.org
>> >>>
>> >>>     于2019年6月29日周六 下午3:17写道:
>> >>>     >>>>      >
>> >>>     >>>>      >> Does this imply that a Jenkins job is active as long
>> >>>     as the
>> >>>     >>>>      Travis build
>> >>>     >>>>      >> runs?
>> >>>     >>>>      >>
>> >>>     >>>>      >> On 26/06/2019 21:28, Bowen Li wrote:
>> >>>     >>>>      >>> Hi,
>> >>>     >>>>      >>>
>> >>>     >>>>      >>> @Dawid, I think the "long test running" as I
>> >>>     mentioned in the
>> >>>     >>>>      first
>> >>>     >>>>      >> email,
>> >>>     >>>>      >>> also as you guys said, belongs to "a big effort
>> >>>     which is much
>> >>>     >>>>      harder to
>> >>>     >>>>      >>> accomplish in a short period of time and may deserve
>> >>>     its own
>> >>>     >>>>      separate
>> >>>     >>>>      >>> discussion". Thus I didn't include it in what we can
>> >>>     do in a
>> >>>     >>>>      foreseeable
>> >>>     >>>>      >>> short term.
>> >>>     >>>>      >>>
>> >>>     >>>>      >>> Besides, I don't think that's the ultimate reason
>> >>>     for lack of
>> >>>     >>>>      build
>> >>>     >>>>      >>> resources. Even if the build is shortened to
>> >>>     something like
>> >>>     >>>>      2h, the
>> >>>     >>>>      >>> problems of no build machine works about 6 or more
>> >>>     hours in
>> >>>     >>>>      PST daytime
>> >>>     >>>>      >>> that I described will still happen, because no
>> >>>     machine from
>> >>>     >>>>      ASF INFRA's
>> >>>     >>>>      >>> pool is allocated to Flink. As I have paid close
>> >>>     attention to
>> >>>     >>>>      the build
>> >>>     >>>>      >>> queue in the past few weekdays, it's a pretty clear
>> >>>     pattern now.
>> >>>     >>>>      >>>
>> >>>     >>>>      >>> **The ultimate root cause** for that is - we don't
>> >>>     have any
>> >>>     >>>>      **dedicated**
>> >>>     >>>>      >>> build resources that we can stably rely on. I'm
>> >>>     actually ok to
>> >>>     >>>>      wait for a
>> >>>     >>>>      >>> long time if there are build requests running, it
>> >>>     means at
>> >>>     >>>>      least we are
>> >>>     >>>>      >>> making progress. But I'm not ok with no build
>> >>>     resource. A
>> >>>     >>>>      better place I
>> >>>     >>>>      >>> think we should aim at in short term is to always
>> >>>     have at
>> >>>     >>>>      least a central
>> >>>     >>>>      >>> pool (can be 3 or 5) of machines dedicated to build
>> >>>     Flink at
>> >>>     >>>>      any time, or
>> >>>     >>>>      >>> maybe use users resources.
>> >>>     >>>>      >>>
>> >>>     >>>>      >>> @Chesnay @Robert I synced with Jeff offline that
>> >>>     Zeppelin
>> >>>     >>>>      community is
>> >>>     >>>>      >>> using a Jenkins job to automatically build on users'
>> >>>     travis
>> >>>     >>>>      account and
>> >>>     >>>>      >>> link the result back to github PR. I guess the
>> >>>     Jenkins job
>> >>>     >>>>      would fetch
>> >>>     >>>>      >>> latest upstream master and build the PR against it.
>> >>>     Jeff has
>> >>>     >>>> filed
>> >>>     >>>>      >> tickets
>> >>>     >>>>      >>> to learn and get access to the Jenkins infra. It'll
>> >>>     better to
>> >>>     >>>>      fully
>> >>>     >>>>      >>> understand it first before judging this approach.
>> >>>     >>>>      >>>
>> >>>     >>>>      >>> I also heard good things about CircleCI, and ASF
>> >>>     INFRA seems
>> >>>     >>>>      to have a
>> >>>     >>>>      >> pool
>> >>>     >>>>      >>> of build capacity there too. Can be an alternative
>> >>>     to consider.
>> >>>     >>>>      >>>
>> >>>     >>>>      >>>
>> >>>     >>>>      >>>
>> >>>     >>>>      >>>
>> >>>     >>>>      >>>
>> >>>     >>>>      >>>
>> >>>     >>>>      >>>
>> >>>     >>>>      >>>
>> >>>     >>>>      >>>
>> >>>     >>>>      >>> On Wed, Jun 26, 2019 at 12:44 AM Dawid Wysakowicz <
>> >>>     >>>>      >> dwysakowicz@apache.org
>> >>> <ma...@apache.org> <mailto:dwysakowicz@apache.org
>> >>> <ma...@apache.org>>>
>> >>>     >>>>      >>> wrote:
>> >>>     >>>>      >>>
>> >>>     >>>>      >>>> Sorry to jump in late, but I think Bowen missed the
>> >>>     most
>> >>>     >>>>      important point
>> >>>     >>>>      >>>> from Chesnay's previous message in the summary. The
>> >>>     ultimate
>> >>>     >>>>      reason for
>> >>>     >>>>      >>>> all the problems is that the tests take close to 2
>> >>>     hours to
>> >>>     >>>>      run already.
>> >>>     >>>>      >>>> I fully support this claim: "Unless people start
>> >>>     caring about
>> >>>     >>>>      test times
>> >>>     >>>>      >>>> before adding them, this issue cannot be solved"
>> >>>     >>>>      >>>>
>> >>>     >>>>      >>>> This is also another reason why using user's Travis
>> >>>     account
>> >>>     >>>>      won't help.
>> >>>     >>>>      >>>> Every few weeks we reach the user's time limit for
>> >>>     a single
>> >>>     >>>>      profile.
>> >>>     >>>>      >>>> This makes the user's builds simply fail, until we
>> >>>     either
>> >>>     >>>>      properly
>> >>>     >>>>      >>>> decrease the time the tests take (which I am not
>> >>>     sure we ever
>> >>>     >>>>      did) or
>> >>>     >>>>      >>>> postpone the problem by splitting into more
>> >>>     profiles. (Note
>> >>>     >>>>      that the ASF
>> >>>     >>>>      >>>> Travis account has higher time limits)
>> >>>     >>>>      >>>>
>> >>>     >>>>      >>>> Best,
>> >>>     >>>>      >>>>
>> >>>     >>>>      >>>> Dawid
>> >>>     >>>>      >>>>
>> >>>     >>>>      >>>> On 26/06/2019 09:36, Robert Metzger wrote:
>> >>>     >>>>      >>>>> Do we know if using "the best" available hardware
>> >>>     would
>> >>>     >>>>      improve the
>> >>>     >>>>      >> build
>> >>>     >>>>      >>>>> times?
>> >>>     >>>>      >>>>> Imagine we would run the build on machines with
>> >>>     plenty of
>> >>>     >>>>      main memory
>> >>>     >>>>      >> to
>> >>>     >>>>      >>>>> mount everything to ramdisk + the latest CPU
>> >>>     architecture?
>> >>>     >>>>      >>>>>
>> >>>     >>>>      >>>>> Throwing hardware at the problem could help reduce
>> >>>     the time
>> >>>     >>>>      of an
>> >>>     >>>>      >>>>> individual build, and using our own infrastructure
>> >>>     would
>> >>>     >>>>      remove our
>> >>>     >>>>      >>>>> dependency on Apache's Travis account (with the
>> >>>     obvious
>> >>>     >>>>      downside of
>> >>>     >>>>      >>>> having
>> >>>     >>>>      >>>>> to maintain the infrastructure)
>> >>>     >>>>      >>>>> We could use an open source travis alternative, to
>> >>>     have a
>> >>>     >>>>      similar
>> >>>     >>>>      >>>>> experience and make the migration easy.
>> >>>     >>>>      >>>>>
>> >>>     >>>>      >>>>>
>> >>>     >>>>      >>>>> On Wed, Jun 26, 2019 at 9:34 AM Chesnay Schepler
>> >>>     >>>>      <chesnay@apache.org <ma...@apache.org>
>> >>>     <mailto:chesnay@apache.org <ma...@apache.org>>>
>> >>>     >>>>      >>>> wrote:
>> >>>     >>>>      >>>>>>    >From what I gathered, there's no special
>> >>>     sauce that the
>> >>>     >>>>      Zeppelin
>> >>>     >>>>      >>>>>> project uses which actually integrates a users
>> >>> Travis
>> >>>     >>>>      account into the
>> >>>     >>>>      >>>> PR.
>> >>>     >>>>      >>>>>> They just disabled Travis for PRs. And that's
>> >>>     kind of it.
>> >>>     >>>>      >>>>>>
>> >>>     >>>>      >>>>>> Naturally we can do this (duh) and safe the ASF a
>> >>>     fair
>> >>>     >>>>      amount of
>> >>>     >>>>      >>>>>> resources, but there are downsides:
>> >>>     >>>>      >>>>>>
>> >>>     >>>>      >>>>>> The discoverability of the Travis check takes a
>> >>>     nose-dive.
>> >>>     >>>>      Either we
>> >>>     >>>>      >>>>>> require every contributor to always, an every
>> >>>     commit, also
>> >>>     >>>>      post a
>> >>>     >>>>      >> Travis
>> >>>     >>>>      >>>>>> build, or we have the reviewer sift through the
>> >>>     >>>>      contributors account
>> >>>     >>>>      >> to
>> >>>     >>>>      >>>>>> find it.
>> >>>     >>>>      >>>>>>
>> >>>     >>>>      >>>>>> This is rather cumbersome. Additionally, it's
>> >>>     also not
>> >>>     >>>>      equivalent to
>> >>>     >>>>      >>>>>> having a PR build.
>> >>>     >>>>      >>>>>>
>> >>>     >>>>      >>>>>> A normal branch build takes a branch as is and
>> >>>     tests it. A
>> >>>     >>>>      PR build
>> >>>     >>>>      >>>>>> merges the branch into master, and then runs it.
>> >>>     (Fun fact:
>> >>>     >>>>      This is
>> >>>     >>>>      >> why
>> >>>     >>>>      >>>>>> a PR without merge conflicts is not being run on
>> >>>     Travis.)
>> >>>     >>>>      >>>>>>
>> >>>     >>>>      >>>>>> And ultimately, everyone can already make use
>> >>> of this
>> >>>     >>>>      approach anyway.
>> >>>     >>>>      >>>>>>
>> >>>     >>>>      >>>>>> On 25/06/2019 08:02, Jark Wu wrote:
>> >>>     >>>>      >>>>>>> Hi Jeff,
>> >>>     >>>>      >>>>>>>
>> >>>     >>>>      >>>>>>> Thanks for sharing the Zeppelin approach. I
>> >>>     think it's a
>> >>>     >>>>      good idea to
>> >>>     >>>>      >>>>>>> leverage user's travis account.
>> >>>     >>>>      >>>>>>> In this way, we can have almost unlimited
>> >>>     concurrent build
>> >>>     >>>>      jobs and
>> >>>     >>>>      >>>>>>> developers can restart build by themselves
>> >>>     (currently only
>> >>>     >>>>      committers
>> >>>     >>>>      >>>>>>> can restart PR's build).
>> >>>     >>>>      >>>>>>>
>> >>>     >>>>      >>>>>>> But I'm still not very clear how to integrate
>> >>> user's
>> >>>     >>>>      travis build
>> >>>     >>>>      >> into
>> >>>     >>>>      >>>>>>> the Flink pull request's build automatically.
>> >>>     Can you
>> >>>     >>>>      explain more in
>> >>>     >>>>      >>>>>>> detail?
>> >>>     >>>>      >>>>>>>
>> >>>     >>>>      >>>>>>> Another question: does travis only build
>> >>>     branches for user
>> >>>     >>>>      account?
>> >>>     >>>>      >>>>>>> My concern is that builds for PRs will rebase
>> >>> user's
>> >>>     >>>>      commits against
>> >>>     >>>>      >>>>>>> current master branch.
>> >>>     >>>>      >>>>>>> This will help us to find problems before
>> >>>     merge.  Builds
>> >>>     >>>>      for branches
>> >>>     >>>>      >>>>>>> will lose the impact of new commits in master.
>> >>>     >>>>      >>>>>>> How does Zeppelin solve this problem?
>> >>>     >>>>      >>>>>>>
>> >>>     >>>>      >>>>>>> Thanks again for sharing the idea.
>> >>>     >>>>      >>>>>>>
>> >>>     >>>>      >>>>>>> Regards,
>> >>>     >>>>      >>>>>>> Jark
>> >>>     >>>>      >>>>>>>
>> >>>     >>>>      >>>>>>> On Tue, 25 Jun 2019 at 11:01, Jeff Zhang
>> >>>     <zjffdu@gmail.com <ma...@gmail.com>
>> >>>     >>>>      <mailto:zjffdu@gmail.com <ma...@gmail.com>>
>> >>>     >>>>      >>>>>>> <mailto:zjffdu@gmail.com
>> >>> <ma...@gmail.com> <mailto:zjffdu@gmail.com
>> >>> <ma...@gmail.com>>>> wrote:
>> >>>     >>>>      >>>>>>>
>> >>>     >>>>      >>>>>>>  Hi Folks,
>> >>>     >>>>      >>>>>>>
>> >>>     >>>>      >>>>>>>  Zeppelin meet this kind of issue before, we
>> >>> solve
>> >>>     >>>> it by
>> >>>     >>>>      >> delegating
>> >>>     >>>>      >>>>>>>  each
>> >>>     >>>>      >>>>>>>  one's PR build to his travis account
>> >>>     (Everyone can
>> >>>     >>>>      have 5 free
>> >>>     >>>>      >>>>>>>  slot for
>> >>>     >>>>      >>>>>>>  travis build).
>> >>>     >>>>      >>>>>>>  Apache account travis build is only triggered
>> >>> when
>> >>>     >>>>      PR is merged.
>> >>>     >>>>      >>>>>>>
>> >>>     >>>>      >>>>>>>
>> >>>     >>>>      >>>>>>>
>> >>>     >>>>      >>>>>>>  Kurt Young <ykt836@gmail.com
>> >>> <ma...@gmail.com>
>> >>>     >>>>      <mailto:ykt836@gmail.com <ma...@gmail.com>>
>> >>>     <mailto:ykt836@gmail.com <ma...@gmail.com>
>> >>>     >>>>      <mailto:ykt836@gmail.com <ma...@gmail.com>>>>
>> >>>     >>>>      >>>>>>>  于2019年6月25日周二 上午10:16写道:
>> >>>     >>>>      >>>>>>>
>> >>>     >>>>      >>>>>>>  > (Forgot to cc George)
>> >>>     >>>>      >>>>>>>  >
>> >>>     >>>>      >>>>>>>  > Best,
>> >>>     >>>>      >>>>>>>  > Kurt
>> >>>     >>>>      >>>>>>>  >
>> >>>     >>>>      >>>>>>>  >
>> >>>     >>>>      >>>>>>>  > On Tue, Jun 25, 2019 at 10:16 AM Kurt Young
>> >>>     >>>>      <ykt836@gmail.com <ma...@gmail.com>
>> >>>     <mailto:ykt836@gmail.com <ma...@gmail.com>>
>> >>>     >>>>      >>>>>>> <mailto:ykt836@gmail.com
>> >>> <ma...@gmail.com> <mailto:ykt836@gmail.com
>> >>> <ma...@gmail.com>>>>
>> >>>     >>>>      wrote:
>> >>>     >>>>      >>>>>>>  >
>> >>>     >>>>      >>>>>>>  > > Hi Bowen,
>> >>>     >>>>      >>>>>>>  > >
>> >>>     >>>>      >>>>>>>  > > Thanks for bringing this up. We
>> >>>     actually have
>> >>>     >>>>      discussed
>> >>>     >>>>      >> about
>> >>>     >>>>      >>>>>>>  this, and I
>> >>>     >>>>      >>>>>>>  > > think Till and George have
>> >>>     >>>>      >>>>>>>  > > already spend sometime investigating
>> >>>     it. I have
>> >>>     >>>>      cced both of
>> >>>     >>>>      >>>>>>>  them, and
>> >>>     >>>>      >>>>>>>  > > maybe they can share
>> >>>     >>>>      >>>>>>>  > > their findings.
>> >>>     >>>>      >>>>>>>  > >
>> >>>     >>>>      >>>>>>>  > > Best,
>> >>>     >>>>      >>>>>>>  > > Kurt
>> >>>     >>>>      >>>>>>>  > >
>> >>>     >>>>      >>>>>>>  > >
>> >>>     >>>>      >>>>>>>  > > On Tue, Jun 25, 2019 at 10:08 AM Jark Wu
>> >>>     >>>>      <imjark@gmail.com <ma...@gmail.com>
>> >>>     <mailto:imjark@gmail.com <ma...@gmail.com>>
>> >>>     >>>>      >>>>>>> <mailto:imjark@gmail.com
>> >>> <ma...@gmail.com> <mailto:imjark@gmail.com
>> >>> <ma...@gmail.com>>>>
>> >>>     >>>>      wrote:
>> >>>     >>>>      >>>>>>>  > >
>> >>>     >>>>      >>>>>>>  > >> Hi Bowen,
>> >>>     >>>>      >>>>>>>  > >>
>> >>>     >>>>      >>>>>>>  > >> Thanks for bringing this. We also
>> >>>     suffered from
>> >>>     >>>>      the long
>> >>>     >>>>      >>>>>>>  build time.
>> >>>     >>>>      >>>>>>>  > >> I agree that we should focus on
>> >>>     solving build
>> >>>     >>>>      capacity
>> >>>     >>>>      >>>>>>>  problem in the
>> >>>     >>>>      >>>>>>>  > >> thread.
>> >>>     >>>>      >>>>>>>  > >>
>> >>>     >>>>      >>>>>>>  > >> My observation is there is only one
>> >>>     build is
>> >>>     >>>>      running, all
>> >>>     >>>>      >> the
>> >>>     >>>>      >>>>>>>  others
>> >>>     >>>>      >>>>>>>  > >> (other
>> >>>     >>>>      >>>>>>>  > >> PRs, master) are pending.
>> >>>     >>>>      >>>>>>>  > >> The pricing plan[1] of travis shows
>> >>>     it can
>> >>>     >>>> support
>> >>>     >>>>      >> concurrent
>> >>>     >>>>      >>>>>>>  build
>> >>>     >>>>      >>>>>>>  > jobs.
>> >>>     >>>>      >>>>>>>  > >> But I don't know which plan we are
>> >>>     using, might
>> >>>     >>>>      be the free
>> >>>     >>>>      >>>>>>>  plan for
>> >>>     >>>>      >>>>>>>  > open
>> >>>     >>>>      >>>>>>>  > >> source.
>> >>>     >>>>      >>>>>>>  > >>
>> >>>     >>>>      >>>>>>>  > >> I cc-ed Chesnay who may have some
>> >>>     experience on
>> >>>     >>>>      Travis.
>> >>>     >>>>      >>>>>>>  > >>
>> >>>     >>>>      >>>>>>>  > >> Regards,
>> >>>     >>>>      >>>>>>>  > >> Jark
>> >>>     >>>>      >>>>>>>  > >>
>> >>>     >>>>      >>>>>>>  > >> [1]: https://travis-ci.com/plans
>> >>>     >>>>      >>>>>>>  > >>
>> >>>     >>>>      >>>>>>>  > >> On Tue, 25 Jun 2019 at 08:11, Bowen Li <
>> >>>     >>>>      >> bowenli86@gmail.com <ma...@gmail.com>
>> >>>     <mailto:bowenli86@gmail.com <ma...@gmail.com>>
>> >>>     >>>>      >>>>>>> <mailto:bowenli86@gmail.com
>> >>> <ma...@gmail.com>
>> >>>     >>>>      <mailto:bowenli86@gmail.com
>> >>> <ma...@gmail.com>>>> wrote:
>> >>>     >>>>      >>>>>>>  > >>
>> >>>     >>>>      >>>>>>>  > >> > Hi Steven,
>> >>>     >>>>      >>>>>>>  > >> >
>> >>>     >>>>      >>>>>>>  > >> > I think you may not read what I
>> >>>     wrote. The
>> >>>     >>>>      discussion is
>> >>>     >>>>      >>>> about
>> >>>     >>>>      >>>>>>>  > "unstable
>> >>>     >>>>      >>>>>>>  > >> > build **capacity**", in another word
>> >>>     >>>>      "unstable / lack of
>> >>>     >>>>      >>>> build
>> >>>     >>>>      >>>>>>>  > >> resources",
>> >>>     >>>>      >>>>>>>  > >> > not "unstable build".
>> >>>     >>>>      >>>>>>>  > >> >
>> >>>     >>>>      >>>>>>>  > >> > On Mon, Jun 24, 2019 at 4:40 PM
>> >>>     Steven Wu
>> >>>     >>>>      >>>>>>>  <stevenz3wu@gmail.com
>> >>> <ma...@gmail.com> <mailto:stevenz3wu@gmail.com
>> >>> <ma...@gmail.com>>
>> >>>     >>>>      <mailto:stevenz3wu@gmail.com
>> >>> <ma...@gmail.com> <mailto:stevenz3wu@gmail.com
>> >>> <ma...@gmail.com>>>>
>> >>>     >>>>      >>>>>>>  > wrote:
>> >>>     >>>>      >>>>>>>  > >> >
>> >>>     >>>>      >>>>>>>  > >> > > long and sometimes unstable build is
>> >>>     >>>>      definitely a pain
>> >>>     >>>>      >>>>>> point.
>> >>>     >>>>      >>>>>>>  > >> > >
>> >>>     >>>>      >>>>>>>  > >> > > I suspect the build failure here in
>> >>>     >>>>      >> flink-connector-kafka
>> >>>     >>>>      >>>>>>>  is not
>> >>>     >>>>      >>>>>>>  > >> related
>> >>>     >>>>      >>>>>>>  > >> > to
>> >>>     >>>>      >>>>>>>  > >> > > my change. but there is no easy
>> >>>     re-run the
>> >>>     >>>>      build on
>> >>>     >>>>      >>>>>>>  travis UI.
>> >>>     >>>>      >>>>>>>  > Google
>> >>>     >>>>      >>>>>>>  > >> > > search showed a trick of
>> >>>     close-and-open the
>> >>>     >>>>      PR will
>> >>>     >>>>      >>>>>>>  trigger rebuild.
>> >>>     >>>>      >>>>>>>  > >> but
>> >>>     >>>>      >>>>>>>  > >> > > that could add noises to the PR
>> >>>     activities.
>> >>>     >>>>      >>>>>>>  > >> > >
>> >>>     >>>> https://travis-ci.org/apache/flink/jobs/545555519
>> >>>     >>>>      >>>>>>>  > >> > >
>> >>>     >>>>      >>>>>>>  > >> > > travis-ci for my personal repo
>> >>>     often failed
>> >>>     >>>>      with
>> >>>     >>>>      >>>>>>>  exceeding time
>> >>>     >>>>      >>>>>>>  > limit
>> >>>     >>>>      >>>>>>>  > >> > after
>> >>>     >>>>      >>>>>>>  > >> > > 4+ hours.
>> >>>     >>>>      >>>>>>>  > >> > > The job exceeded the maximum time
>> >>>     limit for
>> >>>     >>>>      jobs, and
>> >>>     >>>>      >> has
>> >>>     >>>>      >>>>>>>  been
>> >>>     >>>>      >>>>>>>  > >> > terminated.
>> >>>     >>>>      >>>>>>>  > >> > >
>> >>>     >>>>      >>>>>>>  > >> > > On Mon, Jun 24, 2019 at 4:15 PM
>> >>>     Bowen Li
>> >>>     >>>>      >>>>>>>  <bowenli86@gmail.com
>> >>> <ma...@gmail.com> <mailto:bowenli86@gmail.com
>> >>> <ma...@gmail.com>>
>> >>>     >>>>      <mailto:bowenli86@gmail.com <mailto:bowenli86@gmail.com
>> >
>> >>>     <mailto:bowenli86@gmail.com <ma...@gmail.com>>>>
>> >>>     >>>>      >>>>>>>  > wrote:
>> >>>     >>>>      >>>>>>>  > >> > >
>> >>>     >>>>      >>>>>>>  > >> > > >
>> >>>     >>>> https://travis-ci.org/apache/flink/builds/549681530
>> >>>     >>>>      >>>>>>>  This build
>> >>>     >>>>      >>>>>>>  > >> > request
>> >>>     >>>>      >>>>>>>  > >> > > > has
>> >>>     >>>>      >>>>>>>  > >> > > > been sitting at **HEAD of the
>> >>>     queue**
>> >>>     >>>>      since I first
>> >>>     >>>>      >> saw
>> >>>     >>>>      >>>>>>>  it at PST
>> >>>     >>>>      >>>>>>>  > >> > 10:30am
>> >>>     >>>>      >>>>>>>  > >> > > > (not sure how long it's been
>> >>>     there before
>> >>>     >>>>      10:30am).
>> >>>     >>>>      >>>>>>>  It's PST
>> >>>     >>>>      >>>>>>>  > 4:12pm
>> >>>     >>>>      >>>>>>>  > >> now
>> >>>     >>>>      >>>>>>>  > >> > > and
>> >>>     >>>>      >>>>>>>  > >> > > > it hasn't started yet.
>> >>>     >>>>      >>>>>>>  > >> > > >
>> >>>     >>>>      >>>>>>>  > >> > > > On Mon, Jun 24, 2019 at 2:48 PM
>> >>>     Bowen Li
>> >>>     >>>>      >>>>>>>  <bowenli86@gmail.com
>> >>> <ma...@gmail.com> <mailto:bowenli86@gmail.com
>> >>> <ma...@gmail.com>>
>> >>>     >>>>      <mailto:bowenli86@gmail.com <mailto:bowenli86@gmail.com
>> >
>> >>>     <mailto:bowenli86@gmail.com <ma...@gmail.com>>>>
>> >>>     >>>>      >>>>>>>  > >> wrote:
>> >>>     >>>>      >>>>>>>  > >> > > >
>> >>>     >>>>      >>>>>>>  > >> > > > > Hi devs,
>> >>>     >>>>      >>>>>>>  > >> > > > >
>> >>>     >>>>      >>>>>>>  > >> > > > > I've been experiencing the pain
>> >>>     >>>>      resulting from lack
>> >>>     >>>>      >>>>>>>  of stable
>> >>>     >>>>      >>>>>>>  > >> build
>> >>>     >>>>      >>>>>>>  > >> > > > > capacity on Travis for Flink
>> >>>     PRs [1].
>> >>>     >>>>      >> Specifically, I
>> >>>     >>>>      >>>>>>>  noticed
>> >>>     >>>>      >>>>>>>  > >> often
>> >>>     >>>>      >>>>>>>  > >> > > that
>> >>>     >>>>      >>>>>>>  > >> > > > no
>> >>>     >>>>      >>>>>>>  > >> > > > > build in the queue is making any
>> >>>     >>>>      progress for
>> >>>     >>>>      >> hours,
>> >>>     >>>>      >>>> and
>> >>>     >>>>      >>>>>>>  > suddenly
>> >>>     >>>>      >>>>>>>  > >> 5
>> >>>     >>>>      >>>>>>>  > >> > or
>> >>>     >>>>      >>>>>>>  > >> > > 6
>> >>>     >>>>      >>>>>>>  > >> > > > > builds kick off all together
>> >>>     after the
>> >>>     >>>>      long pause.
>> >>>     >>>>      >>>>>>>  I'm at PST
>> >>>     >>>>      >>>>>>>  > >> > (UTC-08)
>> >>>     >>>>      >>>>>>>  > >> > > > time
>> >>>     >>>>      >>>>>>>  > >> > > > > zone, and I've seen pause can
>> >>>     be as
>> >>>     >>>>      long as 6 hours
>> >>>     >>>>      >>>>>>>  from PST 9am
>> >>>     >>>>      >>>>>>>  > >> to
>> >>>     >>>>      >>>>>>>  > >> > 3pm
>> >>>     >>>>      >>>>>>>  > >> > > > > (let alone the time needed to
>> >>>     drain the
>> >>>     >>>>      queue
>> >>>     >>>>      >>>>>>>  afterwards).
>> >>>     >>>>      >>>>>>>  > >> > > > >
>> >>>     >>>>      >>>>>>>  > >> > > > > I think this has greatly
>> >>>     impacted our
>> >>>     >>>>      productivity.
>> >>>     >>>>      >>>> I've
>> >>>     >>>>      >>>>>>>  > >> experienced
>> >>>     >>>>      >>>>>>>  > >> > > that
>> >>>     >>>>      >>>>>>>  > >> > > > > PRs submitted in the early
>> >>>     morning of
>> >>>     >>>>      PST time zone
>> >>>     >>>>      >>>>>>>  won't finish
>> >>>     >>>>      >>>>>>>  > >> > their
>> >>>     >>>>      >>>>>>>  > >> > > > > build until late night of the
>> >>>     same day.
>> >>>     >>>>      >>>>>>>  > >> > > > >
>> >>>     >>>>      >>>>>>>  > >> > > > > So my questions are:
>> >>>     >>>>      >>>>>>>  > >> > > > >
>> >>>     >>>>      >>>>>>>  > >> > > > > - Has anyone else experienced
>> >>>     the same
>> >>>     >>>>      problem or
>> >>>     >>>>      >>>>>>>  have similar
>> >>>     >>>>      >>>>>>>  > >> > > > observation
>> >>>     >>>>      >>>>>>>  > >> > > > > on TravisCI? (I suspect it
>> >>>     has things
>> >>>     >>>>      to do with
>> >>>     >>>>      >> time
>> >>>     >>>>      >>>>>>>  zone)
>> >>>     >>>>      >>>>>>>  > >> > > > >
>> >>>     >>>>      >>>>>>>  > >> > > > > - What pricing plan of
>> >>>     TravisCI is
>> >>>     >>>>      Flink currently
>> >>>     >>>>      >>>>>>>  using? Is it
>> >>>     >>>>      >>>>>>>  > >> the
>> >>>     >>>>      >>>>>>>  > >> > > free
>> >>>     >>>>      >>>>>>>  > >> > > > > plan for open source
>> >>>     projects? What
>> >>>     >>>> are the
>> >>>     >>>>      >>>>>>>  guaranteed build
>> >>>     >>>>      >>>>>>>  > >> capacity
>> >>>     >>>>      >>>>>>>  > >> > > of
>> >>>     >>>>      >>>>>>>  > >> > > > > the current plan?
>> >>>     >>>>      >>>>>>>  > >> > > > >
>> >>>     >>>>      >>>>>>>  > >> > > > > - If the current pricing plan
>> >>>     (either
>> >>>     >>>>      free or paid)
>> >>>     >>>>      >>>>>> can't
>> >>>     >>>>      >>>>>>>  > provide
>> >>>     >>>>      >>>>>>>  > >> > > stable
>> >>>     >>>>      >>>>>>>  > >> > > > > build capacity, can we
>> >>>     upgrade to a
>> >>>     >>>>      higher priced
>> >>>     >>>>      >>>>>>>  plan with
>> >>>     >>>>      >>>>>>>  > larger
>> >>>     >>>>      >>>>>>>  > >> > and
>> >>>     >>>>      >>>>>>>  > >> > > > more
>> >>>     >>>>      >>>>>>>  > >> > > > > stable build capacity?
>> >>>     >>>>      >>>>>>>  > >> > > > >
>> >>>     >>>>      >>>>>>>  > >> > > > > BTW, another factor that
>> >>>     contribute to
>> >>>     >>>> the
>> >>>     >>>>      >>>>>>>  productivity problem
>> >>>     >>>>      >>>>>>>  > is
>> >>>     >>>>      >>>>>>>  > >> > that
>> >>>     >>>>      >>>>>>>  > >> > > > > our build is slow - we run
>> >>>     full build
>> >>>     >>>>      for every PR
>> >>>     >>>>      >>>> and a
>> >>>     >>>>      >>>>>>>  > >> successful
>> >>>     >>>>      >>>>>>>  > >> > > full
>> >>>     >>>>      >>>>>>>  > >> > > > > build takes ~5h. We
>> >>>     definitely have
>> >>>     >>>>      more options to
>> >>>     >>>>      >>>>>>>  solve it,
>> >>>     >>>>      >>>>>>>  > for
>> >>>     >>>>      >>>>>>>  > >> > > > instance,
>> >>>     >>>>      >>>>>>>  > >> > > > > modularize the build graphs
>> >>>     and reuse
>> >>>     >>>>      artifacts
>> >>>     >>>>      >> from
>> >>>     >>>>      >>>> the
>> >>>     >>>>      >>>>>>>  > previous
>> >>>     >>>>      >>>>>>>  > >> > > build.
>> >>>     >>>>      >>>>>>>  > >> > > > > But I think that can be a big
>> >>>     effort
>> >>>     >>>>      which is much
>> >>>     >>>>      >>>>>>>  harder to
>> >>>     >>>>      >>>>>>>  > >> > accomplish
>> >>>     >>>>      >>>>>>>  > >> > > > in
>> >>>     >>>>      >>>>>>>  > >> > > > > a short period of time and
>> >>>     may deserve
>> >>>     >>>>      its own
>> >>>     >>>>      >>>> separate
>> >>>     >>>>      >>>>>>>  > >> discussion.
>> >>>     >>>>      >>>>>>>  > >> > > > >
>> >>>     >>>>      >>>>>>>  > >> > > > > [1]
>> >>>     >>>>      >> https://travis-ci.org/apache/flink/pull_requests
>> >>>     >>>>      >>>>>>>  > >> > > > >
>> >>>     >>>>      >>>>>>>  > >> > > > >
>> >>>     >>>>      >>>>>>>  > >> > > >
>> >>>     >>>>      >>>>>>>  > >> > >
>> >>>     >>>>      >>>>>>>  > >> >
>> >>>     >>>>      >>>>>>>  > >>
>> >>>     >>>>      >>>>>>>  > >
>> >>>     >>>>      >>>>>>>  >
>> >>>     >>>>      >>>>>>>
>> >>>     >>>>      >>>>>>>
>> >>>     >>>>      >>>>>>>  --
>> >>>     >>>>      >>>>>>>  Best Regards
>> >>>     >>>>      >>>>>>>
>> >>>     >>>>      >>>>>>>  Jeff Zhang
>> >>>     >>>>      >>>>>>>
>> >>>     >>>>      >>
>> >>>     >>>>
>> >>>     >>>
>> >>>     >>
>> >>>
>> >>
>> >>
>> >
>>
>>

Re: [VOTE] Migrate to sponsored Travis account

Posted by Hequn Cheng <ch...@gmail.com>.
+1.

And thanks a lot to Chesnay for pushing this.

Best, Hequn

On Thu, Jul 4, 2019 at 8:07 PM Chesnay Schepler <ch...@apache.org> wrote:

> Note that the Flinkbot approach isn't that trivial either; we can't
> _just_ trigger builds for a branch in the apache repo, but would first
> have to clone the branch/pr into a separate repository (that is owned by
> the github account that the travis account would be tied to).
>
> One roadblock after the next showing up...
>
> On 04/07/2019 11:59, Chesnay Schepler wrote:
> > Small update with mostly bad news:
> >
> > INFRA doesn't know whether it is possible, and referred my to Travis
> > support.
> > They did point out that it could be problematic in regards to
> > read/write permissions for the repository.
> >
> > From my own findings /so far/ with a test repo/organization, it does
> > not appear possible to configure the Travis account used for a
> > specific repository.
> >
> > So yeah, if we go down this route we may have to pimp the Flinkbot to
> > trigger builds through the Travis REST API.
> >
> > On 04/07/2019 10:46, Chesnay Schepler wrote:
> >> I've raised a JIRA
> >> <https://issues.apache.org/jira/browse/INFRA-18703>with INFRA to
> >> inquire whether it would be possible to switch to a different Travis
> >> account, and if so what steps would need to be taken.
> >> We need a proper confirmation from INFRA since we are not in full
> >> control of the flink repository (for example, we cannot access the
> >> settings page).
> >>
> >> If this is indeed possible, Ververica is willing sponsor a Travis
> >> account for the Flink project.
> >> This would provide us with more than enough resources than we need.
> >>
> >> Since this makes the project more reliant on resources provided by
> >> external companies I would like to vote on this.
> >>
> >> Please vote on this proposal, as follows:
> >> [ ] +1, Approve the migration to a Ververica-sponsored Travis
> >> account, provided that INFRA approves
> >> [ ] -1, Do not approach the migration to a Ververica-sponsored Travis
> >> account
> >>
> >> The vote will be open for at least 24h, and until we have
> >> confirmation from INFRA. The voting period may be shorter than the
> >> usual 3 days since our current is effectively not working.
> >>
> >> On 04/07/2019 06:51, Bowen Li wrote:
> >>> Re: > Are they using their own Travis CI pool, or did the switch to
> >>> an entirely different CI service?
> >>>
> >>> I reached out to Wes and Krisztián from Apache Arrow PMC. They are
> >>> currently moving away from ASF's Travis to their own in-house metal
> >>> machines at [1] with custom CI application at [2]. They've seen
> >>> significant improvement w.r.t both much higher performance and
> >>> basically no resource waiting time, "night-and-day" difference
> >>> quoting Wes.
> >>>
> >>> Re: > If we can just switch to our own Travis pool, just for our
> >>> project, then this might be something we can do fairly quickly?
> >>>
> >>> I believe so, according to [3] and [4]
> >>>
> >>>
> >>> [1] https://ci.ursalabs.org/ <https://ci.ursalabs.org/#/>
> >>> [2] https://github.com/ursa-labs/ursabot
> >>> [3]
> >>>
> https://docs.travis-ci.com/user/migrate/open-source-repository-migration
> >>>
> >>> [4]
> >>> https://docs.travis-ci.com/user/migrate/open-source-on-travis-ci-com
> >>>
> >>>
> >>>
> >>> On Wed, Jul 3, 2019 at 12:01 AM Chesnay Schepler <chesnay@apache.org
> >>> <ma...@apache.org>> wrote:
> >>>
> >>>     Are they using their own Travis CI pool, or did the switch to an
> >>>     entirely different CI service?
> >>>
> >>>     If we can just switch to our own Travis pool, just for our
> >>>     project, then
> >>>     this might be something we can do fairly quickly?
> >>>
> >>>     On 03/07/2019 05:55, Bowen Li wrote:
> >>>     > I responded in the INFRA ticket [1] that I believe they are
> >>>     using a wrong
> >>>     > metric against Flink and the total build time is a completely
> >>>     different
> >>>     > thing than guaranteed build capacity.
> >>>     >
> >>>     > My response:
> >>>     >
> >>>     > "As mentioned above, since I started to pay attention to Flink's
> >>>     build
> >>>     > queue a few tens of days ago, I'm in Seattle and I saw no build
> >>>     was kicking
> >>>     > off in PST daytime in weekdays for Flink. Our teammates in China
> >>>     and Europe
> >>>     > have also reported similar observations. So we need to evaluate
> >>>     how the
> >>>     > large total build time came from - if 1) your number and 2) our
> >>>     > observations from three locations that cover pretty much a full
> >>>     day, are
> >>>     > all true, I **guess** one reason can be that - highly likely the
> >>>     extra
> >>>     > build time came from weekends when other Apache projects may be
> >>>     idle and
> >>>     > Flink just drains hard its congested queue.
> >>>     >
> >>>     > Please be aware of that we're not complaining about the lack of
> >>>     resources
> >>>     > in general, I'm complaining about the lack of **stable,
> >>> dedicated**
> >>>     > resources. An example for the latter one is, currently even if
> >>>     no build is
> >>>     > in Flink's queue and I submit a request to be the queue head
> >>> in PST
> >>>     > morning, my build won't even start in 6-8+h. That is an absurd
> >>>     amount of
> >>>     > waiting time.
> >>>     >
> >>>     > That's saying, if ASF INFRA decides to adopt a quota system and
> >>>     grants
> >>>     > Flink five DEDICATED servers that runs all the time only for
> >>>     Flink, that'll
> >>>     > be PERFECT and can totally solve our problem now.
> >>>     >
> >>>     > Please be aware of that we're not complaining about the lack of
> >>>     resources
> >>>     > in general, I'm complaining about the lack of **stable,
> >>> dedicated**
> >>>     > resources. An example for the latter one is, currently even if
> >>>     no build is
> >>>     > in Flink's queue and I submit a request to be the queue head
> >>> in PST
> >>>     > morning, my build won't even start in 6-8+h. That is an absurd
> >>>     amount of
> >>>     > waiting time.
> >>>     >
> >>>     >
> >>>     > That's saying, if ASF INFRA decides to adopt a quota system and
> >>>     grants
> >>>     > Flink five DEDICATED servers that runs all the time only for
> >>>     Flink, that'll
> >>>     > be PERFECT and can totally solve our problem now.
> >>>     >
> >>>     > I feel what's missing in the ASF INFRA's Travis resource pool is
> >>>     some level
> >>>     > of build capacity SLAs and certainty"
> >>>     >
> >>>     >
> >>>     > Again, I believe there are differences in nature of these two
> >>>     problems,
> >>>     > long build time v.s. lack of dedicated build resource. That's
> >>>     saying,
> >>>     > shortening build time may relieve the situation, and may not.
> >>>     I'm sightly
> >>>     > negative on disabling IT cases for PRs, due to the downside is
> >>>     that we are
> >>>     > at risk of any potential bugs in PR that UTs doesn't catch, and
> >>>     may cost a
> >>>     > lot more to fix and if it slows others down or even block
> >>>     others, but am
> >>>     > open to others opinions on it.
> >>>     >
> >>>     > AFAICT from INFRA ticket[1], donating to ASF INFRA won't be
> >>>     feasible to
> >>>     > solve our problem since INFRA's pool is fully shared and they
> >>>     have no
> >>>     > control and finer insights over resource allocation to a
> >>>     specific Apache
> >>>     > project. As mentioned in [1], Apache Arrow is moving away from
> >>>     ASF INFRA
> >>>     > Travis pool (they are actually surprised Flink hasn't plan to do
> >>>     so). I
> >>>     > know that Spark is on its own build infra. If we all agree that
> >>>     funding our
> >>>     > own build infra, I'd be glad to help investigate any potential
> >>>     options
> >>>     > after releasing 1.9 since I'm super busy with 1.9 now.
> >>>     >
> >>>     > [1] https://issues.apache.org/jira/browse/INFRA-18533
> >>>     >
> >>>     >
> >>>     >
> >>>     > On Tue, Jul 2, 2019 at 4:46 AM Chesnay Schepler
> >>>     <chesnay@apache.org <ma...@apache.org>> wrote:
> >>>     >
> >>>     >> As a short-term stopgap, since we can assume this issue to
> >>>     become much
> >>>     >> worse in the following days/weeks, we could disable IT cases in
> >>>     PRs and
> >>>     >> only run them on master.
> >>>     >>
> >>>     >> On 02/07/2019 12:03, Chesnay Schepler wrote:
> >>>     >>> People really have to stop thinking that just because
> >>>     something works
> >>>     >>> for us it is also a good solution.
> >>>     >>> Also, please remember that our builds run for 2h from start to
> >>>     finish,
> >>>     >>> and not the 14 _minutes_ it takes for zeppelin.
> >>>     >>> We are dealing with an entirely different scale here, both in
> >>>     terms of
> >>>     >>> build times and number of builds.
> >>>     >>>
> >>>     >>> In this very thread people have been complaining about long
> >>> queue
> >>>     >>> times for their builds. Surprise, other Apache projects have
> >>> been
> >>>     >>> suffering the very same thing due to us not controlling our
> >>> build
> >>>     >>> times. While switching services (be it Jenkins, CircleCI or
> >>>     whatever)
> >>>     >>> will possibly work for us (and these options are actually
> >>>     attractive,
> >>>     >>> like CircleCI's proper support for build artifacts), it will
> >>> also
> >>>     >>> result in us likely negatively affecting other projects in
> >>>     significant
> >>>     >>> ways.
> >>>     >>>
> >>>     >>> Sure, the Jenkins setup has a good user experience for us, at
> >>>     the cost
> >>>     >>> of blocking Jenkins workers for a _lot_ of time. Right now we
> >>>     have 25
> >>>     >>> PR's in our queue; that's possibly 50h we'd consume of Jenkins
> >>>     >>> resources, and the European contributors haven't even really
> >>>     started yet.
> >>>     >>>
> >>>     >>> FYI, the latest INFRA response from INFRA-18533:
> >>>     >>>
> >>>     >>> "Our rough metrics shows that Flink used over 5800 hours of
> >>>     build time
> >>>     >>> last month. That is equal to EIGHT servers running 24/7 for
> >>>     the ENTIRE
> >>>     >>> MONTH. EIGHT. nonstop.
> >>>     >>> When we discovered this last night, we discussed it some and
> >>>     are going
> >>>     >>> to tune down Flink to allow only five executors maximum. We
> >>> cannot
> >>>     >>> allow Flink to consume so much of a Foundation shared
> >>> resource."
> >>>     >>>
> >>>     >>> So yes, we either
> >>>     >>> a) have to heavily reduce our CI usage or
> >>>     >>> b) fund our own, either maintaining it ourselves or donating
> >>>     to Apache.
> >>>     >>>
> >>>     >>> On 02/07/2019 05:11, Bowen Li wrote:
> >>>     >>>> By looking at the git history of the Jenkins script, its core
> >>>     part
> >>>     >>>> was finished in March 2017 (and only two minor update in
> >>>     2017/2018),
> >>>     >>>> so it's been running for over two years now and feels like
> >>>     Zepplin
> >>>     >>>> community has been quite happy with it. @Jeff Zhang
> >>>     >>>> <mailto:zjffdu@gmail.com <ma...@gmail.com>> can you
> >>>     share your insights and user
> >>>     >>>> experience with the Jenkins+Travis approach?
> >>>     >>>>
> >>>     >>>> Things like:
> >>>     >>>>
> >>>     >>>> - has the approach completely solved the resource capacity
> >>>     problem
> >>>     >>>> for Zepplin community? is Zepplin community happy with the
> >>>     result?
> >>>     >>>> - is the whole configuration chain stable (e.g. uptime)
> >>> enough?
> >>>     >>>> - how often do you need to maintain the Jenkins infra? how
> >>> many
> >>>     >>>> people are usually involved in maintenance and bug-fixes?
> >>>     >>>>
> >>>     >>>> The downside of this approach seems mostly to be on the
> >>>     maintenance
> >>>     >>>> to me - maintain the script and Jenkins infra.
> >>>     >>>>
> >>>     >>>> ** Having Our Own Travis-CI.com Account **
> >>>     >>>>
> >>>     >>>> Another alternative I've been thinking of is to have our own
> >>>     >>>> travis-ci.com <http://travis-ci.com> <http://travis-ci.com>
> >>>     account with paid dedicated
> >>>     >>>> resources. Note travis-ci.org <http://travis-ci.org>
> >>> <http://travis-ci.org> is the free
> >>>     >>>> version and travis-ci.com <http://travis-ci.com>
> >>> <http://travis-ci.com> is the commercial
> >>>     >>>> version. We currently use a shared resource pool managed by
> >>>     ASK INFRA
> >>>     >>>> team on travis-ci.org <http://travis-ci.org>
> >>> <http://travis-ci.org>, but we have no control
> >>>     >>>> over it - we can't see how it's configured, how much
> >>>     resources are
> >>>     >>>> available, how resources are allocated among Apache projects,
> >>>     etc.
> >>>     >>>> The nice thing about having an account on travis-ci.com
> >>> <http://travis-ci.com>
> >>>     >>>> <http://travis-ci.com> are:
> >>>     >>>>
> >>>     >>>> - relatively low cost with much better resource guarantee
> >>>     than what
> >>>     >>>> we currently have [1]: $249/month with 5 dedicated
> >>> concurrency,
> >>>     >>>> $489/month with 10 concurrency
> >>>     >>>> - low maintenance work compared to using Jenkins
> >>>     >>>> - (potentially) no migration cost according to Travis's doc
> >>> [2]
> >>>     >>>> (pending verification)
> >>>     >>>> - full control over the build capacity/configuration
> >>> compared to
> >>>     >>>> using ASF INFRA's pool
> >>>     >>>>
> >>>     >>>> I'd be surprised if we as such a vibrant community cannot
> >>>     find and
> >>>     >>>> fund $249*12=$2988 a year in exchange for a much better
> >>> developer
> >>>     >>>> experience and much higher productivity.
> >>>     >>>>
> >>>     >>>> [1] https://travis-ci.com/plans
> >>>     >>>> [2]
> >>>     >>>>
> >>>     >>
> >>>
> https://docs.travis-ci.com/user/migrate/open-source-repository-migration
> >>>
> >>>     >>>> On Sat, Jun 29, 2019 at 8:39 AM Chesnay Schepler
> >>>     <chesnay@apache.org <ma...@apache.org>
> >>>     >>>> <mailto:chesnay@apache.org <ma...@apache.org>>>
> >>> wrote:
> >>>     >>>>
> >>>     >>>>      So yes, the Jenkins job keeps pulling the state from
> >>>     Travis until it
> >>>     >>>>      finishes.
> >>>     >>>>
> >>>     >>>>      Note sure I'm comfortable with the idea of using Jenkins
> >>>     workers
> >>>     >>>>      just to
> >>>     >>>>      idle for a several hours.
> >>>     >>>>
> >>>     >>>>      On 29/06/2019 14:56, Jeff Zhang wrote:
> >>>     >>>>      > Here's what zeppelin community did, we make a python
> >>>     script to
> >>>     >>>>      check the
> >>>     >>>>      > build status of pull request.
> >>>     >>>>      > Here's script:
> >>>     >>>>      >
> >>> https://github.com/apache/zeppelin/blob/master/travis_check.py
> >>>     >>>>      >
> >>>     >>>>      > And this is the script we used in Jenkins build job.
> >>>     >>>>      >
> >>>     >>>>      > if [ -f "travis_check.py" ]; then
> >>>     >>>>      >    git log -n 1
> >>>     >>>>      >    STATUS=$(curl -s $BUILD_URL | grep -e "GitHub pull
> >>>     >>>>      request.*from.*" | sed
> >>>     >>>>      > 's/.*GitHub pull request <a
> >>>     >>>>      > href=\"\(https[^"]*\).*from[^"]*.\(https[^"]*\).*/\1
> >>>     \2/g')
> >>>     >>>>      >    AUTHOR=$(echo $STATUS | sed 's/.*[/]\(.*\)$/\1/g')
> >>>     >>>>      >    PR=$(echo $STATUS | awk '{print $1}' | sed
> >>>     >>>> 's/.*[/]\(.*\)$/\1/g')
> >>>     >>>>      >    #COMMIT=$(git log -n 1 | grep "^Merge:" | awk
> >>>     '{print $3}')
> >>>     >>>>      >    #if [ -z $COMMIT ]; then
> >>>     >>>>      >    #  COMMIT=$(curl -s
> >>>     >>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR
> >>>     >>>>      > | grep -e "\"label\":" -e "\"ref\":" -e "\"sha\":" |
> >>>     tr '\n' ' '
> >>>     >>>>      | sed
> >>>     >>>>      > 's/\(.*sha[^,]*,\)\(.*ref.*\)/\1 = \2/g' | tr = '\n' |
> >>>     grep -v
> >>>     >>>>      "apache:" |
> >>>     >>>>      > sed 's/.*sha.[^"]*["]\([^"]*\).*/\1/g')
> >>>     >>>>      >    #fi
> >>>     >>>>      >
> >>>     >>>>      >    # get commit hash from PR
> >>>     >>>>      >    COMMIT=$(curl -s
> >>>     >>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR |
> >>>     >>>>      > grep -e "\"label\":" -e "\"ref\":" -e "\"sha\":" | tr
> >>>     '\n' ' '
> >>>     >>>> | sed
> >>>     >>>>      > 's/\(.*sha[^,]*,\)\(.*ref.*\)/\1 = \2/g' | tr = '\n' |
> >>>     grep -v
> >>>     >>>>      "apache:" |
> >>>     >>>>      > sed 's/.*sha.[^"]*["]\([^"]*\).*/\1/g')
> >>>     >>>>      >    sleep 30 # sleep few moment to wait travis starts
> >>>     the build
> >>>     >>>>      >    RET_CODE=0
> >>>     >>>>      >    python ./travis_check.py ${AUTHOR} ${COMMIT} ||
> >>>     RET_CODE=$?
> >>>     >>>>      >    if [ $RET_CODE -eq 2 ]; then # try with repository
> >>>     name when
> >>>     >>>>      travis-ci is
> >>>     >>>>      > not available in the account
> >>>     >>>>      >      RET_CODE=0
> >>>     >>>>      >      AUTHOR=$(curl -s
> >>>     >>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR
> >>>     >>>>      > | grep '"full_name":' | grep -v "apache/zeppelin" | sed
> >>>     >>>>      > 's/.*[:][^"]*["]\([^/]*\).*/\1/g')
> >>>     >>>>      >    python ./travis_check.py ${AUTHOR} ${COMMIT} ||
> >>>     RET_CODE=$?
> >>>     >>>>      >    fi
> >>>     >>>>      >
> >>>     >>>>      >    if [ $RET_CODE -eq 2 ]; then # fail with can't find
> >>>     build
> >>>     >>>>      information in
> >>>     >>>>      > the travis
> >>>     >>>>      >      set +x
> >>>     >>>>      >      echo
> >>>     "-----------------------------------------------------"
> >>>     >>>>      >      echo "Looks like travis-ci is not configured for
> >>>     your fork."
> >>>     >>>>      >      echo "Please setup by swich on 'zeppelin'
> >>>     repository at
> >>>     >>>>      > https://travis-ci.org/profile and travis-ci."
> >>>     >>>>      >      echo "And then make sure 'Build branch updates'
> >>>     option is
> >>>     >>>>      enabled in
> >>>     >>>>      > the settings
> >>> https://travis-ci.org/${AUTHOR}/zeppelin/settings
> >>> <https://travis-ci.org/$%7BAUTHOR%7D/zeppelin/settings>
> >>>     >>>> <https://travis-ci.org/$%7BAUTHOR%7D/zeppelin/settings>."
> >>>     >>>>      >      echo ""
> >>>     >>>>      >      echo "To trigger CI after setup, you will need
> >>>     ammend your
> >>>     >>>>      last commit
> >>>     >>>>      > with"
> >>>     >>>>      >      echo "git commit --amend"
> >>>     >>>>      >      echo "git push your-remote HEAD --force"
> >>>     >>>>      >      echo ""
> >>>     >>>>      >      echo "See
> >>>     >>>>      >
> >>>     >>>>
> >>>     >>
> >>>
> http://zeppelin.apache.org/contribution/contributions.html#continuous-integration
> >>>     >>>>      > ."
> >>>     >>>>      >    fi
> >>>     >>>>      >
> >>>     >>>>      >    exit $RET_CODE
> >>>     >>>>      > else
> >>>     >>>>      >    set +x
> >>>     >>>>      >    echo "travis_check.py does not exists"
> >>>     >>>>      >    exit 1
> >>>     >>>>      > fi
> >>>     >>>>      >
> >>>     >>>>      > Chesnay Schepler <chesnay@apache.org
> >>> <ma...@apache.org>
> >>>     >>>>      <mailto:chesnay@apache.org <ma...@apache.org>>>
> >>>     于2019年6月29日周六 下午3:17写道:
> >>>     >>>>      >
> >>>     >>>>      >> Does this imply that a Jenkins job is active as long
> >>>     as the
> >>>     >>>>      Travis build
> >>>     >>>>      >> runs?
> >>>     >>>>      >>
> >>>     >>>>      >> On 26/06/2019 21:28, Bowen Li wrote:
> >>>     >>>>      >>> Hi,
> >>>     >>>>      >>>
> >>>     >>>>      >>> @Dawid, I think the "long test running" as I
> >>>     mentioned in the
> >>>     >>>>      first
> >>>     >>>>      >> email,
> >>>     >>>>      >>> also as you guys said, belongs to "a big effort
> >>>     which is much
> >>>     >>>>      harder to
> >>>     >>>>      >>> accomplish in a short period of time and may deserve
> >>>     its own
> >>>     >>>>      separate
> >>>     >>>>      >>> discussion". Thus I didn't include it in what we can
> >>>     do in a
> >>>     >>>>      foreseeable
> >>>     >>>>      >>> short term.
> >>>     >>>>      >>>
> >>>     >>>>      >>> Besides, I don't think that's the ultimate reason
> >>>     for lack of
> >>>     >>>>      build
> >>>     >>>>      >>> resources. Even if the build is shortened to
> >>>     something like
> >>>     >>>>      2h, the
> >>>     >>>>      >>> problems of no build machine works about 6 or more
> >>>     hours in
> >>>     >>>>      PST daytime
> >>>     >>>>      >>> that I described will still happen, because no
> >>>     machine from
> >>>     >>>>      ASF INFRA's
> >>>     >>>>      >>> pool is allocated to Flink. As I have paid close
> >>>     attention to
> >>>     >>>>      the build
> >>>     >>>>      >>> queue in the past few weekdays, it's a pretty clear
> >>>     pattern now.
> >>>     >>>>      >>>
> >>>     >>>>      >>> **The ultimate root cause** for that is - we don't
> >>>     have any
> >>>     >>>>      **dedicated**
> >>>     >>>>      >>> build resources that we can stably rely on. I'm
> >>>     actually ok to
> >>>     >>>>      wait for a
> >>>     >>>>      >>> long time if there are build requests running, it
> >>>     means at
> >>>     >>>>      least we are
> >>>     >>>>      >>> making progress. But I'm not ok with no build
> >>>     resource. A
> >>>     >>>>      better place I
> >>>     >>>>      >>> think we should aim at in short term is to always
> >>>     have at
> >>>     >>>>      least a central
> >>>     >>>>      >>> pool (can be 3 or 5) of machines dedicated to build
> >>>     Flink at
> >>>     >>>>      any time, or
> >>>     >>>>      >>> maybe use users resources.
> >>>     >>>>      >>>
> >>>     >>>>      >>> @Chesnay @Robert I synced with Jeff offline that
> >>>     Zeppelin
> >>>     >>>>      community is
> >>>     >>>>      >>> using a Jenkins job to automatically build on users'
> >>>     travis
> >>>     >>>>      account and
> >>>     >>>>      >>> link the result back to github PR. I guess the
> >>>     Jenkins job
> >>>     >>>>      would fetch
> >>>     >>>>      >>> latest upstream master and build the PR against it.
> >>>     Jeff has
> >>>     >>>> filed
> >>>     >>>>      >> tickets
> >>>     >>>>      >>> to learn and get access to the Jenkins infra. It'll
> >>>     better to
> >>>     >>>>      fully
> >>>     >>>>      >>> understand it first before judging this approach.
> >>>     >>>>      >>>
> >>>     >>>>      >>> I also heard good things about CircleCI, and ASF
> >>>     INFRA seems
> >>>     >>>>      to have a
> >>>     >>>>      >> pool
> >>>     >>>>      >>> of build capacity there too. Can be an alternative
> >>>     to consider.
> >>>     >>>>      >>>
> >>>     >>>>      >>>
> >>>     >>>>      >>>
> >>>     >>>>      >>>
> >>>     >>>>      >>>
> >>>     >>>>      >>>
> >>>     >>>>      >>>
> >>>     >>>>      >>>
> >>>     >>>>      >>>
> >>>     >>>>      >>> On Wed, Jun 26, 2019 at 12:44 AM Dawid Wysakowicz <
> >>>     >>>>      >> dwysakowicz@apache.org
> >>> <ma...@apache.org> <mailto:dwysakowicz@apache.org
> >>> <ma...@apache.org>>>
> >>>     >>>>      >>> wrote:
> >>>     >>>>      >>>
> >>>     >>>>      >>>> Sorry to jump in late, but I think Bowen missed the
> >>>     most
> >>>     >>>>      important point
> >>>     >>>>      >>>> from Chesnay's previous message in the summary. The
> >>>     ultimate
> >>>     >>>>      reason for
> >>>     >>>>      >>>> all the problems is that the tests take close to 2
> >>>     hours to
> >>>     >>>>      run already.
> >>>     >>>>      >>>> I fully support this claim: "Unless people start
> >>>     caring about
> >>>     >>>>      test times
> >>>     >>>>      >>>> before adding them, this issue cannot be solved"
> >>>     >>>>      >>>>
> >>>     >>>>      >>>> This is also another reason why using user's Travis
> >>>     account
> >>>     >>>>      won't help.
> >>>     >>>>      >>>> Every few weeks we reach the user's time limit for
> >>>     a single
> >>>     >>>>      profile.
> >>>     >>>>      >>>> This makes the user's builds simply fail, until we
> >>>     either
> >>>     >>>>      properly
> >>>     >>>>      >>>> decrease the time the tests take (which I am not
> >>>     sure we ever
> >>>     >>>>      did) or
> >>>     >>>>      >>>> postpone the problem by splitting into more
> >>>     profiles. (Note
> >>>     >>>>      that the ASF
> >>>     >>>>      >>>> Travis account has higher time limits)
> >>>     >>>>      >>>>
> >>>     >>>>      >>>> Best,
> >>>     >>>>      >>>>
> >>>     >>>>      >>>> Dawid
> >>>     >>>>      >>>>
> >>>     >>>>      >>>> On 26/06/2019 09:36, Robert Metzger wrote:
> >>>     >>>>      >>>>> Do we know if using "the best" available hardware
> >>>     would
> >>>     >>>>      improve the
> >>>     >>>>      >> build
> >>>     >>>>      >>>>> times?
> >>>     >>>>      >>>>> Imagine we would run the build on machines with
> >>>     plenty of
> >>>     >>>>      main memory
> >>>     >>>>      >> to
> >>>     >>>>      >>>>> mount everything to ramdisk + the latest CPU
> >>>     architecture?
> >>>     >>>>      >>>>>
> >>>     >>>>      >>>>> Throwing hardware at the problem could help reduce
> >>>     the time
> >>>     >>>>      of an
> >>>     >>>>      >>>>> individual build, and using our own infrastructure
> >>>     would
> >>>     >>>>      remove our
> >>>     >>>>      >>>>> dependency on Apache's Travis account (with the
> >>>     obvious
> >>>     >>>>      downside of
> >>>     >>>>      >>>> having
> >>>     >>>>      >>>>> to maintain the infrastructure)
> >>>     >>>>      >>>>> We could use an open source travis alternative, to
> >>>     have a
> >>>     >>>>      similar
> >>>     >>>>      >>>>> experience and make the migration easy.
> >>>     >>>>      >>>>>
> >>>     >>>>      >>>>>
> >>>     >>>>      >>>>> On Wed, Jun 26, 2019 at 9:34 AM Chesnay Schepler
> >>>     >>>>      <chesnay@apache.org <ma...@apache.org>
> >>>     <mailto:chesnay@apache.org <ma...@apache.org>>>
> >>>     >>>>      >>>> wrote:
> >>>     >>>>      >>>>>>    >From what I gathered, there's no special
> >>>     sauce that the
> >>>     >>>>      Zeppelin
> >>>     >>>>      >>>>>> project uses which actually integrates a users
> >>> Travis
> >>>     >>>>      account into the
> >>>     >>>>      >>>> PR.
> >>>     >>>>      >>>>>> They just disabled Travis for PRs. And that's
> >>>     kind of it.
> >>>     >>>>      >>>>>>
> >>>     >>>>      >>>>>> Naturally we can do this (duh) and safe the ASF a
> >>>     fair
> >>>     >>>>      amount of
> >>>     >>>>      >>>>>> resources, but there are downsides:
> >>>     >>>>      >>>>>>
> >>>     >>>>      >>>>>> The discoverability of the Travis check takes a
> >>>     nose-dive.
> >>>     >>>>      Either we
> >>>     >>>>      >>>>>> require every contributor to always, an every
> >>>     commit, also
> >>>     >>>>      post a
> >>>     >>>>      >> Travis
> >>>     >>>>      >>>>>> build, or we have the reviewer sift through the
> >>>     >>>>      contributors account
> >>>     >>>>      >> to
> >>>     >>>>      >>>>>> find it.
> >>>     >>>>      >>>>>>
> >>>     >>>>      >>>>>> This is rather cumbersome. Additionally, it's
> >>>     also not
> >>>     >>>>      equivalent to
> >>>     >>>>      >>>>>> having a PR build.
> >>>     >>>>      >>>>>>
> >>>     >>>>      >>>>>> A normal branch build takes a branch as is and
> >>>     tests it. A
> >>>     >>>>      PR build
> >>>     >>>>      >>>>>> merges the branch into master, and then runs it.
> >>>     (Fun fact:
> >>>     >>>>      This is
> >>>     >>>>      >> why
> >>>     >>>>      >>>>>> a PR without merge conflicts is not being run on
> >>>     Travis.)
> >>>     >>>>      >>>>>>
> >>>     >>>>      >>>>>> And ultimately, everyone can already make use
> >>> of this
> >>>     >>>>      approach anyway.
> >>>     >>>>      >>>>>>
> >>>     >>>>      >>>>>> On 25/06/2019 08:02, Jark Wu wrote:
> >>>     >>>>      >>>>>>> Hi Jeff,
> >>>     >>>>      >>>>>>>
> >>>     >>>>      >>>>>>> Thanks for sharing the Zeppelin approach. I
> >>>     think it's a
> >>>     >>>>      good idea to
> >>>     >>>>      >>>>>>> leverage user's travis account.
> >>>     >>>>      >>>>>>> In this way, we can have almost unlimited
> >>>     concurrent build
> >>>     >>>>      jobs and
> >>>     >>>>      >>>>>>> developers can restart build by themselves
> >>>     (currently only
> >>>     >>>>      committers
> >>>     >>>>      >>>>>>> can restart PR's build).
> >>>     >>>>      >>>>>>>
> >>>     >>>>      >>>>>>> But I'm still not very clear how to integrate
> >>> user's
> >>>     >>>>      travis build
> >>>     >>>>      >> into
> >>>     >>>>      >>>>>>> the Flink pull request's build automatically.
> >>>     Can you
> >>>     >>>>      explain more in
> >>>     >>>>      >>>>>>> detail?
> >>>     >>>>      >>>>>>>
> >>>     >>>>      >>>>>>> Another question: does travis only build
> >>>     branches for user
> >>>     >>>>      account?
> >>>     >>>>      >>>>>>> My concern is that builds for PRs will rebase
> >>> user's
> >>>     >>>>      commits against
> >>>     >>>>      >>>>>>> current master branch.
> >>>     >>>>      >>>>>>> This will help us to find problems before
> >>>     merge.  Builds
> >>>     >>>>      for branches
> >>>     >>>>      >>>>>>> will lose the impact of new commits in master.
> >>>     >>>>      >>>>>>> How does Zeppelin solve this problem?
> >>>     >>>>      >>>>>>>
> >>>     >>>>      >>>>>>> Thanks again for sharing the idea.
> >>>     >>>>      >>>>>>>
> >>>     >>>>      >>>>>>> Regards,
> >>>     >>>>      >>>>>>> Jark
> >>>     >>>>      >>>>>>>
> >>>     >>>>      >>>>>>> On Tue, 25 Jun 2019 at 11:01, Jeff Zhang
> >>>     <zjffdu@gmail.com <ma...@gmail.com>
> >>>     >>>>      <mailto:zjffdu@gmail.com <ma...@gmail.com>>
> >>>     >>>>      >>>>>>> <mailto:zjffdu@gmail.com
> >>> <ma...@gmail.com> <mailto:zjffdu@gmail.com
> >>> <ma...@gmail.com>>>> wrote:
> >>>     >>>>      >>>>>>>
> >>>     >>>>      >>>>>>>  Hi Folks,
> >>>     >>>>      >>>>>>>
> >>>     >>>>      >>>>>>>  Zeppelin meet this kind of issue before, we
> >>> solve
> >>>     >>>> it by
> >>>     >>>>      >> delegating
> >>>     >>>>      >>>>>>>  each
> >>>     >>>>      >>>>>>>  one's PR build to his travis account
> >>>     (Everyone can
> >>>     >>>>      have 5 free
> >>>     >>>>      >>>>>>>  slot for
> >>>     >>>>      >>>>>>>  travis build).
> >>>     >>>>      >>>>>>>  Apache account travis build is only triggered
> >>> when
> >>>     >>>>      PR is merged.
> >>>     >>>>      >>>>>>>
> >>>     >>>>      >>>>>>>
> >>>     >>>>      >>>>>>>
> >>>     >>>>      >>>>>>>  Kurt Young <ykt836@gmail.com
> >>> <ma...@gmail.com>
> >>>     >>>>      <mailto:ykt836@gmail.com <ma...@gmail.com>>
> >>>     <mailto:ykt836@gmail.com <ma...@gmail.com>
> >>>     >>>>      <mailto:ykt836@gmail.com <ma...@gmail.com>>>>
> >>>     >>>>      >>>>>>>  于2019年6月25日周二 上午10:16写道:
> >>>     >>>>      >>>>>>>
> >>>     >>>>      >>>>>>>  > (Forgot to cc George)
> >>>     >>>>      >>>>>>>  >
> >>>     >>>>      >>>>>>>  > Best,
> >>>     >>>>      >>>>>>>  > Kurt
> >>>     >>>>      >>>>>>>  >
> >>>     >>>>      >>>>>>>  >
> >>>     >>>>      >>>>>>>  > On Tue, Jun 25, 2019 at 10:16 AM Kurt Young
> >>>     >>>>      <ykt836@gmail.com <ma...@gmail.com>
> >>>     <mailto:ykt836@gmail.com <ma...@gmail.com>>
> >>>     >>>>      >>>>>>> <mailto:ykt836@gmail.com
> >>> <ma...@gmail.com> <mailto:ykt836@gmail.com
> >>> <ma...@gmail.com>>>>
> >>>     >>>>      wrote:
> >>>     >>>>      >>>>>>>  >
> >>>     >>>>      >>>>>>>  > > Hi Bowen,
> >>>     >>>>      >>>>>>>  > >
> >>>     >>>>      >>>>>>>  > > Thanks for bringing this up. We
> >>>     actually have
> >>>     >>>>      discussed
> >>>     >>>>      >> about
> >>>     >>>>      >>>>>>>  this, and I
> >>>     >>>>      >>>>>>>  > > think Till and George have
> >>>     >>>>      >>>>>>>  > > already spend sometime investigating
> >>>     it. I have
> >>>     >>>>      cced both of
> >>>     >>>>      >>>>>>>  them, and
> >>>     >>>>      >>>>>>>  > > maybe they can share
> >>>     >>>>      >>>>>>>  > > their findings.
> >>>     >>>>      >>>>>>>  > >
> >>>     >>>>      >>>>>>>  > > Best,
> >>>     >>>>      >>>>>>>  > > Kurt
> >>>     >>>>      >>>>>>>  > >
> >>>     >>>>      >>>>>>>  > >
> >>>     >>>>      >>>>>>>  > > On Tue, Jun 25, 2019 at 10:08 AM Jark Wu
> >>>     >>>>      <imjark@gmail.com <ma...@gmail.com>
> >>>     <mailto:imjark@gmail.com <ma...@gmail.com>>
> >>>     >>>>      >>>>>>> <mailto:imjark@gmail.com
> >>> <ma...@gmail.com> <mailto:imjark@gmail.com
> >>> <ma...@gmail.com>>>>
> >>>     >>>>      wrote:
> >>>     >>>>      >>>>>>>  > >
> >>>     >>>>      >>>>>>>  > >> Hi Bowen,
> >>>     >>>>      >>>>>>>  > >>
> >>>     >>>>      >>>>>>>  > >> Thanks for bringing this. We also
> >>>     suffered from
> >>>     >>>>      the long
> >>>     >>>>      >>>>>>>  build time.
> >>>     >>>>      >>>>>>>  > >> I agree that we should focus on
> >>>     solving build
> >>>     >>>>      capacity
> >>>     >>>>      >>>>>>>  problem in the
> >>>     >>>>      >>>>>>>  > >> thread.
> >>>     >>>>      >>>>>>>  > >>
> >>>     >>>>      >>>>>>>  > >> My observation is there is only one
> >>>     build is
> >>>     >>>>      running, all
> >>>     >>>>      >> the
> >>>     >>>>      >>>>>>>  others
> >>>     >>>>      >>>>>>>  > >> (other
> >>>     >>>>      >>>>>>>  > >> PRs, master) are pending.
> >>>     >>>>      >>>>>>>  > >> The pricing plan[1] of travis shows
> >>>     it can
> >>>     >>>> support
> >>>     >>>>      >> concurrent
> >>>     >>>>      >>>>>>>  build
> >>>     >>>>      >>>>>>>  > jobs.
> >>>     >>>>      >>>>>>>  > >> But I don't know which plan we are
> >>>     using, might
> >>>     >>>>      be the free
> >>>     >>>>      >>>>>>>  plan for
> >>>     >>>>      >>>>>>>  > open
> >>>     >>>>      >>>>>>>  > >> source.
> >>>     >>>>      >>>>>>>  > >>
> >>>     >>>>      >>>>>>>  > >> I cc-ed Chesnay who may have some
> >>>     experience on
> >>>     >>>>      Travis.
> >>>     >>>>      >>>>>>>  > >>
> >>>     >>>>      >>>>>>>  > >> Regards,
> >>>     >>>>      >>>>>>>  > >> Jark
> >>>     >>>>      >>>>>>>  > >>
> >>>     >>>>      >>>>>>>  > >> [1]: https://travis-ci.com/plans
> >>>     >>>>      >>>>>>>  > >>
> >>>     >>>>      >>>>>>>  > >> On Tue, 25 Jun 2019 at 08:11, Bowen Li <
> >>>     >>>>      >> bowenli86@gmail.com <ma...@gmail.com>
> >>>     <mailto:bowenli86@gmail.com <ma...@gmail.com>>
> >>>     >>>>      >>>>>>> <mailto:bowenli86@gmail.com
> >>> <ma...@gmail.com>
> >>>     >>>>      <mailto:bowenli86@gmail.com
> >>> <ma...@gmail.com>>>> wrote:
> >>>     >>>>      >>>>>>>  > >>
> >>>     >>>>      >>>>>>>  > >> > Hi Steven,
> >>>     >>>>      >>>>>>>  > >> >
> >>>     >>>>      >>>>>>>  > >> > I think you may not read what I
> >>>     wrote. The
> >>>     >>>>      discussion is
> >>>     >>>>      >>>> about
> >>>     >>>>      >>>>>>>  > "unstable
> >>>     >>>>      >>>>>>>  > >> > build **capacity**", in another word
> >>>     >>>>      "unstable / lack of
> >>>     >>>>      >>>> build
> >>>     >>>>      >>>>>>>  > >> resources",
> >>>     >>>>      >>>>>>>  > >> > not "unstable build".
> >>>     >>>>      >>>>>>>  > >> >
> >>>     >>>>      >>>>>>>  > >> > On Mon, Jun 24, 2019 at 4:40 PM
> >>>     Steven Wu
> >>>     >>>>      >>>>>>>  <stevenz3wu@gmail.com
> >>> <ma...@gmail.com> <mailto:stevenz3wu@gmail.com
> >>> <ma...@gmail.com>>
> >>>     >>>>      <mailto:stevenz3wu@gmail.com
> >>> <ma...@gmail.com> <mailto:stevenz3wu@gmail.com
> >>> <ma...@gmail.com>>>>
> >>>     >>>>      >>>>>>>  > wrote:
> >>>     >>>>      >>>>>>>  > >> >
> >>>     >>>>      >>>>>>>  > >> > > long and sometimes unstable build is
> >>>     >>>>      definitely a pain
> >>>     >>>>      >>>>>> point.
> >>>     >>>>      >>>>>>>  > >> > >
> >>>     >>>>      >>>>>>>  > >> > > I suspect the build failure here in
> >>>     >>>>      >> flink-connector-kafka
> >>>     >>>>      >>>>>>>  is not
> >>>     >>>>      >>>>>>>  > >> related
> >>>     >>>>      >>>>>>>  > >> > to
> >>>     >>>>      >>>>>>>  > >> > > my change. but there is no easy
> >>>     re-run the
> >>>     >>>>      build on
> >>>     >>>>      >>>>>>>  travis UI.
> >>>     >>>>      >>>>>>>  > Google
> >>>     >>>>      >>>>>>>  > >> > > search showed a trick of
> >>>     close-and-open the
> >>>     >>>>      PR will
> >>>     >>>>      >>>>>>>  trigger rebuild.
> >>>     >>>>      >>>>>>>  > >> but
> >>>     >>>>      >>>>>>>  > >> > > that could add noises to the PR
> >>>     activities.
> >>>     >>>>      >>>>>>>  > >> > >
> >>>     >>>> https://travis-ci.org/apache/flink/jobs/545555519
> >>>     >>>>      >>>>>>>  > >> > >
> >>>     >>>>      >>>>>>>  > >> > > travis-ci for my personal repo
> >>>     often failed
> >>>     >>>>      with
> >>>     >>>>      >>>>>>>  exceeding time
> >>>     >>>>      >>>>>>>  > limit
> >>>     >>>>      >>>>>>>  > >> > after
> >>>     >>>>      >>>>>>>  > >> > > 4+ hours.
> >>>     >>>>      >>>>>>>  > >> > > The job exceeded the maximum time
> >>>     limit for
> >>>     >>>>      jobs, and
> >>>     >>>>      >> has
> >>>     >>>>      >>>>>>>  been
> >>>     >>>>      >>>>>>>  > >> > terminated.
> >>>     >>>>      >>>>>>>  > >> > >
> >>>     >>>>      >>>>>>>  > >> > > On Mon, Jun 24, 2019 at 4:15 PM
> >>>     Bowen Li
> >>>     >>>>      >>>>>>>  <bowenli86@gmail.com
> >>> <ma...@gmail.com> <mailto:bowenli86@gmail.com
> >>> <ma...@gmail.com>>
> >>>     >>>>      <mailto:bowenli86@gmail.com <ma...@gmail.com>
> >>>     <mailto:bowenli86@gmail.com <ma...@gmail.com>>>>
> >>>     >>>>      >>>>>>>  > wrote:
> >>>     >>>>      >>>>>>>  > >> > >
> >>>     >>>>      >>>>>>>  > >> > > >
> >>>     >>>> https://travis-ci.org/apache/flink/builds/549681530
> >>>     >>>>      >>>>>>>  This build
> >>>     >>>>      >>>>>>>  > >> > request
> >>>     >>>>      >>>>>>>  > >> > > > has
> >>>     >>>>      >>>>>>>  > >> > > > been sitting at **HEAD of the
> >>>     queue**
> >>>     >>>>      since I first
> >>>     >>>>      >> saw
> >>>     >>>>      >>>>>>>  it at PST
> >>>     >>>>      >>>>>>>  > >> > 10:30am
> >>>     >>>>      >>>>>>>  > >> > > > (not sure how long it's been
> >>>     there before
> >>>     >>>>      10:30am).
> >>>     >>>>      >>>>>>>  It's PST
> >>>     >>>>      >>>>>>>  > 4:12pm
> >>>     >>>>      >>>>>>>  > >> now
> >>>     >>>>      >>>>>>>  > >> > > and
> >>>     >>>>      >>>>>>>  > >> > > > it hasn't started yet.
> >>>     >>>>      >>>>>>>  > >> > > >
> >>>     >>>>      >>>>>>>  > >> > > > On Mon, Jun 24, 2019 at 2:48 PM
> >>>     Bowen Li
> >>>     >>>>      >>>>>>>  <bowenli86@gmail.com
> >>> <ma...@gmail.com> <mailto:bowenli86@gmail.com
> >>> <ma...@gmail.com>>
> >>>     >>>>      <mailto:bowenli86@gmail.com <ma...@gmail.com>
> >>>     <mailto:bowenli86@gmail.com <ma...@gmail.com>>>>
> >>>     >>>>      >>>>>>>  > >> wrote:
> >>>     >>>>      >>>>>>>  > >> > > >
> >>>     >>>>      >>>>>>>  > >> > > > > Hi devs,
> >>>     >>>>      >>>>>>>  > >> > > > >
> >>>     >>>>      >>>>>>>  > >> > > > > I've been experiencing the pain
> >>>     >>>>      resulting from lack
> >>>     >>>>      >>>>>>>  of stable
> >>>     >>>>      >>>>>>>  > >> build
> >>>     >>>>      >>>>>>>  > >> > > > > capacity on Travis for Flink
> >>>     PRs [1].
> >>>     >>>>      >> Specifically, I
> >>>     >>>>      >>>>>>>  noticed
> >>>     >>>>      >>>>>>>  > >> often
> >>>     >>>>      >>>>>>>  > >> > > that
> >>>     >>>>      >>>>>>>  > >> > > > no
> >>>     >>>>      >>>>>>>  > >> > > > > build in the queue is making any
> >>>     >>>>      progress for
> >>>     >>>>      >> hours,
> >>>     >>>>      >>>> and
> >>>     >>>>      >>>>>>>  > suddenly
> >>>     >>>>      >>>>>>>  > >> 5
> >>>     >>>>      >>>>>>>  > >> > or
> >>>     >>>>      >>>>>>>  > >> > > 6
> >>>     >>>>      >>>>>>>  > >> > > > > builds kick off all together
> >>>     after the
> >>>     >>>>      long pause.
> >>>     >>>>      >>>>>>>  I'm at PST
> >>>     >>>>      >>>>>>>  > >> > (UTC-08)
> >>>     >>>>      >>>>>>>  > >> > > > time
> >>>     >>>>      >>>>>>>  > >> > > > > zone, and I've seen pause can
> >>>     be as
> >>>     >>>>      long as 6 hours
> >>>     >>>>      >>>>>>>  from PST 9am
> >>>     >>>>      >>>>>>>  > >> to
> >>>     >>>>      >>>>>>>  > >> > 3pm
> >>>     >>>>      >>>>>>>  > >> > > > > (let alone the time needed to
> >>>     drain the
> >>>     >>>>      queue
> >>>     >>>>      >>>>>>>  afterwards).
> >>>     >>>>      >>>>>>>  > >> > > > >
> >>>     >>>>      >>>>>>>  > >> > > > > I think this has greatly
> >>>     impacted our
> >>>     >>>>      productivity.
> >>>     >>>>      >>>> I've
> >>>     >>>>      >>>>>>>  > >> experienced
> >>>     >>>>      >>>>>>>  > >> > > that
> >>>     >>>>      >>>>>>>  > >> > > > > PRs submitted in the early
> >>>     morning of
> >>>     >>>>      PST time zone
> >>>     >>>>      >>>>>>>  won't finish
> >>>     >>>>      >>>>>>>  > >> > their
> >>>     >>>>      >>>>>>>  > >> > > > > build until late night of the
> >>>     same day.
> >>>     >>>>      >>>>>>>  > >> > > > >
> >>>     >>>>      >>>>>>>  > >> > > > > So my questions are:
> >>>     >>>>      >>>>>>>  > >> > > > >
> >>>     >>>>      >>>>>>>  > >> > > > > - Has anyone else experienced
> >>>     the same
> >>>     >>>>      problem or
> >>>     >>>>      >>>>>>>  have similar
> >>>     >>>>      >>>>>>>  > >> > > > observation
> >>>     >>>>      >>>>>>>  > >> > > > > on TravisCI? (I suspect it
> >>>     has things
> >>>     >>>>      to do with
> >>>     >>>>      >> time
> >>>     >>>>      >>>>>>>  zone)
> >>>     >>>>      >>>>>>>  > >> > > > >
> >>>     >>>>      >>>>>>>  > >> > > > > - What pricing plan of
> >>>     TravisCI is
> >>>     >>>>      Flink currently
> >>>     >>>>      >>>>>>>  using? Is it
> >>>     >>>>      >>>>>>>  > >> the
> >>>     >>>>      >>>>>>>  > >> > > free
> >>>     >>>>      >>>>>>>  > >> > > > > plan for open source
> >>>     projects? What
> >>>     >>>> are the
> >>>     >>>>      >>>>>>>  guaranteed build
> >>>     >>>>      >>>>>>>  > >> capacity
> >>>     >>>>      >>>>>>>  > >> > > of
> >>>     >>>>      >>>>>>>  > >> > > > > the current plan?
> >>>     >>>>      >>>>>>>  > >> > > > >
> >>>     >>>>      >>>>>>>  > >> > > > > - If the current pricing plan
> >>>     (either
> >>>     >>>>      free or paid)
> >>>     >>>>      >>>>>> can't
> >>>     >>>>      >>>>>>>  > provide
> >>>     >>>>      >>>>>>>  > >> > > stable
> >>>     >>>>      >>>>>>>  > >> > > > > build capacity, can we
> >>>     upgrade to a
> >>>     >>>>      higher priced
> >>>     >>>>      >>>>>>>  plan with
> >>>     >>>>      >>>>>>>  > larger
> >>>     >>>>      >>>>>>>  > >> > and
> >>>     >>>>      >>>>>>>  > >> > > > more
> >>>     >>>>      >>>>>>>  > >> > > > > stable build capacity?
> >>>     >>>>      >>>>>>>  > >> > > > >
> >>>     >>>>      >>>>>>>  > >> > > > > BTW, another factor that
> >>>     contribute to
> >>>     >>>> the
> >>>     >>>>      >>>>>>>  productivity problem
> >>>     >>>>      >>>>>>>  > is
> >>>     >>>>      >>>>>>>  > >> > that
> >>>     >>>>      >>>>>>>  > >> > > > > our build is slow - we run
> >>>     full build
> >>>     >>>>      for every PR
> >>>     >>>>      >>>> and a
> >>>     >>>>      >>>>>>>  > >> successful
> >>>     >>>>      >>>>>>>  > >> > > full
> >>>     >>>>      >>>>>>>  > >> > > > > build takes ~5h. We
> >>>     definitely have
> >>>     >>>>      more options to
> >>>     >>>>      >>>>>>>  solve it,
> >>>     >>>>      >>>>>>>  > for
> >>>     >>>>      >>>>>>>  > >> > > > instance,
> >>>     >>>>      >>>>>>>  > >> > > > > modularize the build graphs
> >>>     and reuse
> >>>     >>>>      artifacts
> >>>     >>>>      >> from
> >>>     >>>>      >>>> the
> >>>     >>>>      >>>>>>>  > previous
> >>>     >>>>      >>>>>>>  > >> > > build.
> >>>     >>>>      >>>>>>>  > >> > > > > But I think that can be a big
> >>>     effort
> >>>     >>>>      which is much
> >>>     >>>>      >>>>>>>  harder to
> >>>     >>>>      >>>>>>>  > >> > accomplish
> >>>     >>>>      >>>>>>>  > >> > > > in
> >>>     >>>>      >>>>>>>  > >> > > > > a short period of time and
> >>>     may deserve
> >>>     >>>>      its own
> >>>     >>>>      >>>> separate
> >>>     >>>>      >>>>>>>  > >> discussion.
> >>>     >>>>      >>>>>>>  > >> > > > >
> >>>     >>>>      >>>>>>>  > >> > > > > [1]
> >>>     >>>>      >> https://travis-ci.org/apache/flink/pull_requests
> >>>     >>>>      >>>>>>>  > >> > > > >
> >>>     >>>>      >>>>>>>  > >> > > > >
> >>>     >>>>      >>>>>>>  > >> > > >
> >>>     >>>>      >>>>>>>  > >> > >
> >>>     >>>>      >>>>>>>  > >> >
> >>>     >>>>      >>>>>>>  > >>
> >>>     >>>>      >>>>>>>  > >
> >>>     >>>>      >>>>>>>  >
> >>>     >>>>      >>>>>>>
> >>>     >>>>      >>>>>>>
> >>>     >>>>      >>>>>>>  --
> >>>     >>>>      >>>>>>>  Best Regards
> >>>     >>>>      >>>>>>>
> >>>     >>>>      >>>>>>>  Jeff Zhang
> >>>     >>>>      >>>>>>>
> >>>     >>>>      >>
> >>>     >>>>
> >>>     >>>
> >>>     >>
> >>>
> >>
> >>
> >
>
>

Re: [VOTE] Migrate to sponsored Travis account

Posted by Chesnay Schepler <ch...@apache.org>.
Note that the Flinkbot approach isn't that trivial either; we can't 
_just_ trigger builds for a branch in the apache repo, but would first 
have to clone the branch/pr into a separate repository (that is owned by 
the github account that the travis account would be tied to).

One roadblock after the next showing up...

On 04/07/2019 11:59, Chesnay Schepler wrote:
> Small update with mostly bad news:
>
> INFRA doesn't know whether it is possible, and referred my to Travis 
> support.
> They did point out that it could be problematic in regards to 
> read/write permissions for the repository.
>
> From my own findings /so far/ with a test repo/organization, it does 
> not appear possible to configure the Travis account used for a 
> specific repository.
>
> So yeah, if we go down this route we may have to pimp the Flinkbot to 
> trigger builds through the Travis REST API.
>
> On 04/07/2019 10:46, Chesnay Schepler wrote:
>> I've raised a JIRA 
>> <https://issues.apache.org/jira/browse/INFRA-18703>with INFRA to 
>> inquire whether it would be possible to switch to a different Travis 
>> account, and if so what steps would need to be taken.
>> We need a proper confirmation from INFRA since we are not in full 
>> control of the flink repository (for example, we cannot access the 
>> settings page).
>>
>> If this is indeed possible, Ververica is willing sponsor a Travis 
>> account for the Flink project.
>> This would provide us with more than enough resources than we need.
>>
>> Since this makes the project more reliant on resources provided by 
>> external companies I would like to vote on this.
>>
>> Please vote on this proposal, as follows:
>> [ ] +1, Approve the migration to a Ververica-sponsored Travis 
>> account, provided that INFRA approves
>> [ ] -1, Do not approach the migration to a Ververica-sponsored Travis 
>> account
>>
>> The vote will be open for at least 24h, and until we have 
>> confirmation from INFRA. The voting period may be shorter than the 
>> usual 3 days since our current is effectively not working.
>>
>> On 04/07/2019 06:51, Bowen Li wrote:
>>> Re: > Are they using their own Travis CI pool, or did the switch to 
>>> an entirely different CI service?
>>>
>>> I reached out to Wes and Krisztián from Apache Arrow PMC. They are 
>>> currently moving away from ASF's Travis to their own in-house metal 
>>> machines at [1] with custom CI application at [2]. They've seen 
>>> significant improvement w.r.t both much higher performance and 
>>> basically no resource waiting time, "night-and-day" difference 
>>> quoting Wes.
>>>
>>> Re: > If we can just switch to our own Travis pool, just for our 
>>> project, then this might be something we can do fairly quickly?
>>>
>>> I believe so, according to [3] and [4]
>>>
>>>
>>> [1] https://ci.ursalabs.org/ <https://ci.ursalabs.org/#/>
>>> [2] https://github.com/ursa-labs/ursabot
>>> [3] 
>>> https://docs.travis-ci.com/user/migrate/open-source-repository-migration 
>>>
>>> [4] 
>>> https://docs.travis-ci.com/user/migrate/open-source-on-travis-ci-com
>>>
>>>
>>>
>>> On Wed, Jul 3, 2019 at 12:01 AM Chesnay Schepler <chesnay@apache.org 
>>> <ma...@apache.org>> wrote:
>>>
>>>     Are they using their own Travis CI pool, or did the switch to an
>>>     entirely different CI service?
>>>
>>>     If we can just switch to our own Travis pool, just for our
>>>     project, then
>>>     this might be something we can do fairly quickly?
>>>
>>>     On 03/07/2019 05:55, Bowen Li wrote:
>>>     > I responded in the INFRA ticket [1] that I believe they are
>>>     using a wrong
>>>     > metric against Flink and the total build time is a completely
>>>     different
>>>     > thing than guaranteed build capacity.
>>>     >
>>>     > My response:
>>>     >
>>>     > "As mentioned above, since I started to pay attention to Flink's
>>>     build
>>>     > queue a few tens of days ago, I'm in Seattle and I saw no build
>>>     was kicking
>>>     > off in PST daytime in weekdays for Flink. Our teammates in China
>>>     and Europe
>>>     > have also reported similar observations. So we need to evaluate
>>>     how the
>>>     > large total build time came from - if 1) your number and 2) our
>>>     > observations from three locations that cover pretty much a full
>>>     day, are
>>>     > all true, I **guess** one reason can be that - highly likely the
>>>     extra
>>>     > build time came from weekends when other Apache projects may be
>>>     idle and
>>>     > Flink just drains hard its congested queue.
>>>     >
>>>     > Please be aware of that we're not complaining about the lack of
>>>     resources
>>>     > in general, I'm complaining about the lack of **stable, 
>>> dedicated**
>>>     > resources. An example for the latter one is, currently even if
>>>     no build is
>>>     > in Flink's queue and I submit a request to be the queue head 
>>> in PST
>>>     > morning, my build won't even start in 6-8+h. That is an absurd
>>>     amount of
>>>     > waiting time.
>>>     >
>>>     > That's saying, if ASF INFRA decides to adopt a quota system and
>>>     grants
>>>     > Flink five DEDICATED servers that runs all the time only for
>>>     Flink, that'll
>>>     > be PERFECT and can totally solve our problem now.
>>>     >
>>>     > Please be aware of that we're not complaining about the lack of
>>>     resources
>>>     > in general, I'm complaining about the lack of **stable, 
>>> dedicated**
>>>     > resources. An example for the latter one is, currently even if
>>>     no build is
>>>     > in Flink's queue and I submit a request to be the queue head 
>>> in PST
>>>     > morning, my build won't even start in 6-8+h. That is an absurd
>>>     amount of
>>>     > waiting time.
>>>     >
>>>     >
>>>     > That's saying, if ASF INFRA decides to adopt a quota system and
>>>     grants
>>>     > Flink five DEDICATED servers that runs all the time only for
>>>     Flink, that'll
>>>     > be PERFECT and can totally solve our problem now.
>>>     >
>>>     > I feel what's missing in the ASF INFRA's Travis resource pool is
>>>     some level
>>>     > of build capacity SLAs and certainty"
>>>     >
>>>     >
>>>     > Again, I believe there are differences in nature of these two
>>>     problems,
>>>     > long build time v.s. lack of dedicated build resource. That's
>>>     saying,
>>>     > shortening build time may relieve the situation, and may not.
>>>     I'm sightly
>>>     > negative on disabling IT cases for PRs, due to the downside is
>>>     that we are
>>>     > at risk of any potential bugs in PR that UTs doesn't catch, and
>>>     may cost a
>>>     > lot more to fix and if it slows others down or even block
>>>     others, but am
>>>     > open to others opinions on it.
>>>     >
>>>     > AFAICT from INFRA ticket[1], donating to ASF INFRA won't be
>>>     feasible to
>>>     > solve our problem since INFRA's pool is fully shared and they
>>>     have no
>>>     > control and finer insights over resource allocation to a
>>>     specific Apache
>>>     > project. As mentioned in [1], Apache Arrow is moving away from
>>>     ASF INFRA
>>>     > Travis pool (they are actually surprised Flink hasn't plan to do
>>>     so). I
>>>     > know that Spark is on its own build infra. If we all agree that
>>>     funding our
>>>     > own build infra, I'd be glad to help investigate any potential
>>>     options
>>>     > after releasing 1.9 since I'm super busy with 1.9 now.
>>>     >
>>>     > [1] https://issues.apache.org/jira/browse/INFRA-18533
>>>     >
>>>     >
>>>     >
>>>     > On Tue, Jul 2, 2019 at 4:46 AM Chesnay Schepler
>>>     <chesnay@apache.org <ma...@apache.org>> wrote:
>>>     >
>>>     >> As a short-term stopgap, since we can assume this issue to
>>>     become much
>>>     >> worse in the following days/weeks, we could disable IT cases in
>>>     PRs and
>>>     >> only run them on master.
>>>     >>
>>>     >> On 02/07/2019 12:03, Chesnay Schepler wrote:
>>>     >>> People really have to stop thinking that just because
>>>     something works
>>>     >>> for us it is also a good solution.
>>>     >>> Also, please remember that our builds run for 2h from start to
>>>     finish,
>>>     >>> and not the 14 _minutes_ it takes for zeppelin.
>>>     >>> We are dealing with an entirely different scale here, both in
>>>     terms of
>>>     >>> build times and number of builds.
>>>     >>>
>>>     >>> In this very thread people have been complaining about long 
>>> queue
>>>     >>> times for their builds. Surprise, other Apache projects have 
>>> been
>>>     >>> suffering the very same thing due to us not controlling our 
>>> build
>>>     >>> times. While switching services (be it Jenkins, CircleCI or
>>>     whatever)
>>>     >>> will possibly work for us (and these options are actually
>>>     attractive,
>>>     >>> like CircleCI's proper support for build artifacts), it will 
>>> also
>>>     >>> result in us likely negatively affecting other projects in
>>>     significant
>>>     >>> ways.
>>>     >>>
>>>     >>> Sure, the Jenkins setup has a good user experience for us, at
>>>     the cost
>>>     >>> of blocking Jenkins workers for a _lot_ of time. Right now we
>>>     have 25
>>>     >>> PR's in our queue; that's possibly 50h we'd consume of Jenkins
>>>     >>> resources, and the European contributors haven't even really
>>>     started yet.
>>>     >>>
>>>     >>> FYI, the latest INFRA response from INFRA-18533:
>>>     >>>
>>>     >>> "Our rough metrics shows that Flink used over 5800 hours of
>>>     build time
>>>     >>> last month. That is equal to EIGHT servers running 24/7 for
>>>     the ENTIRE
>>>     >>> MONTH. EIGHT. nonstop.
>>>     >>> When we discovered this last night, we discussed it some and
>>>     are going
>>>     >>> to tune down Flink to allow only five executors maximum. We 
>>> cannot
>>>     >>> allow Flink to consume so much of a Foundation shared 
>>> resource."
>>>     >>>
>>>     >>> So yes, we either
>>>     >>> a) have to heavily reduce our CI usage or
>>>     >>> b) fund our own, either maintaining it ourselves or donating
>>>     to Apache.
>>>     >>>
>>>     >>> On 02/07/2019 05:11, Bowen Li wrote:
>>>     >>>> By looking at the git history of the Jenkins script, its core
>>>     part
>>>     >>>> was finished in March 2017 (and only two minor update in
>>>     2017/2018),
>>>     >>>> so it's been running for over two years now and feels like
>>>     Zepplin
>>>     >>>> community has been quite happy with it. @Jeff Zhang
>>>     >>>> <mailto:zjffdu@gmail.com <ma...@gmail.com>> can you
>>>     share your insights and user
>>>     >>>> experience with the Jenkins+Travis approach?
>>>     >>>>
>>>     >>>> Things like:
>>>     >>>>
>>>     >>>> - has the approach completely solved the resource capacity
>>>     problem
>>>     >>>> for Zepplin community? is Zepplin community happy with the
>>>     result?
>>>     >>>> - is the whole configuration chain stable (e.g. uptime) 
>>> enough?
>>>     >>>> - how often do you need to maintain the Jenkins infra? how 
>>> many
>>>     >>>> people are usually involved in maintenance and bug-fixes?
>>>     >>>>
>>>     >>>> The downside of this approach seems mostly to be on the
>>>     maintenance
>>>     >>>> to me - maintain the script and Jenkins infra.
>>>     >>>>
>>>     >>>> ** Having Our Own Travis-CI.com Account **
>>>     >>>>
>>>     >>>> Another alternative I've been thinking of is to have our own
>>>     >>>> travis-ci.com <http://travis-ci.com> <http://travis-ci.com>
>>>     account with paid dedicated
>>>     >>>> resources. Note travis-ci.org <http://travis-ci.org>
>>> <http://travis-ci.org> is the free
>>>     >>>> version and travis-ci.com <http://travis-ci.com>
>>> <http://travis-ci.com> is the commercial
>>>     >>>> version. We currently use a shared resource pool managed by
>>>     ASK INFRA
>>>     >>>> team on travis-ci.org <http://travis-ci.org>
>>> <http://travis-ci.org>, but we have no control
>>>     >>>> over it - we can't see how it's configured, how much
>>>     resources are
>>>     >>>> available, how resources are allocated among Apache projects,
>>>     etc.
>>>     >>>> The nice thing about having an account on travis-ci.com
>>> <http://travis-ci.com>
>>>     >>>> <http://travis-ci.com> are:
>>>     >>>>
>>>     >>>> - relatively low cost with much better resource guarantee
>>>     than what
>>>     >>>> we currently have [1]: $249/month with 5 dedicated 
>>> concurrency,
>>>     >>>> $489/month with 10 concurrency
>>>     >>>> - low maintenance work compared to using Jenkins
>>>     >>>> - (potentially) no migration cost according to Travis's doc 
>>> [2]
>>>     >>>> (pending verification)
>>>     >>>> - full control over the build capacity/configuration 
>>> compared to
>>>     >>>> using ASF INFRA's pool
>>>     >>>>
>>>     >>>> I'd be surprised if we as such a vibrant community cannot
>>>     find and
>>>     >>>> fund $249*12=$2988 a year in exchange for a much better 
>>> developer
>>>     >>>> experience and much higher productivity.
>>>     >>>>
>>>     >>>> [1] https://travis-ci.com/plans
>>>     >>>> [2]
>>>     >>>>
>>>     >>
>>> https://docs.travis-ci.com/user/migrate/open-source-repository-migration 
>>>
>>>     >>>> On Sat, Jun 29, 2019 at 8:39 AM Chesnay Schepler
>>>     <chesnay@apache.org <ma...@apache.org>
>>>     >>>> <mailto:chesnay@apache.org <ma...@apache.org>>> 
>>> wrote:
>>>     >>>>
>>>     >>>>      So yes, the Jenkins job keeps pulling the state from
>>>     Travis until it
>>>     >>>>      finishes.
>>>     >>>>
>>>     >>>>      Note sure I'm comfortable with the idea of using Jenkins
>>>     workers
>>>     >>>>      just to
>>>     >>>>      idle for a several hours.
>>>     >>>>
>>>     >>>>      On 29/06/2019 14:56, Jeff Zhang wrote:
>>>     >>>>      > Here's what zeppelin community did, we make a python
>>>     script to
>>>     >>>>      check the
>>>     >>>>      > build status of pull request.
>>>     >>>>      > Here's script:
>>>     >>>>      >
>>> https://github.com/apache/zeppelin/blob/master/travis_check.py
>>>     >>>>      >
>>>     >>>>      > And this is the script we used in Jenkins build job.
>>>     >>>>      >
>>>     >>>>      > if [ -f "travis_check.py" ]; then
>>>     >>>>      >    git log -n 1
>>>     >>>>      >    STATUS=$(curl -s $BUILD_URL | grep -e "GitHub pull
>>>     >>>>      request.*from.*" | sed
>>>     >>>>      > 's/.*GitHub pull request <a
>>>     >>>>      > href=\"\(https[^"]*\).*from[^"]*.\(https[^"]*\).*/\1
>>>     \2/g')
>>>     >>>>      >    AUTHOR=$(echo $STATUS | sed 's/.*[/]\(.*\)$/\1/g')
>>>     >>>>      >    PR=$(echo $STATUS | awk '{print $1}' | sed
>>>     >>>> 's/.*[/]\(.*\)$/\1/g')
>>>     >>>>      >    #COMMIT=$(git log -n 1 | grep "^Merge:" | awk
>>>     '{print $3}')
>>>     >>>>      >    #if [ -z $COMMIT ]; then
>>>     >>>>      >    #  COMMIT=$(curl -s
>>>     >>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR
>>>     >>>>      > | grep -e "\"label\":" -e "\"ref\":" -e "\"sha\":" |
>>>     tr '\n' ' '
>>>     >>>>      | sed
>>>     >>>>      > 's/\(.*sha[^,]*,\)\(.*ref.*\)/\1 = \2/g' | tr = '\n' |
>>>     grep -v
>>>     >>>>      "apache:" |
>>>     >>>>      > sed 's/.*sha.[^"]*["]\([^"]*\).*/\1/g')
>>>     >>>>      >    #fi
>>>     >>>>      >
>>>     >>>>      >    # get commit hash from PR
>>>     >>>>      >    COMMIT=$(curl -s
>>>     >>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR |
>>>     >>>>      > grep -e "\"label\":" -e "\"ref\":" -e "\"sha\":" | tr
>>>     '\n' ' '
>>>     >>>> | sed
>>>     >>>>      > 's/\(.*sha[^,]*,\)\(.*ref.*\)/\1 = \2/g' | tr = '\n' |
>>>     grep -v
>>>     >>>>      "apache:" |
>>>     >>>>      > sed 's/.*sha.[^"]*["]\([^"]*\).*/\1/g')
>>>     >>>>      >    sleep 30 # sleep few moment to wait travis starts
>>>     the build
>>>     >>>>      >    RET_CODE=0
>>>     >>>>      >    python ./travis_check.py ${AUTHOR} ${COMMIT} ||
>>>     RET_CODE=$?
>>>     >>>>      >    if [ $RET_CODE -eq 2 ]; then # try with repository
>>>     name when
>>>     >>>>      travis-ci is
>>>     >>>>      > not available in the account
>>>     >>>>      >      RET_CODE=0
>>>     >>>>      >      AUTHOR=$(curl -s
>>>     >>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR
>>>     >>>>      > | grep '"full_name":' | grep -v "apache/zeppelin" | sed
>>>     >>>>      > 's/.*[:][^"]*["]\([^/]*\).*/\1/g')
>>>     >>>>      >    python ./travis_check.py ${AUTHOR} ${COMMIT} ||
>>>     RET_CODE=$?
>>>     >>>>      >    fi
>>>     >>>>      >
>>>     >>>>      >    if [ $RET_CODE -eq 2 ]; then # fail with can't find
>>>     build
>>>     >>>>      information in
>>>     >>>>      > the travis
>>>     >>>>      >      set +x
>>>     >>>>      >      echo
>>>     "-----------------------------------------------------"
>>>     >>>>      >      echo "Looks like travis-ci is not configured for
>>>     your fork."
>>>     >>>>      >      echo "Please setup by swich on 'zeppelin'
>>>     repository at
>>>     >>>>      > https://travis-ci.org/profile and travis-ci."
>>>     >>>>      >      echo "And then make sure 'Build branch updates'
>>>     option is
>>>     >>>>      enabled in
>>>     >>>>      > the settings
>>> https://travis-ci.org/${AUTHOR}/zeppelin/settings
>>> <https://travis-ci.org/$%7BAUTHOR%7D/zeppelin/settings>
>>>     >>>> <https://travis-ci.org/$%7BAUTHOR%7D/zeppelin/settings>."
>>>     >>>>      >      echo ""
>>>     >>>>      >      echo "To trigger CI after setup, you will need
>>>     ammend your
>>>     >>>>      last commit
>>>     >>>>      > with"
>>>     >>>>      >      echo "git commit --amend"
>>>     >>>>      >      echo "git push your-remote HEAD --force"
>>>     >>>>      >      echo ""
>>>     >>>>      >      echo "See
>>>     >>>>      >
>>>     >>>>
>>>     >>
>>> http://zeppelin.apache.org/contribution/contributions.html#continuous-integration
>>>     >>>>      > ."
>>>     >>>>      >    fi
>>>     >>>>      >
>>>     >>>>      >    exit $RET_CODE
>>>     >>>>      > else
>>>     >>>>      >    set +x
>>>     >>>>      >    echo "travis_check.py does not exists"
>>>     >>>>      >    exit 1
>>>     >>>>      > fi
>>>     >>>>      >
>>>     >>>>      > Chesnay Schepler <chesnay@apache.org
>>> <ma...@apache.org>
>>>     >>>>      <mailto:chesnay@apache.org <ma...@apache.org>>>
>>>     于2019年6月29日周六 下午3:17写道:
>>>     >>>>      >
>>>     >>>>      >> Does this imply that a Jenkins job is active as long
>>>     as the
>>>     >>>>      Travis build
>>>     >>>>      >> runs?
>>>     >>>>      >>
>>>     >>>>      >> On 26/06/2019 21:28, Bowen Li wrote:
>>>     >>>>      >>> Hi,
>>>     >>>>      >>>
>>>     >>>>      >>> @Dawid, I think the "long test running" as I
>>>     mentioned in the
>>>     >>>>      first
>>>     >>>>      >> email,
>>>     >>>>      >>> also as you guys said, belongs to "a big effort
>>>     which is much
>>>     >>>>      harder to
>>>     >>>>      >>> accomplish in a short period of time and may deserve
>>>     its own
>>>     >>>>      separate
>>>     >>>>      >>> discussion". Thus I didn't include it in what we can
>>>     do in a
>>>     >>>>      foreseeable
>>>     >>>>      >>> short term.
>>>     >>>>      >>>
>>>     >>>>      >>> Besides, I don't think that's the ultimate reason
>>>     for lack of
>>>     >>>>      build
>>>     >>>>      >>> resources. Even if the build is shortened to
>>>     something like
>>>     >>>>      2h, the
>>>     >>>>      >>> problems of no build machine works about 6 or more
>>>     hours in
>>>     >>>>      PST daytime
>>>     >>>>      >>> that I described will still happen, because no
>>>     machine from
>>>     >>>>      ASF INFRA's
>>>     >>>>      >>> pool is allocated to Flink. As I have paid close
>>>     attention to
>>>     >>>>      the build
>>>     >>>>      >>> queue in the past few weekdays, it's a pretty clear
>>>     pattern now.
>>>     >>>>      >>>
>>>     >>>>      >>> **The ultimate root cause** for that is - we don't
>>>     have any
>>>     >>>>      **dedicated**
>>>     >>>>      >>> build resources that we can stably rely on. I'm
>>>     actually ok to
>>>     >>>>      wait for a
>>>     >>>>      >>> long time if there are build requests running, it
>>>     means at
>>>     >>>>      least we are
>>>     >>>>      >>> making progress. But I'm not ok with no build
>>>     resource. A
>>>     >>>>      better place I
>>>     >>>>      >>> think we should aim at in short term is to always
>>>     have at
>>>     >>>>      least a central
>>>     >>>>      >>> pool (can be 3 or 5) of machines dedicated to build
>>>     Flink at
>>>     >>>>      any time, or
>>>     >>>>      >>> maybe use users resources.
>>>     >>>>      >>>
>>>     >>>>      >>> @Chesnay @Robert I synced with Jeff offline that
>>>     Zeppelin
>>>     >>>>      community is
>>>     >>>>      >>> using a Jenkins job to automatically build on users'
>>>     travis
>>>     >>>>      account and
>>>     >>>>      >>> link the result back to github PR. I guess the
>>>     Jenkins job
>>>     >>>>      would fetch
>>>     >>>>      >>> latest upstream master and build the PR against it.
>>>     Jeff has
>>>     >>>> filed
>>>     >>>>      >> tickets
>>>     >>>>      >>> to learn and get access to the Jenkins infra. It'll
>>>     better to
>>>     >>>>      fully
>>>     >>>>      >>> understand it first before judging this approach.
>>>     >>>>      >>>
>>>     >>>>      >>> I also heard good things about CircleCI, and ASF
>>>     INFRA seems
>>>     >>>>      to have a
>>>     >>>>      >> pool
>>>     >>>>      >>> of build capacity there too. Can be an alternative
>>>     to consider.
>>>     >>>>      >>>
>>>     >>>>      >>>
>>>     >>>>      >>>
>>>     >>>>      >>>
>>>     >>>>      >>>
>>>     >>>>      >>>
>>>     >>>>      >>>
>>>     >>>>      >>>
>>>     >>>>      >>>
>>>     >>>>      >>> On Wed, Jun 26, 2019 at 12:44 AM Dawid Wysakowicz <
>>>     >>>>      >> dwysakowicz@apache.org
>>> <ma...@apache.org> <mailto:dwysakowicz@apache.org
>>> <ma...@apache.org>>>
>>>     >>>>      >>> wrote:
>>>     >>>>      >>>
>>>     >>>>      >>>> Sorry to jump in late, but I think Bowen missed the
>>>     most
>>>     >>>>      important point
>>>     >>>>      >>>> from Chesnay's previous message in the summary. The
>>>     ultimate
>>>     >>>>      reason for
>>>     >>>>      >>>> all the problems is that the tests take close to 2
>>>     hours to
>>>     >>>>      run already.
>>>     >>>>      >>>> I fully support this claim: "Unless people start
>>>     caring about
>>>     >>>>      test times
>>>     >>>>      >>>> before adding them, this issue cannot be solved"
>>>     >>>>      >>>>
>>>     >>>>      >>>> This is also another reason why using user's Travis
>>>     account
>>>     >>>>      won't help.
>>>     >>>>      >>>> Every few weeks we reach the user's time limit for
>>>     a single
>>>     >>>>      profile.
>>>     >>>>      >>>> This makes the user's builds simply fail, until we
>>>     either
>>>     >>>>      properly
>>>     >>>>      >>>> decrease the time the tests take (which I am not
>>>     sure we ever
>>>     >>>>      did) or
>>>     >>>>      >>>> postpone the problem by splitting into more
>>>     profiles. (Note
>>>     >>>>      that the ASF
>>>     >>>>      >>>> Travis account has higher time limits)
>>>     >>>>      >>>>
>>>     >>>>      >>>> Best,
>>>     >>>>      >>>>
>>>     >>>>      >>>> Dawid
>>>     >>>>      >>>>
>>>     >>>>      >>>> On 26/06/2019 09:36, Robert Metzger wrote:
>>>     >>>>      >>>>> Do we know if using "the best" available hardware
>>>     would
>>>     >>>>      improve the
>>>     >>>>      >> build
>>>     >>>>      >>>>> times?
>>>     >>>>      >>>>> Imagine we would run the build on machines with
>>>     plenty of
>>>     >>>>      main memory
>>>     >>>>      >> to
>>>     >>>>      >>>>> mount everything to ramdisk + the latest CPU
>>>     architecture?
>>>     >>>>      >>>>>
>>>     >>>>      >>>>> Throwing hardware at the problem could help reduce
>>>     the time
>>>     >>>>      of an
>>>     >>>>      >>>>> individual build, and using our own infrastructure
>>>     would
>>>     >>>>      remove our
>>>     >>>>      >>>>> dependency on Apache's Travis account (with the
>>>     obvious
>>>     >>>>      downside of
>>>     >>>>      >>>> having
>>>     >>>>      >>>>> to maintain the infrastructure)
>>>     >>>>      >>>>> We could use an open source travis alternative, to
>>>     have a
>>>     >>>>      similar
>>>     >>>>      >>>>> experience and make the migration easy.
>>>     >>>>      >>>>>
>>>     >>>>      >>>>>
>>>     >>>>      >>>>> On Wed, Jun 26, 2019 at 9:34 AM Chesnay Schepler
>>>     >>>>      <chesnay@apache.org <ma...@apache.org>
>>>     <mailto:chesnay@apache.org <ma...@apache.org>>>
>>>     >>>>      >>>> wrote:
>>>     >>>>      >>>>>>    >From what I gathered, there's no special
>>>     sauce that the
>>>     >>>>      Zeppelin
>>>     >>>>      >>>>>> project uses which actually integrates a users 
>>> Travis
>>>     >>>>      account into the
>>>     >>>>      >>>> PR.
>>>     >>>>      >>>>>> They just disabled Travis for PRs. And that's
>>>     kind of it.
>>>     >>>>      >>>>>>
>>>     >>>>      >>>>>> Naturally we can do this (duh) and safe the ASF a
>>>     fair
>>>     >>>>      amount of
>>>     >>>>      >>>>>> resources, but there are downsides:
>>>     >>>>      >>>>>>
>>>     >>>>      >>>>>> The discoverability of the Travis check takes a
>>>     nose-dive.
>>>     >>>>      Either we
>>>     >>>>      >>>>>> require every contributor to always, an every
>>>     commit, also
>>>     >>>>      post a
>>>     >>>>      >> Travis
>>>     >>>>      >>>>>> build, or we have the reviewer sift through the
>>>     >>>>      contributors account
>>>     >>>>      >> to
>>>     >>>>      >>>>>> find it.
>>>     >>>>      >>>>>>
>>>     >>>>      >>>>>> This is rather cumbersome. Additionally, it's
>>>     also not
>>>     >>>>      equivalent to
>>>     >>>>      >>>>>> having a PR build.
>>>     >>>>      >>>>>>
>>>     >>>>      >>>>>> A normal branch build takes a branch as is and
>>>     tests it. A
>>>     >>>>      PR build
>>>     >>>>      >>>>>> merges the branch into master, and then runs it.
>>>     (Fun fact:
>>>     >>>>      This is
>>>     >>>>      >> why
>>>     >>>>      >>>>>> a PR without merge conflicts is not being run on
>>>     Travis.)
>>>     >>>>      >>>>>>
>>>     >>>>      >>>>>> And ultimately, everyone can already make use 
>>> of this
>>>     >>>>      approach anyway.
>>>     >>>>      >>>>>>
>>>     >>>>      >>>>>> On 25/06/2019 08:02, Jark Wu wrote:
>>>     >>>>      >>>>>>> Hi Jeff,
>>>     >>>>      >>>>>>>
>>>     >>>>      >>>>>>> Thanks for sharing the Zeppelin approach. I
>>>     think it's a
>>>     >>>>      good idea to
>>>     >>>>      >>>>>>> leverage user's travis account.
>>>     >>>>      >>>>>>> In this way, we can have almost unlimited
>>>     concurrent build
>>>     >>>>      jobs and
>>>     >>>>      >>>>>>> developers can restart build by themselves
>>>     (currently only
>>>     >>>>      committers
>>>     >>>>      >>>>>>> can restart PR's build).
>>>     >>>>      >>>>>>>
>>>     >>>>      >>>>>>> But I'm still not very clear how to integrate 
>>> user's
>>>     >>>>      travis build
>>>     >>>>      >> into
>>>     >>>>      >>>>>>> the Flink pull request's build automatically.
>>>     Can you
>>>     >>>>      explain more in
>>>     >>>>      >>>>>>> detail?
>>>     >>>>      >>>>>>>
>>>     >>>>      >>>>>>> Another question: does travis only build
>>>     branches for user
>>>     >>>>      account?
>>>     >>>>      >>>>>>> My concern is that builds for PRs will rebase 
>>> user's
>>>     >>>>      commits against
>>>     >>>>      >>>>>>> current master branch.
>>>     >>>>      >>>>>>> This will help us to find problems before
>>>     merge.  Builds
>>>     >>>>      for branches
>>>     >>>>      >>>>>>> will lose the impact of new commits in master.
>>>     >>>>      >>>>>>> How does Zeppelin solve this problem?
>>>     >>>>      >>>>>>>
>>>     >>>>      >>>>>>> Thanks again for sharing the idea.
>>>     >>>>      >>>>>>>
>>>     >>>>      >>>>>>> Regards,
>>>     >>>>      >>>>>>> Jark
>>>     >>>>      >>>>>>>
>>>     >>>>      >>>>>>> On Tue, 25 Jun 2019 at 11:01, Jeff Zhang
>>>     <zjffdu@gmail.com <ma...@gmail.com>
>>>     >>>>      <mailto:zjffdu@gmail.com <ma...@gmail.com>>
>>>     >>>>      >>>>>>> <mailto:zjffdu@gmail.com
>>> <ma...@gmail.com> <mailto:zjffdu@gmail.com
>>> <ma...@gmail.com>>>> wrote:
>>>     >>>>      >>>>>>>
>>>     >>>>      >>>>>>>  Hi Folks,
>>>     >>>>      >>>>>>>
>>>     >>>>      >>>>>>>  Zeppelin meet this kind of issue before, we 
>>> solve
>>>     >>>> it by
>>>     >>>>      >> delegating
>>>     >>>>      >>>>>>>  each
>>>     >>>>      >>>>>>>  one's PR build to his travis account
>>>     (Everyone can
>>>     >>>>      have 5 free
>>>     >>>>      >>>>>>>  slot for
>>>     >>>>      >>>>>>>  travis build).
>>>     >>>>      >>>>>>>  Apache account travis build is only triggered 
>>> when
>>>     >>>>      PR is merged.
>>>     >>>>      >>>>>>>
>>>     >>>>      >>>>>>>
>>>     >>>>      >>>>>>>
>>>     >>>>      >>>>>>>  Kurt Young <ykt836@gmail.com
>>> <ma...@gmail.com>
>>>     >>>>      <mailto:ykt836@gmail.com <ma...@gmail.com>>
>>>     <mailto:ykt836@gmail.com <ma...@gmail.com>
>>>     >>>>      <mailto:ykt836@gmail.com <ma...@gmail.com>>>>
>>>     >>>>      >>>>>>>  于2019年6月25日周二 上午10:16写道:
>>>     >>>>      >>>>>>>
>>>     >>>>      >>>>>>>  > (Forgot to cc George)
>>>     >>>>      >>>>>>>  >
>>>     >>>>      >>>>>>>  > Best,
>>>     >>>>      >>>>>>>  > Kurt
>>>     >>>>      >>>>>>>  >
>>>     >>>>      >>>>>>>  >
>>>     >>>>      >>>>>>>  > On Tue, Jun 25, 2019 at 10:16 AM Kurt Young
>>>     >>>>      <ykt836@gmail.com <ma...@gmail.com>
>>>     <mailto:ykt836@gmail.com <ma...@gmail.com>>
>>>     >>>>      >>>>>>> <mailto:ykt836@gmail.com
>>> <ma...@gmail.com> <mailto:ykt836@gmail.com
>>> <ma...@gmail.com>>>>
>>>     >>>>      wrote:
>>>     >>>>      >>>>>>>  >
>>>     >>>>      >>>>>>>  > > Hi Bowen,
>>>     >>>>      >>>>>>>  > >
>>>     >>>>      >>>>>>>  > > Thanks for bringing this up. We
>>>     actually have
>>>     >>>>      discussed
>>>     >>>>      >> about
>>>     >>>>      >>>>>>>  this, and I
>>>     >>>>      >>>>>>>  > > think Till and George have
>>>     >>>>      >>>>>>>  > > already spend sometime investigating
>>>     it. I have
>>>     >>>>      cced both of
>>>     >>>>      >>>>>>>  them, and
>>>     >>>>      >>>>>>>  > > maybe they can share
>>>     >>>>      >>>>>>>  > > their findings.
>>>     >>>>      >>>>>>>  > >
>>>     >>>>      >>>>>>>  > > Best,
>>>     >>>>      >>>>>>>  > > Kurt
>>>     >>>>      >>>>>>>  > >
>>>     >>>>      >>>>>>>  > >
>>>     >>>>      >>>>>>>  > > On Tue, Jun 25, 2019 at 10:08 AM Jark Wu
>>>     >>>>      <imjark@gmail.com <ma...@gmail.com>
>>>     <mailto:imjark@gmail.com <ma...@gmail.com>>
>>>     >>>>      >>>>>>> <mailto:imjark@gmail.com
>>> <ma...@gmail.com> <mailto:imjark@gmail.com
>>> <ma...@gmail.com>>>>
>>>     >>>>      wrote:
>>>     >>>>      >>>>>>>  > >
>>>     >>>>      >>>>>>>  > >> Hi Bowen,
>>>     >>>>      >>>>>>>  > >>
>>>     >>>>      >>>>>>>  > >> Thanks for bringing this. We also
>>>     suffered from
>>>     >>>>      the long
>>>     >>>>      >>>>>>>  build time.
>>>     >>>>      >>>>>>>  > >> I agree that we should focus on
>>>     solving build
>>>     >>>>      capacity
>>>     >>>>      >>>>>>>  problem in the
>>>     >>>>      >>>>>>>  > >> thread.
>>>     >>>>      >>>>>>>  > >>
>>>     >>>>      >>>>>>>  > >> My observation is there is only one
>>>     build is
>>>     >>>>      running, all
>>>     >>>>      >> the
>>>     >>>>      >>>>>>>  others
>>>     >>>>      >>>>>>>  > >> (other
>>>     >>>>      >>>>>>>  > >> PRs, master) are pending.
>>>     >>>>      >>>>>>>  > >> The pricing plan[1] of travis shows
>>>     it can
>>>     >>>> support
>>>     >>>>      >> concurrent
>>>     >>>>      >>>>>>>  build
>>>     >>>>      >>>>>>>  > jobs.
>>>     >>>>      >>>>>>>  > >> But I don't know which plan we are
>>>     using, might
>>>     >>>>      be the free
>>>     >>>>      >>>>>>>  plan for
>>>     >>>>      >>>>>>>  > open
>>>     >>>>      >>>>>>>  > >> source.
>>>     >>>>      >>>>>>>  > >>
>>>     >>>>      >>>>>>>  > >> I cc-ed Chesnay who may have some
>>>     experience on
>>>     >>>>      Travis.
>>>     >>>>      >>>>>>>  > >>
>>>     >>>>      >>>>>>>  > >> Regards,
>>>     >>>>      >>>>>>>  > >> Jark
>>>     >>>>      >>>>>>>  > >>
>>>     >>>>      >>>>>>>  > >> [1]: https://travis-ci.com/plans
>>>     >>>>      >>>>>>>  > >>
>>>     >>>>      >>>>>>>  > >> On Tue, 25 Jun 2019 at 08:11, Bowen Li <
>>>     >>>>      >> bowenli86@gmail.com <ma...@gmail.com>
>>>     <mailto:bowenli86@gmail.com <ma...@gmail.com>>
>>>     >>>>      >>>>>>> <mailto:bowenli86@gmail.com
>>> <ma...@gmail.com>
>>>     >>>>      <mailto:bowenli86@gmail.com
>>> <ma...@gmail.com>>>> wrote:
>>>     >>>>      >>>>>>>  > >>
>>>     >>>>      >>>>>>>  > >> > Hi Steven,
>>>     >>>>      >>>>>>>  > >> >
>>>     >>>>      >>>>>>>  > >> > I think you may not read what I
>>>     wrote. The
>>>     >>>>      discussion is
>>>     >>>>      >>>> about
>>>     >>>>      >>>>>>>  > "unstable
>>>     >>>>      >>>>>>>  > >> > build **capacity**", in another word
>>>     >>>>      "unstable / lack of
>>>     >>>>      >>>> build
>>>     >>>>      >>>>>>>  > >> resources",
>>>     >>>>      >>>>>>>  > >> > not "unstable build".
>>>     >>>>      >>>>>>>  > >> >
>>>     >>>>      >>>>>>>  > >> > On Mon, Jun 24, 2019 at 4:40 PM
>>>     Steven Wu
>>>     >>>>      >>>>>>>  <stevenz3wu@gmail.com
>>> <ma...@gmail.com> <mailto:stevenz3wu@gmail.com
>>> <ma...@gmail.com>>
>>>     >>>>      <mailto:stevenz3wu@gmail.com
>>> <ma...@gmail.com> <mailto:stevenz3wu@gmail.com
>>> <ma...@gmail.com>>>>
>>>     >>>>      >>>>>>>  > wrote:
>>>     >>>>      >>>>>>>  > >> >
>>>     >>>>      >>>>>>>  > >> > > long and sometimes unstable build is
>>>     >>>>      definitely a pain
>>>     >>>>      >>>>>> point.
>>>     >>>>      >>>>>>>  > >> > >
>>>     >>>>      >>>>>>>  > >> > > I suspect the build failure here in
>>>     >>>>      >> flink-connector-kafka
>>>     >>>>      >>>>>>>  is not
>>>     >>>>      >>>>>>>  > >> related
>>>     >>>>      >>>>>>>  > >> > to
>>>     >>>>      >>>>>>>  > >> > > my change. but there is no easy
>>>     re-run the
>>>     >>>>      build on
>>>     >>>>      >>>>>>>  travis UI.
>>>     >>>>      >>>>>>>  > Google
>>>     >>>>      >>>>>>>  > >> > > search showed a trick of
>>>     close-and-open the
>>>     >>>>      PR will
>>>     >>>>      >>>>>>>  trigger rebuild.
>>>     >>>>      >>>>>>>  > >> but
>>>     >>>>      >>>>>>>  > >> > > that could add noises to the PR
>>>     activities.
>>>     >>>>      >>>>>>>  > >> > >
>>>     >>>> https://travis-ci.org/apache/flink/jobs/545555519
>>>     >>>>      >>>>>>>  > >> > >
>>>     >>>>      >>>>>>>  > >> > > travis-ci for my personal repo
>>>     often failed
>>>     >>>>      with
>>>     >>>>      >>>>>>>  exceeding time
>>>     >>>>      >>>>>>>  > limit
>>>     >>>>      >>>>>>>  > >> > after
>>>     >>>>      >>>>>>>  > >> > > 4+ hours.
>>>     >>>>      >>>>>>>  > >> > > The job exceeded the maximum time
>>>     limit for
>>>     >>>>      jobs, and
>>>     >>>>      >> has
>>>     >>>>      >>>>>>>  been
>>>     >>>>      >>>>>>>  > >> > terminated.
>>>     >>>>      >>>>>>>  > >> > >
>>>     >>>>      >>>>>>>  > >> > > On Mon, Jun 24, 2019 at 4:15 PM
>>>     Bowen Li
>>>     >>>>      >>>>>>>  <bowenli86@gmail.com
>>> <ma...@gmail.com> <mailto:bowenli86@gmail.com
>>> <ma...@gmail.com>>
>>>     >>>>      <mailto:bowenli86@gmail.com <ma...@gmail.com>
>>>     <mailto:bowenli86@gmail.com <ma...@gmail.com>>>>
>>>     >>>>      >>>>>>>  > wrote:
>>>     >>>>      >>>>>>>  > >> > >
>>>     >>>>      >>>>>>>  > >> > > >
>>>     >>>> https://travis-ci.org/apache/flink/builds/549681530
>>>     >>>>      >>>>>>>  This build
>>>     >>>>      >>>>>>>  > >> > request
>>>     >>>>      >>>>>>>  > >> > > > has
>>>     >>>>      >>>>>>>  > >> > > > been sitting at **HEAD of the
>>>     queue**
>>>     >>>>      since I first
>>>     >>>>      >> saw
>>>     >>>>      >>>>>>>  it at PST
>>>     >>>>      >>>>>>>  > >> > 10:30am
>>>     >>>>      >>>>>>>  > >> > > > (not sure how long it's been
>>>     there before
>>>     >>>>      10:30am).
>>>     >>>>      >>>>>>>  It's PST
>>>     >>>>      >>>>>>>  > 4:12pm
>>>     >>>>      >>>>>>>  > >> now
>>>     >>>>      >>>>>>>  > >> > > and
>>>     >>>>      >>>>>>>  > >> > > > it hasn't started yet.
>>>     >>>>      >>>>>>>  > >> > > >
>>>     >>>>      >>>>>>>  > >> > > > On Mon, Jun 24, 2019 at 2:48 PM
>>>     Bowen Li
>>>     >>>>      >>>>>>>  <bowenli86@gmail.com
>>> <ma...@gmail.com> <mailto:bowenli86@gmail.com
>>> <ma...@gmail.com>>
>>>     >>>>      <mailto:bowenli86@gmail.com <ma...@gmail.com>
>>>     <mailto:bowenli86@gmail.com <ma...@gmail.com>>>>
>>>     >>>>      >>>>>>>  > >> wrote:
>>>     >>>>      >>>>>>>  > >> > > >
>>>     >>>>      >>>>>>>  > >> > > > > Hi devs,
>>>     >>>>      >>>>>>>  > >> > > > >
>>>     >>>>      >>>>>>>  > >> > > > > I've been experiencing the pain
>>>     >>>>      resulting from lack
>>>     >>>>      >>>>>>>  of stable
>>>     >>>>      >>>>>>>  > >> build
>>>     >>>>      >>>>>>>  > >> > > > > capacity on Travis for Flink
>>>     PRs [1].
>>>     >>>>      >> Specifically, I
>>>     >>>>      >>>>>>>  noticed
>>>     >>>>      >>>>>>>  > >> often
>>>     >>>>      >>>>>>>  > >> > > that
>>>     >>>>      >>>>>>>  > >> > > > no
>>>     >>>>      >>>>>>>  > >> > > > > build in the queue is making any
>>>     >>>>      progress for
>>>     >>>>      >> hours,
>>>     >>>>      >>>> and
>>>     >>>>      >>>>>>>  > suddenly
>>>     >>>>      >>>>>>>  > >> 5
>>>     >>>>      >>>>>>>  > >> > or
>>>     >>>>      >>>>>>>  > >> > > 6
>>>     >>>>      >>>>>>>  > >> > > > > builds kick off all together
>>>     after the
>>>     >>>>      long pause.
>>>     >>>>      >>>>>>>  I'm at PST
>>>     >>>>      >>>>>>>  > >> > (UTC-08)
>>>     >>>>      >>>>>>>  > >> > > > time
>>>     >>>>      >>>>>>>  > >> > > > > zone, and I've seen pause can
>>>     be as
>>>     >>>>      long as 6 hours
>>>     >>>>      >>>>>>>  from PST 9am
>>>     >>>>      >>>>>>>  > >> to
>>>     >>>>      >>>>>>>  > >> > 3pm
>>>     >>>>      >>>>>>>  > >> > > > > (let alone the time needed to
>>>     drain the
>>>     >>>>      queue
>>>     >>>>      >>>>>>>  afterwards).
>>>     >>>>      >>>>>>>  > >> > > > >
>>>     >>>>      >>>>>>>  > >> > > > > I think this has greatly
>>>     impacted our
>>>     >>>>      productivity.
>>>     >>>>      >>>> I've
>>>     >>>>      >>>>>>>  > >> experienced
>>>     >>>>      >>>>>>>  > >> > > that
>>>     >>>>      >>>>>>>  > >> > > > > PRs submitted in the early
>>>     morning of
>>>     >>>>      PST time zone
>>>     >>>>      >>>>>>>  won't finish
>>>     >>>>      >>>>>>>  > >> > their
>>>     >>>>      >>>>>>>  > >> > > > > build until late night of the
>>>     same day.
>>>     >>>>      >>>>>>>  > >> > > > >
>>>     >>>>      >>>>>>>  > >> > > > > So my questions are:
>>>     >>>>      >>>>>>>  > >> > > > >
>>>     >>>>      >>>>>>>  > >> > > > > - Has anyone else experienced
>>>     the same
>>>     >>>>      problem or
>>>     >>>>      >>>>>>>  have similar
>>>     >>>>      >>>>>>>  > >> > > > observation
>>>     >>>>      >>>>>>>  > >> > > > > on TravisCI? (I suspect it
>>>     has things
>>>     >>>>      to do with
>>>     >>>>      >> time
>>>     >>>>      >>>>>>>  zone)
>>>     >>>>      >>>>>>>  > >> > > > >
>>>     >>>>      >>>>>>>  > >> > > > > - What pricing plan of
>>>     TravisCI is
>>>     >>>>      Flink currently
>>>     >>>>      >>>>>>>  using? Is it
>>>     >>>>      >>>>>>>  > >> the
>>>     >>>>      >>>>>>>  > >> > > free
>>>     >>>>      >>>>>>>  > >> > > > > plan for open source
>>>     projects? What
>>>     >>>> are the
>>>     >>>>      >>>>>>>  guaranteed build
>>>     >>>>      >>>>>>>  > >> capacity
>>>     >>>>      >>>>>>>  > >> > > of
>>>     >>>>      >>>>>>>  > >> > > > > the current plan?
>>>     >>>>      >>>>>>>  > >> > > > >
>>>     >>>>      >>>>>>>  > >> > > > > - If the current pricing plan
>>>     (either
>>>     >>>>      free or paid)
>>>     >>>>      >>>>>> can't
>>>     >>>>      >>>>>>>  > provide
>>>     >>>>      >>>>>>>  > >> > > stable
>>>     >>>>      >>>>>>>  > >> > > > > build capacity, can we
>>>     upgrade to a
>>>     >>>>      higher priced
>>>     >>>>      >>>>>>>  plan with
>>>     >>>>      >>>>>>>  > larger
>>>     >>>>      >>>>>>>  > >> > and
>>>     >>>>      >>>>>>>  > >> > > > more
>>>     >>>>      >>>>>>>  > >> > > > > stable build capacity?
>>>     >>>>      >>>>>>>  > >> > > > >
>>>     >>>>      >>>>>>>  > >> > > > > BTW, another factor that
>>>     contribute to
>>>     >>>> the
>>>     >>>>      >>>>>>>  productivity problem
>>>     >>>>      >>>>>>>  > is
>>>     >>>>      >>>>>>>  > >> > that
>>>     >>>>      >>>>>>>  > >> > > > > our build is slow - we run
>>>     full build
>>>     >>>>      for every PR
>>>     >>>>      >>>> and a
>>>     >>>>      >>>>>>>  > >> successful
>>>     >>>>      >>>>>>>  > >> > > full
>>>     >>>>      >>>>>>>  > >> > > > > build takes ~5h. We
>>>     definitely have
>>>     >>>>      more options to
>>>     >>>>      >>>>>>>  solve it,
>>>     >>>>      >>>>>>>  > for
>>>     >>>>      >>>>>>>  > >> > > > instance,
>>>     >>>>      >>>>>>>  > >> > > > > modularize the build graphs
>>>     and reuse
>>>     >>>>      artifacts
>>>     >>>>      >> from
>>>     >>>>      >>>> the
>>>     >>>>      >>>>>>>  > previous
>>>     >>>>      >>>>>>>  > >> > > build.
>>>     >>>>      >>>>>>>  > >> > > > > But I think that can be a big
>>>     effort
>>>     >>>>      which is much
>>>     >>>>      >>>>>>>  harder to
>>>     >>>>      >>>>>>>  > >> > accomplish
>>>     >>>>      >>>>>>>  > >> > > > in
>>>     >>>>      >>>>>>>  > >> > > > > a short period of time and
>>>     may deserve
>>>     >>>>      its own
>>>     >>>>      >>>> separate
>>>     >>>>      >>>>>>>  > >> discussion.
>>>     >>>>      >>>>>>>  > >> > > > >
>>>     >>>>      >>>>>>>  > >> > > > > [1]
>>>     >>>>      >> https://travis-ci.org/apache/flink/pull_requests
>>>     >>>>      >>>>>>>  > >> > > > >
>>>     >>>>      >>>>>>>  > >> > > > >
>>>     >>>>      >>>>>>>  > >> > > >
>>>     >>>>      >>>>>>>  > >> > >
>>>     >>>>      >>>>>>>  > >> >
>>>     >>>>      >>>>>>>  > >>
>>>     >>>>      >>>>>>>  > >
>>>     >>>>      >>>>>>>  >
>>>     >>>>      >>>>>>>
>>>     >>>>      >>>>>>>
>>>     >>>>      >>>>>>>  --
>>>     >>>>      >>>>>>>  Best Regards
>>>     >>>>      >>>>>>>
>>>     >>>>      >>>>>>>  Jeff Zhang
>>>     >>>>      >>>>>>>
>>>     >>>>      >>
>>>     >>>>
>>>     >>>
>>>     >>
>>>
>>
>>
>


Re: [VOTE] Migrate to sponsored Travis account

Posted by Chesnay Schepler <ch...@apache.org>.
Small update with mostly bad news:

INFRA doesn't know whether it is possible, and referred my to Travis 
support.
They did point out that it could be problematic in regards to read/write 
permissions for the repository.

 From my own findings /so far/ with a test repo/organization, it does 
not appear possible to configure the Travis account used for a specific 
repository.

So yeah, if we go down this route we may have to pimp the Flinkbot to 
trigger builds through the Travis REST API.

On 04/07/2019 10:46, Chesnay Schepler wrote:
> I've raised a JIRA 
> <https://issues.apache.org/jira/browse/INFRA-18703>with INFRA to 
> inquire whether it would be possible to switch to a different Travis 
> account, and if so what steps would need to be taken.
> We need a proper confirmation from INFRA since we are not in full 
> control of the flink repository (for example, we cannot access the 
> settings page).
>
> If this is indeed possible, Ververica is willing sponsor a Travis 
> account for the Flink project.
> This would provide us with more than enough resources than we need.
>
> Since this makes the project more reliant on resources provided by 
> external companies I would like to vote on this.
>
> Please vote on this proposal, as follows:
> [ ] +1, Approve the migration to a Ververica-sponsored Travis account, 
> provided that INFRA approves
> [ ] -1, Do not approach the migration to a Ververica-sponsored Travis 
> account
>
> The vote will be open for at least 24h, and until we have confirmation 
> from INFRA. The voting period may be shorter than the usual 3 days 
> since our current is effectively not working.
>
> On 04/07/2019 06:51, Bowen Li wrote:
>> Re: > Are they using their own Travis CI pool, or did the switch to 
>> an entirely different CI service?
>>
>> I reached out to Wes and Krisztián from Apache Arrow PMC. They are 
>> currently moving away from ASF's Travis to their own in-house metal 
>> machines at [1] with custom CI application at [2]. They've seen 
>> significant improvement w.r.t both much higher performance and 
>> basically no resource waiting time, "night-and-day" difference 
>> quoting Wes.
>>
>> Re: > If we can just switch to our own Travis pool, just for our 
>> project, then this might be something we can do fairly quickly?
>>
>> I believe so, according to [3] and [4]
>>
>>
>> [1] https://ci.ursalabs.org/ <https://ci.ursalabs.org/#/>
>> [2] https://github.com/ursa-labs/ursabot
>> [3] 
>> https://docs.travis-ci.com/user/migrate/open-source-repository-migration
>> [4] https://docs.travis-ci.com/user/migrate/open-source-on-travis-ci-com
>>
>>
>>
>> On Wed, Jul 3, 2019 at 12:01 AM Chesnay Schepler <chesnay@apache.org 
>> <ma...@apache.org>> wrote:
>>
>>     Are they using their own Travis CI pool, or did the switch to an
>>     entirely different CI service?
>>
>>     If we can just switch to our own Travis pool, just for our
>>     project, then
>>     this might be something we can do fairly quickly?
>>
>>     On 03/07/2019 05:55, Bowen Li wrote:
>>     > I responded in the INFRA ticket [1] that I believe they are
>>     using a wrong
>>     > metric against Flink and the total build time is a completely
>>     different
>>     > thing than guaranteed build capacity.
>>     >
>>     > My response:
>>     >
>>     > "As mentioned above, since I started to pay attention to Flink's
>>     build
>>     > queue a few tens of days ago, I'm in Seattle and I saw no build
>>     was kicking
>>     > off in PST daytime in weekdays for Flink. Our teammates in China
>>     and Europe
>>     > have also reported similar observations. So we need to evaluate
>>     how the
>>     > large total build time came from - if 1) your number and 2) our
>>     > observations from three locations that cover pretty much a full
>>     day, are
>>     > all true, I **guess** one reason can be that - highly likely the
>>     extra
>>     > build time came from weekends when other Apache projects may be
>>     idle and
>>     > Flink just drains hard its congested queue.
>>     >
>>     > Please be aware of that we're not complaining about the lack of
>>     resources
>>     > in general, I'm complaining about the lack of **stable, 
>> dedicated**
>>     > resources. An example for the latter one is, currently even if
>>     no build is
>>     > in Flink's queue and I submit a request to be the queue head in 
>> PST
>>     > morning, my build won't even start in 6-8+h. That is an absurd
>>     amount of
>>     > waiting time.
>>     >
>>     > That's saying, if ASF INFRA decides to adopt a quota system and
>>     grants
>>     > Flink five DEDICATED servers that runs all the time only for
>>     Flink, that'll
>>     > be PERFECT and can totally solve our problem now.
>>     >
>>     > Please be aware of that we're not complaining about the lack of
>>     resources
>>     > in general, I'm complaining about the lack of **stable, 
>> dedicated**
>>     > resources. An example for the latter one is, currently even if
>>     no build is
>>     > in Flink's queue and I submit a request to be the queue head in 
>> PST
>>     > morning, my build won't even start in 6-8+h. That is an absurd
>>     amount of
>>     > waiting time.
>>     >
>>     >
>>     > That's saying, if ASF INFRA decides to adopt a quota system and
>>     grants
>>     > Flink five DEDICATED servers that runs all the time only for
>>     Flink, that'll
>>     > be PERFECT and can totally solve our problem now.
>>     >
>>     > I feel what's missing in the ASF INFRA's Travis resource pool is
>>     some level
>>     > of build capacity SLAs and certainty"
>>     >
>>     >
>>     > Again, I believe there are differences in nature of these two
>>     problems,
>>     > long build time v.s. lack of dedicated build resource. That's
>>     saying,
>>     > shortening build time may relieve the situation, and may not.
>>     I'm sightly
>>     > negative on disabling IT cases for PRs, due to the downside is
>>     that we are
>>     > at risk of any potential bugs in PR that UTs doesn't catch, and
>>     may cost a
>>     > lot more to fix and if it slows others down or even block
>>     others, but am
>>     > open to others opinions on it.
>>     >
>>     > AFAICT from INFRA ticket[1], donating to ASF INFRA won't be
>>     feasible to
>>     > solve our problem since INFRA's pool is fully shared and they
>>     have no
>>     > control and finer insights over resource allocation to a
>>     specific Apache
>>     > project. As mentioned in [1], Apache Arrow is moving away from
>>     ASF INFRA
>>     > Travis pool (they are actually surprised Flink hasn't plan to do
>>     so). I
>>     > know that Spark is on its own build infra. If we all agree that
>>     funding our
>>     > own build infra, I'd be glad to help investigate any potential
>>     options
>>     > after releasing 1.9 since I'm super busy with 1.9 now.
>>     >
>>     > [1] https://issues.apache.org/jira/browse/INFRA-18533
>>     >
>>     >
>>     >
>>     > On Tue, Jul 2, 2019 at 4:46 AM Chesnay Schepler
>>     <chesnay@apache.org <ma...@apache.org>> wrote:
>>     >
>>     >> As a short-term stopgap, since we can assume this issue to
>>     become much
>>     >> worse in the following days/weeks, we could disable IT cases in
>>     PRs and
>>     >> only run them on master.
>>     >>
>>     >> On 02/07/2019 12:03, Chesnay Schepler wrote:
>>     >>> People really have to stop thinking that just because
>>     something works
>>     >>> for us it is also a good solution.
>>     >>> Also, please remember that our builds run for 2h from start to
>>     finish,
>>     >>> and not the 14 _minutes_ it takes for zeppelin.
>>     >>> We are dealing with an entirely different scale here, both in
>>     terms of
>>     >>> build times and number of builds.
>>     >>>
>>     >>> In this very thread people have been complaining about long 
>> queue
>>     >>> times for their builds. Surprise, other Apache projects have 
>> been
>>     >>> suffering the very same thing due to us not controlling our 
>> build
>>     >>> times. While switching services (be it Jenkins, CircleCI or
>>     whatever)
>>     >>> will possibly work for us (and these options are actually
>>     attractive,
>>     >>> like CircleCI's proper support for build artifacts), it will 
>> also
>>     >>> result in us likely negatively affecting other projects in
>>     significant
>>     >>> ways.
>>     >>>
>>     >>> Sure, the Jenkins setup has a good user experience for us, at
>>     the cost
>>     >>> of blocking Jenkins workers for a _lot_ of time. Right now we
>>     have 25
>>     >>> PR's in our queue; that's possibly 50h we'd consume of Jenkins
>>     >>> resources, and the European contributors haven't even really
>>     started yet.
>>     >>>
>>     >>> FYI, the latest INFRA response from INFRA-18533:
>>     >>>
>>     >>> "Our rough metrics shows that Flink used over 5800 hours of
>>     build time
>>     >>> last month. That is equal to EIGHT servers running 24/7 for
>>     the ENTIRE
>>     >>> MONTH. EIGHT. nonstop.
>>     >>> When we discovered this last night, we discussed it some and
>>     are going
>>     >>> to tune down Flink to allow only five executors maximum. We 
>> cannot
>>     >>> allow Flink to consume so much of a Foundation shared resource."
>>     >>>
>>     >>> So yes, we either
>>     >>> a) have to heavily reduce our CI usage or
>>     >>> b) fund our own, either maintaining it ourselves or donating
>>     to Apache.
>>     >>>
>>     >>> On 02/07/2019 05:11, Bowen Li wrote:
>>     >>>> By looking at the git history of the Jenkins script, its core
>>     part
>>     >>>> was finished in March 2017 (and only two minor update in
>>     2017/2018),
>>     >>>> so it's been running for over two years now and feels like
>>     Zepplin
>>     >>>> community has been quite happy with it. @Jeff Zhang
>>     >>>> <mailto:zjffdu@gmail.com <ma...@gmail.com>> can you
>>     share your insights and user
>>     >>>> experience with the Jenkins+Travis approach?
>>     >>>>
>>     >>>> Things like:
>>     >>>>
>>     >>>> - has the approach completely solved the resource capacity
>>     problem
>>     >>>> for Zepplin community? is Zepplin community happy with the
>>     result?
>>     >>>> - is the whole configuration chain stable (e.g. uptime) enough?
>>     >>>> - how often do you need to maintain the Jenkins infra? how many
>>     >>>> people are usually involved in maintenance and bug-fixes?
>>     >>>>
>>     >>>> The downside of this approach seems mostly to be on the
>>     maintenance
>>     >>>> to me - maintain the script and Jenkins infra.
>>     >>>>
>>     >>>> ** Having Our Own Travis-CI.com Account **
>>     >>>>
>>     >>>> Another alternative I've been thinking of is to have our own
>>     >>>> travis-ci.com <http://travis-ci.com> <http://travis-ci.com>
>>     account with paid dedicated
>>     >>>> resources. Note travis-ci.org <http://travis-ci.org>
>>     <http://travis-ci.org> is the free
>>     >>>> version and travis-ci.com <http://travis-ci.com>
>>     <http://travis-ci.com> is the commercial
>>     >>>> version. We currently use a shared resource pool managed by
>>     ASK INFRA
>>     >>>> team on travis-ci.org <http://travis-ci.org>
>>     <http://travis-ci.org>, but we have no control
>>     >>>> over it - we can't see how it's configured, how much
>>     resources are
>>     >>>> available, how resources are allocated among Apache projects,
>>     etc.
>>     >>>> The nice thing about having an account on travis-ci.com
>>     <http://travis-ci.com>
>>     >>>> <http://travis-ci.com> are:
>>     >>>>
>>     >>>> - relatively low cost with much better resource guarantee
>>     than what
>>     >>>> we currently have [1]: $249/month with 5 dedicated concurrency,
>>     >>>> $489/month with 10 concurrency
>>     >>>> - low maintenance work compared to using Jenkins
>>     >>>> - (potentially) no migration cost according to Travis's doc [2]
>>     >>>> (pending verification)
>>     >>>> - full control over the build capacity/configuration 
>> compared to
>>     >>>> using ASF INFRA's pool
>>     >>>>
>>     >>>> I'd be surprised if we as such a vibrant community cannot
>>     find and
>>     >>>> fund $249*12=$2988 a year in exchange for a much better 
>> developer
>>     >>>> experience and much higher productivity.
>>     >>>>
>>     >>>> [1] https://travis-ci.com/plans
>>     >>>> [2]
>>     >>>>
>>     >>
>> https://docs.travis-ci.com/user/migrate/open-source-repository-migration
>>     >>>> On Sat, Jun 29, 2019 at 8:39 AM Chesnay Schepler
>>     <chesnay@apache.org <ma...@apache.org>
>>     >>>> <mailto:chesnay@apache.org <ma...@apache.org>>> wrote:
>>     >>>>
>>     >>>>      So yes, the Jenkins job keeps pulling the state from
>>     Travis until it
>>     >>>>      finishes.
>>     >>>>
>>     >>>>      Note sure I'm comfortable with the idea of using Jenkins
>>     workers
>>     >>>>      just to
>>     >>>>      idle for a several hours.
>>     >>>>
>>     >>>>      On 29/06/2019 14:56, Jeff Zhang wrote:
>>     >>>>      > Here's what zeppelin community did, we make a python
>>     script to
>>     >>>>      check the
>>     >>>>      > build status of pull request.
>>     >>>>      > Here's script:
>>     >>>>      >
>> https://github.com/apache/zeppelin/blob/master/travis_check.py
>>     >>>>      >
>>     >>>>      > And this is the script we used in Jenkins build job.
>>     >>>>      >
>>     >>>>      > if [ -f "travis_check.py" ]; then
>>     >>>>      >    git log -n 1
>>     >>>>      >    STATUS=$(curl -s $BUILD_URL | grep -e "GitHub pull
>>     >>>>      request.*from.*" | sed
>>     >>>>      > 's/.*GitHub pull request <a
>>     >>>>      > href=\"\(https[^"]*\).*from[^"]*.\(https[^"]*\).*/\1
>>     \2/g')
>>     >>>>      >    AUTHOR=$(echo $STATUS | sed 's/.*[/]\(.*\)$/\1/g')
>>     >>>>      >    PR=$(echo $STATUS | awk '{print $1}' | sed
>>     >>>> 's/.*[/]\(.*\)$/\1/g')
>>     >>>>      >    #COMMIT=$(git log -n 1 | grep "^Merge:" | awk
>>     '{print $3}')
>>     >>>>      >    #if [ -z $COMMIT ]; then
>>     >>>>      >    #  COMMIT=$(curl -s
>>     >>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR
>>     >>>>      > | grep -e "\"label\":" -e "\"ref\":" -e "\"sha\":" |
>>     tr '\n' ' '
>>     >>>>      | sed
>>     >>>>      > 's/\(.*sha[^,]*,\)\(.*ref.*\)/\1 = \2/g' | tr = '\n' |
>>     grep -v
>>     >>>>      "apache:" |
>>     >>>>      > sed 's/.*sha.[^"]*["]\([^"]*\).*/\1/g')
>>     >>>>      >    #fi
>>     >>>>      >
>>     >>>>      >    # get commit hash from PR
>>     >>>>      >    COMMIT=$(curl -s
>>     >>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR |
>>     >>>>      > grep -e "\"label\":" -e "\"ref\":" -e "\"sha\":" | tr
>>     '\n' ' '
>>     >>>> | sed
>>     >>>>      > 's/\(.*sha[^,]*,\)\(.*ref.*\)/\1 = \2/g' | tr = '\n' |
>>     grep -v
>>     >>>>      "apache:" |
>>     >>>>      > sed 's/.*sha.[^"]*["]\([^"]*\).*/\1/g')
>>     >>>>      >    sleep 30 # sleep few moment to wait travis starts
>>     the build
>>     >>>>      >    RET_CODE=0
>>     >>>>      >    python ./travis_check.py ${AUTHOR} ${COMMIT} ||
>>     RET_CODE=$?
>>     >>>>      >    if [ $RET_CODE -eq 2 ]; then # try with repository
>>     name when
>>     >>>>      travis-ci is
>>     >>>>      > not available in the account
>>     >>>>      >      RET_CODE=0
>>     >>>>      >      AUTHOR=$(curl -s
>>     >>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR
>>     >>>>      > | grep '"full_name":' | grep -v "apache/zeppelin" | sed
>>     >>>>      > 's/.*[:][^"]*["]\([^/]*\).*/\1/g')
>>     >>>>      >    python ./travis_check.py ${AUTHOR} ${COMMIT} ||
>>     RET_CODE=$?
>>     >>>>      >    fi
>>     >>>>      >
>>     >>>>      >    if [ $RET_CODE -eq 2 ]; then # fail with can't find
>>     build
>>     >>>>      information in
>>     >>>>      > the travis
>>     >>>>      >      set +x
>>     >>>>      >      echo
>>     "-----------------------------------------------------"
>>     >>>>      >      echo "Looks like travis-ci is not configured for
>>     your fork."
>>     >>>>      >      echo "Please setup by swich on 'zeppelin'
>>     repository at
>>     >>>>      > https://travis-ci.org/profile and travis-ci."
>>     >>>>      >      echo "And then make sure 'Build branch updates'
>>     option is
>>     >>>>      enabled in
>>     >>>>      > the settings
>>     https://travis-ci.org/${AUTHOR}/zeppelin/settings
>> <https://travis-ci.org/$%7BAUTHOR%7D/zeppelin/settings>
>>     >>>> <https://travis-ci.org/$%7BAUTHOR%7D/zeppelin/settings>."
>>     >>>>      >      echo ""
>>     >>>>      >      echo "To trigger CI after setup, you will need
>>     ammend your
>>     >>>>      last commit
>>     >>>>      > with"
>>     >>>>      >      echo "git commit --amend"
>>     >>>>      >      echo "git push your-remote HEAD --force"
>>     >>>>      >      echo ""
>>     >>>>      >      echo "See
>>     >>>>      >
>>     >>>>
>>     >>
>> http://zeppelin.apache.org/contribution/contributions.html#continuous-integration
>>     >>>>      > ."
>>     >>>>      >    fi
>>     >>>>      >
>>     >>>>      >    exit $RET_CODE
>>     >>>>      > else
>>     >>>>      >    set +x
>>     >>>>      >    echo "travis_check.py does not exists"
>>     >>>>      >    exit 1
>>     >>>>      > fi
>>     >>>>      >
>>     >>>>      > Chesnay Schepler <chesnay@apache.org
>>     <ma...@apache.org>
>>     >>>>      <mailto:chesnay@apache.org <ma...@apache.org>>>
>>     于2019年6月29日周六 下午3:17写道:
>>     >>>>      >
>>     >>>>      >> Does this imply that a Jenkins job is active as long
>>     as the
>>     >>>>      Travis build
>>     >>>>      >> runs?
>>     >>>>      >>
>>     >>>>      >> On 26/06/2019 21:28, Bowen Li wrote:
>>     >>>>      >>> Hi,
>>     >>>>      >>>
>>     >>>>      >>> @Dawid, I think the "long test running" as I
>>     mentioned in the
>>     >>>>      first
>>     >>>>      >> email,
>>     >>>>      >>> also as you guys said, belongs to "a big effort
>>     which is much
>>     >>>>      harder to
>>     >>>>      >>> accomplish in a short period of time and may deserve
>>     its own
>>     >>>>      separate
>>     >>>>      >>> discussion". Thus I didn't include it in what we can
>>     do in a
>>     >>>>      foreseeable
>>     >>>>      >>> short term.
>>     >>>>      >>>
>>     >>>>      >>> Besides, I don't think that's the ultimate reason
>>     for lack of
>>     >>>>      build
>>     >>>>      >>> resources. Even if the build is shortened to
>>     something like
>>     >>>>      2h, the
>>     >>>>      >>> problems of no build machine works about 6 or more
>>     hours in
>>     >>>>      PST daytime
>>     >>>>      >>> that I described will still happen, because no
>>     machine from
>>     >>>>      ASF INFRA's
>>     >>>>      >>> pool is allocated to Flink. As I have paid close
>>     attention to
>>     >>>>      the build
>>     >>>>      >>> queue in the past few weekdays, it's a pretty clear
>>     pattern now.
>>     >>>>      >>>
>>     >>>>      >>> **The ultimate root cause** for that is - we don't
>>     have any
>>     >>>>      **dedicated**
>>     >>>>      >>> build resources that we can stably rely on. I'm
>>     actually ok to
>>     >>>>      wait for a
>>     >>>>      >>> long time if there are build requests running, it
>>     means at
>>     >>>>      least we are
>>     >>>>      >>> making progress. But I'm not ok with no build
>>     resource. A
>>     >>>>      better place I
>>     >>>>      >>> think we should aim at in short term is to always
>>     have at
>>     >>>>      least a central
>>     >>>>      >>> pool (can be 3 or 5) of machines dedicated to build
>>     Flink at
>>     >>>>      any time, or
>>     >>>>      >>> maybe use users resources.
>>     >>>>      >>>
>>     >>>>      >>> @Chesnay @Robert I synced with Jeff offline that
>>     Zeppelin
>>     >>>>      community is
>>     >>>>      >>> using a Jenkins job to automatically build on users'
>>     travis
>>     >>>>      account and
>>     >>>>      >>> link the result back to github PR. I guess the
>>     Jenkins job
>>     >>>>      would fetch
>>     >>>>      >>> latest upstream master and build the PR against it.
>>     Jeff has
>>     >>>> filed
>>     >>>>      >> tickets
>>     >>>>      >>> to learn and get access to the Jenkins infra. It'll
>>     better to
>>     >>>>      fully
>>     >>>>      >>> understand it first before judging this approach.
>>     >>>>      >>>
>>     >>>>      >>> I also heard good things about CircleCI, and ASF
>>     INFRA seems
>>     >>>>      to have a
>>     >>>>      >> pool
>>     >>>>      >>> of build capacity there too. Can be an alternative
>>     to consider.
>>     >>>>      >>>
>>     >>>>      >>>
>>     >>>>      >>>
>>     >>>>      >>>
>>     >>>>      >>>
>>     >>>>      >>>
>>     >>>>      >>>
>>     >>>>      >>>
>>     >>>>      >>>
>>     >>>>      >>> On Wed, Jun 26, 2019 at 12:44 AM Dawid Wysakowicz <
>>     >>>>      >> dwysakowicz@apache.org
>>     <ma...@apache.org> <mailto:dwysakowicz@apache.org
>>     <ma...@apache.org>>>
>>     >>>>      >>> wrote:
>>     >>>>      >>>
>>     >>>>      >>>> Sorry to jump in late, but I think Bowen missed the
>>     most
>>     >>>>      important point
>>     >>>>      >>>> from Chesnay's previous message in the summary. The
>>     ultimate
>>     >>>>      reason for
>>     >>>>      >>>> all the problems is that the tests take close to 2
>>     hours to
>>     >>>>      run already.
>>     >>>>      >>>> I fully support this claim: "Unless people start
>>     caring about
>>     >>>>      test times
>>     >>>>      >>>> before adding them, this issue cannot be solved"
>>     >>>>      >>>>
>>     >>>>      >>>> This is also another reason why using user's Travis
>>     account
>>     >>>>      won't help.
>>     >>>>      >>>> Every few weeks we reach the user's time limit for
>>     a single
>>     >>>>      profile.
>>     >>>>      >>>> This makes the user's builds simply fail, until we
>>     either
>>     >>>>      properly
>>     >>>>      >>>> decrease the time the tests take (which I am not
>>     sure we ever
>>     >>>>      did) or
>>     >>>>      >>>> postpone the problem by splitting into more
>>     profiles. (Note
>>     >>>>      that the ASF
>>     >>>>      >>>> Travis account has higher time limits)
>>     >>>>      >>>>
>>     >>>>      >>>> Best,
>>     >>>>      >>>>
>>     >>>>      >>>> Dawid
>>     >>>>      >>>>
>>     >>>>      >>>> On 26/06/2019 09:36, Robert Metzger wrote:
>>     >>>>      >>>>> Do we know if using "the best" available hardware
>>     would
>>     >>>>      improve the
>>     >>>>      >> build
>>     >>>>      >>>>> times?
>>     >>>>      >>>>> Imagine we would run the build on machines with
>>     plenty of
>>     >>>>      main memory
>>     >>>>      >> to
>>     >>>>      >>>>> mount everything to ramdisk + the latest CPU
>>     architecture?
>>     >>>>      >>>>>
>>     >>>>      >>>>> Throwing hardware at the problem could help reduce
>>     the time
>>     >>>>      of an
>>     >>>>      >>>>> individual build, and using our own infrastructure
>>     would
>>     >>>>      remove our
>>     >>>>      >>>>> dependency on Apache's Travis account (with the
>>     obvious
>>     >>>>      downside of
>>     >>>>      >>>> having
>>     >>>>      >>>>> to maintain the infrastructure)
>>     >>>>      >>>>> We could use an open source travis alternative, to
>>     have a
>>     >>>>      similar
>>     >>>>      >>>>> experience and make the migration easy.
>>     >>>>      >>>>>
>>     >>>>      >>>>>
>>     >>>>      >>>>> On Wed, Jun 26, 2019 at 9:34 AM Chesnay Schepler
>>     >>>>      <chesnay@apache.org <ma...@apache.org>
>>     <mailto:chesnay@apache.org <ma...@apache.org>>>
>>     >>>>      >>>> wrote:
>>     >>>>      >>>>>>    >From what I gathered, there's no special
>>     sauce that the
>>     >>>>      Zeppelin
>>     >>>>      >>>>>> project uses which actually integrates a users 
>> Travis
>>     >>>>      account into the
>>     >>>>      >>>> PR.
>>     >>>>      >>>>>> They just disabled Travis for PRs. And that's
>>     kind of it.
>>     >>>>      >>>>>>
>>     >>>>      >>>>>> Naturally we can do this (duh) and safe the ASF a
>>     fair
>>     >>>>      amount of
>>     >>>>      >>>>>> resources, but there are downsides:
>>     >>>>      >>>>>>
>>     >>>>      >>>>>> The discoverability of the Travis check takes a
>>     nose-dive.
>>     >>>>      Either we
>>     >>>>      >>>>>> require every contributor to always, an every
>>     commit, also
>>     >>>>      post a
>>     >>>>      >> Travis
>>     >>>>      >>>>>> build, or we have the reviewer sift through the
>>     >>>>      contributors account
>>     >>>>      >> to
>>     >>>>      >>>>>> find it.
>>     >>>>      >>>>>>
>>     >>>>      >>>>>> This is rather cumbersome. Additionally, it's
>>     also not
>>     >>>>      equivalent to
>>     >>>>      >>>>>> having a PR build.
>>     >>>>      >>>>>>
>>     >>>>      >>>>>> A normal branch build takes a branch as is and
>>     tests it. A
>>     >>>>      PR build
>>     >>>>      >>>>>> merges the branch into master, and then runs it.
>>     (Fun fact:
>>     >>>>      This is
>>     >>>>      >> why
>>     >>>>      >>>>>> a PR without merge conflicts is not being run on
>>     Travis.)
>>     >>>>      >>>>>>
>>     >>>>      >>>>>> And ultimately, everyone can already make use of 
>> this
>>     >>>>      approach anyway.
>>     >>>>      >>>>>>
>>     >>>>      >>>>>> On 25/06/2019 08:02, Jark Wu wrote:
>>     >>>>      >>>>>>> Hi Jeff,
>>     >>>>      >>>>>>>
>>     >>>>      >>>>>>> Thanks for sharing the Zeppelin approach. I
>>     think it's a
>>     >>>>      good idea to
>>     >>>>      >>>>>>> leverage user's travis account.
>>     >>>>      >>>>>>> In this way, we can have almost unlimited
>>     concurrent build
>>     >>>>      jobs and
>>     >>>>      >>>>>>> developers can restart build by themselves
>>     (currently only
>>     >>>>      committers
>>     >>>>      >>>>>>> can restart PR's build).
>>     >>>>      >>>>>>>
>>     >>>>      >>>>>>> But I'm still not very clear how to integrate 
>> user's
>>     >>>>      travis build
>>     >>>>      >> into
>>     >>>>      >>>>>>> the Flink pull request's build automatically.
>>     Can you
>>     >>>>      explain more in
>>     >>>>      >>>>>>> detail?
>>     >>>>      >>>>>>>
>>     >>>>      >>>>>>> Another question: does travis only build
>>     branches for user
>>     >>>>      account?
>>     >>>>      >>>>>>> My concern is that builds for PRs will rebase 
>> user's
>>     >>>>      commits against
>>     >>>>      >>>>>>> current master branch.
>>     >>>>      >>>>>>> This will help us to find problems before
>>     merge.  Builds
>>     >>>>      for branches
>>     >>>>      >>>>>>> will lose the impact of new commits in master.
>>     >>>>      >>>>>>> How does Zeppelin solve this problem?
>>     >>>>      >>>>>>>
>>     >>>>      >>>>>>> Thanks again for sharing the idea.
>>     >>>>      >>>>>>>
>>     >>>>      >>>>>>> Regards,
>>     >>>>      >>>>>>> Jark
>>     >>>>      >>>>>>>
>>     >>>>      >>>>>>> On Tue, 25 Jun 2019 at 11:01, Jeff Zhang
>>     <zjffdu@gmail.com <ma...@gmail.com>
>>     >>>>      <mailto:zjffdu@gmail.com <ma...@gmail.com>>
>>     >>>>      >>>>>>> <mailto:zjffdu@gmail.com
>>     <ma...@gmail.com> <mailto:zjffdu@gmail.com
>>     <ma...@gmail.com>>>> wrote:
>>     >>>>      >>>>>>>
>>     >>>>      >>>>>>>       Hi Folks,
>>     >>>>      >>>>>>>
>>     >>>>      >>>>>>>  Zeppelin meet this kind of issue before, we solve
>>     >>>> it by
>>     >>>>      >> delegating
>>     >>>>      >>>>>>>  each
>>     >>>>      >>>>>>>  one's PR build to his travis account
>>     (Everyone can
>>     >>>>      have 5 free
>>     >>>>      >>>>>>>  slot for
>>     >>>>      >>>>>>>  travis build).
>>     >>>>      >>>>>>>  Apache account travis build is only triggered 
>> when
>>     >>>>      PR is merged.
>>     >>>>      >>>>>>>
>>     >>>>      >>>>>>>
>>     >>>>      >>>>>>>
>>     >>>>      >>>>>>>  Kurt Young <ykt836@gmail.com
>>     <ma...@gmail.com>
>>     >>>>      <mailto:ykt836@gmail.com <ma...@gmail.com>>
>>     <mailto:ykt836@gmail.com <ma...@gmail.com>
>>     >>>>      <mailto:ykt836@gmail.com <ma...@gmail.com>>>>
>>     >>>>      >>>>>>>  于2019年6月25日周二 上午10:16写道:
>>     >>>>      >>>>>>>
>>     >>>>      >>>>>>>  > (Forgot to cc George)
>>     >>>>      >>>>>>>  >
>>     >>>>      >>>>>>>  > Best,
>>     >>>>      >>>>>>>  > Kurt
>>     >>>>      >>>>>>>  >
>>     >>>>      >>>>>>>  >
>>     >>>>      >>>>>>>  > On Tue, Jun 25, 2019 at 10:16 AM Kurt Young
>>     >>>>      <ykt836@gmail.com <ma...@gmail.com>
>>     <mailto:ykt836@gmail.com <ma...@gmail.com>>
>>     >>>>      >>>>>>> <mailto:ykt836@gmail.com
>>     <ma...@gmail.com> <mailto:ykt836@gmail.com
>>     <ma...@gmail.com>>>>
>>     >>>>      wrote:
>>     >>>>      >>>>>>>  >
>>     >>>>      >>>>>>>  > > Hi Bowen,
>>     >>>>      >>>>>>>  > >
>>     >>>>      >>>>>>>  > > Thanks for bringing this up. We
>>     actually have
>>     >>>>      discussed
>>     >>>>      >> about
>>     >>>>      >>>>>>>  this, and I
>>     >>>>      >>>>>>>  > > think Till and George have
>>     >>>>      >>>>>>>  > > already spend sometime investigating
>>     it. I have
>>     >>>>      cced both of
>>     >>>>      >>>>>>>  them, and
>>     >>>>      >>>>>>>  > > maybe they can share
>>     >>>>      >>>>>>>  > > their findings.
>>     >>>>      >>>>>>>  > >
>>     >>>>      >>>>>>>  > > Best,
>>     >>>>      >>>>>>>  > > Kurt
>>     >>>>      >>>>>>>  > >
>>     >>>>      >>>>>>>  > >
>>     >>>>      >>>>>>>  > > On Tue, Jun 25, 2019 at 10:08 AM Jark Wu
>>     >>>>      <imjark@gmail.com <ma...@gmail.com>
>>     <mailto:imjark@gmail.com <ma...@gmail.com>>
>>     >>>>      >>>>>>> <mailto:imjark@gmail.com
>>     <ma...@gmail.com> <mailto:imjark@gmail.com
>>     <ma...@gmail.com>>>>
>>     >>>>      wrote:
>>     >>>>      >>>>>>>  > >
>>     >>>>      >>>>>>>  > >> Hi Bowen,
>>     >>>>      >>>>>>>  > >>
>>     >>>>      >>>>>>>  > >> Thanks for bringing this. We also
>>     suffered from
>>     >>>>      the long
>>     >>>>      >>>>>>>  build time.
>>     >>>>      >>>>>>>  > >> I agree that we should focus on
>>     solving build
>>     >>>>      capacity
>>     >>>>      >>>>>>>  problem in the
>>     >>>>      >>>>>>>  > >> thread.
>>     >>>>      >>>>>>>  > >>
>>     >>>>      >>>>>>>  > >> My observation is there is only one
>>     build is
>>     >>>>      running, all
>>     >>>>      >> the
>>     >>>>      >>>>>>>  others
>>     >>>>      >>>>>>>  > >> (other
>>     >>>>      >>>>>>>  > >> PRs, master) are pending.
>>     >>>>      >>>>>>>  > >> The pricing plan[1] of travis shows
>>     it can
>>     >>>> support
>>     >>>>      >> concurrent
>>     >>>>      >>>>>>>  build
>>     >>>>      >>>>>>>  > jobs.
>>     >>>>      >>>>>>>  > >> But I don't know which plan we are
>>     using, might
>>     >>>>      be the free
>>     >>>>      >>>>>>>  plan for
>>     >>>>      >>>>>>>  > open
>>     >>>>      >>>>>>>  > >> source.
>>     >>>>      >>>>>>>  > >>
>>     >>>>      >>>>>>>  > >> I cc-ed Chesnay who may have some
>>     experience on
>>     >>>>      Travis.
>>     >>>>      >>>>>>>  > >>
>>     >>>>      >>>>>>>  > >> Regards,
>>     >>>>      >>>>>>>  > >> Jark
>>     >>>>      >>>>>>>  > >>
>>     >>>>      >>>>>>>  > >> [1]: https://travis-ci.com/plans
>>     >>>>      >>>>>>>  > >>
>>     >>>>      >>>>>>>  > >> On Tue, 25 Jun 2019 at 08:11, Bowen Li <
>>     >>>>      >> bowenli86@gmail.com <ma...@gmail.com>
>>     <mailto:bowenli86@gmail.com <ma...@gmail.com>>
>>     >>>>      >>>>>>> <mailto:bowenli86@gmail.com
>>     <ma...@gmail.com>
>>     >>>>      <mailto:bowenli86@gmail.com
>>     <ma...@gmail.com>>>> wrote:
>>     >>>>      >>>>>>>  > >>
>>     >>>>      >>>>>>>  > >> > Hi Steven,
>>     >>>>      >>>>>>>  > >> >
>>     >>>>      >>>>>>>  > >> > I think you may not read what I
>>     wrote. The
>>     >>>>      discussion is
>>     >>>>      >>>> about
>>     >>>>      >>>>>>>  > "unstable
>>     >>>>      >>>>>>>  > >> > build **capacity**", in another word
>>     >>>>      "unstable / lack of
>>     >>>>      >>>> build
>>     >>>>      >>>>>>>  > >> resources",
>>     >>>>      >>>>>>>  > >> > not "unstable build".
>>     >>>>      >>>>>>>  > >> >
>>     >>>>      >>>>>>>  > >> > On Mon, Jun 24, 2019 at 4:40 PM
>>     Steven Wu
>>     >>>>      >>>>>>>  <stevenz3wu@gmail.com
>>     <ma...@gmail.com> <mailto:stevenz3wu@gmail.com
>>     <ma...@gmail.com>>
>>     >>>>      <mailto:stevenz3wu@gmail.com
>>     <ma...@gmail.com> <mailto:stevenz3wu@gmail.com
>>     <ma...@gmail.com>>>>
>>     >>>>      >>>>>>>  > wrote:
>>     >>>>      >>>>>>>  > >> >
>>     >>>>      >>>>>>>  > >> > > long and sometimes unstable build is
>>     >>>>      definitely a pain
>>     >>>>      >>>>>> point.
>>     >>>>      >>>>>>>  > >> > >
>>     >>>>      >>>>>>>  > >> > > I suspect the build failure here in
>>     >>>>      >> flink-connector-kafka
>>     >>>>      >>>>>>>       is not
>>     >>>>      >>>>>>>  > >> related
>>     >>>>      >>>>>>>  > >> > to
>>     >>>>      >>>>>>>  > >> > > my change. but there is no easy
>>     re-run the
>>     >>>>      build on
>>     >>>>      >>>>>>>  travis UI.
>>     >>>>      >>>>>>>  > Google
>>     >>>>      >>>>>>>  > >> > > search showed a trick of
>>     close-and-open the
>>     >>>>      PR will
>>     >>>>      >>>>>>>  trigger rebuild.
>>     >>>>      >>>>>>>  > >> but
>>     >>>>      >>>>>>>  > >> > > that could add noises to the PR
>>     activities.
>>     >>>>      >>>>>>>  > >> > >
>>     >>>> https://travis-ci.org/apache/flink/jobs/545555519
>>     >>>>      >>>>>>>  > >> > >
>>     >>>>      >>>>>>>  > >> > > travis-ci for my personal repo
>>     often failed
>>     >>>>      with
>>     >>>>      >>>>>>>  exceeding time
>>     >>>>      >>>>>>>  > limit
>>     >>>>      >>>>>>>  > >> > after
>>     >>>>      >>>>>>>  > >> > > 4+ hours.
>>     >>>>      >>>>>>>  > >> > > The job exceeded the maximum time
>>     limit for
>>     >>>>      jobs, and
>>     >>>>      >> has
>>     >>>>      >>>>>>>  been
>>     >>>>      >>>>>>>  > >> > terminated.
>>     >>>>      >>>>>>>  > >> > >
>>     >>>>      >>>>>>>  > >> > > On Mon, Jun 24, 2019 at 4:15 PM
>>     Bowen Li
>>     >>>>      >>>>>>>  <bowenli86@gmail.com
>>     <ma...@gmail.com> <mailto:bowenli86@gmail.com
>>     <ma...@gmail.com>>
>>     >>>>      <mailto:bowenli86@gmail.com <ma...@gmail.com>
>>     <mailto:bowenli86@gmail.com <ma...@gmail.com>>>>
>>     >>>>      >>>>>>>  > wrote:
>>     >>>>      >>>>>>>  > >> > >
>>     >>>>      >>>>>>>  > >> > > >
>>     >>>> https://travis-ci.org/apache/flink/builds/549681530
>>     >>>>      >>>>>>>  This build
>>     >>>>      >>>>>>>  > >> > request
>>     >>>>      >>>>>>>  > >> > > > has
>>     >>>>      >>>>>>>  > >> > > > been sitting at **HEAD of the
>>     queue**
>>     >>>>      since I first
>>     >>>>      >> saw
>>     >>>>      >>>>>>>       it at PST
>>     >>>>      >>>>>>>  > >> > 10:30am
>>     >>>>      >>>>>>>  > >> > > > (not sure how long it's been
>>     there before
>>     >>>>      10:30am).
>>     >>>>      >>>>>>>  It's PST
>>     >>>>      >>>>>>>  > 4:12pm
>>     >>>>      >>>>>>>  > >> now
>>     >>>>      >>>>>>>  > >> > > and
>>     >>>>      >>>>>>>  > >> > > > it hasn't started yet.
>>     >>>>      >>>>>>>  > >> > > >
>>     >>>>      >>>>>>>  > >> > > > On Mon, Jun 24, 2019 at 2:48 PM
>>     Bowen Li
>>     >>>>      >>>>>>>  <bowenli86@gmail.com
>>     <ma...@gmail.com> <mailto:bowenli86@gmail.com
>>     <ma...@gmail.com>>
>>     >>>>      <mailto:bowenli86@gmail.com <ma...@gmail.com>
>>     <mailto:bowenli86@gmail.com <ma...@gmail.com>>>>
>>     >>>>      >>>>>>>  > >> wrote:
>>     >>>>      >>>>>>>  > >> > > >
>>     >>>>      >>>>>>>  > >> > > > > Hi devs,
>>     >>>>      >>>>>>>  > >> > > > >
>>     >>>>      >>>>>>>  > >> > > > > I've been experiencing the pain
>>     >>>>      resulting from lack
>>     >>>>      >>>>>>>       of stable
>>     >>>>      >>>>>>>  > >> build
>>     >>>>      >>>>>>>  > >> > > > > capacity on Travis for Flink
>>     PRs [1].
>>     >>>>      >> Specifically, I
>>     >>>>      >>>>>>>  noticed
>>     >>>>      >>>>>>>  > >> often
>>     >>>>      >>>>>>>  > >> > > that
>>     >>>>      >>>>>>>  > >> > > > no
>>     >>>>      >>>>>>>  > >> > > > > build in the queue is making any
>>     >>>>      progress for
>>     >>>>      >> hours,
>>     >>>>      >>>> and
>>     >>>>      >>>>>>>  > suddenly
>>     >>>>      >>>>>>>  > >> 5
>>     >>>>      >>>>>>>  > >> > or
>>     >>>>      >>>>>>>  > >> > > 6
>>     >>>>      >>>>>>>  > >> > > > > builds kick off all together
>>     after the
>>     >>>>      long pause.
>>     >>>>      >>>>>>>       I'm at PST
>>     >>>>      >>>>>>>  > >> > (UTC-08)
>>     >>>>      >>>>>>>  > >> > > > time
>>     >>>>      >>>>>>>  > >> > > > > zone, and I've seen pause can
>>     be as
>>     >>>>      long as 6 hours
>>     >>>>      >>>>>>>  from PST 9am
>>     >>>>      >>>>>>>  > >> to
>>     >>>>      >>>>>>>  > >> > 3pm
>>     >>>>      >>>>>>>  > >> > > > > (let alone the time needed to
>>     drain the
>>     >>>>      queue
>>     >>>>      >>>>>>>  afterwards).
>>     >>>>      >>>>>>>  > >> > > > >
>>     >>>>      >>>>>>>  > >> > > > > I think this has greatly
>>     impacted our
>>     >>>>      productivity.
>>     >>>>      >>>> I've
>>     >>>>      >>>>>>>  > >> experienced
>>     >>>>      >>>>>>>  > >> > > that
>>     >>>>      >>>>>>>  > >> > > > > PRs submitted in the early
>>     morning of
>>     >>>>      PST time zone
>>     >>>>      >>>>>>>  won't finish
>>     >>>>      >>>>>>>  > >> > their
>>     >>>>      >>>>>>>  > >> > > > > build until late night of the
>>     same day.
>>     >>>>      >>>>>>>  > >> > > > >
>>     >>>>      >>>>>>>  > >> > > > > So my questions are:
>>     >>>>      >>>>>>>  > >> > > > >
>>     >>>>      >>>>>>>  > >> > > > > - Has anyone else experienced
>>     the same
>>     >>>>      problem or
>>     >>>>      >>>>>>>  have similar
>>     >>>>      >>>>>>>  > >> > > > observation
>>     >>>>      >>>>>>>  > >> > > > > on TravisCI? (I suspect it
>>     has things
>>     >>>>      to do with
>>     >>>>      >> time
>>     >>>>      >>>>>>>  zone)
>>     >>>>      >>>>>>>  > >> > > > >
>>     >>>>      >>>>>>>  > >> > > > > - What pricing plan of
>>     TravisCI is
>>     >>>>      Flink currently
>>     >>>>      >>>>>>>  using? Is it
>>     >>>>      >>>>>>>  > >> the
>>     >>>>      >>>>>>>  > >> > > free
>>     >>>>      >>>>>>>  > >> > > > > plan for open source
>>     projects? What
>>     >>>> are the
>>     >>>>      >>>>>>>  guaranteed build
>>     >>>>      >>>>>>>  > >> capacity
>>     >>>>      >>>>>>>  > >> > > of
>>     >>>>      >>>>>>>  > >> > > > > the current plan?
>>     >>>>      >>>>>>>  > >> > > > >
>>     >>>>      >>>>>>>  > >> > > > > - If the current pricing plan
>>     (either
>>     >>>>      free or paid)
>>     >>>>      >>>>>> can't
>>     >>>>      >>>>>>>  > provide
>>     >>>>      >>>>>>>  > >> > > stable
>>     >>>>      >>>>>>>  > >> > > > > build capacity, can we
>>     upgrade to a
>>     >>>>      higher priced
>>     >>>>      >>>>>>>  plan with
>>     >>>>      >>>>>>>  > larger
>>     >>>>      >>>>>>>  > >> > and
>>     >>>>      >>>>>>>  > >> > > > more
>>     >>>>      >>>>>>>  > >> > > > > stable build capacity?
>>     >>>>      >>>>>>>  > >> > > > >
>>     >>>>      >>>>>>>  > >> > > > > BTW, another factor that
>>     contribute to
>>     >>>> the
>>     >>>>      >>>>>>>  productivity problem
>>     >>>>      >>>>>>>  > is
>>     >>>>      >>>>>>>  > >> > that
>>     >>>>      >>>>>>>  > >> > > > > our build is slow - we run
>>     full build
>>     >>>>      for every PR
>>     >>>>      >>>> and a
>>     >>>>      >>>>>>>  > >> successful
>>     >>>>      >>>>>>>  > >> > > full
>>     >>>>      >>>>>>>  > >> > > > > build takes ~5h. We
>>     definitely have
>>     >>>>      more options to
>>     >>>>      >>>>>>>  solve it,
>>     >>>>      >>>>>>>  > for
>>     >>>>      >>>>>>>  > >> > > > instance,
>>     >>>>      >>>>>>>  > >> > > > > modularize the build graphs
>>     and reuse
>>     >>>>      artifacts
>>     >>>>      >> from
>>     >>>>      >>>> the
>>     >>>>      >>>>>>>  > previous
>>     >>>>      >>>>>>>  > >> > > build.
>>     >>>>      >>>>>>>  > >> > > > > But I think that can be a big
>>     effort
>>     >>>>      which is much
>>     >>>>      >>>>>>>  harder to
>>     >>>>      >>>>>>>  > >> > accomplish
>>     >>>>      >>>>>>>  > >> > > > in
>>     >>>>      >>>>>>>  > >> > > > > a short period of time and
>>     may deserve
>>     >>>>      its own
>>     >>>>      >>>> separate
>>     >>>>      >>>>>>>  > >> discussion.
>>     >>>>      >>>>>>>  > >> > > > >
>>     >>>>      >>>>>>>  > >> > > > > [1]
>>     >>>>      >> https://travis-ci.org/apache/flink/pull_requests
>>     >>>>      >>>>>>>  > >> > > > >
>>     >>>>      >>>>>>>  > >> > > > >
>>     >>>>      >>>>>>>  > >> > > >
>>     >>>>      >>>>>>>  > >> > >
>>     >>>>      >>>>>>>  > >> >
>>     >>>>      >>>>>>>  > >>
>>     >>>>      >>>>>>>  > >
>>     >>>>      >>>>>>>  >
>>     >>>>      >>>>>>>
>>     >>>>      >>>>>>>
>>     >>>>      >>>>>>>       --
>>     >>>>      >>>>>>>  Best Regards
>>     >>>>      >>>>>>>
>>     >>>>      >>>>>>>  Jeff Zhang
>>     >>>>      >>>>>>>
>>     >>>>      >>
>>     >>>>
>>     >>>
>>     >>
>>
>
>


Re: [RESULT][VOTE] Migrate to sponsored Travis account

Posted by Chesnay Schepler <ch...@apache.org>.
Yes we can do that; for the time being you can add an empty commit to 
re-trigger the CI.


On 08/07/2019 03:49, Congxian Qiu wrote:
> As we used flink bot to trigger the CI test, could we add a command for
> flink bot to retrigger the CI(sometimes we may encounter some flaky tests)
>
> Best,
> Congxian
>
>
> Chesnay Schepler <ch...@apache.org> 于2019年7月8日周一 上午5:01写道:
>
>> The vote has passed unanimously in favor of migrating to a separate
>> Travis account.
>>
>> I will now set things up such that no PullRequest is no longer run on
>> the ASF servers.
>> This is a major setup in reducing our usage of ASF resources.
>> For the time being we'll use free Travis plan for flink-ci (i.e. 5
>> workers, which is the same the ASF gives us). Over the course of the
>> next week we'll setup the Ververica subscription to increase this limit.
>>
>>   From now now, a bot will mirror all new and updated PullRequests to a
>> mirror repository (https://github.com/flink-ci/flink-ci) and write an
>> update into the PR once the build is complete.
>> I have ran the bots for the past 3 days in parallel to our existing
>> Travis and it was working without major issues.
>>
>> The biggest change that contributors will see is that there's no longer
>> a icon next to each commit. We may revisit this in the future.
>>
>> I'll setup a repo with the source of the bot later.
>>
>> On 04/07/2019 10:46, Chesnay Schepler wrote:
>>> I've raised a JIRA
>>> <https://issues.apache.org/jira/browse/INFRA-18703>with INFRA to
>>> inquire whether it would be possible to switch to a different Travis
>>> account, and if so what steps would need to be taken.
>>> We need a proper confirmation from INFRA since we are not in full
>>> control of the flink repository (for example, we cannot access the
>>> settings page).
>>>
>>> If this is indeed possible, Ververica is willing sponsor a Travis
>>> account for the Flink project.
>>> This would provide us with more than enough resources than we need.
>>>
>>> Since this makes the project more reliant on resources provided by
>>> external companies I would like to vote on this.
>>>
>>> Please vote on this proposal, as follows:
>>> [ ] +1, Approve the migration to a Ververica-sponsored Travis account,
>>> provided that INFRA approves
>>> [ ] -1, Do not approach the migration to a Ververica-sponsored Travis
>>> account
>>>
>>> The vote will be open for at least 24h, and until we have confirmation
>>> from INFRA. The voting period may be shorter than the usual 3 days
>>> since our current is effectively not working.
>>>
>>> On 04/07/2019 06:51, Bowen Li wrote:
>>>> Re: > Are they using their own Travis CI pool, or did the switch to
>>>> an entirely different CI service?
>>>>
>>>> I reached out to Wes and Krisztián from Apache Arrow PMC. They are
>>>> currently moving away from ASF's Travis to their own in-house metal
>>>> machines at [1] with custom CI application at [2]. They've seen
>>>> significant improvement w.r.t both much higher performance and
>>>> basically no resource waiting time, "night-and-day" difference
>>>> quoting Wes.
>>>>
>>>> Re: > If we can just switch to our own Travis pool, just for our
>>>> project, then this might be something we can do fairly quickly?
>>>>
>>>> I believe so, according to [3] and [4]
>>>>
>>>>
>>>> [1] https://ci.ursalabs.org/ <https://ci.ursalabs.org/#/>
>>>> [2] https://github.com/ursa-labs/ursabot
>>>> [3]
>>>>
>> https://docs.travis-ci.com/user/migrate/open-source-repository-migration
>>>> [4]
>> https://docs.travis-ci.com/user/migrate/open-source-on-travis-ci-com
>>>>
>>>>
>>>> On Wed, Jul 3, 2019 at 12:01 AM Chesnay Schepler <chesnay@apache.org
>>>> <ma...@apache.org>> wrote:
>>>>
>>>>      Are they using their own Travis CI pool, or did the switch to an
>>>>      entirely different CI service?
>>>>
>>>>      If we can just switch to our own Travis pool, just for our
>>>>      project, then
>>>>      this might be something we can do fairly quickly?
>>>>
>>>>      On 03/07/2019 05:55, Bowen Li wrote:
>>>>      > I responded in the INFRA ticket [1] that I believe they are
>>>>      using a wrong
>>>>      > metric against Flink and the total build time is a completely
>>>>      different
>>>>      > thing than guaranteed build capacity.
>>>>      >
>>>>      > My response:
>>>>      >
>>>>      > "As mentioned above, since I started to pay attention to Flink's
>>>>      build
>>>>      > queue a few tens of days ago, I'm in Seattle and I saw no build
>>>>      was kicking
>>>>      > off in PST daytime in weekdays for Flink. Our teammates in China
>>>>      and Europe
>>>>      > have also reported similar observations. So we need to evaluate
>>>>      how the
>>>>      > large total build time came from - if 1) your number and 2) our
>>>>      > observations from three locations that cover pretty much a full
>>>>      day, are
>>>>      > all true, I **guess** one reason can be that - highly likely the
>>>>      extra
>>>>      > build time came from weekends when other Apache projects may be
>>>>      idle and
>>>>      > Flink just drains hard its congested queue.
>>>>      >
>>>>      > Please be aware of that we're not complaining about the lack of
>>>>      resources
>>>>      > in general, I'm complaining about the lack of **stable,
>>>> dedicated**
>>>>      > resources. An example for the latter one is, currently even if
>>>>      no build is
>>>>      > in Flink's queue and I submit a request to be the queue head in
>>>> PST
>>>>      > morning, my build won't even start in 6-8+h. That is an absurd
>>>>      amount of
>>>>      > waiting time.
>>>>      >
>>>>      > That's saying, if ASF INFRA decides to adopt a quota system and
>>>>      grants
>>>>      > Flink five DEDICATED servers that runs all the time only for
>>>>      Flink, that'll
>>>>      > be PERFECT and can totally solve our problem now.
>>>>      >
>>>>      > Please be aware of that we're not complaining about the lack of
>>>>      resources
>>>>      > in general, I'm complaining about the lack of **stable,
>>>> dedicated**
>>>>      > resources. An example for the latter one is, currently even if
>>>>      no build is
>>>>      > in Flink's queue and I submit a request to be the queue head in
>>>> PST
>>>>      > morning, my build won't even start in 6-8+h. That is an absurd
>>>>      amount of
>>>>      > waiting time.
>>>>      >
>>>>      >
>>>>      > That's saying, if ASF INFRA decides to adopt a quota system and
>>>>      grants
>>>>      > Flink five DEDICATED servers that runs all the time only for
>>>>      Flink, that'll
>>>>      > be PERFECT and can totally solve our problem now.
>>>>      >
>>>>      > I feel what's missing in the ASF INFRA's Travis resource pool is
>>>>      some level
>>>>      > of build capacity SLAs and certainty"
>>>>      >
>>>>      >
>>>>      > Again, I believe there are differences in nature of these two
>>>>      problems,
>>>>      > long build time v.s. lack of dedicated build resource. That's
>>>>      saying,
>>>>      > shortening build time may relieve the situation, and may not.
>>>>      I'm sightly
>>>>      > negative on disabling IT cases for PRs, due to the downside is
>>>>      that we are
>>>>      > at risk of any potential bugs in PR that UTs doesn't catch, and
>>>>      may cost a
>>>>      > lot more to fix and if it slows others down or even block
>>>>      others, but am
>>>>      > open to others opinions on it.
>>>>      >
>>>>      > AFAICT from INFRA ticket[1], donating to ASF INFRA won't be
>>>>      feasible to
>>>>      > solve our problem since INFRA's pool is fully shared and they
>>>>      have no
>>>>      > control and finer insights over resource allocation to a
>>>>      specific Apache
>>>>      > project. As mentioned in [1], Apache Arrow is moving away from
>>>>      ASF INFRA
>>>>      > Travis pool (they are actually surprised Flink hasn't plan to do
>>>>      so). I
>>>>      > know that Spark is on its own build infra. If we all agree that
>>>>      funding our
>>>>      > own build infra, I'd be glad to help investigate any potential
>>>>      options
>>>>      > after releasing 1.9 since I'm super busy with 1.9 now.
>>>>      >
>>>>      > [1] https://issues.apache.org/jira/browse/INFRA-18533
>>>>      >
>>>>      >
>>>>      >
>>>>      > On Tue, Jul 2, 2019 at 4:46 AM Chesnay Schepler
>>>>      <chesnay@apache.org <ma...@apache.org>> wrote:
>>>>      >
>>>>      >> As a short-term stopgap, since we can assume this issue to
>>>>      become much
>>>>      >> worse in the following days/weeks, we could disable IT cases in
>>>>      PRs and
>>>>      >> only run them on master.
>>>>      >>
>>>>      >> On 02/07/2019 12:03, Chesnay Schepler wrote:
>>>>      >>> People really have to stop thinking that just because
>>>>      something works
>>>>      >>> for us it is also a good solution.
>>>>      >>> Also, please remember that our builds run for 2h from start to
>>>>      finish,
>>>>      >>> and not the 14 _minutes_ it takes for zeppelin.
>>>>      >>> We are dealing with an entirely different scale here, both in
>>>>      terms of
>>>>      >>> build times and number of builds.
>>>>      >>>
>>>>      >>> In this very thread people have been complaining about long
>>>> queue
>>>>      >>> times for their builds. Surprise, other Apache projects have
>>>> been
>>>>      >>> suffering the very same thing due to us not controlling our
>>>> build
>>>>      >>> times. While switching services (be it Jenkins, CircleCI or
>>>>      whatever)
>>>>      >>> will possibly work for us (and these options are actually
>>>>      attractive,
>>>>      >>> like CircleCI's proper support for build artifacts), it will
>>>> also
>>>>      >>> result in us likely negatively affecting other projects in
>>>>      significant
>>>>      >>> ways.
>>>>      >>>
>>>>      >>> Sure, the Jenkins setup has a good user experience for us, at
>>>>      the cost
>>>>      >>> of blocking Jenkins workers for a _lot_ of time. Right now we
>>>>      have 25
>>>>      >>> PR's in our queue; that's possibly 50h we'd consume of Jenkins
>>>>      >>> resources, and the European contributors haven't even really
>>>>      started yet.
>>>>      >>>
>>>>      >>> FYI, the latest INFRA response from INFRA-18533:
>>>>      >>>
>>>>      >>> "Our rough metrics shows that Flink used over 5800 hours of
>>>>      build time
>>>>      >>> last month. That is equal to EIGHT servers running 24/7 for
>>>>      the ENTIRE
>>>>      >>> MONTH. EIGHT. nonstop.
>>>>      >>> When we discovered this last night, we discussed it some and
>>>>      are going
>>>>      >>> to tune down Flink to allow only five executors maximum. We
>>>> cannot
>>>>      >>> allow Flink to consume so much of a Foundation shared resource."
>>>>      >>>
>>>>      >>> So yes, we either
>>>>      >>> a) have to heavily reduce our CI usage or
>>>>      >>> b) fund our own, either maintaining it ourselves or donating
>>>>      to Apache.
>>>>      >>>
>>>>      >>> On 02/07/2019 05:11, Bowen Li wrote:
>>>>      >>>> By looking at the git history of the Jenkins script, its core
>>>>      part
>>>>      >>>> was finished in March 2017 (and only two minor update in
>>>>      2017/2018),
>>>>      >>>> so it's been running for over two years now and feels like
>>>>      Zepplin
>>>>      >>>> community has been quite happy with it. @Jeff Zhang
>>>>      >>>> <mailto:zjffdu@gmail.com <ma...@gmail.com>> can you
>>>>      share your insights and user
>>>>      >>>> experience with the Jenkins+Travis approach?
>>>>      >>>>
>>>>      >>>> Things like:
>>>>      >>>>
>>>>      >>>> - has the approach completely solved the resource capacity
>>>>      problem
>>>>      >>>> for Zepplin community? is Zepplin community happy with the
>>>>      result?
>>>>      >>>> - is the whole configuration chain stable (e.g. uptime) enough?
>>>>      >>>> - how often do you need to maintain the Jenkins infra? how many
>>>>      >>>> people are usually involved in maintenance and bug-fixes?
>>>>      >>>>
>>>>      >>>> The downside of this approach seems mostly to be on the
>>>>      maintenance
>>>>      >>>> to me - maintain the script and Jenkins infra.
>>>>      >>>>
>>>>      >>>> ** Having Our Own Travis-CI.com Account **
>>>>      >>>>
>>>>      >>>> Another alternative I've been thinking of is to have our own
>>>>      >>>> travis-ci.com <http://travis-ci.com> <http://travis-ci.com>
>>>>      account with paid dedicated
>>>>      >>>> resources. Note travis-ci.org <http://travis-ci.org>
>>>>      <http://travis-ci.org> is the free
>>>>      >>>> version and travis-ci.com <http://travis-ci.com>
>>>>      <http://travis-ci.com> is the commercial
>>>>      >>>> version. We currently use a shared resource pool managed by
>>>>      ASK INFRA
>>>>      >>>> team on travis-ci.org <http://travis-ci.org>
>>>>      <http://travis-ci.org>, but we have no control
>>>>      >>>> over it - we can't see how it's configured, how much
>>>>      resources are
>>>>      >>>> available, how resources are allocated among Apache projects,
>>>>      etc.
>>>>      >>>> The nice thing about having an account on travis-ci.com
>>>>      <http://travis-ci.com>
>>>>      >>>> <http://travis-ci.com> are:
>>>>      >>>>
>>>>      >>>> - relatively low cost with much better resource guarantee
>>>>      than what
>>>>      >>>> we currently have [1]: $249/month with 5 dedicated concurrency,
>>>>      >>>> $489/month with 10 concurrency
>>>>      >>>> - low maintenance work compared to using Jenkins
>>>>      >>>> - (potentially) no migration cost according to Travis's doc [2]
>>>>      >>>> (pending verification)
>>>>      >>>> - full control over the build capacity/configuration
>>>> compared to
>>>>      >>>> using ASF INFRA's pool
>>>>      >>>>
>>>>      >>>> I'd be surprised if we as such a vibrant community cannot
>>>>      find and
>>>>      >>>> fund $249*12=$2988 a year in exchange for a much better
>>>> developer
>>>>      >>>> experience and much higher productivity.
>>>>      >>>>
>>>>      >>>> [1] https://travis-ci.com/plans
>>>>      >>>> [2]
>>>>      >>>>
>>>>      >>
>>>>
>> https://docs.travis-ci.com/user/migrate/open-source-repository-migration
>>>>      >>>> On Sat, Jun 29, 2019 at 8:39 AM Chesnay Schepler
>>>>      <chesnay@apache.org <ma...@apache.org>
>>>>      >>>> <mailto:chesnay@apache.org <ma...@apache.org>>>
>> wrote:
>>>>      >>>>
>>>>      >>>>      So yes, the Jenkins job keeps pulling the state from
>>>>      Travis until it
>>>>      >>>>      finishes.
>>>>      >>>>
>>>>      >>>>      Note sure I'm comfortable with the idea of using Jenkins
>>>>      workers
>>>>      >>>>      just to
>>>>      >>>>      idle for a several hours.
>>>>      >>>>
>>>>      >>>>      On 29/06/2019 14:56, Jeff Zhang wrote:
>>>>      >>>>      > Here's what zeppelin community did, we make a python
>>>>      script to
>>>>      >>>>      check the
>>>>      >>>>      > build status of pull request.
>>>>      >>>>      > Here's script:
>>>>      >>>>      >
>>>> https://github.com/apache/zeppelin/blob/master/travis_check.py
>>>>      >>>>      >
>>>>      >>>>      > And this is the script we used in Jenkins build job.
>>>>      >>>>      >
>>>>      >>>>      > if [ -f "travis_check.py" ]; then
>>>>      >>>>      >    git log -n 1
>>>>      >>>>      >    STATUS=$(curl -s $BUILD_URL | grep -e "GitHub pull
>>>>      >>>>      request.*from.*" | sed
>>>>      >>>>      > 's/.*GitHub pull request <a
>>>>      >>>>      > href=\"\(https[^"]*\).*from[^"]*.\(https[^"]*\).*/\1
>>>>      \2/g')
>>>>      >>>>      >    AUTHOR=$(echo $STATUS | sed 's/.*[/]\(.*\)$/\1/g')
>>>>      >>>>      >    PR=$(echo $STATUS | awk '{print $1}' | sed
>>>>      >>>> 's/.*[/]\(.*\)$/\1/g')
>>>>      >>>>      >    #COMMIT=$(git log -n 1 | grep "^Merge:" | awk
>>>>      '{print $3}')
>>>>      >>>>      >    #if [ -z $COMMIT ]; then
>>>>      >>>>      >    #  COMMIT=$(curl -s
>>>>      >>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR
>>>>      >>>>      > | grep -e "\"label\":" -e "\"ref\":" -e "\"sha\":" |
>>>>      tr '\n' ' '
>>>>      >>>>      | sed
>>>>      >>>>      > 's/\(.*sha[^,]*,\)\(.*ref.*\)/\1 = \2/g' | tr = '\n' |
>>>>      grep -v
>>>>      >>>>      "apache:" |
>>>>      >>>>      > sed 's/.*sha.[^"]*["]\([^"]*\).*/\1/g')
>>>>      >>>>      >    #fi
>>>>      >>>>      >
>>>>      >>>>      >    # get commit hash from PR
>>>>      >>>>      >    COMMIT=$(curl -s
>>>>      >>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR |
>>>>      >>>>      > grep -e "\"label\":" -e "\"ref\":" -e "\"sha\":" | tr
>>>>      '\n' ' '
>>>>      >>>> | sed
>>>>      >>>>      > 's/\(.*sha[^,]*,\)\(.*ref.*\)/\1 = \2/g' | tr = '\n' |
>>>>      grep -v
>>>>      >>>>      "apache:" |
>>>>      >>>>      > sed 's/.*sha.[^"]*["]\([^"]*\).*/\1/g')
>>>>      >>>>      >    sleep 30 # sleep few moment to wait travis starts
>>>>      the build
>>>>      >>>>      >    RET_CODE=0
>>>>      >>>>      >    python ./travis_check.py ${AUTHOR} ${COMMIT} ||
>>>>      RET_CODE=$?
>>>>      >>>>      >    if [ $RET_CODE -eq 2 ]; then # try with repository
>>>>      name when
>>>>      >>>>      travis-ci is
>>>>      >>>>      > not available in the account
>>>>      >>>>      >      RET_CODE=0
>>>>      >>>>      >      AUTHOR=$(curl -s
>>>>      >>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR
>>>>      >>>>      > | grep '"full_name":' | grep -v "apache/zeppelin" | sed
>>>>      >>>>      > 's/.*[:][^"]*["]\([^/]*\).*/\1/g')
>>>>      >>>>      >    python ./travis_check.py ${AUTHOR} ${COMMIT} ||
>>>>      RET_CODE=$?
>>>>      >>>>      >    fi
>>>>      >>>>      >
>>>>      >>>>      >    if [ $RET_CODE -eq 2 ]; then # fail with can't find
>>>>      build
>>>>      >>>>      information in
>>>>      >>>>      > the travis
>>>>      >>>>      >      set +x
>>>>      >>>>      >      echo
>>>>      "-----------------------------------------------------"
>>>>      >>>>      >      echo "Looks like travis-ci is not configured for
>>>>      your fork."
>>>>      >>>>      >      echo "Please setup by swich on 'zeppelin'
>>>>      repository at
>>>>      >>>>      > https://travis-ci.org/profile and travis-ci."
>>>>      >>>>      >      echo "And then make sure 'Build branch updates'
>>>>      option is
>>>>      >>>>      enabled in
>>>>      >>>>      > the settings
>>>>      https://travis-ci.org/${AUTHOR}/zeppelin/settings
>>>> <https://travis-ci.org/$%7BAUTHOR%7D/zeppelin/settings>
>>>>      >>>> <https://travis-ci.org/$%7BAUTHOR%7D/zeppelin/settings>."
>>>>      >>>>      >      echo ""
>>>>      >>>>      >      echo "To trigger CI after setup, you will need
>>>>      ammend your
>>>>      >>>>      last commit
>>>>      >>>>      > with"
>>>>      >>>>      >      echo "git commit --amend"
>>>>      >>>>      >      echo "git push your-remote HEAD --force"
>>>>      >>>>      >      echo ""
>>>>      >>>>      >      echo "See
>>>>      >>>>      >
>>>>      >>>>
>>>>      >>
>>>>
>> http://zeppelin.apache.org/contribution/contributions.html#continuous-integration
>>>>      >>>>      > ."
>>>>      >>>>      >    fi
>>>>      >>>>      >
>>>>      >>>>      >    exit $RET_CODE
>>>>      >>>>      > else
>>>>      >>>>      >    set +x
>>>>      >>>>      >    echo "travis_check.py does not exists"
>>>>      >>>>      >    exit 1
>>>>      >>>>      > fi
>>>>      >>>>      >
>>>>      >>>>      > Chesnay Schepler <chesnay@apache.org
>>>>      <ma...@apache.org>
>>>>      >>>>      <mailto:chesnay@apache.org <ma...@apache.org>>>
>>>>      于2019年6月29日周六 下午3:17写道:
>>>>      >>>>      >
>>>>      >>>>      >> Does this imply that a Jenkins job is active as long
>>>>      as the
>>>>      >>>>      Travis build
>>>>      >>>>      >> runs?
>>>>      >>>>      >>
>>>>      >>>>      >> On 26/06/2019 21:28, Bowen Li wrote:
>>>>      >>>>      >>> Hi,
>>>>      >>>>      >>>
>>>>      >>>>      >>> @Dawid, I think the "long test running" as I
>>>>      mentioned in the
>>>>      >>>>      first
>>>>      >>>>      >> email,
>>>>      >>>>      >>> also as you guys said, belongs to "a big effort
>>>>      which is much
>>>>      >>>>      harder to
>>>>      >>>>      >>> accomplish in a short period of time and may deserve
>>>>      its own
>>>>      >>>>      separate
>>>>      >>>>      >>> discussion". Thus I didn't include it in what we can
>>>>      do in a
>>>>      >>>>      foreseeable
>>>>      >>>>      >>> short term.
>>>>      >>>>      >>>
>>>>      >>>>      >>> Besides, I don't think that's the ultimate reason
>>>>      for lack of
>>>>      >>>>      build
>>>>      >>>>      >>> resources. Even if the build is shortened to
>>>>      something like
>>>>      >>>>      2h, the
>>>>      >>>>      >>> problems of no build machine works about 6 or more
>>>>      hours in
>>>>      >>>>      PST daytime
>>>>      >>>>      >>> that I described will still happen, because no
>>>>      machine from
>>>>      >>>>      ASF INFRA's
>>>>      >>>>      >>> pool is allocated to Flink. As I have paid close
>>>>      attention to
>>>>      >>>>      the build
>>>>      >>>>      >>> queue in the past few weekdays, it's a pretty clear
>>>>      pattern now.
>>>>      >>>>      >>>
>>>>      >>>>      >>> **The ultimate root cause** for that is - we don't
>>>>      have any
>>>>      >>>>      **dedicated**
>>>>      >>>>      >>> build resources that we can stably rely on. I'm
>>>>      actually ok to
>>>>      >>>>      wait for a
>>>>      >>>>      >>> long time if there are build requests running, it
>>>>      means at
>>>>      >>>>      least we are
>>>>      >>>>      >>> making progress. But I'm not ok with no build
>>>>      resource. A
>>>>      >>>>      better place I
>>>>      >>>>      >>> think we should aim at in short term is to always
>>>>      have at
>>>>      >>>>      least a central
>>>>      >>>>      >>> pool (can be 3 or 5) of machines dedicated to build
>>>>      Flink at
>>>>      >>>>      any time, or
>>>>      >>>>      >>> maybe use users resources.
>>>>      >>>>      >>>
>>>>      >>>>      >>> @Chesnay @Robert I synced with Jeff offline that
>>>>      Zeppelin
>>>>      >>>>      community is
>>>>      >>>>      >>> using a Jenkins job to automatically build on users'
>>>>      travis
>>>>      >>>>      account and
>>>>      >>>>      >>> link the result back to github PR. I guess the
>>>>      Jenkins job
>>>>      >>>>      would fetch
>>>>      >>>>      >>> latest upstream master and build the PR against it.
>>>>      Jeff has
>>>>      >>>> filed
>>>>      >>>>      >> tickets
>>>>      >>>>      >>> to learn and get access to the Jenkins infra. It'll
>>>>      better to
>>>>      >>>>      fully
>>>>      >>>>      >>> understand it first before judging this approach.
>>>>      >>>>      >>>
>>>>      >>>>      >>> I also heard good things about CircleCI, and ASF
>>>>      INFRA seems
>>>>      >>>>      to have a
>>>>      >>>>      >> pool
>>>>      >>>>      >>> of build capacity there too. Can be an alternative
>>>>      to consider.
>>>>      >>>>      >>>
>>>>      >>>>      >>>
>>>>      >>>>      >>>
>>>>      >>>>      >>>
>>>>      >>>>      >>>
>>>>      >>>>      >>>
>>>>      >>>>      >>>
>>>>      >>>>      >>>
>>>>      >>>>      >>>
>>>>      >>>>      >>> On Wed, Jun 26, 2019 at 12:44 AM Dawid Wysakowicz <
>>>>      >>>>      >> dwysakowicz@apache.org
>>>>      <ma...@apache.org> <mailto:dwysakowicz@apache.org
>>>>      <ma...@apache.org>>>
>>>>      >>>>      >>> wrote:
>>>>      >>>>      >>>
>>>>      >>>>      >>>> Sorry to jump in late, but I think Bowen missed the
>>>>      most
>>>>      >>>>      important point
>>>>      >>>>      >>>> from Chesnay's previous message in the summary. The
>>>>      ultimate
>>>>      >>>>      reason for
>>>>      >>>>      >>>> all the problems is that the tests take close to 2
>>>>      hours to
>>>>      >>>>      run already.
>>>>      >>>>      >>>> I fully support this claim: "Unless people start
>>>>      caring about
>>>>      >>>>      test times
>>>>      >>>>      >>>> before adding them, this issue cannot be solved"
>>>>      >>>>      >>>>
>>>>      >>>>      >>>> This is also another reason why using user's Travis
>>>>      account
>>>>      >>>>      won't help.
>>>>      >>>>      >>>> Every few weeks we reach the user's time limit for
>>>>      a single
>>>>      >>>>      profile.
>>>>      >>>>      >>>> This makes the user's builds simply fail, until we
>>>>      either
>>>>      >>>>      properly
>>>>      >>>>      >>>> decrease the time the tests take (which I am not
>>>>      sure we ever
>>>>      >>>>      did) or
>>>>      >>>>      >>>> postpone the problem by splitting into more
>>>>      profiles. (Note
>>>>      >>>>      that the ASF
>>>>      >>>>      >>>> Travis account has higher time limits)
>>>>      >>>>      >>>>
>>>>      >>>>      >>>> Best,
>>>>      >>>>      >>>>
>>>>      >>>>      >>>> Dawid
>>>>      >>>>      >>>>
>>>>      >>>>      >>>> On 26/06/2019 09:36, Robert Metzger wrote:
>>>>      >>>>      >>>>> Do we know if using "the best" available hardware
>>>>      would
>>>>      >>>>      improve the
>>>>      >>>>      >> build
>>>>      >>>>      >>>>> times?
>>>>      >>>>      >>>>> Imagine we would run the build on machines with
>>>>      plenty of
>>>>      >>>>      main memory
>>>>      >>>>      >> to
>>>>      >>>>      >>>>> mount everything to ramdisk + the latest CPU
>>>>      architecture?
>>>>      >>>>      >>>>>
>>>>      >>>>      >>>>> Throwing hardware at the problem could help reduce
>>>>      the time
>>>>      >>>>      of an
>>>>      >>>>      >>>>> individual build, and using our own infrastructure
>>>>      would
>>>>      >>>>      remove our
>>>>      >>>>      >>>>> dependency on Apache's Travis account (with the
>>>>      obvious
>>>>      >>>>      downside of
>>>>      >>>>      >>>> having
>>>>      >>>>      >>>>> to maintain the infrastructure)
>>>>      >>>>      >>>>> We could use an open source travis alternative, to
>>>>      have a
>>>>      >>>>      similar
>>>>      >>>>      >>>>> experience and make the migration easy.
>>>>      >>>>      >>>>>
>>>>      >>>>      >>>>>
>>>>      >>>>      >>>>> On Wed, Jun 26, 2019 at 9:34 AM Chesnay Schepler
>>>>      >>>>      <chesnay@apache.org <ma...@apache.org>
>>>>      <mailto:chesnay@apache.org <ma...@apache.org>>>
>>>>      >>>>      >>>> wrote:
>>>>      >>>>      >>>>>>    >From what I gathered, there's no special
>>>>      sauce that the
>>>>      >>>>      Zeppelin
>>>>      >>>>      >>>>>> project uses which actually integrates a users
>>>> Travis
>>>>      >>>>      account into the
>>>>      >>>>      >>>> PR.
>>>>      >>>>      >>>>>> They just disabled Travis for PRs. And that's
>>>>      kind of it.
>>>>      >>>>      >>>>>>
>>>>      >>>>      >>>>>> Naturally we can do this (duh) and safe the ASF a
>>>>      fair
>>>>      >>>>      amount of
>>>>      >>>>      >>>>>> resources, but there are downsides:
>>>>      >>>>      >>>>>>
>>>>      >>>>      >>>>>> The discoverability of the Travis check takes a
>>>>      nose-dive.
>>>>      >>>>      Either we
>>>>      >>>>      >>>>>> require every contributor to always, an every
>>>>      commit, also
>>>>      >>>>      post a
>>>>      >>>>      >> Travis
>>>>      >>>>      >>>>>> build, or we have the reviewer sift through the
>>>>      >>>>      contributors account
>>>>      >>>>      >> to
>>>>      >>>>      >>>>>> find it.
>>>>      >>>>      >>>>>>
>>>>      >>>>      >>>>>> This is rather cumbersome. Additionally, it's
>>>>      also not
>>>>      >>>>      equivalent to
>>>>      >>>>      >>>>>> having a PR build.
>>>>      >>>>      >>>>>>
>>>>      >>>>      >>>>>> A normal branch build takes a branch as is and
>>>>      tests it. A
>>>>      >>>>      PR build
>>>>      >>>>      >>>>>> merges the branch into master, and then runs it.
>>>>      (Fun fact:
>>>>      >>>>      This is
>>>>      >>>>      >> why
>>>>      >>>>      >>>>>> a PR without merge conflicts is not being run on
>>>>      Travis.)
>>>>      >>>>      >>>>>>
>>>>      >>>>      >>>>>> And ultimately, everyone can already make use of
>>>> this
>>>>      >>>>      approach anyway.
>>>>      >>>>      >>>>>>
>>>>      >>>>      >>>>>> On 25/06/2019 08:02, Jark Wu wrote:
>>>>      >>>>      >>>>>>> Hi Jeff,
>>>>      >>>>      >>>>>>>
>>>>      >>>>      >>>>>>> Thanks for sharing the Zeppelin approach. I
>>>>      think it's a
>>>>      >>>>      good idea to
>>>>      >>>>      >>>>>>> leverage user's travis account.
>>>>      >>>>      >>>>>>> In this way, we can have almost unlimited
>>>>      concurrent build
>>>>      >>>>      jobs and
>>>>      >>>>      >>>>>>> developers can restart build by themselves
>>>>      (currently only
>>>>      >>>>      committers
>>>>      >>>>      >>>>>>> can restart PR's build).
>>>>      >>>>      >>>>>>>
>>>>      >>>>      >>>>>>> But I'm still not very clear how to integrate
>>>> user's
>>>>      >>>>      travis build
>>>>      >>>>      >> into
>>>>      >>>>      >>>>>>> the Flink pull request's build automatically.
>>>>      Can you
>>>>      >>>>      explain more in
>>>>      >>>>      >>>>>>> detail?
>>>>      >>>>      >>>>>>>
>>>>      >>>>      >>>>>>> Another question: does travis only build
>>>>      branches for user
>>>>      >>>>      account?
>>>>      >>>>      >>>>>>> My concern is that builds for PRs will rebase
>>>> user's
>>>>      >>>>      commits against
>>>>      >>>>      >>>>>>> current master branch.
>>>>      >>>>      >>>>>>> This will help us to find problems before
>>>>      merge.  Builds
>>>>      >>>>      for branches
>>>>      >>>>      >>>>>>> will lose the impact of new commits in master.
>>>>      >>>>      >>>>>>> How does Zeppelin solve this problem?
>>>>      >>>>      >>>>>>>
>>>>      >>>>      >>>>>>> Thanks again for sharing the idea.
>>>>      >>>>      >>>>>>>
>>>>      >>>>      >>>>>>> Regards,
>>>>      >>>>      >>>>>>> Jark
>>>>      >>>>      >>>>>>>
>>>>      >>>>      >>>>>>> On Tue, 25 Jun 2019 at 11:01, Jeff Zhang
>>>>      <zjffdu@gmail.com <ma...@gmail.com>
>>>>      >>>>      <mailto:zjffdu@gmail.com <ma...@gmail.com>>
>>>>      >>>>      >>>>>>> <mailto:zjffdu@gmail.com
>>>>      <ma...@gmail.com> <mailto:zjffdu@gmail.com
>>>>      <ma...@gmail.com>>>> wrote:
>>>>      >>>>      >>>>>>>
>>>>      >>>>      >>>>>>>       Hi Folks,
>>>>      >>>>      >>>>>>>
>>>>      >>>>      >>>>>>>  Zeppelin meet this kind of issue before, we solve
>>>>      >>>> it by
>>>>      >>>>      >> delegating
>>>>      >>>>      >>>>>>>  each
>>>>      >>>>      >>>>>>>  one's PR build to his travis account
>>>>      (Everyone can
>>>>      >>>>      have 5 free
>>>>      >>>>      >>>>>>>  slot for
>>>>      >>>>      >>>>>>>  travis build).
>>>>      >>>>      >>>>>>>  Apache account travis build is only triggered
>>>> when
>>>>      >>>>      PR is merged.
>>>>      >>>>      >>>>>>>
>>>>      >>>>      >>>>>>>
>>>>      >>>>      >>>>>>>
>>>>      >>>>      >>>>>>>  Kurt Young <ykt836@gmail.com
>>>>      <ma...@gmail.com>
>>>>      >>>>      <mailto:ykt836@gmail.com <ma...@gmail.com>>
>>>>      <mailto:ykt836@gmail.com <ma...@gmail.com>
>>>>      >>>>      <mailto:ykt836@gmail.com <ma...@gmail.com>>>>
>>>>      >>>>      >>>>>>>  于2019年6月25日周二 上午10:16写道:
>>>>      >>>>      >>>>>>>
>>>>      >>>>      >>>>>>>  > (Forgot to cc George)
>>>>      >>>>      >>>>>>>  >
>>>>      >>>>      >>>>>>>  > Best,
>>>>      >>>>      >>>>>>>  > Kurt
>>>>      >>>>      >>>>>>>  >
>>>>      >>>>      >>>>>>>  >
>>>>      >>>>      >>>>>>>  > On Tue, Jun 25, 2019 at 10:16 AM Kurt Young
>>>>      >>>>      <ykt836@gmail.com <ma...@gmail.com>
>>>>      <mailto:ykt836@gmail.com <ma...@gmail.com>>
>>>>      >>>>      >>>>>>> <mailto:ykt836@gmail.com
>>>>      <ma...@gmail.com> <mailto:ykt836@gmail.com
>>>>      <ma...@gmail.com>>>>
>>>>      >>>>      wrote:
>>>>      >>>>      >>>>>>>  >
>>>>      >>>>      >>>>>>>  > > Hi Bowen,
>>>>      >>>>      >>>>>>>  > >
>>>>      >>>>      >>>>>>>  > > Thanks for bringing this up. We
>>>>      actually have
>>>>      >>>>      discussed
>>>>      >>>>      >> about
>>>>      >>>>      >>>>>>>  this, and I
>>>>      >>>>      >>>>>>>  > > think Till and George have
>>>>      >>>>      >>>>>>>  > > already spend sometime investigating
>>>>      it. I have
>>>>      >>>>      cced both of
>>>>      >>>>      >>>>>>>  them, and
>>>>      >>>>      >>>>>>>  > > maybe they can share
>>>>      >>>>      >>>>>>>  > > their findings.
>>>>      >>>>      >>>>>>>  > >
>>>>      >>>>      >>>>>>>  > > Best,
>>>>      >>>>      >>>>>>>  > > Kurt
>>>>      >>>>      >>>>>>>  > >
>>>>      >>>>      >>>>>>>  > >
>>>>      >>>>      >>>>>>>  > > On Tue, Jun 25, 2019 at 10:08 AM Jark Wu
>>>>      >>>>      <imjark@gmail.com <ma...@gmail.com>
>>>>      <mailto:imjark@gmail.com <ma...@gmail.com>>
>>>>      >>>>      >>>>>>> <mailto:imjark@gmail.com
>>>>      <ma...@gmail.com> <mailto:imjark@gmail.com
>>>>      <ma...@gmail.com>>>>
>>>>      >>>>      wrote:
>>>>      >>>>      >>>>>>>  > >
>>>>      >>>>      >>>>>>>  > >> Hi Bowen,
>>>>      >>>>      >>>>>>>  > >>
>>>>      >>>>      >>>>>>>  > >> Thanks for bringing this. We also
>>>>      suffered from
>>>>      >>>>      the long
>>>>      >>>>      >>>>>>>  build time.
>>>>      >>>>      >>>>>>>  > >> I agree that we should focus on
>>>>      solving build
>>>>      >>>>      capacity
>>>>      >>>>      >>>>>>>  problem in the
>>>>      >>>>      >>>>>>>  > >> thread.
>>>>      >>>>      >>>>>>>  > >>
>>>>      >>>>      >>>>>>>  > >> My observation is there is only one
>>>>      build is
>>>>      >>>>      running, all
>>>>      >>>>      >> the
>>>>      >>>>      >>>>>>>  others
>>>>      >>>>      >>>>>>>  > >> (other
>>>>      >>>>      >>>>>>>  > >> PRs, master) are pending.
>>>>      >>>>      >>>>>>>  > >> The pricing plan[1] of travis shows
>>>>      it can
>>>>      >>>> support
>>>>      >>>>      >> concurrent
>>>>      >>>>      >>>>>>>  build
>>>>      >>>>      >>>>>>>  > jobs.
>>>>      >>>>      >>>>>>>  > >> But I don't know which plan we are
>>>>      using, might
>>>>      >>>>      be the free
>>>>      >>>>      >>>>>>>  plan for
>>>>      >>>>      >>>>>>>  > open
>>>>      >>>>      >>>>>>>  > >> source.
>>>>      >>>>      >>>>>>>  > >>
>>>>      >>>>      >>>>>>>  > >> I cc-ed Chesnay who may have some
>>>>      experience on
>>>>      >>>>      Travis.
>>>>      >>>>      >>>>>>>  > >>
>>>>      >>>>      >>>>>>>  > >> Regards,
>>>>      >>>>      >>>>>>>  > >> Jark
>>>>      >>>>      >>>>>>>  > >>
>>>>      >>>>      >>>>>>>  > >> [1]: https://travis-ci.com/plans
>>>>      >>>>      >>>>>>>  > >>
>>>>      >>>>      >>>>>>>  > >> On Tue, 25 Jun 2019 at 08:11, Bowen Li <
>>>>      >>>>      >> bowenli86@gmail.com <ma...@gmail.com>
>>>>      <mailto:bowenli86@gmail.com <ma...@gmail.com>>
>>>>      >>>>      >>>>>>> <mailto:bowenli86@gmail.com
>>>>      <ma...@gmail.com>
>>>>      >>>>      <mailto:bowenli86@gmail.com
>>>>      <ma...@gmail.com>>>> wrote:
>>>>      >>>>      >>>>>>>  > >>
>>>>      >>>>      >>>>>>>  > >> > Hi Steven,
>>>>      >>>>      >>>>>>>  > >> >
>>>>      >>>>      >>>>>>>  > >> > I think you may not read what I
>>>>      wrote. The
>>>>      >>>>      discussion is
>>>>      >>>>      >>>> about
>>>>      >>>>      >>>>>>>  > "unstable
>>>>      >>>>      >>>>>>>  > >> > build **capacity**", in another word
>>>>      >>>>      "unstable / lack of
>>>>      >>>>      >>>> build
>>>>      >>>>      >>>>>>>  > >> resources",
>>>>      >>>>      >>>>>>>  > >> > not "unstable build".
>>>>      >>>>      >>>>>>>  > >> >
>>>>      >>>>      >>>>>>>  > >> > On Mon, Jun 24, 2019 at 4:40 PM
>>>>      Steven Wu
>>>>      >>>>      >>>>>>>  <stevenz3wu@gmail.com
>>>>      <ma...@gmail.com> <mailto:stevenz3wu@gmail.com
>>>>      <ma...@gmail.com>>
>>>>      >>>>      <mailto:stevenz3wu@gmail.com
>>>>      <ma...@gmail.com> <mailto:stevenz3wu@gmail.com
>>>>      <ma...@gmail.com>>>>
>>>>      >>>>      >>>>>>>  > wrote:
>>>>      >>>>      >>>>>>>  > >> >
>>>>      >>>>      >>>>>>>  > >> > > long and sometimes unstable build is
>>>>      >>>>      definitely a pain
>>>>      >>>>      >>>>>> point.
>>>>      >>>>      >>>>>>>  > >> > >
>>>>      >>>>      >>>>>>>  > >> > > I suspect the build failure here in
>>>>      >>>>      >> flink-connector-kafka
>>>>      >>>>      >>>>>>>       is not
>>>>      >>>>      >>>>>>>  > >> related
>>>>      >>>>      >>>>>>>  > >> > to
>>>>      >>>>      >>>>>>>  > >> > > my change. but there is no easy
>>>>      re-run the
>>>>      >>>>      build on
>>>>      >>>>      >>>>>>>  travis UI.
>>>>      >>>>      >>>>>>>  > Google
>>>>      >>>>      >>>>>>>  > >> > > search showed a trick of
>>>>      close-and-open the
>>>>      >>>>      PR will
>>>>      >>>>      >>>>>>>  trigger rebuild.
>>>>      >>>>      >>>>>>>  > >> but
>>>>      >>>>      >>>>>>>  > >> > > that could add noises to the PR
>>>>      activities.
>>>>      >>>>      >>>>>>>  > >> > >
>>>>      >>>> https://travis-ci.org/apache/flink/jobs/545555519
>>>>      >>>>      >>>>>>>  > >> > >
>>>>      >>>>      >>>>>>>  > >> > > travis-ci for my personal repo
>>>>      often failed
>>>>      >>>>      with
>>>>      >>>>      >>>>>>>  exceeding time
>>>>      >>>>      >>>>>>>  > limit
>>>>      >>>>      >>>>>>>  > >> > after
>>>>      >>>>      >>>>>>>  > >> > > 4+ hours.
>>>>      >>>>      >>>>>>>  > >> > > The job exceeded the maximum time
>>>>      limit for
>>>>      >>>>      jobs, and
>>>>      >>>>      >> has
>>>>      >>>>      >>>>>>>  been
>>>>      >>>>      >>>>>>>  > >> > terminated.
>>>>      >>>>      >>>>>>>  > >> > >
>>>>      >>>>      >>>>>>>  > >> > > On Mon, Jun 24, 2019 at 4:15 PM
>>>>      Bowen Li
>>>>      >>>>      >>>>>>>  <bowenli86@gmail.com
>>>>      <ma...@gmail.com> <mailto:bowenli86@gmail.com
>>>>      <ma...@gmail.com>>
>>>>      >>>>      <mailto:bowenli86@gmail.com <ma...@gmail.com>
>>>>      <mailto:bowenli86@gmail.com <ma...@gmail.com>>>>
>>>>      >>>>      >>>>>>>  > wrote:
>>>>      >>>>      >>>>>>>  > >> > >
>>>>      >>>>      >>>>>>>  > >> > > >
>>>>      >>>> https://travis-ci.org/apache/flink/builds/549681530
>>>>      >>>>      >>>>>>>  This build
>>>>      >>>>      >>>>>>>  > >> > request
>>>>      >>>>      >>>>>>>  > >> > > > has
>>>>      >>>>      >>>>>>>  > >> > > > been sitting at **HEAD of the
>>>>      queue**
>>>>      >>>>      since I first
>>>>      >>>>      >> saw
>>>>      >>>>      >>>>>>>       it at PST
>>>>      >>>>      >>>>>>>  > >> > 10:30am
>>>>      >>>>      >>>>>>>  > >> > > > (not sure how long it's been
>>>>      there before
>>>>      >>>>      10:30am).
>>>>      >>>>      >>>>>>>  It's PST
>>>>      >>>>      >>>>>>>  > 4:12pm
>>>>      >>>>      >>>>>>>  > >> now
>>>>      >>>>      >>>>>>>  > >> > > and
>>>>      >>>>      >>>>>>>  > >> > > > it hasn't started yet.
>>>>      >>>>      >>>>>>>  > >> > > >
>>>>      >>>>      >>>>>>>  > >> > > > On Mon, Jun 24, 2019 at 2:48 PM
>>>>      Bowen Li
>>>>      >>>>      >>>>>>>  <bowenli86@gmail.com
>>>>      <ma...@gmail.com> <mailto:bowenli86@gmail.com
>>>>      <ma...@gmail.com>>
>>>>      >>>>      <mailto:bowenli86@gmail.com <ma...@gmail.com>
>>>>      <mailto:bowenli86@gmail.com <ma...@gmail.com>>>>
>>>>      >>>>      >>>>>>>  > >> wrote:
>>>>      >>>>      >>>>>>>  > >> > > >
>>>>      >>>>      >>>>>>>  > >> > > > > Hi devs,
>>>>      >>>>      >>>>>>>  > >> > > > >
>>>>      >>>>      >>>>>>>  > >> > > > > I've been experiencing the pain
>>>>      >>>>      resulting from lack
>>>>      >>>>      >>>>>>>       of stable
>>>>      >>>>      >>>>>>>  > >> build
>>>>      >>>>      >>>>>>>  > >> > > > > capacity on Travis for Flink
>>>>      PRs [1].
>>>>      >>>>      >> Specifically, I
>>>>      >>>>      >>>>>>>  noticed
>>>>      >>>>      >>>>>>>  > >> often
>>>>      >>>>      >>>>>>>  > >> > > that
>>>>      >>>>      >>>>>>>  > >> > > > no
>>>>      >>>>      >>>>>>>  > >> > > > > build in the queue is making any
>>>>      >>>>      progress for
>>>>      >>>>      >> hours,
>>>>      >>>>      >>>> and
>>>>      >>>>      >>>>>>>  > suddenly
>>>>      >>>>      >>>>>>>  > >> 5
>>>>      >>>>      >>>>>>>  > >> > or
>>>>      >>>>      >>>>>>>  > >> > > 6
>>>>      >>>>      >>>>>>>  > >> > > > > builds kick off all together
>>>>      after the
>>>>      >>>>      long pause.
>>>>      >>>>      >>>>>>>       I'm at PST
>>>>      >>>>      >>>>>>>  > >> > (UTC-08)
>>>>      >>>>      >>>>>>>  > >> > > > time
>>>>      >>>>      >>>>>>>  > >> > > > > zone, and I've seen pause can
>>>>      be as
>>>>      >>>>      long as 6 hours
>>>>      >>>>      >>>>>>>  from PST 9am
>>>>      >>>>      >>>>>>>  > >> to
>>>>      >>>>      >>>>>>>  > >> > 3pm
>>>>      >>>>      >>>>>>>  > >> > > > > (let alone the time needed to
>>>>      drain the
>>>>      >>>>      queue
>>>>      >>>>      >>>>>>>  afterwards).
>>>>      >>>>      >>>>>>>  > >> > > > >
>>>>      >>>>      >>>>>>>  > >> > > > > I think this has greatly
>>>>      impacted our
>>>>      >>>>      productivity.
>>>>      >>>>      >>>> I've
>>>>      >>>>      >>>>>>>  > >> experienced
>>>>      >>>>      >>>>>>>  > >> > > that
>>>>      >>>>      >>>>>>>  > >> > > > > PRs submitted in the early
>>>>      morning of
>>>>      >>>>      PST time zone
>>>>      >>>>      >>>>>>>  won't finish
>>>>      >>>>      >>>>>>>  > >> > their
>>>>      >>>>      >>>>>>>  > >> > > > > build until late night of the
>>>>      same day.
>>>>      >>>>      >>>>>>>  > >> > > > >
>>>>      >>>>      >>>>>>>  > >> > > > > So my questions are:
>>>>      >>>>      >>>>>>>  > >> > > > >
>>>>      >>>>      >>>>>>>  > >> > > > > - Has anyone else experienced
>>>>      the same
>>>>      >>>>      problem or
>>>>      >>>>      >>>>>>>  have similar
>>>>      >>>>      >>>>>>>  > >> > > > observation
>>>>      >>>>      >>>>>>>  > >> > > > > on TravisCI? (I suspect it
>>>>      has things
>>>>      >>>>      to do with
>>>>      >>>>      >> time
>>>>      >>>>      >>>>>>>  zone)
>>>>      >>>>      >>>>>>>  > >> > > > >
>>>>      >>>>      >>>>>>>  > >> > > > > - What pricing plan of
>>>>      TravisCI is
>>>>      >>>>      Flink currently
>>>>      >>>>      >>>>>>>  using? Is it
>>>>      >>>>      >>>>>>>  > >> the
>>>>      >>>>      >>>>>>>  > >> > > free
>>>>      >>>>      >>>>>>>  > >> > > > > plan for open source
>>>>      projects? What
>>>>      >>>> are the
>>>>      >>>>      >>>>>>>  guaranteed build
>>>>      >>>>      >>>>>>>  > >> capacity
>>>>      >>>>      >>>>>>>  > >> > > of
>>>>      >>>>      >>>>>>>  > >> > > > > the current plan?
>>>>      >>>>      >>>>>>>  > >> > > > >
>>>>      >>>>      >>>>>>>  > >> > > > > - If the current pricing plan
>>>>      (either
>>>>      >>>>      free or paid)
>>>>      >>>>      >>>>>> can't
>>>>      >>>>      >>>>>>>  > provide
>>>>      >>>>      >>>>>>>  > >> > > stable
>>>>      >>>>      >>>>>>>  > >> > > > > build capacity, can we
>>>>      upgrade to a
>>>>      >>>>      higher priced
>>>>      >>>>      >>>>>>>  plan with
>>>>      >>>>      >>>>>>>  > larger
>>>>      >>>>      >>>>>>>  > >> > and
>>>>      >>>>      >>>>>>>  > >> > > > more
>>>>      >>>>      >>>>>>>  > >> > > > > stable build capacity?
>>>>      >>>>      >>>>>>>  > >> > > > >
>>>>      >>>>      >>>>>>>  > >> > > > > BTW, another factor that
>>>>      contribute to
>>>>      >>>> the
>>>>      >>>>      >>>>>>>  productivity problem
>>>>      >>>>      >>>>>>>  > is
>>>>      >>>>      >>>>>>>  > >> > that
>>>>      >>>>      >>>>>>>  > >> > > > > our build is slow - we run
>>>>      full build
>>>>      >>>>      for every PR
>>>>      >>>>      >>>> and a
>>>>      >>>>      >>>>>>>  > >> successful
>>>>      >>>>      >>>>>>>  > >> > > full
>>>>      >>>>      >>>>>>>  > >> > > > > build takes ~5h. We
>>>>      definitely have
>>>>      >>>>      more options to
>>>>      >>>>      >>>>>>>  solve it,
>>>>      >>>>      >>>>>>>  > for
>>>>      >>>>      >>>>>>>  > >> > > > instance,
>>>>      >>>>      >>>>>>>  > >> > > > > modularize the build graphs
>>>>      and reuse
>>>>      >>>>      artifacts
>>>>      >>>>      >> from
>>>>      >>>>      >>>> the
>>>>      >>>>      >>>>>>>  > previous
>>>>      >>>>      >>>>>>>  > >> > > build.
>>>>      >>>>      >>>>>>>  > >> > > > > But I think that can be a big
>>>>      effort
>>>>      >>>>      which is much
>>>>      >>>>      >>>>>>>  harder to
>>>>      >>>>      >>>>>>>  > >> > accomplish
>>>>      >>>>      >>>>>>>  > >> > > > in
>>>>      >>>>      >>>>>>>  > >> > > > > a short period of time and
>>>>      may deserve
>>>>      >>>>      its own
>>>>      >>>>      >>>> separate
>>>>      >>>>      >>>>>>>  > >> discussion.
>>>>      >>>>      >>>>>>>  > >> > > > >
>>>>      >>>>      >>>>>>>  > >> > > > > [1]
>>>>      >>>>      >> https://travis-ci.org/apache/flink/pull_requests
>>>>      >>>>      >>>>>>>  > >> > > > >
>>>>      >>>>      >>>>>>>  > >> > > > >
>>>>      >>>>      >>>>>>>  > >> > > >
>>>>      >>>>      >>>>>>>  > >> > >
>>>>      >>>>      >>>>>>>  > >> >
>>>>      >>>>      >>>>>>>  > >>
>>>>      >>>>      >>>>>>>  > >
>>>>      >>>>      >>>>>>>  >
>>>>      >>>>      >>>>>>>
>>>>      >>>>      >>>>>>>
>>>>      >>>>      >>>>>>>       --
>>>>      >>>>      >>>>>>>  Best Regards
>>>>      >>>>      >>>>>>>
>>>>      >>>>      >>>>>>>  Jeff Zhang
>>>>      >>>>      >>>>>>>
>>>>      >>>>      >>
>>>>      >>>>
>>>>      >>>
>>>>      >>
>>>>
>>>
>>


Re: [RESULT][VOTE] Migrate to sponsored Travis account

Posted by Congxian Qiu <qc...@gmail.com>.
As we used flink bot to trigger the CI test, could we add a command for
flink bot to retrigger the CI(sometimes we may encounter some flaky tests)

Best,
Congxian


Chesnay Schepler <ch...@apache.org> 于2019年7月8日周一 上午5:01写道:

> The vote has passed unanimously in favor of migrating to a separate
> Travis account.
>
> I will now set things up such that no PullRequest is no longer run on
> the ASF servers.
> This is a major setup in reducing our usage of ASF resources.
> For the time being we'll use free Travis plan for flink-ci (i.e. 5
> workers, which is the same the ASF gives us). Over the course of the
> next week we'll setup the Ververica subscription to increase this limit.
>
>  From now now, a bot will mirror all new and updated PullRequests to a
> mirror repository (https://github.com/flink-ci/flink-ci) and write an
> update into the PR once the build is complete.
> I have ran the bots for the past 3 days in parallel to our existing
> Travis and it was working without major issues.
>
> The biggest change that contributors will see is that there's no longer
> a icon next to each commit. We may revisit this in the future.
>
> I'll setup a repo with the source of the bot later.
>
> On 04/07/2019 10:46, Chesnay Schepler wrote:
> > I've raised a JIRA
> > <https://issues.apache.org/jira/browse/INFRA-18703>with INFRA to
> > inquire whether it would be possible to switch to a different Travis
> > account, and if so what steps would need to be taken.
> > We need a proper confirmation from INFRA since we are not in full
> > control of the flink repository (for example, we cannot access the
> > settings page).
> >
> > If this is indeed possible, Ververica is willing sponsor a Travis
> > account for the Flink project.
> > This would provide us with more than enough resources than we need.
> >
> > Since this makes the project more reliant on resources provided by
> > external companies I would like to vote on this.
> >
> > Please vote on this proposal, as follows:
> > [ ] +1, Approve the migration to a Ververica-sponsored Travis account,
> > provided that INFRA approves
> > [ ] -1, Do not approach the migration to a Ververica-sponsored Travis
> > account
> >
> > The vote will be open for at least 24h, and until we have confirmation
> > from INFRA. The voting period may be shorter than the usual 3 days
> > since our current is effectively not working.
> >
> > On 04/07/2019 06:51, Bowen Li wrote:
> >> Re: > Are they using their own Travis CI pool, or did the switch to
> >> an entirely different CI service?
> >>
> >> I reached out to Wes and Krisztián from Apache Arrow PMC. They are
> >> currently moving away from ASF's Travis to their own in-house metal
> >> machines at [1] with custom CI application at [2]. They've seen
> >> significant improvement w.r.t both much higher performance and
> >> basically no resource waiting time, "night-and-day" difference
> >> quoting Wes.
> >>
> >> Re: > If we can just switch to our own Travis pool, just for our
> >> project, then this might be something we can do fairly quickly?
> >>
> >> I believe so, according to [3] and [4]
> >>
> >>
> >> [1] https://ci.ursalabs.org/ <https://ci.ursalabs.org/#/>
> >> [2] https://github.com/ursa-labs/ursabot
> >> [3]
> >>
> https://docs.travis-ci.com/user/migrate/open-source-repository-migration
> >> [4]
> https://docs.travis-ci.com/user/migrate/open-source-on-travis-ci-com
> >>
> >>
> >>
> >> On Wed, Jul 3, 2019 at 12:01 AM Chesnay Schepler <chesnay@apache.org
> >> <ma...@apache.org>> wrote:
> >>
> >>     Are they using their own Travis CI pool, or did the switch to an
> >>     entirely different CI service?
> >>
> >>     If we can just switch to our own Travis pool, just for our
> >>     project, then
> >>     this might be something we can do fairly quickly?
> >>
> >>     On 03/07/2019 05:55, Bowen Li wrote:
> >>     > I responded in the INFRA ticket [1] that I believe they are
> >>     using a wrong
> >>     > metric against Flink and the total build time is a completely
> >>     different
> >>     > thing than guaranteed build capacity.
> >>     >
> >>     > My response:
> >>     >
> >>     > "As mentioned above, since I started to pay attention to Flink's
> >>     build
> >>     > queue a few tens of days ago, I'm in Seattle and I saw no build
> >>     was kicking
> >>     > off in PST daytime in weekdays for Flink. Our teammates in China
> >>     and Europe
> >>     > have also reported similar observations. So we need to evaluate
> >>     how the
> >>     > large total build time came from - if 1) your number and 2) our
> >>     > observations from three locations that cover pretty much a full
> >>     day, are
> >>     > all true, I **guess** one reason can be that - highly likely the
> >>     extra
> >>     > build time came from weekends when other Apache projects may be
> >>     idle and
> >>     > Flink just drains hard its congested queue.
> >>     >
> >>     > Please be aware of that we're not complaining about the lack of
> >>     resources
> >>     > in general, I'm complaining about the lack of **stable,
> >> dedicated**
> >>     > resources. An example for the latter one is, currently even if
> >>     no build is
> >>     > in Flink's queue and I submit a request to be the queue head in
> >> PST
> >>     > morning, my build won't even start in 6-8+h. That is an absurd
> >>     amount of
> >>     > waiting time.
> >>     >
> >>     > That's saying, if ASF INFRA decides to adopt a quota system and
> >>     grants
> >>     > Flink five DEDICATED servers that runs all the time only for
> >>     Flink, that'll
> >>     > be PERFECT and can totally solve our problem now.
> >>     >
> >>     > Please be aware of that we're not complaining about the lack of
> >>     resources
> >>     > in general, I'm complaining about the lack of **stable,
> >> dedicated**
> >>     > resources. An example for the latter one is, currently even if
> >>     no build is
> >>     > in Flink's queue and I submit a request to be the queue head in
> >> PST
> >>     > morning, my build won't even start in 6-8+h. That is an absurd
> >>     amount of
> >>     > waiting time.
> >>     >
> >>     >
> >>     > That's saying, if ASF INFRA decides to adopt a quota system and
> >>     grants
> >>     > Flink five DEDICATED servers that runs all the time only for
> >>     Flink, that'll
> >>     > be PERFECT and can totally solve our problem now.
> >>     >
> >>     > I feel what's missing in the ASF INFRA's Travis resource pool is
> >>     some level
> >>     > of build capacity SLAs and certainty"
> >>     >
> >>     >
> >>     > Again, I believe there are differences in nature of these two
> >>     problems,
> >>     > long build time v.s. lack of dedicated build resource. That's
> >>     saying,
> >>     > shortening build time may relieve the situation, and may not.
> >>     I'm sightly
> >>     > negative on disabling IT cases for PRs, due to the downside is
> >>     that we are
> >>     > at risk of any potential bugs in PR that UTs doesn't catch, and
> >>     may cost a
> >>     > lot more to fix and if it slows others down or even block
> >>     others, but am
> >>     > open to others opinions on it.
> >>     >
> >>     > AFAICT from INFRA ticket[1], donating to ASF INFRA won't be
> >>     feasible to
> >>     > solve our problem since INFRA's pool is fully shared and they
> >>     have no
> >>     > control and finer insights over resource allocation to a
> >>     specific Apache
> >>     > project. As mentioned in [1], Apache Arrow is moving away from
> >>     ASF INFRA
> >>     > Travis pool (they are actually surprised Flink hasn't plan to do
> >>     so). I
> >>     > know that Spark is on its own build infra. If we all agree that
> >>     funding our
> >>     > own build infra, I'd be glad to help investigate any potential
> >>     options
> >>     > after releasing 1.9 since I'm super busy with 1.9 now.
> >>     >
> >>     > [1] https://issues.apache.org/jira/browse/INFRA-18533
> >>     >
> >>     >
> >>     >
> >>     > On Tue, Jul 2, 2019 at 4:46 AM Chesnay Schepler
> >>     <chesnay@apache.org <ma...@apache.org>> wrote:
> >>     >
> >>     >> As a short-term stopgap, since we can assume this issue to
> >>     become much
> >>     >> worse in the following days/weeks, we could disable IT cases in
> >>     PRs and
> >>     >> only run them on master.
> >>     >>
> >>     >> On 02/07/2019 12:03, Chesnay Schepler wrote:
> >>     >>> People really have to stop thinking that just because
> >>     something works
> >>     >>> for us it is also a good solution.
> >>     >>> Also, please remember that our builds run for 2h from start to
> >>     finish,
> >>     >>> and not the 14 _minutes_ it takes for zeppelin.
> >>     >>> We are dealing with an entirely different scale here, both in
> >>     terms of
> >>     >>> build times and number of builds.
> >>     >>>
> >>     >>> In this very thread people have been complaining about long
> >> queue
> >>     >>> times for their builds. Surprise, other Apache projects have
> >> been
> >>     >>> suffering the very same thing due to us not controlling our
> >> build
> >>     >>> times. While switching services (be it Jenkins, CircleCI or
> >>     whatever)
> >>     >>> will possibly work for us (and these options are actually
> >>     attractive,
> >>     >>> like CircleCI's proper support for build artifacts), it will
> >> also
> >>     >>> result in us likely negatively affecting other projects in
> >>     significant
> >>     >>> ways.
> >>     >>>
> >>     >>> Sure, the Jenkins setup has a good user experience for us, at
> >>     the cost
> >>     >>> of blocking Jenkins workers for a _lot_ of time. Right now we
> >>     have 25
> >>     >>> PR's in our queue; that's possibly 50h we'd consume of Jenkins
> >>     >>> resources, and the European contributors haven't even really
> >>     started yet.
> >>     >>>
> >>     >>> FYI, the latest INFRA response from INFRA-18533:
> >>     >>>
> >>     >>> "Our rough metrics shows that Flink used over 5800 hours of
> >>     build time
> >>     >>> last month. That is equal to EIGHT servers running 24/7 for
> >>     the ENTIRE
> >>     >>> MONTH. EIGHT. nonstop.
> >>     >>> When we discovered this last night, we discussed it some and
> >>     are going
> >>     >>> to tune down Flink to allow only five executors maximum. We
> >> cannot
> >>     >>> allow Flink to consume so much of a Foundation shared resource."
> >>     >>>
> >>     >>> So yes, we either
> >>     >>> a) have to heavily reduce our CI usage or
> >>     >>> b) fund our own, either maintaining it ourselves or donating
> >>     to Apache.
> >>     >>>
> >>     >>> On 02/07/2019 05:11, Bowen Li wrote:
> >>     >>>> By looking at the git history of the Jenkins script, its core
> >>     part
> >>     >>>> was finished in March 2017 (and only two minor update in
> >>     2017/2018),
> >>     >>>> so it's been running for over two years now and feels like
> >>     Zepplin
> >>     >>>> community has been quite happy with it. @Jeff Zhang
> >>     >>>> <mailto:zjffdu@gmail.com <ma...@gmail.com>> can you
> >>     share your insights and user
> >>     >>>> experience with the Jenkins+Travis approach?
> >>     >>>>
> >>     >>>> Things like:
> >>     >>>>
> >>     >>>> - has the approach completely solved the resource capacity
> >>     problem
> >>     >>>> for Zepplin community? is Zepplin community happy with the
> >>     result?
> >>     >>>> - is the whole configuration chain stable (e.g. uptime) enough?
> >>     >>>> - how often do you need to maintain the Jenkins infra? how many
> >>     >>>> people are usually involved in maintenance and bug-fixes?
> >>     >>>>
> >>     >>>> The downside of this approach seems mostly to be on the
> >>     maintenance
> >>     >>>> to me - maintain the script and Jenkins infra.
> >>     >>>>
> >>     >>>> ** Having Our Own Travis-CI.com Account **
> >>     >>>>
> >>     >>>> Another alternative I've been thinking of is to have our own
> >>     >>>> travis-ci.com <http://travis-ci.com> <http://travis-ci.com>
> >>     account with paid dedicated
> >>     >>>> resources. Note travis-ci.org <http://travis-ci.org>
> >>     <http://travis-ci.org> is the free
> >>     >>>> version and travis-ci.com <http://travis-ci.com>
> >>     <http://travis-ci.com> is the commercial
> >>     >>>> version. We currently use a shared resource pool managed by
> >>     ASK INFRA
> >>     >>>> team on travis-ci.org <http://travis-ci.org>
> >>     <http://travis-ci.org>, but we have no control
> >>     >>>> over it - we can't see how it's configured, how much
> >>     resources are
> >>     >>>> available, how resources are allocated among Apache projects,
> >>     etc.
> >>     >>>> The nice thing about having an account on travis-ci.com
> >>     <http://travis-ci.com>
> >>     >>>> <http://travis-ci.com> are:
> >>     >>>>
> >>     >>>> - relatively low cost with much better resource guarantee
> >>     than what
> >>     >>>> we currently have [1]: $249/month with 5 dedicated concurrency,
> >>     >>>> $489/month with 10 concurrency
> >>     >>>> - low maintenance work compared to using Jenkins
> >>     >>>> - (potentially) no migration cost according to Travis's doc [2]
> >>     >>>> (pending verification)
> >>     >>>> - full control over the build capacity/configuration
> >> compared to
> >>     >>>> using ASF INFRA's pool
> >>     >>>>
> >>     >>>> I'd be surprised if we as such a vibrant community cannot
> >>     find and
> >>     >>>> fund $249*12=$2988 a year in exchange for a much better
> >> developer
> >>     >>>> experience and much higher productivity.
> >>     >>>>
> >>     >>>> [1] https://travis-ci.com/plans
> >>     >>>> [2]
> >>     >>>>
> >>     >>
> >>
> https://docs.travis-ci.com/user/migrate/open-source-repository-migration
> >>     >>>> On Sat, Jun 29, 2019 at 8:39 AM Chesnay Schepler
> >>     <chesnay@apache.org <ma...@apache.org>
> >>     >>>> <mailto:chesnay@apache.org <ma...@apache.org>>>
> wrote:
> >>     >>>>
> >>     >>>>      So yes, the Jenkins job keeps pulling the state from
> >>     Travis until it
> >>     >>>>      finishes.
> >>     >>>>
> >>     >>>>      Note sure I'm comfortable with the idea of using Jenkins
> >>     workers
> >>     >>>>      just to
> >>     >>>>      idle for a several hours.
> >>     >>>>
> >>     >>>>      On 29/06/2019 14:56, Jeff Zhang wrote:
> >>     >>>>      > Here's what zeppelin community did, we make a python
> >>     script to
> >>     >>>>      check the
> >>     >>>>      > build status of pull request.
> >>     >>>>      > Here's script:
> >>     >>>>      >
> >> https://github.com/apache/zeppelin/blob/master/travis_check.py
> >>     >>>>      >
> >>     >>>>      > And this is the script we used in Jenkins build job.
> >>     >>>>      >
> >>     >>>>      > if [ -f "travis_check.py" ]; then
> >>     >>>>      >    git log -n 1
> >>     >>>>      >    STATUS=$(curl -s $BUILD_URL | grep -e "GitHub pull
> >>     >>>>      request.*from.*" | sed
> >>     >>>>      > 's/.*GitHub pull request <a
> >>     >>>>      > href=\"\(https[^"]*\).*from[^"]*.\(https[^"]*\).*/\1
> >>     \2/g')
> >>     >>>>      >    AUTHOR=$(echo $STATUS | sed 's/.*[/]\(.*\)$/\1/g')
> >>     >>>>      >    PR=$(echo $STATUS | awk '{print $1}' | sed
> >>     >>>> 's/.*[/]\(.*\)$/\1/g')
> >>     >>>>      >    #COMMIT=$(git log -n 1 | grep "^Merge:" | awk
> >>     '{print $3}')
> >>     >>>>      >    #if [ -z $COMMIT ]; then
> >>     >>>>      >    #  COMMIT=$(curl -s
> >>     >>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR
> >>     >>>>      > | grep -e "\"label\":" -e "\"ref\":" -e "\"sha\":" |
> >>     tr '\n' ' '
> >>     >>>>      | sed
> >>     >>>>      > 's/\(.*sha[^,]*,\)\(.*ref.*\)/\1 = \2/g' | tr = '\n' |
> >>     grep -v
> >>     >>>>      "apache:" |
> >>     >>>>      > sed 's/.*sha.[^"]*["]\([^"]*\).*/\1/g')
> >>     >>>>      >    #fi
> >>     >>>>      >
> >>     >>>>      >    # get commit hash from PR
> >>     >>>>      >    COMMIT=$(curl -s
> >>     >>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR |
> >>     >>>>      > grep -e "\"label\":" -e "\"ref\":" -e "\"sha\":" | tr
> >>     '\n' ' '
> >>     >>>> | sed
> >>     >>>>      > 's/\(.*sha[^,]*,\)\(.*ref.*\)/\1 = \2/g' | tr = '\n' |
> >>     grep -v
> >>     >>>>      "apache:" |
> >>     >>>>      > sed 's/.*sha.[^"]*["]\([^"]*\).*/\1/g')
> >>     >>>>      >    sleep 30 # sleep few moment to wait travis starts
> >>     the build
> >>     >>>>      >    RET_CODE=0
> >>     >>>>      >    python ./travis_check.py ${AUTHOR} ${COMMIT} ||
> >>     RET_CODE=$?
> >>     >>>>      >    if [ $RET_CODE -eq 2 ]; then # try with repository
> >>     name when
> >>     >>>>      travis-ci is
> >>     >>>>      > not available in the account
> >>     >>>>      >      RET_CODE=0
> >>     >>>>      >      AUTHOR=$(curl -s
> >>     >>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR
> >>     >>>>      > | grep '"full_name":' | grep -v "apache/zeppelin" | sed
> >>     >>>>      > 's/.*[:][^"]*["]\([^/]*\).*/\1/g')
> >>     >>>>      >    python ./travis_check.py ${AUTHOR} ${COMMIT} ||
> >>     RET_CODE=$?
> >>     >>>>      >    fi
> >>     >>>>      >
> >>     >>>>      >    if [ $RET_CODE -eq 2 ]; then # fail with can't find
> >>     build
> >>     >>>>      information in
> >>     >>>>      > the travis
> >>     >>>>      >      set +x
> >>     >>>>      >      echo
> >>     "-----------------------------------------------------"
> >>     >>>>      >      echo "Looks like travis-ci is not configured for
> >>     your fork."
> >>     >>>>      >      echo "Please setup by swich on 'zeppelin'
> >>     repository at
> >>     >>>>      > https://travis-ci.org/profile and travis-ci."
> >>     >>>>      >      echo "And then make sure 'Build branch updates'
> >>     option is
> >>     >>>>      enabled in
> >>     >>>>      > the settings
> >>     https://travis-ci.org/${AUTHOR}/zeppelin/settings
> >> <https://travis-ci.org/$%7BAUTHOR%7D/zeppelin/settings>
> >>     >>>> <https://travis-ci.org/$%7BAUTHOR%7D/zeppelin/settings>."
> >>     >>>>      >      echo ""
> >>     >>>>      >      echo "To trigger CI after setup, you will need
> >>     ammend your
> >>     >>>>      last commit
> >>     >>>>      > with"
> >>     >>>>      >      echo "git commit --amend"
> >>     >>>>      >      echo "git push your-remote HEAD --force"
> >>     >>>>      >      echo ""
> >>     >>>>      >      echo "See
> >>     >>>>      >
> >>     >>>>
> >>     >>
> >>
> http://zeppelin.apache.org/contribution/contributions.html#continuous-integration
> >>     >>>>      > ."
> >>     >>>>      >    fi
> >>     >>>>      >
> >>     >>>>      >    exit $RET_CODE
> >>     >>>>      > else
> >>     >>>>      >    set +x
> >>     >>>>      >    echo "travis_check.py does not exists"
> >>     >>>>      >    exit 1
> >>     >>>>      > fi
> >>     >>>>      >
> >>     >>>>      > Chesnay Schepler <chesnay@apache.org
> >>     <ma...@apache.org>
> >>     >>>>      <mailto:chesnay@apache.org <ma...@apache.org>>>
> >>     于2019年6月29日周六 下午3:17写道:
> >>     >>>>      >
> >>     >>>>      >> Does this imply that a Jenkins job is active as long
> >>     as the
> >>     >>>>      Travis build
> >>     >>>>      >> runs?
> >>     >>>>      >>
> >>     >>>>      >> On 26/06/2019 21:28, Bowen Li wrote:
> >>     >>>>      >>> Hi,
> >>     >>>>      >>>
> >>     >>>>      >>> @Dawid, I think the "long test running" as I
> >>     mentioned in the
> >>     >>>>      first
> >>     >>>>      >> email,
> >>     >>>>      >>> also as you guys said, belongs to "a big effort
> >>     which is much
> >>     >>>>      harder to
> >>     >>>>      >>> accomplish in a short period of time and may deserve
> >>     its own
> >>     >>>>      separate
> >>     >>>>      >>> discussion". Thus I didn't include it in what we can
> >>     do in a
> >>     >>>>      foreseeable
> >>     >>>>      >>> short term.
> >>     >>>>      >>>
> >>     >>>>      >>> Besides, I don't think that's the ultimate reason
> >>     for lack of
> >>     >>>>      build
> >>     >>>>      >>> resources. Even if the build is shortened to
> >>     something like
> >>     >>>>      2h, the
> >>     >>>>      >>> problems of no build machine works about 6 or more
> >>     hours in
> >>     >>>>      PST daytime
> >>     >>>>      >>> that I described will still happen, because no
> >>     machine from
> >>     >>>>      ASF INFRA's
> >>     >>>>      >>> pool is allocated to Flink. As I have paid close
> >>     attention to
> >>     >>>>      the build
> >>     >>>>      >>> queue in the past few weekdays, it's a pretty clear
> >>     pattern now.
> >>     >>>>      >>>
> >>     >>>>      >>> **The ultimate root cause** for that is - we don't
> >>     have any
> >>     >>>>      **dedicated**
> >>     >>>>      >>> build resources that we can stably rely on. I'm
> >>     actually ok to
> >>     >>>>      wait for a
> >>     >>>>      >>> long time if there are build requests running, it
> >>     means at
> >>     >>>>      least we are
> >>     >>>>      >>> making progress. But I'm not ok with no build
> >>     resource. A
> >>     >>>>      better place I
> >>     >>>>      >>> think we should aim at in short term is to always
> >>     have at
> >>     >>>>      least a central
> >>     >>>>      >>> pool (can be 3 or 5) of machines dedicated to build
> >>     Flink at
> >>     >>>>      any time, or
> >>     >>>>      >>> maybe use users resources.
> >>     >>>>      >>>
> >>     >>>>      >>> @Chesnay @Robert I synced with Jeff offline that
> >>     Zeppelin
> >>     >>>>      community is
> >>     >>>>      >>> using a Jenkins job to automatically build on users'
> >>     travis
> >>     >>>>      account and
> >>     >>>>      >>> link the result back to github PR. I guess the
> >>     Jenkins job
> >>     >>>>      would fetch
> >>     >>>>      >>> latest upstream master and build the PR against it.
> >>     Jeff has
> >>     >>>> filed
> >>     >>>>      >> tickets
> >>     >>>>      >>> to learn and get access to the Jenkins infra. It'll
> >>     better to
> >>     >>>>      fully
> >>     >>>>      >>> understand it first before judging this approach.
> >>     >>>>      >>>
> >>     >>>>      >>> I also heard good things about CircleCI, and ASF
> >>     INFRA seems
> >>     >>>>      to have a
> >>     >>>>      >> pool
> >>     >>>>      >>> of build capacity there too. Can be an alternative
> >>     to consider.
> >>     >>>>      >>>
> >>     >>>>      >>>
> >>     >>>>      >>>
> >>     >>>>      >>>
> >>     >>>>      >>>
> >>     >>>>      >>>
> >>     >>>>      >>>
> >>     >>>>      >>>
> >>     >>>>      >>>
> >>     >>>>      >>> On Wed, Jun 26, 2019 at 12:44 AM Dawid Wysakowicz <
> >>     >>>>      >> dwysakowicz@apache.org
> >>     <ma...@apache.org> <mailto:dwysakowicz@apache.org
> >>     <ma...@apache.org>>>
> >>     >>>>      >>> wrote:
> >>     >>>>      >>>
> >>     >>>>      >>>> Sorry to jump in late, but I think Bowen missed the
> >>     most
> >>     >>>>      important point
> >>     >>>>      >>>> from Chesnay's previous message in the summary. The
> >>     ultimate
> >>     >>>>      reason for
> >>     >>>>      >>>> all the problems is that the tests take close to 2
> >>     hours to
> >>     >>>>      run already.
> >>     >>>>      >>>> I fully support this claim: "Unless people start
> >>     caring about
> >>     >>>>      test times
> >>     >>>>      >>>> before adding them, this issue cannot be solved"
> >>     >>>>      >>>>
> >>     >>>>      >>>> This is also another reason why using user's Travis
> >>     account
> >>     >>>>      won't help.
> >>     >>>>      >>>> Every few weeks we reach the user's time limit for
> >>     a single
> >>     >>>>      profile.
> >>     >>>>      >>>> This makes the user's builds simply fail, until we
> >>     either
> >>     >>>>      properly
> >>     >>>>      >>>> decrease the time the tests take (which I am not
> >>     sure we ever
> >>     >>>>      did) or
> >>     >>>>      >>>> postpone the problem by splitting into more
> >>     profiles. (Note
> >>     >>>>      that the ASF
> >>     >>>>      >>>> Travis account has higher time limits)
> >>     >>>>      >>>>
> >>     >>>>      >>>> Best,
> >>     >>>>      >>>>
> >>     >>>>      >>>> Dawid
> >>     >>>>      >>>>
> >>     >>>>      >>>> On 26/06/2019 09:36, Robert Metzger wrote:
> >>     >>>>      >>>>> Do we know if using "the best" available hardware
> >>     would
> >>     >>>>      improve the
> >>     >>>>      >> build
> >>     >>>>      >>>>> times?
> >>     >>>>      >>>>> Imagine we would run the build on machines with
> >>     plenty of
> >>     >>>>      main memory
> >>     >>>>      >> to
> >>     >>>>      >>>>> mount everything to ramdisk + the latest CPU
> >>     architecture?
> >>     >>>>      >>>>>
> >>     >>>>      >>>>> Throwing hardware at the problem could help reduce
> >>     the time
> >>     >>>>      of an
> >>     >>>>      >>>>> individual build, and using our own infrastructure
> >>     would
> >>     >>>>      remove our
> >>     >>>>      >>>>> dependency on Apache's Travis account (with the
> >>     obvious
> >>     >>>>      downside of
> >>     >>>>      >>>> having
> >>     >>>>      >>>>> to maintain the infrastructure)
> >>     >>>>      >>>>> We could use an open source travis alternative, to
> >>     have a
> >>     >>>>      similar
> >>     >>>>      >>>>> experience and make the migration easy.
> >>     >>>>      >>>>>
> >>     >>>>      >>>>>
> >>     >>>>      >>>>> On Wed, Jun 26, 2019 at 9:34 AM Chesnay Schepler
> >>     >>>>      <chesnay@apache.org <ma...@apache.org>
> >>     <mailto:chesnay@apache.org <ma...@apache.org>>>
> >>     >>>>      >>>> wrote:
> >>     >>>>      >>>>>>    >From what I gathered, there's no special
> >>     sauce that the
> >>     >>>>      Zeppelin
> >>     >>>>      >>>>>> project uses which actually integrates a users
> >> Travis
> >>     >>>>      account into the
> >>     >>>>      >>>> PR.
> >>     >>>>      >>>>>> They just disabled Travis for PRs. And that's
> >>     kind of it.
> >>     >>>>      >>>>>>
> >>     >>>>      >>>>>> Naturally we can do this (duh) and safe the ASF a
> >>     fair
> >>     >>>>      amount of
> >>     >>>>      >>>>>> resources, but there are downsides:
> >>     >>>>      >>>>>>
> >>     >>>>      >>>>>> The discoverability of the Travis check takes a
> >>     nose-dive.
> >>     >>>>      Either we
> >>     >>>>      >>>>>> require every contributor to always, an every
> >>     commit, also
> >>     >>>>      post a
> >>     >>>>      >> Travis
> >>     >>>>      >>>>>> build, or we have the reviewer sift through the
> >>     >>>>      contributors account
> >>     >>>>      >> to
> >>     >>>>      >>>>>> find it.
> >>     >>>>      >>>>>>
> >>     >>>>      >>>>>> This is rather cumbersome. Additionally, it's
> >>     also not
> >>     >>>>      equivalent to
> >>     >>>>      >>>>>> having a PR build.
> >>     >>>>      >>>>>>
> >>     >>>>      >>>>>> A normal branch build takes a branch as is and
> >>     tests it. A
> >>     >>>>      PR build
> >>     >>>>      >>>>>> merges the branch into master, and then runs it.
> >>     (Fun fact:
> >>     >>>>      This is
> >>     >>>>      >> why
> >>     >>>>      >>>>>> a PR without merge conflicts is not being run on
> >>     Travis.)
> >>     >>>>      >>>>>>
> >>     >>>>      >>>>>> And ultimately, everyone can already make use of
> >> this
> >>     >>>>      approach anyway.
> >>     >>>>      >>>>>>
> >>     >>>>      >>>>>> On 25/06/2019 08:02, Jark Wu wrote:
> >>     >>>>      >>>>>>> Hi Jeff,
> >>     >>>>      >>>>>>>
> >>     >>>>      >>>>>>> Thanks for sharing the Zeppelin approach. I
> >>     think it's a
> >>     >>>>      good idea to
> >>     >>>>      >>>>>>> leverage user's travis account.
> >>     >>>>      >>>>>>> In this way, we can have almost unlimited
> >>     concurrent build
> >>     >>>>      jobs and
> >>     >>>>      >>>>>>> developers can restart build by themselves
> >>     (currently only
> >>     >>>>      committers
> >>     >>>>      >>>>>>> can restart PR's build).
> >>     >>>>      >>>>>>>
> >>     >>>>      >>>>>>> But I'm still not very clear how to integrate
> >> user's
> >>     >>>>      travis build
> >>     >>>>      >> into
> >>     >>>>      >>>>>>> the Flink pull request's build automatically.
> >>     Can you
> >>     >>>>      explain more in
> >>     >>>>      >>>>>>> detail?
> >>     >>>>      >>>>>>>
> >>     >>>>      >>>>>>> Another question: does travis only build
> >>     branches for user
> >>     >>>>      account?
> >>     >>>>      >>>>>>> My concern is that builds for PRs will rebase
> >> user's
> >>     >>>>      commits against
> >>     >>>>      >>>>>>> current master branch.
> >>     >>>>      >>>>>>> This will help us to find problems before
> >>     merge.  Builds
> >>     >>>>      for branches
> >>     >>>>      >>>>>>> will lose the impact of new commits in master.
> >>     >>>>      >>>>>>> How does Zeppelin solve this problem?
> >>     >>>>      >>>>>>>
> >>     >>>>      >>>>>>> Thanks again for sharing the idea.
> >>     >>>>      >>>>>>>
> >>     >>>>      >>>>>>> Regards,
> >>     >>>>      >>>>>>> Jark
> >>     >>>>      >>>>>>>
> >>     >>>>      >>>>>>> On Tue, 25 Jun 2019 at 11:01, Jeff Zhang
> >>     <zjffdu@gmail.com <ma...@gmail.com>
> >>     >>>>      <mailto:zjffdu@gmail.com <ma...@gmail.com>>
> >>     >>>>      >>>>>>> <mailto:zjffdu@gmail.com
> >>     <ma...@gmail.com> <mailto:zjffdu@gmail.com
> >>     <ma...@gmail.com>>>> wrote:
> >>     >>>>      >>>>>>>
> >>     >>>>      >>>>>>>       Hi Folks,
> >>     >>>>      >>>>>>>
> >>     >>>>      >>>>>>>  Zeppelin meet this kind of issue before, we solve
> >>     >>>> it by
> >>     >>>>      >> delegating
> >>     >>>>      >>>>>>>  each
> >>     >>>>      >>>>>>>  one's PR build to his travis account
> >>     (Everyone can
> >>     >>>>      have 5 free
> >>     >>>>      >>>>>>>  slot for
> >>     >>>>      >>>>>>>  travis build).
> >>     >>>>      >>>>>>>  Apache account travis build is only triggered
> >> when
> >>     >>>>      PR is merged.
> >>     >>>>      >>>>>>>
> >>     >>>>      >>>>>>>
> >>     >>>>      >>>>>>>
> >>     >>>>      >>>>>>>  Kurt Young <ykt836@gmail.com
> >>     <ma...@gmail.com>
> >>     >>>>      <mailto:ykt836@gmail.com <ma...@gmail.com>>
> >>     <mailto:ykt836@gmail.com <ma...@gmail.com>
> >>     >>>>      <mailto:ykt836@gmail.com <ma...@gmail.com>>>>
> >>     >>>>      >>>>>>>  于2019年6月25日周二 上午10:16写道:
> >>     >>>>      >>>>>>>
> >>     >>>>      >>>>>>>  > (Forgot to cc George)
> >>     >>>>      >>>>>>>  >
> >>     >>>>      >>>>>>>  > Best,
> >>     >>>>      >>>>>>>  > Kurt
> >>     >>>>      >>>>>>>  >
> >>     >>>>      >>>>>>>  >
> >>     >>>>      >>>>>>>  > On Tue, Jun 25, 2019 at 10:16 AM Kurt Young
> >>     >>>>      <ykt836@gmail.com <ma...@gmail.com>
> >>     <mailto:ykt836@gmail.com <ma...@gmail.com>>
> >>     >>>>      >>>>>>> <mailto:ykt836@gmail.com
> >>     <ma...@gmail.com> <mailto:ykt836@gmail.com
> >>     <ma...@gmail.com>>>>
> >>     >>>>      wrote:
> >>     >>>>      >>>>>>>  >
> >>     >>>>      >>>>>>>  > > Hi Bowen,
> >>     >>>>      >>>>>>>  > >
> >>     >>>>      >>>>>>>  > > Thanks for bringing this up. We
> >>     actually have
> >>     >>>>      discussed
> >>     >>>>      >> about
> >>     >>>>      >>>>>>>  this, and I
> >>     >>>>      >>>>>>>  > > think Till and George have
> >>     >>>>      >>>>>>>  > > already spend sometime investigating
> >>     it. I have
> >>     >>>>      cced both of
> >>     >>>>      >>>>>>>  them, and
> >>     >>>>      >>>>>>>  > > maybe they can share
> >>     >>>>      >>>>>>>  > > their findings.
> >>     >>>>      >>>>>>>  > >
> >>     >>>>      >>>>>>>  > > Best,
> >>     >>>>      >>>>>>>  > > Kurt
> >>     >>>>      >>>>>>>  > >
> >>     >>>>      >>>>>>>  > >
> >>     >>>>      >>>>>>>  > > On Tue, Jun 25, 2019 at 10:08 AM Jark Wu
> >>     >>>>      <imjark@gmail.com <ma...@gmail.com>
> >>     <mailto:imjark@gmail.com <ma...@gmail.com>>
> >>     >>>>      >>>>>>> <mailto:imjark@gmail.com
> >>     <ma...@gmail.com> <mailto:imjark@gmail.com
> >>     <ma...@gmail.com>>>>
> >>     >>>>      wrote:
> >>     >>>>      >>>>>>>  > >
> >>     >>>>      >>>>>>>  > >> Hi Bowen,
> >>     >>>>      >>>>>>>  > >>
> >>     >>>>      >>>>>>>  > >> Thanks for bringing this. We also
> >>     suffered from
> >>     >>>>      the long
> >>     >>>>      >>>>>>>  build time.
> >>     >>>>      >>>>>>>  > >> I agree that we should focus on
> >>     solving build
> >>     >>>>      capacity
> >>     >>>>      >>>>>>>  problem in the
> >>     >>>>      >>>>>>>  > >> thread.
> >>     >>>>      >>>>>>>  > >>
> >>     >>>>      >>>>>>>  > >> My observation is there is only one
> >>     build is
> >>     >>>>      running, all
> >>     >>>>      >> the
> >>     >>>>      >>>>>>>  others
> >>     >>>>      >>>>>>>  > >> (other
> >>     >>>>      >>>>>>>  > >> PRs, master) are pending.
> >>     >>>>      >>>>>>>  > >> The pricing plan[1] of travis shows
> >>     it can
> >>     >>>> support
> >>     >>>>      >> concurrent
> >>     >>>>      >>>>>>>  build
> >>     >>>>      >>>>>>>  > jobs.
> >>     >>>>      >>>>>>>  > >> But I don't know which plan we are
> >>     using, might
> >>     >>>>      be the free
> >>     >>>>      >>>>>>>  plan for
> >>     >>>>      >>>>>>>  > open
> >>     >>>>      >>>>>>>  > >> source.
> >>     >>>>      >>>>>>>  > >>
> >>     >>>>      >>>>>>>  > >> I cc-ed Chesnay who may have some
> >>     experience on
> >>     >>>>      Travis.
> >>     >>>>      >>>>>>>  > >>
> >>     >>>>      >>>>>>>  > >> Regards,
> >>     >>>>      >>>>>>>  > >> Jark
> >>     >>>>      >>>>>>>  > >>
> >>     >>>>      >>>>>>>  > >> [1]: https://travis-ci.com/plans
> >>     >>>>      >>>>>>>  > >>
> >>     >>>>      >>>>>>>  > >> On Tue, 25 Jun 2019 at 08:11, Bowen Li <
> >>     >>>>      >> bowenli86@gmail.com <ma...@gmail.com>
> >>     <mailto:bowenli86@gmail.com <ma...@gmail.com>>
> >>     >>>>      >>>>>>> <mailto:bowenli86@gmail.com
> >>     <ma...@gmail.com>
> >>     >>>>      <mailto:bowenli86@gmail.com
> >>     <ma...@gmail.com>>>> wrote:
> >>     >>>>      >>>>>>>  > >>
> >>     >>>>      >>>>>>>  > >> > Hi Steven,
> >>     >>>>      >>>>>>>  > >> >
> >>     >>>>      >>>>>>>  > >> > I think you may not read what I
> >>     wrote. The
> >>     >>>>      discussion is
> >>     >>>>      >>>> about
> >>     >>>>      >>>>>>>  > "unstable
> >>     >>>>      >>>>>>>  > >> > build **capacity**", in another word
> >>     >>>>      "unstable / lack of
> >>     >>>>      >>>> build
> >>     >>>>      >>>>>>>  > >> resources",
> >>     >>>>      >>>>>>>  > >> > not "unstable build".
> >>     >>>>      >>>>>>>  > >> >
> >>     >>>>      >>>>>>>  > >> > On Mon, Jun 24, 2019 at 4:40 PM
> >>     Steven Wu
> >>     >>>>      >>>>>>>  <stevenz3wu@gmail.com
> >>     <ma...@gmail.com> <mailto:stevenz3wu@gmail.com
> >>     <ma...@gmail.com>>
> >>     >>>>      <mailto:stevenz3wu@gmail.com
> >>     <ma...@gmail.com> <mailto:stevenz3wu@gmail.com
> >>     <ma...@gmail.com>>>>
> >>     >>>>      >>>>>>>  > wrote:
> >>     >>>>      >>>>>>>  > >> >
> >>     >>>>      >>>>>>>  > >> > > long and sometimes unstable build is
> >>     >>>>      definitely a pain
> >>     >>>>      >>>>>> point.
> >>     >>>>      >>>>>>>  > >> > >
> >>     >>>>      >>>>>>>  > >> > > I suspect the build failure here in
> >>     >>>>      >> flink-connector-kafka
> >>     >>>>      >>>>>>>       is not
> >>     >>>>      >>>>>>>  > >> related
> >>     >>>>      >>>>>>>  > >> > to
> >>     >>>>      >>>>>>>  > >> > > my change. but there is no easy
> >>     re-run the
> >>     >>>>      build on
> >>     >>>>      >>>>>>>  travis UI.
> >>     >>>>      >>>>>>>  > Google
> >>     >>>>      >>>>>>>  > >> > > search showed a trick of
> >>     close-and-open the
> >>     >>>>      PR will
> >>     >>>>      >>>>>>>  trigger rebuild.
> >>     >>>>      >>>>>>>  > >> but
> >>     >>>>      >>>>>>>  > >> > > that could add noises to the PR
> >>     activities.
> >>     >>>>      >>>>>>>  > >> > >
> >>     >>>> https://travis-ci.org/apache/flink/jobs/545555519
> >>     >>>>      >>>>>>>  > >> > >
> >>     >>>>      >>>>>>>  > >> > > travis-ci for my personal repo
> >>     often failed
> >>     >>>>      with
> >>     >>>>      >>>>>>>  exceeding time
> >>     >>>>      >>>>>>>  > limit
> >>     >>>>      >>>>>>>  > >> > after
> >>     >>>>      >>>>>>>  > >> > > 4+ hours.
> >>     >>>>      >>>>>>>  > >> > > The job exceeded the maximum time
> >>     limit for
> >>     >>>>      jobs, and
> >>     >>>>      >> has
> >>     >>>>      >>>>>>>  been
> >>     >>>>      >>>>>>>  > >> > terminated.
> >>     >>>>      >>>>>>>  > >> > >
> >>     >>>>      >>>>>>>  > >> > > On Mon, Jun 24, 2019 at 4:15 PM
> >>     Bowen Li
> >>     >>>>      >>>>>>>  <bowenli86@gmail.com
> >>     <ma...@gmail.com> <mailto:bowenli86@gmail.com
> >>     <ma...@gmail.com>>
> >>     >>>>      <mailto:bowenli86@gmail.com <ma...@gmail.com>
> >>     <mailto:bowenli86@gmail.com <ma...@gmail.com>>>>
> >>     >>>>      >>>>>>>  > wrote:
> >>     >>>>      >>>>>>>  > >> > >
> >>     >>>>      >>>>>>>  > >> > > >
> >>     >>>> https://travis-ci.org/apache/flink/builds/549681530
> >>     >>>>      >>>>>>>  This build
> >>     >>>>      >>>>>>>  > >> > request
> >>     >>>>      >>>>>>>  > >> > > > has
> >>     >>>>      >>>>>>>  > >> > > > been sitting at **HEAD of the
> >>     queue**
> >>     >>>>      since I first
> >>     >>>>      >> saw
> >>     >>>>      >>>>>>>       it at PST
> >>     >>>>      >>>>>>>  > >> > 10:30am
> >>     >>>>      >>>>>>>  > >> > > > (not sure how long it's been
> >>     there before
> >>     >>>>      10:30am).
> >>     >>>>      >>>>>>>  It's PST
> >>     >>>>      >>>>>>>  > 4:12pm
> >>     >>>>      >>>>>>>  > >> now
> >>     >>>>      >>>>>>>  > >> > > and
> >>     >>>>      >>>>>>>  > >> > > > it hasn't started yet.
> >>     >>>>      >>>>>>>  > >> > > >
> >>     >>>>      >>>>>>>  > >> > > > On Mon, Jun 24, 2019 at 2:48 PM
> >>     Bowen Li
> >>     >>>>      >>>>>>>  <bowenli86@gmail.com
> >>     <ma...@gmail.com> <mailto:bowenli86@gmail.com
> >>     <ma...@gmail.com>>
> >>     >>>>      <mailto:bowenli86@gmail.com <ma...@gmail.com>
> >>     <mailto:bowenli86@gmail.com <ma...@gmail.com>>>>
> >>     >>>>      >>>>>>>  > >> wrote:
> >>     >>>>      >>>>>>>  > >> > > >
> >>     >>>>      >>>>>>>  > >> > > > > Hi devs,
> >>     >>>>      >>>>>>>  > >> > > > >
> >>     >>>>      >>>>>>>  > >> > > > > I've been experiencing the pain
> >>     >>>>      resulting from lack
> >>     >>>>      >>>>>>>       of stable
> >>     >>>>      >>>>>>>  > >> build
> >>     >>>>      >>>>>>>  > >> > > > > capacity on Travis for Flink
> >>     PRs [1].
> >>     >>>>      >> Specifically, I
> >>     >>>>      >>>>>>>  noticed
> >>     >>>>      >>>>>>>  > >> often
> >>     >>>>      >>>>>>>  > >> > > that
> >>     >>>>      >>>>>>>  > >> > > > no
> >>     >>>>      >>>>>>>  > >> > > > > build in the queue is making any
> >>     >>>>      progress for
> >>     >>>>      >> hours,
> >>     >>>>      >>>> and
> >>     >>>>      >>>>>>>  > suddenly
> >>     >>>>      >>>>>>>  > >> 5
> >>     >>>>      >>>>>>>  > >> > or
> >>     >>>>      >>>>>>>  > >> > > 6
> >>     >>>>      >>>>>>>  > >> > > > > builds kick off all together
> >>     after the
> >>     >>>>      long pause.
> >>     >>>>      >>>>>>>       I'm at PST
> >>     >>>>      >>>>>>>  > >> > (UTC-08)
> >>     >>>>      >>>>>>>  > >> > > > time
> >>     >>>>      >>>>>>>  > >> > > > > zone, and I've seen pause can
> >>     be as
> >>     >>>>      long as 6 hours
> >>     >>>>      >>>>>>>  from PST 9am
> >>     >>>>      >>>>>>>  > >> to
> >>     >>>>      >>>>>>>  > >> > 3pm
> >>     >>>>      >>>>>>>  > >> > > > > (let alone the time needed to
> >>     drain the
> >>     >>>>      queue
> >>     >>>>      >>>>>>>  afterwards).
> >>     >>>>      >>>>>>>  > >> > > > >
> >>     >>>>      >>>>>>>  > >> > > > > I think this has greatly
> >>     impacted our
> >>     >>>>      productivity.
> >>     >>>>      >>>> I've
> >>     >>>>      >>>>>>>  > >> experienced
> >>     >>>>      >>>>>>>  > >> > > that
> >>     >>>>      >>>>>>>  > >> > > > > PRs submitted in the early
> >>     morning of
> >>     >>>>      PST time zone
> >>     >>>>      >>>>>>>  won't finish
> >>     >>>>      >>>>>>>  > >> > their
> >>     >>>>      >>>>>>>  > >> > > > > build until late night of the
> >>     same day.
> >>     >>>>      >>>>>>>  > >> > > > >
> >>     >>>>      >>>>>>>  > >> > > > > So my questions are:
> >>     >>>>      >>>>>>>  > >> > > > >
> >>     >>>>      >>>>>>>  > >> > > > > - Has anyone else experienced
> >>     the same
> >>     >>>>      problem or
> >>     >>>>      >>>>>>>  have similar
> >>     >>>>      >>>>>>>  > >> > > > observation
> >>     >>>>      >>>>>>>  > >> > > > > on TravisCI? (I suspect it
> >>     has things
> >>     >>>>      to do with
> >>     >>>>      >> time
> >>     >>>>      >>>>>>>  zone)
> >>     >>>>      >>>>>>>  > >> > > > >
> >>     >>>>      >>>>>>>  > >> > > > > - What pricing plan of
> >>     TravisCI is
> >>     >>>>      Flink currently
> >>     >>>>      >>>>>>>  using? Is it
> >>     >>>>      >>>>>>>  > >> the
> >>     >>>>      >>>>>>>  > >> > > free
> >>     >>>>      >>>>>>>  > >> > > > > plan for open source
> >>     projects? What
> >>     >>>> are the
> >>     >>>>      >>>>>>>  guaranteed build
> >>     >>>>      >>>>>>>  > >> capacity
> >>     >>>>      >>>>>>>  > >> > > of
> >>     >>>>      >>>>>>>  > >> > > > > the current plan?
> >>     >>>>      >>>>>>>  > >> > > > >
> >>     >>>>      >>>>>>>  > >> > > > > - If the current pricing plan
> >>     (either
> >>     >>>>      free or paid)
> >>     >>>>      >>>>>> can't
> >>     >>>>      >>>>>>>  > provide
> >>     >>>>      >>>>>>>  > >> > > stable
> >>     >>>>      >>>>>>>  > >> > > > > build capacity, can we
> >>     upgrade to a
> >>     >>>>      higher priced
> >>     >>>>      >>>>>>>  plan with
> >>     >>>>      >>>>>>>  > larger
> >>     >>>>      >>>>>>>  > >> > and
> >>     >>>>      >>>>>>>  > >> > > > more
> >>     >>>>      >>>>>>>  > >> > > > > stable build capacity?
> >>     >>>>      >>>>>>>  > >> > > > >
> >>     >>>>      >>>>>>>  > >> > > > > BTW, another factor that
> >>     contribute to
> >>     >>>> the
> >>     >>>>      >>>>>>>  productivity problem
> >>     >>>>      >>>>>>>  > is
> >>     >>>>      >>>>>>>  > >> > that
> >>     >>>>      >>>>>>>  > >> > > > > our build is slow - we run
> >>     full build
> >>     >>>>      for every PR
> >>     >>>>      >>>> and a
> >>     >>>>      >>>>>>>  > >> successful
> >>     >>>>      >>>>>>>  > >> > > full
> >>     >>>>      >>>>>>>  > >> > > > > build takes ~5h. We
> >>     definitely have
> >>     >>>>      more options to
> >>     >>>>      >>>>>>>  solve it,
> >>     >>>>      >>>>>>>  > for
> >>     >>>>      >>>>>>>  > >> > > > instance,
> >>     >>>>      >>>>>>>  > >> > > > > modularize the build graphs
> >>     and reuse
> >>     >>>>      artifacts
> >>     >>>>      >> from
> >>     >>>>      >>>> the
> >>     >>>>      >>>>>>>  > previous
> >>     >>>>      >>>>>>>  > >> > > build.
> >>     >>>>      >>>>>>>  > >> > > > > But I think that can be a big
> >>     effort
> >>     >>>>      which is much
> >>     >>>>      >>>>>>>  harder to
> >>     >>>>      >>>>>>>  > >> > accomplish
> >>     >>>>      >>>>>>>  > >> > > > in
> >>     >>>>      >>>>>>>  > >> > > > > a short period of time and
> >>     may deserve
> >>     >>>>      its own
> >>     >>>>      >>>> separate
> >>     >>>>      >>>>>>>  > >> discussion.
> >>     >>>>      >>>>>>>  > >> > > > >
> >>     >>>>      >>>>>>>  > >> > > > > [1]
> >>     >>>>      >> https://travis-ci.org/apache/flink/pull_requests
> >>     >>>>      >>>>>>>  > >> > > > >
> >>     >>>>      >>>>>>>  > >> > > > >
> >>     >>>>      >>>>>>>  > >> > > >
> >>     >>>>      >>>>>>>  > >> > >
> >>     >>>>      >>>>>>>  > >> >
> >>     >>>>      >>>>>>>  > >>
> >>     >>>>      >>>>>>>  > >
> >>     >>>>      >>>>>>>  >
> >>     >>>>      >>>>>>>
> >>     >>>>      >>>>>>>
> >>     >>>>      >>>>>>>       --
> >>     >>>>      >>>>>>>  Best Regards
> >>     >>>>      >>>>>>>
> >>     >>>>      >>>>>>>  Jeff Zhang
> >>     >>>>      >>>>>>>
> >>     >>>>      >>
> >>     >>>>
> >>     >>>
> >>     >>
> >>
> >
> >
>
>

Re: [RESULT][VOTE] Migrate to sponsored Travis account

Posted by Chesnay Schepler <ch...@apache.org>.
Update: Implemented and deployed.

On 02/08/2019 12:11, Jark Wu wrote:
> Wow. That's great! Thanks Chesnay.
>
> On Fri, 2 Aug 2019 at 17:50, Chesnay Schepler <chesnay@apache.org 
> <ma...@apache.org>> wrote:
>
>     I'm currently modifying the cibot to do this automatically; should be
>     finished until Monday.
>
>     On 02/08/2019 07:41, Jark Wu wrote:
>     > Hi Chesnay,
>     >
>     > Can we assign Flink Committers the permission of flink-ci/flink
>     repo?
>     > Several times, when I pushed some new commits, the old build
>     jobs are still
>     > in pending and not canceled.
>     > Before we fix that, we can manually cancel some old jobs to save
>     build
>     > resource.
>     >
>     > Best,
>     > Jark
>     >
>     >
>     > On Wed, 10 Jul 2019 at 16:17, Chesnay Schepler
>     <chesnay@apache.org <ma...@apache.org>> wrote:
>     >
>     >> Your best bet would be to check the first commit in the PR and
>     check the
>     >> parent commit.
>     >>
>     >> To re-run things, you will have to rebase the PR on the latest
>     master.
>     >>
>     >> On 10/07/2019 03:32, Kurt Young wrote:
>     >>> Thanks for all your efforts Chesnay, it indeed improves a lot
>     for our
>     >>> develop experience. BTW, do you know how to find the master branch
>     >>> information which the CI runs with?
>     >>>
>     >>> For example, like this one:
>     >>> https://travis-ci.com/flink-ci/flink/jobs/214542568
>     >>> It shows pass with the commits, which rebased on the master
>     when the CI
>     >>> is triggered. But it's both possible that the master branch CI
>     runs on is
>     >>> the
>     >>> same or different with current master. If it's the same, I can
>     simply
>     >> rely
>     >>> on the
>     >>> passed information to push commits, but if it's not, I think i
>     should
>     >> find
>     >>> another
>     >>> way to re-trigger tests based on the newest master.
>     >>>
>     >>> Do you know where can I get such information?
>     >>>
>     >>> Best,
>     >>> Kurt
>     >>>
>     >>>
>     >>> On Tue, Jul 9, 2019 at 3:27 AM Chesnay Schepler
>     <chesnay@apache.org <ma...@apache.org>>
>     >> wrote:
>     >>>> The kinks have been worked out; the bot is running again and
>     pr builds
>     >>>> are yet again no longer running on ASF resources.
>     >>>>
>     >>>> PRs are mirrored to: https://github.com/flink-ci/flink
>     >>>> Bot source: https://github.com/flink-ci/ci-bot
>     >>>>
>     >>>> On 08/07/2019 17:14, Chesnay Schepler wrote:
>     >>>>> I have temporarily re-enabled running PR builds on the ASF
>     account;
>     >>>>> migrating to the Travis subscription caused some issues in
>     the bot
>     >>>>> that I have to fix first.
>     >>>>>
>     >>>>> On 07/07/2019 23:01, Chesnay Schepler wrote:
>     >>>>>> The vote has passed unanimously in favor of migrating to a
>     separate
>     >>>>>> Travis account.
>     >>>>>>
>     >>>>>> I will now set things up such that no PullRequest is no
>     longer run on
>     >>>>>> the ASF servers.
>     >>>>>> This is a major setup in reducing our usage of ASF resources.
>     >>>>>> For the time being we'll use free Travis plan for flink-ci
>     (i.e. 5
>     >>>>>> workers, which is the same the ASF gives us). Over the
>     course of the
>     >>>>>> next week we'll setup the Ververica subscription to
>     increase this
>     >> limit.
>     >>>>>>   From now now, a bot will mirror all new and updated
>     PullRequests to a
>     >>>>>> mirror repository (https://github.com/flink-ci/flink-ci)
>     and write an
>     >>>>>> update into the PR once the build is complete.
>     >>>>>> I have ran the bots for the past 3 days in parallel to our
>     existing
>     >>>>>> Travis and it was working without major issues.
>     >>>>>>
>     >>>>>> The biggest change that contributors will see is that
>     there's no
>     >>>>>> longer a icon next to each commit. We may revisit this in
>     the future.
>     >>>>>>
>     >>>>>> I'll setup a repo with the source of the bot later.
>     >>>>>>
>     >>>>>> On 04/07/2019 10:46, Chesnay Schepler wrote:
>     >>>>>>> I've raised a JIRA
>     >>>>>>> <https://issues.apache.org/jira/browse/INFRA-18703>with
>     INFRA to
>     >>>>>>> inquire whether it would be possible to switch to a
>     different Travis
>     >>>>>>> account, and if so what steps would need to be taken.
>     >>>>>>> We need a proper confirmation from INFRA since we are not
>     in full
>     >>>>>>> control of the flink repository (for example, we cannot
>     access the
>     >>>>>>> settings page).
>     >>>>>>>
>     >>>>>>> If this is indeed possible, Ververica is willing sponsor a
>     Travis
>     >>>>>>> account for the Flink project.
>     >>>>>>> This would provide us with more than enough resources than
>     we need.
>     >>>>>>>
>     >>>>>>> Since this makes the project more reliant on resources
>     provided by
>     >>>>>>> external companies I would like to vote on this.
>     >>>>>>>
>     >>>>>>> Please vote on this proposal, as follows:
>     >>>>>>> [ ] +1, Approve the migration to a Ververica-sponsored Travis
>     >>>>>>> account, provided that INFRA approves
>     >>>>>>> [ ] -1, Do not approach the migration to a Ververica-sponsored
>     >>>>>>> Travis account
>     >>>>>>>
>     >>>>>>> The vote will be open for at least 24h, and until we have
>     >>>>>>> confirmation from INFRA. The voting period may be shorter
>     than the
>     >>>>>>> usual 3 days since our current is effectively not working.
>     >>>>>>>
>     >>>>>>> On 04/07/2019 06:51, Bowen Li wrote:
>     >>>>>>>> Re: > Are they using their own Travis CI pool, or did the
>     switch to
>     >>>>>>>> an entirely different CI service?
>     >>>>>>>>
>     >>>>>>>> I reached out to Wes and Krisztián from Apache Arrow PMC.
>     They are
>     >>>>>>>> currently moving away from ASF's Travis to their own
>     in-house metal
>     >>>>>>>> machines at [1] with custom CI application at [2].
>     They've seen
>     >>>>>>>> significant improvement w.r.t both much higher
>     performance and
>     >>>>>>>> basically no resource waiting time, "night-and-day"
>     difference
>     >>>>>>>> quoting Wes.
>     >>>>>>>>
>     >>>>>>>> Re: > If we can just switch to our own Travis pool, just
>     for our
>     >>>>>>>> project, then this might be something we can do fairly
>     quickly?
>     >>>>>>>>
>     >>>>>>>> I believe so, according to [3] and [4]
>     >>>>>>>>
>     >>>>>>>>
>     >>>>>>>> [1] https://ci.ursalabs.org/ <https://ci.ursalabs.org/#/>
>     >>>>>>>> [2] https://github.com/ursa-labs/ursabot
>     >>>>>>>> [3]
>     >>>>>>>>
>     >>
>     https://docs.travis-ci.com/user/migrate/open-source-repository-migration
>     >>>>>>>> [4]
>     >>>>>>>>
>     >>
>     https://docs.travis-ci.com/user/migrate/open-source-on-travis-ci-com
>     >>>>>>>>
>     >>>>>>>>
>     >>>>>>>> On Wed, Jul 3, 2019 at 12:01 AM Chesnay Schepler
>     >>>>>>>> <chesnay@apache.org <ma...@apache.org>
>     <mailto:chesnay@apache.org <ma...@apache.org>>> wrote:
>     >>>>>>>>
>     >>>>>>>>       Are they using their own Travis CI pool, or did the
>     switch to
>     >> an
>     >>>>>>>>       entirely different CI service?
>     >>>>>>>>
>     >>>>>>>>       If we can just switch to our own Travis pool, just
>     for our
>     >>>>>>>>       project, then
>     >>>>>>>>       this might be something we can do fairly quickly?
>     >>>>>>>>
>     >>>>>>>>       On 03/07/2019 05:55, Bowen Li wrote:
>     >>>>>>>>       > I responded in the INFRA ticket [1] that I
>     believe they are
>     >>>>>>>>       using a wrong
>     >>>>>>>>       > metric against Flink and the total build time is
>     a completely
>     >>>>>>>>       different
>     >>>>>>>>       > thing than guaranteed build capacity.
>     >>>>>>>>       >
>     >>>>>>>>       > My response:
>     >>>>>>>>       >
>     >>>>>>>>       > "As mentioned above, since I started to pay
>     attention to
>     >> Flink's
>     >>>>>>>>       build
>     >>>>>>>>       > queue a few tens of days ago, I'm in Seattle and
>     I saw no
>     >> build
>     >>>>>>>>       was kicking
>     >>>>>>>>       > off in PST daytime in weekdays for Flink. Our
>     teammates in
>     >> China
>     >>>>>>>>       and Europe
>     >>>>>>>>       > have also reported similar observations. So we
>     need to
>     >> evaluate
>     >>>>>>>>       how the
>     >>>>>>>>       > large total build time came from - if 1) your
>     number and 2)
>     >> our
>     >>>>>>>>       > observations from three locations that cover
>     pretty much a
>     >> full
>     >>>>>>>>       day, are
>     >>>>>>>>       > all true, I **guess** one reason can be that -
>     highly likely
>     >> the
>     >>>>>>>>       extra
>     >>>>>>>>       > build time came from weekends when other Apache
>     projects may
>     >> be
>     >>>>>>>>       idle and
>     >>>>>>>>       > Flink just drains hard its congested queue.
>     >>>>>>>>       >
>     >>>>>>>>       > Please be aware of that we're not complaining
>     about the lack
>     >> of
>     >>>>>>>>       resources
>     >>>>>>>>       > in general, I'm complaining about the lack of
>     **stable,
>     >>>>>>>> dedicated**
>     >>>>>>>>       > resources. An example for the latter one is,
>     currently even
>     >> if
>     >>>>>>>>       no build is
>     >>>>>>>>       > in Flink's queue and I submit a request to be the
>     queue head
>     >>>>>>>> in PST
>     >>>>>>>>       > morning, my build won't even start in 6-8+h. That
>     is an
>     >> absurd
>     >>>>>>>>       amount of
>     >>>>>>>>       > waiting time.
>     >>>>>>>>       >
>     >>>>>>>>       > That's saying, if ASF INFRA decides to adopt a
>     quota system
>     >> and
>     >>>>>>>>       grants
>     >>>>>>>>       > Flink five DEDICATED servers that runs all the
>     time only for
>     >>>>>>>>       Flink, that'll
>     >>>>>>>>       > be PERFECT and can totally solve our problem now.
>     >>>>>>>>       >
>     >>>>>>>>       > Please be aware of that we're not complaining
>     about the lack
>     >> of
>     >>>>>>>>       resources
>     >>>>>>>>       > in general, I'm complaining about the lack of
>     **stable,
>     >>>>>>>> dedicated**
>     >>>>>>>>       > resources. An example for the latter one is,
>     currently even
>     >> if
>     >>>>>>>>       no build is
>     >>>>>>>>       > in Flink's queue and I submit a request to be the
>     queue head
>     >>>>>>>> in PST
>     >>>>>>>>       > morning, my build won't even start in 6-8+h. That
>     is an
>     >> absurd
>     >>>>>>>>       amount of
>     >>>>>>>>       > waiting time.
>     >>>>>>>>       >
>     >>>>>>>>       >
>     >>>>>>>>       > That's saying, if ASF INFRA decides to adopt a
>     quota system
>     >> and
>     >>>>>>>>       grants
>     >>>>>>>>       > Flink five DEDICATED servers that runs all the
>     time only for
>     >>>>>>>>       Flink, that'll
>     >>>>>>>>       > be PERFECT and can totally solve our problem now.
>     >>>>>>>>       >
>     >>>>>>>>       > I feel what's missing in the ASF INFRA's Travis
>     resource
>     >> pool is
>     >>>>>>>>       some level
>     >>>>>>>>       > of build capacity SLAs and certainty"
>     >>>>>>>>       >
>     >>>>>>>>       >
>     >>>>>>>>       > Again, I believe there are differences in nature
>     of these two
>     >>>>>>>>       problems,
>     >>>>>>>>       > long build time v.s. lack of dedicated build
>     resource. That's
>     >>>>>>>>       saying,
>     >>>>>>>>       > shortening build time may relieve the situation,
>     and may not.
>     >>>>>>>>       I'm sightly
>     >>>>>>>>       > negative on disabling IT cases for PRs, due to
>     the downside
>     >> is
>     >>>>>>>>       that we are
>     >>>>>>>>       > at risk of any potential bugs in PR that UTs
>     doesn't catch,
>     >> and
>     >>>>>>>>       may cost a
>     >>>>>>>>       > lot more to fix and if it slows others down or
>     even block
>     >>>>>>>>       others, but am
>     >>>>>>>>       > open to others opinions on it.
>     >>>>>>>>       >
>     >>>>>>>>       > AFAICT from INFRA ticket[1], donating to ASF
>     INFRA won't be
>     >>>>>>>>       feasible to
>     >>>>>>>>       > solve our problem since INFRA's pool is fully
>     shared and they
>     >>>>>>>>       have no
>     >>>>>>>>       > control and finer insights over resource
>     allocation to a
>     >>>>>>>>       specific Apache
>     >>>>>>>>       > project. As mentioned in [1], Apache Arrow is
>     moving away
>     >> from
>     >>>>>>>>       ASF INFRA
>     >>>>>>>>       > Travis pool (they are actually surprised Flink
>     hasn't plan
>     >> to do
>     >>>>>>>>       so). I
>     >>>>>>>>       > know that Spark is on its own build infra. If we
>     all agree
>     >> that
>     >>>>>>>>       funding our
>     >>>>>>>>       > own build infra, I'd be glad to help investigate any
>     >> potential
>     >>>>>>>>       options
>     >>>>>>>>       > after releasing 1.9 since I'm super busy with 1.9
>     now.
>     >>>>>>>>       >
>     >>>>>>>>       > [1] https://issues.apache.org/jira/browse/INFRA-18533
>     >>>>>>>>       >
>     >>>>>>>>       >
>     >>>>>>>>       >
>     >>>>>>>>       > On Tue, Jul 2, 2019 at 4:46 AM Chesnay Schepler
>     >>>>>>>>       <chesnay@apache.org <ma...@apache.org>
>     <mailto:chesnay@apache.org <ma...@apache.org>>> wrote:
>     >>>>>>>>       >
>     >>>>>>>>       >> As a short-term stopgap, since we can assume
>     this issue to
>     >>>>>>>>       become much
>     >>>>>>>>       >> worse in the following days/weeks, we could
>     disable IT
>     >> cases in
>     >>>>>>>>       PRs and
>     >>>>>>>>       >> only run them on master.
>     >>>>>>>>       >>
>     >>>>>>>>       >> On 02/07/2019 12:03, Chesnay Schepler wrote:
>     >>>>>>>>       >>> People really have to stop thinking that just
>     because
>     >>>>>>>>       something works
>     >>>>>>>>       >>> for us it is also a good solution.
>     >>>>>>>>       >>> Also, please remember that our builds run for
>     2h from
>     >> start to
>     >>>>>>>>       finish,
>     >>>>>>>>       >>> and not the 14 _minutes_ it takes for zeppelin.
>     >>>>>>>>       >>> We are dealing with an entirely different scale
>     here, both
>     >> in
>     >>>>>>>>       terms of
>     >>>>>>>>       >>> build times and number of builds.
>     >>>>>>>>       >>>
>     >>>>>>>>       >>> In this very thread people have been
>     complaining about long
>     >>>>>>>> queue
>     >>>>>>>>       >>> times for their builds. Surprise, other Apache
>     projects
>     >>>>>>>> have been
>     >>>>>>>>       >>> suffering the very same thing due to us not
>     controlling our
>     >>>>>>>> build
>     >>>>>>>>       >>> times. While switching services (be it Jenkins,
>     CircleCI or
>     >>>>>>>>       whatever)
>     >>>>>>>>       >>> will possibly work for us (and these options
>     are actually
>     >>>>>>>>       attractive,
>     >>>>>>>>       >>> like CircleCI's proper support for build
>     artifacts), it
>     >>>>>>>> will also
>     >>>>>>>>       >>> result in us likely negatively affecting other
>     projects in
>     >>>>>>>>       significant
>     >>>>>>>>       >>> ways.
>     >>>>>>>>       >>>
>     >>>>>>>>       >>> Sure, the Jenkins setup has a good user
>     experience for us,
>     >> at
>     >>>>>>>>       the cost
>     >>>>>>>>       >>> of blocking Jenkins workers for a _lot_ of
>     time. Right now
>     >> we
>     >>>>>>>>       have 25
>     >>>>>>>>       >>> PR's in our queue; that's possibly 50h we'd
>     consume of
>     >> Jenkins
>     >>>>>>>>       >>> resources, and the European contributors
>     haven't even
>     >> really
>     >>>>>>>>       started yet.
>     >>>>>>>>       >>>
>     >>>>>>>>       >>> FYI, the latest INFRA response from INFRA-18533:
>     >>>>>>>>       >>>
>     >>>>>>>>       >>> "Our rough metrics shows that Flink used over
>     5800 hours of
>     >>>>>>>>       build time
>     >>>>>>>>       >>> last month. That is equal to EIGHT servers
>     running 24/7 for
>     >>>>>>>>       the ENTIRE
>     >>>>>>>>       >>> MONTH. EIGHT. nonstop.
>     >>>>>>>>       >>> When we discovered this last night, we
>     discussed it some
>     >> and
>     >>>>>>>>       are going
>     >>>>>>>>       >>> to tune down Flink to allow only five executors
>     maximum. We
>     >>>>>>>> cannot
>     >>>>>>>>       >>> allow Flink to consume so much of a Foundation
>     shared
>     >>>>>>>> resource."
>     >>>>>>>>       >>>
>     >>>>>>>>       >>> So yes, we either
>     >>>>>>>>       >>> a) have to heavily reduce our CI usage or
>     >>>>>>>>       >>> b) fund our own, either maintaining it ourselves or
>     >> donating
>     >>>>>>>>       to Apache.
>     >>>>>>>>       >>>
>     >>>>>>>>       >>> On 02/07/2019 05:11, Bowen Li wrote:
>     >>>>>>>>       >>>> By looking at the git history of the Jenkins
>     script, its
>     >> core
>     >>>>>>>>       part
>     >>>>>>>>       >>>> was finished in March 2017 (and only two minor
>     update in
>     >>>>>>>>       2017/2018),
>     >>>>>>>>       >>>> so it's been running for over two years now
>     and feels like
>     >>>>>>>>       Zepplin
>     >>>>>>>>       >>>> community has been quite happy with it. @Jeff
>     Zhang
>     >>>>>>>>       >>>> <mailto:zjffdu@gmail.com
>     <ma...@gmail.com> <mailto:zjffdu@gmail.com
>     <ma...@gmail.com>>> can
>     >> you
>     >>>>>>>>       share your insights and user
>     >>>>>>>>       >>>> experience with the Jenkins+Travis approach?
>     >>>>>>>>       >>>>
>     >>>>>>>>       >>>> Things like:
>     >>>>>>>>       >>>>
>     >>>>>>>>       >>>> - has the approach completely solved the
>     resource capacity
>     >>>>>>>>       problem
>     >>>>>>>>       >>>> for Zepplin community? is Zepplin community
>     happy with the
>     >>>>>>>>       result?
>     >>>>>>>>       >>>> - is the whole configuration chain stable
>     (e.g. uptime)
>     >>>>>>>> enough?
>     >>>>>>>>       >>>> - how often do you need to maintain the
>     Jenkins infra? how
>     >>>>>>>> many
>     >>>>>>>>       >>>> people are usually involved in maintenance and
>     bug-fixes?
>     >>>>>>>>       >>>>
>     >>>>>>>>       >>>> The downside of this approach seems mostly to
>     be on the
>     >>>>>>>>       maintenance
>     >>>>>>>>       >>>> to me - maintain the script and Jenkins infra.
>     >>>>>>>>       >>>>
>     >>>>>>>>       >>>> ** Having Our Own Travis-CI.com Account **
>     >>>>>>>>       >>>>
>     >>>>>>>>       >>>> Another alternative I've been thinking of is
>     to have our
>     >> own
>     >>>>>>>>       >>>> travis-ci.com <http://travis-ci.com>
>     <http://travis-ci.com> <
>     >> http://travis-ci.com>
>     >>>>>>>>       account with paid dedicated
>     >>>>>>>>       >>>> resources. Note travis-ci.org
>     <http://travis-ci.org> <http://travis-ci.org>
>     >>>>>>>>       <http://travis-ci.org> is the free
>     >>>>>>>>       >>>> version and travis-ci.com
>     <http://travis-ci.com> <http://travis-ci.com>
>     >>>>>>>>       <http://travis-ci.com> is the commercial
>     >>>>>>>>       >>>> version. We currently use a shared resource
>     pool managed
>     >> by
>     >>>>>>>>       ASK INFRA
>     >>>>>>>>       >>>> team on travis-ci.org <http://travis-ci.org>
>     <http://travis-ci.org>
>     >>>>>>>>       <http://travis-ci.org>, but we have no control
>     >>>>>>>>       >>>> over it - we can't see how it's configured,
>     how much
>     >>>>>>>>       resources are
>     >>>>>>>>       >>>> available, how resources are allocated among
>     Apache
>     >> projects,
>     >>>>>>>>       etc.
>     >>>>>>>>       >>>> The nice thing about having an account on
>     travis-ci.com <http://travis-ci.com>
>     >>>>>>>>       <http://travis-ci.com>
>     >>>>>>>>       >>>> <http://travis-ci.com> are:
>     >>>>>>>>       >>>>
>     >>>>>>>>       >>>> - relatively low cost with much better
>     resource guarantee
>     >>>>>>>>       than what
>     >>>>>>>>       >>>> we currently have [1]: $249/month with 5 dedicated
>     >>>>>>>> concurrency,
>     >>>>>>>>       >>>> $489/month with 10 concurrency
>     >>>>>>>>       >>>> - low maintenance work compared to using Jenkins
>     >>>>>>>>       >>>> - (potentially) no migration cost according to
>     Travis's
>     >>>>>>>> doc [2]
>     >>>>>>>>       >>>> (pending verification)
>     >>>>>>>>       >>>> - full control over the build
>     capacity/configuration
>     >>>>>>>> compared to
>     >>>>>>>>       >>>> using ASF INFRA's pool
>     >>>>>>>>       >>>>
>     >>>>>>>>       >>>> I'd be surprised if we as such a vibrant
>     community cannot
>     >>>>>>>>       find and
>     >>>>>>>>       >>>> fund $249*12=$2988 a year in exchange for a
>     much better
>     >>>>>>>> developer
>     >>>>>>>>       >>>> experience and much higher productivity.
>     >>>>>>>>       >>>>
>     >>>>>>>>       >>>> [1] https://travis-ci.com/plans
>     >>>>>>>>       >>>> [2]
>     >>>>>>>>       >>>>
>     >>>>>>>>       >>
>     >>>>>>>>
>     >>
>     https://docs.travis-ci.com/user/migrate/open-source-repository-migration
>     >>>>>>>>       >>>> On Sat, Jun 29, 2019 at 8:39 AM Chesnay Schepler
>     >>>>>>>>       <chesnay@apache.org <ma...@apache.org>
>     <mailto:chesnay@apache.org <ma...@apache.org>>
>     >>>>>>>>       >>>> <mailto:chesnay@apache.org
>     <ma...@apache.org> <mailto:chesnay@apache.org
>     <ma...@apache.org>>>>
>     >>>>>>>> wrote:
>     >>>>>>>>       >>>>
>     >>>>>>>>       >>>> So yes, the Jenkins job keeps pulling the
>     state from
>     >>>>>>>>       Travis until it
>     >>>>>>>>       >>>> finishes.
>     >>>>>>>>       >>>>
>     >>>>>>>>       >>>> Note sure I'm comfortable with the idea of using
>     >> Jenkins
>     >>>>>>>>       workers
>     >>>>>>>>       >>>> just to
>     >>>>>>>>       >>>> idle for a several hours.
>     >>>>>>>>       >>>>
>     >>>>>>>>       >>>> On 29/06/2019 14:56, Jeff Zhang wrote:
>     >>>>>>>>       >>>> > Here's what zeppelin community did, we make a
>     >> python
>     >>>>>>>>       script to
>     >>>>>>>>       >>>> check the
>     >>>>>>>>       >>>> > build status of pull request.
>     >>>>>>>>       >>>> > Here's script:
>     >>>>>>>>       >>>> >
>     >>>>>>>>
>     https://github.com/apache/zeppelin/blob/master/travis_check.py
>     >>>>>>>>       >>>> >
>     >>>>>>>>       >>>> > And this is the script we used in Jenkins build
>     >> job.
>     >>>>>>>>       >>>> >
>     >>>>>>>>       >>>> > if [ -f "travis_check.py" ]; then
>     >>>>>>>>       >>>> >    git log -n 1
>     >>>>>>>>       >>>> >    STATUS=$(curl -s $BUILD_URL | grep -e "GitHub
>     >> pull
>     >>>>>>>>       >>>> request.*from.*" | sed
>     >>>>>>>>       >>>> > 's/.*GitHub pull request <a
>     >>>>>>>>       >>>> >
>     >> href=\"\(https[^"]*\).*from[^"]*.\(https[^"]*\).*/\1
>     >>>>>>>>       \2/g')
>     >>>>>>>>       >>>> >    AUTHOR=$(echo $STATUS | sed
>     >> 's/.*[/]\(.*\)$/\1/g')
>     >>>>>>>>       >>>> >    PR=$(echo $STATUS | awk '{print $1}' | sed
>     >>>>>>>>       >>>> 's/.*[/]\(.*\)$/\1/g')
>     >>>>>>>>       >>>> >    #COMMIT=$(git log -n 1 | grep "^Merge:" | awk
>     >>>>>>>>       '{print $3}')
>     >>>>>>>>       >>>> >    #if [ -z $COMMIT ]; then
>     >>>>>>>>       >>>> >    #  COMMIT=$(curl -s
>     >>>>>>>>       >>>>
>     https://api.github.com/repos/apache/zeppelin/pulls/$PR
>     >>>>>>>>       >>>> > | grep -e "\"label\":" -e "\"ref\":" -e
>     "\"sha\":"
>     >> |
>     >>>>>>>>       tr '\n' ' '
>     >>>>>>>>       >>>> | sed
>     >>>>>>>>       >>>> > 's/\(.*sha[^,]*,\)\(.*ref.*\)/\1 = \2/g' | tr =
>     >> '\n' |
>     >>>>>>>>       grep -v
>     >>>>>>>>       >>>> "apache:" |
>     >>>>>>>>       >>>> > sed 's/.*sha.[^"]*["]\([^"]*\).*/\1/g')
>     >>>>>>>>       >>>> >    #fi
>     >>>>>>>>       >>>> >
>     >>>>>>>>       >>>> >    # get commit hash from PR
>     >>>>>>>>       >>>> >    COMMIT=$(curl -s
>     >>>>>>>>       >>>>
>     https://api.github.com/repos/apache/zeppelin/pulls/$PR |
>     >>>>>>>>       >>>> > grep -e "\"label\":" -e "\"ref\":" -e
>     "\"sha\":" |
>     >> tr
>     >>>>>>>>       '\n' ' '
>     >>>>>>>>       >>>> | sed
>     >>>>>>>>       >>>> > 's/\(.*sha[^,]*,\)\(.*ref.*\)/\1 = \2/g' | tr =
>     >> '\n' |
>     >>>>>>>>       grep -v
>     >>>>>>>>       >>>> "apache:" |
>     >>>>>>>>       >>>> > sed 's/.*sha.[^"]*["]\([^"]*\).*/\1/g')
>     >>>>>>>>       >>>> >    sleep 30 # sleep few moment to wait travis
>     >> starts
>     >>>>>>>>       the build
>     >>>>>>>>       >>>> >    RET_CODE=0
>     >>>>>>>>       >>>> >    python ./travis_check.py ${AUTHOR}
>     ${COMMIT} ||
>     >>>>>>>>       RET_CODE=$?
>     >>>>>>>>       >>>> >    if [ $RET_CODE -eq 2 ]; then # try with
>     >> repository
>     >>>>>>>>       name when
>     >>>>>>>>       >>>> travis-ci is
>     >>>>>>>>       >>>> > not available in the account
>     >>>>>>>>       >>>> >      RET_CODE=0
>     >>>>>>>>       >>>> >      AUTHOR=$(curl -s
>     >>>>>>>>       >>>>
>     https://api.github.com/repos/apache/zeppelin/pulls/$PR
>     >>>>>>>>       >>>> > | grep '"full_name":' | grep -v
>     "apache/zeppelin" |
>     >>>>>>>> sed
>     >>>>>>>>       >>>> > 's/.*[:][^"]*["]\([^/]*\).*/\1/g')
>     >>>>>>>>       >>>> >    python ./travis_check.py ${AUTHOR}
>     ${COMMIT} ||
>     >>>>>>>>       RET_CODE=$?
>     >>>>>>>>       >>>> >    fi
>     >>>>>>>>       >>>> >
>     >>>>>>>>       >>>> >    if [ $RET_CODE -eq 2 ]; then # fail with
>     can't
>     >> find
>     >>>>>>>>       build
>     >>>>>>>>       >>>> information in
>     >>>>>>>>       >>>> > the travis
>     >>>>>>>>       >>>> >      set +x
>     >>>>>>>>       >>>> >      echo
>     >>>>>>>>  "-----------------------------------------------------"
>     >>>>>>>>       >>>> >      echo "Looks like travis-ci is not
>     configured
>     >> for
>     >>>>>>>>       your fork."
>     >>>>>>>>       >>>> >      echo "Please setup by swich on 'zeppelin'
>     >>>>>>>>       repository at
>     >>>>>>>>       >>>> > https://travis-ci.org/profile and travis-ci."
>     >>>>>>>>       >>>> >      echo "And then make sure 'Build branch
>     >> updates'
>     >>>>>>>>       option is
>     >>>>>>>>       >>>> enabled in
>     >>>>>>>>       >>>> > the settings
>     >>>>>>>> https://travis-ci.org/${AUTHOR}/zeppelin/settings
>     <https://travis-ci.org/$%7BAUTHOR%7D/zeppelin/settings>
>     >>>>>>>> <https://travis-ci.org/$%7BAUTHOR%7D/zeppelin/settings>
>     >>>>>>>>       >>>>
>     <https://travis-ci.org/$%7BAUTHOR%7D/zeppelin/settings>."
>     >>>>>>>>       >>>> >      echo ""
>     >>>>>>>>       >>>> >      echo "To trigger CI after setup, you
>     will need
>     >>>>>>>>       ammend your
>     >>>>>>>>       >>>> last commit
>     >>>>>>>>       >>>> > with"
>     >>>>>>>>       >>>> >      echo "git commit --amend"
>     >>>>>>>>       >>>> >      echo "git push your-remote HEAD --force"
>     >>>>>>>>       >>>> >      echo ""
>     >>>>>>>>       >>>> >      echo "See
>     >>>>>>>>       >>>> >
>     >>>>>>>>       >>>>
>     >>>>>>>>       >>
>     >>>>>>>>
>     >>
>     http://zeppelin.apache.org/contribution/contributions.html#continuous-integration
>     >>>>>>>>       >>>> > ."
>     >>>>>>>>       >>>> >    fi
>     >>>>>>>>       >>>> >
>     >>>>>>>>       >>>> >    exit $RET_CODE
>     >>>>>>>>       >>>> > else
>     >>>>>>>>       >>>> >    set +x
>     >>>>>>>>       >>>> >    echo "travis_check.py does not exists"
>     >>>>>>>>       >>>> >    exit 1
>     >>>>>>>>       >>>> > fi
>     >>>>>>>>       >>>> >
>     >>>>>>>>       >>>> > Chesnay Schepler <chesnay@apache.org
>     <ma...@apache.org>
>     >>>>>>>>       <mailto:chesnay@apache.org <ma...@apache.org>>
>     >>>>>>>>       >>>> <mailto:chesnay@apache.org
>     <ma...@apache.org> <mailto:
>     >> chesnay@apache.org <ma...@apache.org>
>     >>>>>>>>       于2019年6月29日周六 下午3:17写道:
>     >>>>>>>>       >>>> >
>     >>>>>>>>       >>>> >> Does this imply that a Jenkins job is active as
>     >> long
>     >>>>>>>>       as the
>     >>>>>>>>       >>>> Travis build
>     >>>>>>>>       >>>> >> runs?
>     >>>>>>>>       >>>> >>
>     >>>>>>>>       >>>> >> On 26/06/2019 21:28, Bowen Li wrote:
>     >>>>>>>>       >>>> >>> Hi,
>     >>>>>>>>       >>>> >>>
>     >>>>>>>>       >>>> >>> @Dawid, I think the "long test running" as I
>     >>>>>>>>       mentioned in the
>     >>>>>>>>       >>>> first
>     >>>>>>>>       >>>> >> email,
>     >>>>>>>>       >>>> >>> also as you guys said, belongs to "a big
>     effort
>     >>>>>>>>       which is much
>     >>>>>>>>       >>>> harder to
>     >>>>>>>>       >>>> >>> accomplish in a short period of time and may
>     >> deserve
>     >>>>>>>>       its own
>     >>>>>>>>       >>>> separate
>     >>>>>>>>       >>>> >>> discussion". Thus I didn't include it in
>     what we
>     >> can
>     >>>>>>>>       do in a
>     >>>>>>>>       >>>> foreseeable
>     >>>>>>>>       >>>> >>> short term.
>     >>>>>>>>       >>>> >>>
>     >>>>>>>>       >>>> >>> Besides, I don't think that's the ultimate
>     reason
>     >>>>>>>>       for lack of
>     >>>>>>>>       >>>> build
>     >>>>>>>>       >>>> >>> resources. Even if the build is shortened to
>     >>>>>>>>       something like
>     >>>>>>>>       >>>> 2h, the
>     >>>>>>>>       >>>> >>> problems of no build machine works about 6 or
>     >> more
>     >>>>>>>>       hours in
>     >>>>>>>>       >>>> PST daytime
>     >>>>>>>>       >>>> >>> that I described will still happen, because no
>     >>>>>>>>       machine from
>     >>>>>>>>       >>>> ASF INFRA's
>     >>>>>>>>       >>>> >>> pool is allocated to Flink. As I have paid
>     close
>     >>>>>>>>       attention to
>     >>>>>>>>       >>>> the build
>     >>>>>>>>       >>>> >>> queue in the past few weekdays, it's a pretty
>     >> clear
>     >>>>>>>>       pattern now.
>     >>>>>>>>       >>>> >>>
>     >>>>>>>>       >>>> >>> **The ultimate root cause** for that is - we
>     >> don't
>     >>>>>>>>       have any
>     >>>>>>>>       >>>> **dedicated**
>     >>>>>>>>       >>>> >>> build resources that we can stably rely
>     on. I'm
>     >>>>>>>>       actually ok to
>     >>>>>>>>       >>>> wait for a
>     >>>>>>>>       >>>> >>> long time if there are build requests
>     running, it
>     >>>>>>>>       means at
>     >>>>>>>>       >>>> least we are
>     >>>>>>>>       >>>> >>> making progress. But I'm not ok with no build
>     >>>>>>>>       resource. A
>     >>>>>>>>       >>>> better place I
>     >>>>>>>>       >>>> >>> think we should aim at in short term is to
>     always
>     >>>>>>>>       have at
>     >>>>>>>>       >>>> least a central
>     >>>>>>>>       >>>> >>> pool (can be 3 or 5) of machines dedicated to
>     >> build
>     >>>>>>>>       Flink at
>     >>>>>>>>       >>>> any time, or
>     >>>>>>>>       >>>> >>> maybe use users resources.
>     >>>>>>>>       >>>> >>>
>     >>>>>>>>       >>>> >>> @Chesnay @Robert I synced with Jeff
>     offline that
>     >>>>>>>>       Zeppelin
>     >>>>>>>>       >>>> community is
>     >>>>>>>>       >>>> >>> using a Jenkins job to automatically build on
>     >> users'
>     >>>>>>>>       travis
>     >>>>>>>>       >>>> account and
>     >>>>>>>>       >>>> >>> link the result back to github PR. I guess the
>     >>>>>>>>       Jenkins job
>     >>>>>>>>       >>>> would fetch
>     >>>>>>>>       >>>> >>> latest upstream master and build the PR
>     against
>     >> it.
>     >>>>>>>>       Jeff has
>     >>>>>>>>       >>>> filed
>     >>>>>>>>       >>>> >> tickets
>     >>>>>>>>       >>>> >>> to learn and get access to the Jenkins infra.
>     >> It'll
>     >>>>>>>>       better to
>     >>>>>>>>       >>>> fully
>     >>>>>>>>       >>>> >>> understand it first before judging this
>     approach.
>     >>>>>>>>       >>>> >>>
>     >>>>>>>>       >>>> >>> I also heard good things about CircleCI,
>     and ASF
>     >>>>>>>>       INFRA seems
>     >>>>>>>>       >>>> to have a
>     >>>>>>>>       >>>> >> pool
>     >>>>>>>>       >>>> >>> of build capacity there too. Can be an
>     >> alternative
>     >>>>>>>>       to consider.
>     >>>>>>>>       >>>> >>>
>     >>>>>>>>       >>>> >>>
>     >>>>>>>>       >>>> >>>
>     >>>>>>>>       >>>> >>>
>     >>>>>>>>       >>>> >>>
>     >>>>>>>>       >>>> >>>
>     >>>>>>>>       >>>> >>>
>     >>>>>>>>       >>>> >>>
>     >>>>>>>>       >>>> >>>
>     >>>>>>>>       >>>> >>> On Wed, Jun 26, 2019 at 12:44 AM Dawid
>     >> Wysakowicz <
>     >>>>>>>>       >>>> >> dwysakowicz@apache.org
>     <ma...@apache.org>
>     >>>>>>>>       <mailto:dwysakowicz@apache.org
>     <ma...@apache.org>> <mailto:dwysakowicz@apache.org
>     <ma...@apache.org>
>     >>>>>>>>       <mailto:dwysakowicz@apache.org
>     <ma...@apache.org>>>>
>     >>>>>>>>       >>>> >>> wrote:
>     >>>>>>>>       >>>> >>>
>     >>>>>>>>       >>>> >>>> Sorry to jump in late, but I think Bowen
>     missed
>     >> the
>     >>>>>>>>       most
>     >>>>>>>>       >>>> important point
>     >>>>>>>>       >>>> >>>> from Chesnay's previous message in the
>     summary.
>     >> The
>     >>>>>>>>       ultimate
>     >>>>>>>>       >>>> reason for
>     >>>>>>>>       >>>> >>>> all the problems is that the tests take close
>     >> to 2
>     >>>>>>>>       hours to
>     >>>>>>>>       >>>> run already.
>     >>>>>>>>       >>>> >>>> I fully support this claim: "Unless
>     people start
>     >>>>>>>>       caring about
>     >>>>>>>>       >>>> test times
>     >>>>>>>>       >>>> >>>> before adding them, this issue cannot be
>     solved"
>     >>>>>>>>       >>>> >>>>
>     >>>>>>>>       >>>> >>>> This is also another reason why using user's
>     >> Travis
>     >>>>>>>>       account
>     >>>>>>>>       >>>> won't help.
>     >>>>>>>>       >>>> >>>> Every few weeks we reach the user's time
>     limit
>     >> for
>     >>>>>>>>       a single
>     >>>>>>>>       >>>> profile.
>     >>>>>>>>       >>>> >>>> This makes the user's builds simply fail,
>     until
>     >> we
>     >>>>>>>>       either
>     >>>>>>>>       >>>> properly
>     >>>>>>>>       >>>> >>>> decrease the time the tests take (which I
>     am not
>     >>>>>>>>       sure we ever
>     >>>>>>>>       >>>> did) or
>     >>>>>>>>       >>>> >>>> postpone the problem by splitting into more
>     >>>>>>>>       profiles. (Note
>     >>>>>>>>       >>>> that the ASF
>     >>>>>>>>       >>>> >>>> Travis account has higher time limits)
>     >>>>>>>>       >>>> >>>>
>     >>>>>>>>       >>>> >>>> Best,
>     >>>>>>>>       >>>> >>>>
>     >>>>>>>>       >>>> >>>> Dawid
>     >>>>>>>>       >>>> >>>>
>     >>>>>>>>       >>>> >>>> On 26/06/2019 09:36, Robert Metzger wrote:
>     >>>>>>>>       >>>> >>>>> Do we know if using "the best" available
>     >> hardware
>     >>>>>>>>       would
>     >>>>>>>>       >>>> improve the
>     >>>>>>>>       >>>> >> build
>     >>>>>>>>       >>>> >>>>> times?
>     >>>>>>>>       >>>> >>>>> Imagine we would run the build on
>     machines with
>     >>>>>>>>       plenty of
>     >>>>>>>>       >>>> main memory
>     >>>>>>>>       >>>> >> to
>     >>>>>>>>       >>>> >>>>> mount everything to ramdisk + the latest CPU
>     >>>>>>>>       architecture?
>     >>>>>>>>       >>>> >>>>>
>     >>>>>>>>       >>>> >>>>> Throwing hardware at the problem could help
>     >> reduce
>     >>>>>>>>       the time
>     >>>>>>>>       >>>> of an
>     >>>>>>>>       >>>> >>>>> individual build, and using our own
>     >> infrastructure
>     >>>>>>>>       would
>     >>>>>>>>       >>>> remove our
>     >>>>>>>>       >>>> >>>>> dependency on Apache's Travis account
>     (with the
>     >>>>>>>>       obvious
>     >>>>>>>>       >>>> downside of
>     >>>>>>>>       >>>> >>>> having
>     >>>>>>>>       >>>> >>>>> to maintain the infrastructure)
>     >>>>>>>>       >>>> >>>>> We could use an open source travis
>     >> alternative, to
>     >>>>>>>>       have a
>     >>>>>>>>       >>>> similar
>     >>>>>>>>       >>>> >>>>> experience and make the migration easy.
>     >>>>>>>>       >>>> >>>>>
>     >>>>>>>>       >>>> >>>>>
>     >>>>>>>>       >>>> >>>>> On Wed, Jun 26, 2019 at 9:34 AM Chesnay
>     >> Schepler
>     >>>>>>>>       >>>> <chesnay@apache.org
>     <ma...@apache.org> <mailto:chesnay@apache.org
>     <ma...@apache.org>>
>     >>>>>>>>       <mailto:chesnay@apache.org
>     <ma...@apache.org> <mailto:chesnay@apache.org
>     <ma...@apache.org>>>>
>     >>>>>>>>       >>>> >>>> wrote:
>     >>>>>>>>       >>>> >>>>>> >From what I gathered, there's no special
>     >>>>>>>>       sauce that the
>     >>>>>>>>       >>>> Zeppelin
>     >>>>>>>>       >>>> >>>>>> project uses which actually integrates
>     a users
>     >>>>>>>> Travis
>     >>>>>>>>       >>>> account into the
>     >>>>>>>>       >>>> >>>> PR.
>     >>>>>>>>       >>>> >>>>>> They just disabled Travis for PRs. And
>     that's
>     >>>>>>>>       kind of it.
>     >>>>>>>>       >>>> >>>>>>
>     >>>>>>>>       >>>> >>>>>> Naturally we can do this (duh) and safe the
>     >> ASF a
>     >>>>>>>>       fair
>     >>>>>>>>       >>>> amount of
>     >>>>>>>>       >>>> >>>>>> resources, but there are downsides:
>     >>>>>>>>       >>>> >>>>>>
>     >>>>>>>>       >>>> >>>>>> The discoverability of the Travis check
>     takes
>     >> a
>     >>>>>>>>       nose-dive.
>     >>>>>>>>       >>>> Either we
>     >>>>>>>>       >>>> >>>>>> require every contributor to always, an
>     every
>     >>>>>>>>       commit, also
>     >>>>>>>>       >>>> post a
>     >>>>>>>>       >>>> >> Travis
>     >>>>>>>>       >>>> >>>>>> build, or we have the reviewer sift through
>     >> the
>     >>>>>>>>       >>>> contributors account
>     >>>>>>>>       >>>> >> to
>     >>>>>>>>       >>>> >>>>>> find it.
>     >>>>>>>>       >>>> >>>>>>
>     >>>>>>>>       >>>> >>>>>> This is rather cumbersome.
>     Additionally, it's
>     >>>>>>>>       also not
>     >>>>>>>>       >>>> equivalent to
>     >>>>>>>>       >>>> >>>>>> having a PR build.
>     >>>>>>>>       >>>> >>>>>>
>     >>>>>>>>       >>>> >>>>>> A normal branch build takes a branch as
>     is and
>     >>>>>>>>       tests it. A
>     >>>>>>>>       >>>> PR build
>     >>>>>>>>       >>>> >>>>>> merges the branch into master, and then
>     runs
>     >> it.
>     >>>>>>>>       (Fun fact:
>     >>>>>>>>       >>>> This is
>     >>>>>>>>       >>>> >> why
>     >>>>>>>>       >>>> >>>>>> a PR without merge conflicts is not
>     being run
>     >> on
>     >>>>>>>>       Travis.)
>     >>>>>>>>       >>>> >>>>>>
>     >>>>>>>>       >>>> >>>>>> And ultimately, everyone can already
>     make use
>     >>>>>>>> of this
>     >>>>>>>>       >>>> approach anyway.
>     >>>>>>>>       >>>> >>>>>>
>     >>>>>>>>       >>>> >>>>>> On 25/06/2019 08:02, Jark Wu wrote:
>     >>>>>>>>       >>>> >>>>>>> Hi Jeff,
>     >>>>>>>>       >>>> >>>>>>>
>     >>>>>>>>       >>>> >>>>>>> Thanks for sharing the Zeppelin
>     approach. I
>     >>>>>>>>       think it's a
>     >>>>>>>>       >>>> good idea to
>     >>>>>>>>       >>>> >>>>>>> leverage user's travis account.
>     >>>>>>>>       >>>> >>>>>>> In this way, we can have almost unlimited
>     >>>>>>>>       concurrent build
>     >>>>>>>>       >>>> jobs and
>     >>>>>>>>       >>>> >>>>>>> developers can restart build by themselves
>     >>>>>>>>       (currently only
>     >>>>>>>>       >>>> committers
>     >>>>>>>>       >>>> >>>>>>> can restart PR's build).
>     >>>>>>>>       >>>> >>>>>>>
>     >>>>>>>>       >>>> >>>>>>> But I'm still not very clear how to
>     integrate
>     >>>>>>>> user's
>     >>>>>>>>       >>>> travis build
>     >>>>>>>>       >>>> >> into
>     >>>>>>>>       >>>> >>>>>>> the Flink pull request's build
>     automatically.
>     >>>>>>>>       Can you
>     >>>>>>>>       >>>> explain more in
>     >>>>>>>>       >>>> >>>>>>> detail?
>     >>>>>>>>       >>>> >>>>>>>
>     >>>>>>>>       >>>> >>>>>>> Another question: does travis only build
>     >>>>>>>>       branches for user
>     >>>>>>>>       >>>> account?
>     >>>>>>>>       >>>> >>>>>>> My concern is that builds for PRs will
>     rebase
>     >>>>>>>> user's
>     >>>>>>>>       >>>> commits against
>     >>>>>>>>       >>>> >>>>>>> current master branch.
>     >>>>>>>>       >>>> >>>>>>> This will help us to find problems before
>     >>>>>>>>       merge.  Builds
>     >>>>>>>>       >>>> for branches
>     >>>>>>>>       >>>> >>>>>>> will lose the impact of new commits in
>     >> master.
>     >>>>>>>>       >>>> >>>>>>> How does Zeppelin solve this problem?
>     >>>>>>>>       >>>> >>>>>>>
>     >>>>>>>>       >>>> >>>>>>> Thanks again for sharing the idea.
>     >>>>>>>>       >>>> >>>>>>>
>     >>>>>>>>       >>>> >>>>>>> Regards,
>     >>>>>>>>       >>>> >>>>>>> Jark
>     >>>>>>>>       >>>> >>>>>>>
>     >>>>>>>>       >>>> >>>>>>> On Tue, 25 Jun 2019 at 11:01, Jeff Zhang
>     >>>>>>>>       <zjffdu@gmail.com <ma...@gmail.com>
>     <mailto:zjffdu@gmail.com <ma...@gmail.com>>
>     >>>>>>>>       >>>> <mailto:zjffdu@gmail.com
>     <ma...@gmail.com> <mailto:zjffdu@gmail.com
>     <ma...@gmail.com>>>
>     >>>>>>>>       >>>> >>>>>>> <mailto:zjffdu@gmail.com
>     <ma...@gmail.com>
>     >>>>>>>>       <mailto:zjffdu@gmail.com <ma...@gmail.com>>
>     <mailto:zjffdu@gmail.com <ma...@gmail.com>
>     >>>>>>>>       <mailto:zjffdu@gmail.com
>     <ma...@gmail.com>>>>> wrote:
>     >>>>>>>>       >>>> >>>>>>>
>     >>>>>>>>       >>>> >>>>>>>  Hi Folks,
>     >>>>>>>>       >>>> >>>>>>>
>     >>>>>>>>       >>>> >>>>>>>  Zeppelin meet this kind of issue
>     before, we
>     >>>>>>>> solve
>     >>>>>>>>       >>>> it by
>     >>>>>>>>       >>>> >> delegating
>     >>>>>>>>       >>>> >>>>>>>  each
>     >>>>>>>>       >>>> >>>>>>>  one's PR build to his travis account
>     >>>>>>>>       (Everyone can
>     >>>>>>>>       >>>> have 5 free
>     >>>>>>>>       >>>> >>>>>>>  slot for
>     >>>>>>>>       >>>> >>>>>>>  travis build).
>     >>>>>>>>       >>>> >>>>>>>  Apache account travis build is only
>     >>>>>>>> triggered when
>     >>>>>>>>       >>>> PR is merged.
>     >>>>>>>>       >>>> >>>>>>>
>     >>>>>>>>       >>>> >>>>>>>
>     >>>>>>>>       >>>> >>>>>>>
>     >>>>>>>>       >>>> >>>>>>>  Kurt Young <ykt836@gmail.com
>     <ma...@gmail.com>
>     >>>>>>>>       <mailto:ykt836@gmail.com <ma...@gmail.com>>
>     >>>>>>>>       >>>> <mailto:ykt836@gmail.com
>     <ma...@gmail.com> <mailto:ykt836@gmail.com
>     <ma...@gmail.com>>>
>     >>>>>>>>       <mailto:ykt836@gmail.com <ma...@gmail.com>
>     <mailto:ykt836@gmail.com <ma...@gmail.com>>
>     >>>>>>>>       >>>> <mailto:ykt836@gmail.com
>     <ma...@gmail.com> <mailto:ykt836@gmail.com
>     <ma...@gmail.com>
>     >>>>>>>>       >>>> >>>>>>>  于2019年6月25日周二 上午10:16写道:
>     >>>>>>>>       >>>> >>>>>>>
>     >>>>>>>>       >>>> >>>>>>>  > (Forgot to cc George)
>     >>>>>>>>       >>>> >>>>>>>  >
>     >>>>>>>>       >>>> >>>>>>>  > Best,
>     >>>>>>>>       >>>> >>>>>>>  > Kurt
>     >>>>>>>>       >>>> >>>>>>>  >
>     >>>>>>>>       >>>> >>>>>>>  >
>     >>>>>>>>       >>>> >>>>>>>  > On Tue, Jun 25, 2019 at 10:16 AM Kurt
>     >> Young
>     >>>>>>>>       >>>> <ykt836@gmail.com <ma...@gmail.com>
>     <mailto:ykt836@gmail.com <ma...@gmail.com>>
>     >>>>>>>>       <mailto:ykt836@gmail.com <ma...@gmail.com>
>     <mailto:ykt836@gmail.com <ma...@gmail.com>>>
>     >>>>>>>>       >>>> >>>>>>> <mailto:ykt836@gmail.com
>     <ma...@gmail.com>
>     >>>>>>>>       <mailto:ykt836@gmail.com <ma...@gmail.com>>
>     <mailto:ykt836@gmail.com <ma...@gmail.com>
>     >>>>>>>>       <mailto:ykt836@gmail.com <ma...@gmail.com>>>>>
>     >>>>>>>>       >>>> wrote:
>     >>>>>>>>       >>>> >>>>>>>  >
>     >>>>>>>>       >>>> >>>>>>>  > > Hi Bowen,
>     >>>>>>>>       >>>> >>>>>>>  > >
>     >>>>>>>>       >>>> >>>>>>>  > > Thanks for bringing this up. We
>     >>>>>>>>       actually have
>     >>>>>>>>       >>>> discussed
>     >>>>>>>>       >>>> >> about
>     >>>>>>>>       >>>> >>>>>>>  this, and I
>     >>>>>>>>       >>>> >>>>>>>  > > think Till and George have
>     >>>>>>>>       >>>> >>>>>>>  > > already spend sometime investigating
>     >>>>>>>>       it. I have
>     >>>>>>>>       >>>> cced both of
>     >>>>>>>>       >>>> >>>>>>>  them, and
>     >>>>>>>>       >>>> >>>>>>>  > > maybe they can share
>     >>>>>>>>       >>>> >>>>>>>  > > their findings.
>     >>>>>>>>       >>>> >>>>>>>  > >
>     >>>>>>>>       >>>> >>>>>>>  > > Best,
>     >>>>>>>>       >>>> >>>>>>>  > > Kurt
>     >>>>>>>>       >>>> >>>>>>>  > >
>     >>>>>>>>       >>>> >>>>>>>  > >
>     >>>>>>>>       >>>> >>>>>>>  > > On Tue, Jun 25, 2019 at 10:08 AM
>     Jark Wu
>     >>>>>>>>       >>>> <imjark@gmail.com <ma...@gmail.com>
>     <mailto:imjark@gmail.com <ma...@gmail.com>>
>     >>>>>>>>       <mailto:imjark@gmail.com <ma...@gmail.com>
>     <mailto:imjark@gmail.com <ma...@gmail.com>>>
>     >>>>>>>>       >>>> >>>>>>> <mailto:imjark@gmail.com
>     <ma...@gmail.com>
>     >>>>>>>>       <mailto:imjark@gmail.com <ma...@gmail.com>>
>     <mailto:imjark@gmail.com <ma...@gmail.com>
>     >>>>>>>>       <mailto:imjark@gmail.com <ma...@gmail.com>>>>>
>     >>>>>>>>       >>>> wrote:
>     >>>>>>>>       >>>> >>>>>>>  > >
>     >>>>>>>>       >>>> >>>>>>>  > >> Hi Bowen,
>     >>>>>>>>       >>>> >>>>>>>  > >>
>     >>>>>>>>       >>>> >>>>>>>  > >> Thanks for bringing this. We also
>     >>>>>>>>       suffered from
>     >>>>>>>>       >>>> the long
>     >>>>>>>>       >>>> >>>>>>>  build time.
>     >>>>>>>>       >>>> >>>>>>>  > >> I agree that we should focus on
>     >>>>>>>>       solving build
>     >>>>>>>>       >>>> capacity
>     >>>>>>>>       >>>> >>>>>>>  problem in the
>     >>>>>>>>       >>>> >>>>>>>  > >> thread.
>     >>>>>>>>       >>>> >>>>>>>  > >>
>     >>>>>>>>       >>>> >>>>>>>  > >> My observation is there is only one
>     >>>>>>>>       build is
>     >>>>>>>>       >>>> running, all
>     >>>>>>>>       >>>> >> the
>     >>>>>>>>       >>>> >>>>>>>  others
>     >>>>>>>>       >>>> >>>>>>>  > >> (other
>     >>>>>>>>       >>>> >>>>>>>  > >> PRs, master) are pending.
>     >>>>>>>>       >>>> >>>>>>>  > >> The pricing plan[1] of travis shows
>     >>>>>>>>       it can
>     >>>>>>>>       >>>> support
>     >>>>>>>>       >>>> >> concurrent
>     >>>>>>>>       >>>> >>>>>>>  build
>     >>>>>>>>       >>>> >>>>>>>  > jobs.
>     >>>>>>>>       >>>> >>>>>>>  > >> But I don't know which plan we are
>     >>>>>>>>       using, might
>     >>>>>>>>       >>>> be the free
>     >>>>>>>>       >>>> >>>>>>>  plan for
>     >>>>>>>>       >>>> >>>>>>>  > open
>     >>>>>>>>       >>>> >>>>>>>  > >> source.
>     >>>>>>>>       >>>> >>>>>>>  > >>
>     >>>>>>>>       >>>> >>>>>>>  > >> I cc-ed Chesnay who may have some
>     >>>>>>>>       experience on
>     >>>>>>>>       >>>> Travis.
>     >>>>>>>>       >>>> >>>>>>>  > >>
>     >>>>>>>>       >>>> >>>>>>>  > >> Regards,
>     >>>>>>>>       >>>> >>>>>>>  > >> Jark
>     >>>>>>>>       >>>> >>>>>>>  > >>
>     >>>>>>>>       >>>> >>>>>>>  > >> [1]: https://travis-ci.com/plans
>     >>>>>>>>       >>>> >>>>>>>  > >>
>     >>>>>>>>       >>>> >>>>>>>  > >> On Tue, 25 Jun 2019 at 08:11,
>     Bowen Li
>     >> <
>     >>>>>>>>       >>>> >> bowenli86@gmail.com
>     <ma...@gmail.com> <mailto:bowenli86@gmail.com
>     <ma...@gmail.com>>
>     >>>>>>>>       <mailto:bowenli86@gmail.com
>     <ma...@gmail.com> <mailto:bowenli86@gmail.com
>     <ma...@gmail.com>>>
>     >>>>>>>>       >>>> >>>>>>> <mailto:bowenli86@gmail.com
>     <ma...@gmail.com>
>     >>>>>>>>       <mailto:bowenli86@gmail.com
>     <ma...@gmail.com>>
>     >>>>>>>>       >>>> <mailto:bowenli86@gmail.com
>     <ma...@gmail.com>
>     >>>>>>>>       <mailto:bowenli86@gmail.com
>     <ma...@gmail.com>>>>> wrote:
>     >>>>>>>>       >>>> >>>>>>>  > >>
>     >>>>>>>>       >>>> >>>>>>>  > >> > Hi Steven,
>     >>>>>>>>       >>>> >>>>>>>  > >> >
>     >>>>>>>>       >>>> >>>>>>>  > >> > I think you may not read what I
>     >>>>>>>>       wrote. The
>     >>>>>>>>       >>>> discussion is
>     >>>>>>>>       >>>> >>>> about
>     >>>>>>>>       >>>> >>>>>>>  > "unstable
>     >>>>>>>>       >>>> >>>>>>>  > >> > build **capacity**", in
>     another word
>     >>>>>>>>       >>>> "unstable / lack of
>     >>>>>>>>       >>>> >>>> build
>     >>>>>>>>       >>>> >>>>>>>  > >> resources",
>     >>>>>>>>       >>>> >>>>>>>  > >> > not "unstable build".
>     >>>>>>>>       >>>> >>>>>>>  > >> >
>     >>>>>>>>       >>>> >>>>>>>  > >> > On Mon, Jun 24, 2019 at 4:40 PM
>     >>>>>>>>       Steven Wu
>     >>>>>>>>       >>>> >>>>>>>  <stevenz3wu@gmail.com
>     <ma...@gmail.com>
>     >>>>>>>>       <mailto:stevenz3wu@gmail.com
>     <ma...@gmail.com>> <mailto:stevenz3wu@gmail.com
>     <ma...@gmail.com>
>     >>>>>>>>       <mailto:stevenz3wu@gmail.com
>     <ma...@gmail.com>>>
>     >>>>>>>>       >>>> <mailto:stevenz3wu@gmail.com
>     <ma...@gmail.com>
>     >>>>>>>>       <mailto:stevenz3wu@gmail.com
>     <ma...@gmail.com>> <mailto:stevenz3wu@gmail.com
>     <ma...@gmail.com>
>     >>>>>>>>       <mailto:stevenz3wu@gmail.com
>     <ma...@gmail.com>>>>>
>     >>>>>>>>       >>>> >>>>>>>  > wrote:
>     >>>>>>>>       >>>> >>>>>>>  > >> >
>     >>>>>>>>       >>>> >>>>>>>  > >> > > long and sometimes unstable
>     build
>     >> is
>     >>>>>>>>       >>>> definitely a pain
>     >>>>>>>>       >>>> >>>>>> point.
>     >>>>>>>>       >>>> >>>>>>>  > >> > >
>     >>>>>>>>       >>>> >>>>>>>  > >> > > I suspect the build failure
>     here in
>     >>>>>>>>       >>>> >> flink-connector-kafka
>     >>>>>>>>       >>>> >>>>>>>  is not
>     >>>>>>>>       >>>> >>>>>>>  > >> related
>     >>>>>>>>       >>>> >>>>>>>  > >> > to
>     >>>>>>>>       >>>> >>>>>>>  > >> > > my change. but there is no easy
>     >>>>>>>>       re-run the
>     >>>>>>>>       >>>> build on
>     >>>>>>>>       >>>> >>>>>>>  travis UI.
>     >>>>>>>>       >>>> >>>>>>>  > Google
>     >>>>>>>>       >>>> >>>>>>>  > >> > > search showed a trick of
>     >>>>>>>>       close-and-open the
>     >>>>>>>>       >>>> PR will
>     >>>>>>>>       >>>> >>>>>>>  trigger rebuild.
>     >>>>>>>>       >>>> >>>>>>>  > >> but
>     >>>>>>>>       >>>> >>>>>>>  > >> > > that could add noises to the PR
>     >>>>>>>>       activities.
>     >>>>>>>>       >>>> >>>>>>>  > >> > >
>     >>>>>>>>       >>>> https://travis-ci.org/apache/flink/jobs/545555519
>     >>>>>>>>       >>>> >>>>>>>  > >> > >
>     >>>>>>>>       >>>> >>>>>>>  > >> > > travis-ci for my personal repo
>     >>>>>>>>       often failed
>     >>>>>>>>       >>>> with
>     >>>>>>>>       >>>> >>>>>>>  exceeding time
>     >>>>>>>>       >>>> >>>>>>>  > limit
>     >>>>>>>>       >>>> >>>>>>>  > >> > after
>     >>>>>>>>       >>>> >>>>>>>  > >> > > 4+ hours.
>     >>>>>>>>       >>>> >>>>>>>  > >> > > The job exceeded the maximum
>     time
>     >>>>>>>>       limit for
>     >>>>>>>>       >>>> jobs, and
>     >>>>>>>>       >>>> >> has
>     >>>>>>>>       >>>> >>>>>>>  been
>     >>>>>>>>       >>>> >>>>>>>  > >> > terminated.
>     >>>>>>>>       >>>> >>>>>>>  > >> > >
>     >>>>>>>>       >>>> >>>>>>>  > >> > > On Mon, Jun 24, 2019 at 4:15 PM
>     >>>>>>>>       Bowen Li
>     >>>>>>>>       >>>> >>>>>>>  <bowenli86@gmail.com
>     <ma...@gmail.com>
>     >>>>>>>>       <mailto:bowenli86@gmail.com
>     <ma...@gmail.com>> <mailto:bowenli86@gmail.com
>     <ma...@gmail.com>
>     >>>>>>>>       <mailto:bowenli86@gmail.com
>     <ma...@gmail.com>>>
>     >>>>>>>>       >>>> <mailto:bowenli86@gmail.com
>     <ma...@gmail.com> <mailto:
>     >> bowenli86@gmail.com <ma...@gmail.com>
>     >>>>>>>>       <mailto:bowenli86@gmail.com
>     <ma...@gmail.com> <mailto:bowenli86@gmail.com
>     <ma...@gmail.com>>>>>
>     >>>>>>>>       >>>> >>>>>>>  > wrote:
>     >>>>>>>>       >>>> >>>>>>>  > >> > >
>     >>>>>>>>       >>>> >>>>>>>  > >> > > >
>     >>>>>>>>       >>>>
>     https://travis-ci.org/apache/flink/builds/549681530
>     >>>>>>>>       >>>> >>>>>>>  This build
>     >>>>>>>>       >>>> >>>>>>>  > >> > request
>     >>>>>>>>       >>>> >>>>>>>  > >> > > > has
>     >>>>>>>>       >>>> >>>>>>>  > >> > > > been sitting at **HEAD of the
>     >>>>>>>>       queue**
>     >>>>>>>>       >>>> since I first
>     >>>>>>>>       >>>> >> saw
>     >>>>>>>>       >>>> >>>>>>>  it at PST
>     >>>>>>>>       >>>> >>>>>>>  > >> > 10:30am
>     >>>>>>>>       >>>> >>>>>>>  > >> > > > (not sure how long it's been
>     >>>>>>>>       there before
>     >>>>>>>>       >>>> 10:30am).
>     >>>>>>>>       >>>> >>>>>>>  It's PST
>     >>>>>>>>       >>>> >>>>>>>  > 4:12pm
>     >>>>>>>>       >>>> >>>>>>>  > >> now
>     >>>>>>>>       >>>> >>>>>>>  > >> > > and
>     >>>>>>>>       >>>> >>>>>>>  > >> > > > it hasn't started yet.
>     >>>>>>>>       >>>> >>>>>>>  > >> > > >
>     >>>>>>>>       >>>> >>>>>>>  > >> > > > On Mon, Jun 24, 2019 at
>     2:48 PM
>     >>>>>>>>       Bowen Li
>     >>>>>>>>       >>>> >>>>>>>  <bowenli86@gmail.com
>     <ma...@gmail.com>
>     >>>>>>>>       <mailto:bowenli86@gmail.com
>     <ma...@gmail.com>> <mailto:bowenli86@gmail.com
>     <ma...@gmail.com>
>     >>>>>>>>       <mailto:bowenli86@gmail.com
>     <ma...@gmail.com>>>
>     >>>>>>>>       >>>> <mailto:bowenli86@gmail.com
>     <ma...@gmail.com> <mailto:
>     >> bowenli86@gmail.com <ma...@gmail.com>
>     >>>>>>>>       <mailto:bowenli86@gmail.com
>     <ma...@gmail.com> <mailto:bowenli86@gmail.com
>     <ma...@gmail.com>>>>>
>     >>>>>>>>       >>>> >>>>>>>  > >> wrote:
>     >>>>>>>>       >>>> >>>>>>>  > >> > > >
>     >>>>>>>>       >>>> >>>>>>>  > >> > > > > Hi devs,
>     >>>>>>>>       >>>> >>>>>>>  > >> > > > >
>     >>>>>>>>       >>>> >>>>>>>  > >> > > > > I've been experiencing
>     the pain
>     >>>>>>>>       >>>> resulting from lack
>     >>>>>>>>       >>>> >>>>>>>  of stable
>     >>>>>>>>       >>>> >>>>>>>  > >> build
>     >>>>>>>>       >>>> >>>>>>>  > >> > > > > capacity on Travis for Flink
>     >>>>>>>>       PRs [1].
>     >>>>>>>>       >>>> >> Specifically, I
>     >>>>>>>>       >>>> >>>>>>>  noticed
>     >>>>>>>>       >>>> >>>>>>>  > >> often
>     >>>>>>>>       >>>> >>>>>>>  > >> > > that
>     >>>>>>>>       >>>> >>>>>>>  > >> > > > no
>     >>>>>>>>       >>>> >>>>>>>  > >> > > > > build in the queue is making
>     >> any
>     >>>>>>>>       >>>> progress for
>     >>>>>>>>       >>>> >> hours,
>     >>>>>>>>       >>>> >>>> and
>     >>>>>>>>       >>>> >>>>>>>  > suddenly
>     >>>>>>>>       >>>> >>>>>>>  > >> 5
>     >>>>>>>>       >>>> >>>>>>>  > >> > or
>     >>>>>>>>       >>>> >>>>>>>  > >> > > 6
>     >>>>>>>>       >>>> >>>>>>>  > >> > > > > builds kick off all together
>     >>>>>>>>       after the
>     >>>>>>>>       >>>> long pause.
>     >>>>>>>>       >>>> >>>>>>>  I'm at PST
>     >>>>>>>>       >>>> >>>>>>>  > >> > (UTC-08)
>     >>>>>>>>       >>>> >>>>>>>  > >> > > > time
>     >>>>>>>>       >>>> >>>>>>>  > >> > > > > zone, and I've seen
>     pause can
>     >>>>>>>>       be as
>     >>>>>>>>       >>>> long as 6 hours
>     >>>>>>>>       >>>> >>>>>>>  from PST 9am
>     >>>>>>>>       >>>> >>>>>>>  > >> to
>     >>>>>>>>       >>>> >>>>>>>  > >> > 3pm
>     >>>>>>>>       >>>> >>>>>>>  > >> > > > > (let alone the time
>     needed to
>     >>>>>>>>       drain the
>     >>>>>>>>       >>>> queue
>     >>>>>>>>       >>>> >>>>>>>  afterwards).
>     >>>>>>>>       >>>> >>>>>>>  > >> > > > >
>     >>>>>>>>       >>>> >>>>>>>  > >> > > > > I think this has greatly
>     >>>>>>>>       impacted our
>     >>>>>>>>       >>>> productivity.
>     >>>>>>>>       >>>> >>>> I've
>     >>>>>>>>       >>>> >>>>>>>  > >> experienced
>     >>>>>>>>       >>>> >>>>>>>  > >> > > that
>     >>>>>>>>       >>>> >>>>>>>  > >> > > > > PRs submitted in the early
>     >>>>>>>>       morning of
>     >>>>>>>>       >>>> PST time zone
>     >>>>>>>>       >>>> >>>>>>>  won't finish
>     >>>>>>>>       >>>> >>>>>>>  > >> > their
>     >>>>>>>>       >>>> >>>>>>>  > >> > > > > build until late night
>     of the
>     >>>>>>>>       same day.
>     >>>>>>>>       >>>> >>>>>>>  > >> > > > >
>     >>>>>>>>       >>>> >>>>>>>  > >> > > > > So my questions are:
>     >>>>>>>>       >>>> >>>>>>>  > >> > > > >
>     >>>>>>>>       >>>> >>>>>>>  > >> > > > > - Has anyone else
>     experienced
>     >>>>>>>>       the same
>     >>>>>>>>       >>>> problem or
>     >>>>>>>>       >>>> >>>>>>>  have similar
>     >>>>>>>>       >>>> >>>>>>>  > >> > > > observation
>     >>>>>>>>       >>>> >>>>>>>  > >> > > > > on TravisCI? (I suspect it
>     >>>>>>>>       has things
>     >>>>>>>>       >>>> to do with
>     >>>>>>>>       >>>> >> time
>     >>>>>>>>       >>>> >>>>>>>  zone)
>     >>>>>>>>       >>>> >>>>>>>  > >> > > > >
>     >>>>>>>>       >>>> >>>>>>>  > >> > > > > - What pricing plan of
>     >>>>>>>>       TravisCI is
>     >>>>>>>>       >>>> Flink currently
>     >>>>>>>>       >>>> >>>>>>>  using? Is it
>     >>>>>>>>       >>>> >>>>>>>  > >> the
>     >>>>>>>>       >>>> >>>>>>>  > >> > > free
>     >>>>>>>>       >>>> >>>>>>>  > >> > > > > plan for open source
>     >>>>>>>>       projects? What
>     >>>>>>>>       >>>> are the
>     >>>>>>>>       >>>> >>>>>>>  guaranteed build
>     >>>>>>>>       >>>> >>>>>>>  > >> capacity
>     >>>>>>>>       >>>> >>>>>>>  > >> > > of
>     >>>>>>>>       >>>> >>>>>>>  > >> > > > > the current plan?
>     >>>>>>>>       >>>> >>>>>>>  > >> > > > >
>     >>>>>>>>       >>>> >>>>>>>  > >> > > > > - If the current pricing
>     plan
>     >>>>>>>>       (either
>     >>>>>>>>       >>>> free or paid)
>     >>>>>>>>       >>>> >>>>>> can't
>     >>>>>>>>       >>>> >>>>>>>  > provide
>     >>>>>>>>       >>>> >>>>>>>  > >> > > stable
>     >>>>>>>>       >>>> >>>>>>>  > >> > > > > build capacity, can we
>     >>>>>>>>       upgrade to a
>     >>>>>>>>       >>>> higher priced
>     >>>>>>>>       >>>> >>>>>>>  plan with
>     >>>>>>>>       >>>> >>>>>>>  > larger
>     >>>>>>>>       >>>> >>>>>>>  > >> > and
>     >>>>>>>>       >>>> >>>>>>>  > >> > > > more
>     >>>>>>>>       >>>> >>>>>>>  > >> > > > > stable build capacity?
>     >>>>>>>>       >>>> >>>>>>>  > >> > > > >
>     >>>>>>>>       >>>> >>>>>>>  > >> > > > > BTW, another factor that
>     >>>>>>>>       contribute to
>     >>>>>>>>       >>>> the
>     >>>>>>>>       >>>> >>>>>>>  productivity problem
>     >>>>>>>>       >>>> >>>>>>>  > is
>     >>>>>>>>       >>>> >>>>>>>  > >> > that
>     >>>>>>>>       >>>> >>>>>>>  > >> > > > > our build is slow - we run
>     >>>>>>>>       full build
>     >>>>>>>>       >>>> for every PR
>     >>>>>>>>       >>>> >>>> and a
>     >>>>>>>>       >>>> >>>>>>>  > >> successful
>     >>>>>>>>       >>>> >>>>>>>  > >> > > full
>     >>>>>>>>       >>>> >>>>>>>  > >> > > > > build takes ~5h. We
>     >>>>>>>>       definitely have
>     >>>>>>>>       >>>> more options to
>     >>>>>>>>       >>>> >>>>>>>  solve it,
>     >>>>>>>>       >>>> >>>>>>>  > for
>     >>>>>>>>       >>>> >>>>>>>  > >> > > > instance,
>     >>>>>>>>       >>>> >>>>>>>  > >> > > > > modularize the build graphs
>     >>>>>>>>       and reuse
>     >>>>>>>>       >>>> artifacts
>     >>>>>>>>       >>>> >> from
>     >>>>>>>>       >>>> >>>> the
>     >>>>>>>>       >>>> >>>>>>>  > previous
>     >>>>>>>>       >>>> >>>>>>>  > >> > > build.
>     >>>>>>>>       >>>> >>>>>>>  > >> > > > > But I think that can be
>     a big
>     >>>>>>>>       effort
>     >>>>>>>>       >>>> which is much
>     >>>>>>>>       >>>> >>>>>>>  harder to
>     >>>>>>>>       >>>> >>>>>>>  > >> > accomplish
>     >>>>>>>>       >>>> >>>>>>>  > >> > > > in
>     >>>>>>>>       >>>> >>>>>>>  > >> > > > > a short period of time and
>     >>>>>>>>       may deserve
>     >>>>>>>>       >>>> its own
>     >>>>>>>>       >>>> >>>> separate
>     >>>>>>>>       >>>> >>>>>>>  > >> discussion.
>     >>>>>>>>       >>>> >>>>>>>  > >> > > > >
>     >>>>>>>>       >>>> >>>>>>>  > >> > > > > [1]
>     >>>>>>>>       >>>> >>
>     https://travis-ci.org/apache/flink/pull_requests
>     >>>>>>>>       >>>> >>>>>>>  > >> > > > >
>     >>>>>>>>       >>>> >>>>>>>  > >> > > > >
>     >>>>>>>>       >>>> >>>>>>>  > >> > > >
>     >>>>>>>>       >>>> >>>>>>>  > >> > >
>     >>>>>>>>       >>>> >>>>>>>  > >> >
>     >>>>>>>>       >>>> >>>>>>>  > >>
>     >>>>>>>>       >>>> >>>>>>>  > >
>     >>>>>>>>       >>>> >>>>>>>  >
>     >>>>>>>>       >>>> >>>>>>>
>     >>>>>>>>       >>>> >>>>>>>
>     >>>>>>>>       >>>> >>>>>>>  --
>     >>>>>>>>       >>>> >>>>>>>  Best Regards
>     >>>>>>>>       >>>> >>>>>>>
>     >>>>>>>>       >>>> >>>>>>>  Jeff Zhang
>     >>>>>>>>       >>>> >>>>>>>
>     >>>>>>>>       >>>> >>
>     >>>>>>>>       >>>>
>     >>>>>>>>       >>>
>     >>>>>>>>       >>
>     >>>>>>>>
>     >>
>


Re: [RESULT][VOTE] Migrate to sponsored Travis account

Posted by Jark Wu <im...@gmail.com>.
Wow. That's great! Thanks Chesnay.

On Fri, 2 Aug 2019 at 17:50, Chesnay Schepler <ch...@apache.org> wrote:

> I'm currently modifying the cibot to do this automatically; should be
> finished until Monday.
>
> On 02/08/2019 07:41, Jark Wu wrote:
> > Hi Chesnay,
> >
> > Can we assign Flink Committers the permission of flink-ci/flink repo?
> > Several times, when I pushed some new commits, the old build jobs are
> still
> > in pending and not canceled.
> > Before we fix that, we can manually cancel some old jobs to save build
> > resource.
> >
> > Best,
> > Jark
> >
> >
> > On Wed, 10 Jul 2019 at 16:17, Chesnay Schepler <ch...@apache.org>
> wrote:
> >
> >> Your best bet would be to check the first commit in the PR and check the
> >> parent commit.
> >>
> >> To re-run things, you will have to rebase the PR on the latest master.
> >>
> >> On 10/07/2019 03:32, Kurt Young wrote:
> >>> Thanks for all your efforts Chesnay, it indeed improves a lot for our
> >>> develop experience. BTW, do you know how to find the master branch
> >>> information which the CI runs with?
> >>>
> >>> For example, like this one:
> >>> https://travis-ci.com/flink-ci/flink/jobs/214542568
> >>> It shows pass with the commits, which rebased on the master when the CI
> >>> is triggered. But it's both possible that the master branch CI runs on
> is
> >>> the
> >>> same or different with current master. If it's the same, I can simply
> >> rely
> >>> on the
> >>> passed information to push commits, but if it's not, I think i should
> >> find
> >>> another
> >>> way to re-trigger tests based on the newest master.
> >>>
> >>> Do you know where can I get such information?
> >>>
> >>> Best,
> >>> Kurt
> >>>
> >>>
> >>> On Tue, Jul 9, 2019 at 3:27 AM Chesnay Schepler <ch...@apache.org>
> >> wrote:
> >>>> The kinks have been worked out; the bot is running again and pr builds
> >>>> are yet again no longer running on ASF resources.
> >>>>
> >>>> PRs are mirrored to: https://github.com/flink-ci/flink
> >>>> Bot source: https://github.com/flink-ci/ci-bot
> >>>>
> >>>> On 08/07/2019 17:14, Chesnay Schepler wrote:
> >>>>> I have temporarily re-enabled running PR builds on the ASF account;
> >>>>> migrating to the Travis subscription caused some issues in the bot
> >>>>> that I have to fix first.
> >>>>>
> >>>>> On 07/07/2019 23:01, Chesnay Schepler wrote:
> >>>>>> The vote has passed unanimously in favor of migrating to a separate
> >>>>>> Travis account.
> >>>>>>
> >>>>>> I will now set things up such that no PullRequest is no longer run
> on
> >>>>>> the ASF servers.
> >>>>>> This is a major setup in reducing our usage of ASF resources.
> >>>>>> For the time being we'll use free Travis plan for flink-ci (i.e. 5
> >>>>>> workers, which is the same the ASF gives us). Over the course of the
> >>>>>> next week we'll setup the Ververica subscription to increase this
> >> limit.
> >>>>>>   From now now, a bot will mirror all new and updated PullRequests
> to a
> >>>>>> mirror repository (https://github.com/flink-ci/flink-ci) and write
> an
> >>>>>> update into the PR once the build is complete.
> >>>>>> I have ran the bots for the past 3 days in parallel to our existing
> >>>>>> Travis and it was working without major issues.
> >>>>>>
> >>>>>> The biggest change that contributors will see is that there's no
> >>>>>> longer a icon next to each commit. We may revisit this in the
> future.
> >>>>>>
> >>>>>> I'll setup a repo with the source of the bot later.
> >>>>>>
> >>>>>> On 04/07/2019 10:46, Chesnay Schepler wrote:
> >>>>>>> I've raised a JIRA
> >>>>>>> <https://issues.apache.org/jira/browse/INFRA-18703>with INFRA to
> >>>>>>> inquire whether it would be possible to switch to a different
> Travis
> >>>>>>> account, and if so what steps would need to be taken.
> >>>>>>> We need a proper confirmation from INFRA since we are not in full
> >>>>>>> control of the flink repository (for example, we cannot access the
> >>>>>>> settings page).
> >>>>>>>
> >>>>>>> If this is indeed possible, Ververica is willing sponsor a Travis
> >>>>>>> account for the Flink project.
> >>>>>>> This would provide us with more than enough resources than we need.
> >>>>>>>
> >>>>>>> Since this makes the project more reliant on resources provided by
> >>>>>>> external companies I would like to vote on this.
> >>>>>>>
> >>>>>>> Please vote on this proposal, as follows:
> >>>>>>> [ ] +1, Approve the migration to a Ververica-sponsored Travis
> >>>>>>> account, provided that INFRA approves
> >>>>>>> [ ] -1, Do not approach the migration to a Ververica-sponsored
> >>>>>>> Travis account
> >>>>>>>
> >>>>>>> The vote will be open for at least 24h, and until we have
> >>>>>>> confirmation from INFRA. The voting period may be shorter than the
> >>>>>>> usual 3 days since our current is effectively not working.
> >>>>>>>
> >>>>>>> On 04/07/2019 06:51, Bowen Li wrote:
> >>>>>>>> Re: > Are they using their own Travis CI pool, or did the switch
> to
> >>>>>>>> an entirely different CI service?
> >>>>>>>>
> >>>>>>>> I reached out to Wes and Krisztián from Apache Arrow PMC. They are
> >>>>>>>> currently moving away from ASF's Travis to their own in-house
> metal
> >>>>>>>> machines at [1] with custom CI application at [2]. They've seen
> >>>>>>>> significant improvement w.r.t both much higher performance and
> >>>>>>>> basically no resource waiting time, "night-and-day" difference
> >>>>>>>> quoting Wes.
> >>>>>>>>
> >>>>>>>> Re: > If we can just switch to our own Travis pool, just for our
> >>>>>>>> project, then this might be something we can do fairly quickly?
> >>>>>>>>
> >>>>>>>> I believe so, according to [3] and [4]
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> [1] https://ci.ursalabs.org/ <https://ci.ursalabs.org/#/>
> >>>>>>>> [2] https://github.com/ursa-labs/ursabot
> >>>>>>>> [3]
> >>>>>>>>
> >>
> https://docs.travis-ci.com/user/migrate/open-source-repository-migration
> >>>>>>>> [4]
> >>>>>>>>
> >> https://docs.travis-ci.com/user/migrate/open-source-on-travis-ci-com
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Wed, Jul 3, 2019 at 12:01 AM Chesnay Schepler
> >>>>>>>> <chesnay@apache.org <ma...@apache.org>> wrote:
> >>>>>>>>
> >>>>>>>>       Are they using their own Travis CI pool, or did the switch
> to
> >> an
> >>>>>>>>       entirely different CI service?
> >>>>>>>>
> >>>>>>>>       If we can just switch to our own Travis pool, just for our
> >>>>>>>>       project, then
> >>>>>>>>       this might be something we can do fairly quickly?
> >>>>>>>>
> >>>>>>>>       On 03/07/2019 05:55, Bowen Li wrote:
> >>>>>>>>       > I responded in the INFRA ticket [1] that I believe they
> are
> >>>>>>>>       using a wrong
> >>>>>>>>       > metric against Flink and the total build time is a
> completely
> >>>>>>>>       different
> >>>>>>>>       > thing than guaranteed build capacity.
> >>>>>>>>       >
> >>>>>>>>       > My response:
> >>>>>>>>       >
> >>>>>>>>       > "As mentioned above, since I started to pay attention to
> >> Flink's
> >>>>>>>>       build
> >>>>>>>>       > queue a few tens of days ago, I'm in Seattle and I saw no
> >> build
> >>>>>>>>       was kicking
> >>>>>>>>       > off in PST daytime in weekdays for Flink. Our teammates in
> >> China
> >>>>>>>>       and Europe
> >>>>>>>>       > have also reported similar observations. So we need to
> >> evaluate
> >>>>>>>>       how the
> >>>>>>>>       > large total build time came from - if 1) your number and
> 2)
> >> our
> >>>>>>>>       > observations from three locations that cover pretty much a
> >> full
> >>>>>>>>       day, are
> >>>>>>>>       > all true, I **guess** one reason can be that - highly
> likely
> >> the
> >>>>>>>>       extra
> >>>>>>>>       > build time came from weekends when other Apache projects
> may
> >> be
> >>>>>>>>       idle and
> >>>>>>>>       > Flink just drains hard its congested queue.
> >>>>>>>>       >
> >>>>>>>>       > Please be aware of that we're not complaining about the
> lack
> >> of
> >>>>>>>>       resources
> >>>>>>>>       > in general, I'm complaining about the lack of **stable,
> >>>>>>>> dedicated**
> >>>>>>>>       > resources. An example for the latter one is, currently
> even
> >> if
> >>>>>>>>       no build is
> >>>>>>>>       > in Flink's queue and I submit a request to be the queue
> head
> >>>>>>>> in PST
> >>>>>>>>       > morning, my build won't even start in 6-8+h. That is an
> >> absurd
> >>>>>>>>       amount of
> >>>>>>>>       > waiting time.
> >>>>>>>>       >
> >>>>>>>>       > That's saying, if ASF INFRA decides to adopt a quota
> system
> >> and
> >>>>>>>>       grants
> >>>>>>>>       > Flink five DEDICATED servers that runs all the time only
> for
> >>>>>>>>       Flink, that'll
> >>>>>>>>       > be PERFECT and can totally solve our problem now.
> >>>>>>>>       >
> >>>>>>>>       > Please be aware of that we're not complaining about the
> lack
> >> of
> >>>>>>>>       resources
> >>>>>>>>       > in general, I'm complaining about the lack of **stable,
> >>>>>>>> dedicated**
> >>>>>>>>       > resources. An example for the latter one is, currently
> even
> >> if
> >>>>>>>>       no build is
> >>>>>>>>       > in Flink's queue and I submit a request to be the queue
> head
> >>>>>>>> in PST
> >>>>>>>>       > morning, my build won't even start in 6-8+h. That is an
> >> absurd
> >>>>>>>>       amount of
> >>>>>>>>       > waiting time.
> >>>>>>>>       >
> >>>>>>>>       >
> >>>>>>>>       > That's saying, if ASF INFRA decides to adopt a quota
> system
> >> and
> >>>>>>>>       grants
> >>>>>>>>       > Flink five DEDICATED servers that runs all the time only
> for
> >>>>>>>>       Flink, that'll
> >>>>>>>>       > be PERFECT and can totally solve our problem now.
> >>>>>>>>       >
> >>>>>>>>       > I feel what's missing in the ASF INFRA's Travis resource
> >> pool is
> >>>>>>>>       some level
> >>>>>>>>       > of build capacity SLAs and certainty"
> >>>>>>>>       >
> >>>>>>>>       >
> >>>>>>>>       > Again, I believe there are differences in nature of these
> two
> >>>>>>>>       problems,
> >>>>>>>>       > long build time v.s. lack of dedicated build resource.
> That's
> >>>>>>>>       saying,
> >>>>>>>>       > shortening build time may relieve the situation, and may
> not.
> >>>>>>>>       I'm sightly
> >>>>>>>>       > negative on disabling IT cases for PRs, due to the
> downside
> >> is
> >>>>>>>>       that we are
> >>>>>>>>       > at risk of any potential bugs in PR that UTs doesn't
> catch,
> >> and
> >>>>>>>>       may cost a
> >>>>>>>>       > lot more to fix and if it slows others down or even block
> >>>>>>>>       others, but am
> >>>>>>>>       > open to others opinions on it.
> >>>>>>>>       >
> >>>>>>>>       > AFAICT from INFRA ticket[1], donating to ASF INFRA won't
> be
> >>>>>>>>       feasible to
> >>>>>>>>       > solve our problem since INFRA's pool is fully shared and
> they
> >>>>>>>>       have no
> >>>>>>>>       > control and finer insights over resource allocation to a
> >>>>>>>>       specific Apache
> >>>>>>>>       > project. As mentioned in [1], Apache Arrow is moving away
> >> from
> >>>>>>>>       ASF INFRA
> >>>>>>>>       > Travis pool (they are actually surprised Flink hasn't plan
> >> to do
> >>>>>>>>       so). I
> >>>>>>>>       > know that Spark is on its own build infra. If we all agree
> >> that
> >>>>>>>>       funding our
> >>>>>>>>       > own build infra, I'd be glad to help investigate any
> >> potential
> >>>>>>>>       options
> >>>>>>>>       > after releasing 1.9 since I'm super busy with 1.9 now.
> >>>>>>>>       >
> >>>>>>>>       > [1] https://issues.apache.org/jira/browse/INFRA-18533
> >>>>>>>>       >
> >>>>>>>>       >
> >>>>>>>>       >
> >>>>>>>>       > On Tue, Jul 2, 2019 at 4:46 AM Chesnay Schepler
> >>>>>>>>       <chesnay@apache.org <ma...@apache.org>> wrote:
> >>>>>>>>       >
> >>>>>>>>       >> As a short-term stopgap, since we can assume this issue
> to
> >>>>>>>>       become much
> >>>>>>>>       >> worse in the following days/weeks, we could disable IT
> >> cases in
> >>>>>>>>       PRs and
> >>>>>>>>       >> only run them on master.
> >>>>>>>>       >>
> >>>>>>>>       >> On 02/07/2019 12:03, Chesnay Schepler wrote:
> >>>>>>>>       >>> People really have to stop thinking that just because
> >>>>>>>>       something works
> >>>>>>>>       >>> for us it is also a good solution.
> >>>>>>>>       >>> Also, please remember that our builds run for 2h from
> >> start to
> >>>>>>>>       finish,
> >>>>>>>>       >>> and not the 14 _minutes_ it takes for zeppelin.
> >>>>>>>>       >>> We are dealing with an entirely different scale here,
> both
> >> in
> >>>>>>>>       terms of
> >>>>>>>>       >>> build times and number of builds.
> >>>>>>>>       >>>
> >>>>>>>>       >>> In this very thread people have been complaining about
> long
> >>>>>>>> queue
> >>>>>>>>       >>> times for their builds. Surprise, other Apache projects
> >>>>>>>> have been
> >>>>>>>>       >>> suffering the very same thing due to us not controlling
> our
> >>>>>>>> build
> >>>>>>>>       >>> times. While switching services (be it Jenkins,
> CircleCI or
> >>>>>>>>       whatever)
> >>>>>>>>       >>> will possibly work for us (and these options are
> actually
> >>>>>>>>       attractive,
> >>>>>>>>       >>> like CircleCI's proper support for build artifacts), it
> >>>>>>>> will also
> >>>>>>>>       >>> result in us likely negatively affecting other projects
> in
> >>>>>>>>       significant
> >>>>>>>>       >>> ways.
> >>>>>>>>       >>>
> >>>>>>>>       >>> Sure, the Jenkins setup has a good user experience for
> us,
> >> at
> >>>>>>>>       the cost
> >>>>>>>>       >>> of blocking Jenkins workers for a _lot_ of time. Right
> now
> >> we
> >>>>>>>>       have 25
> >>>>>>>>       >>> PR's in our queue; that's possibly 50h we'd consume of
> >> Jenkins
> >>>>>>>>       >>> resources, and the European contributors haven't even
> >> really
> >>>>>>>>       started yet.
> >>>>>>>>       >>>
> >>>>>>>>       >>> FYI, the latest INFRA response from INFRA-18533:
> >>>>>>>>       >>>
> >>>>>>>>       >>> "Our rough metrics shows that Flink used over 5800
> hours of
> >>>>>>>>       build time
> >>>>>>>>       >>> last month. That is equal to EIGHT servers running 24/7
> for
> >>>>>>>>       the ENTIRE
> >>>>>>>>       >>> MONTH. EIGHT. nonstop.
> >>>>>>>>       >>> When we discovered this last night, we discussed it some
> >> and
> >>>>>>>>       are going
> >>>>>>>>       >>> to tune down Flink to allow only five executors
> maximum. We
> >>>>>>>> cannot
> >>>>>>>>       >>> allow Flink to consume so much of a Foundation shared
> >>>>>>>> resource."
> >>>>>>>>       >>>
> >>>>>>>>       >>> So yes, we either
> >>>>>>>>       >>> a) have to heavily reduce our CI usage or
> >>>>>>>>       >>> b) fund our own, either maintaining it ourselves or
> >> donating
> >>>>>>>>       to Apache.
> >>>>>>>>       >>>
> >>>>>>>>       >>> On 02/07/2019 05:11, Bowen Li wrote:
> >>>>>>>>       >>>> By looking at the git history of the Jenkins script,
> its
> >> core
> >>>>>>>>       part
> >>>>>>>>       >>>> was finished in March 2017 (and only two minor update
> in
> >>>>>>>>       2017/2018),
> >>>>>>>>       >>>> so it's been running for over two years now and feels
> like
> >>>>>>>>       Zepplin
> >>>>>>>>       >>>> community has been quite happy with it. @Jeff Zhang
> >>>>>>>>       >>>> <mailto:zjffdu@gmail.com <ma...@gmail.com>>
> can
> >> you
> >>>>>>>>       share your insights and user
> >>>>>>>>       >>>> experience with the Jenkins+Travis approach?
> >>>>>>>>       >>>>
> >>>>>>>>       >>>> Things like:
> >>>>>>>>       >>>>
> >>>>>>>>       >>>> - has the approach completely solved the resource
> capacity
> >>>>>>>>       problem
> >>>>>>>>       >>>> for Zepplin community? is Zepplin community happy with
> the
> >>>>>>>>       result?
> >>>>>>>>       >>>> - is the whole configuration chain stable (e.g. uptime)
> >>>>>>>> enough?
> >>>>>>>>       >>>> - how often do you need to maintain the Jenkins infra?
> how
> >>>>>>>> many
> >>>>>>>>       >>>> people are usually involved in maintenance and
> bug-fixes?
> >>>>>>>>       >>>>
> >>>>>>>>       >>>> The downside of this approach seems mostly to be on the
> >>>>>>>>       maintenance
> >>>>>>>>       >>>> to me - maintain the script and Jenkins infra.
> >>>>>>>>       >>>>
> >>>>>>>>       >>>> ** Having Our Own Travis-CI.com Account **
> >>>>>>>>       >>>>
> >>>>>>>>       >>>> Another alternative I've been thinking of is to have
> our
> >> own
> >>>>>>>>       >>>> travis-ci.com <http://travis-ci.com> <
> >> http://travis-ci.com>
> >>>>>>>>       account with paid dedicated
> >>>>>>>>       >>>> resources. Note travis-ci.org <http://travis-ci.org>
> >>>>>>>>       <http://travis-ci.org> is the free
> >>>>>>>>       >>>> version and travis-ci.com <http://travis-ci.com>
> >>>>>>>>       <http://travis-ci.com> is the commercial
> >>>>>>>>       >>>> version. We currently use a shared resource pool
> managed
> >> by
> >>>>>>>>       ASK INFRA
> >>>>>>>>       >>>> team on travis-ci.org <http://travis-ci.org>
> >>>>>>>>       <http://travis-ci.org>, but we have no control
> >>>>>>>>       >>>> over it - we can't see how it's configured, how much
> >>>>>>>>       resources are
> >>>>>>>>       >>>> available, how resources are allocated among Apache
> >> projects,
> >>>>>>>>       etc.
> >>>>>>>>       >>>> The nice thing about having an account on
> travis-ci.com
> >>>>>>>>       <http://travis-ci.com>
> >>>>>>>>       >>>> <http://travis-ci.com> are:
> >>>>>>>>       >>>>
> >>>>>>>>       >>>> - relatively low cost with much better resource
> guarantee
> >>>>>>>>       than what
> >>>>>>>>       >>>> we currently have [1]: $249/month with 5 dedicated
> >>>>>>>> concurrency,
> >>>>>>>>       >>>> $489/month with 10 concurrency
> >>>>>>>>       >>>> - low maintenance work compared to using Jenkins
> >>>>>>>>       >>>> - (potentially) no migration cost according to Travis's
> >>>>>>>> doc [2]
> >>>>>>>>       >>>> (pending verification)
> >>>>>>>>       >>>> - full control over the build capacity/configuration
> >>>>>>>> compared to
> >>>>>>>>       >>>> using ASF INFRA's pool
> >>>>>>>>       >>>>
> >>>>>>>>       >>>> I'd be surprised if we as such a vibrant community
> cannot
> >>>>>>>>       find and
> >>>>>>>>       >>>> fund $249*12=$2988 a year in exchange for a much better
> >>>>>>>> developer
> >>>>>>>>       >>>> experience and much higher productivity.
> >>>>>>>>       >>>>
> >>>>>>>>       >>>> [1] https://travis-ci.com/plans
> >>>>>>>>       >>>> [2]
> >>>>>>>>       >>>>
> >>>>>>>>       >>
> >>>>>>>>
> >>
> https://docs.travis-ci.com/user/migrate/open-source-repository-migration
> >>>>>>>>       >>>> On Sat, Jun 29, 2019 at 8:39 AM Chesnay Schepler
> >>>>>>>>       <chesnay@apache.org <ma...@apache.org>
> >>>>>>>>       >>>> <mailto:chesnay@apache.org <mailto:chesnay@apache.org
> >>>
> >>>>>>>> wrote:
> >>>>>>>>       >>>>
> >>>>>>>>       >>>>      So yes, the Jenkins job keeps pulling the state
> from
> >>>>>>>>       Travis until it
> >>>>>>>>       >>>>      finishes.
> >>>>>>>>       >>>>
> >>>>>>>>       >>>>      Note sure I'm comfortable with the idea of using
> >> Jenkins
> >>>>>>>>       workers
> >>>>>>>>       >>>>      just to
> >>>>>>>>       >>>>      idle for a several hours.
> >>>>>>>>       >>>>
> >>>>>>>>       >>>>      On 29/06/2019 14:56, Jeff Zhang wrote:
> >>>>>>>>       >>>>      > Here's what zeppelin community did, we make a
> >> python
> >>>>>>>>       script to
> >>>>>>>>       >>>>      check the
> >>>>>>>>       >>>>      > build status of pull request.
> >>>>>>>>       >>>>      > Here's script:
> >>>>>>>>       >>>>      >
> >>>>>>>> https://github.com/apache/zeppelin/blob/master/travis_check.py
> >>>>>>>>       >>>>      >
> >>>>>>>>       >>>>      > And this is the script we used in Jenkins build
> >> job.
> >>>>>>>>       >>>>      >
> >>>>>>>>       >>>>      > if [ -f "travis_check.py" ]; then
> >>>>>>>>       >>>>      >    git log -n 1
> >>>>>>>>       >>>>      >    STATUS=$(curl -s $BUILD_URL | grep -e "GitHub
> >> pull
> >>>>>>>>       >>>>      request.*from.*" | sed
> >>>>>>>>       >>>>      > 's/.*GitHub pull request <a
> >>>>>>>>       >>>>      >
> >> href=\"\(https[^"]*\).*from[^"]*.\(https[^"]*\).*/\1
> >>>>>>>>       \2/g')
> >>>>>>>>       >>>>      >    AUTHOR=$(echo $STATUS | sed
> >> 's/.*[/]\(.*\)$/\1/g')
> >>>>>>>>       >>>>      >    PR=$(echo $STATUS | awk '{print $1}' | sed
> >>>>>>>>       >>>> 's/.*[/]\(.*\)$/\1/g')
> >>>>>>>>       >>>>      >    #COMMIT=$(git log -n 1 | grep "^Merge:" | awk
> >>>>>>>>       '{print $3}')
> >>>>>>>>       >>>>      >    #if [ -z $COMMIT ]; then
> >>>>>>>>       >>>>      >    #  COMMIT=$(curl -s
> >>>>>>>>       >>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR
> >>>>>>>>       >>>>      > | grep -e "\"label\":" -e "\"ref\":" -e
> "\"sha\":"
> >> |
> >>>>>>>>       tr '\n' ' '
> >>>>>>>>       >>>>      | sed
> >>>>>>>>       >>>>      > 's/\(.*sha[^,]*,\)\(.*ref.*\)/\1 = \2/g' | tr =
> >> '\n' |
> >>>>>>>>       grep -v
> >>>>>>>>       >>>>      "apache:" |
> >>>>>>>>       >>>>      > sed 's/.*sha.[^"]*["]\([^"]*\).*/\1/g')
> >>>>>>>>       >>>>      >    #fi
> >>>>>>>>       >>>>      >
> >>>>>>>>       >>>>      >    # get commit hash from PR
> >>>>>>>>       >>>>      >    COMMIT=$(curl -s
> >>>>>>>>       >>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR
> |
> >>>>>>>>       >>>>      > grep -e "\"label\":" -e "\"ref\":" -e
> "\"sha\":" |
> >> tr
> >>>>>>>>       '\n' ' '
> >>>>>>>>       >>>> | sed
> >>>>>>>>       >>>>      > 's/\(.*sha[^,]*,\)\(.*ref.*\)/\1 = \2/g' | tr =
> >> '\n' |
> >>>>>>>>       grep -v
> >>>>>>>>       >>>>      "apache:" |
> >>>>>>>>       >>>>      > sed 's/.*sha.[^"]*["]\([^"]*\).*/\1/g')
> >>>>>>>>       >>>>      >    sleep 30 # sleep few moment to wait travis
> >> starts
> >>>>>>>>       the build
> >>>>>>>>       >>>>      >    RET_CODE=0
> >>>>>>>>       >>>>      >    python ./travis_check.py ${AUTHOR} ${COMMIT}
> ||
> >>>>>>>>       RET_CODE=$?
> >>>>>>>>       >>>>      >    if [ $RET_CODE -eq 2 ]; then # try with
> >> repository
> >>>>>>>>       name when
> >>>>>>>>       >>>>      travis-ci is
> >>>>>>>>       >>>>      > not available in the account
> >>>>>>>>       >>>>      >      RET_CODE=0
> >>>>>>>>       >>>>      >      AUTHOR=$(curl -s
> >>>>>>>>       >>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR
> >>>>>>>>       >>>>      > | grep '"full_name":' | grep -v
> "apache/zeppelin" |
> >>>>>>>> sed
> >>>>>>>>       >>>>      > 's/.*[:][^"]*["]\([^/]*\).*/\1/g')
> >>>>>>>>       >>>>      >    python ./travis_check.py ${AUTHOR} ${COMMIT}
> ||
> >>>>>>>>       RET_CODE=$?
> >>>>>>>>       >>>>      >    fi
> >>>>>>>>       >>>>      >
> >>>>>>>>       >>>>      >    if [ $RET_CODE -eq 2 ]; then # fail with
> can't
> >> find
> >>>>>>>>       build
> >>>>>>>>       >>>>      information in
> >>>>>>>>       >>>>      > the travis
> >>>>>>>>       >>>>      >      set +x
> >>>>>>>>       >>>>      >      echo
> >>>>>>>>       "-----------------------------------------------------"
> >>>>>>>>       >>>>      >      echo "Looks like travis-ci is not
> configured
> >> for
> >>>>>>>>       your fork."
> >>>>>>>>       >>>>      >      echo "Please setup by swich on 'zeppelin'
> >>>>>>>>       repository at
> >>>>>>>>       >>>>      > https://travis-ci.org/profile and travis-ci."
> >>>>>>>>       >>>>      >      echo "And then make sure 'Build branch
> >> updates'
> >>>>>>>>       option is
> >>>>>>>>       >>>>      enabled in
> >>>>>>>>       >>>>      > the settings
> >>>>>>>>       https://travis-ci.org/${AUTHOR}/zeppelin/settings
> >>>>>>>> <https://travis-ci.org/$%7BAUTHOR%7D/zeppelin/settings>
> >>>>>>>>       >>>> <https://travis-ci.org/$%7BAUTHOR%7D/zeppelin/settings
> >."
> >>>>>>>>       >>>>      >      echo ""
> >>>>>>>>       >>>>      >      echo "To trigger CI after setup, you will
> need
> >>>>>>>>       ammend your
> >>>>>>>>       >>>>      last commit
> >>>>>>>>       >>>>      > with"
> >>>>>>>>       >>>>      >      echo "git commit --amend"
> >>>>>>>>       >>>>      >      echo "git push your-remote HEAD --force"
> >>>>>>>>       >>>>      >      echo ""
> >>>>>>>>       >>>>      >      echo "See
> >>>>>>>>       >>>>      >
> >>>>>>>>       >>>>
> >>>>>>>>       >>
> >>>>>>>>
> >>
> http://zeppelin.apache.org/contribution/contributions.html#continuous-integration
> >>>>>>>>       >>>>      > ."
> >>>>>>>>       >>>>      >    fi
> >>>>>>>>       >>>>      >
> >>>>>>>>       >>>>      >    exit $RET_CODE
> >>>>>>>>       >>>>      > else
> >>>>>>>>       >>>>      >    set +x
> >>>>>>>>       >>>>      >    echo "travis_check.py does not exists"
> >>>>>>>>       >>>>      >    exit 1
> >>>>>>>>       >>>>      > fi
> >>>>>>>>       >>>>      >
> >>>>>>>>       >>>>      > Chesnay Schepler <chesnay@apache.org
> >>>>>>>>       <ma...@apache.org>
> >>>>>>>>       >>>>      <mailto:chesnay@apache.org <mailto:
> >> chesnay@apache.org
> >>>>>>>>       于2019年6月29日周六 下午3:17写道:
> >>>>>>>>       >>>>      >
> >>>>>>>>       >>>>      >> Does this imply that a Jenkins job is active as
> >> long
> >>>>>>>>       as the
> >>>>>>>>       >>>>      Travis build
> >>>>>>>>       >>>>      >> runs?
> >>>>>>>>       >>>>      >>
> >>>>>>>>       >>>>      >> On 26/06/2019 21:28, Bowen Li wrote:
> >>>>>>>>       >>>>      >>> Hi,
> >>>>>>>>       >>>>      >>>
> >>>>>>>>       >>>>      >>> @Dawid, I think the "long test running" as I
> >>>>>>>>       mentioned in the
> >>>>>>>>       >>>>      first
> >>>>>>>>       >>>>      >> email,
> >>>>>>>>       >>>>      >>> also as you guys said, belongs to "a big
> effort
> >>>>>>>>       which is much
> >>>>>>>>       >>>>      harder to
> >>>>>>>>       >>>>      >>> accomplish in a short period of time and may
> >> deserve
> >>>>>>>>       its own
> >>>>>>>>       >>>>      separate
> >>>>>>>>       >>>>      >>> discussion". Thus I didn't include it in what
> we
> >> can
> >>>>>>>>       do in a
> >>>>>>>>       >>>>      foreseeable
> >>>>>>>>       >>>>      >>> short term.
> >>>>>>>>       >>>>      >>>
> >>>>>>>>       >>>>      >>> Besides, I don't think that's the ultimate
> reason
> >>>>>>>>       for lack of
> >>>>>>>>       >>>>      build
> >>>>>>>>       >>>>      >>> resources. Even if the build is shortened to
> >>>>>>>>       something like
> >>>>>>>>       >>>>      2h, the
> >>>>>>>>       >>>>      >>> problems of no build machine works about 6 or
> >> more
> >>>>>>>>       hours in
> >>>>>>>>       >>>>      PST daytime
> >>>>>>>>       >>>>      >>> that I described will still happen, because no
> >>>>>>>>       machine from
> >>>>>>>>       >>>>      ASF INFRA's
> >>>>>>>>       >>>>      >>> pool is allocated to Flink. As I have paid
> close
> >>>>>>>>       attention to
> >>>>>>>>       >>>>      the build
> >>>>>>>>       >>>>      >>> queue in the past few weekdays, it's a pretty
> >> clear
> >>>>>>>>       pattern now.
> >>>>>>>>       >>>>      >>>
> >>>>>>>>       >>>>      >>> **The ultimate root cause** for that is - we
> >> don't
> >>>>>>>>       have any
> >>>>>>>>       >>>>      **dedicated**
> >>>>>>>>       >>>>      >>> build resources that we can stably rely on.
> I'm
> >>>>>>>>       actually ok to
> >>>>>>>>       >>>>      wait for a
> >>>>>>>>       >>>>      >>> long time if there are build requests
> running, it
> >>>>>>>>       means at
> >>>>>>>>       >>>>      least we are
> >>>>>>>>       >>>>      >>> making progress. But I'm not ok with no build
> >>>>>>>>       resource. A
> >>>>>>>>       >>>>      better place I
> >>>>>>>>       >>>>      >>> think we should aim at in short term is to
> always
> >>>>>>>>       have at
> >>>>>>>>       >>>>      least a central
> >>>>>>>>       >>>>      >>> pool (can be 3 or 5) of machines dedicated to
> >> build
> >>>>>>>>       Flink at
> >>>>>>>>       >>>>      any time, or
> >>>>>>>>       >>>>      >>> maybe use users resources.
> >>>>>>>>       >>>>      >>>
> >>>>>>>>       >>>>      >>> @Chesnay @Robert I synced with Jeff offline
> that
> >>>>>>>>       Zeppelin
> >>>>>>>>       >>>>      community is
> >>>>>>>>       >>>>      >>> using a Jenkins job to automatically build on
> >> users'
> >>>>>>>>       travis
> >>>>>>>>       >>>>      account and
> >>>>>>>>       >>>>      >>> link the result back to github PR. I guess the
> >>>>>>>>       Jenkins job
> >>>>>>>>       >>>>      would fetch
> >>>>>>>>       >>>>      >>> latest upstream master and build the PR
> against
> >> it.
> >>>>>>>>       Jeff has
> >>>>>>>>       >>>> filed
> >>>>>>>>       >>>>      >> tickets
> >>>>>>>>       >>>>      >>> to learn and get access to the Jenkins infra.
> >> It'll
> >>>>>>>>       better to
> >>>>>>>>       >>>>      fully
> >>>>>>>>       >>>>      >>> understand it first before judging this
> approach.
> >>>>>>>>       >>>>      >>>
> >>>>>>>>       >>>>      >>> I also heard good things about CircleCI, and
> ASF
> >>>>>>>>       INFRA seems
> >>>>>>>>       >>>>      to have a
> >>>>>>>>       >>>>      >> pool
> >>>>>>>>       >>>>      >>> of build capacity there too. Can be an
> >> alternative
> >>>>>>>>       to consider.
> >>>>>>>>       >>>>      >>>
> >>>>>>>>       >>>>      >>>
> >>>>>>>>       >>>>      >>>
> >>>>>>>>       >>>>      >>>
> >>>>>>>>       >>>>      >>>
> >>>>>>>>       >>>>      >>>
> >>>>>>>>       >>>>      >>>
> >>>>>>>>       >>>>      >>>
> >>>>>>>>       >>>>      >>>
> >>>>>>>>       >>>>      >>> On Wed, Jun 26, 2019 at 12:44 AM Dawid
> >> Wysakowicz <
> >>>>>>>>       >>>>      >> dwysakowicz@apache.org
> >>>>>>>>       <ma...@apache.org> <mailto:
> dwysakowicz@apache.org
> >>>>>>>>       <ma...@apache.org>>>
> >>>>>>>>       >>>>      >>> wrote:
> >>>>>>>>       >>>>      >>>
> >>>>>>>>       >>>>      >>>> Sorry to jump in late, but I think Bowen
> missed
> >> the
> >>>>>>>>       most
> >>>>>>>>       >>>>      important point
> >>>>>>>>       >>>>      >>>> from Chesnay's previous message in the
> summary.
> >> The
> >>>>>>>>       ultimate
> >>>>>>>>       >>>>      reason for
> >>>>>>>>       >>>>      >>>> all the problems is that the tests take close
> >> to 2
> >>>>>>>>       hours to
> >>>>>>>>       >>>>      run already.
> >>>>>>>>       >>>>      >>>> I fully support this claim: "Unless people
> start
> >>>>>>>>       caring about
> >>>>>>>>       >>>>      test times
> >>>>>>>>       >>>>      >>>> before adding them, this issue cannot be
> solved"
> >>>>>>>>       >>>>      >>>>
> >>>>>>>>       >>>>      >>>> This is also another reason why using user's
> >> Travis
> >>>>>>>>       account
> >>>>>>>>       >>>>      won't help.
> >>>>>>>>       >>>>      >>>> Every few weeks we reach the user's time
> limit
> >> for
> >>>>>>>>       a single
> >>>>>>>>       >>>>      profile.
> >>>>>>>>       >>>>      >>>> This makes the user's builds simply fail,
> until
> >> we
> >>>>>>>>       either
> >>>>>>>>       >>>>      properly
> >>>>>>>>       >>>>      >>>> decrease the time the tests take (which I am
> not
> >>>>>>>>       sure we ever
> >>>>>>>>       >>>>      did) or
> >>>>>>>>       >>>>      >>>> postpone the problem by splitting into more
> >>>>>>>>       profiles. (Note
> >>>>>>>>       >>>>      that the ASF
> >>>>>>>>       >>>>      >>>> Travis account has higher time limits)
> >>>>>>>>       >>>>      >>>>
> >>>>>>>>       >>>>      >>>> Best,
> >>>>>>>>       >>>>      >>>>
> >>>>>>>>       >>>>      >>>> Dawid
> >>>>>>>>       >>>>      >>>>
> >>>>>>>>       >>>>      >>>> On 26/06/2019 09:36, Robert Metzger wrote:
> >>>>>>>>       >>>>      >>>>> Do we know if using "the best" available
> >> hardware
> >>>>>>>>       would
> >>>>>>>>       >>>>      improve the
> >>>>>>>>       >>>>      >> build
> >>>>>>>>       >>>>      >>>>> times?
> >>>>>>>>       >>>>      >>>>> Imagine we would run the build on machines
> with
> >>>>>>>>       plenty of
> >>>>>>>>       >>>>      main memory
> >>>>>>>>       >>>>      >> to
> >>>>>>>>       >>>>      >>>>> mount everything to ramdisk + the latest CPU
> >>>>>>>>       architecture?
> >>>>>>>>       >>>>      >>>>>
> >>>>>>>>       >>>>      >>>>> Throwing hardware at the problem could help
> >> reduce
> >>>>>>>>       the time
> >>>>>>>>       >>>>      of an
> >>>>>>>>       >>>>      >>>>> individual build, and using our own
> >> infrastructure
> >>>>>>>>       would
> >>>>>>>>       >>>>      remove our
> >>>>>>>>       >>>>      >>>>> dependency on Apache's Travis account (with
> the
> >>>>>>>>       obvious
> >>>>>>>>       >>>>      downside of
> >>>>>>>>       >>>>      >>>> having
> >>>>>>>>       >>>>      >>>>> to maintain the infrastructure)
> >>>>>>>>       >>>>      >>>>> We could use an open source travis
> >> alternative, to
> >>>>>>>>       have a
> >>>>>>>>       >>>>      similar
> >>>>>>>>       >>>>      >>>>> experience and make the migration easy.
> >>>>>>>>       >>>>      >>>>>
> >>>>>>>>       >>>>      >>>>>
> >>>>>>>>       >>>>      >>>>> On Wed, Jun 26, 2019 at 9:34 AM Chesnay
> >> Schepler
> >>>>>>>>       >>>>      <chesnay@apache.org <ma...@apache.org>
> >>>>>>>>       <mailto:chesnay@apache.org <ma...@apache.org>>>
> >>>>>>>>       >>>>      >>>> wrote:
> >>>>>>>>       >>>>      >>>>>> >From what I gathered, there's no special
> >>>>>>>>       sauce that the
> >>>>>>>>       >>>>      Zeppelin
> >>>>>>>>       >>>>      >>>>>> project uses which actually integrates a
> users
> >>>>>>>> Travis
> >>>>>>>>       >>>>      account into the
> >>>>>>>>       >>>>      >>>> PR.
> >>>>>>>>       >>>>      >>>>>> They just disabled Travis for PRs. And
> that's
> >>>>>>>>       kind of it.
> >>>>>>>>       >>>>      >>>>>>
> >>>>>>>>       >>>>      >>>>>> Naturally we can do this (duh) and safe the
> >> ASF a
> >>>>>>>>       fair
> >>>>>>>>       >>>>      amount of
> >>>>>>>>       >>>>      >>>>>> resources, but there are downsides:
> >>>>>>>>       >>>>      >>>>>>
> >>>>>>>>       >>>>      >>>>>> The discoverability of the Travis check
> takes
> >> a
> >>>>>>>>       nose-dive.
> >>>>>>>>       >>>>      Either we
> >>>>>>>>       >>>>      >>>>>> require every contributor to always, an
> every
> >>>>>>>>       commit, also
> >>>>>>>>       >>>>      post a
> >>>>>>>>       >>>>      >> Travis
> >>>>>>>>       >>>>      >>>>>> build, or we have the reviewer sift through
> >> the
> >>>>>>>>       >>>>      contributors account
> >>>>>>>>       >>>>      >> to
> >>>>>>>>       >>>>      >>>>>> find it.
> >>>>>>>>       >>>>      >>>>>>
> >>>>>>>>       >>>>      >>>>>> This is rather cumbersome. Additionally,
> it's
> >>>>>>>>       also not
> >>>>>>>>       >>>>      equivalent to
> >>>>>>>>       >>>>      >>>>>> having a PR build.
> >>>>>>>>       >>>>      >>>>>>
> >>>>>>>>       >>>>      >>>>>> A normal branch build takes a branch as is
> and
> >>>>>>>>       tests it. A
> >>>>>>>>       >>>>      PR build
> >>>>>>>>       >>>>      >>>>>> merges the branch into master, and then
> runs
> >> it.
> >>>>>>>>       (Fun fact:
> >>>>>>>>       >>>>      This is
> >>>>>>>>       >>>>      >> why
> >>>>>>>>       >>>>      >>>>>> a PR without merge conflicts is not being
> run
> >> on
> >>>>>>>>       Travis.)
> >>>>>>>>       >>>>      >>>>>>
> >>>>>>>>       >>>>      >>>>>> And ultimately, everyone can already make
> use
> >>>>>>>> of this
> >>>>>>>>       >>>>      approach anyway.
> >>>>>>>>       >>>>      >>>>>>
> >>>>>>>>       >>>>      >>>>>> On 25/06/2019 08:02, Jark Wu wrote:
> >>>>>>>>       >>>>      >>>>>>> Hi Jeff,
> >>>>>>>>       >>>>      >>>>>>>
> >>>>>>>>       >>>>      >>>>>>> Thanks for sharing the Zeppelin approach.
> I
> >>>>>>>>       think it's a
> >>>>>>>>       >>>>      good idea to
> >>>>>>>>       >>>>      >>>>>>> leverage user's travis account.
> >>>>>>>>       >>>>      >>>>>>> In this way, we can have almost unlimited
> >>>>>>>>       concurrent build
> >>>>>>>>       >>>>      jobs and
> >>>>>>>>       >>>>      >>>>>>> developers can restart build by themselves
> >>>>>>>>       (currently only
> >>>>>>>>       >>>>      committers
> >>>>>>>>       >>>>      >>>>>>> can restart PR's build).
> >>>>>>>>       >>>>      >>>>>>>
> >>>>>>>>       >>>>      >>>>>>> But I'm still not very clear how to
> integrate
> >>>>>>>> user's
> >>>>>>>>       >>>>      travis build
> >>>>>>>>       >>>>      >> into
> >>>>>>>>       >>>>      >>>>>>> the Flink pull request's build
> automatically.
> >>>>>>>>       Can you
> >>>>>>>>       >>>>      explain more in
> >>>>>>>>       >>>>      >>>>>>> detail?
> >>>>>>>>       >>>>      >>>>>>>
> >>>>>>>>       >>>>      >>>>>>> Another question: does travis only build
> >>>>>>>>       branches for user
> >>>>>>>>       >>>>      account?
> >>>>>>>>       >>>>      >>>>>>> My concern is that builds for PRs will
> rebase
> >>>>>>>> user's
> >>>>>>>>       >>>>      commits against
> >>>>>>>>       >>>>      >>>>>>> current master branch.
> >>>>>>>>       >>>>      >>>>>>> This will help us to find problems before
> >>>>>>>>       merge.  Builds
> >>>>>>>>       >>>>      for branches
> >>>>>>>>       >>>>      >>>>>>> will lose the impact of new commits in
> >> master.
> >>>>>>>>       >>>>      >>>>>>> How does Zeppelin solve this problem?
> >>>>>>>>       >>>>      >>>>>>>
> >>>>>>>>       >>>>      >>>>>>> Thanks again for sharing the idea.
> >>>>>>>>       >>>>      >>>>>>>
> >>>>>>>>       >>>>      >>>>>>> Regards,
> >>>>>>>>       >>>>      >>>>>>> Jark
> >>>>>>>>       >>>>      >>>>>>>
> >>>>>>>>       >>>>      >>>>>>> On Tue, 25 Jun 2019 at 11:01, Jeff Zhang
> >>>>>>>>       <zjffdu@gmail.com <ma...@gmail.com>
> >>>>>>>>       >>>>      <mailto:zjffdu@gmail.com <mailto:zjffdu@gmail.com
> >>
> >>>>>>>>       >>>>      >>>>>>> <mailto:zjffdu@gmail.com
> >>>>>>>>       <ma...@gmail.com> <mailto:zjffdu@gmail.com
> >>>>>>>>       <ma...@gmail.com>>>> wrote:
> >>>>>>>>       >>>>      >>>>>>>
> >>>>>>>>       >>>>      >>>>>>>  Hi Folks,
> >>>>>>>>       >>>>      >>>>>>>
> >>>>>>>>       >>>>      >>>>>>>  Zeppelin meet this kind of issue before,
> we
> >>>>>>>> solve
> >>>>>>>>       >>>> it by
> >>>>>>>>       >>>>      >> delegating
> >>>>>>>>       >>>>      >>>>>>>  each
> >>>>>>>>       >>>>      >>>>>>>  one's PR build to his travis account
> >>>>>>>>       (Everyone can
> >>>>>>>>       >>>>      have 5 free
> >>>>>>>>       >>>>      >>>>>>>  slot for
> >>>>>>>>       >>>>      >>>>>>>  travis build).
> >>>>>>>>       >>>>      >>>>>>>  Apache account travis build is only
> >>>>>>>> triggered when
> >>>>>>>>       >>>>      PR is merged.
> >>>>>>>>       >>>>      >>>>>>>
> >>>>>>>>       >>>>      >>>>>>>
> >>>>>>>>       >>>>      >>>>>>>
> >>>>>>>>       >>>>      >>>>>>>  Kurt Young <ykt836@gmail.com
> >>>>>>>>       <ma...@gmail.com>
> >>>>>>>>       >>>>      <mailto:ykt836@gmail.com <mailto:ykt836@gmail.com
> >>
> >>>>>>>>       <mailto:ykt836@gmail.com <ma...@gmail.com>
> >>>>>>>>       >>>>      <mailto:ykt836@gmail.com <mailto:ykt836@gmail.com
> >>>>>>>>       >>>>      >>>>>>>  于2019年6月25日周二 上午10:16写道:
> >>>>>>>>       >>>>      >>>>>>>
> >>>>>>>>       >>>>      >>>>>>>  > (Forgot to cc George)
> >>>>>>>>       >>>>      >>>>>>>  >
> >>>>>>>>       >>>>      >>>>>>>  > Best,
> >>>>>>>>       >>>>      >>>>>>>  > Kurt
> >>>>>>>>       >>>>      >>>>>>>  >
> >>>>>>>>       >>>>      >>>>>>>  >
> >>>>>>>>       >>>>      >>>>>>>  > On Tue, Jun 25, 2019 at 10:16 AM Kurt
> >> Young
> >>>>>>>>       >>>>      <ykt836@gmail.com <ma...@gmail.com>
> >>>>>>>>       <mailto:ykt836@gmail.com <ma...@gmail.com>>
> >>>>>>>>       >>>>      >>>>>>> <mailto:ykt836@gmail.com
> >>>>>>>>       <ma...@gmail.com> <mailto:ykt836@gmail.com
> >>>>>>>>       <ma...@gmail.com>>>>
> >>>>>>>>       >>>>      wrote:
> >>>>>>>>       >>>>      >>>>>>>  >
> >>>>>>>>       >>>>      >>>>>>>  > > Hi Bowen,
> >>>>>>>>       >>>>      >>>>>>>  > >
> >>>>>>>>       >>>>      >>>>>>>  > > Thanks for bringing this up. We
> >>>>>>>>       actually have
> >>>>>>>>       >>>>      discussed
> >>>>>>>>       >>>>      >> about
> >>>>>>>>       >>>>      >>>>>>>  this, and I
> >>>>>>>>       >>>>      >>>>>>>  > > think Till and George have
> >>>>>>>>       >>>>      >>>>>>>  > > already spend sometime investigating
> >>>>>>>>       it. I have
> >>>>>>>>       >>>>      cced both of
> >>>>>>>>       >>>>      >>>>>>>  them, and
> >>>>>>>>       >>>>      >>>>>>>  > > maybe they can share
> >>>>>>>>       >>>>      >>>>>>>  > > their findings.
> >>>>>>>>       >>>>      >>>>>>>  > >
> >>>>>>>>       >>>>      >>>>>>>  > > Best,
> >>>>>>>>       >>>>      >>>>>>>  > > Kurt
> >>>>>>>>       >>>>      >>>>>>>  > >
> >>>>>>>>       >>>>      >>>>>>>  > >
> >>>>>>>>       >>>>      >>>>>>>  > > On Tue, Jun 25, 2019 at 10:08 AM
> Jark Wu
> >>>>>>>>       >>>>      <imjark@gmail.com <ma...@gmail.com>
> >>>>>>>>       <mailto:imjark@gmail.com <ma...@gmail.com>>
> >>>>>>>>       >>>>      >>>>>>> <mailto:imjark@gmail.com
> >>>>>>>>       <ma...@gmail.com> <mailto:imjark@gmail.com
> >>>>>>>>       <ma...@gmail.com>>>>
> >>>>>>>>       >>>>      wrote:
> >>>>>>>>       >>>>      >>>>>>>  > >
> >>>>>>>>       >>>>      >>>>>>>  > >> Hi Bowen,
> >>>>>>>>       >>>>      >>>>>>>  > >>
> >>>>>>>>       >>>>      >>>>>>>  > >> Thanks for bringing this. We also
> >>>>>>>>       suffered from
> >>>>>>>>       >>>>      the long
> >>>>>>>>       >>>>      >>>>>>>  build time.
> >>>>>>>>       >>>>      >>>>>>>  > >> I agree that we should focus on
> >>>>>>>>       solving build
> >>>>>>>>       >>>>      capacity
> >>>>>>>>       >>>>      >>>>>>>  problem in the
> >>>>>>>>       >>>>      >>>>>>>  > >> thread.
> >>>>>>>>       >>>>      >>>>>>>  > >>
> >>>>>>>>       >>>>      >>>>>>>  > >> My observation is there is only one
> >>>>>>>>       build is
> >>>>>>>>       >>>>      running, all
> >>>>>>>>       >>>>      >> the
> >>>>>>>>       >>>>      >>>>>>>  others
> >>>>>>>>       >>>>      >>>>>>>  > >> (other
> >>>>>>>>       >>>>      >>>>>>>  > >> PRs, master) are pending.
> >>>>>>>>       >>>>      >>>>>>>  > >> The pricing plan[1] of travis shows
> >>>>>>>>       it can
> >>>>>>>>       >>>> support
> >>>>>>>>       >>>>      >> concurrent
> >>>>>>>>       >>>>      >>>>>>>  build
> >>>>>>>>       >>>>      >>>>>>>  > jobs.
> >>>>>>>>       >>>>      >>>>>>>  > >> But I don't know which plan we are
> >>>>>>>>       using, might
> >>>>>>>>       >>>>      be the free
> >>>>>>>>       >>>>      >>>>>>>  plan for
> >>>>>>>>       >>>>      >>>>>>>  > open
> >>>>>>>>       >>>>      >>>>>>>  > >> source.
> >>>>>>>>       >>>>      >>>>>>>  > >>
> >>>>>>>>       >>>>      >>>>>>>  > >> I cc-ed Chesnay who may have some
> >>>>>>>>       experience on
> >>>>>>>>       >>>>      Travis.
> >>>>>>>>       >>>>      >>>>>>>  > >>
> >>>>>>>>       >>>>      >>>>>>>  > >> Regards,
> >>>>>>>>       >>>>      >>>>>>>  > >> Jark
> >>>>>>>>       >>>>      >>>>>>>  > >>
> >>>>>>>>       >>>>      >>>>>>>  > >> [1]: https://travis-ci.com/plans
> >>>>>>>>       >>>>      >>>>>>>  > >>
> >>>>>>>>       >>>>      >>>>>>>  > >> On Tue, 25 Jun 2019 at 08:11, Bowen
> Li
> >> <
> >>>>>>>>       >>>>      >> bowenli86@gmail.com <mailto:
> bowenli86@gmail.com>
> >>>>>>>>       <mailto:bowenli86@gmail.com <ma...@gmail.com>>
> >>>>>>>>       >>>>      >>>>>>> <mailto:bowenli86@gmail.com
> >>>>>>>>       <ma...@gmail.com>
> >>>>>>>>       >>>>      <mailto:bowenli86@gmail.com
> >>>>>>>>       <ma...@gmail.com>>>> wrote:
> >>>>>>>>       >>>>      >>>>>>>  > >>
> >>>>>>>>       >>>>      >>>>>>>  > >> > Hi Steven,
> >>>>>>>>       >>>>      >>>>>>>  > >> >
> >>>>>>>>       >>>>      >>>>>>>  > >> > I think you may not read what I
> >>>>>>>>       wrote. The
> >>>>>>>>       >>>>      discussion is
> >>>>>>>>       >>>>      >>>> about
> >>>>>>>>       >>>>      >>>>>>>  > "unstable
> >>>>>>>>       >>>>      >>>>>>>  > >> > build **capacity**", in another
> word
> >>>>>>>>       >>>>      "unstable / lack of
> >>>>>>>>       >>>>      >>>> build
> >>>>>>>>       >>>>      >>>>>>>  > >> resources",
> >>>>>>>>       >>>>      >>>>>>>  > >> > not "unstable build".
> >>>>>>>>       >>>>      >>>>>>>  > >> >
> >>>>>>>>       >>>>      >>>>>>>  > >> > On Mon, Jun 24, 2019 at 4:40 PM
> >>>>>>>>       Steven Wu
> >>>>>>>>       >>>>      >>>>>>>  <stevenz3wu@gmail.com
> >>>>>>>>       <ma...@gmail.com> <mailto:stevenz3wu@gmail.com
> >>>>>>>>       <ma...@gmail.com>>
> >>>>>>>>       >>>>      <mailto:stevenz3wu@gmail.com
> >>>>>>>>       <ma...@gmail.com> <mailto:stevenz3wu@gmail.com
> >>>>>>>>       <ma...@gmail.com>>>>
> >>>>>>>>       >>>>      >>>>>>>  > wrote:
> >>>>>>>>       >>>>      >>>>>>>  > >> >
> >>>>>>>>       >>>>      >>>>>>>  > >> > > long and sometimes unstable
> build
> >> is
> >>>>>>>>       >>>>      definitely a pain
> >>>>>>>>       >>>>      >>>>>> point.
> >>>>>>>>       >>>>      >>>>>>>  > >> > >
> >>>>>>>>       >>>>      >>>>>>>  > >> > > I suspect the build failure
> here in
> >>>>>>>>       >>>>      >> flink-connector-kafka
> >>>>>>>>       >>>>      >>>>>>>  is not
> >>>>>>>>       >>>>      >>>>>>>  > >> related
> >>>>>>>>       >>>>      >>>>>>>  > >> > to
> >>>>>>>>       >>>>      >>>>>>>  > >> > > my change. but there is no easy
> >>>>>>>>       re-run the
> >>>>>>>>       >>>>      build on
> >>>>>>>>       >>>>      >>>>>>>  travis UI.
> >>>>>>>>       >>>>      >>>>>>>  > Google
> >>>>>>>>       >>>>      >>>>>>>  > >> > > search showed a trick of
> >>>>>>>>       close-and-open the
> >>>>>>>>       >>>>      PR will
> >>>>>>>>       >>>>      >>>>>>>  trigger rebuild.
> >>>>>>>>       >>>>      >>>>>>>  > >> but
> >>>>>>>>       >>>>      >>>>>>>  > >> > > that could add noises to the PR
> >>>>>>>>       activities.
> >>>>>>>>       >>>>      >>>>>>>  > >> > >
> >>>>>>>>       >>>> https://travis-ci.org/apache/flink/jobs/545555519
> >>>>>>>>       >>>>      >>>>>>>  > >> > >
> >>>>>>>>       >>>>      >>>>>>>  > >> > > travis-ci for my personal repo
> >>>>>>>>       often failed
> >>>>>>>>       >>>>      with
> >>>>>>>>       >>>>      >>>>>>>  exceeding time
> >>>>>>>>       >>>>      >>>>>>>  > limit
> >>>>>>>>       >>>>      >>>>>>>  > >> > after
> >>>>>>>>       >>>>      >>>>>>>  > >> > > 4+ hours.
> >>>>>>>>       >>>>      >>>>>>>  > >> > > The job exceeded the maximum
> time
> >>>>>>>>       limit for
> >>>>>>>>       >>>>      jobs, and
> >>>>>>>>       >>>>      >> has
> >>>>>>>>       >>>>      >>>>>>>  been
> >>>>>>>>       >>>>      >>>>>>>  > >> > terminated.
> >>>>>>>>       >>>>      >>>>>>>  > >> > >
> >>>>>>>>       >>>>      >>>>>>>  > >> > > On Mon, Jun 24, 2019 at 4:15 PM
> >>>>>>>>       Bowen Li
> >>>>>>>>       >>>>      >>>>>>>  <bowenli86@gmail.com
> >>>>>>>>       <ma...@gmail.com> <mailto:bowenli86@gmail.com
> >>>>>>>>       <ma...@gmail.com>>
> >>>>>>>>       >>>>      <mailto:bowenli86@gmail.com <mailto:
> >> bowenli86@gmail.com
> >>>>>>>>       <mailto:bowenli86@gmail.com <ma...@gmail.com>>>>
> >>>>>>>>       >>>>      >>>>>>>  > wrote:
> >>>>>>>>       >>>>      >>>>>>>  > >> > >
> >>>>>>>>       >>>>      >>>>>>>  > >> > > >
> >>>>>>>>       >>>> https://travis-ci.org/apache/flink/builds/549681530
> >>>>>>>>       >>>>      >>>>>>>  This build
> >>>>>>>>       >>>>      >>>>>>>  > >> > request
> >>>>>>>>       >>>>      >>>>>>>  > >> > > > has
> >>>>>>>>       >>>>      >>>>>>>  > >> > > > been sitting at **HEAD of the
> >>>>>>>>       queue**
> >>>>>>>>       >>>>      since I first
> >>>>>>>>       >>>>      >> saw
> >>>>>>>>       >>>>      >>>>>>>  it at PST
> >>>>>>>>       >>>>      >>>>>>>  > >> > 10:30am
> >>>>>>>>       >>>>      >>>>>>>  > >> > > > (not sure how long it's been
> >>>>>>>>       there before
> >>>>>>>>       >>>>      10:30am).
> >>>>>>>>       >>>>      >>>>>>>  It's PST
> >>>>>>>>       >>>>      >>>>>>>  > 4:12pm
> >>>>>>>>       >>>>      >>>>>>>  > >> now
> >>>>>>>>       >>>>      >>>>>>>  > >> > > and
> >>>>>>>>       >>>>      >>>>>>>  > >> > > > it hasn't started yet.
> >>>>>>>>       >>>>      >>>>>>>  > >> > > >
> >>>>>>>>       >>>>      >>>>>>>  > >> > > > On Mon, Jun 24, 2019 at 2:48
> PM
> >>>>>>>>       Bowen Li
> >>>>>>>>       >>>>      >>>>>>>  <bowenli86@gmail.com
> >>>>>>>>       <ma...@gmail.com> <mailto:bowenli86@gmail.com
> >>>>>>>>       <ma...@gmail.com>>
> >>>>>>>>       >>>>      <mailto:bowenli86@gmail.com <mailto:
> >> bowenli86@gmail.com
> >>>>>>>>       <mailto:bowenli86@gmail.com <ma...@gmail.com>>>>
> >>>>>>>>       >>>>      >>>>>>>  > >> wrote:
> >>>>>>>>       >>>>      >>>>>>>  > >> > > >
> >>>>>>>>       >>>>      >>>>>>>  > >> > > > > Hi devs,
> >>>>>>>>       >>>>      >>>>>>>  > >> > > > >
> >>>>>>>>       >>>>      >>>>>>>  > >> > > > > I've been experiencing the
> pain
> >>>>>>>>       >>>>      resulting from lack
> >>>>>>>>       >>>>      >>>>>>>  of stable
> >>>>>>>>       >>>>      >>>>>>>  > >> build
> >>>>>>>>       >>>>      >>>>>>>  > >> > > > > capacity on Travis for Flink
> >>>>>>>>       PRs [1].
> >>>>>>>>       >>>>      >> Specifically, I
> >>>>>>>>       >>>>      >>>>>>>  noticed
> >>>>>>>>       >>>>      >>>>>>>  > >> often
> >>>>>>>>       >>>>      >>>>>>>  > >> > > that
> >>>>>>>>       >>>>      >>>>>>>  > >> > > > no
> >>>>>>>>       >>>>      >>>>>>>  > >> > > > > build in the queue is making
> >> any
> >>>>>>>>       >>>>      progress for
> >>>>>>>>       >>>>      >> hours,
> >>>>>>>>       >>>>      >>>> and
> >>>>>>>>       >>>>      >>>>>>>  > suddenly
> >>>>>>>>       >>>>      >>>>>>>  > >> 5
> >>>>>>>>       >>>>      >>>>>>>  > >> > or
> >>>>>>>>       >>>>      >>>>>>>  > >> > > 6
> >>>>>>>>       >>>>      >>>>>>>  > >> > > > > builds kick off all together
> >>>>>>>>       after the
> >>>>>>>>       >>>>      long pause.
> >>>>>>>>       >>>>      >>>>>>>  I'm at PST
> >>>>>>>>       >>>>      >>>>>>>  > >> > (UTC-08)
> >>>>>>>>       >>>>      >>>>>>>  > >> > > > time
> >>>>>>>>       >>>>      >>>>>>>  > >> > > > > zone, and I've seen pause
> can
> >>>>>>>>       be as
> >>>>>>>>       >>>>      long as 6 hours
> >>>>>>>>       >>>>      >>>>>>>  from PST 9am
> >>>>>>>>       >>>>      >>>>>>>  > >> to
> >>>>>>>>       >>>>      >>>>>>>  > >> > 3pm
> >>>>>>>>       >>>>      >>>>>>>  > >> > > > > (let alone the time needed
> to
> >>>>>>>>       drain the
> >>>>>>>>       >>>>      queue
> >>>>>>>>       >>>>      >>>>>>>  afterwards).
> >>>>>>>>       >>>>      >>>>>>>  > >> > > > >
> >>>>>>>>       >>>>      >>>>>>>  > >> > > > > I think this has greatly
> >>>>>>>>       impacted our
> >>>>>>>>       >>>>      productivity.
> >>>>>>>>       >>>>      >>>> I've
> >>>>>>>>       >>>>      >>>>>>>  > >> experienced
> >>>>>>>>       >>>>      >>>>>>>  > >> > > that
> >>>>>>>>       >>>>      >>>>>>>  > >> > > > > PRs submitted in the early
> >>>>>>>>       morning of
> >>>>>>>>       >>>>      PST time zone
> >>>>>>>>       >>>>      >>>>>>>  won't finish
> >>>>>>>>       >>>>      >>>>>>>  > >> > their
> >>>>>>>>       >>>>      >>>>>>>  > >> > > > > build until late night of
> the
> >>>>>>>>       same day.
> >>>>>>>>       >>>>      >>>>>>>  > >> > > > >
> >>>>>>>>       >>>>      >>>>>>>  > >> > > > > So my questions are:
> >>>>>>>>       >>>>      >>>>>>>  > >> > > > >
> >>>>>>>>       >>>>      >>>>>>>  > >> > > > > - Has anyone else
> experienced
> >>>>>>>>       the same
> >>>>>>>>       >>>>      problem or
> >>>>>>>>       >>>>      >>>>>>>  have similar
> >>>>>>>>       >>>>      >>>>>>>  > >> > > > observation
> >>>>>>>>       >>>>      >>>>>>>  > >> > > > > on TravisCI? (I suspect it
> >>>>>>>>       has things
> >>>>>>>>       >>>>      to do with
> >>>>>>>>       >>>>      >> time
> >>>>>>>>       >>>>      >>>>>>>  zone)
> >>>>>>>>       >>>>      >>>>>>>  > >> > > > >
> >>>>>>>>       >>>>      >>>>>>>  > >> > > > > - What pricing plan of
> >>>>>>>>       TravisCI is
> >>>>>>>>       >>>>      Flink currently
> >>>>>>>>       >>>>      >>>>>>>  using? Is it
> >>>>>>>>       >>>>      >>>>>>>  > >> the
> >>>>>>>>       >>>>      >>>>>>>  > >> > > free
> >>>>>>>>       >>>>      >>>>>>>  > >> > > > > plan for open source
> >>>>>>>>       projects? What
> >>>>>>>>       >>>> are the
> >>>>>>>>       >>>>      >>>>>>>  guaranteed build
> >>>>>>>>       >>>>      >>>>>>>  > >> capacity
> >>>>>>>>       >>>>      >>>>>>>  > >> > > of
> >>>>>>>>       >>>>      >>>>>>>  > >> > > > > the current plan?
> >>>>>>>>       >>>>      >>>>>>>  > >> > > > >
> >>>>>>>>       >>>>      >>>>>>>  > >> > > > > - If the current pricing
> plan
> >>>>>>>>       (either
> >>>>>>>>       >>>>      free or paid)
> >>>>>>>>       >>>>      >>>>>> can't
> >>>>>>>>       >>>>      >>>>>>>  > provide
> >>>>>>>>       >>>>      >>>>>>>  > >> > > stable
> >>>>>>>>       >>>>      >>>>>>>  > >> > > > > build capacity, can we
> >>>>>>>>       upgrade to a
> >>>>>>>>       >>>>      higher priced
> >>>>>>>>       >>>>      >>>>>>>  plan with
> >>>>>>>>       >>>>      >>>>>>>  > larger
> >>>>>>>>       >>>>      >>>>>>>  > >> > and
> >>>>>>>>       >>>>      >>>>>>>  > >> > > > more
> >>>>>>>>       >>>>      >>>>>>>  > >> > > > > stable build capacity?
> >>>>>>>>       >>>>      >>>>>>>  > >> > > > >
> >>>>>>>>       >>>>      >>>>>>>  > >> > > > > BTW, another factor that
> >>>>>>>>       contribute to
> >>>>>>>>       >>>> the
> >>>>>>>>       >>>>      >>>>>>>  productivity problem
> >>>>>>>>       >>>>      >>>>>>>  > is
> >>>>>>>>       >>>>      >>>>>>>  > >> > that
> >>>>>>>>       >>>>      >>>>>>>  > >> > > > > our build is slow - we run
> >>>>>>>>       full build
> >>>>>>>>       >>>>      for every PR
> >>>>>>>>       >>>>      >>>> and a
> >>>>>>>>       >>>>      >>>>>>>  > >> successful
> >>>>>>>>       >>>>      >>>>>>>  > >> > > full
> >>>>>>>>       >>>>      >>>>>>>  > >> > > > > build takes ~5h. We
> >>>>>>>>       definitely have
> >>>>>>>>       >>>>      more options to
> >>>>>>>>       >>>>      >>>>>>>  solve it,
> >>>>>>>>       >>>>      >>>>>>>  > for
> >>>>>>>>       >>>>      >>>>>>>  > >> > > > instance,
> >>>>>>>>       >>>>      >>>>>>>  > >> > > > > modularize the build graphs
> >>>>>>>>       and reuse
> >>>>>>>>       >>>>      artifacts
> >>>>>>>>       >>>>      >> from
> >>>>>>>>       >>>>      >>>> the
> >>>>>>>>       >>>>      >>>>>>>  > previous
> >>>>>>>>       >>>>      >>>>>>>  > >> > > build.
> >>>>>>>>       >>>>      >>>>>>>  > >> > > > > But I think that can be a
> big
> >>>>>>>>       effort
> >>>>>>>>       >>>>      which is much
> >>>>>>>>       >>>>      >>>>>>>  harder to
> >>>>>>>>       >>>>      >>>>>>>  > >> > accomplish
> >>>>>>>>       >>>>      >>>>>>>  > >> > > > in
> >>>>>>>>       >>>>      >>>>>>>  > >> > > > > a short period of time and
> >>>>>>>>       may deserve
> >>>>>>>>       >>>>      its own
> >>>>>>>>       >>>>      >>>> separate
> >>>>>>>>       >>>>      >>>>>>>  > >> discussion.
> >>>>>>>>       >>>>      >>>>>>>  > >> > > > >
> >>>>>>>>       >>>>      >>>>>>>  > >> > > > > [1]
> >>>>>>>>       >>>>      >>
> https://travis-ci.org/apache/flink/pull_requests
> >>>>>>>>       >>>>      >>>>>>>  > >> > > > >
> >>>>>>>>       >>>>      >>>>>>>  > >> > > > >
> >>>>>>>>       >>>>      >>>>>>>  > >> > > >
> >>>>>>>>       >>>>      >>>>>>>  > >> > >
> >>>>>>>>       >>>>      >>>>>>>  > >> >
> >>>>>>>>       >>>>      >>>>>>>  > >>
> >>>>>>>>       >>>>      >>>>>>>  > >
> >>>>>>>>       >>>>      >>>>>>>  >
> >>>>>>>>       >>>>      >>>>>>>
> >>>>>>>>       >>>>      >>>>>>>
> >>>>>>>>       >>>>      >>>>>>>  --
> >>>>>>>>       >>>>      >>>>>>>  Best Regards
> >>>>>>>>       >>>>      >>>>>>>
> >>>>>>>>       >>>>      >>>>>>>  Jeff Zhang
> >>>>>>>>       >>>>      >>>>>>>
> >>>>>>>>       >>>>      >>
> >>>>>>>>       >>>>
> >>>>>>>>       >>>
> >>>>>>>>       >>
> >>>>>>>>
> >>
>
>

Re: [RESULT][VOTE] Migrate to sponsored Travis account

Posted by Chesnay Schepler <ch...@apache.org>.
I'm currently modifying the cibot to do this automatically; should be 
finished until Monday.

On 02/08/2019 07:41, Jark Wu wrote:
> Hi Chesnay,
>
> Can we assign Flink Committers the permission of flink-ci/flink repo?
> Several times, when I pushed some new commits, the old build jobs are still
> in pending and not canceled.
> Before we fix that, we can manually cancel some old jobs to save build
> resource.
>
> Best,
> Jark
>
>
> On Wed, 10 Jul 2019 at 16:17, Chesnay Schepler <ch...@apache.org> wrote:
>
>> Your best bet would be to check the first commit in the PR and check the
>> parent commit.
>>
>> To re-run things, you will have to rebase the PR on the latest master.
>>
>> On 10/07/2019 03:32, Kurt Young wrote:
>>> Thanks for all your efforts Chesnay, it indeed improves a lot for our
>>> develop experience. BTW, do you know how to find the master branch
>>> information which the CI runs with?
>>>
>>> For example, like this one:
>>> https://travis-ci.com/flink-ci/flink/jobs/214542568
>>> It shows pass with the commits, which rebased on the master when the CI
>>> is triggered. But it's both possible that the master branch CI runs on is
>>> the
>>> same or different with current master. If it's the same, I can simply
>> rely
>>> on the
>>> passed information to push commits, but if it's not, I think i should
>> find
>>> another
>>> way to re-trigger tests based on the newest master.
>>>
>>> Do you know where can I get such information?
>>>
>>> Best,
>>> Kurt
>>>
>>>
>>> On Tue, Jul 9, 2019 at 3:27 AM Chesnay Schepler <ch...@apache.org>
>> wrote:
>>>> The kinks have been worked out; the bot is running again and pr builds
>>>> are yet again no longer running on ASF resources.
>>>>
>>>> PRs are mirrored to: https://github.com/flink-ci/flink
>>>> Bot source: https://github.com/flink-ci/ci-bot
>>>>
>>>> On 08/07/2019 17:14, Chesnay Schepler wrote:
>>>>> I have temporarily re-enabled running PR builds on the ASF account;
>>>>> migrating to the Travis subscription caused some issues in the bot
>>>>> that I have to fix first.
>>>>>
>>>>> On 07/07/2019 23:01, Chesnay Schepler wrote:
>>>>>> The vote has passed unanimously in favor of migrating to a separate
>>>>>> Travis account.
>>>>>>
>>>>>> I will now set things up such that no PullRequest is no longer run on
>>>>>> the ASF servers.
>>>>>> This is a major setup in reducing our usage of ASF resources.
>>>>>> For the time being we'll use free Travis plan for flink-ci (i.e. 5
>>>>>> workers, which is the same the ASF gives us). Over the course of the
>>>>>> next week we'll setup the Ververica subscription to increase this
>> limit.
>>>>>>   From now now, a bot will mirror all new and updated PullRequests to a
>>>>>> mirror repository (https://github.com/flink-ci/flink-ci) and write an
>>>>>> update into the PR once the build is complete.
>>>>>> I have ran the bots for the past 3 days in parallel to our existing
>>>>>> Travis and it was working without major issues.
>>>>>>
>>>>>> The biggest change that contributors will see is that there's no
>>>>>> longer a icon next to each commit. We may revisit this in the future.
>>>>>>
>>>>>> I'll setup a repo with the source of the bot later.
>>>>>>
>>>>>> On 04/07/2019 10:46, Chesnay Schepler wrote:
>>>>>>> I've raised a JIRA
>>>>>>> <https://issues.apache.org/jira/browse/INFRA-18703>with INFRA to
>>>>>>> inquire whether it would be possible to switch to a different Travis
>>>>>>> account, and if so what steps would need to be taken.
>>>>>>> We need a proper confirmation from INFRA since we are not in full
>>>>>>> control of the flink repository (for example, we cannot access the
>>>>>>> settings page).
>>>>>>>
>>>>>>> If this is indeed possible, Ververica is willing sponsor a Travis
>>>>>>> account for the Flink project.
>>>>>>> This would provide us with more than enough resources than we need.
>>>>>>>
>>>>>>> Since this makes the project more reliant on resources provided by
>>>>>>> external companies I would like to vote on this.
>>>>>>>
>>>>>>> Please vote on this proposal, as follows:
>>>>>>> [ ] +1, Approve the migration to a Ververica-sponsored Travis
>>>>>>> account, provided that INFRA approves
>>>>>>> [ ] -1, Do not approach the migration to a Ververica-sponsored
>>>>>>> Travis account
>>>>>>>
>>>>>>> The vote will be open for at least 24h, and until we have
>>>>>>> confirmation from INFRA. The voting period may be shorter than the
>>>>>>> usual 3 days since our current is effectively not working.
>>>>>>>
>>>>>>> On 04/07/2019 06:51, Bowen Li wrote:
>>>>>>>> Re: > Are they using their own Travis CI pool, or did the switch to
>>>>>>>> an entirely different CI service?
>>>>>>>>
>>>>>>>> I reached out to Wes and Krisztián from Apache Arrow PMC. They are
>>>>>>>> currently moving away from ASF's Travis to their own in-house metal
>>>>>>>> machines at [1] with custom CI application at [2]. They've seen
>>>>>>>> significant improvement w.r.t both much higher performance and
>>>>>>>> basically no resource waiting time, "night-and-day" difference
>>>>>>>> quoting Wes.
>>>>>>>>
>>>>>>>> Re: > If we can just switch to our own Travis pool, just for our
>>>>>>>> project, then this might be something we can do fairly quickly?
>>>>>>>>
>>>>>>>> I believe so, according to [3] and [4]
>>>>>>>>
>>>>>>>>
>>>>>>>> [1] https://ci.ursalabs.org/ <https://ci.ursalabs.org/#/>
>>>>>>>> [2] https://github.com/ursa-labs/ursabot
>>>>>>>> [3]
>>>>>>>>
>> https://docs.travis-ci.com/user/migrate/open-source-repository-migration
>>>>>>>> [4]
>>>>>>>>
>> https://docs.travis-ci.com/user/migrate/open-source-on-travis-ci-com
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Jul 3, 2019 at 12:01 AM Chesnay Schepler
>>>>>>>> <chesnay@apache.org <ma...@apache.org>> wrote:
>>>>>>>>
>>>>>>>>       Are they using their own Travis CI pool, or did the switch to
>> an
>>>>>>>>       entirely different CI service?
>>>>>>>>
>>>>>>>>       If we can just switch to our own Travis pool, just for our
>>>>>>>>       project, then
>>>>>>>>       this might be something we can do fairly quickly?
>>>>>>>>
>>>>>>>>       On 03/07/2019 05:55, Bowen Li wrote:
>>>>>>>>       > I responded in the INFRA ticket [1] that I believe they are
>>>>>>>>       using a wrong
>>>>>>>>       > metric against Flink and the total build time is a completely
>>>>>>>>       different
>>>>>>>>       > thing than guaranteed build capacity.
>>>>>>>>       >
>>>>>>>>       > My response:
>>>>>>>>       >
>>>>>>>>       > "As mentioned above, since I started to pay attention to
>> Flink's
>>>>>>>>       build
>>>>>>>>       > queue a few tens of days ago, I'm in Seattle and I saw no
>> build
>>>>>>>>       was kicking
>>>>>>>>       > off in PST daytime in weekdays for Flink. Our teammates in
>> China
>>>>>>>>       and Europe
>>>>>>>>       > have also reported similar observations. So we need to
>> evaluate
>>>>>>>>       how the
>>>>>>>>       > large total build time came from - if 1) your number and 2)
>> our
>>>>>>>>       > observations from three locations that cover pretty much a
>> full
>>>>>>>>       day, are
>>>>>>>>       > all true, I **guess** one reason can be that - highly likely
>> the
>>>>>>>>       extra
>>>>>>>>       > build time came from weekends when other Apache projects may
>> be
>>>>>>>>       idle and
>>>>>>>>       > Flink just drains hard its congested queue.
>>>>>>>>       >
>>>>>>>>       > Please be aware of that we're not complaining about the lack
>> of
>>>>>>>>       resources
>>>>>>>>       > in general, I'm complaining about the lack of **stable,
>>>>>>>> dedicated**
>>>>>>>>       > resources. An example for the latter one is, currently even
>> if
>>>>>>>>       no build is
>>>>>>>>       > in Flink's queue and I submit a request to be the queue head
>>>>>>>> in PST
>>>>>>>>       > morning, my build won't even start in 6-8+h. That is an
>> absurd
>>>>>>>>       amount of
>>>>>>>>       > waiting time.
>>>>>>>>       >
>>>>>>>>       > That's saying, if ASF INFRA decides to adopt a quota system
>> and
>>>>>>>>       grants
>>>>>>>>       > Flink five DEDICATED servers that runs all the time only for
>>>>>>>>       Flink, that'll
>>>>>>>>       > be PERFECT and can totally solve our problem now.
>>>>>>>>       >
>>>>>>>>       > Please be aware of that we're not complaining about the lack
>> of
>>>>>>>>       resources
>>>>>>>>       > in general, I'm complaining about the lack of **stable,
>>>>>>>> dedicated**
>>>>>>>>       > resources. An example for the latter one is, currently even
>> if
>>>>>>>>       no build is
>>>>>>>>       > in Flink's queue and I submit a request to be the queue head
>>>>>>>> in PST
>>>>>>>>       > morning, my build won't even start in 6-8+h. That is an
>> absurd
>>>>>>>>       amount of
>>>>>>>>       > waiting time.
>>>>>>>>       >
>>>>>>>>       >
>>>>>>>>       > That's saying, if ASF INFRA decides to adopt a quota system
>> and
>>>>>>>>       grants
>>>>>>>>       > Flink five DEDICATED servers that runs all the time only for
>>>>>>>>       Flink, that'll
>>>>>>>>       > be PERFECT and can totally solve our problem now.
>>>>>>>>       >
>>>>>>>>       > I feel what's missing in the ASF INFRA's Travis resource
>> pool is
>>>>>>>>       some level
>>>>>>>>       > of build capacity SLAs and certainty"
>>>>>>>>       >
>>>>>>>>       >
>>>>>>>>       > Again, I believe there are differences in nature of these two
>>>>>>>>       problems,
>>>>>>>>       > long build time v.s. lack of dedicated build resource. That's
>>>>>>>>       saying,
>>>>>>>>       > shortening build time may relieve the situation, and may not.
>>>>>>>>       I'm sightly
>>>>>>>>       > negative on disabling IT cases for PRs, due to the downside
>> is
>>>>>>>>       that we are
>>>>>>>>       > at risk of any potential bugs in PR that UTs doesn't catch,
>> and
>>>>>>>>       may cost a
>>>>>>>>       > lot more to fix and if it slows others down or even block
>>>>>>>>       others, but am
>>>>>>>>       > open to others opinions on it.
>>>>>>>>       >
>>>>>>>>       > AFAICT from INFRA ticket[1], donating to ASF INFRA won't be
>>>>>>>>       feasible to
>>>>>>>>       > solve our problem since INFRA's pool is fully shared and they
>>>>>>>>       have no
>>>>>>>>       > control and finer insights over resource allocation to a
>>>>>>>>       specific Apache
>>>>>>>>       > project. As mentioned in [1], Apache Arrow is moving away
>> from
>>>>>>>>       ASF INFRA
>>>>>>>>       > Travis pool (they are actually surprised Flink hasn't plan
>> to do
>>>>>>>>       so). I
>>>>>>>>       > know that Spark is on its own build infra. If we all agree
>> that
>>>>>>>>       funding our
>>>>>>>>       > own build infra, I'd be glad to help investigate any
>> potential
>>>>>>>>       options
>>>>>>>>       > after releasing 1.9 since I'm super busy with 1.9 now.
>>>>>>>>       >
>>>>>>>>       > [1] https://issues.apache.org/jira/browse/INFRA-18533
>>>>>>>>       >
>>>>>>>>       >
>>>>>>>>       >
>>>>>>>>       > On Tue, Jul 2, 2019 at 4:46 AM Chesnay Schepler
>>>>>>>>       <chesnay@apache.org <ma...@apache.org>> wrote:
>>>>>>>>       >
>>>>>>>>       >> As a short-term stopgap, since we can assume this issue to
>>>>>>>>       become much
>>>>>>>>       >> worse in the following days/weeks, we could disable IT
>> cases in
>>>>>>>>       PRs and
>>>>>>>>       >> only run them on master.
>>>>>>>>       >>
>>>>>>>>       >> On 02/07/2019 12:03, Chesnay Schepler wrote:
>>>>>>>>       >>> People really have to stop thinking that just because
>>>>>>>>       something works
>>>>>>>>       >>> for us it is also a good solution.
>>>>>>>>       >>> Also, please remember that our builds run for 2h from
>> start to
>>>>>>>>       finish,
>>>>>>>>       >>> and not the 14 _minutes_ it takes for zeppelin.
>>>>>>>>       >>> We are dealing with an entirely different scale here, both
>> in
>>>>>>>>       terms of
>>>>>>>>       >>> build times and number of builds.
>>>>>>>>       >>>
>>>>>>>>       >>> In this very thread people have been complaining about long
>>>>>>>> queue
>>>>>>>>       >>> times for their builds. Surprise, other Apache projects
>>>>>>>> have been
>>>>>>>>       >>> suffering the very same thing due to us not controlling our
>>>>>>>> build
>>>>>>>>       >>> times. While switching services (be it Jenkins, CircleCI or
>>>>>>>>       whatever)
>>>>>>>>       >>> will possibly work for us (and these options are actually
>>>>>>>>       attractive,
>>>>>>>>       >>> like CircleCI's proper support for build artifacts), it
>>>>>>>> will also
>>>>>>>>       >>> result in us likely negatively affecting other projects in
>>>>>>>>       significant
>>>>>>>>       >>> ways.
>>>>>>>>       >>>
>>>>>>>>       >>> Sure, the Jenkins setup has a good user experience for us,
>> at
>>>>>>>>       the cost
>>>>>>>>       >>> of blocking Jenkins workers for a _lot_ of time. Right now
>> we
>>>>>>>>       have 25
>>>>>>>>       >>> PR's in our queue; that's possibly 50h we'd consume of
>> Jenkins
>>>>>>>>       >>> resources, and the European contributors haven't even
>> really
>>>>>>>>       started yet.
>>>>>>>>       >>>
>>>>>>>>       >>> FYI, the latest INFRA response from INFRA-18533:
>>>>>>>>       >>>
>>>>>>>>       >>> "Our rough metrics shows that Flink used over 5800 hours of
>>>>>>>>       build time
>>>>>>>>       >>> last month. That is equal to EIGHT servers running 24/7 for
>>>>>>>>       the ENTIRE
>>>>>>>>       >>> MONTH. EIGHT. nonstop.
>>>>>>>>       >>> When we discovered this last night, we discussed it some
>> and
>>>>>>>>       are going
>>>>>>>>       >>> to tune down Flink to allow only five executors maximum. We
>>>>>>>> cannot
>>>>>>>>       >>> allow Flink to consume so much of a Foundation shared
>>>>>>>> resource."
>>>>>>>>       >>>
>>>>>>>>       >>> So yes, we either
>>>>>>>>       >>> a) have to heavily reduce our CI usage or
>>>>>>>>       >>> b) fund our own, either maintaining it ourselves or
>> donating
>>>>>>>>       to Apache.
>>>>>>>>       >>>
>>>>>>>>       >>> On 02/07/2019 05:11, Bowen Li wrote:
>>>>>>>>       >>>> By looking at the git history of the Jenkins script, its
>> core
>>>>>>>>       part
>>>>>>>>       >>>> was finished in March 2017 (and only two minor update in
>>>>>>>>       2017/2018),
>>>>>>>>       >>>> so it's been running for over two years now and feels like
>>>>>>>>       Zepplin
>>>>>>>>       >>>> community has been quite happy with it. @Jeff Zhang
>>>>>>>>       >>>> <mailto:zjffdu@gmail.com <ma...@gmail.com>> can
>> you
>>>>>>>>       share your insights and user
>>>>>>>>       >>>> experience with the Jenkins+Travis approach?
>>>>>>>>       >>>>
>>>>>>>>       >>>> Things like:
>>>>>>>>       >>>>
>>>>>>>>       >>>> - has the approach completely solved the resource capacity
>>>>>>>>       problem
>>>>>>>>       >>>> for Zepplin community? is Zepplin community happy with the
>>>>>>>>       result?
>>>>>>>>       >>>> - is the whole configuration chain stable (e.g. uptime)
>>>>>>>> enough?
>>>>>>>>       >>>> - how often do you need to maintain the Jenkins infra? how
>>>>>>>> many
>>>>>>>>       >>>> people are usually involved in maintenance and bug-fixes?
>>>>>>>>       >>>>
>>>>>>>>       >>>> The downside of this approach seems mostly to be on the
>>>>>>>>       maintenance
>>>>>>>>       >>>> to me - maintain the script and Jenkins infra.
>>>>>>>>       >>>>
>>>>>>>>       >>>> ** Having Our Own Travis-CI.com Account **
>>>>>>>>       >>>>
>>>>>>>>       >>>> Another alternative I've been thinking of is to have our
>> own
>>>>>>>>       >>>> travis-ci.com <http://travis-ci.com> <
>> http://travis-ci.com>
>>>>>>>>       account with paid dedicated
>>>>>>>>       >>>> resources. Note travis-ci.org <http://travis-ci.org>
>>>>>>>>       <http://travis-ci.org> is the free
>>>>>>>>       >>>> version and travis-ci.com <http://travis-ci.com>
>>>>>>>>       <http://travis-ci.com> is the commercial
>>>>>>>>       >>>> version. We currently use a shared resource pool managed
>> by
>>>>>>>>       ASK INFRA
>>>>>>>>       >>>> team on travis-ci.org <http://travis-ci.org>
>>>>>>>>       <http://travis-ci.org>, but we have no control
>>>>>>>>       >>>> over it - we can't see how it's configured, how much
>>>>>>>>       resources are
>>>>>>>>       >>>> available, how resources are allocated among Apache
>> projects,
>>>>>>>>       etc.
>>>>>>>>       >>>> The nice thing about having an account on travis-ci.com
>>>>>>>>       <http://travis-ci.com>
>>>>>>>>       >>>> <http://travis-ci.com> are:
>>>>>>>>       >>>>
>>>>>>>>       >>>> - relatively low cost with much better resource guarantee
>>>>>>>>       than what
>>>>>>>>       >>>> we currently have [1]: $249/month with 5 dedicated
>>>>>>>> concurrency,
>>>>>>>>       >>>> $489/month with 10 concurrency
>>>>>>>>       >>>> - low maintenance work compared to using Jenkins
>>>>>>>>       >>>> - (potentially) no migration cost according to Travis's
>>>>>>>> doc [2]
>>>>>>>>       >>>> (pending verification)
>>>>>>>>       >>>> - full control over the build capacity/configuration
>>>>>>>> compared to
>>>>>>>>       >>>> using ASF INFRA's pool
>>>>>>>>       >>>>
>>>>>>>>       >>>> I'd be surprised if we as such a vibrant community cannot
>>>>>>>>       find and
>>>>>>>>       >>>> fund $249*12=$2988 a year in exchange for a much better
>>>>>>>> developer
>>>>>>>>       >>>> experience and much higher productivity.
>>>>>>>>       >>>>
>>>>>>>>       >>>> [1] https://travis-ci.com/plans
>>>>>>>>       >>>> [2]
>>>>>>>>       >>>>
>>>>>>>>       >>
>>>>>>>>
>> https://docs.travis-ci.com/user/migrate/open-source-repository-migration
>>>>>>>>       >>>> On Sat, Jun 29, 2019 at 8:39 AM Chesnay Schepler
>>>>>>>>       <chesnay@apache.org <ma...@apache.org>
>>>>>>>>       >>>> <mailto:chesnay@apache.org <ma...@apache.org>>>
>>>>>>>> wrote:
>>>>>>>>       >>>>
>>>>>>>>       >>>>      So yes, the Jenkins job keeps pulling the state from
>>>>>>>>       Travis until it
>>>>>>>>       >>>>      finishes.
>>>>>>>>       >>>>
>>>>>>>>       >>>>      Note sure I'm comfortable with the idea of using
>> Jenkins
>>>>>>>>       workers
>>>>>>>>       >>>>      just to
>>>>>>>>       >>>>      idle for a several hours.
>>>>>>>>       >>>>
>>>>>>>>       >>>>      On 29/06/2019 14:56, Jeff Zhang wrote:
>>>>>>>>       >>>>      > Here's what zeppelin community did, we make a
>> python
>>>>>>>>       script to
>>>>>>>>       >>>>      check the
>>>>>>>>       >>>>      > build status of pull request.
>>>>>>>>       >>>>      > Here's script:
>>>>>>>>       >>>>      >
>>>>>>>> https://github.com/apache/zeppelin/blob/master/travis_check.py
>>>>>>>>       >>>>      >
>>>>>>>>       >>>>      > And this is the script we used in Jenkins build
>> job.
>>>>>>>>       >>>>      >
>>>>>>>>       >>>>      > if [ -f "travis_check.py" ]; then
>>>>>>>>       >>>>      >    git log -n 1
>>>>>>>>       >>>>      >    STATUS=$(curl -s $BUILD_URL | grep -e "GitHub
>> pull
>>>>>>>>       >>>>      request.*from.*" | sed
>>>>>>>>       >>>>      > 's/.*GitHub pull request <a
>>>>>>>>       >>>>      >
>> href=\"\(https[^"]*\).*from[^"]*.\(https[^"]*\).*/\1
>>>>>>>>       \2/g')
>>>>>>>>       >>>>      >    AUTHOR=$(echo $STATUS | sed
>> 's/.*[/]\(.*\)$/\1/g')
>>>>>>>>       >>>>      >    PR=$(echo $STATUS | awk '{print $1}' | sed
>>>>>>>>       >>>> 's/.*[/]\(.*\)$/\1/g')
>>>>>>>>       >>>>      >    #COMMIT=$(git log -n 1 | grep "^Merge:" | awk
>>>>>>>>       '{print $3}')
>>>>>>>>       >>>>      >    #if [ -z $COMMIT ]; then
>>>>>>>>       >>>>      >    #  COMMIT=$(curl -s
>>>>>>>>       >>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR
>>>>>>>>       >>>>      > | grep -e "\"label\":" -e "\"ref\":" -e "\"sha\":"
>> |
>>>>>>>>       tr '\n' ' '
>>>>>>>>       >>>>      | sed
>>>>>>>>       >>>>      > 's/\(.*sha[^,]*,\)\(.*ref.*\)/\1 = \2/g' | tr =
>> '\n' |
>>>>>>>>       grep -v
>>>>>>>>       >>>>      "apache:" |
>>>>>>>>       >>>>      > sed 's/.*sha.[^"]*["]\([^"]*\).*/\1/g')
>>>>>>>>       >>>>      >    #fi
>>>>>>>>       >>>>      >
>>>>>>>>       >>>>      >    # get commit hash from PR
>>>>>>>>       >>>>      >    COMMIT=$(curl -s
>>>>>>>>       >>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR |
>>>>>>>>       >>>>      > grep -e "\"label\":" -e "\"ref\":" -e "\"sha\":" |
>> tr
>>>>>>>>       '\n' ' '
>>>>>>>>       >>>> | sed
>>>>>>>>       >>>>      > 's/\(.*sha[^,]*,\)\(.*ref.*\)/\1 = \2/g' | tr =
>> '\n' |
>>>>>>>>       grep -v
>>>>>>>>       >>>>      "apache:" |
>>>>>>>>       >>>>      > sed 's/.*sha.[^"]*["]\([^"]*\).*/\1/g')
>>>>>>>>       >>>>      >    sleep 30 # sleep few moment to wait travis
>> starts
>>>>>>>>       the build
>>>>>>>>       >>>>      >    RET_CODE=0
>>>>>>>>       >>>>      >    python ./travis_check.py ${AUTHOR} ${COMMIT} ||
>>>>>>>>       RET_CODE=$?
>>>>>>>>       >>>>      >    if [ $RET_CODE -eq 2 ]; then # try with
>> repository
>>>>>>>>       name when
>>>>>>>>       >>>>      travis-ci is
>>>>>>>>       >>>>      > not available in the account
>>>>>>>>       >>>>      >      RET_CODE=0
>>>>>>>>       >>>>      >      AUTHOR=$(curl -s
>>>>>>>>       >>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR
>>>>>>>>       >>>>      > | grep '"full_name":' | grep -v "apache/zeppelin" |
>>>>>>>> sed
>>>>>>>>       >>>>      > 's/.*[:][^"]*["]\([^/]*\).*/\1/g')
>>>>>>>>       >>>>      >    python ./travis_check.py ${AUTHOR} ${COMMIT} ||
>>>>>>>>       RET_CODE=$?
>>>>>>>>       >>>>      >    fi
>>>>>>>>       >>>>      >
>>>>>>>>       >>>>      >    if [ $RET_CODE -eq 2 ]; then # fail with can't
>> find
>>>>>>>>       build
>>>>>>>>       >>>>      information in
>>>>>>>>       >>>>      > the travis
>>>>>>>>       >>>>      >      set +x
>>>>>>>>       >>>>      >      echo
>>>>>>>>       "-----------------------------------------------------"
>>>>>>>>       >>>>      >      echo "Looks like travis-ci is not configured
>> for
>>>>>>>>       your fork."
>>>>>>>>       >>>>      >      echo "Please setup by swich on 'zeppelin'
>>>>>>>>       repository at
>>>>>>>>       >>>>      > https://travis-ci.org/profile and travis-ci."
>>>>>>>>       >>>>      >      echo "And then make sure 'Build branch
>> updates'
>>>>>>>>       option is
>>>>>>>>       >>>>      enabled in
>>>>>>>>       >>>>      > the settings
>>>>>>>>       https://travis-ci.org/${AUTHOR}/zeppelin/settings
>>>>>>>> <https://travis-ci.org/$%7BAUTHOR%7D/zeppelin/settings>
>>>>>>>>       >>>> <https://travis-ci.org/$%7BAUTHOR%7D/zeppelin/settings>."
>>>>>>>>       >>>>      >      echo ""
>>>>>>>>       >>>>      >      echo "To trigger CI after setup, you will need
>>>>>>>>       ammend your
>>>>>>>>       >>>>      last commit
>>>>>>>>       >>>>      > with"
>>>>>>>>       >>>>      >      echo "git commit --amend"
>>>>>>>>       >>>>      >      echo "git push your-remote HEAD --force"
>>>>>>>>       >>>>      >      echo ""
>>>>>>>>       >>>>      >      echo "See
>>>>>>>>       >>>>      >
>>>>>>>>       >>>>
>>>>>>>>       >>
>>>>>>>>
>> http://zeppelin.apache.org/contribution/contributions.html#continuous-integration
>>>>>>>>       >>>>      > ."
>>>>>>>>       >>>>      >    fi
>>>>>>>>       >>>>      >
>>>>>>>>       >>>>      >    exit $RET_CODE
>>>>>>>>       >>>>      > else
>>>>>>>>       >>>>      >    set +x
>>>>>>>>       >>>>      >    echo "travis_check.py does not exists"
>>>>>>>>       >>>>      >    exit 1
>>>>>>>>       >>>>      > fi
>>>>>>>>       >>>>      >
>>>>>>>>       >>>>      > Chesnay Schepler <chesnay@apache.org
>>>>>>>>       <ma...@apache.org>
>>>>>>>>       >>>>      <mailto:chesnay@apache.org <mailto:
>> chesnay@apache.org
>>>>>>>>       于2019年6月29日周六 下午3:17写道:
>>>>>>>>       >>>>      >
>>>>>>>>       >>>>      >> Does this imply that a Jenkins job is active as
>> long
>>>>>>>>       as the
>>>>>>>>       >>>>      Travis build
>>>>>>>>       >>>>      >> runs?
>>>>>>>>       >>>>      >>
>>>>>>>>       >>>>      >> On 26/06/2019 21:28, Bowen Li wrote:
>>>>>>>>       >>>>      >>> Hi,
>>>>>>>>       >>>>      >>>
>>>>>>>>       >>>>      >>> @Dawid, I think the "long test running" as I
>>>>>>>>       mentioned in the
>>>>>>>>       >>>>      first
>>>>>>>>       >>>>      >> email,
>>>>>>>>       >>>>      >>> also as you guys said, belongs to "a big effort
>>>>>>>>       which is much
>>>>>>>>       >>>>      harder to
>>>>>>>>       >>>>      >>> accomplish in a short period of time and may
>> deserve
>>>>>>>>       its own
>>>>>>>>       >>>>      separate
>>>>>>>>       >>>>      >>> discussion". Thus I didn't include it in what we
>> can
>>>>>>>>       do in a
>>>>>>>>       >>>>      foreseeable
>>>>>>>>       >>>>      >>> short term.
>>>>>>>>       >>>>      >>>
>>>>>>>>       >>>>      >>> Besides, I don't think that's the ultimate reason
>>>>>>>>       for lack of
>>>>>>>>       >>>>      build
>>>>>>>>       >>>>      >>> resources. Even if the build is shortened to
>>>>>>>>       something like
>>>>>>>>       >>>>      2h, the
>>>>>>>>       >>>>      >>> problems of no build machine works about 6 or
>> more
>>>>>>>>       hours in
>>>>>>>>       >>>>      PST daytime
>>>>>>>>       >>>>      >>> that I described will still happen, because no
>>>>>>>>       machine from
>>>>>>>>       >>>>      ASF INFRA's
>>>>>>>>       >>>>      >>> pool is allocated to Flink. As I have paid close
>>>>>>>>       attention to
>>>>>>>>       >>>>      the build
>>>>>>>>       >>>>      >>> queue in the past few weekdays, it's a pretty
>> clear
>>>>>>>>       pattern now.
>>>>>>>>       >>>>      >>>
>>>>>>>>       >>>>      >>> **The ultimate root cause** for that is - we
>> don't
>>>>>>>>       have any
>>>>>>>>       >>>>      **dedicated**
>>>>>>>>       >>>>      >>> build resources that we can stably rely on. I'm
>>>>>>>>       actually ok to
>>>>>>>>       >>>>      wait for a
>>>>>>>>       >>>>      >>> long time if there are build requests running, it
>>>>>>>>       means at
>>>>>>>>       >>>>      least we are
>>>>>>>>       >>>>      >>> making progress. But I'm not ok with no build
>>>>>>>>       resource. A
>>>>>>>>       >>>>      better place I
>>>>>>>>       >>>>      >>> think we should aim at in short term is to always
>>>>>>>>       have at
>>>>>>>>       >>>>      least a central
>>>>>>>>       >>>>      >>> pool (can be 3 or 5) of machines dedicated to
>> build
>>>>>>>>       Flink at
>>>>>>>>       >>>>      any time, or
>>>>>>>>       >>>>      >>> maybe use users resources.
>>>>>>>>       >>>>      >>>
>>>>>>>>       >>>>      >>> @Chesnay @Robert I synced with Jeff offline that
>>>>>>>>       Zeppelin
>>>>>>>>       >>>>      community is
>>>>>>>>       >>>>      >>> using a Jenkins job to automatically build on
>> users'
>>>>>>>>       travis
>>>>>>>>       >>>>      account and
>>>>>>>>       >>>>      >>> link the result back to github PR. I guess the
>>>>>>>>       Jenkins job
>>>>>>>>       >>>>      would fetch
>>>>>>>>       >>>>      >>> latest upstream master and build the PR against
>> it.
>>>>>>>>       Jeff has
>>>>>>>>       >>>> filed
>>>>>>>>       >>>>      >> tickets
>>>>>>>>       >>>>      >>> to learn and get access to the Jenkins infra.
>> It'll
>>>>>>>>       better to
>>>>>>>>       >>>>      fully
>>>>>>>>       >>>>      >>> understand it first before judging this approach.
>>>>>>>>       >>>>      >>>
>>>>>>>>       >>>>      >>> I also heard good things about CircleCI, and ASF
>>>>>>>>       INFRA seems
>>>>>>>>       >>>>      to have a
>>>>>>>>       >>>>      >> pool
>>>>>>>>       >>>>      >>> of build capacity there too. Can be an
>> alternative
>>>>>>>>       to consider.
>>>>>>>>       >>>>      >>>
>>>>>>>>       >>>>      >>>
>>>>>>>>       >>>>      >>>
>>>>>>>>       >>>>      >>>
>>>>>>>>       >>>>      >>>
>>>>>>>>       >>>>      >>>
>>>>>>>>       >>>>      >>>
>>>>>>>>       >>>>      >>>
>>>>>>>>       >>>>      >>>
>>>>>>>>       >>>>      >>> On Wed, Jun 26, 2019 at 12:44 AM Dawid
>> Wysakowicz <
>>>>>>>>       >>>>      >> dwysakowicz@apache.org
>>>>>>>>       <ma...@apache.org> <mailto:dwysakowicz@apache.org
>>>>>>>>       <ma...@apache.org>>>
>>>>>>>>       >>>>      >>> wrote:
>>>>>>>>       >>>>      >>>
>>>>>>>>       >>>>      >>>> Sorry to jump in late, but I think Bowen missed
>> the
>>>>>>>>       most
>>>>>>>>       >>>>      important point
>>>>>>>>       >>>>      >>>> from Chesnay's previous message in the summary.
>> The
>>>>>>>>       ultimate
>>>>>>>>       >>>>      reason for
>>>>>>>>       >>>>      >>>> all the problems is that the tests take close
>> to 2
>>>>>>>>       hours to
>>>>>>>>       >>>>      run already.
>>>>>>>>       >>>>      >>>> I fully support this claim: "Unless people start
>>>>>>>>       caring about
>>>>>>>>       >>>>      test times
>>>>>>>>       >>>>      >>>> before adding them, this issue cannot be solved"
>>>>>>>>       >>>>      >>>>
>>>>>>>>       >>>>      >>>> This is also another reason why using user's
>> Travis
>>>>>>>>       account
>>>>>>>>       >>>>      won't help.
>>>>>>>>       >>>>      >>>> Every few weeks we reach the user's time limit
>> for
>>>>>>>>       a single
>>>>>>>>       >>>>      profile.
>>>>>>>>       >>>>      >>>> This makes the user's builds simply fail, until
>> we
>>>>>>>>       either
>>>>>>>>       >>>>      properly
>>>>>>>>       >>>>      >>>> decrease the time the tests take (which I am not
>>>>>>>>       sure we ever
>>>>>>>>       >>>>      did) or
>>>>>>>>       >>>>      >>>> postpone the problem by splitting into more
>>>>>>>>       profiles. (Note
>>>>>>>>       >>>>      that the ASF
>>>>>>>>       >>>>      >>>> Travis account has higher time limits)
>>>>>>>>       >>>>      >>>>
>>>>>>>>       >>>>      >>>> Best,
>>>>>>>>       >>>>      >>>>
>>>>>>>>       >>>>      >>>> Dawid
>>>>>>>>       >>>>      >>>>
>>>>>>>>       >>>>      >>>> On 26/06/2019 09:36, Robert Metzger wrote:
>>>>>>>>       >>>>      >>>>> Do we know if using "the best" available
>> hardware
>>>>>>>>       would
>>>>>>>>       >>>>      improve the
>>>>>>>>       >>>>      >> build
>>>>>>>>       >>>>      >>>>> times?
>>>>>>>>       >>>>      >>>>> Imagine we would run the build on machines with
>>>>>>>>       plenty of
>>>>>>>>       >>>>      main memory
>>>>>>>>       >>>>      >> to
>>>>>>>>       >>>>      >>>>> mount everything to ramdisk + the latest CPU
>>>>>>>>       architecture?
>>>>>>>>       >>>>      >>>>>
>>>>>>>>       >>>>      >>>>> Throwing hardware at the problem could help
>> reduce
>>>>>>>>       the time
>>>>>>>>       >>>>      of an
>>>>>>>>       >>>>      >>>>> individual build, and using our own
>> infrastructure
>>>>>>>>       would
>>>>>>>>       >>>>      remove our
>>>>>>>>       >>>>      >>>>> dependency on Apache's Travis account (with the
>>>>>>>>       obvious
>>>>>>>>       >>>>      downside of
>>>>>>>>       >>>>      >>>> having
>>>>>>>>       >>>>      >>>>> to maintain the infrastructure)
>>>>>>>>       >>>>      >>>>> We could use an open source travis
>> alternative, to
>>>>>>>>       have a
>>>>>>>>       >>>>      similar
>>>>>>>>       >>>>      >>>>> experience and make the migration easy.
>>>>>>>>       >>>>      >>>>>
>>>>>>>>       >>>>      >>>>>
>>>>>>>>       >>>>      >>>>> On Wed, Jun 26, 2019 at 9:34 AM Chesnay
>> Schepler
>>>>>>>>       >>>>      <chesnay@apache.org <ma...@apache.org>
>>>>>>>>       <mailto:chesnay@apache.org <ma...@apache.org>>>
>>>>>>>>       >>>>      >>>> wrote:
>>>>>>>>       >>>>      >>>>>> >From what I gathered, there's no special
>>>>>>>>       sauce that the
>>>>>>>>       >>>>      Zeppelin
>>>>>>>>       >>>>      >>>>>> project uses which actually integrates a users
>>>>>>>> Travis
>>>>>>>>       >>>>      account into the
>>>>>>>>       >>>>      >>>> PR.
>>>>>>>>       >>>>      >>>>>> They just disabled Travis for PRs. And that's
>>>>>>>>       kind of it.
>>>>>>>>       >>>>      >>>>>>
>>>>>>>>       >>>>      >>>>>> Naturally we can do this (duh) and safe the
>> ASF a
>>>>>>>>       fair
>>>>>>>>       >>>>      amount of
>>>>>>>>       >>>>      >>>>>> resources, but there are downsides:
>>>>>>>>       >>>>      >>>>>>
>>>>>>>>       >>>>      >>>>>> The discoverability of the Travis check takes
>> a
>>>>>>>>       nose-dive.
>>>>>>>>       >>>>      Either we
>>>>>>>>       >>>>      >>>>>> require every contributor to always, an every
>>>>>>>>       commit, also
>>>>>>>>       >>>>      post a
>>>>>>>>       >>>>      >> Travis
>>>>>>>>       >>>>      >>>>>> build, or we have the reviewer sift through
>> the
>>>>>>>>       >>>>      contributors account
>>>>>>>>       >>>>      >> to
>>>>>>>>       >>>>      >>>>>> find it.
>>>>>>>>       >>>>      >>>>>>
>>>>>>>>       >>>>      >>>>>> This is rather cumbersome. Additionally, it's
>>>>>>>>       also not
>>>>>>>>       >>>>      equivalent to
>>>>>>>>       >>>>      >>>>>> having a PR build.
>>>>>>>>       >>>>      >>>>>>
>>>>>>>>       >>>>      >>>>>> A normal branch build takes a branch as is and
>>>>>>>>       tests it. A
>>>>>>>>       >>>>      PR build
>>>>>>>>       >>>>      >>>>>> merges the branch into master, and then runs
>> it.
>>>>>>>>       (Fun fact:
>>>>>>>>       >>>>      This is
>>>>>>>>       >>>>      >> why
>>>>>>>>       >>>>      >>>>>> a PR without merge conflicts is not being run
>> on
>>>>>>>>       Travis.)
>>>>>>>>       >>>>      >>>>>>
>>>>>>>>       >>>>      >>>>>> And ultimately, everyone can already make use
>>>>>>>> of this
>>>>>>>>       >>>>      approach anyway.
>>>>>>>>       >>>>      >>>>>>
>>>>>>>>       >>>>      >>>>>> On 25/06/2019 08:02, Jark Wu wrote:
>>>>>>>>       >>>>      >>>>>>> Hi Jeff,
>>>>>>>>       >>>>      >>>>>>>
>>>>>>>>       >>>>      >>>>>>> Thanks for sharing the Zeppelin approach. I
>>>>>>>>       think it's a
>>>>>>>>       >>>>      good idea to
>>>>>>>>       >>>>      >>>>>>> leverage user's travis account.
>>>>>>>>       >>>>      >>>>>>> In this way, we can have almost unlimited
>>>>>>>>       concurrent build
>>>>>>>>       >>>>      jobs and
>>>>>>>>       >>>>      >>>>>>> developers can restart build by themselves
>>>>>>>>       (currently only
>>>>>>>>       >>>>      committers
>>>>>>>>       >>>>      >>>>>>> can restart PR's build).
>>>>>>>>       >>>>      >>>>>>>
>>>>>>>>       >>>>      >>>>>>> But I'm still not very clear how to integrate
>>>>>>>> user's
>>>>>>>>       >>>>      travis build
>>>>>>>>       >>>>      >> into
>>>>>>>>       >>>>      >>>>>>> the Flink pull request's build automatically.
>>>>>>>>       Can you
>>>>>>>>       >>>>      explain more in
>>>>>>>>       >>>>      >>>>>>> detail?
>>>>>>>>       >>>>      >>>>>>>
>>>>>>>>       >>>>      >>>>>>> Another question: does travis only build
>>>>>>>>       branches for user
>>>>>>>>       >>>>      account?
>>>>>>>>       >>>>      >>>>>>> My concern is that builds for PRs will rebase
>>>>>>>> user's
>>>>>>>>       >>>>      commits against
>>>>>>>>       >>>>      >>>>>>> current master branch.
>>>>>>>>       >>>>      >>>>>>> This will help us to find problems before
>>>>>>>>       merge.  Builds
>>>>>>>>       >>>>      for branches
>>>>>>>>       >>>>      >>>>>>> will lose the impact of new commits in
>> master.
>>>>>>>>       >>>>      >>>>>>> How does Zeppelin solve this problem?
>>>>>>>>       >>>>      >>>>>>>
>>>>>>>>       >>>>      >>>>>>> Thanks again for sharing the idea.
>>>>>>>>       >>>>      >>>>>>>
>>>>>>>>       >>>>      >>>>>>> Regards,
>>>>>>>>       >>>>      >>>>>>> Jark
>>>>>>>>       >>>>      >>>>>>>
>>>>>>>>       >>>>      >>>>>>> On Tue, 25 Jun 2019 at 11:01, Jeff Zhang
>>>>>>>>       <zjffdu@gmail.com <ma...@gmail.com>
>>>>>>>>       >>>>      <mailto:zjffdu@gmail.com <ma...@gmail.com>>
>>>>>>>>       >>>>      >>>>>>> <mailto:zjffdu@gmail.com
>>>>>>>>       <ma...@gmail.com> <mailto:zjffdu@gmail.com
>>>>>>>>       <ma...@gmail.com>>>> wrote:
>>>>>>>>       >>>>      >>>>>>>
>>>>>>>>       >>>>      >>>>>>>  Hi Folks,
>>>>>>>>       >>>>      >>>>>>>
>>>>>>>>       >>>>      >>>>>>>  Zeppelin meet this kind of issue before, we
>>>>>>>> solve
>>>>>>>>       >>>> it by
>>>>>>>>       >>>>      >> delegating
>>>>>>>>       >>>>      >>>>>>>  each
>>>>>>>>       >>>>      >>>>>>>  one's PR build to his travis account
>>>>>>>>       (Everyone can
>>>>>>>>       >>>>      have 5 free
>>>>>>>>       >>>>      >>>>>>>  slot for
>>>>>>>>       >>>>      >>>>>>>  travis build).
>>>>>>>>       >>>>      >>>>>>>  Apache account travis build is only
>>>>>>>> triggered when
>>>>>>>>       >>>>      PR is merged.
>>>>>>>>       >>>>      >>>>>>>
>>>>>>>>       >>>>      >>>>>>>
>>>>>>>>       >>>>      >>>>>>>
>>>>>>>>       >>>>      >>>>>>>  Kurt Young <ykt836@gmail.com
>>>>>>>>       <ma...@gmail.com>
>>>>>>>>       >>>>      <mailto:ykt836@gmail.com <ma...@gmail.com>>
>>>>>>>>       <mailto:ykt836@gmail.com <ma...@gmail.com>
>>>>>>>>       >>>>      <mailto:ykt836@gmail.com <mailto:ykt836@gmail.com
>>>>>>>>       >>>>      >>>>>>>  于2019年6月25日周二 上午10:16写道:
>>>>>>>>       >>>>      >>>>>>>
>>>>>>>>       >>>>      >>>>>>>  > (Forgot to cc George)
>>>>>>>>       >>>>      >>>>>>>  >
>>>>>>>>       >>>>      >>>>>>>  > Best,
>>>>>>>>       >>>>      >>>>>>>  > Kurt
>>>>>>>>       >>>>      >>>>>>>  >
>>>>>>>>       >>>>      >>>>>>>  >
>>>>>>>>       >>>>      >>>>>>>  > On Tue, Jun 25, 2019 at 10:16 AM Kurt
>> Young
>>>>>>>>       >>>>      <ykt836@gmail.com <ma...@gmail.com>
>>>>>>>>       <mailto:ykt836@gmail.com <ma...@gmail.com>>
>>>>>>>>       >>>>      >>>>>>> <mailto:ykt836@gmail.com
>>>>>>>>       <ma...@gmail.com> <mailto:ykt836@gmail.com
>>>>>>>>       <ma...@gmail.com>>>>
>>>>>>>>       >>>>      wrote:
>>>>>>>>       >>>>      >>>>>>>  >
>>>>>>>>       >>>>      >>>>>>>  > > Hi Bowen,
>>>>>>>>       >>>>      >>>>>>>  > >
>>>>>>>>       >>>>      >>>>>>>  > > Thanks for bringing this up. We
>>>>>>>>       actually have
>>>>>>>>       >>>>      discussed
>>>>>>>>       >>>>      >> about
>>>>>>>>       >>>>      >>>>>>>  this, and I
>>>>>>>>       >>>>      >>>>>>>  > > think Till and George have
>>>>>>>>       >>>>      >>>>>>>  > > already spend sometime investigating
>>>>>>>>       it. I have
>>>>>>>>       >>>>      cced both of
>>>>>>>>       >>>>      >>>>>>>  them, and
>>>>>>>>       >>>>      >>>>>>>  > > maybe they can share
>>>>>>>>       >>>>      >>>>>>>  > > their findings.
>>>>>>>>       >>>>      >>>>>>>  > >
>>>>>>>>       >>>>      >>>>>>>  > > Best,
>>>>>>>>       >>>>      >>>>>>>  > > Kurt
>>>>>>>>       >>>>      >>>>>>>  > >
>>>>>>>>       >>>>      >>>>>>>  > >
>>>>>>>>       >>>>      >>>>>>>  > > On Tue, Jun 25, 2019 at 10:08 AM Jark Wu
>>>>>>>>       >>>>      <imjark@gmail.com <ma...@gmail.com>
>>>>>>>>       <mailto:imjark@gmail.com <ma...@gmail.com>>
>>>>>>>>       >>>>      >>>>>>> <mailto:imjark@gmail.com
>>>>>>>>       <ma...@gmail.com> <mailto:imjark@gmail.com
>>>>>>>>       <ma...@gmail.com>>>>
>>>>>>>>       >>>>      wrote:
>>>>>>>>       >>>>      >>>>>>>  > >
>>>>>>>>       >>>>      >>>>>>>  > >> Hi Bowen,
>>>>>>>>       >>>>      >>>>>>>  > >>
>>>>>>>>       >>>>      >>>>>>>  > >> Thanks for bringing this. We also
>>>>>>>>       suffered from
>>>>>>>>       >>>>      the long
>>>>>>>>       >>>>      >>>>>>>  build time.
>>>>>>>>       >>>>      >>>>>>>  > >> I agree that we should focus on
>>>>>>>>       solving build
>>>>>>>>       >>>>      capacity
>>>>>>>>       >>>>      >>>>>>>  problem in the
>>>>>>>>       >>>>      >>>>>>>  > >> thread.
>>>>>>>>       >>>>      >>>>>>>  > >>
>>>>>>>>       >>>>      >>>>>>>  > >> My observation is there is only one
>>>>>>>>       build is
>>>>>>>>       >>>>      running, all
>>>>>>>>       >>>>      >> the
>>>>>>>>       >>>>      >>>>>>>  others
>>>>>>>>       >>>>      >>>>>>>  > >> (other
>>>>>>>>       >>>>      >>>>>>>  > >> PRs, master) are pending.
>>>>>>>>       >>>>      >>>>>>>  > >> The pricing plan[1] of travis shows
>>>>>>>>       it can
>>>>>>>>       >>>> support
>>>>>>>>       >>>>      >> concurrent
>>>>>>>>       >>>>      >>>>>>>  build
>>>>>>>>       >>>>      >>>>>>>  > jobs.
>>>>>>>>       >>>>      >>>>>>>  > >> But I don't know which plan we are
>>>>>>>>       using, might
>>>>>>>>       >>>>      be the free
>>>>>>>>       >>>>      >>>>>>>  plan for
>>>>>>>>       >>>>      >>>>>>>  > open
>>>>>>>>       >>>>      >>>>>>>  > >> source.
>>>>>>>>       >>>>      >>>>>>>  > >>
>>>>>>>>       >>>>      >>>>>>>  > >> I cc-ed Chesnay who may have some
>>>>>>>>       experience on
>>>>>>>>       >>>>      Travis.
>>>>>>>>       >>>>      >>>>>>>  > >>
>>>>>>>>       >>>>      >>>>>>>  > >> Regards,
>>>>>>>>       >>>>      >>>>>>>  > >> Jark
>>>>>>>>       >>>>      >>>>>>>  > >>
>>>>>>>>       >>>>      >>>>>>>  > >> [1]: https://travis-ci.com/plans
>>>>>>>>       >>>>      >>>>>>>  > >>
>>>>>>>>       >>>>      >>>>>>>  > >> On Tue, 25 Jun 2019 at 08:11, Bowen Li
>> <
>>>>>>>>       >>>>      >> bowenli86@gmail.com <ma...@gmail.com>
>>>>>>>>       <mailto:bowenli86@gmail.com <ma...@gmail.com>>
>>>>>>>>       >>>>      >>>>>>> <mailto:bowenli86@gmail.com
>>>>>>>>       <ma...@gmail.com>
>>>>>>>>       >>>>      <mailto:bowenli86@gmail.com
>>>>>>>>       <ma...@gmail.com>>>> wrote:
>>>>>>>>       >>>>      >>>>>>>  > >>
>>>>>>>>       >>>>      >>>>>>>  > >> > Hi Steven,
>>>>>>>>       >>>>      >>>>>>>  > >> >
>>>>>>>>       >>>>      >>>>>>>  > >> > I think you may not read what I
>>>>>>>>       wrote. The
>>>>>>>>       >>>>      discussion is
>>>>>>>>       >>>>      >>>> about
>>>>>>>>       >>>>      >>>>>>>  > "unstable
>>>>>>>>       >>>>      >>>>>>>  > >> > build **capacity**", in another word
>>>>>>>>       >>>>      "unstable / lack of
>>>>>>>>       >>>>      >>>> build
>>>>>>>>       >>>>      >>>>>>>  > >> resources",
>>>>>>>>       >>>>      >>>>>>>  > >> > not "unstable build".
>>>>>>>>       >>>>      >>>>>>>  > >> >
>>>>>>>>       >>>>      >>>>>>>  > >> > On Mon, Jun 24, 2019 at 4:40 PM
>>>>>>>>       Steven Wu
>>>>>>>>       >>>>      >>>>>>>  <stevenz3wu@gmail.com
>>>>>>>>       <ma...@gmail.com> <mailto:stevenz3wu@gmail.com
>>>>>>>>       <ma...@gmail.com>>
>>>>>>>>       >>>>      <mailto:stevenz3wu@gmail.com
>>>>>>>>       <ma...@gmail.com> <mailto:stevenz3wu@gmail.com
>>>>>>>>       <ma...@gmail.com>>>>
>>>>>>>>       >>>>      >>>>>>>  > wrote:
>>>>>>>>       >>>>      >>>>>>>  > >> >
>>>>>>>>       >>>>      >>>>>>>  > >> > > long and sometimes unstable build
>> is
>>>>>>>>       >>>>      definitely a pain
>>>>>>>>       >>>>      >>>>>> point.
>>>>>>>>       >>>>      >>>>>>>  > >> > >
>>>>>>>>       >>>>      >>>>>>>  > >> > > I suspect the build failure here in
>>>>>>>>       >>>>      >> flink-connector-kafka
>>>>>>>>       >>>>      >>>>>>>  is not
>>>>>>>>       >>>>      >>>>>>>  > >> related
>>>>>>>>       >>>>      >>>>>>>  > >> > to
>>>>>>>>       >>>>      >>>>>>>  > >> > > my change. but there is no easy
>>>>>>>>       re-run the
>>>>>>>>       >>>>      build on
>>>>>>>>       >>>>      >>>>>>>  travis UI.
>>>>>>>>       >>>>      >>>>>>>  > Google
>>>>>>>>       >>>>      >>>>>>>  > >> > > search showed a trick of
>>>>>>>>       close-and-open the
>>>>>>>>       >>>>      PR will
>>>>>>>>       >>>>      >>>>>>>  trigger rebuild.
>>>>>>>>       >>>>      >>>>>>>  > >> but
>>>>>>>>       >>>>      >>>>>>>  > >> > > that could add noises to the PR
>>>>>>>>       activities.
>>>>>>>>       >>>>      >>>>>>>  > >> > >
>>>>>>>>       >>>> https://travis-ci.org/apache/flink/jobs/545555519
>>>>>>>>       >>>>      >>>>>>>  > >> > >
>>>>>>>>       >>>>      >>>>>>>  > >> > > travis-ci for my personal repo
>>>>>>>>       often failed
>>>>>>>>       >>>>      with
>>>>>>>>       >>>>      >>>>>>>  exceeding time
>>>>>>>>       >>>>      >>>>>>>  > limit
>>>>>>>>       >>>>      >>>>>>>  > >> > after
>>>>>>>>       >>>>      >>>>>>>  > >> > > 4+ hours.
>>>>>>>>       >>>>      >>>>>>>  > >> > > The job exceeded the maximum time
>>>>>>>>       limit for
>>>>>>>>       >>>>      jobs, and
>>>>>>>>       >>>>      >> has
>>>>>>>>       >>>>      >>>>>>>  been
>>>>>>>>       >>>>      >>>>>>>  > >> > terminated.
>>>>>>>>       >>>>      >>>>>>>  > >> > >
>>>>>>>>       >>>>      >>>>>>>  > >> > > On Mon, Jun 24, 2019 at 4:15 PM
>>>>>>>>       Bowen Li
>>>>>>>>       >>>>      >>>>>>>  <bowenli86@gmail.com
>>>>>>>>       <ma...@gmail.com> <mailto:bowenli86@gmail.com
>>>>>>>>       <ma...@gmail.com>>
>>>>>>>>       >>>>      <mailto:bowenli86@gmail.com <mailto:
>> bowenli86@gmail.com
>>>>>>>>       <mailto:bowenli86@gmail.com <ma...@gmail.com>>>>
>>>>>>>>       >>>>      >>>>>>>  > wrote:
>>>>>>>>       >>>>      >>>>>>>  > >> > >
>>>>>>>>       >>>>      >>>>>>>  > >> > > >
>>>>>>>>       >>>> https://travis-ci.org/apache/flink/builds/549681530
>>>>>>>>       >>>>      >>>>>>>  This build
>>>>>>>>       >>>>      >>>>>>>  > >> > request
>>>>>>>>       >>>>      >>>>>>>  > >> > > > has
>>>>>>>>       >>>>      >>>>>>>  > >> > > > been sitting at **HEAD of the
>>>>>>>>       queue**
>>>>>>>>       >>>>      since I first
>>>>>>>>       >>>>      >> saw
>>>>>>>>       >>>>      >>>>>>>  it at PST
>>>>>>>>       >>>>      >>>>>>>  > >> > 10:30am
>>>>>>>>       >>>>      >>>>>>>  > >> > > > (not sure how long it's been
>>>>>>>>       there before
>>>>>>>>       >>>>      10:30am).
>>>>>>>>       >>>>      >>>>>>>  It's PST
>>>>>>>>       >>>>      >>>>>>>  > 4:12pm
>>>>>>>>       >>>>      >>>>>>>  > >> now
>>>>>>>>       >>>>      >>>>>>>  > >> > > and
>>>>>>>>       >>>>      >>>>>>>  > >> > > > it hasn't started yet.
>>>>>>>>       >>>>      >>>>>>>  > >> > > >
>>>>>>>>       >>>>      >>>>>>>  > >> > > > On Mon, Jun 24, 2019 at 2:48 PM
>>>>>>>>       Bowen Li
>>>>>>>>       >>>>      >>>>>>>  <bowenli86@gmail.com
>>>>>>>>       <ma...@gmail.com> <mailto:bowenli86@gmail.com
>>>>>>>>       <ma...@gmail.com>>
>>>>>>>>       >>>>      <mailto:bowenli86@gmail.com <mailto:
>> bowenli86@gmail.com
>>>>>>>>       <mailto:bowenli86@gmail.com <ma...@gmail.com>>>>
>>>>>>>>       >>>>      >>>>>>>  > >> wrote:
>>>>>>>>       >>>>      >>>>>>>  > >> > > >
>>>>>>>>       >>>>      >>>>>>>  > >> > > > > Hi devs,
>>>>>>>>       >>>>      >>>>>>>  > >> > > > >
>>>>>>>>       >>>>      >>>>>>>  > >> > > > > I've been experiencing the pain
>>>>>>>>       >>>>      resulting from lack
>>>>>>>>       >>>>      >>>>>>>  of stable
>>>>>>>>       >>>>      >>>>>>>  > >> build
>>>>>>>>       >>>>      >>>>>>>  > >> > > > > capacity on Travis for Flink
>>>>>>>>       PRs [1].
>>>>>>>>       >>>>      >> Specifically, I
>>>>>>>>       >>>>      >>>>>>>  noticed
>>>>>>>>       >>>>      >>>>>>>  > >> often
>>>>>>>>       >>>>      >>>>>>>  > >> > > that
>>>>>>>>       >>>>      >>>>>>>  > >> > > > no
>>>>>>>>       >>>>      >>>>>>>  > >> > > > > build in the queue is making
>> any
>>>>>>>>       >>>>      progress for
>>>>>>>>       >>>>      >> hours,
>>>>>>>>       >>>>      >>>> and
>>>>>>>>       >>>>      >>>>>>>  > suddenly
>>>>>>>>       >>>>      >>>>>>>  > >> 5
>>>>>>>>       >>>>      >>>>>>>  > >> > or
>>>>>>>>       >>>>      >>>>>>>  > >> > > 6
>>>>>>>>       >>>>      >>>>>>>  > >> > > > > builds kick off all together
>>>>>>>>       after the
>>>>>>>>       >>>>      long pause.
>>>>>>>>       >>>>      >>>>>>>  I'm at PST
>>>>>>>>       >>>>      >>>>>>>  > >> > (UTC-08)
>>>>>>>>       >>>>      >>>>>>>  > >> > > > time
>>>>>>>>       >>>>      >>>>>>>  > >> > > > > zone, and I've seen pause can
>>>>>>>>       be as
>>>>>>>>       >>>>      long as 6 hours
>>>>>>>>       >>>>      >>>>>>>  from PST 9am
>>>>>>>>       >>>>      >>>>>>>  > >> to
>>>>>>>>       >>>>      >>>>>>>  > >> > 3pm
>>>>>>>>       >>>>      >>>>>>>  > >> > > > > (let alone the time needed to
>>>>>>>>       drain the
>>>>>>>>       >>>>      queue
>>>>>>>>       >>>>      >>>>>>>  afterwards).
>>>>>>>>       >>>>      >>>>>>>  > >> > > > >
>>>>>>>>       >>>>      >>>>>>>  > >> > > > > I think this has greatly
>>>>>>>>       impacted our
>>>>>>>>       >>>>      productivity.
>>>>>>>>       >>>>      >>>> I've
>>>>>>>>       >>>>      >>>>>>>  > >> experienced
>>>>>>>>       >>>>      >>>>>>>  > >> > > that
>>>>>>>>       >>>>      >>>>>>>  > >> > > > > PRs submitted in the early
>>>>>>>>       morning of
>>>>>>>>       >>>>      PST time zone
>>>>>>>>       >>>>      >>>>>>>  won't finish
>>>>>>>>       >>>>      >>>>>>>  > >> > their
>>>>>>>>       >>>>      >>>>>>>  > >> > > > > build until late night of the
>>>>>>>>       same day.
>>>>>>>>       >>>>      >>>>>>>  > >> > > > >
>>>>>>>>       >>>>      >>>>>>>  > >> > > > > So my questions are:
>>>>>>>>       >>>>      >>>>>>>  > >> > > > >
>>>>>>>>       >>>>      >>>>>>>  > >> > > > > - Has anyone else experienced
>>>>>>>>       the same
>>>>>>>>       >>>>      problem or
>>>>>>>>       >>>>      >>>>>>>  have similar
>>>>>>>>       >>>>      >>>>>>>  > >> > > > observation
>>>>>>>>       >>>>      >>>>>>>  > >> > > > > on TravisCI? (I suspect it
>>>>>>>>       has things
>>>>>>>>       >>>>      to do with
>>>>>>>>       >>>>      >> time
>>>>>>>>       >>>>      >>>>>>>  zone)
>>>>>>>>       >>>>      >>>>>>>  > >> > > > >
>>>>>>>>       >>>>      >>>>>>>  > >> > > > > - What pricing plan of
>>>>>>>>       TravisCI is
>>>>>>>>       >>>>      Flink currently
>>>>>>>>       >>>>      >>>>>>>  using? Is it
>>>>>>>>       >>>>      >>>>>>>  > >> the
>>>>>>>>       >>>>      >>>>>>>  > >> > > free
>>>>>>>>       >>>>      >>>>>>>  > >> > > > > plan for open source
>>>>>>>>       projects? What
>>>>>>>>       >>>> are the
>>>>>>>>       >>>>      >>>>>>>  guaranteed build
>>>>>>>>       >>>>      >>>>>>>  > >> capacity
>>>>>>>>       >>>>      >>>>>>>  > >> > > of
>>>>>>>>       >>>>      >>>>>>>  > >> > > > > the current plan?
>>>>>>>>       >>>>      >>>>>>>  > >> > > > >
>>>>>>>>       >>>>      >>>>>>>  > >> > > > > - If the current pricing plan
>>>>>>>>       (either
>>>>>>>>       >>>>      free or paid)
>>>>>>>>       >>>>      >>>>>> can't
>>>>>>>>       >>>>      >>>>>>>  > provide
>>>>>>>>       >>>>      >>>>>>>  > >> > > stable
>>>>>>>>       >>>>      >>>>>>>  > >> > > > > build capacity, can we
>>>>>>>>       upgrade to a
>>>>>>>>       >>>>      higher priced
>>>>>>>>       >>>>      >>>>>>>  plan with
>>>>>>>>       >>>>      >>>>>>>  > larger
>>>>>>>>       >>>>      >>>>>>>  > >> > and
>>>>>>>>       >>>>      >>>>>>>  > >> > > > more
>>>>>>>>       >>>>      >>>>>>>  > >> > > > > stable build capacity?
>>>>>>>>       >>>>      >>>>>>>  > >> > > > >
>>>>>>>>       >>>>      >>>>>>>  > >> > > > > BTW, another factor that
>>>>>>>>       contribute to
>>>>>>>>       >>>> the
>>>>>>>>       >>>>      >>>>>>>  productivity problem
>>>>>>>>       >>>>      >>>>>>>  > is
>>>>>>>>       >>>>      >>>>>>>  > >> > that
>>>>>>>>       >>>>      >>>>>>>  > >> > > > > our build is slow - we run
>>>>>>>>       full build
>>>>>>>>       >>>>      for every PR
>>>>>>>>       >>>>      >>>> and a
>>>>>>>>       >>>>      >>>>>>>  > >> successful
>>>>>>>>       >>>>      >>>>>>>  > >> > > full
>>>>>>>>       >>>>      >>>>>>>  > >> > > > > build takes ~5h. We
>>>>>>>>       definitely have
>>>>>>>>       >>>>      more options to
>>>>>>>>       >>>>      >>>>>>>  solve it,
>>>>>>>>       >>>>      >>>>>>>  > for
>>>>>>>>       >>>>      >>>>>>>  > >> > > > instance,
>>>>>>>>       >>>>      >>>>>>>  > >> > > > > modularize the build graphs
>>>>>>>>       and reuse
>>>>>>>>       >>>>      artifacts
>>>>>>>>       >>>>      >> from
>>>>>>>>       >>>>      >>>> the
>>>>>>>>       >>>>      >>>>>>>  > previous
>>>>>>>>       >>>>      >>>>>>>  > >> > > build.
>>>>>>>>       >>>>      >>>>>>>  > >> > > > > But I think that can be a big
>>>>>>>>       effort
>>>>>>>>       >>>>      which is much
>>>>>>>>       >>>>      >>>>>>>  harder to
>>>>>>>>       >>>>      >>>>>>>  > >> > accomplish
>>>>>>>>       >>>>      >>>>>>>  > >> > > > in
>>>>>>>>       >>>>      >>>>>>>  > >> > > > > a short period of time and
>>>>>>>>       may deserve
>>>>>>>>       >>>>      its own
>>>>>>>>       >>>>      >>>> separate
>>>>>>>>       >>>>      >>>>>>>  > >> discussion.
>>>>>>>>       >>>>      >>>>>>>  > >> > > > >
>>>>>>>>       >>>>      >>>>>>>  > >> > > > > [1]
>>>>>>>>       >>>>      >> https://travis-ci.org/apache/flink/pull_requests
>>>>>>>>       >>>>      >>>>>>>  > >> > > > >
>>>>>>>>       >>>>      >>>>>>>  > >> > > > >
>>>>>>>>       >>>>      >>>>>>>  > >> > > >
>>>>>>>>       >>>>      >>>>>>>  > >> > >
>>>>>>>>       >>>>      >>>>>>>  > >> >
>>>>>>>>       >>>>      >>>>>>>  > >>
>>>>>>>>       >>>>      >>>>>>>  > >
>>>>>>>>       >>>>      >>>>>>>  >
>>>>>>>>       >>>>      >>>>>>>
>>>>>>>>       >>>>      >>>>>>>
>>>>>>>>       >>>>      >>>>>>>  --
>>>>>>>>       >>>>      >>>>>>>  Best Regards
>>>>>>>>       >>>>      >>>>>>>
>>>>>>>>       >>>>      >>>>>>>  Jeff Zhang
>>>>>>>>       >>>>      >>>>>>>
>>>>>>>>       >>>>      >>
>>>>>>>>       >>>>
>>>>>>>>       >>>
>>>>>>>>       >>
>>>>>>>>
>>


Re: [RESULT][VOTE] Migrate to sponsored Travis account

Posted by Jark Wu <im...@gmail.com>.
Hi Chesnay,

Can we assign Flink Committers the permission of flink-ci/flink repo?
Several times, when I pushed some new commits, the old build jobs are still
in pending and not canceled.
Before we fix that, we can manually cancel some old jobs to save build
resource.

Best,
Jark


On Wed, 10 Jul 2019 at 16:17, Chesnay Schepler <ch...@apache.org> wrote:

> Your best bet would be to check the first commit in the PR and check the
> parent commit.
>
> To re-run things, you will have to rebase the PR on the latest master.
>
> On 10/07/2019 03:32, Kurt Young wrote:
> > Thanks for all your efforts Chesnay, it indeed improves a lot for our
> > develop experience. BTW, do you know how to find the master branch
> > information which the CI runs with?
> >
> > For example, like this one:
> > https://travis-ci.com/flink-ci/flink/jobs/214542568
> > It shows pass with the commits, which rebased on the master when the CI
> > is triggered. But it's both possible that the master branch CI runs on is
> > the
> > same or different with current master. If it's the same, I can simply
> rely
> > on the
> > passed information to push commits, but if it's not, I think i should
> find
> > another
> > way to re-trigger tests based on the newest master.
> >
> > Do you know where can I get such information?
> >
> > Best,
> > Kurt
> >
> >
> > On Tue, Jul 9, 2019 at 3:27 AM Chesnay Schepler <ch...@apache.org>
> wrote:
> >
> >> The kinks have been worked out; the bot is running again and pr builds
> >> are yet again no longer running on ASF resources.
> >>
> >> PRs are mirrored to: https://github.com/flink-ci/flink
> >> Bot source: https://github.com/flink-ci/ci-bot
> >>
> >> On 08/07/2019 17:14, Chesnay Schepler wrote:
> >>> I have temporarily re-enabled running PR builds on the ASF account;
> >>> migrating to the Travis subscription caused some issues in the bot
> >>> that I have to fix first.
> >>>
> >>> On 07/07/2019 23:01, Chesnay Schepler wrote:
> >>>> The vote has passed unanimously in favor of migrating to a separate
> >>>> Travis account.
> >>>>
> >>>> I will now set things up such that no PullRequest is no longer run on
> >>>> the ASF servers.
> >>>> This is a major setup in reducing our usage of ASF resources.
> >>>> For the time being we'll use free Travis plan for flink-ci (i.e. 5
> >>>> workers, which is the same the ASF gives us). Over the course of the
> >>>> next week we'll setup the Ververica subscription to increase this
> limit.
> >>>>
> >>>>  From now now, a bot will mirror all new and updated PullRequests to a
> >>>> mirror repository (https://github.com/flink-ci/flink-ci) and write an
> >>>> update into the PR once the build is complete.
> >>>> I have ran the bots for the past 3 days in parallel to our existing
> >>>> Travis and it was working without major issues.
> >>>>
> >>>> The biggest change that contributors will see is that there's no
> >>>> longer a icon next to each commit. We may revisit this in the future.
> >>>>
> >>>> I'll setup a repo with the source of the bot later.
> >>>>
> >>>> On 04/07/2019 10:46, Chesnay Schepler wrote:
> >>>>> I've raised a JIRA
> >>>>> <https://issues.apache.org/jira/browse/INFRA-18703>with INFRA to
> >>>>> inquire whether it would be possible to switch to a different Travis
> >>>>> account, and if so what steps would need to be taken.
> >>>>> We need a proper confirmation from INFRA since we are not in full
> >>>>> control of the flink repository (for example, we cannot access the
> >>>>> settings page).
> >>>>>
> >>>>> If this is indeed possible, Ververica is willing sponsor a Travis
> >>>>> account for the Flink project.
> >>>>> This would provide us with more than enough resources than we need.
> >>>>>
> >>>>> Since this makes the project more reliant on resources provided by
> >>>>> external companies I would like to vote on this.
> >>>>>
> >>>>> Please vote on this proposal, as follows:
> >>>>> [ ] +1, Approve the migration to a Ververica-sponsored Travis
> >>>>> account, provided that INFRA approves
> >>>>> [ ] -1, Do not approach the migration to a Ververica-sponsored
> >>>>> Travis account
> >>>>>
> >>>>> The vote will be open for at least 24h, and until we have
> >>>>> confirmation from INFRA. The voting period may be shorter than the
> >>>>> usual 3 days since our current is effectively not working.
> >>>>>
> >>>>> On 04/07/2019 06:51, Bowen Li wrote:
> >>>>>> Re: > Are they using their own Travis CI pool, or did the switch to
> >>>>>> an entirely different CI service?
> >>>>>>
> >>>>>> I reached out to Wes and Krisztián from Apache Arrow PMC. They are
> >>>>>> currently moving away from ASF's Travis to their own in-house metal
> >>>>>> machines at [1] with custom CI application at [2]. They've seen
> >>>>>> significant improvement w.r.t both much higher performance and
> >>>>>> basically no resource waiting time, "night-and-day" difference
> >>>>>> quoting Wes.
> >>>>>>
> >>>>>> Re: > If we can just switch to our own Travis pool, just for our
> >>>>>> project, then this might be something we can do fairly quickly?
> >>>>>>
> >>>>>> I believe so, according to [3] and [4]
> >>>>>>
> >>>>>>
> >>>>>> [1] https://ci.ursalabs.org/ <https://ci.ursalabs.org/#/>
> >>>>>> [2] https://github.com/ursa-labs/ursabot
> >>>>>> [3]
> >>>>>>
> >>
> https://docs.travis-ci.com/user/migrate/open-source-repository-migration
> >>>>>> [4]
> >>>>>>
> https://docs.travis-ci.com/user/migrate/open-source-on-travis-ci-com
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On Wed, Jul 3, 2019 at 12:01 AM Chesnay Schepler
> >>>>>> <chesnay@apache.org <ma...@apache.org>> wrote:
> >>>>>>
> >>>>>>      Are they using their own Travis CI pool, or did the switch to
> an
> >>>>>>      entirely different CI service?
> >>>>>>
> >>>>>>      If we can just switch to our own Travis pool, just for our
> >>>>>>      project, then
> >>>>>>      this might be something we can do fairly quickly?
> >>>>>>
> >>>>>>      On 03/07/2019 05:55, Bowen Li wrote:
> >>>>>>      > I responded in the INFRA ticket [1] that I believe they are
> >>>>>>      using a wrong
> >>>>>>      > metric against Flink and the total build time is a completely
> >>>>>>      different
> >>>>>>      > thing than guaranteed build capacity.
> >>>>>>      >
> >>>>>>      > My response:
> >>>>>>      >
> >>>>>>      > "As mentioned above, since I started to pay attention to
> Flink's
> >>>>>>      build
> >>>>>>      > queue a few tens of days ago, I'm in Seattle and I saw no
> build
> >>>>>>      was kicking
> >>>>>>      > off in PST daytime in weekdays for Flink. Our teammates in
> China
> >>>>>>      and Europe
> >>>>>>      > have also reported similar observations. So we need to
> evaluate
> >>>>>>      how the
> >>>>>>      > large total build time came from - if 1) your number and 2)
> our
> >>>>>>      > observations from three locations that cover pretty much a
> full
> >>>>>>      day, are
> >>>>>>      > all true, I **guess** one reason can be that - highly likely
> the
> >>>>>>      extra
> >>>>>>      > build time came from weekends when other Apache projects may
> be
> >>>>>>      idle and
> >>>>>>      > Flink just drains hard its congested queue.
> >>>>>>      >
> >>>>>>      > Please be aware of that we're not complaining about the lack
> of
> >>>>>>      resources
> >>>>>>      > in general, I'm complaining about the lack of **stable,
> >>>>>> dedicated**
> >>>>>>      > resources. An example for the latter one is, currently even
> if
> >>>>>>      no build is
> >>>>>>      > in Flink's queue and I submit a request to be the queue head
> >>>>>> in PST
> >>>>>>      > morning, my build won't even start in 6-8+h. That is an
> absurd
> >>>>>>      amount of
> >>>>>>      > waiting time.
> >>>>>>      >
> >>>>>>      > That's saying, if ASF INFRA decides to adopt a quota system
> and
> >>>>>>      grants
> >>>>>>      > Flink five DEDICATED servers that runs all the time only for
> >>>>>>      Flink, that'll
> >>>>>>      > be PERFECT and can totally solve our problem now.
> >>>>>>      >
> >>>>>>      > Please be aware of that we're not complaining about the lack
> of
> >>>>>>      resources
> >>>>>>      > in general, I'm complaining about the lack of **stable,
> >>>>>> dedicated**
> >>>>>>      > resources. An example for the latter one is, currently even
> if
> >>>>>>      no build is
> >>>>>>      > in Flink's queue and I submit a request to be the queue head
> >>>>>> in PST
> >>>>>>      > morning, my build won't even start in 6-8+h. That is an
> absurd
> >>>>>>      amount of
> >>>>>>      > waiting time.
> >>>>>>      >
> >>>>>>      >
> >>>>>>      > That's saying, if ASF INFRA decides to adopt a quota system
> and
> >>>>>>      grants
> >>>>>>      > Flink five DEDICATED servers that runs all the time only for
> >>>>>>      Flink, that'll
> >>>>>>      > be PERFECT and can totally solve our problem now.
> >>>>>>      >
> >>>>>>      > I feel what's missing in the ASF INFRA's Travis resource
> pool is
> >>>>>>      some level
> >>>>>>      > of build capacity SLAs and certainty"
> >>>>>>      >
> >>>>>>      >
> >>>>>>      > Again, I believe there are differences in nature of these two
> >>>>>>      problems,
> >>>>>>      > long build time v.s. lack of dedicated build resource. That's
> >>>>>>      saying,
> >>>>>>      > shortening build time may relieve the situation, and may not.
> >>>>>>      I'm sightly
> >>>>>>      > negative on disabling IT cases for PRs, due to the downside
> is
> >>>>>>      that we are
> >>>>>>      > at risk of any potential bugs in PR that UTs doesn't catch,
> and
> >>>>>>      may cost a
> >>>>>>      > lot more to fix and if it slows others down or even block
> >>>>>>      others, but am
> >>>>>>      > open to others opinions on it.
> >>>>>>      >
> >>>>>>      > AFAICT from INFRA ticket[1], donating to ASF INFRA won't be
> >>>>>>      feasible to
> >>>>>>      > solve our problem since INFRA's pool is fully shared and they
> >>>>>>      have no
> >>>>>>      > control and finer insights over resource allocation to a
> >>>>>>      specific Apache
> >>>>>>      > project. As mentioned in [1], Apache Arrow is moving away
> from
> >>>>>>      ASF INFRA
> >>>>>>      > Travis pool (they are actually surprised Flink hasn't plan
> to do
> >>>>>>      so). I
> >>>>>>      > know that Spark is on its own build infra. If we all agree
> that
> >>>>>>      funding our
> >>>>>>      > own build infra, I'd be glad to help investigate any
> potential
> >>>>>>      options
> >>>>>>      > after releasing 1.9 since I'm super busy with 1.9 now.
> >>>>>>      >
> >>>>>>      > [1] https://issues.apache.org/jira/browse/INFRA-18533
> >>>>>>      >
> >>>>>>      >
> >>>>>>      >
> >>>>>>      > On Tue, Jul 2, 2019 at 4:46 AM Chesnay Schepler
> >>>>>>      <chesnay@apache.org <ma...@apache.org>> wrote:
> >>>>>>      >
> >>>>>>      >> As a short-term stopgap, since we can assume this issue to
> >>>>>>      become much
> >>>>>>      >> worse in the following days/weeks, we could disable IT
> cases in
> >>>>>>      PRs and
> >>>>>>      >> only run them on master.
> >>>>>>      >>
> >>>>>>      >> On 02/07/2019 12:03, Chesnay Schepler wrote:
> >>>>>>      >>> People really have to stop thinking that just because
> >>>>>>      something works
> >>>>>>      >>> for us it is also a good solution.
> >>>>>>      >>> Also, please remember that our builds run for 2h from
> start to
> >>>>>>      finish,
> >>>>>>      >>> and not the 14 _minutes_ it takes for zeppelin.
> >>>>>>      >>> We are dealing with an entirely different scale here, both
> in
> >>>>>>      terms of
> >>>>>>      >>> build times and number of builds.
> >>>>>>      >>>
> >>>>>>      >>> In this very thread people have been complaining about long
> >>>>>> queue
> >>>>>>      >>> times for their builds. Surprise, other Apache projects
> >>>>>> have been
> >>>>>>      >>> suffering the very same thing due to us not controlling our
> >>>>>> build
> >>>>>>      >>> times. While switching services (be it Jenkins, CircleCI or
> >>>>>>      whatever)
> >>>>>>      >>> will possibly work for us (and these options are actually
> >>>>>>      attractive,
> >>>>>>      >>> like CircleCI's proper support for build artifacts), it
> >>>>>> will also
> >>>>>>      >>> result in us likely negatively affecting other projects in
> >>>>>>      significant
> >>>>>>      >>> ways.
> >>>>>>      >>>
> >>>>>>      >>> Sure, the Jenkins setup has a good user experience for us,
> at
> >>>>>>      the cost
> >>>>>>      >>> of blocking Jenkins workers for a _lot_ of time. Right now
> we
> >>>>>>      have 25
> >>>>>>      >>> PR's in our queue; that's possibly 50h we'd consume of
> Jenkins
> >>>>>>      >>> resources, and the European contributors haven't even
> really
> >>>>>>      started yet.
> >>>>>>      >>>
> >>>>>>      >>> FYI, the latest INFRA response from INFRA-18533:
> >>>>>>      >>>
> >>>>>>      >>> "Our rough metrics shows that Flink used over 5800 hours of
> >>>>>>      build time
> >>>>>>      >>> last month. That is equal to EIGHT servers running 24/7 for
> >>>>>>      the ENTIRE
> >>>>>>      >>> MONTH. EIGHT. nonstop.
> >>>>>>      >>> When we discovered this last night, we discussed it some
> and
> >>>>>>      are going
> >>>>>>      >>> to tune down Flink to allow only five executors maximum. We
> >>>>>> cannot
> >>>>>>      >>> allow Flink to consume so much of a Foundation shared
> >>>>>> resource."
> >>>>>>      >>>
> >>>>>>      >>> So yes, we either
> >>>>>>      >>> a) have to heavily reduce our CI usage or
> >>>>>>      >>> b) fund our own, either maintaining it ourselves or
> donating
> >>>>>>      to Apache.
> >>>>>>      >>>
> >>>>>>      >>> On 02/07/2019 05:11, Bowen Li wrote:
> >>>>>>      >>>> By looking at the git history of the Jenkins script, its
> core
> >>>>>>      part
> >>>>>>      >>>> was finished in March 2017 (and only two minor update in
> >>>>>>      2017/2018),
> >>>>>>      >>>> so it's been running for over two years now and feels like
> >>>>>>      Zepplin
> >>>>>>      >>>> community has been quite happy with it. @Jeff Zhang
> >>>>>>      >>>> <mailto:zjffdu@gmail.com <ma...@gmail.com>> can
> you
> >>>>>>      share your insights and user
> >>>>>>      >>>> experience with the Jenkins+Travis approach?
> >>>>>>      >>>>
> >>>>>>      >>>> Things like:
> >>>>>>      >>>>
> >>>>>>      >>>> - has the approach completely solved the resource capacity
> >>>>>>      problem
> >>>>>>      >>>> for Zepplin community? is Zepplin community happy with the
> >>>>>>      result?
> >>>>>>      >>>> - is the whole configuration chain stable (e.g. uptime)
> >>>>>> enough?
> >>>>>>      >>>> - how often do you need to maintain the Jenkins infra? how
> >>>>>> many
> >>>>>>      >>>> people are usually involved in maintenance and bug-fixes?
> >>>>>>      >>>>
> >>>>>>      >>>> The downside of this approach seems mostly to be on the
> >>>>>>      maintenance
> >>>>>>      >>>> to me - maintain the script and Jenkins infra.
> >>>>>>      >>>>
> >>>>>>      >>>> ** Having Our Own Travis-CI.com Account **
> >>>>>>      >>>>
> >>>>>>      >>>> Another alternative I've been thinking of is to have our
> own
> >>>>>>      >>>> travis-ci.com <http://travis-ci.com> <
> http://travis-ci.com>
> >>>>>>      account with paid dedicated
> >>>>>>      >>>> resources. Note travis-ci.org <http://travis-ci.org>
> >>>>>>      <http://travis-ci.org> is the free
> >>>>>>      >>>> version and travis-ci.com <http://travis-ci.com>
> >>>>>>      <http://travis-ci.com> is the commercial
> >>>>>>      >>>> version. We currently use a shared resource pool managed
> by
> >>>>>>      ASK INFRA
> >>>>>>      >>>> team on travis-ci.org <http://travis-ci.org>
> >>>>>>      <http://travis-ci.org>, but we have no control
> >>>>>>      >>>> over it - we can't see how it's configured, how much
> >>>>>>      resources are
> >>>>>>      >>>> available, how resources are allocated among Apache
> projects,
> >>>>>>      etc.
> >>>>>>      >>>> The nice thing about having an account on travis-ci.com
> >>>>>>      <http://travis-ci.com>
> >>>>>>      >>>> <http://travis-ci.com> are:
> >>>>>>      >>>>
> >>>>>>      >>>> - relatively low cost with much better resource guarantee
> >>>>>>      than what
> >>>>>>      >>>> we currently have [1]: $249/month with 5 dedicated
> >>>>>> concurrency,
> >>>>>>      >>>> $489/month with 10 concurrency
> >>>>>>      >>>> - low maintenance work compared to using Jenkins
> >>>>>>      >>>> - (potentially) no migration cost according to Travis's
> >>>>>> doc [2]
> >>>>>>      >>>> (pending verification)
> >>>>>>      >>>> - full control over the build capacity/configuration
> >>>>>> compared to
> >>>>>>      >>>> using ASF INFRA's pool
> >>>>>>      >>>>
> >>>>>>      >>>> I'd be surprised if we as such a vibrant community cannot
> >>>>>>      find and
> >>>>>>      >>>> fund $249*12=$2988 a year in exchange for a much better
> >>>>>> developer
> >>>>>>      >>>> experience and much higher productivity.
> >>>>>>      >>>>
> >>>>>>      >>>> [1] https://travis-ci.com/plans
> >>>>>>      >>>> [2]
> >>>>>>      >>>>
> >>>>>>      >>
> >>>>>>
> >>
> https://docs.travis-ci.com/user/migrate/open-source-repository-migration
> >>>>>>      >>>> On Sat, Jun 29, 2019 at 8:39 AM Chesnay Schepler
> >>>>>>      <chesnay@apache.org <ma...@apache.org>
> >>>>>>      >>>> <mailto:chesnay@apache.org <ma...@apache.org>>>
> >>>>>> wrote:
> >>>>>>      >>>>
> >>>>>>      >>>>      So yes, the Jenkins job keeps pulling the state from
> >>>>>>      Travis until it
> >>>>>>      >>>>      finishes.
> >>>>>>      >>>>
> >>>>>>      >>>>      Note sure I'm comfortable with the idea of using
> Jenkins
> >>>>>>      workers
> >>>>>>      >>>>      just to
> >>>>>>      >>>>      idle for a several hours.
> >>>>>>      >>>>
> >>>>>>      >>>>      On 29/06/2019 14:56, Jeff Zhang wrote:
> >>>>>>      >>>>      > Here's what zeppelin community did, we make a
> python
> >>>>>>      script to
> >>>>>>      >>>>      check the
> >>>>>>      >>>>      > build status of pull request.
> >>>>>>      >>>>      > Here's script:
> >>>>>>      >>>>      >
> >>>>>> https://github.com/apache/zeppelin/blob/master/travis_check.py
> >>>>>>      >>>>      >
> >>>>>>      >>>>      > And this is the script we used in Jenkins build
> job.
> >>>>>>      >>>>      >
> >>>>>>      >>>>      > if [ -f "travis_check.py" ]; then
> >>>>>>      >>>>      >    git log -n 1
> >>>>>>      >>>>      >    STATUS=$(curl -s $BUILD_URL | grep -e "GitHub
> pull
> >>>>>>      >>>>      request.*from.*" | sed
> >>>>>>      >>>>      > 's/.*GitHub pull request <a
> >>>>>>      >>>>      >
> href=\"\(https[^"]*\).*from[^"]*.\(https[^"]*\).*/\1
> >>>>>>      \2/g')
> >>>>>>      >>>>      >    AUTHOR=$(echo $STATUS | sed
> 's/.*[/]\(.*\)$/\1/g')
> >>>>>>      >>>>      >    PR=$(echo $STATUS | awk '{print $1}' | sed
> >>>>>>      >>>> 's/.*[/]\(.*\)$/\1/g')
> >>>>>>      >>>>      >    #COMMIT=$(git log -n 1 | grep "^Merge:" | awk
> >>>>>>      '{print $3}')
> >>>>>>      >>>>      >    #if [ -z $COMMIT ]; then
> >>>>>>      >>>>      >    #  COMMIT=$(curl -s
> >>>>>>      >>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR
> >>>>>>      >>>>      > | grep -e "\"label\":" -e "\"ref\":" -e "\"sha\":"
> |
> >>>>>>      tr '\n' ' '
> >>>>>>      >>>>      | sed
> >>>>>>      >>>>      > 's/\(.*sha[^,]*,\)\(.*ref.*\)/\1 = \2/g' | tr =
> '\n' |
> >>>>>>      grep -v
> >>>>>>      >>>>      "apache:" |
> >>>>>>      >>>>      > sed 's/.*sha.[^"]*["]\([^"]*\).*/\1/g')
> >>>>>>      >>>>      >    #fi
> >>>>>>      >>>>      >
> >>>>>>      >>>>      >    # get commit hash from PR
> >>>>>>      >>>>      >    COMMIT=$(curl -s
> >>>>>>      >>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR |
> >>>>>>      >>>>      > grep -e "\"label\":" -e "\"ref\":" -e "\"sha\":" |
> tr
> >>>>>>      '\n' ' '
> >>>>>>      >>>> | sed
> >>>>>>      >>>>      > 's/\(.*sha[^,]*,\)\(.*ref.*\)/\1 = \2/g' | tr =
> '\n' |
> >>>>>>      grep -v
> >>>>>>      >>>>      "apache:" |
> >>>>>>      >>>>      > sed 's/.*sha.[^"]*["]\([^"]*\).*/\1/g')
> >>>>>>      >>>>      >    sleep 30 # sleep few moment to wait travis
> starts
> >>>>>>      the build
> >>>>>>      >>>>      >    RET_CODE=0
> >>>>>>      >>>>      >    python ./travis_check.py ${AUTHOR} ${COMMIT} ||
> >>>>>>      RET_CODE=$?
> >>>>>>      >>>>      >    if [ $RET_CODE -eq 2 ]; then # try with
> repository
> >>>>>>      name when
> >>>>>>      >>>>      travis-ci is
> >>>>>>      >>>>      > not available in the account
> >>>>>>      >>>>      >      RET_CODE=0
> >>>>>>      >>>>      >      AUTHOR=$(curl -s
> >>>>>>      >>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR
> >>>>>>      >>>>      > | grep '"full_name":' | grep -v "apache/zeppelin" |
> >>>>>> sed
> >>>>>>      >>>>      > 's/.*[:][^"]*["]\([^/]*\).*/\1/g')
> >>>>>>      >>>>      >    python ./travis_check.py ${AUTHOR} ${COMMIT} ||
> >>>>>>      RET_CODE=$?
> >>>>>>      >>>>      >    fi
> >>>>>>      >>>>      >
> >>>>>>      >>>>      >    if [ $RET_CODE -eq 2 ]; then # fail with can't
> find
> >>>>>>      build
> >>>>>>      >>>>      information in
> >>>>>>      >>>>      > the travis
> >>>>>>      >>>>      >      set +x
> >>>>>>      >>>>      >      echo
> >>>>>>      "-----------------------------------------------------"
> >>>>>>      >>>>      >      echo "Looks like travis-ci is not configured
> for
> >>>>>>      your fork."
> >>>>>>      >>>>      >      echo "Please setup by swich on 'zeppelin'
> >>>>>>      repository at
> >>>>>>      >>>>      > https://travis-ci.org/profile and travis-ci."
> >>>>>>      >>>>      >      echo "And then make sure 'Build branch
> updates'
> >>>>>>      option is
> >>>>>>      >>>>      enabled in
> >>>>>>      >>>>      > the settings
> >>>>>>      https://travis-ci.org/${AUTHOR}/zeppelin/settings
> >>>>>> <https://travis-ci.org/$%7BAUTHOR%7D/zeppelin/settings>
> >>>>>>      >>>> <https://travis-ci.org/$%7BAUTHOR%7D/zeppelin/settings>."
> >>>>>>      >>>>      >      echo ""
> >>>>>>      >>>>      >      echo "To trigger CI after setup, you will need
> >>>>>>      ammend your
> >>>>>>      >>>>      last commit
> >>>>>>      >>>>      > with"
> >>>>>>      >>>>      >      echo "git commit --amend"
> >>>>>>      >>>>      >      echo "git push your-remote HEAD --force"
> >>>>>>      >>>>      >      echo ""
> >>>>>>      >>>>      >      echo "See
> >>>>>>      >>>>      >
> >>>>>>      >>>>
> >>>>>>      >>
> >>>>>>
> >>
> http://zeppelin.apache.org/contribution/contributions.html#continuous-integration
> >>>>>>      >>>>      > ."
> >>>>>>      >>>>      >    fi
> >>>>>>      >>>>      >
> >>>>>>      >>>>      >    exit $RET_CODE
> >>>>>>      >>>>      > else
> >>>>>>      >>>>      >    set +x
> >>>>>>      >>>>      >    echo "travis_check.py does not exists"
> >>>>>>      >>>>      >    exit 1
> >>>>>>      >>>>      > fi
> >>>>>>      >>>>      >
> >>>>>>      >>>>      > Chesnay Schepler <chesnay@apache.org
> >>>>>>      <ma...@apache.org>
> >>>>>>      >>>>      <mailto:chesnay@apache.org <mailto:
> chesnay@apache.org
> >>>>>>      于2019年6月29日周六 下午3:17写道:
> >>>>>>      >>>>      >
> >>>>>>      >>>>      >> Does this imply that a Jenkins job is active as
> long
> >>>>>>      as the
> >>>>>>      >>>>      Travis build
> >>>>>>      >>>>      >> runs?
> >>>>>>      >>>>      >>
> >>>>>>      >>>>      >> On 26/06/2019 21:28, Bowen Li wrote:
> >>>>>>      >>>>      >>> Hi,
> >>>>>>      >>>>      >>>
> >>>>>>      >>>>      >>> @Dawid, I think the "long test running" as I
> >>>>>>      mentioned in the
> >>>>>>      >>>>      first
> >>>>>>      >>>>      >> email,
> >>>>>>      >>>>      >>> also as you guys said, belongs to "a big effort
> >>>>>>      which is much
> >>>>>>      >>>>      harder to
> >>>>>>      >>>>      >>> accomplish in a short period of time and may
> deserve
> >>>>>>      its own
> >>>>>>      >>>>      separate
> >>>>>>      >>>>      >>> discussion". Thus I didn't include it in what we
> can
> >>>>>>      do in a
> >>>>>>      >>>>      foreseeable
> >>>>>>      >>>>      >>> short term.
> >>>>>>      >>>>      >>>
> >>>>>>      >>>>      >>> Besides, I don't think that's the ultimate reason
> >>>>>>      for lack of
> >>>>>>      >>>>      build
> >>>>>>      >>>>      >>> resources. Even if the build is shortened to
> >>>>>>      something like
> >>>>>>      >>>>      2h, the
> >>>>>>      >>>>      >>> problems of no build machine works about 6 or
> more
> >>>>>>      hours in
> >>>>>>      >>>>      PST daytime
> >>>>>>      >>>>      >>> that I described will still happen, because no
> >>>>>>      machine from
> >>>>>>      >>>>      ASF INFRA's
> >>>>>>      >>>>      >>> pool is allocated to Flink. As I have paid close
> >>>>>>      attention to
> >>>>>>      >>>>      the build
> >>>>>>      >>>>      >>> queue in the past few weekdays, it's a pretty
> clear
> >>>>>>      pattern now.
> >>>>>>      >>>>      >>>
> >>>>>>      >>>>      >>> **The ultimate root cause** for that is - we
> don't
> >>>>>>      have any
> >>>>>>      >>>>      **dedicated**
> >>>>>>      >>>>      >>> build resources that we can stably rely on. I'm
> >>>>>>      actually ok to
> >>>>>>      >>>>      wait for a
> >>>>>>      >>>>      >>> long time if there are build requests running, it
> >>>>>>      means at
> >>>>>>      >>>>      least we are
> >>>>>>      >>>>      >>> making progress. But I'm not ok with no build
> >>>>>>      resource. A
> >>>>>>      >>>>      better place I
> >>>>>>      >>>>      >>> think we should aim at in short term is to always
> >>>>>>      have at
> >>>>>>      >>>>      least a central
> >>>>>>      >>>>      >>> pool (can be 3 or 5) of machines dedicated to
> build
> >>>>>>      Flink at
> >>>>>>      >>>>      any time, or
> >>>>>>      >>>>      >>> maybe use users resources.
> >>>>>>      >>>>      >>>
> >>>>>>      >>>>      >>> @Chesnay @Robert I synced with Jeff offline that
> >>>>>>      Zeppelin
> >>>>>>      >>>>      community is
> >>>>>>      >>>>      >>> using a Jenkins job to automatically build on
> users'
> >>>>>>      travis
> >>>>>>      >>>>      account and
> >>>>>>      >>>>      >>> link the result back to github PR. I guess the
> >>>>>>      Jenkins job
> >>>>>>      >>>>      would fetch
> >>>>>>      >>>>      >>> latest upstream master and build the PR against
> it.
> >>>>>>      Jeff has
> >>>>>>      >>>> filed
> >>>>>>      >>>>      >> tickets
> >>>>>>      >>>>      >>> to learn and get access to the Jenkins infra.
> It'll
> >>>>>>      better to
> >>>>>>      >>>>      fully
> >>>>>>      >>>>      >>> understand it first before judging this approach.
> >>>>>>      >>>>      >>>
> >>>>>>      >>>>      >>> I also heard good things about CircleCI, and ASF
> >>>>>>      INFRA seems
> >>>>>>      >>>>      to have a
> >>>>>>      >>>>      >> pool
> >>>>>>      >>>>      >>> of build capacity there too. Can be an
> alternative
> >>>>>>      to consider.
> >>>>>>      >>>>      >>>
> >>>>>>      >>>>      >>>
> >>>>>>      >>>>      >>>
> >>>>>>      >>>>      >>>
> >>>>>>      >>>>      >>>
> >>>>>>      >>>>      >>>
> >>>>>>      >>>>      >>>
> >>>>>>      >>>>      >>>
> >>>>>>      >>>>      >>>
> >>>>>>      >>>>      >>> On Wed, Jun 26, 2019 at 12:44 AM Dawid
> Wysakowicz <
> >>>>>>      >>>>      >> dwysakowicz@apache.org
> >>>>>>      <ma...@apache.org> <mailto:dwysakowicz@apache.org
> >>>>>>      <ma...@apache.org>>>
> >>>>>>      >>>>      >>> wrote:
> >>>>>>      >>>>      >>>
> >>>>>>      >>>>      >>>> Sorry to jump in late, but I think Bowen missed
> the
> >>>>>>      most
> >>>>>>      >>>>      important point
> >>>>>>      >>>>      >>>> from Chesnay's previous message in the summary.
> The
> >>>>>>      ultimate
> >>>>>>      >>>>      reason for
> >>>>>>      >>>>      >>>> all the problems is that the tests take close
> to 2
> >>>>>>      hours to
> >>>>>>      >>>>      run already.
> >>>>>>      >>>>      >>>> I fully support this claim: "Unless people start
> >>>>>>      caring about
> >>>>>>      >>>>      test times
> >>>>>>      >>>>      >>>> before adding them, this issue cannot be solved"
> >>>>>>      >>>>      >>>>
> >>>>>>      >>>>      >>>> This is also another reason why using user's
> Travis
> >>>>>>      account
> >>>>>>      >>>>      won't help.
> >>>>>>      >>>>      >>>> Every few weeks we reach the user's time limit
> for
> >>>>>>      a single
> >>>>>>      >>>>      profile.
> >>>>>>      >>>>      >>>> This makes the user's builds simply fail, until
> we
> >>>>>>      either
> >>>>>>      >>>>      properly
> >>>>>>      >>>>      >>>> decrease the time the tests take (which I am not
> >>>>>>      sure we ever
> >>>>>>      >>>>      did) or
> >>>>>>      >>>>      >>>> postpone the problem by splitting into more
> >>>>>>      profiles. (Note
> >>>>>>      >>>>      that the ASF
> >>>>>>      >>>>      >>>> Travis account has higher time limits)
> >>>>>>      >>>>      >>>>
> >>>>>>      >>>>      >>>> Best,
> >>>>>>      >>>>      >>>>
> >>>>>>      >>>>      >>>> Dawid
> >>>>>>      >>>>      >>>>
> >>>>>>      >>>>      >>>> On 26/06/2019 09:36, Robert Metzger wrote:
> >>>>>>      >>>>      >>>>> Do we know if using "the best" available
> hardware
> >>>>>>      would
> >>>>>>      >>>>      improve the
> >>>>>>      >>>>      >> build
> >>>>>>      >>>>      >>>>> times?
> >>>>>>      >>>>      >>>>> Imagine we would run the build on machines with
> >>>>>>      plenty of
> >>>>>>      >>>>      main memory
> >>>>>>      >>>>      >> to
> >>>>>>      >>>>      >>>>> mount everything to ramdisk + the latest CPU
> >>>>>>      architecture?
> >>>>>>      >>>>      >>>>>
> >>>>>>      >>>>      >>>>> Throwing hardware at the problem could help
> reduce
> >>>>>>      the time
> >>>>>>      >>>>      of an
> >>>>>>      >>>>      >>>>> individual build, and using our own
> infrastructure
> >>>>>>      would
> >>>>>>      >>>>      remove our
> >>>>>>      >>>>      >>>>> dependency on Apache's Travis account (with the
> >>>>>>      obvious
> >>>>>>      >>>>      downside of
> >>>>>>      >>>>      >>>> having
> >>>>>>      >>>>      >>>>> to maintain the infrastructure)
> >>>>>>      >>>>      >>>>> We could use an open source travis
> alternative, to
> >>>>>>      have a
> >>>>>>      >>>>      similar
> >>>>>>      >>>>      >>>>> experience and make the migration easy.
> >>>>>>      >>>>      >>>>>
> >>>>>>      >>>>      >>>>>
> >>>>>>      >>>>      >>>>> On Wed, Jun 26, 2019 at 9:34 AM Chesnay
> Schepler
> >>>>>>      >>>>      <chesnay@apache.org <ma...@apache.org>
> >>>>>>      <mailto:chesnay@apache.org <ma...@apache.org>>>
> >>>>>>      >>>>      >>>> wrote:
> >>>>>>      >>>>      >>>>>> >From what I gathered, there's no special
> >>>>>>      sauce that the
> >>>>>>      >>>>      Zeppelin
> >>>>>>      >>>>      >>>>>> project uses which actually integrates a users
> >>>>>> Travis
> >>>>>>      >>>>      account into the
> >>>>>>      >>>>      >>>> PR.
> >>>>>>      >>>>      >>>>>> They just disabled Travis for PRs. And that's
> >>>>>>      kind of it.
> >>>>>>      >>>>      >>>>>>
> >>>>>>      >>>>      >>>>>> Naturally we can do this (duh) and safe the
> ASF a
> >>>>>>      fair
> >>>>>>      >>>>      amount of
> >>>>>>      >>>>      >>>>>> resources, but there are downsides:
> >>>>>>      >>>>      >>>>>>
> >>>>>>      >>>>      >>>>>> The discoverability of the Travis check takes
> a
> >>>>>>      nose-dive.
> >>>>>>      >>>>      Either we
> >>>>>>      >>>>      >>>>>> require every contributor to always, an every
> >>>>>>      commit, also
> >>>>>>      >>>>      post a
> >>>>>>      >>>>      >> Travis
> >>>>>>      >>>>      >>>>>> build, or we have the reviewer sift through
> the
> >>>>>>      >>>>      contributors account
> >>>>>>      >>>>      >> to
> >>>>>>      >>>>      >>>>>> find it.
> >>>>>>      >>>>      >>>>>>
> >>>>>>      >>>>      >>>>>> This is rather cumbersome. Additionally, it's
> >>>>>>      also not
> >>>>>>      >>>>      equivalent to
> >>>>>>      >>>>      >>>>>> having a PR build.
> >>>>>>      >>>>      >>>>>>
> >>>>>>      >>>>      >>>>>> A normal branch build takes a branch as is and
> >>>>>>      tests it. A
> >>>>>>      >>>>      PR build
> >>>>>>      >>>>      >>>>>> merges the branch into master, and then runs
> it.
> >>>>>>      (Fun fact:
> >>>>>>      >>>>      This is
> >>>>>>      >>>>      >> why
> >>>>>>      >>>>      >>>>>> a PR without merge conflicts is not being run
> on
> >>>>>>      Travis.)
> >>>>>>      >>>>      >>>>>>
> >>>>>>      >>>>      >>>>>> And ultimately, everyone can already make use
> >>>>>> of this
> >>>>>>      >>>>      approach anyway.
> >>>>>>      >>>>      >>>>>>
> >>>>>>      >>>>      >>>>>> On 25/06/2019 08:02, Jark Wu wrote:
> >>>>>>      >>>>      >>>>>>> Hi Jeff,
> >>>>>>      >>>>      >>>>>>>
> >>>>>>      >>>>      >>>>>>> Thanks for sharing the Zeppelin approach. I
> >>>>>>      think it's a
> >>>>>>      >>>>      good idea to
> >>>>>>      >>>>      >>>>>>> leverage user's travis account.
> >>>>>>      >>>>      >>>>>>> In this way, we can have almost unlimited
> >>>>>>      concurrent build
> >>>>>>      >>>>      jobs and
> >>>>>>      >>>>      >>>>>>> developers can restart build by themselves
> >>>>>>      (currently only
> >>>>>>      >>>>      committers
> >>>>>>      >>>>      >>>>>>> can restart PR's build).
> >>>>>>      >>>>      >>>>>>>
> >>>>>>      >>>>      >>>>>>> But I'm still not very clear how to integrate
> >>>>>> user's
> >>>>>>      >>>>      travis build
> >>>>>>      >>>>      >> into
> >>>>>>      >>>>      >>>>>>> the Flink pull request's build automatically.
> >>>>>>      Can you
> >>>>>>      >>>>      explain more in
> >>>>>>      >>>>      >>>>>>> detail?
> >>>>>>      >>>>      >>>>>>>
> >>>>>>      >>>>      >>>>>>> Another question: does travis only build
> >>>>>>      branches for user
> >>>>>>      >>>>      account?
> >>>>>>      >>>>      >>>>>>> My concern is that builds for PRs will rebase
> >>>>>> user's
> >>>>>>      >>>>      commits against
> >>>>>>      >>>>      >>>>>>> current master branch.
> >>>>>>      >>>>      >>>>>>> This will help us to find problems before
> >>>>>>      merge.  Builds
> >>>>>>      >>>>      for branches
> >>>>>>      >>>>      >>>>>>> will lose the impact of new commits in
> master.
> >>>>>>      >>>>      >>>>>>> How does Zeppelin solve this problem?
> >>>>>>      >>>>      >>>>>>>
> >>>>>>      >>>>      >>>>>>> Thanks again for sharing the idea.
> >>>>>>      >>>>      >>>>>>>
> >>>>>>      >>>>      >>>>>>> Regards,
> >>>>>>      >>>>      >>>>>>> Jark
> >>>>>>      >>>>      >>>>>>>
> >>>>>>      >>>>      >>>>>>> On Tue, 25 Jun 2019 at 11:01, Jeff Zhang
> >>>>>>      <zjffdu@gmail.com <ma...@gmail.com>
> >>>>>>      >>>>      <mailto:zjffdu@gmail.com <ma...@gmail.com>>
> >>>>>>      >>>>      >>>>>>> <mailto:zjffdu@gmail.com
> >>>>>>      <ma...@gmail.com> <mailto:zjffdu@gmail.com
> >>>>>>      <ma...@gmail.com>>>> wrote:
> >>>>>>      >>>>      >>>>>>>
> >>>>>>      >>>>      >>>>>>>  Hi Folks,
> >>>>>>      >>>>      >>>>>>>
> >>>>>>      >>>>      >>>>>>>  Zeppelin meet this kind of issue before, we
> >>>>>> solve
> >>>>>>      >>>> it by
> >>>>>>      >>>>      >> delegating
> >>>>>>      >>>>      >>>>>>>  each
> >>>>>>      >>>>      >>>>>>>  one's PR build to his travis account
> >>>>>>      (Everyone can
> >>>>>>      >>>>      have 5 free
> >>>>>>      >>>>      >>>>>>>  slot for
> >>>>>>      >>>>      >>>>>>>  travis build).
> >>>>>>      >>>>      >>>>>>>  Apache account travis build is only
> >>>>>> triggered when
> >>>>>>      >>>>      PR is merged.
> >>>>>>      >>>>      >>>>>>>
> >>>>>>      >>>>      >>>>>>>
> >>>>>>      >>>>      >>>>>>>
> >>>>>>      >>>>      >>>>>>>  Kurt Young <ykt836@gmail.com
> >>>>>>      <ma...@gmail.com>
> >>>>>>      >>>>      <mailto:ykt836@gmail.com <ma...@gmail.com>>
> >>>>>>      <mailto:ykt836@gmail.com <ma...@gmail.com>
> >>>>>>      >>>>      <mailto:ykt836@gmail.com <mailto:ykt836@gmail.com
> >>>>
> >>>>>>      >>>>      >>>>>>>  于2019年6月25日周二 上午10:16写道:
> >>>>>>      >>>>      >>>>>>>
> >>>>>>      >>>>      >>>>>>>  > (Forgot to cc George)
> >>>>>>      >>>>      >>>>>>>  >
> >>>>>>      >>>>      >>>>>>>  > Best,
> >>>>>>      >>>>      >>>>>>>  > Kurt
> >>>>>>      >>>>      >>>>>>>  >
> >>>>>>      >>>>      >>>>>>>  >
> >>>>>>      >>>>      >>>>>>>  > On Tue, Jun 25, 2019 at 10:16 AM Kurt
> Young
> >>>>>>      >>>>      <ykt836@gmail.com <ma...@gmail.com>
> >>>>>>      <mailto:ykt836@gmail.com <ma...@gmail.com>>
> >>>>>>      >>>>      >>>>>>> <mailto:ykt836@gmail.com
> >>>>>>      <ma...@gmail.com> <mailto:ykt836@gmail.com
> >>>>>>      <ma...@gmail.com>>>>
> >>>>>>      >>>>      wrote:
> >>>>>>      >>>>      >>>>>>>  >
> >>>>>>      >>>>      >>>>>>>  > > Hi Bowen,
> >>>>>>      >>>>      >>>>>>>  > >
> >>>>>>      >>>>      >>>>>>>  > > Thanks for bringing this up. We
> >>>>>>      actually have
> >>>>>>      >>>>      discussed
> >>>>>>      >>>>      >> about
> >>>>>>      >>>>      >>>>>>>  this, and I
> >>>>>>      >>>>      >>>>>>>  > > think Till and George have
> >>>>>>      >>>>      >>>>>>>  > > already spend sometime investigating
> >>>>>>      it. I have
> >>>>>>      >>>>      cced both of
> >>>>>>      >>>>      >>>>>>>  them, and
> >>>>>>      >>>>      >>>>>>>  > > maybe they can share
> >>>>>>      >>>>      >>>>>>>  > > their findings.
> >>>>>>      >>>>      >>>>>>>  > >
> >>>>>>      >>>>      >>>>>>>  > > Best,
> >>>>>>      >>>>      >>>>>>>  > > Kurt
> >>>>>>      >>>>      >>>>>>>  > >
> >>>>>>      >>>>      >>>>>>>  > >
> >>>>>>      >>>>      >>>>>>>  > > On Tue, Jun 25, 2019 at 10:08 AM Jark Wu
> >>>>>>      >>>>      <imjark@gmail.com <ma...@gmail.com>
> >>>>>>      <mailto:imjark@gmail.com <ma...@gmail.com>>
> >>>>>>      >>>>      >>>>>>> <mailto:imjark@gmail.com
> >>>>>>      <ma...@gmail.com> <mailto:imjark@gmail.com
> >>>>>>      <ma...@gmail.com>>>>
> >>>>>>      >>>>      wrote:
> >>>>>>      >>>>      >>>>>>>  > >
> >>>>>>      >>>>      >>>>>>>  > >> Hi Bowen,
> >>>>>>      >>>>      >>>>>>>  > >>
> >>>>>>      >>>>      >>>>>>>  > >> Thanks for bringing this. We also
> >>>>>>      suffered from
> >>>>>>      >>>>      the long
> >>>>>>      >>>>      >>>>>>>  build time.
> >>>>>>      >>>>      >>>>>>>  > >> I agree that we should focus on
> >>>>>>      solving build
> >>>>>>      >>>>      capacity
> >>>>>>      >>>>      >>>>>>>  problem in the
> >>>>>>      >>>>      >>>>>>>  > >> thread.
> >>>>>>      >>>>      >>>>>>>  > >>
> >>>>>>      >>>>      >>>>>>>  > >> My observation is there is only one
> >>>>>>      build is
> >>>>>>      >>>>      running, all
> >>>>>>      >>>>      >> the
> >>>>>>      >>>>      >>>>>>>  others
> >>>>>>      >>>>      >>>>>>>  > >> (other
> >>>>>>      >>>>      >>>>>>>  > >> PRs, master) are pending.
> >>>>>>      >>>>      >>>>>>>  > >> The pricing plan[1] of travis shows
> >>>>>>      it can
> >>>>>>      >>>> support
> >>>>>>      >>>>      >> concurrent
> >>>>>>      >>>>      >>>>>>>  build
> >>>>>>      >>>>      >>>>>>>  > jobs.
> >>>>>>      >>>>      >>>>>>>  > >> But I don't know which plan we are
> >>>>>>      using, might
> >>>>>>      >>>>      be the free
> >>>>>>      >>>>      >>>>>>>  plan for
> >>>>>>      >>>>      >>>>>>>  > open
> >>>>>>      >>>>      >>>>>>>  > >> source.
> >>>>>>      >>>>      >>>>>>>  > >>
> >>>>>>      >>>>      >>>>>>>  > >> I cc-ed Chesnay who may have some
> >>>>>>      experience on
> >>>>>>      >>>>      Travis.
> >>>>>>      >>>>      >>>>>>>  > >>
> >>>>>>      >>>>      >>>>>>>  > >> Regards,
> >>>>>>      >>>>      >>>>>>>  > >> Jark
> >>>>>>      >>>>      >>>>>>>  > >>
> >>>>>>      >>>>      >>>>>>>  > >> [1]: https://travis-ci.com/plans
> >>>>>>      >>>>      >>>>>>>  > >>
> >>>>>>      >>>>      >>>>>>>  > >> On Tue, 25 Jun 2019 at 08:11, Bowen Li
> <
> >>>>>>      >>>>      >> bowenli86@gmail.com <ma...@gmail.com>
> >>>>>>      <mailto:bowenli86@gmail.com <ma...@gmail.com>>
> >>>>>>      >>>>      >>>>>>> <mailto:bowenli86@gmail.com
> >>>>>>      <ma...@gmail.com>
> >>>>>>      >>>>      <mailto:bowenli86@gmail.com
> >>>>>>      <ma...@gmail.com>>>> wrote:
> >>>>>>      >>>>      >>>>>>>  > >>
> >>>>>>      >>>>      >>>>>>>  > >> > Hi Steven,
> >>>>>>      >>>>      >>>>>>>  > >> >
> >>>>>>      >>>>      >>>>>>>  > >> > I think you may not read what I
> >>>>>>      wrote. The
> >>>>>>      >>>>      discussion is
> >>>>>>      >>>>      >>>> about
> >>>>>>      >>>>      >>>>>>>  > "unstable
> >>>>>>      >>>>      >>>>>>>  > >> > build **capacity**", in another word
> >>>>>>      >>>>      "unstable / lack of
> >>>>>>      >>>>      >>>> build
> >>>>>>      >>>>      >>>>>>>  > >> resources",
> >>>>>>      >>>>      >>>>>>>  > >> > not "unstable build".
> >>>>>>      >>>>      >>>>>>>  > >> >
> >>>>>>      >>>>      >>>>>>>  > >> > On Mon, Jun 24, 2019 at 4:40 PM
> >>>>>>      Steven Wu
> >>>>>>      >>>>      >>>>>>>  <stevenz3wu@gmail.com
> >>>>>>      <ma...@gmail.com> <mailto:stevenz3wu@gmail.com
> >>>>>>      <ma...@gmail.com>>
> >>>>>>      >>>>      <mailto:stevenz3wu@gmail.com
> >>>>>>      <ma...@gmail.com> <mailto:stevenz3wu@gmail.com
> >>>>>>      <ma...@gmail.com>>>>
> >>>>>>      >>>>      >>>>>>>  > wrote:
> >>>>>>      >>>>      >>>>>>>  > >> >
> >>>>>>      >>>>      >>>>>>>  > >> > > long and sometimes unstable build
> is
> >>>>>>      >>>>      definitely a pain
> >>>>>>      >>>>      >>>>>> point.
> >>>>>>      >>>>      >>>>>>>  > >> > >
> >>>>>>      >>>>      >>>>>>>  > >> > > I suspect the build failure here in
> >>>>>>      >>>>      >> flink-connector-kafka
> >>>>>>      >>>>      >>>>>>>  is not
> >>>>>>      >>>>      >>>>>>>  > >> related
> >>>>>>      >>>>      >>>>>>>  > >> > to
> >>>>>>      >>>>      >>>>>>>  > >> > > my change. but there is no easy
> >>>>>>      re-run the
> >>>>>>      >>>>      build on
> >>>>>>      >>>>      >>>>>>>  travis UI.
> >>>>>>      >>>>      >>>>>>>  > Google
> >>>>>>      >>>>      >>>>>>>  > >> > > search showed a trick of
> >>>>>>      close-and-open the
> >>>>>>      >>>>      PR will
> >>>>>>      >>>>      >>>>>>>  trigger rebuild.
> >>>>>>      >>>>      >>>>>>>  > >> but
> >>>>>>      >>>>      >>>>>>>  > >> > > that could add noises to the PR
> >>>>>>      activities.
> >>>>>>      >>>>      >>>>>>>  > >> > >
> >>>>>>      >>>> https://travis-ci.org/apache/flink/jobs/545555519
> >>>>>>      >>>>      >>>>>>>  > >> > >
> >>>>>>      >>>>      >>>>>>>  > >> > > travis-ci for my personal repo
> >>>>>>      often failed
> >>>>>>      >>>>      with
> >>>>>>      >>>>      >>>>>>>  exceeding time
> >>>>>>      >>>>      >>>>>>>  > limit
> >>>>>>      >>>>      >>>>>>>  > >> > after
> >>>>>>      >>>>      >>>>>>>  > >> > > 4+ hours.
> >>>>>>      >>>>      >>>>>>>  > >> > > The job exceeded the maximum time
> >>>>>>      limit for
> >>>>>>      >>>>      jobs, and
> >>>>>>      >>>>      >> has
> >>>>>>      >>>>      >>>>>>>  been
> >>>>>>      >>>>      >>>>>>>  > >> > terminated.
> >>>>>>      >>>>      >>>>>>>  > >> > >
> >>>>>>      >>>>      >>>>>>>  > >> > > On Mon, Jun 24, 2019 at 4:15 PM
> >>>>>>      Bowen Li
> >>>>>>      >>>>      >>>>>>>  <bowenli86@gmail.com
> >>>>>>      <ma...@gmail.com> <mailto:bowenli86@gmail.com
> >>>>>>      <ma...@gmail.com>>
> >>>>>>      >>>>      <mailto:bowenli86@gmail.com <mailto:
> bowenli86@gmail.com
> >>>>>>      <mailto:bowenli86@gmail.com <ma...@gmail.com>>>>
> >>>>>>      >>>>      >>>>>>>  > wrote:
> >>>>>>      >>>>      >>>>>>>  > >> > >
> >>>>>>      >>>>      >>>>>>>  > >> > > >
> >>>>>>      >>>> https://travis-ci.org/apache/flink/builds/549681530
> >>>>>>      >>>>      >>>>>>>  This build
> >>>>>>      >>>>      >>>>>>>  > >> > request
> >>>>>>      >>>>      >>>>>>>  > >> > > > has
> >>>>>>      >>>>      >>>>>>>  > >> > > > been sitting at **HEAD of the
> >>>>>>      queue**
> >>>>>>      >>>>      since I first
> >>>>>>      >>>>      >> saw
> >>>>>>      >>>>      >>>>>>>  it at PST
> >>>>>>      >>>>      >>>>>>>  > >> > 10:30am
> >>>>>>      >>>>      >>>>>>>  > >> > > > (not sure how long it's been
> >>>>>>      there before
> >>>>>>      >>>>      10:30am).
> >>>>>>      >>>>      >>>>>>>  It's PST
> >>>>>>      >>>>      >>>>>>>  > 4:12pm
> >>>>>>      >>>>      >>>>>>>  > >> now
> >>>>>>      >>>>      >>>>>>>  > >> > > and
> >>>>>>      >>>>      >>>>>>>  > >> > > > it hasn't started yet.
> >>>>>>      >>>>      >>>>>>>  > >> > > >
> >>>>>>      >>>>      >>>>>>>  > >> > > > On Mon, Jun 24, 2019 at 2:48 PM
> >>>>>>      Bowen Li
> >>>>>>      >>>>      >>>>>>>  <bowenli86@gmail.com
> >>>>>>      <ma...@gmail.com> <mailto:bowenli86@gmail.com
> >>>>>>      <ma...@gmail.com>>
> >>>>>>      >>>>      <mailto:bowenli86@gmail.com <mailto:
> bowenli86@gmail.com
> >>>>>>      <mailto:bowenli86@gmail.com <ma...@gmail.com>>>>
> >>>>>>      >>>>      >>>>>>>  > >> wrote:
> >>>>>>      >>>>      >>>>>>>  > >> > > >
> >>>>>>      >>>>      >>>>>>>  > >> > > > > Hi devs,
> >>>>>>      >>>>      >>>>>>>  > >> > > > >
> >>>>>>      >>>>      >>>>>>>  > >> > > > > I've been experiencing the pain
> >>>>>>      >>>>      resulting from lack
> >>>>>>      >>>>      >>>>>>>  of stable
> >>>>>>      >>>>      >>>>>>>  > >> build
> >>>>>>      >>>>      >>>>>>>  > >> > > > > capacity on Travis for Flink
> >>>>>>      PRs [1].
> >>>>>>      >>>>      >> Specifically, I
> >>>>>>      >>>>      >>>>>>>  noticed
> >>>>>>      >>>>      >>>>>>>  > >> often
> >>>>>>      >>>>      >>>>>>>  > >> > > that
> >>>>>>      >>>>      >>>>>>>  > >> > > > no
> >>>>>>      >>>>      >>>>>>>  > >> > > > > build in the queue is making
> any
> >>>>>>      >>>>      progress for
> >>>>>>      >>>>      >> hours,
> >>>>>>      >>>>      >>>> and
> >>>>>>      >>>>      >>>>>>>  > suddenly
> >>>>>>      >>>>      >>>>>>>  > >> 5
> >>>>>>      >>>>      >>>>>>>  > >> > or
> >>>>>>      >>>>      >>>>>>>  > >> > > 6
> >>>>>>      >>>>      >>>>>>>  > >> > > > > builds kick off all together
> >>>>>>      after the
> >>>>>>      >>>>      long pause.
> >>>>>>      >>>>      >>>>>>>  I'm at PST
> >>>>>>      >>>>      >>>>>>>  > >> > (UTC-08)
> >>>>>>      >>>>      >>>>>>>  > >> > > > time
> >>>>>>      >>>>      >>>>>>>  > >> > > > > zone, and I've seen pause can
> >>>>>>      be as
> >>>>>>      >>>>      long as 6 hours
> >>>>>>      >>>>      >>>>>>>  from PST 9am
> >>>>>>      >>>>      >>>>>>>  > >> to
> >>>>>>      >>>>      >>>>>>>  > >> > 3pm
> >>>>>>      >>>>      >>>>>>>  > >> > > > > (let alone the time needed to
> >>>>>>      drain the
> >>>>>>      >>>>      queue
> >>>>>>      >>>>      >>>>>>>  afterwards).
> >>>>>>      >>>>      >>>>>>>  > >> > > > >
> >>>>>>      >>>>      >>>>>>>  > >> > > > > I think this has greatly
> >>>>>>      impacted our
> >>>>>>      >>>>      productivity.
> >>>>>>      >>>>      >>>> I've
> >>>>>>      >>>>      >>>>>>>  > >> experienced
> >>>>>>      >>>>      >>>>>>>  > >> > > that
> >>>>>>      >>>>      >>>>>>>  > >> > > > > PRs submitted in the early
> >>>>>>      morning of
> >>>>>>      >>>>      PST time zone
> >>>>>>      >>>>      >>>>>>>  won't finish
> >>>>>>      >>>>      >>>>>>>  > >> > their
> >>>>>>      >>>>      >>>>>>>  > >> > > > > build until late night of the
> >>>>>>      same day.
> >>>>>>      >>>>      >>>>>>>  > >> > > > >
> >>>>>>      >>>>      >>>>>>>  > >> > > > > So my questions are:
> >>>>>>      >>>>      >>>>>>>  > >> > > > >
> >>>>>>      >>>>      >>>>>>>  > >> > > > > - Has anyone else experienced
> >>>>>>      the same
> >>>>>>      >>>>      problem or
> >>>>>>      >>>>      >>>>>>>  have similar
> >>>>>>      >>>>      >>>>>>>  > >> > > > observation
> >>>>>>      >>>>      >>>>>>>  > >> > > > > on TravisCI? (I suspect it
> >>>>>>      has things
> >>>>>>      >>>>      to do with
> >>>>>>      >>>>      >> time
> >>>>>>      >>>>      >>>>>>>  zone)
> >>>>>>      >>>>      >>>>>>>  > >> > > > >
> >>>>>>      >>>>      >>>>>>>  > >> > > > > - What pricing plan of
> >>>>>>      TravisCI is
> >>>>>>      >>>>      Flink currently
> >>>>>>      >>>>      >>>>>>>  using? Is it
> >>>>>>      >>>>      >>>>>>>  > >> the
> >>>>>>      >>>>      >>>>>>>  > >> > > free
> >>>>>>      >>>>      >>>>>>>  > >> > > > > plan for open source
> >>>>>>      projects? What
> >>>>>>      >>>> are the
> >>>>>>      >>>>      >>>>>>>  guaranteed build
> >>>>>>      >>>>      >>>>>>>  > >> capacity
> >>>>>>      >>>>      >>>>>>>  > >> > > of
> >>>>>>      >>>>      >>>>>>>  > >> > > > > the current plan?
> >>>>>>      >>>>      >>>>>>>  > >> > > > >
> >>>>>>      >>>>      >>>>>>>  > >> > > > > - If the current pricing plan
> >>>>>>      (either
> >>>>>>      >>>>      free or paid)
> >>>>>>      >>>>      >>>>>> can't
> >>>>>>      >>>>      >>>>>>>  > provide
> >>>>>>      >>>>      >>>>>>>  > >> > > stable
> >>>>>>      >>>>      >>>>>>>  > >> > > > > build capacity, can we
> >>>>>>      upgrade to a
> >>>>>>      >>>>      higher priced
> >>>>>>      >>>>      >>>>>>>  plan with
> >>>>>>      >>>>      >>>>>>>  > larger
> >>>>>>      >>>>      >>>>>>>  > >> > and
> >>>>>>      >>>>      >>>>>>>  > >> > > > more
> >>>>>>      >>>>      >>>>>>>  > >> > > > > stable build capacity?
> >>>>>>      >>>>      >>>>>>>  > >> > > > >
> >>>>>>      >>>>      >>>>>>>  > >> > > > > BTW, another factor that
> >>>>>>      contribute to
> >>>>>>      >>>> the
> >>>>>>      >>>>      >>>>>>>  productivity problem
> >>>>>>      >>>>      >>>>>>>  > is
> >>>>>>      >>>>      >>>>>>>  > >> > that
> >>>>>>      >>>>      >>>>>>>  > >> > > > > our build is slow - we run
> >>>>>>      full build
> >>>>>>      >>>>      for every PR
> >>>>>>      >>>>      >>>> and a
> >>>>>>      >>>>      >>>>>>>  > >> successful
> >>>>>>      >>>>      >>>>>>>  > >> > > full
> >>>>>>      >>>>      >>>>>>>  > >> > > > > build takes ~5h. We
> >>>>>>      definitely have
> >>>>>>      >>>>      more options to
> >>>>>>      >>>>      >>>>>>>  solve it,
> >>>>>>      >>>>      >>>>>>>  > for
> >>>>>>      >>>>      >>>>>>>  > >> > > > instance,
> >>>>>>      >>>>      >>>>>>>  > >> > > > > modularize the build graphs
> >>>>>>      and reuse
> >>>>>>      >>>>      artifacts
> >>>>>>      >>>>      >> from
> >>>>>>      >>>>      >>>> the
> >>>>>>      >>>>      >>>>>>>  > previous
> >>>>>>      >>>>      >>>>>>>  > >> > > build.
> >>>>>>      >>>>      >>>>>>>  > >> > > > > But I think that can be a big
> >>>>>>      effort
> >>>>>>      >>>>      which is much
> >>>>>>      >>>>      >>>>>>>  harder to
> >>>>>>      >>>>      >>>>>>>  > >> > accomplish
> >>>>>>      >>>>      >>>>>>>  > >> > > > in
> >>>>>>      >>>>      >>>>>>>  > >> > > > > a short period of time and
> >>>>>>      may deserve
> >>>>>>      >>>>      its own
> >>>>>>      >>>>      >>>> separate
> >>>>>>      >>>>      >>>>>>>  > >> discussion.
> >>>>>>      >>>>      >>>>>>>  > >> > > > >
> >>>>>>      >>>>      >>>>>>>  > >> > > > > [1]
> >>>>>>      >>>>      >> https://travis-ci.org/apache/flink/pull_requests
> >>>>>>      >>>>      >>>>>>>  > >> > > > >
> >>>>>>      >>>>      >>>>>>>  > >> > > > >
> >>>>>>      >>>>      >>>>>>>  > >> > > >
> >>>>>>      >>>>      >>>>>>>  > >> > >
> >>>>>>      >>>>      >>>>>>>  > >> >
> >>>>>>      >>>>      >>>>>>>  > >>
> >>>>>>      >>>>      >>>>>>>  > >
> >>>>>>      >>>>      >>>>>>>  >
> >>>>>>      >>>>      >>>>>>>
> >>>>>>      >>>>      >>>>>>>
> >>>>>>      >>>>      >>>>>>>  --
> >>>>>>      >>>>      >>>>>>>  Best Regards
> >>>>>>      >>>>      >>>>>>>
> >>>>>>      >>>>      >>>>>>>  Jeff Zhang
> >>>>>>      >>>>      >>>>>>>
> >>>>>>      >>>>      >>
> >>>>>>      >>>>
> >>>>>>      >>>
> >>>>>>      >>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
>
>

Re: [RESULT][VOTE] Migrate to sponsored Travis account

Posted by Chesnay Schepler <ch...@apache.org>.
Your best bet would be to check the first commit in the PR and check the 
parent commit.

To re-run things, you will have to rebase the PR on the latest master.

On 10/07/2019 03:32, Kurt Young wrote:
> Thanks for all your efforts Chesnay, it indeed improves a lot for our
> develop experience. BTW, do you know how to find the master branch
> information which the CI runs with?
>
> For example, like this one:
> https://travis-ci.com/flink-ci/flink/jobs/214542568
> It shows pass with the commits, which rebased on the master when the CI
> is triggered. But it's both possible that the master branch CI runs on is
> the
> same or different with current master. If it's the same, I can simply rely
> on the
> passed information to push commits, but if it's not, I think i should find
> another
> way to re-trigger tests based on the newest master.
>
> Do you know where can I get such information?
>
> Best,
> Kurt
>
>
> On Tue, Jul 9, 2019 at 3:27 AM Chesnay Schepler <ch...@apache.org> wrote:
>
>> The kinks have been worked out; the bot is running again and pr builds
>> are yet again no longer running on ASF resources.
>>
>> PRs are mirrored to: https://github.com/flink-ci/flink
>> Bot source: https://github.com/flink-ci/ci-bot
>>
>> On 08/07/2019 17:14, Chesnay Schepler wrote:
>>> I have temporarily re-enabled running PR builds on the ASF account;
>>> migrating to the Travis subscription caused some issues in the bot
>>> that I have to fix first.
>>>
>>> On 07/07/2019 23:01, Chesnay Schepler wrote:
>>>> The vote has passed unanimously in favor of migrating to a separate
>>>> Travis account.
>>>>
>>>> I will now set things up such that no PullRequest is no longer run on
>>>> the ASF servers.
>>>> This is a major setup in reducing our usage of ASF resources.
>>>> For the time being we'll use free Travis plan for flink-ci (i.e. 5
>>>> workers, which is the same the ASF gives us). Over the course of the
>>>> next week we'll setup the Ververica subscription to increase this limit.
>>>>
>>>>  From now now, a bot will mirror all new and updated PullRequests to a
>>>> mirror repository (https://github.com/flink-ci/flink-ci) and write an
>>>> update into the PR once the build is complete.
>>>> I have ran the bots for the past 3 days in parallel to our existing
>>>> Travis and it was working without major issues.
>>>>
>>>> The biggest change that contributors will see is that there's no
>>>> longer a icon next to each commit. We may revisit this in the future.
>>>>
>>>> I'll setup a repo with the source of the bot later.
>>>>
>>>> On 04/07/2019 10:46, Chesnay Schepler wrote:
>>>>> I've raised a JIRA
>>>>> <https://issues.apache.org/jira/browse/INFRA-18703>with INFRA to
>>>>> inquire whether it would be possible to switch to a different Travis
>>>>> account, and if so what steps would need to be taken.
>>>>> We need a proper confirmation from INFRA since we are not in full
>>>>> control of the flink repository (for example, we cannot access the
>>>>> settings page).
>>>>>
>>>>> If this is indeed possible, Ververica is willing sponsor a Travis
>>>>> account for the Flink project.
>>>>> This would provide us with more than enough resources than we need.
>>>>>
>>>>> Since this makes the project more reliant on resources provided by
>>>>> external companies I would like to vote on this.
>>>>>
>>>>> Please vote on this proposal, as follows:
>>>>> [ ] +1, Approve the migration to a Ververica-sponsored Travis
>>>>> account, provided that INFRA approves
>>>>> [ ] -1, Do not approach the migration to a Ververica-sponsored
>>>>> Travis account
>>>>>
>>>>> The vote will be open for at least 24h, and until we have
>>>>> confirmation from INFRA. The voting period may be shorter than the
>>>>> usual 3 days since our current is effectively not working.
>>>>>
>>>>> On 04/07/2019 06:51, Bowen Li wrote:
>>>>>> Re: > Are they using their own Travis CI pool, or did the switch to
>>>>>> an entirely different CI service?
>>>>>>
>>>>>> I reached out to Wes and Krisztián from Apache Arrow PMC. They are
>>>>>> currently moving away from ASF's Travis to their own in-house metal
>>>>>> machines at [1] with custom CI application at [2]. They've seen
>>>>>> significant improvement w.r.t both much higher performance and
>>>>>> basically no resource waiting time, "night-and-day" difference
>>>>>> quoting Wes.
>>>>>>
>>>>>> Re: > If we can just switch to our own Travis pool, just for our
>>>>>> project, then this might be something we can do fairly quickly?
>>>>>>
>>>>>> I believe so, according to [3] and [4]
>>>>>>
>>>>>>
>>>>>> [1] https://ci.ursalabs.org/ <https://ci.ursalabs.org/#/>
>>>>>> [2] https://github.com/ursa-labs/ursabot
>>>>>> [3]
>>>>>>
>> https://docs.travis-ci.com/user/migrate/open-source-repository-migration
>>>>>> [4]
>>>>>> https://docs.travis-ci.com/user/migrate/open-source-on-travis-ci-com
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Jul 3, 2019 at 12:01 AM Chesnay Schepler
>>>>>> <chesnay@apache.org <ma...@apache.org>> wrote:
>>>>>>
>>>>>>      Are they using their own Travis CI pool, or did the switch to an
>>>>>>      entirely different CI service?
>>>>>>
>>>>>>      If we can just switch to our own Travis pool, just for our
>>>>>>      project, then
>>>>>>      this might be something we can do fairly quickly?
>>>>>>
>>>>>>      On 03/07/2019 05:55, Bowen Li wrote:
>>>>>>      > I responded in the INFRA ticket [1] that I believe they are
>>>>>>      using a wrong
>>>>>>      > metric against Flink and the total build time is a completely
>>>>>>      different
>>>>>>      > thing than guaranteed build capacity.
>>>>>>      >
>>>>>>      > My response:
>>>>>>      >
>>>>>>      > "As mentioned above, since I started to pay attention to Flink's
>>>>>>      build
>>>>>>      > queue a few tens of days ago, I'm in Seattle and I saw no build
>>>>>>      was kicking
>>>>>>      > off in PST daytime in weekdays for Flink. Our teammates in China
>>>>>>      and Europe
>>>>>>      > have also reported similar observations. So we need to evaluate
>>>>>>      how the
>>>>>>      > large total build time came from - if 1) your number and 2) our
>>>>>>      > observations from three locations that cover pretty much a full
>>>>>>      day, are
>>>>>>      > all true, I **guess** one reason can be that - highly likely the
>>>>>>      extra
>>>>>>      > build time came from weekends when other Apache projects may be
>>>>>>      idle and
>>>>>>      > Flink just drains hard its congested queue.
>>>>>>      >
>>>>>>      > Please be aware of that we're not complaining about the lack of
>>>>>>      resources
>>>>>>      > in general, I'm complaining about the lack of **stable,
>>>>>> dedicated**
>>>>>>      > resources. An example for the latter one is, currently even if
>>>>>>      no build is
>>>>>>      > in Flink's queue and I submit a request to be the queue head
>>>>>> in PST
>>>>>>      > morning, my build won't even start in 6-8+h. That is an absurd
>>>>>>      amount of
>>>>>>      > waiting time.
>>>>>>      >
>>>>>>      > That's saying, if ASF INFRA decides to adopt a quota system and
>>>>>>      grants
>>>>>>      > Flink five DEDICATED servers that runs all the time only for
>>>>>>      Flink, that'll
>>>>>>      > be PERFECT and can totally solve our problem now.
>>>>>>      >
>>>>>>      > Please be aware of that we're not complaining about the lack of
>>>>>>      resources
>>>>>>      > in general, I'm complaining about the lack of **stable,
>>>>>> dedicated**
>>>>>>      > resources. An example for the latter one is, currently even if
>>>>>>      no build is
>>>>>>      > in Flink's queue and I submit a request to be the queue head
>>>>>> in PST
>>>>>>      > morning, my build won't even start in 6-8+h. That is an absurd
>>>>>>      amount of
>>>>>>      > waiting time.
>>>>>>      >
>>>>>>      >
>>>>>>      > That's saying, if ASF INFRA decides to adopt a quota system and
>>>>>>      grants
>>>>>>      > Flink five DEDICATED servers that runs all the time only for
>>>>>>      Flink, that'll
>>>>>>      > be PERFECT and can totally solve our problem now.
>>>>>>      >
>>>>>>      > I feel what's missing in the ASF INFRA's Travis resource pool is
>>>>>>      some level
>>>>>>      > of build capacity SLAs and certainty"
>>>>>>      >
>>>>>>      >
>>>>>>      > Again, I believe there are differences in nature of these two
>>>>>>      problems,
>>>>>>      > long build time v.s. lack of dedicated build resource. That's
>>>>>>      saying,
>>>>>>      > shortening build time may relieve the situation, and may not.
>>>>>>      I'm sightly
>>>>>>      > negative on disabling IT cases for PRs, due to the downside is
>>>>>>      that we are
>>>>>>      > at risk of any potential bugs in PR that UTs doesn't catch, and
>>>>>>      may cost a
>>>>>>      > lot more to fix and if it slows others down or even block
>>>>>>      others, but am
>>>>>>      > open to others opinions on it.
>>>>>>      >
>>>>>>      > AFAICT from INFRA ticket[1], donating to ASF INFRA won't be
>>>>>>      feasible to
>>>>>>      > solve our problem since INFRA's pool is fully shared and they
>>>>>>      have no
>>>>>>      > control and finer insights over resource allocation to a
>>>>>>      specific Apache
>>>>>>      > project. As mentioned in [1], Apache Arrow is moving away from
>>>>>>      ASF INFRA
>>>>>>      > Travis pool (they are actually surprised Flink hasn't plan to do
>>>>>>      so). I
>>>>>>      > know that Spark is on its own build infra. If we all agree that
>>>>>>      funding our
>>>>>>      > own build infra, I'd be glad to help investigate any potential
>>>>>>      options
>>>>>>      > after releasing 1.9 since I'm super busy with 1.9 now.
>>>>>>      >
>>>>>>      > [1] https://issues.apache.org/jira/browse/INFRA-18533
>>>>>>      >
>>>>>>      >
>>>>>>      >
>>>>>>      > On Tue, Jul 2, 2019 at 4:46 AM Chesnay Schepler
>>>>>>      <chesnay@apache.org <ma...@apache.org>> wrote:
>>>>>>      >
>>>>>>      >> As a short-term stopgap, since we can assume this issue to
>>>>>>      become much
>>>>>>      >> worse in the following days/weeks, we could disable IT cases in
>>>>>>      PRs and
>>>>>>      >> only run them on master.
>>>>>>      >>
>>>>>>      >> On 02/07/2019 12:03, Chesnay Schepler wrote:
>>>>>>      >>> People really have to stop thinking that just because
>>>>>>      something works
>>>>>>      >>> for us it is also a good solution.
>>>>>>      >>> Also, please remember that our builds run for 2h from start to
>>>>>>      finish,
>>>>>>      >>> and not the 14 _minutes_ it takes for zeppelin.
>>>>>>      >>> We are dealing with an entirely different scale here, both in
>>>>>>      terms of
>>>>>>      >>> build times and number of builds.
>>>>>>      >>>
>>>>>>      >>> In this very thread people have been complaining about long
>>>>>> queue
>>>>>>      >>> times for their builds. Surprise, other Apache projects
>>>>>> have been
>>>>>>      >>> suffering the very same thing due to us not controlling our
>>>>>> build
>>>>>>      >>> times. While switching services (be it Jenkins, CircleCI or
>>>>>>      whatever)
>>>>>>      >>> will possibly work for us (and these options are actually
>>>>>>      attractive,
>>>>>>      >>> like CircleCI's proper support for build artifacts), it
>>>>>> will also
>>>>>>      >>> result in us likely negatively affecting other projects in
>>>>>>      significant
>>>>>>      >>> ways.
>>>>>>      >>>
>>>>>>      >>> Sure, the Jenkins setup has a good user experience for us, at
>>>>>>      the cost
>>>>>>      >>> of blocking Jenkins workers for a _lot_ of time. Right now we
>>>>>>      have 25
>>>>>>      >>> PR's in our queue; that's possibly 50h we'd consume of Jenkins
>>>>>>      >>> resources, and the European contributors haven't even really
>>>>>>      started yet.
>>>>>>      >>>
>>>>>>      >>> FYI, the latest INFRA response from INFRA-18533:
>>>>>>      >>>
>>>>>>      >>> "Our rough metrics shows that Flink used over 5800 hours of
>>>>>>      build time
>>>>>>      >>> last month. That is equal to EIGHT servers running 24/7 for
>>>>>>      the ENTIRE
>>>>>>      >>> MONTH. EIGHT. nonstop.
>>>>>>      >>> When we discovered this last night, we discussed it some and
>>>>>>      are going
>>>>>>      >>> to tune down Flink to allow only five executors maximum. We
>>>>>> cannot
>>>>>>      >>> allow Flink to consume so much of a Foundation shared
>>>>>> resource."
>>>>>>      >>>
>>>>>>      >>> So yes, we either
>>>>>>      >>> a) have to heavily reduce our CI usage or
>>>>>>      >>> b) fund our own, either maintaining it ourselves or donating
>>>>>>      to Apache.
>>>>>>      >>>
>>>>>>      >>> On 02/07/2019 05:11, Bowen Li wrote:
>>>>>>      >>>> By looking at the git history of the Jenkins script, its core
>>>>>>      part
>>>>>>      >>>> was finished in March 2017 (and only two minor update in
>>>>>>      2017/2018),
>>>>>>      >>>> so it's been running for over two years now and feels like
>>>>>>      Zepplin
>>>>>>      >>>> community has been quite happy with it. @Jeff Zhang
>>>>>>      >>>> <mailto:zjffdu@gmail.com <ma...@gmail.com>> can you
>>>>>>      share your insights and user
>>>>>>      >>>> experience with the Jenkins+Travis approach?
>>>>>>      >>>>
>>>>>>      >>>> Things like:
>>>>>>      >>>>
>>>>>>      >>>> - has the approach completely solved the resource capacity
>>>>>>      problem
>>>>>>      >>>> for Zepplin community? is Zepplin community happy with the
>>>>>>      result?
>>>>>>      >>>> - is the whole configuration chain stable (e.g. uptime)
>>>>>> enough?
>>>>>>      >>>> - how often do you need to maintain the Jenkins infra? how
>>>>>> many
>>>>>>      >>>> people are usually involved in maintenance and bug-fixes?
>>>>>>      >>>>
>>>>>>      >>>> The downside of this approach seems mostly to be on the
>>>>>>      maintenance
>>>>>>      >>>> to me - maintain the script and Jenkins infra.
>>>>>>      >>>>
>>>>>>      >>>> ** Having Our Own Travis-CI.com Account **
>>>>>>      >>>>
>>>>>>      >>>> Another alternative I've been thinking of is to have our own
>>>>>>      >>>> travis-ci.com <http://travis-ci.com> <http://travis-ci.com>
>>>>>>      account with paid dedicated
>>>>>>      >>>> resources. Note travis-ci.org <http://travis-ci.org>
>>>>>>      <http://travis-ci.org> is the free
>>>>>>      >>>> version and travis-ci.com <http://travis-ci.com>
>>>>>>      <http://travis-ci.com> is the commercial
>>>>>>      >>>> version. We currently use a shared resource pool managed by
>>>>>>      ASK INFRA
>>>>>>      >>>> team on travis-ci.org <http://travis-ci.org>
>>>>>>      <http://travis-ci.org>, but we have no control
>>>>>>      >>>> over it - we can't see how it's configured, how much
>>>>>>      resources are
>>>>>>      >>>> available, how resources are allocated among Apache projects,
>>>>>>      etc.
>>>>>>      >>>> The nice thing about having an account on travis-ci.com
>>>>>>      <http://travis-ci.com>
>>>>>>      >>>> <http://travis-ci.com> are:
>>>>>>      >>>>
>>>>>>      >>>> - relatively low cost with much better resource guarantee
>>>>>>      than what
>>>>>>      >>>> we currently have [1]: $249/month with 5 dedicated
>>>>>> concurrency,
>>>>>>      >>>> $489/month with 10 concurrency
>>>>>>      >>>> - low maintenance work compared to using Jenkins
>>>>>>      >>>> - (potentially) no migration cost according to Travis's
>>>>>> doc [2]
>>>>>>      >>>> (pending verification)
>>>>>>      >>>> - full control over the build capacity/configuration
>>>>>> compared to
>>>>>>      >>>> using ASF INFRA's pool
>>>>>>      >>>>
>>>>>>      >>>> I'd be surprised if we as such a vibrant community cannot
>>>>>>      find and
>>>>>>      >>>> fund $249*12=$2988 a year in exchange for a much better
>>>>>> developer
>>>>>>      >>>> experience and much higher productivity.
>>>>>>      >>>>
>>>>>>      >>>> [1] https://travis-ci.com/plans
>>>>>>      >>>> [2]
>>>>>>      >>>>
>>>>>>      >>
>>>>>>
>> https://docs.travis-ci.com/user/migrate/open-source-repository-migration
>>>>>>      >>>> On Sat, Jun 29, 2019 at 8:39 AM Chesnay Schepler
>>>>>>      <chesnay@apache.org <ma...@apache.org>
>>>>>>      >>>> <mailto:chesnay@apache.org <ma...@apache.org>>>
>>>>>> wrote:
>>>>>>      >>>>
>>>>>>      >>>>      So yes, the Jenkins job keeps pulling the state from
>>>>>>      Travis until it
>>>>>>      >>>>      finishes.
>>>>>>      >>>>
>>>>>>      >>>>      Note sure I'm comfortable with the idea of using Jenkins
>>>>>>      workers
>>>>>>      >>>>      just to
>>>>>>      >>>>      idle for a several hours.
>>>>>>      >>>>
>>>>>>      >>>>      On 29/06/2019 14:56, Jeff Zhang wrote:
>>>>>>      >>>>      > Here's what zeppelin community did, we make a python
>>>>>>      script to
>>>>>>      >>>>      check the
>>>>>>      >>>>      > build status of pull request.
>>>>>>      >>>>      > Here's script:
>>>>>>      >>>>      >
>>>>>> https://github.com/apache/zeppelin/blob/master/travis_check.py
>>>>>>      >>>>      >
>>>>>>      >>>>      > And this is the script we used in Jenkins build job.
>>>>>>      >>>>      >
>>>>>>      >>>>      > if [ -f "travis_check.py" ]; then
>>>>>>      >>>>      >    git log -n 1
>>>>>>      >>>>      >    STATUS=$(curl -s $BUILD_URL | grep -e "GitHub pull
>>>>>>      >>>>      request.*from.*" | sed
>>>>>>      >>>>      > 's/.*GitHub pull request <a
>>>>>>      >>>>      > href=\"\(https[^"]*\).*from[^"]*.\(https[^"]*\).*/\1
>>>>>>      \2/g')
>>>>>>      >>>>      >    AUTHOR=$(echo $STATUS | sed 's/.*[/]\(.*\)$/\1/g')
>>>>>>      >>>>      >    PR=$(echo $STATUS | awk '{print $1}' | sed
>>>>>>      >>>> 's/.*[/]\(.*\)$/\1/g')
>>>>>>      >>>>      >    #COMMIT=$(git log -n 1 | grep "^Merge:" | awk
>>>>>>      '{print $3}')
>>>>>>      >>>>      >    #if [ -z $COMMIT ]; then
>>>>>>      >>>>      >    #  COMMIT=$(curl -s
>>>>>>      >>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR
>>>>>>      >>>>      > | grep -e "\"label\":" -e "\"ref\":" -e "\"sha\":" |
>>>>>>      tr '\n' ' '
>>>>>>      >>>>      | sed
>>>>>>      >>>>      > 's/\(.*sha[^,]*,\)\(.*ref.*\)/\1 = \2/g' | tr = '\n' |
>>>>>>      grep -v
>>>>>>      >>>>      "apache:" |
>>>>>>      >>>>      > sed 's/.*sha.[^"]*["]\([^"]*\).*/\1/g')
>>>>>>      >>>>      >    #fi
>>>>>>      >>>>      >
>>>>>>      >>>>      >    # get commit hash from PR
>>>>>>      >>>>      >    COMMIT=$(curl -s
>>>>>>      >>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR |
>>>>>>      >>>>      > grep -e "\"label\":" -e "\"ref\":" -e "\"sha\":" | tr
>>>>>>      '\n' ' '
>>>>>>      >>>> | sed
>>>>>>      >>>>      > 's/\(.*sha[^,]*,\)\(.*ref.*\)/\1 = \2/g' | tr = '\n' |
>>>>>>      grep -v
>>>>>>      >>>>      "apache:" |
>>>>>>      >>>>      > sed 's/.*sha.[^"]*["]\([^"]*\).*/\1/g')
>>>>>>      >>>>      >    sleep 30 # sleep few moment to wait travis starts
>>>>>>      the build
>>>>>>      >>>>      >    RET_CODE=0
>>>>>>      >>>>      >    python ./travis_check.py ${AUTHOR} ${COMMIT} ||
>>>>>>      RET_CODE=$?
>>>>>>      >>>>      >    if [ $RET_CODE -eq 2 ]; then # try with repository
>>>>>>      name when
>>>>>>      >>>>      travis-ci is
>>>>>>      >>>>      > not available in the account
>>>>>>      >>>>      >      RET_CODE=0
>>>>>>      >>>>      >      AUTHOR=$(curl -s
>>>>>>      >>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR
>>>>>>      >>>>      > | grep '"full_name":' | grep -v "apache/zeppelin" |
>>>>>> sed
>>>>>>      >>>>      > 's/.*[:][^"]*["]\([^/]*\).*/\1/g')
>>>>>>      >>>>      >    python ./travis_check.py ${AUTHOR} ${COMMIT} ||
>>>>>>      RET_CODE=$?
>>>>>>      >>>>      >    fi
>>>>>>      >>>>      >
>>>>>>      >>>>      >    if [ $RET_CODE -eq 2 ]; then # fail with can't find
>>>>>>      build
>>>>>>      >>>>      information in
>>>>>>      >>>>      > the travis
>>>>>>      >>>>      >      set +x
>>>>>>      >>>>      >      echo
>>>>>>      "-----------------------------------------------------"
>>>>>>      >>>>      >      echo "Looks like travis-ci is not configured for
>>>>>>      your fork."
>>>>>>      >>>>      >      echo "Please setup by swich on 'zeppelin'
>>>>>>      repository at
>>>>>>      >>>>      > https://travis-ci.org/profile and travis-ci."
>>>>>>      >>>>      >      echo "And then make sure 'Build branch updates'
>>>>>>      option is
>>>>>>      >>>>      enabled in
>>>>>>      >>>>      > the settings
>>>>>>      https://travis-ci.org/${AUTHOR}/zeppelin/settings
>>>>>> <https://travis-ci.org/$%7BAUTHOR%7D/zeppelin/settings>
>>>>>>      >>>> <https://travis-ci.org/$%7BAUTHOR%7D/zeppelin/settings>."
>>>>>>      >>>>      >      echo ""
>>>>>>      >>>>      >      echo "To trigger CI after setup, you will need
>>>>>>      ammend your
>>>>>>      >>>>      last commit
>>>>>>      >>>>      > with"
>>>>>>      >>>>      >      echo "git commit --amend"
>>>>>>      >>>>      >      echo "git push your-remote HEAD --force"
>>>>>>      >>>>      >      echo ""
>>>>>>      >>>>      >      echo "See
>>>>>>      >>>>      >
>>>>>>      >>>>
>>>>>>      >>
>>>>>>
>> http://zeppelin.apache.org/contribution/contributions.html#continuous-integration
>>>>>>      >>>>      > ."
>>>>>>      >>>>      >    fi
>>>>>>      >>>>      >
>>>>>>      >>>>      >    exit $RET_CODE
>>>>>>      >>>>      > else
>>>>>>      >>>>      >    set +x
>>>>>>      >>>>      >    echo "travis_check.py does not exists"
>>>>>>      >>>>      >    exit 1
>>>>>>      >>>>      > fi
>>>>>>      >>>>      >
>>>>>>      >>>>      > Chesnay Schepler <chesnay@apache.org
>>>>>>      <ma...@apache.org>
>>>>>>      >>>>      <mailto:chesnay@apache.org <mailto:chesnay@apache.org
>>>>>>      于2019年6月29日周六 下午3:17写道:
>>>>>>      >>>>      >
>>>>>>      >>>>      >> Does this imply that a Jenkins job is active as long
>>>>>>      as the
>>>>>>      >>>>      Travis build
>>>>>>      >>>>      >> runs?
>>>>>>      >>>>      >>
>>>>>>      >>>>      >> On 26/06/2019 21:28, Bowen Li wrote:
>>>>>>      >>>>      >>> Hi,
>>>>>>      >>>>      >>>
>>>>>>      >>>>      >>> @Dawid, I think the "long test running" as I
>>>>>>      mentioned in the
>>>>>>      >>>>      first
>>>>>>      >>>>      >> email,
>>>>>>      >>>>      >>> also as you guys said, belongs to "a big effort
>>>>>>      which is much
>>>>>>      >>>>      harder to
>>>>>>      >>>>      >>> accomplish in a short period of time and may deserve
>>>>>>      its own
>>>>>>      >>>>      separate
>>>>>>      >>>>      >>> discussion". Thus I didn't include it in what we can
>>>>>>      do in a
>>>>>>      >>>>      foreseeable
>>>>>>      >>>>      >>> short term.
>>>>>>      >>>>      >>>
>>>>>>      >>>>      >>> Besides, I don't think that's the ultimate reason
>>>>>>      for lack of
>>>>>>      >>>>      build
>>>>>>      >>>>      >>> resources. Even if the build is shortened to
>>>>>>      something like
>>>>>>      >>>>      2h, the
>>>>>>      >>>>      >>> problems of no build machine works about 6 or more
>>>>>>      hours in
>>>>>>      >>>>      PST daytime
>>>>>>      >>>>      >>> that I described will still happen, because no
>>>>>>      machine from
>>>>>>      >>>>      ASF INFRA's
>>>>>>      >>>>      >>> pool is allocated to Flink. As I have paid close
>>>>>>      attention to
>>>>>>      >>>>      the build
>>>>>>      >>>>      >>> queue in the past few weekdays, it's a pretty clear
>>>>>>      pattern now.
>>>>>>      >>>>      >>>
>>>>>>      >>>>      >>> **The ultimate root cause** for that is - we don't
>>>>>>      have any
>>>>>>      >>>>      **dedicated**
>>>>>>      >>>>      >>> build resources that we can stably rely on. I'm
>>>>>>      actually ok to
>>>>>>      >>>>      wait for a
>>>>>>      >>>>      >>> long time if there are build requests running, it
>>>>>>      means at
>>>>>>      >>>>      least we are
>>>>>>      >>>>      >>> making progress. But I'm not ok with no build
>>>>>>      resource. A
>>>>>>      >>>>      better place I
>>>>>>      >>>>      >>> think we should aim at in short term is to always
>>>>>>      have at
>>>>>>      >>>>      least a central
>>>>>>      >>>>      >>> pool (can be 3 or 5) of machines dedicated to build
>>>>>>      Flink at
>>>>>>      >>>>      any time, or
>>>>>>      >>>>      >>> maybe use users resources.
>>>>>>      >>>>      >>>
>>>>>>      >>>>      >>> @Chesnay @Robert I synced with Jeff offline that
>>>>>>      Zeppelin
>>>>>>      >>>>      community is
>>>>>>      >>>>      >>> using a Jenkins job to automatically build on users'
>>>>>>      travis
>>>>>>      >>>>      account and
>>>>>>      >>>>      >>> link the result back to github PR. I guess the
>>>>>>      Jenkins job
>>>>>>      >>>>      would fetch
>>>>>>      >>>>      >>> latest upstream master and build the PR against it.
>>>>>>      Jeff has
>>>>>>      >>>> filed
>>>>>>      >>>>      >> tickets
>>>>>>      >>>>      >>> to learn and get access to the Jenkins infra. It'll
>>>>>>      better to
>>>>>>      >>>>      fully
>>>>>>      >>>>      >>> understand it first before judging this approach.
>>>>>>      >>>>      >>>
>>>>>>      >>>>      >>> I also heard good things about CircleCI, and ASF
>>>>>>      INFRA seems
>>>>>>      >>>>      to have a
>>>>>>      >>>>      >> pool
>>>>>>      >>>>      >>> of build capacity there too. Can be an alternative
>>>>>>      to consider.
>>>>>>      >>>>      >>>
>>>>>>      >>>>      >>>
>>>>>>      >>>>      >>>
>>>>>>      >>>>      >>>
>>>>>>      >>>>      >>>
>>>>>>      >>>>      >>>
>>>>>>      >>>>      >>>
>>>>>>      >>>>      >>>
>>>>>>      >>>>      >>>
>>>>>>      >>>>      >>> On Wed, Jun 26, 2019 at 12:44 AM Dawid Wysakowicz <
>>>>>>      >>>>      >> dwysakowicz@apache.org
>>>>>>      <ma...@apache.org> <mailto:dwysakowicz@apache.org
>>>>>>      <ma...@apache.org>>>
>>>>>>      >>>>      >>> wrote:
>>>>>>      >>>>      >>>
>>>>>>      >>>>      >>>> Sorry to jump in late, but I think Bowen missed the
>>>>>>      most
>>>>>>      >>>>      important point
>>>>>>      >>>>      >>>> from Chesnay's previous message in the summary. The
>>>>>>      ultimate
>>>>>>      >>>>      reason for
>>>>>>      >>>>      >>>> all the problems is that the tests take close to 2
>>>>>>      hours to
>>>>>>      >>>>      run already.
>>>>>>      >>>>      >>>> I fully support this claim: "Unless people start
>>>>>>      caring about
>>>>>>      >>>>      test times
>>>>>>      >>>>      >>>> before adding them, this issue cannot be solved"
>>>>>>      >>>>      >>>>
>>>>>>      >>>>      >>>> This is also another reason why using user's Travis
>>>>>>      account
>>>>>>      >>>>      won't help.
>>>>>>      >>>>      >>>> Every few weeks we reach the user's time limit for
>>>>>>      a single
>>>>>>      >>>>      profile.
>>>>>>      >>>>      >>>> This makes the user's builds simply fail, until we
>>>>>>      either
>>>>>>      >>>>      properly
>>>>>>      >>>>      >>>> decrease the time the tests take (which I am not
>>>>>>      sure we ever
>>>>>>      >>>>      did) or
>>>>>>      >>>>      >>>> postpone the problem by splitting into more
>>>>>>      profiles. (Note
>>>>>>      >>>>      that the ASF
>>>>>>      >>>>      >>>> Travis account has higher time limits)
>>>>>>      >>>>      >>>>
>>>>>>      >>>>      >>>> Best,
>>>>>>      >>>>      >>>>
>>>>>>      >>>>      >>>> Dawid
>>>>>>      >>>>      >>>>
>>>>>>      >>>>      >>>> On 26/06/2019 09:36, Robert Metzger wrote:
>>>>>>      >>>>      >>>>> Do we know if using "the best" available hardware
>>>>>>      would
>>>>>>      >>>>      improve the
>>>>>>      >>>>      >> build
>>>>>>      >>>>      >>>>> times?
>>>>>>      >>>>      >>>>> Imagine we would run the build on machines with
>>>>>>      plenty of
>>>>>>      >>>>      main memory
>>>>>>      >>>>      >> to
>>>>>>      >>>>      >>>>> mount everything to ramdisk + the latest CPU
>>>>>>      architecture?
>>>>>>      >>>>      >>>>>
>>>>>>      >>>>      >>>>> Throwing hardware at the problem could help reduce
>>>>>>      the time
>>>>>>      >>>>      of an
>>>>>>      >>>>      >>>>> individual build, and using our own infrastructure
>>>>>>      would
>>>>>>      >>>>      remove our
>>>>>>      >>>>      >>>>> dependency on Apache's Travis account (with the
>>>>>>      obvious
>>>>>>      >>>>      downside of
>>>>>>      >>>>      >>>> having
>>>>>>      >>>>      >>>>> to maintain the infrastructure)
>>>>>>      >>>>      >>>>> We could use an open source travis alternative, to
>>>>>>      have a
>>>>>>      >>>>      similar
>>>>>>      >>>>      >>>>> experience and make the migration easy.
>>>>>>      >>>>      >>>>>
>>>>>>      >>>>      >>>>>
>>>>>>      >>>>      >>>>> On Wed, Jun 26, 2019 at 9:34 AM Chesnay Schepler
>>>>>>      >>>>      <chesnay@apache.org <ma...@apache.org>
>>>>>>      <mailto:chesnay@apache.org <ma...@apache.org>>>
>>>>>>      >>>>      >>>> wrote:
>>>>>>      >>>>      >>>>>> >From what I gathered, there's no special
>>>>>>      sauce that the
>>>>>>      >>>>      Zeppelin
>>>>>>      >>>>      >>>>>> project uses which actually integrates a users
>>>>>> Travis
>>>>>>      >>>>      account into the
>>>>>>      >>>>      >>>> PR.
>>>>>>      >>>>      >>>>>> They just disabled Travis for PRs. And that's
>>>>>>      kind of it.
>>>>>>      >>>>      >>>>>>
>>>>>>      >>>>      >>>>>> Naturally we can do this (duh) and safe the ASF a
>>>>>>      fair
>>>>>>      >>>>      amount of
>>>>>>      >>>>      >>>>>> resources, but there are downsides:
>>>>>>      >>>>      >>>>>>
>>>>>>      >>>>      >>>>>> The discoverability of the Travis check takes a
>>>>>>      nose-dive.
>>>>>>      >>>>      Either we
>>>>>>      >>>>      >>>>>> require every contributor to always, an every
>>>>>>      commit, also
>>>>>>      >>>>      post a
>>>>>>      >>>>      >> Travis
>>>>>>      >>>>      >>>>>> build, or we have the reviewer sift through the
>>>>>>      >>>>      contributors account
>>>>>>      >>>>      >> to
>>>>>>      >>>>      >>>>>> find it.
>>>>>>      >>>>      >>>>>>
>>>>>>      >>>>      >>>>>> This is rather cumbersome. Additionally, it's
>>>>>>      also not
>>>>>>      >>>>      equivalent to
>>>>>>      >>>>      >>>>>> having a PR build.
>>>>>>      >>>>      >>>>>>
>>>>>>      >>>>      >>>>>> A normal branch build takes a branch as is and
>>>>>>      tests it. A
>>>>>>      >>>>      PR build
>>>>>>      >>>>      >>>>>> merges the branch into master, and then runs it.
>>>>>>      (Fun fact:
>>>>>>      >>>>      This is
>>>>>>      >>>>      >> why
>>>>>>      >>>>      >>>>>> a PR without merge conflicts is not being run on
>>>>>>      Travis.)
>>>>>>      >>>>      >>>>>>
>>>>>>      >>>>      >>>>>> And ultimately, everyone can already make use
>>>>>> of this
>>>>>>      >>>>      approach anyway.
>>>>>>      >>>>      >>>>>>
>>>>>>      >>>>      >>>>>> On 25/06/2019 08:02, Jark Wu wrote:
>>>>>>      >>>>      >>>>>>> Hi Jeff,
>>>>>>      >>>>      >>>>>>>
>>>>>>      >>>>      >>>>>>> Thanks for sharing the Zeppelin approach. I
>>>>>>      think it's a
>>>>>>      >>>>      good idea to
>>>>>>      >>>>      >>>>>>> leverage user's travis account.
>>>>>>      >>>>      >>>>>>> In this way, we can have almost unlimited
>>>>>>      concurrent build
>>>>>>      >>>>      jobs and
>>>>>>      >>>>      >>>>>>> developers can restart build by themselves
>>>>>>      (currently only
>>>>>>      >>>>      committers
>>>>>>      >>>>      >>>>>>> can restart PR's build).
>>>>>>      >>>>      >>>>>>>
>>>>>>      >>>>      >>>>>>> But I'm still not very clear how to integrate
>>>>>> user's
>>>>>>      >>>>      travis build
>>>>>>      >>>>      >> into
>>>>>>      >>>>      >>>>>>> the Flink pull request's build automatically.
>>>>>>      Can you
>>>>>>      >>>>      explain more in
>>>>>>      >>>>      >>>>>>> detail?
>>>>>>      >>>>      >>>>>>>
>>>>>>      >>>>      >>>>>>> Another question: does travis only build
>>>>>>      branches for user
>>>>>>      >>>>      account?
>>>>>>      >>>>      >>>>>>> My concern is that builds for PRs will rebase
>>>>>> user's
>>>>>>      >>>>      commits against
>>>>>>      >>>>      >>>>>>> current master branch.
>>>>>>      >>>>      >>>>>>> This will help us to find problems before
>>>>>>      merge.  Builds
>>>>>>      >>>>      for branches
>>>>>>      >>>>      >>>>>>> will lose the impact of new commits in master.
>>>>>>      >>>>      >>>>>>> How does Zeppelin solve this problem?
>>>>>>      >>>>      >>>>>>>
>>>>>>      >>>>      >>>>>>> Thanks again for sharing the idea.
>>>>>>      >>>>      >>>>>>>
>>>>>>      >>>>      >>>>>>> Regards,
>>>>>>      >>>>      >>>>>>> Jark
>>>>>>      >>>>      >>>>>>>
>>>>>>      >>>>      >>>>>>> On Tue, 25 Jun 2019 at 11:01, Jeff Zhang
>>>>>>      <zjffdu@gmail.com <ma...@gmail.com>
>>>>>>      >>>>      <mailto:zjffdu@gmail.com <ma...@gmail.com>>
>>>>>>      >>>>      >>>>>>> <mailto:zjffdu@gmail.com
>>>>>>      <ma...@gmail.com> <mailto:zjffdu@gmail.com
>>>>>>      <ma...@gmail.com>>>> wrote:
>>>>>>      >>>>      >>>>>>>
>>>>>>      >>>>      >>>>>>>  Hi Folks,
>>>>>>      >>>>      >>>>>>>
>>>>>>      >>>>      >>>>>>>  Zeppelin meet this kind of issue before, we
>>>>>> solve
>>>>>>      >>>> it by
>>>>>>      >>>>      >> delegating
>>>>>>      >>>>      >>>>>>>  each
>>>>>>      >>>>      >>>>>>>  one's PR build to his travis account
>>>>>>      (Everyone can
>>>>>>      >>>>      have 5 free
>>>>>>      >>>>      >>>>>>>  slot for
>>>>>>      >>>>      >>>>>>>  travis build).
>>>>>>      >>>>      >>>>>>>  Apache account travis build is only
>>>>>> triggered when
>>>>>>      >>>>      PR is merged.
>>>>>>      >>>>      >>>>>>>
>>>>>>      >>>>      >>>>>>>
>>>>>>      >>>>      >>>>>>>
>>>>>>      >>>>      >>>>>>>  Kurt Young <ykt836@gmail.com
>>>>>>      <ma...@gmail.com>
>>>>>>      >>>>      <mailto:ykt836@gmail.com <ma...@gmail.com>>
>>>>>>      <mailto:ykt836@gmail.com <ma...@gmail.com>
>>>>>>      >>>>      <mailto:ykt836@gmail.com <ma...@gmail.com>>>>
>>>>>>      >>>>      >>>>>>>  于2019年6月25日周二 上午10:16写道:
>>>>>>      >>>>      >>>>>>>
>>>>>>      >>>>      >>>>>>>  > (Forgot to cc George)
>>>>>>      >>>>      >>>>>>>  >
>>>>>>      >>>>      >>>>>>>  > Best,
>>>>>>      >>>>      >>>>>>>  > Kurt
>>>>>>      >>>>      >>>>>>>  >
>>>>>>      >>>>      >>>>>>>  >
>>>>>>      >>>>      >>>>>>>  > On Tue, Jun 25, 2019 at 10:16 AM Kurt Young
>>>>>>      >>>>      <ykt836@gmail.com <ma...@gmail.com>
>>>>>>      <mailto:ykt836@gmail.com <ma...@gmail.com>>
>>>>>>      >>>>      >>>>>>> <mailto:ykt836@gmail.com
>>>>>>      <ma...@gmail.com> <mailto:ykt836@gmail.com
>>>>>>      <ma...@gmail.com>>>>
>>>>>>      >>>>      wrote:
>>>>>>      >>>>      >>>>>>>  >
>>>>>>      >>>>      >>>>>>>  > > Hi Bowen,
>>>>>>      >>>>      >>>>>>>  > >
>>>>>>      >>>>      >>>>>>>  > > Thanks for bringing this up. We
>>>>>>      actually have
>>>>>>      >>>>      discussed
>>>>>>      >>>>      >> about
>>>>>>      >>>>      >>>>>>>  this, and I
>>>>>>      >>>>      >>>>>>>  > > think Till and George have
>>>>>>      >>>>      >>>>>>>  > > already spend sometime investigating
>>>>>>      it. I have
>>>>>>      >>>>      cced both of
>>>>>>      >>>>      >>>>>>>  them, and
>>>>>>      >>>>      >>>>>>>  > > maybe they can share
>>>>>>      >>>>      >>>>>>>  > > their findings.
>>>>>>      >>>>      >>>>>>>  > >
>>>>>>      >>>>      >>>>>>>  > > Best,
>>>>>>      >>>>      >>>>>>>  > > Kurt
>>>>>>      >>>>      >>>>>>>  > >
>>>>>>      >>>>      >>>>>>>  > >
>>>>>>      >>>>      >>>>>>>  > > On Tue, Jun 25, 2019 at 10:08 AM Jark Wu
>>>>>>      >>>>      <imjark@gmail.com <ma...@gmail.com>
>>>>>>      <mailto:imjark@gmail.com <ma...@gmail.com>>
>>>>>>      >>>>      >>>>>>> <mailto:imjark@gmail.com
>>>>>>      <ma...@gmail.com> <mailto:imjark@gmail.com
>>>>>>      <ma...@gmail.com>>>>
>>>>>>      >>>>      wrote:
>>>>>>      >>>>      >>>>>>>  > >
>>>>>>      >>>>      >>>>>>>  > >> Hi Bowen,
>>>>>>      >>>>      >>>>>>>  > >>
>>>>>>      >>>>      >>>>>>>  > >> Thanks for bringing this. We also
>>>>>>      suffered from
>>>>>>      >>>>      the long
>>>>>>      >>>>      >>>>>>>  build time.
>>>>>>      >>>>      >>>>>>>  > >> I agree that we should focus on
>>>>>>      solving build
>>>>>>      >>>>      capacity
>>>>>>      >>>>      >>>>>>>  problem in the
>>>>>>      >>>>      >>>>>>>  > >> thread.
>>>>>>      >>>>      >>>>>>>  > >>
>>>>>>      >>>>      >>>>>>>  > >> My observation is there is only one
>>>>>>      build is
>>>>>>      >>>>      running, all
>>>>>>      >>>>      >> the
>>>>>>      >>>>      >>>>>>>  others
>>>>>>      >>>>      >>>>>>>  > >> (other
>>>>>>      >>>>      >>>>>>>  > >> PRs, master) are pending.
>>>>>>      >>>>      >>>>>>>  > >> The pricing plan[1] of travis shows
>>>>>>      it can
>>>>>>      >>>> support
>>>>>>      >>>>      >> concurrent
>>>>>>      >>>>      >>>>>>>  build
>>>>>>      >>>>      >>>>>>>  > jobs.
>>>>>>      >>>>      >>>>>>>  > >> But I don't know which plan we are
>>>>>>      using, might
>>>>>>      >>>>      be the free
>>>>>>      >>>>      >>>>>>>  plan for
>>>>>>      >>>>      >>>>>>>  > open
>>>>>>      >>>>      >>>>>>>  > >> source.
>>>>>>      >>>>      >>>>>>>  > >>
>>>>>>      >>>>      >>>>>>>  > >> I cc-ed Chesnay who may have some
>>>>>>      experience on
>>>>>>      >>>>      Travis.
>>>>>>      >>>>      >>>>>>>  > >>
>>>>>>      >>>>      >>>>>>>  > >> Regards,
>>>>>>      >>>>      >>>>>>>  > >> Jark
>>>>>>      >>>>      >>>>>>>  > >>
>>>>>>      >>>>      >>>>>>>  > >> [1]: https://travis-ci.com/plans
>>>>>>      >>>>      >>>>>>>  > >>
>>>>>>      >>>>      >>>>>>>  > >> On Tue, 25 Jun 2019 at 08:11, Bowen Li <
>>>>>>      >>>>      >> bowenli86@gmail.com <ma...@gmail.com>
>>>>>>      <mailto:bowenli86@gmail.com <ma...@gmail.com>>
>>>>>>      >>>>      >>>>>>> <mailto:bowenli86@gmail.com
>>>>>>      <ma...@gmail.com>
>>>>>>      >>>>      <mailto:bowenli86@gmail.com
>>>>>>      <ma...@gmail.com>>>> wrote:
>>>>>>      >>>>      >>>>>>>  > >>
>>>>>>      >>>>      >>>>>>>  > >> > Hi Steven,
>>>>>>      >>>>      >>>>>>>  > >> >
>>>>>>      >>>>      >>>>>>>  > >> > I think you may not read what I
>>>>>>      wrote. The
>>>>>>      >>>>      discussion is
>>>>>>      >>>>      >>>> about
>>>>>>      >>>>      >>>>>>>  > "unstable
>>>>>>      >>>>      >>>>>>>  > >> > build **capacity**", in another word
>>>>>>      >>>>      "unstable / lack of
>>>>>>      >>>>      >>>> build
>>>>>>      >>>>      >>>>>>>  > >> resources",
>>>>>>      >>>>      >>>>>>>  > >> > not "unstable build".
>>>>>>      >>>>      >>>>>>>  > >> >
>>>>>>      >>>>      >>>>>>>  > >> > On Mon, Jun 24, 2019 at 4:40 PM
>>>>>>      Steven Wu
>>>>>>      >>>>      >>>>>>>  <stevenz3wu@gmail.com
>>>>>>      <ma...@gmail.com> <mailto:stevenz3wu@gmail.com
>>>>>>      <ma...@gmail.com>>
>>>>>>      >>>>      <mailto:stevenz3wu@gmail.com
>>>>>>      <ma...@gmail.com> <mailto:stevenz3wu@gmail.com
>>>>>>      <ma...@gmail.com>>>>
>>>>>>      >>>>      >>>>>>>  > wrote:
>>>>>>      >>>>      >>>>>>>  > >> >
>>>>>>      >>>>      >>>>>>>  > >> > > long and sometimes unstable build is
>>>>>>      >>>>      definitely a pain
>>>>>>      >>>>      >>>>>> point.
>>>>>>      >>>>      >>>>>>>  > >> > >
>>>>>>      >>>>      >>>>>>>  > >> > > I suspect the build failure here in
>>>>>>      >>>>      >> flink-connector-kafka
>>>>>>      >>>>      >>>>>>>  is not
>>>>>>      >>>>      >>>>>>>  > >> related
>>>>>>      >>>>      >>>>>>>  > >> > to
>>>>>>      >>>>      >>>>>>>  > >> > > my change. but there is no easy
>>>>>>      re-run the
>>>>>>      >>>>      build on
>>>>>>      >>>>      >>>>>>>  travis UI.
>>>>>>      >>>>      >>>>>>>  > Google
>>>>>>      >>>>      >>>>>>>  > >> > > search showed a trick of
>>>>>>      close-and-open the
>>>>>>      >>>>      PR will
>>>>>>      >>>>      >>>>>>>  trigger rebuild.
>>>>>>      >>>>      >>>>>>>  > >> but
>>>>>>      >>>>      >>>>>>>  > >> > > that could add noises to the PR
>>>>>>      activities.
>>>>>>      >>>>      >>>>>>>  > >> > >
>>>>>>      >>>> https://travis-ci.org/apache/flink/jobs/545555519
>>>>>>      >>>>      >>>>>>>  > >> > >
>>>>>>      >>>>      >>>>>>>  > >> > > travis-ci for my personal repo
>>>>>>      often failed
>>>>>>      >>>>      with
>>>>>>      >>>>      >>>>>>>  exceeding time
>>>>>>      >>>>      >>>>>>>  > limit
>>>>>>      >>>>      >>>>>>>  > >> > after
>>>>>>      >>>>      >>>>>>>  > >> > > 4+ hours.
>>>>>>      >>>>      >>>>>>>  > >> > > The job exceeded the maximum time
>>>>>>      limit for
>>>>>>      >>>>      jobs, and
>>>>>>      >>>>      >> has
>>>>>>      >>>>      >>>>>>>  been
>>>>>>      >>>>      >>>>>>>  > >> > terminated.
>>>>>>      >>>>      >>>>>>>  > >> > >
>>>>>>      >>>>      >>>>>>>  > >> > > On Mon, Jun 24, 2019 at 4:15 PM
>>>>>>      Bowen Li
>>>>>>      >>>>      >>>>>>>  <bowenli86@gmail.com
>>>>>>      <ma...@gmail.com> <mailto:bowenli86@gmail.com
>>>>>>      <ma...@gmail.com>>
>>>>>>      >>>>      <mailto:bowenli86@gmail.com <mailto:bowenli86@gmail.com
>>>>>>      <mailto:bowenli86@gmail.com <ma...@gmail.com>>>>
>>>>>>      >>>>      >>>>>>>  > wrote:
>>>>>>      >>>>      >>>>>>>  > >> > >
>>>>>>      >>>>      >>>>>>>  > >> > > >
>>>>>>      >>>> https://travis-ci.org/apache/flink/builds/549681530
>>>>>>      >>>>      >>>>>>>  This build
>>>>>>      >>>>      >>>>>>>  > >> > request
>>>>>>      >>>>      >>>>>>>  > >> > > > has
>>>>>>      >>>>      >>>>>>>  > >> > > > been sitting at **HEAD of the
>>>>>>      queue**
>>>>>>      >>>>      since I first
>>>>>>      >>>>      >> saw
>>>>>>      >>>>      >>>>>>>  it at PST
>>>>>>      >>>>      >>>>>>>  > >> > 10:30am
>>>>>>      >>>>      >>>>>>>  > >> > > > (not sure how long it's been
>>>>>>      there before
>>>>>>      >>>>      10:30am).
>>>>>>      >>>>      >>>>>>>  It's PST
>>>>>>      >>>>      >>>>>>>  > 4:12pm
>>>>>>      >>>>      >>>>>>>  > >> now
>>>>>>      >>>>      >>>>>>>  > >> > > and
>>>>>>      >>>>      >>>>>>>  > >> > > > it hasn't started yet.
>>>>>>      >>>>      >>>>>>>  > >> > > >
>>>>>>      >>>>      >>>>>>>  > >> > > > On Mon, Jun 24, 2019 at 2:48 PM
>>>>>>      Bowen Li
>>>>>>      >>>>      >>>>>>>  <bowenli86@gmail.com
>>>>>>      <ma...@gmail.com> <mailto:bowenli86@gmail.com
>>>>>>      <ma...@gmail.com>>
>>>>>>      >>>>      <mailto:bowenli86@gmail.com <mailto:bowenli86@gmail.com
>>>>>>      <mailto:bowenli86@gmail.com <ma...@gmail.com>>>>
>>>>>>      >>>>      >>>>>>>  > >> wrote:
>>>>>>      >>>>      >>>>>>>  > >> > > >
>>>>>>      >>>>      >>>>>>>  > >> > > > > Hi devs,
>>>>>>      >>>>      >>>>>>>  > >> > > > >
>>>>>>      >>>>      >>>>>>>  > >> > > > > I've been experiencing the pain
>>>>>>      >>>>      resulting from lack
>>>>>>      >>>>      >>>>>>>  of stable
>>>>>>      >>>>      >>>>>>>  > >> build
>>>>>>      >>>>      >>>>>>>  > >> > > > > capacity on Travis for Flink
>>>>>>      PRs [1].
>>>>>>      >>>>      >> Specifically, I
>>>>>>      >>>>      >>>>>>>  noticed
>>>>>>      >>>>      >>>>>>>  > >> often
>>>>>>      >>>>      >>>>>>>  > >> > > that
>>>>>>      >>>>      >>>>>>>  > >> > > > no
>>>>>>      >>>>      >>>>>>>  > >> > > > > build in the queue is making any
>>>>>>      >>>>      progress for
>>>>>>      >>>>      >> hours,
>>>>>>      >>>>      >>>> and
>>>>>>      >>>>      >>>>>>>  > suddenly
>>>>>>      >>>>      >>>>>>>  > >> 5
>>>>>>      >>>>      >>>>>>>  > >> > or
>>>>>>      >>>>      >>>>>>>  > >> > > 6
>>>>>>      >>>>      >>>>>>>  > >> > > > > builds kick off all together
>>>>>>      after the
>>>>>>      >>>>      long pause.
>>>>>>      >>>>      >>>>>>>  I'm at PST
>>>>>>      >>>>      >>>>>>>  > >> > (UTC-08)
>>>>>>      >>>>      >>>>>>>  > >> > > > time
>>>>>>      >>>>      >>>>>>>  > >> > > > > zone, and I've seen pause can
>>>>>>      be as
>>>>>>      >>>>      long as 6 hours
>>>>>>      >>>>      >>>>>>>  from PST 9am
>>>>>>      >>>>      >>>>>>>  > >> to
>>>>>>      >>>>      >>>>>>>  > >> > 3pm
>>>>>>      >>>>      >>>>>>>  > >> > > > > (let alone the time needed to
>>>>>>      drain the
>>>>>>      >>>>      queue
>>>>>>      >>>>      >>>>>>>  afterwards).
>>>>>>      >>>>      >>>>>>>  > >> > > > >
>>>>>>      >>>>      >>>>>>>  > >> > > > > I think this has greatly
>>>>>>      impacted our
>>>>>>      >>>>      productivity.
>>>>>>      >>>>      >>>> I've
>>>>>>      >>>>      >>>>>>>  > >> experienced
>>>>>>      >>>>      >>>>>>>  > >> > > that
>>>>>>      >>>>      >>>>>>>  > >> > > > > PRs submitted in the early
>>>>>>      morning of
>>>>>>      >>>>      PST time zone
>>>>>>      >>>>      >>>>>>>  won't finish
>>>>>>      >>>>      >>>>>>>  > >> > their
>>>>>>      >>>>      >>>>>>>  > >> > > > > build until late night of the
>>>>>>      same day.
>>>>>>      >>>>      >>>>>>>  > >> > > > >
>>>>>>      >>>>      >>>>>>>  > >> > > > > So my questions are:
>>>>>>      >>>>      >>>>>>>  > >> > > > >
>>>>>>      >>>>      >>>>>>>  > >> > > > > - Has anyone else experienced
>>>>>>      the same
>>>>>>      >>>>      problem or
>>>>>>      >>>>      >>>>>>>  have similar
>>>>>>      >>>>      >>>>>>>  > >> > > > observation
>>>>>>      >>>>      >>>>>>>  > >> > > > > on TravisCI? (I suspect it
>>>>>>      has things
>>>>>>      >>>>      to do with
>>>>>>      >>>>      >> time
>>>>>>      >>>>      >>>>>>>  zone)
>>>>>>      >>>>      >>>>>>>  > >> > > > >
>>>>>>      >>>>      >>>>>>>  > >> > > > > - What pricing plan of
>>>>>>      TravisCI is
>>>>>>      >>>>      Flink currently
>>>>>>      >>>>      >>>>>>>  using? Is it
>>>>>>      >>>>      >>>>>>>  > >> the
>>>>>>      >>>>      >>>>>>>  > >> > > free
>>>>>>      >>>>      >>>>>>>  > >> > > > > plan for open source
>>>>>>      projects? What
>>>>>>      >>>> are the
>>>>>>      >>>>      >>>>>>>  guaranteed build
>>>>>>      >>>>      >>>>>>>  > >> capacity
>>>>>>      >>>>      >>>>>>>  > >> > > of
>>>>>>      >>>>      >>>>>>>  > >> > > > > the current plan?
>>>>>>      >>>>      >>>>>>>  > >> > > > >
>>>>>>      >>>>      >>>>>>>  > >> > > > > - If the current pricing plan
>>>>>>      (either
>>>>>>      >>>>      free or paid)
>>>>>>      >>>>      >>>>>> can't
>>>>>>      >>>>      >>>>>>>  > provide
>>>>>>      >>>>      >>>>>>>  > >> > > stable
>>>>>>      >>>>      >>>>>>>  > >> > > > > build capacity, can we
>>>>>>      upgrade to a
>>>>>>      >>>>      higher priced
>>>>>>      >>>>      >>>>>>>  plan with
>>>>>>      >>>>      >>>>>>>  > larger
>>>>>>      >>>>      >>>>>>>  > >> > and
>>>>>>      >>>>      >>>>>>>  > >> > > > more
>>>>>>      >>>>      >>>>>>>  > >> > > > > stable build capacity?
>>>>>>      >>>>      >>>>>>>  > >> > > > >
>>>>>>      >>>>      >>>>>>>  > >> > > > > BTW, another factor that
>>>>>>      contribute to
>>>>>>      >>>> the
>>>>>>      >>>>      >>>>>>>  productivity problem
>>>>>>      >>>>      >>>>>>>  > is
>>>>>>      >>>>      >>>>>>>  > >> > that
>>>>>>      >>>>      >>>>>>>  > >> > > > > our build is slow - we run
>>>>>>      full build
>>>>>>      >>>>      for every PR
>>>>>>      >>>>      >>>> and a
>>>>>>      >>>>      >>>>>>>  > >> successful
>>>>>>      >>>>      >>>>>>>  > >> > > full
>>>>>>      >>>>      >>>>>>>  > >> > > > > build takes ~5h. We
>>>>>>      definitely have
>>>>>>      >>>>      more options to
>>>>>>      >>>>      >>>>>>>  solve it,
>>>>>>      >>>>      >>>>>>>  > for
>>>>>>      >>>>      >>>>>>>  > >> > > > instance,
>>>>>>      >>>>      >>>>>>>  > >> > > > > modularize the build graphs
>>>>>>      and reuse
>>>>>>      >>>>      artifacts
>>>>>>      >>>>      >> from
>>>>>>      >>>>      >>>> the
>>>>>>      >>>>      >>>>>>>  > previous
>>>>>>      >>>>      >>>>>>>  > >> > > build.
>>>>>>      >>>>      >>>>>>>  > >> > > > > But I think that can be a big
>>>>>>      effort
>>>>>>      >>>>      which is much
>>>>>>      >>>>      >>>>>>>  harder to
>>>>>>      >>>>      >>>>>>>  > >> > accomplish
>>>>>>      >>>>      >>>>>>>  > >> > > > in
>>>>>>      >>>>      >>>>>>>  > >> > > > > a short period of time and
>>>>>>      may deserve
>>>>>>      >>>>      its own
>>>>>>      >>>>      >>>> separate
>>>>>>      >>>>      >>>>>>>  > >> discussion.
>>>>>>      >>>>      >>>>>>>  > >> > > > >
>>>>>>      >>>>      >>>>>>>  > >> > > > > [1]
>>>>>>      >>>>      >> https://travis-ci.org/apache/flink/pull_requests
>>>>>>      >>>>      >>>>>>>  > >> > > > >
>>>>>>      >>>>      >>>>>>>  > >> > > > >
>>>>>>      >>>>      >>>>>>>  > >> > > >
>>>>>>      >>>>      >>>>>>>  > >> > >
>>>>>>      >>>>      >>>>>>>  > >> >
>>>>>>      >>>>      >>>>>>>  > >>
>>>>>>      >>>>      >>>>>>>  > >
>>>>>>      >>>>      >>>>>>>  >
>>>>>>      >>>>      >>>>>>>
>>>>>>      >>>>      >>>>>>>
>>>>>>      >>>>      >>>>>>>  --
>>>>>>      >>>>      >>>>>>>  Best Regards
>>>>>>      >>>>      >>>>>>>
>>>>>>      >>>>      >>>>>>>  Jeff Zhang
>>>>>>      >>>>      >>>>>>>
>>>>>>      >>>>      >>
>>>>>>      >>>>
>>>>>>      >>>
>>>>>>      >>
>>>>>>
>>>>>
>>>>
>>>
>>


Re: [RESULT][VOTE] Migrate to sponsored Travis account

Posted by Kurt Young <yk...@gmail.com>.
Thanks for all your efforts Chesnay, it indeed improves a lot for our
develop experience. BTW, do you know how to find the master branch
information which the CI runs with?

For example, like this one:
https://travis-ci.com/flink-ci/flink/jobs/214542568
It shows pass with the commits, which rebased on the master when the CI
is triggered. But it's both possible that the master branch CI runs on is
the
same or different with current master. If it's the same, I can simply rely
on the
passed information to push commits, but if it's not, I think i should find
another
way to re-trigger tests based on the newest master.

Do you know where can I get such information?

Best,
Kurt


On Tue, Jul 9, 2019 at 3:27 AM Chesnay Schepler <ch...@apache.org> wrote:

> The kinks have been worked out; the bot is running again and pr builds
> are yet again no longer running on ASF resources.
>
> PRs are mirrored to: https://github.com/flink-ci/flink
> Bot source: https://github.com/flink-ci/ci-bot
>
> On 08/07/2019 17:14, Chesnay Schepler wrote:
> > I have temporarily re-enabled running PR builds on the ASF account;
> > migrating to the Travis subscription caused some issues in the bot
> > that I have to fix first.
> >
> > On 07/07/2019 23:01, Chesnay Schepler wrote:
> >> The vote has passed unanimously in favor of migrating to a separate
> >> Travis account.
> >>
> >> I will now set things up such that no PullRequest is no longer run on
> >> the ASF servers.
> >> This is a major setup in reducing our usage of ASF resources.
> >> For the time being we'll use free Travis plan for flink-ci (i.e. 5
> >> workers, which is the same the ASF gives us). Over the course of the
> >> next week we'll setup the Ververica subscription to increase this limit.
> >>
> >> From now now, a bot will mirror all new and updated PullRequests to a
> >> mirror repository (https://github.com/flink-ci/flink-ci) and write an
> >> update into the PR once the build is complete.
> >> I have ran the bots for the past 3 days in parallel to our existing
> >> Travis and it was working without major issues.
> >>
> >> The biggest change that contributors will see is that there's no
> >> longer a icon next to each commit. We may revisit this in the future.
> >>
> >> I'll setup a repo with the source of the bot later.
> >>
> >> On 04/07/2019 10:46, Chesnay Schepler wrote:
> >>> I've raised a JIRA
> >>> <https://issues.apache.org/jira/browse/INFRA-18703>with INFRA to
> >>> inquire whether it would be possible to switch to a different Travis
> >>> account, and if so what steps would need to be taken.
> >>> We need a proper confirmation from INFRA since we are not in full
> >>> control of the flink repository (for example, we cannot access the
> >>> settings page).
> >>>
> >>> If this is indeed possible, Ververica is willing sponsor a Travis
> >>> account for the Flink project.
> >>> This would provide us with more than enough resources than we need.
> >>>
> >>> Since this makes the project more reliant on resources provided by
> >>> external companies I would like to vote on this.
> >>>
> >>> Please vote on this proposal, as follows:
> >>> [ ] +1, Approve the migration to a Ververica-sponsored Travis
> >>> account, provided that INFRA approves
> >>> [ ] -1, Do not approach the migration to a Ververica-sponsored
> >>> Travis account
> >>>
> >>> The vote will be open for at least 24h, and until we have
> >>> confirmation from INFRA. The voting period may be shorter than the
> >>> usual 3 days since our current is effectively not working.
> >>>
> >>> On 04/07/2019 06:51, Bowen Li wrote:
> >>>> Re: > Are they using their own Travis CI pool, or did the switch to
> >>>> an entirely different CI service?
> >>>>
> >>>> I reached out to Wes and Krisztián from Apache Arrow PMC. They are
> >>>> currently moving away from ASF's Travis to their own in-house metal
> >>>> machines at [1] with custom CI application at [2]. They've seen
> >>>> significant improvement w.r.t both much higher performance and
> >>>> basically no resource waiting time, "night-and-day" difference
> >>>> quoting Wes.
> >>>>
> >>>> Re: > If we can just switch to our own Travis pool, just for our
> >>>> project, then this might be something we can do fairly quickly?
> >>>>
> >>>> I believe so, according to [3] and [4]
> >>>>
> >>>>
> >>>> [1] https://ci.ursalabs.org/ <https://ci.ursalabs.org/#/>
> >>>> [2] https://github.com/ursa-labs/ursabot
> >>>> [3]
> >>>>
> https://docs.travis-ci.com/user/migrate/open-source-repository-migration
> >>>>
> >>>> [4]
> >>>> https://docs.travis-ci.com/user/migrate/open-source-on-travis-ci-com
> >>>>
> >>>>
> >>>>
> >>>> On Wed, Jul 3, 2019 at 12:01 AM Chesnay Schepler
> >>>> <chesnay@apache.org <ma...@apache.org>> wrote:
> >>>>
> >>>>     Are they using their own Travis CI pool, or did the switch to an
> >>>>     entirely different CI service?
> >>>>
> >>>>     If we can just switch to our own Travis pool, just for our
> >>>>     project, then
> >>>>     this might be something we can do fairly quickly?
> >>>>
> >>>>     On 03/07/2019 05:55, Bowen Li wrote:
> >>>>     > I responded in the INFRA ticket [1] that I believe they are
> >>>>     using a wrong
> >>>>     > metric against Flink and the total build time is a completely
> >>>>     different
> >>>>     > thing than guaranteed build capacity.
> >>>>     >
> >>>>     > My response:
> >>>>     >
> >>>>     > "As mentioned above, since I started to pay attention to Flink's
> >>>>     build
> >>>>     > queue a few tens of days ago, I'm in Seattle and I saw no build
> >>>>     was kicking
> >>>>     > off in PST daytime in weekdays for Flink. Our teammates in China
> >>>>     and Europe
> >>>>     > have also reported similar observations. So we need to evaluate
> >>>>     how the
> >>>>     > large total build time came from - if 1) your number and 2) our
> >>>>     > observations from three locations that cover pretty much a full
> >>>>     day, are
> >>>>     > all true, I **guess** one reason can be that - highly likely the
> >>>>     extra
> >>>>     > build time came from weekends when other Apache projects may be
> >>>>     idle and
> >>>>     > Flink just drains hard its congested queue.
> >>>>     >
> >>>>     > Please be aware of that we're not complaining about the lack of
> >>>>     resources
> >>>>     > in general, I'm complaining about the lack of **stable,
> >>>> dedicated**
> >>>>     > resources. An example for the latter one is, currently even if
> >>>>     no build is
> >>>>     > in Flink's queue and I submit a request to be the queue head
> >>>> in PST
> >>>>     > morning, my build won't even start in 6-8+h. That is an absurd
> >>>>     amount of
> >>>>     > waiting time.
> >>>>     >
> >>>>     > That's saying, if ASF INFRA decides to adopt a quota system and
> >>>>     grants
> >>>>     > Flink five DEDICATED servers that runs all the time only for
> >>>>     Flink, that'll
> >>>>     > be PERFECT and can totally solve our problem now.
> >>>>     >
> >>>>     > Please be aware of that we're not complaining about the lack of
> >>>>     resources
> >>>>     > in general, I'm complaining about the lack of **stable,
> >>>> dedicated**
> >>>>     > resources. An example for the latter one is, currently even if
> >>>>     no build is
> >>>>     > in Flink's queue and I submit a request to be the queue head
> >>>> in PST
> >>>>     > morning, my build won't even start in 6-8+h. That is an absurd
> >>>>     amount of
> >>>>     > waiting time.
> >>>>     >
> >>>>     >
> >>>>     > That's saying, if ASF INFRA decides to adopt a quota system and
> >>>>     grants
> >>>>     > Flink five DEDICATED servers that runs all the time only for
> >>>>     Flink, that'll
> >>>>     > be PERFECT and can totally solve our problem now.
> >>>>     >
> >>>>     > I feel what's missing in the ASF INFRA's Travis resource pool is
> >>>>     some level
> >>>>     > of build capacity SLAs and certainty"
> >>>>     >
> >>>>     >
> >>>>     > Again, I believe there are differences in nature of these two
> >>>>     problems,
> >>>>     > long build time v.s. lack of dedicated build resource. That's
> >>>>     saying,
> >>>>     > shortening build time may relieve the situation, and may not.
> >>>>     I'm sightly
> >>>>     > negative on disabling IT cases for PRs, due to the downside is
> >>>>     that we are
> >>>>     > at risk of any potential bugs in PR that UTs doesn't catch, and
> >>>>     may cost a
> >>>>     > lot more to fix and if it slows others down or even block
> >>>>     others, but am
> >>>>     > open to others opinions on it.
> >>>>     >
> >>>>     > AFAICT from INFRA ticket[1], donating to ASF INFRA won't be
> >>>>     feasible to
> >>>>     > solve our problem since INFRA's pool is fully shared and they
> >>>>     have no
> >>>>     > control and finer insights over resource allocation to a
> >>>>     specific Apache
> >>>>     > project. As mentioned in [1], Apache Arrow is moving away from
> >>>>     ASF INFRA
> >>>>     > Travis pool (they are actually surprised Flink hasn't plan to do
> >>>>     so). I
> >>>>     > know that Spark is on its own build infra. If we all agree that
> >>>>     funding our
> >>>>     > own build infra, I'd be glad to help investigate any potential
> >>>>     options
> >>>>     > after releasing 1.9 since I'm super busy with 1.9 now.
> >>>>     >
> >>>>     > [1] https://issues.apache.org/jira/browse/INFRA-18533
> >>>>     >
> >>>>     >
> >>>>     >
> >>>>     > On Tue, Jul 2, 2019 at 4:46 AM Chesnay Schepler
> >>>>     <chesnay@apache.org <ma...@apache.org>> wrote:
> >>>>     >
> >>>>     >> As a short-term stopgap, since we can assume this issue to
> >>>>     become much
> >>>>     >> worse in the following days/weeks, we could disable IT cases in
> >>>>     PRs and
> >>>>     >> only run them on master.
> >>>>     >>
> >>>>     >> On 02/07/2019 12:03, Chesnay Schepler wrote:
> >>>>     >>> People really have to stop thinking that just because
> >>>>     something works
> >>>>     >>> for us it is also a good solution.
> >>>>     >>> Also, please remember that our builds run for 2h from start to
> >>>>     finish,
> >>>>     >>> and not the 14 _minutes_ it takes for zeppelin.
> >>>>     >>> We are dealing with an entirely different scale here, both in
> >>>>     terms of
> >>>>     >>> build times and number of builds.
> >>>>     >>>
> >>>>     >>> In this very thread people have been complaining about long
> >>>> queue
> >>>>     >>> times for their builds. Surprise, other Apache projects
> >>>> have been
> >>>>     >>> suffering the very same thing due to us not controlling our
> >>>> build
> >>>>     >>> times. While switching services (be it Jenkins, CircleCI or
> >>>>     whatever)
> >>>>     >>> will possibly work for us (and these options are actually
> >>>>     attractive,
> >>>>     >>> like CircleCI's proper support for build artifacts), it
> >>>> will also
> >>>>     >>> result in us likely negatively affecting other projects in
> >>>>     significant
> >>>>     >>> ways.
> >>>>     >>>
> >>>>     >>> Sure, the Jenkins setup has a good user experience for us, at
> >>>>     the cost
> >>>>     >>> of blocking Jenkins workers for a _lot_ of time. Right now we
> >>>>     have 25
> >>>>     >>> PR's in our queue; that's possibly 50h we'd consume of Jenkins
> >>>>     >>> resources, and the European contributors haven't even really
> >>>>     started yet.
> >>>>     >>>
> >>>>     >>> FYI, the latest INFRA response from INFRA-18533:
> >>>>     >>>
> >>>>     >>> "Our rough metrics shows that Flink used over 5800 hours of
> >>>>     build time
> >>>>     >>> last month. That is equal to EIGHT servers running 24/7 for
> >>>>     the ENTIRE
> >>>>     >>> MONTH. EIGHT. nonstop.
> >>>>     >>> When we discovered this last night, we discussed it some and
> >>>>     are going
> >>>>     >>> to tune down Flink to allow only five executors maximum. We
> >>>> cannot
> >>>>     >>> allow Flink to consume so much of a Foundation shared
> >>>> resource."
> >>>>     >>>
> >>>>     >>> So yes, we either
> >>>>     >>> a) have to heavily reduce our CI usage or
> >>>>     >>> b) fund our own, either maintaining it ourselves or donating
> >>>>     to Apache.
> >>>>     >>>
> >>>>     >>> On 02/07/2019 05:11, Bowen Li wrote:
> >>>>     >>>> By looking at the git history of the Jenkins script, its core
> >>>>     part
> >>>>     >>>> was finished in March 2017 (and only two minor update in
> >>>>     2017/2018),
> >>>>     >>>> so it's been running for over two years now and feels like
> >>>>     Zepplin
> >>>>     >>>> community has been quite happy with it. @Jeff Zhang
> >>>>     >>>> <mailto:zjffdu@gmail.com <ma...@gmail.com>> can you
> >>>>     share your insights and user
> >>>>     >>>> experience with the Jenkins+Travis approach?
> >>>>     >>>>
> >>>>     >>>> Things like:
> >>>>     >>>>
> >>>>     >>>> - has the approach completely solved the resource capacity
> >>>>     problem
> >>>>     >>>> for Zepplin community? is Zepplin community happy with the
> >>>>     result?
> >>>>     >>>> - is the whole configuration chain stable (e.g. uptime)
> >>>> enough?
> >>>>     >>>> - how often do you need to maintain the Jenkins infra? how
> >>>> many
> >>>>     >>>> people are usually involved in maintenance and bug-fixes?
> >>>>     >>>>
> >>>>     >>>> The downside of this approach seems mostly to be on the
> >>>>     maintenance
> >>>>     >>>> to me - maintain the script and Jenkins infra.
> >>>>     >>>>
> >>>>     >>>> ** Having Our Own Travis-CI.com Account **
> >>>>     >>>>
> >>>>     >>>> Another alternative I've been thinking of is to have our own
> >>>>     >>>> travis-ci.com <http://travis-ci.com> <http://travis-ci.com>
> >>>>     account with paid dedicated
> >>>>     >>>> resources. Note travis-ci.org <http://travis-ci.org>
> >>>>     <http://travis-ci.org> is the free
> >>>>     >>>> version and travis-ci.com <http://travis-ci.com>
> >>>>     <http://travis-ci.com> is the commercial
> >>>>     >>>> version. We currently use a shared resource pool managed by
> >>>>     ASK INFRA
> >>>>     >>>> team on travis-ci.org <http://travis-ci.org>
> >>>>     <http://travis-ci.org>, but we have no control
> >>>>     >>>> over it - we can't see how it's configured, how much
> >>>>     resources are
> >>>>     >>>> available, how resources are allocated among Apache projects,
> >>>>     etc.
> >>>>     >>>> The nice thing about having an account on travis-ci.com
> >>>>     <http://travis-ci.com>
> >>>>     >>>> <http://travis-ci.com> are:
> >>>>     >>>>
> >>>>     >>>> - relatively low cost with much better resource guarantee
> >>>>     than what
> >>>>     >>>> we currently have [1]: $249/month with 5 dedicated
> >>>> concurrency,
> >>>>     >>>> $489/month with 10 concurrency
> >>>>     >>>> - low maintenance work compared to using Jenkins
> >>>>     >>>> - (potentially) no migration cost according to Travis's
> >>>> doc [2]
> >>>>     >>>> (pending verification)
> >>>>     >>>> - full control over the build capacity/configuration
> >>>> compared to
> >>>>     >>>> using ASF INFRA's pool
> >>>>     >>>>
> >>>>     >>>> I'd be surprised if we as such a vibrant community cannot
> >>>>     find and
> >>>>     >>>> fund $249*12=$2988 a year in exchange for a much better
> >>>> developer
> >>>>     >>>> experience and much higher productivity.
> >>>>     >>>>
> >>>>     >>>> [1] https://travis-ci.com/plans
> >>>>     >>>> [2]
> >>>>     >>>>
> >>>>     >>
> >>>>
> https://docs.travis-ci.com/user/migrate/open-source-repository-migration
> >>>>
> >>>>     >>>> On Sat, Jun 29, 2019 at 8:39 AM Chesnay Schepler
> >>>>     <chesnay@apache.org <ma...@apache.org>
> >>>>     >>>> <mailto:chesnay@apache.org <ma...@apache.org>>>
> >>>> wrote:
> >>>>     >>>>
> >>>>     >>>>      So yes, the Jenkins job keeps pulling the state from
> >>>>     Travis until it
> >>>>     >>>>      finishes.
> >>>>     >>>>
> >>>>     >>>>      Note sure I'm comfortable with the idea of using Jenkins
> >>>>     workers
> >>>>     >>>>      just to
> >>>>     >>>>      idle for a several hours.
> >>>>     >>>>
> >>>>     >>>>      On 29/06/2019 14:56, Jeff Zhang wrote:
> >>>>     >>>>      > Here's what zeppelin community did, we make a python
> >>>>     script to
> >>>>     >>>>      check the
> >>>>     >>>>      > build status of pull request.
> >>>>     >>>>      > Here's script:
> >>>>     >>>>      >
> >>>> https://github.com/apache/zeppelin/blob/master/travis_check.py
> >>>>     >>>>      >
> >>>>     >>>>      > And this is the script we used in Jenkins build job.
> >>>>     >>>>      >
> >>>>     >>>>      > if [ -f "travis_check.py" ]; then
> >>>>     >>>>      >    git log -n 1
> >>>>     >>>>      >    STATUS=$(curl -s $BUILD_URL | grep -e "GitHub pull
> >>>>     >>>>      request.*from.*" | sed
> >>>>     >>>>      > 's/.*GitHub pull request <a
> >>>>     >>>>      > href=\"\(https[^"]*\).*from[^"]*.\(https[^"]*\).*/\1
> >>>>     \2/g')
> >>>>     >>>>      >    AUTHOR=$(echo $STATUS | sed 's/.*[/]\(.*\)$/\1/g')
> >>>>     >>>>      >    PR=$(echo $STATUS | awk '{print $1}' | sed
> >>>>     >>>> 's/.*[/]\(.*\)$/\1/g')
> >>>>     >>>>      >    #COMMIT=$(git log -n 1 | grep "^Merge:" | awk
> >>>>     '{print $3}')
> >>>>     >>>>      >    #if [ -z $COMMIT ]; then
> >>>>     >>>>      >    #  COMMIT=$(curl -s
> >>>>     >>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR
> >>>>     >>>>      > | grep -e "\"label\":" -e "\"ref\":" -e "\"sha\":" |
> >>>>     tr '\n' ' '
> >>>>     >>>>      | sed
> >>>>     >>>>      > 's/\(.*sha[^,]*,\)\(.*ref.*\)/\1 = \2/g' | tr = '\n' |
> >>>>     grep -v
> >>>>     >>>>      "apache:" |
> >>>>     >>>>      > sed 's/.*sha.[^"]*["]\([^"]*\).*/\1/g')
> >>>>     >>>>      >    #fi
> >>>>     >>>>      >
> >>>>     >>>>      >    # get commit hash from PR
> >>>>     >>>>      >    COMMIT=$(curl -s
> >>>>     >>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR |
> >>>>     >>>>      > grep -e "\"label\":" -e "\"ref\":" -e "\"sha\":" | tr
> >>>>     '\n' ' '
> >>>>     >>>> | sed
> >>>>     >>>>      > 's/\(.*sha[^,]*,\)\(.*ref.*\)/\1 = \2/g' | tr = '\n' |
> >>>>     grep -v
> >>>>     >>>>      "apache:" |
> >>>>     >>>>      > sed 's/.*sha.[^"]*["]\([^"]*\).*/\1/g')
> >>>>     >>>>      >    sleep 30 # sleep few moment to wait travis starts
> >>>>     the build
> >>>>     >>>>      >    RET_CODE=0
> >>>>     >>>>      >    python ./travis_check.py ${AUTHOR} ${COMMIT} ||
> >>>>     RET_CODE=$?
> >>>>     >>>>      >    if [ $RET_CODE -eq 2 ]; then # try with repository
> >>>>     name when
> >>>>     >>>>      travis-ci is
> >>>>     >>>>      > not available in the account
> >>>>     >>>>      >      RET_CODE=0
> >>>>     >>>>      >      AUTHOR=$(curl -s
> >>>>     >>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR
> >>>>     >>>>      > | grep '"full_name":' | grep -v "apache/zeppelin" |
> >>>> sed
> >>>>     >>>>      > 's/.*[:][^"]*["]\([^/]*\).*/\1/g')
> >>>>     >>>>      >    python ./travis_check.py ${AUTHOR} ${COMMIT} ||
> >>>>     RET_CODE=$?
> >>>>     >>>>      >    fi
> >>>>     >>>>      >
> >>>>     >>>>      >    if [ $RET_CODE -eq 2 ]; then # fail with can't find
> >>>>     build
> >>>>     >>>>      information in
> >>>>     >>>>      > the travis
> >>>>     >>>>      >      set +x
> >>>>     >>>>      >      echo
> >>>>     "-----------------------------------------------------"
> >>>>     >>>>      >      echo "Looks like travis-ci is not configured for
> >>>>     your fork."
> >>>>     >>>>      >      echo "Please setup by swich on 'zeppelin'
> >>>>     repository at
> >>>>     >>>>      > https://travis-ci.org/profile and travis-ci."
> >>>>     >>>>      >      echo "And then make sure 'Build branch updates'
> >>>>     option is
> >>>>     >>>>      enabled in
> >>>>     >>>>      > the settings
> >>>>     https://travis-ci.org/${AUTHOR}/zeppelin/settings
> >>>> <https://travis-ci.org/$%7BAUTHOR%7D/zeppelin/settings>
> >>>>     >>>> <https://travis-ci.org/$%7BAUTHOR%7D/zeppelin/settings>."
> >>>>     >>>>      >      echo ""
> >>>>     >>>>      >      echo "To trigger CI after setup, you will need
> >>>>     ammend your
> >>>>     >>>>      last commit
> >>>>     >>>>      > with"
> >>>>     >>>>      >      echo "git commit --amend"
> >>>>     >>>>      >      echo "git push your-remote HEAD --force"
> >>>>     >>>>      >      echo ""
> >>>>     >>>>      >      echo "See
> >>>>     >>>>      >
> >>>>     >>>>
> >>>>     >>
> >>>>
> http://zeppelin.apache.org/contribution/contributions.html#continuous-integration
> >>>>
> >>>>     >>>>      > ."
> >>>>     >>>>      >    fi
> >>>>     >>>>      >
> >>>>     >>>>      >    exit $RET_CODE
> >>>>     >>>>      > else
> >>>>     >>>>      >    set +x
> >>>>     >>>>      >    echo "travis_check.py does not exists"
> >>>>     >>>>      >    exit 1
> >>>>     >>>>      > fi
> >>>>     >>>>      >
> >>>>     >>>>      > Chesnay Schepler <chesnay@apache.org
> >>>>     <ma...@apache.org>
> >>>>     >>>>      <mailto:chesnay@apache.org <mailto:chesnay@apache.org
> >>>
> >>>>     于2019年6月29日周六 下午3:17写道:
> >>>>     >>>>      >
> >>>>     >>>>      >> Does this imply that a Jenkins job is active as long
> >>>>     as the
> >>>>     >>>>      Travis build
> >>>>     >>>>      >> runs?
> >>>>     >>>>      >>
> >>>>     >>>>      >> On 26/06/2019 21:28, Bowen Li wrote:
> >>>>     >>>>      >>> Hi,
> >>>>     >>>>      >>>
> >>>>     >>>>      >>> @Dawid, I think the "long test running" as I
> >>>>     mentioned in the
> >>>>     >>>>      first
> >>>>     >>>>      >> email,
> >>>>     >>>>      >>> also as you guys said, belongs to "a big effort
> >>>>     which is much
> >>>>     >>>>      harder to
> >>>>     >>>>      >>> accomplish in a short period of time and may deserve
> >>>>     its own
> >>>>     >>>>      separate
> >>>>     >>>>      >>> discussion". Thus I didn't include it in what we can
> >>>>     do in a
> >>>>     >>>>      foreseeable
> >>>>     >>>>      >>> short term.
> >>>>     >>>>      >>>
> >>>>     >>>>      >>> Besides, I don't think that's the ultimate reason
> >>>>     for lack of
> >>>>     >>>>      build
> >>>>     >>>>      >>> resources. Even if the build is shortened to
> >>>>     something like
> >>>>     >>>>      2h, the
> >>>>     >>>>      >>> problems of no build machine works about 6 or more
> >>>>     hours in
> >>>>     >>>>      PST daytime
> >>>>     >>>>      >>> that I described will still happen, because no
> >>>>     machine from
> >>>>     >>>>      ASF INFRA's
> >>>>     >>>>      >>> pool is allocated to Flink. As I have paid close
> >>>>     attention to
> >>>>     >>>>      the build
> >>>>     >>>>      >>> queue in the past few weekdays, it's a pretty clear
> >>>>     pattern now.
> >>>>     >>>>      >>>
> >>>>     >>>>      >>> **The ultimate root cause** for that is - we don't
> >>>>     have any
> >>>>     >>>>      **dedicated**
> >>>>     >>>>      >>> build resources that we can stably rely on. I'm
> >>>>     actually ok to
> >>>>     >>>>      wait for a
> >>>>     >>>>      >>> long time if there are build requests running, it
> >>>>     means at
> >>>>     >>>>      least we are
> >>>>     >>>>      >>> making progress. But I'm not ok with no build
> >>>>     resource. A
> >>>>     >>>>      better place I
> >>>>     >>>>      >>> think we should aim at in short term is to always
> >>>>     have at
> >>>>     >>>>      least a central
> >>>>     >>>>      >>> pool (can be 3 or 5) of machines dedicated to build
> >>>>     Flink at
> >>>>     >>>>      any time, or
> >>>>     >>>>      >>> maybe use users resources.
> >>>>     >>>>      >>>
> >>>>     >>>>      >>> @Chesnay @Robert I synced with Jeff offline that
> >>>>     Zeppelin
> >>>>     >>>>      community is
> >>>>     >>>>      >>> using a Jenkins job to automatically build on users'
> >>>>     travis
> >>>>     >>>>      account and
> >>>>     >>>>      >>> link the result back to github PR. I guess the
> >>>>     Jenkins job
> >>>>     >>>>      would fetch
> >>>>     >>>>      >>> latest upstream master and build the PR against it.
> >>>>     Jeff has
> >>>>     >>>> filed
> >>>>     >>>>      >> tickets
> >>>>     >>>>      >>> to learn and get access to the Jenkins infra. It'll
> >>>>     better to
> >>>>     >>>>      fully
> >>>>     >>>>      >>> understand it first before judging this approach.
> >>>>     >>>>      >>>
> >>>>     >>>>      >>> I also heard good things about CircleCI, and ASF
> >>>>     INFRA seems
> >>>>     >>>>      to have a
> >>>>     >>>>      >> pool
> >>>>     >>>>      >>> of build capacity there too. Can be an alternative
> >>>>     to consider.
> >>>>     >>>>      >>>
> >>>>     >>>>      >>>
> >>>>     >>>>      >>>
> >>>>     >>>>      >>>
> >>>>     >>>>      >>>
> >>>>     >>>>      >>>
> >>>>     >>>>      >>>
> >>>>     >>>>      >>>
> >>>>     >>>>      >>>
> >>>>     >>>>      >>> On Wed, Jun 26, 2019 at 12:44 AM Dawid Wysakowicz <
> >>>>     >>>>      >> dwysakowicz@apache.org
> >>>>     <ma...@apache.org> <mailto:dwysakowicz@apache.org
> >>>>     <ma...@apache.org>>>
> >>>>     >>>>      >>> wrote:
> >>>>     >>>>      >>>
> >>>>     >>>>      >>>> Sorry to jump in late, but I think Bowen missed the
> >>>>     most
> >>>>     >>>>      important point
> >>>>     >>>>      >>>> from Chesnay's previous message in the summary. The
> >>>>     ultimate
> >>>>     >>>>      reason for
> >>>>     >>>>      >>>> all the problems is that the tests take close to 2
> >>>>     hours to
> >>>>     >>>>      run already.
> >>>>     >>>>      >>>> I fully support this claim: "Unless people start
> >>>>     caring about
> >>>>     >>>>      test times
> >>>>     >>>>      >>>> before adding them, this issue cannot be solved"
> >>>>     >>>>      >>>>
> >>>>     >>>>      >>>> This is also another reason why using user's Travis
> >>>>     account
> >>>>     >>>>      won't help.
> >>>>     >>>>      >>>> Every few weeks we reach the user's time limit for
> >>>>     a single
> >>>>     >>>>      profile.
> >>>>     >>>>      >>>> This makes the user's builds simply fail, until we
> >>>>     either
> >>>>     >>>>      properly
> >>>>     >>>>      >>>> decrease the time the tests take (which I am not
> >>>>     sure we ever
> >>>>     >>>>      did) or
> >>>>     >>>>      >>>> postpone the problem by splitting into more
> >>>>     profiles. (Note
> >>>>     >>>>      that the ASF
> >>>>     >>>>      >>>> Travis account has higher time limits)
> >>>>     >>>>      >>>>
> >>>>     >>>>      >>>> Best,
> >>>>     >>>>      >>>>
> >>>>     >>>>      >>>> Dawid
> >>>>     >>>>      >>>>
> >>>>     >>>>      >>>> On 26/06/2019 09:36, Robert Metzger wrote:
> >>>>     >>>>      >>>>> Do we know if using "the best" available hardware
> >>>>     would
> >>>>     >>>>      improve the
> >>>>     >>>>      >> build
> >>>>     >>>>      >>>>> times?
> >>>>     >>>>      >>>>> Imagine we would run the build on machines with
> >>>>     plenty of
> >>>>     >>>>      main memory
> >>>>     >>>>      >> to
> >>>>     >>>>      >>>>> mount everything to ramdisk + the latest CPU
> >>>>     architecture?
> >>>>     >>>>      >>>>>
> >>>>     >>>>      >>>>> Throwing hardware at the problem could help reduce
> >>>>     the time
> >>>>     >>>>      of an
> >>>>     >>>>      >>>>> individual build, and using our own infrastructure
> >>>>     would
> >>>>     >>>>      remove our
> >>>>     >>>>      >>>>> dependency on Apache's Travis account (with the
> >>>>     obvious
> >>>>     >>>>      downside of
> >>>>     >>>>      >>>> having
> >>>>     >>>>      >>>>> to maintain the infrastructure)
> >>>>     >>>>      >>>>> We could use an open source travis alternative, to
> >>>>     have a
> >>>>     >>>>      similar
> >>>>     >>>>      >>>>> experience and make the migration easy.
> >>>>     >>>>      >>>>>
> >>>>     >>>>      >>>>>
> >>>>     >>>>      >>>>> On Wed, Jun 26, 2019 at 9:34 AM Chesnay Schepler
> >>>>     >>>>      <chesnay@apache.org <ma...@apache.org>
> >>>>     <mailto:chesnay@apache.org <ma...@apache.org>>>
> >>>>     >>>>      >>>> wrote:
> >>>>     >>>>      >>>>>> >From what I gathered, there's no special
> >>>>     sauce that the
> >>>>     >>>>      Zeppelin
> >>>>     >>>>      >>>>>> project uses which actually integrates a users
> >>>> Travis
> >>>>     >>>>      account into the
> >>>>     >>>>      >>>> PR.
> >>>>     >>>>      >>>>>> They just disabled Travis for PRs. And that's
> >>>>     kind of it.
> >>>>     >>>>      >>>>>>
> >>>>     >>>>      >>>>>> Naturally we can do this (duh) and safe the ASF a
> >>>>     fair
> >>>>     >>>>      amount of
> >>>>     >>>>      >>>>>> resources, but there are downsides:
> >>>>     >>>>      >>>>>>
> >>>>     >>>>      >>>>>> The discoverability of the Travis check takes a
> >>>>     nose-dive.
> >>>>     >>>>      Either we
> >>>>     >>>>      >>>>>> require every contributor to always, an every
> >>>>     commit, also
> >>>>     >>>>      post a
> >>>>     >>>>      >> Travis
> >>>>     >>>>      >>>>>> build, or we have the reviewer sift through the
> >>>>     >>>>      contributors account
> >>>>     >>>>      >> to
> >>>>     >>>>      >>>>>> find it.
> >>>>     >>>>      >>>>>>
> >>>>     >>>>      >>>>>> This is rather cumbersome. Additionally, it's
> >>>>     also not
> >>>>     >>>>      equivalent to
> >>>>     >>>>      >>>>>> having a PR build.
> >>>>     >>>>      >>>>>>
> >>>>     >>>>      >>>>>> A normal branch build takes a branch as is and
> >>>>     tests it. A
> >>>>     >>>>      PR build
> >>>>     >>>>      >>>>>> merges the branch into master, and then runs it.
> >>>>     (Fun fact:
> >>>>     >>>>      This is
> >>>>     >>>>      >> why
> >>>>     >>>>      >>>>>> a PR without merge conflicts is not being run on
> >>>>     Travis.)
> >>>>     >>>>      >>>>>>
> >>>>     >>>>      >>>>>> And ultimately, everyone can already make use
> >>>> of this
> >>>>     >>>>      approach anyway.
> >>>>     >>>>      >>>>>>
> >>>>     >>>>      >>>>>> On 25/06/2019 08:02, Jark Wu wrote:
> >>>>     >>>>      >>>>>>> Hi Jeff,
> >>>>     >>>>      >>>>>>>
> >>>>     >>>>      >>>>>>> Thanks for sharing the Zeppelin approach. I
> >>>>     think it's a
> >>>>     >>>>      good idea to
> >>>>     >>>>      >>>>>>> leverage user's travis account.
> >>>>     >>>>      >>>>>>> In this way, we can have almost unlimited
> >>>>     concurrent build
> >>>>     >>>>      jobs and
> >>>>     >>>>      >>>>>>> developers can restart build by themselves
> >>>>     (currently only
> >>>>     >>>>      committers
> >>>>     >>>>      >>>>>>> can restart PR's build).
> >>>>     >>>>      >>>>>>>
> >>>>     >>>>      >>>>>>> But I'm still not very clear how to integrate
> >>>> user's
> >>>>     >>>>      travis build
> >>>>     >>>>      >> into
> >>>>     >>>>      >>>>>>> the Flink pull request's build automatically.
> >>>>     Can you
> >>>>     >>>>      explain more in
> >>>>     >>>>      >>>>>>> detail?
> >>>>     >>>>      >>>>>>>
> >>>>     >>>>      >>>>>>> Another question: does travis only build
> >>>>     branches for user
> >>>>     >>>>      account?
> >>>>     >>>>      >>>>>>> My concern is that builds for PRs will rebase
> >>>> user's
> >>>>     >>>>      commits against
> >>>>     >>>>      >>>>>>> current master branch.
> >>>>     >>>>      >>>>>>> This will help us to find problems before
> >>>>     merge.  Builds
> >>>>     >>>>      for branches
> >>>>     >>>>      >>>>>>> will lose the impact of new commits in master.
> >>>>     >>>>      >>>>>>> How does Zeppelin solve this problem?
> >>>>     >>>>      >>>>>>>
> >>>>     >>>>      >>>>>>> Thanks again for sharing the idea.
> >>>>     >>>>      >>>>>>>
> >>>>     >>>>      >>>>>>> Regards,
> >>>>     >>>>      >>>>>>> Jark
> >>>>     >>>>      >>>>>>>
> >>>>     >>>>      >>>>>>> On Tue, 25 Jun 2019 at 11:01, Jeff Zhang
> >>>>     <zjffdu@gmail.com <ma...@gmail.com>
> >>>>     >>>>      <mailto:zjffdu@gmail.com <ma...@gmail.com>>
> >>>>     >>>>      >>>>>>> <mailto:zjffdu@gmail.com
> >>>>     <ma...@gmail.com> <mailto:zjffdu@gmail.com
> >>>>     <ma...@gmail.com>>>> wrote:
> >>>>     >>>>      >>>>>>>
> >>>>     >>>>      >>>>>>>  Hi Folks,
> >>>>     >>>>      >>>>>>>
> >>>>     >>>>      >>>>>>>  Zeppelin meet this kind of issue before, we
> >>>> solve
> >>>>     >>>> it by
> >>>>     >>>>      >> delegating
> >>>>     >>>>      >>>>>>>  each
> >>>>     >>>>      >>>>>>>  one's PR build to his travis account
> >>>>     (Everyone can
> >>>>     >>>>      have 5 free
> >>>>     >>>>      >>>>>>>  slot for
> >>>>     >>>>      >>>>>>>  travis build).
> >>>>     >>>>      >>>>>>>  Apache account travis build is only
> >>>> triggered when
> >>>>     >>>>      PR is merged.
> >>>>     >>>>      >>>>>>>
> >>>>     >>>>      >>>>>>>
> >>>>     >>>>      >>>>>>>
> >>>>     >>>>      >>>>>>>  Kurt Young <ykt836@gmail.com
> >>>>     <ma...@gmail.com>
> >>>>     >>>>      <mailto:ykt836@gmail.com <ma...@gmail.com>>
> >>>>     <mailto:ykt836@gmail.com <ma...@gmail.com>
> >>>>     >>>>      <mailto:ykt836@gmail.com <ma...@gmail.com>>>>
> >>>>     >>>>      >>>>>>>  于2019年6月25日周二 上午10:16写道:
> >>>>     >>>>      >>>>>>>
> >>>>     >>>>      >>>>>>>  > (Forgot to cc George)
> >>>>     >>>>      >>>>>>>  >
> >>>>     >>>>      >>>>>>>  > Best,
> >>>>     >>>>      >>>>>>>  > Kurt
> >>>>     >>>>      >>>>>>>  >
> >>>>     >>>>      >>>>>>>  >
> >>>>     >>>>      >>>>>>>  > On Tue, Jun 25, 2019 at 10:16 AM Kurt Young
> >>>>     >>>>      <ykt836@gmail.com <ma...@gmail.com>
> >>>>     <mailto:ykt836@gmail.com <ma...@gmail.com>>
> >>>>     >>>>      >>>>>>> <mailto:ykt836@gmail.com
> >>>>     <ma...@gmail.com> <mailto:ykt836@gmail.com
> >>>>     <ma...@gmail.com>>>>
> >>>>     >>>>      wrote:
> >>>>     >>>>      >>>>>>>  >
> >>>>     >>>>      >>>>>>>  > > Hi Bowen,
> >>>>     >>>>      >>>>>>>  > >
> >>>>     >>>>      >>>>>>>  > > Thanks for bringing this up. We
> >>>>     actually have
> >>>>     >>>>      discussed
> >>>>     >>>>      >> about
> >>>>     >>>>      >>>>>>>  this, and I
> >>>>     >>>>      >>>>>>>  > > think Till and George have
> >>>>     >>>>      >>>>>>>  > > already spend sometime investigating
> >>>>     it. I have
> >>>>     >>>>      cced both of
> >>>>     >>>>      >>>>>>>  them, and
> >>>>     >>>>      >>>>>>>  > > maybe they can share
> >>>>     >>>>      >>>>>>>  > > their findings.
> >>>>     >>>>      >>>>>>>  > >
> >>>>     >>>>      >>>>>>>  > > Best,
> >>>>     >>>>      >>>>>>>  > > Kurt
> >>>>     >>>>      >>>>>>>  > >
> >>>>     >>>>      >>>>>>>  > >
> >>>>     >>>>      >>>>>>>  > > On Tue, Jun 25, 2019 at 10:08 AM Jark Wu
> >>>>     >>>>      <imjark@gmail.com <ma...@gmail.com>
> >>>>     <mailto:imjark@gmail.com <ma...@gmail.com>>
> >>>>     >>>>      >>>>>>> <mailto:imjark@gmail.com
> >>>>     <ma...@gmail.com> <mailto:imjark@gmail.com
> >>>>     <ma...@gmail.com>>>>
> >>>>     >>>>      wrote:
> >>>>     >>>>      >>>>>>>  > >
> >>>>     >>>>      >>>>>>>  > >> Hi Bowen,
> >>>>     >>>>      >>>>>>>  > >>
> >>>>     >>>>      >>>>>>>  > >> Thanks for bringing this. We also
> >>>>     suffered from
> >>>>     >>>>      the long
> >>>>     >>>>      >>>>>>>  build time.
> >>>>     >>>>      >>>>>>>  > >> I agree that we should focus on
> >>>>     solving build
> >>>>     >>>>      capacity
> >>>>     >>>>      >>>>>>>  problem in the
> >>>>     >>>>      >>>>>>>  > >> thread.
> >>>>     >>>>      >>>>>>>  > >>
> >>>>     >>>>      >>>>>>>  > >> My observation is there is only one
> >>>>     build is
> >>>>     >>>>      running, all
> >>>>     >>>>      >> the
> >>>>     >>>>      >>>>>>>  others
> >>>>     >>>>      >>>>>>>  > >> (other
> >>>>     >>>>      >>>>>>>  > >> PRs, master) are pending.
> >>>>     >>>>      >>>>>>>  > >> The pricing plan[1] of travis shows
> >>>>     it can
> >>>>     >>>> support
> >>>>     >>>>      >> concurrent
> >>>>     >>>>      >>>>>>>  build
> >>>>     >>>>      >>>>>>>  > jobs.
> >>>>     >>>>      >>>>>>>  > >> But I don't know which plan we are
> >>>>     using, might
> >>>>     >>>>      be the free
> >>>>     >>>>      >>>>>>>  plan for
> >>>>     >>>>      >>>>>>>  > open
> >>>>     >>>>      >>>>>>>  > >> source.
> >>>>     >>>>      >>>>>>>  > >>
> >>>>     >>>>      >>>>>>>  > >> I cc-ed Chesnay who may have some
> >>>>     experience on
> >>>>     >>>>      Travis.
> >>>>     >>>>      >>>>>>>  > >>
> >>>>     >>>>      >>>>>>>  > >> Regards,
> >>>>     >>>>      >>>>>>>  > >> Jark
> >>>>     >>>>      >>>>>>>  > >>
> >>>>     >>>>      >>>>>>>  > >> [1]: https://travis-ci.com/plans
> >>>>     >>>>      >>>>>>>  > >>
> >>>>     >>>>      >>>>>>>  > >> On Tue, 25 Jun 2019 at 08:11, Bowen Li <
> >>>>     >>>>      >> bowenli86@gmail.com <ma...@gmail.com>
> >>>>     <mailto:bowenli86@gmail.com <ma...@gmail.com>>
> >>>>     >>>>      >>>>>>> <mailto:bowenli86@gmail.com
> >>>>     <ma...@gmail.com>
> >>>>     >>>>      <mailto:bowenli86@gmail.com
> >>>>     <ma...@gmail.com>>>> wrote:
> >>>>     >>>>      >>>>>>>  > >>
> >>>>     >>>>      >>>>>>>  > >> > Hi Steven,
> >>>>     >>>>      >>>>>>>  > >> >
> >>>>     >>>>      >>>>>>>  > >> > I think you may not read what I
> >>>>     wrote. The
> >>>>     >>>>      discussion is
> >>>>     >>>>      >>>> about
> >>>>     >>>>      >>>>>>>  > "unstable
> >>>>     >>>>      >>>>>>>  > >> > build **capacity**", in another word
> >>>>     >>>>      "unstable / lack of
> >>>>     >>>>      >>>> build
> >>>>     >>>>      >>>>>>>  > >> resources",
> >>>>     >>>>      >>>>>>>  > >> > not "unstable build".
> >>>>     >>>>      >>>>>>>  > >> >
> >>>>     >>>>      >>>>>>>  > >> > On Mon, Jun 24, 2019 at 4:40 PM
> >>>>     Steven Wu
> >>>>     >>>>      >>>>>>>  <stevenz3wu@gmail.com
> >>>>     <ma...@gmail.com> <mailto:stevenz3wu@gmail.com
> >>>>     <ma...@gmail.com>>
> >>>>     >>>>      <mailto:stevenz3wu@gmail.com
> >>>>     <ma...@gmail.com> <mailto:stevenz3wu@gmail.com
> >>>>     <ma...@gmail.com>>>>
> >>>>     >>>>      >>>>>>>  > wrote:
> >>>>     >>>>      >>>>>>>  > >> >
> >>>>     >>>>      >>>>>>>  > >> > > long and sometimes unstable build is
> >>>>     >>>>      definitely a pain
> >>>>     >>>>      >>>>>> point.
> >>>>     >>>>      >>>>>>>  > >> > >
> >>>>     >>>>      >>>>>>>  > >> > > I suspect the build failure here in
> >>>>     >>>>      >> flink-connector-kafka
> >>>>     >>>>      >>>>>>>  is not
> >>>>     >>>>      >>>>>>>  > >> related
> >>>>     >>>>      >>>>>>>  > >> > to
> >>>>     >>>>      >>>>>>>  > >> > > my change. but there is no easy
> >>>>     re-run the
> >>>>     >>>>      build on
> >>>>     >>>>      >>>>>>>  travis UI.
> >>>>     >>>>      >>>>>>>  > Google
> >>>>     >>>>      >>>>>>>  > >> > > search showed a trick of
> >>>>     close-and-open the
> >>>>     >>>>      PR will
> >>>>     >>>>      >>>>>>>  trigger rebuild.
> >>>>     >>>>      >>>>>>>  > >> but
> >>>>     >>>>      >>>>>>>  > >> > > that could add noises to the PR
> >>>>     activities.
> >>>>     >>>>      >>>>>>>  > >> > >
> >>>>     >>>> https://travis-ci.org/apache/flink/jobs/545555519
> >>>>     >>>>      >>>>>>>  > >> > >
> >>>>     >>>>      >>>>>>>  > >> > > travis-ci for my personal repo
> >>>>     often failed
> >>>>     >>>>      with
> >>>>     >>>>      >>>>>>>  exceeding time
> >>>>     >>>>      >>>>>>>  > limit
> >>>>     >>>>      >>>>>>>  > >> > after
> >>>>     >>>>      >>>>>>>  > >> > > 4+ hours.
> >>>>     >>>>      >>>>>>>  > >> > > The job exceeded the maximum time
> >>>>     limit for
> >>>>     >>>>      jobs, and
> >>>>     >>>>      >> has
> >>>>     >>>>      >>>>>>>  been
> >>>>     >>>>      >>>>>>>  > >> > terminated.
> >>>>     >>>>      >>>>>>>  > >> > >
> >>>>     >>>>      >>>>>>>  > >> > > On Mon, Jun 24, 2019 at 4:15 PM
> >>>>     Bowen Li
> >>>>     >>>>      >>>>>>>  <bowenli86@gmail.com
> >>>>     <ma...@gmail.com> <mailto:bowenli86@gmail.com
> >>>>     <ma...@gmail.com>>
> >>>>     >>>>      <mailto:bowenli86@gmail.com <mailto:bowenli86@gmail.com
> >
> >>>>     <mailto:bowenli86@gmail.com <ma...@gmail.com>>>>
> >>>>     >>>>      >>>>>>>  > wrote:
> >>>>     >>>>      >>>>>>>  > >> > >
> >>>>     >>>>      >>>>>>>  > >> > > >
> >>>>     >>>> https://travis-ci.org/apache/flink/builds/549681530
> >>>>     >>>>      >>>>>>>  This build
> >>>>     >>>>      >>>>>>>  > >> > request
> >>>>     >>>>      >>>>>>>  > >> > > > has
> >>>>     >>>>      >>>>>>>  > >> > > > been sitting at **HEAD of the
> >>>>     queue**
> >>>>     >>>>      since I first
> >>>>     >>>>      >> saw
> >>>>     >>>>      >>>>>>>  it at PST
> >>>>     >>>>      >>>>>>>  > >> > 10:30am
> >>>>     >>>>      >>>>>>>  > >> > > > (not sure how long it's been
> >>>>     there before
> >>>>     >>>>      10:30am).
> >>>>     >>>>      >>>>>>>  It's PST
> >>>>     >>>>      >>>>>>>  > 4:12pm
> >>>>     >>>>      >>>>>>>  > >> now
> >>>>     >>>>      >>>>>>>  > >> > > and
> >>>>     >>>>      >>>>>>>  > >> > > > it hasn't started yet.
> >>>>     >>>>      >>>>>>>  > >> > > >
> >>>>     >>>>      >>>>>>>  > >> > > > On Mon, Jun 24, 2019 at 2:48 PM
> >>>>     Bowen Li
> >>>>     >>>>      >>>>>>>  <bowenli86@gmail.com
> >>>>     <ma...@gmail.com> <mailto:bowenli86@gmail.com
> >>>>     <ma...@gmail.com>>
> >>>>     >>>>      <mailto:bowenli86@gmail.com <mailto:bowenli86@gmail.com
> >
> >>>>     <mailto:bowenli86@gmail.com <ma...@gmail.com>>>>
> >>>>     >>>>      >>>>>>>  > >> wrote:
> >>>>     >>>>      >>>>>>>  > >> > > >
> >>>>     >>>>      >>>>>>>  > >> > > > > Hi devs,
> >>>>     >>>>      >>>>>>>  > >> > > > >
> >>>>     >>>>      >>>>>>>  > >> > > > > I've been experiencing the pain
> >>>>     >>>>      resulting from lack
> >>>>     >>>>      >>>>>>>  of stable
> >>>>     >>>>      >>>>>>>  > >> build
> >>>>     >>>>      >>>>>>>  > >> > > > > capacity on Travis for Flink
> >>>>     PRs [1].
> >>>>     >>>>      >> Specifically, I
> >>>>     >>>>      >>>>>>>  noticed
> >>>>     >>>>      >>>>>>>  > >> often
> >>>>     >>>>      >>>>>>>  > >> > > that
> >>>>     >>>>      >>>>>>>  > >> > > > no
> >>>>     >>>>      >>>>>>>  > >> > > > > build in the queue is making any
> >>>>     >>>>      progress for
> >>>>     >>>>      >> hours,
> >>>>     >>>>      >>>> and
> >>>>     >>>>      >>>>>>>  > suddenly
> >>>>     >>>>      >>>>>>>  > >> 5
> >>>>     >>>>      >>>>>>>  > >> > or
> >>>>     >>>>      >>>>>>>  > >> > > 6
> >>>>     >>>>      >>>>>>>  > >> > > > > builds kick off all together
> >>>>     after the
> >>>>     >>>>      long pause.
> >>>>     >>>>      >>>>>>>  I'm at PST
> >>>>     >>>>      >>>>>>>  > >> > (UTC-08)
> >>>>     >>>>      >>>>>>>  > >> > > > time
> >>>>     >>>>      >>>>>>>  > >> > > > > zone, and I've seen pause can
> >>>>     be as
> >>>>     >>>>      long as 6 hours
> >>>>     >>>>      >>>>>>>  from PST 9am
> >>>>     >>>>      >>>>>>>  > >> to
> >>>>     >>>>      >>>>>>>  > >> > 3pm
> >>>>     >>>>      >>>>>>>  > >> > > > > (let alone the time needed to
> >>>>     drain the
> >>>>     >>>>      queue
> >>>>     >>>>      >>>>>>>  afterwards).
> >>>>     >>>>      >>>>>>>  > >> > > > >
> >>>>     >>>>      >>>>>>>  > >> > > > > I think this has greatly
> >>>>     impacted our
> >>>>     >>>>      productivity.
> >>>>     >>>>      >>>> I've
> >>>>     >>>>      >>>>>>>  > >> experienced
> >>>>     >>>>      >>>>>>>  > >> > > that
> >>>>     >>>>      >>>>>>>  > >> > > > > PRs submitted in the early
> >>>>     morning of
> >>>>     >>>>      PST time zone
> >>>>     >>>>      >>>>>>>  won't finish
> >>>>     >>>>      >>>>>>>  > >> > their
> >>>>     >>>>      >>>>>>>  > >> > > > > build until late night of the
> >>>>     same day.
> >>>>     >>>>      >>>>>>>  > >> > > > >
> >>>>     >>>>      >>>>>>>  > >> > > > > So my questions are:
> >>>>     >>>>      >>>>>>>  > >> > > > >
> >>>>     >>>>      >>>>>>>  > >> > > > > - Has anyone else experienced
> >>>>     the same
> >>>>     >>>>      problem or
> >>>>     >>>>      >>>>>>>  have similar
> >>>>     >>>>      >>>>>>>  > >> > > > observation
> >>>>     >>>>      >>>>>>>  > >> > > > > on TravisCI? (I suspect it
> >>>>     has things
> >>>>     >>>>      to do with
> >>>>     >>>>      >> time
> >>>>     >>>>      >>>>>>>  zone)
> >>>>     >>>>      >>>>>>>  > >> > > > >
> >>>>     >>>>      >>>>>>>  > >> > > > > - What pricing plan of
> >>>>     TravisCI is
> >>>>     >>>>      Flink currently
> >>>>     >>>>      >>>>>>>  using? Is it
> >>>>     >>>>      >>>>>>>  > >> the
> >>>>     >>>>      >>>>>>>  > >> > > free
> >>>>     >>>>      >>>>>>>  > >> > > > > plan for open source
> >>>>     projects? What
> >>>>     >>>> are the
> >>>>     >>>>      >>>>>>>  guaranteed build
> >>>>     >>>>      >>>>>>>  > >> capacity
> >>>>     >>>>      >>>>>>>  > >> > > of
> >>>>     >>>>      >>>>>>>  > >> > > > > the current plan?
> >>>>     >>>>      >>>>>>>  > >> > > > >
> >>>>     >>>>      >>>>>>>  > >> > > > > - If the current pricing plan
> >>>>     (either
> >>>>     >>>>      free or paid)
> >>>>     >>>>      >>>>>> can't
> >>>>     >>>>      >>>>>>>  > provide
> >>>>     >>>>      >>>>>>>  > >> > > stable
> >>>>     >>>>      >>>>>>>  > >> > > > > build capacity, can we
> >>>>     upgrade to a
> >>>>     >>>>      higher priced
> >>>>     >>>>      >>>>>>>  plan with
> >>>>     >>>>      >>>>>>>  > larger
> >>>>     >>>>      >>>>>>>  > >> > and
> >>>>     >>>>      >>>>>>>  > >> > > > more
> >>>>     >>>>      >>>>>>>  > >> > > > > stable build capacity?
> >>>>     >>>>      >>>>>>>  > >> > > > >
> >>>>     >>>>      >>>>>>>  > >> > > > > BTW, another factor that
> >>>>     contribute to
> >>>>     >>>> the
> >>>>     >>>>      >>>>>>>  productivity problem
> >>>>     >>>>      >>>>>>>  > is
> >>>>     >>>>      >>>>>>>  > >> > that
> >>>>     >>>>      >>>>>>>  > >> > > > > our build is slow - we run
> >>>>     full build
> >>>>     >>>>      for every PR
> >>>>     >>>>      >>>> and a
> >>>>     >>>>      >>>>>>>  > >> successful
> >>>>     >>>>      >>>>>>>  > >> > > full
> >>>>     >>>>      >>>>>>>  > >> > > > > build takes ~5h. We
> >>>>     definitely have
> >>>>     >>>>      more options to
> >>>>     >>>>      >>>>>>>  solve it,
> >>>>     >>>>      >>>>>>>  > for
> >>>>     >>>>      >>>>>>>  > >> > > > instance,
> >>>>     >>>>      >>>>>>>  > >> > > > > modularize the build graphs
> >>>>     and reuse
> >>>>     >>>>      artifacts
> >>>>     >>>>      >> from
> >>>>     >>>>      >>>> the
> >>>>     >>>>      >>>>>>>  > previous
> >>>>     >>>>      >>>>>>>  > >> > > build.
> >>>>     >>>>      >>>>>>>  > >> > > > > But I think that can be a big
> >>>>     effort
> >>>>     >>>>      which is much
> >>>>     >>>>      >>>>>>>  harder to
> >>>>     >>>>      >>>>>>>  > >> > accomplish
> >>>>     >>>>      >>>>>>>  > >> > > > in
> >>>>     >>>>      >>>>>>>  > >> > > > > a short period of time and
> >>>>     may deserve
> >>>>     >>>>      its own
> >>>>     >>>>      >>>> separate
> >>>>     >>>>      >>>>>>>  > >> discussion.
> >>>>     >>>>      >>>>>>>  > >> > > > >
> >>>>     >>>>      >>>>>>>  > >> > > > > [1]
> >>>>     >>>>      >> https://travis-ci.org/apache/flink/pull_requests
> >>>>     >>>>      >>>>>>>  > >> > > > >
> >>>>     >>>>      >>>>>>>  > >> > > > >
> >>>>     >>>>      >>>>>>>  > >> > > >
> >>>>     >>>>      >>>>>>>  > >> > >
> >>>>     >>>>      >>>>>>>  > >> >
> >>>>     >>>>      >>>>>>>  > >>
> >>>>     >>>>      >>>>>>>  > >
> >>>>     >>>>      >>>>>>>  >
> >>>>     >>>>      >>>>>>>
> >>>>     >>>>      >>>>>>>
> >>>>     >>>>      >>>>>>>  --
> >>>>     >>>>      >>>>>>>  Best Regards
> >>>>     >>>>      >>>>>>>
> >>>>     >>>>      >>>>>>>  Jeff Zhang
> >>>>     >>>>      >>>>>>>
> >>>>     >>>>      >>
> >>>>     >>>>
> >>>>     >>>
> >>>>     >>
> >>>>
> >>>
> >>>
> >>
> >>
> >
> >
>
>

Re: [RESULT][VOTE] Migrate to sponsored Travis account

Posted by Chesnay Schepler <ch...@apache.org>.
The kinks have been worked out; the bot is running again and pr builds 
are yet again no longer running on ASF resources.

PRs are mirrored to: https://github.com/flink-ci/flink
Bot source: https://github.com/flink-ci/ci-bot

On 08/07/2019 17:14, Chesnay Schepler wrote:
> I have temporarily re-enabled running PR builds on the ASF account; 
> migrating to the Travis subscription caused some issues in the bot 
> that I have to fix first.
>
> On 07/07/2019 23:01, Chesnay Schepler wrote:
>> The vote has passed unanimously in favor of migrating to a separate 
>> Travis account.
>>
>> I will now set things up such that no PullRequest is no longer run on 
>> the ASF servers.
>> This is a major setup in reducing our usage of ASF resources.
>> For the time being we'll use free Travis plan for flink-ci (i.e. 5 
>> workers, which is the same the ASF gives us). Over the course of the 
>> next week we'll setup the Ververica subscription to increase this limit.
>>
>> From now now, a bot will mirror all new and updated PullRequests to a 
>> mirror repository (https://github.com/flink-ci/flink-ci) and write an 
>> update into the PR once the build is complete.
>> I have ran the bots for the past 3 days in parallel to our existing 
>> Travis and it was working without major issues.
>>
>> The biggest change that contributors will see is that there's no 
>> longer a icon next to each commit. We may revisit this in the future.
>>
>> I'll setup a repo with the source of the bot later.
>>
>> On 04/07/2019 10:46, Chesnay Schepler wrote:
>>> I've raised a JIRA 
>>> <https://issues.apache.org/jira/browse/INFRA-18703>with INFRA to 
>>> inquire whether it would be possible to switch to a different Travis 
>>> account, and if so what steps would need to be taken.
>>> We need a proper confirmation from INFRA since we are not in full 
>>> control of the flink repository (for example, we cannot access the 
>>> settings page).
>>>
>>> If this is indeed possible, Ververica is willing sponsor a Travis 
>>> account for the Flink project.
>>> This would provide us with more than enough resources than we need.
>>>
>>> Since this makes the project more reliant on resources provided by 
>>> external companies I would like to vote on this.
>>>
>>> Please vote on this proposal, as follows:
>>> [ ] +1, Approve the migration to a Ververica-sponsored Travis 
>>> account, provided that INFRA approves
>>> [ ] -1, Do not approach the migration to a Ververica-sponsored 
>>> Travis account
>>>
>>> The vote will be open for at least 24h, and until we have 
>>> confirmation from INFRA. The voting period may be shorter than the 
>>> usual 3 days since our current is effectively not working.
>>>
>>> On 04/07/2019 06:51, Bowen Li wrote:
>>>> Re: > Are they using their own Travis CI pool, or did the switch to 
>>>> an entirely different CI service?
>>>>
>>>> I reached out to Wes and Krisztián from Apache Arrow PMC. They are 
>>>> currently moving away from ASF's Travis to their own in-house metal 
>>>> machines at [1] with custom CI application at [2]. They've seen 
>>>> significant improvement w.r.t both much higher performance and 
>>>> basically no resource waiting time, "night-and-day" difference 
>>>> quoting Wes.
>>>>
>>>> Re: > If we can just switch to our own Travis pool, just for our 
>>>> project, then this might be something we can do fairly quickly?
>>>>
>>>> I believe so, according to [3] and [4]
>>>>
>>>>
>>>> [1] https://ci.ursalabs.org/ <https://ci.ursalabs.org/#/>
>>>> [2] https://github.com/ursa-labs/ursabot
>>>> [3] 
>>>> https://docs.travis-ci.com/user/migrate/open-source-repository-migration 
>>>>
>>>> [4] 
>>>> https://docs.travis-ci.com/user/migrate/open-source-on-travis-ci-com
>>>>
>>>>
>>>>
>>>> On Wed, Jul 3, 2019 at 12:01 AM Chesnay Schepler 
>>>> <chesnay@apache.org <ma...@apache.org>> wrote:
>>>>
>>>>     Are they using their own Travis CI pool, or did the switch to an
>>>>     entirely different CI service?
>>>>
>>>>     If we can just switch to our own Travis pool, just for our
>>>>     project, then
>>>>     this might be something we can do fairly quickly?
>>>>
>>>>     On 03/07/2019 05:55, Bowen Li wrote:
>>>>     > I responded in the INFRA ticket [1] that I believe they are
>>>>     using a wrong
>>>>     > metric against Flink and the total build time is a completely
>>>>     different
>>>>     > thing than guaranteed build capacity.
>>>>     >
>>>>     > My response:
>>>>     >
>>>>     > "As mentioned above, since I started to pay attention to Flink's
>>>>     build
>>>>     > queue a few tens of days ago, I'm in Seattle and I saw no build
>>>>     was kicking
>>>>     > off in PST daytime in weekdays for Flink. Our teammates in China
>>>>     and Europe
>>>>     > have also reported similar observations. So we need to evaluate
>>>>     how the
>>>>     > large total build time came from - if 1) your number and 2) our
>>>>     > observations from three locations that cover pretty much a full
>>>>     day, are
>>>>     > all true, I **guess** one reason can be that - highly likely the
>>>>     extra
>>>>     > build time came from weekends when other Apache projects may be
>>>>     idle and
>>>>     > Flink just drains hard its congested queue.
>>>>     >
>>>>     > Please be aware of that we're not complaining about the lack of
>>>>     resources
>>>>     > in general, I'm complaining about the lack of **stable, 
>>>> dedicated**
>>>>     > resources. An example for the latter one is, currently even if
>>>>     no build is
>>>>     > in Flink's queue and I submit a request to be the queue head 
>>>> in PST
>>>>     > morning, my build won't even start in 6-8+h. That is an absurd
>>>>     amount of
>>>>     > waiting time.
>>>>     >
>>>>     > That's saying, if ASF INFRA decides to adopt a quota system and
>>>>     grants
>>>>     > Flink five DEDICATED servers that runs all the time only for
>>>>     Flink, that'll
>>>>     > be PERFECT and can totally solve our problem now.
>>>>     >
>>>>     > Please be aware of that we're not complaining about the lack of
>>>>     resources
>>>>     > in general, I'm complaining about the lack of **stable, 
>>>> dedicated**
>>>>     > resources. An example for the latter one is, currently even if
>>>>     no build is
>>>>     > in Flink's queue and I submit a request to be the queue head 
>>>> in PST
>>>>     > morning, my build won't even start in 6-8+h. That is an absurd
>>>>     amount of
>>>>     > waiting time.
>>>>     >
>>>>     >
>>>>     > That's saying, if ASF INFRA decides to adopt a quota system and
>>>>     grants
>>>>     > Flink five DEDICATED servers that runs all the time only for
>>>>     Flink, that'll
>>>>     > be PERFECT and can totally solve our problem now.
>>>>     >
>>>>     > I feel what's missing in the ASF INFRA's Travis resource pool is
>>>>     some level
>>>>     > of build capacity SLAs and certainty"
>>>>     >
>>>>     >
>>>>     > Again, I believe there are differences in nature of these two
>>>>     problems,
>>>>     > long build time v.s. lack of dedicated build resource. That's
>>>>     saying,
>>>>     > shortening build time may relieve the situation, and may not.
>>>>     I'm sightly
>>>>     > negative on disabling IT cases for PRs, due to the downside is
>>>>     that we are
>>>>     > at risk of any potential bugs in PR that UTs doesn't catch, and
>>>>     may cost a
>>>>     > lot more to fix and if it slows others down or even block
>>>>     others, but am
>>>>     > open to others opinions on it.
>>>>     >
>>>>     > AFAICT from INFRA ticket[1], donating to ASF INFRA won't be
>>>>     feasible to
>>>>     > solve our problem since INFRA's pool is fully shared and they
>>>>     have no
>>>>     > control and finer insights over resource allocation to a
>>>>     specific Apache
>>>>     > project. As mentioned in [1], Apache Arrow is moving away from
>>>>     ASF INFRA
>>>>     > Travis pool (they are actually surprised Flink hasn't plan to do
>>>>     so). I
>>>>     > know that Spark is on its own build infra. If we all agree that
>>>>     funding our
>>>>     > own build infra, I'd be glad to help investigate any potential
>>>>     options
>>>>     > after releasing 1.9 since I'm super busy with 1.9 now.
>>>>     >
>>>>     > [1] https://issues.apache.org/jira/browse/INFRA-18533
>>>>     >
>>>>     >
>>>>     >
>>>>     > On Tue, Jul 2, 2019 at 4:46 AM Chesnay Schepler
>>>>     <chesnay@apache.org <ma...@apache.org>> wrote:
>>>>     >
>>>>     >> As a short-term stopgap, since we can assume this issue to
>>>>     become much
>>>>     >> worse in the following days/weeks, we could disable IT cases in
>>>>     PRs and
>>>>     >> only run them on master.
>>>>     >>
>>>>     >> On 02/07/2019 12:03, Chesnay Schepler wrote:
>>>>     >>> People really have to stop thinking that just because
>>>>     something works
>>>>     >>> for us it is also a good solution.
>>>>     >>> Also, please remember that our builds run for 2h from start to
>>>>     finish,
>>>>     >>> and not the 14 _minutes_ it takes for zeppelin.
>>>>     >>> We are dealing with an entirely different scale here, both in
>>>>     terms of
>>>>     >>> build times and number of builds.
>>>>     >>>
>>>>     >>> In this very thread people have been complaining about long 
>>>> queue
>>>>     >>> times for their builds. Surprise, other Apache projects 
>>>> have been
>>>>     >>> suffering the very same thing due to us not controlling our 
>>>> build
>>>>     >>> times. While switching services (be it Jenkins, CircleCI or
>>>>     whatever)
>>>>     >>> will possibly work for us (and these options are actually
>>>>     attractive,
>>>>     >>> like CircleCI's proper support for build artifacts), it 
>>>> will also
>>>>     >>> result in us likely negatively affecting other projects in
>>>>     significant
>>>>     >>> ways.
>>>>     >>>
>>>>     >>> Sure, the Jenkins setup has a good user experience for us, at
>>>>     the cost
>>>>     >>> of blocking Jenkins workers for a _lot_ of time. Right now we
>>>>     have 25
>>>>     >>> PR's in our queue; that's possibly 50h we'd consume of Jenkins
>>>>     >>> resources, and the European contributors haven't even really
>>>>     started yet.
>>>>     >>>
>>>>     >>> FYI, the latest INFRA response from INFRA-18533:
>>>>     >>>
>>>>     >>> "Our rough metrics shows that Flink used over 5800 hours of
>>>>     build time
>>>>     >>> last month. That is equal to EIGHT servers running 24/7 for
>>>>     the ENTIRE
>>>>     >>> MONTH. EIGHT. nonstop.
>>>>     >>> When we discovered this last night, we discussed it some and
>>>>     are going
>>>>     >>> to tune down Flink to allow only five executors maximum. We 
>>>> cannot
>>>>     >>> allow Flink to consume so much of a Foundation shared 
>>>> resource."
>>>>     >>>
>>>>     >>> So yes, we either
>>>>     >>> a) have to heavily reduce our CI usage or
>>>>     >>> b) fund our own, either maintaining it ourselves or donating
>>>>     to Apache.
>>>>     >>>
>>>>     >>> On 02/07/2019 05:11, Bowen Li wrote:
>>>>     >>>> By looking at the git history of the Jenkins script, its core
>>>>     part
>>>>     >>>> was finished in March 2017 (and only two minor update in
>>>>     2017/2018),
>>>>     >>>> so it's been running for over two years now and feels like
>>>>     Zepplin
>>>>     >>>> community has been quite happy with it. @Jeff Zhang
>>>>     >>>> <mailto:zjffdu@gmail.com <ma...@gmail.com>> can you
>>>>     share your insights and user
>>>>     >>>> experience with the Jenkins+Travis approach?
>>>>     >>>>
>>>>     >>>> Things like:
>>>>     >>>>
>>>>     >>>> - has the approach completely solved the resource capacity
>>>>     problem
>>>>     >>>> for Zepplin community? is Zepplin community happy with the
>>>>     result?
>>>>     >>>> - is the whole configuration chain stable (e.g. uptime) 
>>>> enough?
>>>>     >>>> - how often do you need to maintain the Jenkins infra? how 
>>>> many
>>>>     >>>> people are usually involved in maintenance and bug-fixes?
>>>>     >>>>
>>>>     >>>> The downside of this approach seems mostly to be on the
>>>>     maintenance
>>>>     >>>> to me - maintain the script and Jenkins infra.
>>>>     >>>>
>>>>     >>>> ** Having Our Own Travis-CI.com Account **
>>>>     >>>>
>>>>     >>>> Another alternative I've been thinking of is to have our own
>>>>     >>>> travis-ci.com <http://travis-ci.com> <http://travis-ci.com>
>>>>     account with paid dedicated
>>>>     >>>> resources. Note travis-ci.org <http://travis-ci.org>
>>>>     <http://travis-ci.org> is the free
>>>>     >>>> version and travis-ci.com <http://travis-ci.com>
>>>>     <http://travis-ci.com> is the commercial
>>>>     >>>> version. We currently use a shared resource pool managed by
>>>>     ASK INFRA
>>>>     >>>> team on travis-ci.org <http://travis-ci.org>
>>>>     <http://travis-ci.org>, but we have no control
>>>>     >>>> over it - we can't see how it's configured, how much
>>>>     resources are
>>>>     >>>> available, how resources are allocated among Apache projects,
>>>>     etc.
>>>>     >>>> The nice thing about having an account on travis-ci.com
>>>>     <http://travis-ci.com>
>>>>     >>>> <http://travis-ci.com> are:
>>>>     >>>>
>>>>     >>>> - relatively low cost with much better resource guarantee
>>>>     than what
>>>>     >>>> we currently have [1]: $249/month with 5 dedicated 
>>>> concurrency,
>>>>     >>>> $489/month with 10 concurrency
>>>>     >>>> - low maintenance work compared to using Jenkins
>>>>     >>>> - (potentially) no migration cost according to Travis's 
>>>> doc [2]
>>>>     >>>> (pending verification)
>>>>     >>>> - full control over the build capacity/configuration 
>>>> compared to
>>>>     >>>> using ASF INFRA's pool
>>>>     >>>>
>>>>     >>>> I'd be surprised if we as such a vibrant community cannot
>>>>     find and
>>>>     >>>> fund $249*12=$2988 a year in exchange for a much better 
>>>> developer
>>>>     >>>> experience and much higher productivity.
>>>>     >>>>
>>>>     >>>> [1] https://travis-ci.com/plans
>>>>     >>>> [2]
>>>>     >>>>
>>>>     >>
>>>> https://docs.travis-ci.com/user/migrate/open-source-repository-migration 
>>>>
>>>>     >>>> On Sat, Jun 29, 2019 at 8:39 AM Chesnay Schepler
>>>>     <chesnay@apache.org <ma...@apache.org>
>>>>     >>>> <mailto:chesnay@apache.org <ma...@apache.org>>> 
>>>> wrote:
>>>>     >>>>
>>>>     >>>>      So yes, the Jenkins job keeps pulling the state from
>>>>     Travis until it
>>>>     >>>>      finishes.
>>>>     >>>>
>>>>     >>>>      Note sure I'm comfortable with the idea of using Jenkins
>>>>     workers
>>>>     >>>>      just to
>>>>     >>>>      idle for a several hours.
>>>>     >>>>
>>>>     >>>>      On 29/06/2019 14:56, Jeff Zhang wrote:
>>>>     >>>>      > Here's what zeppelin community did, we make a python
>>>>     script to
>>>>     >>>>      check the
>>>>     >>>>      > build status of pull request.
>>>>     >>>>      > Here's script:
>>>>     >>>>      >
>>>> https://github.com/apache/zeppelin/blob/master/travis_check.py
>>>>     >>>>      >
>>>>     >>>>      > And this is the script we used in Jenkins build job.
>>>>     >>>>      >
>>>>     >>>>      > if [ -f "travis_check.py" ]; then
>>>>     >>>>      >    git log -n 1
>>>>     >>>>      >    STATUS=$(curl -s $BUILD_URL | grep -e "GitHub pull
>>>>     >>>>      request.*from.*" | sed
>>>>     >>>>      > 's/.*GitHub pull request <a
>>>>     >>>>      > href=\"\(https[^"]*\).*from[^"]*.\(https[^"]*\).*/\1
>>>>     \2/g')
>>>>     >>>>      >    AUTHOR=$(echo $STATUS | sed 's/.*[/]\(.*\)$/\1/g')
>>>>     >>>>      >    PR=$(echo $STATUS | awk '{print $1}' | sed
>>>>     >>>> 's/.*[/]\(.*\)$/\1/g')
>>>>     >>>>      >    #COMMIT=$(git log -n 1 | grep "^Merge:" | awk
>>>>     '{print $3}')
>>>>     >>>>      >    #if [ -z $COMMIT ]; then
>>>>     >>>>      >    #  COMMIT=$(curl -s
>>>>     >>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR
>>>>     >>>>      > | grep -e "\"label\":" -e "\"ref\":" -e "\"sha\":" |
>>>>     tr '\n' ' '
>>>>     >>>>      | sed
>>>>     >>>>      > 's/\(.*sha[^,]*,\)\(.*ref.*\)/\1 = \2/g' | tr = '\n' |
>>>>     grep -v
>>>>     >>>>      "apache:" |
>>>>     >>>>      > sed 's/.*sha.[^"]*["]\([^"]*\).*/\1/g')
>>>>     >>>>      >    #fi
>>>>     >>>>      >
>>>>     >>>>      >    # get commit hash from PR
>>>>     >>>>      >    COMMIT=$(curl -s
>>>>     >>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR |
>>>>     >>>>      > grep -e "\"label\":" -e "\"ref\":" -e "\"sha\":" | tr
>>>>     '\n' ' '
>>>>     >>>> | sed
>>>>     >>>>      > 's/\(.*sha[^,]*,\)\(.*ref.*\)/\1 = \2/g' | tr = '\n' |
>>>>     grep -v
>>>>     >>>>      "apache:" |
>>>>     >>>>      > sed 's/.*sha.[^"]*["]\([^"]*\).*/\1/g')
>>>>     >>>>      >    sleep 30 # sleep few moment to wait travis starts
>>>>     the build
>>>>     >>>>      >    RET_CODE=0
>>>>     >>>>      >    python ./travis_check.py ${AUTHOR} ${COMMIT} ||
>>>>     RET_CODE=$?
>>>>     >>>>      >    if [ $RET_CODE -eq 2 ]; then # try with repository
>>>>     name when
>>>>     >>>>      travis-ci is
>>>>     >>>>      > not available in the account
>>>>     >>>>      >      RET_CODE=0
>>>>     >>>>      >      AUTHOR=$(curl -s
>>>>     >>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR
>>>>     >>>>      > | grep '"full_name":' | grep -v "apache/zeppelin" | 
>>>> sed
>>>>     >>>>      > 's/.*[:][^"]*["]\([^/]*\).*/\1/g')
>>>>     >>>>      >    python ./travis_check.py ${AUTHOR} ${COMMIT} ||
>>>>     RET_CODE=$?
>>>>     >>>>      >    fi
>>>>     >>>>      >
>>>>     >>>>      >    if [ $RET_CODE -eq 2 ]; then # fail with can't find
>>>>     build
>>>>     >>>>      information in
>>>>     >>>>      > the travis
>>>>     >>>>      >      set +x
>>>>     >>>>      >      echo
>>>>     "-----------------------------------------------------"
>>>>     >>>>      >      echo "Looks like travis-ci is not configured for
>>>>     your fork."
>>>>     >>>>      >      echo "Please setup by swich on 'zeppelin'
>>>>     repository at
>>>>     >>>>      > https://travis-ci.org/profile and travis-ci."
>>>>     >>>>      >      echo "And then make sure 'Build branch updates'
>>>>     option is
>>>>     >>>>      enabled in
>>>>     >>>>      > the settings
>>>>     https://travis-ci.org/${AUTHOR}/zeppelin/settings
>>>> <https://travis-ci.org/$%7BAUTHOR%7D/zeppelin/settings>
>>>>     >>>> <https://travis-ci.org/$%7BAUTHOR%7D/zeppelin/settings>."
>>>>     >>>>      >      echo ""
>>>>     >>>>      >      echo "To trigger CI after setup, you will need
>>>>     ammend your
>>>>     >>>>      last commit
>>>>     >>>>      > with"
>>>>     >>>>      >      echo "git commit --amend"
>>>>     >>>>      >      echo "git push your-remote HEAD --force"
>>>>     >>>>      >      echo ""
>>>>     >>>>      >      echo "See
>>>>     >>>>      >
>>>>     >>>>
>>>>     >>
>>>> http://zeppelin.apache.org/contribution/contributions.html#continuous-integration 
>>>>
>>>>     >>>>      > ."
>>>>     >>>>      >    fi
>>>>     >>>>      >
>>>>     >>>>      >    exit $RET_CODE
>>>>     >>>>      > else
>>>>     >>>>      >    set +x
>>>>     >>>>      >    echo "travis_check.py does not exists"
>>>>     >>>>      >    exit 1
>>>>     >>>>      > fi
>>>>     >>>>      >
>>>>     >>>>      > Chesnay Schepler <chesnay@apache.org
>>>>     <ma...@apache.org>
>>>>     >>>>      <mailto:chesnay@apache.org <ma...@apache.org>>>
>>>>     于2019年6月29日周六 下午3:17写道:
>>>>     >>>>      >
>>>>     >>>>      >> Does this imply that a Jenkins job is active as long
>>>>     as the
>>>>     >>>>      Travis build
>>>>     >>>>      >> runs?
>>>>     >>>>      >>
>>>>     >>>>      >> On 26/06/2019 21:28, Bowen Li wrote:
>>>>     >>>>      >>> Hi,
>>>>     >>>>      >>>
>>>>     >>>>      >>> @Dawid, I think the "long test running" as I
>>>>     mentioned in the
>>>>     >>>>      first
>>>>     >>>>      >> email,
>>>>     >>>>      >>> also as you guys said, belongs to "a big effort
>>>>     which is much
>>>>     >>>>      harder to
>>>>     >>>>      >>> accomplish in a short period of time and may deserve
>>>>     its own
>>>>     >>>>      separate
>>>>     >>>>      >>> discussion". Thus I didn't include it in what we can
>>>>     do in a
>>>>     >>>>      foreseeable
>>>>     >>>>      >>> short term.
>>>>     >>>>      >>>
>>>>     >>>>      >>> Besides, I don't think that's the ultimate reason
>>>>     for lack of
>>>>     >>>>      build
>>>>     >>>>      >>> resources. Even if the build is shortened to
>>>>     something like
>>>>     >>>>      2h, the
>>>>     >>>>      >>> problems of no build machine works about 6 or more
>>>>     hours in
>>>>     >>>>      PST daytime
>>>>     >>>>      >>> that I described will still happen, because no
>>>>     machine from
>>>>     >>>>      ASF INFRA's
>>>>     >>>>      >>> pool is allocated to Flink. As I have paid close
>>>>     attention to
>>>>     >>>>      the build
>>>>     >>>>      >>> queue in the past few weekdays, it's a pretty clear
>>>>     pattern now.
>>>>     >>>>      >>>
>>>>     >>>>      >>> **The ultimate root cause** for that is - we don't
>>>>     have any
>>>>     >>>>      **dedicated**
>>>>     >>>>      >>> build resources that we can stably rely on. I'm
>>>>     actually ok to
>>>>     >>>>      wait for a
>>>>     >>>>      >>> long time if there are build requests running, it
>>>>     means at
>>>>     >>>>      least we are
>>>>     >>>>      >>> making progress. But I'm not ok with no build
>>>>     resource. A
>>>>     >>>>      better place I
>>>>     >>>>      >>> think we should aim at in short term is to always
>>>>     have at
>>>>     >>>>      least a central
>>>>     >>>>      >>> pool (can be 3 or 5) of machines dedicated to build
>>>>     Flink at
>>>>     >>>>      any time, or
>>>>     >>>>      >>> maybe use users resources.
>>>>     >>>>      >>>
>>>>     >>>>      >>> @Chesnay @Robert I synced with Jeff offline that
>>>>     Zeppelin
>>>>     >>>>      community is
>>>>     >>>>      >>> using a Jenkins job to automatically build on users'
>>>>     travis
>>>>     >>>>      account and
>>>>     >>>>      >>> link the result back to github PR. I guess the
>>>>     Jenkins job
>>>>     >>>>      would fetch
>>>>     >>>>      >>> latest upstream master and build the PR against it.
>>>>     Jeff has
>>>>     >>>> filed
>>>>     >>>>      >> tickets
>>>>     >>>>      >>> to learn and get access to the Jenkins infra. It'll
>>>>     better to
>>>>     >>>>      fully
>>>>     >>>>      >>> understand it first before judging this approach.
>>>>     >>>>      >>>
>>>>     >>>>      >>> I also heard good things about CircleCI, and ASF
>>>>     INFRA seems
>>>>     >>>>      to have a
>>>>     >>>>      >> pool
>>>>     >>>>      >>> of build capacity there too. Can be an alternative
>>>>     to consider.
>>>>     >>>>      >>>
>>>>     >>>>      >>>
>>>>     >>>>      >>>
>>>>     >>>>      >>>
>>>>     >>>>      >>>
>>>>     >>>>      >>>
>>>>     >>>>      >>>
>>>>     >>>>      >>>
>>>>     >>>>      >>>
>>>>     >>>>      >>> On Wed, Jun 26, 2019 at 12:44 AM Dawid Wysakowicz <
>>>>     >>>>      >> dwysakowicz@apache.org
>>>>     <ma...@apache.org> <mailto:dwysakowicz@apache.org
>>>>     <ma...@apache.org>>>
>>>>     >>>>      >>> wrote:
>>>>     >>>>      >>>
>>>>     >>>>      >>>> Sorry to jump in late, but I think Bowen missed the
>>>>     most
>>>>     >>>>      important point
>>>>     >>>>      >>>> from Chesnay's previous message in the summary. The
>>>>     ultimate
>>>>     >>>>      reason for
>>>>     >>>>      >>>> all the problems is that the tests take close to 2
>>>>     hours to
>>>>     >>>>      run already.
>>>>     >>>>      >>>> I fully support this claim: "Unless people start
>>>>     caring about
>>>>     >>>>      test times
>>>>     >>>>      >>>> before adding them, this issue cannot be solved"
>>>>     >>>>      >>>>
>>>>     >>>>      >>>> This is also another reason why using user's Travis
>>>>     account
>>>>     >>>>      won't help.
>>>>     >>>>      >>>> Every few weeks we reach the user's time limit for
>>>>     a single
>>>>     >>>>      profile.
>>>>     >>>>      >>>> This makes the user's builds simply fail, until we
>>>>     either
>>>>     >>>>      properly
>>>>     >>>>      >>>> decrease the time the tests take (which I am not
>>>>     sure we ever
>>>>     >>>>      did) or
>>>>     >>>>      >>>> postpone the problem by splitting into more
>>>>     profiles. (Note
>>>>     >>>>      that the ASF
>>>>     >>>>      >>>> Travis account has higher time limits)
>>>>     >>>>      >>>>
>>>>     >>>>      >>>> Best,
>>>>     >>>>      >>>>
>>>>     >>>>      >>>> Dawid
>>>>     >>>>      >>>>
>>>>     >>>>      >>>> On 26/06/2019 09:36, Robert Metzger wrote:
>>>>     >>>>      >>>>> Do we know if using "the best" available hardware
>>>>     would
>>>>     >>>>      improve the
>>>>     >>>>      >> build
>>>>     >>>>      >>>>> times?
>>>>     >>>>      >>>>> Imagine we would run the build on machines with
>>>>     plenty of
>>>>     >>>>      main memory
>>>>     >>>>      >> to
>>>>     >>>>      >>>>> mount everything to ramdisk + the latest CPU
>>>>     architecture?
>>>>     >>>>      >>>>>
>>>>     >>>>      >>>>> Throwing hardware at the problem could help reduce
>>>>     the time
>>>>     >>>>      of an
>>>>     >>>>      >>>>> individual build, and using our own infrastructure
>>>>     would
>>>>     >>>>      remove our
>>>>     >>>>      >>>>> dependency on Apache's Travis account (with the
>>>>     obvious
>>>>     >>>>      downside of
>>>>     >>>>      >>>> having
>>>>     >>>>      >>>>> to maintain the infrastructure)
>>>>     >>>>      >>>>> We could use an open source travis alternative, to
>>>>     have a
>>>>     >>>>      similar
>>>>     >>>>      >>>>> experience and make the migration easy.
>>>>     >>>>      >>>>>
>>>>     >>>>      >>>>>
>>>>     >>>>      >>>>> On Wed, Jun 26, 2019 at 9:34 AM Chesnay Schepler
>>>>     >>>>      <chesnay@apache.org <ma...@apache.org>
>>>>     <mailto:chesnay@apache.org <ma...@apache.org>>>
>>>>     >>>>      >>>> wrote:
>>>>     >>>>      >>>>>> >From what I gathered, there's no special
>>>>     sauce that the
>>>>     >>>>      Zeppelin
>>>>     >>>>      >>>>>> project uses which actually integrates a users 
>>>> Travis
>>>>     >>>>      account into the
>>>>     >>>>      >>>> PR.
>>>>     >>>>      >>>>>> They just disabled Travis for PRs. And that's
>>>>     kind of it.
>>>>     >>>>      >>>>>>
>>>>     >>>>      >>>>>> Naturally we can do this (duh) and safe the ASF a
>>>>     fair
>>>>     >>>>      amount of
>>>>     >>>>      >>>>>> resources, but there are downsides:
>>>>     >>>>      >>>>>>
>>>>     >>>>      >>>>>> The discoverability of the Travis check takes a
>>>>     nose-dive.
>>>>     >>>>      Either we
>>>>     >>>>      >>>>>> require every contributor to always, an every
>>>>     commit, also
>>>>     >>>>      post a
>>>>     >>>>      >> Travis
>>>>     >>>>      >>>>>> build, or we have the reviewer sift through the
>>>>     >>>>      contributors account
>>>>     >>>>      >> to
>>>>     >>>>      >>>>>> find it.
>>>>     >>>>      >>>>>>
>>>>     >>>>      >>>>>> This is rather cumbersome. Additionally, it's
>>>>     also not
>>>>     >>>>      equivalent to
>>>>     >>>>      >>>>>> having a PR build.
>>>>     >>>>      >>>>>>
>>>>     >>>>      >>>>>> A normal branch build takes a branch as is and
>>>>     tests it. A
>>>>     >>>>      PR build
>>>>     >>>>      >>>>>> merges the branch into master, and then runs it.
>>>>     (Fun fact:
>>>>     >>>>      This is
>>>>     >>>>      >> why
>>>>     >>>>      >>>>>> a PR without merge conflicts is not being run on
>>>>     Travis.)
>>>>     >>>>      >>>>>>
>>>>     >>>>      >>>>>> And ultimately, everyone can already make use 
>>>> of this
>>>>     >>>>      approach anyway.
>>>>     >>>>      >>>>>>
>>>>     >>>>      >>>>>> On 25/06/2019 08:02, Jark Wu wrote:
>>>>     >>>>      >>>>>>> Hi Jeff,
>>>>     >>>>      >>>>>>>
>>>>     >>>>      >>>>>>> Thanks for sharing the Zeppelin approach. I
>>>>     think it's a
>>>>     >>>>      good idea to
>>>>     >>>>      >>>>>>> leverage user's travis account.
>>>>     >>>>      >>>>>>> In this way, we can have almost unlimited
>>>>     concurrent build
>>>>     >>>>      jobs and
>>>>     >>>>      >>>>>>> developers can restart build by themselves
>>>>     (currently only
>>>>     >>>>      committers
>>>>     >>>>      >>>>>>> can restart PR's build).
>>>>     >>>>      >>>>>>>
>>>>     >>>>      >>>>>>> But I'm still not very clear how to integrate 
>>>> user's
>>>>     >>>>      travis build
>>>>     >>>>      >> into
>>>>     >>>>      >>>>>>> the Flink pull request's build automatically.
>>>>     Can you
>>>>     >>>>      explain more in
>>>>     >>>>      >>>>>>> detail?
>>>>     >>>>      >>>>>>>
>>>>     >>>>      >>>>>>> Another question: does travis only build
>>>>     branches for user
>>>>     >>>>      account?
>>>>     >>>>      >>>>>>> My concern is that builds for PRs will rebase 
>>>> user's
>>>>     >>>>      commits against
>>>>     >>>>      >>>>>>> current master branch.
>>>>     >>>>      >>>>>>> This will help us to find problems before
>>>>     merge.  Builds
>>>>     >>>>      for branches
>>>>     >>>>      >>>>>>> will lose the impact of new commits in master.
>>>>     >>>>      >>>>>>> How does Zeppelin solve this problem?
>>>>     >>>>      >>>>>>>
>>>>     >>>>      >>>>>>> Thanks again for sharing the idea.
>>>>     >>>>      >>>>>>>
>>>>     >>>>      >>>>>>> Regards,
>>>>     >>>>      >>>>>>> Jark
>>>>     >>>>      >>>>>>>
>>>>     >>>>      >>>>>>> On Tue, 25 Jun 2019 at 11:01, Jeff Zhang
>>>>     <zjffdu@gmail.com <ma...@gmail.com>
>>>>     >>>>      <mailto:zjffdu@gmail.com <ma...@gmail.com>>
>>>>     >>>>      >>>>>>> <mailto:zjffdu@gmail.com
>>>>     <ma...@gmail.com> <mailto:zjffdu@gmail.com
>>>>     <ma...@gmail.com>>>> wrote:
>>>>     >>>>      >>>>>>>
>>>>     >>>>      >>>>>>>  Hi Folks,
>>>>     >>>>      >>>>>>>
>>>>     >>>>      >>>>>>>  Zeppelin meet this kind of issue before, we 
>>>> solve
>>>>     >>>> it by
>>>>     >>>>      >> delegating
>>>>     >>>>      >>>>>>>  each
>>>>     >>>>      >>>>>>>  one's PR build to his travis account
>>>>     (Everyone can
>>>>     >>>>      have 5 free
>>>>     >>>>      >>>>>>>  slot for
>>>>     >>>>      >>>>>>>  travis build).
>>>>     >>>>      >>>>>>>  Apache account travis build is only 
>>>> triggered when
>>>>     >>>>      PR is merged.
>>>>     >>>>      >>>>>>>
>>>>     >>>>      >>>>>>>
>>>>     >>>>      >>>>>>>
>>>>     >>>>      >>>>>>>  Kurt Young <ykt836@gmail.com
>>>>     <ma...@gmail.com>
>>>>     >>>>      <mailto:ykt836@gmail.com <ma...@gmail.com>>
>>>>     <mailto:ykt836@gmail.com <ma...@gmail.com>
>>>>     >>>>      <mailto:ykt836@gmail.com <ma...@gmail.com>>>>
>>>>     >>>>      >>>>>>>  于2019年6月25日周二 上午10:16写道:
>>>>     >>>>      >>>>>>>
>>>>     >>>>      >>>>>>>  > (Forgot to cc George)
>>>>     >>>>      >>>>>>>  >
>>>>     >>>>      >>>>>>>  > Best,
>>>>     >>>>      >>>>>>>  > Kurt
>>>>     >>>>      >>>>>>>  >
>>>>     >>>>      >>>>>>>  >
>>>>     >>>>      >>>>>>>  > On Tue, Jun 25, 2019 at 10:16 AM Kurt Young
>>>>     >>>>      <ykt836@gmail.com <ma...@gmail.com>
>>>>     <mailto:ykt836@gmail.com <ma...@gmail.com>>
>>>>     >>>>      >>>>>>> <mailto:ykt836@gmail.com
>>>>     <ma...@gmail.com> <mailto:ykt836@gmail.com
>>>>     <ma...@gmail.com>>>>
>>>>     >>>>      wrote:
>>>>     >>>>      >>>>>>>  >
>>>>     >>>>      >>>>>>>  > > Hi Bowen,
>>>>     >>>>      >>>>>>>  > >
>>>>     >>>>      >>>>>>>  > > Thanks for bringing this up. We
>>>>     actually have
>>>>     >>>>      discussed
>>>>     >>>>      >> about
>>>>     >>>>      >>>>>>>  this, and I
>>>>     >>>>      >>>>>>>  > > think Till and George have
>>>>     >>>>      >>>>>>>  > > already spend sometime investigating
>>>>     it. I have
>>>>     >>>>      cced both of
>>>>     >>>>      >>>>>>>  them, and
>>>>     >>>>      >>>>>>>  > > maybe they can share
>>>>     >>>>      >>>>>>>  > > their findings.
>>>>     >>>>      >>>>>>>  > >
>>>>     >>>>      >>>>>>>  > > Best,
>>>>     >>>>      >>>>>>>  > > Kurt
>>>>     >>>>      >>>>>>>  > >
>>>>     >>>>      >>>>>>>  > >
>>>>     >>>>      >>>>>>>  > > On Tue, Jun 25, 2019 at 10:08 AM Jark Wu
>>>>     >>>>      <imjark@gmail.com <ma...@gmail.com>
>>>>     <mailto:imjark@gmail.com <ma...@gmail.com>>
>>>>     >>>>      >>>>>>> <mailto:imjark@gmail.com
>>>>     <ma...@gmail.com> <mailto:imjark@gmail.com
>>>>     <ma...@gmail.com>>>>
>>>>     >>>>      wrote:
>>>>     >>>>      >>>>>>>  > >
>>>>     >>>>      >>>>>>>  > >> Hi Bowen,
>>>>     >>>>      >>>>>>>  > >>
>>>>     >>>>      >>>>>>>  > >> Thanks for bringing this. We also
>>>>     suffered from
>>>>     >>>>      the long
>>>>     >>>>      >>>>>>>  build time.
>>>>     >>>>      >>>>>>>  > >> I agree that we should focus on
>>>>     solving build
>>>>     >>>>      capacity
>>>>     >>>>      >>>>>>>  problem in the
>>>>     >>>>      >>>>>>>  > >> thread.
>>>>     >>>>      >>>>>>>  > >>
>>>>     >>>>      >>>>>>>  > >> My observation is there is only one
>>>>     build is
>>>>     >>>>      running, all
>>>>     >>>>      >> the
>>>>     >>>>      >>>>>>>  others
>>>>     >>>>      >>>>>>>  > >> (other
>>>>     >>>>      >>>>>>>  > >> PRs, master) are pending.
>>>>     >>>>      >>>>>>>  > >> The pricing plan[1] of travis shows
>>>>     it can
>>>>     >>>> support
>>>>     >>>>      >> concurrent
>>>>     >>>>      >>>>>>>  build
>>>>     >>>>      >>>>>>>  > jobs.
>>>>     >>>>      >>>>>>>  > >> But I don't know which plan we are
>>>>     using, might
>>>>     >>>>      be the free
>>>>     >>>>      >>>>>>>  plan for
>>>>     >>>>      >>>>>>>  > open
>>>>     >>>>      >>>>>>>  > >> source.
>>>>     >>>>      >>>>>>>  > >>
>>>>     >>>>      >>>>>>>  > >> I cc-ed Chesnay who may have some
>>>>     experience on
>>>>     >>>>      Travis.
>>>>     >>>>      >>>>>>>  > >>
>>>>     >>>>      >>>>>>>  > >> Regards,
>>>>     >>>>      >>>>>>>  > >> Jark
>>>>     >>>>      >>>>>>>  > >>
>>>>     >>>>      >>>>>>>  > >> [1]: https://travis-ci.com/plans
>>>>     >>>>      >>>>>>>  > >>
>>>>     >>>>      >>>>>>>  > >> On Tue, 25 Jun 2019 at 08:11, Bowen Li <
>>>>     >>>>      >> bowenli86@gmail.com <ma...@gmail.com>
>>>>     <mailto:bowenli86@gmail.com <ma...@gmail.com>>
>>>>     >>>>      >>>>>>> <mailto:bowenli86@gmail.com
>>>>     <ma...@gmail.com>
>>>>     >>>>      <mailto:bowenli86@gmail.com
>>>>     <ma...@gmail.com>>>> wrote:
>>>>     >>>>      >>>>>>>  > >>
>>>>     >>>>      >>>>>>>  > >> > Hi Steven,
>>>>     >>>>      >>>>>>>  > >> >
>>>>     >>>>      >>>>>>>  > >> > I think you may not read what I
>>>>     wrote. The
>>>>     >>>>      discussion is
>>>>     >>>>      >>>> about
>>>>     >>>>      >>>>>>>  > "unstable
>>>>     >>>>      >>>>>>>  > >> > build **capacity**", in another word
>>>>     >>>>      "unstable / lack of
>>>>     >>>>      >>>> build
>>>>     >>>>      >>>>>>>  > >> resources",
>>>>     >>>>      >>>>>>>  > >> > not "unstable build".
>>>>     >>>>      >>>>>>>  > >> >
>>>>     >>>>      >>>>>>>  > >> > On Mon, Jun 24, 2019 at 4:40 PM
>>>>     Steven Wu
>>>>     >>>>      >>>>>>>  <stevenz3wu@gmail.com
>>>>     <ma...@gmail.com> <mailto:stevenz3wu@gmail.com
>>>>     <ma...@gmail.com>>
>>>>     >>>>      <mailto:stevenz3wu@gmail.com
>>>>     <ma...@gmail.com> <mailto:stevenz3wu@gmail.com
>>>>     <ma...@gmail.com>>>>
>>>>     >>>>      >>>>>>>  > wrote:
>>>>     >>>>      >>>>>>>  > >> >
>>>>     >>>>      >>>>>>>  > >> > > long and sometimes unstable build is
>>>>     >>>>      definitely a pain
>>>>     >>>>      >>>>>> point.
>>>>     >>>>      >>>>>>>  > >> > >
>>>>     >>>>      >>>>>>>  > >> > > I suspect the build failure here in
>>>>     >>>>      >> flink-connector-kafka
>>>>     >>>>      >>>>>>>  is not
>>>>     >>>>      >>>>>>>  > >> related
>>>>     >>>>      >>>>>>>  > >> > to
>>>>     >>>>      >>>>>>>  > >> > > my change. but there is no easy
>>>>     re-run the
>>>>     >>>>      build on
>>>>     >>>>      >>>>>>>  travis UI.
>>>>     >>>>      >>>>>>>  > Google
>>>>     >>>>      >>>>>>>  > >> > > search showed a trick of
>>>>     close-and-open the
>>>>     >>>>      PR will
>>>>     >>>>      >>>>>>>  trigger rebuild.
>>>>     >>>>      >>>>>>>  > >> but
>>>>     >>>>      >>>>>>>  > >> > > that could add noises to the PR
>>>>     activities.
>>>>     >>>>      >>>>>>>  > >> > >
>>>>     >>>> https://travis-ci.org/apache/flink/jobs/545555519
>>>>     >>>>      >>>>>>>  > >> > >
>>>>     >>>>      >>>>>>>  > >> > > travis-ci for my personal repo
>>>>     often failed
>>>>     >>>>      with
>>>>     >>>>      >>>>>>>  exceeding time
>>>>     >>>>      >>>>>>>  > limit
>>>>     >>>>      >>>>>>>  > >> > after
>>>>     >>>>      >>>>>>>  > >> > > 4+ hours.
>>>>     >>>>      >>>>>>>  > >> > > The job exceeded the maximum time
>>>>     limit for
>>>>     >>>>      jobs, and
>>>>     >>>>      >> has
>>>>     >>>>      >>>>>>>  been
>>>>     >>>>      >>>>>>>  > >> > terminated.
>>>>     >>>>      >>>>>>>  > >> > >
>>>>     >>>>      >>>>>>>  > >> > > On Mon, Jun 24, 2019 at 4:15 PM
>>>>     Bowen Li
>>>>     >>>>      >>>>>>>  <bowenli86@gmail.com
>>>>     <ma...@gmail.com> <mailto:bowenli86@gmail.com
>>>>     <ma...@gmail.com>>
>>>>     >>>>      <mailto:bowenli86@gmail.com <ma...@gmail.com>
>>>>     <mailto:bowenli86@gmail.com <ma...@gmail.com>>>>
>>>>     >>>>      >>>>>>>  > wrote:
>>>>     >>>>      >>>>>>>  > >> > >
>>>>     >>>>      >>>>>>>  > >> > > >
>>>>     >>>> https://travis-ci.org/apache/flink/builds/549681530
>>>>     >>>>      >>>>>>>  This build
>>>>     >>>>      >>>>>>>  > >> > request
>>>>     >>>>      >>>>>>>  > >> > > > has
>>>>     >>>>      >>>>>>>  > >> > > > been sitting at **HEAD of the
>>>>     queue**
>>>>     >>>>      since I first
>>>>     >>>>      >> saw
>>>>     >>>>      >>>>>>>  it at PST
>>>>     >>>>      >>>>>>>  > >> > 10:30am
>>>>     >>>>      >>>>>>>  > >> > > > (not sure how long it's been
>>>>     there before
>>>>     >>>>      10:30am).
>>>>     >>>>      >>>>>>>  It's PST
>>>>     >>>>      >>>>>>>  > 4:12pm
>>>>     >>>>      >>>>>>>  > >> now
>>>>     >>>>      >>>>>>>  > >> > > and
>>>>     >>>>      >>>>>>>  > >> > > > it hasn't started yet.
>>>>     >>>>      >>>>>>>  > >> > > >
>>>>     >>>>      >>>>>>>  > >> > > > On Mon, Jun 24, 2019 at 2:48 PM
>>>>     Bowen Li
>>>>     >>>>      >>>>>>>  <bowenli86@gmail.com
>>>>     <ma...@gmail.com> <mailto:bowenli86@gmail.com
>>>>     <ma...@gmail.com>>
>>>>     >>>>      <mailto:bowenli86@gmail.com <ma...@gmail.com>
>>>>     <mailto:bowenli86@gmail.com <ma...@gmail.com>>>>
>>>>     >>>>      >>>>>>>  > >> wrote:
>>>>     >>>>      >>>>>>>  > >> > > >
>>>>     >>>>      >>>>>>>  > >> > > > > Hi devs,
>>>>     >>>>      >>>>>>>  > >> > > > >
>>>>     >>>>      >>>>>>>  > >> > > > > I've been experiencing the pain
>>>>     >>>>      resulting from lack
>>>>     >>>>      >>>>>>>  of stable
>>>>     >>>>      >>>>>>>  > >> build
>>>>     >>>>      >>>>>>>  > >> > > > > capacity on Travis for Flink
>>>>     PRs [1].
>>>>     >>>>      >> Specifically, I
>>>>     >>>>      >>>>>>>  noticed
>>>>     >>>>      >>>>>>>  > >> often
>>>>     >>>>      >>>>>>>  > >> > > that
>>>>     >>>>      >>>>>>>  > >> > > > no
>>>>     >>>>      >>>>>>>  > >> > > > > build in the queue is making any
>>>>     >>>>      progress for
>>>>     >>>>      >> hours,
>>>>     >>>>      >>>> and
>>>>     >>>>      >>>>>>>  > suddenly
>>>>     >>>>      >>>>>>>  > >> 5
>>>>     >>>>      >>>>>>>  > >> > or
>>>>     >>>>      >>>>>>>  > >> > > 6
>>>>     >>>>      >>>>>>>  > >> > > > > builds kick off all together
>>>>     after the
>>>>     >>>>      long pause.
>>>>     >>>>      >>>>>>>  I'm at PST
>>>>     >>>>      >>>>>>>  > >> > (UTC-08)
>>>>     >>>>      >>>>>>>  > >> > > > time
>>>>     >>>>      >>>>>>>  > >> > > > > zone, and I've seen pause can
>>>>     be as
>>>>     >>>>      long as 6 hours
>>>>     >>>>      >>>>>>>  from PST 9am
>>>>     >>>>      >>>>>>>  > >> to
>>>>     >>>>      >>>>>>>  > >> > 3pm
>>>>     >>>>      >>>>>>>  > >> > > > > (let alone the time needed to
>>>>     drain the
>>>>     >>>>      queue
>>>>     >>>>      >>>>>>>  afterwards).
>>>>     >>>>      >>>>>>>  > >> > > > >
>>>>     >>>>      >>>>>>>  > >> > > > > I think this has greatly
>>>>     impacted our
>>>>     >>>>      productivity.
>>>>     >>>>      >>>> I've
>>>>     >>>>      >>>>>>>  > >> experienced
>>>>     >>>>      >>>>>>>  > >> > > that
>>>>     >>>>      >>>>>>>  > >> > > > > PRs submitted in the early
>>>>     morning of
>>>>     >>>>      PST time zone
>>>>     >>>>      >>>>>>>  won't finish
>>>>     >>>>      >>>>>>>  > >> > their
>>>>     >>>>      >>>>>>>  > >> > > > > build until late night of the
>>>>     same day.
>>>>     >>>>      >>>>>>>  > >> > > > >
>>>>     >>>>      >>>>>>>  > >> > > > > So my questions are:
>>>>     >>>>      >>>>>>>  > >> > > > >
>>>>     >>>>      >>>>>>>  > >> > > > > - Has anyone else experienced
>>>>     the same
>>>>     >>>>      problem or
>>>>     >>>>      >>>>>>>  have similar
>>>>     >>>>      >>>>>>>  > >> > > > observation
>>>>     >>>>      >>>>>>>  > >> > > > > on TravisCI? (I suspect it
>>>>     has things
>>>>     >>>>      to do with
>>>>     >>>>      >> time
>>>>     >>>>      >>>>>>>  zone)
>>>>     >>>>      >>>>>>>  > >> > > > >
>>>>     >>>>      >>>>>>>  > >> > > > > - What pricing plan of
>>>>     TravisCI is
>>>>     >>>>      Flink currently
>>>>     >>>>      >>>>>>>  using? Is it
>>>>     >>>>      >>>>>>>  > >> the
>>>>     >>>>      >>>>>>>  > >> > > free
>>>>     >>>>      >>>>>>>  > >> > > > > plan for open source
>>>>     projects? What
>>>>     >>>> are the
>>>>     >>>>      >>>>>>>  guaranteed build
>>>>     >>>>      >>>>>>>  > >> capacity
>>>>     >>>>      >>>>>>>  > >> > > of
>>>>     >>>>      >>>>>>>  > >> > > > > the current plan?
>>>>     >>>>      >>>>>>>  > >> > > > >
>>>>     >>>>      >>>>>>>  > >> > > > > - If the current pricing plan
>>>>     (either
>>>>     >>>>      free or paid)
>>>>     >>>>      >>>>>> can't
>>>>     >>>>      >>>>>>>  > provide
>>>>     >>>>      >>>>>>>  > >> > > stable
>>>>     >>>>      >>>>>>>  > >> > > > > build capacity, can we
>>>>     upgrade to a
>>>>     >>>>      higher priced
>>>>     >>>>      >>>>>>>  plan with
>>>>     >>>>      >>>>>>>  > larger
>>>>     >>>>      >>>>>>>  > >> > and
>>>>     >>>>      >>>>>>>  > >> > > > more
>>>>     >>>>      >>>>>>>  > >> > > > > stable build capacity?
>>>>     >>>>      >>>>>>>  > >> > > > >
>>>>     >>>>      >>>>>>>  > >> > > > > BTW, another factor that
>>>>     contribute to
>>>>     >>>> the
>>>>     >>>>      >>>>>>>  productivity problem
>>>>     >>>>      >>>>>>>  > is
>>>>     >>>>      >>>>>>>  > >> > that
>>>>     >>>>      >>>>>>>  > >> > > > > our build is slow - we run
>>>>     full build
>>>>     >>>>      for every PR
>>>>     >>>>      >>>> and a
>>>>     >>>>      >>>>>>>  > >> successful
>>>>     >>>>      >>>>>>>  > >> > > full
>>>>     >>>>      >>>>>>>  > >> > > > > build takes ~5h. We
>>>>     definitely have
>>>>     >>>>      more options to
>>>>     >>>>      >>>>>>>  solve it,
>>>>     >>>>      >>>>>>>  > for
>>>>     >>>>      >>>>>>>  > >> > > > instance,
>>>>     >>>>      >>>>>>>  > >> > > > > modularize the build graphs
>>>>     and reuse
>>>>     >>>>      artifacts
>>>>     >>>>      >> from
>>>>     >>>>      >>>> the
>>>>     >>>>      >>>>>>>  > previous
>>>>     >>>>      >>>>>>>  > >> > > build.
>>>>     >>>>      >>>>>>>  > >> > > > > But I think that can be a big
>>>>     effort
>>>>     >>>>      which is much
>>>>     >>>>      >>>>>>>  harder to
>>>>     >>>>      >>>>>>>  > >> > accomplish
>>>>     >>>>      >>>>>>>  > >> > > > in
>>>>     >>>>      >>>>>>>  > >> > > > > a short period of time and
>>>>     may deserve
>>>>     >>>>      its own
>>>>     >>>>      >>>> separate
>>>>     >>>>      >>>>>>>  > >> discussion.
>>>>     >>>>      >>>>>>>  > >> > > > >
>>>>     >>>>      >>>>>>>  > >> > > > > [1]
>>>>     >>>>      >> https://travis-ci.org/apache/flink/pull_requests
>>>>     >>>>      >>>>>>>  > >> > > > >
>>>>     >>>>      >>>>>>>  > >> > > > >
>>>>     >>>>      >>>>>>>  > >> > > >
>>>>     >>>>      >>>>>>>  > >> > >
>>>>     >>>>      >>>>>>>  > >> >
>>>>     >>>>      >>>>>>>  > >>
>>>>     >>>>      >>>>>>>  > >
>>>>     >>>>      >>>>>>>  >
>>>>     >>>>      >>>>>>>
>>>>     >>>>      >>>>>>>
>>>>     >>>>      >>>>>>>  --
>>>>     >>>>      >>>>>>>  Best Regards
>>>>     >>>>      >>>>>>>
>>>>     >>>>      >>>>>>>  Jeff Zhang
>>>>     >>>>      >>>>>>>
>>>>     >>>>      >>
>>>>     >>>>
>>>>     >>>
>>>>     >>
>>>>
>>>
>>>
>>
>>
>
>


Re: [RESULT][VOTE] Migrate to sponsored Travis account

Posted by Chesnay Schepler <ch...@apache.org>.
I have temporarily re-enabled running PR builds on the ASF account; 
migrating to the Travis subscription caused some issues in the bot that 
I have to fix first.

On 07/07/2019 23:01, Chesnay Schepler wrote:
> The vote has passed unanimously in favor of migrating to a separate 
> Travis account.
>
> I will now set things up such that no PullRequest is no longer run on 
> the ASF servers.
> This is a major setup in reducing our usage of ASF resources.
> For the time being we'll use free Travis plan for flink-ci (i.e. 5 
> workers, which is the same the ASF gives us). Over the course of the 
> next week we'll setup the Ververica subscription to increase this limit.
>
> From now now, a bot will mirror all new and updated PullRequests to a 
> mirror repository (https://github.com/flink-ci/flink-ci) and write an 
> update into the PR once the build is complete.
> I have ran the bots for the past 3 days in parallel to our existing 
> Travis and it was working without major issues.
>
> The biggest change that contributors will see is that there's no 
> longer a icon next to each commit. We may revisit this in the future.
>
> I'll setup a repo with the source of the bot later.
>
> On 04/07/2019 10:46, Chesnay Schepler wrote:
>> I've raised a JIRA 
>> <https://issues.apache.org/jira/browse/INFRA-18703>with INFRA to 
>> inquire whether it would be possible to switch to a different Travis 
>> account, and if so what steps would need to be taken.
>> We need a proper confirmation from INFRA since we are not in full 
>> control of the flink repository (for example, we cannot access the 
>> settings page).
>>
>> If this is indeed possible, Ververica is willing sponsor a Travis 
>> account for the Flink project.
>> This would provide us with more than enough resources than we need.
>>
>> Since this makes the project more reliant on resources provided by 
>> external companies I would like to vote on this.
>>
>> Please vote on this proposal, as follows:
>> [ ] +1, Approve the migration to a Ververica-sponsored Travis 
>> account, provided that INFRA approves
>> [ ] -1, Do not approach the migration to a Ververica-sponsored Travis 
>> account
>>
>> The vote will be open for at least 24h, and until we have 
>> confirmation from INFRA. The voting period may be shorter than the 
>> usual 3 days since our current is effectively not working.
>>
>> On 04/07/2019 06:51, Bowen Li wrote:
>>> Re: > Are they using their own Travis CI pool, or did the switch to 
>>> an entirely different CI service?
>>>
>>> I reached out to Wes and Krisztián from Apache Arrow PMC. They are 
>>> currently moving away from ASF's Travis to their own in-house metal 
>>> machines at [1] with custom CI application at [2]. They've seen 
>>> significant improvement w.r.t both much higher performance and 
>>> basically no resource waiting time, "night-and-day" difference 
>>> quoting Wes.
>>>
>>> Re: > If we can just switch to our own Travis pool, just for our 
>>> project, then this might be something we can do fairly quickly?
>>>
>>> I believe so, according to [3] and [4]
>>>
>>>
>>> [1] https://ci.ursalabs.org/ <https://ci.ursalabs.org/#/>
>>> [2] https://github.com/ursa-labs/ursabot
>>> [3] 
>>> https://docs.travis-ci.com/user/migrate/open-source-repository-migration 
>>>
>>> [4] 
>>> https://docs.travis-ci.com/user/migrate/open-source-on-travis-ci-com
>>>
>>>
>>>
>>> On Wed, Jul 3, 2019 at 12:01 AM Chesnay Schepler <chesnay@apache.org 
>>> <ma...@apache.org>> wrote:
>>>
>>>     Are they using their own Travis CI pool, or did the switch to an
>>>     entirely different CI service?
>>>
>>>     If we can just switch to our own Travis pool, just for our
>>>     project, then
>>>     this might be something we can do fairly quickly?
>>>
>>>     On 03/07/2019 05:55, Bowen Li wrote:
>>>     > I responded in the INFRA ticket [1] that I believe they are
>>>     using a wrong
>>>     > metric against Flink and the total build time is a completely
>>>     different
>>>     > thing than guaranteed build capacity.
>>>     >
>>>     > My response:
>>>     >
>>>     > "As mentioned above, since I started to pay attention to Flink's
>>>     build
>>>     > queue a few tens of days ago, I'm in Seattle and I saw no build
>>>     was kicking
>>>     > off in PST daytime in weekdays for Flink. Our teammates in China
>>>     and Europe
>>>     > have also reported similar observations. So we need to evaluate
>>>     how the
>>>     > large total build time came from - if 1) your number and 2) our
>>>     > observations from three locations that cover pretty much a full
>>>     day, are
>>>     > all true, I **guess** one reason can be that - highly likely the
>>>     extra
>>>     > build time came from weekends when other Apache projects may be
>>>     idle and
>>>     > Flink just drains hard its congested queue.
>>>     >
>>>     > Please be aware of that we're not complaining about the lack of
>>>     resources
>>>     > in general, I'm complaining about the lack of **stable, 
>>> dedicated**
>>>     > resources. An example for the latter one is, currently even if
>>>     no build is
>>>     > in Flink's queue and I submit a request to be the queue head 
>>> in PST
>>>     > morning, my build won't even start in 6-8+h. That is an absurd
>>>     amount of
>>>     > waiting time.
>>>     >
>>>     > That's saying, if ASF INFRA decides to adopt a quota system and
>>>     grants
>>>     > Flink five DEDICATED servers that runs all the time only for
>>>     Flink, that'll
>>>     > be PERFECT and can totally solve our problem now.
>>>     >
>>>     > Please be aware of that we're not complaining about the lack of
>>>     resources
>>>     > in general, I'm complaining about the lack of **stable, 
>>> dedicated**
>>>     > resources. An example for the latter one is, currently even if
>>>     no build is
>>>     > in Flink's queue and I submit a request to be the queue head 
>>> in PST
>>>     > morning, my build won't even start in 6-8+h. That is an absurd
>>>     amount of
>>>     > waiting time.
>>>     >
>>>     >
>>>     > That's saying, if ASF INFRA decides to adopt a quota system and
>>>     grants
>>>     > Flink five DEDICATED servers that runs all the time only for
>>>     Flink, that'll
>>>     > be PERFECT and can totally solve our problem now.
>>>     >
>>>     > I feel what's missing in the ASF INFRA's Travis resource pool is
>>>     some level
>>>     > of build capacity SLAs and certainty"
>>>     >
>>>     >
>>>     > Again, I believe there are differences in nature of these two
>>>     problems,
>>>     > long build time v.s. lack of dedicated build resource. That's
>>>     saying,
>>>     > shortening build time may relieve the situation, and may not.
>>>     I'm sightly
>>>     > negative on disabling IT cases for PRs, due to the downside is
>>>     that we are
>>>     > at risk of any potential bugs in PR that UTs doesn't catch, and
>>>     may cost a
>>>     > lot more to fix and if it slows others down or even block
>>>     others, but am
>>>     > open to others opinions on it.
>>>     >
>>>     > AFAICT from INFRA ticket[1], donating to ASF INFRA won't be
>>>     feasible to
>>>     > solve our problem since INFRA's pool is fully shared and they
>>>     have no
>>>     > control and finer insights over resource allocation to a
>>>     specific Apache
>>>     > project. As mentioned in [1], Apache Arrow is moving away from
>>>     ASF INFRA
>>>     > Travis pool (they are actually surprised Flink hasn't plan to do
>>>     so). I
>>>     > know that Spark is on its own build infra. If we all agree that
>>>     funding our
>>>     > own build infra, I'd be glad to help investigate any potential
>>>     options
>>>     > after releasing 1.9 since I'm super busy with 1.9 now.
>>>     >
>>>     > [1] https://issues.apache.org/jira/browse/INFRA-18533
>>>     >
>>>     >
>>>     >
>>>     > On Tue, Jul 2, 2019 at 4:46 AM Chesnay Schepler
>>>     <chesnay@apache.org <ma...@apache.org>> wrote:
>>>     >
>>>     >> As a short-term stopgap, since we can assume this issue to
>>>     become much
>>>     >> worse in the following days/weeks, we could disable IT cases in
>>>     PRs and
>>>     >> only run them on master.
>>>     >>
>>>     >> On 02/07/2019 12:03, Chesnay Schepler wrote:
>>>     >>> People really have to stop thinking that just because
>>>     something works
>>>     >>> for us it is also a good solution.
>>>     >>> Also, please remember that our builds run for 2h from start to
>>>     finish,
>>>     >>> and not the 14 _minutes_ it takes for zeppelin.
>>>     >>> We are dealing with an entirely different scale here, both in
>>>     terms of
>>>     >>> build times and number of builds.
>>>     >>>
>>>     >>> In this very thread people have been complaining about long 
>>> queue
>>>     >>> times for their builds. Surprise, other Apache projects have 
>>> been
>>>     >>> suffering the very same thing due to us not controlling our 
>>> build
>>>     >>> times. While switching services (be it Jenkins, CircleCI or
>>>     whatever)
>>>     >>> will possibly work for us (and these options are actually
>>>     attractive,
>>>     >>> like CircleCI's proper support for build artifacts), it will 
>>> also
>>>     >>> result in us likely negatively affecting other projects in
>>>     significant
>>>     >>> ways.
>>>     >>>
>>>     >>> Sure, the Jenkins setup has a good user experience for us, at
>>>     the cost
>>>     >>> of blocking Jenkins workers for a _lot_ of time. Right now we
>>>     have 25
>>>     >>> PR's in our queue; that's possibly 50h we'd consume of Jenkins
>>>     >>> resources, and the European contributors haven't even really
>>>     started yet.
>>>     >>>
>>>     >>> FYI, the latest INFRA response from INFRA-18533:
>>>     >>>
>>>     >>> "Our rough metrics shows that Flink used over 5800 hours of
>>>     build time
>>>     >>> last month. That is equal to EIGHT servers running 24/7 for
>>>     the ENTIRE
>>>     >>> MONTH. EIGHT. nonstop.
>>>     >>> When we discovered this last night, we discussed it some and
>>>     are going
>>>     >>> to tune down Flink to allow only five executors maximum. We 
>>> cannot
>>>     >>> allow Flink to consume so much of a Foundation shared 
>>> resource."
>>>     >>>
>>>     >>> So yes, we either
>>>     >>> a) have to heavily reduce our CI usage or
>>>     >>> b) fund our own, either maintaining it ourselves or donating
>>>     to Apache.
>>>     >>>
>>>     >>> On 02/07/2019 05:11, Bowen Li wrote:
>>>     >>>> By looking at the git history of the Jenkins script, its core
>>>     part
>>>     >>>> was finished in March 2017 (and only two minor update in
>>>     2017/2018),
>>>     >>>> so it's been running for over two years now and feels like
>>>     Zepplin
>>>     >>>> community has been quite happy with it. @Jeff Zhang
>>>     >>>> <mailto:zjffdu@gmail.com <ma...@gmail.com>> can you
>>>     share your insights and user
>>>     >>>> experience with the Jenkins+Travis approach?
>>>     >>>>
>>>     >>>> Things like:
>>>     >>>>
>>>     >>>> - has the approach completely solved the resource capacity
>>>     problem
>>>     >>>> for Zepplin community? is Zepplin community happy with the
>>>     result?
>>>     >>>> - is the whole configuration chain stable (e.g. uptime) 
>>> enough?
>>>     >>>> - how often do you need to maintain the Jenkins infra? how 
>>> many
>>>     >>>> people are usually involved in maintenance and bug-fixes?
>>>     >>>>
>>>     >>>> The downside of this approach seems mostly to be on the
>>>     maintenance
>>>     >>>> to me - maintain the script and Jenkins infra.
>>>     >>>>
>>>     >>>> ** Having Our Own Travis-CI.com Account **
>>>     >>>>
>>>     >>>> Another alternative I've been thinking of is to have our own
>>>     >>>> travis-ci.com <http://travis-ci.com> <http://travis-ci.com>
>>>     account with paid dedicated
>>>     >>>> resources. Note travis-ci.org <http://travis-ci.org>
>>>     <http://travis-ci.org> is the free
>>>     >>>> version and travis-ci.com <http://travis-ci.com>
>>>     <http://travis-ci.com> is the commercial
>>>     >>>> version. We currently use a shared resource pool managed by
>>>     ASK INFRA
>>>     >>>> team on travis-ci.org <http://travis-ci.org>
>>>     <http://travis-ci.org>, but we have no control
>>>     >>>> over it - we can't see how it's configured, how much
>>>     resources are
>>>     >>>> available, how resources are allocated among Apache projects,
>>>     etc.
>>>     >>>> The nice thing about having an account on travis-ci.com
>>>     <http://travis-ci.com>
>>>     >>>> <http://travis-ci.com> are:
>>>     >>>>
>>>     >>>> - relatively low cost with much better resource guarantee
>>>     than what
>>>     >>>> we currently have [1]: $249/month with 5 dedicated 
>>> concurrency,
>>>     >>>> $489/month with 10 concurrency
>>>     >>>> - low maintenance work compared to using Jenkins
>>>     >>>> - (potentially) no migration cost according to Travis's doc 
>>> [2]
>>>     >>>> (pending verification)
>>>     >>>> - full control over the build capacity/configuration 
>>> compared to
>>>     >>>> using ASF INFRA's pool
>>>     >>>>
>>>     >>>> I'd be surprised if we as such a vibrant community cannot
>>>     find and
>>>     >>>> fund $249*12=$2988 a year in exchange for a much better 
>>> developer
>>>     >>>> experience and much higher productivity.
>>>     >>>>
>>>     >>>> [1] https://travis-ci.com/plans
>>>     >>>> [2]
>>>     >>>>
>>>     >>
>>> https://docs.travis-ci.com/user/migrate/open-source-repository-migration 
>>>
>>>     >>>> On Sat, Jun 29, 2019 at 8:39 AM Chesnay Schepler
>>>     <chesnay@apache.org <ma...@apache.org>
>>>     >>>> <mailto:chesnay@apache.org <ma...@apache.org>>> 
>>> wrote:
>>>     >>>>
>>>     >>>>      So yes, the Jenkins job keeps pulling the state from
>>>     Travis until it
>>>     >>>>      finishes.
>>>     >>>>
>>>     >>>>      Note sure I'm comfortable with the idea of using Jenkins
>>>     workers
>>>     >>>>      just to
>>>     >>>>      idle for a several hours.
>>>     >>>>
>>>     >>>>      On 29/06/2019 14:56, Jeff Zhang wrote:
>>>     >>>>      > Here's what zeppelin community did, we make a python
>>>     script to
>>>     >>>>      check the
>>>     >>>>      > build status of pull request.
>>>     >>>>      > Here's script:
>>>     >>>>      >
>>> https://github.com/apache/zeppelin/blob/master/travis_check.py
>>>     >>>>      >
>>>     >>>>      > And this is the script we used in Jenkins build job.
>>>     >>>>      >
>>>     >>>>      > if [ -f "travis_check.py" ]; then
>>>     >>>>      >    git log -n 1
>>>     >>>>      >    STATUS=$(curl -s $BUILD_URL | grep -e "GitHub pull
>>>     >>>>      request.*from.*" | sed
>>>     >>>>      > 's/.*GitHub pull request <a
>>>     >>>>      > href=\"\(https[^"]*\).*from[^"]*.\(https[^"]*\).*/\1
>>>     \2/g')
>>>     >>>>      >    AUTHOR=$(echo $STATUS | sed 's/.*[/]\(.*\)$/\1/g')
>>>     >>>>      >    PR=$(echo $STATUS | awk '{print $1}' | sed
>>>     >>>> 's/.*[/]\(.*\)$/\1/g')
>>>     >>>>      >    #COMMIT=$(git log -n 1 | grep "^Merge:" | awk
>>>     '{print $3}')
>>>     >>>>      >    #if [ -z $COMMIT ]; then
>>>     >>>>      >    #  COMMIT=$(curl -s
>>>     >>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR
>>>     >>>>      > | grep -e "\"label\":" -e "\"ref\":" -e "\"sha\":" |
>>>     tr '\n' ' '
>>>     >>>>      | sed
>>>     >>>>      > 's/\(.*sha[^,]*,\)\(.*ref.*\)/\1 = \2/g' | tr = '\n' |
>>>     grep -v
>>>     >>>>      "apache:" |
>>>     >>>>      > sed 's/.*sha.[^"]*["]\([^"]*\).*/\1/g')
>>>     >>>>      >    #fi
>>>     >>>>      >
>>>     >>>>      >    # get commit hash from PR
>>>     >>>>      >    COMMIT=$(curl -s
>>>     >>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR |
>>>     >>>>      > grep -e "\"label\":" -e "\"ref\":" -e "\"sha\":" | tr
>>>     '\n' ' '
>>>     >>>> | sed
>>>     >>>>      > 's/\(.*sha[^,]*,\)\(.*ref.*\)/\1 = \2/g' | tr = '\n' |
>>>     grep -v
>>>     >>>>      "apache:" |
>>>     >>>>      > sed 's/.*sha.[^"]*["]\([^"]*\).*/\1/g')
>>>     >>>>      >    sleep 30 # sleep few moment to wait travis starts
>>>     the build
>>>     >>>>      >    RET_CODE=0
>>>     >>>>      >    python ./travis_check.py ${AUTHOR} ${COMMIT} ||
>>>     RET_CODE=$?
>>>     >>>>      >    if [ $RET_CODE -eq 2 ]; then # try with repository
>>>     name when
>>>     >>>>      travis-ci is
>>>     >>>>      > not available in the account
>>>     >>>>      >      RET_CODE=0
>>>     >>>>      >      AUTHOR=$(curl -s
>>>     >>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR
>>>     >>>>      > | grep '"full_name":' | grep -v "apache/zeppelin" | sed
>>>     >>>>      > 's/.*[:][^"]*["]\([^/]*\).*/\1/g')
>>>     >>>>      >    python ./travis_check.py ${AUTHOR} ${COMMIT} ||
>>>     RET_CODE=$?
>>>     >>>>      >    fi
>>>     >>>>      >
>>>     >>>>      >    if [ $RET_CODE -eq 2 ]; then # fail with can't find
>>>     build
>>>     >>>>      information in
>>>     >>>>      > the travis
>>>     >>>>      >      set +x
>>>     >>>>      >      echo
>>>     "-----------------------------------------------------"
>>>     >>>>      >      echo "Looks like travis-ci is not configured for
>>>     your fork."
>>>     >>>>      >      echo "Please setup by swich on 'zeppelin'
>>>     repository at
>>>     >>>>      > https://travis-ci.org/profile and travis-ci."
>>>     >>>>      >      echo "And then make sure 'Build branch updates'
>>>     option is
>>>     >>>>      enabled in
>>>     >>>>      > the settings
>>>     https://travis-ci.org/${AUTHOR}/zeppelin/settings
>>> <https://travis-ci.org/$%7BAUTHOR%7D/zeppelin/settings>
>>>     >>>> <https://travis-ci.org/$%7BAUTHOR%7D/zeppelin/settings>."
>>>     >>>>      >      echo ""
>>>     >>>>      >      echo "To trigger CI after setup, you will need
>>>     ammend your
>>>     >>>>      last commit
>>>     >>>>      > with"
>>>     >>>>      >      echo "git commit --amend"
>>>     >>>>      >      echo "git push your-remote HEAD --force"
>>>     >>>>      >      echo ""
>>>     >>>>      >      echo "See
>>>     >>>>      >
>>>     >>>>
>>>     >>
>>> http://zeppelin.apache.org/contribution/contributions.html#continuous-integration 
>>>
>>>     >>>>      > ."
>>>     >>>>      >    fi
>>>     >>>>      >
>>>     >>>>      >    exit $RET_CODE
>>>     >>>>      > else
>>>     >>>>      >    set +x
>>>     >>>>      >    echo "travis_check.py does not exists"
>>>     >>>>      >    exit 1
>>>     >>>>      > fi
>>>     >>>>      >
>>>     >>>>      > Chesnay Schepler <chesnay@apache.org
>>>     <ma...@apache.org>
>>>     >>>>      <mailto:chesnay@apache.org <ma...@apache.org>>>
>>>     于2019年6月29日周六 下午3:17写道:
>>>     >>>>      >
>>>     >>>>      >> Does this imply that a Jenkins job is active as long
>>>     as the
>>>     >>>>      Travis build
>>>     >>>>      >> runs?
>>>     >>>>      >>
>>>     >>>>      >> On 26/06/2019 21:28, Bowen Li wrote:
>>>     >>>>      >>> Hi,
>>>     >>>>      >>>
>>>     >>>>      >>> @Dawid, I think the "long test running" as I
>>>     mentioned in the
>>>     >>>>      first
>>>     >>>>      >> email,
>>>     >>>>      >>> also as you guys said, belongs to "a big effort
>>>     which is much
>>>     >>>>      harder to
>>>     >>>>      >>> accomplish in a short period of time and may deserve
>>>     its own
>>>     >>>>      separate
>>>     >>>>      >>> discussion". Thus I didn't include it in what we can
>>>     do in a
>>>     >>>>      foreseeable
>>>     >>>>      >>> short term.
>>>     >>>>      >>>
>>>     >>>>      >>> Besides, I don't think that's the ultimate reason
>>>     for lack of
>>>     >>>>      build
>>>     >>>>      >>> resources. Even if the build is shortened to
>>>     something like
>>>     >>>>      2h, the
>>>     >>>>      >>> problems of no build machine works about 6 or more
>>>     hours in
>>>     >>>>      PST daytime
>>>     >>>>      >>> that I described will still happen, because no
>>>     machine from
>>>     >>>>      ASF INFRA's
>>>     >>>>      >>> pool is allocated to Flink. As I have paid close
>>>     attention to
>>>     >>>>      the build
>>>     >>>>      >>> queue in the past few weekdays, it's a pretty clear
>>>     pattern now.
>>>     >>>>      >>>
>>>     >>>>      >>> **The ultimate root cause** for that is - we don't
>>>     have any
>>>     >>>>      **dedicated**
>>>     >>>>      >>> build resources that we can stably rely on. I'm
>>>     actually ok to
>>>     >>>>      wait for a
>>>     >>>>      >>> long time if there are build requests running, it
>>>     means at
>>>     >>>>      least we are
>>>     >>>>      >>> making progress. But I'm not ok with no build
>>>     resource. A
>>>     >>>>      better place I
>>>     >>>>      >>> think we should aim at in short term is to always
>>>     have at
>>>     >>>>      least a central
>>>     >>>>      >>> pool (can be 3 or 5) of machines dedicated to build
>>>     Flink at
>>>     >>>>      any time, or
>>>     >>>>      >>> maybe use users resources.
>>>     >>>>      >>>
>>>     >>>>      >>> @Chesnay @Robert I synced with Jeff offline that
>>>     Zeppelin
>>>     >>>>      community is
>>>     >>>>      >>> using a Jenkins job to automatically build on users'
>>>     travis
>>>     >>>>      account and
>>>     >>>>      >>> link the result back to github PR. I guess the
>>>     Jenkins job
>>>     >>>>      would fetch
>>>     >>>>      >>> latest upstream master and build the PR against it.
>>>     Jeff has
>>>     >>>> filed
>>>     >>>>      >> tickets
>>>     >>>>      >>> to learn and get access to the Jenkins infra. It'll
>>>     better to
>>>     >>>>      fully
>>>     >>>>      >>> understand it first before judging this approach.
>>>     >>>>      >>>
>>>     >>>>      >>> I also heard good things about CircleCI, and ASF
>>>     INFRA seems
>>>     >>>>      to have a
>>>     >>>>      >> pool
>>>     >>>>      >>> of build capacity there too. Can be an alternative
>>>     to consider.
>>>     >>>>      >>>
>>>     >>>>      >>>
>>>     >>>>      >>>
>>>     >>>>      >>>
>>>     >>>>      >>>
>>>     >>>>      >>>
>>>     >>>>      >>>
>>>     >>>>      >>>
>>>     >>>>      >>>
>>>     >>>>      >>> On Wed, Jun 26, 2019 at 12:44 AM Dawid Wysakowicz <
>>>     >>>>      >> dwysakowicz@apache.org
>>>     <ma...@apache.org> <mailto:dwysakowicz@apache.org
>>>     <ma...@apache.org>>>
>>>     >>>>      >>> wrote:
>>>     >>>>      >>>
>>>     >>>>      >>>> Sorry to jump in late, but I think Bowen missed the
>>>     most
>>>     >>>>      important point
>>>     >>>>      >>>> from Chesnay's previous message in the summary. The
>>>     ultimate
>>>     >>>>      reason for
>>>     >>>>      >>>> all the problems is that the tests take close to 2
>>>     hours to
>>>     >>>>      run already.
>>>     >>>>      >>>> I fully support this claim: "Unless people start
>>>     caring about
>>>     >>>>      test times
>>>     >>>>      >>>> before adding them, this issue cannot be solved"
>>>     >>>>      >>>>
>>>     >>>>      >>>> This is also another reason why using user's Travis
>>>     account
>>>     >>>>      won't help.
>>>     >>>>      >>>> Every few weeks we reach the user's time limit for
>>>     a single
>>>     >>>>      profile.
>>>     >>>>      >>>> This makes the user's builds simply fail, until we
>>>     either
>>>     >>>>      properly
>>>     >>>>      >>>> decrease the time the tests take (which I am not
>>>     sure we ever
>>>     >>>>      did) or
>>>     >>>>      >>>> postpone the problem by splitting into more
>>>     profiles. (Note
>>>     >>>>      that the ASF
>>>     >>>>      >>>> Travis account has higher time limits)
>>>     >>>>      >>>>
>>>     >>>>      >>>> Best,
>>>     >>>>      >>>>
>>>     >>>>      >>>> Dawid
>>>     >>>>      >>>>
>>>     >>>>      >>>> On 26/06/2019 09:36, Robert Metzger wrote:
>>>     >>>>      >>>>> Do we know if using "the best" available hardware
>>>     would
>>>     >>>>      improve the
>>>     >>>>      >> build
>>>     >>>>      >>>>> times?
>>>     >>>>      >>>>> Imagine we would run the build on machines with
>>>     plenty of
>>>     >>>>      main memory
>>>     >>>>      >> to
>>>     >>>>      >>>>> mount everything to ramdisk + the latest CPU
>>>     architecture?
>>>     >>>>      >>>>>
>>>     >>>>      >>>>> Throwing hardware at the problem could help reduce
>>>     the time
>>>     >>>>      of an
>>>     >>>>      >>>>> individual build, and using our own infrastructure
>>>     would
>>>     >>>>      remove our
>>>     >>>>      >>>>> dependency on Apache's Travis account (with the
>>>     obvious
>>>     >>>>      downside of
>>>     >>>>      >>>> having
>>>     >>>>      >>>>> to maintain the infrastructure)
>>>     >>>>      >>>>> We could use an open source travis alternative, to
>>>     have a
>>>     >>>>      similar
>>>     >>>>      >>>>> experience and make the migration easy.
>>>     >>>>      >>>>>
>>>     >>>>      >>>>>
>>>     >>>>      >>>>> On Wed, Jun 26, 2019 at 9:34 AM Chesnay Schepler
>>>     >>>>      <chesnay@apache.org <ma...@apache.org>
>>>     <mailto:chesnay@apache.org <ma...@apache.org>>>
>>>     >>>>      >>>> wrote:
>>>     >>>>      >>>>>>    >From what I gathered, there's no special
>>>     sauce that the
>>>     >>>>      Zeppelin
>>>     >>>>      >>>>>> project uses which actually integrates a users 
>>> Travis
>>>     >>>>      account into the
>>>     >>>>      >>>> PR.
>>>     >>>>      >>>>>> They just disabled Travis for PRs. And that's
>>>     kind of it.
>>>     >>>>      >>>>>>
>>>     >>>>      >>>>>> Naturally we can do this (duh) and safe the ASF a
>>>     fair
>>>     >>>>      amount of
>>>     >>>>      >>>>>> resources, but there are downsides:
>>>     >>>>      >>>>>>
>>>     >>>>      >>>>>> The discoverability of the Travis check takes a
>>>     nose-dive.
>>>     >>>>      Either we
>>>     >>>>      >>>>>> require every contributor to always, an every
>>>     commit, also
>>>     >>>>      post a
>>>     >>>>      >> Travis
>>>     >>>>      >>>>>> build, or we have the reviewer sift through the
>>>     >>>>      contributors account
>>>     >>>>      >> to
>>>     >>>>      >>>>>> find it.
>>>     >>>>      >>>>>>
>>>     >>>>      >>>>>> This is rather cumbersome. Additionally, it's
>>>     also not
>>>     >>>>      equivalent to
>>>     >>>>      >>>>>> having a PR build.
>>>     >>>>      >>>>>>
>>>     >>>>      >>>>>> A normal branch build takes a branch as is and
>>>     tests it. A
>>>     >>>>      PR build
>>>     >>>>      >>>>>> merges the branch into master, and then runs it.
>>>     (Fun fact:
>>>     >>>>      This is
>>>     >>>>      >> why
>>>     >>>>      >>>>>> a PR without merge conflicts is not being run on
>>>     Travis.)
>>>     >>>>      >>>>>>
>>>     >>>>      >>>>>> And ultimately, everyone can already make use 
>>> of this
>>>     >>>>      approach anyway.
>>>     >>>>      >>>>>>
>>>     >>>>      >>>>>> On 25/06/2019 08:02, Jark Wu wrote:
>>>     >>>>      >>>>>>> Hi Jeff,
>>>     >>>>      >>>>>>>
>>>     >>>>      >>>>>>> Thanks for sharing the Zeppelin approach. I
>>>     think it's a
>>>     >>>>      good idea to
>>>     >>>>      >>>>>>> leverage user's travis account.
>>>     >>>>      >>>>>>> In this way, we can have almost unlimited
>>>     concurrent build
>>>     >>>>      jobs and
>>>     >>>>      >>>>>>> developers can restart build by themselves
>>>     (currently only
>>>     >>>>      committers
>>>     >>>>      >>>>>>> can restart PR's build).
>>>     >>>>      >>>>>>>
>>>     >>>>      >>>>>>> But I'm still not very clear how to integrate 
>>> user's
>>>     >>>>      travis build
>>>     >>>>      >> into
>>>     >>>>      >>>>>>> the Flink pull request's build automatically.
>>>     Can you
>>>     >>>>      explain more in
>>>     >>>>      >>>>>>> detail?
>>>     >>>>      >>>>>>>
>>>     >>>>      >>>>>>> Another question: does travis only build
>>>     branches for user
>>>     >>>>      account?
>>>     >>>>      >>>>>>> My concern is that builds for PRs will rebase 
>>> user's
>>>     >>>>      commits against
>>>     >>>>      >>>>>>> current master branch.
>>>     >>>>      >>>>>>> This will help us to find problems before
>>>     merge.  Builds
>>>     >>>>      for branches
>>>     >>>>      >>>>>>> will lose the impact of new commits in master.
>>>     >>>>      >>>>>>> How does Zeppelin solve this problem?
>>>     >>>>      >>>>>>>
>>>     >>>>      >>>>>>> Thanks again for sharing the idea.
>>>     >>>>      >>>>>>>
>>>     >>>>      >>>>>>> Regards,
>>>     >>>>      >>>>>>> Jark
>>>     >>>>      >>>>>>>
>>>     >>>>      >>>>>>> On Tue, 25 Jun 2019 at 11:01, Jeff Zhang
>>>     <zjffdu@gmail.com <ma...@gmail.com>
>>>     >>>>      <mailto:zjffdu@gmail.com <ma...@gmail.com>>
>>>     >>>>      >>>>>>> <mailto:zjffdu@gmail.com
>>>     <ma...@gmail.com> <mailto:zjffdu@gmail.com
>>>     <ma...@gmail.com>>>> wrote:
>>>     >>>>      >>>>>>>
>>>     >>>>      >>>>>>>  Hi Folks,
>>>     >>>>      >>>>>>>
>>>     >>>>      >>>>>>>  Zeppelin meet this kind of issue before, we 
>>> solve
>>>     >>>> it by
>>>     >>>>      >> delegating
>>>     >>>>      >>>>>>>  each
>>>     >>>>      >>>>>>>  one's PR build to his travis account
>>>     (Everyone can
>>>     >>>>      have 5 free
>>>     >>>>      >>>>>>>  slot for
>>>     >>>>      >>>>>>>  travis build).
>>>     >>>>      >>>>>>>  Apache account travis build is only triggered 
>>> when
>>>     >>>>      PR is merged.
>>>     >>>>      >>>>>>>
>>>     >>>>      >>>>>>>
>>>     >>>>      >>>>>>>
>>>     >>>>      >>>>>>>  Kurt Young <ykt836@gmail.com
>>>     <ma...@gmail.com>
>>>     >>>>      <mailto:ykt836@gmail.com <ma...@gmail.com>>
>>>     <mailto:ykt836@gmail.com <ma...@gmail.com>
>>>     >>>>      <mailto:ykt836@gmail.com <ma...@gmail.com>>>>
>>>     >>>>      >>>>>>>  于2019年6月25日周二 上午10:16写道:
>>>     >>>>      >>>>>>>
>>>     >>>>      >>>>>>>  > (Forgot to cc George)
>>>     >>>>      >>>>>>>  >
>>>     >>>>      >>>>>>>  > Best,
>>>     >>>>      >>>>>>>  > Kurt
>>>     >>>>      >>>>>>>  >
>>>     >>>>      >>>>>>>  >
>>>     >>>>      >>>>>>>  > On Tue, Jun 25, 2019 at 10:16 AM Kurt Young
>>>     >>>>      <ykt836@gmail.com <ma...@gmail.com>
>>>     <mailto:ykt836@gmail.com <ma...@gmail.com>>
>>>     >>>>      >>>>>>> <mailto:ykt836@gmail.com
>>>     <ma...@gmail.com> <mailto:ykt836@gmail.com
>>>     <ma...@gmail.com>>>>
>>>     >>>>      wrote:
>>>     >>>>      >>>>>>>  >
>>>     >>>>      >>>>>>>  > > Hi Bowen,
>>>     >>>>      >>>>>>>  > >
>>>     >>>>      >>>>>>>  > > Thanks for bringing this up. We
>>>     actually have
>>>     >>>>      discussed
>>>     >>>>      >> about
>>>     >>>>      >>>>>>>  this, and I
>>>     >>>>      >>>>>>>  > > think Till and George have
>>>     >>>>      >>>>>>>  > > already spend sometime investigating
>>>     it. I have
>>>     >>>>      cced both of
>>>     >>>>      >>>>>>>  them, and
>>>     >>>>      >>>>>>>  > > maybe they can share
>>>     >>>>      >>>>>>>  > > their findings.
>>>     >>>>      >>>>>>>  > >
>>>     >>>>      >>>>>>>  > > Best,
>>>     >>>>      >>>>>>>  > > Kurt
>>>     >>>>      >>>>>>>  > >
>>>     >>>>      >>>>>>>  > >
>>>     >>>>      >>>>>>>  > > On Tue, Jun 25, 2019 at 10:08 AM Jark Wu
>>>     >>>>      <imjark@gmail.com <ma...@gmail.com>
>>>     <mailto:imjark@gmail.com <ma...@gmail.com>>
>>>     >>>>      >>>>>>> <mailto:imjark@gmail.com
>>>     <ma...@gmail.com> <mailto:imjark@gmail.com
>>>     <ma...@gmail.com>>>>
>>>     >>>>      wrote:
>>>     >>>>      >>>>>>>  > >
>>>     >>>>      >>>>>>>  > >> Hi Bowen,
>>>     >>>>      >>>>>>>  > >>
>>>     >>>>      >>>>>>>  > >> Thanks for bringing this. We also
>>>     suffered from
>>>     >>>>      the long
>>>     >>>>      >>>>>>>  build time.
>>>     >>>>      >>>>>>>  > >> I agree that we should focus on
>>>     solving build
>>>     >>>>      capacity
>>>     >>>>      >>>>>>>  problem in the
>>>     >>>>      >>>>>>>  > >> thread.
>>>     >>>>      >>>>>>>  > >>
>>>     >>>>      >>>>>>>  > >> My observation is there is only one
>>>     build is
>>>     >>>>      running, all
>>>     >>>>      >> the
>>>     >>>>      >>>>>>>  others
>>>     >>>>      >>>>>>>  > >> (other
>>>     >>>>      >>>>>>>  > >> PRs, master) are pending.
>>>     >>>>      >>>>>>>  > >> The pricing plan[1] of travis shows
>>>     it can
>>>     >>>> support
>>>     >>>>      >> concurrent
>>>     >>>>      >>>>>>>  build
>>>     >>>>      >>>>>>>  > jobs.
>>>     >>>>      >>>>>>>  > >> But I don't know which plan we are
>>>     using, might
>>>     >>>>      be the free
>>>     >>>>      >>>>>>>  plan for
>>>     >>>>      >>>>>>>  > open
>>>     >>>>      >>>>>>>  > >> source.
>>>     >>>>      >>>>>>>  > >>
>>>     >>>>      >>>>>>>  > >> I cc-ed Chesnay who may have some
>>>     experience on
>>>     >>>>      Travis.
>>>     >>>>      >>>>>>>  > >>
>>>     >>>>      >>>>>>>  > >> Regards,
>>>     >>>>      >>>>>>>  > >> Jark
>>>     >>>>      >>>>>>>  > >>
>>>     >>>>      >>>>>>>  > >> [1]: https://travis-ci.com/plans
>>>     >>>>      >>>>>>>  > >>
>>>     >>>>      >>>>>>>  > >> On Tue, 25 Jun 2019 at 08:11, Bowen Li <
>>>     >>>>      >> bowenli86@gmail.com <ma...@gmail.com>
>>>     <mailto:bowenli86@gmail.com <ma...@gmail.com>>
>>>     >>>>      >>>>>>> <mailto:bowenli86@gmail.com
>>>     <ma...@gmail.com>
>>>     >>>>      <mailto:bowenli86@gmail.com
>>>     <ma...@gmail.com>>>> wrote:
>>>     >>>>      >>>>>>>  > >>
>>>     >>>>      >>>>>>>  > >> > Hi Steven,
>>>     >>>>      >>>>>>>  > >> >
>>>     >>>>      >>>>>>>  > >> > I think you may not read what I
>>>     wrote. The
>>>     >>>>      discussion is
>>>     >>>>      >>>> about
>>>     >>>>      >>>>>>>  > "unstable
>>>     >>>>      >>>>>>>  > >> > build **capacity**", in another word
>>>     >>>>      "unstable / lack of
>>>     >>>>      >>>> build
>>>     >>>>      >>>>>>>  > >> resources",
>>>     >>>>      >>>>>>>  > >> > not "unstable build".
>>>     >>>>      >>>>>>>  > >> >
>>>     >>>>      >>>>>>>  > >> > On Mon, Jun 24, 2019 at 4:40 PM
>>>     Steven Wu
>>>     >>>>      >>>>>>>  <stevenz3wu@gmail.com
>>>     <ma...@gmail.com> <mailto:stevenz3wu@gmail.com
>>>     <ma...@gmail.com>>
>>>     >>>>      <mailto:stevenz3wu@gmail.com
>>>     <ma...@gmail.com> <mailto:stevenz3wu@gmail.com
>>>     <ma...@gmail.com>>>>
>>>     >>>>      >>>>>>>  > wrote:
>>>     >>>>      >>>>>>>  > >> >
>>>     >>>>      >>>>>>>  > >> > > long and sometimes unstable build is
>>>     >>>>      definitely a pain
>>>     >>>>      >>>>>> point.
>>>     >>>>      >>>>>>>  > >> > >
>>>     >>>>      >>>>>>>  > >> > > I suspect the build failure here in
>>>     >>>>      >> flink-connector-kafka
>>>     >>>>      >>>>>>>  is not
>>>     >>>>      >>>>>>>  > >> related
>>>     >>>>      >>>>>>>  > >> > to
>>>     >>>>      >>>>>>>  > >> > > my change. but there is no easy
>>>     re-run the
>>>     >>>>      build on
>>>     >>>>      >>>>>>>  travis UI.
>>>     >>>>      >>>>>>>  > Google
>>>     >>>>      >>>>>>>  > >> > > search showed a trick of
>>>     close-and-open the
>>>     >>>>      PR will
>>>     >>>>      >>>>>>>  trigger rebuild.
>>>     >>>>      >>>>>>>  > >> but
>>>     >>>>      >>>>>>>  > >> > > that could add noises to the PR
>>>     activities.
>>>     >>>>      >>>>>>>  > >> > >
>>>     >>>> https://travis-ci.org/apache/flink/jobs/545555519
>>>     >>>>      >>>>>>>  > >> > >
>>>     >>>>      >>>>>>>  > >> > > travis-ci for my personal repo
>>>     often failed
>>>     >>>>      with
>>>     >>>>      >>>>>>>  exceeding time
>>>     >>>>      >>>>>>>  > limit
>>>     >>>>      >>>>>>>  > >> > after
>>>     >>>>      >>>>>>>  > >> > > 4+ hours.
>>>     >>>>      >>>>>>>  > >> > > The job exceeded the maximum time
>>>     limit for
>>>     >>>>      jobs, and
>>>     >>>>      >> has
>>>     >>>>      >>>>>>>  been
>>>     >>>>      >>>>>>>  > >> > terminated.
>>>     >>>>      >>>>>>>  > >> > >
>>>     >>>>      >>>>>>>  > >> > > On Mon, Jun 24, 2019 at 4:15 PM
>>>     Bowen Li
>>>     >>>>      >>>>>>>  <bowenli86@gmail.com
>>>     <ma...@gmail.com> <mailto:bowenli86@gmail.com
>>>     <ma...@gmail.com>>
>>>     >>>>      <mailto:bowenli86@gmail.com <ma...@gmail.com>
>>>     <mailto:bowenli86@gmail.com <ma...@gmail.com>>>>
>>>     >>>>      >>>>>>>  > wrote:
>>>     >>>>      >>>>>>>  > >> > >
>>>     >>>>      >>>>>>>  > >> > > >
>>>     >>>> https://travis-ci.org/apache/flink/builds/549681530
>>>     >>>>      >>>>>>>  This build
>>>     >>>>      >>>>>>>  > >> > request
>>>     >>>>      >>>>>>>  > >> > > > has
>>>     >>>>      >>>>>>>  > >> > > > been sitting at **HEAD of the
>>>     queue**
>>>     >>>>      since I first
>>>     >>>>      >> saw
>>>     >>>>      >>>>>>>  it at PST
>>>     >>>>      >>>>>>>  > >> > 10:30am
>>>     >>>>      >>>>>>>  > >> > > > (not sure how long it's been
>>>     there before
>>>     >>>>      10:30am).
>>>     >>>>      >>>>>>>  It's PST
>>>     >>>>      >>>>>>>  > 4:12pm
>>>     >>>>      >>>>>>>  > >> now
>>>     >>>>      >>>>>>>  > >> > > and
>>>     >>>>      >>>>>>>  > >> > > > it hasn't started yet.
>>>     >>>>      >>>>>>>  > >> > > >
>>>     >>>>      >>>>>>>  > >> > > > On Mon, Jun 24, 2019 at 2:48 PM
>>>     Bowen Li
>>>     >>>>      >>>>>>>  <bowenli86@gmail.com
>>>     <ma...@gmail.com> <mailto:bowenli86@gmail.com
>>>     <ma...@gmail.com>>
>>>     >>>>      <mailto:bowenli86@gmail.com <ma...@gmail.com>
>>>     <mailto:bowenli86@gmail.com <ma...@gmail.com>>>>
>>>     >>>>      >>>>>>>  > >> wrote:
>>>     >>>>      >>>>>>>  > >> > > >
>>>     >>>>      >>>>>>>  > >> > > > > Hi devs,
>>>     >>>>      >>>>>>>  > >> > > > >
>>>     >>>>      >>>>>>>  > >> > > > > I've been experiencing the pain
>>>     >>>>      resulting from lack
>>>     >>>>      >>>>>>>  of stable
>>>     >>>>      >>>>>>>  > >> build
>>>     >>>>      >>>>>>>  > >> > > > > capacity on Travis for Flink
>>>     PRs [1].
>>>     >>>>      >> Specifically, I
>>>     >>>>      >>>>>>>  noticed
>>>     >>>>      >>>>>>>  > >> often
>>>     >>>>      >>>>>>>  > >> > > that
>>>     >>>>      >>>>>>>  > >> > > > no
>>>     >>>>      >>>>>>>  > >> > > > > build in the queue is making any
>>>     >>>>      progress for
>>>     >>>>      >> hours,
>>>     >>>>      >>>> and
>>>     >>>>      >>>>>>>  > suddenly
>>>     >>>>      >>>>>>>  > >> 5
>>>     >>>>      >>>>>>>  > >> > or
>>>     >>>>      >>>>>>>  > >> > > 6
>>>     >>>>      >>>>>>>  > >> > > > > builds kick off all together
>>>     after the
>>>     >>>>      long pause.
>>>     >>>>      >>>>>>>  I'm at PST
>>>     >>>>      >>>>>>>  > >> > (UTC-08)
>>>     >>>>      >>>>>>>  > >> > > > time
>>>     >>>>      >>>>>>>  > >> > > > > zone, and I've seen pause can
>>>     be as
>>>     >>>>      long as 6 hours
>>>     >>>>      >>>>>>>  from PST 9am
>>>     >>>>      >>>>>>>  > >> to
>>>     >>>>      >>>>>>>  > >> > 3pm
>>>     >>>>      >>>>>>>  > >> > > > > (let alone the time needed to
>>>     drain the
>>>     >>>>      queue
>>>     >>>>      >>>>>>>  afterwards).
>>>     >>>>      >>>>>>>  > >> > > > >
>>>     >>>>      >>>>>>>  > >> > > > > I think this has greatly
>>>     impacted our
>>>     >>>>      productivity.
>>>     >>>>      >>>> I've
>>>     >>>>      >>>>>>>  > >> experienced
>>>     >>>>      >>>>>>>  > >> > > that
>>>     >>>>      >>>>>>>  > >> > > > > PRs submitted in the early
>>>     morning of
>>>     >>>>      PST time zone
>>>     >>>>      >>>>>>>  won't finish
>>>     >>>>      >>>>>>>  > >> > their
>>>     >>>>      >>>>>>>  > >> > > > > build until late night of the
>>>     same day.
>>>     >>>>      >>>>>>>  > >> > > > >
>>>     >>>>      >>>>>>>  > >> > > > > So my questions are:
>>>     >>>>      >>>>>>>  > >> > > > >
>>>     >>>>      >>>>>>>  > >> > > > > - Has anyone else experienced
>>>     the same
>>>     >>>>      problem or
>>>     >>>>      >>>>>>>  have similar
>>>     >>>>      >>>>>>>  > >> > > > observation
>>>     >>>>      >>>>>>>  > >> > > > > on TravisCI? (I suspect it
>>>     has things
>>>     >>>>      to do with
>>>     >>>>      >> time
>>>     >>>>      >>>>>>>  zone)
>>>     >>>>      >>>>>>>  > >> > > > >
>>>     >>>>      >>>>>>>  > >> > > > > - What pricing plan of
>>>     TravisCI is
>>>     >>>>      Flink currently
>>>     >>>>      >>>>>>>  using? Is it
>>>     >>>>      >>>>>>>  > >> the
>>>     >>>>      >>>>>>>  > >> > > free
>>>     >>>>      >>>>>>>  > >> > > > > plan for open source
>>>     projects? What
>>>     >>>> are the
>>>     >>>>      >>>>>>>  guaranteed build
>>>     >>>>      >>>>>>>  > >> capacity
>>>     >>>>      >>>>>>>  > >> > > of
>>>     >>>>      >>>>>>>  > >> > > > > the current plan?
>>>     >>>>      >>>>>>>  > >> > > > >
>>>     >>>>      >>>>>>>  > >> > > > > - If the current pricing plan
>>>     (either
>>>     >>>>      free or paid)
>>>     >>>>      >>>>>> can't
>>>     >>>>      >>>>>>>  > provide
>>>     >>>>      >>>>>>>  > >> > > stable
>>>     >>>>      >>>>>>>  > >> > > > > build capacity, can we
>>>     upgrade to a
>>>     >>>>      higher priced
>>>     >>>>      >>>>>>>  plan with
>>>     >>>>      >>>>>>>  > larger
>>>     >>>>      >>>>>>>  > >> > and
>>>     >>>>      >>>>>>>  > >> > > > more
>>>     >>>>      >>>>>>>  > >> > > > > stable build capacity?
>>>     >>>>      >>>>>>>  > >> > > > >
>>>     >>>>      >>>>>>>  > >> > > > > BTW, another factor that
>>>     contribute to
>>>     >>>> the
>>>     >>>>      >>>>>>>  productivity problem
>>>     >>>>      >>>>>>>  > is
>>>     >>>>      >>>>>>>  > >> > that
>>>     >>>>      >>>>>>>  > >> > > > > our build is slow - we run
>>>     full build
>>>     >>>>      for every PR
>>>     >>>>      >>>> and a
>>>     >>>>      >>>>>>>  > >> successful
>>>     >>>>      >>>>>>>  > >> > > full
>>>     >>>>      >>>>>>>  > >> > > > > build takes ~5h. We
>>>     definitely have
>>>     >>>>      more options to
>>>     >>>>      >>>>>>>  solve it,
>>>     >>>>      >>>>>>>  > for
>>>     >>>>      >>>>>>>  > >> > > > instance,
>>>     >>>>      >>>>>>>  > >> > > > > modularize the build graphs
>>>     and reuse
>>>     >>>>      artifacts
>>>     >>>>      >> from
>>>     >>>>      >>>> the
>>>     >>>>      >>>>>>>  > previous
>>>     >>>>      >>>>>>>  > >> > > build.
>>>     >>>>      >>>>>>>  > >> > > > > But I think that can be a big
>>>     effort
>>>     >>>>      which is much
>>>     >>>>      >>>>>>>  harder to
>>>     >>>>      >>>>>>>  > >> > accomplish
>>>     >>>>      >>>>>>>  > >> > > > in
>>>     >>>>      >>>>>>>  > >> > > > > a short period of time and
>>>     may deserve
>>>     >>>>      its own
>>>     >>>>      >>>> separate
>>>     >>>>      >>>>>>>  > >> discussion.
>>>     >>>>      >>>>>>>  > >> > > > >
>>>     >>>>      >>>>>>>  > >> > > > > [1]
>>>     >>>>      >> https://travis-ci.org/apache/flink/pull_requests
>>>     >>>>      >>>>>>>  > >> > > > >
>>>     >>>>      >>>>>>>  > >> > > > >
>>>     >>>>      >>>>>>>  > >> > > >
>>>     >>>>      >>>>>>>  > >> > >
>>>     >>>>      >>>>>>>  > >> >
>>>     >>>>      >>>>>>>  > >>
>>>     >>>>      >>>>>>>  > >
>>>     >>>>      >>>>>>>  >
>>>     >>>>      >>>>>>>
>>>     >>>>      >>>>>>>
>>>     >>>>      >>>>>>>  --
>>>     >>>>      >>>>>>>  Best Regards
>>>     >>>>      >>>>>>>
>>>     >>>>      >>>>>>>  Jeff Zhang
>>>     >>>>      >>>>>>>
>>>     >>>>      >>
>>>     >>>>
>>>     >>>
>>>     >>
>>>
>>
>>
>
>


[RESULT][VOTE] Migrate to sponsored Travis account

Posted by Chesnay Schepler <ch...@apache.org>.
The vote has passed unanimously in favor of migrating to a separate 
Travis account.

I will now set things up such that no PullRequest is no longer run on 
the ASF servers.
This is a major setup in reducing our usage of ASF resources.
For the time being we'll use free Travis plan for flink-ci (i.e. 5 
workers, which is the same the ASF gives us). Over the course of the 
next week we'll setup the Ververica subscription to increase this limit.

 From now now, a bot will mirror all new and updated PullRequests to a 
mirror repository (https://github.com/flink-ci/flink-ci) and write an 
update into the PR once the build is complete.
I have ran the bots for the past 3 days in parallel to our existing 
Travis and it was working without major issues.

The biggest change that contributors will see is that there's no longer 
a icon next to each commit. We may revisit this in the future.

I'll setup a repo with the source of the bot later.

On 04/07/2019 10:46, Chesnay Schepler wrote:
> I've raised a JIRA 
> <https://issues.apache.org/jira/browse/INFRA-18703>with INFRA to 
> inquire whether it would be possible to switch to a different Travis 
> account, and if so what steps would need to be taken.
> We need a proper confirmation from INFRA since we are not in full 
> control of the flink repository (for example, we cannot access the 
> settings page).
>
> If this is indeed possible, Ververica is willing sponsor a Travis 
> account for the Flink project.
> This would provide us with more than enough resources than we need.
>
> Since this makes the project more reliant on resources provided by 
> external companies I would like to vote on this.
>
> Please vote on this proposal, as follows:
> [ ] +1, Approve the migration to a Ververica-sponsored Travis account, 
> provided that INFRA approves
> [ ] -1, Do not approach the migration to a Ververica-sponsored Travis 
> account
>
> The vote will be open for at least 24h, and until we have confirmation 
> from INFRA. The voting period may be shorter than the usual 3 days 
> since our current is effectively not working.
>
> On 04/07/2019 06:51, Bowen Li wrote:
>> Re: > Are they using their own Travis CI pool, or did the switch to 
>> an entirely different CI service?
>>
>> I reached out to Wes and Krisztián from Apache Arrow PMC. They are 
>> currently moving away from ASF's Travis to their own in-house metal 
>> machines at [1] with custom CI application at [2]. They've seen 
>> significant improvement w.r.t both much higher performance and 
>> basically no resource waiting time, "night-and-day" difference 
>> quoting Wes.
>>
>> Re: > If we can just switch to our own Travis pool, just for our 
>> project, then this might be something we can do fairly quickly?
>>
>> I believe so, according to [3] and [4]
>>
>>
>> [1] https://ci.ursalabs.org/ <https://ci.ursalabs.org/#/>
>> [2] https://github.com/ursa-labs/ursabot
>> [3] 
>> https://docs.travis-ci.com/user/migrate/open-source-repository-migration
>> [4] https://docs.travis-ci.com/user/migrate/open-source-on-travis-ci-com
>>
>>
>>
>> On Wed, Jul 3, 2019 at 12:01 AM Chesnay Schepler <chesnay@apache.org 
>> <ma...@apache.org>> wrote:
>>
>>     Are they using their own Travis CI pool, or did the switch to an
>>     entirely different CI service?
>>
>>     If we can just switch to our own Travis pool, just for our
>>     project, then
>>     this might be something we can do fairly quickly?
>>
>>     On 03/07/2019 05:55, Bowen Li wrote:
>>     > I responded in the INFRA ticket [1] that I believe they are
>>     using a wrong
>>     > metric against Flink and the total build time is a completely
>>     different
>>     > thing than guaranteed build capacity.
>>     >
>>     > My response:
>>     >
>>     > "As mentioned above, since I started to pay attention to Flink's
>>     build
>>     > queue a few tens of days ago, I'm in Seattle and I saw no build
>>     was kicking
>>     > off in PST daytime in weekdays for Flink. Our teammates in China
>>     and Europe
>>     > have also reported similar observations. So we need to evaluate
>>     how the
>>     > large total build time came from - if 1) your number and 2) our
>>     > observations from three locations that cover pretty much a full
>>     day, are
>>     > all true, I **guess** one reason can be that - highly likely the
>>     extra
>>     > build time came from weekends when other Apache projects may be
>>     idle and
>>     > Flink just drains hard its congested queue.
>>     >
>>     > Please be aware of that we're not complaining about the lack of
>>     resources
>>     > in general, I'm complaining about the lack of **stable, 
>> dedicated**
>>     > resources. An example for the latter one is, currently even if
>>     no build is
>>     > in Flink's queue and I submit a request to be the queue head in 
>> PST
>>     > morning, my build won't even start in 6-8+h. That is an absurd
>>     amount of
>>     > waiting time.
>>     >
>>     > That's saying, if ASF INFRA decides to adopt a quota system and
>>     grants
>>     > Flink five DEDICATED servers that runs all the time only for
>>     Flink, that'll
>>     > be PERFECT and can totally solve our problem now.
>>     >
>>     > Please be aware of that we're not complaining about the lack of
>>     resources
>>     > in general, I'm complaining about the lack of **stable, 
>> dedicated**
>>     > resources. An example for the latter one is, currently even if
>>     no build is
>>     > in Flink's queue and I submit a request to be the queue head in 
>> PST
>>     > morning, my build won't even start in 6-8+h. That is an absurd
>>     amount of
>>     > waiting time.
>>     >
>>     >
>>     > That's saying, if ASF INFRA decides to adopt a quota system and
>>     grants
>>     > Flink five DEDICATED servers that runs all the time only for
>>     Flink, that'll
>>     > be PERFECT and can totally solve our problem now.
>>     >
>>     > I feel what's missing in the ASF INFRA's Travis resource pool is
>>     some level
>>     > of build capacity SLAs and certainty"
>>     >
>>     >
>>     > Again, I believe there are differences in nature of these two
>>     problems,
>>     > long build time v.s. lack of dedicated build resource. That's
>>     saying,
>>     > shortening build time may relieve the situation, and may not.
>>     I'm sightly
>>     > negative on disabling IT cases for PRs, due to the downside is
>>     that we are
>>     > at risk of any potential bugs in PR that UTs doesn't catch, and
>>     may cost a
>>     > lot more to fix and if it slows others down or even block
>>     others, but am
>>     > open to others opinions on it.
>>     >
>>     > AFAICT from INFRA ticket[1], donating to ASF INFRA won't be
>>     feasible to
>>     > solve our problem since INFRA's pool is fully shared and they
>>     have no
>>     > control and finer insights over resource allocation to a
>>     specific Apache
>>     > project. As mentioned in [1], Apache Arrow is moving away from
>>     ASF INFRA
>>     > Travis pool (they are actually surprised Flink hasn't plan to do
>>     so). I
>>     > know that Spark is on its own build infra. If we all agree that
>>     funding our
>>     > own build infra, I'd be glad to help investigate any potential
>>     options
>>     > after releasing 1.9 since I'm super busy with 1.9 now.
>>     >
>>     > [1] https://issues.apache.org/jira/browse/INFRA-18533
>>     >
>>     >
>>     >
>>     > On Tue, Jul 2, 2019 at 4:46 AM Chesnay Schepler
>>     <chesnay@apache.org <ma...@apache.org>> wrote:
>>     >
>>     >> As a short-term stopgap, since we can assume this issue to
>>     become much
>>     >> worse in the following days/weeks, we could disable IT cases in
>>     PRs and
>>     >> only run them on master.
>>     >>
>>     >> On 02/07/2019 12:03, Chesnay Schepler wrote:
>>     >>> People really have to stop thinking that just because
>>     something works
>>     >>> for us it is also a good solution.
>>     >>> Also, please remember that our builds run for 2h from start to
>>     finish,
>>     >>> and not the 14 _minutes_ it takes for zeppelin.
>>     >>> We are dealing with an entirely different scale here, both in
>>     terms of
>>     >>> build times and number of builds.
>>     >>>
>>     >>> In this very thread people have been complaining about long 
>> queue
>>     >>> times for their builds. Surprise, other Apache projects have 
>> been
>>     >>> suffering the very same thing due to us not controlling our 
>> build
>>     >>> times. While switching services (be it Jenkins, CircleCI or
>>     whatever)
>>     >>> will possibly work for us (and these options are actually
>>     attractive,
>>     >>> like CircleCI's proper support for build artifacts), it will 
>> also
>>     >>> result in us likely negatively affecting other projects in
>>     significant
>>     >>> ways.
>>     >>>
>>     >>> Sure, the Jenkins setup has a good user experience for us, at
>>     the cost
>>     >>> of blocking Jenkins workers for a _lot_ of time. Right now we
>>     have 25
>>     >>> PR's in our queue; that's possibly 50h we'd consume of Jenkins
>>     >>> resources, and the European contributors haven't even really
>>     started yet.
>>     >>>
>>     >>> FYI, the latest INFRA response from INFRA-18533:
>>     >>>
>>     >>> "Our rough metrics shows that Flink used over 5800 hours of
>>     build time
>>     >>> last month. That is equal to EIGHT servers running 24/7 for
>>     the ENTIRE
>>     >>> MONTH. EIGHT. nonstop.
>>     >>> When we discovered this last night, we discussed it some and
>>     are going
>>     >>> to tune down Flink to allow only five executors maximum. We 
>> cannot
>>     >>> allow Flink to consume so much of a Foundation shared resource."
>>     >>>
>>     >>> So yes, we either
>>     >>> a) have to heavily reduce our CI usage or
>>     >>> b) fund our own, either maintaining it ourselves or donating
>>     to Apache.
>>     >>>
>>     >>> On 02/07/2019 05:11, Bowen Li wrote:
>>     >>>> By looking at the git history of the Jenkins script, its core
>>     part
>>     >>>> was finished in March 2017 (and only two minor update in
>>     2017/2018),
>>     >>>> so it's been running for over two years now and feels like
>>     Zepplin
>>     >>>> community has been quite happy with it. @Jeff Zhang
>>     >>>> <mailto:zjffdu@gmail.com <ma...@gmail.com>> can you
>>     share your insights and user
>>     >>>> experience with the Jenkins+Travis approach?
>>     >>>>
>>     >>>> Things like:
>>     >>>>
>>     >>>> - has the approach completely solved the resource capacity
>>     problem
>>     >>>> for Zepplin community? is Zepplin community happy with the
>>     result?
>>     >>>> - is the whole configuration chain stable (e.g. uptime) enough?
>>     >>>> - how often do you need to maintain the Jenkins infra? how many
>>     >>>> people are usually involved in maintenance and bug-fixes?
>>     >>>>
>>     >>>> The downside of this approach seems mostly to be on the
>>     maintenance
>>     >>>> to me - maintain the script and Jenkins infra.
>>     >>>>
>>     >>>> ** Having Our Own Travis-CI.com Account **
>>     >>>>
>>     >>>> Another alternative I've been thinking of is to have our own
>>     >>>> travis-ci.com <http://travis-ci.com> <http://travis-ci.com>
>>     account with paid dedicated
>>     >>>> resources. Note travis-ci.org <http://travis-ci.org>
>>     <http://travis-ci.org> is the free
>>     >>>> version and travis-ci.com <http://travis-ci.com>
>>     <http://travis-ci.com> is the commercial
>>     >>>> version. We currently use a shared resource pool managed by
>>     ASK INFRA
>>     >>>> team on travis-ci.org <http://travis-ci.org>
>>     <http://travis-ci.org>, but we have no control
>>     >>>> over it - we can't see how it's configured, how much
>>     resources are
>>     >>>> available, how resources are allocated among Apache projects,
>>     etc.
>>     >>>> The nice thing about having an account on travis-ci.com
>>     <http://travis-ci.com>
>>     >>>> <http://travis-ci.com> are:
>>     >>>>
>>     >>>> - relatively low cost with much better resource guarantee
>>     than what
>>     >>>> we currently have [1]: $249/month with 5 dedicated concurrency,
>>     >>>> $489/month with 10 concurrency
>>     >>>> - low maintenance work compared to using Jenkins
>>     >>>> - (potentially) no migration cost according to Travis's doc [2]
>>     >>>> (pending verification)
>>     >>>> - full control over the build capacity/configuration 
>> compared to
>>     >>>> using ASF INFRA's pool
>>     >>>>
>>     >>>> I'd be surprised if we as such a vibrant community cannot
>>     find and
>>     >>>> fund $249*12=$2988 a year in exchange for a much better 
>> developer
>>     >>>> experience and much higher productivity.
>>     >>>>
>>     >>>> [1] https://travis-ci.com/plans
>>     >>>> [2]
>>     >>>>
>>     >>
>> https://docs.travis-ci.com/user/migrate/open-source-repository-migration
>>     >>>> On Sat, Jun 29, 2019 at 8:39 AM Chesnay Schepler
>>     <chesnay@apache.org <ma...@apache.org>
>>     >>>> <mailto:chesnay@apache.org <ma...@apache.org>>> wrote:
>>     >>>>
>>     >>>>      So yes, the Jenkins job keeps pulling the state from
>>     Travis until it
>>     >>>>      finishes.
>>     >>>>
>>     >>>>      Note sure I'm comfortable with the idea of using Jenkins
>>     workers
>>     >>>>      just to
>>     >>>>      idle for a several hours.
>>     >>>>
>>     >>>>      On 29/06/2019 14:56, Jeff Zhang wrote:
>>     >>>>      > Here's what zeppelin community did, we make a python
>>     script to
>>     >>>>      check the
>>     >>>>      > build status of pull request.
>>     >>>>      > Here's script:
>>     >>>>      >
>> https://github.com/apache/zeppelin/blob/master/travis_check.py
>>     >>>>      >
>>     >>>>      > And this is the script we used in Jenkins build job.
>>     >>>>      >
>>     >>>>      > if [ -f "travis_check.py" ]; then
>>     >>>>      >    git log -n 1
>>     >>>>      >    STATUS=$(curl -s $BUILD_URL | grep -e "GitHub pull
>>     >>>>      request.*from.*" | sed
>>     >>>>      > 's/.*GitHub pull request <a
>>     >>>>      > href=\"\(https[^"]*\).*from[^"]*.\(https[^"]*\).*/\1
>>     \2/g')
>>     >>>>      >    AUTHOR=$(echo $STATUS | sed 's/.*[/]\(.*\)$/\1/g')
>>     >>>>      >    PR=$(echo $STATUS | awk '{print $1}' | sed
>>     >>>> 's/.*[/]\(.*\)$/\1/g')
>>     >>>>      >    #COMMIT=$(git log -n 1 | grep "^Merge:" | awk
>>     '{print $3}')
>>     >>>>      >    #if [ -z $COMMIT ]; then
>>     >>>>      >    #  COMMIT=$(curl -s
>>     >>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR
>>     >>>>      > | grep -e "\"label\":" -e "\"ref\":" -e "\"sha\":" |
>>     tr '\n' ' '
>>     >>>>      | sed
>>     >>>>      > 's/\(.*sha[^,]*,\)\(.*ref.*\)/\1 = \2/g' | tr = '\n' |
>>     grep -v
>>     >>>>      "apache:" |
>>     >>>>      > sed 's/.*sha.[^"]*["]\([^"]*\).*/\1/g')
>>     >>>>      >    #fi
>>     >>>>      >
>>     >>>>      >    # get commit hash from PR
>>     >>>>      >    COMMIT=$(curl -s
>>     >>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR |
>>     >>>>      > grep -e "\"label\":" -e "\"ref\":" -e "\"sha\":" | tr
>>     '\n' ' '
>>     >>>> | sed
>>     >>>>      > 's/\(.*sha[^,]*,\)\(.*ref.*\)/\1 = \2/g' | tr = '\n' |
>>     grep -v
>>     >>>>      "apache:" |
>>     >>>>      > sed 's/.*sha.[^"]*["]\([^"]*\).*/\1/g')
>>     >>>>      >    sleep 30 # sleep few moment to wait travis starts
>>     the build
>>     >>>>      >    RET_CODE=0
>>     >>>>      >    python ./travis_check.py ${AUTHOR} ${COMMIT} ||
>>     RET_CODE=$?
>>     >>>>      >    if [ $RET_CODE -eq 2 ]; then # try with repository
>>     name when
>>     >>>>      travis-ci is
>>     >>>>      > not available in the account
>>     >>>>      >      RET_CODE=0
>>     >>>>      >      AUTHOR=$(curl -s
>>     >>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR
>>     >>>>      > | grep '"full_name":' | grep -v "apache/zeppelin" | sed
>>     >>>>      > 's/.*[:][^"]*["]\([^/]*\).*/\1/g')
>>     >>>>      >    python ./travis_check.py ${AUTHOR} ${COMMIT} ||
>>     RET_CODE=$?
>>     >>>>      >    fi
>>     >>>>      >
>>     >>>>      >    if [ $RET_CODE -eq 2 ]; then # fail with can't find
>>     build
>>     >>>>      information in
>>     >>>>      > the travis
>>     >>>>      >      set +x
>>     >>>>      >      echo
>>     "-----------------------------------------------------"
>>     >>>>      >      echo "Looks like travis-ci is not configured for
>>     your fork."
>>     >>>>      >      echo "Please setup by swich on 'zeppelin'
>>     repository at
>>     >>>>      > https://travis-ci.org/profile and travis-ci."
>>     >>>>      >      echo "And then make sure 'Build branch updates'
>>     option is
>>     >>>>      enabled in
>>     >>>>      > the settings
>>     https://travis-ci.org/${AUTHOR}/zeppelin/settings
>> <https://travis-ci.org/$%7BAUTHOR%7D/zeppelin/settings>
>>     >>>> <https://travis-ci.org/$%7BAUTHOR%7D/zeppelin/settings>."
>>     >>>>      >      echo ""
>>     >>>>      >      echo "To trigger CI after setup, you will need
>>     ammend your
>>     >>>>      last commit
>>     >>>>      > with"
>>     >>>>      >      echo "git commit --amend"
>>     >>>>      >      echo "git push your-remote HEAD --force"
>>     >>>>      >      echo ""
>>     >>>>      >      echo "See
>>     >>>>      >
>>     >>>>
>>     >>
>> http://zeppelin.apache.org/contribution/contributions.html#continuous-integration
>>     >>>>      > ."
>>     >>>>      >    fi
>>     >>>>      >
>>     >>>>      >    exit $RET_CODE
>>     >>>>      > else
>>     >>>>      >    set +x
>>     >>>>      >    echo "travis_check.py does not exists"
>>     >>>>      >    exit 1
>>     >>>>      > fi
>>     >>>>      >
>>     >>>>      > Chesnay Schepler <chesnay@apache.org
>>     <ma...@apache.org>
>>     >>>>      <mailto:chesnay@apache.org <ma...@apache.org>>>
>>     于2019年6月29日周六 下午3:17写道:
>>     >>>>      >
>>     >>>>      >> Does this imply that a Jenkins job is active as long
>>     as the
>>     >>>>      Travis build
>>     >>>>      >> runs?
>>     >>>>      >>
>>     >>>>      >> On 26/06/2019 21:28, Bowen Li wrote:
>>     >>>>      >>> Hi,
>>     >>>>      >>>
>>     >>>>      >>> @Dawid, I think the "long test running" as I
>>     mentioned in the
>>     >>>>      first
>>     >>>>      >> email,
>>     >>>>      >>> also as you guys said, belongs to "a big effort
>>     which is much
>>     >>>>      harder to
>>     >>>>      >>> accomplish in a short period of time and may deserve
>>     its own
>>     >>>>      separate
>>     >>>>      >>> discussion". Thus I didn't include it in what we can
>>     do in a
>>     >>>>      foreseeable
>>     >>>>      >>> short term.
>>     >>>>      >>>
>>     >>>>      >>> Besides, I don't think that's the ultimate reason
>>     for lack of
>>     >>>>      build
>>     >>>>      >>> resources. Even if the build is shortened to
>>     something like
>>     >>>>      2h, the
>>     >>>>      >>> problems of no build machine works about 6 or more
>>     hours in
>>     >>>>      PST daytime
>>     >>>>      >>> that I described will still happen, because no
>>     machine from
>>     >>>>      ASF INFRA's
>>     >>>>      >>> pool is allocated to Flink. As I have paid close
>>     attention to
>>     >>>>      the build
>>     >>>>      >>> queue in the past few weekdays, it's a pretty clear
>>     pattern now.
>>     >>>>      >>>
>>     >>>>      >>> **The ultimate root cause** for that is - we don't
>>     have any
>>     >>>>      **dedicated**
>>     >>>>      >>> build resources that we can stably rely on. I'm
>>     actually ok to
>>     >>>>      wait for a
>>     >>>>      >>> long time if there are build requests running, it
>>     means at
>>     >>>>      least we are
>>     >>>>      >>> making progress. But I'm not ok with no build
>>     resource. A
>>     >>>>      better place I
>>     >>>>      >>> think we should aim at in short term is to always
>>     have at
>>     >>>>      least a central
>>     >>>>      >>> pool (can be 3 or 5) of machines dedicated to build
>>     Flink at
>>     >>>>      any time, or
>>     >>>>      >>> maybe use users resources.
>>     >>>>      >>>
>>     >>>>      >>> @Chesnay @Robert I synced with Jeff offline that
>>     Zeppelin
>>     >>>>      community is
>>     >>>>      >>> using a Jenkins job to automatically build on users'
>>     travis
>>     >>>>      account and
>>     >>>>      >>> link the result back to github PR. I guess the
>>     Jenkins job
>>     >>>>      would fetch
>>     >>>>      >>> latest upstream master and build the PR against it.
>>     Jeff has
>>     >>>> filed
>>     >>>>      >> tickets
>>     >>>>      >>> to learn and get access to the Jenkins infra. It'll
>>     better to
>>     >>>>      fully
>>     >>>>      >>> understand it first before judging this approach.
>>     >>>>      >>>
>>     >>>>      >>> I also heard good things about CircleCI, and ASF
>>     INFRA seems
>>     >>>>      to have a
>>     >>>>      >> pool
>>     >>>>      >>> of build capacity there too. Can be an alternative
>>     to consider.
>>     >>>>      >>>
>>     >>>>      >>>
>>     >>>>      >>>
>>     >>>>      >>>
>>     >>>>      >>>
>>     >>>>      >>>
>>     >>>>      >>>
>>     >>>>      >>>
>>     >>>>      >>>
>>     >>>>      >>> On Wed, Jun 26, 2019 at 12:44 AM Dawid Wysakowicz <
>>     >>>>      >> dwysakowicz@apache.org
>>     <ma...@apache.org> <mailto:dwysakowicz@apache.org
>>     <ma...@apache.org>>>
>>     >>>>      >>> wrote:
>>     >>>>      >>>
>>     >>>>      >>>> Sorry to jump in late, but I think Bowen missed the
>>     most
>>     >>>>      important point
>>     >>>>      >>>> from Chesnay's previous message in the summary. The
>>     ultimate
>>     >>>>      reason for
>>     >>>>      >>>> all the problems is that the tests take close to 2
>>     hours to
>>     >>>>      run already.
>>     >>>>      >>>> I fully support this claim: "Unless people start
>>     caring about
>>     >>>>      test times
>>     >>>>      >>>> before adding them, this issue cannot be solved"
>>     >>>>      >>>>
>>     >>>>      >>>> This is also another reason why using user's Travis
>>     account
>>     >>>>      won't help.
>>     >>>>      >>>> Every few weeks we reach the user's time limit for
>>     a single
>>     >>>>      profile.
>>     >>>>      >>>> This makes the user's builds simply fail, until we
>>     either
>>     >>>>      properly
>>     >>>>      >>>> decrease the time the tests take (which I am not
>>     sure we ever
>>     >>>>      did) or
>>     >>>>      >>>> postpone the problem by splitting into more
>>     profiles. (Note
>>     >>>>      that the ASF
>>     >>>>      >>>> Travis account has higher time limits)
>>     >>>>      >>>>
>>     >>>>      >>>> Best,
>>     >>>>      >>>>
>>     >>>>      >>>> Dawid
>>     >>>>      >>>>
>>     >>>>      >>>> On 26/06/2019 09:36, Robert Metzger wrote:
>>     >>>>      >>>>> Do we know if using "the best" available hardware
>>     would
>>     >>>>      improve the
>>     >>>>      >> build
>>     >>>>      >>>>> times?
>>     >>>>      >>>>> Imagine we would run the build on machines with
>>     plenty of
>>     >>>>      main memory
>>     >>>>      >> to
>>     >>>>      >>>>> mount everything to ramdisk + the latest CPU
>>     architecture?
>>     >>>>      >>>>>
>>     >>>>      >>>>> Throwing hardware at the problem could help reduce
>>     the time
>>     >>>>      of an
>>     >>>>      >>>>> individual build, and using our own infrastructure
>>     would
>>     >>>>      remove our
>>     >>>>      >>>>> dependency on Apache's Travis account (with the
>>     obvious
>>     >>>>      downside of
>>     >>>>      >>>> having
>>     >>>>      >>>>> to maintain the infrastructure)
>>     >>>>      >>>>> We could use an open source travis alternative, to
>>     have a
>>     >>>>      similar
>>     >>>>      >>>>> experience and make the migration easy.
>>     >>>>      >>>>>
>>     >>>>      >>>>>
>>     >>>>      >>>>> On Wed, Jun 26, 2019 at 9:34 AM Chesnay Schepler
>>     >>>>      <chesnay@apache.org <ma...@apache.org>
>>     <mailto:chesnay@apache.org <ma...@apache.org>>>
>>     >>>>      >>>> wrote:
>>     >>>>      >>>>>>    >From what I gathered, there's no special
>>     sauce that the
>>     >>>>      Zeppelin
>>     >>>>      >>>>>> project uses which actually integrates a users 
>> Travis
>>     >>>>      account into the
>>     >>>>      >>>> PR.
>>     >>>>      >>>>>> They just disabled Travis for PRs. And that's
>>     kind of it.
>>     >>>>      >>>>>>
>>     >>>>      >>>>>> Naturally we can do this (duh) and safe the ASF a
>>     fair
>>     >>>>      amount of
>>     >>>>      >>>>>> resources, but there are downsides:
>>     >>>>      >>>>>>
>>     >>>>      >>>>>> The discoverability of the Travis check takes a
>>     nose-dive.
>>     >>>>      Either we
>>     >>>>      >>>>>> require every contributor to always, an every
>>     commit, also
>>     >>>>      post a
>>     >>>>      >> Travis
>>     >>>>      >>>>>> build, or we have the reviewer sift through the
>>     >>>>      contributors account
>>     >>>>      >> to
>>     >>>>      >>>>>> find it.
>>     >>>>      >>>>>>
>>     >>>>      >>>>>> This is rather cumbersome. Additionally, it's
>>     also not
>>     >>>>      equivalent to
>>     >>>>      >>>>>> having a PR build.
>>     >>>>      >>>>>>
>>     >>>>      >>>>>> A normal branch build takes a branch as is and
>>     tests it. A
>>     >>>>      PR build
>>     >>>>      >>>>>> merges the branch into master, and then runs it.
>>     (Fun fact:
>>     >>>>      This is
>>     >>>>      >> why
>>     >>>>      >>>>>> a PR without merge conflicts is not being run on
>>     Travis.)
>>     >>>>      >>>>>>
>>     >>>>      >>>>>> And ultimately, everyone can already make use of 
>> this
>>     >>>>      approach anyway.
>>     >>>>      >>>>>>
>>     >>>>      >>>>>> On 25/06/2019 08:02, Jark Wu wrote:
>>     >>>>      >>>>>>> Hi Jeff,
>>     >>>>      >>>>>>>
>>     >>>>      >>>>>>> Thanks for sharing the Zeppelin approach. I
>>     think it's a
>>     >>>>      good idea to
>>     >>>>      >>>>>>> leverage user's travis account.
>>     >>>>      >>>>>>> In this way, we can have almost unlimited
>>     concurrent build
>>     >>>>      jobs and
>>     >>>>      >>>>>>> developers can restart build by themselves
>>     (currently only
>>     >>>>      committers
>>     >>>>      >>>>>>> can restart PR's build).
>>     >>>>      >>>>>>>
>>     >>>>      >>>>>>> But I'm still not very clear how to integrate 
>> user's
>>     >>>>      travis build
>>     >>>>      >> into
>>     >>>>      >>>>>>> the Flink pull request's build automatically.
>>     Can you
>>     >>>>      explain more in
>>     >>>>      >>>>>>> detail?
>>     >>>>      >>>>>>>
>>     >>>>      >>>>>>> Another question: does travis only build
>>     branches for user
>>     >>>>      account?
>>     >>>>      >>>>>>> My concern is that builds for PRs will rebase 
>> user's
>>     >>>>      commits against
>>     >>>>      >>>>>>> current master branch.
>>     >>>>      >>>>>>> This will help us to find problems before
>>     merge.  Builds
>>     >>>>      for branches
>>     >>>>      >>>>>>> will lose the impact of new commits in master.
>>     >>>>      >>>>>>> How does Zeppelin solve this problem?
>>     >>>>      >>>>>>>
>>     >>>>      >>>>>>> Thanks again for sharing the idea.
>>     >>>>      >>>>>>>
>>     >>>>      >>>>>>> Regards,
>>     >>>>      >>>>>>> Jark
>>     >>>>      >>>>>>>
>>     >>>>      >>>>>>> On Tue, 25 Jun 2019 at 11:01, Jeff Zhang
>>     <zjffdu@gmail.com <ma...@gmail.com>
>>     >>>>      <mailto:zjffdu@gmail.com <ma...@gmail.com>>
>>     >>>>      >>>>>>> <mailto:zjffdu@gmail.com
>>     <ma...@gmail.com> <mailto:zjffdu@gmail.com
>>     <ma...@gmail.com>>>> wrote:
>>     >>>>      >>>>>>>
>>     >>>>      >>>>>>>       Hi Folks,
>>     >>>>      >>>>>>>
>>     >>>>      >>>>>>>  Zeppelin meet this kind of issue before, we solve
>>     >>>> it by
>>     >>>>      >> delegating
>>     >>>>      >>>>>>>  each
>>     >>>>      >>>>>>>  one's PR build to his travis account
>>     (Everyone can
>>     >>>>      have 5 free
>>     >>>>      >>>>>>>  slot for
>>     >>>>      >>>>>>>  travis build).
>>     >>>>      >>>>>>>  Apache account travis build is only triggered 
>> when
>>     >>>>      PR is merged.
>>     >>>>      >>>>>>>
>>     >>>>      >>>>>>>
>>     >>>>      >>>>>>>
>>     >>>>      >>>>>>>  Kurt Young <ykt836@gmail.com
>>     <ma...@gmail.com>
>>     >>>>      <mailto:ykt836@gmail.com <ma...@gmail.com>>
>>     <mailto:ykt836@gmail.com <ma...@gmail.com>
>>     >>>>      <mailto:ykt836@gmail.com <ma...@gmail.com>>>>
>>     >>>>      >>>>>>>  于2019年6月25日周二 上午10:16写道:
>>     >>>>      >>>>>>>
>>     >>>>      >>>>>>>  > (Forgot to cc George)
>>     >>>>      >>>>>>>  >
>>     >>>>      >>>>>>>  > Best,
>>     >>>>      >>>>>>>  > Kurt
>>     >>>>      >>>>>>>  >
>>     >>>>      >>>>>>>  >
>>     >>>>      >>>>>>>  > On Tue, Jun 25, 2019 at 10:16 AM Kurt Young
>>     >>>>      <ykt836@gmail.com <ma...@gmail.com>
>>     <mailto:ykt836@gmail.com <ma...@gmail.com>>
>>     >>>>      >>>>>>> <mailto:ykt836@gmail.com
>>     <ma...@gmail.com> <mailto:ykt836@gmail.com
>>     <ma...@gmail.com>>>>
>>     >>>>      wrote:
>>     >>>>      >>>>>>>  >
>>     >>>>      >>>>>>>  > > Hi Bowen,
>>     >>>>      >>>>>>>  > >
>>     >>>>      >>>>>>>  > > Thanks for bringing this up. We
>>     actually have
>>     >>>>      discussed
>>     >>>>      >> about
>>     >>>>      >>>>>>>  this, and I
>>     >>>>      >>>>>>>  > > think Till and George have
>>     >>>>      >>>>>>>  > > already spend sometime investigating
>>     it. I have
>>     >>>>      cced both of
>>     >>>>      >>>>>>>  them, and
>>     >>>>      >>>>>>>  > > maybe they can share
>>     >>>>      >>>>>>>  > > their findings.
>>     >>>>      >>>>>>>  > >
>>     >>>>      >>>>>>>  > > Best,
>>     >>>>      >>>>>>>  > > Kurt
>>     >>>>      >>>>>>>  > >
>>     >>>>      >>>>>>>  > >
>>     >>>>      >>>>>>>  > > On Tue, Jun 25, 2019 at 10:08 AM Jark Wu
>>     >>>>      <imjark@gmail.com <ma...@gmail.com>
>>     <mailto:imjark@gmail.com <ma...@gmail.com>>
>>     >>>>      >>>>>>> <mailto:imjark@gmail.com
>>     <ma...@gmail.com> <mailto:imjark@gmail.com
>>     <ma...@gmail.com>>>>
>>     >>>>      wrote:
>>     >>>>      >>>>>>>  > >
>>     >>>>      >>>>>>>  > >> Hi Bowen,
>>     >>>>      >>>>>>>  > >>
>>     >>>>      >>>>>>>  > >> Thanks for bringing this. We also
>>     suffered from
>>     >>>>      the long
>>     >>>>      >>>>>>>  build time.
>>     >>>>      >>>>>>>  > >> I agree that we should focus on
>>     solving build
>>     >>>>      capacity
>>     >>>>      >>>>>>>  problem in the
>>     >>>>      >>>>>>>  > >> thread.
>>     >>>>      >>>>>>>  > >>
>>     >>>>      >>>>>>>  > >> My observation is there is only one
>>     build is
>>     >>>>      running, all
>>     >>>>      >> the
>>     >>>>      >>>>>>>  others
>>     >>>>      >>>>>>>  > >> (other
>>     >>>>      >>>>>>>  > >> PRs, master) are pending.
>>     >>>>      >>>>>>>  > >> The pricing plan[1] of travis shows
>>     it can
>>     >>>> support
>>     >>>>      >> concurrent
>>     >>>>      >>>>>>>  build
>>     >>>>      >>>>>>>  > jobs.
>>     >>>>      >>>>>>>  > >> But I don't know which plan we are
>>     using, might
>>     >>>>      be the free
>>     >>>>      >>>>>>>  plan for
>>     >>>>      >>>>>>>  > open
>>     >>>>      >>>>>>>  > >> source.
>>     >>>>      >>>>>>>  > >>
>>     >>>>      >>>>>>>  > >> I cc-ed Chesnay who may have some
>>     experience on
>>     >>>>      Travis.
>>     >>>>      >>>>>>>  > >>
>>     >>>>      >>>>>>>  > >> Regards,
>>     >>>>      >>>>>>>  > >> Jark
>>     >>>>      >>>>>>>  > >>
>>     >>>>      >>>>>>>  > >> [1]: https://travis-ci.com/plans
>>     >>>>      >>>>>>>  > >>
>>     >>>>      >>>>>>>  > >> On Tue, 25 Jun 2019 at 08:11, Bowen Li <
>>     >>>>      >> bowenli86@gmail.com <ma...@gmail.com>
>>     <mailto:bowenli86@gmail.com <ma...@gmail.com>>
>>     >>>>      >>>>>>> <mailto:bowenli86@gmail.com
>>     <ma...@gmail.com>
>>     >>>>      <mailto:bowenli86@gmail.com
>>     <ma...@gmail.com>>>> wrote:
>>     >>>>      >>>>>>>  > >>
>>     >>>>      >>>>>>>  > >> > Hi Steven,
>>     >>>>      >>>>>>>  > >> >
>>     >>>>      >>>>>>>  > >> > I think you may not read what I
>>     wrote. The
>>     >>>>      discussion is
>>     >>>>      >>>> about
>>     >>>>      >>>>>>>  > "unstable
>>     >>>>      >>>>>>>  > >> > build **capacity**", in another word
>>     >>>>      "unstable / lack of
>>     >>>>      >>>> build
>>     >>>>      >>>>>>>  > >> resources",
>>     >>>>      >>>>>>>  > >> > not "unstable build".
>>     >>>>      >>>>>>>  > >> >
>>     >>>>      >>>>>>>  > >> > On Mon, Jun 24, 2019 at 4:40 PM
>>     Steven Wu
>>     >>>>      >>>>>>>  <stevenz3wu@gmail.com
>>     <ma...@gmail.com> <mailto:stevenz3wu@gmail.com
>>     <ma...@gmail.com>>
>>     >>>>      <mailto:stevenz3wu@gmail.com
>>     <ma...@gmail.com> <mailto:stevenz3wu@gmail.com
>>     <ma...@gmail.com>>>>
>>     >>>>      >>>>>>>  > wrote:
>>     >>>>      >>>>>>>  > >> >
>>     >>>>      >>>>>>>  > >> > > long and sometimes unstable build is
>>     >>>>      definitely a pain
>>     >>>>      >>>>>> point.
>>     >>>>      >>>>>>>  > >> > >
>>     >>>>      >>>>>>>  > >> > > I suspect the build failure here in
>>     >>>>      >> flink-connector-kafka
>>     >>>>      >>>>>>>       is not
>>     >>>>      >>>>>>>  > >> related
>>     >>>>      >>>>>>>  > >> > to
>>     >>>>      >>>>>>>  > >> > > my change. but there is no easy
>>     re-run the
>>     >>>>      build on
>>     >>>>      >>>>>>>  travis UI.
>>     >>>>      >>>>>>>  > Google
>>     >>>>      >>>>>>>  > >> > > search showed a trick of
>>     close-and-open the
>>     >>>>      PR will
>>     >>>>      >>>>>>>  trigger rebuild.
>>     >>>>      >>>>>>>  > >> but
>>     >>>>      >>>>>>>  > >> > > that could add noises to the PR
>>     activities.
>>     >>>>      >>>>>>>  > >> > >
>>     >>>> https://travis-ci.org/apache/flink/jobs/545555519
>>     >>>>      >>>>>>>  > >> > >
>>     >>>>      >>>>>>>  > >> > > travis-ci for my personal repo
>>     often failed
>>     >>>>      with
>>     >>>>      >>>>>>>  exceeding time
>>     >>>>      >>>>>>>  > limit
>>     >>>>      >>>>>>>  > >> > after
>>     >>>>      >>>>>>>  > >> > > 4+ hours.
>>     >>>>      >>>>>>>  > >> > > The job exceeded the maximum time
>>     limit for
>>     >>>>      jobs, and
>>     >>>>      >> has
>>     >>>>      >>>>>>>  been
>>     >>>>      >>>>>>>  > >> > terminated.
>>     >>>>      >>>>>>>  > >> > >
>>     >>>>      >>>>>>>  > >> > > On Mon, Jun 24, 2019 at 4:15 PM
>>     Bowen Li
>>     >>>>      >>>>>>>  <bowenli86@gmail.com
>>     <ma...@gmail.com> <mailto:bowenli86@gmail.com
>>     <ma...@gmail.com>>
>>     >>>>      <mailto:bowenli86@gmail.com <ma...@gmail.com>
>>     <mailto:bowenli86@gmail.com <ma...@gmail.com>>>>
>>     >>>>      >>>>>>>  > wrote:
>>     >>>>      >>>>>>>  > >> > >
>>     >>>>      >>>>>>>  > >> > > >
>>     >>>> https://travis-ci.org/apache/flink/builds/549681530
>>     >>>>      >>>>>>>  This build
>>     >>>>      >>>>>>>  > >> > request
>>     >>>>      >>>>>>>  > >> > > > has
>>     >>>>      >>>>>>>  > >> > > > been sitting at **HEAD of the
>>     queue**
>>     >>>>      since I first
>>     >>>>      >> saw
>>     >>>>      >>>>>>>       it at PST
>>     >>>>      >>>>>>>  > >> > 10:30am
>>     >>>>      >>>>>>>  > >> > > > (not sure how long it's been
>>     there before
>>     >>>>      10:30am).
>>     >>>>      >>>>>>>  It's PST
>>     >>>>      >>>>>>>  > 4:12pm
>>     >>>>      >>>>>>>  > >> now
>>     >>>>      >>>>>>>  > >> > > and
>>     >>>>      >>>>>>>  > >> > > > it hasn't started yet.
>>     >>>>      >>>>>>>  > >> > > >
>>     >>>>      >>>>>>>  > >> > > > On Mon, Jun 24, 2019 at 2:48 PM
>>     Bowen Li
>>     >>>>      >>>>>>>  <bowenli86@gmail.com
>>     <ma...@gmail.com> <mailto:bowenli86@gmail.com
>>     <ma...@gmail.com>>
>>     >>>>      <mailto:bowenli86@gmail.com <ma...@gmail.com>
>>     <mailto:bowenli86@gmail.com <ma...@gmail.com>>>>
>>     >>>>      >>>>>>>  > >> wrote:
>>     >>>>      >>>>>>>  > >> > > >
>>     >>>>      >>>>>>>  > >> > > > > Hi devs,
>>     >>>>      >>>>>>>  > >> > > > >
>>     >>>>      >>>>>>>  > >> > > > > I've been experiencing the pain
>>     >>>>      resulting from lack
>>     >>>>      >>>>>>>       of stable
>>     >>>>      >>>>>>>  > >> build
>>     >>>>      >>>>>>>  > >> > > > > capacity on Travis for Flink
>>     PRs [1].
>>     >>>>      >> Specifically, I
>>     >>>>      >>>>>>>  noticed
>>     >>>>      >>>>>>>  > >> often
>>     >>>>      >>>>>>>  > >> > > that
>>     >>>>      >>>>>>>  > >> > > > no
>>     >>>>      >>>>>>>  > >> > > > > build in the queue is making any
>>     >>>>      progress for
>>     >>>>      >> hours,
>>     >>>>      >>>> and
>>     >>>>      >>>>>>>  > suddenly
>>     >>>>      >>>>>>>  > >> 5
>>     >>>>      >>>>>>>  > >> > or
>>     >>>>      >>>>>>>  > >> > > 6
>>     >>>>      >>>>>>>  > >> > > > > builds kick off all together
>>     after the
>>     >>>>      long pause.
>>     >>>>      >>>>>>>       I'm at PST
>>     >>>>      >>>>>>>  > >> > (UTC-08)
>>     >>>>      >>>>>>>  > >> > > > time
>>     >>>>      >>>>>>>  > >> > > > > zone, and I've seen pause can
>>     be as
>>     >>>>      long as 6 hours
>>     >>>>      >>>>>>>  from PST 9am
>>     >>>>      >>>>>>>  > >> to
>>     >>>>      >>>>>>>  > >> > 3pm
>>     >>>>      >>>>>>>  > >> > > > > (let alone the time needed to
>>     drain the
>>     >>>>      queue
>>     >>>>      >>>>>>>  afterwards).
>>     >>>>      >>>>>>>  > >> > > > >
>>     >>>>      >>>>>>>  > >> > > > > I think this has greatly
>>     impacted our
>>     >>>>      productivity.
>>     >>>>      >>>> I've
>>     >>>>      >>>>>>>  > >> experienced
>>     >>>>      >>>>>>>  > >> > > that
>>     >>>>      >>>>>>>  > >> > > > > PRs submitted in the early
>>     morning of
>>     >>>>      PST time zone
>>     >>>>      >>>>>>>  won't finish
>>     >>>>      >>>>>>>  > >> > their
>>     >>>>      >>>>>>>  > >> > > > > build until late night of the
>>     same day.
>>     >>>>      >>>>>>>  > >> > > > >
>>     >>>>      >>>>>>>  > >> > > > > So my questions are:
>>     >>>>      >>>>>>>  > >> > > > >
>>     >>>>      >>>>>>>  > >> > > > > - Has anyone else experienced
>>     the same
>>     >>>>      problem or
>>     >>>>      >>>>>>>  have similar
>>     >>>>      >>>>>>>  > >> > > > observation
>>     >>>>      >>>>>>>  > >> > > > > on TravisCI? (I suspect it
>>     has things
>>     >>>>      to do with
>>     >>>>      >> time
>>     >>>>      >>>>>>>  zone)
>>     >>>>      >>>>>>>  > >> > > > >
>>     >>>>      >>>>>>>  > >> > > > > - What pricing plan of
>>     TravisCI is
>>     >>>>      Flink currently
>>     >>>>      >>>>>>>  using? Is it
>>     >>>>      >>>>>>>  > >> the
>>     >>>>      >>>>>>>  > >> > > free
>>     >>>>      >>>>>>>  > >> > > > > plan for open source
>>     projects? What
>>     >>>> are the
>>     >>>>      >>>>>>>  guaranteed build
>>     >>>>      >>>>>>>  > >> capacity
>>     >>>>      >>>>>>>  > >> > > of
>>     >>>>      >>>>>>>  > >> > > > > the current plan?
>>     >>>>      >>>>>>>  > >> > > > >
>>     >>>>      >>>>>>>  > >> > > > > - If the current pricing plan
>>     (either
>>     >>>>      free or paid)
>>     >>>>      >>>>>> can't
>>     >>>>      >>>>>>>  > provide
>>     >>>>      >>>>>>>  > >> > > stable
>>     >>>>      >>>>>>>  > >> > > > > build capacity, can we
>>     upgrade to a
>>     >>>>      higher priced
>>     >>>>      >>>>>>>  plan with
>>     >>>>      >>>>>>>  > larger
>>     >>>>      >>>>>>>  > >> > and
>>     >>>>      >>>>>>>  > >> > > > more
>>     >>>>      >>>>>>>  > >> > > > > stable build capacity?
>>     >>>>      >>>>>>>  > >> > > > >
>>     >>>>      >>>>>>>  > >> > > > > BTW, another factor that
>>     contribute to
>>     >>>> the
>>     >>>>      >>>>>>>  productivity problem
>>     >>>>      >>>>>>>  > is
>>     >>>>      >>>>>>>  > >> > that
>>     >>>>      >>>>>>>  > >> > > > > our build is slow - we run
>>     full build
>>     >>>>      for every PR
>>     >>>>      >>>> and a
>>     >>>>      >>>>>>>  > >> successful
>>     >>>>      >>>>>>>  > >> > > full
>>     >>>>      >>>>>>>  > >> > > > > build takes ~5h. We
>>     definitely have
>>     >>>>      more options to
>>     >>>>      >>>>>>>  solve it,
>>     >>>>      >>>>>>>  > for
>>     >>>>      >>>>>>>  > >> > > > instance,
>>     >>>>      >>>>>>>  > >> > > > > modularize the build graphs
>>     and reuse
>>     >>>>      artifacts
>>     >>>>      >> from
>>     >>>>      >>>> the
>>     >>>>      >>>>>>>  > previous
>>     >>>>      >>>>>>>  > >> > > build.
>>     >>>>      >>>>>>>  > >> > > > > But I think that can be a big
>>     effort
>>     >>>>      which is much
>>     >>>>      >>>>>>>  harder to
>>     >>>>      >>>>>>>  > >> > accomplish
>>     >>>>      >>>>>>>  > >> > > > in
>>     >>>>      >>>>>>>  > >> > > > > a short period of time and
>>     may deserve
>>     >>>>      its own
>>     >>>>      >>>> separate
>>     >>>>      >>>>>>>  > >> discussion.
>>     >>>>      >>>>>>>  > >> > > > >
>>     >>>>      >>>>>>>  > >> > > > > [1]
>>     >>>>      >> https://travis-ci.org/apache/flink/pull_requests
>>     >>>>      >>>>>>>  > >> > > > >
>>     >>>>      >>>>>>>  > >> > > > >
>>     >>>>      >>>>>>>  > >> > > >
>>     >>>>      >>>>>>>  > >> > >
>>     >>>>      >>>>>>>  > >> >
>>     >>>>      >>>>>>>  > >>
>>     >>>>      >>>>>>>  > >
>>     >>>>      >>>>>>>  >
>>     >>>>      >>>>>>>
>>     >>>>      >>>>>>>
>>     >>>>      >>>>>>>       --
>>     >>>>      >>>>>>>  Best Regards
>>     >>>>      >>>>>>>
>>     >>>>      >>>>>>>  Jeff Zhang
>>     >>>>      >>>>>>>
>>     >>>>      >>
>>     >>>>
>>     >>>
>>     >>
>>
>
>


Re: [VOTE] Migrate to sponsored Travis account

Posted by Xu Forward <fo...@gmail.com>.
+1

vino yang <ya...@gmail.com> 于2019年7月4日周四 下午7:55写道:

> +1
>
> Dian Fu <di...@gmail.com> 于2019年7月4日周四 下午7:09写道:
>
> > +1. Thanks Chesnay and Bowen for pushing this forward.
> >
> > Regards,
> > Dian
> >
> > > 在 2019年7月4日,下午6:28,zhijiang <wa...@aliyun.com.INVALID> 写道:
> > >
> > > +1 and thanks for Chesnay' work on this.
> > >
> > > Best,
> > > Zhijiang
> > >
> > > ------------------------------------------------------------------
> > > From:Haibo Sun <su...@163.com>
> > > Send Time:2019年7月4日(星期四) 18:21
> > > To:dev <de...@flink.apache.org>
> > > Cc:private@flink.apache.org <pr...@flink.apache.org>
> > > Subject:Re:Re: [VOTE] Migrate to sponsored Travis account
> > >
> > > +1. Thank Chesnay for pushing this forward.
> > >
> > > Best,
> > > Haibo
> > >
> > >
> > > At 2019-07-04 17:58:28, "Kurt Young" <yk...@gmail.com> wrote:
> > >> +1 and great thanks Chesnay for pushing this.
> > >>
> > >> Best,
> > >> Kurt
> > >>
> > >>
> > >> On Thu, Jul 4, 2019 at 5:44 PM Aljoscha Krettek <al...@apache.org>
> > wrote:
> > >>
> > >>> +1
> > >>>
> > >>> Aljoscha
> > >>>
> > >>>> On 4. Jul 2019, at 11:09, Stephan Ewen <se...@apache.org> wrote:
> > >>>>
> > >>>> +1 to move to a private Travis account.
> > >>>>
> > >>>> I can confirm that Ververica will sponsor a Travis CI plan that is
> > >>>> equivalent or a bit higher than the previous ASF quota (10
> concurrent
> > >>> build
> > >>>> queues)
> > >>>>
> > >>>> Best,
> > >>>> Stephan
> > >>>>
> > >>>> On Thu, Jul 4, 2019 at 10:46 AM Chesnay Schepler <
> chesnay@apache.org>
> > >>> wrote:
> > >>>>
> > >>>>> I've raised a JIRA
> > >>>>> <https://issues.apache.org/jira/browse/INFRA-18703>with INFRA to
> > >>> inquire
> > >>>>> whether it would be possible to switch to a different Travis
> account,
> > >>>>> and if so what steps would need to be taken.
> > >>>>> We need a proper confirmation from INFRA since we are not in full
> > >>>>> control of the flink repository (for example, we cannot access the
> > >>>>> settings page).
> > >>>>>
> > >>>>> If this is indeed possible, Ververica is willing sponsor a Travis
> > >>>>> account for the Flink project.
> > >>>>> This would provide us with more than enough resources than we need.
> > >>>>>
> > >>>>> Since this makes the project more reliant on resources provided by
> > >>>>> external companies I would like to vote on this.
> > >>>>>
> > >>>>> Please vote on this proposal, as follows:
> > >>>>> [ ] +1, Approve the migration to a Ververica-sponsored Travis
> > account,
> > >>>>> provided that INFRA approves
> > >>>>> [ ] -1, Do not approach the migration to a Ververica-sponsored
> Travis
> > >>>>> account
> > >>>>>
> > >>>>> The vote will be open for at least 24h, and until we have
> > confirmation
> > >>>>> from INFRA. The voting period may be shorter than the usual 3 days
> > since
> > >>>>> our current is effectively not working.
> > >>>>>
> > >>>>> On 04/07/2019 06:51, Bowen Li wrote:
> > >>>>>> Re: > Are they using their own Travis CI pool, or did the switch
> to
> > an
> > >>>>>> entirely different CI service?
> > >>>>>>
> > >>>>>> I reached out to Wes and Krisztián from Apache Arrow PMC. They are
> > >>>>>> currently moving away from ASF's Travis to their own in-house
> metal
> > >>>>>> machines at [1] with custom CI application at [2]. They've seen
> > >>>>>> significant improvement w.r.t both much higher performance and
> > >>>>>> basically no resource waiting time, "night-and-day" difference
> > quoting
> > >>>>>> Wes.
> > >>>>>>
> > >>>>>> Re: > If we can just switch to our own Travis pool, just for our
> > >>>>>> project, then this might be something we can do fairly quickly?
> > >>>>>>
> > >>>>>> I believe so, according to [3] and [4]
> > >>>>>>
> > >>>>>>
> > >>>>>> [1] https://ci.ursalabs.org/ <https://ci.ursalabs.org/#/>
> > >>>>>> [2] https://github.com/ursa-labs/ursabot
> > >>>>>> [3]
> > >>>>>>
> > >>>
> > https://docs.travis-ci.com/user/migrate/open-source-repository-migration
> > >>>>>> [4]
> > >>> https://docs.travis-ci.com/user/migrate/open-source-on-travis-ci-com
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>> On Wed, Jul 3, 2019 at 12:01 AM Chesnay Schepler <
> > chesnay@apache.org
> > >>>>>> <ma...@apache.org>> wrote:
> > >>>>>>
> > >>>>>>   Are they using their own Travis CI pool, or did the switch to an
> > >>>>>>   entirely different CI service?
> > >>>>>>
> > >>>>>>   If we can just switch to our own Travis pool, just for our
> > >>>>>>   project, then
> > >>>>>>   this might be something we can do fairly quickly?
> > >>>>>>
> > >>>>>>   On 03/07/2019 05:55, Bowen Li wrote:
> > >>>>>>> I responded in the INFRA ticket [1] that I believe they are
> > >>>>>>   using a wrong
> > >>>>>>> metric against Flink and the total build time is a completely
> > >>>>>>   different
> > >>>>>>> thing than guaranteed build capacity.
> > >>>>>>>
> > >>>>>>> My response:
> > >>>>>>>
> > >>>>>>> "As mentioned above, since I started to pay attention to Flink's
> > >>>>>>   build
> > >>>>>>> queue a few tens of days ago, I'm in Seattle and I saw no build
> > >>>>>>   was kicking
> > >>>>>>> off in PST daytime in weekdays for Flink. Our teammates in China
> > >>>>>>   and Europe
> > >>>>>>> have also reported similar observations. So we need to evaluate
> > >>>>>>   how the
> > >>>>>>> large total build time came from - if 1) your number and 2) our
> > >>>>>>> observations from three locations that cover pretty much a full
> > >>>>>>   day, are
> > >>>>>>> all true, I **guess** one reason can be that - highly likely the
> > >>>>>>   extra
> > >>>>>>> build time came from weekends when other Apache projects may be
> > >>>>>>   idle and
> > >>>>>>> Flink just drains hard its congested queue.
> > >>>>>>>
> > >>>>>>> Please be aware of that we're not complaining about the lack of
> > >>>>>>   resources
> > >>>>>>> in general, I'm complaining about the lack of **stable,
> dedicated**
> > >>>>>>> resources. An example for the latter one is, currently even if
> > >>>>>>   no build is
> > >>>>>>> in Flink's queue and I submit a request to be the queue head in
> PST
> > >>>>>>> morning, my build won't even start in 6-8+h. That is an absurd
> > >>>>>>   amount of
> > >>>>>>> waiting time.
> > >>>>>>>
> > >>>>>>> That's saying, if ASF INFRA decides to adopt a quota system and
> > >>>>>>   grants
> > >>>>>>> Flink five DEDICATED servers that runs all the time only for
> > >>>>>>   Flink, that'll
> > >>>>>>> be PERFECT and can totally solve our problem now.
> > >>>>>>>
> > >>>>>>> Please be aware of that we're not complaining about the lack of
> > >>>>>>   resources
> > >>>>>>> in general, I'm complaining about the lack of **stable,
> dedicated**
> > >>>>>>> resources. An example for the latter one is, currently even if
> > >>>>>>   no build is
> > >>>>>>> in Flink's queue and I submit a request to be the queue head in
> PST
> > >>>>>>> morning, my build won't even start in 6-8+h. That is an absurd
> > >>>>>>   amount of
> > >>>>>>> waiting time.
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> That's saying, if ASF INFRA decides to adopt a quota system and
> > >>>>>>   grants
> > >>>>>>> Flink five DEDICATED servers that runs all the time only for
> > >>>>>>   Flink, that'll
> > >>>>>>> be PERFECT and can totally solve our problem now.
> > >>>>>>>
> > >>>>>>> I feel what's missing in the ASF INFRA's Travis resource pool is
> > >>>>>>   some level
> > >>>>>>> of build capacity SLAs and certainty"
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> Again, I believe there are differences in nature of these two
> > >>>>>>   problems,
> > >>>>>>> long build time v.s. lack of dedicated build resource. That's
> > >>>>>>   saying,
> > >>>>>>> shortening build time may relieve the situation, and may not.
> > >>>>>>   I'm sightly
> > >>>>>>> negative on disabling IT cases for PRs, due to the downside is
> > >>>>>>   that we are
> > >>>>>>> at risk of any potential bugs in PR that UTs doesn't catch, and
> > >>>>>>   may cost a
> > >>>>>>> lot more to fix and if it slows others down or even block
> > >>>>>>   others, but am
> > >>>>>>> open to others opinions on it.
> > >>>>>>>
> > >>>>>>> AFAICT from INFRA ticket[1], donating to ASF INFRA won't be
> > >>>>>>   feasible to
> > >>>>>>> solve our problem since INFRA's pool is fully shared and they
> > >>>>>>   have no
> > >>>>>>> control and finer insights over resource allocation to a
> > >>>>>>   specific Apache
> > >>>>>>> project. As mentioned in [1], Apache Arrow is moving away from
> > >>>>>>   ASF INFRA
> > >>>>>>> Travis pool (they are actually surprised Flink hasn't plan to do
> > >>>>>>   so). I
> > >>>>>>> know that Spark is on its own build infra. If we all agree that
> > >>>>>>   funding our
> > >>>>>>> own build infra, I'd be glad to help investigate any potential
> > >>>>>>   options
> > >>>>>>> after releasing 1.9 since I'm super busy with 1.9 now.
> > >>>>>>>
> > >>>>>>> [1] https://issues.apache.org/jira/browse/INFRA-18533
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> On Tue, Jul 2, 2019 at 4:46 AM Chesnay Schepler
> > >>>>>>   <chesnay@apache.org <ma...@apache.org>> wrote:
> > >>>>>>>
> > >>>>>>>> As a short-term stopgap, since we can assume this issue to
> > >>>>>>   become much
> > >>>>>>>> worse in the following days/weeks, we could disable IT cases in
> > >>>>>>   PRs and
> > >>>>>>>> only run them on master.
> > >>>>>>>>
> > >>>>>>>> On 02/07/2019 12:03, Chesnay Schepler wrote:
> > >>>>>>>>> People really have to stop thinking that just because
> > >>>>>>   something works
> > >>>>>>>>> for us it is also a good solution.
> > >>>>>>>>> Also, please remember that our builds run for 2h from start to
> > >>>>>>   finish,
> > >>>>>>>>> and not the 14 _minutes_ it takes for zeppelin.
> > >>>>>>>>> We are dealing with an entirely different scale here, both in
> > >>>>>>   terms of
> > >>>>>>>>> build times and number of builds.
> > >>>>>>>>>
> > >>>>>>>>> In this very thread people have been complaining about long
> queue
> > >>>>>>>>> times for their builds. Surprise, other Apache projects have
> been
> > >>>>>>>>> suffering the very same thing due to us not controlling our
> build
> > >>>>>>>>> times. While switching services (be it Jenkins, CircleCI or
> > >>>>>>   whatever)
> > >>>>>>>>> will possibly work for us (and these options are actually
> > >>>>>>   attractive,
> > >>>>>>>>> like CircleCI's proper support for build artifacts), it will
> also
> > >>>>>>>>> result in us likely negatively affecting other projects in
> > >>>>>>   significant
> > >>>>>>>>> ways.
> > >>>>>>>>>
> > >>>>>>>>> Sure, the Jenkins setup has a good user experience for us, at
> > >>>>>>   the cost
> > >>>>>>>>> of blocking Jenkins workers for a _lot_ of time. Right now we
> > >>>>>>   have 25
> > >>>>>>>>> PR's in our queue; that's possibly 50h we'd consume of Jenkins
> > >>>>>>>>> resources, and the European contributors haven't even really
> > >>>>>>   started yet.
> > >>>>>>>>>
> > >>>>>>>>> FYI, the latest INFRA response from INFRA-18533:
> > >>>>>>>>>
> > >>>>>>>>> "Our rough metrics shows that Flink used over 5800 hours of
> > >>>>>>   build time
> > >>>>>>>>> last month. That is equal to EIGHT servers running 24/7 for
> > >>>>>>   the ENTIRE
> > >>>>>>>>> MONTH. EIGHT. nonstop.
> > >>>>>>>>> When we discovered this last night, we discussed it some and
> > >>>>>>   are going
> > >>>>>>>>> to tune down Flink to allow only five executors maximum. We
> > >>>>> cannot
> > >>>>>>>>> allow Flink to consume so much of a Foundation shared
> resource."
> > >>>>>>>>>
> > >>>>>>>>> So yes, we either
> > >>>>>>>>> a) have to heavily reduce our CI usage or
> > >>>>>>>>> b) fund our own, either maintaining it ourselves or donating
> > >>>>>>   to Apache.
> > >>>>>>>>>
> > >>>>>>>>> On 02/07/2019 05:11, Bowen Li wrote:
> > >>>>>>>>>> By looking at the git history of the Jenkins script, its core
> > >>>>>>   part
> > >>>>>>>>>> was finished in March 2017 (and only two minor update in
> > >>>>>>   2017/2018),
> > >>>>>>>>>> so it's been running for over two years now and feels like
> > >>>>>>   Zepplin
> > >>>>>>>>>> community has been quite happy with it. @Jeff Zhang
> > >>>>>>>>>> <mailto:zjffdu@gmail.com <ma...@gmail.com>> can you
> > >>>>>>   share your insights and user
> > >>>>>>>>>> experience with the Jenkins+Travis approach?
> > >>>>>>>>>>
> > >>>>>>>>>> Things like:
> > >>>>>>>>>>
> > >>>>>>>>>> - has the approach completely solved the resource capacity
> > >>>>>>   problem
> > >>>>>>>>>> for Zepplin community? is Zepplin community happy with the
> > >>>>>>   result?
> > >>>>>>>>>> - is the whole configuration chain stable (e.g. uptime)
> enough?
> > >>>>>>>>>> - how often do you need to maintain the Jenkins infra? how
> many
> > >>>>>>>>>> people are usually involved in maintenance and bug-fixes?
> > >>>>>>>>>>
> > >>>>>>>>>> The downside of this approach seems mostly to be on the
> > >>>>>>   maintenance
> > >>>>>>>>>> to me - maintain the script and Jenkins infra.
> > >>>>>>>>>>
> > >>>>>>>>>> ** Having Our Own Travis-CI.com Account **
> > >>>>>>>>>>
> > >>>>>>>>>> Another alternative I've been thinking of is to have our own
> > >>>>>>>>>> travis-ci.com <http://travis-ci.com> <http://travis-ci.com>
> > >>>>>>   account with paid dedicated
> > >>>>>>>>>> resources. Note travis-ci.org <http://travis-ci.org>
> > >>>>>>   <http://travis-ci.org> is the free
> > >>>>>>>>>> version and travis-ci.com <http://travis-ci.com>
> > >>>>>>   <http://travis-ci.com> is the commercial
> > >>>>>>>>>> version. We currently use a shared resource pool managed by
> > >>>>>>   ASK INFRA
> > >>>>>>>>>> team on travis-ci.org <http://travis-ci.org>
> > >>>>>>   <http://travis-ci.org>, but we have no control
> > >>>>>>>>>> over it - we can't see how it's configured, how much
> > >>>>>>   resources are
> > >>>>>>>>>> available, how resources are allocated among Apache projects,
> > >>>>>>   etc.
> > >>>>>>>>>> The nice thing about having an account on travis-ci.com
> > >>>>>>   <http://travis-ci.com>
> > >>>>>>>>>> <http://travis-ci.com> are:
> > >>>>>>>>>>
> > >>>>>>>>>> - relatively low cost with much better resource guarantee
> > >>>>>>   than what
> > >>>>>>>>>> we currently have [1]: $249/month with 5 dedicated
> concurrency,
> > >>>>>>>>>> $489/month with 10 concurrency
> > >>>>>>>>>> - low maintenance work compared to using Jenkins
> > >>>>>>>>>> - (potentially) no migration cost according to Travis's doc
> [2]
> > >>>>>>>>>> (pending verification)
> > >>>>>>>>>> - full control over the build capacity/configuration compared
> to
> > >>>>>>>>>> using ASF INFRA's pool
> > >>>>>>>>>>
> > >>>>>>>>>> I'd be surprised if we as such a vibrant community cannot
> > >>>>>>   find and
> > >>>>>>>>>> fund $249*12=$2988 a year in exchange for a much better
> > >>>>> developer
> > >>>>>>>>>> experience and much higher productivity.
> > >>>>>>>>>>
> > >>>>>>>>>> [1] https://travis-ci.com/plans
> > >>>>>>>>>> [2]
> > >>>>>>>>>>
> > >>>>>>>>
> > >>>>>>
> > >>>>>
> > >>>
> > https://docs.travis-ci.com/user/migrate/open-source-repository-migration
> > >>>>>>>>>> On Sat, Jun 29, 2019 at 8:39 AM Chesnay Schepler
> > >>>>>>   <chesnay@apache.org <ma...@apache.org>
> > >>>>>>>>>> <mailto:chesnay@apache.org <ma...@apache.org>>>
> wrote:
> > >>>>>>>>>>
> > >>>>>>>>>>    So yes, the Jenkins job keeps pulling the state from
> > >>>>>>   Travis until it
> > >>>>>>>>>>    finishes.
> > >>>>>>>>>>
> > >>>>>>>>>>    Note sure I'm comfortable with the idea of using Jenkins
> > >>>>>>   workers
> > >>>>>>>>>>    just to
> > >>>>>>>>>>    idle for a several hours.
> > >>>>>>>>>>
> > >>>>>>>>>>    On 29/06/2019 14:56, Jeff Zhang wrote:
> > >>>>>>>>>>> Here's what zeppelin community did, we make a python
> > >>>>>>   script to
> > >>>>>>>>>>    check the
> > >>>>>>>>>>> build status of pull request.
> > >>>>>>>>>>> Here's script:
> > >>>>>>>>>>>
> > >>>>>>   https://github.com/apache/zeppelin/blob/master/travis_check.py
> > >>>>>>>>>>>
> > >>>>>>>>>>> And this is the script we used in Jenkins build job.
> > >>>>>>>>>>>
> > >>>>>>>>>>> if [ -f "travis_check.py" ]; then
> > >>>>>>>>>>>  git log -n 1
> > >>>>>>>>>>>  STATUS=$(curl -s $BUILD_URL | grep -e "GitHub pull
> > >>>>>>>>>>    request.*from.*" | sed
> > >>>>>>>>>>> 's/.*GitHub pull request <a
> > >>>>>>>>>>> href=\"\(https[^"]*\).*from[^"]*.\(https[^"]*\).*/\1
> > >>>>>>   \2/g')
> > >>>>>>>>>>>  AUTHOR=$(echo $STATUS | sed 's/.*[/]\(.*\)$/\1/g')
> > >>>>>>>>>>>  PR=$(echo $STATUS | awk '{print $1}' | sed
> > >>>>>>>>>> 's/.*[/]\(.*\)$/\1/g')
> > >>>>>>>>>>>  #COMMIT=$(git log -n 1 | grep "^Merge:" | awk
> > >>>>>>   '{print $3}')
> > >>>>>>>>>>>  #if [ -z $COMMIT ]; then
> > >>>>>>>>>>>  #  COMMIT=$(curl -s
> > >>>>>>>>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR
> > >>>>>>>>>>> | grep -e "\"label\":" -e "\"ref\":" -e "\"sha\":" |
> > >>>>>>   tr '\n' ' '
> > >>>>>>>>>>    | sed
> > >>>>>>>>>>> 's/\(.*sha[^,]*,\)\(.*ref.*\)/\1 = \2/g' | tr = '\n' |
> > >>>>>>   grep -v
> > >>>>>>>>>>    "apache:" |
> > >>>>>>>>>>> sed 's/.*sha.[^"]*["]\([^"]*\).*/\1/g')
> > >>>>>>>>>>>  #fi
> > >>>>>>>>>>>
> > >>>>>>>>>>>  # get commit hash from PR
> > >>>>>>>>>>>  COMMIT=$(curl -s
> > >>>>>>>>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR |
> > >>>>>>>>>>> grep -e "\"label\":" -e "\"ref\":" -e "\"sha\":" | tr
> > >>>>>>   '\n' ' '
> > >>>>>>>>>> | sed
> > >>>>>>>>>>> 's/\(.*sha[^,]*,\)\(.*ref.*\)/\1 = \2/g' | tr = '\n' |
> > >>>>>>   grep -v
> > >>>>>>>>>>    "apache:" |
> > >>>>>>>>>>> sed 's/.*sha.[^"]*["]\([^"]*\).*/\1/g')
> > >>>>>>>>>>>  sleep 30 # sleep few moment to wait travis starts
> > >>>>>>   the build
> > >>>>>>>>>>>  RET_CODE=0
> > >>>>>>>>>>>  python ./travis_check.py ${AUTHOR} ${COMMIT} ||
> > >>>>>>   RET_CODE=$?
> > >>>>>>>>>>>  if [ $RET_CODE -eq 2 ]; then # try with repository
> > >>>>>>   name when
> > >>>>>>>>>>    travis-ci is
> > >>>>>>>>>>> not available in the account
> > >>>>>>>>>>>    RET_CODE=0
> > >>>>>>>>>>>    AUTHOR=$(curl -s
> > >>>>>>>>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR
> > >>>>>>>>>>> | grep '"full_name":' | grep -v "apache/zeppelin" | sed
> > >>>>>>>>>>> 's/.*[:][^"]*["]\([^/]*\).*/\1/g')
> > >>>>>>>>>>>  python ./travis_check.py ${AUTHOR} ${COMMIT} ||
> > >>>>>>   RET_CODE=$?
> > >>>>>>>>>>>  fi
> > >>>>>>>>>>>
> > >>>>>>>>>>>  if [ $RET_CODE -eq 2 ]; then # fail with can't find
> > >>>>>>   build
> > >>>>>>>>>>    information in
> > >>>>>>>>>>> the travis
> > >>>>>>>>>>>    set +x
> > >>>>>>>>>>>    echo
> > >>>>>>   "-----------------------------------------------------"
> > >>>>>>>>>>>    echo "Looks like travis-ci is not configured for
> > >>>>>>   your fork."
> > >>>>>>>>>>>    echo "Please setup by swich on 'zeppelin'
> > >>>>>>   repository at
> > >>>>>>>>>>> https://travis-ci.org/profile and travis-ci."
> > >>>>>>>>>>>    echo "And then make sure 'Build branch updates'
> > >>>>>>   option is
> > >>>>>>>>>>    enabled in
> > >>>>>>>>>>> the settings
> > >>>>>>   https://travis-ci.org/${AUTHOR}/zeppelin/settings
> > >>>>>>   <https://travis-ci.org/$%7BAUTHOR%7D/zeppelin/settings>
> > >>>>>>>>>> <https://travis-ci.org/$%7BAUTHOR%7D/zeppelin/settings>."
> > >>>>>>>>>>>    echo ""
> > >>>>>>>>>>>    echo "To trigger CI after setup, you will need
> > >>>>>>   ammend your
> > >>>>>>>>>>    last commit
> > >>>>>>>>>>> with"
> > >>>>>>>>>>>    echo "git commit --amend"
> > >>>>>>>>>>>    echo "git push your-remote HEAD --force"
> > >>>>>>>>>>>    echo ""
> > >>>>>>>>>>>    echo "See
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>
> > >>>>>>
> > >>>>>
> > >>>
> >
> http://zeppelin.apache.org/contribution/contributions.html#continuous-integration
> > >>>>>>>>>>> ."
> > >>>>>>>>>>>  fi
> > >>>>>>>>>>>
> > >>>>>>>>>>>  exit $RET_CODE
> > >>>>>>>>>>> else
> > >>>>>>>>>>>  set +x
> > >>>>>>>>>>>  echo "travis_check.py does not exists"
> > >>>>>>>>>>>  exit 1
> > >>>>>>>>>>> fi
> > >>>>>>>>>>>
> > >>>>>>>>>>> Chesnay Schepler <chesnay@apache.org
> > >>>>>>   <ma...@apache.org>
> > >>>>>>>>>>    <mailto:chesnay@apache.org <ma...@apache.org>>>
> > >>>>>>   于2019年6月29日周六 下午3:17写道:
> > >>>>>>>>>>>
> > >>>>>>>>>>>> Does this imply that a Jenkins job is active as long
> > >>>>>>   as the
> > >>>>>>>>>>    Travis build
> > >>>>>>>>>>>> runs?
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> On 26/06/2019 21:28, Bowen Li wrote:
> > >>>>>>>>>>>>> Hi,
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> @Dawid, I think the "long test running" as I
> > >>>>>>   mentioned in the
> > >>>>>>>>>>    first
> > >>>>>>>>>>>> email,
> > >>>>>>>>>>>>> also as you guys said, belongs to "a big effort
> > >>>>>>   which is much
> > >>>>>>>>>>    harder to
> > >>>>>>>>>>>>> accomplish in a short period of time and may deserve
> > >>>>>>   its own
> > >>>>>>>>>>    separate
> > >>>>>>>>>>>>> discussion". Thus I didn't include it in what we can
> > >>>>>>   do in a
> > >>>>>>>>>>    foreseeable
> > >>>>>>>>>>>>> short term.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Besides, I don't think that's the ultimate reason
> > >>>>>>   for lack of
> > >>>>>>>>>>    build
> > >>>>>>>>>>>>> resources. Even if the build is shortened to
> > >>>>>>   something like
> > >>>>>>>>>>    2h, the
> > >>>>>>>>>>>>> problems of no build machine works about 6 or more
> > >>>>>>   hours in
> > >>>>>>>>>>    PST daytime
> > >>>>>>>>>>>>> that I described will still happen, because no
> > >>>>>>   machine from
> > >>>>>>>>>>    ASF INFRA's
> > >>>>>>>>>>>>> pool is allocated to Flink. As I have paid close
> > >>>>>>   attention to
> > >>>>>>>>>>    the build
> > >>>>>>>>>>>>> queue in the past few weekdays, it's a pretty clear
> > >>>>>>   pattern now.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> **The ultimate root cause** for that is - we don't
> > >>>>>>   have any
> > >>>>>>>>>>    **dedicated**
> > >>>>>>>>>>>>> build resources that we can stably rely on. I'm
> > >>>>>>   actually ok to
> > >>>>>>>>>>    wait for a
> > >>>>>>>>>>>>> long time if there are build requests running, it
> > >>>>>>   means at
> > >>>>>>>>>>    least we are
> > >>>>>>>>>>>>> making progress. But I'm not ok with no build
> > >>>>>>   resource. A
> > >>>>>>>>>>    better place I
> > >>>>>>>>>>>>> think we should aim at in short term is to always
> > >>>>>>   have at
> > >>>>>>>>>>    least a central
> > >>>>>>>>>>>>> pool (can be 3 or 5) of machines dedicated to build
> > >>>>>>   Flink at
> > >>>>>>>>>>    any time, or
> > >>>>>>>>>>>>> maybe use users resources.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> @Chesnay @Robert I synced with Jeff offline that
> > >>>>>>   Zeppelin
> > >>>>>>>>>>    community is
> > >>>>>>>>>>>>> using a Jenkins job to automatically build on users'
> > >>>>>>   travis
> > >>>>>>>>>>    account and
> > >>>>>>>>>>>>> link the result back to github PR. I guess the
> > >>>>>>   Jenkins job
> > >>>>>>>>>>    would fetch
> > >>>>>>>>>>>>> latest upstream master and build the PR against it.
> > >>>>>>   Jeff has
> > >>>>>>>>>> filed
> > >>>>>>>>>>>> tickets
> > >>>>>>>>>>>>> to learn and get access to the Jenkins infra. It'll
> > >>>>>>   better to
> > >>>>>>>>>>    fully
> > >>>>>>>>>>>>> understand it first before judging this approach.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> I also heard good things about CircleCI, and ASF
> > >>>>>>   INFRA seems
> > >>>>>>>>>>    to have a
> > >>>>>>>>>>>> pool
> > >>>>>>>>>>>>> of build capacity there too. Can be an alternative
> > >>>>>>   to consider.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> On Wed, Jun 26, 2019 at 12:44 AM Dawid Wysakowicz <
> > >>>>>>>>>>>> dwysakowicz@apache.org
> > >>>>>>   <ma...@apache.org> <mailto:dwysakowicz@apache.org
> > >>>>>>   <ma...@apache.org>>>
> > >>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Sorry to jump in late, but I think Bowen missed the
> > >>>>>>   most
> > >>>>>>>>>>    important point
> > >>>>>>>>>>>>>> from Chesnay's previous message in the summary. The
> > >>>>>>   ultimate
> > >>>>>>>>>>    reason for
> > >>>>>>>>>>>>>> all the problems is that the tests take close to 2
> > >>>>>>   hours to
> > >>>>>>>>>>    run already.
> > >>>>>>>>>>>>>> I fully support this claim: "Unless people start
> > >>>>>>   caring about
> > >>>>>>>>>>    test times
> > >>>>>>>>>>>>>> before adding them, this issue cannot be solved"
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> This is also another reason why using user's Travis
> > >>>>>>   account
> > >>>>>>>>>>    won't help.
> > >>>>>>>>>>>>>> Every few weeks we reach the user's time limit for
> > >>>>>>   a single
> > >>>>>>>>>>    profile.
> > >>>>>>>>>>>>>> This makes the user's builds simply fail, until we
> > >>>>>>   either
> > >>>>>>>>>>    properly
> > >>>>>>>>>>>>>> decrease the time the tests take (which I am not
> > >>>>>>   sure we ever
> > >>>>>>>>>>    did) or
> > >>>>>>>>>>>>>> postpone the problem by splitting into more
> > >>>>>>   profiles. (Note
> > >>>>>>>>>>    that the ASF
> > >>>>>>>>>>>>>> Travis account has higher time limits)
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Best,
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Dawid
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> On 26/06/2019 09:36, Robert Metzger wrote:
> > >>>>>>>>>>>>>>> Do we know if using "the best" available hardware
> > >>>>>>   would
> > >>>>>>>>>>    improve the
> > >>>>>>>>>>>> build
> > >>>>>>>>>>>>>>> times?
> > >>>>>>>>>>>>>>> Imagine we would run the build on machines with
> > >>>>>>   plenty of
> > >>>>>>>>>>    main memory
> > >>>>>>>>>>>> to
> > >>>>>>>>>>>>>>> mount everything to ramdisk + the latest CPU
> > >>>>>>   architecture?
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> Throwing hardware at the problem could help reduce
> > >>>>>>   the time
> > >>>>>>>>>>    of an
> > >>>>>>>>>>>>>>> individual build, and using our own infrastructure
> > >>>>>>   would
> > >>>>>>>>>>    remove our
> > >>>>>>>>>>>>>>> dependency on Apache's Travis account (with the
> > >>>>>>   obvious
> > >>>>>>>>>>    downside of
> > >>>>>>>>>>>>>> having
> > >>>>>>>>>>>>>>> to maintain the infrastructure)
> > >>>>>>>>>>>>>>> We could use an open source travis alternative, to
> > >>>>>>   have a
> > >>>>>>>>>>    similar
> > >>>>>>>>>>>>>>> experience and make the migration easy.
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> On Wed, Jun 26, 2019 at 9:34 AM Chesnay Schepler
> > >>>>>>>>>>    <chesnay@apache.org <ma...@apache.org>
> > >>>>>>   <mailto:chesnay@apache.org <ma...@apache.org>>>
> > >>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>> From what I gathered, there's no special
> > >>>>>>   sauce that the
> > >>>>>>>>>>    Zeppelin
> > >>>>>>>>>>>>>>>> project uses which actually integrates a users
> > >>>>> Travis
> > >>>>>>>>>>    account into the
> > >>>>>>>>>>>>>> PR.
> > >>>>>>>>>>>>>>>> They just disabled Travis for PRs. And that's
> > >>>>>>   kind of it.
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> Naturally we can do this (duh) and safe the ASF a
> > >>>>>>   fair
> > >>>>>>>>>>    amount of
> > >>>>>>>>>>>>>>>> resources, but there are downsides:
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> The discoverability of the Travis check takes a
> > >>>>>>   nose-dive.
> > >>>>>>>>>>    Either we
> > >>>>>>>>>>>>>>>> require every contributor to always, an every
> > >>>>>>   commit, also
> > >>>>>>>>>>    post a
> > >>>>>>>>>>>> Travis
> > >>>>>>>>>>>>>>>> build, or we have the reviewer sift through the
> > >>>>>>>>>>    contributors account
> > >>>>>>>>>>>> to
> > >>>>>>>>>>>>>>>> find it.
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> This is rather cumbersome. Additionally, it's
> > >>>>>>   also not
> > >>>>>>>>>>    equivalent to
> > >>>>>>>>>>>>>>>> having a PR build.
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> A normal branch build takes a branch as is and
> > >>>>>>   tests it. A
> > >>>>>>>>>>    PR build
> > >>>>>>>>>>>>>>>> merges the branch into master, and then runs it.
> > >>>>>>   (Fun fact:
> > >>>>>>>>>>    This is
> > >>>>>>>>>>>> why
> > >>>>>>>>>>>>>>>> a PR without merge conflicts is not being run on
> > >>>>>>   Travis.)
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> And ultimately, everyone can already make use of
> > >>>>> this
> > >>>>>>>>>>    approach anyway.
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> On 25/06/2019 08:02, Jark Wu wrote:
> > >>>>>>>>>>>>>>>>> Hi Jeff,
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> Thanks for sharing the Zeppelin approach. I
> > >>>>>>   think it's a
> > >>>>>>>>>>    good idea to
> > >>>>>>>>>>>>>>>>> leverage user's travis account.
> > >>>>>>>>>>>>>>>>> In this way, we can have almost unlimited
> > >>>>>>   concurrent build
> > >>>>>>>>>>    jobs and
> > >>>>>>>>>>>>>>>>> developers can restart build by themselves
> > >>>>>>   (currently only
> > >>>>>>>>>>    committers
> > >>>>>>>>>>>>>>>>> can restart PR's build).
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> But I'm still not very clear how to integrate
> > >>>>> user's
> > >>>>>>>>>>    travis build
> > >>>>>>>>>>>> into
> > >>>>>>>>>>>>>>>>> the Flink pull request's build automatically.
> > >>>>>>   Can you
> > >>>>>>>>>>    explain more in
> > >>>>>>>>>>>>>>>>> detail?
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> Another question: does travis only build
> > >>>>>>   branches for user
> > >>>>>>>>>>    account?
> > >>>>>>>>>>>>>>>>> My concern is that builds for PRs will rebase
> > >>>>> user's
> > >>>>>>>>>>    commits against
> > >>>>>>>>>>>>>>>>> current master branch.
> > >>>>>>>>>>>>>>>>> This will help us to find problems before
> > >>>>>>   merge.  Builds
> > >>>>>>>>>>    for branches
> > >>>>>>>>>>>>>>>>> will lose the impact of new commits in master.
> > >>>>>>>>>>>>>>>>> How does Zeppelin solve this problem?
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> Thanks again for sharing the idea.
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> Regards,
> > >>>>>>>>>>>>>>>>> Jark
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> On Tue, 25 Jun 2019 at 11:01, Jeff Zhang
> > >>>>>>   <zjffdu@gmail.com <ma...@gmail.com>
> > >>>>>>>>>>    <mailto:zjffdu@gmail.com <ma...@gmail.com>>
> > >>>>>>>>>>>>>>>>> <mailto:zjffdu@gmail.com
> > >>>>>>   <ma...@gmail.com> <mailto:zjffdu@gmail.com
> > >>>>>>   <ma...@gmail.com>>>> wrote:
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>     Hi Folks,
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> Zeppelin meet this kind of issue before, we solve
> > >>>>>>>>>> it by
> > >>>>>>>>>>>> delegating
> > >>>>>>>>>>>>>>>>>     each
> > >>>>>>>>>>>>>>>>>     one's PR build to his travis account
> > >>>>>>   (Everyone can
> > >>>>>>>>>>    have 5 free
> > >>>>>>>>>>>>>>>>>     slot for
> > >>>>>>>>>>>>>>>>> travis build).
> > >>>>>>>>>>>>>>>>> Apache account travis build is only triggered when
> > >>>>>>>>>>    PR is merged.
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>     Kurt Young <ykt836@gmail.com
> > >>>>>>   <ma...@gmail.com>
> > >>>>>>>>>>    <mailto:ykt836@gmail.com <ma...@gmail.com>>
> > >>>>>>   <mailto:ykt836@gmail.com <ma...@gmail.com>
> > >>>>>>>>>>    <mailto:ykt836@gmail.com <ma...@gmail.com>>>>
> > >>>>>>>>>>>>>>>>> 于2019年6月25日周二 上午10:16写道:
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> (Forgot to cc George)
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> Best,
> > >>>>>>>>>>>>>>>>>> Kurt
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> On Tue, Jun 25, 2019 at 10:16 AM Kurt Young
> > >>>>>>>>>>    <ykt836@gmail.com <ma...@gmail.com>
> > >>>>>>   <mailto:ykt836@gmail.com <ma...@gmail.com>>
> > >>>>>>>>>>>>>>>>> <mailto:ykt836@gmail.com
> > >>>>>>   <ma...@gmail.com> <mailto:ykt836@gmail.com
> > >>>>>>   <ma...@gmail.com>>>>
> > >>>>>>>>>>    wrote:
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> Hi Bowen,
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> Thanks for bringing this up. We
> > >>>>>>   actually have
> > >>>>>>>>>>    discussed
> > >>>>>>>>>>>> about
> > >>>>>>>>>>>>>>>>>     this, and I
> > >>>>>>>>>>>>>>>>>>> think Till and George have
> > >>>>>>>>>>>>>>>>>>> already spend sometime investigating
> > >>>>>>   it. I have
> > >>>>>>>>>>    cced both of
> > >>>>>>>>>>>>>>>>>     them, and
> > >>>>>>>>>>>>>>>>>>> maybe they can share
> > >>>>>>>>>>>>>>>>>>> their findings.
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> Best,
> > >>>>>>>>>>>>>>>>>>> Kurt
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> On Tue, Jun 25, 2019 at 10:08 AM Jark Wu
> > >>>>>>>>>>    <imjark@gmail.com <ma...@gmail.com>
> > >>>>>>   <mailto:imjark@gmail.com <ma...@gmail.com>>
> > >>>>>>>>>>>>>>>>> <mailto:imjark@gmail.com
> > >>>>>>   <ma...@gmail.com> <mailto:imjark@gmail.com
> > >>>>>>   <ma...@gmail.com>>>>
> > >>>>>>>>>>    wrote:
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>> Hi Bowen,
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>> Thanks for bringing this. We also
> > >>>>>>   suffered from
> > >>>>>>>>>>    the long
> > >>>>>>>>>>>>>>>>>     build time.
> > >>>>>>>>>>>>>>>>>>>> I agree that we should focus on
> > >>>>>>   solving build
> > >>>>>>>>>>    capacity
> > >>>>>>>>>>>>>>>>> problem in the
> > >>>>>>>>>>>>>>>>>>>> thread.
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>> My observation is there is only one
> > >>>>>>   build is
> > >>>>>>>>>>    running, all
> > >>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>> others
> > >>>>>>>>>>>>>>>>>>>> (other
> > >>>>>>>>>>>>>>>>>>>> PRs, master) are pending.
> > >>>>>>>>>>>>>>>>>>>> The pricing plan[1] of travis shows
> > >>>>>>   it can
> > >>>>>>>>>> support
> > >>>>>>>>>>>> concurrent
> > >>>>>>>>>>>>>>>>>     build
> > >>>>>>>>>>>>>>>>>> jobs.
> > >>>>>>>>>>>>>>>>>>>> But I don't know which plan we are
> > >>>>>>   using, might
> > >>>>>>>>>>    be the free
> > >>>>>>>>>>>>>>>>>     plan for
> > >>>>>>>>>>>>>>>>>> open
> > >>>>>>>>>>>>>>>>>>>> source.
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>> I cc-ed Chesnay who may have some
> > >>>>>>   experience on
> > >>>>>>>>>>    Travis.
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>> Regards,
> > >>>>>>>>>>>>>>>>>>>> Jark
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>> [1]: https://travis-ci.com/plans
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>> On Tue, 25 Jun 2019 at 08:11, Bowen Li <
> > >>>>>>>>>>>> bowenli86@gmail.com <ma...@gmail.com>
> > >>>>>>   <mailto:bowenli86@gmail.com <ma...@gmail.com>>
> > >>>>>>>>>>>>>>>>> <mailto:bowenli86@gmail.com
> > >>>>>>   <ma...@gmail.com>
> > >>>>>>>>>>    <mailto:bowenli86@gmail.com
> > >>>>>>   <ma...@gmail.com>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> Hi Steven,
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> I think you may not read what I
> > >>>>>>   wrote. The
> > >>>>>>>>>>    discussion is
> > >>>>>>>>>>>>>> about
> > >>>>>>>>>>>>>>>>>> "unstable
> > >>>>>>>>>>>>>>>>>>>>> build **capacity**", in another word
> > >>>>>>>>>>    "unstable / lack of
> > >>>>>>>>>>>>>> build
> > >>>>>>>>>>>>>>>>>>>> resources",
> > >>>>>>>>>>>>>>>>>>>>> not "unstable build".
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> On Mon, Jun 24, 2019 at 4:40 PM
> > >>>>>>   Steven Wu
> > >>>>>>>>>>>>>>>>>     <stevenz3wu@gmail.com
> > >>>>>>   <ma...@gmail.com> <mailto:stevenz3wu@gmail.com
> > >>>>>>   <ma...@gmail.com>>
> > >>>>>>>>>>    <mailto:stevenz3wu@gmail.com
> > >>>>>>   <ma...@gmail.com> <mailto:stevenz3wu@gmail.com
> > >>>>>>   <ma...@gmail.com>>>>
> > >>>>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>> long and sometimes unstable build is
> > >>>>>>>>>>    definitely a pain
> > >>>>>>>>>>>>>>>> point.
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>> I suspect the build failure here in
> > >>>>>>>>>>>> flink-connector-kafka
> > >>>>>>>>>>>>>>>>>     is not
> > >>>>>>>>>>>>>>>>>>>> related
> > >>>>>>>>>>>>>>>>>>>>> to
> > >>>>>>>>>>>>>>>>>>>>>> my change. but there is no easy
> > >>>>>>   re-run the
> > >>>>>>>>>>    build on
> > >>>>>>>>>>>>>>>>> travis UI.
> > >>>>>>>>>>>>>>>>>> Google
> > >>>>>>>>>>>>>>>>>>>>>> search showed a trick of
> > >>>>>>   close-and-open the
> > >>>>>>>>>>    PR will
> > >>>>>>>>>>>>>>>>> trigger rebuild.
> > >>>>>>>>>>>>>>>>>>>> but
> > >>>>>>>>>>>>>>>>>>>>>> that could add noises to the PR
> > >>>>>>   activities.
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>> https://travis-ci.org/apache/flink/jobs/545555519
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>> travis-ci for my personal repo
> > >>>>>>   often failed
> > >>>>>>>>>>    with
> > >>>>>>>>>>>>>>>>> exceeding time
> > >>>>>>>>>>>>>>>>>> limit
> > >>>>>>>>>>>>>>>>>>>>> after
> > >>>>>>>>>>>>>>>>>>>>>> 4+ hours.
> > >>>>>>>>>>>>>>>>>>>>>> The job exceeded the maximum time
> > >>>>>>   limit for
> > >>>>>>>>>>    jobs, and
> > >>>>>>>>>>>> has
> > >>>>>>>>>>>>>>>>>     been
> > >>>>>>>>>>>>>>>>>>>>> terminated.
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>> On Mon, Jun 24, 2019 at 4:15 PM
> > >>>>>>   Bowen Li
> > >>>>>>>>>>>>>>>>>     <bowenli86@gmail.com
> > >>>>>>   <ma...@gmail.com> <mailto:bowenli86@gmail.com
> > >>>>>>   <ma...@gmail.com>>
> > >>>>>>>>>>    <mailto:bowenli86@gmail.com <ma...@gmail.com>
> > >>>>>>   <mailto:bowenli86@gmail.com <ma...@gmail.com>>>>
> > >>>>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>> https://travis-ci.org/apache/flink/builds/549681530
> > >>>>>>>>>>>>>>>>>     This build
> > >>>>>>>>>>>>>>>>>>>>> request
> > >>>>>>>>>>>>>>>>>>>>>>> has
> > >>>>>>>>>>>>>>>>>>>>>>> been sitting at **HEAD of the
> > >>>>>>   queue**
> > >>>>>>>>>>    since I first
> > >>>>>>>>>>>> saw
> > >>>>>>>>>>>>>>>>>     it at PST
> > >>>>>>>>>>>>>>>>>>>>> 10:30am
> > >>>>>>>>>>>>>>>>>>>>>>> (not sure how long it's been
> > >>>>>>   there before
> > >>>>>>>>>>    10:30am).
> > >>>>>>>>>>>>>>>>>     It's PST
> > >>>>>>>>>>>>>>>>>> 4:12pm
> > >>>>>>>>>>>>>>>>>>>> now
> > >>>>>>>>>>>>>>>>>>>>>> and
> > >>>>>>>>>>>>>>>>>>>>>>> it hasn't started yet.
> > >>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>> On Mon, Jun 24, 2019 at 2:48 PM
> > >>>>>>   Bowen Li
> > >>>>>>>>>>>>>>>>>     <bowenli86@gmail.com
> > >>>>>>   <ma...@gmail.com> <mailto:bowenli86@gmail.com
> > >>>>>>   <ma...@gmail.com>>
> > >>>>>>>>>>    <mailto:bowenli86@gmail.com <ma...@gmail.com>
> > >>>>>>   <mailto:bowenli86@gmail.com <ma...@gmail.com>>>>
> > >>>>>>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>> Hi devs,
> > >>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>> I've been experiencing the pain
> > >>>>>>>>>>    resulting from lack
> > >>>>>>>>>>>>>>>>>     of stable
> > >>>>>>>>>>>>>>>>>>>> build
> > >>>>>>>>>>>>>>>>>>>>>>>> capacity on Travis for Flink
> > >>>>>>   PRs [1].
> > >>>>>>>>>>>> Specifically, I
> > >>>>>>>>>>>>>>>>> noticed
> > >>>>>>>>>>>>>>>>>>>> often
> > >>>>>>>>>>>>>>>>>>>>>> that
> > >>>>>>>>>>>>>>>>>>>>>>> no
> > >>>>>>>>>>>>>>>>>>>>>>>> build in the queue is making any
> > >>>>>>>>>>    progress for
> > >>>>>>>>>>>> hours,
> > >>>>>>>>>>>>>> and
> > >>>>>>>>>>>>>>>>>> suddenly
> > >>>>>>>>>>>>>>>>>>>> 5
> > >>>>>>>>>>>>>>>>>>>>> or
> > >>>>>>>>>>>>>>>>>>>>>> 6
> > >>>>>>>>>>>>>>>>>>>>>>>> builds kick off all together
> > >>>>>>   after the
> > >>>>>>>>>>    long pause.
> > >>>>>>>>>>>>>>>>>     I'm at PST
> > >>>>>>>>>>>>>>>>>>>>> (UTC-08)
> > >>>>>>>>>>>>>>>>>>>>>>> time
> > >>>>>>>>>>>>>>>>>>>>>>>> zone, and I've seen pause can
> > >>>>>>   be as
> > >>>>>>>>>>    long as 6 hours
> > >>>>>>>>>>>>>>>>>     from PST 9am
> > >>>>>>>>>>>>>>>>>>>> to
> > >>>>>>>>>>>>>>>>>>>>> 3pm
> > >>>>>>>>>>>>>>>>>>>>>>>> (let alone the time needed to
> > >>>>>>   drain the
> > >>>>>>>>>>    queue
> > >>>>>>>>>>>>>>>>> afterwards).
> > >>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>> I think this has greatly
> > >>>>>>   impacted our
> > >>>>>>>>>>    productivity.
> > >>>>>>>>>>>>>> I've
> > >>>>>>>>>>>>>>>>>>>> experienced
> > >>>>>>>>>>>>>>>>>>>>>> that
> > >>>>>>>>>>>>>>>>>>>>>>>> PRs submitted in the early
> > >>>>>>   morning of
> > >>>>>>>>>>    PST time zone
> > >>>>>>>>>>>>>>>>>     won't finish
> > >>>>>>>>>>>>>>>>>>>>> their
> > >>>>>>>>>>>>>>>>>>>>>>>> build until late night of the
> > >>>>>>   same day.
> > >>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>> So my questions are:
> > >>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>> - Has anyone else experienced
> > >>>>>>   the same
> > >>>>>>>>>>    problem or
> > >>>>>>>>>>>>>>>>>     have similar
> > >>>>>>>>>>>>>>>>>>>>>>> observation
> > >>>>>>>>>>>>>>>>>>>>>>>> on TravisCI? (I suspect it
> > >>>>>>   has things
> > >>>>>>>>>>    to do with
> > >>>>>>>>>>>> time
> > >>>>>>>>>>>>>>>>>     zone)
> > >>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>> - What pricing plan of
> > >>>>>>   TravisCI is
> > >>>>>>>>>>    Flink currently
> > >>>>>>>>>>>>>>>>> using? Is it
> > >>>>>>>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>>>>> free
> > >>>>>>>>>>>>>>>>>>>>>>>> plan for open source
> > >>>>>>   projects? What
> > >>>>>>>>>> are the
> > >>>>>>>>>>>>>>>>> guaranteed build
> > >>>>>>>>>>>>>>>>>>>> capacity
> > >>>>>>>>>>>>>>>>>>>>>> of
> > >>>>>>>>>>>>>>>>>>>>>>>> the current plan?
> > >>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>> - If the current pricing plan
> > >>>>>>   (either
> > >>>>>>>>>>    free or paid)
> > >>>>>>>>>>>>>>>> can't
> > >>>>>>>>>>>>>>>>>> provide
> > >>>>>>>>>>>>>>>>>>>>>> stable
> > >>>>>>>>>>>>>>>>>>>>>>>> build capacity, can we
> > >>>>>>   upgrade to a
> > >>>>>>>>>>    higher priced
> > >>>>>>>>>>>>>>>>>     plan with
> > >>>>>>>>>>>>>>>>>> larger
> > >>>>>>>>>>>>>>>>>>>>> and
> > >>>>>>>>>>>>>>>>>>>>>>> more
> > >>>>>>>>>>>>>>>>>>>>>>>> stable build capacity?
> > >>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>> BTW, another factor that
> > >>>>>>   contribute to
> > >>>>>>>>>> the
> > >>>>>>>>>>>>>>>>> productivity problem
> > >>>>>>>>>>>>>>>>>> is
> > >>>>>>>>>>>>>>>>>>>>> that
> > >>>>>>>>>>>>>>>>>>>>>>>> our build is slow - we run
> > >>>>>>   full build
> > >>>>>>>>>>    for every PR
> > >>>>>>>>>>>>>> and a
> > >>>>>>>>>>>>>>>>>>>> successful
> > >>>>>>>>>>>>>>>>>>>>>> full
> > >>>>>>>>>>>>>>>>>>>>>>>> build takes ~5h. We
> > >>>>>>   definitely have
> > >>>>>>>>>>    more options to
> > >>>>>>>>>>>>>>>>>     solve it,
> > >>>>>>>>>>>>>>>>>> for
> > >>>>>>>>>>>>>>>>>>>>>>> instance,
> > >>>>>>>>>>>>>>>>>>>>>>>> modularize the build graphs
> > >>>>>>   and reuse
> > >>>>>>>>>>    artifacts
> > >>>>>>>>>>>> from
> > >>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>> previous
> > >>>>>>>>>>>>>>>>>>>>>> build.
> > >>>>>>>>>>>>>>>>>>>>>>>> But I think that can be a big
> > >>>>>>   effort
> > >>>>>>>>>>    which is much
> > >>>>>>>>>>>>>>>>> harder to
> > >>>>>>>>>>>>>>>>>>>>> accomplish
> > >>>>>>>>>>>>>>>>>>>>>>> in
> > >>>>>>>>>>>>>>>>>>>>>>>> a short period of time and
> > >>>>>>   may deserve
> > >>>>>>>>>>    its own
> > >>>>>>>>>>>>>> separate
> > >>>>>>>>>>>>>>>>>>>> discussion.
> > >>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>> [1]
> > >>>>>>>>>>>> https://travis-ci.org/apache/flink/pull_requests
> > >>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>     --
> > >>>>>>>>>>>>>>>>>     Best Regards
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>     Jeff Zhang
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>>
> > >>>
> > >>>
> > >
> >
> >
> >
>

Re: [VOTE] Migrate to sponsored Travis account

Posted by vino yang <ya...@gmail.com>.
+1

Dian Fu <di...@gmail.com> 于2019年7月4日周四 下午7:09写道:

> +1. Thanks Chesnay and Bowen for pushing this forward.
>
> Regards,
> Dian
>
> > 在 2019年7月4日,下午6:28,zhijiang <wa...@aliyun.com.INVALID> 写道:
> >
> > +1 and thanks for Chesnay' work on this.
> >
> > Best,
> > Zhijiang
> >
> > ------------------------------------------------------------------
> > From:Haibo Sun <su...@163.com>
> > Send Time:2019年7月4日(星期四) 18:21
> > To:dev <de...@flink.apache.org>
> > Cc:private@flink.apache.org <pr...@flink.apache.org>
> > Subject:Re:Re: [VOTE] Migrate to sponsored Travis account
> >
> > +1. Thank Chesnay for pushing this forward.
> >
> > Best,
> > Haibo
> >
> >
> > At 2019-07-04 17:58:28, "Kurt Young" <yk...@gmail.com> wrote:
> >> +1 and great thanks Chesnay for pushing this.
> >>
> >> Best,
> >> Kurt
> >>
> >>
> >> On Thu, Jul 4, 2019 at 5:44 PM Aljoscha Krettek <al...@apache.org>
> wrote:
> >>
> >>> +1
> >>>
> >>> Aljoscha
> >>>
> >>>> On 4. Jul 2019, at 11:09, Stephan Ewen <se...@apache.org> wrote:
> >>>>
> >>>> +1 to move to a private Travis account.
> >>>>
> >>>> I can confirm that Ververica will sponsor a Travis CI plan that is
> >>>> equivalent or a bit higher than the previous ASF quota (10 concurrent
> >>> build
> >>>> queues)
> >>>>
> >>>> Best,
> >>>> Stephan
> >>>>
> >>>> On Thu, Jul 4, 2019 at 10:46 AM Chesnay Schepler <ch...@apache.org>
> >>> wrote:
> >>>>
> >>>>> I've raised a JIRA
> >>>>> <https://issues.apache.org/jira/browse/INFRA-18703>with INFRA to
> >>> inquire
> >>>>> whether it would be possible to switch to a different Travis account,
> >>>>> and if so what steps would need to be taken.
> >>>>> We need a proper confirmation from INFRA since we are not in full
> >>>>> control of the flink repository (for example, we cannot access the
> >>>>> settings page).
> >>>>>
> >>>>> If this is indeed possible, Ververica is willing sponsor a Travis
> >>>>> account for the Flink project.
> >>>>> This would provide us with more than enough resources than we need.
> >>>>>
> >>>>> Since this makes the project more reliant on resources provided by
> >>>>> external companies I would like to vote on this.
> >>>>>
> >>>>> Please vote on this proposal, as follows:
> >>>>> [ ] +1, Approve the migration to a Ververica-sponsored Travis
> account,
> >>>>> provided that INFRA approves
> >>>>> [ ] -1, Do not approach the migration to a Ververica-sponsored Travis
> >>>>> account
> >>>>>
> >>>>> The vote will be open for at least 24h, and until we have
> confirmation
> >>>>> from INFRA. The voting period may be shorter than the usual 3 days
> since
> >>>>> our current is effectively not working.
> >>>>>
> >>>>> On 04/07/2019 06:51, Bowen Li wrote:
> >>>>>> Re: > Are they using their own Travis CI pool, or did the switch to
> an
> >>>>>> entirely different CI service?
> >>>>>>
> >>>>>> I reached out to Wes and Krisztián from Apache Arrow PMC. They are
> >>>>>> currently moving away from ASF's Travis to their own in-house metal
> >>>>>> machines at [1] with custom CI application at [2]. They've seen
> >>>>>> significant improvement w.r.t both much higher performance and
> >>>>>> basically no resource waiting time, "night-and-day" difference
> quoting
> >>>>>> Wes.
> >>>>>>
> >>>>>> Re: > If we can just switch to our own Travis pool, just for our
> >>>>>> project, then this might be something we can do fairly quickly?
> >>>>>>
> >>>>>> I believe so, according to [3] and [4]
> >>>>>>
> >>>>>>
> >>>>>> [1] https://ci.ursalabs.org/ <https://ci.ursalabs.org/#/>
> >>>>>> [2] https://github.com/ursa-labs/ursabot
> >>>>>> [3]
> >>>>>>
> >>>
> https://docs.travis-ci.com/user/migrate/open-source-repository-migration
> >>>>>> [4]
> >>> https://docs.travis-ci.com/user/migrate/open-source-on-travis-ci-com
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On Wed, Jul 3, 2019 at 12:01 AM Chesnay Schepler <
> chesnay@apache.org
> >>>>>> <ma...@apache.org>> wrote:
> >>>>>>
> >>>>>>   Are they using their own Travis CI pool, or did the switch to an
> >>>>>>   entirely different CI service?
> >>>>>>
> >>>>>>   If we can just switch to our own Travis pool, just for our
> >>>>>>   project, then
> >>>>>>   this might be something we can do fairly quickly?
> >>>>>>
> >>>>>>   On 03/07/2019 05:55, Bowen Li wrote:
> >>>>>>> I responded in the INFRA ticket [1] that I believe they are
> >>>>>>   using a wrong
> >>>>>>> metric against Flink and the total build time is a completely
> >>>>>>   different
> >>>>>>> thing than guaranteed build capacity.
> >>>>>>>
> >>>>>>> My response:
> >>>>>>>
> >>>>>>> "As mentioned above, since I started to pay attention to Flink's
> >>>>>>   build
> >>>>>>> queue a few tens of days ago, I'm in Seattle and I saw no build
> >>>>>>   was kicking
> >>>>>>> off in PST daytime in weekdays for Flink. Our teammates in China
> >>>>>>   and Europe
> >>>>>>> have also reported similar observations. So we need to evaluate
> >>>>>>   how the
> >>>>>>> large total build time came from - if 1) your number and 2) our
> >>>>>>> observations from three locations that cover pretty much a full
> >>>>>>   day, are
> >>>>>>> all true, I **guess** one reason can be that - highly likely the
> >>>>>>   extra
> >>>>>>> build time came from weekends when other Apache projects may be
> >>>>>>   idle and
> >>>>>>> Flink just drains hard its congested queue.
> >>>>>>>
> >>>>>>> Please be aware of that we're not complaining about the lack of
> >>>>>>   resources
> >>>>>>> in general, I'm complaining about the lack of **stable, dedicated**
> >>>>>>> resources. An example for the latter one is, currently even if
> >>>>>>   no build is
> >>>>>>> in Flink's queue and I submit a request to be the queue head in PST
> >>>>>>> morning, my build won't even start in 6-8+h. That is an absurd
> >>>>>>   amount of
> >>>>>>> waiting time.
> >>>>>>>
> >>>>>>> That's saying, if ASF INFRA decides to adopt a quota system and
> >>>>>>   grants
> >>>>>>> Flink five DEDICATED servers that runs all the time only for
> >>>>>>   Flink, that'll
> >>>>>>> be PERFECT and can totally solve our problem now.
> >>>>>>>
> >>>>>>> Please be aware of that we're not complaining about the lack of
> >>>>>>   resources
> >>>>>>> in general, I'm complaining about the lack of **stable, dedicated**
> >>>>>>> resources. An example for the latter one is, currently even if
> >>>>>>   no build is
> >>>>>>> in Flink's queue and I submit a request to be the queue head in PST
> >>>>>>> morning, my build won't even start in 6-8+h. That is an absurd
> >>>>>>   amount of
> >>>>>>> waiting time.
> >>>>>>>
> >>>>>>>
> >>>>>>> That's saying, if ASF INFRA decides to adopt a quota system and
> >>>>>>   grants
> >>>>>>> Flink five DEDICATED servers that runs all the time only for
> >>>>>>   Flink, that'll
> >>>>>>> be PERFECT and can totally solve our problem now.
> >>>>>>>
> >>>>>>> I feel what's missing in the ASF INFRA's Travis resource pool is
> >>>>>>   some level
> >>>>>>> of build capacity SLAs and certainty"
> >>>>>>>
> >>>>>>>
> >>>>>>> Again, I believe there are differences in nature of these two
> >>>>>>   problems,
> >>>>>>> long build time v.s. lack of dedicated build resource. That's
> >>>>>>   saying,
> >>>>>>> shortening build time may relieve the situation, and may not.
> >>>>>>   I'm sightly
> >>>>>>> negative on disabling IT cases for PRs, due to the downside is
> >>>>>>   that we are
> >>>>>>> at risk of any potential bugs in PR that UTs doesn't catch, and
> >>>>>>   may cost a
> >>>>>>> lot more to fix and if it slows others down or even block
> >>>>>>   others, but am
> >>>>>>> open to others opinions on it.
> >>>>>>>
> >>>>>>> AFAICT from INFRA ticket[1], donating to ASF INFRA won't be
> >>>>>>   feasible to
> >>>>>>> solve our problem since INFRA's pool is fully shared and they
> >>>>>>   have no
> >>>>>>> control and finer insights over resource allocation to a
> >>>>>>   specific Apache
> >>>>>>> project. As mentioned in [1], Apache Arrow is moving away from
> >>>>>>   ASF INFRA
> >>>>>>> Travis pool (they are actually surprised Flink hasn't plan to do
> >>>>>>   so). I
> >>>>>>> know that Spark is on its own build infra. If we all agree that
> >>>>>>   funding our
> >>>>>>> own build infra, I'd be glad to help investigate any potential
> >>>>>>   options
> >>>>>>> after releasing 1.9 since I'm super busy with 1.9 now.
> >>>>>>>
> >>>>>>> [1] https://issues.apache.org/jira/browse/INFRA-18533
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On Tue, Jul 2, 2019 at 4:46 AM Chesnay Schepler
> >>>>>>   <chesnay@apache.org <ma...@apache.org>> wrote:
> >>>>>>>
> >>>>>>>> As a short-term stopgap, since we can assume this issue to
> >>>>>>   become much
> >>>>>>>> worse in the following days/weeks, we could disable IT cases in
> >>>>>>   PRs and
> >>>>>>>> only run them on master.
> >>>>>>>>
> >>>>>>>> On 02/07/2019 12:03, Chesnay Schepler wrote:
> >>>>>>>>> People really have to stop thinking that just because
> >>>>>>   something works
> >>>>>>>>> for us it is also a good solution.
> >>>>>>>>> Also, please remember that our builds run for 2h from start to
> >>>>>>   finish,
> >>>>>>>>> and not the 14 _minutes_ it takes for zeppelin.
> >>>>>>>>> We are dealing with an entirely different scale here, both in
> >>>>>>   terms of
> >>>>>>>>> build times and number of builds.
> >>>>>>>>>
> >>>>>>>>> In this very thread people have been complaining about long queue
> >>>>>>>>> times for their builds. Surprise, other Apache projects have been
> >>>>>>>>> suffering the very same thing due to us not controlling our build
> >>>>>>>>> times. While switching services (be it Jenkins, CircleCI or
> >>>>>>   whatever)
> >>>>>>>>> will possibly work for us (and these options are actually
> >>>>>>   attractive,
> >>>>>>>>> like CircleCI's proper support for build artifacts), it will also
> >>>>>>>>> result in us likely negatively affecting other projects in
> >>>>>>   significant
> >>>>>>>>> ways.
> >>>>>>>>>
> >>>>>>>>> Sure, the Jenkins setup has a good user experience for us, at
> >>>>>>   the cost
> >>>>>>>>> of blocking Jenkins workers for a _lot_ of time. Right now we
> >>>>>>   have 25
> >>>>>>>>> PR's in our queue; that's possibly 50h we'd consume of Jenkins
> >>>>>>>>> resources, and the European contributors haven't even really
> >>>>>>   started yet.
> >>>>>>>>>
> >>>>>>>>> FYI, the latest INFRA response from INFRA-18533:
> >>>>>>>>>
> >>>>>>>>> "Our rough metrics shows that Flink used over 5800 hours of
> >>>>>>   build time
> >>>>>>>>> last month. That is equal to EIGHT servers running 24/7 for
> >>>>>>   the ENTIRE
> >>>>>>>>> MONTH. EIGHT. nonstop.
> >>>>>>>>> When we discovered this last night, we discussed it some and
> >>>>>>   are going
> >>>>>>>>> to tune down Flink to allow only five executors maximum. We
> >>>>> cannot
> >>>>>>>>> allow Flink to consume so much of a Foundation shared resource."
> >>>>>>>>>
> >>>>>>>>> So yes, we either
> >>>>>>>>> a) have to heavily reduce our CI usage or
> >>>>>>>>> b) fund our own, either maintaining it ourselves or donating
> >>>>>>   to Apache.
> >>>>>>>>>
> >>>>>>>>> On 02/07/2019 05:11, Bowen Li wrote:
> >>>>>>>>>> By looking at the git history of the Jenkins script, its core
> >>>>>>   part
> >>>>>>>>>> was finished in March 2017 (and only two minor update in
> >>>>>>   2017/2018),
> >>>>>>>>>> so it's been running for over two years now and feels like
> >>>>>>   Zepplin
> >>>>>>>>>> community has been quite happy with it. @Jeff Zhang
> >>>>>>>>>> <mailto:zjffdu@gmail.com <ma...@gmail.com>> can you
> >>>>>>   share your insights and user
> >>>>>>>>>> experience with the Jenkins+Travis approach?
> >>>>>>>>>>
> >>>>>>>>>> Things like:
> >>>>>>>>>>
> >>>>>>>>>> - has the approach completely solved the resource capacity
> >>>>>>   problem
> >>>>>>>>>> for Zepplin community? is Zepplin community happy with the
> >>>>>>   result?
> >>>>>>>>>> - is the whole configuration chain stable (e.g. uptime) enough?
> >>>>>>>>>> - how often do you need to maintain the Jenkins infra? how many
> >>>>>>>>>> people are usually involved in maintenance and bug-fixes?
> >>>>>>>>>>
> >>>>>>>>>> The downside of this approach seems mostly to be on the
> >>>>>>   maintenance
> >>>>>>>>>> to me - maintain the script and Jenkins infra.
> >>>>>>>>>>
> >>>>>>>>>> ** Having Our Own Travis-CI.com Account **
> >>>>>>>>>>
> >>>>>>>>>> Another alternative I've been thinking of is to have our own
> >>>>>>>>>> travis-ci.com <http://travis-ci.com> <http://travis-ci.com>
> >>>>>>   account with paid dedicated
> >>>>>>>>>> resources. Note travis-ci.org <http://travis-ci.org>
> >>>>>>   <http://travis-ci.org> is the free
> >>>>>>>>>> version and travis-ci.com <http://travis-ci.com>
> >>>>>>   <http://travis-ci.com> is the commercial
> >>>>>>>>>> version. We currently use a shared resource pool managed by
> >>>>>>   ASK INFRA
> >>>>>>>>>> team on travis-ci.org <http://travis-ci.org>
> >>>>>>   <http://travis-ci.org>, but we have no control
> >>>>>>>>>> over it - we can't see how it's configured, how much
> >>>>>>   resources are
> >>>>>>>>>> available, how resources are allocated among Apache projects,
> >>>>>>   etc.
> >>>>>>>>>> The nice thing about having an account on travis-ci.com
> >>>>>>   <http://travis-ci.com>
> >>>>>>>>>> <http://travis-ci.com> are:
> >>>>>>>>>>
> >>>>>>>>>> - relatively low cost with much better resource guarantee
> >>>>>>   than what
> >>>>>>>>>> we currently have [1]: $249/month with 5 dedicated concurrency,
> >>>>>>>>>> $489/month with 10 concurrency
> >>>>>>>>>> - low maintenance work compared to using Jenkins
> >>>>>>>>>> - (potentially) no migration cost according to Travis's doc [2]
> >>>>>>>>>> (pending verification)
> >>>>>>>>>> - full control over the build capacity/configuration compared to
> >>>>>>>>>> using ASF INFRA's pool
> >>>>>>>>>>
> >>>>>>>>>> I'd be surprised if we as such a vibrant community cannot
> >>>>>>   find and
> >>>>>>>>>> fund $249*12=$2988 a year in exchange for a much better
> >>>>> developer
> >>>>>>>>>> experience and much higher productivity.
> >>>>>>>>>>
> >>>>>>>>>> [1] https://travis-ci.com/plans
> >>>>>>>>>> [2]
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>>
> >>>
> https://docs.travis-ci.com/user/migrate/open-source-repository-migration
> >>>>>>>>>> On Sat, Jun 29, 2019 at 8:39 AM Chesnay Schepler
> >>>>>>   <chesnay@apache.org <ma...@apache.org>
> >>>>>>>>>> <mailto:chesnay@apache.org <ma...@apache.org>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>    So yes, the Jenkins job keeps pulling the state from
> >>>>>>   Travis until it
> >>>>>>>>>>    finishes.
> >>>>>>>>>>
> >>>>>>>>>>    Note sure I'm comfortable with the idea of using Jenkins
> >>>>>>   workers
> >>>>>>>>>>    just to
> >>>>>>>>>>    idle for a several hours.
> >>>>>>>>>>
> >>>>>>>>>>    On 29/06/2019 14:56, Jeff Zhang wrote:
> >>>>>>>>>>> Here's what zeppelin community did, we make a python
> >>>>>>   script to
> >>>>>>>>>>    check the
> >>>>>>>>>>> build status of pull request.
> >>>>>>>>>>> Here's script:
> >>>>>>>>>>>
> >>>>>>   https://github.com/apache/zeppelin/blob/master/travis_check.py
> >>>>>>>>>>>
> >>>>>>>>>>> And this is the script we used in Jenkins build job.
> >>>>>>>>>>>
> >>>>>>>>>>> if [ -f "travis_check.py" ]; then
> >>>>>>>>>>>  git log -n 1
> >>>>>>>>>>>  STATUS=$(curl -s $BUILD_URL | grep -e "GitHub pull
> >>>>>>>>>>    request.*from.*" | sed
> >>>>>>>>>>> 's/.*GitHub pull request <a
> >>>>>>>>>>> href=\"\(https[^"]*\).*from[^"]*.\(https[^"]*\).*/\1
> >>>>>>   \2/g')
> >>>>>>>>>>>  AUTHOR=$(echo $STATUS | sed 's/.*[/]\(.*\)$/\1/g')
> >>>>>>>>>>>  PR=$(echo $STATUS | awk '{print $1}' | sed
> >>>>>>>>>> 's/.*[/]\(.*\)$/\1/g')
> >>>>>>>>>>>  #COMMIT=$(git log -n 1 | grep "^Merge:" | awk
> >>>>>>   '{print $3}')
> >>>>>>>>>>>  #if [ -z $COMMIT ]; then
> >>>>>>>>>>>  #  COMMIT=$(curl -s
> >>>>>>>>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR
> >>>>>>>>>>> | grep -e "\"label\":" -e "\"ref\":" -e "\"sha\":" |
> >>>>>>   tr '\n' ' '
> >>>>>>>>>>    | sed
> >>>>>>>>>>> 's/\(.*sha[^,]*,\)\(.*ref.*\)/\1 = \2/g' | tr = '\n' |
> >>>>>>   grep -v
> >>>>>>>>>>    "apache:" |
> >>>>>>>>>>> sed 's/.*sha.[^"]*["]\([^"]*\).*/\1/g')
> >>>>>>>>>>>  #fi
> >>>>>>>>>>>
> >>>>>>>>>>>  # get commit hash from PR
> >>>>>>>>>>>  COMMIT=$(curl -s
> >>>>>>>>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR |
> >>>>>>>>>>> grep -e "\"label\":" -e "\"ref\":" -e "\"sha\":" | tr
> >>>>>>   '\n' ' '
> >>>>>>>>>> | sed
> >>>>>>>>>>> 's/\(.*sha[^,]*,\)\(.*ref.*\)/\1 = \2/g' | tr = '\n' |
> >>>>>>   grep -v
> >>>>>>>>>>    "apache:" |
> >>>>>>>>>>> sed 's/.*sha.[^"]*["]\([^"]*\).*/\1/g')
> >>>>>>>>>>>  sleep 30 # sleep few moment to wait travis starts
> >>>>>>   the build
> >>>>>>>>>>>  RET_CODE=0
> >>>>>>>>>>>  python ./travis_check.py ${AUTHOR} ${COMMIT} ||
> >>>>>>   RET_CODE=$?
> >>>>>>>>>>>  if [ $RET_CODE -eq 2 ]; then # try with repository
> >>>>>>   name when
> >>>>>>>>>>    travis-ci is
> >>>>>>>>>>> not available in the account
> >>>>>>>>>>>    RET_CODE=0
> >>>>>>>>>>>    AUTHOR=$(curl -s
> >>>>>>>>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR
> >>>>>>>>>>> | grep '"full_name":' | grep -v "apache/zeppelin" | sed
> >>>>>>>>>>> 's/.*[:][^"]*["]\([^/]*\).*/\1/g')
> >>>>>>>>>>>  python ./travis_check.py ${AUTHOR} ${COMMIT} ||
> >>>>>>   RET_CODE=$?
> >>>>>>>>>>>  fi
> >>>>>>>>>>>
> >>>>>>>>>>>  if [ $RET_CODE -eq 2 ]; then # fail with can't find
> >>>>>>   build
> >>>>>>>>>>    information in
> >>>>>>>>>>> the travis
> >>>>>>>>>>>    set +x
> >>>>>>>>>>>    echo
> >>>>>>   "-----------------------------------------------------"
> >>>>>>>>>>>    echo "Looks like travis-ci is not configured for
> >>>>>>   your fork."
> >>>>>>>>>>>    echo "Please setup by swich on 'zeppelin'
> >>>>>>   repository at
> >>>>>>>>>>> https://travis-ci.org/profile and travis-ci."
> >>>>>>>>>>>    echo "And then make sure 'Build branch updates'
> >>>>>>   option is
> >>>>>>>>>>    enabled in
> >>>>>>>>>>> the settings
> >>>>>>   https://travis-ci.org/${AUTHOR}/zeppelin/settings
> >>>>>>   <https://travis-ci.org/$%7BAUTHOR%7D/zeppelin/settings>
> >>>>>>>>>> <https://travis-ci.org/$%7BAUTHOR%7D/zeppelin/settings>."
> >>>>>>>>>>>    echo ""
> >>>>>>>>>>>    echo "To trigger CI after setup, you will need
> >>>>>>   ammend your
> >>>>>>>>>>    last commit
> >>>>>>>>>>> with"
> >>>>>>>>>>>    echo "git commit --amend"
> >>>>>>>>>>>    echo "git push your-remote HEAD --force"
> >>>>>>>>>>>    echo ""
> >>>>>>>>>>>    echo "See
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>>
> >>>
> http://zeppelin.apache.org/contribution/contributions.html#continuous-integration
> >>>>>>>>>>> ."
> >>>>>>>>>>>  fi
> >>>>>>>>>>>
> >>>>>>>>>>>  exit $RET_CODE
> >>>>>>>>>>> else
> >>>>>>>>>>>  set +x
> >>>>>>>>>>>  echo "travis_check.py does not exists"
> >>>>>>>>>>>  exit 1
> >>>>>>>>>>> fi
> >>>>>>>>>>>
> >>>>>>>>>>> Chesnay Schepler <chesnay@apache.org
> >>>>>>   <ma...@apache.org>
> >>>>>>>>>>    <mailto:chesnay@apache.org <ma...@apache.org>>>
> >>>>>>   于2019年6月29日周六 下午3:17写道:
> >>>>>>>>>>>
> >>>>>>>>>>>> Does this imply that a Jenkins job is active as long
> >>>>>>   as the
> >>>>>>>>>>    Travis build
> >>>>>>>>>>>> runs?
> >>>>>>>>>>>>
> >>>>>>>>>>>> On 26/06/2019 21:28, Bowen Li wrote:
> >>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> @Dawid, I think the "long test running" as I
> >>>>>>   mentioned in the
> >>>>>>>>>>    first
> >>>>>>>>>>>> email,
> >>>>>>>>>>>>> also as you guys said, belongs to "a big effort
> >>>>>>   which is much
> >>>>>>>>>>    harder to
> >>>>>>>>>>>>> accomplish in a short period of time and may deserve
> >>>>>>   its own
> >>>>>>>>>>    separate
> >>>>>>>>>>>>> discussion". Thus I didn't include it in what we can
> >>>>>>   do in a
> >>>>>>>>>>    foreseeable
> >>>>>>>>>>>>> short term.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Besides, I don't think that's the ultimate reason
> >>>>>>   for lack of
> >>>>>>>>>>    build
> >>>>>>>>>>>>> resources. Even if the build is shortened to
> >>>>>>   something like
> >>>>>>>>>>    2h, the
> >>>>>>>>>>>>> problems of no build machine works about 6 or more
> >>>>>>   hours in
> >>>>>>>>>>    PST daytime
> >>>>>>>>>>>>> that I described will still happen, because no
> >>>>>>   machine from
> >>>>>>>>>>    ASF INFRA's
> >>>>>>>>>>>>> pool is allocated to Flink. As I have paid close
> >>>>>>   attention to
> >>>>>>>>>>    the build
> >>>>>>>>>>>>> queue in the past few weekdays, it's a pretty clear
> >>>>>>   pattern now.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> **The ultimate root cause** for that is - we don't
> >>>>>>   have any
> >>>>>>>>>>    **dedicated**
> >>>>>>>>>>>>> build resources that we can stably rely on. I'm
> >>>>>>   actually ok to
> >>>>>>>>>>    wait for a
> >>>>>>>>>>>>> long time if there are build requests running, it
> >>>>>>   means at
> >>>>>>>>>>    least we are
> >>>>>>>>>>>>> making progress. But I'm not ok with no build
> >>>>>>   resource. A
> >>>>>>>>>>    better place I
> >>>>>>>>>>>>> think we should aim at in short term is to always
> >>>>>>   have at
> >>>>>>>>>>    least a central
> >>>>>>>>>>>>> pool (can be 3 or 5) of machines dedicated to build
> >>>>>>   Flink at
> >>>>>>>>>>    any time, or
> >>>>>>>>>>>>> maybe use users resources.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> @Chesnay @Robert I synced with Jeff offline that
> >>>>>>   Zeppelin
> >>>>>>>>>>    community is
> >>>>>>>>>>>>> using a Jenkins job to automatically build on users'
> >>>>>>   travis
> >>>>>>>>>>    account and
> >>>>>>>>>>>>> link the result back to github PR. I guess the
> >>>>>>   Jenkins job
> >>>>>>>>>>    would fetch
> >>>>>>>>>>>>> latest upstream master and build the PR against it.
> >>>>>>   Jeff has
> >>>>>>>>>> filed
> >>>>>>>>>>>> tickets
> >>>>>>>>>>>>> to learn and get access to the Jenkins infra. It'll
> >>>>>>   better to
> >>>>>>>>>>    fully
> >>>>>>>>>>>>> understand it first before judging this approach.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I also heard good things about CircleCI, and ASF
> >>>>>>   INFRA seems
> >>>>>>>>>>    to have a
> >>>>>>>>>>>> pool
> >>>>>>>>>>>>> of build capacity there too. Can be an alternative
> >>>>>>   to consider.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Wed, Jun 26, 2019 at 12:44 AM Dawid Wysakowicz <
> >>>>>>>>>>>> dwysakowicz@apache.org
> >>>>>>   <ma...@apache.org> <mailto:dwysakowicz@apache.org
> >>>>>>   <ma...@apache.org>>>
> >>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> Sorry to jump in late, but I think Bowen missed the
> >>>>>>   most
> >>>>>>>>>>    important point
> >>>>>>>>>>>>>> from Chesnay's previous message in the summary. The
> >>>>>>   ultimate
> >>>>>>>>>>    reason for
> >>>>>>>>>>>>>> all the problems is that the tests take close to 2
> >>>>>>   hours to
> >>>>>>>>>>    run already.
> >>>>>>>>>>>>>> I fully support this claim: "Unless people start
> >>>>>>   caring about
> >>>>>>>>>>    test times
> >>>>>>>>>>>>>> before adding them, this issue cannot be solved"
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> This is also another reason why using user's Travis
> >>>>>>   account
> >>>>>>>>>>    won't help.
> >>>>>>>>>>>>>> Every few weeks we reach the user's time limit for
> >>>>>>   a single
> >>>>>>>>>>    profile.
> >>>>>>>>>>>>>> This makes the user's builds simply fail, until we
> >>>>>>   either
> >>>>>>>>>>    properly
> >>>>>>>>>>>>>> decrease the time the tests take (which I am not
> >>>>>>   sure we ever
> >>>>>>>>>>    did) or
> >>>>>>>>>>>>>> postpone the problem by splitting into more
> >>>>>>   profiles. (Note
> >>>>>>>>>>    that the ASF
> >>>>>>>>>>>>>> Travis account has higher time limits)
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Dawid
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On 26/06/2019 09:36, Robert Metzger wrote:
> >>>>>>>>>>>>>>> Do we know if using "the best" available hardware
> >>>>>>   would
> >>>>>>>>>>    improve the
> >>>>>>>>>>>> build
> >>>>>>>>>>>>>>> times?
> >>>>>>>>>>>>>>> Imagine we would run the build on machines with
> >>>>>>   plenty of
> >>>>>>>>>>    main memory
> >>>>>>>>>>>> to
> >>>>>>>>>>>>>>> mount everything to ramdisk + the latest CPU
> >>>>>>   architecture?
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Throwing hardware at the problem could help reduce
> >>>>>>   the time
> >>>>>>>>>>    of an
> >>>>>>>>>>>>>>> individual build, and using our own infrastructure
> >>>>>>   would
> >>>>>>>>>>    remove our
> >>>>>>>>>>>>>>> dependency on Apache's Travis account (with the
> >>>>>>   obvious
> >>>>>>>>>>    downside of
> >>>>>>>>>>>>>> having
> >>>>>>>>>>>>>>> to maintain the infrastructure)
> >>>>>>>>>>>>>>> We could use an open source travis alternative, to
> >>>>>>   have a
> >>>>>>>>>>    similar
> >>>>>>>>>>>>>>> experience and make the migration easy.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On Wed, Jun 26, 2019 at 9:34 AM Chesnay Schepler
> >>>>>>>>>>    <chesnay@apache.org <ma...@apache.org>
> >>>>>>   <mailto:chesnay@apache.org <ma...@apache.org>>>
> >>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>> From what I gathered, there's no special
> >>>>>>   sauce that the
> >>>>>>>>>>    Zeppelin
> >>>>>>>>>>>>>>>> project uses which actually integrates a users
> >>>>> Travis
> >>>>>>>>>>    account into the
> >>>>>>>>>>>>>> PR.
> >>>>>>>>>>>>>>>> They just disabled Travis for PRs. And that's
> >>>>>>   kind of it.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Naturally we can do this (duh) and safe the ASF a
> >>>>>>   fair
> >>>>>>>>>>    amount of
> >>>>>>>>>>>>>>>> resources, but there are downsides:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> The discoverability of the Travis check takes a
> >>>>>>   nose-dive.
> >>>>>>>>>>    Either we
> >>>>>>>>>>>>>>>> require every contributor to always, an every
> >>>>>>   commit, also
> >>>>>>>>>>    post a
> >>>>>>>>>>>> Travis
> >>>>>>>>>>>>>>>> build, or we have the reviewer sift through the
> >>>>>>>>>>    contributors account
> >>>>>>>>>>>> to
> >>>>>>>>>>>>>>>> find it.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> This is rather cumbersome. Additionally, it's
> >>>>>>   also not
> >>>>>>>>>>    equivalent to
> >>>>>>>>>>>>>>>> having a PR build.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> A normal branch build takes a branch as is and
> >>>>>>   tests it. A
> >>>>>>>>>>    PR build
> >>>>>>>>>>>>>>>> merges the branch into master, and then runs it.
> >>>>>>   (Fun fact:
> >>>>>>>>>>    This is
> >>>>>>>>>>>> why
> >>>>>>>>>>>>>>>> a PR without merge conflicts is not being run on
> >>>>>>   Travis.)
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> And ultimately, everyone can already make use of
> >>>>> this
> >>>>>>>>>>    approach anyway.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On 25/06/2019 08:02, Jark Wu wrote:
> >>>>>>>>>>>>>>>>> Hi Jeff,
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Thanks for sharing the Zeppelin approach. I
> >>>>>>   think it's a
> >>>>>>>>>>    good idea to
> >>>>>>>>>>>>>>>>> leverage user's travis account.
> >>>>>>>>>>>>>>>>> In this way, we can have almost unlimited
> >>>>>>   concurrent build
> >>>>>>>>>>    jobs and
> >>>>>>>>>>>>>>>>> developers can restart build by themselves
> >>>>>>   (currently only
> >>>>>>>>>>    committers
> >>>>>>>>>>>>>>>>> can restart PR's build).
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> But I'm still not very clear how to integrate
> >>>>> user's
> >>>>>>>>>>    travis build
> >>>>>>>>>>>> into
> >>>>>>>>>>>>>>>>> the Flink pull request's build automatically.
> >>>>>>   Can you
> >>>>>>>>>>    explain more in
> >>>>>>>>>>>>>>>>> detail?
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Another question: does travis only build
> >>>>>>   branches for user
> >>>>>>>>>>    account?
> >>>>>>>>>>>>>>>>> My concern is that builds for PRs will rebase
> >>>>> user's
> >>>>>>>>>>    commits against
> >>>>>>>>>>>>>>>>> current master branch.
> >>>>>>>>>>>>>>>>> This will help us to find problems before
> >>>>>>   merge.  Builds
> >>>>>>>>>>    for branches
> >>>>>>>>>>>>>>>>> will lose the impact of new commits in master.
> >>>>>>>>>>>>>>>>> How does Zeppelin solve this problem?
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Thanks again for sharing the idea.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Regards,
> >>>>>>>>>>>>>>>>> Jark
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> On Tue, 25 Jun 2019 at 11:01, Jeff Zhang
> >>>>>>   <zjffdu@gmail.com <ma...@gmail.com>
> >>>>>>>>>>    <mailto:zjffdu@gmail.com <ma...@gmail.com>>
> >>>>>>>>>>>>>>>>> <mailto:zjffdu@gmail.com
> >>>>>>   <ma...@gmail.com> <mailto:zjffdu@gmail.com
> >>>>>>   <ma...@gmail.com>>>> wrote:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>     Hi Folks,
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Zeppelin meet this kind of issue before, we solve
> >>>>>>>>>> it by
> >>>>>>>>>>>> delegating
> >>>>>>>>>>>>>>>>>     each
> >>>>>>>>>>>>>>>>>     one's PR build to his travis account
> >>>>>>   (Everyone can
> >>>>>>>>>>    have 5 free
> >>>>>>>>>>>>>>>>>     slot for
> >>>>>>>>>>>>>>>>> travis build).
> >>>>>>>>>>>>>>>>> Apache account travis build is only triggered when
> >>>>>>>>>>    PR is merged.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>     Kurt Young <ykt836@gmail.com
> >>>>>>   <ma...@gmail.com>
> >>>>>>>>>>    <mailto:ykt836@gmail.com <ma...@gmail.com>>
> >>>>>>   <mailto:ykt836@gmail.com <ma...@gmail.com>
> >>>>>>>>>>    <mailto:ykt836@gmail.com <ma...@gmail.com>>>>
> >>>>>>>>>>>>>>>>> 于2019年6月25日周二 上午10:16写道:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> (Forgot to cc George)
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>>>> Kurt
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> On Tue, Jun 25, 2019 at 10:16 AM Kurt Young
> >>>>>>>>>>    <ykt836@gmail.com <ma...@gmail.com>
> >>>>>>   <mailto:ykt836@gmail.com <ma...@gmail.com>>
> >>>>>>>>>>>>>>>>> <mailto:ykt836@gmail.com
> >>>>>>   <ma...@gmail.com> <mailto:ykt836@gmail.com
> >>>>>>   <ma...@gmail.com>>>>
> >>>>>>>>>>    wrote:
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Hi Bowen,
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Thanks for bringing this up. We
> >>>>>>   actually have
> >>>>>>>>>>    discussed
> >>>>>>>>>>>> about
> >>>>>>>>>>>>>>>>>     this, and I
> >>>>>>>>>>>>>>>>>>> think Till and George have
> >>>>>>>>>>>>>>>>>>> already spend sometime investigating
> >>>>>>   it. I have
> >>>>>>>>>>    cced both of
> >>>>>>>>>>>>>>>>>     them, and
> >>>>>>>>>>>>>>>>>>> maybe they can share
> >>>>>>>>>>>>>>>>>>> their findings.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>>>>> Kurt
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> On Tue, Jun 25, 2019 at 10:08 AM Jark Wu
> >>>>>>>>>>    <imjark@gmail.com <ma...@gmail.com>
> >>>>>>   <mailto:imjark@gmail.com <ma...@gmail.com>>
> >>>>>>>>>>>>>>>>> <mailto:imjark@gmail.com
> >>>>>>   <ma...@gmail.com> <mailto:imjark@gmail.com
> >>>>>>   <ma...@gmail.com>>>>
> >>>>>>>>>>    wrote:
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Hi Bowen,
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Thanks for bringing this. We also
> >>>>>>   suffered from
> >>>>>>>>>>    the long
> >>>>>>>>>>>>>>>>>     build time.
> >>>>>>>>>>>>>>>>>>>> I agree that we should focus on
> >>>>>>   solving build
> >>>>>>>>>>    capacity
> >>>>>>>>>>>>>>>>> problem in the
> >>>>>>>>>>>>>>>>>>>> thread.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> My observation is there is only one
> >>>>>>   build is
> >>>>>>>>>>    running, all
> >>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>> others
> >>>>>>>>>>>>>>>>>>>> (other
> >>>>>>>>>>>>>>>>>>>> PRs, master) are pending.
> >>>>>>>>>>>>>>>>>>>> The pricing plan[1] of travis shows
> >>>>>>   it can
> >>>>>>>>>> support
> >>>>>>>>>>>> concurrent
> >>>>>>>>>>>>>>>>>     build
> >>>>>>>>>>>>>>>>>> jobs.
> >>>>>>>>>>>>>>>>>>>> But I don't know which plan we are
> >>>>>>   using, might
> >>>>>>>>>>    be the free
> >>>>>>>>>>>>>>>>>     plan for
> >>>>>>>>>>>>>>>>>> open
> >>>>>>>>>>>>>>>>>>>> source.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> I cc-ed Chesnay who may have some
> >>>>>>   experience on
> >>>>>>>>>>    Travis.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Regards,
> >>>>>>>>>>>>>>>>>>>> Jark
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> [1]: https://travis-ci.com/plans
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> On Tue, 25 Jun 2019 at 08:11, Bowen Li <
> >>>>>>>>>>>> bowenli86@gmail.com <ma...@gmail.com>
> >>>>>>   <mailto:bowenli86@gmail.com <ma...@gmail.com>>
> >>>>>>>>>>>>>>>>> <mailto:bowenli86@gmail.com
> >>>>>>   <ma...@gmail.com>
> >>>>>>>>>>    <mailto:bowenli86@gmail.com
> >>>>>>   <ma...@gmail.com>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Hi Steven,
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> I think you may not read what I
> >>>>>>   wrote. The
> >>>>>>>>>>    discussion is
> >>>>>>>>>>>>>> about
> >>>>>>>>>>>>>>>>>> "unstable
> >>>>>>>>>>>>>>>>>>>>> build **capacity**", in another word
> >>>>>>>>>>    "unstable / lack of
> >>>>>>>>>>>>>> build
> >>>>>>>>>>>>>>>>>>>> resources",
> >>>>>>>>>>>>>>>>>>>>> not "unstable build".
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> On Mon, Jun 24, 2019 at 4:40 PM
> >>>>>>   Steven Wu
> >>>>>>>>>>>>>>>>>     <stevenz3wu@gmail.com
> >>>>>>   <ma...@gmail.com> <mailto:stevenz3wu@gmail.com
> >>>>>>   <ma...@gmail.com>>
> >>>>>>>>>>    <mailto:stevenz3wu@gmail.com
> >>>>>>   <ma...@gmail.com> <mailto:stevenz3wu@gmail.com
> >>>>>>   <ma...@gmail.com>>>>
> >>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> long and sometimes unstable build is
> >>>>>>>>>>    definitely a pain
> >>>>>>>>>>>>>>>> point.
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> I suspect the build failure here in
> >>>>>>>>>>>> flink-connector-kafka
> >>>>>>>>>>>>>>>>>     is not
> >>>>>>>>>>>>>>>>>>>> related
> >>>>>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>> my change. but there is no easy
> >>>>>>   re-run the
> >>>>>>>>>>    build on
> >>>>>>>>>>>>>>>>> travis UI.
> >>>>>>>>>>>>>>>>>> Google
> >>>>>>>>>>>>>>>>>>>>>> search showed a trick of
> >>>>>>   close-and-open the
> >>>>>>>>>>    PR will
> >>>>>>>>>>>>>>>>> trigger rebuild.
> >>>>>>>>>>>>>>>>>>>> but
> >>>>>>>>>>>>>>>>>>>>>> that could add noises to the PR
> >>>>>>   activities.
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>> https://travis-ci.org/apache/flink/jobs/545555519
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> travis-ci for my personal repo
> >>>>>>   often failed
> >>>>>>>>>>    with
> >>>>>>>>>>>>>>>>> exceeding time
> >>>>>>>>>>>>>>>>>> limit
> >>>>>>>>>>>>>>>>>>>>> after
> >>>>>>>>>>>>>>>>>>>>>> 4+ hours.
> >>>>>>>>>>>>>>>>>>>>>> The job exceeded the maximum time
> >>>>>>   limit for
> >>>>>>>>>>    jobs, and
> >>>>>>>>>>>> has
> >>>>>>>>>>>>>>>>>     been
> >>>>>>>>>>>>>>>>>>>>> terminated.
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> On Mon, Jun 24, 2019 at 4:15 PM
> >>>>>>   Bowen Li
> >>>>>>>>>>>>>>>>>     <bowenli86@gmail.com
> >>>>>>   <ma...@gmail.com> <mailto:bowenli86@gmail.com
> >>>>>>   <ma...@gmail.com>>
> >>>>>>>>>>    <mailto:bowenli86@gmail.com <ma...@gmail.com>
> >>>>>>   <mailto:bowenli86@gmail.com <ma...@gmail.com>>>>
> >>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>> https://travis-ci.org/apache/flink/builds/549681530
> >>>>>>>>>>>>>>>>>     This build
> >>>>>>>>>>>>>>>>>>>>> request
> >>>>>>>>>>>>>>>>>>>>>>> has
> >>>>>>>>>>>>>>>>>>>>>>> been sitting at **HEAD of the
> >>>>>>   queue**
> >>>>>>>>>>    since I first
> >>>>>>>>>>>> saw
> >>>>>>>>>>>>>>>>>     it at PST
> >>>>>>>>>>>>>>>>>>>>> 10:30am
> >>>>>>>>>>>>>>>>>>>>>>> (not sure how long it's been
> >>>>>>   there before
> >>>>>>>>>>    10:30am).
> >>>>>>>>>>>>>>>>>     It's PST
> >>>>>>>>>>>>>>>>>> 4:12pm
> >>>>>>>>>>>>>>>>>>>> now
> >>>>>>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>> it hasn't started yet.
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> On Mon, Jun 24, 2019 at 2:48 PM
> >>>>>>   Bowen Li
> >>>>>>>>>>>>>>>>>     <bowenli86@gmail.com
> >>>>>>   <ma...@gmail.com> <mailto:bowenli86@gmail.com
> >>>>>>   <ma...@gmail.com>>
> >>>>>>>>>>    <mailto:bowenli86@gmail.com <ma...@gmail.com>
> >>>>>>   <mailto:bowenli86@gmail.com <ma...@gmail.com>>>>
> >>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> Hi devs,
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> I've been experiencing the pain
> >>>>>>>>>>    resulting from lack
> >>>>>>>>>>>>>>>>>     of stable
> >>>>>>>>>>>>>>>>>>>> build
> >>>>>>>>>>>>>>>>>>>>>>>> capacity on Travis for Flink
> >>>>>>   PRs [1].
> >>>>>>>>>>>> Specifically, I
> >>>>>>>>>>>>>>>>> noticed
> >>>>>>>>>>>>>>>>>>>> often
> >>>>>>>>>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>>>>>>> no
> >>>>>>>>>>>>>>>>>>>>>>>> build in the queue is making any
> >>>>>>>>>>    progress for
> >>>>>>>>>>>> hours,
> >>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>> suddenly
> >>>>>>>>>>>>>>>>>>>> 5
> >>>>>>>>>>>>>>>>>>>>> or
> >>>>>>>>>>>>>>>>>>>>>> 6
> >>>>>>>>>>>>>>>>>>>>>>>> builds kick off all together
> >>>>>>   after the
> >>>>>>>>>>    long pause.
> >>>>>>>>>>>>>>>>>     I'm at PST
> >>>>>>>>>>>>>>>>>>>>> (UTC-08)
> >>>>>>>>>>>>>>>>>>>>>>> time
> >>>>>>>>>>>>>>>>>>>>>>>> zone, and I've seen pause can
> >>>>>>   be as
> >>>>>>>>>>    long as 6 hours
> >>>>>>>>>>>>>>>>>     from PST 9am
> >>>>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>> 3pm
> >>>>>>>>>>>>>>>>>>>>>>>> (let alone the time needed to
> >>>>>>   drain the
> >>>>>>>>>>    queue
> >>>>>>>>>>>>>>>>> afterwards).
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> I think this has greatly
> >>>>>>   impacted our
> >>>>>>>>>>    productivity.
> >>>>>>>>>>>>>> I've
> >>>>>>>>>>>>>>>>>>>> experienced
> >>>>>>>>>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>>>>>>>> PRs submitted in the early
> >>>>>>   morning of
> >>>>>>>>>>    PST time zone
> >>>>>>>>>>>>>>>>>     won't finish
> >>>>>>>>>>>>>>>>>>>>> their
> >>>>>>>>>>>>>>>>>>>>>>>> build until late night of the
> >>>>>>   same day.
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> So my questions are:
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> - Has anyone else experienced
> >>>>>>   the same
> >>>>>>>>>>    problem or
> >>>>>>>>>>>>>>>>>     have similar
> >>>>>>>>>>>>>>>>>>>>>>> observation
> >>>>>>>>>>>>>>>>>>>>>>>> on TravisCI? (I suspect it
> >>>>>>   has things
> >>>>>>>>>>    to do with
> >>>>>>>>>>>> time
> >>>>>>>>>>>>>>>>>     zone)
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> - What pricing plan of
> >>>>>>   TravisCI is
> >>>>>>>>>>    Flink currently
> >>>>>>>>>>>>>>>>> using? Is it
> >>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>> free
> >>>>>>>>>>>>>>>>>>>>>>>> plan for open source
> >>>>>>   projects? What
> >>>>>>>>>> are the
> >>>>>>>>>>>>>>>>> guaranteed build
> >>>>>>>>>>>>>>>>>>>> capacity
> >>>>>>>>>>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>>>>>>>> the current plan?
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> - If the current pricing plan
> >>>>>>   (either
> >>>>>>>>>>    free or paid)
> >>>>>>>>>>>>>>>> can't
> >>>>>>>>>>>>>>>>>> provide
> >>>>>>>>>>>>>>>>>>>>>> stable
> >>>>>>>>>>>>>>>>>>>>>>>> build capacity, can we
> >>>>>>   upgrade to a
> >>>>>>>>>>    higher priced
> >>>>>>>>>>>>>>>>>     plan with
> >>>>>>>>>>>>>>>>>> larger
> >>>>>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>> more
> >>>>>>>>>>>>>>>>>>>>>>>> stable build capacity?
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> BTW, another factor that
> >>>>>>   contribute to
> >>>>>>>>>> the
> >>>>>>>>>>>>>>>>> productivity problem
> >>>>>>>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>>>>>>>> our build is slow - we run
> >>>>>>   full build
> >>>>>>>>>>    for every PR
> >>>>>>>>>>>>>> and a
> >>>>>>>>>>>>>>>>>>>> successful
> >>>>>>>>>>>>>>>>>>>>>> full
> >>>>>>>>>>>>>>>>>>>>>>>> build takes ~5h. We
> >>>>>>   definitely have
> >>>>>>>>>>    more options to
> >>>>>>>>>>>>>>>>>     solve it,
> >>>>>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>>>>>>> instance,
> >>>>>>>>>>>>>>>>>>>>>>>> modularize the build graphs
> >>>>>>   and reuse
> >>>>>>>>>>    artifacts
> >>>>>>>>>>>> from
> >>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>> previous
> >>>>>>>>>>>>>>>>>>>>>> build.
> >>>>>>>>>>>>>>>>>>>>>>>> But I think that can be a big
> >>>>>>   effort
> >>>>>>>>>>    which is much
> >>>>>>>>>>>>>>>>> harder to
> >>>>>>>>>>>>>>>>>>>>> accomplish
> >>>>>>>>>>>>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>>>>>>>>> a short period of time and
> >>>>>>   may deserve
> >>>>>>>>>>    its own
> >>>>>>>>>>>>>> separate
> >>>>>>>>>>>>>>>>>>>> discussion.
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> [1]
> >>>>>>>>>>>> https://travis-ci.org/apache/flink/pull_requests
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>     --
> >>>>>>>>>>>>>>>>>     Best Regards
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>     Jeff Zhang
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>
> >>>
> >
>
>
>

Re: [VOTE] Migrate to sponsored Travis account

Posted by Dian Fu <di...@gmail.com>.
+1. Thanks Chesnay and Bowen for pushing this forward.

Regards,
Dian

> 在 2019年7月4日,下午6:28,zhijiang <wa...@aliyun.com.INVALID> 写道:
> 
> +1 and thanks for Chesnay' work on this.
> 
> Best,
> Zhijiang
> 
> ------------------------------------------------------------------
> From:Haibo Sun <su...@163.com>
> Send Time:2019年7月4日(星期四) 18:21
> To:dev <de...@flink.apache.org>
> Cc:private@flink.apache.org <pr...@flink.apache.org>
> Subject:Re:Re: [VOTE] Migrate to sponsored Travis account
> 
> +1. Thank Chesnay for pushing this forward.
> 
> Best,
> Haibo
> 
> 
> At 2019-07-04 17:58:28, "Kurt Young" <yk...@gmail.com> wrote:
>> +1 and great thanks Chesnay for pushing this.
>> 
>> Best,
>> Kurt
>> 
>> 
>> On Thu, Jul 4, 2019 at 5:44 PM Aljoscha Krettek <al...@apache.org> wrote:
>> 
>>> +1
>>> 
>>> Aljoscha
>>> 
>>>> On 4. Jul 2019, at 11:09, Stephan Ewen <se...@apache.org> wrote:
>>>> 
>>>> +1 to move to a private Travis account.
>>>> 
>>>> I can confirm that Ververica will sponsor a Travis CI plan that is
>>>> equivalent or a bit higher than the previous ASF quota (10 concurrent
>>> build
>>>> queues)
>>>> 
>>>> Best,
>>>> Stephan
>>>> 
>>>> On Thu, Jul 4, 2019 at 10:46 AM Chesnay Schepler <ch...@apache.org>
>>> wrote:
>>>> 
>>>>> I've raised a JIRA
>>>>> <https://issues.apache.org/jira/browse/INFRA-18703>with INFRA to
>>> inquire
>>>>> whether it would be possible to switch to a different Travis account,
>>>>> and if so what steps would need to be taken.
>>>>> We need a proper confirmation from INFRA since we are not in full
>>>>> control of the flink repository (for example, we cannot access the
>>>>> settings page).
>>>>> 
>>>>> If this is indeed possible, Ververica is willing sponsor a Travis
>>>>> account for the Flink project.
>>>>> This would provide us with more than enough resources than we need.
>>>>> 
>>>>> Since this makes the project more reliant on resources provided by
>>>>> external companies I would like to vote on this.
>>>>> 
>>>>> Please vote on this proposal, as follows:
>>>>> [ ] +1, Approve the migration to a Ververica-sponsored Travis account,
>>>>> provided that INFRA approves
>>>>> [ ] -1, Do not approach the migration to a Ververica-sponsored Travis
>>>>> account
>>>>> 
>>>>> The vote will be open for at least 24h, and until we have confirmation
>>>>> from INFRA. The voting period may be shorter than the usual 3 days since
>>>>> our current is effectively not working.
>>>>> 
>>>>> On 04/07/2019 06:51, Bowen Li wrote:
>>>>>> Re: > Are they using their own Travis CI pool, or did the switch to an
>>>>>> entirely different CI service?
>>>>>> 
>>>>>> I reached out to Wes and Krisztián from Apache Arrow PMC. They are
>>>>>> currently moving away from ASF's Travis to their own in-house metal
>>>>>> machines at [1] with custom CI application at [2]. They've seen
>>>>>> significant improvement w.r.t both much higher performance and
>>>>>> basically no resource waiting time, "night-and-day" difference quoting
>>>>>> Wes.
>>>>>> 
>>>>>> Re: > If we can just switch to our own Travis pool, just for our
>>>>>> project, then this might be something we can do fairly quickly?
>>>>>> 
>>>>>> I believe so, according to [3] and [4]
>>>>>> 
>>>>>> 
>>>>>> [1] https://ci.ursalabs.org/ <https://ci.ursalabs.org/#/>
>>>>>> [2] https://github.com/ursa-labs/ursabot
>>>>>> [3]
>>>>>> 
>>> https://docs.travis-ci.com/user/migrate/open-source-repository-migration
>>>>>> [4]
>>> https://docs.travis-ci.com/user/migrate/open-source-on-travis-ci-com
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Wed, Jul 3, 2019 at 12:01 AM Chesnay Schepler <chesnay@apache.org
>>>>>> <ma...@apache.org>> wrote:
>>>>>> 
>>>>>>   Are they using their own Travis CI pool, or did the switch to an
>>>>>>   entirely different CI service?
>>>>>> 
>>>>>>   If we can just switch to our own Travis pool, just for our
>>>>>>   project, then
>>>>>>   this might be something we can do fairly quickly?
>>>>>> 
>>>>>>   On 03/07/2019 05:55, Bowen Li wrote:
>>>>>>> I responded in the INFRA ticket [1] that I believe they are
>>>>>>   using a wrong
>>>>>>> metric against Flink and the total build time is a completely
>>>>>>   different
>>>>>>> thing than guaranteed build capacity.
>>>>>>> 
>>>>>>> My response:
>>>>>>> 
>>>>>>> "As mentioned above, since I started to pay attention to Flink's
>>>>>>   build
>>>>>>> queue a few tens of days ago, I'm in Seattle and I saw no build
>>>>>>   was kicking
>>>>>>> off in PST daytime in weekdays for Flink. Our teammates in China
>>>>>>   and Europe
>>>>>>> have also reported similar observations. So we need to evaluate
>>>>>>   how the
>>>>>>> large total build time came from - if 1) your number and 2) our
>>>>>>> observations from three locations that cover pretty much a full
>>>>>>   day, are
>>>>>>> all true, I **guess** one reason can be that - highly likely the
>>>>>>   extra
>>>>>>> build time came from weekends when other Apache projects may be
>>>>>>   idle and
>>>>>>> Flink just drains hard its congested queue.
>>>>>>> 
>>>>>>> Please be aware of that we're not complaining about the lack of
>>>>>>   resources
>>>>>>> in general, I'm complaining about the lack of **stable, dedicated**
>>>>>>> resources. An example for the latter one is, currently even if
>>>>>>   no build is
>>>>>>> in Flink's queue and I submit a request to be the queue head in PST
>>>>>>> morning, my build won't even start in 6-8+h. That is an absurd
>>>>>>   amount of
>>>>>>> waiting time.
>>>>>>> 
>>>>>>> That's saying, if ASF INFRA decides to adopt a quota system and
>>>>>>   grants
>>>>>>> Flink five DEDICATED servers that runs all the time only for
>>>>>>   Flink, that'll
>>>>>>> be PERFECT and can totally solve our problem now.
>>>>>>> 
>>>>>>> Please be aware of that we're not complaining about the lack of
>>>>>>   resources
>>>>>>> in general, I'm complaining about the lack of **stable, dedicated**
>>>>>>> resources. An example for the latter one is, currently even if
>>>>>>   no build is
>>>>>>> in Flink's queue and I submit a request to be the queue head in PST
>>>>>>> morning, my build won't even start in 6-8+h. That is an absurd
>>>>>>   amount of
>>>>>>> waiting time.
>>>>>>> 
>>>>>>> 
>>>>>>> That's saying, if ASF INFRA decides to adopt a quota system and
>>>>>>   grants
>>>>>>> Flink five DEDICATED servers that runs all the time only for
>>>>>>   Flink, that'll
>>>>>>> be PERFECT and can totally solve our problem now.
>>>>>>> 
>>>>>>> I feel what's missing in the ASF INFRA's Travis resource pool is
>>>>>>   some level
>>>>>>> of build capacity SLAs and certainty"
>>>>>>> 
>>>>>>> 
>>>>>>> Again, I believe there are differences in nature of these two
>>>>>>   problems,
>>>>>>> long build time v.s. lack of dedicated build resource. That's
>>>>>>   saying,
>>>>>>> shortening build time may relieve the situation, and may not.
>>>>>>   I'm sightly
>>>>>>> negative on disabling IT cases for PRs, due to the downside is
>>>>>>   that we are
>>>>>>> at risk of any potential bugs in PR that UTs doesn't catch, and
>>>>>>   may cost a
>>>>>>> lot more to fix and if it slows others down or even block
>>>>>>   others, but am
>>>>>>> open to others opinions on it.
>>>>>>> 
>>>>>>> AFAICT from INFRA ticket[1], donating to ASF INFRA won't be
>>>>>>   feasible to
>>>>>>> solve our problem since INFRA's pool is fully shared and they
>>>>>>   have no
>>>>>>> control and finer insights over resource allocation to a
>>>>>>   specific Apache
>>>>>>> project. As mentioned in [1], Apache Arrow is moving away from
>>>>>>   ASF INFRA
>>>>>>> Travis pool (they are actually surprised Flink hasn't plan to do
>>>>>>   so). I
>>>>>>> know that Spark is on its own build infra. If we all agree that
>>>>>>   funding our
>>>>>>> own build infra, I'd be glad to help investigate any potential
>>>>>>   options
>>>>>>> after releasing 1.9 since I'm super busy with 1.9 now.
>>>>>>> 
>>>>>>> [1] https://issues.apache.org/jira/browse/INFRA-18533
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On Tue, Jul 2, 2019 at 4:46 AM Chesnay Schepler
>>>>>>   <chesnay@apache.org <ma...@apache.org>> wrote:
>>>>>>> 
>>>>>>>> As a short-term stopgap, since we can assume this issue to
>>>>>>   become much
>>>>>>>> worse in the following days/weeks, we could disable IT cases in
>>>>>>   PRs and
>>>>>>>> only run them on master.
>>>>>>>> 
>>>>>>>> On 02/07/2019 12:03, Chesnay Schepler wrote:
>>>>>>>>> People really have to stop thinking that just because
>>>>>>   something works
>>>>>>>>> for us it is also a good solution.
>>>>>>>>> Also, please remember that our builds run for 2h from start to
>>>>>>   finish,
>>>>>>>>> and not the 14 _minutes_ it takes for zeppelin.
>>>>>>>>> We are dealing with an entirely different scale here, both in
>>>>>>   terms of
>>>>>>>>> build times and number of builds.
>>>>>>>>> 
>>>>>>>>> In this very thread people have been complaining about long queue
>>>>>>>>> times for their builds. Surprise, other Apache projects have been
>>>>>>>>> suffering the very same thing due to us not controlling our build
>>>>>>>>> times. While switching services (be it Jenkins, CircleCI or
>>>>>>   whatever)
>>>>>>>>> will possibly work for us (and these options are actually
>>>>>>   attractive,
>>>>>>>>> like CircleCI's proper support for build artifacts), it will also
>>>>>>>>> result in us likely negatively affecting other projects in
>>>>>>   significant
>>>>>>>>> ways.
>>>>>>>>> 
>>>>>>>>> Sure, the Jenkins setup has a good user experience for us, at
>>>>>>   the cost
>>>>>>>>> of blocking Jenkins workers for a _lot_ of time. Right now we
>>>>>>   have 25
>>>>>>>>> PR's in our queue; that's possibly 50h we'd consume of Jenkins
>>>>>>>>> resources, and the European contributors haven't even really
>>>>>>   started yet.
>>>>>>>>> 
>>>>>>>>> FYI, the latest INFRA response from INFRA-18533:
>>>>>>>>> 
>>>>>>>>> "Our rough metrics shows that Flink used over 5800 hours of
>>>>>>   build time
>>>>>>>>> last month. That is equal to EIGHT servers running 24/7 for
>>>>>>   the ENTIRE
>>>>>>>>> MONTH. EIGHT. nonstop.
>>>>>>>>> When we discovered this last night, we discussed it some and
>>>>>>   are going
>>>>>>>>> to tune down Flink to allow only five executors maximum. We
>>>>> cannot
>>>>>>>>> allow Flink to consume so much of a Foundation shared resource."
>>>>>>>>> 
>>>>>>>>> So yes, we either
>>>>>>>>> a) have to heavily reduce our CI usage or
>>>>>>>>> b) fund our own, either maintaining it ourselves or donating
>>>>>>   to Apache.
>>>>>>>>> 
>>>>>>>>> On 02/07/2019 05:11, Bowen Li wrote:
>>>>>>>>>> By looking at the git history of the Jenkins script, its core
>>>>>>   part
>>>>>>>>>> was finished in March 2017 (and only two minor update in
>>>>>>   2017/2018),
>>>>>>>>>> so it's been running for over two years now and feels like
>>>>>>   Zepplin
>>>>>>>>>> community has been quite happy with it. @Jeff Zhang
>>>>>>>>>> <mailto:zjffdu@gmail.com <ma...@gmail.com>> can you
>>>>>>   share your insights and user
>>>>>>>>>> experience with the Jenkins+Travis approach?
>>>>>>>>>> 
>>>>>>>>>> Things like:
>>>>>>>>>> 
>>>>>>>>>> - has the approach completely solved the resource capacity
>>>>>>   problem
>>>>>>>>>> for Zepplin community? is Zepplin community happy with the
>>>>>>   result?
>>>>>>>>>> - is the whole configuration chain stable (e.g. uptime) enough?
>>>>>>>>>> - how often do you need to maintain the Jenkins infra? how many
>>>>>>>>>> people are usually involved in maintenance and bug-fixes?
>>>>>>>>>> 
>>>>>>>>>> The downside of this approach seems mostly to be on the
>>>>>>   maintenance
>>>>>>>>>> to me - maintain the script and Jenkins infra.
>>>>>>>>>> 
>>>>>>>>>> ** Having Our Own Travis-CI.com Account **
>>>>>>>>>> 
>>>>>>>>>> Another alternative I've been thinking of is to have our own
>>>>>>>>>> travis-ci.com <http://travis-ci.com> <http://travis-ci.com>
>>>>>>   account with paid dedicated
>>>>>>>>>> resources. Note travis-ci.org <http://travis-ci.org>
>>>>>>   <http://travis-ci.org> is the free
>>>>>>>>>> version and travis-ci.com <http://travis-ci.com>
>>>>>>   <http://travis-ci.com> is the commercial
>>>>>>>>>> version. We currently use a shared resource pool managed by
>>>>>>   ASK INFRA
>>>>>>>>>> team on travis-ci.org <http://travis-ci.org>
>>>>>>   <http://travis-ci.org>, but we have no control
>>>>>>>>>> over it - we can't see how it's configured, how much
>>>>>>   resources are
>>>>>>>>>> available, how resources are allocated among Apache projects,
>>>>>>   etc.
>>>>>>>>>> The nice thing about having an account on travis-ci.com
>>>>>>   <http://travis-ci.com>
>>>>>>>>>> <http://travis-ci.com> are:
>>>>>>>>>> 
>>>>>>>>>> - relatively low cost with much better resource guarantee
>>>>>>   than what
>>>>>>>>>> we currently have [1]: $249/month with 5 dedicated concurrency,
>>>>>>>>>> $489/month with 10 concurrency
>>>>>>>>>> - low maintenance work compared to using Jenkins
>>>>>>>>>> - (potentially) no migration cost according to Travis's doc [2]
>>>>>>>>>> (pending verification)
>>>>>>>>>> - full control over the build capacity/configuration compared to
>>>>>>>>>> using ASF INFRA's pool
>>>>>>>>>> 
>>>>>>>>>> I'd be surprised if we as such a vibrant community cannot
>>>>>>   find and
>>>>>>>>>> fund $249*12=$2988 a year in exchange for a much better
>>>>> developer
>>>>>>>>>> experience and much higher productivity.
>>>>>>>>>> 
>>>>>>>>>> [1] https://travis-ci.com/plans
>>>>>>>>>> [2]
>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>> 
>>> https://docs.travis-ci.com/user/migrate/open-source-repository-migration
>>>>>>>>>> On Sat, Jun 29, 2019 at 8:39 AM Chesnay Schepler
>>>>>>   <chesnay@apache.org <ma...@apache.org>
>>>>>>>>>> <mailto:chesnay@apache.org <ma...@apache.org>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>    So yes, the Jenkins job keeps pulling the state from
>>>>>>   Travis until it
>>>>>>>>>>    finishes.
>>>>>>>>>> 
>>>>>>>>>>    Note sure I'm comfortable with the idea of using Jenkins
>>>>>>   workers
>>>>>>>>>>    just to
>>>>>>>>>>    idle for a several hours.
>>>>>>>>>> 
>>>>>>>>>>    On 29/06/2019 14:56, Jeff Zhang wrote:
>>>>>>>>>>> Here's what zeppelin community did, we make a python
>>>>>>   script to
>>>>>>>>>>    check the
>>>>>>>>>>> build status of pull request.
>>>>>>>>>>> Here's script:
>>>>>>>>>>> 
>>>>>>   https://github.com/apache/zeppelin/blob/master/travis_check.py
>>>>>>>>>>> 
>>>>>>>>>>> And this is the script we used in Jenkins build job.
>>>>>>>>>>> 
>>>>>>>>>>> if [ -f "travis_check.py" ]; then
>>>>>>>>>>>  git log -n 1
>>>>>>>>>>>  STATUS=$(curl -s $BUILD_URL | grep -e "GitHub pull
>>>>>>>>>>    request.*from.*" | sed
>>>>>>>>>>> 's/.*GitHub pull request <a
>>>>>>>>>>> href=\"\(https[^"]*\).*from[^"]*.\(https[^"]*\).*/\1
>>>>>>   \2/g')
>>>>>>>>>>>  AUTHOR=$(echo $STATUS | sed 's/.*[/]\(.*\)$/\1/g')
>>>>>>>>>>>  PR=$(echo $STATUS | awk '{print $1}' | sed
>>>>>>>>>> 's/.*[/]\(.*\)$/\1/g')
>>>>>>>>>>>  #COMMIT=$(git log -n 1 | grep "^Merge:" | awk
>>>>>>   '{print $3}')
>>>>>>>>>>>  #if [ -z $COMMIT ]; then
>>>>>>>>>>>  #  COMMIT=$(curl -s
>>>>>>>>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR
>>>>>>>>>>> | grep -e "\"label\":" -e "\"ref\":" -e "\"sha\":" |
>>>>>>   tr '\n' ' '
>>>>>>>>>>    | sed
>>>>>>>>>>> 's/\(.*sha[^,]*,\)\(.*ref.*\)/\1 = \2/g' | tr = '\n' |
>>>>>>   grep -v
>>>>>>>>>>    "apache:" |
>>>>>>>>>>> sed 's/.*sha.[^"]*["]\([^"]*\).*/\1/g')
>>>>>>>>>>>  #fi
>>>>>>>>>>> 
>>>>>>>>>>>  # get commit hash from PR
>>>>>>>>>>>  COMMIT=$(curl -s
>>>>>>>>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR |
>>>>>>>>>>> grep -e "\"label\":" -e "\"ref\":" -e "\"sha\":" | tr
>>>>>>   '\n' ' '
>>>>>>>>>> | sed
>>>>>>>>>>> 's/\(.*sha[^,]*,\)\(.*ref.*\)/\1 = \2/g' | tr = '\n' |
>>>>>>   grep -v
>>>>>>>>>>    "apache:" |
>>>>>>>>>>> sed 's/.*sha.[^"]*["]\([^"]*\).*/\1/g')
>>>>>>>>>>>  sleep 30 # sleep few moment to wait travis starts
>>>>>>   the build
>>>>>>>>>>>  RET_CODE=0
>>>>>>>>>>>  python ./travis_check.py ${AUTHOR} ${COMMIT} ||
>>>>>>   RET_CODE=$?
>>>>>>>>>>>  if [ $RET_CODE -eq 2 ]; then # try with repository
>>>>>>   name when
>>>>>>>>>>    travis-ci is
>>>>>>>>>>> not available in the account
>>>>>>>>>>>    RET_CODE=0
>>>>>>>>>>>    AUTHOR=$(curl -s
>>>>>>>>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR
>>>>>>>>>>> | grep '"full_name":' | grep -v "apache/zeppelin" | sed
>>>>>>>>>>> 's/.*[:][^"]*["]\([^/]*\).*/\1/g')
>>>>>>>>>>>  python ./travis_check.py ${AUTHOR} ${COMMIT} ||
>>>>>>   RET_CODE=$?
>>>>>>>>>>>  fi
>>>>>>>>>>> 
>>>>>>>>>>>  if [ $RET_CODE -eq 2 ]; then # fail with can't find
>>>>>>   build
>>>>>>>>>>    information in
>>>>>>>>>>> the travis
>>>>>>>>>>>    set +x
>>>>>>>>>>>    echo
>>>>>>   "-----------------------------------------------------"
>>>>>>>>>>>    echo "Looks like travis-ci is not configured for
>>>>>>   your fork."
>>>>>>>>>>>    echo "Please setup by swich on 'zeppelin'
>>>>>>   repository at
>>>>>>>>>>> https://travis-ci.org/profile and travis-ci."
>>>>>>>>>>>    echo "And then make sure 'Build branch updates'
>>>>>>   option is
>>>>>>>>>>    enabled in
>>>>>>>>>>> the settings
>>>>>>   https://travis-ci.org/${AUTHOR}/zeppelin/settings
>>>>>>   <https://travis-ci.org/$%7BAUTHOR%7D/zeppelin/settings>
>>>>>>>>>> <https://travis-ci.org/$%7BAUTHOR%7D/zeppelin/settings>."
>>>>>>>>>>>    echo ""
>>>>>>>>>>>    echo "To trigger CI after setup, you will need
>>>>>>   ammend your
>>>>>>>>>>    last commit
>>>>>>>>>>> with"
>>>>>>>>>>>    echo "git commit --amend"
>>>>>>>>>>>    echo "git push your-remote HEAD --force"
>>>>>>>>>>>    echo ""
>>>>>>>>>>>    echo "See
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>> 
>>> http://zeppelin.apache.org/contribution/contributions.html#continuous-integration
>>>>>>>>>>> ."
>>>>>>>>>>>  fi
>>>>>>>>>>> 
>>>>>>>>>>>  exit $RET_CODE
>>>>>>>>>>> else
>>>>>>>>>>>  set +x
>>>>>>>>>>>  echo "travis_check.py does not exists"
>>>>>>>>>>>  exit 1
>>>>>>>>>>> fi
>>>>>>>>>>> 
>>>>>>>>>>> Chesnay Schepler <chesnay@apache.org
>>>>>>   <ma...@apache.org>
>>>>>>>>>>    <mailto:chesnay@apache.org <ma...@apache.org>>>
>>>>>>   于2019年6月29日周六 下午3:17写道:
>>>>>>>>>>> 
>>>>>>>>>>>> Does this imply that a Jenkins job is active as long
>>>>>>   as the
>>>>>>>>>>    Travis build
>>>>>>>>>>>> runs?
>>>>>>>>>>>> 
>>>>>>>>>>>> On 26/06/2019 21:28, Bowen Li wrote:
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> @Dawid, I think the "long test running" as I
>>>>>>   mentioned in the
>>>>>>>>>>    first
>>>>>>>>>>>> email,
>>>>>>>>>>>>> also as you guys said, belongs to "a big effort
>>>>>>   which is much
>>>>>>>>>>    harder to
>>>>>>>>>>>>> accomplish in a short period of time and may deserve
>>>>>>   its own
>>>>>>>>>>    separate
>>>>>>>>>>>>> discussion". Thus I didn't include it in what we can
>>>>>>   do in a
>>>>>>>>>>    foreseeable
>>>>>>>>>>>>> short term.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Besides, I don't think that's the ultimate reason
>>>>>>   for lack of
>>>>>>>>>>    build
>>>>>>>>>>>>> resources. Even if the build is shortened to
>>>>>>   something like
>>>>>>>>>>    2h, the
>>>>>>>>>>>>> problems of no build machine works about 6 or more
>>>>>>   hours in
>>>>>>>>>>    PST daytime
>>>>>>>>>>>>> that I described will still happen, because no
>>>>>>   machine from
>>>>>>>>>>    ASF INFRA's
>>>>>>>>>>>>> pool is allocated to Flink. As I have paid close
>>>>>>   attention to
>>>>>>>>>>    the build
>>>>>>>>>>>>> queue in the past few weekdays, it's a pretty clear
>>>>>>   pattern now.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> **The ultimate root cause** for that is - we don't
>>>>>>   have any
>>>>>>>>>>    **dedicated**
>>>>>>>>>>>>> build resources that we can stably rely on. I'm
>>>>>>   actually ok to
>>>>>>>>>>    wait for a
>>>>>>>>>>>>> long time if there are build requests running, it
>>>>>>   means at
>>>>>>>>>>    least we are
>>>>>>>>>>>>> making progress. But I'm not ok with no build
>>>>>>   resource. A
>>>>>>>>>>    better place I
>>>>>>>>>>>>> think we should aim at in short term is to always
>>>>>>   have at
>>>>>>>>>>    least a central
>>>>>>>>>>>>> pool (can be 3 or 5) of machines dedicated to build
>>>>>>   Flink at
>>>>>>>>>>    any time, or
>>>>>>>>>>>>> maybe use users resources.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> @Chesnay @Robert I synced with Jeff offline that
>>>>>>   Zeppelin
>>>>>>>>>>    community is
>>>>>>>>>>>>> using a Jenkins job to automatically build on users'
>>>>>>   travis
>>>>>>>>>>    account and
>>>>>>>>>>>>> link the result back to github PR. I guess the
>>>>>>   Jenkins job
>>>>>>>>>>    would fetch
>>>>>>>>>>>>> latest upstream master and build the PR against it.
>>>>>>   Jeff has
>>>>>>>>>> filed
>>>>>>>>>>>> tickets
>>>>>>>>>>>>> to learn and get access to the Jenkins infra. It'll
>>>>>>   better to
>>>>>>>>>>    fully
>>>>>>>>>>>>> understand it first before judging this approach.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I also heard good things about CircleCI, and ASF
>>>>>>   INFRA seems
>>>>>>>>>>    to have a
>>>>>>>>>>>> pool
>>>>>>>>>>>>> of build capacity there too. Can be an alternative
>>>>>>   to consider.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Wed, Jun 26, 2019 at 12:44 AM Dawid Wysakowicz <
>>>>>>>>>>>> dwysakowicz@apache.org
>>>>>>   <ma...@apache.org> <mailto:dwysakowicz@apache.org
>>>>>>   <ma...@apache.org>>>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Sorry to jump in late, but I think Bowen missed the
>>>>>>   most
>>>>>>>>>>    important point
>>>>>>>>>>>>>> from Chesnay's previous message in the summary. The
>>>>>>   ultimate
>>>>>>>>>>    reason for
>>>>>>>>>>>>>> all the problems is that the tests take close to 2
>>>>>>   hours to
>>>>>>>>>>    run already.
>>>>>>>>>>>>>> I fully support this claim: "Unless people start
>>>>>>   caring about
>>>>>>>>>>    test times
>>>>>>>>>>>>>> before adding them, this issue cannot be solved"
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> This is also another reason why using user's Travis
>>>>>>   account
>>>>>>>>>>    won't help.
>>>>>>>>>>>>>> Every few weeks we reach the user's time limit for
>>>>>>   a single
>>>>>>>>>>    profile.
>>>>>>>>>>>>>> This makes the user's builds simply fail, until we
>>>>>>   either
>>>>>>>>>>    properly
>>>>>>>>>>>>>> decrease the time the tests take (which I am not
>>>>>>   sure we ever
>>>>>>>>>>    did) or
>>>>>>>>>>>>>> postpone the problem by splitting into more
>>>>>>   profiles. (Note
>>>>>>>>>>    that the ASF
>>>>>>>>>>>>>> Travis account has higher time limits)
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Dawid
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On 26/06/2019 09:36, Robert Metzger wrote:
>>>>>>>>>>>>>>> Do we know if using "the best" available hardware
>>>>>>   would
>>>>>>>>>>    improve the
>>>>>>>>>>>> build
>>>>>>>>>>>>>>> times?
>>>>>>>>>>>>>>> Imagine we would run the build on machines with
>>>>>>   plenty of
>>>>>>>>>>    main memory
>>>>>>>>>>>> to
>>>>>>>>>>>>>>> mount everything to ramdisk + the latest CPU
>>>>>>   architecture?
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Throwing hardware at the problem could help reduce
>>>>>>   the time
>>>>>>>>>>    of an
>>>>>>>>>>>>>>> individual build, and using our own infrastructure
>>>>>>   would
>>>>>>>>>>    remove our
>>>>>>>>>>>>>>> dependency on Apache's Travis account (with the
>>>>>>   obvious
>>>>>>>>>>    downside of
>>>>>>>>>>>>>> having
>>>>>>>>>>>>>>> to maintain the infrastructure)
>>>>>>>>>>>>>>> We could use an open source travis alternative, to
>>>>>>   have a
>>>>>>>>>>    similar
>>>>>>>>>>>>>>> experience and make the migration easy.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Wed, Jun 26, 2019 at 9:34 AM Chesnay Schepler
>>>>>>>>>>    <chesnay@apache.org <ma...@apache.org>
>>>>>>   <mailto:chesnay@apache.org <ma...@apache.org>>>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>> From what I gathered, there's no special
>>>>>>   sauce that the
>>>>>>>>>>    Zeppelin
>>>>>>>>>>>>>>>> project uses which actually integrates a users
>>>>> Travis
>>>>>>>>>>    account into the
>>>>>>>>>>>>>> PR.
>>>>>>>>>>>>>>>> They just disabled Travis for PRs. And that's
>>>>>>   kind of it.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Naturally we can do this (duh) and safe the ASF a
>>>>>>   fair
>>>>>>>>>>    amount of
>>>>>>>>>>>>>>>> resources, but there are downsides:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> The discoverability of the Travis check takes a
>>>>>>   nose-dive.
>>>>>>>>>>    Either we
>>>>>>>>>>>>>>>> require every contributor to always, an every
>>>>>>   commit, also
>>>>>>>>>>    post a
>>>>>>>>>>>> Travis
>>>>>>>>>>>>>>>> build, or we have the reviewer sift through the
>>>>>>>>>>    contributors account
>>>>>>>>>>>> to
>>>>>>>>>>>>>>>> find it.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> This is rather cumbersome. Additionally, it's
>>>>>>   also not
>>>>>>>>>>    equivalent to
>>>>>>>>>>>>>>>> having a PR build.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> A normal branch build takes a branch as is and
>>>>>>   tests it. A
>>>>>>>>>>    PR build
>>>>>>>>>>>>>>>> merges the branch into master, and then runs it.
>>>>>>   (Fun fact:
>>>>>>>>>>    This is
>>>>>>>>>>>> why
>>>>>>>>>>>>>>>> a PR without merge conflicts is not being run on
>>>>>>   Travis.)
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> And ultimately, everyone can already make use of
>>>>> this
>>>>>>>>>>    approach anyway.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On 25/06/2019 08:02, Jark Wu wrote:
>>>>>>>>>>>>>>>>> Hi Jeff,
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Thanks for sharing the Zeppelin approach. I
>>>>>>   think it's a
>>>>>>>>>>    good idea to
>>>>>>>>>>>>>>>>> leverage user's travis account.
>>>>>>>>>>>>>>>>> In this way, we can have almost unlimited
>>>>>>   concurrent build
>>>>>>>>>>    jobs and
>>>>>>>>>>>>>>>>> developers can restart build by themselves
>>>>>>   (currently only
>>>>>>>>>>    committers
>>>>>>>>>>>>>>>>> can restart PR's build).
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> But I'm still not very clear how to integrate
>>>>> user's
>>>>>>>>>>    travis build
>>>>>>>>>>>> into
>>>>>>>>>>>>>>>>> the Flink pull request's build automatically.
>>>>>>   Can you
>>>>>>>>>>    explain more in
>>>>>>>>>>>>>>>>> detail?
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Another question: does travis only build
>>>>>>   branches for user
>>>>>>>>>>    account?
>>>>>>>>>>>>>>>>> My concern is that builds for PRs will rebase
>>>>> user's
>>>>>>>>>>    commits against
>>>>>>>>>>>>>>>>> current master branch.
>>>>>>>>>>>>>>>>> This will help us to find problems before
>>>>>>   merge.  Builds
>>>>>>>>>>    for branches
>>>>>>>>>>>>>>>>> will lose the impact of new commits in master.
>>>>>>>>>>>>>>>>> How does Zeppelin solve this problem?
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Thanks again for sharing the idea.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>>> Jark
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On Tue, 25 Jun 2019 at 11:01, Jeff Zhang
>>>>>>   <zjffdu@gmail.com <ma...@gmail.com>
>>>>>>>>>>    <mailto:zjffdu@gmail.com <ma...@gmail.com>>
>>>>>>>>>>>>>>>>> <mailto:zjffdu@gmail.com
>>>>>>   <ma...@gmail.com> <mailto:zjffdu@gmail.com
>>>>>>   <ma...@gmail.com>>>> wrote:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>     Hi Folks,
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Zeppelin meet this kind of issue before, we solve
>>>>>>>>>> it by
>>>>>>>>>>>> delegating
>>>>>>>>>>>>>>>>>     each
>>>>>>>>>>>>>>>>>     one's PR build to his travis account
>>>>>>   (Everyone can
>>>>>>>>>>    have 5 free
>>>>>>>>>>>>>>>>>     slot for
>>>>>>>>>>>>>>>>> travis build).
>>>>>>>>>>>>>>>>> Apache account travis build is only triggered when
>>>>>>>>>>    PR is merged.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>     Kurt Young <ykt836@gmail.com
>>>>>>   <ma...@gmail.com>
>>>>>>>>>>    <mailto:ykt836@gmail.com <ma...@gmail.com>>
>>>>>>   <mailto:ykt836@gmail.com <ma...@gmail.com>
>>>>>>>>>>    <mailto:ykt836@gmail.com <ma...@gmail.com>>>>
>>>>>>>>>>>>>>>>> 于2019年6月25日周二 上午10:16写道:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> (Forgot to cc George)
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>> Kurt
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> On Tue, Jun 25, 2019 at 10:16 AM Kurt Young
>>>>>>>>>>    <ykt836@gmail.com <ma...@gmail.com>
>>>>>>   <mailto:ykt836@gmail.com <ma...@gmail.com>>
>>>>>>>>>>>>>>>>> <mailto:ykt836@gmail.com
>>>>>>   <ma...@gmail.com> <mailto:ykt836@gmail.com
>>>>>>   <ma...@gmail.com>>>>
>>>>>>>>>>    wrote:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Hi Bowen,
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Thanks for bringing this up. We
>>>>>>   actually have
>>>>>>>>>>    discussed
>>>>>>>>>>>> about
>>>>>>>>>>>>>>>>>     this, and I
>>>>>>>>>>>>>>>>>>> think Till and George have
>>>>>>>>>>>>>>>>>>> already spend sometime investigating
>>>>>>   it. I have
>>>>>>>>>>    cced both of
>>>>>>>>>>>>>>>>>     them, and
>>>>>>>>>>>>>>>>>>> maybe they can share
>>>>>>>>>>>>>>>>>>> their findings.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>> Kurt
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> On Tue, Jun 25, 2019 at 10:08 AM Jark Wu
>>>>>>>>>>    <imjark@gmail.com <ma...@gmail.com>
>>>>>>   <mailto:imjark@gmail.com <ma...@gmail.com>>
>>>>>>>>>>>>>>>>> <mailto:imjark@gmail.com
>>>>>>   <ma...@gmail.com> <mailto:imjark@gmail.com
>>>>>>   <ma...@gmail.com>>>>
>>>>>>>>>>    wrote:
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Hi Bowen,
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Thanks for bringing this. We also
>>>>>>   suffered from
>>>>>>>>>>    the long
>>>>>>>>>>>>>>>>>     build time.
>>>>>>>>>>>>>>>>>>>> I agree that we should focus on
>>>>>>   solving build
>>>>>>>>>>    capacity
>>>>>>>>>>>>>>>>> problem in the
>>>>>>>>>>>>>>>>>>>> thread.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> My observation is there is only one
>>>>>>   build is
>>>>>>>>>>    running, all
>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> others
>>>>>>>>>>>>>>>>>>>> (other
>>>>>>>>>>>>>>>>>>>> PRs, master) are pending.
>>>>>>>>>>>>>>>>>>>> The pricing plan[1] of travis shows
>>>>>>   it can
>>>>>>>>>> support
>>>>>>>>>>>> concurrent
>>>>>>>>>>>>>>>>>     build
>>>>>>>>>>>>>>>>>> jobs.
>>>>>>>>>>>>>>>>>>>> But I don't know which plan we are
>>>>>>   using, might
>>>>>>>>>>    be the free
>>>>>>>>>>>>>>>>>     plan for
>>>>>>>>>>>>>>>>>> open
>>>>>>>>>>>>>>>>>>>> source.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> I cc-ed Chesnay who may have some
>>>>>>   experience on
>>>>>>>>>>    Travis.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>>>>>> Jark
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> [1]: https://travis-ci.com/plans
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> On Tue, 25 Jun 2019 at 08:11, Bowen Li <
>>>>>>>>>>>> bowenli86@gmail.com <ma...@gmail.com>
>>>>>>   <mailto:bowenli86@gmail.com <ma...@gmail.com>>
>>>>>>>>>>>>>>>>> <mailto:bowenli86@gmail.com
>>>>>>   <ma...@gmail.com>
>>>>>>>>>>    <mailto:bowenli86@gmail.com
>>>>>>   <ma...@gmail.com>>>> wrote:
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Hi Steven,
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> I think you may not read what I
>>>>>>   wrote. The
>>>>>>>>>>    discussion is
>>>>>>>>>>>>>> about
>>>>>>>>>>>>>>>>>> "unstable
>>>>>>>>>>>>>>>>>>>>> build **capacity**", in another word
>>>>>>>>>>    "unstable / lack of
>>>>>>>>>>>>>> build
>>>>>>>>>>>>>>>>>>>> resources",
>>>>>>>>>>>>>>>>>>>>> not "unstable build".
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> On Mon, Jun 24, 2019 at 4:40 PM
>>>>>>   Steven Wu
>>>>>>>>>>>>>>>>>     <stevenz3wu@gmail.com
>>>>>>   <ma...@gmail.com> <mailto:stevenz3wu@gmail.com
>>>>>>   <ma...@gmail.com>>
>>>>>>>>>>    <mailto:stevenz3wu@gmail.com
>>>>>>   <ma...@gmail.com> <mailto:stevenz3wu@gmail.com
>>>>>>   <ma...@gmail.com>>>>
>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> long and sometimes unstable build is
>>>>>>>>>>    definitely a pain
>>>>>>>>>>>>>>>> point.
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> I suspect the build failure here in
>>>>>>>>>>>> flink-connector-kafka
>>>>>>>>>>>>>>>>>     is not
>>>>>>>>>>>>>>>>>>>> related
>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>> my change. but there is no easy
>>>>>>   re-run the
>>>>>>>>>>    build on
>>>>>>>>>>>>>>>>> travis UI.
>>>>>>>>>>>>>>>>>> Google
>>>>>>>>>>>>>>>>>>>>>> search showed a trick of
>>>>>>   close-and-open the
>>>>>>>>>>    PR will
>>>>>>>>>>>>>>>>> trigger rebuild.
>>>>>>>>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>>>>>>>> that could add noises to the PR
>>>>>>   activities.
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>> https://travis-ci.org/apache/flink/jobs/545555519
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> travis-ci for my personal repo
>>>>>>   often failed
>>>>>>>>>>    with
>>>>>>>>>>>>>>>>> exceeding time
>>>>>>>>>>>>>>>>>> limit
>>>>>>>>>>>>>>>>>>>>> after
>>>>>>>>>>>>>>>>>>>>>> 4+ hours.
>>>>>>>>>>>>>>>>>>>>>> The job exceeded the maximum time
>>>>>>   limit for
>>>>>>>>>>    jobs, and
>>>>>>>>>>>> has
>>>>>>>>>>>>>>>>>     been
>>>>>>>>>>>>>>>>>>>>> terminated.
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> On Mon, Jun 24, 2019 at 4:15 PM
>>>>>>   Bowen Li
>>>>>>>>>>>>>>>>>     <bowenli86@gmail.com
>>>>>>   <ma...@gmail.com> <mailto:bowenli86@gmail.com
>>>>>>   <ma...@gmail.com>>
>>>>>>>>>>    <mailto:bowenli86@gmail.com <ma...@gmail.com>
>>>>>>   <mailto:bowenli86@gmail.com <ma...@gmail.com>>>>
>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>> https://travis-ci.org/apache/flink/builds/549681530
>>>>>>>>>>>>>>>>>     This build
>>>>>>>>>>>>>>>>>>>>> request
>>>>>>>>>>>>>>>>>>>>>>> has
>>>>>>>>>>>>>>>>>>>>>>> been sitting at **HEAD of the
>>>>>>   queue**
>>>>>>>>>>    since I first
>>>>>>>>>>>> saw
>>>>>>>>>>>>>>>>>     it at PST
>>>>>>>>>>>>>>>>>>>>> 10:30am
>>>>>>>>>>>>>>>>>>>>>>> (not sure how long it's been
>>>>>>   there before
>>>>>>>>>>    10:30am).
>>>>>>>>>>>>>>>>>     It's PST
>>>>>>>>>>>>>>>>>> 4:12pm
>>>>>>>>>>>>>>>>>>>> now
>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>> it hasn't started yet.
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> On Mon, Jun 24, 2019 at 2:48 PM
>>>>>>   Bowen Li
>>>>>>>>>>>>>>>>>     <bowenli86@gmail.com
>>>>>>   <ma...@gmail.com> <mailto:bowenli86@gmail.com
>>>>>>   <ma...@gmail.com>>
>>>>>>>>>>    <mailto:bowenli86@gmail.com <ma...@gmail.com>
>>>>>>   <mailto:bowenli86@gmail.com <ma...@gmail.com>>>>
>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> Hi devs,
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> I've been experiencing the pain
>>>>>>>>>>    resulting from lack
>>>>>>>>>>>>>>>>>     of stable
>>>>>>>>>>>>>>>>>>>> build
>>>>>>>>>>>>>>>>>>>>>>>> capacity on Travis for Flink
>>>>>>   PRs [1].
>>>>>>>>>>>> Specifically, I
>>>>>>>>>>>>>>>>> noticed
>>>>>>>>>>>>>>>>>>>> often
>>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>> no
>>>>>>>>>>>>>>>>>>>>>>>> build in the queue is making any
>>>>>>>>>>    progress for
>>>>>>>>>>>> hours,
>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>> suddenly
>>>>>>>>>>>>>>>>>>>> 5
>>>>>>>>>>>>>>>>>>>>> or
>>>>>>>>>>>>>>>>>>>>>> 6
>>>>>>>>>>>>>>>>>>>>>>>> builds kick off all together
>>>>>>   after the
>>>>>>>>>>    long pause.
>>>>>>>>>>>>>>>>>     I'm at PST
>>>>>>>>>>>>>>>>>>>>> (UTC-08)
>>>>>>>>>>>>>>>>>>>>>>> time
>>>>>>>>>>>>>>>>>>>>>>>> zone, and I've seen pause can
>>>>>>   be as
>>>>>>>>>>    long as 6 hours
>>>>>>>>>>>>>>>>>     from PST 9am
>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>> 3pm
>>>>>>>>>>>>>>>>>>>>>>>> (let alone the time needed to
>>>>>>   drain the
>>>>>>>>>>    queue
>>>>>>>>>>>>>>>>> afterwards).
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> I think this has greatly
>>>>>>   impacted our
>>>>>>>>>>    productivity.
>>>>>>>>>>>>>> I've
>>>>>>>>>>>>>>>>>>>> experienced
>>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>> PRs submitted in the early
>>>>>>   morning of
>>>>>>>>>>    PST time zone
>>>>>>>>>>>>>>>>>     won't finish
>>>>>>>>>>>>>>>>>>>>> their
>>>>>>>>>>>>>>>>>>>>>>>> build until late night of the
>>>>>>   same day.
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> So my questions are:
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> - Has anyone else experienced
>>>>>>   the same
>>>>>>>>>>    problem or
>>>>>>>>>>>>>>>>>     have similar
>>>>>>>>>>>>>>>>>>>>>>> observation
>>>>>>>>>>>>>>>>>>>>>>>> on TravisCI? (I suspect it
>>>>>>   has things
>>>>>>>>>>    to do with
>>>>>>>>>>>> time
>>>>>>>>>>>>>>>>>     zone)
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> - What pricing plan of
>>>>>>   TravisCI is
>>>>>>>>>>    Flink currently
>>>>>>>>>>>>>>>>> using? Is it
>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>> free
>>>>>>>>>>>>>>>>>>>>>>>> plan for open source
>>>>>>   projects? What
>>>>>>>>>> are the
>>>>>>>>>>>>>>>>> guaranteed build
>>>>>>>>>>>>>>>>>>>> capacity
>>>>>>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>>>>> the current plan?
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> - If the current pricing plan
>>>>>>   (either
>>>>>>>>>>    free or paid)
>>>>>>>>>>>>>>>> can't
>>>>>>>>>>>>>>>>>> provide
>>>>>>>>>>>>>>>>>>>>>> stable
>>>>>>>>>>>>>>>>>>>>>>>> build capacity, can we
>>>>>>   upgrade to a
>>>>>>>>>>    higher priced
>>>>>>>>>>>>>>>>>     plan with
>>>>>>>>>>>>>>>>>> larger
>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>> more
>>>>>>>>>>>>>>>>>>>>>>>> stable build capacity?
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> BTW, another factor that
>>>>>>   contribute to
>>>>>>>>>> the
>>>>>>>>>>>>>>>>> productivity problem
>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>> our build is slow - we run
>>>>>>   full build
>>>>>>>>>>    for every PR
>>>>>>>>>>>>>> and a
>>>>>>>>>>>>>>>>>>>> successful
>>>>>>>>>>>>>>>>>>>>>> full
>>>>>>>>>>>>>>>>>>>>>>>> build takes ~5h. We
>>>>>>   definitely have
>>>>>>>>>>    more options to
>>>>>>>>>>>>>>>>>     solve it,
>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>>>> instance,
>>>>>>>>>>>>>>>>>>>>>>>> modularize the build graphs
>>>>>>   and reuse
>>>>>>>>>>    artifacts
>>>>>>>>>>>> from
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> previous
>>>>>>>>>>>>>>>>>>>>>> build.
>>>>>>>>>>>>>>>>>>>>>>>> But I think that can be a big
>>>>>>   effort
>>>>>>>>>>    which is much
>>>>>>>>>>>>>>>>> harder to
>>>>>>>>>>>>>>>>>>>>> accomplish
>>>>>>>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>>>>> a short period of time and
>>>>>>   may deserve
>>>>>>>>>>    its own
>>>>>>>>>>>>>> separate
>>>>>>>>>>>>>>>>>>>> discussion.
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>> https://travis-ci.org/apache/flink/pull_requests
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>     --
>>>>>>>>>>>>>>>>>     Best Regards
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>     Jeff Zhang
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>> 
>>> 
> 



Re: Re: [VOTE] Migrate to sponsored Travis account

Posted by zhijiang <wa...@aliyun.com.INVALID>.
+1 and thanks for Chesnay' work on this.

Best,
Zhijiang
 
------------------------------------------------------------------
From:Haibo Sun <su...@163.com>
Send Time:2019年7月4日(星期四) 18:21
To:dev <de...@flink.apache.org>
Cc:private@flink.apache.org <pr...@flink.apache.org>
Subject:Re:Re: [VOTE] Migrate to sponsored Travis account

+1. Thank Chesnay for pushing this forward.

Best,
Haibo


At 2019-07-04 17:58:28, "Kurt Young" <yk...@gmail.com> wrote:
>+1 and great thanks Chesnay for pushing this.
>
>Best,
>Kurt
>
>
>On Thu, Jul 4, 2019 at 5:44 PM Aljoscha Krettek <al...@apache.org> wrote:
>
>> +1
>>
>> Aljoscha
>>
>> > On 4. Jul 2019, at 11:09, Stephan Ewen <se...@apache.org> wrote:
>> >
>> > +1 to move to a private Travis account.
>> >
>> > I can confirm that Ververica will sponsor a Travis CI plan that is
>> > equivalent or a bit higher than the previous ASF quota (10 concurrent
>> build
>> > queues)
>> >
>> > Best,
>> > Stephan
>> >
>> > On Thu, Jul 4, 2019 at 10:46 AM Chesnay Schepler <ch...@apache.org>
>> wrote:
>> >
>> >> I've raised a JIRA
>> >> <https://issues.apache.org/jira/browse/INFRA-18703>with INFRA to
>> inquire
>> >> whether it would be possible to switch to a different Travis account,
>> >> and if so what steps would need to be taken.
>> >> We need a proper confirmation from INFRA since we are not in full
>> >> control of the flink repository (for example, we cannot access the
>> >> settings page).
>> >>
>> >> If this is indeed possible, Ververica is willing sponsor a Travis
>> >> account for the Flink project.
>> >> This would provide us with more than enough resources than we need.
>> >>
>> >> Since this makes the project more reliant on resources provided by
>> >> external companies I would like to vote on this.
>> >>
>> >> Please vote on this proposal, as follows:
>> >> [ ] +1, Approve the migration to a Ververica-sponsored Travis account,
>> >> provided that INFRA approves
>> >> [ ] -1, Do not approach the migration to a Ververica-sponsored Travis
>> >> account
>> >>
>> >> The vote will be open for at least 24h, and until we have confirmation
>> >> from INFRA. The voting period may be shorter than the usual 3 days since
>> >> our current is effectively not working.
>> >>
>> >> On 04/07/2019 06:51, Bowen Li wrote:
>> >>> Re: > Are they using their own Travis CI pool, or did the switch to an
>> >>> entirely different CI service?
>> >>>
>> >>> I reached out to Wes and Krisztián from Apache Arrow PMC. They are
>> >>> currently moving away from ASF's Travis to their own in-house metal
>> >>> machines at [1] with custom CI application at [2]. They've seen
>> >>> significant improvement w.r.t both much higher performance and
>> >>> basically no resource waiting time, "night-and-day" difference quoting
>> >>> Wes.
>> >>>
>> >>> Re: > If we can just switch to our own Travis pool, just for our
>> >>> project, then this might be something we can do fairly quickly?
>> >>>
>> >>> I believe so, according to [3] and [4]
>> >>>
>> >>>
>> >>> [1] https://ci.ursalabs.org/ <https://ci.ursalabs.org/#/>
>> >>> [2] https://github.com/ursa-labs/ursabot
>> >>> [3]
>> >>>
>> https://docs.travis-ci.com/user/migrate/open-source-repository-migration
>> >>> [4]
>> https://docs.travis-ci.com/user/migrate/open-source-on-travis-ci-com
>> >>>
>> >>>
>> >>>
>> >>> On Wed, Jul 3, 2019 at 12:01 AM Chesnay Schepler <chesnay@apache.org
>> >>> <ma...@apache.org>> wrote:
>> >>>
>> >>>    Are they using their own Travis CI pool, or did the switch to an
>> >>>    entirely different CI service?
>> >>>
>> >>>    If we can just switch to our own Travis pool, just for our
>> >>>    project, then
>> >>>    this might be something we can do fairly quickly?
>> >>>
>> >>>    On 03/07/2019 05:55, Bowen Li wrote:
>> >>>> I responded in the INFRA ticket [1] that I believe they are
>> >>>    using a wrong
>> >>>> metric against Flink and the total build time is a completely
>> >>>    different
>> >>>> thing than guaranteed build capacity.
>> >>>>
>> >>>> My response:
>> >>>>
>> >>>> "As mentioned above, since I started to pay attention to Flink's
>> >>>    build
>> >>>> queue a few tens of days ago, I'm in Seattle and I saw no build
>> >>>    was kicking
>> >>>> off in PST daytime in weekdays for Flink. Our teammates in China
>> >>>    and Europe
>> >>>> have also reported similar observations. So we need to evaluate
>> >>>    how the
>> >>>> large total build time came from - if 1) your number and 2) our
>> >>>> observations from three locations that cover pretty much a full
>> >>>    day, are
>> >>>> all true, I **guess** one reason can be that - highly likely the
>> >>>    extra
>> >>>> build time came from weekends when other Apache projects may be
>> >>>    idle and
>> >>>> Flink just drains hard its congested queue.
>> >>>>
>> >>>> Please be aware of that we're not complaining about the lack of
>> >>>    resources
>> >>>> in general, I'm complaining about the lack of **stable, dedicated**
>> >>>> resources. An example for the latter one is, currently even if
>> >>>    no build is
>> >>>> in Flink's queue and I submit a request to be the queue head in PST
>> >>>> morning, my build won't even start in 6-8+h. That is an absurd
>> >>>    amount of
>> >>>> waiting time.
>> >>>>
>> >>>> That's saying, if ASF INFRA decides to adopt a quota system and
>> >>>    grants
>> >>>> Flink five DEDICATED servers that runs all the time only for
>> >>>    Flink, that'll
>> >>>> be PERFECT and can totally solve our problem now.
>> >>>>
>> >>>> Please be aware of that we're not complaining about the lack of
>> >>>    resources
>> >>>> in general, I'm complaining about the lack of **stable, dedicated**
>> >>>> resources. An example for the latter one is, currently even if
>> >>>    no build is
>> >>>> in Flink's queue and I submit a request to be the queue head in PST
>> >>>> morning, my build won't even start in 6-8+h. That is an absurd
>> >>>    amount of
>> >>>> waiting time.
>> >>>>
>> >>>>
>> >>>> That's saying, if ASF INFRA decides to adopt a quota system and
>> >>>    grants
>> >>>> Flink five DEDICATED servers that runs all the time only for
>> >>>    Flink, that'll
>> >>>> be PERFECT and can totally solve our problem now.
>> >>>>
>> >>>> I feel what's missing in the ASF INFRA's Travis resource pool is
>> >>>    some level
>> >>>> of build capacity SLAs and certainty"
>> >>>>
>> >>>>
>> >>>> Again, I believe there are differences in nature of these two
>> >>>    problems,
>> >>>> long build time v.s. lack of dedicated build resource. That's
>> >>>    saying,
>> >>>> shortening build time may relieve the situation, and may not.
>> >>>    I'm sightly
>> >>>> negative on disabling IT cases for PRs, due to the downside is
>> >>>    that we are
>> >>>> at risk of any potential bugs in PR that UTs doesn't catch, and
>> >>>    may cost a
>> >>>> lot more to fix and if it slows others down or even block
>> >>>    others, but am
>> >>>> open to others opinions on it.
>> >>>>
>> >>>> AFAICT from INFRA ticket[1], donating to ASF INFRA won't be
>> >>>    feasible to
>> >>>> solve our problem since INFRA's pool is fully shared and they
>> >>>    have no
>> >>>> control and finer insights over resource allocation to a
>> >>>    specific Apache
>> >>>> project. As mentioned in [1], Apache Arrow is moving away from
>> >>>    ASF INFRA
>> >>>> Travis pool (they are actually surprised Flink hasn't plan to do
>> >>>    so). I
>> >>>> know that Spark is on its own build infra. If we all agree that
>> >>>    funding our
>> >>>> own build infra, I'd be glad to help investigate any potential
>> >>>    options
>> >>>> after releasing 1.9 since I'm super busy with 1.9 now.
>> >>>>
>> >>>> [1] https://issues.apache.org/jira/browse/INFRA-18533
>> >>>>
>> >>>>
>> >>>>
>> >>>> On Tue, Jul 2, 2019 at 4:46 AM Chesnay Schepler
>> >>>    <chesnay@apache.org <ma...@apache.org>> wrote:
>> >>>>
>> >>>>> As a short-term stopgap, since we can assume this issue to
>> >>>    become much
>> >>>>> worse in the following days/weeks, we could disable IT cases in
>> >>>    PRs and
>> >>>>> only run them on master.
>> >>>>>
>> >>>>> On 02/07/2019 12:03, Chesnay Schepler wrote:
>> >>>>>> People really have to stop thinking that just because
>> >>>    something works
>> >>>>>> for us it is also a good solution.
>> >>>>>> Also, please remember that our builds run for 2h from start to
>> >>>    finish,
>> >>>>>> and not the 14 _minutes_ it takes for zeppelin.
>> >>>>>> We are dealing with an entirely different scale here, both in
>> >>>    terms of
>> >>>>>> build times and number of builds.
>> >>>>>>
>> >>>>>> In this very thread people have been complaining about long queue
>> >>>>>> times for their builds. Surprise, other Apache projects have been
>> >>>>>> suffering the very same thing due to us not controlling our build
>> >>>>>> times. While switching services (be it Jenkins, CircleCI or
>> >>>    whatever)
>> >>>>>> will possibly work for us (and these options are actually
>> >>>    attractive,
>> >>>>>> like CircleCI's proper support for build artifacts), it will also
>> >>>>>> result in us likely negatively affecting other projects in
>> >>>    significant
>> >>>>>> ways.
>> >>>>>>
>> >>>>>> Sure, the Jenkins setup has a good user experience for us, at
>> >>>    the cost
>> >>>>>> of blocking Jenkins workers for a _lot_ of time. Right now we
>> >>>    have 25
>> >>>>>> PR's in our queue; that's possibly 50h we'd consume of Jenkins
>> >>>>>> resources, and the European contributors haven't even really
>> >>>    started yet.
>> >>>>>>
>> >>>>>> FYI, the latest INFRA response from INFRA-18533:
>> >>>>>>
>> >>>>>> "Our rough metrics shows that Flink used over 5800 hours of
>> >>>    build time
>> >>>>>> last month. That is equal to EIGHT servers running 24/7 for
>> >>>    the ENTIRE
>> >>>>>> MONTH. EIGHT. nonstop.
>> >>>>>> When we discovered this last night, we discussed it some and
>> >>>    are going
>> >>>>>> to tune down Flink to allow only five executors maximum. We
>> >> cannot
>> >>>>>> allow Flink to consume so much of a Foundation shared resource."
>> >>>>>>
>> >>>>>> So yes, we either
>> >>>>>> a) have to heavily reduce our CI usage or
>> >>>>>> b) fund our own, either maintaining it ourselves or donating
>> >>>    to Apache.
>> >>>>>>
>> >>>>>> On 02/07/2019 05:11, Bowen Li wrote:
>> >>>>>>> By looking at the git history of the Jenkins script, its core
>> >>>    part
>> >>>>>>> was finished in March 2017 (and only two minor update in
>> >>>    2017/2018),
>> >>>>>>> so it's been running for over two years now and feels like
>> >>>    Zepplin
>> >>>>>>> community has been quite happy with it. @Jeff Zhang
>> >>>>>>> <mailto:zjffdu@gmail.com <ma...@gmail.com>> can you
>> >>>    share your insights and user
>> >>>>>>> experience with the Jenkins+Travis approach?
>> >>>>>>>
>> >>>>>>> Things like:
>> >>>>>>>
>> >>>>>>> - has the approach completely solved the resource capacity
>> >>>    problem
>> >>>>>>> for Zepplin community? is Zepplin community happy with the
>> >>>    result?
>> >>>>>>> - is the whole configuration chain stable (e.g. uptime) enough?
>> >>>>>>> - how often do you need to maintain the Jenkins infra? how many
>> >>>>>>> people are usually involved in maintenance and bug-fixes?
>> >>>>>>>
>> >>>>>>> The downside of this approach seems mostly to be on the
>> >>>    maintenance
>> >>>>>>> to me - maintain the script and Jenkins infra.
>> >>>>>>>
>> >>>>>>> ** Having Our Own Travis-CI.com Account **
>> >>>>>>>
>> >>>>>>> Another alternative I've been thinking of is to have our own
>> >>>>>>> travis-ci.com <http://travis-ci.com> <http://travis-ci.com>
>> >>>    account with paid dedicated
>> >>>>>>> resources. Note travis-ci.org <http://travis-ci.org>
>> >>>    <http://travis-ci.org> is the free
>> >>>>>>> version and travis-ci.com <http://travis-ci.com>
>> >>>    <http://travis-ci.com> is the commercial
>> >>>>>>> version. We currently use a shared resource pool managed by
>> >>>    ASK INFRA
>> >>>>>>> team on travis-ci.org <http://travis-ci.org>
>> >>>    <http://travis-ci.org>, but we have no control
>> >>>>>>> over it - we can't see how it's configured, how much
>> >>>    resources are
>> >>>>>>> available, how resources are allocated among Apache projects,
>> >>>    etc.
>> >>>>>>> The nice thing about having an account on travis-ci.com
>> >>>    <http://travis-ci.com>
>> >>>>>>> <http://travis-ci.com> are:
>> >>>>>>>
>> >>>>>>> - relatively low cost with much better resource guarantee
>> >>>    than what
>> >>>>>>> we currently have [1]: $249/month with 5 dedicated concurrency,
>> >>>>>>> $489/month with 10 concurrency
>> >>>>>>> - low maintenance work compared to using Jenkins
>> >>>>>>> - (potentially) no migration cost according to Travis's doc [2]
>> >>>>>>> (pending verification)
>> >>>>>>> - full control over the build capacity/configuration compared to
>> >>>>>>> using ASF INFRA's pool
>> >>>>>>>
>> >>>>>>> I'd be surprised if we as such a vibrant community cannot
>> >>>    find and
>> >>>>>>> fund $249*12=$2988 a year in exchange for a much better
>> >> developer
>> >>>>>>> experience and much higher productivity.
>> >>>>>>>
>> >>>>>>> [1] https://travis-ci.com/plans
>> >>>>>>> [2]
>> >>>>>>>
>> >>>>>
>> >>>
>> >>
>> https://docs.travis-ci.com/user/migrate/open-source-repository-migration
>> >>>>>>> On Sat, Jun 29, 2019 at 8:39 AM Chesnay Schepler
>> >>>    <chesnay@apache.org <ma...@apache.org>
>> >>>>>>> <mailto:chesnay@apache.org <ma...@apache.org>>> wrote:
>> >>>>>>>
>> >>>>>>>     So yes, the Jenkins job keeps pulling the state from
>> >>>    Travis until it
>> >>>>>>>     finishes.
>> >>>>>>>
>> >>>>>>>     Note sure I'm comfortable with the idea of using Jenkins
>> >>>    workers
>> >>>>>>>     just to
>> >>>>>>>     idle for a several hours.
>> >>>>>>>
>> >>>>>>>     On 29/06/2019 14:56, Jeff Zhang wrote:
>> >>>>>>>> Here's what zeppelin community did, we make a python
>> >>>    script to
>> >>>>>>>     check the
>> >>>>>>>> build status of pull request.
>> >>>>>>>> Here's script:
>> >>>>>>>>
>> >>>    https://github.com/apache/zeppelin/blob/master/travis_check.py
>> >>>>>>>>
>> >>>>>>>> And this is the script we used in Jenkins build job.
>> >>>>>>>>
>> >>>>>>>> if [ -f "travis_check.py" ]; then
>> >>>>>>>>   git log -n 1
>> >>>>>>>>   STATUS=$(curl -s $BUILD_URL | grep -e "GitHub pull
>> >>>>>>>     request.*from.*" | sed
>> >>>>>>>> 's/.*GitHub pull request <a
>> >>>>>>>> href=\"\(https[^"]*\).*from[^"]*.\(https[^"]*\).*/\1
>> >>>    \2/g')
>> >>>>>>>>   AUTHOR=$(echo $STATUS | sed 's/.*[/]\(.*\)$/\1/g')
>> >>>>>>>>   PR=$(echo $STATUS | awk '{print $1}' | sed
>> >>>>>>> 's/.*[/]\(.*\)$/\1/g')
>> >>>>>>>>   #COMMIT=$(git log -n 1 | grep "^Merge:" | awk
>> >>>    '{print $3}')
>> >>>>>>>>   #if [ -z $COMMIT ]; then
>> >>>>>>>>   #  COMMIT=$(curl -s
>> >>>>>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR
>> >>>>>>>> | grep -e "\"label\":" -e "\"ref\":" -e "\"sha\":" |
>> >>>    tr '\n' ' '
>> >>>>>>>     | sed
>> >>>>>>>> 's/\(.*sha[^,]*,\)\(.*ref.*\)/\1 = \2/g' | tr = '\n' |
>> >>>    grep -v
>> >>>>>>>     "apache:" |
>> >>>>>>>> sed 's/.*sha.[^"]*["]\([^"]*\).*/\1/g')
>> >>>>>>>>   #fi
>> >>>>>>>>
>> >>>>>>>>   # get commit hash from PR
>> >>>>>>>>   COMMIT=$(curl -s
>> >>>>>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR |
>> >>>>>>>> grep -e "\"label\":" -e "\"ref\":" -e "\"sha\":" | tr
>> >>>    '\n' ' '
>> >>>>>>> | sed
>> >>>>>>>> 's/\(.*sha[^,]*,\)\(.*ref.*\)/\1 = \2/g' | tr = '\n' |
>> >>>    grep -v
>> >>>>>>>     "apache:" |
>> >>>>>>>> sed 's/.*sha.[^"]*["]\([^"]*\).*/\1/g')
>> >>>>>>>>   sleep 30 # sleep few moment to wait travis starts
>> >>>    the build
>> >>>>>>>>   RET_CODE=0
>> >>>>>>>>   python ./travis_check.py ${AUTHOR} ${COMMIT} ||
>> >>>    RET_CODE=$?
>> >>>>>>>>   if [ $RET_CODE -eq 2 ]; then # try with repository
>> >>>    name when
>> >>>>>>>     travis-ci is
>> >>>>>>>> not available in the account
>> >>>>>>>>     RET_CODE=0
>> >>>>>>>>     AUTHOR=$(curl -s
>> >>>>>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR
>> >>>>>>>> | grep '"full_name":' | grep -v "apache/zeppelin" | sed
>> >>>>>>>> 's/.*[:][^"]*["]\([^/]*\).*/\1/g')
>> >>>>>>>>   python ./travis_check.py ${AUTHOR} ${COMMIT} ||
>> >>>    RET_CODE=$?
>> >>>>>>>>   fi
>> >>>>>>>>
>> >>>>>>>>   if [ $RET_CODE -eq 2 ]; then # fail with can't find
>> >>>    build
>> >>>>>>>     information in
>> >>>>>>>> the travis
>> >>>>>>>>     set +x
>> >>>>>>>>     echo
>> >>>    "-----------------------------------------------------"
>> >>>>>>>>     echo "Looks like travis-ci is not configured for
>> >>>    your fork."
>> >>>>>>>>     echo "Please setup by swich on 'zeppelin'
>> >>>    repository at
>> >>>>>>>> https://travis-ci.org/profile and travis-ci."
>> >>>>>>>>     echo "And then make sure 'Build branch updates'
>> >>>    option is
>> >>>>>>>     enabled in
>> >>>>>>>> the settings
>> >>>    https://travis-ci.org/${AUTHOR}/zeppelin/settings
>> >>>    <https://travis-ci.org/$%7BAUTHOR%7D/zeppelin/settings>
>> >>>>>>> <https://travis-ci.org/$%7BAUTHOR%7D/zeppelin/settings>."
>> >>>>>>>>     echo ""
>> >>>>>>>>     echo "To trigger CI after setup, you will need
>> >>>    ammend your
>> >>>>>>>     last commit
>> >>>>>>>> with"
>> >>>>>>>>     echo "git commit --amend"
>> >>>>>>>>     echo "git push your-remote HEAD --force"
>> >>>>>>>>     echo ""
>> >>>>>>>>     echo "See
>> >>>>>>>>
>> >>>>>>>
>> >>>>>
>> >>>
>> >>
>> http://zeppelin.apache.org/contribution/contributions.html#continuous-integration
>> >>>>>>>> ."
>> >>>>>>>>   fi
>> >>>>>>>>
>> >>>>>>>>   exit $RET_CODE
>> >>>>>>>> else
>> >>>>>>>>   set +x
>> >>>>>>>>   echo "travis_check.py does not exists"
>> >>>>>>>>   exit 1
>> >>>>>>>> fi
>> >>>>>>>>
>> >>>>>>>> Chesnay Schepler <chesnay@apache.org
>> >>>    <ma...@apache.org>
>> >>>>>>>     <mailto:chesnay@apache.org <ma...@apache.org>>>
>> >>>    于2019年6月29日周六 下午3:17写道:
>> >>>>>>>>
>> >>>>>>>>> Does this imply that a Jenkins job is active as long
>> >>>    as the
>> >>>>>>>     Travis build
>> >>>>>>>>> runs?
>> >>>>>>>>>
>> >>>>>>>>> On 26/06/2019 21:28, Bowen Li wrote:
>> >>>>>>>>>> Hi,
>> >>>>>>>>>>
>> >>>>>>>>>> @Dawid, I think the "long test running" as I
>> >>>    mentioned in the
>> >>>>>>>     first
>> >>>>>>>>> email,
>> >>>>>>>>>> also as you guys said, belongs to "a big effort
>> >>>    which is much
>> >>>>>>>     harder to
>> >>>>>>>>>> accomplish in a short period of time and may deserve
>> >>>    its own
>> >>>>>>>     separate
>> >>>>>>>>>> discussion". Thus I didn't include it in what we can
>> >>>    do in a
>> >>>>>>>     foreseeable
>> >>>>>>>>>> short term.
>> >>>>>>>>>>
>> >>>>>>>>>> Besides, I don't think that's the ultimate reason
>> >>>    for lack of
>> >>>>>>>     build
>> >>>>>>>>>> resources. Even if the build is shortened to
>> >>>    something like
>> >>>>>>>     2h, the
>> >>>>>>>>>> problems of no build machine works about 6 or more
>> >>>    hours in
>> >>>>>>>     PST daytime
>> >>>>>>>>>> that I described will still happen, because no
>> >>>    machine from
>> >>>>>>>     ASF INFRA's
>> >>>>>>>>>> pool is allocated to Flink. As I have paid close
>> >>>    attention to
>> >>>>>>>     the build
>> >>>>>>>>>> queue in the past few weekdays, it's a pretty clear
>> >>>    pattern now.
>> >>>>>>>>>>
>> >>>>>>>>>> **The ultimate root cause** for that is - we don't
>> >>>    have any
>> >>>>>>>     **dedicated**
>> >>>>>>>>>> build resources that we can stably rely on. I'm
>> >>>    actually ok to
>> >>>>>>>     wait for a
>> >>>>>>>>>> long time if there are build requests running, it
>> >>>    means at
>> >>>>>>>     least we are
>> >>>>>>>>>> making progress. But I'm not ok with no build
>> >>>    resource. A
>> >>>>>>>     better place I
>> >>>>>>>>>> think we should aim at in short term is to always
>> >>>    have at
>> >>>>>>>     least a central
>> >>>>>>>>>> pool (can be 3 or 5) of machines dedicated to build
>> >>>    Flink at
>> >>>>>>>     any time, or
>> >>>>>>>>>> maybe use users resources.
>> >>>>>>>>>>
>> >>>>>>>>>> @Chesnay @Robert I synced with Jeff offline that
>> >>>    Zeppelin
>> >>>>>>>     community is
>> >>>>>>>>>> using a Jenkins job to automatically build on users'
>> >>>    travis
>> >>>>>>>     account and
>> >>>>>>>>>> link the result back to github PR. I guess the
>> >>>    Jenkins job
>> >>>>>>>     would fetch
>> >>>>>>>>>> latest upstream master and build the PR against it.
>> >>>    Jeff has
>> >>>>>>> filed
>> >>>>>>>>> tickets
>> >>>>>>>>>> to learn and get access to the Jenkins infra. It'll
>> >>>    better to
>> >>>>>>>     fully
>> >>>>>>>>>> understand it first before judging this approach.
>> >>>>>>>>>>
>> >>>>>>>>>> I also heard good things about CircleCI, and ASF
>> >>>    INFRA seems
>> >>>>>>>     to have a
>> >>>>>>>>> pool
>> >>>>>>>>>> of build capacity there too. Can be an alternative
>> >>>    to consider.
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> On Wed, Jun 26, 2019 at 12:44 AM Dawid Wysakowicz <
>> >>>>>>>>> dwysakowicz@apache.org
>> >>>    <ma...@apache.org> <mailto:dwysakowicz@apache.org
>> >>>    <ma...@apache.org>>>
>> >>>>>>>>>> wrote:
>> >>>>>>>>>>
>> >>>>>>>>>>> Sorry to jump in late, but I think Bowen missed the
>> >>>    most
>> >>>>>>>     important point
>> >>>>>>>>>>> from Chesnay's previous message in the summary. The
>> >>>    ultimate
>> >>>>>>>     reason for
>> >>>>>>>>>>> all the problems is that the tests take close to 2
>> >>>    hours to
>> >>>>>>>     run already.
>> >>>>>>>>>>> I fully support this claim: "Unless people start
>> >>>    caring about
>> >>>>>>>     test times
>> >>>>>>>>>>> before adding them, this issue cannot be solved"
>> >>>>>>>>>>>
>> >>>>>>>>>>> This is also another reason why using user's Travis
>> >>>    account
>> >>>>>>>     won't help.
>> >>>>>>>>>>> Every few weeks we reach the user's time limit for
>> >>>    a single
>> >>>>>>>     profile.
>> >>>>>>>>>>> This makes the user's builds simply fail, until we
>> >>>    either
>> >>>>>>>     properly
>> >>>>>>>>>>> decrease the time the tests take (which I am not
>> >>>    sure we ever
>> >>>>>>>     did) or
>> >>>>>>>>>>> postpone the problem by splitting into more
>> >>>    profiles. (Note
>> >>>>>>>     that the ASF
>> >>>>>>>>>>> Travis account has higher time limits)
>> >>>>>>>>>>>
>> >>>>>>>>>>> Best,
>> >>>>>>>>>>>
>> >>>>>>>>>>> Dawid
>> >>>>>>>>>>>
>> >>>>>>>>>>> On 26/06/2019 09:36, Robert Metzger wrote:
>> >>>>>>>>>>>> Do we know if using "the best" available hardware
>> >>>    would
>> >>>>>>>     improve the
>> >>>>>>>>> build
>> >>>>>>>>>>>> times?
>> >>>>>>>>>>>> Imagine we would run the build on machines with
>> >>>    plenty of
>> >>>>>>>     main memory
>> >>>>>>>>> to
>> >>>>>>>>>>>> mount everything to ramdisk + the latest CPU
>> >>>    architecture?
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Throwing hardware at the problem could help reduce
>> >>>    the time
>> >>>>>>>     of an
>> >>>>>>>>>>>> individual build, and using our own infrastructure
>> >>>    would
>> >>>>>>>     remove our
>> >>>>>>>>>>>> dependency on Apache's Travis account (with the
>> >>>    obvious
>> >>>>>>>     downside of
>> >>>>>>>>>>> having
>> >>>>>>>>>>>> to maintain the infrastructure)
>> >>>>>>>>>>>> We could use an open source travis alternative, to
>> >>>    have a
>> >>>>>>>     similar
>> >>>>>>>>>>>> experience and make the migration easy.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> On Wed, Jun 26, 2019 at 9:34 AM Chesnay Schepler
>> >>>>>>>     <chesnay@apache.org <ma...@apache.org>
>> >>>    <mailto:chesnay@apache.org <ma...@apache.org>>>
>> >>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>> From what I gathered, there's no special
>> >>>    sauce that the
>> >>>>>>>     Zeppelin
>> >>>>>>>>>>>>> project uses which actually integrates a users
>> >> Travis
>> >>>>>>>     account into the
>> >>>>>>>>>>> PR.
>> >>>>>>>>>>>>> They just disabled Travis for PRs. And that's
>> >>>    kind of it.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> Naturally we can do this (duh) and safe the ASF a
>> >>>    fair
>> >>>>>>>     amount of
>> >>>>>>>>>>>>> resources, but there are downsides:
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> The discoverability of the Travis check takes a
>> >>>    nose-dive.
>> >>>>>>>     Either we
>> >>>>>>>>>>>>> require every contributor to always, an every
>> >>>    commit, also
>> >>>>>>>     post a
>> >>>>>>>>> Travis
>> >>>>>>>>>>>>> build, or we have the reviewer sift through the
>> >>>>>>>     contributors account
>> >>>>>>>>> to
>> >>>>>>>>>>>>> find it.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> This is rather cumbersome. Additionally, it's
>> >>>    also not
>> >>>>>>>     equivalent to
>> >>>>>>>>>>>>> having a PR build.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> A normal branch build takes a branch as is and
>> >>>    tests it. A
>> >>>>>>>     PR build
>> >>>>>>>>>>>>> merges the branch into master, and then runs it.
>> >>>    (Fun fact:
>> >>>>>>>     This is
>> >>>>>>>>> why
>> >>>>>>>>>>>>> a PR without merge conflicts is not being run on
>> >>>    Travis.)
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> And ultimately, everyone can already make use of
>> >> this
>> >>>>>>>     approach anyway.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> On 25/06/2019 08:02, Jark Wu wrote:
>> >>>>>>>>>>>>>> Hi Jeff,
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> Thanks for sharing the Zeppelin approach. I
>> >>>    think it's a
>> >>>>>>>     good idea to
>> >>>>>>>>>>>>>> leverage user's travis account.
>> >>>>>>>>>>>>>> In this way, we can have almost unlimited
>> >>>    concurrent build
>> >>>>>>>     jobs and
>> >>>>>>>>>>>>>> developers can restart build by themselves
>> >>>    (currently only
>> >>>>>>>     committers
>> >>>>>>>>>>>>>> can restart PR's build).
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> But I'm still not very clear how to integrate
>> >> user's
>> >>>>>>>     travis build
>> >>>>>>>>> into
>> >>>>>>>>>>>>>> the Flink pull request's build automatically.
>> >>>    Can you
>> >>>>>>>     explain more in
>> >>>>>>>>>>>>>> detail?
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> Another question: does travis only build
>> >>>    branches for user
>> >>>>>>>     account?
>> >>>>>>>>>>>>>> My concern is that builds for PRs will rebase
>> >> user's
>> >>>>>>>     commits against
>> >>>>>>>>>>>>>> current master branch.
>> >>>>>>>>>>>>>> This will help us to find problems before
>> >>>    merge.  Builds
>> >>>>>>>     for branches
>> >>>>>>>>>>>>>> will lose the impact of new commits in master.
>> >>>>>>>>>>>>>> How does Zeppelin solve this problem?
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> Thanks again for sharing the idea.
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> Regards,
>> >>>>>>>>>>>>>> Jark
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> On Tue, 25 Jun 2019 at 11:01, Jeff Zhang
>> >>>    <zjffdu@gmail.com <ma...@gmail.com>
>> >>>>>>>     <mailto:zjffdu@gmail.com <ma...@gmail.com>>
>> >>>>>>>>>>>>>> <mailto:zjffdu@gmail.com
>> >>>    <ma...@gmail.com> <mailto:zjffdu@gmail.com
>> >>>    <ma...@gmail.com>>>> wrote:
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>      Hi Folks,
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> Zeppelin meet this kind of issue before, we solve
>> >>>>>>> it by
>> >>>>>>>>> delegating
>> >>>>>>>>>>>>>>      each
>> >>>>>>>>>>>>>>      one's PR build to his travis account
>> >>>    (Everyone can
>> >>>>>>>     have 5 free
>> >>>>>>>>>>>>>>      slot for
>> >>>>>>>>>>>>>> travis build).
>> >>>>>>>>>>>>>> Apache account travis build is only triggered when
>> >>>>>>>     PR is merged.
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>      Kurt Young <ykt836@gmail.com
>> >>>    <ma...@gmail.com>
>> >>>>>>>     <mailto:ykt836@gmail.com <ma...@gmail.com>>
>> >>>    <mailto:ykt836@gmail.com <ma...@gmail.com>
>> >>>>>>>     <mailto:ykt836@gmail.com <ma...@gmail.com>>>>
>> >>>>>>>>>>>>>> 于2019年6月25日周二 上午10:16写道:
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> (Forgot to cc George)
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> Best,
>> >>>>>>>>>>>>>>> Kurt
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> On Tue, Jun 25, 2019 at 10:16 AM Kurt Young
>> >>>>>>>     <ykt836@gmail.com <ma...@gmail.com>
>> >>>    <mailto:ykt836@gmail.com <ma...@gmail.com>>
>> >>>>>>>>>>>>>> <mailto:ykt836@gmail.com
>> >>>    <ma...@gmail.com> <mailto:ykt836@gmail.com
>> >>>    <ma...@gmail.com>>>>
>> >>>>>>>     wrote:
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> Hi Bowen,
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> Thanks for bringing this up. We
>> >>>    actually have
>> >>>>>>>     discussed
>> >>>>>>>>> about
>> >>>>>>>>>>>>>>      this, and I
>> >>>>>>>>>>>>>>>> think Till and George have
>> >>>>>>>>>>>>>>>> already spend sometime investigating
>> >>>    it. I have
>> >>>>>>>     cced both of
>> >>>>>>>>>>>>>>      them, and
>> >>>>>>>>>>>>>>>> maybe they can share
>> >>>>>>>>>>>>>>>> their findings.
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> Best,
>> >>>>>>>>>>>>>>>> Kurt
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> On Tue, Jun 25, 2019 at 10:08 AM Jark Wu
>> >>>>>>>     <imjark@gmail.com <ma...@gmail.com>
>> >>>    <mailto:imjark@gmail.com <ma...@gmail.com>>
>> >>>>>>>>>>>>>> <mailto:imjark@gmail.com
>> >>>    <ma...@gmail.com> <mailto:imjark@gmail.com
>> >>>    <ma...@gmail.com>>>>
>> >>>>>>>     wrote:
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> Hi Bowen,
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> Thanks for bringing this. We also
>> >>>    suffered from
>> >>>>>>>     the long
>> >>>>>>>>>>>>>>      build time.
>> >>>>>>>>>>>>>>>>> I agree that we should focus on
>> >>>    solving build
>> >>>>>>>     capacity
>> >>>>>>>>>>>>>> problem in the
>> >>>>>>>>>>>>>>>>> thread.
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> My observation is there is only one
>> >>>    build is
>> >>>>>>>     running, all
>> >>>>>>>>> the
>> >>>>>>>>>>>>>> others
>> >>>>>>>>>>>>>>>>> (other
>> >>>>>>>>>>>>>>>>> PRs, master) are pending.
>> >>>>>>>>>>>>>>>>> The pricing plan[1] of travis shows
>> >>>    it can
>> >>>>>>> support
>> >>>>>>>>> concurrent
>> >>>>>>>>>>>>>>      build
>> >>>>>>>>>>>>>>> jobs.
>> >>>>>>>>>>>>>>>>> But I don't know which plan we are
>> >>>    using, might
>> >>>>>>>     be the free
>> >>>>>>>>>>>>>>      plan for
>> >>>>>>>>>>>>>>> open
>> >>>>>>>>>>>>>>>>> source.
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> I cc-ed Chesnay who may have some
>> >>>    experience on
>> >>>>>>>     Travis.
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> Regards,
>> >>>>>>>>>>>>>>>>> Jark
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> [1]: https://travis-ci.com/plans
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> On Tue, 25 Jun 2019 at 08:11, Bowen Li <
>> >>>>>>>>> bowenli86@gmail.com <ma...@gmail.com>
>> >>>    <mailto:bowenli86@gmail.com <ma...@gmail.com>>
>> >>>>>>>>>>>>>> <mailto:bowenli86@gmail.com
>> >>>    <ma...@gmail.com>
>> >>>>>>>     <mailto:bowenli86@gmail.com
>> >>>    <ma...@gmail.com>>>> wrote:
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> Hi Steven,
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> I think you may not read what I
>> >>>    wrote. The
>> >>>>>>>     discussion is
>> >>>>>>>>>>> about
>> >>>>>>>>>>>>>>> "unstable
>> >>>>>>>>>>>>>>>>>> build **capacity**", in another word
>> >>>>>>>     "unstable / lack of
>> >>>>>>>>>>> build
>> >>>>>>>>>>>>>>>>> resources",
>> >>>>>>>>>>>>>>>>>> not "unstable build".
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> On Mon, Jun 24, 2019 at 4:40 PM
>> >>>    Steven Wu
>> >>>>>>>>>>>>>>      <stevenz3wu@gmail.com
>> >>>    <ma...@gmail.com> <mailto:stevenz3wu@gmail.com
>> >>>    <ma...@gmail.com>>
>> >>>>>>>     <mailto:stevenz3wu@gmail.com
>> >>>    <ma...@gmail.com> <mailto:stevenz3wu@gmail.com
>> >>>    <ma...@gmail.com>>>>
>> >>>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>> long and sometimes unstable build is
>> >>>>>>>     definitely a pain
>> >>>>>>>>>>>>> point.
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>> I suspect the build failure here in
>> >>>>>>>>> flink-connector-kafka
>> >>>>>>>>>>>>>>      is not
>> >>>>>>>>>>>>>>>>> related
>> >>>>>>>>>>>>>>>>>> to
>> >>>>>>>>>>>>>>>>>>> my change. but there is no easy
>> >>>    re-run the
>> >>>>>>>     build on
>> >>>>>>>>>>>>>> travis UI.
>> >>>>>>>>>>>>>>> Google
>> >>>>>>>>>>>>>>>>>>> search showed a trick of
>> >>>    close-and-open the
>> >>>>>>>     PR will
>> >>>>>>>>>>>>>> trigger rebuild.
>> >>>>>>>>>>>>>>>>> but
>> >>>>>>>>>>>>>>>>>>> that could add noises to the PR
>> >>>    activities.
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>> https://travis-ci.org/apache/flink/jobs/545555519
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>> travis-ci for my personal repo
>> >>>    often failed
>> >>>>>>>     with
>> >>>>>>>>>>>>>> exceeding time
>> >>>>>>>>>>>>>>> limit
>> >>>>>>>>>>>>>>>>>> after
>> >>>>>>>>>>>>>>>>>>> 4+ hours.
>> >>>>>>>>>>>>>>>>>>> The job exceeded the maximum time
>> >>>    limit for
>> >>>>>>>     jobs, and
>> >>>>>>>>> has
>> >>>>>>>>>>>>>>      been
>> >>>>>>>>>>>>>>>>>> terminated.
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>> On Mon, Jun 24, 2019 at 4:15 PM
>> >>>    Bowen Li
>> >>>>>>>>>>>>>>      <bowenli86@gmail.com
>> >>>    <ma...@gmail.com> <mailto:bowenli86@gmail.com
>> >>>    <ma...@gmail.com>>
>> >>>>>>>     <mailto:bowenli86@gmail.com <ma...@gmail.com>
>> >>>    <mailto:bowenli86@gmail.com <ma...@gmail.com>>>>
>> >>>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>> https://travis-ci.org/apache/flink/builds/549681530
>> >>>>>>>>>>>>>>      This build
>> >>>>>>>>>>>>>>>>>> request
>> >>>>>>>>>>>>>>>>>>>> has
>> >>>>>>>>>>>>>>>>>>>> been sitting at **HEAD of the
>> >>>    queue**
>> >>>>>>>     since I first
>> >>>>>>>>> saw
>> >>>>>>>>>>>>>>      it at PST
>> >>>>>>>>>>>>>>>>>> 10:30am
>> >>>>>>>>>>>>>>>>>>>> (not sure how long it's been
>> >>>    there before
>> >>>>>>>     10:30am).
>> >>>>>>>>>>>>>>      It's PST
>> >>>>>>>>>>>>>>> 4:12pm
>> >>>>>>>>>>>>>>>>> now
>> >>>>>>>>>>>>>>>>>>> and
>> >>>>>>>>>>>>>>>>>>>> it hasn't started yet.
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>> On Mon, Jun 24, 2019 at 2:48 PM
>> >>>    Bowen Li
>> >>>>>>>>>>>>>>      <bowenli86@gmail.com
>> >>>    <ma...@gmail.com> <mailto:bowenli86@gmail.com
>> >>>    <ma...@gmail.com>>
>> >>>>>>>     <mailto:bowenli86@gmail.com <ma...@gmail.com>
>> >>>    <mailto:bowenli86@gmail.com <ma...@gmail.com>>>>
>> >>>>>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>> Hi devs,
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>> I've been experiencing the pain
>> >>>>>>>     resulting from lack
>> >>>>>>>>>>>>>>      of stable
>> >>>>>>>>>>>>>>>>> build
>> >>>>>>>>>>>>>>>>>>>>> capacity on Travis for Flink
>> >>>    PRs [1].
>> >>>>>>>>> Specifically, I
>> >>>>>>>>>>>>>> noticed
>> >>>>>>>>>>>>>>>>> often
>> >>>>>>>>>>>>>>>>>>> that
>> >>>>>>>>>>>>>>>>>>>> no
>> >>>>>>>>>>>>>>>>>>>>> build in the queue is making any
>> >>>>>>>     progress for
>> >>>>>>>>> hours,
>> >>>>>>>>>>> and
>> >>>>>>>>>>>>>>> suddenly
>> >>>>>>>>>>>>>>>>> 5
>> >>>>>>>>>>>>>>>>>> or
>> >>>>>>>>>>>>>>>>>>> 6
>> >>>>>>>>>>>>>>>>>>>>> builds kick off all together
>> >>>    after the
>> >>>>>>>     long pause.
>> >>>>>>>>>>>>>>      I'm at PST
>> >>>>>>>>>>>>>>>>>> (UTC-08)
>> >>>>>>>>>>>>>>>>>>>> time
>> >>>>>>>>>>>>>>>>>>>>> zone, and I've seen pause can
>> >>>    be as
>> >>>>>>>     long as 6 hours
>> >>>>>>>>>>>>>>      from PST 9am
>> >>>>>>>>>>>>>>>>> to
>> >>>>>>>>>>>>>>>>>> 3pm
>> >>>>>>>>>>>>>>>>>>>>> (let alone the time needed to
>> >>>    drain the
>> >>>>>>>     queue
>> >>>>>>>>>>>>>> afterwards).
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>> I think this has greatly
>> >>>    impacted our
>> >>>>>>>     productivity.
>> >>>>>>>>>>> I've
>> >>>>>>>>>>>>>>>>> experienced
>> >>>>>>>>>>>>>>>>>>> that
>> >>>>>>>>>>>>>>>>>>>>> PRs submitted in the early
>> >>>    morning of
>> >>>>>>>     PST time zone
>> >>>>>>>>>>>>>>      won't finish
>> >>>>>>>>>>>>>>>>>> their
>> >>>>>>>>>>>>>>>>>>>>> build until late night of the
>> >>>    same day.
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>> So my questions are:
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>> - Has anyone else experienced
>> >>>    the same
>> >>>>>>>     problem or
>> >>>>>>>>>>>>>>      have similar
>> >>>>>>>>>>>>>>>>>>>> observation
>> >>>>>>>>>>>>>>>>>>>>> on TravisCI? (I suspect it
>> >>>    has things
>> >>>>>>>     to do with
>> >>>>>>>>> time
>> >>>>>>>>>>>>>>      zone)
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>> - What pricing plan of
>> >>>    TravisCI is
>> >>>>>>>     Flink currently
>> >>>>>>>>>>>>>> using? Is it
>> >>>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>> free
>> >>>>>>>>>>>>>>>>>>>>> plan for open source
>> >>>    projects? What
>> >>>>>>> are the
>> >>>>>>>>>>>>>> guaranteed build
>> >>>>>>>>>>>>>>>>> capacity
>> >>>>>>>>>>>>>>>>>>> of
>> >>>>>>>>>>>>>>>>>>>>> the current plan?
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>> - If the current pricing plan
>> >>>    (either
>> >>>>>>>     free or paid)
>> >>>>>>>>>>>>> can't
>> >>>>>>>>>>>>>>> provide
>> >>>>>>>>>>>>>>>>>>> stable
>> >>>>>>>>>>>>>>>>>>>>> build capacity, can we
>> >>>    upgrade to a
>> >>>>>>>     higher priced
>> >>>>>>>>>>>>>>      plan with
>> >>>>>>>>>>>>>>> larger
>> >>>>>>>>>>>>>>>>>> and
>> >>>>>>>>>>>>>>>>>>>> more
>> >>>>>>>>>>>>>>>>>>>>> stable build capacity?
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>> BTW, another factor that
>> >>>    contribute to
>> >>>>>>> the
>> >>>>>>>>>>>>>> productivity problem
>> >>>>>>>>>>>>>>> is
>> >>>>>>>>>>>>>>>>>> that
>> >>>>>>>>>>>>>>>>>>>>> our build is slow - we run
>> >>>    full build
>> >>>>>>>     for every PR
>> >>>>>>>>>>> and a
>> >>>>>>>>>>>>>>>>> successful
>> >>>>>>>>>>>>>>>>>>> full
>> >>>>>>>>>>>>>>>>>>>>> build takes ~5h. We
>> >>>    definitely have
>> >>>>>>>     more options to
>> >>>>>>>>>>>>>>      solve it,
>> >>>>>>>>>>>>>>> for
>> >>>>>>>>>>>>>>>>>>>> instance,
>> >>>>>>>>>>>>>>>>>>>>> modularize the build graphs
>> >>>    and reuse
>> >>>>>>>     artifacts
>> >>>>>>>>> from
>> >>>>>>>>>>> the
>> >>>>>>>>>>>>>>> previous
>> >>>>>>>>>>>>>>>>>>> build.
>> >>>>>>>>>>>>>>>>>>>>> But I think that can be a big
>> >>>    effort
>> >>>>>>>     which is much
>> >>>>>>>>>>>>>> harder to
>> >>>>>>>>>>>>>>>>>> accomplish
>> >>>>>>>>>>>>>>>>>>>> in
>> >>>>>>>>>>>>>>>>>>>>> a short period of time and
>> >>>    may deserve
>> >>>>>>>     its own
>> >>>>>>>>>>> separate
>> >>>>>>>>>>>>>>>>> discussion.
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>> [1]
>> >>>>>>>>> https://travis-ci.org/apache/flink/pull_requests
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>      --
>> >>>>>>>>>>>>>>      Best Regards
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>      Jeff Zhang
>> >>>>>>>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>
>> >>>>>>
>> >>>>>
>> >>>
>> >>
>> >>
>>
>>


Re:Re: [VOTE] Migrate to sponsored Travis account

Posted by Haibo Sun <su...@163.com>.
+1. Thank Chesnay for pushing this forward.

Best,
Haibo


At 2019-07-04 17:58:28, "Kurt Young" <yk...@gmail.com> wrote:
>+1 and great thanks Chesnay for pushing this.
>
>Best,
>Kurt
>
>
>On Thu, Jul 4, 2019 at 5:44 PM Aljoscha Krettek <al...@apache.org> wrote:
>
>> +1
>>
>> Aljoscha
>>
>> > On 4. Jul 2019, at 11:09, Stephan Ewen <se...@apache.org> wrote:
>> >
>> > +1 to move to a private Travis account.
>> >
>> > I can confirm that Ververica will sponsor a Travis CI plan that is
>> > equivalent or a bit higher than the previous ASF quota (10 concurrent
>> build
>> > queues)
>> >
>> > Best,
>> > Stephan
>> >
>> > On Thu, Jul 4, 2019 at 10:46 AM Chesnay Schepler <ch...@apache.org>
>> wrote:
>> >
>> >> I've raised a JIRA
>> >> <https://issues.apache.org/jira/browse/INFRA-18703>with INFRA to
>> inquire
>> >> whether it would be possible to switch to a different Travis account,
>> >> and if so what steps would need to be taken.
>> >> We need a proper confirmation from INFRA since we are not in full
>> >> control of the flink repository (for example, we cannot access the
>> >> settings page).
>> >>
>> >> If this is indeed possible, Ververica is willing sponsor a Travis
>> >> account for the Flink project.
>> >> This would provide us with more than enough resources than we need.
>> >>
>> >> Since this makes the project more reliant on resources provided by
>> >> external companies I would like to vote on this.
>> >>
>> >> Please vote on this proposal, as follows:
>> >> [ ] +1, Approve the migration to a Ververica-sponsored Travis account,
>> >> provided that INFRA approves
>> >> [ ] -1, Do not approach the migration to a Ververica-sponsored Travis
>> >> account
>> >>
>> >> The vote will be open for at least 24h, and until we have confirmation
>> >> from INFRA. The voting period may be shorter than the usual 3 days since
>> >> our current is effectively not working.
>> >>
>> >> On 04/07/2019 06:51, Bowen Li wrote:
>> >>> Re: > Are they using their own Travis CI pool, or did the switch to an
>> >>> entirely different CI service?
>> >>>
>> >>> I reached out to Wes and Krisztián from Apache Arrow PMC. They are
>> >>> currently moving away from ASF's Travis to their own in-house metal
>> >>> machines at [1] with custom CI application at [2]. They've seen
>> >>> significant improvement w.r.t both much higher performance and
>> >>> basically no resource waiting time, "night-and-day" difference quoting
>> >>> Wes.
>> >>>
>> >>> Re: > If we can just switch to our own Travis pool, just for our
>> >>> project, then this might be something we can do fairly quickly?
>> >>>
>> >>> I believe so, according to [3] and [4]
>> >>>
>> >>>
>> >>> [1] https://ci.ursalabs.org/ <https://ci.ursalabs.org/#/>
>> >>> [2] https://github.com/ursa-labs/ursabot
>> >>> [3]
>> >>>
>> https://docs.travis-ci.com/user/migrate/open-source-repository-migration
>> >>> [4]
>> https://docs.travis-ci.com/user/migrate/open-source-on-travis-ci-com
>> >>>
>> >>>
>> >>>
>> >>> On Wed, Jul 3, 2019 at 12:01 AM Chesnay Schepler <chesnay@apache.org
>> >>> <ma...@apache.org>> wrote:
>> >>>
>> >>>    Are they using their own Travis CI pool, or did the switch to an
>> >>>    entirely different CI service?
>> >>>
>> >>>    If we can just switch to our own Travis pool, just for our
>> >>>    project, then
>> >>>    this might be something we can do fairly quickly?
>> >>>
>> >>>    On 03/07/2019 05:55, Bowen Li wrote:
>> >>>> I responded in the INFRA ticket [1] that I believe they are
>> >>>    using a wrong
>> >>>> metric against Flink and the total build time is a completely
>> >>>    different
>> >>>> thing than guaranteed build capacity.
>> >>>>
>> >>>> My response:
>> >>>>
>> >>>> "As mentioned above, since I started to pay attention to Flink's
>> >>>    build
>> >>>> queue a few tens of days ago, I'm in Seattle and I saw no build
>> >>>    was kicking
>> >>>> off in PST daytime in weekdays for Flink. Our teammates in China
>> >>>    and Europe
>> >>>> have also reported similar observations. So we need to evaluate
>> >>>    how the
>> >>>> large total build time came from - if 1) your number and 2) our
>> >>>> observations from three locations that cover pretty much a full
>> >>>    day, are
>> >>>> all true, I **guess** one reason can be that - highly likely the
>> >>>    extra
>> >>>> build time came from weekends when other Apache projects may be
>> >>>    idle and
>> >>>> Flink just drains hard its congested queue.
>> >>>>
>> >>>> Please be aware of that we're not complaining about the lack of
>> >>>    resources
>> >>>> in general, I'm complaining about the lack of **stable, dedicated**
>> >>>> resources. An example for the latter one is, currently even if
>> >>>    no build is
>> >>>> in Flink's queue and I submit a request to be the queue head in PST
>> >>>> morning, my build won't even start in 6-8+h. That is an absurd
>> >>>    amount of
>> >>>> waiting time.
>> >>>>
>> >>>> That's saying, if ASF INFRA decides to adopt a quota system and
>> >>>    grants
>> >>>> Flink five DEDICATED servers that runs all the time only for
>> >>>    Flink, that'll
>> >>>> be PERFECT and can totally solve our problem now.
>> >>>>
>> >>>> Please be aware of that we're not complaining about the lack of
>> >>>    resources
>> >>>> in general, I'm complaining about the lack of **stable, dedicated**
>> >>>> resources. An example for the latter one is, currently even if
>> >>>    no build is
>> >>>> in Flink's queue and I submit a request to be the queue head in PST
>> >>>> morning, my build won't even start in 6-8+h. That is an absurd
>> >>>    amount of
>> >>>> waiting time.
>> >>>>
>> >>>>
>> >>>> That's saying, if ASF INFRA decides to adopt a quota system and
>> >>>    grants
>> >>>> Flink five DEDICATED servers that runs all the time only for
>> >>>    Flink, that'll
>> >>>> be PERFECT and can totally solve our problem now.
>> >>>>
>> >>>> I feel what's missing in the ASF INFRA's Travis resource pool is
>> >>>    some level
>> >>>> of build capacity SLAs and certainty"
>> >>>>
>> >>>>
>> >>>> Again, I believe there are differences in nature of these two
>> >>>    problems,
>> >>>> long build time v.s. lack of dedicated build resource. That's
>> >>>    saying,
>> >>>> shortening build time may relieve the situation, and may not.
>> >>>    I'm sightly
>> >>>> negative on disabling IT cases for PRs, due to the downside is
>> >>>    that we are
>> >>>> at risk of any potential bugs in PR that UTs doesn't catch, and
>> >>>    may cost a
>> >>>> lot more to fix and if it slows others down or even block
>> >>>    others, but am
>> >>>> open to others opinions on it.
>> >>>>
>> >>>> AFAICT from INFRA ticket[1], donating to ASF INFRA won't be
>> >>>    feasible to
>> >>>> solve our problem since INFRA's pool is fully shared and they
>> >>>    have no
>> >>>> control and finer insights over resource allocation to a
>> >>>    specific Apache
>> >>>> project. As mentioned in [1], Apache Arrow is moving away from
>> >>>    ASF INFRA
>> >>>> Travis pool (they are actually surprised Flink hasn't plan to do
>> >>>    so). I
>> >>>> know that Spark is on its own build infra. If we all agree that
>> >>>    funding our
>> >>>> own build infra, I'd be glad to help investigate any potential
>> >>>    options
>> >>>> after releasing 1.9 since I'm super busy with 1.9 now.
>> >>>>
>> >>>> [1] https://issues.apache.org/jira/browse/INFRA-18533
>> >>>>
>> >>>>
>> >>>>
>> >>>> On Tue, Jul 2, 2019 at 4:46 AM Chesnay Schepler
>> >>>    <chesnay@apache.org <ma...@apache.org>> wrote:
>> >>>>
>> >>>>> As a short-term stopgap, since we can assume this issue to
>> >>>    become much
>> >>>>> worse in the following days/weeks, we could disable IT cases in
>> >>>    PRs and
>> >>>>> only run them on master.
>> >>>>>
>> >>>>> On 02/07/2019 12:03, Chesnay Schepler wrote:
>> >>>>>> People really have to stop thinking that just because
>> >>>    something works
>> >>>>>> for us it is also a good solution.
>> >>>>>> Also, please remember that our builds run for 2h from start to
>> >>>    finish,
>> >>>>>> and not the 14 _minutes_ it takes for zeppelin.
>> >>>>>> We are dealing with an entirely different scale here, both in
>> >>>    terms of
>> >>>>>> build times and number of builds.
>> >>>>>>
>> >>>>>> In this very thread people have been complaining about long queue
>> >>>>>> times for their builds. Surprise, other Apache projects have been
>> >>>>>> suffering the very same thing due to us not controlling our build
>> >>>>>> times. While switching services (be it Jenkins, CircleCI or
>> >>>    whatever)
>> >>>>>> will possibly work for us (and these options are actually
>> >>>    attractive,
>> >>>>>> like CircleCI's proper support for build artifacts), it will also
>> >>>>>> result in us likely negatively affecting other projects in
>> >>>    significant
>> >>>>>> ways.
>> >>>>>>
>> >>>>>> Sure, the Jenkins setup has a good user experience for us, at
>> >>>    the cost
>> >>>>>> of blocking Jenkins workers for a _lot_ of time. Right now we
>> >>>    have 25
>> >>>>>> PR's in our queue; that's possibly 50h we'd consume of Jenkins
>> >>>>>> resources, and the European contributors haven't even really
>> >>>    started yet.
>> >>>>>>
>> >>>>>> FYI, the latest INFRA response from INFRA-18533:
>> >>>>>>
>> >>>>>> "Our rough metrics shows that Flink used over 5800 hours of
>> >>>    build time
>> >>>>>> last month. That is equal to EIGHT servers running 24/7 for
>> >>>    the ENTIRE
>> >>>>>> MONTH. EIGHT. nonstop.
>> >>>>>> When we discovered this last night, we discussed it some and
>> >>>    are going
>> >>>>>> to tune down Flink to allow only five executors maximum. We
>> >> cannot
>> >>>>>> allow Flink to consume so much of a Foundation shared resource."
>> >>>>>>
>> >>>>>> So yes, we either
>> >>>>>> a) have to heavily reduce our CI usage or
>> >>>>>> b) fund our own, either maintaining it ourselves or donating
>> >>>    to Apache.
>> >>>>>>
>> >>>>>> On 02/07/2019 05:11, Bowen Li wrote:
>> >>>>>>> By looking at the git history of the Jenkins script, its core
>> >>>    part
>> >>>>>>> was finished in March 2017 (and only two minor update in
>> >>>    2017/2018),
>> >>>>>>> so it's been running for over two years now and feels like
>> >>>    Zepplin
>> >>>>>>> community has been quite happy with it. @Jeff Zhang
>> >>>>>>> <mailto:zjffdu@gmail.com <ma...@gmail.com>> can you
>> >>>    share your insights and user
>> >>>>>>> experience with the Jenkins+Travis approach?
>> >>>>>>>
>> >>>>>>> Things like:
>> >>>>>>>
>> >>>>>>> - has the approach completely solved the resource capacity
>> >>>    problem
>> >>>>>>> for Zepplin community? is Zepplin community happy with the
>> >>>    result?
>> >>>>>>> - is the whole configuration chain stable (e.g. uptime) enough?
>> >>>>>>> - how often do you need to maintain the Jenkins infra? how many
>> >>>>>>> people are usually involved in maintenance and bug-fixes?
>> >>>>>>>
>> >>>>>>> The downside of this approach seems mostly to be on the
>> >>>    maintenance
>> >>>>>>> to me - maintain the script and Jenkins infra.
>> >>>>>>>
>> >>>>>>> ** Having Our Own Travis-CI.com Account **
>> >>>>>>>
>> >>>>>>> Another alternative I've been thinking of is to have our own
>> >>>>>>> travis-ci.com <http://travis-ci.com> <http://travis-ci.com>
>> >>>    account with paid dedicated
>> >>>>>>> resources. Note travis-ci.org <http://travis-ci.org>
>> >>>    <http://travis-ci.org> is the free
>> >>>>>>> version and travis-ci.com <http://travis-ci.com>
>> >>>    <http://travis-ci.com> is the commercial
>> >>>>>>> version. We currently use a shared resource pool managed by
>> >>>    ASK INFRA
>> >>>>>>> team on travis-ci.org <http://travis-ci.org>
>> >>>    <http://travis-ci.org>, but we have no control
>> >>>>>>> over it - we can't see how it's configured, how much
>> >>>    resources are
>> >>>>>>> available, how resources are allocated among Apache projects,
>> >>>    etc.
>> >>>>>>> The nice thing about having an account on travis-ci.com
>> >>>    <http://travis-ci.com>
>> >>>>>>> <http://travis-ci.com> are:
>> >>>>>>>
>> >>>>>>> - relatively low cost with much better resource guarantee
>> >>>    than what
>> >>>>>>> we currently have [1]: $249/month with 5 dedicated concurrency,
>> >>>>>>> $489/month with 10 concurrency
>> >>>>>>> - low maintenance work compared to using Jenkins
>> >>>>>>> - (potentially) no migration cost according to Travis's doc [2]
>> >>>>>>> (pending verification)
>> >>>>>>> - full control over the build capacity/configuration compared to
>> >>>>>>> using ASF INFRA's pool
>> >>>>>>>
>> >>>>>>> I'd be surprised if we as such a vibrant community cannot
>> >>>    find and
>> >>>>>>> fund $249*12=$2988 a year in exchange for a much better
>> >> developer
>> >>>>>>> experience and much higher productivity.
>> >>>>>>>
>> >>>>>>> [1] https://travis-ci.com/plans
>> >>>>>>> [2]
>> >>>>>>>
>> >>>>>
>> >>>
>> >>
>> https://docs.travis-ci.com/user/migrate/open-source-repository-migration
>> >>>>>>> On Sat, Jun 29, 2019 at 8:39 AM Chesnay Schepler
>> >>>    <chesnay@apache.org <ma...@apache.org>
>> >>>>>>> <mailto:chesnay@apache.org <ma...@apache.org>>> wrote:
>> >>>>>>>
>> >>>>>>>     So yes, the Jenkins job keeps pulling the state from
>> >>>    Travis until it
>> >>>>>>>     finishes.
>> >>>>>>>
>> >>>>>>>     Note sure I'm comfortable with the idea of using Jenkins
>> >>>    workers
>> >>>>>>>     just to
>> >>>>>>>     idle for a several hours.
>> >>>>>>>
>> >>>>>>>     On 29/06/2019 14:56, Jeff Zhang wrote:
>> >>>>>>>> Here's what zeppelin community did, we make a python
>> >>>    script to
>> >>>>>>>     check the
>> >>>>>>>> build status of pull request.
>> >>>>>>>> Here's script:
>> >>>>>>>>
>> >>>    https://github.com/apache/zeppelin/blob/master/travis_check.py
>> >>>>>>>>
>> >>>>>>>> And this is the script we used in Jenkins build job.
>> >>>>>>>>
>> >>>>>>>> if [ -f "travis_check.py" ]; then
>> >>>>>>>>   git log -n 1
>> >>>>>>>>   STATUS=$(curl -s $BUILD_URL | grep -e "GitHub pull
>> >>>>>>>     request.*from.*" | sed
>> >>>>>>>> 's/.*GitHub pull request <a
>> >>>>>>>> href=\"\(https[^"]*\).*from[^"]*.\(https[^"]*\).*/\1
>> >>>    \2/g')
>> >>>>>>>>   AUTHOR=$(echo $STATUS | sed 's/.*[/]\(.*\)$/\1/g')
>> >>>>>>>>   PR=$(echo $STATUS | awk '{print $1}' | sed
>> >>>>>>> 's/.*[/]\(.*\)$/\1/g')
>> >>>>>>>>   #COMMIT=$(git log -n 1 | grep "^Merge:" | awk
>> >>>    '{print $3}')
>> >>>>>>>>   #if [ -z $COMMIT ]; then
>> >>>>>>>>   #  COMMIT=$(curl -s
>> >>>>>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR
>> >>>>>>>> | grep -e "\"label\":" -e "\"ref\":" -e "\"sha\":" |
>> >>>    tr '\n' ' '
>> >>>>>>>     | sed
>> >>>>>>>> 's/\(.*sha[^,]*,\)\(.*ref.*\)/\1 = \2/g' | tr = '\n' |
>> >>>    grep -v
>> >>>>>>>     "apache:" |
>> >>>>>>>> sed 's/.*sha.[^"]*["]\([^"]*\).*/\1/g')
>> >>>>>>>>   #fi
>> >>>>>>>>
>> >>>>>>>>   # get commit hash from PR
>> >>>>>>>>   COMMIT=$(curl -s
>> >>>>>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR |
>> >>>>>>>> grep -e "\"label\":" -e "\"ref\":" -e "\"sha\":" | tr
>> >>>    '\n' ' '
>> >>>>>>> | sed
>> >>>>>>>> 's/\(.*sha[^,]*,\)\(.*ref.*\)/\1 = \2/g' | tr = '\n' |
>> >>>    grep -v
>> >>>>>>>     "apache:" |
>> >>>>>>>> sed 's/.*sha.[^"]*["]\([^"]*\).*/\1/g')
>> >>>>>>>>   sleep 30 # sleep few moment to wait travis starts
>> >>>    the build
>> >>>>>>>>   RET_CODE=0
>> >>>>>>>>   python ./travis_check.py ${AUTHOR} ${COMMIT} ||
>> >>>    RET_CODE=$?
>> >>>>>>>>   if [ $RET_CODE -eq 2 ]; then # try with repository
>> >>>    name when
>> >>>>>>>     travis-ci is
>> >>>>>>>> not available in the account
>> >>>>>>>>     RET_CODE=0
>> >>>>>>>>     AUTHOR=$(curl -s
>> >>>>>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR
>> >>>>>>>> | grep '"full_name":' | grep -v "apache/zeppelin" | sed
>> >>>>>>>> 's/.*[:][^"]*["]\([^/]*\).*/\1/g')
>> >>>>>>>>   python ./travis_check.py ${AUTHOR} ${COMMIT} ||
>> >>>    RET_CODE=$?
>> >>>>>>>>   fi
>> >>>>>>>>
>> >>>>>>>>   if [ $RET_CODE -eq 2 ]; then # fail with can't find
>> >>>    build
>> >>>>>>>     information in
>> >>>>>>>> the travis
>> >>>>>>>>     set +x
>> >>>>>>>>     echo
>> >>>    "-----------------------------------------------------"
>> >>>>>>>>     echo "Looks like travis-ci is not configured for
>> >>>    your fork."
>> >>>>>>>>     echo "Please setup by swich on 'zeppelin'
>> >>>    repository at
>> >>>>>>>> https://travis-ci.org/profile and travis-ci."
>> >>>>>>>>     echo "And then make sure 'Build branch updates'
>> >>>    option is
>> >>>>>>>     enabled in
>> >>>>>>>> the settings
>> >>>    https://travis-ci.org/${AUTHOR}/zeppelin/settings
>> >>>    <https://travis-ci.org/$%7BAUTHOR%7D/zeppelin/settings>
>> >>>>>>> <https://travis-ci.org/$%7BAUTHOR%7D/zeppelin/settings>."
>> >>>>>>>>     echo ""
>> >>>>>>>>     echo "To trigger CI after setup, you will need
>> >>>    ammend your
>> >>>>>>>     last commit
>> >>>>>>>> with"
>> >>>>>>>>     echo "git commit --amend"
>> >>>>>>>>     echo "git push your-remote HEAD --force"
>> >>>>>>>>     echo ""
>> >>>>>>>>     echo "See
>> >>>>>>>>
>> >>>>>>>
>> >>>>>
>> >>>
>> >>
>> http://zeppelin.apache.org/contribution/contributions.html#continuous-integration
>> >>>>>>>> ."
>> >>>>>>>>   fi
>> >>>>>>>>
>> >>>>>>>>   exit $RET_CODE
>> >>>>>>>> else
>> >>>>>>>>   set +x
>> >>>>>>>>   echo "travis_check.py does not exists"
>> >>>>>>>>   exit 1
>> >>>>>>>> fi
>> >>>>>>>>
>> >>>>>>>> Chesnay Schepler <chesnay@apache.org
>> >>>    <ma...@apache.org>
>> >>>>>>>     <mailto:chesnay@apache.org <ma...@apache.org>>>
>> >>>    于2019年6月29日周六 下午3:17写道:
>> >>>>>>>>
>> >>>>>>>>> Does this imply that a Jenkins job is active as long
>> >>>    as the
>> >>>>>>>     Travis build
>> >>>>>>>>> runs?
>> >>>>>>>>>
>> >>>>>>>>> On 26/06/2019 21:28, Bowen Li wrote:
>> >>>>>>>>>> Hi,
>> >>>>>>>>>>
>> >>>>>>>>>> @Dawid, I think the "long test running" as I
>> >>>    mentioned in the
>> >>>>>>>     first
>> >>>>>>>>> email,
>> >>>>>>>>>> also as you guys said, belongs to "a big effort
>> >>>    which is much
>> >>>>>>>     harder to
>> >>>>>>>>>> accomplish in a short period of time and may deserve
>> >>>    its own
>> >>>>>>>     separate
>> >>>>>>>>>> discussion". Thus I didn't include it in what we can
>> >>>    do in a
>> >>>>>>>     foreseeable
>> >>>>>>>>>> short term.
>> >>>>>>>>>>
>> >>>>>>>>>> Besides, I don't think that's the ultimate reason
>> >>>    for lack of
>> >>>>>>>     build
>> >>>>>>>>>> resources. Even if the build is shortened to
>> >>>    something like
>> >>>>>>>     2h, the
>> >>>>>>>>>> problems of no build machine works about 6 or more
>> >>>    hours in
>> >>>>>>>     PST daytime
>> >>>>>>>>>> that I described will still happen, because no
>> >>>    machine from
>> >>>>>>>     ASF INFRA's
>> >>>>>>>>>> pool is allocated to Flink. As I have paid close
>> >>>    attention to
>> >>>>>>>     the build
>> >>>>>>>>>> queue in the past few weekdays, it's a pretty clear
>> >>>    pattern now.
>> >>>>>>>>>>
>> >>>>>>>>>> **The ultimate root cause** for that is - we don't
>> >>>    have any
>> >>>>>>>     **dedicated**
>> >>>>>>>>>> build resources that we can stably rely on. I'm
>> >>>    actually ok to
>> >>>>>>>     wait for a
>> >>>>>>>>>> long time if there are build requests running, it
>> >>>    means at
>> >>>>>>>     least we are
>> >>>>>>>>>> making progress. But I'm not ok with no build
>> >>>    resource. A
>> >>>>>>>     better place I
>> >>>>>>>>>> think we should aim at in short term is to always
>> >>>    have at
>> >>>>>>>     least a central
>> >>>>>>>>>> pool (can be 3 or 5) of machines dedicated to build
>> >>>    Flink at
>> >>>>>>>     any time, or
>> >>>>>>>>>> maybe use users resources.
>> >>>>>>>>>>
>> >>>>>>>>>> @Chesnay @Robert I synced with Jeff offline that
>> >>>    Zeppelin
>> >>>>>>>     community is
>> >>>>>>>>>> using a Jenkins job to automatically build on users'
>> >>>    travis
>> >>>>>>>     account and
>> >>>>>>>>>> link the result back to github PR. I guess the
>> >>>    Jenkins job
>> >>>>>>>     would fetch
>> >>>>>>>>>> latest upstream master and build the PR against it.
>> >>>    Jeff has
>> >>>>>>> filed
>> >>>>>>>>> tickets
>> >>>>>>>>>> to learn and get access to the Jenkins infra. It'll
>> >>>    better to
>> >>>>>>>     fully
>> >>>>>>>>>> understand it first before judging this approach.
>> >>>>>>>>>>
>> >>>>>>>>>> I also heard good things about CircleCI, and ASF
>> >>>    INFRA seems
>> >>>>>>>     to have a
>> >>>>>>>>> pool
>> >>>>>>>>>> of build capacity there too. Can be an alternative
>> >>>    to consider.
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> On Wed, Jun 26, 2019 at 12:44 AM Dawid Wysakowicz <
>> >>>>>>>>> dwysakowicz@apache.org
>> >>>    <ma...@apache.org> <mailto:dwysakowicz@apache.org
>> >>>    <ma...@apache.org>>>
>> >>>>>>>>>> wrote:
>> >>>>>>>>>>
>> >>>>>>>>>>> Sorry to jump in late, but I think Bowen missed the
>> >>>    most
>> >>>>>>>     important point
>> >>>>>>>>>>> from Chesnay's previous message in the summary. The
>> >>>    ultimate
>> >>>>>>>     reason for
>> >>>>>>>>>>> all the problems is that the tests take close to 2
>> >>>    hours to
>> >>>>>>>     run already.
>> >>>>>>>>>>> I fully support this claim: "Unless people start
>> >>>    caring about
>> >>>>>>>     test times
>> >>>>>>>>>>> before adding them, this issue cannot be solved"
>> >>>>>>>>>>>
>> >>>>>>>>>>> This is also another reason why using user's Travis
>> >>>    account
>> >>>>>>>     won't help.
>> >>>>>>>>>>> Every few weeks we reach the user's time limit for
>> >>>    a single
>> >>>>>>>     profile.
>> >>>>>>>>>>> This makes the user's builds simply fail, until we
>> >>>    either
>> >>>>>>>     properly
>> >>>>>>>>>>> decrease the time the tests take (which I am not
>> >>>    sure we ever
>> >>>>>>>     did) or
>> >>>>>>>>>>> postpone the problem by splitting into more
>> >>>    profiles. (Note
>> >>>>>>>     that the ASF
>> >>>>>>>>>>> Travis account has higher time limits)
>> >>>>>>>>>>>
>> >>>>>>>>>>> Best,
>> >>>>>>>>>>>
>> >>>>>>>>>>> Dawid
>> >>>>>>>>>>>
>> >>>>>>>>>>> On 26/06/2019 09:36, Robert Metzger wrote:
>> >>>>>>>>>>>> Do we know if using "the best" available hardware
>> >>>    would
>> >>>>>>>     improve the
>> >>>>>>>>> build
>> >>>>>>>>>>>> times?
>> >>>>>>>>>>>> Imagine we would run the build on machines with
>> >>>    plenty of
>> >>>>>>>     main memory
>> >>>>>>>>> to
>> >>>>>>>>>>>> mount everything to ramdisk + the latest CPU
>> >>>    architecture?
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Throwing hardware at the problem could help reduce
>> >>>    the time
>> >>>>>>>     of an
>> >>>>>>>>>>>> individual build, and using our own infrastructure
>> >>>    would
>> >>>>>>>     remove our
>> >>>>>>>>>>>> dependency on Apache's Travis account (with the
>> >>>    obvious
>> >>>>>>>     downside of
>> >>>>>>>>>>> having
>> >>>>>>>>>>>> to maintain the infrastructure)
>> >>>>>>>>>>>> We could use an open source travis alternative, to
>> >>>    have a
>> >>>>>>>     similar
>> >>>>>>>>>>>> experience and make the migration easy.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> On Wed, Jun 26, 2019 at 9:34 AM Chesnay Schepler
>> >>>>>>>     <chesnay@apache.org <ma...@apache.org>
>> >>>    <mailto:chesnay@apache.org <ma...@apache.org>>>
>> >>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>> From what I gathered, there's no special
>> >>>    sauce that the
>> >>>>>>>     Zeppelin
>> >>>>>>>>>>>>> project uses which actually integrates a users
>> >> Travis
>> >>>>>>>     account into the
>> >>>>>>>>>>> PR.
>> >>>>>>>>>>>>> They just disabled Travis for PRs. And that's
>> >>>    kind of it.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> Naturally we can do this (duh) and safe the ASF a
>> >>>    fair
>> >>>>>>>     amount of
>> >>>>>>>>>>>>> resources, but there are downsides:
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> The discoverability of the Travis check takes a
>> >>>    nose-dive.
>> >>>>>>>     Either we
>> >>>>>>>>>>>>> require every contributor to always, an every
>> >>>    commit, also
>> >>>>>>>     post a
>> >>>>>>>>> Travis
>> >>>>>>>>>>>>> build, or we have the reviewer sift through the
>> >>>>>>>     contributors account
>> >>>>>>>>> to
>> >>>>>>>>>>>>> find it.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> This is rather cumbersome. Additionally, it's
>> >>>    also not
>> >>>>>>>     equivalent to
>> >>>>>>>>>>>>> having a PR build.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> A normal branch build takes a branch as is and
>> >>>    tests it. A
>> >>>>>>>     PR build
>> >>>>>>>>>>>>> merges the branch into master, and then runs it.
>> >>>    (Fun fact:
>> >>>>>>>     This is
>> >>>>>>>>> why
>> >>>>>>>>>>>>> a PR without merge conflicts is not being run on
>> >>>    Travis.)
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> And ultimately, everyone can already make use of
>> >> this
>> >>>>>>>     approach anyway.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> On 25/06/2019 08:02, Jark Wu wrote:
>> >>>>>>>>>>>>>> Hi Jeff,
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> Thanks for sharing the Zeppelin approach. I
>> >>>    think it's a
>> >>>>>>>     good idea to
>> >>>>>>>>>>>>>> leverage user's travis account.
>> >>>>>>>>>>>>>> In this way, we can have almost unlimited
>> >>>    concurrent build
>> >>>>>>>     jobs and
>> >>>>>>>>>>>>>> developers can restart build by themselves
>> >>>    (currently only
>> >>>>>>>     committers
>> >>>>>>>>>>>>>> can restart PR's build).
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> But I'm still not very clear how to integrate
>> >> user's
>> >>>>>>>     travis build
>> >>>>>>>>> into
>> >>>>>>>>>>>>>> the Flink pull request's build automatically.
>> >>>    Can you
>> >>>>>>>     explain more in
>> >>>>>>>>>>>>>> detail?
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> Another question: does travis only build
>> >>>    branches for user
>> >>>>>>>     account?
>> >>>>>>>>>>>>>> My concern is that builds for PRs will rebase
>> >> user's
>> >>>>>>>     commits against
>> >>>>>>>>>>>>>> current master branch.
>> >>>>>>>>>>>>>> This will help us to find problems before
>> >>>    merge.  Builds
>> >>>>>>>     for branches
>> >>>>>>>>>>>>>> will lose the impact of new commits in master.
>> >>>>>>>>>>>>>> How does Zeppelin solve this problem?
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> Thanks again for sharing the idea.
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> Regards,
>> >>>>>>>>>>>>>> Jark
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> On Tue, 25 Jun 2019 at 11:01, Jeff Zhang
>> >>>    <zjffdu@gmail.com <ma...@gmail.com>
>> >>>>>>>     <mailto:zjffdu@gmail.com <ma...@gmail.com>>
>> >>>>>>>>>>>>>> <mailto:zjffdu@gmail.com
>> >>>    <ma...@gmail.com> <mailto:zjffdu@gmail.com
>> >>>    <ma...@gmail.com>>>> wrote:
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>      Hi Folks,
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> Zeppelin meet this kind of issue before, we solve
>> >>>>>>> it by
>> >>>>>>>>> delegating
>> >>>>>>>>>>>>>>      each
>> >>>>>>>>>>>>>>      one's PR build to his travis account
>> >>>    (Everyone can
>> >>>>>>>     have 5 free
>> >>>>>>>>>>>>>>      slot for
>> >>>>>>>>>>>>>> travis build).
>> >>>>>>>>>>>>>> Apache account travis build is only triggered when
>> >>>>>>>     PR is merged.
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>      Kurt Young <ykt836@gmail.com
>> >>>    <ma...@gmail.com>
>> >>>>>>>     <mailto:ykt836@gmail.com <ma...@gmail.com>>
>> >>>    <mailto:ykt836@gmail.com <ma...@gmail.com>
>> >>>>>>>     <mailto:ykt836@gmail.com <ma...@gmail.com>>>>
>> >>>>>>>>>>>>>> 于2019年6月25日周二 上午10:16写道:
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> (Forgot to cc George)
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> Best,
>> >>>>>>>>>>>>>>> Kurt
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> On Tue, Jun 25, 2019 at 10:16 AM Kurt Young
>> >>>>>>>     <ykt836@gmail.com <ma...@gmail.com>
>> >>>    <mailto:ykt836@gmail.com <ma...@gmail.com>>
>> >>>>>>>>>>>>>> <mailto:ykt836@gmail.com
>> >>>    <ma...@gmail.com> <mailto:ykt836@gmail.com
>> >>>    <ma...@gmail.com>>>>
>> >>>>>>>     wrote:
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> Hi Bowen,
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> Thanks for bringing this up. We
>> >>>    actually have
>> >>>>>>>     discussed
>> >>>>>>>>> about
>> >>>>>>>>>>>>>>      this, and I
>> >>>>>>>>>>>>>>>> think Till and George have
>> >>>>>>>>>>>>>>>> already spend sometime investigating
>> >>>    it. I have
>> >>>>>>>     cced both of
>> >>>>>>>>>>>>>>      them, and
>> >>>>>>>>>>>>>>>> maybe they can share
>> >>>>>>>>>>>>>>>> their findings.
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> Best,
>> >>>>>>>>>>>>>>>> Kurt
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> On Tue, Jun 25, 2019 at 10:08 AM Jark Wu
>> >>>>>>>     <imjark@gmail.com <ma...@gmail.com>
>> >>>    <mailto:imjark@gmail.com <ma...@gmail.com>>
>> >>>>>>>>>>>>>> <mailto:imjark@gmail.com
>> >>>    <ma...@gmail.com> <mailto:imjark@gmail.com
>> >>>    <ma...@gmail.com>>>>
>> >>>>>>>     wrote:
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> Hi Bowen,
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> Thanks for bringing this. We also
>> >>>    suffered from
>> >>>>>>>     the long
>> >>>>>>>>>>>>>>      build time.
>> >>>>>>>>>>>>>>>>> I agree that we should focus on
>> >>>    solving build
>> >>>>>>>     capacity
>> >>>>>>>>>>>>>> problem in the
>> >>>>>>>>>>>>>>>>> thread.
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> My observation is there is only one
>> >>>    build is
>> >>>>>>>     running, all
>> >>>>>>>>> the
>> >>>>>>>>>>>>>> others
>> >>>>>>>>>>>>>>>>> (other
>> >>>>>>>>>>>>>>>>> PRs, master) are pending.
>> >>>>>>>>>>>>>>>>> The pricing plan[1] of travis shows
>> >>>    it can
>> >>>>>>> support
>> >>>>>>>>> concurrent
>> >>>>>>>>>>>>>>      build
>> >>>>>>>>>>>>>>> jobs.
>> >>>>>>>>>>>>>>>>> But I don't know which plan we are
>> >>>    using, might
>> >>>>>>>     be the free
>> >>>>>>>>>>>>>>      plan for
>> >>>>>>>>>>>>>>> open
>> >>>>>>>>>>>>>>>>> source.
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> I cc-ed Chesnay who may have some
>> >>>    experience on
>> >>>>>>>     Travis.
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> Regards,
>> >>>>>>>>>>>>>>>>> Jark
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> [1]: https://travis-ci.com/plans
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> On Tue, 25 Jun 2019 at 08:11, Bowen Li <
>> >>>>>>>>> bowenli86@gmail.com <ma...@gmail.com>
>> >>>    <mailto:bowenli86@gmail.com <ma...@gmail.com>>
>> >>>>>>>>>>>>>> <mailto:bowenli86@gmail.com
>> >>>    <ma...@gmail.com>
>> >>>>>>>     <mailto:bowenli86@gmail.com
>> >>>    <ma...@gmail.com>>>> wrote:
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> Hi Steven,
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> I think you may not read what I
>> >>>    wrote. The
>> >>>>>>>     discussion is
>> >>>>>>>>>>> about
>> >>>>>>>>>>>>>>> "unstable
>> >>>>>>>>>>>>>>>>>> build **capacity**", in another word
>> >>>>>>>     "unstable / lack of
>> >>>>>>>>>>> build
>> >>>>>>>>>>>>>>>>> resources",
>> >>>>>>>>>>>>>>>>>> not "unstable build".
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> On Mon, Jun 24, 2019 at 4:40 PM
>> >>>    Steven Wu
>> >>>>>>>>>>>>>>      <stevenz3wu@gmail.com
>> >>>    <ma...@gmail.com> <mailto:stevenz3wu@gmail.com
>> >>>    <ma...@gmail.com>>
>> >>>>>>>     <mailto:stevenz3wu@gmail.com
>> >>>    <ma...@gmail.com> <mailto:stevenz3wu@gmail.com
>> >>>    <ma...@gmail.com>>>>
>> >>>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>> long and sometimes unstable build is
>> >>>>>>>     definitely a pain
>> >>>>>>>>>>>>> point.
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>> I suspect the build failure here in
>> >>>>>>>>> flink-connector-kafka
>> >>>>>>>>>>>>>>      is not
>> >>>>>>>>>>>>>>>>> related
>> >>>>>>>>>>>>>>>>>> to
>> >>>>>>>>>>>>>>>>>>> my change. but there is no easy
>> >>>    re-run the
>> >>>>>>>     build on
>> >>>>>>>>>>>>>> travis UI.
>> >>>>>>>>>>>>>>> Google
>> >>>>>>>>>>>>>>>>>>> search showed a trick of
>> >>>    close-and-open the
>> >>>>>>>     PR will
>> >>>>>>>>>>>>>> trigger rebuild.
>> >>>>>>>>>>>>>>>>> but
>> >>>>>>>>>>>>>>>>>>> that could add noises to the PR
>> >>>    activities.
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>> https://travis-ci.org/apache/flink/jobs/545555519
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>> travis-ci for my personal repo
>> >>>    often failed
>> >>>>>>>     with
>> >>>>>>>>>>>>>> exceeding time
>> >>>>>>>>>>>>>>> limit
>> >>>>>>>>>>>>>>>>>> after
>> >>>>>>>>>>>>>>>>>>> 4+ hours.
>> >>>>>>>>>>>>>>>>>>> The job exceeded the maximum time
>> >>>    limit for
>> >>>>>>>     jobs, and
>> >>>>>>>>> has
>> >>>>>>>>>>>>>>      been
>> >>>>>>>>>>>>>>>>>> terminated.
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>> On Mon, Jun 24, 2019 at 4:15 PM
>> >>>    Bowen Li
>> >>>>>>>>>>>>>>      <bowenli86@gmail.com
>> >>>    <ma...@gmail.com> <mailto:bowenli86@gmail.com
>> >>>    <ma...@gmail.com>>
>> >>>>>>>     <mailto:bowenli86@gmail.com <ma...@gmail.com>
>> >>>    <mailto:bowenli86@gmail.com <ma...@gmail.com>>>>
>> >>>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>> https://travis-ci.org/apache/flink/builds/549681530
>> >>>>>>>>>>>>>>      This build
>> >>>>>>>>>>>>>>>>>> request
>> >>>>>>>>>>>>>>>>>>>> has
>> >>>>>>>>>>>>>>>>>>>> been sitting at **HEAD of the
>> >>>    queue**
>> >>>>>>>     since I first
>> >>>>>>>>> saw
>> >>>>>>>>>>>>>>      it at PST
>> >>>>>>>>>>>>>>>>>> 10:30am
>> >>>>>>>>>>>>>>>>>>>> (not sure how long it's been
>> >>>    there before
>> >>>>>>>     10:30am).
>> >>>>>>>>>>>>>>      It's PST
>> >>>>>>>>>>>>>>> 4:12pm
>> >>>>>>>>>>>>>>>>> now
>> >>>>>>>>>>>>>>>>>>> and
>> >>>>>>>>>>>>>>>>>>>> it hasn't started yet.
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>> On Mon, Jun 24, 2019 at 2:48 PM
>> >>>    Bowen Li
>> >>>>>>>>>>>>>>      <bowenli86@gmail.com
>> >>>    <ma...@gmail.com> <mailto:bowenli86@gmail.com
>> >>>    <ma...@gmail.com>>
>> >>>>>>>     <mailto:bowenli86@gmail.com <ma...@gmail.com>
>> >>>    <mailto:bowenli86@gmail.com <ma...@gmail.com>>>>
>> >>>>>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>> Hi devs,
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>> I've been experiencing the pain
>> >>>>>>>     resulting from lack
>> >>>>>>>>>>>>>>      of stable
>> >>>>>>>>>>>>>>>>> build
>> >>>>>>>>>>>>>>>>>>>>> capacity on Travis for Flink
>> >>>    PRs [1].
>> >>>>>>>>> Specifically, I
>> >>>>>>>>>>>>>> noticed
>> >>>>>>>>>>>>>>>>> often
>> >>>>>>>>>>>>>>>>>>> that
>> >>>>>>>>>>>>>>>>>>>> no
>> >>>>>>>>>>>>>>>>>>>>> build in the queue is making any
>> >>>>>>>     progress for
>> >>>>>>>>> hours,
>> >>>>>>>>>>> and
>> >>>>>>>>>>>>>>> suddenly
>> >>>>>>>>>>>>>>>>> 5
>> >>>>>>>>>>>>>>>>>> or
>> >>>>>>>>>>>>>>>>>>> 6
>> >>>>>>>>>>>>>>>>>>>>> builds kick off all together
>> >>>    after the
>> >>>>>>>     long pause.
>> >>>>>>>>>>>>>>      I'm at PST
>> >>>>>>>>>>>>>>>>>> (UTC-08)
>> >>>>>>>>>>>>>>>>>>>> time
>> >>>>>>>>>>>>>>>>>>>>> zone, and I've seen pause can
>> >>>    be as
>> >>>>>>>     long as 6 hours
>> >>>>>>>>>>>>>>      from PST 9am
>> >>>>>>>>>>>>>>>>> to
>> >>>>>>>>>>>>>>>>>> 3pm
>> >>>>>>>>>>>>>>>>>>>>> (let alone the time needed to
>> >>>    drain the
>> >>>>>>>     queue
>> >>>>>>>>>>>>>> afterwards).
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>> I think this has greatly
>> >>>    impacted our
>> >>>>>>>     productivity.
>> >>>>>>>>>>> I've
>> >>>>>>>>>>>>>>>>> experienced
>> >>>>>>>>>>>>>>>>>>> that
>> >>>>>>>>>>>>>>>>>>>>> PRs submitted in the early
>> >>>    morning of
>> >>>>>>>     PST time zone
>> >>>>>>>>>>>>>>      won't finish
>> >>>>>>>>>>>>>>>>>> their
>> >>>>>>>>>>>>>>>>>>>>> build until late night of the
>> >>>    same day.
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>> So my questions are:
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>> - Has anyone else experienced
>> >>>    the same
>> >>>>>>>     problem or
>> >>>>>>>>>>>>>>      have similar
>> >>>>>>>>>>>>>>>>>>>> observation
>> >>>>>>>>>>>>>>>>>>>>> on TravisCI? (I suspect it
>> >>>    has things
>> >>>>>>>     to do with
>> >>>>>>>>> time
>> >>>>>>>>>>>>>>      zone)
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>> - What pricing plan of
>> >>>    TravisCI is
>> >>>>>>>     Flink currently
>> >>>>>>>>>>>>>> using? Is it
>> >>>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>> free
>> >>>>>>>>>>>>>>>>>>>>> plan for open source
>> >>>    projects? What
>> >>>>>>> are the
>> >>>>>>>>>>>>>> guaranteed build
>> >>>>>>>>>>>>>>>>> capacity
>> >>>>>>>>>>>>>>>>>>> of
>> >>>>>>>>>>>>>>>>>>>>> the current plan?
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>> - If the current pricing plan
>> >>>    (either
>> >>>>>>>     free or paid)
>> >>>>>>>>>>>>> can't
>> >>>>>>>>>>>>>>> provide
>> >>>>>>>>>>>>>>>>>>> stable
>> >>>>>>>>>>>>>>>>>>>>> build capacity, can we
>> >>>    upgrade to a
>> >>>>>>>     higher priced
>> >>>>>>>>>>>>>>      plan with
>> >>>>>>>>>>>>>>> larger
>> >>>>>>>>>>>>>>>>>> and
>> >>>>>>>>>>>>>>>>>>>> more
>> >>>>>>>>>>>>>>>>>>>>> stable build capacity?
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>> BTW, another factor that
>> >>>    contribute to
>> >>>>>>> the
>> >>>>>>>>>>>>>> productivity problem
>> >>>>>>>>>>>>>>> is
>> >>>>>>>>>>>>>>>>>> that
>> >>>>>>>>>>>>>>>>>>>>> our build is slow - we run
>> >>>    full build
>> >>>>>>>     for every PR
>> >>>>>>>>>>> and a
>> >>>>>>>>>>>>>>>>> successful
>> >>>>>>>>>>>>>>>>>>> full
>> >>>>>>>>>>>>>>>>>>>>> build takes ~5h. We
>> >>>    definitely have
>> >>>>>>>     more options to
>> >>>>>>>>>>>>>>      solve it,
>> >>>>>>>>>>>>>>> for
>> >>>>>>>>>>>>>>>>>>>> instance,
>> >>>>>>>>>>>>>>>>>>>>> modularize the build graphs
>> >>>    and reuse
>> >>>>>>>     artifacts
>> >>>>>>>>> from
>> >>>>>>>>>>> the
>> >>>>>>>>>>>>>>> previous
>> >>>>>>>>>>>>>>>>>>> build.
>> >>>>>>>>>>>>>>>>>>>>> But I think that can be a big
>> >>>    effort
>> >>>>>>>     which is much
>> >>>>>>>>>>>>>> harder to
>> >>>>>>>>>>>>>>>>>> accomplish
>> >>>>>>>>>>>>>>>>>>>> in
>> >>>>>>>>>>>>>>>>>>>>> a short period of time and
>> >>>    may deserve
>> >>>>>>>     its own
>> >>>>>>>>>>> separate
>> >>>>>>>>>>>>>>>>> discussion.
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>> [1]
>> >>>>>>>>> https://travis-ci.org/apache/flink/pull_requests
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>      --
>> >>>>>>>>>>>>>>      Best Regards
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>      Jeff Zhang
>> >>>>>>>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>
>> >>>>>>
>> >>>>>
>> >>>
>> >>
>> >>
>>
>>

Re: [VOTE] Migrate to sponsored Travis account

Posted by Kurt Young <yk...@gmail.com>.
+1 and great thanks Chesnay for pushing this.

Best,
Kurt


On Thu, Jul 4, 2019 at 5:44 PM Aljoscha Krettek <al...@apache.org> wrote:

> +1
>
> Aljoscha
>
> > On 4. Jul 2019, at 11:09, Stephan Ewen <se...@apache.org> wrote:
> >
> > +1 to move to a private Travis account.
> >
> > I can confirm that Ververica will sponsor a Travis CI plan that is
> > equivalent or a bit higher than the previous ASF quota (10 concurrent
> build
> > queues)
> >
> > Best,
> > Stephan
> >
> > On Thu, Jul 4, 2019 at 10:46 AM Chesnay Schepler <ch...@apache.org>
> wrote:
> >
> >> I've raised a JIRA
> >> <https://issues.apache.org/jira/browse/INFRA-18703>with INFRA to
> inquire
> >> whether it would be possible to switch to a different Travis account,
> >> and if so what steps would need to be taken.
> >> We need a proper confirmation from INFRA since we are not in full
> >> control of the flink repository (for example, we cannot access the
> >> settings page).
> >>
> >> If this is indeed possible, Ververica is willing sponsor a Travis
> >> account for the Flink project.
> >> This would provide us with more than enough resources than we need.
> >>
> >> Since this makes the project more reliant on resources provided by
> >> external companies I would like to vote on this.
> >>
> >> Please vote on this proposal, as follows:
> >> [ ] +1, Approve the migration to a Ververica-sponsored Travis account,
> >> provided that INFRA approves
> >> [ ] -1, Do not approach the migration to a Ververica-sponsored Travis
> >> account
> >>
> >> The vote will be open for at least 24h, and until we have confirmation
> >> from INFRA. The voting period may be shorter than the usual 3 days since
> >> our current is effectively not working.
> >>
> >> On 04/07/2019 06:51, Bowen Li wrote:
> >>> Re: > Are they using their own Travis CI pool, or did the switch to an
> >>> entirely different CI service?
> >>>
> >>> I reached out to Wes and Krisztián from Apache Arrow PMC. They are
> >>> currently moving away from ASF's Travis to their own in-house metal
> >>> machines at [1] with custom CI application at [2]. They've seen
> >>> significant improvement w.r.t both much higher performance and
> >>> basically no resource waiting time, "night-and-day" difference quoting
> >>> Wes.
> >>>
> >>> Re: > If we can just switch to our own Travis pool, just for our
> >>> project, then this might be something we can do fairly quickly?
> >>>
> >>> I believe so, according to [3] and [4]
> >>>
> >>>
> >>> [1] https://ci.ursalabs.org/ <https://ci.ursalabs.org/#/>
> >>> [2] https://github.com/ursa-labs/ursabot
> >>> [3]
> >>>
> https://docs.travis-ci.com/user/migrate/open-source-repository-migration
> >>> [4]
> https://docs.travis-ci.com/user/migrate/open-source-on-travis-ci-com
> >>>
> >>>
> >>>
> >>> On Wed, Jul 3, 2019 at 12:01 AM Chesnay Schepler <chesnay@apache.org
> >>> <ma...@apache.org>> wrote:
> >>>
> >>>    Are they using their own Travis CI pool, or did the switch to an
> >>>    entirely different CI service?
> >>>
> >>>    If we can just switch to our own Travis pool, just for our
> >>>    project, then
> >>>    this might be something we can do fairly quickly?
> >>>
> >>>    On 03/07/2019 05:55, Bowen Li wrote:
> >>>> I responded in the INFRA ticket [1] that I believe they are
> >>>    using a wrong
> >>>> metric against Flink and the total build time is a completely
> >>>    different
> >>>> thing than guaranteed build capacity.
> >>>>
> >>>> My response:
> >>>>
> >>>> "As mentioned above, since I started to pay attention to Flink's
> >>>    build
> >>>> queue a few tens of days ago, I'm in Seattle and I saw no build
> >>>    was kicking
> >>>> off in PST daytime in weekdays for Flink. Our teammates in China
> >>>    and Europe
> >>>> have also reported similar observations. So we need to evaluate
> >>>    how the
> >>>> large total build time came from - if 1) your number and 2) our
> >>>> observations from three locations that cover pretty much a full
> >>>    day, are
> >>>> all true, I **guess** one reason can be that - highly likely the
> >>>    extra
> >>>> build time came from weekends when other Apache projects may be
> >>>    idle and
> >>>> Flink just drains hard its congested queue.
> >>>>
> >>>> Please be aware of that we're not complaining about the lack of
> >>>    resources
> >>>> in general, I'm complaining about the lack of **stable, dedicated**
> >>>> resources. An example for the latter one is, currently even if
> >>>    no build is
> >>>> in Flink's queue and I submit a request to be the queue head in PST
> >>>> morning, my build won't even start in 6-8+h. That is an absurd
> >>>    amount of
> >>>> waiting time.
> >>>>
> >>>> That's saying, if ASF INFRA decides to adopt a quota system and
> >>>    grants
> >>>> Flink five DEDICATED servers that runs all the time only for
> >>>    Flink, that'll
> >>>> be PERFECT and can totally solve our problem now.
> >>>>
> >>>> Please be aware of that we're not complaining about the lack of
> >>>    resources
> >>>> in general, I'm complaining about the lack of **stable, dedicated**
> >>>> resources. An example for the latter one is, currently even if
> >>>    no build is
> >>>> in Flink's queue and I submit a request to be the queue head in PST
> >>>> morning, my build won't even start in 6-8+h. That is an absurd
> >>>    amount of
> >>>> waiting time.
> >>>>
> >>>>
> >>>> That's saying, if ASF INFRA decides to adopt a quota system and
> >>>    grants
> >>>> Flink five DEDICATED servers that runs all the time only for
> >>>    Flink, that'll
> >>>> be PERFECT and can totally solve our problem now.
> >>>>
> >>>> I feel what's missing in the ASF INFRA's Travis resource pool is
> >>>    some level
> >>>> of build capacity SLAs and certainty"
> >>>>
> >>>>
> >>>> Again, I believe there are differences in nature of these two
> >>>    problems,
> >>>> long build time v.s. lack of dedicated build resource. That's
> >>>    saying,
> >>>> shortening build time may relieve the situation, and may not.
> >>>    I'm sightly
> >>>> negative on disabling IT cases for PRs, due to the downside is
> >>>    that we are
> >>>> at risk of any potential bugs in PR that UTs doesn't catch, and
> >>>    may cost a
> >>>> lot more to fix and if it slows others down or even block
> >>>    others, but am
> >>>> open to others opinions on it.
> >>>>
> >>>> AFAICT from INFRA ticket[1], donating to ASF INFRA won't be
> >>>    feasible to
> >>>> solve our problem since INFRA's pool is fully shared and they
> >>>    have no
> >>>> control and finer insights over resource allocation to a
> >>>    specific Apache
> >>>> project. As mentioned in [1], Apache Arrow is moving away from
> >>>    ASF INFRA
> >>>> Travis pool (they are actually surprised Flink hasn't plan to do
> >>>    so). I
> >>>> know that Spark is on its own build infra. If we all agree that
> >>>    funding our
> >>>> own build infra, I'd be glad to help investigate any potential
> >>>    options
> >>>> after releasing 1.9 since I'm super busy with 1.9 now.
> >>>>
> >>>> [1] https://issues.apache.org/jira/browse/INFRA-18533
> >>>>
> >>>>
> >>>>
> >>>> On Tue, Jul 2, 2019 at 4:46 AM Chesnay Schepler
> >>>    <chesnay@apache.org <ma...@apache.org>> wrote:
> >>>>
> >>>>> As a short-term stopgap, since we can assume this issue to
> >>>    become much
> >>>>> worse in the following days/weeks, we could disable IT cases in
> >>>    PRs and
> >>>>> only run them on master.
> >>>>>
> >>>>> On 02/07/2019 12:03, Chesnay Schepler wrote:
> >>>>>> People really have to stop thinking that just because
> >>>    something works
> >>>>>> for us it is also a good solution.
> >>>>>> Also, please remember that our builds run for 2h from start to
> >>>    finish,
> >>>>>> and not the 14 _minutes_ it takes for zeppelin.
> >>>>>> We are dealing with an entirely different scale here, both in
> >>>    terms of
> >>>>>> build times and number of builds.
> >>>>>>
> >>>>>> In this very thread people have been complaining about long queue
> >>>>>> times for their builds. Surprise, other Apache projects have been
> >>>>>> suffering the very same thing due to us not controlling our build
> >>>>>> times. While switching services (be it Jenkins, CircleCI or
> >>>    whatever)
> >>>>>> will possibly work for us (and these options are actually
> >>>    attractive,
> >>>>>> like CircleCI's proper support for build artifacts), it will also
> >>>>>> result in us likely negatively affecting other projects in
> >>>    significant
> >>>>>> ways.
> >>>>>>
> >>>>>> Sure, the Jenkins setup has a good user experience for us, at
> >>>    the cost
> >>>>>> of blocking Jenkins workers for a _lot_ of time. Right now we
> >>>    have 25
> >>>>>> PR's in our queue; that's possibly 50h we'd consume of Jenkins
> >>>>>> resources, and the European contributors haven't even really
> >>>    started yet.
> >>>>>>
> >>>>>> FYI, the latest INFRA response from INFRA-18533:
> >>>>>>
> >>>>>> "Our rough metrics shows that Flink used over 5800 hours of
> >>>    build time
> >>>>>> last month. That is equal to EIGHT servers running 24/7 for
> >>>    the ENTIRE
> >>>>>> MONTH. EIGHT. nonstop.
> >>>>>> When we discovered this last night, we discussed it some and
> >>>    are going
> >>>>>> to tune down Flink to allow only five executors maximum. We
> >> cannot
> >>>>>> allow Flink to consume so much of a Foundation shared resource."
> >>>>>>
> >>>>>> So yes, we either
> >>>>>> a) have to heavily reduce our CI usage or
> >>>>>> b) fund our own, either maintaining it ourselves or donating
> >>>    to Apache.
> >>>>>>
> >>>>>> On 02/07/2019 05:11, Bowen Li wrote:
> >>>>>>> By looking at the git history of the Jenkins script, its core
> >>>    part
> >>>>>>> was finished in March 2017 (and only two minor update in
> >>>    2017/2018),
> >>>>>>> so it's been running for over two years now and feels like
> >>>    Zepplin
> >>>>>>> community has been quite happy with it. @Jeff Zhang
> >>>>>>> <mailto:zjffdu@gmail.com <ma...@gmail.com>> can you
> >>>    share your insights and user
> >>>>>>> experience with the Jenkins+Travis approach?
> >>>>>>>
> >>>>>>> Things like:
> >>>>>>>
> >>>>>>> - has the approach completely solved the resource capacity
> >>>    problem
> >>>>>>> for Zepplin community? is Zepplin community happy with the
> >>>    result?
> >>>>>>> - is the whole configuration chain stable (e.g. uptime) enough?
> >>>>>>> - how often do you need to maintain the Jenkins infra? how many
> >>>>>>> people are usually involved in maintenance and bug-fixes?
> >>>>>>>
> >>>>>>> The downside of this approach seems mostly to be on the
> >>>    maintenance
> >>>>>>> to me - maintain the script and Jenkins infra.
> >>>>>>>
> >>>>>>> ** Having Our Own Travis-CI.com Account **
> >>>>>>>
> >>>>>>> Another alternative I've been thinking of is to have our own
> >>>>>>> travis-ci.com <http://travis-ci.com> <http://travis-ci.com>
> >>>    account with paid dedicated
> >>>>>>> resources. Note travis-ci.org <http://travis-ci.org>
> >>>    <http://travis-ci.org> is the free
> >>>>>>> version and travis-ci.com <http://travis-ci.com>
> >>>    <http://travis-ci.com> is the commercial
> >>>>>>> version. We currently use a shared resource pool managed by
> >>>    ASK INFRA
> >>>>>>> team on travis-ci.org <http://travis-ci.org>
> >>>    <http://travis-ci.org>, but we have no control
> >>>>>>> over it - we can't see how it's configured, how much
> >>>    resources are
> >>>>>>> available, how resources are allocated among Apache projects,
> >>>    etc.
> >>>>>>> The nice thing about having an account on travis-ci.com
> >>>    <http://travis-ci.com>
> >>>>>>> <http://travis-ci.com> are:
> >>>>>>>
> >>>>>>> - relatively low cost with much better resource guarantee
> >>>    than what
> >>>>>>> we currently have [1]: $249/month with 5 dedicated concurrency,
> >>>>>>> $489/month with 10 concurrency
> >>>>>>> - low maintenance work compared to using Jenkins
> >>>>>>> - (potentially) no migration cost according to Travis's doc [2]
> >>>>>>> (pending verification)
> >>>>>>> - full control over the build capacity/configuration compared to
> >>>>>>> using ASF INFRA's pool
> >>>>>>>
> >>>>>>> I'd be surprised if we as such a vibrant community cannot
> >>>    find and
> >>>>>>> fund $249*12=$2988 a year in exchange for a much better
> >> developer
> >>>>>>> experience and much higher productivity.
> >>>>>>>
> >>>>>>> [1] https://travis-ci.com/plans
> >>>>>>> [2]
> >>>>>>>
> >>>>>
> >>>
> >>
> https://docs.travis-ci.com/user/migrate/open-source-repository-migration
> >>>>>>> On Sat, Jun 29, 2019 at 8:39 AM Chesnay Schepler
> >>>    <chesnay@apache.org <ma...@apache.org>
> >>>>>>> <mailto:chesnay@apache.org <ma...@apache.org>>> wrote:
> >>>>>>>
> >>>>>>>     So yes, the Jenkins job keeps pulling the state from
> >>>    Travis until it
> >>>>>>>     finishes.
> >>>>>>>
> >>>>>>>     Note sure I'm comfortable with the idea of using Jenkins
> >>>    workers
> >>>>>>>     just to
> >>>>>>>     idle for a several hours.
> >>>>>>>
> >>>>>>>     On 29/06/2019 14:56, Jeff Zhang wrote:
> >>>>>>>> Here's what zeppelin community did, we make a python
> >>>    script to
> >>>>>>>     check the
> >>>>>>>> build status of pull request.
> >>>>>>>> Here's script:
> >>>>>>>>
> >>>    https://github.com/apache/zeppelin/blob/master/travis_check.py
> >>>>>>>>
> >>>>>>>> And this is the script we used in Jenkins build job.
> >>>>>>>>
> >>>>>>>> if [ -f "travis_check.py" ]; then
> >>>>>>>>   git log -n 1
> >>>>>>>>   STATUS=$(curl -s $BUILD_URL | grep -e "GitHub pull
> >>>>>>>     request.*from.*" | sed
> >>>>>>>> 's/.*GitHub pull request <a
> >>>>>>>> href=\"\(https[^"]*\).*from[^"]*.\(https[^"]*\).*/\1
> >>>    \2/g')
> >>>>>>>>   AUTHOR=$(echo $STATUS | sed 's/.*[/]\(.*\)$/\1/g')
> >>>>>>>>   PR=$(echo $STATUS | awk '{print $1}' | sed
> >>>>>>> 's/.*[/]\(.*\)$/\1/g')
> >>>>>>>>   #COMMIT=$(git log -n 1 | grep "^Merge:" | awk
> >>>    '{print $3}')
> >>>>>>>>   #if [ -z $COMMIT ]; then
> >>>>>>>>   #  COMMIT=$(curl -s
> >>>>>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR
> >>>>>>>> | grep -e "\"label\":" -e "\"ref\":" -e "\"sha\":" |
> >>>    tr '\n' ' '
> >>>>>>>     | sed
> >>>>>>>> 's/\(.*sha[^,]*,\)\(.*ref.*\)/\1 = \2/g' | tr = '\n' |
> >>>    grep -v
> >>>>>>>     "apache:" |
> >>>>>>>> sed 's/.*sha.[^"]*["]\([^"]*\).*/\1/g')
> >>>>>>>>   #fi
> >>>>>>>>
> >>>>>>>>   # get commit hash from PR
> >>>>>>>>   COMMIT=$(curl -s
> >>>>>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR |
> >>>>>>>> grep -e "\"label\":" -e "\"ref\":" -e "\"sha\":" | tr
> >>>    '\n' ' '
> >>>>>>> | sed
> >>>>>>>> 's/\(.*sha[^,]*,\)\(.*ref.*\)/\1 = \2/g' | tr = '\n' |
> >>>    grep -v
> >>>>>>>     "apache:" |
> >>>>>>>> sed 's/.*sha.[^"]*["]\([^"]*\).*/\1/g')
> >>>>>>>>   sleep 30 # sleep few moment to wait travis starts
> >>>    the build
> >>>>>>>>   RET_CODE=0
> >>>>>>>>   python ./travis_check.py ${AUTHOR} ${COMMIT} ||
> >>>    RET_CODE=$?
> >>>>>>>>   if [ $RET_CODE -eq 2 ]; then # try with repository
> >>>    name when
> >>>>>>>     travis-ci is
> >>>>>>>> not available in the account
> >>>>>>>>     RET_CODE=0
> >>>>>>>>     AUTHOR=$(curl -s
> >>>>>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR
> >>>>>>>> | grep '"full_name":' | grep -v "apache/zeppelin" | sed
> >>>>>>>> 's/.*[:][^"]*["]\([^/]*\).*/\1/g')
> >>>>>>>>   python ./travis_check.py ${AUTHOR} ${COMMIT} ||
> >>>    RET_CODE=$?
> >>>>>>>>   fi
> >>>>>>>>
> >>>>>>>>   if [ $RET_CODE -eq 2 ]; then # fail with can't find
> >>>    build
> >>>>>>>     information in
> >>>>>>>> the travis
> >>>>>>>>     set +x
> >>>>>>>>     echo
> >>>    "-----------------------------------------------------"
> >>>>>>>>     echo "Looks like travis-ci is not configured for
> >>>    your fork."
> >>>>>>>>     echo "Please setup by swich on 'zeppelin'
> >>>    repository at
> >>>>>>>> https://travis-ci.org/profile and travis-ci."
> >>>>>>>>     echo "And then make sure 'Build branch updates'
> >>>    option is
> >>>>>>>     enabled in
> >>>>>>>> the settings
> >>>    https://travis-ci.org/${AUTHOR}/zeppelin/settings
> >>>    <https://travis-ci.org/$%7BAUTHOR%7D/zeppelin/settings>
> >>>>>>> <https://travis-ci.org/$%7BAUTHOR%7D/zeppelin/settings>."
> >>>>>>>>     echo ""
> >>>>>>>>     echo "To trigger CI after setup, you will need
> >>>    ammend your
> >>>>>>>     last commit
> >>>>>>>> with"
> >>>>>>>>     echo "git commit --amend"
> >>>>>>>>     echo "git push your-remote HEAD --force"
> >>>>>>>>     echo ""
> >>>>>>>>     echo "See
> >>>>>>>>
> >>>>>>>
> >>>>>
> >>>
> >>
> http://zeppelin.apache.org/contribution/contributions.html#continuous-integration
> >>>>>>>> ."
> >>>>>>>>   fi
> >>>>>>>>
> >>>>>>>>   exit $RET_CODE
> >>>>>>>> else
> >>>>>>>>   set +x
> >>>>>>>>   echo "travis_check.py does not exists"
> >>>>>>>>   exit 1
> >>>>>>>> fi
> >>>>>>>>
> >>>>>>>> Chesnay Schepler <chesnay@apache.org
> >>>    <ma...@apache.org>
> >>>>>>>     <mailto:chesnay@apache.org <ma...@apache.org>>>
> >>>    于2019年6月29日周六 下午3:17写道:
> >>>>>>>>
> >>>>>>>>> Does this imply that a Jenkins job is active as long
> >>>    as the
> >>>>>>>     Travis build
> >>>>>>>>> runs?
> >>>>>>>>>
> >>>>>>>>> On 26/06/2019 21:28, Bowen Li wrote:
> >>>>>>>>>> Hi,
> >>>>>>>>>>
> >>>>>>>>>> @Dawid, I think the "long test running" as I
> >>>    mentioned in the
> >>>>>>>     first
> >>>>>>>>> email,
> >>>>>>>>>> also as you guys said, belongs to "a big effort
> >>>    which is much
> >>>>>>>     harder to
> >>>>>>>>>> accomplish in a short period of time and may deserve
> >>>    its own
> >>>>>>>     separate
> >>>>>>>>>> discussion". Thus I didn't include it in what we can
> >>>    do in a
> >>>>>>>     foreseeable
> >>>>>>>>>> short term.
> >>>>>>>>>>
> >>>>>>>>>> Besides, I don't think that's the ultimate reason
> >>>    for lack of
> >>>>>>>     build
> >>>>>>>>>> resources. Even if the build is shortened to
> >>>    something like
> >>>>>>>     2h, the
> >>>>>>>>>> problems of no build machine works about 6 or more
> >>>    hours in
> >>>>>>>     PST daytime
> >>>>>>>>>> that I described will still happen, because no
> >>>    machine from
> >>>>>>>     ASF INFRA's
> >>>>>>>>>> pool is allocated to Flink. As I have paid close
> >>>    attention to
> >>>>>>>     the build
> >>>>>>>>>> queue in the past few weekdays, it's a pretty clear
> >>>    pattern now.
> >>>>>>>>>>
> >>>>>>>>>> **The ultimate root cause** for that is - we don't
> >>>    have any
> >>>>>>>     **dedicated**
> >>>>>>>>>> build resources that we can stably rely on. I'm
> >>>    actually ok to
> >>>>>>>     wait for a
> >>>>>>>>>> long time if there are build requests running, it
> >>>    means at
> >>>>>>>     least we are
> >>>>>>>>>> making progress. But I'm not ok with no build
> >>>    resource. A
> >>>>>>>     better place I
> >>>>>>>>>> think we should aim at in short term is to always
> >>>    have at
> >>>>>>>     least a central
> >>>>>>>>>> pool (can be 3 or 5) of machines dedicated to build
> >>>    Flink at
> >>>>>>>     any time, or
> >>>>>>>>>> maybe use users resources.
> >>>>>>>>>>
> >>>>>>>>>> @Chesnay @Robert I synced with Jeff offline that
> >>>    Zeppelin
> >>>>>>>     community is
> >>>>>>>>>> using a Jenkins job to automatically build on users'
> >>>    travis
> >>>>>>>     account and
> >>>>>>>>>> link the result back to github PR. I guess the
> >>>    Jenkins job
> >>>>>>>     would fetch
> >>>>>>>>>> latest upstream master and build the PR against it.
> >>>    Jeff has
> >>>>>>> filed
> >>>>>>>>> tickets
> >>>>>>>>>> to learn and get access to the Jenkins infra. It'll
> >>>    better to
> >>>>>>>     fully
> >>>>>>>>>> understand it first before judging this approach.
> >>>>>>>>>>
> >>>>>>>>>> I also heard good things about CircleCI, and ASF
> >>>    INFRA seems
> >>>>>>>     to have a
> >>>>>>>>> pool
> >>>>>>>>>> of build capacity there too. Can be an alternative
> >>>    to consider.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On Wed, Jun 26, 2019 at 12:44 AM Dawid Wysakowicz <
> >>>>>>>>> dwysakowicz@apache.org
> >>>    <ma...@apache.org> <mailto:dwysakowicz@apache.org
> >>>    <ma...@apache.org>>>
> >>>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Sorry to jump in late, but I think Bowen missed the
> >>>    most
> >>>>>>>     important point
> >>>>>>>>>>> from Chesnay's previous message in the summary. The
> >>>    ultimate
> >>>>>>>     reason for
> >>>>>>>>>>> all the problems is that the tests take close to 2
> >>>    hours to
> >>>>>>>     run already.
> >>>>>>>>>>> I fully support this claim: "Unless people start
> >>>    caring about
> >>>>>>>     test times
> >>>>>>>>>>> before adding them, this issue cannot be solved"
> >>>>>>>>>>>
> >>>>>>>>>>> This is also another reason why using user's Travis
> >>>    account
> >>>>>>>     won't help.
> >>>>>>>>>>> Every few weeks we reach the user's time limit for
> >>>    a single
> >>>>>>>     profile.
> >>>>>>>>>>> This makes the user's builds simply fail, until we
> >>>    either
> >>>>>>>     properly
> >>>>>>>>>>> decrease the time the tests take (which I am not
> >>>    sure we ever
> >>>>>>>     did) or
> >>>>>>>>>>> postpone the problem by splitting into more
> >>>    profiles. (Note
> >>>>>>>     that the ASF
> >>>>>>>>>>> Travis account has higher time limits)
> >>>>>>>>>>>
> >>>>>>>>>>> Best,
> >>>>>>>>>>>
> >>>>>>>>>>> Dawid
> >>>>>>>>>>>
> >>>>>>>>>>> On 26/06/2019 09:36, Robert Metzger wrote:
> >>>>>>>>>>>> Do we know if using "the best" available hardware
> >>>    would
> >>>>>>>     improve the
> >>>>>>>>> build
> >>>>>>>>>>>> times?
> >>>>>>>>>>>> Imagine we would run the build on machines with
> >>>    plenty of
> >>>>>>>     main memory
> >>>>>>>>> to
> >>>>>>>>>>>> mount everything to ramdisk + the latest CPU
> >>>    architecture?
> >>>>>>>>>>>>
> >>>>>>>>>>>> Throwing hardware at the problem could help reduce
> >>>    the time
> >>>>>>>     of an
> >>>>>>>>>>>> individual build, and using our own infrastructure
> >>>    would
> >>>>>>>     remove our
> >>>>>>>>>>>> dependency on Apache's Travis account (with the
> >>>    obvious
> >>>>>>>     downside of
> >>>>>>>>>>> having
> >>>>>>>>>>>> to maintain the infrastructure)
> >>>>>>>>>>>> We could use an open source travis alternative, to
> >>>    have a
> >>>>>>>     similar
> >>>>>>>>>>>> experience and make the migration easy.
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Wed, Jun 26, 2019 at 9:34 AM Chesnay Schepler
> >>>>>>>     <chesnay@apache.org <ma...@apache.org>
> >>>    <mailto:chesnay@apache.org <ma...@apache.org>>>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>>>> From what I gathered, there's no special
> >>>    sauce that the
> >>>>>>>     Zeppelin
> >>>>>>>>>>>>> project uses which actually integrates a users
> >> Travis
> >>>>>>>     account into the
> >>>>>>>>>>> PR.
> >>>>>>>>>>>>> They just disabled Travis for PRs. And that's
> >>>    kind of it.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Naturally we can do this (duh) and safe the ASF a
> >>>    fair
> >>>>>>>     amount of
> >>>>>>>>>>>>> resources, but there are downsides:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> The discoverability of the Travis check takes a
> >>>    nose-dive.
> >>>>>>>     Either we
> >>>>>>>>>>>>> require every contributor to always, an every
> >>>    commit, also
> >>>>>>>     post a
> >>>>>>>>> Travis
> >>>>>>>>>>>>> build, or we have the reviewer sift through the
> >>>>>>>     contributors account
> >>>>>>>>> to
> >>>>>>>>>>>>> find it.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> This is rather cumbersome. Additionally, it's
> >>>    also not
> >>>>>>>     equivalent to
> >>>>>>>>>>>>> having a PR build.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> A normal branch build takes a branch as is and
> >>>    tests it. A
> >>>>>>>     PR build
> >>>>>>>>>>>>> merges the branch into master, and then runs it.
> >>>    (Fun fact:
> >>>>>>>     This is
> >>>>>>>>> why
> >>>>>>>>>>>>> a PR without merge conflicts is not being run on
> >>>    Travis.)
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> And ultimately, everyone can already make use of
> >> this
> >>>>>>>     approach anyway.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On 25/06/2019 08:02, Jark Wu wrote:
> >>>>>>>>>>>>>> Hi Jeff,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Thanks for sharing the Zeppelin approach. I
> >>>    think it's a
> >>>>>>>     good idea to
> >>>>>>>>>>>>>> leverage user's travis account.
> >>>>>>>>>>>>>> In this way, we can have almost unlimited
> >>>    concurrent build
> >>>>>>>     jobs and
> >>>>>>>>>>>>>> developers can restart build by themselves
> >>>    (currently only
> >>>>>>>     committers
> >>>>>>>>>>>>>> can restart PR's build).
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> But I'm still not very clear how to integrate
> >> user's
> >>>>>>>     travis build
> >>>>>>>>> into
> >>>>>>>>>>>>>> the Flink pull request's build automatically.
> >>>    Can you
> >>>>>>>     explain more in
> >>>>>>>>>>>>>> detail?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Another question: does travis only build
> >>>    branches for user
> >>>>>>>     account?
> >>>>>>>>>>>>>> My concern is that builds for PRs will rebase
> >> user's
> >>>>>>>     commits against
> >>>>>>>>>>>>>> current master branch.
> >>>>>>>>>>>>>> This will help us to find problems before
> >>>    merge.  Builds
> >>>>>>>     for branches
> >>>>>>>>>>>>>> will lose the impact of new commits in master.
> >>>>>>>>>>>>>> How does Zeppelin solve this problem?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Thanks again for sharing the idea.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Regards,
> >>>>>>>>>>>>>> Jark
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Tue, 25 Jun 2019 at 11:01, Jeff Zhang
> >>>    <zjffdu@gmail.com <ma...@gmail.com>
> >>>>>>>     <mailto:zjffdu@gmail.com <ma...@gmail.com>>
> >>>>>>>>>>>>>> <mailto:zjffdu@gmail.com
> >>>    <ma...@gmail.com> <mailto:zjffdu@gmail.com
> >>>    <ma...@gmail.com>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>      Hi Folks,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Zeppelin meet this kind of issue before, we solve
> >>>>>>> it by
> >>>>>>>>> delegating
> >>>>>>>>>>>>>>      each
> >>>>>>>>>>>>>>      one's PR build to his travis account
> >>>    (Everyone can
> >>>>>>>     have 5 free
> >>>>>>>>>>>>>>      slot for
> >>>>>>>>>>>>>> travis build).
> >>>>>>>>>>>>>> Apache account travis build is only triggered when
> >>>>>>>     PR is merged.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>      Kurt Young <ykt836@gmail.com
> >>>    <ma...@gmail.com>
> >>>>>>>     <mailto:ykt836@gmail.com <ma...@gmail.com>>
> >>>    <mailto:ykt836@gmail.com <ma...@gmail.com>
> >>>>>>>     <mailto:ykt836@gmail.com <ma...@gmail.com>>>>
> >>>>>>>>>>>>>> 于2019年6月25日周二 上午10:16写道:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> (Forgot to cc George)
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>> Kurt
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On Tue, Jun 25, 2019 at 10:16 AM Kurt Young
> >>>>>>>     <ykt836@gmail.com <ma...@gmail.com>
> >>>    <mailto:ykt836@gmail.com <ma...@gmail.com>>
> >>>>>>>>>>>>>> <mailto:ykt836@gmail.com
> >>>    <ma...@gmail.com> <mailto:ykt836@gmail.com
> >>>    <ma...@gmail.com>>>>
> >>>>>>>     wrote:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Hi Bowen,
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Thanks for bringing this up. We
> >>>    actually have
> >>>>>>>     discussed
> >>>>>>>>> about
> >>>>>>>>>>>>>>      this, and I
> >>>>>>>>>>>>>>>> think Till and George have
> >>>>>>>>>>>>>>>> already spend sometime investigating
> >>>    it. I have
> >>>>>>>     cced both of
> >>>>>>>>>>>>>>      them, and
> >>>>>>>>>>>>>>>> maybe they can share
> >>>>>>>>>>>>>>>> their findings.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>> Kurt
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On Tue, Jun 25, 2019 at 10:08 AM Jark Wu
> >>>>>>>     <imjark@gmail.com <ma...@gmail.com>
> >>>    <mailto:imjark@gmail.com <ma...@gmail.com>>
> >>>>>>>>>>>>>> <mailto:imjark@gmail.com
> >>>    <ma...@gmail.com> <mailto:imjark@gmail.com
> >>>    <ma...@gmail.com>>>>
> >>>>>>>     wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Hi Bowen,
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Thanks for bringing this. We also
> >>>    suffered from
> >>>>>>>     the long
> >>>>>>>>>>>>>>      build time.
> >>>>>>>>>>>>>>>>> I agree that we should focus on
> >>>    solving build
> >>>>>>>     capacity
> >>>>>>>>>>>>>> problem in the
> >>>>>>>>>>>>>>>>> thread.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> My observation is there is only one
> >>>    build is
> >>>>>>>     running, all
> >>>>>>>>> the
> >>>>>>>>>>>>>> others
> >>>>>>>>>>>>>>>>> (other
> >>>>>>>>>>>>>>>>> PRs, master) are pending.
> >>>>>>>>>>>>>>>>> The pricing plan[1] of travis shows
> >>>    it can
> >>>>>>> support
> >>>>>>>>> concurrent
> >>>>>>>>>>>>>>      build
> >>>>>>>>>>>>>>> jobs.
> >>>>>>>>>>>>>>>>> But I don't know which plan we are
> >>>    using, might
> >>>>>>>     be the free
> >>>>>>>>>>>>>>      plan for
> >>>>>>>>>>>>>>> open
> >>>>>>>>>>>>>>>>> source.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> I cc-ed Chesnay who may have some
> >>>    experience on
> >>>>>>>     Travis.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Regards,
> >>>>>>>>>>>>>>>>> Jark
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> [1]: https://travis-ci.com/plans
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> On Tue, 25 Jun 2019 at 08:11, Bowen Li <
> >>>>>>>>> bowenli86@gmail.com <ma...@gmail.com>
> >>>    <mailto:bowenli86@gmail.com <ma...@gmail.com>>
> >>>>>>>>>>>>>> <mailto:bowenli86@gmail.com
> >>>    <ma...@gmail.com>
> >>>>>>>     <mailto:bowenli86@gmail.com
> >>>    <ma...@gmail.com>>>> wrote:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Hi Steven,
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> I think you may not read what I
> >>>    wrote. The
> >>>>>>>     discussion is
> >>>>>>>>>>> about
> >>>>>>>>>>>>>>> "unstable
> >>>>>>>>>>>>>>>>>> build **capacity**", in another word
> >>>>>>>     "unstable / lack of
> >>>>>>>>>>> build
> >>>>>>>>>>>>>>>>> resources",
> >>>>>>>>>>>>>>>>>> not "unstable build".
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> On Mon, Jun 24, 2019 at 4:40 PM
> >>>    Steven Wu
> >>>>>>>>>>>>>>      <stevenz3wu@gmail.com
> >>>    <ma...@gmail.com> <mailto:stevenz3wu@gmail.com
> >>>    <ma...@gmail.com>>
> >>>>>>>     <mailto:stevenz3wu@gmail.com
> >>>    <ma...@gmail.com> <mailto:stevenz3wu@gmail.com
> >>>    <ma...@gmail.com>>>>
> >>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> long and sometimes unstable build is
> >>>>>>>     definitely a pain
> >>>>>>>>>>>>> point.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> I suspect the build failure here in
> >>>>>>>>> flink-connector-kafka
> >>>>>>>>>>>>>>      is not
> >>>>>>>>>>>>>>>>> related
> >>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>> my change. but there is no easy
> >>>    re-run the
> >>>>>>>     build on
> >>>>>>>>>>>>>> travis UI.
> >>>>>>>>>>>>>>> Google
> >>>>>>>>>>>>>>>>>>> search showed a trick of
> >>>    close-and-open the
> >>>>>>>     PR will
> >>>>>>>>>>>>>> trigger rebuild.
> >>>>>>>>>>>>>>>>> but
> >>>>>>>>>>>>>>>>>>> that could add noises to the PR
> >>>    activities.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>> https://travis-ci.org/apache/flink/jobs/545555519
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> travis-ci for my personal repo
> >>>    often failed
> >>>>>>>     with
> >>>>>>>>>>>>>> exceeding time
> >>>>>>>>>>>>>>> limit
> >>>>>>>>>>>>>>>>>> after
> >>>>>>>>>>>>>>>>>>> 4+ hours.
> >>>>>>>>>>>>>>>>>>> The job exceeded the maximum time
> >>>    limit for
> >>>>>>>     jobs, and
> >>>>>>>>> has
> >>>>>>>>>>>>>>      been
> >>>>>>>>>>>>>>>>>> terminated.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> On Mon, Jun 24, 2019 at 4:15 PM
> >>>    Bowen Li
> >>>>>>>>>>>>>>      <bowenli86@gmail.com
> >>>    <ma...@gmail.com> <mailto:bowenli86@gmail.com
> >>>    <ma...@gmail.com>>
> >>>>>>>     <mailto:bowenli86@gmail.com <ma...@gmail.com>
> >>>    <mailto:bowenli86@gmail.com <ma...@gmail.com>>>>
> >>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>> https://travis-ci.org/apache/flink/builds/549681530
> >>>>>>>>>>>>>>      This build
> >>>>>>>>>>>>>>>>>> request
> >>>>>>>>>>>>>>>>>>>> has
> >>>>>>>>>>>>>>>>>>>> been sitting at **HEAD of the
> >>>    queue**
> >>>>>>>     since I first
> >>>>>>>>> saw
> >>>>>>>>>>>>>>      it at PST
> >>>>>>>>>>>>>>>>>> 10:30am
> >>>>>>>>>>>>>>>>>>>> (not sure how long it's been
> >>>    there before
> >>>>>>>     10:30am).
> >>>>>>>>>>>>>>      It's PST
> >>>>>>>>>>>>>>> 4:12pm
> >>>>>>>>>>>>>>>>> now
> >>>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>> it hasn't started yet.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> On Mon, Jun 24, 2019 at 2:48 PM
> >>>    Bowen Li
> >>>>>>>>>>>>>>      <bowenli86@gmail.com
> >>>    <ma...@gmail.com> <mailto:bowenli86@gmail.com
> >>>    <ma...@gmail.com>>
> >>>>>>>     <mailto:bowenli86@gmail.com <ma...@gmail.com>
> >>>    <mailto:bowenli86@gmail.com <ma...@gmail.com>>>>
> >>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Hi devs,
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> I've been experiencing the pain
> >>>>>>>     resulting from lack
> >>>>>>>>>>>>>>      of stable
> >>>>>>>>>>>>>>>>> build
> >>>>>>>>>>>>>>>>>>>>> capacity on Travis for Flink
> >>>    PRs [1].
> >>>>>>>>> Specifically, I
> >>>>>>>>>>>>>> noticed
> >>>>>>>>>>>>>>>>> often
> >>>>>>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>>>> no
> >>>>>>>>>>>>>>>>>>>>> build in the queue is making any
> >>>>>>>     progress for
> >>>>>>>>> hours,
> >>>>>>>>>>> and
> >>>>>>>>>>>>>>> suddenly
> >>>>>>>>>>>>>>>>> 5
> >>>>>>>>>>>>>>>>>> or
> >>>>>>>>>>>>>>>>>>> 6
> >>>>>>>>>>>>>>>>>>>>> builds kick off all together
> >>>    after the
> >>>>>>>     long pause.
> >>>>>>>>>>>>>>      I'm at PST
> >>>>>>>>>>>>>>>>>> (UTC-08)
> >>>>>>>>>>>>>>>>>>>> time
> >>>>>>>>>>>>>>>>>>>>> zone, and I've seen pause can
> >>>    be as
> >>>>>>>     long as 6 hours
> >>>>>>>>>>>>>>      from PST 9am
> >>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>> 3pm
> >>>>>>>>>>>>>>>>>>>>> (let alone the time needed to
> >>>    drain the
> >>>>>>>     queue
> >>>>>>>>>>>>>> afterwards).
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> I think this has greatly
> >>>    impacted our
> >>>>>>>     productivity.
> >>>>>>>>>>> I've
> >>>>>>>>>>>>>>>>> experienced
> >>>>>>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>>>>> PRs submitted in the early
> >>>    morning of
> >>>>>>>     PST time zone
> >>>>>>>>>>>>>>      won't finish
> >>>>>>>>>>>>>>>>>> their
> >>>>>>>>>>>>>>>>>>>>> build until late night of the
> >>>    same day.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> So my questions are:
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> - Has anyone else experienced
> >>>    the same
> >>>>>>>     problem or
> >>>>>>>>>>>>>>      have similar
> >>>>>>>>>>>>>>>>>>>> observation
> >>>>>>>>>>>>>>>>>>>>> on TravisCI? (I suspect it
> >>>    has things
> >>>>>>>     to do with
> >>>>>>>>> time
> >>>>>>>>>>>>>>      zone)
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> - What pricing plan of
> >>>    TravisCI is
> >>>>>>>     Flink currently
> >>>>>>>>>>>>>> using? Is it
> >>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>> free
> >>>>>>>>>>>>>>>>>>>>> plan for open source
> >>>    projects? What
> >>>>>>> are the
> >>>>>>>>>>>>>> guaranteed build
> >>>>>>>>>>>>>>>>> capacity
> >>>>>>>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>>>>> the current plan?
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> - If the current pricing plan
> >>>    (either
> >>>>>>>     free or paid)
> >>>>>>>>>>>>> can't
> >>>>>>>>>>>>>>> provide
> >>>>>>>>>>>>>>>>>>> stable
> >>>>>>>>>>>>>>>>>>>>> build capacity, can we
> >>>    upgrade to a
> >>>>>>>     higher priced
> >>>>>>>>>>>>>>      plan with
> >>>>>>>>>>>>>>> larger
> >>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>> more
> >>>>>>>>>>>>>>>>>>>>> stable build capacity?
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> BTW, another factor that
> >>>    contribute to
> >>>>>>> the
> >>>>>>>>>>>>>> productivity problem
> >>>>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>>>>> our build is slow - we run
> >>>    full build
> >>>>>>>     for every PR
> >>>>>>>>>>> and a
> >>>>>>>>>>>>>>>>> successful
> >>>>>>>>>>>>>>>>>>> full
> >>>>>>>>>>>>>>>>>>>>> build takes ~5h. We
> >>>    definitely have
> >>>>>>>     more options to
> >>>>>>>>>>>>>>      solve it,
> >>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>>>> instance,
> >>>>>>>>>>>>>>>>>>>>> modularize the build graphs
> >>>    and reuse
> >>>>>>>     artifacts
> >>>>>>>>> from
> >>>>>>>>>>> the
> >>>>>>>>>>>>>>> previous
> >>>>>>>>>>>>>>>>>>> build.
> >>>>>>>>>>>>>>>>>>>>> But I think that can be a big
> >>>    effort
> >>>>>>>     which is much
> >>>>>>>>>>>>>> harder to
> >>>>>>>>>>>>>>>>>> accomplish
> >>>>>>>>>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>>>>>> a short period of time and
> >>>    may deserve
> >>>>>>>     its own
> >>>>>>>>>>> separate
> >>>>>>>>>>>>>>>>> discussion.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> [1]
> >>>>>>>>> https://travis-ci.org/apache/flink/pull_requests
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>      --
> >>>>>>>>>>>>>>      Best Regards
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>      Jeff Zhang
> >>>>>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>
> >>
> >>
>
>

Re: [VOTE] Migrate to sponsored Travis account

Posted by Aljoscha Krettek <al...@apache.org>.
+1

Aljoscha

> On 4. Jul 2019, at 11:09, Stephan Ewen <se...@apache.org> wrote:
> 
> +1 to move to a private Travis account.
> 
> I can confirm that Ververica will sponsor a Travis CI plan that is
> equivalent or a bit higher than the previous ASF quota (10 concurrent build
> queues)
> 
> Best,
> Stephan
> 
> On Thu, Jul 4, 2019 at 10:46 AM Chesnay Schepler <ch...@apache.org> wrote:
> 
>> I've raised a JIRA
>> <https://issues.apache.org/jira/browse/INFRA-18703>with INFRA to inquire
>> whether it would be possible to switch to a different Travis account,
>> and if so what steps would need to be taken.
>> We need a proper confirmation from INFRA since we are not in full
>> control of the flink repository (for example, we cannot access the
>> settings page).
>> 
>> If this is indeed possible, Ververica is willing sponsor a Travis
>> account for the Flink project.
>> This would provide us with more than enough resources than we need.
>> 
>> Since this makes the project more reliant on resources provided by
>> external companies I would like to vote on this.
>> 
>> Please vote on this proposal, as follows:
>> [ ] +1, Approve the migration to a Ververica-sponsored Travis account,
>> provided that INFRA approves
>> [ ] -1, Do not approach the migration to a Ververica-sponsored Travis
>> account
>> 
>> The vote will be open for at least 24h, and until we have confirmation
>> from INFRA. The voting period may be shorter than the usual 3 days since
>> our current is effectively not working.
>> 
>> On 04/07/2019 06:51, Bowen Li wrote:
>>> Re: > Are they using their own Travis CI pool, or did the switch to an
>>> entirely different CI service?
>>> 
>>> I reached out to Wes and Krisztián from Apache Arrow PMC. They are
>>> currently moving away from ASF's Travis to their own in-house metal
>>> machines at [1] with custom CI application at [2]. They've seen
>>> significant improvement w.r.t both much higher performance and
>>> basically no resource waiting time, "night-and-day" difference quoting
>>> Wes.
>>> 
>>> Re: > If we can just switch to our own Travis pool, just for our
>>> project, then this might be something we can do fairly quickly?
>>> 
>>> I believe so, according to [3] and [4]
>>> 
>>> 
>>> [1] https://ci.ursalabs.org/ <https://ci.ursalabs.org/#/>
>>> [2] https://github.com/ursa-labs/ursabot
>>> [3]
>>> https://docs.travis-ci.com/user/migrate/open-source-repository-migration
>>> [4] https://docs.travis-ci.com/user/migrate/open-source-on-travis-ci-com
>>> 
>>> 
>>> 
>>> On Wed, Jul 3, 2019 at 12:01 AM Chesnay Schepler <chesnay@apache.org
>>> <ma...@apache.org>> wrote:
>>> 
>>>    Are they using their own Travis CI pool, or did the switch to an
>>>    entirely different CI service?
>>> 
>>>    If we can just switch to our own Travis pool, just for our
>>>    project, then
>>>    this might be something we can do fairly quickly?
>>> 
>>>    On 03/07/2019 05:55, Bowen Li wrote:
>>>> I responded in the INFRA ticket [1] that I believe they are
>>>    using a wrong
>>>> metric against Flink and the total build time is a completely
>>>    different
>>>> thing than guaranteed build capacity.
>>>> 
>>>> My response:
>>>> 
>>>> "As mentioned above, since I started to pay attention to Flink's
>>>    build
>>>> queue a few tens of days ago, I'm in Seattle and I saw no build
>>>    was kicking
>>>> off in PST daytime in weekdays for Flink. Our teammates in China
>>>    and Europe
>>>> have also reported similar observations. So we need to evaluate
>>>    how the
>>>> large total build time came from - if 1) your number and 2) our
>>>> observations from three locations that cover pretty much a full
>>>    day, are
>>>> all true, I **guess** one reason can be that - highly likely the
>>>    extra
>>>> build time came from weekends when other Apache projects may be
>>>    idle and
>>>> Flink just drains hard its congested queue.
>>>> 
>>>> Please be aware of that we're not complaining about the lack of
>>>    resources
>>>> in general, I'm complaining about the lack of **stable, dedicated**
>>>> resources. An example for the latter one is, currently even if
>>>    no build is
>>>> in Flink's queue and I submit a request to be the queue head in PST
>>>> morning, my build won't even start in 6-8+h. That is an absurd
>>>    amount of
>>>> waiting time.
>>>> 
>>>> That's saying, if ASF INFRA decides to adopt a quota system and
>>>    grants
>>>> Flink five DEDICATED servers that runs all the time only for
>>>    Flink, that'll
>>>> be PERFECT and can totally solve our problem now.
>>>> 
>>>> Please be aware of that we're not complaining about the lack of
>>>    resources
>>>> in general, I'm complaining about the lack of **stable, dedicated**
>>>> resources. An example for the latter one is, currently even if
>>>    no build is
>>>> in Flink's queue and I submit a request to be the queue head in PST
>>>> morning, my build won't even start in 6-8+h. That is an absurd
>>>    amount of
>>>> waiting time.
>>>> 
>>>> 
>>>> That's saying, if ASF INFRA decides to adopt a quota system and
>>>    grants
>>>> Flink five DEDICATED servers that runs all the time only for
>>>    Flink, that'll
>>>> be PERFECT and can totally solve our problem now.
>>>> 
>>>> I feel what's missing in the ASF INFRA's Travis resource pool is
>>>    some level
>>>> of build capacity SLAs and certainty"
>>>> 
>>>> 
>>>> Again, I believe there are differences in nature of these two
>>>    problems,
>>>> long build time v.s. lack of dedicated build resource. That's
>>>    saying,
>>>> shortening build time may relieve the situation, and may not.
>>>    I'm sightly
>>>> negative on disabling IT cases for PRs, due to the downside is
>>>    that we are
>>>> at risk of any potential bugs in PR that UTs doesn't catch, and
>>>    may cost a
>>>> lot more to fix and if it slows others down or even block
>>>    others, but am
>>>> open to others opinions on it.
>>>> 
>>>> AFAICT from INFRA ticket[1], donating to ASF INFRA won't be
>>>    feasible to
>>>> solve our problem since INFRA's pool is fully shared and they
>>>    have no
>>>> control and finer insights over resource allocation to a
>>>    specific Apache
>>>> project. As mentioned in [1], Apache Arrow is moving away from
>>>    ASF INFRA
>>>> Travis pool (they are actually surprised Flink hasn't plan to do
>>>    so). I
>>>> know that Spark is on its own build infra. If we all agree that
>>>    funding our
>>>> own build infra, I'd be glad to help investigate any potential
>>>    options
>>>> after releasing 1.9 since I'm super busy with 1.9 now.
>>>> 
>>>> [1] https://issues.apache.org/jira/browse/INFRA-18533
>>>> 
>>>> 
>>>> 
>>>> On Tue, Jul 2, 2019 at 4:46 AM Chesnay Schepler
>>>    <chesnay@apache.org <ma...@apache.org>> wrote:
>>>> 
>>>>> As a short-term stopgap, since we can assume this issue to
>>>    become much
>>>>> worse in the following days/weeks, we could disable IT cases in
>>>    PRs and
>>>>> only run them on master.
>>>>> 
>>>>> On 02/07/2019 12:03, Chesnay Schepler wrote:
>>>>>> People really have to stop thinking that just because
>>>    something works
>>>>>> for us it is also a good solution.
>>>>>> Also, please remember that our builds run for 2h from start to
>>>    finish,
>>>>>> and not the 14 _minutes_ it takes for zeppelin.
>>>>>> We are dealing with an entirely different scale here, both in
>>>    terms of
>>>>>> build times and number of builds.
>>>>>> 
>>>>>> In this very thread people have been complaining about long queue
>>>>>> times for their builds. Surprise, other Apache projects have been
>>>>>> suffering the very same thing due to us not controlling our build
>>>>>> times. While switching services (be it Jenkins, CircleCI or
>>>    whatever)
>>>>>> will possibly work for us (and these options are actually
>>>    attractive,
>>>>>> like CircleCI's proper support for build artifacts), it will also
>>>>>> result in us likely negatively affecting other projects in
>>>    significant
>>>>>> ways.
>>>>>> 
>>>>>> Sure, the Jenkins setup has a good user experience for us, at
>>>    the cost
>>>>>> of blocking Jenkins workers for a _lot_ of time. Right now we
>>>    have 25
>>>>>> PR's in our queue; that's possibly 50h we'd consume of Jenkins
>>>>>> resources, and the European contributors haven't even really
>>>    started yet.
>>>>>> 
>>>>>> FYI, the latest INFRA response from INFRA-18533:
>>>>>> 
>>>>>> "Our rough metrics shows that Flink used over 5800 hours of
>>>    build time
>>>>>> last month. That is equal to EIGHT servers running 24/7 for
>>>    the ENTIRE
>>>>>> MONTH. EIGHT. nonstop.
>>>>>> When we discovered this last night, we discussed it some and
>>>    are going
>>>>>> to tune down Flink to allow only five executors maximum. We
>> cannot
>>>>>> allow Flink to consume so much of a Foundation shared resource."
>>>>>> 
>>>>>> So yes, we either
>>>>>> a) have to heavily reduce our CI usage or
>>>>>> b) fund our own, either maintaining it ourselves or donating
>>>    to Apache.
>>>>>> 
>>>>>> On 02/07/2019 05:11, Bowen Li wrote:
>>>>>>> By looking at the git history of the Jenkins script, its core
>>>    part
>>>>>>> was finished in March 2017 (and only two minor update in
>>>    2017/2018),
>>>>>>> so it's been running for over two years now and feels like
>>>    Zepplin
>>>>>>> community has been quite happy with it. @Jeff Zhang
>>>>>>> <mailto:zjffdu@gmail.com <ma...@gmail.com>> can you
>>>    share your insights and user
>>>>>>> experience with the Jenkins+Travis approach?
>>>>>>> 
>>>>>>> Things like:
>>>>>>> 
>>>>>>> - has the approach completely solved the resource capacity
>>>    problem
>>>>>>> for Zepplin community? is Zepplin community happy with the
>>>    result?
>>>>>>> - is the whole configuration chain stable (e.g. uptime) enough?
>>>>>>> - how often do you need to maintain the Jenkins infra? how many
>>>>>>> people are usually involved in maintenance and bug-fixes?
>>>>>>> 
>>>>>>> The downside of this approach seems mostly to be on the
>>>    maintenance
>>>>>>> to me - maintain the script and Jenkins infra.
>>>>>>> 
>>>>>>> ** Having Our Own Travis-CI.com Account **
>>>>>>> 
>>>>>>> Another alternative I've been thinking of is to have our own
>>>>>>> travis-ci.com <http://travis-ci.com> <http://travis-ci.com>
>>>    account with paid dedicated
>>>>>>> resources. Note travis-ci.org <http://travis-ci.org>
>>>    <http://travis-ci.org> is the free
>>>>>>> version and travis-ci.com <http://travis-ci.com>
>>>    <http://travis-ci.com> is the commercial
>>>>>>> version. We currently use a shared resource pool managed by
>>>    ASK INFRA
>>>>>>> team on travis-ci.org <http://travis-ci.org>
>>>    <http://travis-ci.org>, but we have no control
>>>>>>> over it - we can't see how it's configured, how much
>>>    resources are
>>>>>>> available, how resources are allocated among Apache projects,
>>>    etc.
>>>>>>> The nice thing about having an account on travis-ci.com
>>>    <http://travis-ci.com>
>>>>>>> <http://travis-ci.com> are:
>>>>>>> 
>>>>>>> - relatively low cost with much better resource guarantee
>>>    than what
>>>>>>> we currently have [1]: $249/month with 5 dedicated concurrency,
>>>>>>> $489/month with 10 concurrency
>>>>>>> - low maintenance work compared to using Jenkins
>>>>>>> - (potentially) no migration cost according to Travis's doc [2]
>>>>>>> (pending verification)
>>>>>>> - full control over the build capacity/configuration compared to
>>>>>>> using ASF INFRA's pool
>>>>>>> 
>>>>>>> I'd be surprised if we as such a vibrant community cannot
>>>    find and
>>>>>>> fund $249*12=$2988 a year in exchange for a much better
>> developer
>>>>>>> experience and much higher productivity.
>>>>>>> 
>>>>>>> [1] https://travis-ci.com/plans
>>>>>>> [2]
>>>>>>> 
>>>>> 
>>> 
>> https://docs.travis-ci.com/user/migrate/open-source-repository-migration
>>>>>>> On Sat, Jun 29, 2019 at 8:39 AM Chesnay Schepler
>>>    <chesnay@apache.org <ma...@apache.org>
>>>>>>> <mailto:chesnay@apache.org <ma...@apache.org>>> wrote:
>>>>>>> 
>>>>>>>     So yes, the Jenkins job keeps pulling the state from
>>>    Travis until it
>>>>>>>     finishes.
>>>>>>> 
>>>>>>>     Note sure I'm comfortable with the idea of using Jenkins
>>>    workers
>>>>>>>     just to
>>>>>>>     idle for a several hours.
>>>>>>> 
>>>>>>>     On 29/06/2019 14:56, Jeff Zhang wrote:
>>>>>>>> Here's what zeppelin community did, we make a python
>>>    script to
>>>>>>>     check the
>>>>>>>> build status of pull request.
>>>>>>>> Here's script:
>>>>>>>> 
>>>    https://github.com/apache/zeppelin/blob/master/travis_check.py
>>>>>>>> 
>>>>>>>> And this is the script we used in Jenkins build job.
>>>>>>>> 
>>>>>>>> if [ -f "travis_check.py" ]; then
>>>>>>>>   git log -n 1
>>>>>>>>   STATUS=$(curl -s $BUILD_URL | grep -e "GitHub pull
>>>>>>>     request.*from.*" | sed
>>>>>>>> 's/.*GitHub pull request <a
>>>>>>>> href=\"\(https[^"]*\).*from[^"]*.\(https[^"]*\).*/\1
>>>    \2/g')
>>>>>>>>   AUTHOR=$(echo $STATUS | sed 's/.*[/]\(.*\)$/\1/g')
>>>>>>>>   PR=$(echo $STATUS | awk '{print $1}' | sed
>>>>>>> 's/.*[/]\(.*\)$/\1/g')
>>>>>>>>   #COMMIT=$(git log -n 1 | grep "^Merge:" | awk
>>>    '{print $3}')
>>>>>>>>   #if [ -z $COMMIT ]; then
>>>>>>>>   #  COMMIT=$(curl -s
>>>>>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR
>>>>>>>> | grep -e "\"label\":" -e "\"ref\":" -e "\"sha\":" |
>>>    tr '\n' ' '
>>>>>>>     | sed
>>>>>>>> 's/\(.*sha[^,]*,\)\(.*ref.*\)/\1 = \2/g' | tr = '\n' |
>>>    grep -v
>>>>>>>     "apache:" |
>>>>>>>> sed 's/.*sha.[^"]*["]\([^"]*\).*/\1/g')
>>>>>>>>   #fi
>>>>>>>> 
>>>>>>>>   # get commit hash from PR
>>>>>>>>   COMMIT=$(curl -s
>>>>>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR |
>>>>>>>> grep -e "\"label\":" -e "\"ref\":" -e "\"sha\":" | tr
>>>    '\n' ' '
>>>>>>> | sed
>>>>>>>> 's/\(.*sha[^,]*,\)\(.*ref.*\)/\1 = \2/g' | tr = '\n' |
>>>    grep -v
>>>>>>>     "apache:" |
>>>>>>>> sed 's/.*sha.[^"]*["]\([^"]*\).*/\1/g')
>>>>>>>>   sleep 30 # sleep few moment to wait travis starts
>>>    the build
>>>>>>>>   RET_CODE=0
>>>>>>>>   python ./travis_check.py ${AUTHOR} ${COMMIT} ||
>>>    RET_CODE=$?
>>>>>>>>   if [ $RET_CODE -eq 2 ]; then # try with repository
>>>    name when
>>>>>>>     travis-ci is
>>>>>>>> not available in the account
>>>>>>>>     RET_CODE=0
>>>>>>>>     AUTHOR=$(curl -s
>>>>>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR
>>>>>>>> | grep '"full_name":' | grep -v "apache/zeppelin" | sed
>>>>>>>> 's/.*[:][^"]*["]\([^/]*\).*/\1/g')
>>>>>>>>   python ./travis_check.py ${AUTHOR} ${COMMIT} ||
>>>    RET_CODE=$?
>>>>>>>>   fi
>>>>>>>> 
>>>>>>>>   if [ $RET_CODE -eq 2 ]; then # fail with can't find
>>>    build
>>>>>>>     information in
>>>>>>>> the travis
>>>>>>>>     set +x
>>>>>>>>     echo
>>>    "-----------------------------------------------------"
>>>>>>>>     echo "Looks like travis-ci is not configured for
>>>    your fork."
>>>>>>>>     echo "Please setup by swich on 'zeppelin'
>>>    repository at
>>>>>>>> https://travis-ci.org/profile and travis-ci."
>>>>>>>>     echo "And then make sure 'Build branch updates'
>>>    option is
>>>>>>>     enabled in
>>>>>>>> the settings
>>>    https://travis-ci.org/${AUTHOR}/zeppelin/settings
>>>    <https://travis-ci.org/$%7BAUTHOR%7D/zeppelin/settings>
>>>>>>> <https://travis-ci.org/$%7BAUTHOR%7D/zeppelin/settings>."
>>>>>>>>     echo ""
>>>>>>>>     echo "To trigger CI after setup, you will need
>>>    ammend your
>>>>>>>     last commit
>>>>>>>> with"
>>>>>>>>     echo "git commit --amend"
>>>>>>>>     echo "git push your-remote HEAD --force"
>>>>>>>>     echo ""
>>>>>>>>     echo "See
>>>>>>>> 
>>>>>>> 
>>>>> 
>>> 
>> http://zeppelin.apache.org/contribution/contributions.html#continuous-integration
>>>>>>>> ."
>>>>>>>>   fi
>>>>>>>> 
>>>>>>>>   exit $RET_CODE
>>>>>>>> else
>>>>>>>>   set +x
>>>>>>>>   echo "travis_check.py does not exists"
>>>>>>>>   exit 1
>>>>>>>> fi
>>>>>>>> 
>>>>>>>> Chesnay Schepler <chesnay@apache.org
>>>    <ma...@apache.org>
>>>>>>>     <mailto:chesnay@apache.org <ma...@apache.org>>>
>>>    于2019年6月29日周六 下午3:17写道:
>>>>>>>> 
>>>>>>>>> Does this imply that a Jenkins job is active as long
>>>    as the
>>>>>>>     Travis build
>>>>>>>>> runs?
>>>>>>>>> 
>>>>>>>>> On 26/06/2019 21:28, Bowen Li wrote:
>>>>>>>>>> Hi,
>>>>>>>>>> 
>>>>>>>>>> @Dawid, I think the "long test running" as I
>>>    mentioned in the
>>>>>>>     first
>>>>>>>>> email,
>>>>>>>>>> also as you guys said, belongs to "a big effort
>>>    which is much
>>>>>>>     harder to
>>>>>>>>>> accomplish in a short period of time and may deserve
>>>    its own
>>>>>>>     separate
>>>>>>>>>> discussion". Thus I didn't include it in what we can
>>>    do in a
>>>>>>>     foreseeable
>>>>>>>>>> short term.
>>>>>>>>>> 
>>>>>>>>>> Besides, I don't think that's the ultimate reason
>>>    for lack of
>>>>>>>     build
>>>>>>>>>> resources. Even if the build is shortened to
>>>    something like
>>>>>>>     2h, the
>>>>>>>>>> problems of no build machine works about 6 or more
>>>    hours in
>>>>>>>     PST daytime
>>>>>>>>>> that I described will still happen, because no
>>>    machine from
>>>>>>>     ASF INFRA's
>>>>>>>>>> pool is allocated to Flink. As I have paid close
>>>    attention to
>>>>>>>     the build
>>>>>>>>>> queue in the past few weekdays, it's a pretty clear
>>>    pattern now.
>>>>>>>>>> 
>>>>>>>>>> **The ultimate root cause** for that is - we don't
>>>    have any
>>>>>>>     **dedicated**
>>>>>>>>>> build resources that we can stably rely on. I'm
>>>    actually ok to
>>>>>>>     wait for a
>>>>>>>>>> long time if there are build requests running, it
>>>    means at
>>>>>>>     least we are
>>>>>>>>>> making progress. But I'm not ok with no build
>>>    resource. A
>>>>>>>     better place I
>>>>>>>>>> think we should aim at in short term is to always
>>>    have at
>>>>>>>     least a central
>>>>>>>>>> pool (can be 3 or 5) of machines dedicated to build
>>>    Flink at
>>>>>>>     any time, or
>>>>>>>>>> maybe use users resources.
>>>>>>>>>> 
>>>>>>>>>> @Chesnay @Robert I synced with Jeff offline that
>>>    Zeppelin
>>>>>>>     community is
>>>>>>>>>> using a Jenkins job to automatically build on users'
>>>    travis
>>>>>>>     account and
>>>>>>>>>> link the result back to github PR. I guess the
>>>    Jenkins job
>>>>>>>     would fetch
>>>>>>>>>> latest upstream master and build the PR against it.
>>>    Jeff has
>>>>>>> filed
>>>>>>>>> tickets
>>>>>>>>>> to learn and get access to the Jenkins infra. It'll
>>>    better to
>>>>>>>     fully
>>>>>>>>>> understand it first before judging this approach.
>>>>>>>>>> 
>>>>>>>>>> I also heard good things about CircleCI, and ASF
>>>    INFRA seems
>>>>>>>     to have a
>>>>>>>>> pool
>>>>>>>>>> of build capacity there too. Can be an alternative
>>>    to consider.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Wed, Jun 26, 2019 at 12:44 AM Dawid Wysakowicz <
>>>>>>>>> dwysakowicz@apache.org
>>>    <ma...@apache.org> <mailto:dwysakowicz@apache.org
>>>    <ma...@apache.org>>>
>>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Sorry to jump in late, but I think Bowen missed the
>>>    most
>>>>>>>     important point
>>>>>>>>>>> from Chesnay's previous message in the summary. The
>>>    ultimate
>>>>>>>     reason for
>>>>>>>>>>> all the problems is that the tests take close to 2
>>>    hours to
>>>>>>>     run already.
>>>>>>>>>>> I fully support this claim: "Unless people start
>>>    caring about
>>>>>>>     test times
>>>>>>>>>>> before adding them, this issue cannot be solved"
>>>>>>>>>>> 
>>>>>>>>>>> This is also another reason why using user's Travis
>>>    account
>>>>>>>     won't help.
>>>>>>>>>>> Every few weeks we reach the user's time limit for
>>>    a single
>>>>>>>     profile.
>>>>>>>>>>> This makes the user's builds simply fail, until we
>>>    either
>>>>>>>     properly
>>>>>>>>>>> decrease the time the tests take (which I am not
>>>    sure we ever
>>>>>>>     did) or
>>>>>>>>>>> postpone the problem by splitting into more
>>>    profiles. (Note
>>>>>>>     that the ASF
>>>>>>>>>>> Travis account has higher time limits)
>>>>>>>>>>> 
>>>>>>>>>>> Best,
>>>>>>>>>>> 
>>>>>>>>>>> Dawid
>>>>>>>>>>> 
>>>>>>>>>>> On 26/06/2019 09:36, Robert Metzger wrote:
>>>>>>>>>>>> Do we know if using "the best" available hardware
>>>    would
>>>>>>>     improve the
>>>>>>>>> build
>>>>>>>>>>>> times?
>>>>>>>>>>>> Imagine we would run the build on machines with
>>>    plenty of
>>>>>>>     main memory
>>>>>>>>> to
>>>>>>>>>>>> mount everything to ramdisk + the latest CPU
>>>    architecture?
>>>>>>>>>>>> 
>>>>>>>>>>>> Throwing hardware at the problem could help reduce
>>>    the time
>>>>>>>     of an
>>>>>>>>>>>> individual build, and using our own infrastructure
>>>    would
>>>>>>>     remove our
>>>>>>>>>>>> dependency on Apache's Travis account (with the
>>>    obvious
>>>>>>>     downside of
>>>>>>>>>>> having
>>>>>>>>>>>> to maintain the infrastructure)
>>>>>>>>>>>> We could use an open source travis alternative, to
>>>    have a
>>>>>>>     similar
>>>>>>>>>>>> experience and make the migration easy.
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> On Wed, Jun 26, 2019 at 9:34 AM Chesnay Schepler
>>>>>>>     <chesnay@apache.org <ma...@apache.org>
>>>    <mailto:chesnay@apache.org <ma...@apache.org>>>
>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> From what I gathered, there's no special
>>>    sauce that the
>>>>>>>     Zeppelin
>>>>>>>>>>>>> project uses which actually integrates a users
>> Travis
>>>>>>>     account into the
>>>>>>>>>>> PR.
>>>>>>>>>>>>> They just disabled Travis for PRs. And that's
>>>    kind of it.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Naturally we can do this (duh) and safe the ASF a
>>>    fair
>>>>>>>     amount of
>>>>>>>>>>>>> resources, but there are downsides:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> The discoverability of the Travis check takes a
>>>    nose-dive.
>>>>>>>     Either we
>>>>>>>>>>>>> require every contributor to always, an every
>>>    commit, also
>>>>>>>     post a
>>>>>>>>> Travis
>>>>>>>>>>>>> build, or we have the reviewer sift through the
>>>>>>>     contributors account
>>>>>>>>> to
>>>>>>>>>>>>> find it.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> This is rather cumbersome. Additionally, it's
>>>    also not
>>>>>>>     equivalent to
>>>>>>>>>>>>> having a PR build.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> A normal branch build takes a branch as is and
>>>    tests it. A
>>>>>>>     PR build
>>>>>>>>>>>>> merges the branch into master, and then runs it.
>>>    (Fun fact:
>>>>>>>     This is
>>>>>>>>> why
>>>>>>>>>>>>> a PR without merge conflicts is not being run on
>>>    Travis.)
>>>>>>>>>>>>> 
>>>>>>>>>>>>> And ultimately, everyone can already make use of
>> this
>>>>>>>     approach anyway.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On 25/06/2019 08:02, Jark Wu wrote:
>>>>>>>>>>>>>> Hi Jeff,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Thanks for sharing the Zeppelin approach. I
>>>    think it's a
>>>>>>>     good idea to
>>>>>>>>>>>>>> leverage user's travis account.
>>>>>>>>>>>>>> In this way, we can have almost unlimited
>>>    concurrent build
>>>>>>>     jobs and
>>>>>>>>>>>>>> developers can restart build by themselves
>>>    (currently only
>>>>>>>     committers
>>>>>>>>>>>>>> can restart PR's build).
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> But I'm still not very clear how to integrate
>> user's
>>>>>>>     travis build
>>>>>>>>> into
>>>>>>>>>>>>>> the Flink pull request's build automatically.
>>>    Can you
>>>>>>>     explain more in
>>>>>>>>>>>>>> detail?
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Another question: does travis only build
>>>    branches for user
>>>>>>>     account?
>>>>>>>>>>>>>> My concern is that builds for PRs will rebase
>> user's
>>>>>>>     commits against
>>>>>>>>>>>>>> current master branch.
>>>>>>>>>>>>>> This will help us to find problems before
>>>    merge.  Builds
>>>>>>>     for branches
>>>>>>>>>>>>>> will lose the impact of new commits in master.
>>>>>>>>>>>>>> How does Zeppelin solve this problem?
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Thanks again for sharing the idea.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>> Jark
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Tue, 25 Jun 2019 at 11:01, Jeff Zhang
>>>    <zjffdu@gmail.com <ma...@gmail.com>
>>>>>>>     <mailto:zjffdu@gmail.com <ma...@gmail.com>>
>>>>>>>>>>>>>> <mailto:zjffdu@gmail.com
>>>    <ma...@gmail.com> <mailto:zjffdu@gmail.com
>>>    <ma...@gmail.com>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>      Hi Folks,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Zeppelin meet this kind of issue before, we solve
>>>>>>> it by
>>>>>>>>> delegating
>>>>>>>>>>>>>>      each
>>>>>>>>>>>>>>      one's PR build to his travis account
>>>    (Everyone can
>>>>>>>     have 5 free
>>>>>>>>>>>>>>      slot for
>>>>>>>>>>>>>> travis build).
>>>>>>>>>>>>>> Apache account travis build is only triggered when
>>>>>>>     PR is merged.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>      Kurt Young <ykt836@gmail.com
>>>    <ma...@gmail.com>
>>>>>>>     <mailto:ykt836@gmail.com <ma...@gmail.com>>
>>>    <mailto:ykt836@gmail.com <ma...@gmail.com>
>>>>>>>     <mailto:ykt836@gmail.com <ma...@gmail.com>>>>
>>>>>>>>>>>>>> 于2019年6月25日周二 上午10:16写道:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> (Forgot to cc George)
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>> Kurt
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Tue, Jun 25, 2019 at 10:16 AM Kurt Young
>>>>>>>     <ykt836@gmail.com <ma...@gmail.com>
>>>    <mailto:ykt836@gmail.com <ma...@gmail.com>>
>>>>>>>>>>>>>> <mailto:ykt836@gmail.com
>>>    <ma...@gmail.com> <mailto:ykt836@gmail.com
>>>    <ma...@gmail.com>>>>
>>>>>>>     wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Hi Bowen,
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Thanks for bringing this up. We
>>>    actually have
>>>>>>>     discussed
>>>>>>>>> about
>>>>>>>>>>>>>>      this, and I
>>>>>>>>>>>>>>>> think Till and George have
>>>>>>>>>>>>>>>> already spend sometime investigating
>>>    it. I have
>>>>>>>     cced both of
>>>>>>>>>>>>>>      them, and
>>>>>>>>>>>>>>>> maybe they can share
>>>>>>>>>>>>>>>> their findings.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>> Kurt
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Tue, Jun 25, 2019 at 10:08 AM Jark Wu
>>>>>>>     <imjark@gmail.com <ma...@gmail.com>
>>>    <mailto:imjark@gmail.com <ma...@gmail.com>>
>>>>>>>>>>>>>> <mailto:imjark@gmail.com
>>>    <ma...@gmail.com> <mailto:imjark@gmail.com
>>>    <ma...@gmail.com>>>>
>>>>>>>     wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Hi Bowen,
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Thanks for bringing this. We also
>>>    suffered from
>>>>>>>     the long
>>>>>>>>>>>>>>      build time.
>>>>>>>>>>>>>>>>> I agree that we should focus on
>>>    solving build
>>>>>>>     capacity
>>>>>>>>>>>>>> problem in the
>>>>>>>>>>>>>>>>> thread.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> My observation is there is only one
>>>    build is
>>>>>>>     running, all
>>>>>>>>> the
>>>>>>>>>>>>>> others
>>>>>>>>>>>>>>>>> (other
>>>>>>>>>>>>>>>>> PRs, master) are pending.
>>>>>>>>>>>>>>>>> The pricing plan[1] of travis shows
>>>    it can
>>>>>>> support
>>>>>>>>> concurrent
>>>>>>>>>>>>>>      build
>>>>>>>>>>>>>>> jobs.
>>>>>>>>>>>>>>>>> But I don't know which plan we are
>>>    using, might
>>>>>>>     be the free
>>>>>>>>>>>>>>      plan for
>>>>>>>>>>>>>>> open
>>>>>>>>>>>>>>>>> source.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> I cc-ed Chesnay who may have some
>>>    experience on
>>>>>>>     Travis.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>>> Jark
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> [1]: https://travis-ci.com/plans
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On Tue, 25 Jun 2019 at 08:11, Bowen Li <
>>>>>>>>> bowenli86@gmail.com <ma...@gmail.com>
>>>    <mailto:bowenli86@gmail.com <ma...@gmail.com>>
>>>>>>>>>>>>>> <mailto:bowenli86@gmail.com
>>>    <ma...@gmail.com>
>>>>>>>     <mailto:bowenli86@gmail.com
>>>    <ma...@gmail.com>>>> wrote:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Hi Steven,
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> I think you may not read what I
>>>    wrote. The
>>>>>>>     discussion is
>>>>>>>>>>> about
>>>>>>>>>>>>>>> "unstable
>>>>>>>>>>>>>>>>>> build **capacity**", in another word
>>>>>>>     "unstable / lack of
>>>>>>>>>>> build
>>>>>>>>>>>>>>>>> resources",
>>>>>>>>>>>>>>>>>> not "unstable build".
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> On Mon, Jun 24, 2019 at 4:40 PM
>>>    Steven Wu
>>>>>>>>>>>>>>      <stevenz3wu@gmail.com
>>>    <ma...@gmail.com> <mailto:stevenz3wu@gmail.com
>>>    <ma...@gmail.com>>
>>>>>>>     <mailto:stevenz3wu@gmail.com
>>>    <ma...@gmail.com> <mailto:stevenz3wu@gmail.com
>>>    <ma...@gmail.com>>>>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> long and sometimes unstable build is
>>>>>>>     definitely a pain
>>>>>>>>>>>>> point.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> I suspect the build failure here in
>>>>>>>>> flink-connector-kafka
>>>>>>>>>>>>>>      is not
>>>>>>>>>>>>>>>>> related
>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>> my change. but there is no easy
>>>    re-run the
>>>>>>>     build on
>>>>>>>>>>>>>> travis UI.
>>>>>>>>>>>>>>> Google
>>>>>>>>>>>>>>>>>>> search showed a trick of
>>>    close-and-open the
>>>>>>>     PR will
>>>>>>>>>>>>>> trigger rebuild.
>>>>>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>>>>> that could add noises to the PR
>>>    activities.
>>>>>>>>>>>>>>>>>>> 
>>>>>>> https://travis-ci.org/apache/flink/jobs/545555519
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> travis-ci for my personal repo
>>>    often failed
>>>>>>>     with
>>>>>>>>>>>>>> exceeding time
>>>>>>>>>>>>>>> limit
>>>>>>>>>>>>>>>>>> after
>>>>>>>>>>>>>>>>>>> 4+ hours.
>>>>>>>>>>>>>>>>>>> The job exceeded the maximum time
>>>    limit for
>>>>>>>     jobs, and
>>>>>>>>> has
>>>>>>>>>>>>>>      been
>>>>>>>>>>>>>>>>>> terminated.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> On Mon, Jun 24, 2019 at 4:15 PM
>>>    Bowen Li
>>>>>>>>>>>>>>      <bowenli86@gmail.com
>>>    <ma...@gmail.com> <mailto:bowenli86@gmail.com
>>>    <ma...@gmail.com>>
>>>>>>>     <mailto:bowenli86@gmail.com <ma...@gmail.com>
>>>    <mailto:bowenli86@gmail.com <ma...@gmail.com>>>>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>> https://travis-ci.org/apache/flink/builds/549681530
>>>>>>>>>>>>>>      This build
>>>>>>>>>>>>>>>>>> request
>>>>>>>>>>>>>>>>>>>> has
>>>>>>>>>>>>>>>>>>>> been sitting at **HEAD of the
>>>    queue**
>>>>>>>     since I first
>>>>>>>>> saw
>>>>>>>>>>>>>>      it at PST
>>>>>>>>>>>>>>>>>> 10:30am
>>>>>>>>>>>>>>>>>>>> (not sure how long it's been
>>>    there before
>>>>>>>     10:30am).
>>>>>>>>>>>>>>      It's PST
>>>>>>>>>>>>>>> 4:12pm
>>>>>>>>>>>>>>>>> now
>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>> it hasn't started yet.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> On Mon, Jun 24, 2019 at 2:48 PM
>>>    Bowen Li
>>>>>>>>>>>>>>      <bowenli86@gmail.com
>>>    <ma...@gmail.com> <mailto:bowenli86@gmail.com
>>>    <ma...@gmail.com>>
>>>>>>>     <mailto:bowenli86@gmail.com <ma...@gmail.com>
>>>    <mailto:bowenli86@gmail.com <ma...@gmail.com>>>>
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Hi devs,
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> I've been experiencing the pain
>>>>>>>     resulting from lack
>>>>>>>>>>>>>>      of stable
>>>>>>>>>>>>>>>>> build
>>>>>>>>>>>>>>>>>>>>> capacity on Travis for Flink
>>>    PRs [1].
>>>>>>>>> Specifically, I
>>>>>>>>>>>>>> noticed
>>>>>>>>>>>>>>>>> often
>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>> no
>>>>>>>>>>>>>>>>>>>>> build in the queue is making any
>>>>>>>     progress for
>>>>>>>>> hours,
>>>>>>>>>>> and
>>>>>>>>>>>>>>> suddenly
>>>>>>>>>>>>>>>>> 5
>>>>>>>>>>>>>>>>>> or
>>>>>>>>>>>>>>>>>>> 6
>>>>>>>>>>>>>>>>>>>>> builds kick off all together
>>>    after the
>>>>>>>     long pause.
>>>>>>>>>>>>>>      I'm at PST
>>>>>>>>>>>>>>>>>> (UTC-08)
>>>>>>>>>>>>>>>>>>>> time
>>>>>>>>>>>>>>>>>>>>> zone, and I've seen pause can
>>>    be as
>>>>>>>     long as 6 hours
>>>>>>>>>>>>>>      from PST 9am
>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>> 3pm
>>>>>>>>>>>>>>>>>>>>> (let alone the time needed to
>>>    drain the
>>>>>>>     queue
>>>>>>>>>>>>>> afterwards).
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> I think this has greatly
>>>    impacted our
>>>>>>>     productivity.
>>>>>>>>>>> I've
>>>>>>>>>>>>>>>>> experienced
>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>> PRs submitted in the early
>>>    morning of
>>>>>>>     PST time zone
>>>>>>>>>>>>>>      won't finish
>>>>>>>>>>>>>>>>>> their
>>>>>>>>>>>>>>>>>>>>> build until late night of the
>>>    same day.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> So my questions are:
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> - Has anyone else experienced
>>>    the same
>>>>>>>     problem or
>>>>>>>>>>>>>>      have similar
>>>>>>>>>>>>>>>>>>>> observation
>>>>>>>>>>>>>>>>>>>>> on TravisCI? (I suspect it
>>>    has things
>>>>>>>     to do with
>>>>>>>>> time
>>>>>>>>>>>>>>      zone)
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> - What pricing plan of
>>>    TravisCI is
>>>>>>>     Flink currently
>>>>>>>>>>>>>> using? Is it
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> free
>>>>>>>>>>>>>>>>>>>>> plan for open source
>>>    projects? What
>>>>>>> are the
>>>>>>>>>>>>>> guaranteed build
>>>>>>>>>>>>>>>>> capacity
>>>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>> the current plan?
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> - If the current pricing plan
>>>    (either
>>>>>>>     free or paid)
>>>>>>>>>>>>> can't
>>>>>>>>>>>>>>> provide
>>>>>>>>>>>>>>>>>>> stable
>>>>>>>>>>>>>>>>>>>>> build capacity, can we
>>>    upgrade to a
>>>>>>>     higher priced
>>>>>>>>>>>>>>      plan with
>>>>>>>>>>>>>>> larger
>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>> more
>>>>>>>>>>>>>>>>>>>>> stable build capacity?
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> BTW, another factor that
>>>    contribute to
>>>>>>> the
>>>>>>>>>>>>>> productivity problem
>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>> our build is slow - we run
>>>    full build
>>>>>>>     for every PR
>>>>>>>>>>> and a
>>>>>>>>>>>>>>>>> successful
>>>>>>>>>>>>>>>>>>> full
>>>>>>>>>>>>>>>>>>>>> build takes ~5h. We
>>>    definitely have
>>>>>>>     more options to
>>>>>>>>>>>>>>      solve it,
>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>> instance,
>>>>>>>>>>>>>>>>>>>>> modularize the build graphs
>>>    and reuse
>>>>>>>     artifacts
>>>>>>>>> from
>>>>>>>>>>> the
>>>>>>>>>>>>>>> previous
>>>>>>>>>>>>>>>>>>> build.
>>>>>>>>>>>>>>>>>>>>> But I think that can be a big
>>>    effort
>>>>>>>     which is much
>>>>>>>>>>>>>> harder to
>>>>>>>>>>>>>>>>>> accomplish
>>>>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>> a short period of time and
>>>    may deserve
>>>>>>>     its own
>>>>>>>>>>> separate
>>>>>>>>>>>>>>>>> discussion.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> [1]
>>>>>>>>> https://travis-ci.org/apache/flink/pull_requests
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>      --
>>>>>>>>>>>>>>      Best Regards
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>      Jeff Zhang
>>>>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>> 
>> 
>> 


Re: [VOTE] Migrate to sponsored Travis account

Posted by Stephan Ewen <se...@apache.org>.
+1 to move to a private Travis account.

I can confirm that Ververica will sponsor a Travis CI plan that is
equivalent or a bit higher than the previous ASF quota (10 concurrent build
queues)

Best,
Stephan

On Thu, Jul 4, 2019 at 10:46 AM Chesnay Schepler <ch...@apache.org> wrote:

> I've raised a JIRA
> <https://issues.apache.org/jira/browse/INFRA-18703>with INFRA to inquire
> whether it would be possible to switch to a different Travis account,
> and if so what steps would need to be taken.
> We need a proper confirmation from INFRA since we are not in full
> control of the flink repository (for example, we cannot access the
> settings page).
>
> If this is indeed possible, Ververica is willing sponsor a Travis
> account for the Flink project.
> This would provide us with more than enough resources than we need.
>
> Since this makes the project more reliant on resources provided by
> external companies I would like to vote on this.
>
> Please vote on this proposal, as follows:
> [ ] +1, Approve the migration to a Ververica-sponsored Travis account,
> provided that INFRA approves
> [ ] -1, Do not approach the migration to a Ververica-sponsored Travis
> account
>
> The vote will be open for at least 24h, and until we have confirmation
> from INFRA. The voting period may be shorter than the usual 3 days since
> our current is effectively not working.
>
> On 04/07/2019 06:51, Bowen Li wrote:
> > Re: > Are they using their own Travis CI pool, or did the switch to an
> > entirely different CI service?
> >
> > I reached out to Wes and Krisztián from Apache Arrow PMC. They are
> > currently moving away from ASF's Travis to their own in-house metal
> > machines at [1] with custom CI application at [2]. They've seen
> > significant improvement w.r.t both much higher performance and
> > basically no resource waiting time, "night-and-day" difference quoting
> > Wes.
> >
> > Re: > If we can just switch to our own Travis pool, just for our
> > project, then this might be something we can do fairly quickly?
> >
> > I believe so, according to [3] and [4]
> >
> >
> > [1] https://ci.ursalabs.org/ <https://ci.ursalabs.org/#/>
> > [2] https://github.com/ursa-labs/ursabot
> > [3]
> > https://docs.travis-ci.com/user/migrate/open-source-repository-migration
> > [4] https://docs.travis-ci.com/user/migrate/open-source-on-travis-ci-com
> >
> >
> >
> > On Wed, Jul 3, 2019 at 12:01 AM Chesnay Schepler <chesnay@apache.org
> > <ma...@apache.org>> wrote:
> >
> >     Are they using their own Travis CI pool, or did the switch to an
> >     entirely different CI service?
> >
> >     If we can just switch to our own Travis pool, just for our
> >     project, then
> >     this might be something we can do fairly quickly?
> >
> >     On 03/07/2019 05:55, Bowen Li wrote:
> >     > I responded in the INFRA ticket [1] that I believe they are
> >     using a wrong
> >     > metric against Flink and the total build time is a completely
> >     different
> >     > thing than guaranteed build capacity.
> >     >
> >     > My response:
> >     >
> >     > "As mentioned above, since I started to pay attention to Flink's
> >     build
> >     > queue a few tens of days ago, I'm in Seattle and I saw no build
> >     was kicking
> >     > off in PST daytime in weekdays for Flink. Our teammates in China
> >     and Europe
> >     > have also reported similar observations. So we need to evaluate
> >     how the
> >     > large total build time came from - if 1) your number and 2) our
> >     > observations from three locations that cover pretty much a full
> >     day, are
> >     > all true, I **guess** one reason can be that - highly likely the
> >     extra
> >     > build time came from weekends when other Apache projects may be
> >     idle and
> >     > Flink just drains hard its congested queue.
> >     >
> >     > Please be aware of that we're not complaining about the lack of
> >     resources
> >     > in general, I'm complaining about the lack of **stable, dedicated**
> >     > resources. An example for the latter one is, currently even if
> >     no build is
> >     > in Flink's queue and I submit a request to be the queue head in PST
> >     > morning, my build won't even start in 6-8+h. That is an absurd
> >     amount of
> >     > waiting time.
> >     >
> >     > That's saying, if ASF INFRA decides to adopt a quota system and
> >     grants
> >     > Flink five DEDICATED servers that runs all the time only for
> >     Flink, that'll
> >     > be PERFECT and can totally solve our problem now.
> >     >
> >     > Please be aware of that we're not complaining about the lack of
> >     resources
> >     > in general, I'm complaining about the lack of **stable, dedicated**
> >     > resources. An example for the latter one is, currently even if
> >     no build is
> >     > in Flink's queue and I submit a request to be the queue head in PST
> >     > morning, my build won't even start in 6-8+h. That is an absurd
> >     amount of
> >     > waiting time.
> >     >
> >     >
> >     > That's saying, if ASF INFRA decides to adopt a quota system and
> >     grants
> >     > Flink five DEDICATED servers that runs all the time only for
> >     Flink, that'll
> >     > be PERFECT and can totally solve our problem now.
> >     >
> >     > I feel what's missing in the ASF INFRA's Travis resource pool is
> >     some level
> >     > of build capacity SLAs and certainty"
> >     >
> >     >
> >     > Again, I believe there are differences in nature of these two
> >     problems,
> >     > long build time v.s. lack of dedicated build resource. That's
> >     saying,
> >     > shortening build time may relieve the situation, and may not.
> >     I'm sightly
> >     > negative on disabling IT cases for PRs, due to the downside is
> >     that we are
> >     > at risk of any potential bugs in PR that UTs doesn't catch, and
> >     may cost a
> >     > lot more to fix and if it slows others down or even block
> >     others, but am
> >     > open to others opinions on it.
> >     >
> >     > AFAICT from INFRA ticket[1], donating to ASF INFRA won't be
> >     feasible to
> >     > solve our problem since INFRA's pool is fully shared and they
> >     have no
> >     > control and finer insights over resource allocation to a
> >     specific Apache
> >     > project. As mentioned in [1], Apache Arrow is moving away from
> >     ASF INFRA
> >     > Travis pool (they are actually surprised Flink hasn't plan to do
> >     so). I
> >     > know that Spark is on its own build infra. If we all agree that
> >     funding our
> >     > own build infra, I'd be glad to help investigate any potential
> >     options
> >     > after releasing 1.9 since I'm super busy with 1.9 now.
> >     >
> >     > [1] https://issues.apache.org/jira/browse/INFRA-18533
> >     >
> >     >
> >     >
> >     > On Tue, Jul 2, 2019 at 4:46 AM Chesnay Schepler
> >     <chesnay@apache.org <ma...@apache.org>> wrote:
> >     >
> >     >> As a short-term stopgap, since we can assume this issue to
> >     become much
> >     >> worse in the following days/weeks, we could disable IT cases in
> >     PRs and
> >     >> only run them on master.
> >     >>
> >     >> On 02/07/2019 12:03, Chesnay Schepler wrote:
> >     >>> People really have to stop thinking that just because
> >     something works
> >     >>> for us it is also a good solution.
> >     >>> Also, please remember that our builds run for 2h from start to
> >     finish,
> >     >>> and not the 14 _minutes_ it takes for zeppelin.
> >     >>> We are dealing with an entirely different scale here, both in
> >     terms of
> >     >>> build times and number of builds.
> >     >>>
> >     >>> In this very thread people have been complaining about long queue
> >     >>> times for their builds. Surprise, other Apache projects have been
> >     >>> suffering the very same thing due to us not controlling our build
> >     >>> times. While switching services (be it Jenkins, CircleCI or
> >     whatever)
> >     >>> will possibly work for us (and these options are actually
> >     attractive,
> >     >>> like CircleCI's proper support for build artifacts), it will also
> >     >>> result in us likely negatively affecting other projects in
> >     significant
> >     >>> ways.
> >     >>>
> >     >>> Sure, the Jenkins setup has a good user experience for us, at
> >     the cost
> >     >>> of blocking Jenkins workers for a _lot_ of time. Right now we
> >     have 25
> >     >>> PR's in our queue; that's possibly 50h we'd consume of Jenkins
> >     >>> resources, and the European contributors haven't even really
> >     started yet.
> >     >>>
> >     >>> FYI, the latest INFRA response from INFRA-18533:
> >     >>>
> >     >>> "Our rough metrics shows that Flink used over 5800 hours of
> >     build time
> >     >>> last month. That is equal to EIGHT servers running 24/7 for
> >     the ENTIRE
> >     >>> MONTH. EIGHT. nonstop.
> >     >>> When we discovered this last night, we discussed it some and
> >     are going
> >     >>> to tune down Flink to allow only five executors maximum. We
> cannot
> >     >>> allow Flink to consume so much of a Foundation shared resource."
> >     >>>
> >     >>> So yes, we either
> >     >>> a) have to heavily reduce our CI usage or
> >     >>> b) fund our own, either maintaining it ourselves or donating
> >     to Apache.
> >     >>>
> >     >>> On 02/07/2019 05:11, Bowen Li wrote:
> >     >>>> By looking at the git history of the Jenkins script, its core
> >     part
> >     >>>> was finished in March 2017 (and only two minor update in
> >     2017/2018),
> >     >>>> so it's been running for over two years now and feels like
> >     Zepplin
> >     >>>> community has been quite happy with it. @Jeff Zhang
> >     >>>> <mailto:zjffdu@gmail.com <ma...@gmail.com>> can you
> >     share your insights and user
> >     >>>> experience with the Jenkins+Travis approach?
> >     >>>>
> >     >>>> Things like:
> >     >>>>
> >     >>>> - has the approach completely solved the resource capacity
> >     problem
> >     >>>> for Zepplin community? is Zepplin community happy with the
> >     result?
> >     >>>> - is the whole configuration chain stable (e.g. uptime) enough?
> >     >>>> - how often do you need to maintain the Jenkins infra? how many
> >     >>>> people are usually involved in maintenance and bug-fixes?
> >     >>>>
> >     >>>> The downside of this approach seems mostly to be on the
> >     maintenance
> >     >>>> to me - maintain the script and Jenkins infra.
> >     >>>>
> >     >>>> ** Having Our Own Travis-CI.com Account **
> >     >>>>
> >     >>>> Another alternative I've been thinking of is to have our own
> >     >>>> travis-ci.com <http://travis-ci.com> <http://travis-ci.com>
> >     account with paid dedicated
> >     >>>> resources. Note travis-ci.org <http://travis-ci.org>
> >     <http://travis-ci.org> is the free
> >     >>>> version and travis-ci.com <http://travis-ci.com>
> >     <http://travis-ci.com> is the commercial
> >     >>>> version. We currently use a shared resource pool managed by
> >     ASK INFRA
> >     >>>> team on travis-ci.org <http://travis-ci.org>
> >     <http://travis-ci.org>, but we have no control
> >     >>>> over it - we can't see how it's configured, how much
> >     resources are
> >     >>>> available, how resources are allocated among Apache projects,
> >     etc.
> >     >>>> The nice thing about having an account on travis-ci.com
> >     <http://travis-ci.com>
> >     >>>> <http://travis-ci.com> are:
> >     >>>>
> >     >>>> - relatively low cost with much better resource guarantee
> >     than what
> >     >>>> we currently have [1]: $249/month with 5 dedicated concurrency,
> >     >>>> $489/month with 10 concurrency
> >     >>>> - low maintenance work compared to using Jenkins
> >     >>>> - (potentially) no migration cost according to Travis's doc [2]
> >     >>>> (pending verification)
> >     >>>> - full control over the build capacity/configuration compared to
> >     >>>> using ASF INFRA's pool
> >     >>>>
> >     >>>> I'd be surprised if we as such a vibrant community cannot
> >     find and
> >     >>>> fund $249*12=$2988 a year in exchange for a much better
> developer
> >     >>>> experience and much higher productivity.
> >     >>>>
> >     >>>> [1] https://travis-ci.com/plans
> >     >>>> [2]
> >     >>>>
> >     >>
> >
> https://docs.travis-ci.com/user/migrate/open-source-repository-migration
> >     >>>> On Sat, Jun 29, 2019 at 8:39 AM Chesnay Schepler
> >     <chesnay@apache.org <ma...@apache.org>
> >     >>>> <mailto:chesnay@apache.org <ma...@apache.org>>> wrote:
> >     >>>>
> >     >>>>      So yes, the Jenkins job keeps pulling the state from
> >     Travis until it
> >     >>>>      finishes.
> >     >>>>
> >     >>>>      Note sure I'm comfortable with the idea of using Jenkins
> >     workers
> >     >>>>      just to
> >     >>>>      idle for a several hours.
> >     >>>>
> >     >>>>      On 29/06/2019 14:56, Jeff Zhang wrote:
> >     >>>>      > Here's what zeppelin community did, we make a python
> >     script to
> >     >>>>      check the
> >     >>>>      > build status of pull request.
> >     >>>>      > Here's script:
> >     >>>>      >
> >     https://github.com/apache/zeppelin/blob/master/travis_check.py
> >     >>>>      >
> >     >>>>      > And this is the script we used in Jenkins build job.
> >     >>>>      >
> >     >>>>      > if [ -f "travis_check.py" ]; then
> >     >>>>      >    git log -n 1
> >     >>>>      >    STATUS=$(curl -s $BUILD_URL | grep -e "GitHub pull
> >     >>>>      request.*from.*" | sed
> >     >>>>      > 's/.*GitHub pull request <a
> >     >>>>      > href=\"\(https[^"]*\).*from[^"]*.\(https[^"]*\).*/\1
> >     \2/g')
> >     >>>>      >    AUTHOR=$(echo $STATUS | sed 's/.*[/]\(.*\)$/\1/g')
> >     >>>>      >    PR=$(echo $STATUS | awk '{print $1}' | sed
> >     >>>> 's/.*[/]\(.*\)$/\1/g')
> >     >>>>      >    #COMMIT=$(git log -n 1 | grep "^Merge:" | awk
> >     '{print $3}')
> >     >>>>      >    #if [ -z $COMMIT ]; then
> >     >>>>      >    #  COMMIT=$(curl -s
> >     >>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR
> >     >>>>      > | grep -e "\"label\":" -e "\"ref\":" -e "\"sha\":" |
> >     tr '\n' ' '
> >     >>>>      | sed
> >     >>>>      > 's/\(.*sha[^,]*,\)\(.*ref.*\)/\1 = \2/g' | tr = '\n' |
> >     grep -v
> >     >>>>      "apache:" |
> >     >>>>      > sed 's/.*sha.[^"]*["]\([^"]*\).*/\1/g')
> >     >>>>      >    #fi
> >     >>>>      >
> >     >>>>      >    # get commit hash from PR
> >     >>>>      >    COMMIT=$(curl -s
> >     >>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR |
> >     >>>>      > grep -e "\"label\":" -e "\"ref\":" -e "\"sha\":" | tr
> >     '\n' ' '
> >     >>>> | sed
> >     >>>>      > 's/\(.*sha[^,]*,\)\(.*ref.*\)/\1 = \2/g' | tr = '\n' |
> >     grep -v
> >     >>>>      "apache:" |
> >     >>>>      > sed 's/.*sha.[^"]*["]\([^"]*\).*/\1/g')
> >     >>>>      >    sleep 30 # sleep few moment to wait travis starts
> >     the build
> >     >>>>      >    RET_CODE=0
> >     >>>>      >    python ./travis_check.py ${AUTHOR} ${COMMIT} ||
> >     RET_CODE=$?
> >     >>>>      >    if [ $RET_CODE -eq 2 ]; then # try with repository
> >     name when
> >     >>>>      travis-ci is
> >     >>>>      > not available in the account
> >     >>>>      >      RET_CODE=0
> >     >>>>      >      AUTHOR=$(curl -s
> >     >>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR
> >     >>>>      > | grep '"full_name":' | grep -v "apache/zeppelin" | sed
> >     >>>>      > 's/.*[:][^"]*["]\([^/]*\).*/\1/g')
> >     >>>>      >    python ./travis_check.py ${AUTHOR} ${COMMIT} ||
> >     RET_CODE=$?
> >     >>>>      >    fi
> >     >>>>      >
> >     >>>>      >    if [ $RET_CODE -eq 2 ]; then # fail with can't find
> >     build
> >     >>>>      information in
> >     >>>>      > the travis
> >     >>>>      >      set +x
> >     >>>>      >      echo
> >     "-----------------------------------------------------"
> >     >>>>      >      echo "Looks like travis-ci is not configured for
> >     your fork."
> >     >>>>      >      echo "Please setup by swich on 'zeppelin'
> >     repository at
> >     >>>>      > https://travis-ci.org/profile and travis-ci."
> >     >>>>      >      echo "And then make sure 'Build branch updates'
> >     option is
> >     >>>>      enabled in
> >     >>>>      > the settings
> >     https://travis-ci.org/${AUTHOR}/zeppelin/settings
> >     <https://travis-ci.org/$%7BAUTHOR%7D/zeppelin/settings>
> >     >>>> <https://travis-ci.org/$%7BAUTHOR%7D/zeppelin/settings>."
> >     >>>>      >      echo ""
> >     >>>>      >      echo "To trigger CI after setup, you will need
> >     ammend your
> >     >>>>      last commit
> >     >>>>      > with"
> >     >>>>      >      echo "git commit --amend"
> >     >>>>      >      echo "git push your-remote HEAD --force"
> >     >>>>      >      echo ""
> >     >>>>      >      echo "See
> >     >>>>      >
> >     >>>>
> >     >>
> >
> http://zeppelin.apache.org/contribution/contributions.html#continuous-integration
> >     >>>>      > ."
> >     >>>>      >    fi
> >     >>>>      >
> >     >>>>      >    exit $RET_CODE
> >     >>>>      > else
> >     >>>>      >    set +x
> >     >>>>      >    echo "travis_check.py does not exists"
> >     >>>>      >    exit 1
> >     >>>>      > fi
> >     >>>>      >
> >     >>>>      > Chesnay Schepler <chesnay@apache.org
> >     <ma...@apache.org>
> >     >>>>      <mailto:chesnay@apache.org <ma...@apache.org>>>
> >     于2019年6月29日周六 下午3:17写道:
> >     >>>>      >
> >     >>>>      >> Does this imply that a Jenkins job is active as long
> >     as the
> >     >>>>      Travis build
> >     >>>>      >> runs?
> >     >>>>      >>
> >     >>>>      >> On 26/06/2019 21:28, Bowen Li wrote:
> >     >>>>      >>> Hi,
> >     >>>>      >>>
> >     >>>>      >>> @Dawid, I think the "long test running" as I
> >     mentioned in the
> >     >>>>      first
> >     >>>>      >> email,
> >     >>>>      >>> also as you guys said, belongs to "a big effort
> >     which is much
> >     >>>>      harder to
> >     >>>>      >>> accomplish in a short period of time and may deserve
> >     its own
> >     >>>>      separate
> >     >>>>      >>> discussion". Thus I didn't include it in what we can
> >     do in a
> >     >>>>      foreseeable
> >     >>>>      >>> short term.
> >     >>>>      >>>
> >     >>>>      >>> Besides, I don't think that's the ultimate reason
> >     for lack of
> >     >>>>      build
> >     >>>>      >>> resources. Even if the build is shortened to
> >     something like
> >     >>>>      2h, the
> >     >>>>      >>> problems of no build machine works about 6 or more
> >     hours in
> >     >>>>      PST daytime
> >     >>>>      >>> that I described will still happen, because no
> >     machine from
> >     >>>>      ASF INFRA's
> >     >>>>      >>> pool is allocated to Flink. As I have paid close
> >     attention to
> >     >>>>      the build
> >     >>>>      >>> queue in the past few weekdays, it's a pretty clear
> >     pattern now.
> >     >>>>      >>>
> >     >>>>      >>> **The ultimate root cause** for that is - we don't
> >     have any
> >     >>>>      **dedicated**
> >     >>>>      >>> build resources that we can stably rely on. I'm
> >     actually ok to
> >     >>>>      wait for a
> >     >>>>      >>> long time if there are build requests running, it
> >     means at
> >     >>>>      least we are
> >     >>>>      >>> making progress. But I'm not ok with no build
> >     resource. A
> >     >>>>      better place I
> >     >>>>      >>> think we should aim at in short term is to always
> >     have at
> >     >>>>      least a central
> >     >>>>      >>> pool (can be 3 or 5) of machines dedicated to build
> >     Flink at
> >     >>>>      any time, or
> >     >>>>      >>> maybe use users resources.
> >     >>>>      >>>
> >     >>>>      >>> @Chesnay @Robert I synced with Jeff offline that
> >     Zeppelin
> >     >>>>      community is
> >     >>>>      >>> using a Jenkins job to automatically build on users'
> >     travis
> >     >>>>      account and
> >     >>>>      >>> link the result back to github PR. I guess the
> >     Jenkins job
> >     >>>>      would fetch
> >     >>>>      >>> latest upstream master and build the PR against it.
> >     Jeff has
> >     >>>> filed
> >     >>>>      >> tickets
> >     >>>>      >>> to learn and get access to the Jenkins infra. It'll
> >     better to
> >     >>>>      fully
> >     >>>>      >>> understand it first before judging this approach.
> >     >>>>      >>>
> >     >>>>      >>> I also heard good things about CircleCI, and ASF
> >     INFRA seems
> >     >>>>      to have a
> >     >>>>      >> pool
> >     >>>>      >>> of build capacity there too. Can be an alternative
> >     to consider.
> >     >>>>      >>>
> >     >>>>      >>>
> >     >>>>      >>>
> >     >>>>      >>>
> >     >>>>      >>>
> >     >>>>      >>>
> >     >>>>      >>>
> >     >>>>      >>>
> >     >>>>      >>>
> >     >>>>      >>> On Wed, Jun 26, 2019 at 12:44 AM Dawid Wysakowicz <
> >     >>>>      >> dwysakowicz@apache.org
> >     <ma...@apache.org> <mailto:dwysakowicz@apache.org
> >     <ma...@apache.org>>>
> >     >>>>      >>> wrote:
> >     >>>>      >>>
> >     >>>>      >>>> Sorry to jump in late, but I think Bowen missed the
> >     most
> >     >>>>      important point
> >     >>>>      >>>> from Chesnay's previous message in the summary. The
> >     ultimate
> >     >>>>      reason for
> >     >>>>      >>>> all the problems is that the tests take close to 2
> >     hours to
> >     >>>>      run already.
> >     >>>>      >>>> I fully support this claim: "Unless people start
> >     caring about
> >     >>>>      test times
> >     >>>>      >>>> before adding them, this issue cannot be solved"
> >     >>>>      >>>>
> >     >>>>      >>>> This is also another reason why using user's Travis
> >     account
> >     >>>>      won't help.
> >     >>>>      >>>> Every few weeks we reach the user's time limit for
> >     a single
> >     >>>>      profile.
> >     >>>>      >>>> This makes the user's builds simply fail, until we
> >     either
> >     >>>>      properly
> >     >>>>      >>>> decrease the time the tests take (which I am not
> >     sure we ever
> >     >>>>      did) or
> >     >>>>      >>>> postpone the problem by splitting into more
> >     profiles. (Note
> >     >>>>      that the ASF
> >     >>>>      >>>> Travis account has higher time limits)
> >     >>>>      >>>>
> >     >>>>      >>>> Best,
> >     >>>>      >>>>
> >     >>>>      >>>> Dawid
> >     >>>>      >>>>
> >     >>>>      >>>> On 26/06/2019 09:36, Robert Metzger wrote:
> >     >>>>      >>>>> Do we know if using "the best" available hardware
> >     would
> >     >>>>      improve the
> >     >>>>      >> build
> >     >>>>      >>>>> times?
> >     >>>>      >>>>> Imagine we would run the build on machines with
> >     plenty of
> >     >>>>      main memory
> >     >>>>      >> to
> >     >>>>      >>>>> mount everything to ramdisk + the latest CPU
> >     architecture?
> >     >>>>      >>>>>
> >     >>>>      >>>>> Throwing hardware at the problem could help reduce
> >     the time
> >     >>>>      of an
> >     >>>>      >>>>> individual build, and using our own infrastructure
> >     would
> >     >>>>      remove our
> >     >>>>      >>>>> dependency on Apache's Travis account (with the
> >     obvious
> >     >>>>      downside of
> >     >>>>      >>>> having
> >     >>>>      >>>>> to maintain the infrastructure)
> >     >>>>      >>>>> We could use an open source travis alternative, to
> >     have a
> >     >>>>      similar
> >     >>>>      >>>>> experience and make the migration easy.
> >     >>>>      >>>>>
> >     >>>>      >>>>>
> >     >>>>      >>>>> On Wed, Jun 26, 2019 at 9:34 AM Chesnay Schepler
> >     >>>>      <chesnay@apache.org <ma...@apache.org>
> >     <mailto:chesnay@apache.org <ma...@apache.org>>>
> >     >>>>      >>>> wrote:
> >     >>>>      >>>>>>    >From what I gathered, there's no special
> >     sauce that the
> >     >>>>      Zeppelin
> >     >>>>      >>>>>> project uses which actually integrates a users
> Travis
> >     >>>>      account into the
> >     >>>>      >>>> PR.
> >     >>>>      >>>>>> They just disabled Travis for PRs. And that's
> >     kind of it.
> >     >>>>      >>>>>>
> >     >>>>      >>>>>> Naturally we can do this (duh) and safe the ASF a
> >     fair
> >     >>>>      amount of
> >     >>>>      >>>>>> resources, but there are downsides:
> >     >>>>      >>>>>>
> >     >>>>      >>>>>> The discoverability of the Travis check takes a
> >     nose-dive.
> >     >>>>      Either we
> >     >>>>      >>>>>> require every contributor to always, an every
> >     commit, also
> >     >>>>      post a
> >     >>>>      >> Travis
> >     >>>>      >>>>>> build, or we have the reviewer sift through the
> >     >>>>      contributors account
> >     >>>>      >> to
> >     >>>>      >>>>>> find it.
> >     >>>>      >>>>>>
> >     >>>>      >>>>>> This is rather cumbersome. Additionally, it's
> >     also not
> >     >>>>      equivalent to
> >     >>>>      >>>>>> having a PR build.
> >     >>>>      >>>>>>
> >     >>>>      >>>>>> A normal branch build takes a branch as is and
> >     tests it. A
> >     >>>>      PR build
> >     >>>>      >>>>>> merges the branch into master, and then runs it.
> >     (Fun fact:
> >     >>>>      This is
> >     >>>>      >> why
> >     >>>>      >>>>>> a PR without merge conflicts is not being run on
> >     Travis.)
> >     >>>>      >>>>>>
> >     >>>>      >>>>>> And ultimately, everyone can already make use of
> this
> >     >>>>      approach anyway.
> >     >>>>      >>>>>>
> >     >>>>      >>>>>> On 25/06/2019 08:02, Jark Wu wrote:
> >     >>>>      >>>>>>> Hi Jeff,
> >     >>>>      >>>>>>>
> >     >>>>      >>>>>>> Thanks for sharing the Zeppelin approach. I
> >     think it's a
> >     >>>>      good idea to
> >     >>>>      >>>>>>> leverage user's travis account.
> >     >>>>      >>>>>>> In this way, we can have almost unlimited
> >     concurrent build
> >     >>>>      jobs and
> >     >>>>      >>>>>>> developers can restart build by themselves
> >     (currently only
> >     >>>>      committers
> >     >>>>      >>>>>>> can restart PR's build).
> >     >>>>      >>>>>>>
> >     >>>>      >>>>>>> But I'm still not very clear how to integrate
> user's
> >     >>>>      travis build
> >     >>>>      >> into
> >     >>>>      >>>>>>> the Flink pull request's build automatically.
> >     Can you
> >     >>>>      explain more in
> >     >>>>      >>>>>>> detail?
> >     >>>>      >>>>>>>
> >     >>>>      >>>>>>> Another question: does travis only build
> >     branches for user
> >     >>>>      account?
> >     >>>>      >>>>>>> My concern is that builds for PRs will rebase
> user's
> >     >>>>      commits against
> >     >>>>      >>>>>>> current master branch.
> >     >>>>      >>>>>>> This will help us to find problems before
> >     merge.  Builds
> >     >>>>      for branches
> >     >>>>      >>>>>>> will lose the impact of new commits in master.
> >     >>>>      >>>>>>> How does Zeppelin solve this problem?
> >     >>>>      >>>>>>>
> >     >>>>      >>>>>>> Thanks again for sharing the idea.
> >     >>>>      >>>>>>>
> >     >>>>      >>>>>>> Regards,
> >     >>>>      >>>>>>> Jark
> >     >>>>      >>>>>>>
> >     >>>>      >>>>>>> On Tue, 25 Jun 2019 at 11:01, Jeff Zhang
> >     <zjffdu@gmail.com <ma...@gmail.com>
> >     >>>>      <mailto:zjffdu@gmail.com <ma...@gmail.com>>
> >     >>>>      >>>>>>> <mailto:zjffdu@gmail.com
> >     <ma...@gmail.com> <mailto:zjffdu@gmail.com
> >     <ma...@gmail.com>>>> wrote:
> >     >>>>      >>>>>>>
> >     >>>>      >>>>>>>       Hi Folks,
> >     >>>>      >>>>>>>
> >     >>>>      >>>>>>>  Zeppelin meet this kind of issue before, we solve
> >     >>>> it by
> >     >>>>      >> delegating
> >     >>>>      >>>>>>>       each
> >     >>>>      >>>>>>>       one's PR build to his travis account
> >     (Everyone can
> >     >>>>      have 5 free
> >     >>>>      >>>>>>>       slot for
> >     >>>>      >>>>>>>  travis build).
> >     >>>>      >>>>>>>  Apache account travis build is only triggered when
> >     >>>>      PR is merged.
> >     >>>>      >>>>>>>
> >     >>>>      >>>>>>>
> >     >>>>      >>>>>>>
> >     >>>>      >>>>>>>       Kurt Young <ykt836@gmail.com
> >     <ma...@gmail.com>
> >     >>>>      <mailto:ykt836@gmail.com <ma...@gmail.com>>
> >     <mailto:ykt836@gmail.com <ma...@gmail.com>
> >     >>>>      <mailto:ykt836@gmail.com <ma...@gmail.com>>>>
> >     >>>>      >>>>>>>  于2019年6月25日周二 上午10:16写道:
> >     >>>>      >>>>>>>
> >     >>>>      >>>>>>>       > (Forgot to cc George)
> >     >>>>      >>>>>>>       >
> >     >>>>      >>>>>>>       > Best,
> >     >>>>      >>>>>>>       > Kurt
> >     >>>>      >>>>>>>       >
> >     >>>>      >>>>>>>       >
> >     >>>>      >>>>>>>       > On Tue, Jun 25, 2019 at 10:16 AM Kurt Young
> >     >>>>      <ykt836@gmail.com <ma...@gmail.com>
> >     <mailto:ykt836@gmail.com <ma...@gmail.com>>
> >     >>>>      >>>>>>> <mailto:ykt836@gmail.com
> >     <ma...@gmail.com> <mailto:ykt836@gmail.com
> >     <ma...@gmail.com>>>>
> >     >>>>      wrote:
> >     >>>>      >>>>>>>       >
> >     >>>>      >>>>>>>       > > Hi Bowen,
> >     >>>>      >>>>>>>       > >
> >     >>>>      >>>>>>>       > > Thanks for bringing this up. We
> >     actually have
> >     >>>>      discussed
> >     >>>>      >> about
> >     >>>>      >>>>>>>       this, and I
> >     >>>>      >>>>>>>       > > think Till and George have
> >     >>>>      >>>>>>>       > > already spend sometime investigating
> >     it. I have
> >     >>>>      cced both of
> >     >>>>      >>>>>>>       them, and
> >     >>>>      >>>>>>>       > > maybe they can share
> >     >>>>      >>>>>>>       > > their findings.
> >     >>>>      >>>>>>>       > >
> >     >>>>      >>>>>>>       > > Best,
> >     >>>>      >>>>>>>       > > Kurt
> >     >>>>      >>>>>>>       > >
> >     >>>>      >>>>>>>       > >
> >     >>>>      >>>>>>>       > > On Tue, Jun 25, 2019 at 10:08 AM Jark Wu
> >     >>>>      <imjark@gmail.com <ma...@gmail.com>
> >     <mailto:imjark@gmail.com <ma...@gmail.com>>
> >     >>>>      >>>>>>> <mailto:imjark@gmail.com
> >     <ma...@gmail.com> <mailto:imjark@gmail.com
> >     <ma...@gmail.com>>>>
> >     >>>>      wrote:
> >     >>>>      >>>>>>>       > >
> >     >>>>      >>>>>>>       > >> Hi Bowen,
> >     >>>>      >>>>>>>       > >>
> >     >>>>      >>>>>>>       > >> Thanks for bringing this. We also
> >     suffered from
> >     >>>>      the long
> >     >>>>      >>>>>>>       build time.
> >     >>>>      >>>>>>>       > >> I agree that we should focus on
> >     solving build
> >     >>>>      capacity
> >     >>>>      >>>>>>>  problem in the
> >     >>>>      >>>>>>>       > >> thread.
> >     >>>>      >>>>>>>       > >>
> >     >>>>      >>>>>>>       > >> My observation is there is only one
> >     build is
> >     >>>>      running, all
> >     >>>>      >> the
> >     >>>>      >>>>>>>  others
> >     >>>>      >>>>>>>       > >> (other
> >     >>>>      >>>>>>>       > >> PRs, master) are pending.
> >     >>>>      >>>>>>>       > >> The pricing plan[1] of travis shows
> >     it can
> >     >>>> support
> >     >>>>      >> concurrent
> >     >>>>      >>>>>>>       build
> >     >>>>      >>>>>>>       > jobs.
> >     >>>>      >>>>>>>       > >> But I don't know which plan we are
> >     using, might
> >     >>>>      be the free
> >     >>>>      >>>>>>>       plan for
> >     >>>>      >>>>>>>       > open
> >     >>>>      >>>>>>>       > >> source.
> >     >>>>      >>>>>>>       > >>
> >     >>>>      >>>>>>>       > >> I cc-ed Chesnay who may have some
> >     experience on
> >     >>>>      Travis.
> >     >>>>      >>>>>>>       > >>
> >     >>>>      >>>>>>>       > >> Regards,
> >     >>>>      >>>>>>>       > >> Jark
> >     >>>>      >>>>>>>       > >>
> >     >>>>      >>>>>>>       > >> [1]: https://travis-ci.com/plans
> >     >>>>      >>>>>>>       > >>
> >     >>>>      >>>>>>>       > >> On Tue, 25 Jun 2019 at 08:11, Bowen Li <
> >     >>>>      >> bowenli86@gmail.com <ma...@gmail.com>
> >     <mailto:bowenli86@gmail.com <ma...@gmail.com>>
> >     >>>>      >>>>>>> <mailto:bowenli86@gmail.com
> >     <ma...@gmail.com>
> >     >>>>      <mailto:bowenli86@gmail.com
> >     <ma...@gmail.com>>>> wrote:
> >     >>>>      >>>>>>>       > >>
> >     >>>>      >>>>>>>       > >> > Hi Steven,
> >     >>>>      >>>>>>>       > >> >
> >     >>>>      >>>>>>>       > >> > I think you may not read what I
> >     wrote. The
> >     >>>>      discussion is
> >     >>>>      >>>> about
> >     >>>>      >>>>>>>       > "unstable
> >     >>>>      >>>>>>>       > >> > build **capacity**", in another word
> >     >>>>      "unstable / lack of
> >     >>>>      >>>> build
> >     >>>>      >>>>>>>       > >> resources",
> >     >>>>      >>>>>>>       > >> > not "unstable build".
> >     >>>>      >>>>>>>       > >> >
> >     >>>>      >>>>>>>       > >> > On Mon, Jun 24, 2019 at 4:40 PM
> >     Steven Wu
> >     >>>>      >>>>>>>       <stevenz3wu@gmail.com
> >     <ma...@gmail.com> <mailto:stevenz3wu@gmail.com
> >     <ma...@gmail.com>>
> >     >>>>      <mailto:stevenz3wu@gmail.com
> >     <ma...@gmail.com> <mailto:stevenz3wu@gmail.com
> >     <ma...@gmail.com>>>>
> >     >>>>      >>>>>>>       > wrote:
> >     >>>>      >>>>>>>       > >> >
> >     >>>>      >>>>>>>       > >> > > long and sometimes unstable build is
> >     >>>>      definitely a pain
> >     >>>>      >>>>>> point.
> >     >>>>      >>>>>>>       > >> > >
> >     >>>>      >>>>>>>       > >> > > I suspect the build failure here in
> >     >>>>      >> flink-connector-kafka
> >     >>>>      >>>>>>>       is not
> >     >>>>      >>>>>>>       > >> related
> >     >>>>      >>>>>>>       > >> > to
> >     >>>>      >>>>>>>       > >> > > my change. but there is no easy
> >     re-run the
> >     >>>>      build on
> >     >>>>      >>>>>>>  travis UI.
> >     >>>>      >>>>>>>       > Google
> >     >>>>      >>>>>>>       > >> > > search showed a trick of
> >     close-and-open the
> >     >>>>      PR will
> >     >>>>      >>>>>>>  trigger rebuild.
> >     >>>>      >>>>>>>       > >> but
> >     >>>>      >>>>>>>       > >> > > that could add noises to the PR
> >     activities.
> >     >>>>      >>>>>>>       > >> > >
> >     >>>> https://travis-ci.org/apache/flink/jobs/545555519
> >     >>>>      >>>>>>>       > >> > >
> >     >>>>      >>>>>>>       > >> > > travis-ci for my personal repo
> >     often failed
> >     >>>>      with
> >     >>>>      >>>>>>>  exceeding time
> >     >>>>      >>>>>>>       > limit
> >     >>>>      >>>>>>>       > >> > after
> >     >>>>      >>>>>>>       > >> > > 4+ hours.
> >     >>>>      >>>>>>>       > >> > > The job exceeded the maximum time
> >     limit for
> >     >>>>      jobs, and
> >     >>>>      >> has
> >     >>>>      >>>>>>>       been
> >     >>>>      >>>>>>>       > >> > terminated.
> >     >>>>      >>>>>>>       > >> > >
> >     >>>>      >>>>>>>       > >> > > On Mon, Jun 24, 2019 at 4:15 PM
> >     Bowen Li
> >     >>>>      >>>>>>>       <bowenli86@gmail.com
> >     <ma...@gmail.com> <mailto:bowenli86@gmail.com
> >     <ma...@gmail.com>>
> >     >>>>      <mailto:bowenli86@gmail.com <ma...@gmail.com>
> >     <mailto:bowenli86@gmail.com <ma...@gmail.com>>>>
> >     >>>>      >>>>>>>       > wrote:
> >     >>>>      >>>>>>>       > >> > >
> >     >>>>      >>>>>>>       > >> > > >
> >     >>>> https://travis-ci.org/apache/flink/builds/549681530
> >     >>>>      >>>>>>>       This build
> >     >>>>      >>>>>>>       > >> > request
> >     >>>>      >>>>>>>       > >> > > > has
> >     >>>>      >>>>>>>       > >> > > > been sitting at **HEAD of the
> >     queue**
> >     >>>>      since I first
> >     >>>>      >> saw
> >     >>>>      >>>>>>>       it at PST
> >     >>>>      >>>>>>>       > >> > 10:30am
> >     >>>>      >>>>>>>       > >> > > > (not sure how long it's been
> >     there before
> >     >>>>      10:30am).
> >     >>>>      >>>>>>>       It's PST
> >     >>>>      >>>>>>>       > 4:12pm
> >     >>>>      >>>>>>>       > >> now
> >     >>>>      >>>>>>>       > >> > > and
> >     >>>>      >>>>>>>       > >> > > > it hasn't started yet.
> >     >>>>      >>>>>>>       > >> > > >
> >     >>>>      >>>>>>>       > >> > > > On Mon, Jun 24, 2019 at 2:48 PM
> >     Bowen Li
> >     >>>>      >>>>>>>       <bowenli86@gmail.com
> >     <ma...@gmail.com> <mailto:bowenli86@gmail.com
> >     <ma...@gmail.com>>
> >     >>>>      <mailto:bowenli86@gmail.com <ma...@gmail.com>
> >     <mailto:bowenli86@gmail.com <ma...@gmail.com>>>>
> >     >>>>      >>>>>>>       > >> wrote:
> >     >>>>      >>>>>>>       > >> > > >
> >     >>>>      >>>>>>>       > >> > > > > Hi devs,
> >     >>>>      >>>>>>>       > >> > > > >
> >     >>>>      >>>>>>>       > >> > > > > I've been experiencing the pain
> >     >>>>      resulting from lack
> >     >>>>      >>>>>>>       of stable
> >     >>>>      >>>>>>>       > >> build
> >     >>>>      >>>>>>>       > >> > > > > capacity on Travis for Flink
> >     PRs [1].
> >     >>>>      >> Specifically, I
> >     >>>>      >>>>>>>  noticed
> >     >>>>      >>>>>>>       > >> often
> >     >>>>      >>>>>>>       > >> > > that
> >     >>>>      >>>>>>>       > >> > > > no
> >     >>>>      >>>>>>>       > >> > > > > build in the queue is making any
> >     >>>>      progress for
> >     >>>>      >> hours,
> >     >>>>      >>>> and
> >     >>>>      >>>>>>>       > suddenly
> >     >>>>      >>>>>>>       > >> 5
> >     >>>>      >>>>>>>       > >> > or
> >     >>>>      >>>>>>>       > >> > > 6
> >     >>>>      >>>>>>>       > >> > > > > builds kick off all together
> >     after the
> >     >>>>      long pause.
> >     >>>>      >>>>>>>       I'm at PST
> >     >>>>      >>>>>>>       > >> > (UTC-08)
> >     >>>>      >>>>>>>       > >> > > > time
> >     >>>>      >>>>>>>       > >> > > > > zone, and I've seen pause can
> >     be as
> >     >>>>      long as 6 hours
> >     >>>>      >>>>>>>       from PST 9am
> >     >>>>      >>>>>>>       > >> to
> >     >>>>      >>>>>>>       > >> > 3pm
> >     >>>>      >>>>>>>       > >> > > > > (let alone the time needed to
> >     drain the
> >     >>>>      queue
> >     >>>>      >>>>>>>  afterwards).
> >     >>>>      >>>>>>>       > >> > > > >
> >     >>>>      >>>>>>>       > >> > > > > I think this has greatly
> >     impacted our
> >     >>>>      productivity.
> >     >>>>      >>>> I've
> >     >>>>      >>>>>>>       > >> experienced
> >     >>>>      >>>>>>>       > >> > > that
> >     >>>>      >>>>>>>       > >> > > > > PRs submitted in the early
> >     morning of
> >     >>>>      PST time zone
> >     >>>>      >>>>>>>       won't finish
> >     >>>>      >>>>>>>       > >> > their
> >     >>>>      >>>>>>>       > >> > > > > build until late night of the
> >     same day.
> >     >>>>      >>>>>>>       > >> > > > >
> >     >>>>      >>>>>>>       > >> > > > > So my questions are:
> >     >>>>      >>>>>>>       > >> > > > >
> >     >>>>      >>>>>>>       > >> > > > > - Has anyone else experienced
> >     the same
> >     >>>>      problem or
> >     >>>>      >>>>>>>       have similar
> >     >>>>      >>>>>>>       > >> > > > observation
> >     >>>>      >>>>>>>       > >> > > > > on TravisCI? (I suspect it
> >     has things
> >     >>>>      to do with
> >     >>>>      >> time
> >     >>>>      >>>>>>>       zone)
> >     >>>>      >>>>>>>       > >> > > > >
> >     >>>>      >>>>>>>       > >> > > > > - What pricing plan of
> >     TravisCI is
> >     >>>>      Flink currently
> >     >>>>      >>>>>>>  using? Is it
> >     >>>>      >>>>>>>       > >> the
> >     >>>>      >>>>>>>       > >> > > free
> >     >>>>      >>>>>>>       > >> > > > > plan for open source
> >     projects? What
> >     >>>> are the
> >     >>>>      >>>>>>>  guaranteed build
> >     >>>>      >>>>>>>       > >> capacity
> >     >>>>      >>>>>>>       > >> > > of
> >     >>>>      >>>>>>>       > >> > > > > the current plan?
> >     >>>>      >>>>>>>       > >> > > > >
> >     >>>>      >>>>>>>       > >> > > > > - If the current pricing plan
> >     (either
> >     >>>>      free or paid)
> >     >>>>      >>>>>> can't
> >     >>>>      >>>>>>>       > provide
> >     >>>>      >>>>>>>       > >> > > stable
> >     >>>>      >>>>>>>       > >> > > > > build capacity, can we
> >     upgrade to a
> >     >>>>      higher priced
> >     >>>>      >>>>>>>       plan with
> >     >>>>      >>>>>>>       > larger
> >     >>>>      >>>>>>>       > >> > and
> >     >>>>      >>>>>>>       > >> > > > more
> >     >>>>      >>>>>>>       > >> > > > > stable build capacity?
> >     >>>>      >>>>>>>       > >> > > > >
> >     >>>>      >>>>>>>       > >> > > > > BTW, another factor that
> >     contribute to
> >     >>>> the
> >     >>>>      >>>>>>>  productivity problem
> >     >>>>      >>>>>>>       > is
> >     >>>>      >>>>>>>       > >> > that
> >     >>>>      >>>>>>>       > >> > > > > our build is slow - we run
> >     full build
> >     >>>>      for every PR
> >     >>>>      >>>> and a
> >     >>>>      >>>>>>>       > >> successful
> >     >>>>      >>>>>>>       > >> > > full
> >     >>>>      >>>>>>>       > >> > > > > build takes ~5h. We
> >     definitely have
> >     >>>>      more options to
> >     >>>>      >>>>>>>       solve it,
> >     >>>>      >>>>>>>       > for
> >     >>>>      >>>>>>>       > >> > > > instance,
> >     >>>>      >>>>>>>       > >> > > > > modularize the build graphs
> >     and reuse
> >     >>>>      artifacts
> >     >>>>      >> from
> >     >>>>      >>>> the
> >     >>>>      >>>>>>>       > previous
> >     >>>>      >>>>>>>       > >> > > build.
> >     >>>>      >>>>>>>       > >> > > > > But I think that can be a big
> >     effort
> >     >>>>      which is much
> >     >>>>      >>>>>>>  harder to
> >     >>>>      >>>>>>>       > >> > accomplish
> >     >>>>      >>>>>>>       > >> > > > in
> >     >>>>      >>>>>>>       > >> > > > > a short period of time and
> >     may deserve
> >     >>>>      its own
> >     >>>>      >>>> separate
> >     >>>>      >>>>>>>       > >> discussion.
> >     >>>>      >>>>>>>       > >> > > > >
> >     >>>>      >>>>>>>       > >> > > > > [1]
> >     >>>>      >> https://travis-ci.org/apache/flink/pull_requests
> >     >>>>      >>>>>>>       > >> > > > >
> >     >>>>      >>>>>>>       > >> > > > >
> >     >>>>      >>>>>>>       > >> > > >
> >     >>>>      >>>>>>>       > >> > >
> >     >>>>      >>>>>>>       > >> >
> >     >>>>      >>>>>>>       > >>
> >     >>>>      >>>>>>>       > >
> >     >>>>      >>>>>>>       >
> >     >>>>      >>>>>>>
> >     >>>>      >>>>>>>
> >     >>>>      >>>>>>>       --
> >     >>>>      >>>>>>>       Best Regards
> >     >>>>      >>>>>>>
> >     >>>>      >>>>>>>       Jeff Zhang
> >     >>>>      >>>>>>>
> >     >>>>      >>
> >     >>>>
> >     >>>
> >     >>
> >
>
>

[VOTE] Migrate to sponsored Travis account

Posted by Chesnay Schepler <ch...@apache.org>.
I've raised a JIRA 
<https://issues.apache.org/jira/browse/INFRA-18703>with INFRA to inquire 
whether it would be possible to switch to a different Travis account, 
and if so what steps would need to be taken.
We need a proper confirmation from INFRA since we are not in full 
control of the flink repository (for example, we cannot access the 
settings page).

If this is indeed possible, Ververica is willing sponsor a Travis 
account for the Flink project.
This would provide us with more than enough resources than we need.

Since this makes the project more reliant on resources provided by 
external companies I would like to vote on this.

Please vote on this proposal, as follows:
[ ] +1, Approve the migration to a Ververica-sponsored Travis account, 
provided that INFRA approves
[ ] -1, Do not approach the migration to a Ververica-sponsored Travis 
account

The vote will be open for at least 24h, and until we have confirmation 
from INFRA. The voting period may be shorter than the usual 3 days since 
our current is effectively not working.

On 04/07/2019 06:51, Bowen Li wrote:
> Re: > Are they using their own Travis CI pool, or did the switch to an 
> entirely different CI service?
>
> I reached out to Wes and Krisztián from Apache Arrow PMC. They are 
> currently moving away from ASF's Travis to their own in-house metal 
> machines at [1] with custom CI application at [2]. They've seen 
> significant improvement w.r.t both much higher performance and 
> basically no resource waiting time, "night-and-day" difference quoting 
> Wes.
>
> Re: > If we can just switch to our own Travis pool, just for our 
> project, then this might be something we can do fairly quickly?
>
> I believe so, according to [3] and [4]
>
>
> [1] https://ci.ursalabs.org/ <https://ci.ursalabs.org/#/>
> [2] https://github.com/ursa-labs/ursabot
> [3] 
> https://docs.travis-ci.com/user/migrate/open-source-repository-migration
> [4] https://docs.travis-ci.com/user/migrate/open-source-on-travis-ci-com
>
>
>
> On Wed, Jul 3, 2019 at 12:01 AM Chesnay Schepler <chesnay@apache.org 
> <ma...@apache.org>> wrote:
>
>     Are they using their own Travis CI pool, or did the switch to an
>     entirely different CI service?
>
>     If we can just switch to our own Travis pool, just for our
>     project, then
>     this might be something we can do fairly quickly?
>
>     On 03/07/2019 05:55, Bowen Li wrote:
>     > I responded in the INFRA ticket [1] that I believe they are
>     using a wrong
>     > metric against Flink and the total build time is a completely
>     different
>     > thing than guaranteed build capacity.
>     >
>     > My response:
>     >
>     > "As mentioned above, since I started to pay attention to Flink's
>     build
>     > queue a few tens of days ago, I'm in Seattle and I saw no build
>     was kicking
>     > off in PST daytime in weekdays for Flink. Our teammates in China
>     and Europe
>     > have also reported similar observations. So we need to evaluate
>     how the
>     > large total build time came from - if 1) your number and 2) our
>     > observations from three locations that cover pretty much a full
>     day, are
>     > all true, I **guess** one reason can be that - highly likely the
>     extra
>     > build time came from weekends when other Apache projects may be
>     idle and
>     > Flink just drains hard its congested queue.
>     >
>     > Please be aware of that we're not complaining about the lack of
>     resources
>     > in general, I'm complaining about the lack of **stable, dedicated**
>     > resources. An example for the latter one is, currently even if
>     no build is
>     > in Flink's queue and I submit a request to be the queue head in PST
>     > morning, my build won't even start in 6-8+h. That is an absurd
>     amount of
>     > waiting time.
>     >
>     > That's saying, if ASF INFRA decides to adopt a quota system and
>     grants
>     > Flink five DEDICATED servers that runs all the time only for
>     Flink, that'll
>     > be PERFECT and can totally solve our problem now.
>     >
>     > Please be aware of that we're not complaining about the lack of
>     resources
>     > in general, I'm complaining about the lack of **stable, dedicated**
>     > resources. An example for the latter one is, currently even if
>     no build is
>     > in Flink's queue and I submit a request to be the queue head in PST
>     > morning, my build won't even start in 6-8+h. That is an absurd
>     amount of
>     > waiting time.
>     >
>     >
>     > That's saying, if ASF INFRA decides to adopt a quota system and
>     grants
>     > Flink five DEDICATED servers that runs all the time only for
>     Flink, that'll
>     > be PERFECT and can totally solve our problem now.
>     >
>     > I feel what's missing in the ASF INFRA's Travis resource pool is
>     some level
>     > of build capacity SLAs and certainty"
>     >
>     >
>     > Again, I believe there are differences in nature of these two
>     problems,
>     > long build time v.s. lack of dedicated build resource. That's
>     saying,
>     > shortening build time may relieve the situation, and may not.
>     I'm sightly
>     > negative on disabling IT cases for PRs, due to the downside is
>     that we are
>     > at risk of any potential bugs in PR that UTs doesn't catch, and
>     may cost a
>     > lot more to fix and if it slows others down or even block
>     others, but am
>     > open to others opinions on it.
>     >
>     > AFAICT from INFRA ticket[1], donating to ASF INFRA won't be
>     feasible to
>     > solve our problem since INFRA's pool is fully shared and they
>     have no
>     > control and finer insights over resource allocation to a
>     specific Apache
>     > project. As mentioned in [1], Apache Arrow is moving away from
>     ASF INFRA
>     > Travis pool (they are actually surprised Flink hasn't plan to do
>     so). I
>     > know that Spark is on its own build infra. If we all agree that
>     funding our
>     > own build infra, I'd be glad to help investigate any potential
>     options
>     > after releasing 1.9 since I'm super busy with 1.9 now.
>     >
>     > [1] https://issues.apache.org/jira/browse/INFRA-18533
>     >
>     >
>     >
>     > On Tue, Jul 2, 2019 at 4:46 AM Chesnay Schepler
>     <chesnay@apache.org <ma...@apache.org>> wrote:
>     >
>     >> As a short-term stopgap, since we can assume this issue to
>     become much
>     >> worse in the following days/weeks, we could disable IT cases in
>     PRs and
>     >> only run them on master.
>     >>
>     >> On 02/07/2019 12:03, Chesnay Schepler wrote:
>     >>> People really have to stop thinking that just because
>     something works
>     >>> for us it is also a good solution.
>     >>> Also, please remember that our builds run for 2h from start to
>     finish,
>     >>> and not the 14 _minutes_ it takes for zeppelin.
>     >>> We are dealing with an entirely different scale here, both in
>     terms of
>     >>> build times and number of builds.
>     >>>
>     >>> In this very thread people have been complaining about long queue
>     >>> times for their builds. Surprise, other Apache projects have been
>     >>> suffering the very same thing due to us not controlling our build
>     >>> times. While switching services (be it Jenkins, CircleCI or
>     whatever)
>     >>> will possibly work for us (and these options are actually
>     attractive,
>     >>> like CircleCI's proper support for build artifacts), it will also
>     >>> result in us likely negatively affecting other projects in
>     significant
>     >>> ways.
>     >>>
>     >>> Sure, the Jenkins setup has a good user experience for us, at
>     the cost
>     >>> of blocking Jenkins workers for a _lot_ of time. Right now we
>     have 25
>     >>> PR's in our queue; that's possibly 50h we'd consume of Jenkins
>     >>> resources, and the European contributors haven't even really
>     started yet.
>     >>>
>     >>> FYI, the latest INFRA response from INFRA-18533:
>     >>>
>     >>> "Our rough metrics shows that Flink used over 5800 hours of
>     build time
>     >>> last month. That is equal to EIGHT servers running 24/7 for
>     the ENTIRE
>     >>> MONTH. EIGHT. nonstop.
>     >>> When we discovered this last night, we discussed it some and
>     are going
>     >>> to tune down Flink to allow only five executors maximum. We cannot
>     >>> allow Flink to consume so much of a Foundation shared resource."
>     >>>
>     >>> So yes, we either
>     >>> a) have to heavily reduce our CI usage or
>     >>> b) fund our own, either maintaining it ourselves or donating
>     to Apache.
>     >>>
>     >>> On 02/07/2019 05:11, Bowen Li wrote:
>     >>>> By looking at the git history of the Jenkins script, its core
>     part
>     >>>> was finished in March 2017 (and only two minor update in
>     2017/2018),
>     >>>> so it's been running for over two years now and feels like
>     Zepplin
>     >>>> community has been quite happy with it. @Jeff Zhang
>     >>>> <mailto:zjffdu@gmail.com <ma...@gmail.com>> can you
>     share your insights and user
>     >>>> experience with the Jenkins+Travis approach?
>     >>>>
>     >>>> Things like:
>     >>>>
>     >>>> - has the approach completely solved the resource capacity
>     problem
>     >>>> for Zepplin community? is Zepplin community happy with the
>     result?
>     >>>> - is the whole configuration chain stable (e.g. uptime) enough?
>     >>>> - how often do you need to maintain the Jenkins infra? how many
>     >>>> people are usually involved in maintenance and bug-fixes?
>     >>>>
>     >>>> The downside of this approach seems mostly to be on the
>     maintenance
>     >>>> to me - maintain the script and Jenkins infra.
>     >>>>
>     >>>> ** Having Our Own Travis-CI.com Account **
>     >>>>
>     >>>> Another alternative I've been thinking of is to have our own
>     >>>> travis-ci.com <http://travis-ci.com> <http://travis-ci.com>
>     account with paid dedicated
>     >>>> resources. Note travis-ci.org <http://travis-ci.org>
>     <http://travis-ci.org> is the free
>     >>>> version and travis-ci.com <http://travis-ci.com>
>     <http://travis-ci.com> is the commercial
>     >>>> version. We currently use a shared resource pool managed by
>     ASK INFRA
>     >>>> team on travis-ci.org <http://travis-ci.org>
>     <http://travis-ci.org>, but we have no control
>     >>>> over it - we can't see how it's configured, how much
>     resources are
>     >>>> available, how resources are allocated among Apache projects,
>     etc.
>     >>>> The nice thing about having an account on travis-ci.com
>     <http://travis-ci.com>
>     >>>> <http://travis-ci.com> are:
>     >>>>
>     >>>> - relatively low cost with much better resource guarantee
>     than what
>     >>>> we currently have [1]: $249/month with 5 dedicated concurrency,
>     >>>> $489/month with 10 concurrency
>     >>>> - low maintenance work compared to using Jenkins
>     >>>> - (potentially) no migration cost according to Travis's doc [2]
>     >>>> (pending verification)
>     >>>> - full control over the build capacity/configuration compared to
>     >>>> using ASF INFRA's pool
>     >>>>
>     >>>> I'd be surprised if we as such a vibrant community cannot
>     find and
>     >>>> fund $249*12=$2988 a year in exchange for a much better developer
>     >>>> experience and much higher productivity.
>     >>>>
>     >>>> [1] https://travis-ci.com/plans
>     >>>> [2]
>     >>>>
>     >>
>     https://docs.travis-ci.com/user/migrate/open-source-repository-migration
>     >>>> On Sat, Jun 29, 2019 at 8:39 AM Chesnay Schepler
>     <chesnay@apache.org <ma...@apache.org>
>     >>>> <mailto:chesnay@apache.org <ma...@apache.org>>> wrote:
>     >>>>
>     >>>>      So yes, the Jenkins job keeps pulling the state from
>     Travis until it
>     >>>>      finishes.
>     >>>>
>     >>>>      Note sure I'm comfortable with the idea of using Jenkins
>     workers
>     >>>>      just to
>     >>>>      idle for a several hours.
>     >>>>
>     >>>>      On 29/06/2019 14:56, Jeff Zhang wrote:
>     >>>>      > Here's what zeppelin community did, we make a python
>     script to
>     >>>>      check the
>     >>>>      > build status of pull request.
>     >>>>      > Here's script:
>     >>>>      >
>     https://github.com/apache/zeppelin/blob/master/travis_check.py
>     >>>>      >
>     >>>>      > And this is the script we used in Jenkins build job.
>     >>>>      >
>     >>>>      > if [ -f "travis_check.py" ]; then
>     >>>>      >    git log -n 1
>     >>>>      >    STATUS=$(curl -s $BUILD_URL | grep -e "GitHub pull
>     >>>>      request.*from.*" | sed
>     >>>>      > 's/.*GitHub pull request <a
>     >>>>      > href=\"\(https[^"]*\).*from[^"]*.\(https[^"]*\).*/\1
>     \2/g')
>     >>>>      >    AUTHOR=$(echo $STATUS | sed 's/.*[/]\(.*\)$/\1/g')
>     >>>>      >    PR=$(echo $STATUS | awk '{print $1}' | sed
>     >>>> 's/.*[/]\(.*\)$/\1/g')
>     >>>>      >    #COMMIT=$(git log -n 1 | grep "^Merge:" | awk
>     '{print $3}')
>     >>>>      >    #if [ -z $COMMIT ]; then
>     >>>>      >    #  COMMIT=$(curl -s
>     >>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR
>     >>>>      > | grep -e "\"label\":" -e "\"ref\":" -e "\"sha\":" |
>     tr '\n' ' '
>     >>>>      | sed
>     >>>>      > 's/\(.*sha[^,]*,\)\(.*ref.*\)/\1 = \2/g' | tr = '\n' |
>     grep -v
>     >>>>      "apache:" |
>     >>>>      > sed 's/.*sha.[^"]*["]\([^"]*\).*/\1/g')
>     >>>>      >    #fi
>     >>>>      >
>     >>>>      >    # get commit hash from PR
>     >>>>      >    COMMIT=$(curl -s
>     >>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR |
>     >>>>      > grep -e "\"label\":" -e "\"ref\":" -e "\"sha\":" | tr
>     '\n' ' '
>     >>>> | sed
>     >>>>      > 's/\(.*sha[^,]*,\)\(.*ref.*\)/\1 = \2/g' | tr = '\n' |
>     grep -v
>     >>>>      "apache:" |
>     >>>>      > sed 's/.*sha.[^"]*["]\([^"]*\).*/\1/g')
>     >>>>      >    sleep 30 # sleep few moment to wait travis starts
>     the build
>     >>>>      >    RET_CODE=0
>     >>>>      >    python ./travis_check.py ${AUTHOR} ${COMMIT} ||
>     RET_CODE=$?
>     >>>>      >    if [ $RET_CODE -eq 2 ]; then # try with repository
>     name when
>     >>>>      travis-ci is
>     >>>>      > not available in the account
>     >>>>      >      RET_CODE=0
>     >>>>      >      AUTHOR=$(curl -s
>     >>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR
>     >>>>      > | grep '"full_name":' | grep -v "apache/zeppelin" | sed
>     >>>>      > 's/.*[:][^"]*["]\([^/]*\).*/\1/g')
>     >>>>      >    python ./travis_check.py ${AUTHOR} ${COMMIT} ||
>     RET_CODE=$?
>     >>>>      >    fi
>     >>>>      >
>     >>>>      >    if [ $RET_CODE -eq 2 ]; then # fail with can't find
>     build
>     >>>>      information in
>     >>>>      > the travis
>     >>>>      >      set +x
>     >>>>      >      echo
>     "-----------------------------------------------------"
>     >>>>      >      echo "Looks like travis-ci is not configured for
>     your fork."
>     >>>>      >      echo "Please setup by swich on 'zeppelin'
>     repository at
>     >>>>      > https://travis-ci.org/profile and travis-ci."
>     >>>>      >      echo "And then make sure 'Build branch updates'
>     option is
>     >>>>      enabled in
>     >>>>      > the settings
>     https://travis-ci.org/${AUTHOR}/zeppelin/settings
>     <https://travis-ci.org/$%7BAUTHOR%7D/zeppelin/settings>
>     >>>> <https://travis-ci.org/$%7BAUTHOR%7D/zeppelin/settings>."
>     >>>>      >      echo ""
>     >>>>      >      echo "To trigger CI after setup, you will need
>     ammend your
>     >>>>      last commit
>     >>>>      > with"
>     >>>>      >      echo "git commit --amend"
>     >>>>      >      echo "git push your-remote HEAD --force"
>     >>>>      >      echo ""
>     >>>>      >      echo "See
>     >>>>      >
>     >>>>
>     >>
>     http://zeppelin.apache.org/contribution/contributions.html#continuous-integration
>     >>>>      > ."
>     >>>>      >    fi
>     >>>>      >
>     >>>>      >    exit $RET_CODE
>     >>>>      > else
>     >>>>      >    set +x
>     >>>>      >    echo "travis_check.py does not exists"
>     >>>>      >    exit 1
>     >>>>      > fi
>     >>>>      >
>     >>>>      > Chesnay Schepler <chesnay@apache.org
>     <ma...@apache.org>
>     >>>>      <mailto:chesnay@apache.org <ma...@apache.org>>>
>     于2019年6月29日周六 下午3:17写道:
>     >>>>      >
>     >>>>      >> Does this imply that a Jenkins job is active as long
>     as the
>     >>>>      Travis build
>     >>>>      >> runs?
>     >>>>      >>
>     >>>>      >> On 26/06/2019 21:28, Bowen Li wrote:
>     >>>>      >>> Hi,
>     >>>>      >>>
>     >>>>      >>> @Dawid, I think the "long test running" as I
>     mentioned in the
>     >>>>      first
>     >>>>      >> email,
>     >>>>      >>> also as you guys said, belongs to "a big effort
>     which is much
>     >>>>      harder to
>     >>>>      >>> accomplish in a short period of time and may deserve
>     its own
>     >>>>      separate
>     >>>>      >>> discussion". Thus I didn't include it in what we can
>     do in a
>     >>>>      foreseeable
>     >>>>      >>> short term.
>     >>>>      >>>
>     >>>>      >>> Besides, I don't think that's the ultimate reason
>     for lack of
>     >>>>      build
>     >>>>      >>> resources. Even if the build is shortened to
>     something like
>     >>>>      2h, the
>     >>>>      >>> problems of no build machine works about 6 or more
>     hours in
>     >>>>      PST daytime
>     >>>>      >>> that I described will still happen, because no
>     machine from
>     >>>>      ASF INFRA's
>     >>>>      >>> pool is allocated to Flink. As I have paid close
>     attention to
>     >>>>      the build
>     >>>>      >>> queue in the past few weekdays, it's a pretty clear
>     pattern now.
>     >>>>      >>>
>     >>>>      >>> **The ultimate root cause** for that is - we don't
>     have any
>     >>>>      **dedicated**
>     >>>>      >>> build resources that we can stably rely on. I'm
>     actually ok to
>     >>>>      wait for a
>     >>>>      >>> long time if there are build requests running, it
>     means at
>     >>>>      least we are
>     >>>>      >>> making progress. But I'm not ok with no build
>     resource. A
>     >>>>      better place I
>     >>>>      >>> think we should aim at in short term is to always
>     have at
>     >>>>      least a central
>     >>>>      >>> pool (can be 3 or 5) of machines dedicated to build
>     Flink at
>     >>>>      any time, or
>     >>>>      >>> maybe use users resources.
>     >>>>      >>>
>     >>>>      >>> @Chesnay @Robert I synced with Jeff offline that
>     Zeppelin
>     >>>>      community is
>     >>>>      >>> using a Jenkins job to automatically build on users'
>     travis
>     >>>>      account and
>     >>>>      >>> link the result back to github PR. I guess the
>     Jenkins job
>     >>>>      would fetch
>     >>>>      >>> latest upstream master and build the PR against it.
>     Jeff has
>     >>>> filed
>     >>>>      >> tickets
>     >>>>      >>> to learn and get access to the Jenkins infra. It'll
>     better to
>     >>>>      fully
>     >>>>      >>> understand it first before judging this approach.
>     >>>>      >>>
>     >>>>      >>> I also heard good things about CircleCI, and ASF
>     INFRA seems
>     >>>>      to have a
>     >>>>      >> pool
>     >>>>      >>> of build capacity there too. Can be an alternative
>     to consider.
>     >>>>      >>>
>     >>>>      >>>
>     >>>>      >>>
>     >>>>      >>>
>     >>>>      >>>
>     >>>>      >>>
>     >>>>      >>>
>     >>>>      >>>
>     >>>>      >>>
>     >>>>      >>> On Wed, Jun 26, 2019 at 12:44 AM Dawid Wysakowicz <
>     >>>>      >> dwysakowicz@apache.org
>     <ma...@apache.org> <mailto:dwysakowicz@apache.org
>     <ma...@apache.org>>>
>     >>>>      >>> wrote:
>     >>>>      >>>
>     >>>>      >>>> Sorry to jump in late, but I think Bowen missed the
>     most
>     >>>>      important point
>     >>>>      >>>> from Chesnay's previous message in the summary. The
>     ultimate
>     >>>>      reason for
>     >>>>      >>>> all the problems is that the tests take close to 2
>     hours to
>     >>>>      run already.
>     >>>>      >>>> I fully support this claim: "Unless people start
>     caring about
>     >>>>      test times
>     >>>>      >>>> before adding them, this issue cannot be solved"
>     >>>>      >>>>
>     >>>>      >>>> This is also another reason why using user's Travis
>     account
>     >>>>      won't help.
>     >>>>      >>>> Every few weeks we reach the user's time limit for
>     a single
>     >>>>      profile.
>     >>>>      >>>> This makes the user's builds simply fail, until we
>     either
>     >>>>      properly
>     >>>>      >>>> decrease the time the tests take (which I am not
>     sure we ever
>     >>>>      did) or
>     >>>>      >>>> postpone the problem by splitting into more
>     profiles. (Note
>     >>>>      that the ASF
>     >>>>      >>>> Travis account has higher time limits)
>     >>>>      >>>>
>     >>>>      >>>> Best,
>     >>>>      >>>>
>     >>>>      >>>> Dawid
>     >>>>      >>>>
>     >>>>      >>>> On 26/06/2019 09:36, Robert Metzger wrote:
>     >>>>      >>>>> Do we know if using "the best" available hardware
>     would
>     >>>>      improve the
>     >>>>      >> build
>     >>>>      >>>>> times?
>     >>>>      >>>>> Imagine we would run the build on machines with
>     plenty of
>     >>>>      main memory
>     >>>>      >> to
>     >>>>      >>>>> mount everything to ramdisk + the latest CPU
>     architecture?
>     >>>>      >>>>>
>     >>>>      >>>>> Throwing hardware at the problem could help reduce
>     the time
>     >>>>      of an
>     >>>>      >>>>> individual build, and using our own infrastructure
>     would
>     >>>>      remove our
>     >>>>      >>>>> dependency on Apache's Travis account (with the
>     obvious
>     >>>>      downside of
>     >>>>      >>>> having
>     >>>>      >>>>> to maintain the infrastructure)
>     >>>>      >>>>> We could use an open source travis alternative, to
>     have a
>     >>>>      similar
>     >>>>      >>>>> experience and make the migration easy.
>     >>>>      >>>>>
>     >>>>      >>>>>
>     >>>>      >>>>> On Wed, Jun 26, 2019 at 9:34 AM Chesnay Schepler
>     >>>>      <chesnay@apache.org <ma...@apache.org>
>     <mailto:chesnay@apache.org <ma...@apache.org>>>
>     >>>>      >>>> wrote:
>     >>>>      >>>>>>    >From what I gathered, there's no special
>     sauce that the
>     >>>>      Zeppelin
>     >>>>      >>>>>> project uses which actually integrates a users Travis
>     >>>>      account into the
>     >>>>      >>>> PR.
>     >>>>      >>>>>> They just disabled Travis for PRs. And that's
>     kind of it.
>     >>>>      >>>>>>
>     >>>>      >>>>>> Naturally we can do this (duh) and safe the ASF a
>     fair
>     >>>>      amount of
>     >>>>      >>>>>> resources, but there are downsides:
>     >>>>      >>>>>>
>     >>>>      >>>>>> The discoverability of the Travis check takes a
>     nose-dive.
>     >>>>      Either we
>     >>>>      >>>>>> require every contributor to always, an every
>     commit, also
>     >>>>      post a
>     >>>>      >> Travis
>     >>>>      >>>>>> build, or we have the reviewer sift through the
>     >>>>      contributors account
>     >>>>      >> to
>     >>>>      >>>>>> find it.
>     >>>>      >>>>>>
>     >>>>      >>>>>> This is rather cumbersome. Additionally, it's
>     also not
>     >>>>      equivalent to
>     >>>>      >>>>>> having a PR build.
>     >>>>      >>>>>>
>     >>>>      >>>>>> A normal branch build takes a branch as is and
>     tests it. A
>     >>>>      PR build
>     >>>>      >>>>>> merges the branch into master, and then runs it.
>     (Fun fact:
>     >>>>      This is
>     >>>>      >> why
>     >>>>      >>>>>> a PR without merge conflicts is not being run on
>     Travis.)
>     >>>>      >>>>>>
>     >>>>      >>>>>> And ultimately, everyone can already make use of this
>     >>>>      approach anyway.
>     >>>>      >>>>>>
>     >>>>      >>>>>> On 25/06/2019 08:02, Jark Wu wrote:
>     >>>>      >>>>>>> Hi Jeff,
>     >>>>      >>>>>>>
>     >>>>      >>>>>>> Thanks for sharing the Zeppelin approach. I
>     think it's a
>     >>>>      good idea to
>     >>>>      >>>>>>> leverage user's travis account.
>     >>>>      >>>>>>> In this way, we can have almost unlimited
>     concurrent build
>     >>>>      jobs and
>     >>>>      >>>>>>> developers can restart build by themselves
>     (currently only
>     >>>>      committers
>     >>>>      >>>>>>> can restart PR's build).
>     >>>>      >>>>>>>
>     >>>>      >>>>>>> But I'm still not very clear how to integrate user's
>     >>>>      travis build
>     >>>>      >> into
>     >>>>      >>>>>>> the Flink pull request's build automatically.
>     Can you
>     >>>>      explain more in
>     >>>>      >>>>>>> detail?
>     >>>>      >>>>>>>
>     >>>>      >>>>>>> Another question: does travis only build
>     branches for user
>     >>>>      account?
>     >>>>      >>>>>>> My concern is that builds for PRs will rebase user's
>     >>>>      commits against
>     >>>>      >>>>>>> current master branch.
>     >>>>      >>>>>>> This will help us to find problems before
>     merge.  Builds
>     >>>>      for branches
>     >>>>      >>>>>>> will lose the impact of new commits in master.
>     >>>>      >>>>>>> How does Zeppelin solve this problem?
>     >>>>      >>>>>>>
>     >>>>      >>>>>>> Thanks again for sharing the idea.
>     >>>>      >>>>>>>
>     >>>>      >>>>>>> Regards,
>     >>>>      >>>>>>> Jark
>     >>>>      >>>>>>>
>     >>>>      >>>>>>> On Tue, 25 Jun 2019 at 11:01, Jeff Zhang
>     <zjffdu@gmail.com <ma...@gmail.com>
>     >>>>      <mailto:zjffdu@gmail.com <ma...@gmail.com>>
>     >>>>      >>>>>>> <mailto:zjffdu@gmail.com
>     <ma...@gmail.com> <mailto:zjffdu@gmail.com
>     <ma...@gmail.com>>>> wrote:
>     >>>>      >>>>>>>
>     >>>>      >>>>>>>       Hi Folks,
>     >>>>      >>>>>>>
>     >>>>      >>>>>>>  Zeppelin meet this kind of issue before, we solve
>     >>>> it by
>     >>>>      >> delegating
>     >>>>      >>>>>>>       each
>     >>>>      >>>>>>>       one's PR build to his travis account
>     (Everyone can
>     >>>>      have 5 free
>     >>>>      >>>>>>>       slot for
>     >>>>      >>>>>>>  travis build).
>     >>>>      >>>>>>>  Apache account travis build is only triggered when
>     >>>>      PR is merged.
>     >>>>      >>>>>>>
>     >>>>      >>>>>>>
>     >>>>      >>>>>>>
>     >>>>      >>>>>>>       Kurt Young <ykt836@gmail.com
>     <ma...@gmail.com>
>     >>>>      <mailto:ykt836@gmail.com <ma...@gmail.com>>
>     <mailto:ykt836@gmail.com <ma...@gmail.com>
>     >>>>      <mailto:ykt836@gmail.com <ma...@gmail.com>>>>
>     >>>>      >>>>>>>  于2019年6月25日周二 上午10:16写道:
>     >>>>      >>>>>>>
>     >>>>      >>>>>>>       > (Forgot to cc George)
>     >>>>      >>>>>>>       >
>     >>>>      >>>>>>>       > Best,
>     >>>>      >>>>>>>       > Kurt
>     >>>>      >>>>>>>       >
>     >>>>      >>>>>>>       >
>     >>>>      >>>>>>>       > On Tue, Jun 25, 2019 at 10:16 AM Kurt Young
>     >>>>      <ykt836@gmail.com <ma...@gmail.com>
>     <mailto:ykt836@gmail.com <ma...@gmail.com>>
>     >>>>      >>>>>>> <mailto:ykt836@gmail.com
>     <ma...@gmail.com> <mailto:ykt836@gmail.com
>     <ma...@gmail.com>>>>
>     >>>>      wrote:
>     >>>>      >>>>>>>       >
>     >>>>      >>>>>>>       > > Hi Bowen,
>     >>>>      >>>>>>>       > >
>     >>>>      >>>>>>>       > > Thanks for bringing this up. We
>     actually have
>     >>>>      discussed
>     >>>>      >> about
>     >>>>      >>>>>>>       this, and I
>     >>>>      >>>>>>>       > > think Till and George have
>     >>>>      >>>>>>>       > > already spend sometime investigating
>     it. I have
>     >>>>      cced both of
>     >>>>      >>>>>>>       them, and
>     >>>>      >>>>>>>       > > maybe they can share
>     >>>>      >>>>>>>       > > their findings.
>     >>>>      >>>>>>>       > >
>     >>>>      >>>>>>>       > > Best,
>     >>>>      >>>>>>>       > > Kurt
>     >>>>      >>>>>>>       > >
>     >>>>      >>>>>>>       > >
>     >>>>      >>>>>>>       > > On Tue, Jun 25, 2019 at 10:08 AM Jark Wu
>     >>>>      <imjark@gmail.com <ma...@gmail.com>
>     <mailto:imjark@gmail.com <ma...@gmail.com>>
>     >>>>      >>>>>>> <mailto:imjark@gmail.com
>     <ma...@gmail.com> <mailto:imjark@gmail.com
>     <ma...@gmail.com>>>>
>     >>>>      wrote:
>     >>>>      >>>>>>>       > >
>     >>>>      >>>>>>>       > >> Hi Bowen,
>     >>>>      >>>>>>>       > >>
>     >>>>      >>>>>>>       > >> Thanks for bringing this. We also
>     suffered from
>     >>>>      the long
>     >>>>      >>>>>>>       build time.
>     >>>>      >>>>>>>       > >> I agree that we should focus on
>     solving build
>     >>>>      capacity
>     >>>>      >>>>>>>  problem in the
>     >>>>      >>>>>>>       > >> thread.
>     >>>>      >>>>>>>       > >>
>     >>>>      >>>>>>>       > >> My observation is there is only one
>     build is
>     >>>>      running, all
>     >>>>      >> the
>     >>>>      >>>>>>>  others
>     >>>>      >>>>>>>       > >> (other
>     >>>>      >>>>>>>       > >> PRs, master) are pending.
>     >>>>      >>>>>>>       > >> The pricing plan[1] of travis shows
>     it can
>     >>>> support
>     >>>>      >> concurrent
>     >>>>      >>>>>>>       build
>     >>>>      >>>>>>>       > jobs.
>     >>>>      >>>>>>>       > >> But I don't know which plan we are
>     using, might
>     >>>>      be the free
>     >>>>      >>>>>>>       plan for
>     >>>>      >>>>>>>       > open
>     >>>>      >>>>>>>       > >> source.
>     >>>>      >>>>>>>       > >>
>     >>>>      >>>>>>>       > >> I cc-ed Chesnay who may have some
>     experience on
>     >>>>      Travis.
>     >>>>      >>>>>>>       > >>
>     >>>>      >>>>>>>       > >> Regards,
>     >>>>      >>>>>>>       > >> Jark
>     >>>>      >>>>>>>       > >>
>     >>>>      >>>>>>>       > >> [1]: https://travis-ci.com/plans
>     >>>>      >>>>>>>       > >>
>     >>>>      >>>>>>>       > >> On Tue, 25 Jun 2019 at 08:11, Bowen Li <
>     >>>>      >> bowenli86@gmail.com <ma...@gmail.com>
>     <mailto:bowenli86@gmail.com <ma...@gmail.com>>
>     >>>>      >>>>>>> <mailto:bowenli86@gmail.com
>     <ma...@gmail.com>
>     >>>>      <mailto:bowenli86@gmail.com
>     <ma...@gmail.com>>>> wrote:
>     >>>>      >>>>>>>       > >>
>     >>>>      >>>>>>>       > >> > Hi Steven,
>     >>>>      >>>>>>>       > >> >
>     >>>>      >>>>>>>       > >> > I think you may not read what I
>     wrote. The
>     >>>>      discussion is
>     >>>>      >>>> about
>     >>>>      >>>>>>>       > "unstable
>     >>>>      >>>>>>>       > >> > build **capacity**", in another word
>     >>>>      "unstable / lack of
>     >>>>      >>>> build
>     >>>>      >>>>>>>       > >> resources",
>     >>>>      >>>>>>>       > >> > not "unstable build".
>     >>>>      >>>>>>>       > >> >
>     >>>>      >>>>>>>       > >> > On Mon, Jun 24, 2019 at 4:40 PM
>     Steven Wu
>     >>>>      >>>>>>>       <stevenz3wu@gmail.com
>     <ma...@gmail.com> <mailto:stevenz3wu@gmail.com
>     <ma...@gmail.com>>
>     >>>>      <mailto:stevenz3wu@gmail.com
>     <ma...@gmail.com> <mailto:stevenz3wu@gmail.com
>     <ma...@gmail.com>>>>
>     >>>>      >>>>>>>       > wrote:
>     >>>>      >>>>>>>       > >> >
>     >>>>      >>>>>>>       > >> > > long and sometimes unstable build is
>     >>>>      definitely a pain
>     >>>>      >>>>>> point.
>     >>>>      >>>>>>>       > >> > >
>     >>>>      >>>>>>>       > >> > > I suspect the build failure here in
>     >>>>      >> flink-connector-kafka
>     >>>>      >>>>>>>       is not
>     >>>>      >>>>>>>       > >> related
>     >>>>      >>>>>>>       > >> > to
>     >>>>      >>>>>>>       > >> > > my change. but there is no easy
>     re-run the
>     >>>>      build on
>     >>>>      >>>>>>>  travis UI.
>     >>>>      >>>>>>>       > Google
>     >>>>      >>>>>>>       > >> > > search showed a trick of
>     close-and-open the
>     >>>>      PR will
>     >>>>      >>>>>>>  trigger rebuild.
>     >>>>      >>>>>>>       > >> but
>     >>>>      >>>>>>>       > >> > > that could add noises to the PR
>     activities.
>     >>>>      >>>>>>>       > >> > >
>     >>>> https://travis-ci.org/apache/flink/jobs/545555519
>     >>>>      >>>>>>>       > >> > >
>     >>>>      >>>>>>>       > >> > > travis-ci for my personal repo
>     often failed
>     >>>>      with
>     >>>>      >>>>>>>  exceeding time
>     >>>>      >>>>>>>       > limit
>     >>>>      >>>>>>>       > >> > after
>     >>>>      >>>>>>>       > >> > > 4+ hours.
>     >>>>      >>>>>>>       > >> > > The job exceeded the maximum time
>     limit for
>     >>>>      jobs, and
>     >>>>      >> has
>     >>>>      >>>>>>>       been
>     >>>>      >>>>>>>       > >> > terminated.
>     >>>>      >>>>>>>       > >> > >
>     >>>>      >>>>>>>       > >> > > On Mon, Jun 24, 2019 at 4:15 PM
>     Bowen Li
>     >>>>      >>>>>>>       <bowenli86@gmail.com
>     <ma...@gmail.com> <mailto:bowenli86@gmail.com
>     <ma...@gmail.com>>
>     >>>>      <mailto:bowenli86@gmail.com <ma...@gmail.com>
>     <mailto:bowenli86@gmail.com <ma...@gmail.com>>>>
>     >>>>      >>>>>>>       > wrote:
>     >>>>      >>>>>>>       > >> > >
>     >>>>      >>>>>>>       > >> > > >
>     >>>> https://travis-ci.org/apache/flink/builds/549681530
>     >>>>      >>>>>>>       This build
>     >>>>      >>>>>>>       > >> > request
>     >>>>      >>>>>>>       > >> > > > has
>     >>>>      >>>>>>>       > >> > > > been sitting at **HEAD of the
>     queue**
>     >>>>      since I first
>     >>>>      >> saw
>     >>>>      >>>>>>>       it at PST
>     >>>>      >>>>>>>       > >> > 10:30am
>     >>>>      >>>>>>>       > >> > > > (not sure how long it's been
>     there before
>     >>>>      10:30am).
>     >>>>      >>>>>>>       It's PST
>     >>>>      >>>>>>>       > 4:12pm
>     >>>>      >>>>>>>       > >> now
>     >>>>      >>>>>>>       > >> > > and
>     >>>>      >>>>>>>       > >> > > > it hasn't started yet.
>     >>>>      >>>>>>>       > >> > > >
>     >>>>      >>>>>>>       > >> > > > On Mon, Jun 24, 2019 at 2:48 PM
>     Bowen Li
>     >>>>      >>>>>>>       <bowenli86@gmail.com
>     <ma...@gmail.com> <mailto:bowenli86@gmail.com
>     <ma...@gmail.com>>
>     >>>>      <mailto:bowenli86@gmail.com <ma...@gmail.com>
>     <mailto:bowenli86@gmail.com <ma...@gmail.com>>>>
>     >>>>      >>>>>>>       > >> wrote:
>     >>>>      >>>>>>>       > >> > > >
>     >>>>      >>>>>>>       > >> > > > > Hi devs,
>     >>>>      >>>>>>>       > >> > > > >
>     >>>>      >>>>>>>       > >> > > > > I've been experiencing the pain
>     >>>>      resulting from lack
>     >>>>      >>>>>>>       of stable
>     >>>>      >>>>>>>       > >> build
>     >>>>      >>>>>>>       > >> > > > > capacity on Travis for Flink
>     PRs [1].
>     >>>>      >> Specifically, I
>     >>>>      >>>>>>>  noticed
>     >>>>      >>>>>>>       > >> often
>     >>>>      >>>>>>>       > >> > > that
>     >>>>      >>>>>>>       > >> > > > no
>     >>>>      >>>>>>>       > >> > > > > build in the queue is making any
>     >>>>      progress for
>     >>>>      >> hours,
>     >>>>      >>>> and
>     >>>>      >>>>>>>       > suddenly
>     >>>>      >>>>>>>       > >> 5
>     >>>>      >>>>>>>       > >> > or
>     >>>>      >>>>>>>       > >> > > 6
>     >>>>      >>>>>>>       > >> > > > > builds kick off all together
>     after the
>     >>>>      long pause.
>     >>>>      >>>>>>>       I'm at PST
>     >>>>      >>>>>>>       > >> > (UTC-08)
>     >>>>      >>>>>>>       > >> > > > time
>     >>>>      >>>>>>>       > >> > > > > zone, and I've seen pause can
>     be as
>     >>>>      long as 6 hours
>     >>>>      >>>>>>>       from PST 9am
>     >>>>      >>>>>>>       > >> to
>     >>>>      >>>>>>>       > >> > 3pm
>     >>>>      >>>>>>>       > >> > > > > (let alone the time needed to
>     drain the
>     >>>>      queue
>     >>>>      >>>>>>>  afterwards).
>     >>>>      >>>>>>>       > >> > > > >
>     >>>>      >>>>>>>       > >> > > > > I think this has greatly
>     impacted our
>     >>>>      productivity.
>     >>>>      >>>> I've
>     >>>>      >>>>>>>       > >> experienced
>     >>>>      >>>>>>>       > >> > > that
>     >>>>      >>>>>>>       > >> > > > > PRs submitted in the early
>     morning of
>     >>>>      PST time zone
>     >>>>      >>>>>>>       won't finish
>     >>>>      >>>>>>>       > >> > their
>     >>>>      >>>>>>>       > >> > > > > build until late night of the
>     same day.
>     >>>>      >>>>>>>       > >> > > > >
>     >>>>      >>>>>>>       > >> > > > > So my questions are:
>     >>>>      >>>>>>>       > >> > > > >
>     >>>>      >>>>>>>       > >> > > > > - Has anyone else experienced
>     the same
>     >>>>      problem or
>     >>>>      >>>>>>>       have similar
>     >>>>      >>>>>>>       > >> > > > observation
>     >>>>      >>>>>>>       > >> > > > > on TravisCI? (I suspect it
>     has things
>     >>>>      to do with
>     >>>>      >> time
>     >>>>      >>>>>>>       zone)
>     >>>>      >>>>>>>       > >> > > > >
>     >>>>      >>>>>>>       > >> > > > > - What pricing plan of
>     TravisCI is
>     >>>>      Flink currently
>     >>>>      >>>>>>>  using? Is it
>     >>>>      >>>>>>>       > >> the
>     >>>>      >>>>>>>       > >> > > free
>     >>>>      >>>>>>>       > >> > > > > plan for open source
>     projects? What
>     >>>> are the
>     >>>>      >>>>>>>  guaranteed build
>     >>>>      >>>>>>>       > >> capacity
>     >>>>      >>>>>>>       > >> > > of
>     >>>>      >>>>>>>       > >> > > > > the current plan?
>     >>>>      >>>>>>>       > >> > > > >
>     >>>>      >>>>>>>       > >> > > > > - If the current pricing plan
>     (either
>     >>>>      free or paid)
>     >>>>      >>>>>> can't
>     >>>>      >>>>>>>       > provide
>     >>>>      >>>>>>>       > >> > > stable
>     >>>>      >>>>>>>       > >> > > > > build capacity, can we
>     upgrade to a
>     >>>>      higher priced
>     >>>>      >>>>>>>       plan with
>     >>>>      >>>>>>>       > larger
>     >>>>      >>>>>>>       > >> > and
>     >>>>      >>>>>>>       > >> > > > more
>     >>>>      >>>>>>>       > >> > > > > stable build capacity?
>     >>>>      >>>>>>>       > >> > > > >
>     >>>>      >>>>>>>       > >> > > > > BTW, another factor that
>     contribute to
>     >>>> the
>     >>>>      >>>>>>>  productivity problem
>     >>>>      >>>>>>>       > is
>     >>>>      >>>>>>>       > >> > that
>     >>>>      >>>>>>>       > >> > > > > our build is slow - we run
>     full build
>     >>>>      for every PR
>     >>>>      >>>> and a
>     >>>>      >>>>>>>       > >> successful
>     >>>>      >>>>>>>       > >> > > full
>     >>>>      >>>>>>>       > >> > > > > build takes ~5h. We
>     definitely have
>     >>>>      more options to
>     >>>>      >>>>>>>       solve it,
>     >>>>      >>>>>>>       > for
>     >>>>      >>>>>>>       > >> > > > instance,
>     >>>>      >>>>>>>       > >> > > > > modularize the build graphs
>     and reuse
>     >>>>      artifacts
>     >>>>      >> from
>     >>>>      >>>> the
>     >>>>      >>>>>>>       > previous
>     >>>>      >>>>>>>       > >> > > build.
>     >>>>      >>>>>>>       > >> > > > > But I think that can be a big
>     effort
>     >>>>      which is much
>     >>>>      >>>>>>>  harder to
>     >>>>      >>>>>>>       > >> > accomplish
>     >>>>      >>>>>>>       > >> > > > in
>     >>>>      >>>>>>>       > >> > > > > a short period of time and
>     may deserve
>     >>>>      its own
>     >>>>      >>>> separate
>     >>>>      >>>>>>>       > >> discussion.
>     >>>>      >>>>>>>       > >> > > > >
>     >>>>      >>>>>>>       > >> > > > > [1]
>     >>>>      >> https://travis-ci.org/apache/flink/pull_requests
>     >>>>      >>>>>>>       > >> > > > >
>     >>>>      >>>>>>>       > >> > > > >
>     >>>>      >>>>>>>       > >> > > >
>     >>>>      >>>>>>>       > >> > >
>     >>>>      >>>>>>>       > >> >
>     >>>>      >>>>>>>       > >>
>     >>>>      >>>>>>>       > >
>     >>>>      >>>>>>>       >
>     >>>>      >>>>>>>
>     >>>>      >>>>>>>
>     >>>>      >>>>>>>       --
>     >>>>      >>>>>>>       Best Regards
>     >>>>      >>>>>>>
>     >>>>      >>>>>>>       Jeff Zhang
>     >>>>      >>>>>>>
>     >>>>      >>
>     >>>>
>     >>>
>     >>
>


Re: [DISCUSS] solve unstable build capacity problem on TravisCI

Posted by Bowen Li <bo...@gmail.com>.
Re: > Are they using their own Travis CI pool, or did the switch to an
entirely different CI service?

I reached out to Wes and Krisztián from Apache Arrow PMC. They are
currently moving away from ASF's Travis to their own in-house metal
machines at [1] with custom CI application at [2]. They've seen significant
improvement w.r.t both much higher performance and basically no resource
waiting time, "night-and-day" difference quoting Wes.

Re: > If we can just switch to our own Travis pool, just for our project,
then this might be something we can do fairly quickly?

I believe so, according to [3] and [4]


[1] https://ci.ursalabs.org/ <https://ci.ursalabs.org/#/>
[2] https://github.com/ursa-labs/ursabot
[3] https://docs.travis-ci.com/user/migrate/open-source-repository-migration
[4] https://docs.travis-ci.com/user/migrate/open-source-on-travis-ci-com



On Wed, Jul 3, 2019 at 12:01 AM Chesnay Schepler <ch...@apache.org> wrote:

> Are they using their own Travis CI pool, or did the switch to an
> entirely different CI service?
>
> If we can just switch to our own Travis pool, just for our project, then
> this might be something we can do fairly quickly?
>
> On 03/07/2019 05:55, Bowen Li wrote:
> > I responded in the INFRA ticket [1] that I believe they are using a wrong
> > metric against Flink and the total build time is a completely different
> > thing than guaranteed build capacity.
> >
> > My response:
> >
> > "As mentioned above, since I started to pay attention to Flink's build
> > queue a few tens of days ago, I'm in Seattle and I saw no build was
> kicking
> > off in PST daytime in weekdays for Flink. Our teammates in China and
> Europe
> > have also reported similar observations. So we need to evaluate how the
> > large total build time came from - if 1) your number and 2) our
> > observations from three locations that cover pretty much a full day, are
> > all true, I **guess** one reason can be that - highly likely the extra
> > build time came from weekends when other Apache projects may be idle and
> > Flink just drains hard its congested queue.
> >
> > Please be aware of that we're not complaining about the lack of resources
> > in general, I'm complaining about the lack of **stable, dedicated**
> > resources. An example for the latter one is, currently even if no build
> is
> > in Flink's queue and I submit a request to be the queue head in PST
> > morning, my build won't even start in 6-8+h. That is an absurd amount of
> > waiting time.
> >
> > That's saying, if ASF INFRA decides to adopt a quota system and grants
> > Flink five DEDICATED servers that runs all the time only for Flink,
> that'll
> > be PERFECT and can totally solve our problem now.
> >
> > Please be aware of that we're not complaining about the lack of resources
> > in general, I'm complaining about the lack of **stable, dedicated**
> > resources. An example for the latter one is, currently even if no build
> is
> > in Flink's queue and I submit a request to be the queue head in PST
> > morning, my build won't even start in 6-8+h. That is an absurd amount of
> > waiting time.
> >
> >
> > That's saying, if ASF INFRA decides to adopt a quota system and grants
> > Flink five DEDICATED servers that runs all the time only for Flink,
> that'll
> > be PERFECT and can totally solve our problem now.
> >
> > I feel what's missing in the ASF INFRA's Travis resource pool is some
> level
> > of build capacity SLAs and certainty"
> >
> >
> > Again, I believe there are differences in nature of these two problems,
> > long build time v.s. lack of dedicated build resource. That's saying,
> > shortening build time may relieve the situation, and may not. I'm sightly
> > negative on disabling IT cases for PRs, due to the downside is that we
> are
> > at risk of any potential bugs in PR that UTs doesn't catch, and may cost
> a
> > lot more to fix and if it slows others down or even block others, but am
> > open to others opinions on it.
> >
> > AFAICT from INFRA ticket[1], donating to ASF INFRA won't be feasible to
> > solve our problem since INFRA's pool is fully shared and they have no
> > control and finer insights over resource allocation to a specific Apache
> > project. As mentioned in [1], Apache Arrow is moving away from ASF INFRA
> > Travis pool (they are actually surprised Flink hasn't plan to do so). I
> > know that Spark is on its own build infra. If we all agree that funding
> our
> > own build infra, I'd be glad to help investigate any potential options
> > after releasing 1.9 since I'm super busy with 1.9 now.
> >
> > [1] https://issues.apache.org/jira/browse/INFRA-18533
> >
> >
> >
> > On Tue, Jul 2, 2019 at 4:46 AM Chesnay Schepler <ch...@apache.org>
> wrote:
> >
> >> As a short-term stopgap, since we can assume this issue to become much
> >> worse in the following days/weeks, we could disable IT cases in PRs and
> >> only run them on master.
> >>
> >> On 02/07/2019 12:03, Chesnay Schepler wrote:
> >>> People really have to stop thinking that just because something works
> >>> for us it is also a good solution.
> >>> Also, please remember that our builds run for 2h from start to finish,
> >>> and not the 14 _minutes_ it takes for zeppelin.
> >>> We are dealing with an entirely different scale here, both in terms of
> >>> build times and number of builds.
> >>>
> >>> In this very thread people have been complaining about long queue
> >>> times for their builds. Surprise, other Apache projects have been
> >>> suffering the very same thing due to us not controlling our build
> >>> times. While switching services (be it Jenkins, CircleCI or whatever)
> >>> will possibly work for us (and these options are actually attractive,
> >>> like CircleCI's proper support for build artifacts), it will also
> >>> result in us likely negatively affecting other projects in significant
> >>> ways.
> >>>
> >>> Sure, the Jenkins setup has a good user experience for us, at the cost
> >>> of blocking Jenkins workers for a _lot_ of time. Right now we have 25
> >>> PR's in our queue; that's possibly 50h we'd consume of Jenkins
> >>> resources, and the European contributors haven't even really started
> yet.
> >>>
> >>> FYI, the latest INFRA response from INFRA-18533:
> >>>
> >>> "Our rough metrics shows that Flink used over 5800 hours of build time
> >>> last month. That is equal to EIGHT servers running 24/7 for the ENTIRE
> >>> MONTH. EIGHT. nonstop.
> >>> When we discovered this last night, we discussed it some and are going
> >>> to tune down Flink to allow only five executors maximum. We cannot
> >>> allow Flink to consume so much of a Foundation shared resource."
> >>>
> >>> So yes, we either
> >>> a) have to heavily reduce our CI usage or
> >>> b) fund our own, either maintaining it ourselves or donating to Apache.
> >>>
> >>> On 02/07/2019 05:11, Bowen Li wrote:
> >>>> By looking at the git history of the Jenkins script, its core part
> >>>> was finished in March 2017 (and only two minor update in 2017/2018),
> >>>> so it's been running for over two years now and feels like Zepplin
> >>>> community has been quite happy with it. @Jeff Zhang
> >>>> <ma...@gmail.com> can you share your insights and user
> >>>> experience with the Jenkins+Travis approach?
> >>>>
> >>>> Things like:
> >>>>
> >>>> - has the approach completely solved the resource capacity problem
> >>>> for Zepplin community? is Zepplin community happy with the result?
> >>>> - is the whole configuration chain stable (e.g. uptime) enough?
> >>>> - how often do you need to maintain the Jenkins infra? how many
> >>>> people are usually involved in maintenance and bug-fixes?
> >>>>
> >>>> The downside of this approach seems mostly to be on the maintenance
> >>>> to me - maintain the script and Jenkins infra.
> >>>>
> >>>> ** Having Our Own Travis-CI.com Account **
> >>>>
> >>>> Another alternative I've been thinking of is to have our own
> >>>> travis-ci.com <http://travis-ci.com> account with paid dedicated
> >>>> resources. Note travis-ci.org <http://travis-ci.org> is the free
> >>>> version and travis-ci.com <http://travis-ci.com> is the commercial
> >>>> version. We currently use a shared resource pool managed by ASK INFRA
> >>>> team on travis-ci.org <http://travis-ci.org>, but we have no control
> >>>> over it - we can't see how it's configured, how much resources are
> >>>> available, how resources are allocated among Apache projects, etc.
> >>>> The nice thing about having an account on travis-ci.com
> >>>> <http://travis-ci.com> are:
> >>>>
> >>>> - relatively low cost with much better resource guarantee than what
> >>>> we currently have [1]: $249/month with 5 dedicated concurrency,
> >>>> $489/month with 10 concurrency
> >>>> - low maintenance work compared to using Jenkins
> >>>> - (potentially) no migration cost according to Travis's doc [2]
> >>>> (pending verification)
> >>>> - full control over the build capacity/configuration compared to
> >>>> using ASF INFRA's pool
> >>>>
> >>>> I'd be surprised if we as such a vibrant community cannot find and
> >>>> fund $249*12=$2988 a year in exchange for a much better developer
> >>>> experience and much higher productivity.
> >>>>
> >>>> [1] https://travis-ci.com/plans
> >>>> [2]
> >>>>
> >>
> https://docs.travis-ci.com/user/migrate/open-source-repository-migration
> >>>> On Sat, Jun 29, 2019 at 8:39 AM Chesnay Schepler <chesnay@apache.org
> >>>> <ma...@apache.org>> wrote:
> >>>>
> >>>>      So yes, the Jenkins job keeps pulling the state from Travis
> until it
> >>>>      finishes.
> >>>>
> >>>>      Note sure I'm comfortable with the idea of using Jenkins workers
> >>>>      just to
> >>>>      idle for a several hours.
> >>>>
> >>>>      On 29/06/2019 14:56, Jeff Zhang wrote:
> >>>>      > Here's what zeppelin community did, we make a python script to
> >>>>      check the
> >>>>      > build status of pull request.
> >>>>      > Here's script:
> >>>>      > https://github.com/apache/zeppelin/blob/master/travis_check.py
> >>>>      >
> >>>>      > And this is the script we used in Jenkins build job.
> >>>>      >
> >>>>      > if [ -f "travis_check.py" ]; then
> >>>>      >    git log -n 1
> >>>>      >    STATUS=$(curl -s $BUILD_URL | grep -e "GitHub pull
> >>>>      request.*from.*" | sed
> >>>>      > 's/.*GitHub pull request <a
> >>>>      > href=\"\(https[^"]*\).*from[^"]*.\(https[^"]*\).*/\1 \2/g')
> >>>>      >    AUTHOR=$(echo $STATUS | sed 's/.*[/]\(.*\)$/\1/g')
> >>>>      >    PR=$(echo $STATUS | awk '{print $1}' | sed
> >>>> 's/.*[/]\(.*\)$/\1/g')
> >>>>      >    #COMMIT=$(git log -n 1 | grep "^Merge:" | awk '{print $3}')
> >>>>      >    #if [ -z $COMMIT ]; then
> >>>>      >    #  COMMIT=$(curl -s
> >>>>      https://api.github.com/repos/apache/zeppelin/pulls/$PR
> >>>>      > | grep -e "\"label\":" -e "\"ref\":" -e "\"sha\":" | tr '\n' '
> '
> >>>>      | sed
> >>>>      > 's/\(.*sha[^,]*,\)\(.*ref.*\)/\1 = \2/g' | tr = '\n' | grep -v
> >>>>      "apache:" |
> >>>>      > sed 's/.*sha.[^"]*["]\([^"]*\).*/\1/g')
> >>>>      >    #fi
> >>>>      >
> >>>>      >    # get commit hash from PR
> >>>>      >    COMMIT=$(curl -s
> >>>>      https://api.github.com/repos/apache/zeppelin/pulls/$PR |
> >>>>      > grep -e "\"label\":" -e "\"ref\":" -e "\"sha\":" | tr '\n' ' '
> >>>> | sed
> >>>>      > 's/\(.*sha[^,]*,\)\(.*ref.*\)/\1 = \2/g' | tr = '\n' | grep -v
> >>>>      "apache:" |
> >>>>      > sed 's/.*sha.[^"]*["]\([^"]*\).*/\1/g')
> >>>>      >    sleep 30 # sleep few moment to wait travis starts the build
> >>>>      >    RET_CODE=0
> >>>>      >    python ./travis_check.py ${AUTHOR} ${COMMIT} || RET_CODE=$?
> >>>>      >    if [ $RET_CODE -eq 2 ]; then # try with repository name when
> >>>>      travis-ci is
> >>>>      > not available in the account
> >>>>      >      RET_CODE=0
> >>>>      >      AUTHOR=$(curl -s
> >>>>      https://api.github.com/repos/apache/zeppelin/pulls/$PR
> >>>>      > | grep '"full_name":' | grep -v "apache/zeppelin" | sed
> >>>>      > 's/.*[:][^"]*["]\([^/]*\).*/\1/g')
> >>>>      >    python ./travis_check.py ${AUTHOR} ${COMMIT} || RET_CODE=$?
> >>>>      >    fi
> >>>>      >
> >>>>      >    if [ $RET_CODE -eq 2 ]; then # fail with can't find build
> >>>>      information in
> >>>>      > the travis
> >>>>      >      set +x
> >>>>      >      echo
> "-----------------------------------------------------"
> >>>>      >      echo "Looks like travis-ci is not configured for your
> fork."
> >>>>      >      echo "Please setup by swich on 'zeppelin' repository at
> >>>>      > https://travis-ci.org/profile and travis-ci."
> >>>>      >      echo "And then make sure 'Build branch updates' option is
> >>>>      enabled in
> >>>>      > the settings https://travis-ci.org/${AUTHOR}/zeppelin/settings
> >>>> <https://travis-ci.org/$%7BAUTHOR%7D/zeppelin/settings>."
> >>>>      >      echo ""
> >>>>      >      echo "To trigger CI after setup, you will need ammend your
> >>>>      last commit
> >>>>      > with"
> >>>>      >      echo "git commit --amend"
> >>>>      >      echo "git push your-remote HEAD --force"
> >>>>      >      echo ""
> >>>>      >      echo "See
> >>>>      >
> >>>>
> >>
> http://zeppelin.apache.org/contribution/contributions.html#continuous-integration
> >>>>      > ."
> >>>>      >    fi
> >>>>      >
> >>>>      >    exit $RET_CODE
> >>>>      > else
> >>>>      >    set +x
> >>>>      >    echo "travis_check.py does not exists"
> >>>>      >    exit 1
> >>>>      > fi
> >>>>      >
> >>>>      > Chesnay Schepler <chesnay@apache.org
> >>>>      <ma...@apache.org>> 于2019年6月29日周六 下午3:17写道:
> >>>>      >
> >>>>      >> Does this imply that a Jenkins job is active as long as the
> >>>>      Travis build
> >>>>      >> runs?
> >>>>      >>
> >>>>      >> On 26/06/2019 21:28, Bowen Li wrote:
> >>>>      >>> Hi,
> >>>>      >>>
> >>>>      >>> @Dawid, I think the "long test running" as I mentioned in the
> >>>>      first
> >>>>      >> email,
> >>>>      >>> also as you guys said, belongs to "a big effort which is much
> >>>>      harder to
> >>>>      >>> accomplish in a short period of time and may deserve its own
> >>>>      separate
> >>>>      >>> discussion". Thus I didn't include it in what we can do in a
> >>>>      foreseeable
> >>>>      >>> short term.
> >>>>      >>>
> >>>>      >>> Besides, I don't think that's the ultimate reason for lack of
> >>>>      build
> >>>>      >>> resources. Even if the build is shortened to something like
> >>>>      2h, the
> >>>>      >>> problems of no build machine works about 6 or more hours in
> >>>>      PST daytime
> >>>>      >>> that I described will still happen, because no machine from
> >>>>      ASF INFRA's
> >>>>      >>> pool is allocated to Flink. As I have paid close attention to
> >>>>      the build
> >>>>      >>> queue in the past few weekdays, it's a pretty clear pattern
> now.
> >>>>      >>>
> >>>>      >>> **The ultimate root cause** for that is - we don't have any
> >>>>      **dedicated**
> >>>>      >>> build resources that we can stably rely on. I'm actually ok
> to
> >>>>      wait for a
> >>>>      >>> long time if there are build requests running, it means at
> >>>>      least we are
> >>>>      >>> making progress. But I'm not ok with no build resource. A
> >>>>      better place I
> >>>>      >>> think we should aim at in short term is to always have at
> >>>>      least a central
> >>>>      >>> pool (can be 3 or 5) of machines dedicated to build Flink at
> >>>>      any time, or
> >>>>      >>> maybe use users resources.
> >>>>      >>>
> >>>>      >>> @Chesnay @Robert I synced with Jeff offline that Zeppelin
> >>>>      community is
> >>>>      >>> using a Jenkins job to automatically build on users' travis
> >>>>      account and
> >>>>      >>> link the result back to github PR. I guess the Jenkins job
> >>>>      would fetch
> >>>>      >>> latest upstream master and build the PR against it. Jeff has
> >>>> filed
> >>>>      >> tickets
> >>>>      >>> to learn and get access to the Jenkins infra. It'll better to
> >>>>      fully
> >>>>      >>> understand it first before judging this approach.
> >>>>      >>>
> >>>>      >>> I also heard good things about CircleCI, and ASF INFRA seems
> >>>>      to have a
> >>>>      >> pool
> >>>>      >>> of build capacity there too. Can be an alternative to
> consider.
> >>>>      >>>
> >>>>      >>>
> >>>>      >>>
> >>>>      >>>
> >>>>      >>>
> >>>>      >>>
> >>>>      >>>
> >>>>      >>>
> >>>>      >>>
> >>>>      >>> On Wed, Jun 26, 2019 at 12:44 AM Dawid Wysakowicz <
> >>>>      >> dwysakowicz@apache.org <ma...@apache.org>>
> >>>>      >>> wrote:
> >>>>      >>>
> >>>>      >>>> Sorry to jump in late, but I think Bowen missed the most
> >>>>      important point
> >>>>      >>>> from Chesnay's previous message in the summary. The ultimate
> >>>>      reason for
> >>>>      >>>> all the problems is that the tests take close to 2 hours to
> >>>>      run already.
> >>>>      >>>> I fully support this claim: "Unless people start caring
> about
> >>>>      test times
> >>>>      >>>> before adding them, this issue cannot be solved"
> >>>>      >>>>
> >>>>      >>>> This is also another reason why using user's Travis account
> >>>>      won't help.
> >>>>      >>>> Every few weeks we reach the user's time limit for a single
> >>>>      profile.
> >>>>      >>>> This makes the user's builds simply fail, until we either
> >>>>      properly
> >>>>      >>>> decrease the time the tests take (which I am not sure we
> ever
> >>>>      did) or
> >>>>      >>>> postpone the problem by splitting into more profiles. (Note
> >>>>      that the ASF
> >>>>      >>>> Travis account has higher time limits)
> >>>>      >>>>
> >>>>      >>>> Best,
> >>>>      >>>>
> >>>>      >>>> Dawid
> >>>>      >>>>
> >>>>      >>>> On 26/06/2019 09:36, Robert Metzger wrote:
> >>>>      >>>>> Do we know if using "the best" available hardware would
> >>>>      improve the
> >>>>      >> build
> >>>>      >>>>> times?
> >>>>      >>>>> Imagine we would run the build on machines with plenty of
> >>>>      main memory
> >>>>      >> to
> >>>>      >>>>> mount everything to ramdisk + the latest CPU architecture?
> >>>>      >>>>>
> >>>>      >>>>> Throwing hardware at the problem could help reduce the time
> >>>>      of an
> >>>>      >>>>> individual build, and using our own infrastructure would
> >>>>      remove our
> >>>>      >>>>> dependency on Apache's Travis account (with the obvious
> >>>>      downside of
> >>>>      >>>> having
> >>>>      >>>>> to maintain the infrastructure)
> >>>>      >>>>> We could use an open source travis alternative, to have a
> >>>>      similar
> >>>>      >>>>> experience and make the migration easy.
> >>>>      >>>>>
> >>>>      >>>>>
> >>>>      >>>>> On Wed, Jun 26, 2019 at 9:34 AM Chesnay Schepler
> >>>>      <chesnay@apache.org <ma...@apache.org>>
> >>>>      >>>> wrote:
> >>>>      >>>>>>    From what I gathered, there's no special sauce that the
> >>>>      Zeppelin
> >>>>      >>>>>> project uses which actually integrates a users Travis
> >>>>      account into the
> >>>>      >>>> PR.
> >>>>      >>>>>> They just disabled Travis for PRs. And that's kind of it.
> >>>>      >>>>>>
> >>>>      >>>>>> Naturally we can do this (duh) and safe the ASF a fair
> >>>>      amount of
> >>>>      >>>>>> resources, but there are downsides:
> >>>>      >>>>>>
> >>>>      >>>>>> The discoverability of the Travis check takes a nose-dive.
> >>>>      Either we
> >>>>      >>>>>> require every contributor to always, an every commit, also
> >>>>      post a
> >>>>      >> Travis
> >>>>      >>>>>> build, or we have the reviewer sift through the
> >>>>      contributors account
> >>>>      >> to
> >>>>      >>>>>> find it.
> >>>>      >>>>>>
> >>>>      >>>>>> This is rather cumbersome. Additionally, it's also not
> >>>>      equivalent to
> >>>>      >>>>>> having a PR build.
> >>>>      >>>>>>
> >>>>      >>>>>> A normal branch build takes a branch as is and tests it. A
> >>>>      PR build
> >>>>      >>>>>> merges the branch into master, and then runs it. (Fun
> fact:
> >>>>      This is
> >>>>      >> why
> >>>>      >>>>>> a PR without merge conflicts is not being run on Travis.)
> >>>>      >>>>>>
> >>>>      >>>>>> And ultimately, everyone can already make use of this
> >>>>      approach anyway.
> >>>>      >>>>>>
> >>>>      >>>>>> On 25/06/2019 08:02, Jark Wu wrote:
> >>>>      >>>>>>> Hi Jeff,
> >>>>      >>>>>>>
> >>>>      >>>>>>> Thanks for sharing the Zeppelin approach. I think it's a
> >>>>      good idea to
> >>>>      >>>>>>> leverage user's travis account.
> >>>>      >>>>>>> In this way, we can have almost unlimited concurrent
> build
> >>>>      jobs and
> >>>>      >>>>>>> developers can restart build by themselves (currently
> only
> >>>>      committers
> >>>>      >>>>>>> can restart PR's build).
> >>>>      >>>>>>>
> >>>>      >>>>>>> But I'm still not very clear how to integrate user's
> >>>>      travis build
> >>>>      >> into
> >>>>      >>>>>>> the Flink pull request's build automatically. Can you
> >>>>      explain more in
> >>>>      >>>>>>> detail?
> >>>>      >>>>>>>
> >>>>      >>>>>>> Another question: does travis only build branches for
> user
> >>>>      account?
> >>>>      >>>>>>> My concern is that builds for PRs will rebase user's
> >>>>      commits against
> >>>>      >>>>>>> current master branch.
> >>>>      >>>>>>> This will help us to find problems before merge.  Builds
> >>>>      for branches
> >>>>      >>>>>>> will lose the impact of new commits in master.
> >>>>      >>>>>>> How does Zeppelin solve this problem?
> >>>>      >>>>>>>
> >>>>      >>>>>>> Thanks again for sharing the idea.
> >>>>      >>>>>>>
> >>>>      >>>>>>> Regards,
> >>>>      >>>>>>> Jark
> >>>>      >>>>>>>
> >>>>      >>>>>>> On Tue, 25 Jun 2019 at 11:01, Jeff Zhang <
> zjffdu@gmail.com
> >>>>      <ma...@gmail.com>
> >>>>      >>>>>>> <mailto:zjffdu@gmail.com <ma...@gmail.com>>>
> wrote:
> >>>>      >>>>>>>
> >>>>      >>>>>>>       Hi Folks,
> >>>>      >>>>>>>
> >>>>      >>>>>>>       Zeppelin meet this kind of issue before, we solve
> >>>> it by
> >>>>      >> delegating
> >>>>      >>>>>>>       each
> >>>>      >>>>>>>       one's PR build to his travis account (Everyone can
> >>>>      have 5 free
> >>>>      >>>>>>>       slot for
> >>>>      >>>>>>>       travis build).
> >>>>      >>>>>>>       Apache account travis build is only triggered when
> >>>>      PR is merged.
> >>>>      >>>>>>>
> >>>>      >>>>>>>
> >>>>      >>>>>>>
> >>>>      >>>>>>>       Kurt Young <ykt836@gmail.com
> >>>>      <ma...@gmail.com> <mailto:ykt836@gmail.com
> >>>>      <ma...@gmail.com>>>
> >>>>      >>>>>>>       于2019年6月25日周二 上午10:16写道:
> >>>>      >>>>>>>
> >>>>      >>>>>>>       > (Forgot to cc George)
> >>>>      >>>>>>>       >
> >>>>      >>>>>>>       > Best,
> >>>>      >>>>>>>       > Kurt
> >>>>      >>>>>>>       >
> >>>>      >>>>>>>       >
> >>>>      >>>>>>>       > On Tue, Jun 25, 2019 at 10:16 AM Kurt Young
> >>>>      <ykt836@gmail.com <ma...@gmail.com>
> >>>>      >>>>>>> <mailto:ykt836@gmail.com <ma...@gmail.com>>>
> >>>>      wrote:
> >>>>      >>>>>>>       >
> >>>>      >>>>>>>       > > Hi Bowen,
> >>>>      >>>>>>>       > >
> >>>>      >>>>>>>       > > Thanks for bringing this up. We actually have
> >>>>      discussed
> >>>>      >> about
> >>>>      >>>>>>>       this, and I
> >>>>      >>>>>>>       > > think Till and George have
> >>>>      >>>>>>>       > > already spend sometime investigating it. I have
> >>>>      cced both of
> >>>>      >>>>>>>       them, and
> >>>>      >>>>>>>       > > maybe they can share
> >>>>      >>>>>>>       > > their findings.
> >>>>      >>>>>>>       > >
> >>>>      >>>>>>>       > > Best,
> >>>>      >>>>>>>       > > Kurt
> >>>>      >>>>>>>       > >
> >>>>      >>>>>>>       > >
> >>>>      >>>>>>>       > > On Tue, Jun 25, 2019 at 10:08 AM Jark Wu
> >>>>      <imjark@gmail.com <ma...@gmail.com>
> >>>>      >>>>>>> <mailto:imjark@gmail.com <ma...@gmail.com>>>
> >>>>      wrote:
> >>>>      >>>>>>>       > >
> >>>>      >>>>>>>       > >> Hi Bowen,
> >>>>      >>>>>>>       > >>
> >>>>      >>>>>>>       > >> Thanks for bringing this. We also suffered
> from
> >>>>      the long
> >>>>      >>>>>>>       build time.
> >>>>      >>>>>>>       > >> I agree that we should focus on solving build
> >>>>      capacity
> >>>>      >>>>>>>       problem in the
> >>>>      >>>>>>>       > >> thread.
> >>>>      >>>>>>>       > >>
> >>>>      >>>>>>>       > >> My observation is there is only one build is
> >>>>      running, all
> >>>>      >> the
> >>>>      >>>>>>>       others
> >>>>      >>>>>>>       > >> (other
> >>>>      >>>>>>>       > >> PRs, master) are pending.
> >>>>      >>>>>>>       > >> The pricing plan[1] of travis shows it can
> >>>> support
> >>>>      >> concurrent
> >>>>      >>>>>>>       build
> >>>>      >>>>>>>       > jobs.
> >>>>      >>>>>>>       > >> But I don't know which plan we are using,
> might
> >>>>      be the free
> >>>>      >>>>>>>       plan for
> >>>>      >>>>>>>       > open
> >>>>      >>>>>>>       > >> source.
> >>>>      >>>>>>>       > >>
> >>>>      >>>>>>>       > >> I cc-ed Chesnay who may have some experience
> on
> >>>>      Travis.
> >>>>      >>>>>>>       > >>
> >>>>      >>>>>>>       > >> Regards,
> >>>>      >>>>>>>       > >> Jark
> >>>>      >>>>>>>       > >>
> >>>>      >>>>>>>       > >> [1]: https://travis-ci.com/plans
> >>>>      >>>>>>>       > >>
> >>>>      >>>>>>>       > >> On Tue, 25 Jun 2019 at 08:11, Bowen Li <
> >>>>      >> bowenli86@gmail.com <ma...@gmail.com>
> >>>>      >>>>>>> <mailto:bowenli86@gmail.com
> >>>>      <ma...@gmail.com>>> wrote:
> >>>>      >>>>>>>       > >>
> >>>>      >>>>>>>       > >> > Hi Steven,
> >>>>      >>>>>>>       > >> >
> >>>>      >>>>>>>       > >> > I think you may not read what I wrote. The
> >>>>      discussion is
> >>>>      >>>> about
> >>>>      >>>>>>>       > "unstable
> >>>>      >>>>>>>       > >> > build **capacity**", in another word
> >>>>      "unstable / lack of
> >>>>      >>>> build
> >>>>      >>>>>>>       > >> resources",
> >>>>      >>>>>>>       > >> > not "unstable build".
> >>>>      >>>>>>>       > >> >
> >>>>      >>>>>>>       > >> > On Mon, Jun 24, 2019 at 4:40 PM Steven Wu
> >>>>      >>>>>>>       <stevenz3wu@gmail.com <mailto:stevenz3wu@gmail.com
> >
> >>>>      <mailto:stevenz3wu@gmail.com <ma...@gmail.com>>>
> >>>>      >>>>>>>       > wrote:
> >>>>      >>>>>>>       > >> >
> >>>>      >>>>>>>       > >> > > long and sometimes unstable build is
> >>>>      definitely a pain
> >>>>      >>>>>> point.
> >>>>      >>>>>>>       > >> > >
> >>>>      >>>>>>>       > >> > > I suspect the build failure here in
> >>>>      >> flink-connector-kafka
> >>>>      >>>>>>>       is not
> >>>>      >>>>>>>       > >> related
> >>>>      >>>>>>>       > >> > to
> >>>>      >>>>>>>       > >> > > my change. but there is no easy re-run the
> >>>>      build on
> >>>>      >>>>>>>       travis UI.
> >>>>      >>>>>>>       > Google
> >>>>      >>>>>>>       > >> > > search showed a trick of close-and-open
> the
> >>>>      PR will
> >>>>      >>>>>>>       trigger rebuild.
> >>>>      >>>>>>>       > >> but
> >>>>      >>>>>>>       > >> > > that could add noises to the PR
> activities.
> >>>>      >>>>>>>       > >> > >
> >>>>      https://travis-ci.org/apache/flink/jobs/545555519
> >>>>      >>>>>>>       > >> > >
> >>>>      >>>>>>>       > >> > > travis-ci for my personal repo often
> failed
> >>>>      with
> >>>>      >>>>>>>       exceeding time
> >>>>      >>>>>>>       > limit
> >>>>      >>>>>>>       > >> > after
> >>>>      >>>>>>>       > >> > > 4+ hours.
> >>>>      >>>>>>>       > >> > > The job exceeded the maximum time limit
> for
> >>>>      jobs, and
> >>>>      >> has
> >>>>      >>>>>>>       been
> >>>>      >>>>>>>       > >> > terminated.
> >>>>      >>>>>>>       > >> > >
> >>>>      >>>>>>>       > >> > > On Mon, Jun 24, 2019 at 4:15 PM Bowen Li
> >>>>      >>>>>>>       <bowenli86@gmail.com <ma...@gmail.com>
> >>>>      <mailto:bowenli86@gmail.com <ma...@gmail.com>>>
> >>>>      >>>>>>>       > wrote:
> >>>>      >>>>>>>       > >> > >
> >>>>      >>>>>>>       > >> > > >
> >>>>      https://travis-ci.org/apache/flink/builds/549681530
> >>>>      >>>>>>>       This build
> >>>>      >>>>>>>       > >> > request
> >>>>      >>>>>>>       > >> > > > has
> >>>>      >>>>>>>       > >> > > > been sitting at **HEAD of the queue**
> >>>>      since I first
> >>>>      >> saw
> >>>>      >>>>>>>       it at PST
> >>>>      >>>>>>>       > >> > 10:30am
> >>>>      >>>>>>>       > >> > > > (not sure how long it's been there
> before
> >>>>      10:30am).
> >>>>      >>>>>>>       It's PST
> >>>>      >>>>>>>       > 4:12pm
> >>>>      >>>>>>>       > >> now
> >>>>      >>>>>>>       > >> > > and
> >>>>      >>>>>>>       > >> > > > it hasn't started yet.
> >>>>      >>>>>>>       > >> > > >
> >>>>      >>>>>>>       > >> > > > On Mon, Jun 24, 2019 at 2:48 PM Bowen Li
> >>>>      >>>>>>>       <bowenli86@gmail.com <ma...@gmail.com>
> >>>>      <mailto:bowenli86@gmail.com <ma...@gmail.com>>>
> >>>>      >>>>>>>       > >> wrote:
> >>>>      >>>>>>>       > >> > > >
> >>>>      >>>>>>>       > >> > > > > Hi devs,
> >>>>      >>>>>>>       > >> > > > >
> >>>>      >>>>>>>       > >> > > > > I've been experiencing the pain
> >>>>      resulting from lack
> >>>>      >>>>>>>       of stable
> >>>>      >>>>>>>       > >> build
> >>>>      >>>>>>>       > >> > > > > capacity on Travis for Flink PRs [1].
> >>>>      >> Specifically, I
> >>>>      >>>>>>>       noticed
> >>>>      >>>>>>>       > >> often
> >>>>      >>>>>>>       > >> > > that
> >>>>      >>>>>>>       > >> > > > no
> >>>>      >>>>>>>       > >> > > > > build in the queue is making any
> >>>>      progress for
> >>>>      >> hours,
> >>>>      >>>> and
> >>>>      >>>>>>>       > suddenly
> >>>>      >>>>>>>       > >> 5
> >>>>      >>>>>>>       > >> > or
> >>>>      >>>>>>>       > >> > > 6
> >>>>      >>>>>>>       > >> > > > > builds kick off all together after the
> >>>>      long pause.
> >>>>      >>>>>>>       I'm at PST
> >>>>      >>>>>>>       > >> > (UTC-08)
> >>>>      >>>>>>>       > >> > > > time
> >>>>      >>>>>>>       > >> > > > > zone, and I've seen pause can be as
> >>>>      long as 6 hours
> >>>>      >>>>>>>       from PST 9am
> >>>>      >>>>>>>       > >> to
> >>>>      >>>>>>>       > >> > 3pm
> >>>>      >>>>>>>       > >> > > > > (let alone the time needed to drain
> the
> >>>>      queue
> >>>>      >>>>>>>       afterwards).
> >>>>      >>>>>>>       > >> > > > >
> >>>>      >>>>>>>       > >> > > > > I think this has greatly impacted our
> >>>>      productivity.
> >>>>      >>>> I've
> >>>>      >>>>>>>       > >> experienced
> >>>>      >>>>>>>       > >> > > that
> >>>>      >>>>>>>       > >> > > > > PRs submitted in the early morning of
> >>>>      PST time zone
> >>>>      >>>>>>>       won't finish
> >>>>      >>>>>>>       > >> > their
> >>>>      >>>>>>>       > >> > > > > build until late night of the same
> day.
> >>>>      >>>>>>>       > >> > > > >
> >>>>      >>>>>>>       > >> > > > > So my questions are:
> >>>>      >>>>>>>       > >> > > > >
> >>>>      >>>>>>>       > >> > > > > - Has anyone else experienced the same
> >>>>      problem or
> >>>>      >>>>>>>       have similar
> >>>>      >>>>>>>       > >> > > > observation
> >>>>      >>>>>>>       > >> > > > > on TravisCI? (I suspect it has things
> >>>>      to do with
> >>>>      >> time
> >>>>      >>>>>>>       zone)
> >>>>      >>>>>>>       > >> > > > >
> >>>>      >>>>>>>       > >> > > > > - What pricing plan of TravisCI is
> >>>>      Flink currently
> >>>>      >>>>>>>       using? Is it
> >>>>      >>>>>>>       > >> the
> >>>>      >>>>>>>       > >> > > free
> >>>>      >>>>>>>       > >> > > > > plan for open source projects? What
> >>>> are the
> >>>>      >>>>>>>       guaranteed build
> >>>>      >>>>>>>       > >> capacity
> >>>>      >>>>>>>       > >> > > of
> >>>>      >>>>>>>       > >> > > > > the current plan?
> >>>>      >>>>>>>       > >> > > > >
> >>>>      >>>>>>>       > >> > > > > - If the current pricing plan (either
> >>>>      free or paid)
> >>>>      >>>>>> can't
> >>>>      >>>>>>>       > provide
> >>>>      >>>>>>>       > >> > > stable
> >>>>      >>>>>>>       > >> > > > > build capacity, can we upgrade to a
> >>>>      higher priced
> >>>>      >>>>>>>       plan with
> >>>>      >>>>>>>       > larger
> >>>>      >>>>>>>       > >> > and
> >>>>      >>>>>>>       > >> > > > more
> >>>>      >>>>>>>       > >> > > > > stable build capacity?
> >>>>      >>>>>>>       > >> > > > >
> >>>>      >>>>>>>       > >> > > > > BTW, another factor that contribute to
> >>>> the
> >>>>      >>>>>>>       productivity problem
> >>>>      >>>>>>>       > is
> >>>>      >>>>>>>       > >> > that
> >>>>      >>>>>>>       > >> > > > > our build is slow - we run full build
> >>>>      for every PR
> >>>>      >>>> and a
> >>>>      >>>>>>>       > >> successful
> >>>>      >>>>>>>       > >> > > full
> >>>>      >>>>>>>       > >> > > > > build takes ~5h. We definitely have
> >>>>      more options to
> >>>>      >>>>>>>       solve it,
> >>>>      >>>>>>>       > for
> >>>>      >>>>>>>       > >> > > > instance,
> >>>>      >>>>>>>       > >> > > > > modularize the build graphs and reuse
> >>>>      artifacts
> >>>>      >> from
> >>>>      >>>> the
> >>>>      >>>>>>>       > previous
> >>>>      >>>>>>>       > >> > > build.
> >>>>      >>>>>>>       > >> > > > > But I think that can be a big effort
> >>>>      which is much
> >>>>      >>>>>>>       harder to
> >>>>      >>>>>>>       > >> > accomplish
> >>>>      >>>>>>>       > >> > > > in
> >>>>      >>>>>>>       > >> > > > > a short period of time and may deserve
> >>>>      its own
> >>>>      >>>> separate
> >>>>      >>>>>>>       > >> discussion.
> >>>>      >>>>>>>       > >> > > > >
> >>>>      >>>>>>>       > >> > > > > [1]
> >>>>      >> https://travis-ci.org/apache/flink/pull_requests
> >>>>      >>>>>>>       > >> > > > >
> >>>>      >>>>>>>       > >> > > > >
> >>>>      >>>>>>>       > >> > > >
> >>>>      >>>>>>>       > >> > >
> >>>>      >>>>>>>       > >> >
> >>>>      >>>>>>>       > >>
> >>>>      >>>>>>>       > >
> >>>>      >>>>>>>       >
> >>>>      >>>>>>>
> >>>>      >>>>>>>
> >>>>      >>>>>>>       --
> >>>>      >>>>>>>       Best Regards
> >>>>      >>>>>>>
> >>>>      >>>>>>>       Jeff Zhang
> >>>>      >>>>>>>
> >>>>      >>
> >>>>
> >>>
> >>
>
>

Re: [DISCUSS] solve unstable build capacity problem on TravisCI

Posted by Chesnay Schepler <ch...@apache.org>.
Are they using their own Travis CI pool, or did the switch to an 
entirely different CI service?

If we can just switch to our own Travis pool, just for our project, then 
this might be something we can do fairly quickly?

On 03/07/2019 05:55, Bowen Li wrote:
> I responded in the INFRA ticket [1] that I believe they are using a wrong
> metric against Flink and the total build time is a completely different
> thing than guaranteed build capacity.
>
> My response:
>
> "As mentioned above, since I started to pay attention to Flink's build
> queue a few tens of days ago, I'm in Seattle and I saw no build was kicking
> off in PST daytime in weekdays for Flink. Our teammates in China and Europe
> have also reported similar observations. So we need to evaluate how the
> large total build time came from - if 1) your number and 2) our
> observations from three locations that cover pretty much a full day, are
> all true, I **guess** one reason can be that - highly likely the extra
> build time came from weekends when other Apache projects may be idle and
> Flink just drains hard its congested queue.
>
> Please be aware of that we're not complaining about the lack of resources
> in general, I'm complaining about the lack of **stable, dedicated**
> resources. An example for the latter one is, currently even if no build is
> in Flink's queue and I submit a request to be the queue head in PST
> morning, my build won't even start in 6-8+h. That is an absurd amount of
> waiting time.
>
> That's saying, if ASF INFRA decides to adopt a quota system and grants
> Flink five DEDICATED servers that runs all the time only for Flink, that'll
> be PERFECT and can totally solve our problem now.
>
> Please be aware of that we're not complaining about the lack of resources
> in general, I'm complaining about the lack of **stable, dedicated**
> resources. An example for the latter one is, currently even if no build is
> in Flink's queue and I submit a request to be the queue head in PST
> morning, my build won't even start in 6-8+h. That is an absurd amount of
> waiting time.
>
>
> That's saying, if ASF INFRA decides to adopt a quota system and grants
> Flink five DEDICATED servers that runs all the time only for Flink, that'll
> be PERFECT and can totally solve our problem now.
>
> I feel what's missing in the ASF INFRA's Travis resource pool is some level
> of build capacity SLAs and certainty"
>
>
> Again, I believe there are differences in nature of these two problems,
> long build time v.s. lack of dedicated build resource. That's saying,
> shortening build time may relieve the situation, and may not. I'm sightly
> negative on disabling IT cases for PRs, due to the downside is that we are
> at risk of any potential bugs in PR that UTs doesn't catch, and may cost a
> lot more to fix and if it slows others down or even block others, but am
> open to others opinions on it.
>
> AFAICT from INFRA ticket[1], donating to ASF INFRA won't be feasible to
> solve our problem since INFRA's pool is fully shared and they have no
> control and finer insights over resource allocation to a specific Apache
> project. As mentioned in [1], Apache Arrow is moving away from ASF INFRA
> Travis pool (they are actually surprised Flink hasn't plan to do so). I
> know that Spark is on its own build infra. If we all agree that funding our
> own build infra, I'd be glad to help investigate any potential options
> after releasing 1.9 since I'm super busy with 1.9 now.
>
> [1] https://issues.apache.org/jira/browse/INFRA-18533
>
>
>
> On Tue, Jul 2, 2019 at 4:46 AM Chesnay Schepler <ch...@apache.org> wrote:
>
>> As a short-term stopgap, since we can assume this issue to become much
>> worse in the following days/weeks, we could disable IT cases in PRs and
>> only run them on master.
>>
>> On 02/07/2019 12:03, Chesnay Schepler wrote:
>>> People really have to stop thinking that just because something works
>>> for us it is also a good solution.
>>> Also, please remember that our builds run for 2h from start to finish,
>>> and not the 14 _minutes_ it takes for zeppelin.
>>> We are dealing with an entirely different scale here, both in terms of
>>> build times and number of builds.
>>>
>>> In this very thread people have been complaining about long queue
>>> times for their builds. Surprise, other Apache projects have been
>>> suffering the very same thing due to us not controlling our build
>>> times. While switching services (be it Jenkins, CircleCI or whatever)
>>> will possibly work for us (and these options are actually attractive,
>>> like CircleCI's proper support for build artifacts), it will also
>>> result in us likely negatively affecting other projects in significant
>>> ways.
>>>
>>> Sure, the Jenkins setup has a good user experience for us, at the cost
>>> of blocking Jenkins workers for a _lot_ of time. Right now we have 25
>>> PR's in our queue; that's possibly 50h we'd consume of Jenkins
>>> resources, and the European contributors haven't even really started yet.
>>>
>>> FYI, the latest INFRA response from INFRA-18533:
>>>
>>> "Our rough metrics shows that Flink used over 5800 hours of build time
>>> last month. That is equal to EIGHT servers running 24/7 for the ENTIRE
>>> MONTH. EIGHT. nonstop.
>>> When we discovered this last night, we discussed it some and are going
>>> to tune down Flink to allow only five executors maximum. We cannot
>>> allow Flink to consume so much of a Foundation shared resource."
>>>
>>> So yes, we either
>>> a) have to heavily reduce our CI usage or
>>> b) fund our own, either maintaining it ourselves or donating to Apache.
>>>
>>> On 02/07/2019 05:11, Bowen Li wrote:
>>>> By looking at the git history of the Jenkins script, its core part
>>>> was finished in March 2017 (and only two minor update in 2017/2018),
>>>> so it's been running for over two years now and feels like Zepplin
>>>> community has been quite happy with it. @Jeff Zhang
>>>> <ma...@gmail.com> can you share your insights and user
>>>> experience with the Jenkins+Travis approach?
>>>>
>>>> Things like:
>>>>
>>>> - has the approach completely solved the resource capacity problem
>>>> for Zepplin community? is Zepplin community happy with the result?
>>>> - is the whole configuration chain stable (e.g. uptime) enough?
>>>> - how often do you need to maintain the Jenkins infra? how many
>>>> people are usually involved in maintenance and bug-fixes?
>>>>
>>>> The downside of this approach seems mostly to be on the maintenance
>>>> to me - maintain the script and Jenkins infra.
>>>>
>>>> ** Having Our Own Travis-CI.com Account **
>>>>
>>>> Another alternative I've been thinking of is to have our own
>>>> travis-ci.com <http://travis-ci.com> account with paid dedicated
>>>> resources. Note travis-ci.org <http://travis-ci.org> is the free
>>>> version and travis-ci.com <http://travis-ci.com> is the commercial
>>>> version. We currently use a shared resource pool managed by ASK INFRA
>>>> team on travis-ci.org <http://travis-ci.org>, but we have no control
>>>> over it - we can't see how it's configured, how much resources are
>>>> available, how resources are allocated among Apache projects, etc.
>>>> The nice thing about having an account on travis-ci.com
>>>> <http://travis-ci.com> are:
>>>>
>>>> - relatively low cost with much better resource guarantee than what
>>>> we currently have [1]: $249/month with 5 dedicated concurrency,
>>>> $489/month with 10 concurrency
>>>> - low maintenance work compared to using Jenkins
>>>> - (potentially) no migration cost according to Travis's doc [2]
>>>> (pending verification)
>>>> - full control over the build capacity/configuration compared to
>>>> using ASF INFRA's pool
>>>>
>>>> I'd be surprised if we as such a vibrant community cannot find and
>>>> fund $249*12=$2988 a year in exchange for a much better developer
>>>> experience and much higher productivity.
>>>>
>>>> [1] https://travis-ci.com/plans
>>>> [2]
>>>>
>> https://docs.travis-ci.com/user/migrate/open-source-repository-migration
>>>> On Sat, Jun 29, 2019 at 8:39 AM Chesnay Schepler <chesnay@apache.org
>>>> <ma...@apache.org>> wrote:
>>>>
>>>>      So yes, the Jenkins job keeps pulling the state from Travis until it
>>>>      finishes.
>>>>
>>>>      Note sure I'm comfortable with the idea of using Jenkins workers
>>>>      just to
>>>>      idle for a several hours.
>>>>
>>>>      On 29/06/2019 14:56, Jeff Zhang wrote:
>>>>      > Here's what zeppelin community did, we make a python script to
>>>>      check the
>>>>      > build status of pull request.
>>>>      > Here's script:
>>>>      > https://github.com/apache/zeppelin/blob/master/travis_check.py
>>>>      >
>>>>      > And this is the script we used in Jenkins build job.
>>>>      >
>>>>      > if [ -f "travis_check.py" ]; then
>>>>      >    git log -n 1
>>>>      >    STATUS=$(curl -s $BUILD_URL | grep -e "GitHub pull
>>>>      request.*from.*" | sed
>>>>      > 's/.*GitHub pull request <a
>>>>      > href=\"\(https[^"]*\).*from[^"]*.\(https[^"]*\).*/\1 \2/g')
>>>>      >    AUTHOR=$(echo $STATUS | sed 's/.*[/]\(.*\)$/\1/g')
>>>>      >    PR=$(echo $STATUS | awk '{print $1}' | sed
>>>> 's/.*[/]\(.*\)$/\1/g')
>>>>      >    #COMMIT=$(git log -n 1 | grep "^Merge:" | awk '{print $3}')
>>>>      >    #if [ -z $COMMIT ]; then
>>>>      >    #  COMMIT=$(curl -s
>>>>      https://api.github.com/repos/apache/zeppelin/pulls/$PR
>>>>      > | grep -e "\"label\":" -e "\"ref\":" -e "\"sha\":" | tr '\n' ' '
>>>>      | sed
>>>>      > 's/\(.*sha[^,]*,\)\(.*ref.*\)/\1 = \2/g' | tr = '\n' | grep -v
>>>>      "apache:" |
>>>>      > sed 's/.*sha.[^"]*["]\([^"]*\).*/\1/g')
>>>>      >    #fi
>>>>      >
>>>>      >    # get commit hash from PR
>>>>      >    COMMIT=$(curl -s
>>>>      https://api.github.com/repos/apache/zeppelin/pulls/$PR |
>>>>      > grep -e "\"label\":" -e "\"ref\":" -e "\"sha\":" | tr '\n' ' '
>>>> | sed
>>>>      > 's/\(.*sha[^,]*,\)\(.*ref.*\)/\1 = \2/g' | tr = '\n' | grep -v
>>>>      "apache:" |
>>>>      > sed 's/.*sha.[^"]*["]\([^"]*\).*/\1/g')
>>>>      >    sleep 30 # sleep few moment to wait travis starts the build
>>>>      >    RET_CODE=0
>>>>      >    python ./travis_check.py ${AUTHOR} ${COMMIT} || RET_CODE=$?
>>>>      >    if [ $RET_CODE -eq 2 ]; then # try with repository name when
>>>>      travis-ci is
>>>>      > not available in the account
>>>>      >      RET_CODE=0
>>>>      >      AUTHOR=$(curl -s
>>>>      https://api.github.com/repos/apache/zeppelin/pulls/$PR
>>>>      > | grep '"full_name":' | grep -v "apache/zeppelin" | sed
>>>>      > 's/.*[:][^"]*["]\([^/]*\).*/\1/g')
>>>>      >    python ./travis_check.py ${AUTHOR} ${COMMIT} || RET_CODE=$?
>>>>      >    fi
>>>>      >
>>>>      >    if [ $RET_CODE -eq 2 ]; then # fail with can't find build
>>>>      information in
>>>>      > the travis
>>>>      >      set +x
>>>>      >      echo "-----------------------------------------------------"
>>>>      >      echo "Looks like travis-ci is not configured for your fork."
>>>>      >      echo "Please setup by swich on 'zeppelin' repository at
>>>>      > https://travis-ci.org/profile and travis-ci."
>>>>      >      echo "And then make sure 'Build branch updates' option is
>>>>      enabled in
>>>>      > the settings https://travis-ci.org/${AUTHOR}/zeppelin/settings
>>>> <https://travis-ci.org/$%7BAUTHOR%7D/zeppelin/settings>."
>>>>      >      echo ""
>>>>      >      echo "To trigger CI after setup, you will need ammend your
>>>>      last commit
>>>>      > with"
>>>>      >      echo "git commit --amend"
>>>>      >      echo "git push your-remote HEAD --force"
>>>>      >      echo ""
>>>>      >      echo "See
>>>>      >
>>>>
>> http://zeppelin.apache.org/contribution/contributions.html#continuous-integration
>>>>      > ."
>>>>      >    fi
>>>>      >
>>>>      >    exit $RET_CODE
>>>>      > else
>>>>      >    set +x
>>>>      >    echo "travis_check.py does not exists"
>>>>      >    exit 1
>>>>      > fi
>>>>      >
>>>>      > Chesnay Schepler <chesnay@apache.org
>>>>      <ma...@apache.org>> 于2019年6月29日周六 下午3:17写道:
>>>>      >
>>>>      >> Does this imply that a Jenkins job is active as long as the
>>>>      Travis build
>>>>      >> runs?
>>>>      >>
>>>>      >> On 26/06/2019 21:28, Bowen Li wrote:
>>>>      >>> Hi,
>>>>      >>>
>>>>      >>> @Dawid, I think the "long test running" as I mentioned in the
>>>>      first
>>>>      >> email,
>>>>      >>> also as you guys said, belongs to "a big effort which is much
>>>>      harder to
>>>>      >>> accomplish in a short period of time and may deserve its own
>>>>      separate
>>>>      >>> discussion". Thus I didn't include it in what we can do in a
>>>>      foreseeable
>>>>      >>> short term.
>>>>      >>>
>>>>      >>> Besides, I don't think that's the ultimate reason for lack of
>>>>      build
>>>>      >>> resources. Even if the build is shortened to something like
>>>>      2h, the
>>>>      >>> problems of no build machine works about 6 or more hours in
>>>>      PST daytime
>>>>      >>> that I described will still happen, because no machine from
>>>>      ASF INFRA's
>>>>      >>> pool is allocated to Flink. As I have paid close attention to
>>>>      the build
>>>>      >>> queue in the past few weekdays, it's a pretty clear pattern now.
>>>>      >>>
>>>>      >>> **The ultimate root cause** for that is - we don't have any
>>>>      **dedicated**
>>>>      >>> build resources that we can stably rely on. I'm actually ok to
>>>>      wait for a
>>>>      >>> long time if there are build requests running, it means at
>>>>      least we are
>>>>      >>> making progress. But I'm not ok with no build resource. A
>>>>      better place I
>>>>      >>> think we should aim at in short term is to always have at
>>>>      least a central
>>>>      >>> pool (can be 3 or 5) of machines dedicated to build Flink at
>>>>      any time, or
>>>>      >>> maybe use users resources.
>>>>      >>>
>>>>      >>> @Chesnay @Robert I synced with Jeff offline that Zeppelin
>>>>      community is
>>>>      >>> using a Jenkins job to automatically build on users' travis
>>>>      account and
>>>>      >>> link the result back to github PR. I guess the Jenkins job
>>>>      would fetch
>>>>      >>> latest upstream master and build the PR against it. Jeff has
>>>> filed
>>>>      >> tickets
>>>>      >>> to learn and get access to the Jenkins infra. It'll better to
>>>>      fully
>>>>      >>> understand it first before judging this approach.
>>>>      >>>
>>>>      >>> I also heard good things about CircleCI, and ASF INFRA seems
>>>>      to have a
>>>>      >> pool
>>>>      >>> of build capacity there too. Can be an alternative to consider.
>>>>      >>>
>>>>      >>>
>>>>      >>>
>>>>      >>>
>>>>      >>>
>>>>      >>>
>>>>      >>>
>>>>      >>>
>>>>      >>>
>>>>      >>> On Wed, Jun 26, 2019 at 12:44 AM Dawid Wysakowicz <
>>>>      >> dwysakowicz@apache.org <ma...@apache.org>>
>>>>      >>> wrote:
>>>>      >>>
>>>>      >>>> Sorry to jump in late, but I think Bowen missed the most
>>>>      important point
>>>>      >>>> from Chesnay's previous message in the summary. The ultimate
>>>>      reason for
>>>>      >>>> all the problems is that the tests take close to 2 hours to
>>>>      run already.
>>>>      >>>> I fully support this claim: "Unless people start caring about
>>>>      test times
>>>>      >>>> before adding them, this issue cannot be solved"
>>>>      >>>>
>>>>      >>>> This is also another reason why using user's Travis account
>>>>      won't help.
>>>>      >>>> Every few weeks we reach the user's time limit for a single
>>>>      profile.
>>>>      >>>> This makes the user's builds simply fail, until we either
>>>>      properly
>>>>      >>>> decrease the time the tests take (which I am not sure we ever
>>>>      did) or
>>>>      >>>> postpone the problem by splitting into more profiles. (Note
>>>>      that the ASF
>>>>      >>>> Travis account has higher time limits)
>>>>      >>>>
>>>>      >>>> Best,
>>>>      >>>>
>>>>      >>>> Dawid
>>>>      >>>>
>>>>      >>>> On 26/06/2019 09:36, Robert Metzger wrote:
>>>>      >>>>> Do we know if using "the best" available hardware would
>>>>      improve the
>>>>      >> build
>>>>      >>>>> times?
>>>>      >>>>> Imagine we would run the build on machines with plenty of
>>>>      main memory
>>>>      >> to
>>>>      >>>>> mount everything to ramdisk + the latest CPU architecture?
>>>>      >>>>>
>>>>      >>>>> Throwing hardware at the problem could help reduce the time
>>>>      of an
>>>>      >>>>> individual build, and using our own infrastructure would
>>>>      remove our
>>>>      >>>>> dependency on Apache's Travis account (with the obvious
>>>>      downside of
>>>>      >>>> having
>>>>      >>>>> to maintain the infrastructure)
>>>>      >>>>> We could use an open source travis alternative, to have a
>>>>      similar
>>>>      >>>>> experience and make the migration easy.
>>>>      >>>>>
>>>>      >>>>>
>>>>      >>>>> On Wed, Jun 26, 2019 at 9:34 AM Chesnay Schepler
>>>>      <chesnay@apache.org <ma...@apache.org>>
>>>>      >>>> wrote:
>>>>      >>>>>>    From what I gathered, there's no special sauce that the
>>>>      Zeppelin
>>>>      >>>>>> project uses which actually integrates a users Travis
>>>>      account into the
>>>>      >>>> PR.
>>>>      >>>>>> They just disabled Travis for PRs. And that's kind of it.
>>>>      >>>>>>
>>>>      >>>>>> Naturally we can do this (duh) and safe the ASF a fair
>>>>      amount of
>>>>      >>>>>> resources, but there are downsides:
>>>>      >>>>>>
>>>>      >>>>>> The discoverability of the Travis check takes a nose-dive.
>>>>      Either we
>>>>      >>>>>> require every contributor to always, an every commit, also
>>>>      post a
>>>>      >> Travis
>>>>      >>>>>> build, or we have the reviewer sift through the
>>>>      contributors account
>>>>      >> to
>>>>      >>>>>> find it.
>>>>      >>>>>>
>>>>      >>>>>> This is rather cumbersome. Additionally, it's also not
>>>>      equivalent to
>>>>      >>>>>> having a PR build.
>>>>      >>>>>>
>>>>      >>>>>> A normal branch build takes a branch as is and tests it. A
>>>>      PR build
>>>>      >>>>>> merges the branch into master, and then runs it. (Fun fact:
>>>>      This is
>>>>      >> why
>>>>      >>>>>> a PR without merge conflicts is not being run on Travis.)
>>>>      >>>>>>
>>>>      >>>>>> And ultimately, everyone can already make use of this
>>>>      approach anyway.
>>>>      >>>>>>
>>>>      >>>>>> On 25/06/2019 08:02, Jark Wu wrote:
>>>>      >>>>>>> Hi Jeff,
>>>>      >>>>>>>
>>>>      >>>>>>> Thanks for sharing the Zeppelin approach. I think it's a
>>>>      good idea to
>>>>      >>>>>>> leverage user's travis account.
>>>>      >>>>>>> In this way, we can have almost unlimited concurrent build
>>>>      jobs and
>>>>      >>>>>>> developers can restart build by themselves (currently only
>>>>      committers
>>>>      >>>>>>> can restart PR's build).
>>>>      >>>>>>>
>>>>      >>>>>>> But I'm still not very clear how to integrate user's
>>>>      travis build
>>>>      >> into
>>>>      >>>>>>> the Flink pull request's build automatically. Can you
>>>>      explain more in
>>>>      >>>>>>> detail?
>>>>      >>>>>>>
>>>>      >>>>>>> Another question: does travis only build branches for user
>>>>      account?
>>>>      >>>>>>> My concern is that builds for PRs will rebase user's
>>>>      commits against
>>>>      >>>>>>> current master branch.
>>>>      >>>>>>> This will help us to find problems before merge.  Builds
>>>>      for branches
>>>>      >>>>>>> will lose the impact of new commits in master.
>>>>      >>>>>>> How does Zeppelin solve this problem?
>>>>      >>>>>>>
>>>>      >>>>>>> Thanks again for sharing the idea.
>>>>      >>>>>>>
>>>>      >>>>>>> Regards,
>>>>      >>>>>>> Jark
>>>>      >>>>>>>
>>>>      >>>>>>> On Tue, 25 Jun 2019 at 11:01, Jeff Zhang <zjffdu@gmail.com
>>>>      <ma...@gmail.com>
>>>>      >>>>>>> <mailto:zjffdu@gmail.com <ma...@gmail.com>>> wrote:
>>>>      >>>>>>>
>>>>      >>>>>>>       Hi Folks,
>>>>      >>>>>>>
>>>>      >>>>>>>       Zeppelin meet this kind of issue before, we solve
>>>> it by
>>>>      >> delegating
>>>>      >>>>>>>       each
>>>>      >>>>>>>       one's PR build to his travis account (Everyone can
>>>>      have 5 free
>>>>      >>>>>>>       slot for
>>>>      >>>>>>>       travis build).
>>>>      >>>>>>>       Apache account travis build is only triggered when
>>>>      PR is merged.
>>>>      >>>>>>>
>>>>      >>>>>>>
>>>>      >>>>>>>
>>>>      >>>>>>>       Kurt Young <ykt836@gmail.com
>>>>      <ma...@gmail.com> <mailto:ykt836@gmail.com
>>>>      <ma...@gmail.com>>>
>>>>      >>>>>>>       于2019年6月25日周二 上午10:16写道:
>>>>      >>>>>>>
>>>>      >>>>>>>       > (Forgot to cc George)
>>>>      >>>>>>>       >
>>>>      >>>>>>>       > Best,
>>>>      >>>>>>>       > Kurt
>>>>      >>>>>>>       >
>>>>      >>>>>>>       >
>>>>      >>>>>>>       > On Tue, Jun 25, 2019 at 10:16 AM Kurt Young
>>>>      <ykt836@gmail.com <ma...@gmail.com>
>>>>      >>>>>>> <mailto:ykt836@gmail.com <ma...@gmail.com>>>
>>>>      wrote:
>>>>      >>>>>>>       >
>>>>      >>>>>>>       > > Hi Bowen,
>>>>      >>>>>>>       > >
>>>>      >>>>>>>       > > Thanks for bringing this up. We actually have
>>>>      discussed
>>>>      >> about
>>>>      >>>>>>>       this, and I
>>>>      >>>>>>>       > > think Till and George have
>>>>      >>>>>>>       > > already spend sometime investigating it. I have
>>>>      cced both of
>>>>      >>>>>>>       them, and
>>>>      >>>>>>>       > > maybe they can share
>>>>      >>>>>>>       > > their findings.
>>>>      >>>>>>>       > >
>>>>      >>>>>>>       > > Best,
>>>>      >>>>>>>       > > Kurt
>>>>      >>>>>>>       > >
>>>>      >>>>>>>       > >
>>>>      >>>>>>>       > > On Tue, Jun 25, 2019 at 10:08 AM Jark Wu
>>>>      <imjark@gmail.com <ma...@gmail.com>
>>>>      >>>>>>> <mailto:imjark@gmail.com <ma...@gmail.com>>>
>>>>      wrote:
>>>>      >>>>>>>       > >
>>>>      >>>>>>>       > >> Hi Bowen,
>>>>      >>>>>>>       > >>
>>>>      >>>>>>>       > >> Thanks for bringing this. We also suffered from
>>>>      the long
>>>>      >>>>>>>       build time.
>>>>      >>>>>>>       > >> I agree that we should focus on solving build
>>>>      capacity
>>>>      >>>>>>>       problem in the
>>>>      >>>>>>>       > >> thread.
>>>>      >>>>>>>       > >>
>>>>      >>>>>>>       > >> My observation is there is only one build is
>>>>      running, all
>>>>      >> the
>>>>      >>>>>>>       others
>>>>      >>>>>>>       > >> (other
>>>>      >>>>>>>       > >> PRs, master) are pending.
>>>>      >>>>>>>       > >> The pricing plan[1] of travis shows it can
>>>> support
>>>>      >> concurrent
>>>>      >>>>>>>       build
>>>>      >>>>>>>       > jobs.
>>>>      >>>>>>>       > >> But I don't know which plan we are using, might
>>>>      be the free
>>>>      >>>>>>>       plan for
>>>>      >>>>>>>       > open
>>>>      >>>>>>>       > >> source.
>>>>      >>>>>>>       > >>
>>>>      >>>>>>>       > >> I cc-ed Chesnay who may have some experience on
>>>>      Travis.
>>>>      >>>>>>>       > >>
>>>>      >>>>>>>       > >> Regards,
>>>>      >>>>>>>       > >> Jark
>>>>      >>>>>>>       > >>
>>>>      >>>>>>>       > >> [1]: https://travis-ci.com/plans
>>>>      >>>>>>>       > >>
>>>>      >>>>>>>       > >> On Tue, 25 Jun 2019 at 08:11, Bowen Li <
>>>>      >> bowenli86@gmail.com <ma...@gmail.com>
>>>>      >>>>>>> <mailto:bowenli86@gmail.com
>>>>      <ma...@gmail.com>>> wrote:
>>>>      >>>>>>>       > >>
>>>>      >>>>>>>       > >> > Hi Steven,
>>>>      >>>>>>>       > >> >
>>>>      >>>>>>>       > >> > I think you may not read what I wrote. The
>>>>      discussion is
>>>>      >>>> about
>>>>      >>>>>>>       > "unstable
>>>>      >>>>>>>       > >> > build **capacity**", in another word
>>>>      "unstable / lack of
>>>>      >>>> build
>>>>      >>>>>>>       > >> resources",
>>>>      >>>>>>>       > >> > not "unstable build".
>>>>      >>>>>>>       > >> >
>>>>      >>>>>>>       > >> > On Mon, Jun 24, 2019 at 4:40 PM Steven Wu
>>>>      >>>>>>>       <stevenz3wu@gmail.com <ma...@gmail.com>
>>>>      <mailto:stevenz3wu@gmail.com <ma...@gmail.com>>>
>>>>      >>>>>>>       > wrote:
>>>>      >>>>>>>       > >> >
>>>>      >>>>>>>       > >> > > long and sometimes unstable build is
>>>>      definitely a pain
>>>>      >>>>>> point.
>>>>      >>>>>>>       > >> > >
>>>>      >>>>>>>       > >> > > I suspect the build failure here in
>>>>      >> flink-connector-kafka
>>>>      >>>>>>>       is not
>>>>      >>>>>>>       > >> related
>>>>      >>>>>>>       > >> > to
>>>>      >>>>>>>       > >> > > my change. but there is no easy re-run the
>>>>      build on
>>>>      >>>>>>>       travis UI.
>>>>      >>>>>>>       > Google
>>>>      >>>>>>>       > >> > > search showed a trick of close-and-open the
>>>>      PR will
>>>>      >>>>>>>       trigger rebuild.
>>>>      >>>>>>>       > >> but
>>>>      >>>>>>>       > >> > > that could add noises to the PR activities.
>>>>      >>>>>>>       > >> > >
>>>>      https://travis-ci.org/apache/flink/jobs/545555519
>>>>      >>>>>>>       > >> > >
>>>>      >>>>>>>       > >> > > travis-ci for my personal repo often failed
>>>>      with
>>>>      >>>>>>>       exceeding time
>>>>      >>>>>>>       > limit
>>>>      >>>>>>>       > >> > after
>>>>      >>>>>>>       > >> > > 4+ hours.
>>>>      >>>>>>>       > >> > > The job exceeded the maximum time limit for
>>>>      jobs, and
>>>>      >> has
>>>>      >>>>>>>       been
>>>>      >>>>>>>       > >> > terminated.
>>>>      >>>>>>>       > >> > >
>>>>      >>>>>>>       > >> > > On Mon, Jun 24, 2019 at 4:15 PM Bowen Li
>>>>      >>>>>>>       <bowenli86@gmail.com <ma...@gmail.com>
>>>>      <mailto:bowenli86@gmail.com <ma...@gmail.com>>>
>>>>      >>>>>>>       > wrote:
>>>>      >>>>>>>       > >> > >
>>>>      >>>>>>>       > >> > > >
>>>>      https://travis-ci.org/apache/flink/builds/549681530
>>>>      >>>>>>>       This build
>>>>      >>>>>>>       > >> > request
>>>>      >>>>>>>       > >> > > > has
>>>>      >>>>>>>       > >> > > > been sitting at **HEAD of the queue**
>>>>      since I first
>>>>      >> saw
>>>>      >>>>>>>       it at PST
>>>>      >>>>>>>       > >> > 10:30am
>>>>      >>>>>>>       > >> > > > (not sure how long it's been there before
>>>>      10:30am).
>>>>      >>>>>>>       It's PST
>>>>      >>>>>>>       > 4:12pm
>>>>      >>>>>>>       > >> now
>>>>      >>>>>>>       > >> > > and
>>>>      >>>>>>>       > >> > > > it hasn't started yet.
>>>>      >>>>>>>       > >> > > >
>>>>      >>>>>>>       > >> > > > On Mon, Jun 24, 2019 at 2:48 PM Bowen Li
>>>>      >>>>>>>       <bowenli86@gmail.com <ma...@gmail.com>
>>>>      <mailto:bowenli86@gmail.com <ma...@gmail.com>>>
>>>>      >>>>>>>       > >> wrote:
>>>>      >>>>>>>       > >> > > >
>>>>      >>>>>>>       > >> > > > > Hi devs,
>>>>      >>>>>>>       > >> > > > >
>>>>      >>>>>>>       > >> > > > > I've been experiencing the pain
>>>>      resulting from lack
>>>>      >>>>>>>       of stable
>>>>      >>>>>>>       > >> build
>>>>      >>>>>>>       > >> > > > > capacity on Travis for Flink PRs [1].
>>>>      >> Specifically, I
>>>>      >>>>>>>       noticed
>>>>      >>>>>>>       > >> often
>>>>      >>>>>>>       > >> > > that
>>>>      >>>>>>>       > >> > > > no
>>>>      >>>>>>>       > >> > > > > build in the queue is making any
>>>>      progress for
>>>>      >> hours,
>>>>      >>>> and
>>>>      >>>>>>>       > suddenly
>>>>      >>>>>>>       > >> 5
>>>>      >>>>>>>       > >> > or
>>>>      >>>>>>>       > >> > > 6
>>>>      >>>>>>>       > >> > > > > builds kick off all together after the
>>>>      long pause.
>>>>      >>>>>>>       I'm at PST
>>>>      >>>>>>>       > >> > (UTC-08)
>>>>      >>>>>>>       > >> > > > time
>>>>      >>>>>>>       > >> > > > > zone, and I've seen pause can be as
>>>>      long as 6 hours
>>>>      >>>>>>>       from PST 9am
>>>>      >>>>>>>       > >> to
>>>>      >>>>>>>       > >> > 3pm
>>>>      >>>>>>>       > >> > > > > (let alone the time needed to drain the
>>>>      queue
>>>>      >>>>>>>       afterwards).
>>>>      >>>>>>>       > >> > > > >
>>>>      >>>>>>>       > >> > > > > I think this has greatly impacted our
>>>>      productivity.
>>>>      >>>> I've
>>>>      >>>>>>>       > >> experienced
>>>>      >>>>>>>       > >> > > that
>>>>      >>>>>>>       > >> > > > > PRs submitted in the early morning of
>>>>      PST time zone
>>>>      >>>>>>>       won't finish
>>>>      >>>>>>>       > >> > their
>>>>      >>>>>>>       > >> > > > > build until late night of the same day.
>>>>      >>>>>>>       > >> > > > >
>>>>      >>>>>>>       > >> > > > > So my questions are:
>>>>      >>>>>>>       > >> > > > >
>>>>      >>>>>>>       > >> > > > > - Has anyone else experienced the same
>>>>      problem or
>>>>      >>>>>>>       have similar
>>>>      >>>>>>>       > >> > > > observation
>>>>      >>>>>>>       > >> > > > > on TravisCI? (I suspect it has things
>>>>      to do with
>>>>      >> time
>>>>      >>>>>>>       zone)
>>>>      >>>>>>>       > >> > > > >
>>>>      >>>>>>>       > >> > > > > - What pricing plan of TravisCI is
>>>>      Flink currently
>>>>      >>>>>>>       using? Is it
>>>>      >>>>>>>       > >> the
>>>>      >>>>>>>       > >> > > free
>>>>      >>>>>>>       > >> > > > > plan for open source projects? What
>>>> are the
>>>>      >>>>>>>       guaranteed build
>>>>      >>>>>>>       > >> capacity
>>>>      >>>>>>>       > >> > > of
>>>>      >>>>>>>       > >> > > > > the current plan?
>>>>      >>>>>>>       > >> > > > >
>>>>      >>>>>>>       > >> > > > > - If the current pricing plan (either
>>>>      free or paid)
>>>>      >>>>>> can't
>>>>      >>>>>>>       > provide
>>>>      >>>>>>>       > >> > > stable
>>>>      >>>>>>>       > >> > > > > build capacity, can we upgrade to a
>>>>      higher priced
>>>>      >>>>>>>       plan with
>>>>      >>>>>>>       > larger
>>>>      >>>>>>>       > >> > and
>>>>      >>>>>>>       > >> > > > more
>>>>      >>>>>>>       > >> > > > > stable build capacity?
>>>>      >>>>>>>       > >> > > > >
>>>>      >>>>>>>       > >> > > > > BTW, another factor that contribute to
>>>> the
>>>>      >>>>>>>       productivity problem
>>>>      >>>>>>>       > is
>>>>      >>>>>>>       > >> > that
>>>>      >>>>>>>       > >> > > > > our build is slow - we run full build
>>>>      for every PR
>>>>      >>>> and a
>>>>      >>>>>>>       > >> successful
>>>>      >>>>>>>       > >> > > full
>>>>      >>>>>>>       > >> > > > > build takes ~5h. We definitely have
>>>>      more options to
>>>>      >>>>>>>       solve it,
>>>>      >>>>>>>       > for
>>>>      >>>>>>>       > >> > > > instance,
>>>>      >>>>>>>       > >> > > > > modularize the build graphs and reuse
>>>>      artifacts
>>>>      >> from
>>>>      >>>> the
>>>>      >>>>>>>       > previous
>>>>      >>>>>>>       > >> > > build.
>>>>      >>>>>>>       > >> > > > > But I think that can be a big effort
>>>>      which is much
>>>>      >>>>>>>       harder to
>>>>      >>>>>>>       > >> > accomplish
>>>>      >>>>>>>       > >> > > > in
>>>>      >>>>>>>       > >> > > > > a short period of time and may deserve
>>>>      its own
>>>>      >>>> separate
>>>>      >>>>>>>       > >> discussion.
>>>>      >>>>>>>       > >> > > > >
>>>>      >>>>>>>       > >> > > > > [1]
>>>>      >> https://travis-ci.org/apache/flink/pull_requests
>>>>      >>>>>>>       > >> > > > >
>>>>      >>>>>>>       > >> > > > >
>>>>      >>>>>>>       > >> > > >
>>>>      >>>>>>>       > >> > >
>>>>      >>>>>>>       > >> >
>>>>      >>>>>>>       > >>
>>>>      >>>>>>>       > >
>>>>      >>>>>>>       >
>>>>      >>>>>>>
>>>>      >>>>>>>
>>>>      >>>>>>>       --
>>>>      >>>>>>>       Best Regards
>>>>      >>>>>>>
>>>>      >>>>>>>       Jeff Zhang
>>>>      >>>>>>>
>>>>      >>
>>>>
>>>
>>


Re: [DISCUSS] solve unstable build capacity problem on TravisCI

Posted by Bowen Li <bo...@gmail.com>.
I responded in the INFRA ticket [1] that I believe they are using a wrong
metric against Flink and the total build time is a completely different
thing than guaranteed build capacity.

My response:

"As mentioned above, since I started to pay attention to Flink's build
queue a few tens of days ago, I'm in Seattle and I saw no build was kicking
off in PST daytime in weekdays for Flink. Our teammates in China and Europe
have also reported similar observations. So we need to evaluate how the
large total build time came from - if 1) your number and 2) our
observations from three locations that cover pretty much a full day, are
all true, I **guess** one reason can be that - highly likely the extra
build time came from weekends when other Apache projects may be idle and
Flink just drains hard its congested queue.

Please be aware of that we're not complaining about the lack of resources
in general, I'm complaining about the lack of **stable, dedicated**
resources. An example for the latter one is, currently even if no build is
in Flink's queue and I submit a request to be the queue head in PST
morning, my build won't even start in 6-8+h. That is an absurd amount of
waiting time.

That's saying, if ASF INFRA decides to adopt a quota system and grants
Flink five DEDICATED servers that runs all the time only for Flink, that'll
be PERFECT and can totally solve our problem now.

Please be aware of that we're not complaining about the lack of resources
in general, I'm complaining about the lack of **stable, dedicated**
resources. An example for the latter one is, currently even if no build is
in Flink's queue and I submit a request to be the queue head in PST
morning, my build won't even start in 6-8+h. That is an absurd amount of
waiting time.


That's saying, if ASF INFRA decides to adopt a quota system and grants
Flink five DEDICATED servers that runs all the time only for Flink, that'll
be PERFECT and can totally solve our problem now.

I feel what's missing in the ASF INFRA's Travis resource pool is some level
of build capacity SLAs and certainty"


Again, I believe there are differences in nature of these two problems,
long build time v.s. lack of dedicated build resource. That's saying,
shortening build time may relieve the situation, and may not. I'm sightly
negative on disabling IT cases for PRs, due to the downside is that we are
at risk of any potential bugs in PR that UTs doesn't catch, and may cost a
lot more to fix and if it slows others down or even block others, but am
open to others opinions on it.

AFAICT from INFRA ticket[1], donating to ASF INFRA won't be feasible to
solve our problem since INFRA's pool is fully shared and they have no
control and finer insights over resource allocation to a specific Apache
project. As mentioned in [1], Apache Arrow is moving away from ASF INFRA
Travis pool (they are actually surprised Flink hasn't plan to do so). I
know that Spark is on its own build infra. If we all agree that funding our
own build infra, I'd be glad to help investigate any potential options
after releasing 1.9 since I'm super busy with 1.9 now.

[1] https://issues.apache.org/jira/browse/INFRA-18533



On Tue, Jul 2, 2019 at 4:46 AM Chesnay Schepler <ch...@apache.org> wrote:

> As a short-term stopgap, since we can assume this issue to become much
> worse in the following days/weeks, we could disable IT cases in PRs and
> only run them on master.
>
> On 02/07/2019 12:03, Chesnay Schepler wrote:
> > People really have to stop thinking that just because something works
> > for us it is also a good solution.
> > Also, please remember that our builds run for 2h from start to finish,
> > and not the 14 _minutes_ it takes for zeppelin.
> > We are dealing with an entirely different scale here, both in terms of
> > build times and number of builds.
> >
> > In this very thread people have been complaining about long queue
> > times for their builds. Surprise, other Apache projects have been
> > suffering the very same thing due to us not controlling our build
> > times. While switching services (be it Jenkins, CircleCI or whatever)
> > will possibly work for us (and these options are actually attractive,
> > like CircleCI's proper support for build artifacts), it will also
> > result in us likely negatively affecting other projects in significant
> > ways.
> >
> > Sure, the Jenkins setup has a good user experience for us, at the cost
> > of blocking Jenkins workers for a _lot_ of time. Right now we have 25
> > PR's in our queue; that's possibly 50h we'd consume of Jenkins
> > resources, and the European contributors haven't even really started yet.
> >
> > FYI, the latest INFRA response from INFRA-18533:
> >
> > "Our rough metrics shows that Flink used over 5800 hours of build time
> > last month. That is equal to EIGHT servers running 24/7 for the ENTIRE
> > MONTH. EIGHT. nonstop.
> > When we discovered this last night, we discussed it some and are going
> > to tune down Flink to allow only five executors maximum. We cannot
> > allow Flink to consume so much of a Foundation shared resource."
> >
> > So yes, we either
> > a) have to heavily reduce our CI usage or
> > b) fund our own, either maintaining it ourselves or donating to Apache.
> >
> > On 02/07/2019 05:11, Bowen Li wrote:
> >> By looking at the git history of the Jenkins script, its core part
> >> was finished in March 2017 (and only two minor update in 2017/2018),
> >> so it's been running for over two years now and feels like Zepplin
> >> community has been quite happy with it. @Jeff Zhang
> >> <ma...@gmail.com> can you share your insights and user
> >> experience with the Jenkins+Travis approach?
> >>
> >> Things like:
> >>
> >> - has the approach completely solved the resource capacity problem
> >> for Zepplin community? is Zepplin community happy with the result?
> >> - is the whole configuration chain stable (e.g. uptime) enough?
> >> - how often do you need to maintain the Jenkins infra? how many
> >> people are usually involved in maintenance and bug-fixes?
> >>
> >> The downside of this approach seems mostly to be on the maintenance
> >> to me - maintain the script and Jenkins infra.
> >>
> >> ** Having Our Own Travis-CI.com Account **
> >>
> >> Another alternative I've been thinking of is to have our own
> >> travis-ci.com <http://travis-ci.com> account with paid dedicated
> >> resources. Note travis-ci.org <http://travis-ci.org> is the free
> >> version and travis-ci.com <http://travis-ci.com> is the commercial
> >> version. We currently use a shared resource pool managed by ASK INFRA
> >> team on travis-ci.org <http://travis-ci.org>, but we have no control
> >> over it - we can't see how it's configured, how much resources are
> >> available, how resources are allocated among Apache projects, etc.
> >> The nice thing about having an account on travis-ci.com
> >> <http://travis-ci.com> are:
> >>
> >> - relatively low cost with much better resource guarantee than what
> >> we currently have [1]: $249/month with 5 dedicated concurrency,
> >> $489/month with 10 concurrency
> >> - low maintenance work compared to using Jenkins
> >> - (potentially) no migration cost according to Travis's doc [2]
> >> (pending verification)
> >> - full control over the build capacity/configuration compared to
> >> using ASF INFRA's pool
> >>
> >> I'd be surprised if we as such a vibrant community cannot find and
> >> fund $249*12=$2988 a year in exchange for a much better developer
> >> experience and much higher productivity.
> >>
> >> [1] https://travis-ci.com/plans
> >> [2]
> >>
> https://docs.travis-ci.com/user/migrate/open-source-repository-migration
> >>
> >> On Sat, Jun 29, 2019 at 8:39 AM Chesnay Schepler <chesnay@apache.org
> >> <ma...@apache.org>> wrote:
> >>
> >>     So yes, the Jenkins job keeps pulling the state from Travis until it
> >>     finishes.
> >>
> >>     Note sure I'm comfortable with the idea of using Jenkins workers
> >>     just to
> >>     idle for a several hours.
> >>
> >>     On 29/06/2019 14:56, Jeff Zhang wrote:
> >>     > Here's what zeppelin community did, we make a python script to
> >>     check the
> >>     > build status of pull request.
> >>     > Here's script:
> >>     > https://github.com/apache/zeppelin/blob/master/travis_check.py
> >>     >
> >>     > And this is the script we used in Jenkins build job.
> >>     >
> >>     > if [ -f "travis_check.py" ]; then
> >>     >    git log -n 1
> >>     >    STATUS=$(curl -s $BUILD_URL | grep -e "GitHub pull
> >>     request.*from.*" | sed
> >>     > 's/.*GitHub pull request <a
> >>     > href=\"\(https[^"]*\).*from[^"]*.\(https[^"]*\).*/\1 \2/g')
> >>     >    AUTHOR=$(echo $STATUS | sed 's/.*[/]\(.*\)$/\1/g')
> >>     >    PR=$(echo $STATUS | awk '{print $1}' | sed
> >> 's/.*[/]\(.*\)$/\1/g')
> >>     >    #COMMIT=$(git log -n 1 | grep "^Merge:" | awk '{print $3}')
> >>     >    #if [ -z $COMMIT ]; then
> >>     >    #  COMMIT=$(curl -s
> >>     https://api.github.com/repos/apache/zeppelin/pulls/$PR
> >>     > | grep -e "\"label\":" -e "\"ref\":" -e "\"sha\":" | tr '\n' ' '
> >>     | sed
> >>     > 's/\(.*sha[^,]*,\)\(.*ref.*\)/\1 = \2/g' | tr = '\n' | grep -v
> >>     "apache:" |
> >>     > sed 's/.*sha.[^"]*["]\([^"]*\).*/\1/g')
> >>     >    #fi
> >>     >
> >>     >    # get commit hash from PR
> >>     >    COMMIT=$(curl -s
> >>     https://api.github.com/repos/apache/zeppelin/pulls/$PR |
> >>     > grep -e "\"label\":" -e "\"ref\":" -e "\"sha\":" | tr '\n' ' '
> >> | sed
> >>     > 's/\(.*sha[^,]*,\)\(.*ref.*\)/\1 = \2/g' | tr = '\n' | grep -v
> >>     "apache:" |
> >>     > sed 's/.*sha.[^"]*["]\([^"]*\).*/\1/g')
> >>     >    sleep 30 # sleep few moment to wait travis starts the build
> >>     >    RET_CODE=0
> >>     >    python ./travis_check.py ${AUTHOR} ${COMMIT} || RET_CODE=$?
> >>     >    if [ $RET_CODE -eq 2 ]; then # try with repository name when
> >>     travis-ci is
> >>     > not available in the account
> >>     >      RET_CODE=0
> >>     >      AUTHOR=$(curl -s
> >>     https://api.github.com/repos/apache/zeppelin/pulls/$PR
> >>     > | grep '"full_name":' | grep -v "apache/zeppelin" | sed
> >>     > 's/.*[:][^"]*["]\([^/]*\).*/\1/g')
> >>     >    python ./travis_check.py ${AUTHOR} ${COMMIT} || RET_CODE=$?
> >>     >    fi
> >>     >
> >>     >    if [ $RET_CODE -eq 2 ]; then # fail with can't find build
> >>     information in
> >>     > the travis
> >>     >      set +x
> >>     >      echo "-----------------------------------------------------"
> >>     >      echo "Looks like travis-ci is not configured for your fork."
> >>     >      echo "Please setup by swich on 'zeppelin' repository at
> >>     > https://travis-ci.org/profile and travis-ci."
> >>     >      echo "And then make sure 'Build branch updates' option is
> >>     enabled in
> >>     > the settings https://travis-ci.org/${AUTHOR}/zeppelin/settings
> >> <https://travis-ci.org/$%7BAUTHOR%7D/zeppelin/settings>."
> >>     >      echo ""
> >>     >      echo "To trigger CI after setup, you will need ammend your
> >>     last commit
> >>     > with"
> >>     >      echo "git commit --amend"
> >>     >      echo "git push your-remote HEAD --force"
> >>     >      echo ""
> >>     >      echo "See
> >>     >
> >>
> http://zeppelin.apache.org/contribution/contributions.html#continuous-integration
> >>     > ."
> >>     >    fi
> >>     >
> >>     >    exit $RET_CODE
> >>     > else
> >>     >    set +x
> >>     >    echo "travis_check.py does not exists"
> >>     >    exit 1
> >>     > fi
> >>     >
> >>     > Chesnay Schepler <chesnay@apache.org
> >>     <ma...@apache.org>> 于2019年6月29日周六 下午3:17写道:
> >>     >
> >>     >> Does this imply that a Jenkins job is active as long as the
> >>     Travis build
> >>     >> runs?
> >>     >>
> >>     >> On 26/06/2019 21:28, Bowen Li wrote:
> >>     >>> Hi,
> >>     >>>
> >>     >>> @Dawid, I think the "long test running" as I mentioned in the
> >>     first
> >>     >> email,
> >>     >>> also as you guys said, belongs to "a big effort which is much
> >>     harder to
> >>     >>> accomplish in a short period of time and may deserve its own
> >>     separate
> >>     >>> discussion". Thus I didn't include it in what we can do in a
> >>     foreseeable
> >>     >>> short term.
> >>     >>>
> >>     >>> Besides, I don't think that's the ultimate reason for lack of
> >>     build
> >>     >>> resources. Even if the build is shortened to something like
> >>     2h, the
> >>     >>> problems of no build machine works about 6 or more hours in
> >>     PST daytime
> >>     >>> that I described will still happen, because no machine from
> >>     ASF INFRA's
> >>     >>> pool is allocated to Flink. As I have paid close attention to
> >>     the build
> >>     >>> queue in the past few weekdays, it's a pretty clear pattern now.
> >>     >>>
> >>     >>> **The ultimate root cause** for that is - we don't have any
> >>     **dedicated**
> >>     >>> build resources that we can stably rely on. I'm actually ok to
> >>     wait for a
> >>     >>> long time if there are build requests running, it means at
> >>     least we are
> >>     >>> making progress. But I'm not ok with no build resource. A
> >>     better place I
> >>     >>> think we should aim at in short term is to always have at
> >>     least a central
> >>     >>> pool (can be 3 or 5) of machines dedicated to build Flink at
> >>     any time, or
> >>     >>> maybe use users resources.
> >>     >>>
> >>     >>> @Chesnay @Robert I synced with Jeff offline that Zeppelin
> >>     community is
> >>     >>> using a Jenkins job to automatically build on users' travis
> >>     account and
> >>     >>> link the result back to github PR. I guess the Jenkins job
> >>     would fetch
> >>     >>> latest upstream master and build the PR against it. Jeff has
> >> filed
> >>     >> tickets
> >>     >>> to learn and get access to the Jenkins infra. It'll better to
> >>     fully
> >>     >>> understand it first before judging this approach.
> >>     >>>
> >>     >>> I also heard good things about CircleCI, and ASF INFRA seems
> >>     to have a
> >>     >> pool
> >>     >>> of build capacity there too. Can be an alternative to consider.
> >>     >>>
> >>     >>>
> >>     >>>
> >>     >>>
> >>     >>>
> >>     >>>
> >>     >>>
> >>     >>>
> >>     >>>
> >>     >>> On Wed, Jun 26, 2019 at 12:44 AM Dawid Wysakowicz <
> >>     >> dwysakowicz@apache.org <ma...@apache.org>>
> >>     >>> wrote:
> >>     >>>
> >>     >>>> Sorry to jump in late, but I think Bowen missed the most
> >>     important point
> >>     >>>> from Chesnay's previous message in the summary. The ultimate
> >>     reason for
> >>     >>>> all the problems is that the tests take close to 2 hours to
> >>     run already.
> >>     >>>> I fully support this claim: "Unless people start caring about
> >>     test times
> >>     >>>> before adding them, this issue cannot be solved"
> >>     >>>>
> >>     >>>> This is also another reason why using user's Travis account
> >>     won't help.
> >>     >>>> Every few weeks we reach the user's time limit for a single
> >>     profile.
> >>     >>>> This makes the user's builds simply fail, until we either
> >>     properly
> >>     >>>> decrease the time the tests take (which I am not sure we ever
> >>     did) or
> >>     >>>> postpone the problem by splitting into more profiles. (Note
> >>     that the ASF
> >>     >>>> Travis account has higher time limits)
> >>     >>>>
> >>     >>>> Best,
> >>     >>>>
> >>     >>>> Dawid
> >>     >>>>
> >>     >>>> On 26/06/2019 09:36, Robert Metzger wrote:
> >>     >>>>> Do we know if using "the best" available hardware would
> >>     improve the
> >>     >> build
> >>     >>>>> times?
> >>     >>>>> Imagine we would run the build on machines with plenty of
> >>     main memory
> >>     >> to
> >>     >>>>> mount everything to ramdisk + the latest CPU architecture?
> >>     >>>>>
> >>     >>>>> Throwing hardware at the problem could help reduce the time
> >>     of an
> >>     >>>>> individual build, and using our own infrastructure would
> >>     remove our
> >>     >>>>> dependency on Apache's Travis account (with the obvious
> >>     downside of
> >>     >>>> having
> >>     >>>>> to maintain the infrastructure)
> >>     >>>>> We could use an open source travis alternative, to have a
> >>     similar
> >>     >>>>> experience and make the migration easy.
> >>     >>>>>
> >>     >>>>>
> >>     >>>>> On Wed, Jun 26, 2019 at 9:34 AM Chesnay Schepler
> >>     <chesnay@apache.org <ma...@apache.org>>
> >>     >>>> wrote:
> >>     >>>>>>    From what I gathered, there's no special sauce that the
> >>     Zeppelin
> >>     >>>>>> project uses which actually integrates a users Travis
> >>     account into the
> >>     >>>> PR.
> >>     >>>>>> They just disabled Travis for PRs. And that's kind of it.
> >>     >>>>>>
> >>     >>>>>> Naturally we can do this (duh) and safe the ASF a fair
> >>     amount of
> >>     >>>>>> resources, but there are downsides:
> >>     >>>>>>
> >>     >>>>>> The discoverability of the Travis check takes a nose-dive.
> >>     Either we
> >>     >>>>>> require every contributor to always, an every commit, also
> >>     post a
> >>     >> Travis
> >>     >>>>>> build, or we have the reviewer sift through the
> >>     contributors account
> >>     >> to
> >>     >>>>>> find it.
> >>     >>>>>>
> >>     >>>>>> This is rather cumbersome. Additionally, it's also not
> >>     equivalent to
> >>     >>>>>> having a PR build.
> >>     >>>>>>
> >>     >>>>>> A normal branch build takes a branch as is and tests it. A
> >>     PR build
> >>     >>>>>> merges the branch into master, and then runs it. (Fun fact:
> >>     This is
> >>     >> why
> >>     >>>>>> a PR without merge conflicts is not being run on Travis.)
> >>     >>>>>>
> >>     >>>>>> And ultimately, everyone can already make use of this
> >>     approach anyway.
> >>     >>>>>>
> >>     >>>>>> On 25/06/2019 08:02, Jark Wu wrote:
> >>     >>>>>>> Hi Jeff,
> >>     >>>>>>>
> >>     >>>>>>> Thanks for sharing the Zeppelin approach. I think it's a
> >>     good idea to
> >>     >>>>>>> leverage user's travis account.
> >>     >>>>>>> In this way, we can have almost unlimited concurrent build
> >>     jobs and
> >>     >>>>>>> developers can restart build by themselves (currently only
> >>     committers
> >>     >>>>>>> can restart PR's build).
> >>     >>>>>>>
> >>     >>>>>>> But I'm still not very clear how to integrate user's
> >>     travis build
> >>     >> into
> >>     >>>>>>> the Flink pull request's build automatically. Can you
> >>     explain more in
> >>     >>>>>>> detail?
> >>     >>>>>>>
> >>     >>>>>>> Another question: does travis only build branches for user
> >>     account?
> >>     >>>>>>> My concern is that builds for PRs will rebase user's
> >>     commits against
> >>     >>>>>>> current master branch.
> >>     >>>>>>> This will help us to find problems before merge.  Builds
> >>     for branches
> >>     >>>>>>> will lose the impact of new commits in master.
> >>     >>>>>>> How does Zeppelin solve this problem?
> >>     >>>>>>>
> >>     >>>>>>> Thanks again for sharing the idea.
> >>     >>>>>>>
> >>     >>>>>>> Regards,
> >>     >>>>>>> Jark
> >>     >>>>>>>
> >>     >>>>>>> On Tue, 25 Jun 2019 at 11:01, Jeff Zhang <zjffdu@gmail.com
> >>     <ma...@gmail.com>
> >>     >>>>>>> <mailto:zjffdu@gmail.com <ma...@gmail.com>>> wrote:
> >>     >>>>>>>
> >>     >>>>>>>       Hi Folks,
> >>     >>>>>>>
> >>     >>>>>>>       Zeppelin meet this kind of issue before, we solve
> >> it by
> >>     >> delegating
> >>     >>>>>>>       each
> >>     >>>>>>>       one's PR build to his travis account (Everyone can
> >>     have 5 free
> >>     >>>>>>>       slot for
> >>     >>>>>>>       travis build).
> >>     >>>>>>>       Apache account travis build is only triggered when
> >>     PR is merged.
> >>     >>>>>>>
> >>     >>>>>>>
> >>     >>>>>>>
> >>     >>>>>>>       Kurt Young <ykt836@gmail.com
> >>     <ma...@gmail.com> <mailto:ykt836@gmail.com
> >>     <ma...@gmail.com>>>
> >>     >>>>>>>       于2019年6月25日周二 上午10:16写道:
> >>     >>>>>>>
> >>     >>>>>>>       > (Forgot to cc George)
> >>     >>>>>>>       >
> >>     >>>>>>>       > Best,
> >>     >>>>>>>       > Kurt
> >>     >>>>>>>       >
> >>     >>>>>>>       >
> >>     >>>>>>>       > On Tue, Jun 25, 2019 at 10:16 AM Kurt Young
> >>     <ykt836@gmail.com <ma...@gmail.com>
> >>     >>>>>>> <mailto:ykt836@gmail.com <ma...@gmail.com>>>
> >>     wrote:
> >>     >>>>>>>       >
> >>     >>>>>>>       > > Hi Bowen,
> >>     >>>>>>>       > >
> >>     >>>>>>>       > > Thanks for bringing this up. We actually have
> >>     discussed
> >>     >> about
> >>     >>>>>>>       this, and I
> >>     >>>>>>>       > > think Till and George have
> >>     >>>>>>>       > > already spend sometime investigating it. I have
> >>     cced both of
> >>     >>>>>>>       them, and
> >>     >>>>>>>       > > maybe they can share
> >>     >>>>>>>       > > their findings.
> >>     >>>>>>>       > >
> >>     >>>>>>>       > > Best,
> >>     >>>>>>>       > > Kurt
> >>     >>>>>>>       > >
> >>     >>>>>>>       > >
> >>     >>>>>>>       > > On Tue, Jun 25, 2019 at 10:08 AM Jark Wu
> >>     <imjark@gmail.com <ma...@gmail.com>
> >>     >>>>>>> <mailto:imjark@gmail.com <ma...@gmail.com>>>
> >>     wrote:
> >>     >>>>>>>       > >
> >>     >>>>>>>       > >> Hi Bowen,
> >>     >>>>>>>       > >>
> >>     >>>>>>>       > >> Thanks for bringing this. We also suffered from
> >>     the long
> >>     >>>>>>>       build time.
> >>     >>>>>>>       > >> I agree that we should focus on solving build
> >>     capacity
> >>     >>>>>>>       problem in the
> >>     >>>>>>>       > >> thread.
> >>     >>>>>>>       > >>
> >>     >>>>>>>       > >> My observation is there is only one build is
> >>     running, all
> >>     >> the
> >>     >>>>>>>       others
> >>     >>>>>>>       > >> (other
> >>     >>>>>>>       > >> PRs, master) are pending.
> >>     >>>>>>>       > >> The pricing plan[1] of travis shows it can
> >> support
> >>     >> concurrent
> >>     >>>>>>>       build
> >>     >>>>>>>       > jobs.
> >>     >>>>>>>       > >> But I don't know which plan we are using, might
> >>     be the free
> >>     >>>>>>>       plan for
> >>     >>>>>>>       > open
> >>     >>>>>>>       > >> source.
> >>     >>>>>>>       > >>
> >>     >>>>>>>       > >> I cc-ed Chesnay who may have some experience on
> >>     Travis.
> >>     >>>>>>>       > >>
> >>     >>>>>>>       > >> Regards,
> >>     >>>>>>>       > >> Jark
> >>     >>>>>>>       > >>
> >>     >>>>>>>       > >> [1]: https://travis-ci.com/plans
> >>     >>>>>>>       > >>
> >>     >>>>>>>       > >> On Tue, 25 Jun 2019 at 08:11, Bowen Li <
> >>     >> bowenli86@gmail.com <ma...@gmail.com>
> >>     >>>>>>> <mailto:bowenli86@gmail.com
> >>     <ma...@gmail.com>>> wrote:
> >>     >>>>>>>       > >>
> >>     >>>>>>>       > >> > Hi Steven,
> >>     >>>>>>>       > >> >
> >>     >>>>>>>       > >> > I think you may not read what I wrote. The
> >>     discussion is
> >>     >>>> about
> >>     >>>>>>>       > "unstable
> >>     >>>>>>>       > >> > build **capacity**", in another word
> >>     "unstable / lack of
> >>     >>>> build
> >>     >>>>>>>       > >> resources",
> >>     >>>>>>>       > >> > not "unstable build".
> >>     >>>>>>>       > >> >
> >>     >>>>>>>       > >> > On Mon, Jun 24, 2019 at 4:40 PM Steven Wu
> >>     >>>>>>>       <stevenz3wu@gmail.com <ma...@gmail.com>
> >>     <mailto:stevenz3wu@gmail.com <ma...@gmail.com>>>
> >>     >>>>>>>       > wrote:
> >>     >>>>>>>       > >> >
> >>     >>>>>>>       > >> > > long and sometimes unstable build is
> >>     definitely a pain
> >>     >>>>>> point.
> >>     >>>>>>>       > >> > >
> >>     >>>>>>>       > >> > > I suspect the build failure here in
> >>     >> flink-connector-kafka
> >>     >>>>>>>       is not
> >>     >>>>>>>       > >> related
> >>     >>>>>>>       > >> > to
> >>     >>>>>>>       > >> > > my change. but there is no easy re-run the
> >>     build on
> >>     >>>>>>>       travis UI.
> >>     >>>>>>>       > Google
> >>     >>>>>>>       > >> > > search showed a trick of close-and-open the
> >>     PR will
> >>     >>>>>>>       trigger rebuild.
> >>     >>>>>>>       > >> but
> >>     >>>>>>>       > >> > > that could add noises to the PR activities.
> >>     >>>>>>>       > >> > >
> >>     https://travis-ci.org/apache/flink/jobs/545555519
> >>     >>>>>>>       > >> > >
> >>     >>>>>>>       > >> > > travis-ci for my personal repo often failed
> >>     with
> >>     >>>>>>>       exceeding time
> >>     >>>>>>>       > limit
> >>     >>>>>>>       > >> > after
> >>     >>>>>>>       > >> > > 4+ hours.
> >>     >>>>>>>       > >> > > The job exceeded the maximum time limit for
> >>     jobs, and
> >>     >> has
> >>     >>>>>>>       been
> >>     >>>>>>>       > >> > terminated.
> >>     >>>>>>>       > >> > >
> >>     >>>>>>>       > >> > > On Mon, Jun 24, 2019 at 4:15 PM Bowen Li
> >>     >>>>>>>       <bowenli86@gmail.com <ma...@gmail.com>
> >>     <mailto:bowenli86@gmail.com <ma...@gmail.com>>>
> >>     >>>>>>>       > wrote:
> >>     >>>>>>>       > >> > >
> >>     >>>>>>>       > >> > > >
> >>     https://travis-ci.org/apache/flink/builds/549681530
> >>     >>>>>>>       This build
> >>     >>>>>>>       > >> > request
> >>     >>>>>>>       > >> > > > has
> >>     >>>>>>>       > >> > > > been sitting at **HEAD of the queue**
> >>     since I first
> >>     >> saw
> >>     >>>>>>>       it at PST
> >>     >>>>>>>       > >> > 10:30am
> >>     >>>>>>>       > >> > > > (not sure how long it's been there before
> >>     10:30am).
> >>     >>>>>>>       It's PST
> >>     >>>>>>>       > 4:12pm
> >>     >>>>>>>       > >> now
> >>     >>>>>>>       > >> > > and
> >>     >>>>>>>       > >> > > > it hasn't started yet.
> >>     >>>>>>>       > >> > > >
> >>     >>>>>>>       > >> > > > On Mon, Jun 24, 2019 at 2:48 PM Bowen Li
> >>     >>>>>>>       <bowenli86@gmail.com <ma...@gmail.com>
> >>     <mailto:bowenli86@gmail.com <ma...@gmail.com>>>
> >>     >>>>>>>       > >> wrote:
> >>     >>>>>>>       > >> > > >
> >>     >>>>>>>       > >> > > > > Hi devs,
> >>     >>>>>>>       > >> > > > >
> >>     >>>>>>>       > >> > > > > I've been experiencing the pain
> >>     resulting from lack
> >>     >>>>>>>       of stable
> >>     >>>>>>>       > >> build
> >>     >>>>>>>       > >> > > > > capacity on Travis for Flink PRs [1].
> >>     >> Specifically, I
> >>     >>>>>>>       noticed
> >>     >>>>>>>       > >> often
> >>     >>>>>>>       > >> > > that
> >>     >>>>>>>       > >> > > > no
> >>     >>>>>>>       > >> > > > > build in the queue is making any
> >>     progress for
> >>     >> hours,
> >>     >>>> and
> >>     >>>>>>>       > suddenly
> >>     >>>>>>>       > >> 5
> >>     >>>>>>>       > >> > or
> >>     >>>>>>>       > >> > > 6
> >>     >>>>>>>       > >> > > > > builds kick off all together after the
> >>     long pause.
> >>     >>>>>>>       I'm at PST
> >>     >>>>>>>       > >> > (UTC-08)
> >>     >>>>>>>       > >> > > > time
> >>     >>>>>>>       > >> > > > > zone, and I've seen pause can be as
> >>     long as 6 hours
> >>     >>>>>>>       from PST 9am
> >>     >>>>>>>       > >> to
> >>     >>>>>>>       > >> > 3pm
> >>     >>>>>>>       > >> > > > > (let alone the time needed to drain the
> >>     queue
> >>     >>>>>>>       afterwards).
> >>     >>>>>>>       > >> > > > >
> >>     >>>>>>>       > >> > > > > I think this has greatly impacted our
> >>     productivity.
> >>     >>>> I've
> >>     >>>>>>>       > >> experienced
> >>     >>>>>>>       > >> > > that
> >>     >>>>>>>       > >> > > > > PRs submitted in the early morning of
> >>     PST time zone
> >>     >>>>>>>       won't finish
> >>     >>>>>>>       > >> > their
> >>     >>>>>>>       > >> > > > > build until late night of the same day.
> >>     >>>>>>>       > >> > > > >
> >>     >>>>>>>       > >> > > > > So my questions are:
> >>     >>>>>>>       > >> > > > >
> >>     >>>>>>>       > >> > > > > - Has anyone else experienced the same
> >>     problem or
> >>     >>>>>>>       have similar
> >>     >>>>>>>       > >> > > > observation
> >>     >>>>>>>       > >> > > > > on TravisCI? (I suspect it has things
> >>     to do with
> >>     >> time
> >>     >>>>>>>       zone)
> >>     >>>>>>>       > >> > > > >
> >>     >>>>>>>       > >> > > > > - What pricing plan of TravisCI is
> >>     Flink currently
> >>     >>>>>>>       using? Is it
> >>     >>>>>>>       > >> the
> >>     >>>>>>>       > >> > > free
> >>     >>>>>>>       > >> > > > > plan for open source projects? What
> >> are the
> >>     >>>>>>>       guaranteed build
> >>     >>>>>>>       > >> capacity
> >>     >>>>>>>       > >> > > of
> >>     >>>>>>>       > >> > > > > the current plan?
> >>     >>>>>>>       > >> > > > >
> >>     >>>>>>>       > >> > > > > - If the current pricing plan (either
> >>     free or paid)
> >>     >>>>>> can't
> >>     >>>>>>>       > provide
> >>     >>>>>>>       > >> > > stable
> >>     >>>>>>>       > >> > > > > build capacity, can we upgrade to a
> >>     higher priced
> >>     >>>>>>>       plan with
> >>     >>>>>>>       > larger
> >>     >>>>>>>       > >> > and
> >>     >>>>>>>       > >> > > > more
> >>     >>>>>>>       > >> > > > > stable build capacity?
> >>     >>>>>>>       > >> > > > >
> >>     >>>>>>>       > >> > > > > BTW, another factor that contribute to
> >> the
> >>     >>>>>>>       productivity problem
> >>     >>>>>>>       > is
> >>     >>>>>>>       > >> > that
> >>     >>>>>>>       > >> > > > > our build is slow - we run full build
> >>     for every PR
> >>     >>>> and a
> >>     >>>>>>>       > >> successful
> >>     >>>>>>>       > >> > > full
> >>     >>>>>>>       > >> > > > > build takes ~5h. We definitely have
> >>     more options to
> >>     >>>>>>>       solve it,
> >>     >>>>>>>       > for
> >>     >>>>>>>       > >> > > > instance,
> >>     >>>>>>>       > >> > > > > modularize the build graphs and reuse
> >>     artifacts
> >>     >> from
> >>     >>>> the
> >>     >>>>>>>       > previous
> >>     >>>>>>>       > >> > > build.
> >>     >>>>>>>       > >> > > > > But I think that can be a big effort
> >>     which is much
> >>     >>>>>>>       harder to
> >>     >>>>>>>       > >> > accomplish
> >>     >>>>>>>       > >> > > > in
> >>     >>>>>>>       > >> > > > > a short period of time and may deserve
> >>     its own
> >>     >>>> separate
> >>     >>>>>>>       > >> discussion.
> >>     >>>>>>>       > >> > > > >
> >>     >>>>>>>       > >> > > > > [1]
> >>     >> https://travis-ci.org/apache/flink/pull_requests
> >>     >>>>>>>       > >> > > > >
> >>     >>>>>>>       > >> > > > >
> >>     >>>>>>>       > >> > > >
> >>     >>>>>>>       > >> > >
> >>     >>>>>>>       > >> >
> >>     >>>>>>>       > >>
> >>     >>>>>>>       > >
> >>     >>>>>>>       >
> >>     >>>>>>>
> >>     >>>>>>>
> >>     >>>>>>>       --
> >>     >>>>>>>       Best Regards
> >>     >>>>>>>
> >>     >>>>>>>       Jeff Zhang
> >>     >>>>>>>
> >>     >>
> >>
> >
> >
>
>

Re: [DISCUSS] solve unstable build capacity problem on TravisCI

Posted by Chesnay Schepler <ch...@apache.org>.
As a short-term stopgap, since we can assume this issue to become much 
worse in the following days/weeks, we could disable IT cases in PRs and 
only run them on master.

On 02/07/2019 12:03, Chesnay Schepler wrote:
> People really have to stop thinking that just because something works 
> for us it is also a good solution.
> Also, please remember that our builds run for 2h from start to finish, 
> and not the 14 _minutes_ it takes for zeppelin.
> We are dealing with an entirely different scale here, both in terms of 
> build times and number of builds.
>
> In this very thread people have been complaining about long queue 
> times for their builds. Surprise, other Apache projects have been 
> suffering the very same thing due to us not controlling our build 
> times. While switching services (be it Jenkins, CircleCI or whatever) 
> will possibly work for us (and these options are actually attractive, 
> like CircleCI's proper support for build artifacts), it will also 
> result in us likely negatively affecting other projects in significant 
> ways.
>
> Sure, the Jenkins setup has a good user experience for us, at the cost 
> of blocking Jenkins workers for a _lot_ of time. Right now we have 25 
> PR's in our queue; that's possibly 50h we'd consume of Jenkins 
> resources, and the European contributors haven't even really started yet.
>
> FYI, the latest INFRA response from INFRA-18533:
>
> "Our rough metrics shows that Flink used over 5800 hours of build time 
> last month. That is equal to EIGHT servers running 24/7 for the ENTIRE 
> MONTH. EIGHT. nonstop.
> When we discovered this last night, we discussed it some and are going 
> to tune down Flink to allow only five executors maximum. We cannot 
> allow Flink to consume so much of a Foundation shared resource."
>
> So yes, we either
> a) have to heavily reduce our CI usage or
> b) fund our own, either maintaining it ourselves or donating to Apache.
>
> On 02/07/2019 05:11, Bowen Li wrote:
>> By looking at the git history of the Jenkins script, its core part 
>> was finished in March 2017 (and only two minor update in 2017/2018), 
>> so it's been running for over two years now and feels like Zepplin 
>> community has been quite happy with it. @Jeff Zhang 
>> <ma...@gmail.com> can you share your insights and user 
>> experience with the Jenkins+Travis approach?
>>
>> Things like:
>>
>> - has the approach completely solved the resource capacity problem 
>> for Zepplin community? is Zepplin community happy with the result?
>> - is the whole configuration chain stable (e.g. uptime) enough?
>> - how often do you need to maintain the Jenkins infra? how many 
>> people are usually involved in maintenance and bug-fixes?
>>
>> The downside of this approach seems mostly to be on the maintenance 
>> to me - maintain the script and Jenkins infra.
>>
>> ** Having Our Own Travis-CI.com Account **
>>
>> Another alternative I've been thinking of is to have our own 
>> travis-ci.com <http://travis-ci.com> account with paid dedicated 
>> resources. Note travis-ci.org <http://travis-ci.org> is the free 
>> version and travis-ci.com <http://travis-ci.com> is the commercial 
>> version. We currently use a shared resource pool managed by ASK INFRA 
>> team on travis-ci.org <http://travis-ci.org>, but we have no control 
>> over it - we can't see how it's configured, how much resources are 
>> available, how resources are allocated among Apache projects, etc. 
>> The nice thing about having an account on travis-ci.com 
>> <http://travis-ci.com> are:
>>
>> - relatively low cost with much better resource guarantee than what 
>> we currently have [1]: $249/month with 5 dedicated concurrency, 
>> $489/month with 10 concurrency
>> - low maintenance work compared to using Jenkins
>> - (potentially) no migration cost according to Travis's doc [2] 
>> (pending verification)
>> - full control over the build capacity/configuration compared to 
>> using ASF INFRA's pool
>>
>> I'd be surprised if we as such a vibrant community cannot find and 
>> fund $249*12=$2988 a year in exchange for a much better developer 
>> experience and much higher productivity.
>>
>> [1] https://travis-ci.com/plans
>> [2] 
>> https://docs.travis-ci.com/user/migrate/open-source-repository-migration
>>
>> On Sat, Jun 29, 2019 at 8:39 AM Chesnay Schepler <chesnay@apache.org 
>> <ma...@apache.org>> wrote:
>>
>>     So yes, the Jenkins job keeps pulling the state from Travis until it
>>     finishes.
>>
>>     Note sure I'm comfortable with the idea of using Jenkins workers
>>     just to
>>     idle for a several hours.
>>
>>     On 29/06/2019 14:56, Jeff Zhang wrote:
>>     > Here's what zeppelin community did, we make a python script to
>>     check the
>>     > build status of pull request.
>>     > Here's script:
>>     > https://github.com/apache/zeppelin/blob/master/travis_check.py
>>     >
>>     > And this is the script we used in Jenkins build job.
>>     >
>>     > if [ -f "travis_check.py" ]; then
>>     >    git log -n 1
>>     >    STATUS=$(curl -s $BUILD_URL | grep -e "GitHub pull
>>     request.*from.*" | sed
>>     > 's/.*GitHub pull request <a
>>     > href=\"\(https[^"]*\).*from[^"]*.\(https[^"]*\).*/\1 \2/g')
>>     >    AUTHOR=$(echo $STATUS | sed 's/.*[/]\(.*\)$/\1/g')
>>     >    PR=$(echo $STATUS | awk '{print $1}' | sed 
>> 's/.*[/]\(.*\)$/\1/g')
>>     >    #COMMIT=$(git log -n 1 | grep "^Merge:" | awk '{print $3}')
>>     >    #if [ -z $COMMIT ]; then
>>     >    #  COMMIT=$(curl -s
>>     https://api.github.com/repos/apache/zeppelin/pulls/$PR
>>     > | grep -e "\"label\":" -e "\"ref\":" -e "\"sha\":" | tr '\n' ' '
>>     | sed
>>     > 's/\(.*sha[^,]*,\)\(.*ref.*\)/\1 = \2/g' | tr = '\n' | grep -v
>>     "apache:" |
>>     > sed 's/.*sha.[^"]*["]\([^"]*\).*/\1/g')
>>     >    #fi
>>     >
>>     >    # get commit hash from PR
>>     >    COMMIT=$(curl -s
>>     https://api.github.com/repos/apache/zeppelin/pulls/$PR |
>>     > grep -e "\"label\":" -e "\"ref\":" -e "\"sha\":" | tr '\n' ' ' 
>> | sed
>>     > 's/\(.*sha[^,]*,\)\(.*ref.*\)/\1 = \2/g' | tr = '\n' | grep -v
>>     "apache:" |
>>     > sed 's/.*sha.[^"]*["]\([^"]*\).*/\1/g')
>>     >    sleep 30 # sleep few moment to wait travis starts the build
>>     >    RET_CODE=0
>>     >    python ./travis_check.py ${AUTHOR} ${COMMIT} || RET_CODE=$?
>>     >    if [ $RET_CODE -eq 2 ]; then # try with repository name when
>>     travis-ci is
>>     > not available in the account
>>     >      RET_CODE=0
>>     >      AUTHOR=$(curl -s
>>     https://api.github.com/repos/apache/zeppelin/pulls/$PR
>>     > | grep '"full_name":' | grep -v "apache/zeppelin" | sed
>>     > 's/.*[:][^"]*["]\([^/]*\).*/\1/g')
>>     >    python ./travis_check.py ${AUTHOR} ${COMMIT} || RET_CODE=$?
>>     >    fi
>>     >
>>     >    if [ $RET_CODE -eq 2 ]; then # fail with can't find build
>>     information in
>>     > the travis
>>     >      set +x
>>     >      echo "-----------------------------------------------------"
>>     >      echo "Looks like travis-ci is not configured for your fork."
>>     >      echo "Please setup by swich on 'zeppelin' repository at
>>     > https://travis-ci.org/profile and travis-ci."
>>     >      echo "And then make sure 'Build branch updates' option is
>>     enabled in
>>     > the settings https://travis-ci.org/${AUTHOR}/zeppelin/settings
>> <https://travis-ci.org/$%7BAUTHOR%7D/zeppelin/settings>."
>>     >      echo ""
>>     >      echo "To trigger CI after setup, you will need ammend your
>>     last commit
>>     > with"
>>     >      echo "git commit --amend"
>>     >      echo "git push your-remote HEAD --force"
>>     >      echo ""
>>     >      echo "See
>>     >
>> http://zeppelin.apache.org/contribution/contributions.html#continuous-integration
>>     > ."
>>     >    fi
>>     >
>>     >    exit $RET_CODE
>>     > else
>>     >    set +x
>>     >    echo "travis_check.py does not exists"
>>     >    exit 1
>>     > fi
>>     >
>>     > Chesnay Schepler <chesnay@apache.org
>>     <ma...@apache.org>> 于2019年6月29日周六 下午3:17写道:
>>     >
>>     >> Does this imply that a Jenkins job is active as long as the
>>     Travis build
>>     >> runs?
>>     >>
>>     >> On 26/06/2019 21:28, Bowen Li wrote:
>>     >>> Hi,
>>     >>>
>>     >>> @Dawid, I think the "long test running" as I mentioned in the
>>     first
>>     >> email,
>>     >>> also as you guys said, belongs to "a big effort which is much
>>     harder to
>>     >>> accomplish in a short period of time and may deserve its own
>>     separate
>>     >>> discussion". Thus I didn't include it in what we can do in a
>>     foreseeable
>>     >>> short term.
>>     >>>
>>     >>> Besides, I don't think that's the ultimate reason for lack of
>>     build
>>     >>> resources. Even if the build is shortened to something like
>>     2h, the
>>     >>> problems of no build machine works about 6 or more hours in
>>     PST daytime
>>     >>> that I described will still happen, because no machine from
>>     ASF INFRA's
>>     >>> pool is allocated to Flink. As I have paid close attention to
>>     the build
>>     >>> queue in the past few weekdays, it's a pretty clear pattern now.
>>     >>>
>>     >>> **The ultimate root cause** for that is - we don't have any
>>     **dedicated**
>>     >>> build resources that we can stably rely on. I'm actually ok to
>>     wait for a
>>     >>> long time if there are build requests running, it means at
>>     least we are
>>     >>> making progress. But I'm not ok with no build resource. A
>>     better place I
>>     >>> think we should aim at in short term is to always have at
>>     least a central
>>     >>> pool (can be 3 or 5) of machines dedicated to build Flink at
>>     any time, or
>>     >>> maybe use users resources.
>>     >>>
>>     >>> @Chesnay @Robert I synced with Jeff offline that Zeppelin
>>     community is
>>     >>> using a Jenkins job to automatically build on users' travis
>>     account and
>>     >>> link the result back to github PR. I guess the Jenkins job
>>     would fetch
>>     >>> latest upstream master and build the PR against it. Jeff has 
>> filed
>>     >> tickets
>>     >>> to learn and get access to the Jenkins infra. It'll better to
>>     fully
>>     >>> understand it first before judging this approach.
>>     >>>
>>     >>> I also heard good things about CircleCI, and ASF INFRA seems
>>     to have a
>>     >> pool
>>     >>> of build capacity there too. Can be an alternative to consider.
>>     >>>
>>     >>>
>>     >>>
>>     >>>
>>     >>>
>>     >>>
>>     >>>
>>     >>>
>>     >>>
>>     >>> On Wed, Jun 26, 2019 at 12:44 AM Dawid Wysakowicz <
>>     >> dwysakowicz@apache.org <ma...@apache.org>>
>>     >>> wrote:
>>     >>>
>>     >>>> Sorry to jump in late, but I think Bowen missed the most
>>     important point
>>     >>>> from Chesnay's previous message in the summary. The ultimate
>>     reason for
>>     >>>> all the problems is that the tests take close to 2 hours to
>>     run already.
>>     >>>> I fully support this claim: "Unless people start caring about
>>     test times
>>     >>>> before adding them, this issue cannot be solved"
>>     >>>>
>>     >>>> This is also another reason why using user's Travis account
>>     won't help.
>>     >>>> Every few weeks we reach the user's time limit for a single
>>     profile.
>>     >>>> This makes the user's builds simply fail, until we either
>>     properly
>>     >>>> decrease the time the tests take (which I am not sure we ever
>>     did) or
>>     >>>> postpone the problem by splitting into more profiles. (Note
>>     that the ASF
>>     >>>> Travis account has higher time limits)
>>     >>>>
>>     >>>> Best,
>>     >>>>
>>     >>>> Dawid
>>     >>>>
>>     >>>> On 26/06/2019 09:36, Robert Metzger wrote:
>>     >>>>> Do we know if using "the best" available hardware would
>>     improve the
>>     >> build
>>     >>>>> times?
>>     >>>>> Imagine we would run the build on machines with plenty of
>>     main memory
>>     >> to
>>     >>>>> mount everything to ramdisk + the latest CPU architecture?
>>     >>>>>
>>     >>>>> Throwing hardware at the problem could help reduce the time
>>     of an
>>     >>>>> individual build, and using our own infrastructure would
>>     remove our
>>     >>>>> dependency on Apache's Travis account (with the obvious
>>     downside of
>>     >>>> having
>>     >>>>> to maintain the infrastructure)
>>     >>>>> We could use an open source travis alternative, to have a
>>     similar
>>     >>>>> experience and make the migration easy.
>>     >>>>>
>>     >>>>>
>>     >>>>> On Wed, Jun 26, 2019 at 9:34 AM Chesnay Schepler
>>     <chesnay@apache.org <ma...@apache.org>>
>>     >>>> wrote:
>>     >>>>>>    From what I gathered, there's no special sauce that the
>>     Zeppelin
>>     >>>>>> project uses which actually integrates a users Travis
>>     account into the
>>     >>>> PR.
>>     >>>>>> They just disabled Travis for PRs. And that's kind of it.
>>     >>>>>>
>>     >>>>>> Naturally we can do this (duh) and safe the ASF a fair
>>     amount of
>>     >>>>>> resources, but there are downsides:
>>     >>>>>>
>>     >>>>>> The discoverability of the Travis check takes a nose-dive.
>>     Either we
>>     >>>>>> require every contributor to always, an every commit, also
>>     post a
>>     >> Travis
>>     >>>>>> build, or we have the reviewer sift through the
>>     contributors account
>>     >> to
>>     >>>>>> find it.
>>     >>>>>>
>>     >>>>>> This is rather cumbersome. Additionally, it's also not
>>     equivalent to
>>     >>>>>> having a PR build.
>>     >>>>>>
>>     >>>>>> A normal branch build takes a branch as is and tests it. A
>>     PR build
>>     >>>>>> merges the branch into master, and then runs it. (Fun fact:
>>     This is
>>     >> why
>>     >>>>>> a PR without merge conflicts is not being run on Travis.)
>>     >>>>>>
>>     >>>>>> And ultimately, everyone can already make use of this
>>     approach anyway.
>>     >>>>>>
>>     >>>>>> On 25/06/2019 08:02, Jark Wu wrote:
>>     >>>>>>> Hi Jeff,
>>     >>>>>>>
>>     >>>>>>> Thanks for sharing the Zeppelin approach. I think it's a
>>     good idea to
>>     >>>>>>> leverage user's travis account.
>>     >>>>>>> In this way, we can have almost unlimited concurrent build
>>     jobs and
>>     >>>>>>> developers can restart build by themselves (currently only
>>     committers
>>     >>>>>>> can restart PR's build).
>>     >>>>>>>
>>     >>>>>>> But I'm still not very clear how to integrate user's
>>     travis build
>>     >> into
>>     >>>>>>> the Flink pull request's build automatically. Can you
>>     explain more in
>>     >>>>>>> detail?
>>     >>>>>>>
>>     >>>>>>> Another question: does travis only build branches for user
>>     account?
>>     >>>>>>> My concern is that builds for PRs will rebase user's
>>     commits against
>>     >>>>>>> current master branch.
>>     >>>>>>> This will help us to find problems before merge.  Builds
>>     for branches
>>     >>>>>>> will lose the impact of new commits in master.
>>     >>>>>>> How does Zeppelin solve this problem?
>>     >>>>>>>
>>     >>>>>>> Thanks again for sharing the idea.
>>     >>>>>>>
>>     >>>>>>> Regards,
>>     >>>>>>> Jark
>>     >>>>>>>
>>     >>>>>>> On Tue, 25 Jun 2019 at 11:01, Jeff Zhang <zjffdu@gmail.com
>>     <ma...@gmail.com>
>>     >>>>>>> <mailto:zjffdu@gmail.com <ma...@gmail.com>>> wrote:
>>     >>>>>>>
>>     >>>>>>>       Hi Folks,
>>     >>>>>>>
>>     >>>>>>>       Zeppelin meet this kind of issue before, we solve 
>> it by
>>     >> delegating
>>     >>>>>>>       each
>>     >>>>>>>       one's PR build to his travis account (Everyone can
>>     have 5 free
>>     >>>>>>>       slot for
>>     >>>>>>>       travis build).
>>     >>>>>>>       Apache account travis build is only triggered when
>>     PR is merged.
>>     >>>>>>>
>>     >>>>>>>
>>     >>>>>>>
>>     >>>>>>>       Kurt Young <ykt836@gmail.com
>>     <ma...@gmail.com> <mailto:ykt836@gmail.com
>>     <ma...@gmail.com>>>
>>     >>>>>>>       于2019年6月25日周二 上午10:16写道:
>>     >>>>>>>
>>     >>>>>>>       > (Forgot to cc George)
>>     >>>>>>>       >
>>     >>>>>>>       > Best,
>>     >>>>>>>       > Kurt
>>     >>>>>>>       >
>>     >>>>>>>       >
>>     >>>>>>>       > On Tue, Jun 25, 2019 at 10:16 AM Kurt Young
>>     <ykt836@gmail.com <ma...@gmail.com>
>>     >>>>>>> <mailto:ykt836@gmail.com <ma...@gmail.com>>>
>>     wrote:
>>     >>>>>>>       >
>>     >>>>>>>       > > Hi Bowen,
>>     >>>>>>>       > >
>>     >>>>>>>       > > Thanks for bringing this up. We actually have
>>     discussed
>>     >> about
>>     >>>>>>>       this, and I
>>     >>>>>>>       > > think Till and George have
>>     >>>>>>>       > > already spend sometime investigating it. I have
>>     cced both of
>>     >>>>>>>       them, and
>>     >>>>>>>       > > maybe they can share
>>     >>>>>>>       > > their findings.
>>     >>>>>>>       > >
>>     >>>>>>>       > > Best,
>>     >>>>>>>       > > Kurt
>>     >>>>>>>       > >
>>     >>>>>>>       > >
>>     >>>>>>>       > > On Tue, Jun 25, 2019 at 10:08 AM Jark Wu
>>     <imjark@gmail.com <ma...@gmail.com>
>>     >>>>>>> <mailto:imjark@gmail.com <ma...@gmail.com>>>
>>     wrote:
>>     >>>>>>>       > >
>>     >>>>>>>       > >> Hi Bowen,
>>     >>>>>>>       > >>
>>     >>>>>>>       > >> Thanks for bringing this. We also suffered from
>>     the long
>>     >>>>>>>       build time.
>>     >>>>>>>       > >> I agree that we should focus on solving build
>>     capacity
>>     >>>>>>>       problem in the
>>     >>>>>>>       > >> thread.
>>     >>>>>>>       > >>
>>     >>>>>>>       > >> My observation is there is only one build is
>>     running, all
>>     >> the
>>     >>>>>>>       others
>>     >>>>>>>       > >> (other
>>     >>>>>>>       > >> PRs, master) are pending.
>>     >>>>>>>       > >> The pricing plan[1] of travis shows it can 
>> support
>>     >> concurrent
>>     >>>>>>>       build
>>     >>>>>>>       > jobs.
>>     >>>>>>>       > >> But I don't know which plan we are using, might
>>     be the free
>>     >>>>>>>       plan for
>>     >>>>>>>       > open
>>     >>>>>>>       > >> source.
>>     >>>>>>>       > >>
>>     >>>>>>>       > >> I cc-ed Chesnay who may have some experience on
>>     Travis.
>>     >>>>>>>       > >>
>>     >>>>>>>       > >> Regards,
>>     >>>>>>>       > >> Jark
>>     >>>>>>>       > >>
>>     >>>>>>>       > >> [1]: https://travis-ci.com/plans
>>     >>>>>>>       > >>
>>     >>>>>>>       > >> On Tue, 25 Jun 2019 at 08:11, Bowen Li <
>>     >> bowenli86@gmail.com <ma...@gmail.com>
>>     >>>>>>> <mailto:bowenli86@gmail.com
>>     <ma...@gmail.com>>> wrote:
>>     >>>>>>>       > >>
>>     >>>>>>>       > >> > Hi Steven,
>>     >>>>>>>       > >> >
>>     >>>>>>>       > >> > I think you may not read what I wrote. The
>>     discussion is
>>     >>>> about
>>     >>>>>>>       > "unstable
>>     >>>>>>>       > >> > build **capacity**", in another word
>>     "unstable / lack of
>>     >>>> build
>>     >>>>>>>       > >> resources",
>>     >>>>>>>       > >> > not "unstable build".
>>     >>>>>>>       > >> >
>>     >>>>>>>       > >> > On Mon, Jun 24, 2019 at 4:40 PM Steven Wu
>>     >>>>>>>       <stevenz3wu@gmail.com <ma...@gmail.com>
>>     <mailto:stevenz3wu@gmail.com <ma...@gmail.com>>>
>>     >>>>>>>       > wrote:
>>     >>>>>>>       > >> >
>>     >>>>>>>       > >> > > long and sometimes unstable build is
>>     definitely a pain
>>     >>>>>> point.
>>     >>>>>>>       > >> > >
>>     >>>>>>>       > >> > > I suspect the build failure here in
>>     >> flink-connector-kafka
>>     >>>>>>>       is not
>>     >>>>>>>       > >> related
>>     >>>>>>>       > >> > to
>>     >>>>>>>       > >> > > my change. but there is no easy re-run the
>>     build on
>>     >>>>>>>       travis UI.
>>     >>>>>>>       > Google
>>     >>>>>>>       > >> > > search showed a trick of close-and-open the
>>     PR will
>>     >>>>>>>       trigger rebuild.
>>     >>>>>>>       > >> but
>>     >>>>>>>       > >> > > that could add noises to the PR activities.
>>     >>>>>>>       > >> > >
>>     https://travis-ci.org/apache/flink/jobs/545555519
>>     >>>>>>>       > >> > >
>>     >>>>>>>       > >> > > travis-ci for my personal repo often failed
>>     with
>>     >>>>>>>       exceeding time
>>     >>>>>>>       > limit
>>     >>>>>>>       > >> > after
>>     >>>>>>>       > >> > > 4+ hours.
>>     >>>>>>>       > >> > > The job exceeded the maximum time limit for
>>     jobs, and
>>     >> has
>>     >>>>>>>       been
>>     >>>>>>>       > >> > terminated.
>>     >>>>>>>       > >> > >
>>     >>>>>>>       > >> > > On Mon, Jun 24, 2019 at 4:15 PM Bowen Li
>>     >>>>>>>       <bowenli86@gmail.com <ma...@gmail.com>
>>     <mailto:bowenli86@gmail.com <ma...@gmail.com>>>
>>     >>>>>>>       > wrote:
>>     >>>>>>>       > >> > >
>>     >>>>>>>       > >> > > >
>>     https://travis-ci.org/apache/flink/builds/549681530
>>     >>>>>>>       This build
>>     >>>>>>>       > >> > request
>>     >>>>>>>       > >> > > > has
>>     >>>>>>>       > >> > > > been sitting at **HEAD of the queue**
>>     since I first
>>     >> saw
>>     >>>>>>>       it at PST
>>     >>>>>>>       > >> > 10:30am
>>     >>>>>>>       > >> > > > (not sure how long it's been there before
>>     10:30am).
>>     >>>>>>>       It's PST
>>     >>>>>>>       > 4:12pm
>>     >>>>>>>       > >> now
>>     >>>>>>>       > >> > > and
>>     >>>>>>>       > >> > > > it hasn't started yet.
>>     >>>>>>>       > >> > > >
>>     >>>>>>>       > >> > > > On Mon, Jun 24, 2019 at 2:48 PM Bowen Li
>>     >>>>>>>       <bowenli86@gmail.com <ma...@gmail.com>
>>     <mailto:bowenli86@gmail.com <ma...@gmail.com>>>
>>     >>>>>>>       > >> wrote:
>>     >>>>>>>       > >> > > >
>>     >>>>>>>       > >> > > > > Hi devs,
>>     >>>>>>>       > >> > > > >
>>     >>>>>>>       > >> > > > > I've been experiencing the pain
>>     resulting from lack
>>     >>>>>>>       of stable
>>     >>>>>>>       > >> build
>>     >>>>>>>       > >> > > > > capacity on Travis for Flink PRs [1].
>>     >> Specifically, I
>>     >>>>>>>       noticed
>>     >>>>>>>       > >> often
>>     >>>>>>>       > >> > > that
>>     >>>>>>>       > >> > > > no
>>     >>>>>>>       > >> > > > > build in the queue is making any
>>     progress for
>>     >> hours,
>>     >>>> and
>>     >>>>>>>       > suddenly
>>     >>>>>>>       > >> 5
>>     >>>>>>>       > >> > or
>>     >>>>>>>       > >> > > 6
>>     >>>>>>>       > >> > > > > builds kick off all together after the
>>     long pause.
>>     >>>>>>>       I'm at PST
>>     >>>>>>>       > >> > (UTC-08)
>>     >>>>>>>       > >> > > > time
>>     >>>>>>>       > >> > > > > zone, and I've seen pause can be as
>>     long as 6 hours
>>     >>>>>>>       from PST 9am
>>     >>>>>>>       > >> to
>>     >>>>>>>       > >> > 3pm
>>     >>>>>>>       > >> > > > > (let alone the time needed to drain the
>>     queue
>>     >>>>>>>       afterwards).
>>     >>>>>>>       > >> > > > >
>>     >>>>>>>       > >> > > > > I think this has greatly impacted our
>>     productivity.
>>     >>>> I've
>>     >>>>>>>       > >> experienced
>>     >>>>>>>       > >> > > that
>>     >>>>>>>       > >> > > > > PRs submitted in the early morning of
>>     PST time zone
>>     >>>>>>>       won't finish
>>     >>>>>>>       > >> > their
>>     >>>>>>>       > >> > > > > build until late night of the same day.
>>     >>>>>>>       > >> > > > >
>>     >>>>>>>       > >> > > > > So my questions are:
>>     >>>>>>>       > >> > > > >
>>     >>>>>>>       > >> > > > > - Has anyone else experienced the same
>>     problem or
>>     >>>>>>>       have similar
>>     >>>>>>>       > >> > > > observation
>>     >>>>>>>       > >> > > > > on TravisCI? (I suspect it has things
>>     to do with
>>     >> time
>>     >>>>>>>       zone)
>>     >>>>>>>       > >> > > > >
>>     >>>>>>>       > >> > > > > - What pricing plan of TravisCI is
>>     Flink currently
>>     >>>>>>>       using? Is it
>>     >>>>>>>       > >> the
>>     >>>>>>>       > >> > > free
>>     >>>>>>>       > >> > > > > plan for open source projects? What 
>> are the
>>     >>>>>>>       guaranteed build
>>     >>>>>>>       > >> capacity
>>     >>>>>>>       > >> > > of
>>     >>>>>>>       > >> > > > > the current plan?
>>     >>>>>>>       > >> > > > >
>>     >>>>>>>       > >> > > > > - If the current pricing plan (either
>>     free or paid)
>>     >>>>>> can't
>>     >>>>>>>       > provide
>>     >>>>>>>       > >> > > stable
>>     >>>>>>>       > >> > > > > build capacity, can we upgrade to a
>>     higher priced
>>     >>>>>>>       plan with
>>     >>>>>>>       > larger
>>     >>>>>>>       > >> > and
>>     >>>>>>>       > >> > > > more
>>     >>>>>>>       > >> > > > > stable build capacity?
>>     >>>>>>>       > >> > > > >
>>     >>>>>>>       > >> > > > > BTW, another factor that contribute to 
>> the
>>     >>>>>>>       productivity problem
>>     >>>>>>>       > is
>>     >>>>>>>       > >> > that
>>     >>>>>>>       > >> > > > > our build is slow - we run full build
>>     for every PR
>>     >>>> and a
>>     >>>>>>>       > >> successful
>>     >>>>>>>       > >> > > full
>>     >>>>>>>       > >> > > > > build takes ~5h. We definitely have
>>     more options to
>>     >>>>>>>       solve it,
>>     >>>>>>>       > for
>>     >>>>>>>       > >> > > > instance,
>>     >>>>>>>       > >> > > > > modularize the build graphs and reuse
>>     artifacts
>>     >> from
>>     >>>> the
>>     >>>>>>>       > previous
>>     >>>>>>>       > >> > > build.
>>     >>>>>>>       > >> > > > > But I think that can be a big effort
>>     which is much
>>     >>>>>>>       harder to
>>     >>>>>>>       > >> > accomplish
>>     >>>>>>>       > >> > > > in
>>     >>>>>>>       > >> > > > > a short period of time and may deserve
>>     its own
>>     >>>> separate
>>     >>>>>>>       > >> discussion.
>>     >>>>>>>       > >> > > > >
>>     >>>>>>>       > >> > > > > [1]
>>     >> https://travis-ci.org/apache/flink/pull_requests
>>     >>>>>>>       > >> > > > >
>>     >>>>>>>       > >> > > > >
>>     >>>>>>>       > >> > > >
>>     >>>>>>>       > >> > >
>>     >>>>>>>       > >> >
>>     >>>>>>>       > >>
>>     >>>>>>>       > >
>>     >>>>>>>       >
>>     >>>>>>>
>>     >>>>>>>
>>     >>>>>>>       --
>>     >>>>>>>       Best Regards
>>     >>>>>>>
>>     >>>>>>>       Jeff Zhang
>>     >>>>>>>
>>     >>
>>
>
>


Re: [DISCUSS] solve unstable build capacity problem on TravisCI

Posted by Chesnay Schepler <ch...@apache.org>.
People really have to stop thinking that just because something works 
for us it is also a good solution.
Also, please remember that our builds run for 2h from start to finish, 
and not the 14 _minutes_ it takes for zeppelin.
We are dealing with an entirely different scale here, both in terms of 
build times and number of builds.

In this very thread people have been complaining about long queue times 
for their builds. Surprise, other Apache projects have been suffering 
the very same thing due to us not controlling our build times. While 
switching services (be it Jenkins, CircleCI or whatever) will possibly 
work for us (and these options are actually attractive, like CircleCI's 
proper support for build artifacts), it will also result in us likely 
negatively affecting other projects in significant ways.

Sure, the Jenkins setup has a good user experience for us, at the cost 
of blocking Jenkins workers for a _lot_ of time. Right now we have 25 
PR's in our queue; that's possibly 50h we'd consume of Jenkins 
resources, and the European contributors haven't even really started yet.

FYI, the latest INFRA response from INFRA-18533:

"Our rough metrics shows that Flink used over 5800 hours of build time 
last month. That is equal to EIGHT servers running 24/7 for the ENTIRE 
MONTH. EIGHT. nonstop.
When we discovered this last night, we discussed it some and are going 
to tune down Flink to allow only five executors maximum. We cannot allow 
Flink to consume so much of a Foundation shared resource."

So yes, we either
a) have to heavily reduce our CI usage or
b) fund our own, either maintaining it ourselves or donating to Apache.

On 02/07/2019 05:11, Bowen Li wrote:
> By looking at the git history of the Jenkins script, its core part was 
> finished in March 2017 (and only two minor update in 2017/2018), so 
> it's been running for over two years now and feels like Zepplin 
> community has been quite happy with it. @Jeff Zhang 
> <ma...@gmail.com> can you share your insights and user 
> experience with the Jenkins+Travis approach?
>
> Things like:
>
> - has the approach completely solved the resource capacity problem for 
> Zepplin community? is Zepplin community happy with the result?
> - is the whole configuration chain stable (e.g. uptime) enough?
> - how often do you need to maintain the Jenkins infra? how many people 
> are usually involved in maintenance and bug-fixes?
>
> The downside of this approach seems mostly to be on the maintenance to 
> me - maintain the script and Jenkins infra.
>
> ** Having Our Own Travis-CI.com Account **
>
> Another alternative I've been thinking of is to have our own 
> travis-ci.com <http://travis-ci.com> account with paid dedicated 
> resources. Note travis-ci.org <http://travis-ci.org> is the free 
> version and travis-ci.com <http://travis-ci.com> is the commercial 
> version. We currently use a shared resource pool managed by ASK INFRA 
> team on travis-ci.org <http://travis-ci.org>, but we have no control 
> over it - we can't see how it's configured, how much resources are 
> available, how resources are allocated among Apache projects, etc. The 
> nice thing about having an account on travis-ci.com 
> <http://travis-ci.com> are:
>
> - relatively low cost with much better resource guarantee than what we 
> currently have [1]: $249/month with 5 dedicated concurrency, 
> $489/month with 10 concurrency
> - low maintenance work compared to using Jenkins
> - (potentially) no migration cost according to Travis's doc [2] 
> (pending verification)
> - full control over the build capacity/configuration compared to using 
> ASF INFRA's pool
>
> I'd be surprised if we as such a vibrant community cannot find and 
> fund $249*12=$2988 a year in exchange for a much better developer 
> experience and much higher productivity.
>
> [1] https://travis-ci.com/plans
> [2] 
> https://docs.travis-ci.com/user/migrate/open-source-repository-migration
>
> On Sat, Jun 29, 2019 at 8:39 AM Chesnay Schepler <chesnay@apache.org 
> <ma...@apache.org>> wrote:
>
>     So yes, the Jenkins job keeps pulling the state from Travis until it
>     finishes.
>
>     Note sure I'm comfortable with the idea of using Jenkins workers
>     just to
>     idle for a several hours.
>
>     On 29/06/2019 14:56, Jeff Zhang wrote:
>     > Here's what zeppelin community did, we make a python script to
>     check the
>     > build status of pull request.
>     > Here's script:
>     > https://github.com/apache/zeppelin/blob/master/travis_check.py
>     >
>     > And this is the script we used in Jenkins build job.
>     >
>     > if [ -f "travis_check.py" ]; then
>     >    git log -n 1
>     >    STATUS=$(curl -s $BUILD_URL | grep -e "GitHub pull
>     request.*from.*" | sed
>     > 's/.*GitHub pull request <a
>     > href=\"\(https[^"]*\).*from[^"]*.\(https[^"]*\).*/\1 \2/g')
>     >    AUTHOR=$(echo $STATUS | sed 's/.*[/]\(.*\)$/\1/g')
>     >    PR=$(echo $STATUS | awk '{print $1}' | sed 's/.*[/]\(.*\)$/\1/g')
>     >    #COMMIT=$(git log -n 1 | grep "^Merge:" | awk '{print $3}')
>     >    #if [ -z $COMMIT ]; then
>     >    #  COMMIT=$(curl -s
>     https://api.github.com/repos/apache/zeppelin/pulls/$PR
>     > | grep -e "\"label\":" -e "\"ref\":" -e "\"sha\":" | tr '\n' ' '
>     | sed
>     > 's/\(.*sha[^,]*,\)\(.*ref.*\)/\1 = \2/g' | tr = '\n' | grep -v
>     "apache:" |
>     > sed 's/.*sha.[^"]*["]\([^"]*\).*/\1/g')
>     >    #fi
>     >
>     >    # get commit hash from PR
>     >    COMMIT=$(curl -s
>     https://api.github.com/repos/apache/zeppelin/pulls/$PR |
>     > grep -e "\"label\":" -e "\"ref\":" -e "\"sha\":" | tr '\n' ' ' | sed
>     > 's/\(.*sha[^,]*,\)\(.*ref.*\)/\1 = \2/g' | tr = '\n' | grep -v
>     "apache:" |
>     > sed 's/.*sha.[^"]*["]\([^"]*\).*/\1/g')
>     >    sleep 30 # sleep few moment to wait travis starts the build
>     >    RET_CODE=0
>     >    python ./travis_check.py ${AUTHOR} ${COMMIT} || RET_CODE=$?
>     >    if [ $RET_CODE -eq 2 ]; then # try with repository name when
>     travis-ci is
>     > not available in the account
>     >      RET_CODE=0
>     >      AUTHOR=$(curl -s
>     https://api.github.com/repos/apache/zeppelin/pulls/$PR
>     > | grep '"full_name":' | grep -v "apache/zeppelin" | sed
>     > 's/.*[:][^"]*["]\([^/]*\).*/\1/g')
>     >    python ./travis_check.py ${AUTHOR} ${COMMIT} || RET_CODE=$?
>     >    fi
>     >
>     >    if [ $RET_CODE -eq 2 ]; then # fail with can't find build
>     information in
>     > the travis
>     >      set +x
>     >      echo "-----------------------------------------------------"
>     >      echo "Looks like travis-ci is not configured for your fork."
>     >      echo "Please setup by swich on 'zeppelin' repository at
>     > https://travis-ci.org/profile and travis-ci."
>     >      echo "And then make sure 'Build branch updates' option is
>     enabled in
>     > the settings https://travis-ci.org/${AUTHOR}/zeppelin/settings
>     <https://travis-ci.org/$%7BAUTHOR%7D/zeppelin/settings>."
>     >      echo ""
>     >      echo "To trigger CI after setup, you will need ammend your
>     last commit
>     > with"
>     >      echo "git commit --amend"
>     >      echo "git push your-remote HEAD --force"
>     >      echo ""
>     >      echo "See
>     >
>     http://zeppelin.apache.org/contribution/contributions.html#continuous-integration
>     > ."
>     >    fi
>     >
>     >    exit $RET_CODE
>     > else
>     >    set +x
>     >    echo "travis_check.py does not exists"
>     >    exit 1
>     > fi
>     >
>     > Chesnay Schepler <chesnay@apache.org
>     <ma...@apache.org>> 于2019年6月29日周六 下午3:17写道:
>     >
>     >> Does this imply that a Jenkins job is active as long as the
>     Travis build
>     >> runs?
>     >>
>     >> On 26/06/2019 21:28, Bowen Li wrote:
>     >>> Hi,
>     >>>
>     >>> @Dawid, I think the "long test running" as I mentioned in the
>     first
>     >> email,
>     >>> also as you guys said, belongs to "a big effort which is much
>     harder to
>     >>> accomplish in a short period of time and may deserve its own
>     separate
>     >>> discussion". Thus I didn't include it in what we can do in a
>     foreseeable
>     >>> short term.
>     >>>
>     >>> Besides, I don't think that's the ultimate reason for lack of
>     build
>     >>> resources. Even if the build is shortened to something like
>     2h, the
>     >>> problems of no build machine works about 6 or more hours in
>     PST daytime
>     >>> that I described will still happen, because no machine from
>     ASF INFRA's
>     >>> pool is allocated to Flink. As I have paid close attention to
>     the build
>     >>> queue in the past few weekdays, it's a pretty clear pattern now.
>     >>>
>     >>> **The ultimate root cause** for that is - we don't have any
>     **dedicated**
>     >>> build resources that we can stably rely on. I'm actually ok to
>     wait for a
>     >>> long time if there are build requests running, it means at
>     least we are
>     >>> making progress. But I'm not ok with no build resource. A
>     better place I
>     >>> think we should aim at in short term is to always have at
>     least a central
>     >>> pool (can be 3 or 5) of machines dedicated to build Flink at
>     any time, or
>     >>> maybe use users resources.
>     >>>
>     >>> @Chesnay @Robert I synced with Jeff offline that Zeppelin
>     community is
>     >>> using a Jenkins job to automatically build on users' travis
>     account and
>     >>> link the result back to github PR. I guess the Jenkins job
>     would fetch
>     >>> latest upstream master and build the PR against it. Jeff has filed
>     >> tickets
>     >>> to learn and get access to the Jenkins infra. It'll better to
>     fully
>     >>> understand it first before judging this approach.
>     >>>
>     >>> I also heard good things about CircleCI, and ASF INFRA seems
>     to have a
>     >> pool
>     >>> of build capacity there too. Can be an alternative to consider.
>     >>>
>     >>>
>     >>>
>     >>>
>     >>>
>     >>>
>     >>>
>     >>>
>     >>>
>     >>> On Wed, Jun 26, 2019 at 12:44 AM Dawid Wysakowicz <
>     >> dwysakowicz@apache.org <ma...@apache.org>>
>     >>> wrote:
>     >>>
>     >>>> Sorry to jump in late, but I think Bowen missed the most
>     important point
>     >>>> from Chesnay's previous message in the summary. The ultimate
>     reason for
>     >>>> all the problems is that the tests take close to 2 hours to
>     run already.
>     >>>> I fully support this claim: "Unless people start caring about
>     test times
>     >>>> before adding them, this issue cannot be solved"
>     >>>>
>     >>>> This is also another reason why using user's Travis account
>     won't help.
>     >>>> Every few weeks we reach the user's time limit for a single
>     profile.
>     >>>> This makes the user's builds simply fail, until we either
>     properly
>     >>>> decrease the time the tests take (which I am not sure we ever
>     did) or
>     >>>> postpone the problem by splitting into more profiles. (Note
>     that the ASF
>     >>>> Travis account has higher time limits)
>     >>>>
>     >>>> Best,
>     >>>>
>     >>>> Dawid
>     >>>>
>     >>>> On 26/06/2019 09:36, Robert Metzger wrote:
>     >>>>> Do we know if using "the best" available hardware would
>     improve the
>     >> build
>     >>>>> times?
>     >>>>> Imagine we would run the build on machines with plenty of
>     main memory
>     >> to
>     >>>>> mount everything to ramdisk + the latest CPU architecture?
>     >>>>>
>     >>>>> Throwing hardware at the problem could help reduce the time
>     of an
>     >>>>> individual build, and using our own infrastructure would
>     remove our
>     >>>>> dependency on Apache's Travis account (with the obvious
>     downside of
>     >>>> having
>     >>>>> to maintain the infrastructure)
>     >>>>> We could use an open source travis alternative, to have a
>     similar
>     >>>>> experience and make the migration easy.
>     >>>>>
>     >>>>>
>     >>>>> On Wed, Jun 26, 2019 at 9:34 AM Chesnay Schepler
>     <chesnay@apache.org <ma...@apache.org>>
>     >>>> wrote:
>     >>>>>>    From what I gathered, there's no special sauce that the
>     Zeppelin
>     >>>>>> project uses which actually integrates a users Travis
>     account into the
>     >>>> PR.
>     >>>>>> They just disabled Travis for PRs. And that's kind of it.
>     >>>>>>
>     >>>>>> Naturally we can do this (duh) and safe the ASF a fair
>     amount of
>     >>>>>> resources, but there are downsides:
>     >>>>>>
>     >>>>>> The discoverability of the Travis check takes a nose-dive.
>     Either we
>     >>>>>> require every contributor to always, an every commit, also
>     post a
>     >> Travis
>     >>>>>> build, or we have the reviewer sift through the
>     contributors account
>     >> to
>     >>>>>> find it.
>     >>>>>>
>     >>>>>> This is rather cumbersome. Additionally, it's also not
>     equivalent to
>     >>>>>> having a PR build.
>     >>>>>>
>     >>>>>> A normal branch build takes a branch as is and tests it. A
>     PR build
>     >>>>>> merges the branch into master, and then runs it. (Fun fact:
>     This is
>     >> why
>     >>>>>> a PR without merge conflicts is not being run on Travis.)
>     >>>>>>
>     >>>>>> And ultimately, everyone can already make use of this
>     approach anyway.
>     >>>>>>
>     >>>>>> On 25/06/2019 08:02, Jark Wu wrote:
>     >>>>>>> Hi Jeff,
>     >>>>>>>
>     >>>>>>> Thanks for sharing the Zeppelin approach. I think it's a
>     good idea to
>     >>>>>>> leverage user's travis account.
>     >>>>>>> In this way, we can have almost unlimited concurrent build
>     jobs and
>     >>>>>>> developers can restart build by themselves (currently only
>     committers
>     >>>>>>> can restart PR's build).
>     >>>>>>>
>     >>>>>>> But I'm still not very clear how to integrate user's
>     travis build
>     >> into
>     >>>>>>> the Flink pull request's build automatically. Can you
>     explain more in
>     >>>>>>> detail?
>     >>>>>>>
>     >>>>>>> Another question: does travis only build branches for user
>     account?
>     >>>>>>> My concern is that builds for PRs will rebase user's
>     commits against
>     >>>>>>> current master branch.
>     >>>>>>> This will help us to find problems before merge.  Builds
>     for branches
>     >>>>>>> will lose the impact of new commits in master.
>     >>>>>>> How does Zeppelin solve this problem?
>     >>>>>>>
>     >>>>>>> Thanks again for sharing the idea.
>     >>>>>>>
>     >>>>>>> Regards,
>     >>>>>>> Jark
>     >>>>>>>
>     >>>>>>> On Tue, 25 Jun 2019 at 11:01, Jeff Zhang <zjffdu@gmail.com
>     <ma...@gmail.com>
>     >>>>>>> <mailto:zjffdu@gmail.com <ma...@gmail.com>>> wrote:
>     >>>>>>>
>     >>>>>>>       Hi Folks,
>     >>>>>>>
>     >>>>>>>       Zeppelin meet this kind of issue before, we solve it by
>     >> delegating
>     >>>>>>>       each
>     >>>>>>>       one's PR build to his travis account (Everyone can
>     have 5 free
>     >>>>>>>       slot for
>     >>>>>>>       travis build).
>     >>>>>>>       Apache account travis build is only triggered when
>     PR is merged.
>     >>>>>>>
>     >>>>>>>
>     >>>>>>>
>     >>>>>>>       Kurt Young <ykt836@gmail.com
>     <ma...@gmail.com> <mailto:ykt836@gmail.com
>     <ma...@gmail.com>>>
>     >>>>>>>       于2019年6月25日周二 上午10:16写道:
>     >>>>>>>
>     >>>>>>>       > (Forgot to cc George)
>     >>>>>>>       >
>     >>>>>>>       > Best,
>     >>>>>>>       > Kurt
>     >>>>>>>       >
>     >>>>>>>       >
>     >>>>>>>       > On Tue, Jun 25, 2019 at 10:16 AM Kurt Young
>     <ykt836@gmail.com <ma...@gmail.com>
>     >>>>>>>       <mailto:ykt836@gmail.com <ma...@gmail.com>>>
>     wrote:
>     >>>>>>>       >
>     >>>>>>>       > > Hi Bowen,
>     >>>>>>>       > >
>     >>>>>>>       > > Thanks for bringing this up. We actually have
>     discussed
>     >> about
>     >>>>>>>       this, and I
>     >>>>>>>       > > think Till and George have
>     >>>>>>>       > > already spend sometime investigating it. I have
>     cced both of
>     >>>>>>>       them, and
>     >>>>>>>       > > maybe they can share
>     >>>>>>>       > > their findings.
>     >>>>>>>       > >
>     >>>>>>>       > > Best,
>     >>>>>>>       > > Kurt
>     >>>>>>>       > >
>     >>>>>>>       > >
>     >>>>>>>       > > On Tue, Jun 25, 2019 at 10:08 AM Jark Wu
>     <imjark@gmail.com <ma...@gmail.com>
>     >>>>>>>       <mailto:imjark@gmail.com <ma...@gmail.com>>>
>     wrote:
>     >>>>>>>       > >
>     >>>>>>>       > >> Hi Bowen,
>     >>>>>>>       > >>
>     >>>>>>>       > >> Thanks for bringing this. We also suffered from
>     the long
>     >>>>>>>       build time.
>     >>>>>>>       > >> I agree that we should focus on solving build
>     capacity
>     >>>>>>>       problem in the
>     >>>>>>>       > >> thread.
>     >>>>>>>       > >>
>     >>>>>>>       > >> My observation is there is only one build is
>     running, all
>     >> the
>     >>>>>>>       others
>     >>>>>>>       > >> (other
>     >>>>>>>       > >> PRs, master) are pending.
>     >>>>>>>       > >> The pricing plan[1] of travis shows it can support
>     >> concurrent
>     >>>>>>>       build
>     >>>>>>>       > jobs.
>     >>>>>>>       > >> But I don't know which plan we are using, might
>     be the free
>     >>>>>>>       plan for
>     >>>>>>>       > open
>     >>>>>>>       > >> source.
>     >>>>>>>       > >>
>     >>>>>>>       > >> I cc-ed Chesnay who may have some experience on
>     Travis.
>     >>>>>>>       > >>
>     >>>>>>>       > >> Regards,
>     >>>>>>>       > >> Jark
>     >>>>>>>       > >>
>     >>>>>>>       > >> [1]: https://travis-ci.com/plans
>     >>>>>>>       > >>
>     >>>>>>>       > >> On Tue, 25 Jun 2019 at 08:11, Bowen Li <
>     >> bowenli86@gmail.com <ma...@gmail.com>
>     >>>>>>>       <mailto:bowenli86@gmail.com
>     <ma...@gmail.com>>> wrote:
>     >>>>>>>       > >>
>     >>>>>>>       > >> > Hi Steven,
>     >>>>>>>       > >> >
>     >>>>>>>       > >> > I think you may not read what I wrote. The
>     discussion is
>     >>>> about
>     >>>>>>>       > "unstable
>     >>>>>>>       > >> > build **capacity**", in another word
>     "unstable / lack of
>     >>>> build
>     >>>>>>>       > >> resources",
>     >>>>>>>       > >> > not "unstable build".
>     >>>>>>>       > >> >
>     >>>>>>>       > >> > On Mon, Jun 24, 2019 at 4:40 PM Steven Wu
>     >>>>>>>       <stevenz3wu@gmail.com <ma...@gmail.com>
>     <mailto:stevenz3wu@gmail.com <ma...@gmail.com>>>
>     >>>>>>>       > wrote:
>     >>>>>>>       > >> >
>     >>>>>>>       > >> > > long and sometimes unstable build is
>     definitely a pain
>     >>>>>> point.
>     >>>>>>>       > >> > >
>     >>>>>>>       > >> > > I suspect the build failure here in
>     >> flink-connector-kafka
>     >>>>>>>       is not
>     >>>>>>>       > >> related
>     >>>>>>>       > >> > to
>     >>>>>>>       > >> > > my change. but there is no easy re-run the
>     build on
>     >>>>>>>       travis UI.
>     >>>>>>>       > Google
>     >>>>>>>       > >> > > search showed a trick of close-and-open the
>     PR will
>     >>>>>>>       trigger rebuild.
>     >>>>>>>       > >> but
>     >>>>>>>       > >> > > that could add noises to the PR activities.
>     >>>>>>>       > >> > >
>     https://travis-ci.org/apache/flink/jobs/545555519
>     >>>>>>>       > >> > >
>     >>>>>>>       > >> > > travis-ci for my personal repo often failed
>     with
>     >>>>>>>       exceeding time
>     >>>>>>>       > limit
>     >>>>>>>       > >> > after
>     >>>>>>>       > >> > > 4+ hours.
>     >>>>>>>       > >> > > The job exceeded the maximum time limit for
>     jobs, and
>     >> has
>     >>>>>>>       been
>     >>>>>>>       > >> > terminated.
>     >>>>>>>       > >> > >
>     >>>>>>>       > >> > > On Mon, Jun 24, 2019 at 4:15 PM Bowen Li
>     >>>>>>>       <bowenli86@gmail.com <ma...@gmail.com>
>     <mailto:bowenli86@gmail.com <ma...@gmail.com>>>
>     >>>>>>>       > wrote:
>     >>>>>>>       > >> > >
>     >>>>>>>       > >> > > >
>     https://travis-ci.org/apache/flink/builds/549681530
>     >>>>>>>       This build
>     >>>>>>>       > >> > request
>     >>>>>>>       > >> > > > has
>     >>>>>>>       > >> > > > been sitting at **HEAD of the queue**
>     since I first
>     >> saw
>     >>>>>>>       it at PST
>     >>>>>>>       > >> > 10:30am
>     >>>>>>>       > >> > > > (not sure how long it's been there before
>     10:30am).
>     >>>>>>>       It's PST
>     >>>>>>>       > 4:12pm
>     >>>>>>>       > >> now
>     >>>>>>>       > >> > > and
>     >>>>>>>       > >> > > > it hasn't started yet.
>     >>>>>>>       > >> > > >
>     >>>>>>>       > >> > > > On Mon, Jun 24, 2019 at 2:48 PM Bowen Li
>     >>>>>>>       <bowenli86@gmail.com <ma...@gmail.com>
>     <mailto:bowenli86@gmail.com <ma...@gmail.com>>>
>     >>>>>>>       > >> wrote:
>     >>>>>>>       > >> > > >
>     >>>>>>>       > >> > > > > Hi devs,
>     >>>>>>>       > >> > > > >
>     >>>>>>>       > >> > > > > I've been experiencing the pain
>     resulting from lack
>     >>>>>>>       of stable
>     >>>>>>>       > >> build
>     >>>>>>>       > >> > > > > capacity on Travis for Flink PRs [1].
>     >> Specifically, I
>     >>>>>>>       noticed
>     >>>>>>>       > >> often
>     >>>>>>>       > >> > > that
>     >>>>>>>       > >> > > > no
>     >>>>>>>       > >> > > > > build in the queue is making any
>     progress for
>     >> hours,
>     >>>> and
>     >>>>>>>       > suddenly
>     >>>>>>>       > >> 5
>     >>>>>>>       > >> > or
>     >>>>>>>       > >> > > 6
>     >>>>>>>       > >> > > > > builds kick off all together after the
>     long pause.
>     >>>>>>>       I'm at PST
>     >>>>>>>       > >> > (UTC-08)
>     >>>>>>>       > >> > > > time
>     >>>>>>>       > >> > > > > zone, and I've seen pause can be as
>     long as 6 hours
>     >>>>>>>       from PST 9am
>     >>>>>>>       > >> to
>     >>>>>>>       > >> > 3pm
>     >>>>>>>       > >> > > > > (let alone the time needed to drain the
>     queue
>     >>>>>>>       afterwards).
>     >>>>>>>       > >> > > > >
>     >>>>>>>       > >> > > > > I think this has greatly impacted our
>     productivity.
>     >>>> I've
>     >>>>>>>       > >> experienced
>     >>>>>>>       > >> > > that
>     >>>>>>>       > >> > > > > PRs submitted in the early morning of
>     PST time zone
>     >>>>>>>       won't finish
>     >>>>>>>       > >> > their
>     >>>>>>>       > >> > > > > build until late night of the same day.
>     >>>>>>>       > >> > > > >
>     >>>>>>>       > >> > > > > So my questions are:
>     >>>>>>>       > >> > > > >
>     >>>>>>>       > >> > > > > - Has anyone else experienced the same
>     problem or
>     >>>>>>>       have similar
>     >>>>>>>       > >> > > > observation
>     >>>>>>>       > >> > > > > on TravisCI? (I suspect it has things
>     to do with
>     >> time
>     >>>>>>>       zone)
>     >>>>>>>       > >> > > > >
>     >>>>>>>       > >> > > > > - What pricing plan of TravisCI is
>     Flink currently
>     >>>>>>>       using? Is it
>     >>>>>>>       > >> the
>     >>>>>>>       > >> > > free
>     >>>>>>>       > >> > > > > plan for open source projects? What are the
>     >>>>>>>       guaranteed build
>     >>>>>>>       > >> capacity
>     >>>>>>>       > >> > > of
>     >>>>>>>       > >> > > > > the current plan?
>     >>>>>>>       > >> > > > >
>     >>>>>>>       > >> > > > > - If the current pricing plan (either
>     free or paid)
>     >>>>>> can't
>     >>>>>>>       > provide
>     >>>>>>>       > >> > > stable
>     >>>>>>>       > >> > > > > build capacity, can we upgrade to a
>     higher priced
>     >>>>>>>       plan with
>     >>>>>>>       > larger
>     >>>>>>>       > >> > and
>     >>>>>>>       > >> > > > more
>     >>>>>>>       > >> > > > > stable build capacity?
>     >>>>>>>       > >> > > > >
>     >>>>>>>       > >> > > > > BTW, another factor that contribute to the
>     >>>>>>>       productivity problem
>     >>>>>>>       > is
>     >>>>>>>       > >> > that
>     >>>>>>>       > >> > > > > our build is slow - we run full build
>     for every PR
>     >>>> and a
>     >>>>>>>       > >> successful
>     >>>>>>>       > >> > > full
>     >>>>>>>       > >> > > > > build takes ~5h. We definitely have
>     more options to
>     >>>>>>>       solve it,
>     >>>>>>>       > for
>     >>>>>>>       > >> > > > instance,
>     >>>>>>>       > >> > > > > modularize the build graphs and reuse
>     artifacts
>     >> from
>     >>>> the
>     >>>>>>>       > previous
>     >>>>>>>       > >> > > build.
>     >>>>>>>       > >> > > > > But I think that can be a big effort
>     which is much
>     >>>>>>>       harder to
>     >>>>>>>       > >> > accomplish
>     >>>>>>>       > >> > > > in
>     >>>>>>>       > >> > > > > a short period of time and may deserve
>     its own
>     >>>> separate
>     >>>>>>>       > >> discussion.
>     >>>>>>>       > >> > > > >
>     >>>>>>>       > >> > > > > [1]
>     >> https://travis-ci.org/apache/flink/pull_requests
>     >>>>>>>       > >> > > > >
>     >>>>>>>       > >> > > > >
>     >>>>>>>       > >> > > >
>     >>>>>>>       > >> > >
>     >>>>>>>       > >> >
>     >>>>>>>       > >>
>     >>>>>>>       > >
>     >>>>>>>       >
>     >>>>>>>
>     >>>>>>>
>     >>>>>>>       --
>     >>>>>>>       Best Regards
>     >>>>>>>
>     >>>>>>>       Jeff Zhang
>     >>>>>>>
>     >>
>


Re: [DISCUSS] solve unstable build capacity problem on TravisCI

Posted by Bowen Li <bo...@gmail.com>.
By looking at the git history of the Jenkins script, its core part was
finished in March 2017 (and only two minor update in 2017/2018), so it's
been running for over two years now and feels like Zepplin community has
been quite happy with it. @Jeff Zhang <zj...@gmail.com> can you share your
insights and user experience with the Jenkins+Travis approach?

Things like:

- has the approach completely solved the resource capacity problem for
Zepplin community? is Zepplin community happy with the result?
- is the whole configuration chain stable (e.g. uptime) enough?
- how often do you need to maintain the Jenkins infra? how many people are
usually involved in maintenance and bug-fixes?

The downside of this approach seems mostly to be on the maintenance to me -
maintain the script and Jenkins infra.

** Having Our Own Travis-CI.com Account **

Another alternative I've been thinking of is to have our own travis-ci.com
account with paid dedicated resources. Note travis-ci.org is the free
version and travis-ci.com is the commercial version. We currently use a
shared resource pool managed by ASK INFRA team on travis-ci.org, but we
have no control over it - we can't see how it's configured, how much
resources are available, how resources are allocated among Apache projects,
etc. The nice thing about having an account on travis-ci.com are:

- relatively low cost with much better resource guarantee than what we
currently have [1]: $249/month with 5 dedicated concurrency, $489/month
with 10 concurrency
- low maintenance work compared to using Jenkins
- (potentially) no migration cost according to Travis's doc [2] (pending
verification)
- full control over the build capacity/configuration compared to using ASF
INFRA's pool

I'd be surprised if we as such a vibrant community cannot find and fund
$249*12=$2988 a year in exchange for a much better developer experience and
much higher productivity.

[1] https://travis-ci.com/plans
[2] https://docs.travis-ci.com/user/migrate/open-source-repository-migration

On Sat, Jun 29, 2019 at 8:39 AM Chesnay Schepler <ch...@apache.org> wrote:

> So yes, the Jenkins job keeps pulling the state from Travis until it
> finishes.
>
> Note sure I'm comfortable with the idea of using Jenkins workers just to
> idle for a several hours.
>
> On 29/06/2019 14:56, Jeff Zhang wrote:
> > Here's what zeppelin community did, we make a python script to check the
> > build status of pull request.
> > Here's script:
> > https://github.com/apache/zeppelin/blob/master/travis_check.py
> >
> > And this is the script we used in Jenkins build job.
> >
> > if [ -f "travis_check.py" ]; then
> >    git log -n 1
> >    STATUS=$(curl -s $BUILD_URL | grep -e "GitHub pull request.*from.*" |
> sed
> > 's/.*GitHub pull request <a
> > href=\"\(https[^"]*\).*from[^"]*.\(https[^"]*\).*/\1 \2/g')
> >    AUTHOR=$(echo $STATUS | sed 's/.*[/]\(.*\)$/\1/g')
> >    PR=$(echo $STATUS | awk '{print $1}' | sed 's/.*[/]\(.*\)$/\1/g')
> >    #COMMIT=$(git log -n 1 | grep "^Merge:" | awk '{print $3}')
> >    #if [ -z $COMMIT ]; then
> >    #  COMMIT=$(curl -s
> https://api.github.com/repos/apache/zeppelin/pulls/$PR
> > | grep -e "\"label\":" -e "\"ref\":" -e "\"sha\":" | tr '\n' ' ' | sed
> > 's/\(.*sha[^,]*,\)\(.*ref.*\)/\1 = \2/g' | tr = '\n' | grep -v "apache:"
> |
> > sed 's/.*sha.[^"]*["]\([^"]*\).*/\1/g')
> >    #fi
> >
> >    # get commit hash from PR
> >    COMMIT=$(curl -s
> https://api.github.com/repos/apache/zeppelin/pulls/$PR |
> > grep -e "\"label\":" -e "\"ref\":" -e "\"sha\":" | tr '\n' ' ' | sed
> > 's/\(.*sha[^,]*,\)\(.*ref.*\)/\1 = \2/g' | tr = '\n' | grep -v "apache:"
> |
> > sed 's/.*sha.[^"]*["]\([^"]*\).*/\1/g')
> >    sleep 30 # sleep few moment to wait travis starts the build
> >    RET_CODE=0
> >    python ./travis_check.py ${AUTHOR} ${COMMIT} || RET_CODE=$?
> >    if [ $RET_CODE -eq 2 ]; then # try with repository name when
> travis-ci is
> > not available in the account
> >      RET_CODE=0
> >      AUTHOR=$(curl -s
> https://api.github.com/repos/apache/zeppelin/pulls/$PR
> > | grep '"full_name":' | grep -v "apache/zeppelin" | sed
> > 's/.*[:][^"]*["]\([^/]*\).*/\1/g')
> >    python ./travis_check.py ${AUTHOR} ${COMMIT} || RET_CODE=$?
> >    fi
> >
> >    if [ $RET_CODE -eq 2 ]; then # fail with can't find build information
> in
> > the travis
> >      set +x
> >      echo "-----------------------------------------------------"
> >      echo "Looks like travis-ci is not configured for your fork."
> >      echo "Please setup by swich on 'zeppelin' repository at
> > https://travis-ci.org/profile and travis-ci."
> >      echo "And then make sure 'Build branch updates' option is enabled in
> > the settings https://travis-ci.org/${AUTHOR}/zeppelin/settings."
> >      echo ""
> >      echo "To trigger CI after setup, you will need ammend your last
> commit
> > with"
> >      echo "git commit --amend"
> >      echo "git push your-remote HEAD --force"
> >      echo ""
> >      echo "See
> >
> http://zeppelin.apache.org/contribution/contributions.html#continuous-integration
> > ."
> >    fi
> >
> >    exit $RET_CODE
> > else
> >    set +x
> >    echo "travis_check.py does not exists"
> >    exit 1
> > fi
> >
> > Chesnay Schepler <ch...@apache.org> 于2019年6月29日周六 下午3:17写道:
> >
> >> Does this imply that a Jenkins job is active as long as the Travis build
> >> runs?
> >>
> >> On 26/06/2019 21:28, Bowen Li wrote:
> >>> Hi,
> >>>
> >>> @Dawid, I think the "long test running" as I mentioned in the first
> >> email,
> >>> also as you guys said, belongs to "a big effort which is much harder to
> >>> accomplish in a short period of time and may deserve its own separate
> >>> discussion". Thus I didn't include it in what we can do in a
> foreseeable
> >>> short term.
> >>>
> >>> Besides, I don't think that's the ultimate reason for lack of build
> >>> resources. Even if the build is shortened to something like 2h, the
> >>> problems of no build machine works about 6 or more hours in PST daytime
> >>> that I described will still happen, because no machine from ASF INFRA's
> >>> pool is allocated to Flink. As I have paid close attention to the build
> >>> queue in the past few weekdays, it's a pretty clear pattern now.
> >>>
> >>> **The ultimate root cause** for that is - we don't have any
> **dedicated**
> >>> build resources that we can stably rely on. I'm actually ok to wait
> for a
> >>> long time if there are build requests running, it means at least we are
> >>> making progress. But I'm not ok with no build resource. A better place
> I
> >>> think we should aim at in short term is to always have at least a
> central
> >>> pool (can be 3 or 5) of machines dedicated to build Flink at any time,
> or
> >>> maybe use users resources.
> >>>
> >>> @Chesnay @Robert I synced with Jeff offline that Zeppelin community is
> >>> using a Jenkins job to automatically build on users' travis account and
> >>> link the result back to github PR. I guess the Jenkins job would fetch
> >>> latest upstream master and build the PR against it. Jeff has filed
> >> tickets
> >>> to learn and get access to the Jenkins infra. It'll better to fully
> >>> understand it first before judging this approach.
> >>>
> >>> I also heard good things about CircleCI, and ASF INFRA seems to have a
> >> pool
> >>> of build capacity there too. Can be an alternative to consider.
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> On Wed, Jun 26, 2019 at 12:44 AM Dawid Wysakowicz <
> >> dwysakowicz@apache.org>
> >>> wrote:
> >>>
> >>>> Sorry to jump in late, but I think Bowen missed the most important
> point
> >>>> from Chesnay's previous message in the summary. The ultimate reason
> for
> >>>> all the problems is that the tests take close to 2 hours to run
> already.
> >>>> I fully support this claim: "Unless people start caring about test
> times
> >>>> before adding them, this issue cannot be solved"
> >>>>
> >>>> This is also another reason why using user's Travis account won't
> help.
> >>>> Every few weeks we reach the user's time limit for a single profile.
> >>>> This makes the user's builds simply fail, until we either properly
> >>>> decrease the time the tests take (which I am not sure we ever did) or
> >>>> postpone the problem by splitting into more profiles. (Note that the
> ASF
> >>>> Travis account has higher time limits)
> >>>>
> >>>> Best,
> >>>>
> >>>> Dawid
> >>>>
> >>>> On 26/06/2019 09:36, Robert Metzger wrote:
> >>>>> Do we know if using "the best" available hardware would improve the
> >> build
> >>>>> times?
> >>>>> Imagine we would run the build on machines with plenty of main memory
> >> to
> >>>>> mount everything to ramdisk + the latest CPU architecture?
> >>>>>
> >>>>> Throwing hardware at the problem could help reduce the time of an
> >>>>> individual build, and using our own infrastructure would remove our
> >>>>> dependency on Apache's Travis account (with the obvious downside of
> >>>> having
> >>>>> to maintain the infrastructure)
> >>>>> We could use an open source travis alternative, to have a similar
> >>>>> experience and make the migration easy.
> >>>>>
> >>>>>
> >>>>> On Wed, Jun 26, 2019 at 9:34 AM Chesnay Schepler <chesnay@apache.org
> >
> >>>> wrote:
> >>>>>>    From what I gathered, there's no special sauce that the Zeppelin
> >>>>>> project uses which actually integrates a users Travis account into
> the
> >>>> PR.
> >>>>>> They just disabled Travis for PRs. And that's kind of it.
> >>>>>>
> >>>>>> Naturally we can do this (duh) and safe the ASF a fair amount of
> >>>>>> resources, but there are downsides:
> >>>>>>
> >>>>>> The discoverability of the Travis check takes a nose-dive. Either we
> >>>>>> require every contributor to always, an every commit, also post a
> >> Travis
> >>>>>> build, or we have the reviewer sift through the contributors account
> >> to
> >>>>>> find it.
> >>>>>>
> >>>>>> This is rather cumbersome. Additionally, it's also not equivalent to
> >>>>>> having a PR build.
> >>>>>>
> >>>>>> A normal branch build takes a branch as is and tests it. A PR build
> >>>>>> merges the branch into master, and then runs it. (Fun fact: This is
> >> why
> >>>>>> a PR without merge conflicts is not being run on Travis.)
> >>>>>>
> >>>>>> And ultimately, everyone can already make use of this approach
> anyway.
> >>>>>>
> >>>>>> On 25/06/2019 08:02, Jark Wu wrote:
> >>>>>>> Hi Jeff,
> >>>>>>>
> >>>>>>> Thanks for sharing the Zeppelin approach. I think it's a good idea
> to
> >>>>>>> leverage user's travis account.
> >>>>>>> In this way, we can have almost unlimited concurrent build jobs and
> >>>>>>> developers can restart build by themselves (currently only
> committers
> >>>>>>> can restart PR's build).
> >>>>>>>
> >>>>>>> But I'm still not very clear how to integrate user's travis build
> >> into
> >>>>>>> the Flink pull request's build automatically. Can you explain more
> in
> >>>>>>> detail?
> >>>>>>>
> >>>>>>> Another question: does travis only build branches for user account?
> >>>>>>> My concern is that builds for PRs will rebase user's commits
> against
> >>>>>>> current master branch.
> >>>>>>> This will help us to find problems before merge.  Builds for
> branches
> >>>>>>> will lose the impact of new commits in master.
> >>>>>>> How does Zeppelin solve this problem?
> >>>>>>>
> >>>>>>> Thanks again for sharing the idea.
> >>>>>>>
> >>>>>>> Regards,
> >>>>>>> Jark
> >>>>>>>
> >>>>>>> On Tue, 25 Jun 2019 at 11:01, Jeff Zhang <zjffdu@gmail.com
> >>>>>>> <ma...@gmail.com>> wrote:
> >>>>>>>
> >>>>>>>       Hi Folks,
> >>>>>>>
> >>>>>>>       Zeppelin meet this kind of issue before, we solve it by
> >> delegating
> >>>>>>>       each
> >>>>>>>       one's PR build to his travis account (Everyone can have 5
> free
> >>>>>>>       slot for
> >>>>>>>       travis build).
> >>>>>>>       Apache account travis build is only triggered when PR is
> merged.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>       Kurt Young <ykt836@gmail.com <ma...@gmail.com>>
> >>>>>>>       于2019年6月25日周二 上午10:16写道:
> >>>>>>>
> >>>>>>>       > (Forgot to cc George)
> >>>>>>>       >
> >>>>>>>       > Best,
> >>>>>>>       > Kurt
> >>>>>>>       >
> >>>>>>>       >
> >>>>>>>       > On Tue, Jun 25, 2019 at 10:16 AM Kurt Young <
> ykt836@gmail.com
> >>>>>>>       <ma...@gmail.com>> wrote:
> >>>>>>>       >
> >>>>>>>       > > Hi Bowen,
> >>>>>>>       > >
> >>>>>>>       > > Thanks for bringing this up. We actually have discussed
> >> about
> >>>>>>>       this, and I
> >>>>>>>       > > think Till and George have
> >>>>>>>       > > already spend sometime investigating it. I have cced
> both of
> >>>>>>>       them, and
> >>>>>>>       > > maybe they can share
> >>>>>>>       > > their findings.
> >>>>>>>       > >
> >>>>>>>       > > Best,
> >>>>>>>       > > Kurt
> >>>>>>>       > >
> >>>>>>>       > >
> >>>>>>>       > > On Tue, Jun 25, 2019 at 10:08 AM Jark Wu <
> imjark@gmail.com
> >>>>>>>       <ma...@gmail.com>> wrote:
> >>>>>>>       > >
> >>>>>>>       > >> Hi Bowen,
> >>>>>>>       > >>
> >>>>>>>       > >> Thanks for bringing this. We also suffered from the long
> >>>>>>>       build time.
> >>>>>>>       > >> I agree that we should focus on solving build capacity
> >>>>>>>       problem in the
> >>>>>>>       > >> thread.
> >>>>>>>       > >>
> >>>>>>>       > >> My observation is there is only one build is running,
> all
> >> the
> >>>>>>>       others
> >>>>>>>       > >> (other
> >>>>>>>       > >> PRs, master) are pending.
> >>>>>>>       > >> The pricing plan[1] of travis shows it can support
> >> concurrent
> >>>>>>>       build
> >>>>>>>       > jobs.
> >>>>>>>       > >> But I don't know which plan we are using, might be the
> free
> >>>>>>>       plan for
> >>>>>>>       > open
> >>>>>>>       > >> source.
> >>>>>>>       > >>
> >>>>>>>       > >> I cc-ed Chesnay who may have some experience on Travis.
> >>>>>>>       > >>
> >>>>>>>       > >> Regards,
> >>>>>>>       > >> Jark
> >>>>>>>       > >>
> >>>>>>>       > >> [1]: https://travis-ci.com/plans
> >>>>>>>       > >>
> >>>>>>>       > >> On Tue, 25 Jun 2019 at 08:11, Bowen Li <
> >> bowenli86@gmail.com
> >>>>>>>       <ma...@gmail.com>> wrote:
> >>>>>>>       > >>
> >>>>>>>       > >> > Hi Steven,
> >>>>>>>       > >> >
> >>>>>>>       > >> > I think you may not read what I wrote. The discussion
> is
> >>>> about
> >>>>>>>       > "unstable
> >>>>>>>       > >> > build **capacity**", in another word "unstable / lack
> of
> >>>> build
> >>>>>>>       > >> resources",
> >>>>>>>       > >> > not "unstable build".
> >>>>>>>       > >> >
> >>>>>>>       > >> > On Mon, Jun 24, 2019 at 4:40 PM Steven Wu
> >>>>>>>       <stevenz3wu@gmail.com <ma...@gmail.com>>
> >>>>>>>       > wrote:
> >>>>>>>       > >> >
> >>>>>>>       > >> > > long and sometimes unstable build is definitely a
> pain
> >>>>>> point.
> >>>>>>>       > >> > >
> >>>>>>>       > >> > > I suspect the build failure here in
> >> flink-connector-kafka
> >>>>>>>       is not
> >>>>>>>       > >> related
> >>>>>>>       > >> > to
> >>>>>>>       > >> > > my change. but there is no easy re-run the build on
> >>>>>>>       travis UI.
> >>>>>>>       > Google
> >>>>>>>       > >> > > search showed a trick of close-and-open the PR will
> >>>>>>>       trigger rebuild.
> >>>>>>>       > >> but
> >>>>>>>       > >> > > that could add noises to the PR activities.
> >>>>>>>       > >> > > https://travis-ci.org/apache/flink/jobs/545555519
> >>>>>>>       > >> > >
> >>>>>>>       > >> > > travis-ci for my personal repo often failed with
> >>>>>>>       exceeding time
> >>>>>>>       > limit
> >>>>>>>       > >> > after
> >>>>>>>       > >> > > 4+ hours.
> >>>>>>>       > >> > > The job exceeded the maximum time limit for jobs,
> and
> >> has
> >>>>>>>       been
> >>>>>>>       > >> > terminated.
> >>>>>>>       > >> > >
> >>>>>>>       > >> > > On Mon, Jun 24, 2019 at 4:15 PM Bowen Li
> >>>>>>>       <bowenli86@gmail.com <ma...@gmail.com>>
> >>>>>>>       > wrote:
> >>>>>>>       > >> > >
> >>>>>>>       > >> > > >
> https://travis-ci.org/apache/flink/builds/549681530
> >>>>>>>       This build
> >>>>>>>       > >> > request
> >>>>>>>       > >> > > > has
> >>>>>>>       > >> > > > been sitting at **HEAD of the queue** since I
> first
> >> saw
> >>>>>>>       it at PST
> >>>>>>>       > >> > 10:30am
> >>>>>>>       > >> > > > (not sure how long it's been there before
> 10:30am).
> >>>>>>>       It's PST
> >>>>>>>       > 4:12pm
> >>>>>>>       > >> now
> >>>>>>>       > >> > > and
> >>>>>>>       > >> > > > it hasn't started yet.
> >>>>>>>       > >> > > >
> >>>>>>>       > >> > > > On Mon, Jun 24, 2019 at 2:48 PM Bowen Li
> >>>>>>>       <bowenli86@gmail.com <ma...@gmail.com>>
> >>>>>>>       > >> wrote:
> >>>>>>>       > >> > > >
> >>>>>>>       > >> > > > > Hi devs,
> >>>>>>>       > >> > > > >
> >>>>>>>       > >> > > > > I've been experiencing the pain resulting from
> lack
> >>>>>>>       of stable
> >>>>>>>       > >> build
> >>>>>>>       > >> > > > > capacity on Travis for Flink PRs [1].
> >> Specifically, I
> >>>>>>>       noticed
> >>>>>>>       > >> often
> >>>>>>>       > >> > > that
> >>>>>>>       > >> > > > no
> >>>>>>>       > >> > > > > build in the queue is making any progress for
> >> hours,
> >>>> and
> >>>>>>>       > suddenly
> >>>>>>>       > >> 5
> >>>>>>>       > >> > or
> >>>>>>>       > >> > > 6
> >>>>>>>       > >> > > > > builds kick off all together after the long
> pause.
> >>>>>>>       I'm at PST
> >>>>>>>       > >> > (UTC-08)
> >>>>>>>       > >> > > > time
> >>>>>>>       > >> > > > > zone, and I've seen pause can be as long as 6
> hours
> >>>>>>>       from PST 9am
> >>>>>>>       > >> to
> >>>>>>>       > >> > 3pm
> >>>>>>>       > >> > > > > (let alone the time needed to drain the queue
> >>>>>>>       afterwards).
> >>>>>>>       > >> > > > >
> >>>>>>>       > >> > > > > I think this has greatly impacted our
> productivity.
> >>>> I've
> >>>>>>>       > >> experienced
> >>>>>>>       > >> > > that
> >>>>>>>       > >> > > > > PRs submitted in the early morning of PST time
> zone
> >>>>>>>       won't finish
> >>>>>>>       > >> > their
> >>>>>>>       > >> > > > > build until late night of the same day.
> >>>>>>>       > >> > > > >
> >>>>>>>       > >> > > > > So my questions are:
> >>>>>>>       > >> > > > >
> >>>>>>>       > >> > > > > - Has anyone else experienced the same problem
> or
> >>>>>>>       have similar
> >>>>>>>       > >> > > > observation
> >>>>>>>       > >> > > > > on TravisCI? (I suspect it has things to do with
> >> time
> >>>>>>>       zone)
> >>>>>>>       > >> > > > >
> >>>>>>>       > >> > > > > - What pricing plan of TravisCI is Flink
> currently
> >>>>>>>       using? Is it
> >>>>>>>       > >> the
> >>>>>>>       > >> > > free
> >>>>>>>       > >> > > > > plan for open source projects? What are the
> >>>>>>>       guaranteed build
> >>>>>>>       > >> capacity
> >>>>>>>       > >> > > of
> >>>>>>>       > >> > > > > the current plan?
> >>>>>>>       > >> > > > >
> >>>>>>>       > >> > > > > - If the current pricing plan (either free or
> paid)
> >>>>>> can't
> >>>>>>>       > provide
> >>>>>>>       > >> > > stable
> >>>>>>>       > >> > > > > build capacity, can we upgrade to a higher
> priced
> >>>>>>>       plan with
> >>>>>>>       > larger
> >>>>>>>       > >> > and
> >>>>>>>       > >> > > > more
> >>>>>>>       > >> > > > > stable build capacity?
> >>>>>>>       > >> > > > >
> >>>>>>>       > >> > > > > BTW, another factor that contribute to the
> >>>>>>>       productivity problem
> >>>>>>>       > is
> >>>>>>>       > >> > that
> >>>>>>>       > >> > > > > our build is slow - we run full build for every
> PR
> >>>> and a
> >>>>>>>       > >> successful
> >>>>>>>       > >> > > full
> >>>>>>>       > >> > > > > build takes ~5h. We definitely have more
> options to
> >>>>>>>       solve it,
> >>>>>>>       > for
> >>>>>>>       > >> > > > instance,
> >>>>>>>       > >> > > > > modularize the build graphs and reuse artifacts
> >> from
> >>>> the
> >>>>>>>       > previous
> >>>>>>>       > >> > > build.
> >>>>>>>       > >> > > > > But I think that can be a big effort which is
> much
> >>>>>>>       harder to
> >>>>>>>       > >> > accomplish
> >>>>>>>       > >> > > > in
> >>>>>>>       > >> > > > > a short period of time and may deserve its own
> >>>> separate
> >>>>>>>       > >> discussion.
> >>>>>>>       > >> > > > >
> >>>>>>>       > >> > > > > [1]
> >> https://travis-ci.org/apache/flink/pull_requests
> >>>>>>>       > >> > > > >
> >>>>>>>       > >> > > > >
> >>>>>>>       > >> > > >
> >>>>>>>       > >> > >
> >>>>>>>       > >> >
> >>>>>>>       > >>
> >>>>>>>       > >
> >>>>>>>       >
> >>>>>>>
> >>>>>>>
> >>>>>>>       --
> >>>>>>>       Best Regards
> >>>>>>>
> >>>>>>>       Jeff Zhang
> >>>>>>>
> >>
>
>

Re: [DISCUSS] solve unstable build capacity problem on TravisCI

Posted by Chesnay Schepler <ch...@apache.org>.
So yes, the Jenkins job keeps pulling the state from Travis until it 
finishes.

Note sure I'm comfortable with the idea of using Jenkins workers just to 
idle for a several hours.

On 29/06/2019 14:56, Jeff Zhang wrote:
> Here's what zeppelin community did, we make a python script to check the
> build status of pull request.
> Here's script:
> https://github.com/apache/zeppelin/blob/master/travis_check.py
>
> And this is the script we used in Jenkins build job.
>
> if [ -f "travis_check.py" ]; then
>    git log -n 1
>    STATUS=$(curl -s $BUILD_URL | grep -e "GitHub pull request.*from.*" | sed
> 's/.*GitHub pull request <a
> href=\"\(https[^"]*\).*from[^"]*.\(https[^"]*\).*/\1 \2/g')
>    AUTHOR=$(echo $STATUS | sed 's/.*[/]\(.*\)$/\1/g')
>    PR=$(echo $STATUS | awk '{print $1}' | sed 's/.*[/]\(.*\)$/\1/g')
>    #COMMIT=$(git log -n 1 | grep "^Merge:" | awk '{print $3}')
>    #if [ -z $COMMIT ]; then
>    #  COMMIT=$(curl -s https://api.github.com/repos/apache/zeppelin/pulls/$PR
> | grep -e "\"label\":" -e "\"ref\":" -e "\"sha\":" | tr '\n' ' ' | sed
> 's/\(.*sha[^,]*,\)\(.*ref.*\)/\1 = \2/g' | tr = '\n' | grep -v "apache:" |
> sed 's/.*sha.[^"]*["]\([^"]*\).*/\1/g')
>    #fi
>
>    # get commit hash from PR
>    COMMIT=$(curl -s https://api.github.com/repos/apache/zeppelin/pulls/$PR |
> grep -e "\"label\":" -e "\"ref\":" -e "\"sha\":" | tr '\n' ' ' | sed
> 's/\(.*sha[^,]*,\)\(.*ref.*\)/\1 = \2/g' | tr = '\n' | grep -v "apache:" |
> sed 's/.*sha.[^"]*["]\([^"]*\).*/\1/g')
>    sleep 30 # sleep few moment to wait travis starts the build
>    RET_CODE=0
>    python ./travis_check.py ${AUTHOR} ${COMMIT} || RET_CODE=$?
>    if [ $RET_CODE -eq 2 ]; then # try with repository name when travis-ci is
> not available in the account
>      RET_CODE=0
>      AUTHOR=$(curl -s https://api.github.com/repos/apache/zeppelin/pulls/$PR
> | grep '"full_name":' | grep -v "apache/zeppelin" | sed
> 's/.*[:][^"]*["]\([^/]*\).*/\1/g')
>    python ./travis_check.py ${AUTHOR} ${COMMIT} || RET_CODE=$?
>    fi
>
>    if [ $RET_CODE -eq 2 ]; then # fail with can't find build information in
> the travis
>      set +x
>      echo "-----------------------------------------------------"
>      echo "Looks like travis-ci is not configured for your fork."
>      echo "Please setup by swich on 'zeppelin' repository at
> https://travis-ci.org/profile and travis-ci."
>      echo "And then make sure 'Build branch updates' option is enabled in
> the settings https://travis-ci.org/${AUTHOR}/zeppelin/settings."
>      echo ""
>      echo "To trigger CI after setup, you will need ammend your last commit
> with"
>      echo "git commit --amend"
>      echo "git push your-remote HEAD --force"
>      echo ""
>      echo "See
> http://zeppelin.apache.org/contribution/contributions.html#continuous-integration
> ."
>    fi
>
>    exit $RET_CODE
> else
>    set +x
>    echo "travis_check.py does not exists"
>    exit 1
> fi
>
> Chesnay Schepler <ch...@apache.org> 于2019年6月29日周六 下午3:17写道:
>
>> Does this imply that a Jenkins job is active as long as the Travis build
>> runs?
>>
>> On 26/06/2019 21:28, Bowen Li wrote:
>>> Hi,
>>>
>>> @Dawid, I think the "long test running" as I mentioned in the first
>> email,
>>> also as you guys said, belongs to "a big effort which is much harder to
>>> accomplish in a short period of time and may deserve its own separate
>>> discussion". Thus I didn't include it in what we can do in a foreseeable
>>> short term.
>>>
>>> Besides, I don't think that's the ultimate reason for lack of build
>>> resources. Even if the build is shortened to something like 2h, the
>>> problems of no build machine works about 6 or more hours in PST daytime
>>> that I described will still happen, because no machine from ASF INFRA's
>>> pool is allocated to Flink. As I have paid close attention to the build
>>> queue in the past few weekdays, it's a pretty clear pattern now.
>>>
>>> **The ultimate root cause** for that is - we don't have any **dedicated**
>>> build resources that we can stably rely on. I'm actually ok to wait for a
>>> long time if there are build requests running, it means at least we are
>>> making progress. But I'm not ok with no build resource. A better place I
>>> think we should aim at in short term is to always have at least a central
>>> pool (can be 3 or 5) of machines dedicated to build Flink at any time, or
>>> maybe use users resources.
>>>
>>> @Chesnay @Robert I synced with Jeff offline that Zeppelin community is
>>> using a Jenkins job to automatically build on users' travis account and
>>> link the result back to github PR. I guess the Jenkins job would fetch
>>> latest upstream master and build the PR against it. Jeff has filed
>> tickets
>>> to learn and get access to the Jenkins infra. It'll better to fully
>>> understand it first before judging this approach.
>>>
>>> I also heard good things about CircleCI, and ASF INFRA seems to have a
>> pool
>>> of build capacity there too. Can be an alternative to consider.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Wed, Jun 26, 2019 at 12:44 AM Dawid Wysakowicz <
>> dwysakowicz@apache.org>
>>> wrote:
>>>
>>>> Sorry to jump in late, but I think Bowen missed the most important point
>>>> from Chesnay's previous message in the summary. The ultimate reason for
>>>> all the problems is that the tests take close to 2 hours to run already.
>>>> I fully support this claim: "Unless people start caring about test times
>>>> before adding them, this issue cannot be solved"
>>>>
>>>> This is also another reason why using user's Travis account won't help.
>>>> Every few weeks we reach the user's time limit for a single profile.
>>>> This makes the user's builds simply fail, until we either properly
>>>> decrease the time the tests take (which I am not sure we ever did) or
>>>> postpone the problem by splitting into more profiles. (Note that the ASF
>>>> Travis account has higher time limits)
>>>>
>>>> Best,
>>>>
>>>> Dawid
>>>>
>>>> On 26/06/2019 09:36, Robert Metzger wrote:
>>>>> Do we know if using "the best" available hardware would improve the
>> build
>>>>> times?
>>>>> Imagine we would run the build on machines with plenty of main memory
>> to
>>>>> mount everything to ramdisk + the latest CPU architecture?
>>>>>
>>>>> Throwing hardware at the problem could help reduce the time of an
>>>>> individual build, and using our own infrastructure would remove our
>>>>> dependency on Apache's Travis account (with the obvious downside of
>>>> having
>>>>> to maintain the infrastructure)
>>>>> We could use an open source travis alternative, to have a similar
>>>>> experience and make the migration easy.
>>>>>
>>>>>
>>>>> On Wed, Jun 26, 2019 at 9:34 AM Chesnay Schepler <ch...@apache.org>
>>>> wrote:
>>>>>>    From what I gathered, there's no special sauce that the Zeppelin
>>>>>> project uses which actually integrates a users Travis account into the
>>>> PR.
>>>>>> They just disabled Travis for PRs. And that's kind of it.
>>>>>>
>>>>>> Naturally we can do this (duh) and safe the ASF a fair amount of
>>>>>> resources, but there are downsides:
>>>>>>
>>>>>> The discoverability of the Travis check takes a nose-dive. Either we
>>>>>> require every contributor to always, an every commit, also post a
>> Travis
>>>>>> build, or we have the reviewer sift through the contributors account
>> to
>>>>>> find it.
>>>>>>
>>>>>> This is rather cumbersome. Additionally, it's also not equivalent to
>>>>>> having a PR build.
>>>>>>
>>>>>> A normal branch build takes a branch as is and tests it. A PR build
>>>>>> merges the branch into master, and then runs it. (Fun fact: This is
>> why
>>>>>> a PR without merge conflicts is not being run on Travis.)
>>>>>>
>>>>>> And ultimately, everyone can already make use of this approach anyway.
>>>>>>
>>>>>> On 25/06/2019 08:02, Jark Wu wrote:
>>>>>>> Hi Jeff,
>>>>>>>
>>>>>>> Thanks for sharing the Zeppelin approach. I think it's a good idea to
>>>>>>> leverage user's travis account.
>>>>>>> In this way, we can have almost unlimited concurrent build jobs and
>>>>>>> developers can restart build by themselves (currently only committers
>>>>>>> can restart PR's build).
>>>>>>>
>>>>>>> But I'm still not very clear how to integrate user's travis build
>> into
>>>>>>> the Flink pull request's build automatically. Can you explain more in
>>>>>>> detail?
>>>>>>>
>>>>>>> Another question: does travis only build branches for user account?
>>>>>>> My concern is that builds for PRs will rebase user's commits against
>>>>>>> current master branch.
>>>>>>> This will help us to find problems before merge.  Builds for branches
>>>>>>> will lose the impact of new commits in master.
>>>>>>> How does Zeppelin solve this problem?
>>>>>>>
>>>>>>> Thanks again for sharing the idea.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Jark
>>>>>>>
>>>>>>> On Tue, 25 Jun 2019 at 11:01, Jeff Zhang <zjffdu@gmail.com
>>>>>>> <ma...@gmail.com>> wrote:
>>>>>>>
>>>>>>>       Hi Folks,
>>>>>>>
>>>>>>>       Zeppelin meet this kind of issue before, we solve it by
>> delegating
>>>>>>>       each
>>>>>>>       one's PR build to his travis account (Everyone can have 5 free
>>>>>>>       slot for
>>>>>>>       travis build).
>>>>>>>       Apache account travis build is only triggered when PR is merged.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>       Kurt Young <ykt836@gmail.com <ma...@gmail.com>>
>>>>>>>       于2019年6月25日周二 上午10:16写道:
>>>>>>>
>>>>>>>       > (Forgot to cc George)
>>>>>>>       >
>>>>>>>       > Best,
>>>>>>>       > Kurt
>>>>>>>       >
>>>>>>>       >
>>>>>>>       > On Tue, Jun 25, 2019 at 10:16 AM Kurt Young <ykt836@gmail.com
>>>>>>>       <ma...@gmail.com>> wrote:
>>>>>>>       >
>>>>>>>       > > Hi Bowen,
>>>>>>>       > >
>>>>>>>       > > Thanks for bringing this up. We actually have discussed
>> about
>>>>>>>       this, and I
>>>>>>>       > > think Till and George have
>>>>>>>       > > already spend sometime investigating it. I have cced both of
>>>>>>>       them, and
>>>>>>>       > > maybe they can share
>>>>>>>       > > their findings.
>>>>>>>       > >
>>>>>>>       > > Best,
>>>>>>>       > > Kurt
>>>>>>>       > >
>>>>>>>       > >
>>>>>>>       > > On Tue, Jun 25, 2019 at 10:08 AM Jark Wu <imjark@gmail.com
>>>>>>>       <ma...@gmail.com>> wrote:
>>>>>>>       > >
>>>>>>>       > >> Hi Bowen,
>>>>>>>       > >>
>>>>>>>       > >> Thanks for bringing this. We also suffered from the long
>>>>>>>       build time.
>>>>>>>       > >> I agree that we should focus on solving build capacity
>>>>>>>       problem in the
>>>>>>>       > >> thread.
>>>>>>>       > >>
>>>>>>>       > >> My observation is there is only one build is running, all
>> the
>>>>>>>       others
>>>>>>>       > >> (other
>>>>>>>       > >> PRs, master) are pending.
>>>>>>>       > >> The pricing plan[1] of travis shows it can support
>> concurrent
>>>>>>>       build
>>>>>>>       > jobs.
>>>>>>>       > >> But I don't know which plan we are using, might be the free
>>>>>>>       plan for
>>>>>>>       > open
>>>>>>>       > >> source.
>>>>>>>       > >>
>>>>>>>       > >> I cc-ed Chesnay who may have some experience on Travis.
>>>>>>>       > >>
>>>>>>>       > >> Regards,
>>>>>>>       > >> Jark
>>>>>>>       > >>
>>>>>>>       > >> [1]: https://travis-ci.com/plans
>>>>>>>       > >>
>>>>>>>       > >> On Tue, 25 Jun 2019 at 08:11, Bowen Li <
>> bowenli86@gmail.com
>>>>>>>       <ma...@gmail.com>> wrote:
>>>>>>>       > >>
>>>>>>>       > >> > Hi Steven,
>>>>>>>       > >> >
>>>>>>>       > >> > I think you may not read what I wrote. The discussion is
>>>> about
>>>>>>>       > "unstable
>>>>>>>       > >> > build **capacity**", in another word "unstable / lack of
>>>> build
>>>>>>>       > >> resources",
>>>>>>>       > >> > not "unstable build".
>>>>>>>       > >> >
>>>>>>>       > >> > On Mon, Jun 24, 2019 at 4:40 PM Steven Wu
>>>>>>>       <stevenz3wu@gmail.com <ma...@gmail.com>>
>>>>>>>       > wrote:
>>>>>>>       > >> >
>>>>>>>       > >> > > long and sometimes unstable build is definitely a pain
>>>>>> point.
>>>>>>>       > >> > >
>>>>>>>       > >> > > I suspect the build failure here in
>> flink-connector-kafka
>>>>>>>       is not
>>>>>>>       > >> related
>>>>>>>       > >> > to
>>>>>>>       > >> > > my change. but there is no easy re-run the build on
>>>>>>>       travis UI.
>>>>>>>       > Google
>>>>>>>       > >> > > search showed a trick of close-and-open the PR will
>>>>>>>       trigger rebuild.
>>>>>>>       > >> but
>>>>>>>       > >> > > that could add noises to the PR activities.
>>>>>>>       > >> > > https://travis-ci.org/apache/flink/jobs/545555519
>>>>>>>       > >> > >
>>>>>>>       > >> > > travis-ci for my personal repo often failed with
>>>>>>>       exceeding time
>>>>>>>       > limit
>>>>>>>       > >> > after
>>>>>>>       > >> > > 4+ hours.
>>>>>>>       > >> > > The job exceeded the maximum time limit for jobs, and
>> has
>>>>>>>       been
>>>>>>>       > >> > terminated.
>>>>>>>       > >> > >
>>>>>>>       > >> > > On Mon, Jun 24, 2019 at 4:15 PM Bowen Li
>>>>>>>       <bowenli86@gmail.com <ma...@gmail.com>>
>>>>>>>       > wrote:
>>>>>>>       > >> > >
>>>>>>>       > >> > > > https://travis-ci.org/apache/flink/builds/549681530
>>>>>>>       This build
>>>>>>>       > >> > request
>>>>>>>       > >> > > > has
>>>>>>>       > >> > > > been sitting at **HEAD of the queue** since I first
>> saw
>>>>>>>       it at PST
>>>>>>>       > >> > 10:30am
>>>>>>>       > >> > > > (not sure how long it's been there before 10:30am).
>>>>>>>       It's PST
>>>>>>>       > 4:12pm
>>>>>>>       > >> now
>>>>>>>       > >> > > and
>>>>>>>       > >> > > > it hasn't started yet.
>>>>>>>       > >> > > >
>>>>>>>       > >> > > > On Mon, Jun 24, 2019 at 2:48 PM Bowen Li
>>>>>>>       <bowenli86@gmail.com <ma...@gmail.com>>
>>>>>>>       > >> wrote:
>>>>>>>       > >> > > >
>>>>>>>       > >> > > > > Hi devs,
>>>>>>>       > >> > > > >
>>>>>>>       > >> > > > > I've been experiencing the pain resulting from lack
>>>>>>>       of stable
>>>>>>>       > >> build
>>>>>>>       > >> > > > > capacity on Travis for Flink PRs [1].
>> Specifically, I
>>>>>>>       noticed
>>>>>>>       > >> often
>>>>>>>       > >> > > that
>>>>>>>       > >> > > > no
>>>>>>>       > >> > > > > build in the queue is making any progress for
>> hours,
>>>> and
>>>>>>>       > suddenly
>>>>>>>       > >> 5
>>>>>>>       > >> > or
>>>>>>>       > >> > > 6
>>>>>>>       > >> > > > > builds kick off all together after the long pause.
>>>>>>>       I'm at PST
>>>>>>>       > >> > (UTC-08)
>>>>>>>       > >> > > > time
>>>>>>>       > >> > > > > zone, and I've seen pause can be as long as 6 hours
>>>>>>>       from PST 9am
>>>>>>>       > >> to
>>>>>>>       > >> > 3pm
>>>>>>>       > >> > > > > (let alone the time needed to drain the queue
>>>>>>>       afterwards).
>>>>>>>       > >> > > > >
>>>>>>>       > >> > > > > I think this has greatly impacted our productivity.
>>>> I've
>>>>>>>       > >> experienced
>>>>>>>       > >> > > that
>>>>>>>       > >> > > > > PRs submitted in the early morning of PST time zone
>>>>>>>       won't finish
>>>>>>>       > >> > their
>>>>>>>       > >> > > > > build until late night of the same day.
>>>>>>>       > >> > > > >
>>>>>>>       > >> > > > > So my questions are:
>>>>>>>       > >> > > > >
>>>>>>>       > >> > > > > - Has anyone else experienced the same problem or
>>>>>>>       have similar
>>>>>>>       > >> > > > observation
>>>>>>>       > >> > > > > on TravisCI? (I suspect it has things to do with
>> time
>>>>>>>       zone)
>>>>>>>       > >> > > > >
>>>>>>>       > >> > > > > - What pricing plan of TravisCI is Flink currently
>>>>>>>       using? Is it
>>>>>>>       > >> the
>>>>>>>       > >> > > free
>>>>>>>       > >> > > > > plan for open source projects? What are the
>>>>>>>       guaranteed build
>>>>>>>       > >> capacity
>>>>>>>       > >> > > of
>>>>>>>       > >> > > > > the current plan?
>>>>>>>       > >> > > > >
>>>>>>>       > >> > > > > - If the current pricing plan (either free or paid)
>>>>>> can't
>>>>>>>       > provide
>>>>>>>       > >> > > stable
>>>>>>>       > >> > > > > build capacity, can we upgrade to a higher priced
>>>>>>>       plan with
>>>>>>>       > larger
>>>>>>>       > >> > and
>>>>>>>       > >> > > > more
>>>>>>>       > >> > > > > stable build capacity?
>>>>>>>       > >> > > > >
>>>>>>>       > >> > > > > BTW, another factor that contribute to the
>>>>>>>       productivity problem
>>>>>>>       > is
>>>>>>>       > >> > that
>>>>>>>       > >> > > > > our build is slow - we run full build for every PR
>>>> and a
>>>>>>>       > >> successful
>>>>>>>       > >> > > full
>>>>>>>       > >> > > > > build takes ~5h. We definitely have more options to
>>>>>>>       solve it,
>>>>>>>       > for
>>>>>>>       > >> > > > instance,
>>>>>>>       > >> > > > > modularize the build graphs and reuse artifacts
>> from
>>>> the
>>>>>>>       > previous
>>>>>>>       > >> > > build.
>>>>>>>       > >> > > > > But I think that can be a big effort which is much
>>>>>>>       harder to
>>>>>>>       > >> > accomplish
>>>>>>>       > >> > > > in
>>>>>>>       > >> > > > > a short period of time and may deserve its own
>>>> separate
>>>>>>>       > >> discussion.
>>>>>>>       > >> > > > >
>>>>>>>       > >> > > > > [1]
>> https://travis-ci.org/apache/flink/pull_requests
>>>>>>>       > >> > > > >
>>>>>>>       > >> > > > >
>>>>>>>       > >> > > >
>>>>>>>       > >> > >
>>>>>>>       > >> >
>>>>>>>       > >>
>>>>>>>       > >
>>>>>>>       >
>>>>>>>
>>>>>>>
>>>>>>>       --
>>>>>>>       Best Regards
>>>>>>>
>>>>>>>       Jeff Zhang
>>>>>>>
>>


Re: [DISCUSS] solve unstable build capacity problem on TravisCI

Posted by Jeff Zhang <zj...@gmail.com>.
Here's what zeppelin community did, we make a python script to check the
build status of pull request.
Here's script:
https://github.com/apache/zeppelin/blob/master/travis_check.py

And this is the script we used in Jenkins build job.

if [ -f "travis_check.py" ]; then
  git log -n 1
  STATUS=$(curl -s $BUILD_URL | grep -e "GitHub pull request.*from.*" | sed
's/.*GitHub pull request <a
href=\"\(https[^"]*\).*from[^"]*.\(https[^"]*\).*/\1 \2/g')
  AUTHOR=$(echo $STATUS | sed 's/.*[/]\(.*\)$/\1/g')
  PR=$(echo $STATUS | awk '{print $1}' | sed 's/.*[/]\(.*\)$/\1/g')
  #COMMIT=$(git log -n 1 | grep "^Merge:" | awk '{print $3}')
  #if [ -z $COMMIT ]; then
  #  COMMIT=$(curl -s https://api.github.com/repos/apache/zeppelin/pulls/$PR
| grep -e "\"label\":" -e "\"ref\":" -e "\"sha\":" | tr '\n' ' ' | sed
's/\(.*sha[^,]*,\)\(.*ref.*\)/\1 = \2/g' | tr = '\n' | grep -v "apache:" |
sed 's/.*sha.[^"]*["]\([^"]*\).*/\1/g')
  #fi

  # get commit hash from PR
  COMMIT=$(curl -s https://api.github.com/repos/apache/zeppelin/pulls/$PR |
grep -e "\"label\":" -e "\"ref\":" -e "\"sha\":" | tr '\n' ' ' | sed
's/\(.*sha[^,]*,\)\(.*ref.*\)/\1 = \2/g' | tr = '\n' | grep -v "apache:" |
sed 's/.*sha.[^"]*["]\([^"]*\).*/\1/g')
  sleep 30 # sleep few moment to wait travis starts the build
  RET_CODE=0
  python ./travis_check.py ${AUTHOR} ${COMMIT} || RET_CODE=$?
  if [ $RET_CODE -eq 2 ]; then # try with repository name when travis-ci is
not available in the account
    RET_CODE=0
    AUTHOR=$(curl -s https://api.github.com/repos/apache/zeppelin/pulls/$PR
| grep '"full_name":' | grep -v "apache/zeppelin" | sed
's/.*[:][^"]*["]\([^/]*\).*/\1/g')
  python ./travis_check.py ${AUTHOR} ${COMMIT} || RET_CODE=$?
  fi

  if [ $RET_CODE -eq 2 ]; then # fail with can't find build information in
the travis
    set +x
    echo "-----------------------------------------------------"
    echo "Looks like travis-ci is not configured for your fork."
    echo "Please setup by swich on 'zeppelin' repository at
https://travis-ci.org/profile and travis-ci."
    echo "And then make sure 'Build branch updates' option is enabled in
the settings https://travis-ci.org/${AUTHOR}/zeppelin/settings."
    echo ""
    echo "To trigger CI after setup, you will need ammend your last commit
with"
    echo "git commit --amend"
    echo "git push your-remote HEAD --force"
    echo ""
    echo "See
http://zeppelin.apache.org/contribution/contributions.html#continuous-integration
."
  fi

  exit $RET_CODE
else
  set +x
  echo "travis_check.py does not exists"
  exit 1
fi

Chesnay Schepler <ch...@apache.org> 于2019年6月29日周六 下午3:17写道:

> Does this imply that a Jenkins job is active as long as the Travis build
> runs?
>
> On 26/06/2019 21:28, Bowen Li wrote:
> > Hi,
> >
> > @Dawid, I think the "long test running" as I mentioned in the first
> email,
> > also as you guys said, belongs to "a big effort which is much harder to
> > accomplish in a short period of time and may deserve its own separate
> > discussion". Thus I didn't include it in what we can do in a foreseeable
> > short term.
> >
> > Besides, I don't think that's the ultimate reason for lack of build
> > resources. Even if the build is shortened to something like 2h, the
> > problems of no build machine works about 6 or more hours in PST daytime
> > that I described will still happen, because no machine from ASF INFRA's
> > pool is allocated to Flink. As I have paid close attention to the build
> > queue in the past few weekdays, it's a pretty clear pattern now.
> >
> > **The ultimate root cause** for that is - we don't have any **dedicated**
> > build resources that we can stably rely on. I'm actually ok to wait for a
> > long time if there are build requests running, it means at least we are
> > making progress. But I'm not ok with no build resource. A better place I
> > think we should aim at in short term is to always have at least a central
> > pool (can be 3 or 5) of machines dedicated to build Flink at any time, or
> > maybe use users resources.
> >
> > @Chesnay @Robert I synced with Jeff offline that Zeppelin community is
> > using a Jenkins job to automatically build on users' travis account and
> > link the result back to github PR. I guess the Jenkins job would fetch
> > latest upstream master and build the PR against it. Jeff has filed
> tickets
> > to learn and get access to the Jenkins infra. It'll better to fully
> > understand it first before judging this approach.
> >
> > I also heard good things about CircleCI, and ASF INFRA seems to have a
> pool
> > of build capacity there too. Can be an alternative to consider.
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > On Wed, Jun 26, 2019 at 12:44 AM Dawid Wysakowicz <
> dwysakowicz@apache.org>
> > wrote:
> >
> >> Sorry to jump in late, but I think Bowen missed the most important point
> >> from Chesnay's previous message in the summary. The ultimate reason for
> >> all the problems is that the tests take close to 2 hours to run already.
> >> I fully support this claim: "Unless people start caring about test times
> >> before adding them, this issue cannot be solved"
> >>
> >> This is also another reason why using user's Travis account won't help.
> >> Every few weeks we reach the user's time limit for a single profile.
> >> This makes the user's builds simply fail, until we either properly
> >> decrease the time the tests take (which I am not sure we ever did) or
> >> postpone the problem by splitting into more profiles. (Note that the ASF
> >> Travis account has higher time limits)
> >>
> >> Best,
> >>
> >> Dawid
> >>
> >> On 26/06/2019 09:36, Robert Metzger wrote:
> >>> Do we know if using "the best" available hardware would improve the
> build
> >>> times?
> >>> Imagine we would run the build on machines with plenty of main memory
> to
> >>> mount everything to ramdisk + the latest CPU architecture?
> >>>
> >>> Throwing hardware at the problem could help reduce the time of an
> >>> individual build, and using our own infrastructure would remove our
> >>> dependency on Apache's Travis account (with the obvious downside of
> >> having
> >>> to maintain the infrastructure)
> >>> We could use an open source travis alternative, to have a similar
> >>> experience and make the migration easy.
> >>>
> >>>
> >>> On Wed, Jun 26, 2019 at 9:34 AM Chesnay Schepler <ch...@apache.org>
> >> wrote:
> >>>>   From what I gathered, there's no special sauce that the Zeppelin
> >>>> project uses which actually integrates a users Travis account into the
> >> PR.
> >>>> They just disabled Travis for PRs. And that's kind of it.
> >>>>
> >>>> Naturally we can do this (duh) and safe the ASF a fair amount of
> >>>> resources, but there are downsides:
> >>>>
> >>>> The discoverability of the Travis check takes a nose-dive. Either we
> >>>> require every contributor to always, an every commit, also post a
> Travis
> >>>> build, or we have the reviewer sift through the contributors account
> to
> >>>> find it.
> >>>>
> >>>> This is rather cumbersome. Additionally, it's also not equivalent to
> >>>> having a PR build.
> >>>>
> >>>> A normal branch build takes a branch as is and tests it. A PR build
> >>>> merges the branch into master, and then runs it. (Fun fact: This is
> why
> >>>> a PR without merge conflicts is not being run on Travis.)
> >>>>
> >>>> And ultimately, everyone can already make use of this approach anyway.
> >>>>
> >>>> On 25/06/2019 08:02, Jark Wu wrote:
> >>>>> Hi Jeff,
> >>>>>
> >>>>> Thanks for sharing the Zeppelin approach. I think it's a good idea to
> >>>>> leverage user's travis account.
> >>>>> In this way, we can have almost unlimited concurrent build jobs and
> >>>>> developers can restart build by themselves (currently only committers
> >>>>> can restart PR's build).
> >>>>>
> >>>>> But I'm still not very clear how to integrate user's travis build
> into
> >>>>> the Flink pull request's build automatically. Can you explain more in
> >>>>> detail?
> >>>>>
> >>>>> Another question: does travis only build branches for user account?
> >>>>> My concern is that builds for PRs will rebase user's commits against
> >>>>> current master branch.
> >>>>> This will help us to find problems before merge.  Builds for branches
> >>>>> will lose the impact of new commits in master.
> >>>>> How does Zeppelin solve this problem?
> >>>>>
> >>>>> Thanks again for sharing the idea.
> >>>>>
> >>>>> Regards,
> >>>>> Jark
> >>>>>
> >>>>> On Tue, 25 Jun 2019 at 11:01, Jeff Zhang <zjffdu@gmail.com
> >>>>> <ma...@gmail.com>> wrote:
> >>>>>
> >>>>>      Hi Folks,
> >>>>>
> >>>>>      Zeppelin meet this kind of issue before, we solve it by
> delegating
> >>>>>      each
> >>>>>      one's PR build to his travis account (Everyone can have 5 free
> >>>>>      slot for
> >>>>>      travis build).
> >>>>>      Apache account travis build is only triggered when PR is merged.
> >>>>>
> >>>>>
> >>>>>
> >>>>>      Kurt Young <ykt836@gmail.com <ma...@gmail.com>>
> >>>>>      于2019年6月25日周二 上午10:16写道:
> >>>>>
> >>>>>      > (Forgot to cc George)
> >>>>>      >
> >>>>>      > Best,
> >>>>>      > Kurt
> >>>>>      >
> >>>>>      >
> >>>>>      > On Tue, Jun 25, 2019 at 10:16 AM Kurt Young <ykt836@gmail.com
> >>>>>      <ma...@gmail.com>> wrote:
> >>>>>      >
> >>>>>      > > Hi Bowen,
> >>>>>      > >
> >>>>>      > > Thanks for bringing this up. We actually have discussed
> about
> >>>>>      this, and I
> >>>>>      > > think Till and George have
> >>>>>      > > already spend sometime investigating it. I have cced both of
> >>>>>      them, and
> >>>>>      > > maybe they can share
> >>>>>      > > their findings.
> >>>>>      > >
> >>>>>      > > Best,
> >>>>>      > > Kurt
> >>>>>      > >
> >>>>>      > >
> >>>>>      > > On Tue, Jun 25, 2019 at 10:08 AM Jark Wu <imjark@gmail.com
> >>>>>      <ma...@gmail.com>> wrote:
> >>>>>      > >
> >>>>>      > >> Hi Bowen,
> >>>>>      > >>
> >>>>>      > >> Thanks for bringing this. We also suffered from the long
> >>>>>      build time.
> >>>>>      > >> I agree that we should focus on solving build capacity
> >>>>>      problem in the
> >>>>>      > >> thread.
> >>>>>      > >>
> >>>>>      > >> My observation is there is only one build is running, all
> the
> >>>>>      others
> >>>>>      > >> (other
> >>>>>      > >> PRs, master) are pending.
> >>>>>      > >> The pricing plan[1] of travis shows it can support
> concurrent
> >>>>>      build
> >>>>>      > jobs.
> >>>>>      > >> But I don't know which plan we are using, might be the free
> >>>>>      plan for
> >>>>>      > open
> >>>>>      > >> source.
> >>>>>      > >>
> >>>>>      > >> I cc-ed Chesnay who may have some experience on Travis.
> >>>>>      > >>
> >>>>>      > >> Regards,
> >>>>>      > >> Jark
> >>>>>      > >>
> >>>>>      > >> [1]: https://travis-ci.com/plans
> >>>>>      > >>
> >>>>>      > >> On Tue, 25 Jun 2019 at 08:11, Bowen Li <
> bowenli86@gmail.com
> >>>>>      <ma...@gmail.com>> wrote:
> >>>>>      > >>
> >>>>>      > >> > Hi Steven,
> >>>>>      > >> >
> >>>>>      > >> > I think you may not read what I wrote. The discussion is
> >> about
> >>>>>      > "unstable
> >>>>>      > >> > build **capacity**", in another word "unstable / lack of
> >> build
> >>>>>      > >> resources",
> >>>>>      > >> > not "unstable build".
> >>>>>      > >> >
> >>>>>      > >> > On Mon, Jun 24, 2019 at 4:40 PM Steven Wu
> >>>>>      <stevenz3wu@gmail.com <ma...@gmail.com>>
> >>>>>      > wrote:
> >>>>>      > >> >
> >>>>>      > >> > > long and sometimes unstable build is definitely a pain
> >>>> point.
> >>>>>      > >> > >
> >>>>>      > >> > > I suspect the build failure here in
> flink-connector-kafka
> >>>>>      is not
> >>>>>      > >> related
> >>>>>      > >> > to
> >>>>>      > >> > > my change. but there is no easy re-run the build on
> >>>>>      travis UI.
> >>>>>      > Google
> >>>>>      > >> > > search showed a trick of close-and-open the PR will
> >>>>>      trigger rebuild.
> >>>>>      > >> but
> >>>>>      > >> > > that could add noises to the PR activities.
> >>>>>      > >> > > https://travis-ci.org/apache/flink/jobs/545555519
> >>>>>      > >> > >
> >>>>>      > >> > > travis-ci for my personal repo often failed with
> >>>>>      exceeding time
> >>>>>      > limit
> >>>>>      > >> > after
> >>>>>      > >> > > 4+ hours.
> >>>>>      > >> > > The job exceeded the maximum time limit for jobs, and
> has
> >>>>>      been
> >>>>>      > >> > terminated.
> >>>>>      > >> > >
> >>>>>      > >> > > On Mon, Jun 24, 2019 at 4:15 PM Bowen Li
> >>>>>      <bowenli86@gmail.com <ma...@gmail.com>>
> >>>>>      > wrote:
> >>>>>      > >> > >
> >>>>>      > >> > > > https://travis-ci.org/apache/flink/builds/549681530
> >>>>>      This build
> >>>>>      > >> > request
> >>>>>      > >> > > > has
> >>>>>      > >> > > > been sitting at **HEAD of the queue** since I first
> saw
> >>>>>      it at PST
> >>>>>      > >> > 10:30am
> >>>>>      > >> > > > (not sure how long it's been there before 10:30am).
> >>>>>      It's PST
> >>>>>      > 4:12pm
> >>>>>      > >> now
> >>>>>      > >> > > and
> >>>>>      > >> > > > it hasn't started yet.
> >>>>>      > >> > > >
> >>>>>      > >> > > > On Mon, Jun 24, 2019 at 2:48 PM Bowen Li
> >>>>>      <bowenli86@gmail.com <ma...@gmail.com>>
> >>>>>      > >> wrote:
> >>>>>      > >> > > >
> >>>>>      > >> > > > > Hi devs,
> >>>>>      > >> > > > >
> >>>>>      > >> > > > > I've been experiencing the pain resulting from lack
> >>>>>      of stable
> >>>>>      > >> build
> >>>>>      > >> > > > > capacity on Travis for Flink PRs [1].
> Specifically, I
> >>>>>      noticed
> >>>>>      > >> often
> >>>>>      > >> > > that
> >>>>>      > >> > > > no
> >>>>>      > >> > > > > build in the queue is making any progress for
> hours,
> >> and
> >>>>>      > suddenly
> >>>>>      > >> 5
> >>>>>      > >> > or
> >>>>>      > >> > > 6
> >>>>>      > >> > > > > builds kick off all together after the long pause.
> >>>>>      I'm at PST
> >>>>>      > >> > (UTC-08)
> >>>>>      > >> > > > time
> >>>>>      > >> > > > > zone, and I've seen pause can be as long as 6 hours
> >>>>>      from PST 9am
> >>>>>      > >> to
> >>>>>      > >> > 3pm
> >>>>>      > >> > > > > (let alone the time needed to drain the queue
> >>>>>      afterwards).
> >>>>>      > >> > > > >
> >>>>>      > >> > > > > I think this has greatly impacted our productivity.
> >> I've
> >>>>>      > >> experienced
> >>>>>      > >> > > that
> >>>>>      > >> > > > > PRs submitted in the early morning of PST time zone
> >>>>>      won't finish
> >>>>>      > >> > their
> >>>>>      > >> > > > > build until late night of the same day.
> >>>>>      > >> > > > >
> >>>>>      > >> > > > > So my questions are:
> >>>>>      > >> > > > >
> >>>>>      > >> > > > > - Has anyone else experienced the same problem or
> >>>>>      have similar
> >>>>>      > >> > > > observation
> >>>>>      > >> > > > > on TravisCI? (I suspect it has things to do with
> time
> >>>>>      zone)
> >>>>>      > >> > > > >
> >>>>>      > >> > > > > - What pricing plan of TravisCI is Flink currently
> >>>>>      using? Is it
> >>>>>      > >> the
> >>>>>      > >> > > free
> >>>>>      > >> > > > > plan for open source projects? What are the
> >>>>>      guaranteed build
> >>>>>      > >> capacity
> >>>>>      > >> > > of
> >>>>>      > >> > > > > the current plan?
> >>>>>      > >> > > > >
> >>>>>      > >> > > > > - If the current pricing plan (either free or paid)
> >>>> can't
> >>>>>      > provide
> >>>>>      > >> > > stable
> >>>>>      > >> > > > > build capacity, can we upgrade to a higher priced
> >>>>>      plan with
> >>>>>      > larger
> >>>>>      > >> > and
> >>>>>      > >> > > > more
> >>>>>      > >> > > > > stable build capacity?
> >>>>>      > >> > > > >
> >>>>>      > >> > > > > BTW, another factor that contribute to the
> >>>>>      productivity problem
> >>>>>      > is
> >>>>>      > >> > that
> >>>>>      > >> > > > > our build is slow - we run full build for every PR
> >> and a
> >>>>>      > >> successful
> >>>>>      > >> > > full
> >>>>>      > >> > > > > build takes ~5h. We definitely have more options to
> >>>>>      solve it,
> >>>>>      > for
> >>>>>      > >> > > > instance,
> >>>>>      > >> > > > > modularize the build graphs and reuse artifacts
> from
> >> the
> >>>>>      > previous
> >>>>>      > >> > > build.
> >>>>>      > >> > > > > But I think that can be a big effort which is much
> >>>>>      harder to
> >>>>>      > >> > accomplish
> >>>>>      > >> > > > in
> >>>>>      > >> > > > > a short period of time and may deserve its own
> >> separate
> >>>>>      > >> discussion.
> >>>>>      > >> > > > >
> >>>>>      > >> > > > > [1]
> https://travis-ci.org/apache/flink/pull_requests
> >>>>>      > >> > > > >
> >>>>>      > >> > > > >
> >>>>>      > >> > > >
> >>>>>      > >> > >
> >>>>>      > >> >
> >>>>>      > >>
> >>>>>      > >
> >>>>>      >
> >>>>>
> >>>>>
> >>>>>      --
> >>>>>      Best Regards
> >>>>>
> >>>>>      Jeff Zhang
> >>>>>
> >>
>
>

-- 
Best Regards

Jeff Zhang

Re: [DISCUSS] solve unstable build capacity problem on TravisCI

Posted by Chesnay Schepler <ch...@apache.org>.
Does this imply that a Jenkins job is active as long as the Travis build 
runs?

On 26/06/2019 21:28, Bowen Li wrote:
> Hi,
>
> @Dawid, I think the "long test running" as I mentioned in the first email,
> also as you guys said, belongs to "a big effort which is much harder to
> accomplish in a short period of time and may deserve its own separate
> discussion". Thus I didn't include it in what we can do in a foreseeable
> short term.
>
> Besides, I don't think that's the ultimate reason for lack of build
> resources. Even if the build is shortened to something like 2h, the
> problems of no build machine works about 6 or more hours in PST daytime
> that I described will still happen, because no machine from ASF INFRA's
> pool is allocated to Flink. As I have paid close attention to the build
> queue in the past few weekdays, it's a pretty clear pattern now.
>
> **The ultimate root cause** for that is - we don't have any **dedicated**
> build resources that we can stably rely on. I'm actually ok to wait for a
> long time if there are build requests running, it means at least we are
> making progress. But I'm not ok with no build resource. A better place I
> think we should aim at in short term is to always have at least a central
> pool (can be 3 or 5) of machines dedicated to build Flink at any time, or
> maybe use users resources.
>
> @Chesnay @Robert I synced with Jeff offline that Zeppelin community is
> using a Jenkins job to automatically build on users' travis account and
> link the result back to github PR. I guess the Jenkins job would fetch
> latest upstream master and build the PR against it. Jeff has filed tickets
> to learn and get access to the Jenkins infra. It'll better to fully
> understand it first before judging this approach.
>
> I also heard good things about CircleCI, and ASF INFRA seems to have a pool
> of build capacity there too. Can be an alternative to consider.
>
>
>
>
>
>
>
>
>
> On Wed, Jun 26, 2019 at 12:44 AM Dawid Wysakowicz <dw...@apache.org>
> wrote:
>
>> Sorry to jump in late, but I think Bowen missed the most important point
>> from Chesnay's previous message in the summary. The ultimate reason for
>> all the problems is that the tests take close to 2 hours to run already.
>> I fully support this claim: "Unless people start caring about test times
>> before adding them, this issue cannot be solved"
>>
>> This is also another reason why using user's Travis account won't help.
>> Every few weeks we reach the user's time limit for a single profile.
>> This makes the user's builds simply fail, until we either properly
>> decrease the time the tests take (which I am not sure we ever did) or
>> postpone the problem by splitting into more profiles. (Note that the ASF
>> Travis account has higher time limits)
>>
>> Best,
>>
>> Dawid
>>
>> On 26/06/2019 09:36, Robert Metzger wrote:
>>> Do we know if using "the best" available hardware would improve the build
>>> times?
>>> Imagine we would run the build on machines with plenty of main memory to
>>> mount everything to ramdisk + the latest CPU architecture?
>>>
>>> Throwing hardware at the problem could help reduce the time of an
>>> individual build, and using our own infrastructure would remove our
>>> dependency on Apache's Travis account (with the obvious downside of
>> having
>>> to maintain the infrastructure)
>>> We could use an open source travis alternative, to have a similar
>>> experience and make the migration easy.
>>>
>>>
>>> On Wed, Jun 26, 2019 at 9:34 AM Chesnay Schepler <ch...@apache.org>
>> wrote:
>>>>   From what I gathered, there's no special sauce that the Zeppelin
>>>> project uses which actually integrates a users Travis account into the
>> PR.
>>>> They just disabled Travis for PRs. And that's kind of it.
>>>>
>>>> Naturally we can do this (duh) and safe the ASF a fair amount of
>>>> resources, but there are downsides:
>>>>
>>>> The discoverability of the Travis check takes a nose-dive. Either we
>>>> require every contributor to always, an every commit, also post a Travis
>>>> build, or we have the reviewer sift through the contributors account to
>>>> find it.
>>>>
>>>> This is rather cumbersome. Additionally, it's also not equivalent to
>>>> having a PR build.
>>>>
>>>> A normal branch build takes a branch as is and tests it. A PR build
>>>> merges the branch into master, and then runs it. (Fun fact: This is why
>>>> a PR without merge conflicts is not being run on Travis.)
>>>>
>>>> And ultimately, everyone can already make use of this approach anyway.
>>>>
>>>> On 25/06/2019 08:02, Jark Wu wrote:
>>>>> Hi Jeff,
>>>>>
>>>>> Thanks for sharing the Zeppelin approach. I think it's a good idea to
>>>>> leverage user's travis account.
>>>>> In this way, we can have almost unlimited concurrent build jobs and
>>>>> developers can restart build by themselves (currently only committers
>>>>> can restart PR's build).
>>>>>
>>>>> But I'm still not very clear how to integrate user's travis build into
>>>>> the Flink pull request's build automatically. Can you explain more in
>>>>> detail?
>>>>>
>>>>> Another question: does travis only build branches for user account?
>>>>> My concern is that builds for PRs will rebase user's commits against
>>>>> current master branch.
>>>>> This will help us to find problems before merge.  Builds for branches
>>>>> will lose the impact of new commits in master.
>>>>> How does Zeppelin solve this problem?
>>>>>
>>>>> Thanks again for sharing the idea.
>>>>>
>>>>> Regards,
>>>>> Jark
>>>>>
>>>>> On Tue, 25 Jun 2019 at 11:01, Jeff Zhang <zjffdu@gmail.com
>>>>> <ma...@gmail.com>> wrote:
>>>>>
>>>>>      Hi Folks,
>>>>>
>>>>>      Zeppelin meet this kind of issue before, we solve it by delegating
>>>>>      each
>>>>>      one's PR build to his travis account (Everyone can have 5 free
>>>>>      slot for
>>>>>      travis build).
>>>>>      Apache account travis build is only triggered when PR is merged.
>>>>>
>>>>>
>>>>>
>>>>>      Kurt Young <ykt836@gmail.com <ma...@gmail.com>>
>>>>>      于2019年6月25日周二 上午10:16写道:
>>>>>
>>>>>      > (Forgot to cc George)
>>>>>      >
>>>>>      > Best,
>>>>>      > Kurt
>>>>>      >
>>>>>      >
>>>>>      > On Tue, Jun 25, 2019 at 10:16 AM Kurt Young <ykt836@gmail.com
>>>>>      <ma...@gmail.com>> wrote:
>>>>>      >
>>>>>      > > Hi Bowen,
>>>>>      > >
>>>>>      > > Thanks for bringing this up. We actually have discussed about
>>>>>      this, and I
>>>>>      > > think Till and George have
>>>>>      > > already spend sometime investigating it. I have cced both of
>>>>>      them, and
>>>>>      > > maybe they can share
>>>>>      > > their findings.
>>>>>      > >
>>>>>      > > Best,
>>>>>      > > Kurt
>>>>>      > >
>>>>>      > >
>>>>>      > > On Tue, Jun 25, 2019 at 10:08 AM Jark Wu <imjark@gmail.com
>>>>>      <ma...@gmail.com>> wrote:
>>>>>      > >
>>>>>      > >> Hi Bowen,
>>>>>      > >>
>>>>>      > >> Thanks for bringing this. We also suffered from the long
>>>>>      build time.
>>>>>      > >> I agree that we should focus on solving build capacity
>>>>>      problem in the
>>>>>      > >> thread.
>>>>>      > >>
>>>>>      > >> My observation is there is only one build is running, all the
>>>>>      others
>>>>>      > >> (other
>>>>>      > >> PRs, master) are pending.
>>>>>      > >> The pricing plan[1] of travis shows it can support concurrent
>>>>>      build
>>>>>      > jobs.
>>>>>      > >> But I don't know which plan we are using, might be the free
>>>>>      plan for
>>>>>      > open
>>>>>      > >> source.
>>>>>      > >>
>>>>>      > >> I cc-ed Chesnay who may have some experience on Travis.
>>>>>      > >>
>>>>>      > >> Regards,
>>>>>      > >> Jark
>>>>>      > >>
>>>>>      > >> [1]: https://travis-ci.com/plans
>>>>>      > >>
>>>>>      > >> On Tue, 25 Jun 2019 at 08:11, Bowen Li <bowenli86@gmail.com
>>>>>      <ma...@gmail.com>> wrote:
>>>>>      > >>
>>>>>      > >> > Hi Steven,
>>>>>      > >> >
>>>>>      > >> > I think you may not read what I wrote. The discussion is
>> about
>>>>>      > "unstable
>>>>>      > >> > build **capacity**", in another word "unstable / lack of
>> build
>>>>>      > >> resources",
>>>>>      > >> > not "unstable build".
>>>>>      > >> >
>>>>>      > >> > On Mon, Jun 24, 2019 at 4:40 PM Steven Wu
>>>>>      <stevenz3wu@gmail.com <ma...@gmail.com>>
>>>>>      > wrote:
>>>>>      > >> >
>>>>>      > >> > > long and sometimes unstable build is definitely a pain
>>>> point.
>>>>>      > >> > >
>>>>>      > >> > > I suspect the build failure here in flink-connector-kafka
>>>>>      is not
>>>>>      > >> related
>>>>>      > >> > to
>>>>>      > >> > > my change. but there is no easy re-run the build on
>>>>>      travis UI.
>>>>>      > Google
>>>>>      > >> > > search showed a trick of close-and-open the PR will
>>>>>      trigger rebuild.
>>>>>      > >> but
>>>>>      > >> > > that could add noises to the PR activities.
>>>>>      > >> > > https://travis-ci.org/apache/flink/jobs/545555519
>>>>>      > >> > >
>>>>>      > >> > > travis-ci for my personal repo often failed with
>>>>>      exceeding time
>>>>>      > limit
>>>>>      > >> > after
>>>>>      > >> > > 4+ hours.
>>>>>      > >> > > The job exceeded the maximum time limit for jobs, and has
>>>>>      been
>>>>>      > >> > terminated.
>>>>>      > >> > >
>>>>>      > >> > > On Mon, Jun 24, 2019 at 4:15 PM Bowen Li
>>>>>      <bowenli86@gmail.com <ma...@gmail.com>>
>>>>>      > wrote:
>>>>>      > >> > >
>>>>>      > >> > > > https://travis-ci.org/apache/flink/builds/549681530
>>>>>      This build
>>>>>      > >> > request
>>>>>      > >> > > > has
>>>>>      > >> > > > been sitting at **HEAD of the queue** since I first saw
>>>>>      it at PST
>>>>>      > >> > 10:30am
>>>>>      > >> > > > (not sure how long it's been there before 10:30am).
>>>>>      It's PST
>>>>>      > 4:12pm
>>>>>      > >> now
>>>>>      > >> > > and
>>>>>      > >> > > > it hasn't started yet.
>>>>>      > >> > > >
>>>>>      > >> > > > On Mon, Jun 24, 2019 at 2:48 PM Bowen Li
>>>>>      <bowenli86@gmail.com <ma...@gmail.com>>
>>>>>      > >> wrote:
>>>>>      > >> > > >
>>>>>      > >> > > > > Hi devs,
>>>>>      > >> > > > >
>>>>>      > >> > > > > I've been experiencing the pain resulting from lack
>>>>>      of stable
>>>>>      > >> build
>>>>>      > >> > > > > capacity on Travis for Flink PRs [1]. Specifically, I
>>>>>      noticed
>>>>>      > >> often
>>>>>      > >> > > that
>>>>>      > >> > > > no
>>>>>      > >> > > > > build in the queue is making any progress for hours,
>> and
>>>>>      > suddenly
>>>>>      > >> 5
>>>>>      > >> > or
>>>>>      > >> > > 6
>>>>>      > >> > > > > builds kick off all together after the long pause.
>>>>>      I'm at PST
>>>>>      > >> > (UTC-08)
>>>>>      > >> > > > time
>>>>>      > >> > > > > zone, and I've seen pause can be as long as 6 hours
>>>>>      from PST 9am
>>>>>      > >> to
>>>>>      > >> > 3pm
>>>>>      > >> > > > > (let alone the time needed to drain the queue
>>>>>      afterwards).
>>>>>      > >> > > > >
>>>>>      > >> > > > > I think this has greatly impacted our productivity.
>> I've
>>>>>      > >> experienced
>>>>>      > >> > > that
>>>>>      > >> > > > > PRs submitted in the early morning of PST time zone
>>>>>      won't finish
>>>>>      > >> > their
>>>>>      > >> > > > > build until late night of the same day.
>>>>>      > >> > > > >
>>>>>      > >> > > > > So my questions are:
>>>>>      > >> > > > >
>>>>>      > >> > > > > - Has anyone else experienced the same problem or
>>>>>      have similar
>>>>>      > >> > > > observation
>>>>>      > >> > > > > on TravisCI? (I suspect it has things to do with time
>>>>>      zone)
>>>>>      > >> > > > >
>>>>>      > >> > > > > - What pricing plan of TravisCI is Flink currently
>>>>>      using? Is it
>>>>>      > >> the
>>>>>      > >> > > free
>>>>>      > >> > > > > plan for open source projects? What are the
>>>>>      guaranteed build
>>>>>      > >> capacity
>>>>>      > >> > > of
>>>>>      > >> > > > > the current plan?
>>>>>      > >> > > > >
>>>>>      > >> > > > > - If the current pricing plan (either free or paid)
>>>> can't
>>>>>      > provide
>>>>>      > >> > > stable
>>>>>      > >> > > > > build capacity, can we upgrade to a higher priced
>>>>>      plan with
>>>>>      > larger
>>>>>      > >> > and
>>>>>      > >> > > > more
>>>>>      > >> > > > > stable build capacity?
>>>>>      > >> > > > >
>>>>>      > >> > > > > BTW, another factor that contribute to the
>>>>>      productivity problem
>>>>>      > is
>>>>>      > >> > that
>>>>>      > >> > > > > our build is slow - we run full build for every PR
>> and a
>>>>>      > >> successful
>>>>>      > >> > > full
>>>>>      > >> > > > > build takes ~5h. We definitely have more options to
>>>>>      solve it,
>>>>>      > for
>>>>>      > >> > > > instance,
>>>>>      > >> > > > > modularize the build graphs and reuse artifacts from
>> the
>>>>>      > previous
>>>>>      > >> > > build.
>>>>>      > >> > > > > But I think that can be a big effort which is much
>>>>>      harder to
>>>>>      > >> > accomplish
>>>>>      > >> > > > in
>>>>>      > >> > > > > a short period of time and may deserve its own
>> separate
>>>>>      > >> discussion.
>>>>>      > >> > > > >
>>>>>      > >> > > > > [1] https://travis-ci.org/apache/flink/pull_requests
>>>>>      > >> > > > >
>>>>>      > >> > > > >
>>>>>      > >> > > >
>>>>>      > >> > >
>>>>>      > >> >
>>>>>      > >>
>>>>>      > >
>>>>>      >
>>>>>
>>>>>
>>>>>      --
>>>>>      Best Regards
>>>>>
>>>>>      Jeff Zhang
>>>>>
>>


Re: [DISCUSS] solve unstable build capacity problem on TravisCI

Posted by Chesnay Schepler <ch...@apache.org>.
see https://issues.apache.org/jira/browse/INFRA-18533 for the overall 
degradation of Travis capacity.

On 26/06/2019 21:50, Bowen wrote:
> just elaborate a bit more on why slow build is ok but no resource is not: Say I submit a build request at PST 9am, no other requests exist and mine is the queue head, currently it means it still cannot get built until 4 or 5pm.
>
>
>
>> On Jun 26, 2019, at 12:28, Bowen Li <bo...@gmail.com> wrote:
>>
>> Hi,
>>
>> @Dawid, I think the "long test running" as I mentioned in the first email, also as you guys said, belongs to "a big effort which is much harder to accomplish in a short period of time and may deserve its own separate discussion". Thus I didn't include it in what we can do in a foreseeable short term.
>>
>> Besides, I don't think that's the ultimate reason for lack of build resources. Even if the build is shortened to something like 2h, the problems of no build machine works about 6 or more hours in PST daytime that I described will still happen, because no machine from ASF INFRA's pool is allocated to Flink. As I have paid close attention to the build queue in the past few weekdays, it's a pretty clear pattern now.
>>
>> **The ultimate root cause** for that is - we don't have any **dedicated** build resources that we can stably rely on. I'm actually ok to wait for a long time if there are build requests running, it means at least we are making progress. But I'm not ok with no build resource. A better place I think we should aim at in short term is to always have at least a central pool (can be 3 or 5) of machines dedicated to build Flink at any time, or maybe use users resources.
>>
>> @Chesnay @Robert I synced with Jeff offline that Zeppelin community is using a Jenkins job to automatically build on users' travis account and link the result back to github PR. I guess the Jenkins job would fetch latest upstream master and build the PR against it. Jeff has filed tickets to learn and get access to the Jenkins infra. It'll better to fully understand it first before judging this approach.
>>
>> I also heard good things about CircleCI, and ASF INFRA seems to have a pool of build capacity there too. Can be an alternative to consider.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>> On Wed, Jun 26, 2019 at 12:44 AM Dawid Wysakowicz <dw...@apache.org> wrote:
>>> Sorry to jump in late, but I think Bowen missed the most important point
>>> from Chesnay's previous message in the summary. The ultimate reason for
>>> all the problems is that the tests take close to 2 hours to run already.
>>> I fully support this claim: "Unless people start caring about test times
>>> before adding them, this issue cannot be solved"
>>>
>>> This is also another reason why using user's Travis account won't help.
>>> Every few weeks we reach the user's time limit for a single profile.
>>> This makes the user's builds simply fail, until we either properly
>>> decrease the time the tests take (which I am not sure we ever did) or
>>> postpone the problem by splitting into more profiles. (Note that the ASF
>>> Travis account has higher time limits)
>>>
>>> Best,
>>>
>>> Dawid
>>>
>>> On 26/06/2019 09:36, Robert Metzger wrote:
>>>> Do we know if using "the best" available hardware would improve the build
>>>> times?
>>>> Imagine we would run the build on machines with plenty of main memory to
>>>> mount everything to ramdisk + the latest CPU architecture?
>>>>
>>>> Throwing hardware at the problem could help reduce the time of an
>>>> individual build, and using our own infrastructure would remove our
>>>> dependency on Apache's Travis account (with the obvious downside of having
>>>> to maintain the infrastructure)
>>>> We could use an open source travis alternative, to have a similar
>>>> experience and make the migration easy.
>>>>
>>>>
>>>> On Wed, Jun 26, 2019 at 9:34 AM Chesnay Schepler <ch...@apache.org> wrote:
>>>>
>>>>>   From what I gathered, there's no special sauce that the Zeppelin
>>>>> project uses which actually integrates a users Travis account into the PR.
>>>>>
>>>>> They just disabled Travis for PRs. And that's kind of it.
>>>>>
>>>>> Naturally we can do this (duh) and safe the ASF a fair amount of
>>>>> resources, but there are downsides:
>>>>>
>>>>> The discoverability of the Travis check takes a nose-dive. Either we
>>>>> require every contributor to always, an every commit, also post a Travis
>>>>> build, or we have the reviewer sift through the contributors account to
>>>>> find it.
>>>>>
>>>>> This is rather cumbersome. Additionally, it's also not equivalent to
>>>>> having a PR build.
>>>>>
>>>>> A normal branch build takes a branch as is and tests it. A PR build
>>>>> merges the branch into master, and then runs it. (Fun fact: This is why
>>>>> a PR without merge conflicts is not being run on Travis.)
>>>>>
>>>>> And ultimately, everyone can already make use of this approach anyway.
>>>>>
>>>>> On 25/06/2019 08:02, Jark Wu wrote:
>>>>>> Hi Jeff,
>>>>>>
>>>>>> Thanks for sharing the Zeppelin approach. I think it's a good idea to
>>>>>> leverage user's travis account.
>>>>>> In this way, we can have almost unlimited concurrent build jobs and
>>>>>> developers can restart build by themselves (currently only committers
>>>>>> can restart PR's build).
>>>>>>
>>>>>> But I'm still not very clear how to integrate user's travis build into
>>>>>> the Flink pull request's build automatically. Can you explain more in
>>>>>> detail?
>>>>>>
>>>>>> Another question: does travis only build branches for user account?
>>>>>> My concern is that builds for PRs will rebase user's commits against
>>>>>> current master branch.
>>>>>> This will help us to find problems before merge.  Builds for branches
>>>>>> will lose the impact of new commits in master.
>>>>>> How does Zeppelin solve this problem?
>>>>>>
>>>>>> Thanks again for sharing the idea.
>>>>>>
>>>>>> Regards,
>>>>>> Jark
>>>>>>
>>>>>> On Tue, 25 Jun 2019 at 11:01, Jeff Zhang <zjffdu@gmail.com
>>>>>> <ma...@gmail.com>> wrote:
>>>>>>
>>>>>>      Hi Folks,
>>>>>>
>>>>>>      Zeppelin meet this kind of issue before, we solve it by delegating
>>>>>>      each
>>>>>>      one's PR build to his travis account (Everyone can have 5 free
>>>>>>      slot for
>>>>>>      travis build).
>>>>>>      Apache account travis build is only triggered when PR is merged.
>>>>>>
>>>>>>
>>>>>>
>>>>>>      Kurt Young <ykt836@gmail.com <ma...@gmail.com>>
>>>>>>      于2019年6月25日周二 上午10:16写道:
>>>>>>
>>>>>>      > (Forgot to cc George)
>>>>>>      >
>>>>>>      > Best,
>>>>>>      > Kurt
>>>>>>      >
>>>>>>      >
>>>>>>      > On Tue, Jun 25, 2019 at 10:16 AM Kurt Young <ykt836@gmail.com
>>>>>>      <ma...@gmail.com>> wrote:
>>>>>>      >
>>>>>>      > > Hi Bowen,
>>>>>>      > >
>>>>>>      > > Thanks for bringing this up. We actually have discussed about
>>>>>>      this, and I
>>>>>>      > > think Till and George have
>>>>>>      > > already spend sometime investigating it. I have cced both of
>>>>>>      them, and
>>>>>>      > > maybe they can share
>>>>>>      > > their findings.
>>>>>>      > >
>>>>>>      > > Best,
>>>>>>      > > Kurt
>>>>>>      > >
>>>>>>      > >
>>>>>>      > > On Tue, Jun 25, 2019 at 10:08 AM Jark Wu <imjark@gmail.com
>>>>>>      <ma...@gmail.com>> wrote:
>>>>>>      > >
>>>>>>      > >> Hi Bowen,
>>>>>>      > >>
>>>>>>      > >> Thanks for bringing this. We also suffered from the long
>>>>>>      build time.
>>>>>>      > >> I agree that we should focus on solving build capacity
>>>>>>      problem in the
>>>>>>      > >> thread.
>>>>>>      > >>
>>>>>>      > >> My observation is there is only one build is running, all the
>>>>>>      others
>>>>>>      > >> (other
>>>>>>      > >> PRs, master) are pending.
>>>>>>      > >> The pricing plan[1] of travis shows it can support concurrent
>>>>>>      build
>>>>>>      > jobs.
>>>>>>      > >> But I don't know which plan we are using, might be the free
>>>>>>      plan for
>>>>>>      > open
>>>>>>      > >> source.
>>>>>>      > >>
>>>>>>      > >> I cc-ed Chesnay who may have some experience on Travis.
>>>>>>      > >>
>>>>>>      > >> Regards,
>>>>>>      > >> Jark
>>>>>>      > >>
>>>>>>      > >> [1]: https://travis-ci.com/plans
>>>>>>      > >>
>>>>>>      > >> On Tue, 25 Jun 2019 at 08:11, Bowen Li <bowenli86@gmail.com
>>>>>>      <ma...@gmail.com>> wrote:
>>>>>>      > >>
>>>>>>      > >> > Hi Steven,
>>>>>>      > >> >
>>>>>>      > >> > I think you may not read what I wrote. The discussion is about
>>>>>>      > "unstable
>>>>>>      > >> > build **capacity**", in another word "unstable / lack of build
>>>>>>      > >> resources",
>>>>>>      > >> > not "unstable build".
>>>>>>      > >> >
>>>>>>      > >> > On Mon, Jun 24, 2019 at 4:40 PM Steven Wu
>>>>>>      <stevenz3wu@gmail.com <ma...@gmail.com>>
>>>>>>      > wrote:
>>>>>>      > >> >
>>>>>>      > >> > > long and sometimes unstable build is definitely a pain
>>>>> point.
>>>>>>      > >> > >
>>>>>>      > >> > > I suspect the build failure here in flink-connector-kafka
>>>>>>      is not
>>>>>>      > >> related
>>>>>>      > >> > to
>>>>>>      > >> > > my change. but there is no easy re-run the build on
>>>>>>      travis UI.
>>>>>>      > Google
>>>>>>      > >> > > search showed a trick of close-and-open the PR will
>>>>>>      trigger rebuild.
>>>>>>      > >> but
>>>>>>      > >> > > that could add noises to the PR activities.
>>>>>>      > >> > > https://travis-ci.org/apache/flink/jobs/545555519
>>>>>>      > >> > >
>>>>>>      > >> > > travis-ci for my personal repo often failed with
>>>>>>      exceeding time
>>>>>>      > limit
>>>>>>      > >> > after
>>>>>>      > >> > > 4+ hours.
>>>>>>      > >> > > The job exceeded the maximum time limit for jobs, and has
>>>>>>      been
>>>>>>      > >> > terminated.
>>>>>>      > >> > >
>>>>>>      > >> > > On Mon, Jun 24, 2019 at 4:15 PM Bowen Li
>>>>>>      <bowenli86@gmail.com <ma...@gmail.com>>
>>>>>>      > wrote:
>>>>>>      > >> > >
>>>>>>      > >> > > > https://travis-ci.org/apache/flink/builds/549681530
>>>>>>      This build
>>>>>>      > >> > request
>>>>>>      > >> > > > has
>>>>>>      > >> > > > been sitting at **HEAD of the queue** since I first saw
>>>>>>      it at PST
>>>>>>      > >> > 10:30am
>>>>>>      > >> > > > (not sure how long it's been there before 10:30am).
>>>>>>      It's PST
>>>>>>      > 4:12pm
>>>>>>      > >> now
>>>>>>      > >> > > and
>>>>>>      > >> > > > it hasn't started yet.
>>>>>>      > >> > > >
>>>>>>      > >> > > > On Mon, Jun 24, 2019 at 2:48 PM Bowen Li
>>>>>>      <bowenli86@gmail.com <ma...@gmail.com>>
>>>>>>      > >> wrote:
>>>>>>      > >> > > >
>>>>>>      > >> > > > > Hi devs,
>>>>>>      > >> > > > >
>>>>>>      > >> > > > > I've been experiencing the pain resulting from lack
>>>>>>      of stable
>>>>>>      > >> build
>>>>>>      > >> > > > > capacity on Travis for Flink PRs [1]. Specifically, I
>>>>>>      noticed
>>>>>>      > >> often
>>>>>>      > >> > > that
>>>>>>      > >> > > > no
>>>>>>      > >> > > > > build in the queue is making any progress for hours, and
>>>>>>      > suddenly
>>>>>>      > >> 5
>>>>>>      > >> > or
>>>>>>      > >> > > 6
>>>>>>      > >> > > > > builds kick off all together after the long pause.
>>>>>>      I'm at PST
>>>>>>      > >> > (UTC-08)
>>>>>>      > >> > > > time
>>>>>>      > >> > > > > zone, and I've seen pause can be as long as 6 hours
>>>>>>      from PST 9am
>>>>>>      > >> to
>>>>>>      > >> > 3pm
>>>>>>      > >> > > > > (let alone the time needed to drain the queue
>>>>>>      afterwards).
>>>>>>      > >> > > > >
>>>>>>      > >> > > > > I think this has greatly impacted our productivity. I've
>>>>>>      > >> experienced
>>>>>>      > >> > > that
>>>>>>      > >> > > > > PRs submitted in the early morning of PST time zone
>>>>>>      won't finish
>>>>>>      > >> > their
>>>>>>      > >> > > > > build until late night of the same day.
>>>>>>      > >> > > > >
>>>>>>      > >> > > > > So my questions are:
>>>>>>      > >> > > > >
>>>>>>      > >> > > > > - Has anyone else experienced the same problem or
>>>>>>      have similar
>>>>>>      > >> > > > observation
>>>>>>      > >> > > > > on TravisCI? (I suspect it has things to do with time
>>>>>>      zone)
>>>>>>      > >> > > > >
>>>>>>      > >> > > > > - What pricing plan of TravisCI is Flink currently
>>>>>>      using? Is it
>>>>>>      > >> the
>>>>>>      > >> > > free
>>>>>>      > >> > > > > plan for open source projects? What are the
>>>>>>      guaranteed build
>>>>>>      > >> capacity
>>>>>>      > >> > > of
>>>>>>      > >> > > > > the current plan?
>>>>>>      > >> > > > >
>>>>>>      > >> > > > > - If the current pricing plan (either free or paid)
>>>>> can't
>>>>>>      > provide
>>>>>>      > >> > > stable
>>>>>>      > >> > > > > build capacity, can we upgrade to a higher priced
>>>>>>      plan with
>>>>>>      > larger
>>>>>>      > >> > and
>>>>>>      > >> > > > more
>>>>>>      > >> > > > > stable build capacity?
>>>>>>      > >> > > > >
>>>>>>      > >> > > > > BTW, another factor that contribute to the
>>>>>>      productivity problem
>>>>>>      > is
>>>>>>      > >> > that
>>>>>>      > >> > > > > our build is slow - we run full build for every PR and a
>>>>>>      > >> successful
>>>>>>      > >> > > full
>>>>>>      > >> > > > > build takes ~5h. We definitely have more options to
>>>>>>      solve it,
>>>>>>      > for
>>>>>>      > >> > > > instance,
>>>>>>      > >> > > > > modularize the build graphs and reuse artifacts from the
>>>>>>      > previous
>>>>>>      > >> > > build.
>>>>>>      > >> > > > > But I think that can be a big effort which is much
>>>>>>      harder to
>>>>>>      > >> > accomplish
>>>>>>      > >> > > > in
>>>>>>      > >> > > > > a short period of time and may deserve its own separate
>>>>>>      > >> discussion.
>>>>>>      > >> > > > >
>>>>>>      > >> > > > > [1] https://travis-ci.org/apache/flink/pull_requests
>>>>>>      > >> > > > >
>>>>>>      > >> > > > >
>>>>>>      > >> > > >
>>>>>>      > >> > >
>>>>>>      > >> >
>>>>>>      > >>
>>>>>>      > >
>>>>>>      >
>>>>>>
>>>>>>
>>>>>>      --
>>>>>>      Best Regards
>>>>>>
>>>>>>      Jeff Zhang
>>>>>>


Re: [DISCUSS] solve unstable build capacity problem on TravisCI

Posted by Bowen <bo...@gmail.com>.
just elaborate a bit more on why slow build is ok but no resource is not: Say I submit a build request at PST 9am, no other requests exist and mine is the queue head, currently it means it still cannot get built until 4 or 5pm.



> On Jun 26, 2019, at 12:28, Bowen Li <bo...@gmail.com> wrote:
> 
> Hi,
> 
> @Dawid, I think the "long test running" as I mentioned in the first email, also as you guys said, belongs to "a big effort which is much harder to accomplish in a short period of time and may deserve its own separate discussion". Thus I didn't include it in what we can do in a foreseeable short term.
> 
> Besides, I don't think that's the ultimate reason for lack of build resources. Even if the build is shortened to something like 2h, the problems of no build machine works about 6 or more hours in PST daytime that I described will still happen, because no machine from ASF INFRA's pool is allocated to Flink. As I have paid close attention to the build queue in the past few weekdays, it's a pretty clear pattern now. 
> 
> **The ultimate root cause** for that is - we don't have any **dedicated** build resources that we can stably rely on. I'm actually ok to wait for a long time if there are build requests running, it means at least we are making progress. But I'm not ok with no build resource. A better place I think we should aim at in short term is to always have at least a central pool (can be 3 or 5) of machines dedicated to build Flink at any time, or maybe use users resources.
> 
> @Chesnay @Robert I synced with Jeff offline that Zeppelin community is using a Jenkins job to automatically build on users' travis account and link the result back to github PR. I guess the Jenkins job would fetch latest upstream master and build the PR against it. Jeff has filed tickets to learn and get access to the Jenkins infra. It'll better to fully understand it first before judging this approach.
> 
> I also heard good things about CircleCI, and ASF INFRA seems to have a pool of build capacity there too. Can be an alternative to consider.
> 
> 
> 
> 
> 
> 
> 
> 
> 
>> On Wed, Jun 26, 2019 at 12:44 AM Dawid Wysakowicz <dw...@apache.org> wrote:
>> Sorry to jump in late, but I think Bowen missed the most important point
>> from Chesnay's previous message in the summary. The ultimate reason for
>> all the problems is that the tests take close to 2 hours to run already.
>> I fully support this claim: "Unless people start caring about test times
>> before adding them, this issue cannot be solved"
>> 
>> This is also another reason why using user's Travis account won't help.
>> Every few weeks we reach the user's time limit for a single profile.
>> This makes the user's builds simply fail, until we either properly
>> decrease the time the tests take (which I am not sure we ever did) or
>> postpone the problem by splitting into more profiles. (Note that the ASF
>> Travis account has higher time limits)
>> 
>> Best,
>> 
>> Dawid
>> 
>> On 26/06/2019 09:36, Robert Metzger wrote:
>> > Do we know if using "the best" available hardware would improve the build
>> > times?
>> > Imagine we would run the build on machines with plenty of main memory to
>> > mount everything to ramdisk + the latest CPU architecture?
>> >
>> > Throwing hardware at the problem could help reduce the time of an
>> > individual build, and using our own infrastructure would remove our
>> > dependency on Apache's Travis account (with the obvious downside of having
>> > to maintain the infrastructure)
>> > We could use an open source travis alternative, to have a similar
>> > experience and make the migration easy.
>> >
>> >
>> > On Wed, Jun 26, 2019 at 9:34 AM Chesnay Schepler <ch...@apache.org> wrote:
>> >
>> >>  From what I gathered, there's no special sauce that the Zeppelin
>> >> project uses which actually integrates a users Travis account into the PR.
>> >>
>> >> They just disabled Travis for PRs. And that's kind of it.
>> >>
>> >> Naturally we can do this (duh) and safe the ASF a fair amount of
>> >> resources, but there are downsides:
>> >>
>> >> The discoverability of the Travis check takes a nose-dive. Either we
>> >> require every contributor to always, an every commit, also post a Travis
>> >> build, or we have the reviewer sift through the contributors account to
>> >> find it.
>> >>
>> >> This is rather cumbersome. Additionally, it's also not equivalent to
>> >> having a PR build.
>> >>
>> >> A normal branch build takes a branch as is and tests it. A PR build
>> >> merges the branch into master, and then runs it. (Fun fact: This is why
>> >> a PR without merge conflicts is not being run on Travis.)
>> >>
>> >> And ultimately, everyone can already make use of this approach anyway.
>> >>
>> >> On 25/06/2019 08:02, Jark Wu wrote:
>> >>> Hi Jeff,
>> >>>
>> >>> Thanks for sharing the Zeppelin approach. I think it's a good idea to
>> >>> leverage user's travis account.
>> >>> In this way, we can have almost unlimited concurrent build jobs and
>> >>> developers can restart build by themselves (currently only committers
>> >>> can restart PR's build).
>> >>>
>> >>> But I'm still not very clear how to integrate user's travis build into
>> >>> the Flink pull request's build automatically. Can you explain more in
>> >>> detail?
>> >>>
>> >>> Another question: does travis only build branches for user account?
>> >>> My concern is that builds for PRs will rebase user's commits against
>> >>> current master branch.
>> >>> This will help us to find problems before merge.  Builds for branches
>> >>> will lose the impact of new commits in master.
>> >>> How does Zeppelin solve this problem?
>> >>>
>> >>> Thanks again for sharing the idea.
>> >>>
>> >>> Regards,
>> >>> Jark
>> >>>
>> >>> On Tue, 25 Jun 2019 at 11:01, Jeff Zhang <zjffdu@gmail.com
>> >>> <ma...@gmail.com>> wrote:
>> >>>
>> >>>     Hi Folks,
>> >>>
>> >>>     Zeppelin meet this kind of issue before, we solve it by delegating
>> >>>     each
>> >>>     one's PR build to his travis account (Everyone can have 5 free
>> >>>     slot for
>> >>>     travis build).
>> >>>     Apache account travis build is only triggered when PR is merged.
>> >>>
>> >>>
>> >>>
>> >>>     Kurt Young <ykt836@gmail.com <ma...@gmail.com>>
>> >>>     于2019年6月25日周二 上午10:16写道:
>> >>>
>> >>>     > (Forgot to cc George)
>> >>>     >
>> >>>     > Best,
>> >>>     > Kurt
>> >>>     >
>> >>>     >
>> >>>     > On Tue, Jun 25, 2019 at 10:16 AM Kurt Young <ykt836@gmail.com
>> >>>     <ma...@gmail.com>> wrote:
>> >>>     >
>> >>>     > > Hi Bowen,
>> >>>     > >
>> >>>     > > Thanks for bringing this up. We actually have discussed about
>> >>>     this, and I
>> >>>     > > think Till and George have
>> >>>     > > already spend sometime investigating it. I have cced both of
>> >>>     them, and
>> >>>     > > maybe they can share
>> >>>     > > their findings.
>> >>>     > >
>> >>>     > > Best,
>> >>>     > > Kurt
>> >>>     > >
>> >>>     > >
>> >>>     > > On Tue, Jun 25, 2019 at 10:08 AM Jark Wu <imjark@gmail.com
>> >>>     <ma...@gmail.com>> wrote:
>> >>>     > >
>> >>>     > >> Hi Bowen,
>> >>>     > >>
>> >>>     > >> Thanks for bringing this. We also suffered from the long
>> >>>     build time.
>> >>>     > >> I agree that we should focus on solving build capacity
>> >>>     problem in the
>> >>>     > >> thread.
>> >>>     > >>
>> >>>     > >> My observation is there is only one build is running, all the
>> >>>     others
>> >>>     > >> (other
>> >>>     > >> PRs, master) are pending.
>> >>>     > >> The pricing plan[1] of travis shows it can support concurrent
>> >>>     build
>> >>>     > jobs.
>> >>>     > >> But I don't know which plan we are using, might be the free
>> >>>     plan for
>> >>>     > open
>> >>>     > >> source.
>> >>>     > >>
>> >>>     > >> I cc-ed Chesnay who may have some experience on Travis.
>> >>>     > >>
>> >>>     > >> Regards,
>> >>>     > >> Jark
>> >>>     > >>
>> >>>     > >> [1]: https://travis-ci.com/plans
>> >>>     > >>
>> >>>     > >> On Tue, 25 Jun 2019 at 08:11, Bowen Li <bowenli86@gmail.com
>> >>>     <ma...@gmail.com>> wrote:
>> >>>     > >>
>> >>>     > >> > Hi Steven,
>> >>>     > >> >
>> >>>     > >> > I think you may not read what I wrote. The discussion is about
>> >>>     > "unstable
>> >>>     > >> > build **capacity**", in another word "unstable / lack of build
>> >>>     > >> resources",
>> >>>     > >> > not "unstable build".
>> >>>     > >> >
>> >>>     > >> > On Mon, Jun 24, 2019 at 4:40 PM Steven Wu
>> >>>     <stevenz3wu@gmail.com <ma...@gmail.com>>
>> >>>     > wrote:
>> >>>     > >> >
>> >>>     > >> > > long and sometimes unstable build is definitely a pain
>> >> point.
>> >>>     > >> > >
>> >>>     > >> > > I suspect the build failure here in flink-connector-kafka
>> >>>     is not
>> >>>     > >> related
>> >>>     > >> > to
>> >>>     > >> > > my change. but there is no easy re-run the build on
>> >>>     travis UI.
>> >>>     > Google
>> >>>     > >> > > search showed a trick of close-and-open the PR will
>> >>>     trigger rebuild.
>> >>>     > >> but
>> >>>     > >> > > that could add noises to the PR activities.
>> >>>     > >> > > https://travis-ci.org/apache/flink/jobs/545555519
>> >>>     > >> > >
>> >>>     > >> > > travis-ci for my personal repo often failed with
>> >>>     exceeding time
>> >>>     > limit
>> >>>     > >> > after
>> >>>     > >> > > 4+ hours.
>> >>>     > >> > > The job exceeded the maximum time limit for jobs, and has
>> >>>     been
>> >>>     > >> > terminated.
>> >>>     > >> > >
>> >>>     > >> > > On Mon, Jun 24, 2019 at 4:15 PM Bowen Li
>> >>>     <bowenli86@gmail.com <ma...@gmail.com>>
>> >>>     > wrote:
>> >>>     > >> > >
>> >>>     > >> > > > https://travis-ci.org/apache/flink/builds/549681530
>> >>>     This build
>> >>>     > >> > request
>> >>>     > >> > > > has
>> >>>     > >> > > > been sitting at **HEAD of the queue** since I first saw
>> >>>     it at PST
>> >>>     > >> > 10:30am
>> >>>     > >> > > > (not sure how long it's been there before 10:30am).
>> >>>     It's PST
>> >>>     > 4:12pm
>> >>>     > >> now
>> >>>     > >> > > and
>> >>>     > >> > > > it hasn't started yet.
>> >>>     > >> > > >
>> >>>     > >> > > > On Mon, Jun 24, 2019 at 2:48 PM Bowen Li
>> >>>     <bowenli86@gmail.com <ma...@gmail.com>>
>> >>>     > >> wrote:
>> >>>     > >> > > >
>> >>>     > >> > > > > Hi devs,
>> >>>     > >> > > > >
>> >>>     > >> > > > > I've been experiencing the pain resulting from lack
>> >>>     of stable
>> >>>     > >> build
>> >>>     > >> > > > > capacity on Travis for Flink PRs [1]. Specifically, I
>> >>>     noticed
>> >>>     > >> often
>> >>>     > >> > > that
>> >>>     > >> > > > no
>> >>>     > >> > > > > build in the queue is making any progress for hours, and
>> >>>     > suddenly
>> >>>     > >> 5
>> >>>     > >> > or
>> >>>     > >> > > 6
>> >>>     > >> > > > > builds kick off all together after the long pause.
>> >>>     I'm at PST
>> >>>     > >> > (UTC-08)
>> >>>     > >> > > > time
>> >>>     > >> > > > > zone, and I've seen pause can be as long as 6 hours
>> >>>     from PST 9am
>> >>>     > >> to
>> >>>     > >> > 3pm
>> >>>     > >> > > > > (let alone the time needed to drain the queue
>> >>>     afterwards).
>> >>>     > >> > > > >
>> >>>     > >> > > > > I think this has greatly impacted our productivity. I've
>> >>>     > >> experienced
>> >>>     > >> > > that
>> >>>     > >> > > > > PRs submitted in the early morning of PST time zone
>> >>>     won't finish
>> >>>     > >> > their
>> >>>     > >> > > > > build until late night of the same day.
>> >>>     > >> > > > >
>> >>>     > >> > > > > So my questions are:
>> >>>     > >> > > > >
>> >>>     > >> > > > > - Has anyone else experienced the same problem or
>> >>>     have similar
>> >>>     > >> > > > observation
>> >>>     > >> > > > > on TravisCI? (I suspect it has things to do with time
>> >>>     zone)
>> >>>     > >> > > > >
>> >>>     > >> > > > > - What pricing plan of TravisCI is Flink currently
>> >>>     using? Is it
>> >>>     > >> the
>> >>>     > >> > > free
>> >>>     > >> > > > > plan for open source projects? What are the
>> >>>     guaranteed build
>> >>>     > >> capacity
>> >>>     > >> > > of
>> >>>     > >> > > > > the current plan?
>> >>>     > >> > > > >
>> >>>     > >> > > > > - If the current pricing plan (either free or paid)
>> >> can't
>> >>>     > provide
>> >>>     > >> > > stable
>> >>>     > >> > > > > build capacity, can we upgrade to a higher priced
>> >>>     plan with
>> >>>     > larger
>> >>>     > >> > and
>> >>>     > >> > > > more
>> >>>     > >> > > > > stable build capacity?
>> >>>     > >> > > > >
>> >>>     > >> > > > > BTW, another factor that contribute to the
>> >>>     productivity problem
>> >>>     > is
>> >>>     > >> > that
>> >>>     > >> > > > > our build is slow - we run full build for every PR and a
>> >>>     > >> successful
>> >>>     > >> > > full
>> >>>     > >> > > > > build takes ~5h. We definitely have more options to
>> >>>     solve it,
>> >>>     > for
>> >>>     > >> > > > instance,
>> >>>     > >> > > > > modularize the build graphs and reuse artifacts from the
>> >>>     > previous
>> >>>     > >> > > build.
>> >>>     > >> > > > > But I think that can be a big effort which is much
>> >>>     harder to
>> >>>     > >> > accomplish
>> >>>     > >> > > > in
>> >>>     > >> > > > > a short period of time and may deserve its own separate
>> >>>     > >> discussion.
>> >>>     > >> > > > >
>> >>>     > >> > > > > [1] https://travis-ci.org/apache/flink/pull_requests
>> >>>     > >> > > > >
>> >>>     > >> > > > >
>> >>>     > >> > > >
>> >>>     > >> > >
>> >>>     > >> >
>> >>>     > >>
>> >>>     > >
>> >>>     >
>> >>>
>> >>>
>> >>>     --
>> >>>     Best Regards
>> >>>
>> >>>     Jeff Zhang
>> >>>
>> >>
>> 

Re: [DISCUSS] solve unstable build capacity problem on TravisCI

Posted by Bowen Li <bo...@gmail.com>.
Hi,

@Dawid, I think the "long test running" as I mentioned in the first email,
also as you guys said, belongs to "a big effort which is much harder to
accomplish in a short period of time and may deserve its own separate
discussion". Thus I didn't include it in what we can do in a foreseeable
short term.

Besides, I don't think that's the ultimate reason for lack of build
resources. Even if the build is shortened to something like 2h, the
problems of no build machine works about 6 or more hours in PST daytime
that I described will still happen, because no machine from ASF INFRA's
pool is allocated to Flink. As I have paid close attention to the build
queue in the past few weekdays, it's a pretty clear pattern now.

**The ultimate root cause** for that is - we don't have any **dedicated**
build resources that we can stably rely on. I'm actually ok to wait for a
long time if there are build requests running, it means at least we are
making progress. But I'm not ok with no build resource. A better place I
think we should aim at in short term is to always have at least a central
pool (can be 3 or 5) of machines dedicated to build Flink at any time, or
maybe use users resources.

@Chesnay @Robert I synced with Jeff offline that Zeppelin community is
using a Jenkins job to automatically build on users' travis account and
link the result back to github PR. I guess the Jenkins job would fetch
latest upstream master and build the PR against it. Jeff has filed tickets
to learn and get access to the Jenkins infra. It'll better to fully
understand it first before judging this approach.

I also heard good things about CircleCI, and ASF INFRA seems to have a pool
of build capacity there too. Can be an alternative to consider.









On Wed, Jun 26, 2019 at 12:44 AM Dawid Wysakowicz <dw...@apache.org>
wrote:

> Sorry to jump in late, but I think Bowen missed the most important point
> from Chesnay's previous message in the summary. The ultimate reason for
> all the problems is that the tests take close to 2 hours to run already.
> I fully support this claim: "Unless people start caring about test times
> before adding them, this issue cannot be solved"
>
> This is also another reason why using user's Travis account won't help.
> Every few weeks we reach the user's time limit for a single profile.
> This makes the user's builds simply fail, until we either properly
> decrease the time the tests take (which I am not sure we ever did) or
> postpone the problem by splitting into more profiles. (Note that the ASF
> Travis account has higher time limits)
>
> Best,
>
> Dawid
>
> On 26/06/2019 09:36, Robert Metzger wrote:
> > Do we know if using "the best" available hardware would improve the build
> > times?
> > Imagine we would run the build on machines with plenty of main memory to
> > mount everything to ramdisk + the latest CPU architecture?
> >
> > Throwing hardware at the problem could help reduce the time of an
> > individual build, and using our own infrastructure would remove our
> > dependency on Apache's Travis account (with the obvious downside of
> having
> > to maintain the infrastructure)
> > We could use an open source travis alternative, to have a similar
> > experience and make the migration easy.
> >
> >
> > On Wed, Jun 26, 2019 at 9:34 AM Chesnay Schepler <ch...@apache.org>
> wrote:
> >
> >>  From what I gathered, there's no special sauce that the Zeppelin
> >> project uses which actually integrates a users Travis account into the
> PR.
> >>
> >> They just disabled Travis for PRs. And that's kind of it.
> >>
> >> Naturally we can do this (duh) and safe the ASF a fair amount of
> >> resources, but there are downsides:
> >>
> >> The discoverability of the Travis check takes a nose-dive. Either we
> >> require every contributor to always, an every commit, also post a Travis
> >> build, or we have the reviewer sift through the contributors account to
> >> find it.
> >>
> >> This is rather cumbersome. Additionally, it's also not equivalent to
> >> having a PR build.
> >>
> >> A normal branch build takes a branch as is and tests it. A PR build
> >> merges the branch into master, and then runs it. (Fun fact: This is why
> >> a PR without merge conflicts is not being run on Travis.)
> >>
> >> And ultimately, everyone can already make use of this approach anyway.
> >>
> >> On 25/06/2019 08:02, Jark Wu wrote:
> >>> Hi Jeff,
> >>>
> >>> Thanks for sharing the Zeppelin approach. I think it's a good idea to
> >>> leverage user's travis account.
> >>> In this way, we can have almost unlimited concurrent build jobs and
> >>> developers can restart build by themselves (currently only committers
> >>> can restart PR's build).
> >>>
> >>> But I'm still not very clear how to integrate user's travis build into
> >>> the Flink pull request's build automatically. Can you explain more in
> >>> detail?
> >>>
> >>> Another question: does travis only build branches for user account?
> >>> My concern is that builds for PRs will rebase user's commits against
> >>> current master branch.
> >>> This will help us to find problems before merge.  Builds for branches
> >>> will lose the impact of new commits in master.
> >>> How does Zeppelin solve this problem?
> >>>
> >>> Thanks again for sharing the idea.
> >>>
> >>> Regards,
> >>> Jark
> >>>
> >>> On Tue, 25 Jun 2019 at 11:01, Jeff Zhang <zjffdu@gmail.com
> >>> <ma...@gmail.com>> wrote:
> >>>
> >>>     Hi Folks,
> >>>
> >>>     Zeppelin meet this kind of issue before, we solve it by delegating
> >>>     each
> >>>     one's PR build to his travis account (Everyone can have 5 free
> >>>     slot for
> >>>     travis build).
> >>>     Apache account travis build is only triggered when PR is merged.
> >>>
> >>>
> >>>
> >>>     Kurt Young <ykt836@gmail.com <ma...@gmail.com>>
> >>>     于2019年6月25日周二 上午10:16写道:
> >>>
> >>>     > (Forgot to cc George)
> >>>     >
> >>>     > Best,
> >>>     > Kurt
> >>>     >
> >>>     >
> >>>     > On Tue, Jun 25, 2019 at 10:16 AM Kurt Young <ykt836@gmail.com
> >>>     <ma...@gmail.com>> wrote:
> >>>     >
> >>>     > > Hi Bowen,
> >>>     > >
> >>>     > > Thanks for bringing this up. We actually have discussed about
> >>>     this, and I
> >>>     > > think Till and George have
> >>>     > > already spend sometime investigating it. I have cced both of
> >>>     them, and
> >>>     > > maybe they can share
> >>>     > > their findings.
> >>>     > >
> >>>     > > Best,
> >>>     > > Kurt
> >>>     > >
> >>>     > >
> >>>     > > On Tue, Jun 25, 2019 at 10:08 AM Jark Wu <imjark@gmail.com
> >>>     <ma...@gmail.com>> wrote:
> >>>     > >
> >>>     > >> Hi Bowen,
> >>>     > >>
> >>>     > >> Thanks for bringing this. We also suffered from the long
> >>>     build time.
> >>>     > >> I agree that we should focus on solving build capacity
> >>>     problem in the
> >>>     > >> thread.
> >>>     > >>
> >>>     > >> My observation is there is only one build is running, all the
> >>>     others
> >>>     > >> (other
> >>>     > >> PRs, master) are pending.
> >>>     > >> The pricing plan[1] of travis shows it can support concurrent
> >>>     build
> >>>     > jobs.
> >>>     > >> But I don't know which plan we are using, might be the free
> >>>     plan for
> >>>     > open
> >>>     > >> source.
> >>>     > >>
> >>>     > >> I cc-ed Chesnay who may have some experience on Travis.
> >>>     > >>
> >>>     > >> Regards,
> >>>     > >> Jark
> >>>     > >>
> >>>     > >> [1]: https://travis-ci.com/plans
> >>>     > >>
> >>>     > >> On Tue, 25 Jun 2019 at 08:11, Bowen Li <bowenli86@gmail.com
> >>>     <ma...@gmail.com>> wrote:
> >>>     > >>
> >>>     > >> > Hi Steven,
> >>>     > >> >
> >>>     > >> > I think you may not read what I wrote. The discussion is
> about
> >>>     > "unstable
> >>>     > >> > build **capacity**", in another word "unstable / lack of
> build
> >>>     > >> resources",
> >>>     > >> > not "unstable build".
> >>>     > >> >
> >>>     > >> > On Mon, Jun 24, 2019 at 4:40 PM Steven Wu
> >>>     <stevenz3wu@gmail.com <ma...@gmail.com>>
> >>>     > wrote:
> >>>     > >> >
> >>>     > >> > > long and sometimes unstable build is definitely a pain
> >> point.
> >>>     > >> > >
> >>>     > >> > > I suspect the build failure here in flink-connector-kafka
> >>>     is not
> >>>     > >> related
> >>>     > >> > to
> >>>     > >> > > my change. but there is no easy re-run the build on
> >>>     travis UI.
> >>>     > Google
> >>>     > >> > > search showed a trick of close-and-open the PR will
> >>>     trigger rebuild.
> >>>     > >> but
> >>>     > >> > > that could add noises to the PR activities.
> >>>     > >> > > https://travis-ci.org/apache/flink/jobs/545555519
> >>>     > >> > >
> >>>     > >> > > travis-ci for my personal repo often failed with
> >>>     exceeding time
> >>>     > limit
> >>>     > >> > after
> >>>     > >> > > 4+ hours.
> >>>     > >> > > The job exceeded the maximum time limit for jobs, and has
> >>>     been
> >>>     > >> > terminated.
> >>>     > >> > >
> >>>     > >> > > On Mon, Jun 24, 2019 at 4:15 PM Bowen Li
> >>>     <bowenli86@gmail.com <ma...@gmail.com>>
> >>>     > wrote:
> >>>     > >> > >
> >>>     > >> > > > https://travis-ci.org/apache/flink/builds/549681530
> >>>     This build
> >>>     > >> > request
> >>>     > >> > > > has
> >>>     > >> > > > been sitting at **HEAD of the queue** since I first saw
> >>>     it at PST
> >>>     > >> > 10:30am
> >>>     > >> > > > (not sure how long it's been there before 10:30am).
> >>>     It's PST
> >>>     > 4:12pm
> >>>     > >> now
> >>>     > >> > > and
> >>>     > >> > > > it hasn't started yet.
> >>>     > >> > > >
> >>>     > >> > > > On Mon, Jun 24, 2019 at 2:48 PM Bowen Li
> >>>     <bowenli86@gmail.com <ma...@gmail.com>>
> >>>     > >> wrote:
> >>>     > >> > > >
> >>>     > >> > > > > Hi devs,
> >>>     > >> > > > >
> >>>     > >> > > > > I've been experiencing the pain resulting from lack
> >>>     of stable
> >>>     > >> build
> >>>     > >> > > > > capacity on Travis for Flink PRs [1]. Specifically, I
> >>>     noticed
> >>>     > >> often
> >>>     > >> > > that
> >>>     > >> > > > no
> >>>     > >> > > > > build in the queue is making any progress for hours,
> and
> >>>     > suddenly
> >>>     > >> 5
> >>>     > >> > or
> >>>     > >> > > 6
> >>>     > >> > > > > builds kick off all together after the long pause.
> >>>     I'm at PST
> >>>     > >> > (UTC-08)
> >>>     > >> > > > time
> >>>     > >> > > > > zone, and I've seen pause can be as long as 6 hours
> >>>     from PST 9am
> >>>     > >> to
> >>>     > >> > 3pm
> >>>     > >> > > > > (let alone the time needed to drain the queue
> >>>     afterwards).
> >>>     > >> > > > >
> >>>     > >> > > > > I think this has greatly impacted our productivity.
> I've
> >>>     > >> experienced
> >>>     > >> > > that
> >>>     > >> > > > > PRs submitted in the early morning of PST time zone
> >>>     won't finish
> >>>     > >> > their
> >>>     > >> > > > > build until late night of the same day.
> >>>     > >> > > > >
> >>>     > >> > > > > So my questions are:
> >>>     > >> > > > >
> >>>     > >> > > > > - Has anyone else experienced the same problem or
> >>>     have similar
> >>>     > >> > > > observation
> >>>     > >> > > > > on TravisCI? (I suspect it has things to do with time
> >>>     zone)
> >>>     > >> > > > >
> >>>     > >> > > > > - What pricing plan of TravisCI is Flink currently
> >>>     using? Is it
> >>>     > >> the
> >>>     > >> > > free
> >>>     > >> > > > > plan for open source projects? What are the
> >>>     guaranteed build
> >>>     > >> capacity
> >>>     > >> > > of
> >>>     > >> > > > > the current plan?
> >>>     > >> > > > >
> >>>     > >> > > > > - If the current pricing plan (either free or paid)
> >> can't
> >>>     > provide
> >>>     > >> > > stable
> >>>     > >> > > > > build capacity, can we upgrade to a higher priced
> >>>     plan with
> >>>     > larger
> >>>     > >> > and
> >>>     > >> > > > more
> >>>     > >> > > > > stable build capacity?
> >>>     > >> > > > >
> >>>     > >> > > > > BTW, another factor that contribute to the
> >>>     productivity problem
> >>>     > is
> >>>     > >> > that
> >>>     > >> > > > > our build is slow - we run full build for every PR
> and a
> >>>     > >> successful
> >>>     > >> > > full
> >>>     > >> > > > > build takes ~5h. We definitely have more options to
> >>>     solve it,
> >>>     > for
> >>>     > >> > > > instance,
> >>>     > >> > > > > modularize the build graphs and reuse artifacts from
> the
> >>>     > previous
> >>>     > >> > > build.
> >>>     > >> > > > > But I think that can be a big effort which is much
> >>>     harder to
> >>>     > >> > accomplish
> >>>     > >> > > > in
> >>>     > >> > > > > a short period of time and may deserve its own
> separate
> >>>     > >> discussion.
> >>>     > >> > > > >
> >>>     > >> > > > > [1] https://travis-ci.org/apache/flink/pull_requests
> >>>     > >> > > > >
> >>>     > >> > > > >
> >>>     > >> > > >
> >>>     > >> > >
> >>>     > >> >
> >>>     > >>
> >>>     > >
> >>>     >
> >>>
> >>>
> >>>     --
> >>>     Best Regards
> >>>
> >>>     Jeff Zhang
> >>>
> >>
>
>

Re: [DISCUSS] solve unstable build capacity problem on TravisCI

Posted by Dawid Wysakowicz <dw...@apache.org>.
Sorry to jump in late, but I think Bowen missed the most important point
from Chesnay's previous message in the summary. The ultimate reason for
all the problems is that the tests take close to 2 hours to run already.
I fully support this claim: "Unless people start caring about test times
before adding them, this issue cannot be solved"

This is also another reason why using user's Travis account won't help.
Every few weeks we reach the user's time limit for a single profile.
This makes the user's builds simply fail, until we either properly
decrease the time the tests take (which I am not sure we ever did) or
postpone the problem by splitting into more profiles. (Note that the ASF
Travis account has higher time limits)

Best,

Dawid

On 26/06/2019 09:36, Robert Metzger wrote:
> Do we know if using "the best" available hardware would improve the build
> times?
> Imagine we would run the build on machines with plenty of main memory to
> mount everything to ramdisk + the latest CPU architecture?
>
> Throwing hardware at the problem could help reduce the time of an
> individual build, and using our own infrastructure would remove our
> dependency on Apache's Travis account (with the obvious downside of having
> to maintain the infrastructure)
> We could use an open source travis alternative, to have a similar
> experience and make the migration easy.
>
>
> On Wed, Jun 26, 2019 at 9:34 AM Chesnay Schepler <ch...@apache.org> wrote:
>
>>  From what I gathered, there's no special sauce that the Zeppelin
>> project uses which actually integrates a users Travis account into the PR.
>>
>> They just disabled Travis for PRs. And that's kind of it.
>>
>> Naturally we can do this (duh) and safe the ASF a fair amount of
>> resources, but there are downsides:
>>
>> The discoverability of the Travis check takes a nose-dive. Either we
>> require every contributor to always, an every commit, also post a Travis
>> build, or we have the reviewer sift through the contributors account to
>> find it.
>>
>> This is rather cumbersome. Additionally, it's also not equivalent to
>> having a PR build.
>>
>> A normal branch build takes a branch as is and tests it. A PR build
>> merges the branch into master, and then runs it. (Fun fact: This is why
>> a PR without merge conflicts is not being run on Travis.)
>>
>> And ultimately, everyone can already make use of this approach anyway.
>>
>> On 25/06/2019 08:02, Jark Wu wrote:
>>> Hi Jeff,
>>>
>>> Thanks for sharing the Zeppelin approach. I think it's a good idea to
>>> leverage user's travis account.
>>> In this way, we can have almost unlimited concurrent build jobs and
>>> developers can restart build by themselves (currently only committers
>>> can restart PR's build).
>>>
>>> But I'm still not very clear how to integrate user's travis build into
>>> the Flink pull request's build automatically. Can you explain more in
>>> detail?
>>>
>>> Another question: does travis only build branches for user account?
>>> My concern is that builds for PRs will rebase user's commits against
>>> current master branch.
>>> This will help us to find problems before merge.  Builds for branches
>>> will lose the impact of new commits in master.
>>> How does Zeppelin solve this problem?
>>>
>>> Thanks again for sharing the idea.
>>>
>>> Regards,
>>> Jark
>>>
>>> On Tue, 25 Jun 2019 at 11:01, Jeff Zhang <zjffdu@gmail.com
>>> <ma...@gmail.com>> wrote:
>>>
>>>     Hi Folks,
>>>
>>>     Zeppelin meet this kind of issue before, we solve it by delegating
>>>     each
>>>     one's PR build to his travis account (Everyone can have 5 free
>>>     slot for
>>>     travis build).
>>>     Apache account travis build is only triggered when PR is merged.
>>>
>>>
>>>
>>>     Kurt Young <ykt836@gmail.com <ma...@gmail.com>>
>>>     于2019年6月25日周二 上午10:16写道:
>>>
>>>     > (Forgot to cc George)
>>>     >
>>>     > Best,
>>>     > Kurt
>>>     >
>>>     >
>>>     > On Tue, Jun 25, 2019 at 10:16 AM Kurt Young <ykt836@gmail.com
>>>     <ma...@gmail.com>> wrote:
>>>     >
>>>     > > Hi Bowen,
>>>     > >
>>>     > > Thanks for bringing this up. We actually have discussed about
>>>     this, and I
>>>     > > think Till and George have
>>>     > > already spend sometime investigating it. I have cced both of
>>>     them, and
>>>     > > maybe they can share
>>>     > > their findings.
>>>     > >
>>>     > > Best,
>>>     > > Kurt
>>>     > >
>>>     > >
>>>     > > On Tue, Jun 25, 2019 at 10:08 AM Jark Wu <imjark@gmail.com
>>>     <ma...@gmail.com>> wrote:
>>>     > >
>>>     > >> Hi Bowen,
>>>     > >>
>>>     > >> Thanks for bringing this. We also suffered from the long
>>>     build time.
>>>     > >> I agree that we should focus on solving build capacity
>>>     problem in the
>>>     > >> thread.
>>>     > >>
>>>     > >> My observation is there is only one build is running, all the
>>>     others
>>>     > >> (other
>>>     > >> PRs, master) are pending.
>>>     > >> The pricing plan[1] of travis shows it can support concurrent
>>>     build
>>>     > jobs.
>>>     > >> But I don't know which plan we are using, might be the free
>>>     plan for
>>>     > open
>>>     > >> source.
>>>     > >>
>>>     > >> I cc-ed Chesnay who may have some experience on Travis.
>>>     > >>
>>>     > >> Regards,
>>>     > >> Jark
>>>     > >>
>>>     > >> [1]: https://travis-ci.com/plans
>>>     > >>
>>>     > >> On Tue, 25 Jun 2019 at 08:11, Bowen Li <bowenli86@gmail.com
>>>     <ma...@gmail.com>> wrote:
>>>     > >>
>>>     > >> > Hi Steven,
>>>     > >> >
>>>     > >> > I think you may not read what I wrote. The discussion is about
>>>     > "unstable
>>>     > >> > build **capacity**", in another word "unstable / lack of build
>>>     > >> resources",
>>>     > >> > not "unstable build".
>>>     > >> >
>>>     > >> > On Mon, Jun 24, 2019 at 4:40 PM Steven Wu
>>>     <stevenz3wu@gmail.com <ma...@gmail.com>>
>>>     > wrote:
>>>     > >> >
>>>     > >> > > long and sometimes unstable build is definitely a pain
>> point.
>>>     > >> > >
>>>     > >> > > I suspect the build failure here in flink-connector-kafka
>>>     is not
>>>     > >> related
>>>     > >> > to
>>>     > >> > > my change. but there is no easy re-run the build on
>>>     travis UI.
>>>     > Google
>>>     > >> > > search showed a trick of close-and-open the PR will
>>>     trigger rebuild.
>>>     > >> but
>>>     > >> > > that could add noises to the PR activities.
>>>     > >> > > https://travis-ci.org/apache/flink/jobs/545555519
>>>     > >> > >
>>>     > >> > > travis-ci for my personal repo often failed with
>>>     exceeding time
>>>     > limit
>>>     > >> > after
>>>     > >> > > 4+ hours.
>>>     > >> > > The job exceeded the maximum time limit for jobs, and has
>>>     been
>>>     > >> > terminated.
>>>     > >> > >
>>>     > >> > > On Mon, Jun 24, 2019 at 4:15 PM Bowen Li
>>>     <bowenli86@gmail.com <ma...@gmail.com>>
>>>     > wrote:
>>>     > >> > >
>>>     > >> > > > https://travis-ci.org/apache/flink/builds/549681530
>>>     This build
>>>     > >> > request
>>>     > >> > > > has
>>>     > >> > > > been sitting at **HEAD of the queue** since I first saw
>>>     it at PST
>>>     > >> > 10:30am
>>>     > >> > > > (not sure how long it's been there before 10:30am).
>>>     It's PST
>>>     > 4:12pm
>>>     > >> now
>>>     > >> > > and
>>>     > >> > > > it hasn't started yet.
>>>     > >> > > >
>>>     > >> > > > On Mon, Jun 24, 2019 at 2:48 PM Bowen Li
>>>     <bowenli86@gmail.com <ma...@gmail.com>>
>>>     > >> wrote:
>>>     > >> > > >
>>>     > >> > > > > Hi devs,
>>>     > >> > > > >
>>>     > >> > > > > I've been experiencing the pain resulting from lack
>>>     of stable
>>>     > >> build
>>>     > >> > > > > capacity on Travis for Flink PRs [1]. Specifically, I
>>>     noticed
>>>     > >> often
>>>     > >> > > that
>>>     > >> > > > no
>>>     > >> > > > > build in the queue is making any progress for hours, and
>>>     > suddenly
>>>     > >> 5
>>>     > >> > or
>>>     > >> > > 6
>>>     > >> > > > > builds kick off all together after the long pause.
>>>     I'm at PST
>>>     > >> > (UTC-08)
>>>     > >> > > > time
>>>     > >> > > > > zone, and I've seen pause can be as long as 6 hours
>>>     from PST 9am
>>>     > >> to
>>>     > >> > 3pm
>>>     > >> > > > > (let alone the time needed to drain the queue
>>>     afterwards).
>>>     > >> > > > >
>>>     > >> > > > > I think this has greatly impacted our productivity. I've
>>>     > >> experienced
>>>     > >> > > that
>>>     > >> > > > > PRs submitted in the early morning of PST time zone
>>>     won't finish
>>>     > >> > their
>>>     > >> > > > > build until late night of the same day.
>>>     > >> > > > >
>>>     > >> > > > > So my questions are:
>>>     > >> > > > >
>>>     > >> > > > > - Has anyone else experienced the same problem or
>>>     have similar
>>>     > >> > > > observation
>>>     > >> > > > > on TravisCI? (I suspect it has things to do with time
>>>     zone)
>>>     > >> > > > >
>>>     > >> > > > > - What pricing plan of TravisCI is Flink currently
>>>     using? Is it
>>>     > >> the
>>>     > >> > > free
>>>     > >> > > > > plan for open source projects? What are the
>>>     guaranteed build
>>>     > >> capacity
>>>     > >> > > of
>>>     > >> > > > > the current plan?
>>>     > >> > > > >
>>>     > >> > > > > - If the current pricing plan (either free or paid)
>> can't
>>>     > provide
>>>     > >> > > stable
>>>     > >> > > > > build capacity, can we upgrade to a higher priced
>>>     plan with
>>>     > larger
>>>     > >> > and
>>>     > >> > > > more
>>>     > >> > > > > stable build capacity?
>>>     > >> > > > >
>>>     > >> > > > > BTW, another factor that contribute to the
>>>     productivity problem
>>>     > is
>>>     > >> > that
>>>     > >> > > > > our build is slow - we run full build for every PR and a
>>>     > >> successful
>>>     > >> > > full
>>>     > >> > > > > build takes ~5h. We definitely have more options to
>>>     solve it,
>>>     > for
>>>     > >> > > > instance,
>>>     > >> > > > > modularize the build graphs and reuse artifacts from the
>>>     > previous
>>>     > >> > > build.
>>>     > >> > > > > But I think that can be a big effort which is much
>>>     harder to
>>>     > >> > accomplish
>>>     > >> > > > in
>>>     > >> > > > > a short period of time and may deserve its own separate
>>>     > >> discussion.
>>>     > >> > > > >
>>>     > >> > > > > [1] https://travis-ci.org/apache/flink/pull_requests
>>>     > >> > > > >
>>>     > >> > > > >
>>>     > >> > > >
>>>     > >> > >
>>>     > >> >
>>>     > >>
>>>     > >
>>>     >
>>>
>>>
>>>     --
>>>     Best Regards
>>>
>>>     Jeff Zhang
>>>
>>


Re: [DISCUSS] solve unstable build capacity problem on TravisCI

Posted by Robert Metzger <rm...@apache.org>.
Do we know if using "the best" available hardware would improve the build
times?
Imagine we would run the build on machines with plenty of main memory to
mount everything to ramdisk + the latest CPU architecture?

Throwing hardware at the problem could help reduce the time of an
individual build, and using our own infrastructure would remove our
dependency on Apache's Travis account (with the obvious downside of having
to maintain the infrastructure)
We could use an open source travis alternative, to have a similar
experience and make the migration easy.


On Wed, Jun 26, 2019 at 9:34 AM Chesnay Schepler <ch...@apache.org> wrote:

>  From what I gathered, there's no special sauce that the Zeppelin
> project uses which actually integrates a users Travis account into the PR.
>
> They just disabled Travis for PRs. And that's kind of it.
>
> Naturally we can do this (duh) and safe the ASF a fair amount of
> resources, but there are downsides:
>
> The discoverability of the Travis check takes a nose-dive. Either we
> require every contributor to always, an every commit, also post a Travis
> build, or we have the reviewer sift through the contributors account to
> find it.
>
> This is rather cumbersome. Additionally, it's also not equivalent to
> having a PR build.
>
> A normal branch build takes a branch as is and tests it. A PR build
> merges the branch into master, and then runs it. (Fun fact: This is why
> a PR without merge conflicts is not being run on Travis.)
>
> And ultimately, everyone can already make use of this approach anyway.
>
> On 25/06/2019 08:02, Jark Wu wrote:
> > Hi Jeff,
> >
> > Thanks for sharing the Zeppelin approach. I think it's a good idea to
> > leverage user's travis account.
> > In this way, we can have almost unlimited concurrent build jobs and
> > developers can restart build by themselves (currently only committers
> > can restart PR's build).
> >
> > But I'm still not very clear how to integrate user's travis build into
> > the Flink pull request's build automatically. Can you explain more in
> > detail?
> >
> > Another question: does travis only build branches for user account?
> > My concern is that builds for PRs will rebase user's commits against
> > current master branch.
> > This will help us to find problems before merge.  Builds for branches
> > will lose the impact of new commits in master.
> > How does Zeppelin solve this problem?
> >
> > Thanks again for sharing the idea.
> >
> > Regards,
> > Jark
> >
> > On Tue, 25 Jun 2019 at 11:01, Jeff Zhang <zjffdu@gmail.com
> > <ma...@gmail.com>> wrote:
> >
> >     Hi Folks,
> >
> >     Zeppelin meet this kind of issue before, we solve it by delegating
> >     each
> >     one's PR build to his travis account (Everyone can have 5 free
> >     slot for
> >     travis build).
> >     Apache account travis build is only triggered when PR is merged.
> >
> >
> >
> >     Kurt Young <ykt836@gmail.com <ma...@gmail.com>>
> >     于2019年6月25日周二 上午10:16写道:
> >
> >     > (Forgot to cc George)
> >     >
> >     > Best,
> >     > Kurt
> >     >
> >     >
> >     > On Tue, Jun 25, 2019 at 10:16 AM Kurt Young <ykt836@gmail.com
> >     <ma...@gmail.com>> wrote:
> >     >
> >     > > Hi Bowen,
> >     > >
> >     > > Thanks for bringing this up. We actually have discussed about
> >     this, and I
> >     > > think Till and George have
> >     > > already spend sometime investigating it. I have cced both of
> >     them, and
> >     > > maybe they can share
> >     > > their findings.
> >     > >
> >     > > Best,
> >     > > Kurt
> >     > >
> >     > >
> >     > > On Tue, Jun 25, 2019 at 10:08 AM Jark Wu <imjark@gmail.com
> >     <ma...@gmail.com>> wrote:
> >     > >
> >     > >> Hi Bowen,
> >     > >>
> >     > >> Thanks for bringing this. We also suffered from the long
> >     build time.
> >     > >> I agree that we should focus on solving build capacity
> >     problem in the
> >     > >> thread.
> >     > >>
> >     > >> My observation is there is only one build is running, all the
> >     others
> >     > >> (other
> >     > >> PRs, master) are pending.
> >     > >> The pricing plan[1] of travis shows it can support concurrent
> >     build
> >     > jobs.
> >     > >> But I don't know which plan we are using, might be the free
> >     plan for
> >     > open
> >     > >> source.
> >     > >>
> >     > >> I cc-ed Chesnay who may have some experience on Travis.
> >     > >>
> >     > >> Regards,
> >     > >> Jark
> >     > >>
> >     > >> [1]: https://travis-ci.com/plans
> >     > >>
> >     > >> On Tue, 25 Jun 2019 at 08:11, Bowen Li <bowenli86@gmail.com
> >     <ma...@gmail.com>> wrote:
> >     > >>
> >     > >> > Hi Steven,
> >     > >> >
> >     > >> > I think you may not read what I wrote. The discussion is about
> >     > "unstable
> >     > >> > build **capacity**", in another word "unstable / lack of build
> >     > >> resources",
> >     > >> > not "unstable build".
> >     > >> >
> >     > >> > On Mon, Jun 24, 2019 at 4:40 PM Steven Wu
> >     <stevenz3wu@gmail.com <ma...@gmail.com>>
> >     > wrote:
> >     > >> >
> >     > >> > > long and sometimes unstable build is definitely a pain
> point.
> >     > >> > >
> >     > >> > > I suspect the build failure here in flink-connector-kafka
> >     is not
> >     > >> related
> >     > >> > to
> >     > >> > > my change. but there is no easy re-run the build on
> >     travis UI.
> >     > Google
> >     > >> > > search showed a trick of close-and-open the PR will
> >     trigger rebuild.
> >     > >> but
> >     > >> > > that could add noises to the PR activities.
> >     > >> > > https://travis-ci.org/apache/flink/jobs/545555519
> >     > >> > >
> >     > >> > > travis-ci for my personal repo often failed with
> >     exceeding time
> >     > limit
> >     > >> > after
> >     > >> > > 4+ hours.
> >     > >> > > The job exceeded the maximum time limit for jobs, and has
> >     been
> >     > >> > terminated.
> >     > >> > >
> >     > >> > > On Mon, Jun 24, 2019 at 4:15 PM Bowen Li
> >     <bowenli86@gmail.com <ma...@gmail.com>>
> >     > wrote:
> >     > >> > >
> >     > >> > > > https://travis-ci.org/apache/flink/builds/549681530
> >     This build
> >     > >> > request
> >     > >> > > > has
> >     > >> > > > been sitting at **HEAD of the queue** since I first saw
> >     it at PST
> >     > >> > 10:30am
> >     > >> > > > (not sure how long it's been there before 10:30am).
> >     It's PST
> >     > 4:12pm
> >     > >> now
> >     > >> > > and
> >     > >> > > > it hasn't started yet.
> >     > >> > > >
> >     > >> > > > On Mon, Jun 24, 2019 at 2:48 PM Bowen Li
> >     <bowenli86@gmail.com <ma...@gmail.com>>
> >     > >> wrote:
> >     > >> > > >
> >     > >> > > > > Hi devs,
> >     > >> > > > >
> >     > >> > > > > I've been experiencing the pain resulting from lack
> >     of stable
> >     > >> build
> >     > >> > > > > capacity on Travis for Flink PRs [1]. Specifically, I
> >     noticed
> >     > >> often
> >     > >> > > that
> >     > >> > > > no
> >     > >> > > > > build in the queue is making any progress for hours, and
> >     > suddenly
> >     > >> 5
> >     > >> > or
> >     > >> > > 6
> >     > >> > > > > builds kick off all together after the long pause.
> >     I'm at PST
> >     > >> > (UTC-08)
> >     > >> > > > time
> >     > >> > > > > zone, and I've seen pause can be as long as 6 hours
> >     from PST 9am
> >     > >> to
> >     > >> > 3pm
> >     > >> > > > > (let alone the time needed to drain the queue
> >     afterwards).
> >     > >> > > > >
> >     > >> > > > > I think this has greatly impacted our productivity. I've
> >     > >> experienced
> >     > >> > > that
> >     > >> > > > > PRs submitted in the early morning of PST time zone
> >     won't finish
> >     > >> > their
> >     > >> > > > > build until late night of the same day.
> >     > >> > > > >
> >     > >> > > > > So my questions are:
> >     > >> > > > >
> >     > >> > > > > - Has anyone else experienced the same problem or
> >     have similar
> >     > >> > > > observation
> >     > >> > > > > on TravisCI? (I suspect it has things to do with time
> >     zone)
> >     > >> > > > >
> >     > >> > > > > - What pricing plan of TravisCI is Flink currently
> >     using? Is it
> >     > >> the
> >     > >> > > free
> >     > >> > > > > plan for open source projects? What are the
> >     guaranteed build
> >     > >> capacity
> >     > >> > > of
> >     > >> > > > > the current plan?
> >     > >> > > > >
> >     > >> > > > > - If the current pricing plan (either free or paid)
> can't
> >     > provide
> >     > >> > > stable
> >     > >> > > > > build capacity, can we upgrade to a higher priced
> >     plan with
> >     > larger
> >     > >> > and
> >     > >> > > > more
> >     > >> > > > > stable build capacity?
> >     > >> > > > >
> >     > >> > > > > BTW, another factor that contribute to the
> >     productivity problem
> >     > is
> >     > >> > that
> >     > >> > > > > our build is slow - we run full build for every PR and a
> >     > >> successful
> >     > >> > > full
> >     > >> > > > > build takes ~5h. We definitely have more options to
> >     solve it,
> >     > for
> >     > >> > > > instance,
> >     > >> > > > > modularize the build graphs and reuse artifacts from the
> >     > previous
> >     > >> > > build.
> >     > >> > > > > But I think that can be a big effort which is much
> >     harder to
> >     > >> > accomplish
> >     > >> > > > in
> >     > >> > > > > a short period of time and may deserve its own separate
> >     > >> discussion.
> >     > >> > > > >
> >     > >> > > > > [1] https://travis-ci.org/apache/flink/pull_requests
> >     > >> > > > >
> >     > >> > > > >
> >     > >> > > >
> >     > >> > >
> >     > >> >
> >     > >>
> >     > >
> >     >
> >
> >
> >     --
> >     Best Regards
> >
> >     Jeff Zhang
> >
>
>

Re: [DISCUSS] solve unstable build capacity problem on TravisCI

Posted by Chesnay Schepler <ch...@apache.org>.
 From what I gathered, there's no special sauce that the Zeppelin 
project uses which actually integrates a users Travis account into the PR.

They just disabled Travis for PRs. And that's kind of it.

Naturally we can do this (duh) and safe the ASF a fair amount of 
resources, but there are downsides:

The discoverability of the Travis check takes a nose-dive. Either we 
require every contributor to always, an every commit, also post a Travis 
build, or we have the reviewer sift through the contributors account to 
find it.

This is rather cumbersome. Additionally, it's also not equivalent to 
having a PR build.

A normal branch build takes a branch as is and tests it. A PR build 
merges the branch into master, and then runs it. (Fun fact: This is why 
a PR without merge conflicts is not being run on Travis.)

And ultimately, everyone can already make use of this approach anyway.

On 25/06/2019 08:02, Jark Wu wrote:
> Hi Jeff,
>
> Thanks for sharing the Zeppelin approach. I think it's a good idea to 
> leverage user's travis account.
> In this way, we can have almost unlimited concurrent build jobs and 
> developers can restart build by themselves (currently only committers 
> can restart PR's build).
>
> But I'm still not very clear how to integrate user's travis build into 
> the Flink pull request's build automatically. Can you explain more in 
> detail?
>
> Another question: does travis only build branches for user account?
> My concern is that builds for PRs will rebase user's commits against 
> current master branch.
> This will help us to find problems before merge.  Builds for branches 
> will lose the impact of new commits in master.
> How does Zeppelin solve this problem?
>
> Thanks again for sharing the idea.
>
> Regards,
> Jark
>
> On Tue, 25 Jun 2019 at 11:01, Jeff Zhang <zjffdu@gmail.com 
> <ma...@gmail.com>> wrote:
>
>     Hi Folks,
>
>     Zeppelin meet this kind of issue before, we solve it by delegating
>     each
>     one's PR build to his travis account (Everyone can have 5 free
>     slot for
>     travis build).
>     Apache account travis build is only triggered when PR is merged.
>
>
>
>     Kurt Young <ykt836@gmail.com <ma...@gmail.com>>
>     于2019年6月25日周二 上午10:16写道:
>
>     > (Forgot to cc George)
>     >
>     > Best,
>     > Kurt
>     >
>     >
>     > On Tue, Jun 25, 2019 at 10:16 AM Kurt Young <ykt836@gmail.com
>     <ma...@gmail.com>> wrote:
>     >
>     > > Hi Bowen,
>     > >
>     > > Thanks for bringing this up. We actually have discussed about
>     this, and I
>     > > think Till and George have
>     > > already spend sometime investigating it. I have cced both of
>     them, and
>     > > maybe they can share
>     > > their findings.
>     > >
>     > > Best,
>     > > Kurt
>     > >
>     > >
>     > > On Tue, Jun 25, 2019 at 10:08 AM Jark Wu <imjark@gmail.com
>     <ma...@gmail.com>> wrote:
>     > >
>     > >> Hi Bowen,
>     > >>
>     > >> Thanks for bringing this. We also suffered from the long
>     build time.
>     > >> I agree that we should focus on solving build capacity
>     problem in the
>     > >> thread.
>     > >>
>     > >> My observation is there is only one build is running, all the
>     others
>     > >> (other
>     > >> PRs, master) are pending.
>     > >> The pricing plan[1] of travis shows it can support concurrent
>     build
>     > jobs.
>     > >> But I don't know which plan we are using, might be the free
>     plan for
>     > open
>     > >> source.
>     > >>
>     > >> I cc-ed Chesnay who may have some experience on Travis.
>     > >>
>     > >> Regards,
>     > >> Jark
>     > >>
>     > >> [1]: https://travis-ci.com/plans
>     > >>
>     > >> On Tue, 25 Jun 2019 at 08:11, Bowen Li <bowenli86@gmail.com
>     <ma...@gmail.com>> wrote:
>     > >>
>     > >> > Hi Steven,
>     > >> >
>     > >> > I think you may not read what I wrote. The discussion is about
>     > "unstable
>     > >> > build **capacity**", in another word "unstable / lack of build
>     > >> resources",
>     > >> > not "unstable build".
>     > >> >
>     > >> > On Mon, Jun 24, 2019 at 4:40 PM Steven Wu
>     <stevenz3wu@gmail.com <ma...@gmail.com>>
>     > wrote:
>     > >> >
>     > >> > > long and sometimes unstable build is definitely a pain point.
>     > >> > >
>     > >> > > I suspect the build failure here in flink-connector-kafka
>     is not
>     > >> related
>     > >> > to
>     > >> > > my change. but there is no easy re-run the build on
>     travis UI.
>     > Google
>     > >> > > search showed a trick of close-and-open the PR will
>     trigger rebuild.
>     > >> but
>     > >> > > that could add noises to the PR activities.
>     > >> > > https://travis-ci.org/apache/flink/jobs/545555519
>     > >> > >
>     > >> > > travis-ci for my personal repo often failed with
>     exceeding time
>     > limit
>     > >> > after
>     > >> > > 4+ hours.
>     > >> > > The job exceeded the maximum time limit for jobs, and has
>     been
>     > >> > terminated.
>     > >> > >
>     > >> > > On Mon, Jun 24, 2019 at 4:15 PM Bowen Li
>     <bowenli86@gmail.com <ma...@gmail.com>>
>     > wrote:
>     > >> > >
>     > >> > > > https://travis-ci.org/apache/flink/builds/549681530
>     This build
>     > >> > request
>     > >> > > > has
>     > >> > > > been sitting at **HEAD of the queue** since I first saw
>     it at PST
>     > >> > 10:30am
>     > >> > > > (not sure how long it's been there before 10:30am).
>     It's PST
>     > 4:12pm
>     > >> now
>     > >> > > and
>     > >> > > > it hasn't started yet.
>     > >> > > >
>     > >> > > > On Mon, Jun 24, 2019 at 2:48 PM Bowen Li
>     <bowenli86@gmail.com <ma...@gmail.com>>
>     > >> wrote:
>     > >> > > >
>     > >> > > > > Hi devs,
>     > >> > > > >
>     > >> > > > > I've been experiencing the pain resulting from lack
>     of stable
>     > >> build
>     > >> > > > > capacity on Travis for Flink PRs [1]. Specifically, I
>     noticed
>     > >> often
>     > >> > > that
>     > >> > > > no
>     > >> > > > > build in the queue is making any progress for hours, and
>     > suddenly
>     > >> 5
>     > >> > or
>     > >> > > 6
>     > >> > > > > builds kick off all together after the long pause.
>     I'm at PST
>     > >> > (UTC-08)
>     > >> > > > time
>     > >> > > > > zone, and I've seen pause can be as long as 6 hours
>     from PST 9am
>     > >> to
>     > >> > 3pm
>     > >> > > > > (let alone the time needed to drain the queue
>     afterwards).
>     > >> > > > >
>     > >> > > > > I think this has greatly impacted our productivity. I've
>     > >> experienced
>     > >> > > that
>     > >> > > > > PRs submitted in the early morning of PST time zone
>     won't finish
>     > >> > their
>     > >> > > > > build until late night of the same day.
>     > >> > > > >
>     > >> > > > > So my questions are:
>     > >> > > > >
>     > >> > > > > - Has anyone else experienced the same problem or
>     have similar
>     > >> > > > observation
>     > >> > > > > on TravisCI? (I suspect it has things to do with time
>     zone)
>     > >> > > > >
>     > >> > > > > - What pricing plan of TravisCI is Flink currently
>     using? Is it
>     > >> the
>     > >> > > free
>     > >> > > > > plan for open source projects? What are the
>     guaranteed build
>     > >> capacity
>     > >> > > of
>     > >> > > > > the current plan?
>     > >> > > > >
>     > >> > > > > - If the current pricing plan (either free or paid) can't
>     > provide
>     > >> > > stable
>     > >> > > > > build capacity, can we upgrade to a higher priced
>     plan with
>     > larger
>     > >> > and
>     > >> > > > more
>     > >> > > > > stable build capacity?
>     > >> > > > >
>     > >> > > > > BTW, another factor that contribute to the
>     productivity problem
>     > is
>     > >> > that
>     > >> > > > > our build is slow - we run full build for every PR and a
>     > >> successful
>     > >> > > full
>     > >> > > > > build takes ~5h. We definitely have more options to
>     solve it,
>     > for
>     > >> > > > instance,
>     > >> > > > > modularize the build graphs and reuse artifacts from the
>     > previous
>     > >> > > build.
>     > >> > > > > But I think that can be a big effort which is much
>     harder to
>     > >> > accomplish
>     > >> > > > in
>     > >> > > > > a short period of time and may deserve its own separate
>     > >> discussion.
>     > >> > > > >
>     > >> > > > > [1] https://travis-ci.org/apache/flink/pull_requests
>     > >> > > > >
>     > >> > > > >
>     > >> > > >
>     > >> > >
>     > >> >
>     > >>
>     > >
>     >
>
>
>     -- 
>     Best Regards
>
>     Jeff Zhang
>


Re: [DISCUSS] solve unstable build capacity problem on TravisCI

Posted by Jark Wu <im...@gmail.com>.
Hi Jeff,

Thanks for sharing the Zeppelin approach. I think it's a good idea to
leverage user's travis account.
In this way, we can have almost unlimited concurrent build jobs and
developers can restart build by themselves (currently only committers can
restart PR's build).

But I'm still not very clear how to integrate user's travis build into the
Flink pull request's build automatically. Can you explain more in detail?

Another question: does travis only build branches for user account?
My concern is that builds for PRs will rebase user's commits against
current master branch.
This will help us to find problems before merge.  Builds for branches will
lose the impact of new commits in master.
How does Zeppelin solve this problem?

Thanks again for sharing the idea.

Regards,
Jark

On Tue, 25 Jun 2019 at 11:01, Jeff Zhang <zj...@gmail.com> wrote:

> Hi Folks,
>
> Zeppelin meet this kind of issue before, we solve it by delegating each
> one's PR build to his travis account (Everyone can have 5 free slot for
> travis build).
> Apache account travis build is only triggered when PR is merged.
>
>
>
> Kurt Young <yk...@gmail.com> 于2019年6月25日周二 上午10:16写道:
>
> > (Forgot to cc George)
> >
> > Best,
> > Kurt
> >
> >
> > On Tue, Jun 25, 2019 at 10:16 AM Kurt Young <yk...@gmail.com> wrote:
> >
> > > Hi Bowen,
> > >
> > > Thanks for bringing this up. We actually have discussed about this,
> and I
> > > think Till and George have
> > > already spend sometime investigating it. I have cced both of them, and
> > > maybe they can share
> > > their findings.
> > >
> > > Best,
> > > Kurt
> > >
> > >
> > > On Tue, Jun 25, 2019 at 10:08 AM Jark Wu <im...@gmail.com> wrote:
> > >
> > >> Hi Bowen,
> > >>
> > >> Thanks for bringing this. We also suffered from the long build time.
> > >> I agree that we should focus on solving build capacity problem in the
> > >> thread.
> > >>
> > >> My observation is there is only one build is running, all the others
> > >> (other
> > >> PRs, master) are pending.
> > >> The pricing plan[1] of travis shows it can support concurrent build
> > jobs.
> > >> But I don't know which plan we are using, might be the free plan for
> > open
> > >> source.
> > >>
> > >> I cc-ed Chesnay who may have some experience on Travis.
> > >>
> > >> Regards,
> > >> Jark
> > >>
> > >> [1]: https://travis-ci.com/plans
> > >>
> > >> On Tue, 25 Jun 2019 at 08:11, Bowen Li <bo...@gmail.com> wrote:
> > >>
> > >> > Hi Steven,
> > >> >
> > >> > I think you may not read what I wrote. The discussion is about
> > "unstable
> > >> > build **capacity**", in another word "unstable / lack of build
> > >> resources",
> > >> > not "unstable build".
> > >> >
> > >> > On Mon, Jun 24, 2019 at 4:40 PM Steven Wu <st...@gmail.com>
> > wrote:
> > >> >
> > >> > > long and sometimes unstable build is definitely a pain point.
> > >> > >
> > >> > > I suspect the build failure here in flink-connector-kafka is not
> > >> related
> > >> > to
> > >> > > my change. but there is no easy re-run the build on travis UI.
> > Google
> > >> > > search showed a trick of close-and-open the PR will trigger
> rebuild.
> > >> but
> > >> > > that could add noises to the PR activities.
> > >> > > https://travis-ci.org/apache/flink/jobs/545555519
> > >> > >
> > >> > > travis-ci for my personal repo often failed with exceeding time
> > limit
> > >> > after
> > >> > > 4+ hours.
> > >> > > The job exceeded the maximum time limit for jobs, and has been
> > >> > terminated.
> > >> > >
> > >> > > On Mon, Jun 24, 2019 at 4:15 PM Bowen Li <bo...@gmail.com>
> > wrote:
> > >> > >
> > >> > > > https://travis-ci.org/apache/flink/builds/549681530  This build
> > >> > request
> > >> > > > has
> > >> > > > been sitting at **HEAD of the queue** since I first saw it at
> PST
> > >> > 10:30am
> > >> > > > (not sure how long it's been there before 10:30am). It's PST
> > 4:12pm
> > >> now
> > >> > > and
> > >> > > > it hasn't started yet.
> > >> > > >
> > >> > > > On Mon, Jun 24, 2019 at 2:48 PM Bowen Li <bo...@gmail.com>
> > >> wrote:
> > >> > > >
> > >> > > > > Hi devs,
> > >> > > > >
> > >> > > > > I've been experiencing the pain resulting from lack of stable
> > >> build
> > >> > > > > capacity on Travis for Flink PRs [1]. Specifically, I noticed
> > >> often
> > >> > > that
> > >> > > > no
> > >> > > > > build in the queue is making any progress for hours, and
> > suddenly
> > >> 5
> > >> > or
> > >> > > 6
> > >> > > > > builds kick off all together after the long pause. I'm at PST
> > >> > (UTC-08)
> > >> > > > time
> > >> > > > > zone, and I've seen pause can be as long as 6 hours from PST
> 9am
> > >> to
> > >> > 3pm
> > >> > > > > (let alone the time needed to drain the queue afterwards).
> > >> > > > >
> > >> > > > > I think this has greatly impacted our productivity. I've
> > >> experienced
> > >> > > that
> > >> > > > > PRs submitted in the early morning of PST time zone won't
> finish
> > >> > their
> > >> > > > > build until late night of the same day.
> > >> > > > >
> > >> > > > > So my questions are:
> > >> > > > >
> > >> > > > > - Has anyone else experienced the same problem or have similar
> > >> > > > observation
> > >> > > > > on TravisCI? (I suspect it has things to do with time zone)
> > >> > > > >
> > >> > > > > - What pricing plan of TravisCI is Flink currently using? Is
> it
> > >> the
> > >> > > free
> > >> > > > > plan for open source projects? What are the guaranteed build
> > >> capacity
> > >> > > of
> > >> > > > > the current plan?
> > >> > > > >
> > >> > > > > - If the current pricing plan (either free or paid) can't
> > provide
> > >> > > stable
> > >> > > > > build capacity, can we upgrade to a higher priced plan with
> > larger
> > >> > and
> > >> > > > more
> > >> > > > > stable build capacity?
> > >> > > > >
> > >> > > > > BTW, another factor that contribute to the productivity
> problem
> > is
> > >> > that
> > >> > > > > our build is slow - we run full build for every PR and a
> > >> successful
> > >> > > full
> > >> > > > > build takes ~5h. We definitely have more options to solve it,
> > for
> > >> > > > instance,
> > >> > > > > modularize the build graphs and reuse artifacts from the
> > previous
> > >> > > build.
> > >> > > > > But I think that can be a big effort which is much harder to
> > >> > accomplish
> > >> > > > in
> > >> > > > > a short period of time and may deserve its own separate
> > >> discussion.
> > >> > > > >
> > >> > > > > [1] https://travis-ci.org/apache/flink/pull_requests
> > >> > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> >
>
>
> --
> Best Regards
>
> Jeff Zhang
>

Re: [DISCUSS] solve unstable build capacity problem on TravisCI

Posted by Jeff Zhang <zj...@gmail.com>.
Hi Folks,

Zeppelin meet this kind of issue before, we solve it by delegating each
one's PR build to his travis account (Everyone can have 5 free slot for
travis build).
Apache account travis build is only triggered when PR is merged.



Kurt Young <yk...@gmail.com> 于2019年6月25日周二 上午10:16写道:

> (Forgot to cc George)
>
> Best,
> Kurt
>
>
> On Tue, Jun 25, 2019 at 10:16 AM Kurt Young <yk...@gmail.com> wrote:
>
> > Hi Bowen,
> >
> > Thanks for bringing this up. We actually have discussed about this, and I
> > think Till and George have
> > already spend sometime investigating it. I have cced both of them, and
> > maybe they can share
> > their findings.
> >
> > Best,
> > Kurt
> >
> >
> > On Tue, Jun 25, 2019 at 10:08 AM Jark Wu <im...@gmail.com> wrote:
> >
> >> Hi Bowen,
> >>
> >> Thanks for bringing this. We also suffered from the long build time.
> >> I agree that we should focus on solving build capacity problem in the
> >> thread.
> >>
> >> My observation is there is only one build is running, all the others
> >> (other
> >> PRs, master) are pending.
> >> The pricing plan[1] of travis shows it can support concurrent build
> jobs.
> >> But I don't know which plan we are using, might be the free plan for
> open
> >> source.
> >>
> >> I cc-ed Chesnay who may have some experience on Travis.
> >>
> >> Regards,
> >> Jark
> >>
> >> [1]: https://travis-ci.com/plans
> >>
> >> On Tue, 25 Jun 2019 at 08:11, Bowen Li <bo...@gmail.com> wrote:
> >>
> >> > Hi Steven,
> >> >
> >> > I think you may not read what I wrote. The discussion is about
> "unstable
> >> > build **capacity**", in another word "unstable / lack of build
> >> resources",
> >> > not "unstable build".
> >> >
> >> > On Mon, Jun 24, 2019 at 4:40 PM Steven Wu <st...@gmail.com>
> wrote:
> >> >
> >> > > long and sometimes unstable build is definitely a pain point.
> >> > >
> >> > > I suspect the build failure here in flink-connector-kafka is not
> >> related
> >> > to
> >> > > my change. but there is no easy re-run the build on travis UI.
> Google
> >> > > search showed a trick of close-and-open the PR will trigger rebuild.
> >> but
> >> > > that could add noises to the PR activities.
> >> > > https://travis-ci.org/apache/flink/jobs/545555519
> >> > >
> >> > > travis-ci for my personal repo often failed with exceeding time
> limit
> >> > after
> >> > > 4+ hours.
> >> > > The job exceeded the maximum time limit for jobs, and has been
> >> > terminated.
> >> > >
> >> > > On Mon, Jun 24, 2019 at 4:15 PM Bowen Li <bo...@gmail.com>
> wrote:
> >> > >
> >> > > > https://travis-ci.org/apache/flink/builds/549681530  This build
> >> > request
> >> > > > has
> >> > > > been sitting at **HEAD of the queue** since I first saw it at PST
> >> > 10:30am
> >> > > > (not sure how long it's been there before 10:30am). It's PST
> 4:12pm
> >> now
> >> > > and
> >> > > > it hasn't started yet.
> >> > > >
> >> > > > On Mon, Jun 24, 2019 at 2:48 PM Bowen Li <bo...@gmail.com>
> >> wrote:
> >> > > >
> >> > > > > Hi devs,
> >> > > > >
> >> > > > > I've been experiencing the pain resulting from lack of stable
> >> build
> >> > > > > capacity on Travis for Flink PRs [1]. Specifically, I noticed
> >> often
> >> > > that
> >> > > > no
> >> > > > > build in the queue is making any progress for hours, and
> suddenly
> >> 5
> >> > or
> >> > > 6
> >> > > > > builds kick off all together after the long pause. I'm at PST
> >> > (UTC-08)
> >> > > > time
> >> > > > > zone, and I've seen pause can be as long as 6 hours from PST 9am
> >> to
> >> > 3pm
> >> > > > > (let alone the time needed to drain the queue afterwards).
> >> > > > >
> >> > > > > I think this has greatly impacted our productivity. I've
> >> experienced
> >> > > that
> >> > > > > PRs submitted in the early morning of PST time zone won't finish
> >> > their
> >> > > > > build until late night of the same day.
> >> > > > >
> >> > > > > So my questions are:
> >> > > > >
> >> > > > > - Has anyone else experienced the same problem or have similar
> >> > > > observation
> >> > > > > on TravisCI? (I suspect it has things to do with time zone)
> >> > > > >
> >> > > > > - What pricing plan of TravisCI is Flink currently using? Is it
> >> the
> >> > > free
> >> > > > > plan for open source projects? What are the guaranteed build
> >> capacity
> >> > > of
> >> > > > > the current plan?
> >> > > > >
> >> > > > > - If the current pricing plan (either free or paid) can't
> provide
> >> > > stable
> >> > > > > build capacity, can we upgrade to a higher priced plan with
> larger
> >> > and
> >> > > > more
> >> > > > > stable build capacity?
> >> > > > >
> >> > > > > BTW, another factor that contribute to the productivity problem
> is
> >> > that
> >> > > > > our build is slow - we run full build for every PR and a
> >> successful
> >> > > full
> >> > > > > build takes ~5h. We definitely have more options to solve it,
> for
> >> > > > instance,
> >> > > > > modularize the build graphs and reuse artifacts from the
> previous
> >> > > build.
> >> > > > > But I think that can be a big effort which is much harder to
> >> > accomplish
> >> > > > in
> >> > > > > a short period of time and may deserve its own separate
> >> discussion.
> >> > > > >
> >> > > > > [1] https://travis-ci.org/apache/flink/pull_requests
> >> > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
>


-- 
Best Regards

Jeff Zhang

Re: [DISCUSS] solve unstable build capacity problem on TravisCI

Posted by Kurt Young <yk...@gmail.com>.
(Forgot to cc George)

Best,
Kurt


On Tue, Jun 25, 2019 at 10:16 AM Kurt Young <yk...@gmail.com> wrote:

> Hi Bowen,
>
> Thanks for bringing this up. We actually have discussed about this, and I
> think Till and George have
> already spend sometime investigating it. I have cced both of them, and
> maybe they can share
> their findings.
>
> Best,
> Kurt
>
>
> On Tue, Jun 25, 2019 at 10:08 AM Jark Wu <im...@gmail.com> wrote:
>
>> Hi Bowen,
>>
>> Thanks for bringing this. We also suffered from the long build time.
>> I agree that we should focus on solving build capacity problem in the
>> thread.
>>
>> My observation is there is only one build is running, all the others
>> (other
>> PRs, master) are pending.
>> The pricing plan[1] of travis shows it can support concurrent build jobs.
>> But I don't know which plan we are using, might be the free plan for open
>> source.
>>
>> I cc-ed Chesnay who may have some experience on Travis.
>>
>> Regards,
>> Jark
>>
>> [1]: https://travis-ci.com/plans
>>
>> On Tue, 25 Jun 2019 at 08:11, Bowen Li <bo...@gmail.com> wrote:
>>
>> > Hi Steven,
>> >
>> > I think you may not read what I wrote. The discussion is about "unstable
>> > build **capacity**", in another word "unstable / lack of build
>> resources",
>> > not "unstable build".
>> >
>> > On Mon, Jun 24, 2019 at 4:40 PM Steven Wu <st...@gmail.com> wrote:
>> >
>> > > long and sometimes unstable build is definitely a pain point.
>> > >
>> > > I suspect the build failure here in flink-connector-kafka is not
>> related
>> > to
>> > > my change. but there is no easy re-run the build on travis UI. Google
>> > > search showed a trick of close-and-open the PR will trigger rebuild.
>> but
>> > > that could add noises to the PR activities.
>> > > https://travis-ci.org/apache/flink/jobs/545555519
>> > >
>> > > travis-ci for my personal repo often failed with exceeding time limit
>> > after
>> > > 4+ hours.
>> > > The job exceeded the maximum time limit for jobs, and has been
>> > terminated.
>> > >
>> > > On Mon, Jun 24, 2019 at 4:15 PM Bowen Li <bo...@gmail.com> wrote:
>> > >
>> > > > https://travis-ci.org/apache/flink/builds/549681530  This build
>> > request
>> > > > has
>> > > > been sitting at **HEAD of the queue** since I first saw it at PST
>> > 10:30am
>> > > > (not sure how long it's been there before 10:30am). It's PST 4:12pm
>> now
>> > > and
>> > > > it hasn't started yet.
>> > > >
>> > > > On Mon, Jun 24, 2019 at 2:48 PM Bowen Li <bo...@gmail.com>
>> wrote:
>> > > >
>> > > > > Hi devs,
>> > > > >
>> > > > > I've been experiencing the pain resulting from lack of stable
>> build
>> > > > > capacity on Travis for Flink PRs [1]. Specifically, I noticed
>> often
>> > > that
>> > > > no
>> > > > > build in the queue is making any progress for hours, and suddenly
>> 5
>> > or
>> > > 6
>> > > > > builds kick off all together after the long pause. I'm at PST
>> > (UTC-08)
>> > > > time
>> > > > > zone, and I've seen pause can be as long as 6 hours from PST 9am
>> to
>> > 3pm
>> > > > > (let alone the time needed to drain the queue afterwards).
>> > > > >
>> > > > > I think this has greatly impacted our productivity. I've
>> experienced
>> > > that
>> > > > > PRs submitted in the early morning of PST time zone won't finish
>> > their
>> > > > > build until late night of the same day.
>> > > > >
>> > > > > So my questions are:
>> > > > >
>> > > > > - Has anyone else experienced the same problem or have similar
>> > > > observation
>> > > > > on TravisCI? (I suspect it has things to do with time zone)
>> > > > >
>> > > > > - What pricing plan of TravisCI is Flink currently using? Is it
>> the
>> > > free
>> > > > > plan for open source projects? What are the guaranteed build
>> capacity
>> > > of
>> > > > > the current plan?
>> > > > >
>> > > > > - If the current pricing plan (either free or paid) can't provide
>> > > stable
>> > > > > build capacity, can we upgrade to a higher priced plan with larger
>> > and
>> > > > more
>> > > > > stable build capacity?
>> > > > >
>> > > > > BTW, another factor that contribute to the productivity problem is
>> > that
>> > > > > our build is slow - we run full build for every PR and a
>> successful
>> > > full
>> > > > > build takes ~5h. We definitely have more options to solve it, for
>> > > > instance,
>> > > > > modularize the build graphs and reuse artifacts from the previous
>> > > build.
>> > > > > But I think that can be a big effort which is much harder to
>> > accomplish
>> > > > in
>> > > > > a short period of time and may deserve its own separate
>> discussion.
>> > > > >
>> > > > > [1] https://travis-ci.org/apache/flink/pull_requests
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>

Re: [DISCUSS] solve unstable build capacity problem on TravisCI

Posted by Kurt Young <yk...@gmail.com>.
Hi Bowen,

Thanks for bringing this up. We actually have discussed about this, and I
think Till and George have
already spend sometime investigating it. I have cced both of them, and
maybe they can share
their findings.

Best,
Kurt


On Tue, Jun 25, 2019 at 10:08 AM Jark Wu <im...@gmail.com> wrote:

> Hi Bowen,
>
> Thanks for bringing this. We also suffered from the long build time.
> I agree that we should focus on solving build capacity problem in the
> thread.
>
> My observation is there is only one build is running, all the others (other
> PRs, master) are pending.
> The pricing plan[1] of travis shows it can support concurrent build jobs.
> But I don't know which plan we are using, might be the free plan for open
> source.
>
> I cc-ed Chesnay who may have some experience on Travis.
>
> Regards,
> Jark
>
> [1]: https://travis-ci.com/plans
>
> On Tue, 25 Jun 2019 at 08:11, Bowen Li <bo...@gmail.com> wrote:
>
> > Hi Steven,
> >
> > I think you may not read what I wrote. The discussion is about "unstable
> > build **capacity**", in another word "unstable / lack of build
> resources",
> > not "unstable build".
> >
> > On Mon, Jun 24, 2019 at 4:40 PM Steven Wu <st...@gmail.com> wrote:
> >
> > > long and sometimes unstable build is definitely a pain point.
> > >
> > > I suspect the build failure here in flink-connector-kafka is not
> related
> > to
> > > my change. but there is no easy re-run the build on travis UI. Google
> > > search showed a trick of close-and-open the PR will trigger rebuild.
> but
> > > that could add noises to the PR activities.
> > > https://travis-ci.org/apache/flink/jobs/545555519
> > >
> > > travis-ci for my personal repo often failed with exceeding time limit
> > after
> > > 4+ hours.
> > > The job exceeded the maximum time limit for jobs, and has been
> > terminated.
> > >
> > > On Mon, Jun 24, 2019 at 4:15 PM Bowen Li <bo...@gmail.com> wrote:
> > >
> > > > https://travis-ci.org/apache/flink/builds/549681530  This build
> > request
> > > > has
> > > > been sitting at **HEAD of the queue** since I first saw it at PST
> > 10:30am
> > > > (not sure how long it's been there before 10:30am). It's PST 4:12pm
> now
> > > and
> > > > it hasn't started yet.
> > > >
> > > > On Mon, Jun 24, 2019 at 2:48 PM Bowen Li <bo...@gmail.com>
> wrote:
> > > >
> > > > > Hi devs,
> > > > >
> > > > > I've been experiencing the pain resulting from lack of stable build
> > > > > capacity on Travis for Flink PRs [1]. Specifically, I noticed often
> > > that
> > > > no
> > > > > build in the queue is making any progress for hours, and suddenly 5
> > or
> > > 6
> > > > > builds kick off all together after the long pause. I'm at PST
> > (UTC-08)
> > > > time
> > > > > zone, and I've seen pause can be as long as 6 hours from PST 9am to
> > 3pm
> > > > > (let alone the time needed to drain the queue afterwards).
> > > > >
> > > > > I think this has greatly impacted our productivity. I've
> experienced
> > > that
> > > > > PRs submitted in the early morning of PST time zone won't finish
> > their
> > > > > build until late night of the same day.
> > > > >
> > > > > So my questions are:
> > > > >
> > > > > - Has anyone else experienced the same problem or have similar
> > > > observation
> > > > > on TravisCI? (I suspect it has things to do with time zone)
> > > > >
> > > > > - What pricing plan of TravisCI is Flink currently using? Is it the
> > > free
> > > > > plan for open source projects? What are the guaranteed build
> capacity
> > > of
> > > > > the current plan?
> > > > >
> > > > > - If the current pricing plan (either free or paid) can't provide
> > > stable
> > > > > build capacity, can we upgrade to a higher priced plan with larger
> > and
> > > > more
> > > > > stable build capacity?
> > > > >
> > > > > BTW, another factor that contribute to the productivity problem is
> > that
> > > > > our build is slow - we run full build for every PR and a successful
> > > full
> > > > > build takes ~5h. We definitely have more options to solve it, for
> > > > instance,
> > > > > modularize the build graphs and reuse artifacts from the previous
> > > build.
> > > > > But I think that can be a big effort which is much harder to
> > accomplish
> > > > in
> > > > > a short period of time and may deserve its own separate discussion.
> > > > >
> > > > > [1] https://travis-ci.org/apache/flink/pull_requests
> > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] solve unstable build capacity problem on TravisCI

Posted by Jark Wu <im...@gmail.com>.
Hi Bowen,

Thanks for bringing this. We also suffered from the long build time.
I agree that we should focus on solving build capacity problem in the
thread.

My observation is there is only one build is running, all the others (other
PRs, master) are pending.
The pricing plan[1] of travis shows it can support concurrent build jobs.
But I don't know which plan we are using, might be the free plan for open
source.

I cc-ed Chesnay who may have some experience on Travis.

Regards,
Jark

[1]: https://travis-ci.com/plans

On Tue, 25 Jun 2019 at 08:11, Bowen Li <bo...@gmail.com> wrote:

> Hi Steven,
>
> I think you may not read what I wrote. The discussion is about "unstable
> build **capacity**", in another word "unstable / lack of build resources",
> not "unstable build".
>
> On Mon, Jun 24, 2019 at 4:40 PM Steven Wu <st...@gmail.com> wrote:
>
> > long and sometimes unstable build is definitely a pain point.
> >
> > I suspect the build failure here in flink-connector-kafka is not related
> to
> > my change. but there is no easy re-run the build on travis UI. Google
> > search showed a trick of close-and-open the PR will trigger rebuild. but
> > that could add noises to the PR activities.
> > https://travis-ci.org/apache/flink/jobs/545555519
> >
> > travis-ci for my personal repo often failed with exceeding time limit
> after
> > 4+ hours.
> > The job exceeded the maximum time limit for jobs, and has been
> terminated.
> >
> > On Mon, Jun 24, 2019 at 4:15 PM Bowen Li <bo...@gmail.com> wrote:
> >
> > > https://travis-ci.org/apache/flink/builds/549681530  This build
> request
> > > has
> > > been sitting at **HEAD of the queue** since I first saw it at PST
> 10:30am
> > > (not sure how long it's been there before 10:30am). It's PST 4:12pm now
> > and
> > > it hasn't started yet.
> > >
> > > On Mon, Jun 24, 2019 at 2:48 PM Bowen Li <bo...@gmail.com> wrote:
> > >
> > > > Hi devs,
> > > >
> > > > I've been experiencing the pain resulting from lack of stable build
> > > > capacity on Travis for Flink PRs [1]. Specifically, I noticed often
> > that
> > > no
> > > > build in the queue is making any progress for hours, and suddenly 5
> or
> > 6
> > > > builds kick off all together after the long pause. I'm at PST
> (UTC-08)
> > > time
> > > > zone, and I've seen pause can be as long as 6 hours from PST 9am to
> 3pm
> > > > (let alone the time needed to drain the queue afterwards).
> > > >
> > > > I think this has greatly impacted our productivity. I've experienced
> > that
> > > > PRs submitted in the early morning of PST time zone won't finish
> their
> > > > build until late night of the same day.
> > > >
> > > > So my questions are:
> > > >
> > > > - Has anyone else experienced the same problem or have similar
> > > observation
> > > > on TravisCI? (I suspect it has things to do with time zone)
> > > >
> > > > - What pricing plan of TravisCI is Flink currently using? Is it the
> > free
> > > > plan for open source projects? What are the guaranteed build capacity
> > of
> > > > the current plan?
> > > >
> > > > - If the current pricing plan (either free or paid) can't provide
> > stable
> > > > build capacity, can we upgrade to a higher priced plan with larger
> and
> > > more
> > > > stable build capacity?
> > > >
> > > > BTW, another factor that contribute to the productivity problem is
> that
> > > > our build is slow - we run full build for every PR and a successful
> > full
> > > > build takes ~5h. We definitely have more options to solve it, for
> > > instance,
> > > > modularize the build graphs and reuse artifacts from the previous
> > build.
> > > > But I think that can be a big effort which is much harder to
> accomplish
> > > in
> > > > a short period of time and may deserve its own separate discussion.
> > > >
> > > > [1] https://travis-ci.org/apache/flink/pull_requests
> > > >
> > > >
> > >
> >
>

Re: [DISCUSS] solve unstable build capacity problem on TravisCI

Posted by Bowen Li <bo...@gmail.com>.
Hi Steven,

I think you may not read what I wrote. The discussion is about "unstable
build **capacity**", in another word "unstable / lack of build resources",
not "unstable build".

On Mon, Jun 24, 2019 at 4:40 PM Steven Wu <st...@gmail.com> wrote:

> long and sometimes unstable build is definitely a pain point.
>
> I suspect the build failure here in flink-connector-kafka is not related to
> my change. but there is no easy re-run the build on travis UI. Google
> search showed a trick of close-and-open the PR will trigger rebuild. but
> that could add noises to the PR activities.
> https://travis-ci.org/apache/flink/jobs/545555519
>
> travis-ci for my personal repo often failed with exceeding time limit after
> 4+ hours.
> The job exceeded the maximum time limit for jobs, and has been terminated.
>
> On Mon, Jun 24, 2019 at 4:15 PM Bowen Li <bo...@gmail.com> wrote:
>
> > https://travis-ci.org/apache/flink/builds/549681530  This build request
> > has
> > been sitting at **HEAD of the queue** since I first saw it at PST 10:30am
> > (not sure how long it's been there before 10:30am). It's PST 4:12pm now
> and
> > it hasn't started yet.
> >
> > On Mon, Jun 24, 2019 at 2:48 PM Bowen Li <bo...@gmail.com> wrote:
> >
> > > Hi devs,
> > >
> > > I've been experiencing the pain resulting from lack of stable build
> > > capacity on Travis for Flink PRs [1]. Specifically, I noticed often
> that
> > no
> > > build in the queue is making any progress for hours, and suddenly 5 or
> 6
> > > builds kick off all together after the long pause. I'm at PST (UTC-08)
> > time
> > > zone, and I've seen pause can be as long as 6 hours from PST 9am to 3pm
> > > (let alone the time needed to drain the queue afterwards).
> > >
> > > I think this has greatly impacted our productivity. I've experienced
> that
> > > PRs submitted in the early morning of PST time zone won't finish their
> > > build until late night of the same day.
> > >
> > > So my questions are:
> > >
> > > - Has anyone else experienced the same problem or have similar
> > observation
> > > on TravisCI? (I suspect it has things to do with time zone)
> > >
> > > - What pricing plan of TravisCI is Flink currently using? Is it the
> free
> > > plan for open source projects? What are the guaranteed build capacity
> of
> > > the current plan?
> > >
> > > - If the current pricing plan (either free or paid) can't provide
> stable
> > > build capacity, can we upgrade to a higher priced plan with larger and
> > more
> > > stable build capacity?
> > >
> > > BTW, another factor that contribute to the productivity problem is that
> > > our build is slow - we run full build for every PR and a successful
> full
> > > build takes ~5h. We definitely have more options to solve it, for
> > instance,
> > > modularize the build graphs and reuse artifacts from the previous
> build.
> > > But I think that can be a big effort which is much harder to accomplish
> > in
> > > a short period of time and may deserve its own separate discussion.
> > >
> > > [1] https://travis-ci.org/apache/flink/pull_requests
> > >
> > >
> >
>

Re: [DISCUSS] solve unstable build capacity problem on TravisCI

Posted by Steven Wu <st...@gmail.com>.
long and sometimes unstable build is definitely a pain point.

I suspect the build failure here in flink-connector-kafka is not related to
my change. but there is no easy re-run the build on travis UI. Google
search showed a trick of close-and-open the PR will trigger rebuild. but
that could add noises to the PR activities.
https://travis-ci.org/apache/flink/jobs/545555519

travis-ci for my personal repo often failed with exceeding time limit after
4+ hours.
The job exceeded the maximum time limit for jobs, and has been terminated.

On Mon, Jun 24, 2019 at 4:15 PM Bowen Li <bo...@gmail.com> wrote:

> https://travis-ci.org/apache/flink/builds/549681530  This build request
> has
> been sitting at **HEAD of the queue** since I first saw it at PST 10:30am
> (not sure how long it's been there before 10:30am). It's PST 4:12pm now and
> it hasn't started yet.
>
> On Mon, Jun 24, 2019 at 2:48 PM Bowen Li <bo...@gmail.com> wrote:
>
> > Hi devs,
> >
> > I've been experiencing the pain resulting from lack of stable build
> > capacity on Travis for Flink PRs [1]. Specifically, I noticed often that
> no
> > build in the queue is making any progress for hours, and suddenly 5 or 6
> > builds kick off all together after the long pause. I'm at PST (UTC-08)
> time
> > zone, and I've seen pause can be as long as 6 hours from PST 9am to 3pm
> > (let alone the time needed to drain the queue afterwards).
> >
> > I think this has greatly impacted our productivity. I've experienced that
> > PRs submitted in the early morning of PST time zone won't finish their
> > build until late night of the same day.
> >
> > So my questions are:
> >
> > - Has anyone else experienced the same problem or have similar
> observation
> > on TravisCI? (I suspect it has things to do with time zone)
> >
> > - What pricing plan of TravisCI is Flink currently using? Is it the free
> > plan for open source projects? What are the guaranteed build capacity of
> > the current plan?
> >
> > - If the current pricing plan (either free or paid) can't provide stable
> > build capacity, can we upgrade to a higher priced plan with larger and
> more
> > stable build capacity?
> >
> > BTW, another factor that contribute to the productivity problem is that
> > our build is slow - we run full build for every PR and a successful full
> > build takes ~5h. We definitely have more options to solve it, for
> instance,
> > modularize the build graphs and reuse artifacts from the previous build.
> > But I think that can be a big effort which is much harder to accomplish
> in
> > a short period of time and may deserve its own separate discussion.
> >
> > [1] https://travis-ci.org/apache/flink/pull_requests
> >
> >
>

Re: [DISCUSS] solve unstable build capacity problem on TravisCI

Posted by Bowen Li <bo...@gmail.com>.
https://travis-ci.org/apache/flink/builds/549681530  This build request has
been sitting at **HEAD of the queue** since I first saw it at PST 10:30am
(not sure how long it's been there before 10:30am). It's PST 4:12pm now and
it hasn't started yet.

On Mon, Jun 24, 2019 at 2:48 PM Bowen Li <bo...@gmail.com> wrote:

> Hi devs,
>
> I've been experiencing the pain resulting from lack of stable build
> capacity on Travis for Flink PRs [1]. Specifically, I noticed often that no
> build in the queue is making any progress for hours, and suddenly 5 or 6
> builds kick off all together after the long pause. I'm at PST (UTC-08) time
> zone, and I've seen pause can be as long as 6 hours from PST 9am to 3pm
> (let alone the time needed to drain the queue afterwards).
>
> I think this has greatly impacted our productivity. I've experienced that
> PRs submitted in the early morning of PST time zone won't finish their
> build until late night of the same day.
>
> So my questions are:
>
> - Has anyone else experienced the same problem or have similar observation
> on TravisCI? (I suspect it has things to do with time zone)
>
> - What pricing plan of TravisCI is Flink currently using? Is it the free
> plan for open source projects? What are the guaranteed build capacity of
> the current plan?
>
> - If the current pricing plan (either free or paid) can't provide stable
> build capacity, can we upgrade to a higher priced plan with larger and more
> stable build capacity?
>
> BTW, another factor that contribute to the productivity problem is that
> our build is slow - we run full build for every PR and a successful full
> build takes ~5h. We definitely have more options to solve it, for instance,
> modularize the build graphs and reuse artifacts from the previous build.
> But I think that can be a big effort which is much harder to accomplish in
> a short period of time and may deserve its own separate discussion.
>
> [1] https://travis-ci.org/apache/flink/pull_requests
>
>

Re: [DISCUSS] solve unstable build capacity problem on TravisCI

Posted by Bowen Li <bo...@gmail.com>.
Want to summarize Chesnay's points for everyone reading this thread: 1) the
build resources Flink is currently using belong to ASF INFRA, and 2) we are
waiting on ASF INFRA's response on whether we can donate/sponsor extra
build resources for Flink.

I think it'll be super helpful to pay and secure dedicated build resources
for Flink. If that doesn't work, I agree with Jark that the Zeppelin's
approach Jeff described sounds promising.

Jeff, can you answer Jark's questions above and share how Zeppelin
community's practices look like?

Cheers,
Bowen

On Tue, Jun 25, 2019 at 12:50 AM Chesnay Schepler <ch...@apache.org>
wrote:

>
> On 24/06/2019 23:48, Bowen Li wrote:
> > - Has anyone else experienced the same problem or have similar
> observation
> > on TravisCI? (I suspect it has things to do with time zone)
> In Europe we have the same problem.
> >
> > - What pricing plan of TravisCI is Flink currently using? Is it the free
> > plan for open source projects? What are the guaranteed build capacity of
> > the current plan?
> Flink is using the Travis resources provided by the ASF, which afaik the
> ASF is paying for.
>
> Note that in the past the Flink project was already approached  by INFRA
> since we were using too many Travis resources,
> so this is _not_ as simple as asking for more.
> >
> > - If the current pricing plan (either free or paid) can't provide stable
> > build capacity, can we upgrade to a higher priced plan with larger and
> more
> > stable build capacity?
> We are currently investigating whether companies could donate/sponsor
> Travis CI resources to the ASF for increasing the build capacity;
> currently waiting for an answer from INFRA.
> >
> > BTW, another factor that contribute to the productivity problem is that
> our
> > build is slow - we run full build for every PR and a successful full
> build
> > takes ~5h. We definitely have more options to solve it, for instance,
> > modularize the build graphs and reuse artifacts from the previous build.
> > But I think that can be a big effort which is much harder to accomplish
> in
> > a short period of time and may deserve its own separate discussion.
> We already are doing that. The vast majority of the build times is
> simply due to tests taking way too long, not compilation.
> The tests for the kafka connector alone exceed a single profile, as does
> the table API.
> Unless people start caring about test times before adding them, this
> issue cannot be solved.
>
> Of course, this discussion isn't new, I already raised it the last 2
> times we approach the Travis limits, with little to no effect to be seen.
>
> At this point I'm sure someone out there is thinking "well, just don't
> run kafka tests for every PR. Like, check the diff or something",
> and yes, sure, that _might_ work. But to this day, despite numerous
> people suggesting it, I still haven't seen a single person actually try
> implementing it.
>
> The problem with these kind of approaches is that they tend to be
> brittle as hell, result in subtle behaviors if they have bugs, and
> overall make the CI significantly more complicated by introducing
> various edge cases.
>
> Our current CI is, relatively speaking, straightforward and consistent.
> And as it stands we can't afford elaborate schemes because I just don't
> have the time capacity for maintaining that.
>

Re: [DISCUSS] solve unstable build capacity problem on TravisCI

Posted by Chesnay Schepler <ch...@apache.org>.
On 24/06/2019 23:48, Bowen Li wrote:
> - Has anyone else experienced the same problem or have similar observation
> on TravisCI? (I suspect it has things to do with time zone)
In Europe we have the same problem.
>
> - What pricing plan of TravisCI is Flink currently using? Is it the free
> plan for open source projects? What are the guaranteed build capacity of
> the current plan?
Flink is using the Travis resources provided by the ASF, which afaik the 
ASF is paying for.

Note that in the past the Flink project was already approached  by INFRA 
since we were using too many Travis resources,
so this is _not_ as simple as asking for more.
>
> - If the current pricing plan (either free or paid) can't provide stable
> build capacity, can we upgrade to a higher priced plan with larger and more
> stable build capacity?
We are currently investigating whether companies could donate/sponsor 
Travis CI resources to the ASF for increasing the build capacity; 
currently waiting for an answer from INFRA.
>
> BTW, another factor that contribute to the productivity problem is that our
> build is slow - we run full build for every PR and a successful full build
> takes ~5h. We definitely have more options to solve it, for instance,
> modularize the build graphs and reuse artifacts from the previous build.
> But I think that can be a big effort which is much harder to accomplish in
> a short period of time and may deserve its own separate discussion.
We already are doing that. The vast majority of the build times is 
simply due to tests taking way too long, not compilation.
The tests for the kafka connector alone exceed a single profile, as does 
the table API.
Unless people start caring about test times before adding them, this 
issue cannot be solved.

Of course, this discussion isn't new, I already raised it the last 2 
times we approach the Travis limits, with little to no effect to be seen.

At this point I'm sure someone out there is thinking "well, just don't 
run kafka tests for every PR. Like, check the diff or something",
and yes, sure, that _might_ work. But to this day, despite numerous 
people suggesting it, I still haven't seen a single person actually try 
implementing it.

The problem with these kind of approaches is that they tend to be 
brittle as hell, result in subtle behaviors if they have bugs, and 
overall make the CI significantly more complicated by introducing 
various edge cases.

Our current CI is, relatively speaking, straightforward and consistent. 
And as it stands we can't afford elaborate schemes because I just don't 
have the time capacity for maintaining that.