You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Pablo Estrada <pa...@google.com> on 2019/04/01 00:53:40 UTC

Re: Build blocking on

Hi Michael,
I wrote that test and much of that code. I'm quite sorry about the trouble.
The test should use mocks and not hang when it's missing GCP dependencies.
That sounds like a bug in the test. We can deactivate it while I figure out
what's going wrong..
Best
-P.

On Sat, Mar 30, 2019, 2:55 PM Michael Luckey <ad...@gmail.com> wrote:

> After digging a bit deeper, I was able to verify, that those tests block
> on authorization to GCP.
>
> Seems that, as I do not have any credentials set, and underlying oauth2
> falls back to some local mode. This seems to start a webserver on port 8080
> and waiting there forever. Accessing that port forwards to some google, but
> fails also miserably.
>
> Running
>
> python setup.py nosetests --tests
>>  apache_beam.io.gcp.bigquery_file_loads_test:TestBigQueryFileLoads.test_records_traverse_transform_with_mocks
>
>
> and hitting 'Ctrl-C' after it got stuck, results in following output:
>
> 'KeyboardInterrupt [while running
>> \'WriteToBigQuery/BigQueryBatchFileLoads/RemoveTempTables/Delete\']\n------------
>> Your browser has been opened to visit:
>>
>> https://accounts.google.com/o/oauth2/v2/auth?scope=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fbigquery+https%3A%
>> If your browser is on a different machine then exit and re-run this
>> application with the command-line parameter
>>   --noauth_local_webserver
>> Failed to find "code" in the query parameters of the redirect.
>> Invalid authorization: Try running with --noauth_local_webserver.
>
>
> I am a bit lost here on how to proceed.
>
>
> On Tue, Mar 26, 2019 at 11:48 PM Michael Luckey <ad...@gmail.com>
> wrote:
>
>>
>>
>> On Tue, Mar 26, 2019 at 11:18 PM Mikhail Gryzykhin <mi...@google.com>
>> wrote:
>>
>>> I believe what happens is that testPy2Gcp actually runs integration
>>> tests that try to connect to GCP.
>>>
>>
>> Actually I was hoping for an explanation like this. Any suggestion how I
>> could confirm that on my behalf?
>>
>>
>>> Without having GCP cluster and configuration on your machine I'd expect
>>> these tests to fail.
>>>
>>
>> Hmm... here I am actually unsure, what would be the best to handle such
>> cases.
>>
>> If I understand correctly, we currently skip some tests which do not meet
>> expectations, kind of 'can not run on your arch' thingies... So I am
>> undecided, whether I d prefer those tests to be skipped if gcp
>> configuration is missing
>>
>> pro
>> * dev is still able to run the tests (whichever task they are associated
>> with) without having to separate the failures out. For instance, these
>> 'testPy2Gcp' does actually execute 'some tests' - which might be already
>> covered by some other calls... But I definitely do not like the idea, to
>> put the burden on the developer to track which tasks/tests might be
>> executed on local machine. Unless this distinction is really coarse - and
>> pre/postcommit is something I really would like to be able to run locally...
>>
>>
>> con
>> * we definitely need to make sure, those tests are not accidentally
>> skipped on CI servers.
>>
>>
>>>
>>> I'd say we should remove testPy2Gcp task from "build" task and
>>> explicitly keep it as integration test.
>>>
>>> --Mikhail
>>>
>>>
>>> On Tue, Mar 26, 2019 at 3:12 PM Michael Luckey <ad...@gmail.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Tue, Mar 26, 2019 at 10:29 PM Udi Meiri <eh...@google.com> wrote:
>>>>
>>>>> Luckey, I couldn't recreate your issue, but I still haven't done a
>>>>> full build.
>>>>> I created a new GCE VM with using the ubuntu-1804-bionic-v20190212a
>>>>> image (n1-standard-4 machine type).
>>>>>
>>>>> Ran the following:
>>>>> sudo apt-get update
>>>>> sudo apt-get install python-pip
>>>>> sudo apt-get install python-virtualenv
>>>>> git clone https://github.com/apache/beam.git
>>>>> cd beam
>>>>> ./gradlew :beam-sdks-python:testPy2Gcp
>>>>> [failed: no JAVA_HOME]
>>>>> sudo apt-get install openjdk-8-jdk
>>>>> ./gradlew :beam-sdks-python:testPy2Gcp
>>>>>
>>>>> Got: BUILD SUCCESSFUL in 7m 52s
>>>>>
>>>>
>>>> Nice. Thanks a lot for your help here.
>>>>
>>>> If I understand correctly, this VM is already located within gcp. Could
>>>> it already have some setup, which needs to be done on 'my' VM? For instance
>>>> I was contemplating about that test trying 'to call home', but as I am
>>>> (unfortunately ;) no googler and do not have any gcp specific setup, fails
>>>> here but misses to timeout? This is just some weird assumption, did not yet
>>>> look into the actual implementation.
>>>>
>>>> Which I seemingly need to do here :(
>>>>
>>>>
>>>>> Then I tried:
>>>>> ./gradlew build
>>>>>
>>>>> And ran out of disk space. :) (beam/ is taking 4.5G and the VM boot
>>>>> disk is 10G total)
>>>>>
>>>>
>>>> Ouch :D
>>>>
>>>>
>>>>>
>>>>> On Tue, Mar 26, 2019 at 1:35 PM Robert Burke <ro...@frantil.com>
>>>>> wrote:
>>>>>
>>>>>> Michael, your concern is reasonable, especially with the experience
>>>>>> with python, though that does help me bootstrap this work. :)
>>>>>>
>>>>>> The go tools provide caching and avoid redoing work if the source
>>>>>> files haven't changed. This applies most particularly for `go build` and
>>>>>> `go test`. As long as the go code isn't changing at every invocation, this
>>>>>> should be fine. I'm not aware of the same being the case for the usual
>>>>>> python tools.
>>>>>>
>>>>>>  The real trick is ensuring a valid and consistent environment for
>>>>>> the go code.
>>>>>>
>>>>>> The environment question becomes easier for everyone by moving to go
>>>>>> modules, which were designed to provide these kinds of consistent builds.
>>>>>> It also avoids needing a GOPATH set. Any directory is permitted, as long as
>>>>>> the go.mod is present.
>>>>>>
>>>>>> (The Go SDK doesn't yet us go modules, so go.mod and go.sum aren't
>>>>>> yet in the repo.)
>>>>>>
>>>>>> The main blocker is see is updating the Jenkins machines to have the
>>>>>> latest version of Go (1.12) instead of 1.10, which doesn't support modules.
>>>>>> This only blocks a final submission, rather than the work fortunately.
>>>>>>
>>>>>> On Tue, Mar 26, 2019, 1:08 PM Udi Meiri <eh...@google.com> wrote:
>>>>>>
>>>>>>> "rm -r ~/.gradle/go/repo/" worked for me (there was more than one
>>>>>>> package with issues).
>>>>>>> My ~/.bashrc has
>>>>>>>   export GOPATH=$HOME/go
>>>>>>> so maybe that's making the difference in my setup.
>>>>>>>
>>>>>>> On Tue, Mar 26, 2019 at 11:28 AM Thomas Weise <th...@apache.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Can this be addressed by having "clean" remove all state that
>>>>>>>> gogradle leaves behind? This staleness issue has bitten me a few times also
>>>>>>>> and it would be good to have a reliable way to deal with it, even if it
>>>>>>>> involves an extra clean.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Mar 26, 2019 at 11:14 AM Michael Luckey <
>>>>>>>> adude3141@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> @Udi
>>>>>>>>> Did you try to just delete the
>>>>>>>>> '/usr/local/google/home/ehudm/.gradle/go/repo/cloud.google.com'
>>>>>>>>> folder?
>>>>>>>>>
>>>>>>>>> @Robert
>>>>>>>>> As said before, I am a bit scared about the implications. Shelling
>>>>>>>>> out is done by python, and from build perspective, this does not work very
>>>>>>>>> well, unfortunately. I.e. no caching, up-to-date checks etc...
>>>>>>>>>
>>>>>>>>> But of course, we need to play with this a bit more.
>>>>>>>>>
>>>>>>>>> On Tue, Mar 26, 2019 at 6:24 PM Robert Burke <ro...@frantil.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Reading the error from the gradle scan, it largely looks like
>>>>>>>>>> some part of the GCP dependencies for the build depends on a package, where
>>>>>>>>>> the commit version is no longer around. The main issue with gogradle is
>>>>>>>>>> that it's entirely distinct from the usual Go workflow, which means deps
>>>>>>>>>> users use are likely to be different to what's in the lock file.
>>>>>>>>>>
>>>>>>>>>> This work will be tracked in
>>>>>>>>>> https://issues.apache.org/jira/browse/BEAM-5379
>>>>>>>>>> GoGradle hasn't moved to support the new-go way of handling deps,
>>>>>>>>>> so my inclination is to simplify to simple scripts for Gradle that shell
>>>>>>>>>> out the to Go tool for handling Go dep management, over trying to fix
>>>>>>>>>> GoGradle.
>>>>>>>>>>
>>>>>>>>>> On Tue, 26 Mar 2019 at 09:43, Udi Meiri <eh...@google.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Robert, from what I recall it's not flaky for me - it
>>>>>>>>>>> consistently fails. Let me know if there's a way to get more logging about
>>>>>>>>>>> this error.
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Mar 25, 2019, 19:50 Robert Burke <ro...@frantil.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> It's concerning to me that 1) the Go dependency resolution via
>>>>>>>>>>>> gogradle is flaky, and 2) that it can block other languages.
>>>>>>>>>>>>
>>>>>>>>>>>> I suppose 2) makes sense since it's part of the container
>>>>>>>>>>>> bootstrapping code, but that makes 1) a serious problem, of which I wasn't
>>>>>>>>>>>> aware.
>>>>>>>>>>>> I should have time to investigate this in the next two weeks.
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, 25 Mar 2019 at 18:08, Michael Luckey <
>>>>>>>>>>>> adude3141@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Just for the record,
>>>>>>>>>>>>>
>>>>>>>>>>>>> using a vm here, because did not yet get all task running on
>>>>>>>>>>>>> my mac, and did not want to mess with my setup.
>>>>>>>>>>>>>
>>>>>>>>>>>>> So installed vanilla ubuntu-18.04 LTS on virtual box, 26GB
>>>>>>>>>>>>> ram, 6 cores and further
>>>>>>>>>>>>>
>>>>>>>>>>>>> sudo apt update
>>>>>>>>>>>>>
>>>>>>>>>>>>> sudo apt install gcc
>>>>>>>>>>>>>
>>>>>>>>>>>>> sudo apt install make
>>>>>>>>>>>>>
>>>>>>>>>>>>> sudo apt install perl
>>>>>>>>>>>>>
>>>>>>>>>>>>> sudo apt install curl
>>>>>>>>>>>>>
>>>>>>>>>>>>> sudo apt install openjdk-8-jdk
>>>>>>>>>>>>>
>>>>>>>>>>>>> sudo apt install python
>>>>>>>>>>>>>
>>>>>>>>>>>>> sudo apt install -y software-properties-common
>>>>>>>>>>>>>
>>>>>>>>>>>>> sudo add-apt-repository ppa:deadsnakes/ppa
>>>>>>>>>>>>>
>>>>>>>>>>>>> sudo apt update
>>>>>>>>>>>>>
>>>>>>>>>>>>> sudo apt install python3.5
>>>>>>>>>>>>>
>>>>>>>>>>>>> sudo apt-get install apt-transport-https ca-certificates curl
>>>>>>>>>>>>> gnupg-agent software-properties-common
>>>>>>>>>>>>>
>>>>>>>>>>>>> curl -fsSL https://download.docker.com/linux/ubuntu/gpg |
>>>>>>>>>>>>> sudo apt-key add -
>>>>>>>>>>>>>
>>>>>>>>>>>>> sudo apt-key fingerprint 0EBFCD88
>>>>>>>>>>>>>
>>>>>>>>>>>>> sudo add-apt-repository "deb [arch=amd64]
>>>>>>>>>>>>> https://download.docker.com/linux/ubuntu \
>>>>>>>>>>>>>
>>>>>>>>>>>>> $(lsb_release -cs) \
>>>>>>>>>>>>>
>>>>>>>>>>>>> stable"
>>>>>>>>>>>>>
>>>>>>>>>>>>> sudo apt-get update
>>>>>>>>>>>>>
>>>>>>>>>>>>> sudo apt-get install docker-ce docker-ce-cli containerd.io
>>>>>>>>>>>>>
>>>>>>>>>>>>> sudo groupadd docker
>>>>>>>>>>>>>
>>>>>>>>>>>>> sudo usermod -aG docker $USER
>>>>>>>>>>>>>
>>>>>>>>>>>>> git config --global user.email "dont@spam.me"
>>>>>>>>>>>>>
>>>>>>>>>>>>> git config --global user.name "Some Guy"
>>>>>>>>>>>>>
>>>>>>>>>>>>> curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
>>>>>>>>>>>>>
>>>>>>>>>>>>> sudo python get-pip.py
>>>>>>>>>>>>>
>>>>>>>>>>>>> rm get-pip.py
>>>>>>>>>>>>>
>>>>>>>>>>>>> sudo pip install --upgrade virtualenv
>>>>>>>>>>>>>
>>>>>>>>>>>>> sudo pip install cython
>>>>>>>>>>>>>
>>>>>>>>>>>>> sudo apt-get install python-dev
>>>>>>>>>>>>>
>>>>>>>>>>>>> sudo apt-get install python3-distutils
>>>>>>>>>>>>>
>>>>>>>>>>>>> sudo apt-get install python3-dev # for python3.x installs
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> git clone https://github.com/apache/beam.git cd beam/
>>>>>>>>>>>>> ./gradlew build
>>>>>>>>>>>>>
>>>>>>>>>>>>> Nothing else changed/added. (hopefully, need to reassure
>>>>>>>>>>>>> myself here)
>>>>>>>>>>>>>
>>>>>>>>>>>>> Unfortunately, this is failing. Need to exclude those python
>>>>>>>>>>>>> tests (and of course website, which usually fails on lira links)
>>>>>>>>>>>>>
>>>>>>>>>>>>> So I might be missing some env settings for gap, dunno.
>>>>>>>>>>>>> Probably missed some docs.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Mar 26, 2019 at 1:46 AM Michael Luckey <
>>>>>>>>>>>>> adude3141@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks Udi for trying that!
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> In fact, the go dependency resolution is flaky. Did not look
>>>>>>>>>>>>>> into that, but just rerunning usually works. Of course, less than optimal,
>>>>>>>>>>>>>> but, well...
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Running build target is of course just an aggregation of task
>>>>>>>>>>>>>> to run. And unfortunately just running that
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ./gradlew  :beam-sdks-python:testPy2Gcp
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> stalls on my (virtual) machine.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Mar 26, 2019 at 1:35 AM Udi Meiri <eh...@google.com>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Okay, `./gradlew build` failed pretty quickly for me:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> > Task :beam-sdks-go:resolveBuildDependencies FAILED
>>>>>>>>>>>>>>> cloud.google.com/go:
>>>>>>>>>>>>>>> commit='4f6c921ec566a33844f4e7879b31cd8575a6982d', urls=[
>>>>>>>>>>>>>>> https://code.googlesource.com/gocloud] does not exist in
>>>>>>>>>>>>>>> /usr/local/google/home/ehudm/.gradle/go/repo/
>>>>>>>>>>>>>>> cloud.google.com/go/625660c387d9403fde4d73cacaf2d2ac,
>>>>>>>>>>>>>>> updating will be performed.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> https://gradle.com/s/x5zqbc5zwd3bg
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> (Now I remember why I stopped using `build` :/)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Mon, Mar 25, 2019 at 5:30 PM Udi Meiri <eh...@google.com>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> It shouldn't stall. That's a bug.
>>>>>>>>>>>>>>>> OTOH, I never use the `build` target.
>>>>>>>>>>>>>>>> I'll try running that myself.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Mon, Mar 25, 2019, 07:24 Michael Luckey <
>>>>>>>>>>>>>>>> adude3141@gmail.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> trying to run './gradlew build' on vanilla setup, my build
>>>>>>>>>>>>>>>>> consistently stalls during execution of python gcp tests, e.g. on both of
>>>>>>>>>>>>>>>>> - > :beam-sdks-python:testPy2Gcp
>>>>>>>>>>>>>>>>> - > :beam-sdks-python-test-suites-tox-py35:testPy35Gcp
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Console output:
>>>>>>>>>>>>>>>>> #### snip ####
>>>>>>>>>>>>>>>>> test_big_query_standard_sql
>>>>>>>>>>>>>>>>> (apache_beam.io.gcp.big_query_query_to_table_it_test.BigQueryQueryToTableIT)
>>>>>>>>>>>>>>>>> ... SKIP: IT is skipped because --test-pipeline-options is not specified
>>>>>>>>>>>>>>>>> test_big_query_standard_sql_kms_key
>>>>>>>>>>>>>>>>> (apache_beam.io.gcp.big_query_query_to_table_it_test.BigQueryQueryToTableIT)
>>>>>>>>>>>>>>>>> ... SKIP: This test requires BQ Dataflow native source support for KMS,
>>>>>>>>>>>>>>>>> which is not available yet.
>>>>>>>>>>>>>>>>> test_multiple_destinations_transform
>>>>>>>>>>>>>>>>> (apache_beam.io.gcp.bigquery_file_loads_test.BigQueryFileLoadsIT) ... SKIP:
>>>>>>>>>>>>>>>>> IT is skipped because --test-pipeline-options is not specified
>>>>>>>>>>>>>>>>> test_one_job_fails_all_jobs_fail
>>>>>>>>>>>>>>>>> (apache_beam.io.gcp.bigquery_file_loads_test.BigQueryFileLoadsIT) ... SKIP:
>>>>>>>>>>>>>>>>> IT is skipped because --test-pipeline-options is not specified
>>>>>>>>>>>>>>>>> test_records_traverse_transform_with_mocks
>>>>>>>>>>>>>>>>> (apache_beam.io.gcp.bigquery_file_loads_test.TestBigQueryFileLoads) ...
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> output ends here, would expect a failed or ok here.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Afterwards no progress - even waiting for hours. Any idea,
>>>>>>>>>>>>>>>>> what might be causing this? Do I need to add some GCP properties for this
>>>>>>>>>>>>>>>>> task ?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Any ideas, what I am doing wrong?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> best,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> michel
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>