You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Tianhua huang <hu...@gmail.com> on 2019/06/19 08:24:05 UTC

Re: Ask for ARM CI for spark

Thanks for your reply.

As I said before, I met some problem of build or test for spark on aarch64
server, so it will be better to have the ARM CI to make sure the spark
is compatible
for AArch64 platforms.

I’m from OpenLab team(https://openlabtesting.org/ ,a community to do open
source project testing. And we can support some Arm virtual machines to
AMPLab Jenkins, and also we have a developer team that willing to work on
this, we willing to maintain build CI jobs and address the CI issues.
What do you think?

Thanks for your attention.

On Wed, Jun 19, 2019 at 6:39 AM shane knapp <sk...@berkeley.edu> wrote:

> yeah, we don't have any aarch64 systems for testing...  this has been
> asked before but is currently pretty low on our priority list as we don't
> have the hardware.
>
> sorry,
>
> shane
>
> On Mon, Jun 10, 2019 at 7:08 PM Tianhua huang <hu...@gmail.com>
> wrote:
>
>> Hi, sorry to disturb you.
>> The CI testing for apache spark is supported by AMPLab Jenkins, and I
>> find there are some computers(most of them are Linux (amd64) arch) for
>> the CI development, but seems there is no Aarch64 computer for spark CI
>> testing. Recently, I build and run test for spark(master and branch-2.4) on
>> my arm server, and unfortunately there are some problems, for example, ut
>> test is failed due to a LEVELDBJNI native package, the details for java
>> test see http://paste.openstack.org/show/752063/ and python test see
>> http://paste.openstack.org/show/752709/
>> So I have a question about the ARM CI testing for spark, is there any
>> plan to support it? Thank you very much and I will wait for your reply!
>>
>
>
> --
> Shane Knapp
> UC Berkeley EECS Research / RISELab Staff Technical Lead
> https://rise.cs.berkeley.edu
>

Re: Ask for ARM CI for spark

Posted by bo zhaobo <bz...@gmail.com>.

Hi Guys,

Recently, we are trying to test pyspark on ARM, we found some issue but
have no idea about them. Could you please have a look if you are free?
Thanks.

There are two issues:
1. The first one looks like a arm performance issue, the test job in a
pyspark test doesn't fully finish when exec assert check. So we change the
source code on our local env to test, they will pass.  For this issue, we
opened a JIRA issue [1]. If you guys are free, please help it. Thanks.
2. The second one looks like a spark internal issue, when we test
"pyspark.mllib.tests.test_streaming_algorithms:StreamingLinearRegressionWithTests.test_train_prediction",
it will fail as the "condition" function.We tried to deep into it and found
the predicted value is still [0. 0. .....0.], eventhough we wait for a long
time on ARM testing env. That's the main cause I think. And we failed to
debug into which step is wrong. Could you please help to figure it out? I
upload the test log after I inserted some 'printf' into the 'func' function
of  the testcase function. I tried on ARM and X86. ARM log is [2], X86 log
is [3]. They are the same testing env except different ARCH.

Thanks, if you are free, please help us.

Best Regards

[1] https://issues.apache.org/jira/browse/SPARK-29205
[2] https://etherpad.net/p/pyspark-arm
[3] https://etherpad.net/p/pyspark-x86

[image: Mailtrack]
<https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&>
Sender
notified by
Mailtrack
<https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&>
19/09/23
上午11:53:29

Tianhua huang <hu...@gmail.com> 于2019年9月19日周四 上午10:59写道：

> @Dongjoon Hyun <do...@gmail.com> ,
>
> Sure, and I have update the JIRA already :)
> https://issues.apache.org/jira/browse/SPARK-29106
> If anything missed, please let me know, thank you.
>
> On Thu, Sep 19, 2019 at 12:44 AM Dongjoon Hyun <do...@gmail.com>
> wrote:
>
>> Hi, Tianhua.
>>
>> Could you summarize the detail on the JIRA once more?
>> It will be very helpful for the community. Also, I've been waiting on
>> that JIRA. :)
>>
>> Bests,
>> Dongjoon.
>>
>>
>> On Mon, Sep 16, 2019 at 11:48 PM Tianhua huang <hu...@gmail.com>
>> wrote:
>>
>>> @shane knapp <sk...@berkeley.edu> thank you very much, I opened an
>>> issue for this https://issues.apache.org/jira/browse/SPARK-29106, we
>>> can tall the details in it :)
>>> And we will prepare an arm instance today and will send the info to your
>>> email later.
>>>
>>> On Tue, Sep 17, 2019 at 4:40 AM Shane Knapp <sk...@berkeley.edu> wrote:
>>>
>>>> @Tianhua huang <hu...@gmail.com> sure, i think we can get
>>>> something sorted for the short-term.
>>>>
>>>> all we need is ssh access (i can provide an ssh key), and i can then
>>>> have our jenkins master launch a remote worker on that instance.
>>>>
>>>> instance setup, etc, will be up to you.  my support for the time being
>>>> will be to create the job and 'best effort' for everything else.
>>>>
>>>> this should get us up and running asap.
>>>>
>>>> is there an open JIRA for jenkins/arm test support?  we can move the
>>>> technical details about this idea there.
>>>>
>>>> On Sun, Sep 15, 2019 at 9:03 PM Tianhua huang <
>>>> huangtianhua223@gmail.com> wrote:
>>>>
>>>>> @Sean Owen <sr...@gmail.com> , so sorry to reply late, we had a
>>>>> Mid-Autumn holiday:)
>>>>>
>>>>> If you hope to integrate ARM CI to amplab jenkins, we can offer the
>>>>> arm instance, and then the ARM job will run together with other x86 jobs,
>>>>> so maybe there is a guideline to do this? @shane knapp
>>>>> <sk...@berkeley.edu>  would you help us?
>>>>>
>>>>> On Thu, Sep 12, 2019 at 9:36 PM Sean Owen <sr...@gmail.com> wrote:
>>>>>
>>>>>> I don't know what's involved in actually accepting or operating those
>>>>>> machines, so can't comment there, but in the meantime it's good that you
>>>>>> are running these tests and can help report changes needed to keep it
>>>>>> working with ARM. I would continue with that for now.
>>>>>>
>>>>>> On Wed, Sep 11, 2019 at 10:06 PM Tianhua huang <
>>>>>> huangtianhua223@gmail.com> wrote:
>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> For the whole work process of spark ARM CI, we want to make 2 things
>>>>>>> clear.
>>>>>>>
>>>>>>> The first thing is:
>>>>>>> About spark ARM CI, now we have two periodic jobs, one job[1] based
>>>>>>> on commit[2](which already fixed the replay tests failed issue[3], we made
>>>>>>> a new test branch based on date 09-09-2019), the other job[4] based on
>>>>>>> spark master.
>>>>>>>
>>>>>>> The first job we test on the specified branch to prove that our ARM
>>>>>>> CI is good and stable.
>>>>>>> The second job checks spark master every day, then we can find
>>>>>>> whether the latest commits affect the ARM CI. According to the build
>>>>>>> history and result, it shows that some problems are easier to find on ARM
>>>>>>> like SPARK-28770 <https://issues.apache.org/jira/browse/SPARK-28770>,
>>>>>>> and it also shows that we would make efforts to trace and figure them
>>>>>>> out, till now we have found and fixed several problems[5][6][7], thanks
>>>>>>> everyone of the community :). And we believe that ARM CI is very necessary,
>>>>>>> right?
>>>>>>>
>>>>>>> The second thing is:
>>>>>>> We plan to run the jobs for a period of time, and you can see the
>>>>>>> result and logs from 'build history' of the jobs console, if everything
>>>>>>> goes well for one or two weeks could community accept the ARM CI? or how
>>>>>>> long the periodic jobs to run then our community could have enough
>>>>>>> confidence to accept the ARM CI? As you suggested before, it's good to
>>>>>>> integrate ARM CI to amplab jenkins, we agree that and we can donate the ARM
>>>>>>> instances and then maintain the ARM-related test jobs together with
>>>>>>> community, any thoughts?
>>>>>>>
>>>>>>> Thank you all!
>>>>>>>
>>>>>>> [1]
>>>>>>> http://status.openlabtesting.org/job/spark-unchanged-branch-unit-test-hadoop-2.7-arm64
>>>>>>> [2]
>>>>>>> https://github.com/apache/spark/commit/0ed9fae45769d4b06b8cf8128f462f09ff3d9a72
>>>>>>> [3] https://issues.apache.org/jira/browse/SPARK-28770
>>>>>>> [4]
>>>>>>> http://status.openlabtesting.org/builds?job_name=spark-master-unit-test-hadoop-2.7-arm64
>>>>>>> [5] https://github.com/apache/spark/pull/25186
>>>>>>> [6] https://github.com/apache/spark/pull/25279
>>>>>>> [7] https://github.com/apache/spark/pull/25673
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Aug 16, 2019 at 11:24 PM Sean Owen <sr...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Yes, I think it's just local caching. After you run the build you
>>>>>>>> should find lots of stuff cached at ~/.m2/repository and it won't download
>>>>>>>> every time.
>>>>>>>>
>>>>>>>> On Fri, Aug 16, 2019 at 3:01 AM bo zhaobo <
>>>>>>>> bzhaojyathousandy@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi Sean,
>>>>>>>>> Thanks for reply. And very apologize for making you confused.
>>>>>>>>> I know the dependencies will be downloaded from SBT or Maven. But
>>>>>>>>> the Spark QA job also exec "mvn clean package", why the log didn't print
>>>>>>>>> "downloading some jar from Maven central [1] and build very fast. Is the
>>>>>>>>> reason that Spark Jenkins build the Spark jars in the physical machiines
>>>>>>>>> and won't destrory the test env after job is finished? Then the other job
>>>>>>>>> build Spark will get the dependencies jar from the local cached, as the
>>>>>>>>> previous jobs exec "mvn package", those dependencies had been downloaded
>>>>>>>>> already on local worker machine. Am I right? Is that the reason the job
>>>>>>>>> log[1] didn't print any downloading information from Maven Central?
>>>>>>>>>
>>>>>>>>> Thank you very much.
>>>>>>>>>
>>>>>>>>> [1]
>>>>>>>>> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-master-test-maven-hadoop-2.6-ubuntu-testing/lastBuild/consoleFull
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Best regards
>>>>>>>>>
>>>>>>>>> ZhaoBo
>>>>>>>>>
>>>>>>>>> [image: Mailtrack]
>>>>>>>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> Sender
>>>>>>>>> notified by
>>>>>>>>> Mailtrack
>>>>>>>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> 19/08/16
>>>>>>>>> 下午03:58:53
>>>>>>>>>
>>>>>>>>> Sean Owen <sr...@gmail.com> 于2019年8月16日周五 上午10:38写道：
>>>>>>>>>
>>>>>>>>>> I'm not sure what you mean. The dependencies are downloaded by
>>>>>>>>>> SBT and Maven like in any other project, and nothing about it is specific
>>>>>>>>>> to Spark.
>>>>>>>>>> The worker machines cache artifacts that are downloaded from
>>>>>>>>>> these, but this is a function of Maven and SBT, not Spark. You may find
>>>>>>>>>> that the initial download takes a long time.
>>>>>>>>>>
>>>>>>>>>> On Thu, Aug 15, 2019 at 9:02 PM bo zhaobo <
>>>>>>>>>> bzhaojyathousandy@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Sean,
>>>>>>>>>>>
>>>>>>>>>>> Thanks very much for pointing out the roadmap. ;-). Then I think
>>>>>>>>>>> we will continue to focus on our test environment.
>>>>>>>>>>>
>>>>>>>>>>> For the networking problems, I mean that we can access Maven
>>>>>>>>>>> Central, and jobs cloud download the required jar package with a high
>>>>>>>>>>> network speed. What we want to know is that, why the Spark QA test jobs[1]
>>>>>>>>>>> log shows the job script/maven build seem don't download the jar packages?
>>>>>>>>>>> Could you tell us the reason about that? Thank you.  The reason we raise
>>>>>>>>>>> the "networking problems" is that we found a phenomenon during we test, if
>>>>>>>>>>> we execute "mvn clean package" in a new test environment(As in our test
>>>>>>>>>>> environment, we will destory the test VMs after the job is finish), maven
>>>>>>>>>>> will download the dependency jar packages from Maven Central, but in this
>>>>>>>>>>> job "spark-master-test-maven-hadoop" [2], from the log, we didn't found it
>>>>>>>>>>> download any jar packages, what the reason about that?
>>>>>>>>>>> Also we build the Spark jar with downloading dependencies from
>>>>>>>>>>> Maven Central, it will cost mostly 1 hour. And we found [2] just cost
>>>>>>>>>>> 10min. But if we run "mvn package" in a VM which already exec "mvn package"
>>>>>>>>>>> before, it just cost 14min, looks very closer with [2]. So we suspect that
>>>>>>>>>>> downloading the Jar packages cost so much time. For the goad of ARM CI, we
>>>>>>>>>>> expect the performance of NEW ARM CI could be closer with existing X86 CI,
>>>>>>>>>>> then users could accept it eaiser.
>>>>>>>>>>>
>>>>>>>>>>> [1]
>>>>>>>>>>> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/
>>>>>>>>>>> [2]
>>>>>>>>>>> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-master-test-maven-hadoop-2.6-ubuntu-testing/lastBuild/consoleFull
>>>>>>>>>>>
>>>>>>>>>>> Best regards
>>>>>>>>>>>
>>>>>>>>>>> ZhaoBo
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> [image: Mailtrack]
>>>>>>>>>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> Sender
>>>>>>>>>>> notified by
>>>>>>>>>>> Mailtrack
>>>>>>>>>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> 19/08/16
>>>>>>>>>>> 上午09:48:43
>>>>>>>>>>>
>>>>>>>>>>> Sean Owen <sr...@gmail.com> 于2019年8月15日周四 下午9:58写道：
>>>>>>>>>>>
>>>>>>>>>>>> I think the right goal is to fix the remaining issues first. If
>>>>>>>>>>>> we set up CI/CD it will only tell us there are still some test failures. If
>>>>>>>>>>>> it's stable, and not hard to add to the existing CI/CD, yes it could be
>>>>>>>>>>>> done automatically later. You can continue to test on ARM independently for
>>>>>>>>>>>> now.
>>>>>>>>>>>>
>>>>>>>>>>>> It sounds indeed like there are some networking problems in the
>>>>>>>>>>>> test system if you're not able to download from Maven Central. That rarely
>>>>>>>>>>>> takes significant time, and there aren't project-specific mirrors here. You
>>>>>>>>>>>> might be able to point at a closer public mirror, depending on where you
>>>>>>>>>>>> are.
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Aug 15, 2019 at 5:43 AM Tianhua huang <
>>>>>>>>>>>> huangtianhua223@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I want to discuss spark ARM CI again, we took some tests on
>>>>>>>>>>>>> arm instance based on master and the job includes
>>>>>>>>>>>>> https://github.com/theopenlab/spark/pull/13  and k8s
>>>>>>>>>>>>> integration https://github.com/theopenlab/spark/pull/17/ ,
>>>>>>>>>>>>> there are several things I want to talk about:
>>>>>>>>>>>>>
>>>>>>>>>>>>> First, about the failed tests:
>>>>>>>>>>>>>     1.we have fixed some problems like
>>>>>>>>>>>>> https://github.com/apache/spark/pull/25186 and
>>>>>>>>>>>>> https://github.com/apache/spark/pull/25279, thanks sean owen
>>>>>>>>>>>>> and others to help us.
>>>>>>>>>>>>>     2.we tried k8s integration test on arm, and met an error:
>>>>>>>>>>>>> apk fetch hangs,  the tests passed  after adding '--network host' option
>>>>>>>>>>>>> for command `docker build`, see:
>>>>>>>>>>>>>
>>>>>>>>>>>>> https://github.com/theopenlab/spark/pull/17/files#diff-5b731b14068240d63a93c393f6f9b1e8R176
>>>>>>>>>>>>> , the solution refers to
>>>>>>>>>>>>> https://github.com/gliderlabs/docker-alpine/issues/307  and I
>>>>>>>>>>>>> don't know whether it happened once in community CI, or maybe we should
>>>>>>>>>>>>> submit a pr to pass  '--network host' when `docker build`?
>>>>>>>>>>>>>     3.we found there are two tests failed after the commit
>>>>>>>>>>>>> https://github.com/apache/spark/pull/23767  :
>>>>>>>>>>>>>        ReplayListenerSuite:
>>>>>>>>>>>>>        - ...
>>>>>>>>>>>>>        - End-to-end replay *** FAILED ***
>>>>>>>>>>>>>          "[driver]" did not equal "[1]"
>>>>>>>>>>>>> (JsonProtocolSuite.scala:622)
>>>>>>>>>>>>>        - End-to-end replay with compression *** FAILED ***
>>>>>>>>>>>>>          "[driver]" did not equal "[1]"
>>>>>>>>>>>>> (JsonProtocolSuite.scala:622)
>>>>>>>>>>>>>
>>>>>>>>>>>>>         we tried to revert the commit and then the tests
>>>>>>>>>>>>> passed, the patch is too big and so sorry we can't find the reason till
>>>>>>>>>>>>> now, if you are interesting please try it, and it will be very appreciate
>>>>>>>>>>>>>         if someone can help us to figure it out.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Second, about the test time, we increased the flavor of arm
>>>>>>>>>>>>> instance to 16U16G, but seems there was no significant improvement, the k8s
>>>>>>>>>>>>> integration test took about one and a half hours, and the QA test(like
>>>>>>>>>>>>> spark-master-test-maven-hadoop-2.7 community jenkins job) took about
>>>>>>>>>>>>> seventeen hours(it is too long :(), we suspect that the reason is the
>>>>>>>>>>>>> performance and network,
>>>>>>>>>>>>> we split the jobs based on projects such as sql, core and so
>>>>>>>>>>>>> on, the time can be decrease to about seven hours, see
>>>>>>>>>>>>> https://github.com/theopenlab/spark/pull/19 We found the
>>>>>>>>>>>>> Spark QA tests like
>>>>>>>>>>>>> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/   ,
>>>>>>>>>>>>> it looks all tests seem never download the jar packages from maven centry
>>>>>>>>>>>>> repo(such as
>>>>>>>>>>>>> https://repo.maven.apache.org/maven2/org/opencypher/okapi-api/0.4.2/okapi-api-0.4.2.jar).
>>>>>>>>>>>>> So we want to know how the jenkins jobs can do that, is there a internal
>>>>>>>>>>>>> maven repo launched? maybe we can do the same thing to avoid the network
>>>>>>>>>>>>> connection cost during downloading the dependent jar packages.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Third, the most important thing, it's about ARM CI of spark,
>>>>>>>>>>>>> we believe that it is necessary, right? And you can see we really made a
>>>>>>>>>>>>> lot of efforts, now the basic arm build/test jobs is ok, so we suggest to
>>>>>>>>>>>>> add arm jobs to community, we can set them to novoting firstly, and
>>>>>>>>>>>>> improve/rich the jobs step by step. Generally, there are two ways in our
>>>>>>>>>>>>> mind to integrate the ARM CI for spark:
>>>>>>>>>>>>>      1) We introduce openlab ARM CI into spark as a custom CI
>>>>>>>>>>>>> system. We provide human resources and test ARM VMs, also we will focus on
>>>>>>>>>>>>> the ARM related issues about Spark. We will push the PR into community.
>>>>>>>>>>>>>      2) We donate ARM VM resources into existing amplab
>>>>>>>>>>>>> Jenkins. We still provide human resources, focus on the ARM related issues
>>>>>>>>>>>>> about Spark and push the PR into community.
>>>>>>>>>>>>> Both options, we will provide human resources to maintain, of
>>>>>>>>>>>>> course it will be great if we can work together. So please tell us which
>>>>>>>>>>>>> option you would like? And let's move forward. Waiting for your reply,
>>>>>>>>>>>>> thank you very much.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>
>>>> --
>>>> Shane Knapp
>>>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>>> https://rise.cs.berkeley.edu
>>>>
>>>

Re: Ask for ARM CI for spark

Posted by Tianhua huang <hu...@gmail.com>.

@Dongjoon Hyun <do...@gmail.com> ,

Sure, and I have update the JIRA already :)
https://issues.apache.org/jira/browse/SPARK-29106
If anything missed, please let me know, thank you.

On Thu, Sep 19, 2019 at 12:44 AM Dongjoon Hyun <do...@gmail.com>
wrote:

> Hi, Tianhua.
>
> Could you summarize the detail on the JIRA once more?
> It will be very helpful for the community. Also, I've been waiting on that
> JIRA. :)
>
> Bests,
> Dongjoon.
>
>
> On Mon, Sep 16, 2019 at 11:48 PM Tianhua huang <hu...@gmail.com>
> wrote:
>
>> @shane knapp <sk...@berkeley.edu> thank you very much, I opened an
>> issue for this https://issues.apache.org/jira/browse/SPARK-29106, we can
>> tall the details in it :)
>> And we will prepare an arm instance today and will send the info to your
>> email later.
>>
>> On Tue, Sep 17, 2019 at 4:40 AM Shane Knapp <sk...@berkeley.edu> wrote:
>>
>>> @Tianhua huang <hu...@gmail.com> sure, i think we can get
>>> something sorted for the short-term.
>>>
>>> all we need is ssh access (i can provide an ssh key), and i can then
>>> have our jenkins master launch a remote worker on that instance.
>>>
>>> instance setup, etc, will be up to you.  my support for the time being
>>> will be to create the job and 'best effort' for everything else.
>>>
>>> this should get us up and running asap.
>>>
>>> is there an open JIRA for jenkins/arm test support?  we can move the
>>> technical details about this idea there.
>>>
>>> On Sun, Sep 15, 2019 at 9:03 PM Tianhua huang <hu...@gmail.com>
>>> wrote:
>>>
>>>> @Sean Owen <sr...@gmail.com> , so sorry to reply late, we had a
>>>> Mid-Autumn holiday:)
>>>>
>>>> If you hope to integrate ARM CI to amplab jenkins, we can offer the arm
>>>> instance, and then the ARM job will run together with other x86 jobs, so
>>>> maybe there is a guideline to do this? @shane knapp
>>>> <sk...@berkeley.edu>  would you help us?
>>>>
>>>> On Thu, Sep 12, 2019 at 9:36 PM Sean Owen <sr...@gmail.com> wrote:
>>>>
>>>>> I don't know what's involved in actually accepting or operating those
>>>>> machines, so can't comment there, but in the meantime it's good that you
>>>>> are running these tests and can help report changes needed to keep it
>>>>> working with ARM. I would continue with that for now.
>>>>>
>>>>> On Wed, Sep 11, 2019 at 10:06 PM Tianhua huang <
>>>>> huangtianhua223@gmail.com> wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> For the whole work process of spark ARM CI, we want to make 2 things
>>>>>> clear.
>>>>>>
>>>>>> The first thing is:
>>>>>> About spark ARM CI, now we have two periodic jobs, one job[1] based
>>>>>> on commit[2](which already fixed the replay tests failed issue[3], we made
>>>>>> a new test branch based on date 09-09-2019), the other job[4] based on
>>>>>> spark master.
>>>>>>
>>>>>> The first job we test on the specified branch to prove that our ARM
>>>>>> CI is good and stable.
>>>>>> The second job checks spark master every day, then we can find
>>>>>> whether the latest commits affect the ARM CI. According to the build
>>>>>> history and result, it shows that some problems are easier to find on ARM
>>>>>> like SPARK-28770 <https://issues.apache.org/jira/browse/SPARK-28770>,
>>>>>> and it also shows that we would make efforts to trace and figure them
>>>>>> out, till now we have found and fixed several problems[5][6][7], thanks
>>>>>> everyone of the community :). And we believe that ARM CI is very necessary,
>>>>>> right?
>>>>>>
>>>>>> The second thing is:
>>>>>> We plan to run the jobs for a period of time, and you can see the
>>>>>> result and logs from 'build history' of the jobs console, if everything
>>>>>> goes well for one or two weeks could community accept the ARM CI? or how
>>>>>> long the periodic jobs to run then our community could have enough
>>>>>> confidence to accept the ARM CI? As you suggested before, it's good to
>>>>>> integrate ARM CI to amplab jenkins, we agree that and we can donate the ARM
>>>>>> instances and then maintain the ARM-related test jobs together with
>>>>>> community, any thoughts?
>>>>>>
>>>>>> Thank you all!
>>>>>>
>>>>>> [1]
>>>>>> http://status.openlabtesting.org/job/spark-unchanged-branch-unit-test-hadoop-2.7-arm64
>>>>>> [2]
>>>>>> https://github.com/apache/spark/commit/0ed9fae45769d4b06b8cf8128f462f09ff3d9a72
>>>>>> [3] https://issues.apache.org/jira/browse/SPARK-28770
>>>>>> [4]
>>>>>> http://status.openlabtesting.org/builds?job_name=spark-master-unit-test-hadoop-2.7-arm64
>>>>>> [5] https://github.com/apache/spark/pull/25186
>>>>>> [6] https://github.com/apache/spark/pull/25279
>>>>>> [7] https://github.com/apache/spark/pull/25673
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Aug 16, 2019 at 11:24 PM Sean Owen <sr...@gmail.com> wrote:
>>>>>>
>>>>>>> Yes, I think it's just local caching. After you run the build you
>>>>>>> should find lots of stuff cached at ~/.m2/repository and it won't download
>>>>>>> every time.
>>>>>>>
>>>>>>> On Fri, Aug 16, 2019 at 3:01 AM bo zhaobo <
>>>>>>> bzhaojyathousandy@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi Sean,
>>>>>>>> Thanks for reply. And very apologize for making you confused.
>>>>>>>> I know the dependencies will be downloaded from SBT or Maven. But
>>>>>>>> the Spark QA job also exec "mvn clean package", why the log didn't print
>>>>>>>> "downloading some jar from Maven central [1] and build very fast. Is the
>>>>>>>> reason that Spark Jenkins build the Spark jars in the physical machiines
>>>>>>>> and won't destrory the test env after job is finished? Then the other job
>>>>>>>> build Spark will get the dependencies jar from the local cached, as the
>>>>>>>> previous jobs exec "mvn package", those dependencies had been downloaded
>>>>>>>> already on local worker machine. Am I right? Is that the reason the job
>>>>>>>> log[1] didn't print any downloading information from Maven Central?
>>>>>>>>
>>>>>>>> Thank you very much.
>>>>>>>>
>>>>>>>> [1]
>>>>>>>> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-master-test-maven-hadoop-2.6-ubuntu-testing/lastBuild/consoleFull
>>>>>>>>
>>>>>>>>
>>>>>>>> Best regards
>>>>>>>>
>>>>>>>> ZhaoBo
>>>>>>>>
>>>>>>>> [image: Mailtrack]
>>>>>>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> Sender
>>>>>>>> notified by
>>>>>>>> Mailtrack
>>>>>>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> 19/08/16
>>>>>>>> 下午03:58:53
>>>>>>>>
>>>>>>>> Sean Owen <sr...@gmail.com> 于2019年8月16日周五 上午10:38写道：
>>>>>>>>
>>>>>>>>> I'm not sure what you mean. The dependencies are downloaded by SBT
>>>>>>>>> and Maven like in any other project, and nothing about it is specific to
>>>>>>>>> Spark.
>>>>>>>>> The worker machines cache artifacts that are downloaded from
>>>>>>>>> these, but this is a function of Maven and SBT, not Spark. You may find
>>>>>>>>> that the initial download takes a long time.
>>>>>>>>>
>>>>>>>>> On Thu, Aug 15, 2019 at 9:02 PM bo zhaobo <
>>>>>>>>> bzhaojyathousandy@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Sean,
>>>>>>>>>>
>>>>>>>>>> Thanks very much for pointing out the roadmap. ;-). Then I think
>>>>>>>>>> we will continue to focus on our test environment.
>>>>>>>>>>
>>>>>>>>>> For the networking problems, I mean that we can access Maven
>>>>>>>>>> Central, and jobs cloud download the required jar package with a high
>>>>>>>>>> network speed. What we want to know is that, why the Spark QA test jobs[1]
>>>>>>>>>> log shows the job script/maven build seem don't download the jar packages?
>>>>>>>>>> Could you tell us the reason about that? Thank you.  The reason we raise
>>>>>>>>>> the "networking problems" is that we found a phenomenon during we test, if
>>>>>>>>>> we execute "mvn clean package" in a new test environment(As in our test
>>>>>>>>>> environment, we will destory the test VMs after the job is finish), maven
>>>>>>>>>> will download the dependency jar packages from Maven Central, but in this
>>>>>>>>>> job "spark-master-test-maven-hadoop" [2], from the log, we didn't found it
>>>>>>>>>> download any jar packages, what the reason about that?
>>>>>>>>>> Also we build the Spark jar with downloading dependencies from
>>>>>>>>>> Maven Central, it will cost mostly 1 hour. And we found [2] just cost
>>>>>>>>>> 10min. But if we run "mvn package" in a VM which already exec "mvn package"
>>>>>>>>>> before, it just cost 14min, looks very closer with [2]. So we suspect that
>>>>>>>>>> downloading the Jar packages cost so much time. For the goad of ARM CI, we
>>>>>>>>>> expect the performance of NEW ARM CI could be closer with existing X86 CI,
>>>>>>>>>> then users could accept it eaiser.
>>>>>>>>>>
>>>>>>>>>> [1]
>>>>>>>>>> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/
>>>>>>>>>> [2]
>>>>>>>>>> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-master-test-maven-hadoop-2.6-ubuntu-testing/lastBuild/consoleFull
>>>>>>>>>>
>>>>>>>>>> Best regards
>>>>>>>>>>
>>>>>>>>>> ZhaoBo
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> [image: Mailtrack]
>>>>>>>>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> Sender
>>>>>>>>>> notified by
>>>>>>>>>> Mailtrack
>>>>>>>>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> 19/08/16
>>>>>>>>>> 上午09:48:43
>>>>>>>>>>
>>>>>>>>>> Sean Owen <sr...@gmail.com> 于2019年8月15日周四 下午9:58写道：
>>>>>>>>>>
>>>>>>>>>>> I think the right goal is to fix the remaining issues first. If
>>>>>>>>>>> we set up CI/CD it will only tell us there are still some test failures. If
>>>>>>>>>>> it's stable, and not hard to add to the existing CI/CD, yes it could be
>>>>>>>>>>> done automatically later. You can continue to test on ARM independently for
>>>>>>>>>>> now.
>>>>>>>>>>>
>>>>>>>>>>> It sounds indeed like there are some networking problems in the
>>>>>>>>>>> test system if you're not able to download from Maven Central. That rarely
>>>>>>>>>>> takes significant time, and there aren't project-specific mirrors here. You
>>>>>>>>>>> might be able to point at a closer public mirror, depending on where you
>>>>>>>>>>> are.
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Aug 15, 2019 at 5:43 AM Tianhua huang <
>>>>>>>>>>> huangtianhua223@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>
>>>>>>>>>>>> I want to discuss spark ARM CI again, we took some tests on arm
>>>>>>>>>>>> instance based on master and the job includes
>>>>>>>>>>>> https://github.com/theopenlab/spark/pull/13  and k8s
>>>>>>>>>>>> integration https://github.com/theopenlab/spark/pull/17/ ,
>>>>>>>>>>>> there are several things I want to talk about:
>>>>>>>>>>>>
>>>>>>>>>>>> First, about the failed tests:
>>>>>>>>>>>>     1.we have fixed some problems like
>>>>>>>>>>>> https://github.com/apache/spark/pull/25186 and
>>>>>>>>>>>> https://github.com/apache/spark/pull/25279, thanks sean owen
>>>>>>>>>>>> and others to help us.
>>>>>>>>>>>>     2.we tried k8s integration test on arm, and met an error:
>>>>>>>>>>>> apk fetch hangs,  the tests passed  after adding '--network host' option
>>>>>>>>>>>> for command `docker build`, see:
>>>>>>>>>>>>
>>>>>>>>>>>> https://github.com/theopenlab/spark/pull/17/files#diff-5b731b14068240d63a93c393f6f9b1e8R176
>>>>>>>>>>>> , the solution refers to
>>>>>>>>>>>> https://github.com/gliderlabs/docker-alpine/issues/307  and I
>>>>>>>>>>>> don't know whether it happened once in community CI, or maybe we should
>>>>>>>>>>>> submit a pr to pass  '--network host' when `docker build`?
>>>>>>>>>>>>     3.we found there are two tests failed after the commit
>>>>>>>>>>>> https://github.com/apache/spark/pull/23767  :
>>>>>>>>>>>>        ReplayListenerSuite:
>>>>>>>>>>>>        - ...
>>>>>>>>>>>>        - End-to-end replay *** FAILED ***
>>>>>>>>>>>>          "[driver]" did not equal "[1]"
>>>>>>>>>>>> (JsonProtocolSuite.scala:622)
>>>>>>>>>>>>        - End-to-end replay with compression *** FAILED ***
>>>>>>>>>>>>          "[driver]" did not equal "[1]"
>>>>>>>>>>>> (JsonProtocolSuite.scala:622)
>>>>>>>>>>>>
>>>>>>>>>>>>         we tried to revert the commit and then the tests
>>>>>>>>>>>> passed, the patch is too big and so sorry we can't find the reason till
>>>>>>>>>>>> now, if you are interesting please try it, and it will be very appreciate
>>>>>>>>>>>>         if someone can help us to figure it out.
>>>>>>>>>>>>
>>>>>>>>>>>> Second, about the test time, we increased the flavor of arm
>>>>>>>>>>>> instance to 16U16G, but seems there was no significant improvement, the k8s
>>>>>>>>>>>> integration test took about one and a half hours, and the QA test(like
>>>>>>>>>>>> spark-master-test-maven-hadoop-2.7 community jenkins job) took about
>>>>>>>>>>>> seventeen hours(it is too long :(), we suspect that the reason is the
>>>>>>>>>>>> performance and network,
>>>>>>>>>>>> we split the jobs based on projects such as sql, core and so
>>>>>>>>>>>> on, the time can be decrease to about seven hours, see
>>>>>>>>>>>> https://github.com/theopenlab/spark/pull/19 We found the Spark
>>>>>>>>>>>> QA tests like
>>>>>>>>>>>> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/   ,
>>>>>>>>>>>> it looks all tests seem never download the jar packages from maven centry
>>>>>>>>>>>> repo(such as
>>>>>>>>>>>> https://repo.maven.apache.org/maven2/org/opencypher/okapi-api/0.4.2/okapi-api-0.4.2.jar).
>>>>>>>>>>>> So we want to know how the jenkins jobs can do that, is there a internal
>>>>>>>>>>>> maven repo launched? maybe we can do the same thing to avoid the network
>>>>>>>>>>>> connection cost during downloading the dependent jar packages.
>>>>>>>>>>>>
>>>>>>>>>>>> Third, the most important thing, it's about ARM CI of spark, we
>>>>>>>>>>>> believe that it is necessary, right? And you can see we really made a lot
>>>>>>>>>>>> of efforts, now the basic arm build/test jobs is ok, so we suggest to add
>>>>>>>>>>>> arm jobs to community, we can set them to novoting firstly, and
>>>>>>>>>>>> improve/rich the jobs step by step. Generally, there are two ways in our
>>>>>>>>>>>> mind to integrate the ARM CI for spark:
>>>>>>>>>>>>      1) We introduce openlab ARM CI into spark as a custom CI
>>>>>>>>>>>> system. We provide human resources and test ARM VMs, also we will focus on
>>>>>>>>>>>> the ARM related issues about Spark. We will push the PR into community.
>>>>>>>>>>>>      2) We donate ARM VM resources into existing amplab
>>>>>>>>>>>> Jenkins. We still provide human resources, focus on the ARM related issues
>>>>>>>>>>>> about Spark and push the PR into community.
>>>>>>>>>>>> Both options, we will provide human resources to maintain, of
>>>>>>>>>>>> course it will be great if we can work together. So please tell us which
>>>>>>>>>>>> option you would like? And let's move forward. Waiting for your reply,
>>>>>>>>>>>> thank you very much.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>
>>> --
>>> Shane Knapp
>>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>> https://rise.cs.berkeley.edu
>>>
>>

Re: Ask for ARM CI for spark

Posted by Dongjoon Hyun <do...@gmail.com>.

Hi, Tianhua.

Could you summarize the detail on the JIRA once more?
It will be very helpful for the community. Also, I've been waiting on that
JIRA. :)

Bests,
Dongjoon.


On Mon, Sep 16, 2019 at 11:48 PM Tianhua huang <hu...@gmail.com>
wrote:

> @shane knapp <sk...@berkeley.edu> thank you very much, I opened an issue
> for this https://issues.apache.org/jira/browse/SPARK-29106, we can tall
> the details in it :)
> And we will prepare an arm instance today and will send the info to your
> email later.
>
> On Tue, Sep 17, 2019 at 4:40 AM Shane Knapp <sk...@berkeley.edu> wrote:
>
>> @Tianhua huang <hu...@gmail.com> sure, i think we can get
>> something sorted for the short-term.
>>
>> all we need is ssh access (i can provide an ssh key), and i can then have
>> our jenkins master launch a remote worker on that instance.
>>
>> instance setup, etc, will be up to you.  my support for the time being
>> will be to create the job and 'best effort' for everything else.
>>
>> this should get us up and running asap.
>>
>> is there an open JIRA for jenkins/arm test support?  we can move the
>> technical details about this idea there.
>>
>> On Sun, Sep 15, 2019 at 9:03 PM Tianhua huang <hu...@gmail.com>
>> wrote:
>>
>>> @Sean Owen <sr...@gmail.com> , so sorry to reply late, we had a
>>> Mid-Autumn holiday:)
>>>
>>> If you hope to integrate ARM CI to amplab jenkins, we can offer the arm
>>> instance, and then the ARM job will run together with other x86 jobs, so
>>> maybe there is a guideline to do this? @shane knapp
>>> <sk...@berkeley.edu>  would you help us?
>>>
>>> On Thu, Sep 12, 2019 at 9:36 PM Sean Owen <sr...@gmail.com> wrote:
>>>
>>>> I don't know what's involved in actually accepting or operating those
>>>> machines, so can't comment there, but in the meantime it's good that you
>>>> are running these tests and can help report changes needed to keep it
>>>> working with ARM. I would continue with that for now.
>>>>
>>>> On Wed, Sep 11, 2019 at 10:06 PM Tianhua huang <
>>>> huangtianhua223@gmail.com> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> For the whole work process of spark ARM CI, we want to make 2 things
>>>>> clear.
>>>>>
>>>>> The first thing is:
>>>>> About spark ARM CI, now we have two periodic jobs, one job[1] based on
>>>>> commit[2](which already fixed the replay tests failed issue[3], we made a
>>>>> new test branch based on date 09-09-2019), the other job[4] based on spark
>>>>> master.
>>>>>
>>>>> The first job we test on the specified branch to prove that our ARM CI
>>>>> is good and stable.
>>>>> The second job checks spark master every day, then we can find whether
>>>>> the latest commits affect the ARM CI. According to the build history and
>>>>> result, it shows that some problems are easier to find on ARM like
>>>>> SPARK-28770 <https://issues.apache.org/jira/browse/SPARK-28770>, and
>>>>> it also shows that we would make efforts to trace and figure them out, till
>>>>> now we have found and fixed several problems[5][6][7], thanks everyone of
>>>>> the community :). And we believe that ARM CI is very necessary, right?
>>>>>
>>>>> The second thing is:
>>>>> We plan to run the jobs for a period of time, and you can see the
>>>>> result and logs from 'build history' of the jobs console, if everything
>>>>> goes well for one or two weeks could community accept the ARM CI? or how
>>>>> long the periodic jobs to run then our community could have enough
>>>>> confidence to accept the ARM CI? As you suggested before, it's good to
>>>>> integrate ARM CI to amplab jenkins, we agree that and we can donate the ARM
>>>>> instances and then maintain the ARM-related test jobs together with
>>>>> community, any thoughts?
>>>>>
>>>>> Thank you all!
>>>>>
>>>>> [1]
>>>>> http://status.openlabtesting.org/job/spark-unchanged-branch-unit-test-hadoop-2.7-arm64
>>>>> [2]
>>>>> https://github.com/apache/spark/commit/0ed9fae45769d4b06b8cf8128f462f09ff3d9a72
>>>>> [3] https://issues.apache.org/jira/browse/SPARK-28770
>>>>> [4]
>>>>> http://status.openlabtesting.org/builds?job_name=spark-master-unit-test-hadoop-2.7-arm64
>>>>> [5] https://github.com/apache/spark/pull/25186
>>>>> [6] https://github.com/apache/spark/pull/25279
>>>>> [7] https://github.com/apache/spark/pull/25673
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Aug 16, 2019 at 11:24 PM Sean Owen <sr...@gmail.com> wrote:
>>>>>
>>>>>> Yes, I think it's just local caching. After you run the build you
>>>>>> should find lots of stuff cached at ~/.m2/repository and it won't download
>>>>>> every time.
>>>>>>
>>>>>> On Fri, Aug 16, 2019 at 3:01 AM bo zhaobo <
>>>>>> bzhaojyathousandy@gmail.com> wrote:
>>>>>>
>>>>>>> Hi Sean,
>>>>>>> Thanks for reply. And very apologize for making you confused.
>>>>>>> I know the dependencies will be downloaded from SBT or Maven. But
>>>>>>> the Spark QA job also exec "mvn clean package", why the log didn't print
>>>>>>> "downloading some jar from Maven central [1] and build very fast. Is the
>>>>>>> reason that Spark Jenkins build the Spark jars in the physical machiines
>>>>>>> and won't destrory the test env after job is finished? Then the other job
>>>>>>> build Spark will get the dependencies jar from the local cached, as the
>>>>>>> previous jobs exec "mvn package", those dependencies had been downloaded
>>>>>>> already on local worker machine. Am I right? Is that the reason the job
>>>>>>> log[1] didn't print any downloading information from Maven Central?
>>>>>>>
>>>>>>> Thank you very much.
>>>>>>>
>>>>>>> [1]
>>>>>>> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-master-test-maven-hadoop-2.6-ubuntu-testing/lastBuild/consoleFull
>>>>>>>
>>>>>>>
>>>>>>> Best regards
>>>>>>>
>>>>>>> ZhaoBo
>>>>>>>
>>>>>>> [image: Mailtrack]
>>>>>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> Sender
>>>>>>> notified by
>>>>>>> Mailtrack
>>>>>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> 19/08/16
>>>>>>> 下午03:58:53
>>>>>>>
>>>>>>> Sean Owen <sr...@gmail.com> 于2019年8月16日周五 上午10:38写道：
>>>>>>>
>>>>>>>> I'm not sure what you mean. The dependencies are downloaded by SBT
>>>>>>>> and Maven like in any other project, and nothing about it is specific to
>>>>>>>> Spark.
>>>>>>>> The worker machines cache artifacts that are downloaded from these,
>>>>>>>> but this is a function of Maven and SBT, not Spark. You may find that the
>>>>>>>> initial download takes a long time.
>>>>>>>>
>>>>>>>> On Thu, Aug 15, 2019 at 9:02 PM bo zhaobo <
>>>>>>>> bzhaojyathousandy@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi Sean,
>>>>>>>>>
>>>>>>>>> Thanks very much for pointing out the roadmap. ;-). Then I think
>>>>>>>>> we will continue to focus on our test environment.
>>>>>>>>>
>>>>>>>>> For the networking problems, I mean that we can access Maven
>>>>>>>>> Central, and jobs cloud download the required jar package with a high
>>>>>>>>> network speed. What we want to know is that, why the Spark QA test jobs[1]
>>>>>>>>> log shows the job script/maven build seem don't download the jar packages?
>>>>>>>>> Could you tell us the reason about that? Thank you.  The reason we raise
>>>>>>>>> the "networking problems" is that we found a phenomenon during we test, if
>>>>>>>>> we execute "mvn clean package" in a new test environment(As in our test
>>>>>>>>> environment, we will destory the test VMs after the job is finish), maven
>>>>>>>>> will download the dependency jar packages from Maven Central, but in this
>>>>>>>>> job "spark-master-test-maven-hadoop" [2], from the log, we didn't found it
>>>>>>>>> download any jar packages, what the reason about that?
>>>>>>>>> Also we build the Spark jar with downloading dependencies from
>>>>>>>>> Maven Central, it will cost mostly 1 hour. And we found [2] just cost
>>>>>>>>> 10min. But if we run "mvn package" in a VM which already exec "mvn package"
>>>>>>>>> before, it just cost 14min, looks very closer with [2]. So we suspect that
>>>>>>>>> downloading the Jar packages cost so much time. For the goad of ARM CI, we
>>>>>>>>> expect the performance of NEW ARM CI could be closer with existing X86 CI,
>>>>>>>>> then users could accept it eaiser.
>>>>>>>>>
>>>>>>>>> [1] https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/
>>>>>>>>>
>>>>>>>>> [2]
>>>>>>>>> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-master-test-maven-hadoop-2.6-ubuntu-testing/lastBuild/consoleFull
>>>>>>>>>
>>>>>>>>> Best regards
>>>>>>>>>
>>>>>>>>> ZhaoBo
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> [image: Mailtrack]
>>>>>>>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> Sender
>>>>>>>>> notified by
>>>>>>>>> Mailtrack
>>>>>>>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> 19/08/16
>>>>>>>>> 上午09:48:43
>>>>>>>>>
>>>>>>>>> Sean Owen <sr...@gmail.com> 于2019年8月15日周四 下午9:58写道：
>>>>>>>>>
>>>>>>>>>> I think the right goal is to fix the remaining issues first. If
>>>>>>>>>> we set up CI/CD it will only tell us there are still some test failures. If
>>>>>>>>>> it's stable, and not hard to add to the existing CI/CD, yes it could be
>>>>>>>>>> done automatically later. You can continue to test on ARM independently for
>>>>>>>>>> now.
>>>>>>>>>>
>>>>>>>>>> It sounds indeed like there are some networking problems in the
>>>>>>>>>> test system if you're not able to download from Maven Central. That rarely
>>>>>>>>>> takes significant time, and there aren't project-specific mirrors here. You
>>>>>>>>>> might be able to point at a closer public mirror, depending on where you
>>>>>>>>>> are.
>>>>>>>>>>
>>>>>>>>>> On Thu, Aug 15, 2019 at 5:43 AM Tianhua huang <
>>>>>>>>>> huangtianhua223@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi all,
>>>>>>>>>>>
>>>>>>>>>>> I want to discuss spark ARM CI again, we took some tests on arm
>>>>>>>>>>> instance based on master and the job includes
>>>>>>>>>>> https://github.com/theopenlab/spark/pull/13  and k8s
>>>>>>>>>>> integration https://github.com/theopenlab/spark/pull/17/ ,
>>>>>>>>>>> there are several things I want to talk about:
>>>>>>>>>>>
>>>>>>>>>>> First, about the failed tests:
>>>>>>>>>>>     1.we have fixed some problems like
>>>>>>>>>>> https://github.com/apache/spark/pull/25186 and
>>>>>>>>>>> https://github.com/apache/spark/pull/25279, thanks sean owen
>>>>>>>>>>> and others to help us.
>>>>>>>>>>>     2.we tried k8s integration test on arm, and met an error:
>>>>>>>>>>> apk fetch hangs,  the tests passed  after adding '--network host' option
>>>>>>>>>>> for command `docker build`, see:
>>>>>>>>>>>
>>>>>>>>>>> https://github.com/theopenlab/spark/pull/17/files#diff-5b731b14068240d63a93c393f6f9b1e8R176
>>>>>>>>>>> , the solution refers to
>>>>>>>>>>> https://github.com/gliderlabs/docker-alpine/issues/307  and I
>>>>>>>>>>> don't know whether it happened once in community CI, or maybe we should
>>>>>>>>>>> submit a pr to pass  '--network host' when `docker build`?
>>>>>>>>>>>     3.we found there are two tests failed after the commit
>>>>>>>>>>> https://github.com/apache/spark/pull/23767  :
>>>>>>>>>>>        ReplayListenerSuite:
>>>>>>>>>>>        - ...
>>>>>>>>>>>        - End-to-end replay *** FAILED ***
>>>>>>>>>>>          "[driver]" did not equal "[1]"
>>>>>>>>>>> (JsonProtocolSuite.scala:622)
>>>>>>>>>>>        - End-to-end replay with compression *** FAILED ***
>>>>>>>>>>>          "[driver]" did not equal "[1]"
>>>>>>>>>>> (JsonProtocolSuite.scala:622)
>>>>>>>>>>>
>>>>>>>>>>>         we tried to revert the commit and then the tests passed,
>>>>>>>>>>> the patch is too big and so sorry we can't find the reason till now, if you
>>>>>>>>>>> are interesting please try it, and it will be very appreciate          if
>>>>>>>>>>> someone can help us to figure it out.
>>>>>>>>>>>
>>>>>>>>>>> Second, about the test time, we increased the flavor of arm
>>>>>>>>>>> instance to 16U16G, but seems there was no significant improvement, the k8s
>>>>>>>>>>> integration test took about one and a half hours, and the QA test(like
>>>>>>>>>>> spark-master-test-maven-hadoop-2.7 community jenkins job) took about
>>>>>>>>>>> seventeen hours(it is too long :(), we suspect that the reason is the
>>>>>>>>>>> performance and network,
>>>>>>>>>>> we split the jobs based on projects such as sql, core and so on,
>>>>>>>>>>> the time can be decrease to about seven hours, see
>>>>>>>>>>> https://github.com/theopenlab/spark/pull/19 We found the Spark
>>>>>>>>>>> QA tests like
>>>>>>>>>>> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/   ,
>>>>>>>>>>> it looks all tests seem never download the jar packages from maven centry
>>>>>>>>>>> repo(such as
>>>>>>>>>>> https://repo.maven.apache.org/maven2/org/opencypher/okapi-api/0.4.2/okapi-api-0.4.2.jar).
>>>>>>>>>>> So we want to know how the jenkins jobs can do that, is there a internal
>>>>>>>>>>> maven repo launched? maybe we can do the same thing to avoid the network
>>>>>>>>>>> connection cost during downloading the dependent jar packages.
>>>>>>>>>>>
>>>>>>>>>>> Third, the most important thing, it's about ARM CI of spark, we
>>>>>>>>>>> believe that it is necessary, right? And you can see we really made a lot
>>>>>>>>>>> of efforts, now the basic arm build/test jobs is ok, so we suggest to add
>>>>>>>>>>> arm jobs to community, we can set them to novoting firstly, and
>>>>>>>>>>> improve/rich the jobs step by step. Generally, there are two ways in our
>>>>>>>>>>> mind to integrate the ARM CI for spark:
>>>>>>>>>>>      1) We introduce openlab ARM CI into spark as a custom CI
>>>>>>>>>>> system. We provide human resources and test ARM VMs, also we will focus on
>>>>>>>>>>> the ARM related issues about Spark. We will push the PR into community.
>>>>>>>>>>>      2) We donate ARM VM resources into existing amplab Jenkins.
>>>>>>>>>>> We still provide human resources, focus on the ARM related issues about
>>>>>>>>>>> Spark and push the PR into community.
>>>>>>>>>>> Both options, we will provide human resources to maintain, of
>>>>>>>>>>> course it will be great if we can work together. So please tell us which
>>>>>>>>>>> option you would like? And let's move forward. Waiting for your reply,
>>>>>>>>>>> thank you very much.
>>>>>>>>>>>
>>>>>>>>>>
>>
>> --
>> Shane Knapp
>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>> https://rise.cs.berkeley.edu
>>
>

Re: Ask for ARM CI for spark

Posted by Tianhua huang <hu...@gmail.com>.

We can talk about this later, but I have to update some things:)

- It (largely) worked previously
  --- But no one sure about this before the arm testing, and it can't be
found anywhere, specify officially will make it more clear
- I think you're also saying you don't have 100% tests passing anyway,
though probably just small issues
  --- The maven and python tests are 100% passing, see
https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-arm/ and
https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-python-arm/
- It does not seem to merit a special announcement from the PMC among
the 2000+ changes in Spark 3
  --- It's import to users, I believe it deserves

On Mon, Nov 18, 2019 at 10:06 AM Sean Owen <sr...@apache.org> wrote:

> Same response as before:
>
> - It is in the list of resolved JIRAs, of course
> - It (largely) worked previously
> - I think you're also saying you don't have 100% tests passing anyway,
> though probably just small issues
> - It does not seem to merit a special announcement from the PMC among
> the 2000+ changes in Spark 3
> - You are welcome to announce (on the project's user@ list if you
> like) whatever you want. Obviously, this is already well advertised on
> dev@
>
> I think you are asking for what borders on endorsement, and no that
> doesn't sound appropriate. Please just announce whatever you like as
> suggested.
>
> Sean
>
> On Sun, Nov 17, 2019 at 8:01 PM Tianhua huang <hu...@gmail.com>
> wrote:
> >
> > @Sean Owen，
> > I'm afraid I don't agree with you this time, I still remember no one can
> tell me whether Spark supports ARM or how much Spark can support ARM when I
> asked this first time on Dev@,  you're very kind and told me to build and
> test on ARM locally and so sorry I think you were not sure much about this
> at that moment, right? Then I and my team work with community, we
> found/fixed several issues, integrate arm jobs into AMPLAB Jenkins, and the
> daily jobs has been stablely running for few weeks... after these efforts
> why not announce this officially in Spark releasenote? I believe after this
> everyone will know Spark is fully testing on ARM on community CI, Spark
> supports ARM basically, it's amazing and this will be very helpful. So what
> do you think? Or what are you worrying about?
>

Re: Ask for ARM CI for spark

Posted by bo zhaobo <bz...@gmail.com>.

Hi @Sean Owen <sr...@gmail.com> ,

Thanks for your reply and patient.
First, we are so apologized for the bad words in the previous emails. We
just want to make the users can see the current support status in some
place of spark community. I'm really appreciated that you and spark
community make spark better on ARM, and open to us.
Second, that's correct, adding this kind information of CICD into
releasenotes is improper. So we won't do that.
Third, I think your suggest is good for current situation. We will follow
your kind suggestion and send a email to user@ and dev@ to describe our
testing result, including the test coverage and the known issue we found.
Also we hope that we do this could be good for attracting more
users/developers of spark use ARM.
Fourth, we still have concerns that user still can not know very clear that
Spark can run on which ARCH with generic testing. So user will ask the same
question again and again which mentioned by huangtianhua in spark
community, eventhrough they plan to build/test on the specific ARCH. Here
we are not true whether community has a good way to resolve this. Here just
a suggest from us, how about describe the current testing status(all test
status) of Amplab in some place of spark? Then users can know spark testing
which already be pretested with generic test cases in upstream community,
and feel confident to use spark on anyplace what they want according to
that information.

In the end, we always believe community and follow community suggestions.
Please feel free to tell us about our outcomes at work, and welcome to work
together on Spark ARM. If any issue hit on ARM, please also @us for
discuss. ;-)

Thank you @Sean Owen <sr...@gmail.com> and team.

BR

ZhaoBo

[image: Mailtrack]
<https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&>
Sender
notified by
Mailtrack
<https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&>
19/11/19
下午05:49:00

Sean Owen <sr...@apache.org> 于2019年11月18日周一 上午10:06写道：

> Same response as before:
>
> - It is in the list of resolved JIRAs, of course
> - It (largely) worked previously
> - I think you're also saying you don't have 100% tests passing anyway,
> though probably just small issues
> - It does not seem to merit a special announcement from the PMC among
> the 2000+ changes in Spark 3
> - You are welcome to announce (on the project's user@ list if you
> like) whatever you want. Obviously, this is already well advertised on
> dev@
>
> I think you are asking for what borders on endorsement, and no that
> doesn't sound appropriate. Please just announce whatever you like as
> suggested.
>
> Sean
>
> On Sun, Nov 17, 2019 at 8:01 PM Tianhua huang <hu...@gmail.com>
> wrote:
> >
> > @Sean Owen，
> > I'm afraid I don't agree with you this time, I still remember no one can
> tell me whether Spark supports ARM or how much Spark can support ARM when I
> asked this first time on Dev@,  you're very kind and told me to build and
> test on ARM locally and so sorry I think you were not sure much about this
> at that moment, right? Then I and my team work with community, we
> found/fixed several issues, integrate arm jobs into AMPLAB Jenkins, and the
> daily jobs has been stablely running for few weeks... after these efforts
> why not announce this officially in Spark releasenote? I believe after this
> everyone will know Spark is fully testing on ARM on community CI, Spark
> supports ARM basically, it's amazing and this will be very helpful. So what
> do you think? Or what are you worrying about?
>

Re: Ask for ARM CI for spark

Posted by Sean Owen <sr...@apache.org>.

Same response as before:

- It is in the list of resolved JIRAs, of course
- It (largely) worked previously
- I think you're also saying you don't have 100% tests passing anyway,
though probably just small issues
- It does not seem to merit a special announcement from the PMC among
the 2000+ changes in Spark 3
- You are welcome to announce (on the project's user@ list if you
like) whatever you want. Obviously, this is already well advertised on
dev@

I think you are asking for what borders on endorsement, and no that
doesn't sound appropriate. Please just announce whatever you like as
suggested.

Sean

On Sun, Nov 17, 2019 at 8:01 PM Tianhua huang <hu...@gmail.com> wrote:
>
> @Sean Owen，
> I'm afraid I don't agree with you this time, I still remember no one can tell me whether Spark supports ARM or how much Spark can support ARM when I asked this first time on Dev@,  you're very kind and told me to build and test on ARM locally and so sorry I think you were not sure much about this at that moment, right? Then I and my team work with community, we found/fixed several issues, integrate arm jobs into AMPLAB Jenkins, and the daily jobs has been stablely running for few weeks... after these efforts why not announce this officially in Spark releasenote? I believe after this everyone will know Spark is fully testing on ARM on community CI, Spark supports ARM basically, it's amazing and this will be very helpful. So what do you think? Or what are you worrying about?

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org

Re: Ask for ARM CI for spark

Posted by Tianhua huang <hu...@gmail.com>.

@Sean Owen <sr...@gmail.com>，
I'm afraid I don't agree with you this time, I still remember no one can
tell me whether Spark supports ARM or how much Spark can support ARM when I
asked this first time on Dev@,  you're very kind and told me to build and
test on ARM locally and so sorry I think you were not sure much about this
at that moment, right? Then I and my team work with community, we
found/fixed several issues, integrate arm jobs into AMPLAB Jenkins, and the
daily jobs has been stablely running for few weeks... after these efforts
why not announce this officially in Spark releasenote? I believe after this
everyone will know Spark is fully testing on ARM on community CI, Spark
supports ARM basically, it's amazing and this will be very helpful. So what
do you think? Or what are you worrying about?

On Mon, Nov 18, 2019 at 2:28 AM Steve Loughran <st...@cloudera.com> wrote:

> The ASF PR team would like something like that "Spark now supports ARM" in
> press releases. And don't forget: they do you like to be involved in the
> launch of the final release.
>
> On Fri, Nov 15, 2019 at 9:46 AM bo zhaobo <bz...@gmail.com>
> wrote:
>
>> Hi @Sean Owen <sr...@gmail.com> ,
>>
>> Thanks for your idea.
>>
>> We may use the bad words to describe our request. That's true that we
>> cannot just say "Spark support ARM from release 3.0.0", and we also cannot
>> say the past releases cannot run on ARM. But the reality is the past
>> releases didn't get a fully test on ARM like the current testing we do. And
>> that's true that current CI system have no resources can fit this kind
>> request(test on ARM).
>>
>> And please try to think, if a user wants to run lastest Spark release on
>> ARM(even the old releases), but community doesn't say that the specific
>> Spark release get testing on ARM. I think the users might think there is a
>> risk run on ARM, if he/she has no choice, they have to run spark on ARM,
>> they will build the CI system by themselves. That's very expensive. Right?
>> But now, community will do the same testing on ARM in the upstream, this
>> will save the users' resources. That's the reason announcing by community
>> in some ways is official and the best. Such as "In XXX release, Spark gets
>> fully testing on ARM" or "In XXX release, Spark community integrated an ARM
>> CI system. ". Once user see that, he/she would be very comfortable to use
>> Spark on ARM. ;-)
>>
>> Thanks for your paitent, we just discuss here, if I do something not
>> good, please feel free to correct and discuss. ;-)
>>
>> Thanks,
>>
>> BR
>>
>> ZhaoBo
>>
>>
>>
>>
>> [image: Mailtrack]
>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> Sender
>> notified by
>> Mailtrack
>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> 19/11/15
>> 下午05:43:57
>>
>> Sean Owen <sr...@gmail.com> 于2019年11月15日周五 下午5:04写道：
>>
>>> I don't think that's true either, not yet. Being JVM-based with no
>>> native code, I just don't even think it would be common to assume it
>>> doesn't work and it apparently has. If you want to announce it, that's
>>> up to you.
>>>
>>> On Fri, Nov 15, 2019 at 3:01 AM Tianhua huang <hu...@gmail.com>
>>> wrote:
>>> >
>>> > @Sean Owen,
>>> > Thanks for attention this.
>>> > I agree with you, it's probably not very appropriate to say 'support
>>> arm from 3.0 release'. How about change to the word "Spark community
>>> supports fully tests on arm from 3.0 release"?
>>> > Let's try to think about it from the user's point of view than
>>> developer，users have to know exactly whether spark supports arm well and
>>> wheter spark fully tests on arm. If we specify spark is fully tests on arm,
>>> I believe users will have much more confidence to run spark on arm.
>>> >
>>>
>>

Re: Ask for ARM CI for spark

Posted by Steve Loughran <st...@cloudera.com.INVALID>.

The ASF PR team would like something like that "Spark now supports ARM" in
press releases. And don't forget: they do you like to be involved in the
launch of the final release.

On Fri, Nov 15, 2019 at 9:46 AM bo zhaobo <bz...@gmail.com>
wrote:

> Hi @Sean Owen <sr...@gmail.com> ,
>
> Thanks for your idea.
>
> We may use the bad words to describe our request. That's true that we
> cannot just say "Spark support ARM from release 3.0.0", and we also cannot
> say the past releases cannot run on ARM. But the reality is the past
> releases didn't get a fully test on ARM like the current testing we do. And
> that's true that current CI system have no resources can fit this kind
> request(test on ARM).
>
> And please try to think, if a user wants to run lastest Spark release on
> ARM(even the old releases), but community doesn't say that the specific
> Spark release get testing on ARM. I think the users might think there is a
> risk run on ARM, if he/she has no choice, they have to run spark on ARM,
> they will build the CI system by themselves. That's very expensive. Right?
> But now, community will do the same testing on ARM in the upstream, this
> will save the users' resources. That's the reason announcing by community
> in some ways is official and the best. Such as "In XXX release, Spark gets
> fully testing on ARM" or "In XXX release, Spark community integrated an ARM
> CI system. ". Once user see that, he/she would be very comfortable to use
> Spark on ARM. ;-)
>
> Thanks for your paitent, we just discuss here, if I do something not good,
> please feel free to correct and discuss. ;-)
>
> Thanks,
>
> BR
>
> ZhaoBo
>
>
>
>
> [image: Mailtrack]
> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> Sender
> notified by
> Mailtrack
> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> 19/11/15
> 下午05:43:57
>
> Sean Owen <sr...@gmail.com> 于2019年11月15日周五 下午5:04写道：
>
>> I don't think that's true either, not yet. Being JVM-based with no
>> native code, I just don't even think it would be common to assume it
>> doesn't work and it apparently has. If you want to announce it, that's
>> up to you.
>>
>> On Fri, Nov 15, 2019 at 3:01 AM Tianhua huang <hu...@gmail.com>
>> wrote:
>> >
>> > @Sean Owen,
>> > Thanks for attention this.
>> > I agree with you, it's probably not very appropriate to say 'support
>> arm from 3.0 release'. How about change to the word "Spark community
>> supports fully tests on arm from 3.0 release"?
>> > Let's try to think about it from the user's point of view than
>> developer，users have to know exactly whether spark supports arm well and
>> wheter spark fully tests on arm. If we specify spark is fully tests on arm,
>> I believe users will have much more confidence to run spark on arm.
>> >
>>
>

Re: Ask for ARM CI for spark

Posted by bo zhaobo <bz...@gmail.com>.

Hi @Sean Owen <sr...@gmail.com> ,

Thanks for your idea.

We may use the bad words to describe our request. That's true that we
cannot just say "Spark support ARM from release 3.0.0", and we also cannot
say the past releases cannot run on ARM. But the reality is the past
releases didn't get a fully test on ARM like the current testing we do. And
that's true that current CI system have no resources can fit this kind
request(test on ARM).

And please try to think, if a user wants to run lastest Spark release on
ARM(even the old releases), but community doesn't say that the specific
Spark release get testing on ARM. I think the users might think there is a
risk run on ARM, if he/she has no choice, they have to run spark on ARM,
they will build the CI system by themselves. That's very expensive. Right?
But now, community will do the same testing on ARM in the upstream, this
will save the users' resources. That's the reason announcing by community
in some ways is official and the best. Such as "In XXX release, Spark gets
fully testing on ARM" or "In XXX release, Spark community integrated an ARM
CI system. ". Once user see that, he/she would be very comfortable to use
Spark on ARM. ;-)

Thanks for your paitent, we just discuss here, if I do something not good,
please feel free to correct and discuss. ;-)

Thanks,

BR

ZhaoBo

[image: Mailtrack]
<https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&>
Sender
notified by
Mailtrack
<https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&>
19/11/15
下午05:43:57

Sean Owen <sr...@gmail.com> 于2019年11月15日周五 下午5:04写道：

> I don't think that's true either, not yet. Being JVM-based with no
> native code, I just don't even think it would be common to assume it
> doesn't work and it apparently has. If you want to announce it, that's
> up to you.
>
> On Fri, Nov 15, 2019 at 3:01 AM Tianhua huang <hu...@gmail.com>
> wrote:
> >
> > @Sean Owen,
> > Thanks for attention this.
> > I agree with you, it's probably not very appropriate to say 'support arm
> from 3.0 release'. How about change to the word "Spark community supports
> fully tests on arm from 3.0 release"?
> > Let's try to think about it from the user's point of view than
> developer，users have to know exactly whether spark supports arm well and
> wheter spark fully tests on arm. If we specify spark is fully tests on arm,
> I believe users will have much more confidence to run spark on arm.
> >
>

Re: Ask for ARM CI for spark

Posted by Sean Owen <sr...@gmail.com>.

I don't think that's true either, not yet. Being JVM-based with no
native code, I just don't even think it would be common to assume it
doesn't work and it apparently has. If you want to announce it, that's
up to you.

On Fri, Nov 15, 2019 at 3:01 AM Tianhua huang <hu...@gmail.com> wrote:
>
> @Sean Owen,
> Thanks for attention this.
> I agree with you, it's probably not very appropriate to say 'support arm from 3.0 release'. How about change to the word "Spark community supports fully tests on arm from 3.0 release"?
> Let's try to think about it from the user's point of view than developer，users have to know exactly whether spark supports arm well and wheter spark fully tests on arm. If we specify spark is fully tests on arm, I believe users will have much more confidence to run spark on arm.
>

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org

Re: Ask for ARM CI for spark

Posted by bo zhaobo <bz...@gmail.com>.

Hi,

And I found Spark-3.0.0-preview had released, but there is no releasenotes
in [1]. So how about to add support ARM notes in the next
releasenotes(maybe the releasenotes of Spark-3.0.0-preview). So sorry to
raise this, I'm not familiar with this, if any bad from me please feel free
to correct.
Moreover, since the daily job has been stablely running for few weeks,
probably we can say that the release have some basic ARM support in the
next release note.
From OpenSource project and Open Source eco system, this is a good chance
to tell peoples that Spark support ARM and support cross platforms. As
Spark can support more ARCH platforms, this is good to attract more users
to use Spark, eventhough they cannot use X86 in other uncontrollable
reasons.

[1]  https://github.com/apache/spark-website/tree/asf-site/releases/_posts



[image: Mailtrack]
<https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&>
Sender
notified by
Mailtrack
<https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&>
19/11/15
上午11:16:28

bo zhaobo <bz...@gmail.com> 于2019年11月15日周五 上午11:00写道：

> Hi @Sean Owen <sr...@gmail.com> ,
>
> Thanks for reply. We know that Spark community has own release date and
> plan. We are happy to follow Spark community. But we think it's great if
> community could add a sentence into the next releasenotes and claim "Spark
> can support Arm from this release." after we finish the test work on ARM.
> That's all. We just want a community official caliber that spark support
> ARM for attracting more users to use spark.
>
> Thank you very much for your patient.
>
> BR
>
> ZhaoBo
>
>
> [image: Mailtrack]
> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> Sender
> notified by
> Mailtrack
> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> 19/11/15
> 上午10:59:29
>
> Tianhua huang <hu...@gmail.com> 于2019年11月15日周五 上午10:25写道：
>
>> @Sean,
>> Yes, you are right, we don't have to create a separate release of Spark
>> for ARM, it's enough to add a releasenote to say that Spark supports
>> arm architecture.
>> About the test failure, one or two tests will timeout on our poor
>> performance arm instance sometimes, now we donate a high performance arm
>> instance to amplab, and wait shane to build the jobs on it.
>>
>> On Fri, Nov 15, 2019 at 10:13 AM Sean Owen <sr...@gmail.com> wrote:
>>
>>> I don't quite understand. You are saying tests don't pass yet, so why
>>> would anyone yet run these tests regularly?
>>> If it's because the instances aren't fast enough, use bigger instances?
>>> I don't think anyone would create a separate release of Spark for ARM,
>>> no. But why would that be necessary?
>>>
>>> On Thu, Nov 14, 2019 at 7:28 PM bo zhaobo <bz...@gmail.com>
>>> wrote:
>>>
>>>> Hi Spark team,
>>>>
>>>> Any ideas about the above email? Thank you.
>>>>
>>>> BR
>>>>
>>>> ZhaoBo
>>>>
>>>>
>>>> [image: Mailtrack]
>>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> Sender
>>>> notified by
>>>> Mailtrack
>>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> 19/11/15
>>>> 上午09:26:17
>>>>
>>>> Tianhua huang <hu...@gmail.com> 于2019年11月12日周二 下午2:47写道：
>>>>
>>>>> Hi all,
>>>>>
>>>>> Spark arm jobs have built for some time, and now there are two jobs[1]
>>>>> spark-master-test-maven-arm
>>>>> <https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-arm/>
>>>>> and spark-master-test-python-arm
>>>>> <https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-python-arm/>,
>>>>> we can see there are some build failures, but it because of the poor
>>>>> performance of the arm instance, and now we begin to build spark arm jobs
>>>>> on other high performance instances, and the build/test are all success, we
>>>>> plan to donate the instance to amplab later.  According to the build
>>>>> history, we are very happy to say spark is supported on aarch64 platform,
>>>>> and I suggest to add this good news into spark-3.0.0 releasenotes. Maybe
>>>>> community could provide an arm-supported release of spark at the meanwhile?
>>>>>
>>>>> [1]
>>>>> https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-arm/
>>>>>
>>>>>
>>>>> https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-python-arm/
>>>>>
>>>>>
>>>>> ps: the jira https://issues.apache.org/jira/browse/SPARK-29106 trace
>>>>> the whole work, thank you very much Shane:)
>>>>>
>>>>

Re: Ask for ARM CI for spark

Posted by Tianhua huang <hu...@gmail.com>.

@Sean Owen,
Thanks for attention this.
I agree with you, it's probably not very appropriate to say 'support arm
from 3.0 release'. How about change to the word "Spark community supports
fully tests on arm from 3.0 release"?
Let's try to think about it from the user's point of view than
developer，users have to know exactly whether spark supports arm well and
wheter spark fully tests on arm. If we specify spark is fully tests on arm,
I believe users will have much more confidence to run spark on arm.

On Fri, Nov 15, 2019 at 4:05 PM Sean Owen <sr...@gmail.com> wrote:

> I'm not against it, but the JIRAs will already show that the small
> ARM-related difference like floating-point in log() were resolved.
> Those aren't major enough to highlight as key changes in the 2000+
> resolved. it didn't really not-work before either, as I understand;
> Spark isn't specific to an architecture, so I don't know if that
> situation changed materially in 3.0; it still otherwise ran in 2.x on
> ARM right? It would imply people couldn't use it on ARM previously.
> You can certainly announce you endorse 3.0 as a good release for ARM
> and/or call attention to it on user@.
>
> On Thu, Nov 14, 2019 at 9:01 PM bo zhaobo <bz...@gmail.com>
> wrote:
> >
> > Hi @Sean Owen ,
> >
> > Thanks for reply. We know that Spark community has own release date and
> plan. We are happy to follow Spark community. But we think it's great if
> community could add a sentence into the next releasenotes and claim "Spark
> can support Arm from this release." after we finish the test work on ARM.
> That's all. We just want a community official caliber that spark support
> ARM for attracting more users to use spark.
> >
>

Re: Ask for ARM CI for spark

Posted by Sean Owen <sr...@gmail.com>.

I'm not against it, but the JIRAs will already show that the small
ARM-related difference like floating-point in log() were resolved.
Those aren't major enough to highlight as key changes in the 2000+
resolved. it didn't really not-work before either, as I understand;
Spark isn't specific to an architecture, so I don't know if that
situation changed materially in 3.0; it still otherwise ran in 2.x on
ARM right? It would imply people couldn't use it on ARM previously.
You can certainly announce you endorse 3.0 as a good release for ARM
and/or call attention to it on user@.

On Thu, Nov 14, 2019 at 9:01 PM bo zhaobo <bz...@gmail.com> wrote:
>
> Hi @Sean Owen ,
>
> Thanks for reply. We know that Spark community has own release date and plan. We are happy to follow Spark community. But we think it's great if community could add a sentence into the next releasenotes and claim "Spark can support Arm from this release." after we finish the test work on ARM. That's all. We just want a community official caliber that spark support ARM for attracting more users to use spark.
>

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org

Re: Ask for ARM CI for spark

Posted by bo zhaobo <bz...@gmail.com>.

Hi @Sean Owen <sr...@gmail.com> ,

Thanks for reply. We know that Spark community has own release date and
plan. We are happy to follow Spark community. But we think it's great if
community could add a sentence into the next releasenotes and claim "Spark
can support Arm from this release." after we finish the test work on ARM.
That's all. We just want a community official caliber that spark support
ARM for attracting more users to use spark.

Thank you very much for your patient.

BR

ZhaoBo


[image: Mailtrack]
<https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&>
Sender
notified by
Mailtrack
<https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&>
19/11/15
上午10:59:29

Tianhua huang <hu...@gmail.com> 于2019年11月15日周五 上午10:25写道：

> @Sean,
> Yes, you are right, we don't have to create a separate release of Spark
> for ARM, it's enough to add a releasenote to say that Spark supports
> arm architecture.
> About the test failure, one or two tests will timeout on our poor
> performance arm instance sometimes, now we donate a high performance arm
> instance to amplab, and wait shane to build the jobs on it.
>
> On Fri, Nov 15, 2019 at 10:13 AM Sean Owen <sr...@gmail.com> wrote:
>
>> I don't quite understand. You are saying tests don't pass yet, so why
>> would anyone yet run these tests regularly?
>> If it's because the instances aren't fast enough, use bigger instances?
>> I don't think anyone would create a separate release of Spark for ARM,
>> no. But why would that be necessary?
>>
>> On Thu, Nov 14, 2019 at 7:28 PM bo zhaobo <bz...@gmail.com>
>> wrote:
>>
>>> Hi Spark team,
>>>
>>> Any ideas about the above email? Thank you.
>>>
>>> BR
>>>
>>> ZhaoBo
>>>
>>>
>>> [image: Mailtrack]
>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> Sender
>>> notified by
>>> Mailtrack
>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> 19/11/15
>>> 上午09:26:17
>>>
>>> Tianhua huang <hu...@gmail.com> 于2019年11月12日周二 下午2:47写道：
>>>
>>>> Hi all,
>>>>
>>>> Spark arm jobs have built for some time, and now there are two jobs[1]
>>>> spark-master-test-maven-arm
>>>> <https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-arm/>
>>>> and spark-master-test-python-arm
>>>> <https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-python-arm/>,
>>>> we can see there are some build failures, but it because of the poor
>>>> performance of the arm instance, and now we begin to build spark arm jobs
>>>> on other high performance instances, and the build/test are all success, we
>>>> plan to donate the instance to amplab later.  According to the build
>>>> history, we are very happy to say spark is supported on aarch64 platform,
>>>> and I suggest to add this good news into spark-3.0.0 releasenotes. Maybe
>>>> community could provide an arm-supported release of spark at the meanwhile?
>>>>
>>>> [1]
>>>> https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-arm/
>>>>
>>>> https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-python-arm/
>>>>
>>>>
>>>> ps: the jira https://issues.apache.org/jira/browse/SPARK-29106 trace
>>>> the whole work, thank you very much Shane:)
>>>>
>>>

Re: Ask for ARM CI for spark

Posted by Tianhua huang <hu...@gmail.com>.

@Sean,
Yes, you are right, we don't have to create a separate release of Spark for
ARM, it's enough to add a releasenote to say that Spark supports
arm architecture.
About the test failure, one or two tests will timeout on our poor
performance arm instance sometimes, now we donate a high performance arm
instance to amplab, and wait shane to build the jobs on it.

On Fri, Nov 15, 2019 at 10:13 AM Sean Owen <sr...@gmail.com> wrote:

> I don't quite understand. You are saying tests don't pass yet, so why
> would anyone yet run these tests regularly?
> If it's because the instances aren't fast enough, use bigger instances?
> I don't think anyone would create a separate release of Spark for ARM, no.
> But why would that be necessary?
>
> On Thu, Nov 14, 2019 at 7:28 PM bo zhaobo <bz...@gmail.com>
> wrote:
>
>> Hi Spark team,
>>
>> Any ideas about the above email? Thank you.
>>
>> BR
>>
>> ZhaoBo
>>
>>
>> [image: Mailtrack]
>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> Sender
>> notified by
>> Mailtrack
>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> 19/11/15
>> 上午09:26:17
>>
>> Tianhua huang <hu...@gmail.com> 于2019年11月12日周二 下午2:47写道：
>>
>>> Hi all,
>>>
>>> Spark arm jobs have built for some time, and now there are two jobs[1]
>>> spark-master-test-maven-arm
>>> <https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-arm/>
>>> and spark-master-test-python-arm
>>> <https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-python-arm/>,
>>> we can see there are some build failures, but it because of the poor
>>> performance of the arm instance, and now we begin to build spark arm jobs
>>> on other high performance instances, and the build/test are all success, we
>>> plan to donate the instance to amplab later.  According to the build
>>> history, we are very happy to say spark is supported on aarch64 platform,
>>> and I suggest to add this good news into spark-3.0.0 releasenotes. Maybe
>>> community could provide an arm-supported release of spark at the meanwhile?
>>>
>>> [1]
>>> https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-arm/
>>> https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-python-arm/
>>>
>>>
>>> ps: the jira https://issues.apache.org/jira/browse/SPARK-29106 trace
>>> the whole work, thank you very much Shane:)
>>>
>>

Re: Ask for ARM CI for spark

Posted by Sean Owen <sr...@gmail.com>.

I don't quite understand. You are saying tests don't pass yet, so why would
anyone yet run these tests regularly?
If it's because the instances aren't fast enough, use bigger instances?
I don't think anyone would create a separate release of Spark for ARM, no.
But why would that be necessary?

On Thu, Nov 14, 2019 at 7:28 PM bo zhaobo <bz...@gmail.com>
wrote:

> Hi Spark team,
>
> Any ideas about the above email? Thank you.
>
> BR
>
> ZhaoBo
>
>
> [image: Mailtrack]
> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> Sender
> notified by
> Mailtrack
> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> 19/11/15
> 上午09:26:17
>
> Tianhua huang <hu...@gmail.com> 于2019年11月12日周二 下午2:47写道：
>
>> Hi all,
>>
>> Spark arm jobs have built for some time, and now there are two jobs[1]
>> spark-master-test-maven-arm
>> <https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-arm/>
>> and spark-master-test-python-arm
>> <https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-python-arm/>,
>> we can see there are some build failures, but it because of the poor
>> performance of the arm instance, and now we begin to build spark arm jobs
>> on other high performance instances, and the build/test are all success, we
>> plan to donate the instance to amplab later.  According to the build
>> history, we are very happy to say spark is supported on aarch64 platform,
>> and I suggest to add this good news into spark-3.0.0 releasenotes. Maybe
>> community could provide an arm-supported release of spark at the meanwhile?
>>
>> [1]
>> https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-arm/
>> https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-python-arm/
>>
>> ps: the jira https://issues.apache.org/jira/browse/SPARK-29106 trace the
>> whole work, thank you very much Shane:)
>>
>

Re: Ask for ARM CI for spark

Posted by bo zhaobo <bz...@gmail.com>.

Hi Spark team,

Any ideas about the above email? Thank you.

BR

ZhaoBo


[image: Mailtrack]
<https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&>
Sender
notified by
Mailtrack
<https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&>
19/11/15
上午09:26:17

Tianhua huang <hu...@gmail.com> 于2019年11月12日周二 下午2:47写道：

> Hi all,
>
> Spark arm jobs have built for some time, and now there are two jobs[1]
> spark-master-test-maven-arm
> <https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-arm/>
> and spark-master-test-python-arm
> <https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-python-arm/>,
> we can see there are some build failures, but it because of the poor
> performance of the arm instance, and now we begin to build spark arm jobs
> on other high performance instances, and the build/test are all success, we
> plan to donate the instance to amplab later.  According to the build
> history, we are very happy to say spark is supported on aarch64 platform,
> and I suggest to add this good news into spark-3.0.0 releasenotes. Maybe
> community could provide an arm-supported release of spark at the meanwhile?
>
> [1]
> https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-arm/
> https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-python-arm/
>
> ps: the jira https://issues.apache.org/jira/browse/SPARK-29106 trace the
> whole work, thank you very much Shane:)
>
> On Thu, Oct 17, 2019 at 2:52 PM bo zhaobo <bz...@gmail.com>
> wrote:
>
>> Just Notes: The jira issue link is
>> https://issues.apache.org/jira/browse/SPARK-29106
>>
>>
>>
>> [image: Mailtrack]
>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> Sender
>> notified by
>> Mailtrack
>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> 19/10/17
>> 下午02:50:01
>>
>> Tianhua huang <hu...@gmail.com> 于2019年10月17日周四 上午10:47写道：
>>
>>> OK, let's update infos there. Thanks.
>>>
>>> On Thu, Oct 17, 2019 at 1:52 AM Shane Knapp <sk...@berkeley.edu> wrote:
>>>
>>>> i totally missed the spark jira from earlier...  let's move the
>>>> conversation there!
>>>>
>>>> On Tue, Oct 15, 2019 at 6:21 PM bo zhaobo <bz...@gmail.com>
>>>> wrote:
>>>>
>>>>> Shane, Awaresome! We will try the best to finish the test and the
>>>>> requests on the VM recently. Once we finish those things, we will send you
>>>>> an email , then we can continue the following things. Thank you very much.
>>>>>
>>>>> Best Regards,
>>>>>
>>>>> ZhaoBo
>>>>>
>>>>> Shane Knapp <sk...@berkeley.edu> 于 2019年10月16日周三 上午3:47写道：
>>>>>
>>>>>> ok!  i'm able to successfully log in to the VM!
>>>>>>
>>>>>> i also have created a jenkins worker entry:
>>>>>> https://amplab.cs.berkeley.edu/jenkins/computer/spark-arm-vm/
>>>>>>
>>>>>> it's a pretty bare-bones VM, so i have some suggestions/requests
>>>>>> before we can actually proceed w/testing.  i will not be able to perform
>>>>>> any system configuration, as i don't have the cycles to reverse-engineer
>>>>>> the ansible setup and test it all out.
>>>>>>
>>>>>> * java is not installed, please install the following:
>>>>>>   - java8 min version 1.8.0_191
>>>>>>   - java11 min version 11.0.1
>>>>>>
>>>>>> * it appears from the ansible playbook that there are other deps that
>>>>>> need to be installed.
>>>>>>   - please install all deps
>>>>>>   - manually run the tests until they pass
>>>>>>
>>>>>> * the jenkins user should NEVER have sudo or any root-level access!
>>>>>>
>>>>>> * once the arm tests pass when manually run, take a snapshot of this
>>>>>> image so we can recreate it w/o needing to reinstall everything
>>>>>>
>>>>>> after that's done i can finish configuring the jenkins worker and set
>>>>>> up a build...
>>>>>>
>>>>>> thanks!
>>>>>>
>>>>>> shane
>>>>>>
>>>>>>
>>>>>> On Mon, Oct 14, 2019 at 8:34 PM Shane Knapp <sk...@berkeley.edu>
>>>>>> wrote:
>>>>>>
>>>>>>> yes, i will get to that tomorrow.  today was spent cleaning up the
>>>>>>> mess from last week.
>>>>>>>
>>>>>>> On Mon, Oct 14, 2019 at 6:18 PM bo zhaobo <
>>>>>>> bzhaojyathousandy@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi shane,
>>>>>>>>
>>>>>>>> That's great news for Amplab is back. ;-) . If possible, could you
>>>>>>>> please take several minutes to check the ARM VM is accessible from your
>>>>>>>> side? And is there any plan for the whole ARM test integration from
>>>>>>>> you?(how about we finish it this month?) Thanks.
>>>>>>>>
>>>>>>>> Best regards,
>>>>>>>>
>>>>>>>> ZhaoBo
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> [image: Mailtrack]
>>>>>>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> Sender
>>>>>>>> notified by
>>>>>>>> Mailtrack
>>>>>>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> 19/10/15
>>>>>>>> 上午09:13:33
>>>>>>>>
>>>>>>>> bo zhaobo <bz...@gmail.com> 于2019年10月10日周四 上午8:29写道：
>>>>>>>>
>>>>>>>>> Oh, sorry about we miss that email.  If possible, could you please
>>>>>>>>> take some minutes to test the ARM VM is accessible through your ssh private
>>>>>>>>> key with jenkins user? And we plan to make the whole integration process
>>>>>>>>> and test could be done before the end of this month. We are very happy to
>>>>>>>>> work together with you to move it forward, if you are free and agree that.
>>>>>>>>> :)  Thank you very much.
>>>>>>>>>
>>>>>>>>> Best Regards,
>>>>>>>>> Zhao Bo
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Shane Knapp <sk...@berkeley.edu> 于 2019年10月9日周三 下午11:10写道：
>>>>>>>>>
>>>>>>>>>> i spent yesterday dealing w/a power outage on campus.  please see
>>>>>>>>>> my email to the spark dev list.
>>>>>>>>>>
>>>>>>>>>> On Wed, Oct 9, 2019 at 3:29 AM Tianhua huang <
>>>>>>>>>> huangtianhua223@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Shane,
>>>>>>>>>>> Sorry to disturb you again, I wonder if there is a progress of
>>>>>>>>>>> the jenkins arm job?
>>>>>>>>>>> And please don't hesitate to contact us if you have any
>>>>>>>>>>> questions and need helps, thank you very much :)
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Oct 8, 2019 at 9:21 AM bo zhaobo <
>>>>>>>>>>> bzhaojyathousandy@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Shane,
>>>>>>>>>>>>
>>>>>>>>>>>> Sorry for late. We are just back for a 7-days holiday.
>>>>>>>>>>>>
>>>>>>>>>>>> I already make the public key into the VM. If you are free,
>>>>>>>>>>>> please go ahead to test. Thank you
>>>>>>>>>>>>
>>>>>>>>>>>> Best regards
>>>>>>>>>>>>
>>>>>>>>>>>> ZhaoBo
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> [image: Mailtrack]
>>>>>>>>>>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> Sender
>>>>>>>>>>>> notified by
>>>>>>>>>>>> Mailtrack
>>>>>>>>>>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> 19/10/08
>>>>>>>>>>>> 上午09:16:49
>>>>>>>>>>>>
>>>>>>>>>>>> Shane Knapp <sk...@berkeley.edu> 于2019年10月1日周二 上午2:33写道：
>>>>>>>>>>>>
>>>>>>>>>>>>> here's the public key for the jenkins user:
>>>>>>>>>>>>> ssh-rsa
>>>>>>>>>>>>> AAAAB3NzaC1yc2EAAAABIwAAAgEApe+DQF0SusgpdSDLAeZ5ymbEbUODTMUT67yRCaVD7S4oAWgtHXWSLtgZAlD0D2N2qRm74DVXcCrN+LGIxExXP+h/xAPI+0tMHAFt0+u5zTy+6Fq3ADtG5q4dmNMohk4gZhlueeBN7JT6b3uRLwnrSr2F9DCd5F3gMd2fXAHVGWlOPY01IFwJcHcu4VVPV3pHq35N7TyZGup0Np/D1FtB4Hpw7tyrtiidYfQXE1MFVWLpHXFIoRjMEPGfZw5gfIuejImd22W3Qx9BHPC0e97wOxbQfygZHh8S0J5v6X5dvR/jZZs2queMiNwSDsVnjqDX3vgOIymfgy6xJNjiTTXPNuwEbmk54DCMkqibSY3NmmWPAzzWI1SwU4bSmVExY97TgoLr7hEBQQeMZuKScWVY2tD3yRJz18a3rJGnSboESPpItr5pLlCKcZlvJKM24goo4Uiqi9lLPvJbeXV3FbSiGt9pWDu18XzuZGkamxJzkCKSmhCoxB+fqNXWL7jEcvJw8smF9oTmnGG+in4awCBW11U2wkvPvCwUBWB3tRwHERaG6vTp0CNIKaBH5R968qsWuhbNjlARul5JG3XVUbljjxI4s8W+jRU++Ua2wlDC/6scqgEHlLGbQUO5uHCUTxn+wx7XpEZ6FSucOof0S0TR5mCrsFZ7+eV0nLX6l1cZhs8=
>>>>>>>>>>>>> jenkins@amp-jenkins-master
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> once you get that set up, please let me know and i will test
>>>>>>>>>>>>> ssh-ability.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, Sep 25, 2019 at 10:15 PM Shane Knapp <
>>>>>>>>>>>>> sknapp@berkeley.edu> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> thanks for understanding.  i'm a one-man show.  :)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> shane
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Wed, Sep 25, 2019 at 6:52 PM Tianhua huang <
>>>>>>>>>>>>>> huangtianhua223@gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I am very happy to hear that :)
>>>>>>>>>>>>>>> And please don't hesitate to contact us if you have any
>>>>>>>>>>>>>>> questions and need helps, thanks again.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Thu, Sep 26, 2019 at 12:34 AM Shane Knapp <
>>>>>>>>>>>>>>> sknapp@berkeley.edu> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> i'm hoping to get started by EOW or beginning of next.  i
>>>>>>>>>>>>>>>> have an incredibly small team here @ berkeley and support a lot of research
>>>>>>>>>>>>>>>> labs, and i've been down 25-50% staffing for the past 6 weeks.  thankfully
>>>>>>>>>>>>>>>> my whole team is back and i'm finally getting my head above water and will
>>>>>>>>>>>>>>>> have time to dedicate to this really soon.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> sorry for the delay!
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> shane
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Mon, Sep 23, 2019 at 7:21 PM bo zhaobo <
>>>>>>>>>>>>>>>> bzhaojyathousandy@gmail.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi Shane,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> How about to start the ARM work this week? Can we? ;-).
>>>>>>>>>>>>>>>>> Thank you
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Best Regards
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> [image: Mailtrack]
>>>>>>>>>>>>>>>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> Sender
>>>>>>>>>>>>>>>>> notified by
>>>>>>>>>>>>>>>>> Mailtrack
>>>>>>>>>>>>>>>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> 19/09/24
>>>>>>>>>>>>>>>>> 上午10:18:01
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Shane Knapp <sk...@berkeley.edu> 于2019年9月20日周五 上午11:57写道：
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> i'll have the cycles over the next week.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Thu, Sep 19, 2019 at 7:51 PM bo zhaobo <
>>>>>>>>>>>>>>>>>> bzhaojyathousandy@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Hi Shane,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Is there any update about the last email? ;-)
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Best Regards
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> [image: Mailtrack]
>>>>>>>>>>>>>>>>>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> Sender
>>>>>>>>>>>>>>>>>>> notified by
>>>>>>>>>>>>>>>>>>> Mailtrack
>>>>>>>>>>>>>>>>>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> 19/09/20
>>>>>>>>>>>>>>>>>>> 上午10:49:35
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> bo zhaobo <bz...@gmail.com> 于2019年9月18日周三
>>>>>>>>>>>>>>>>>>> 上午10:02写道：
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Thanks for reply.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I will answer you question One by One.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> 0) Sure, I can create an user named 'jenkins' manually.
>>>>>>>>>>>>>>>>>>>> And I will left the 'arm' User if you might need the higher priority to do
>>>>>>>>>>>>>>>>>>>> something on the VM.
>>>>>>>>>>>>>>>>>>>> 1) Sure, that would be great if you could provide a
>>>>>>>>>>>>>>>>>>>> public key to us. ;-)
>>>>>>>>>>>>>>>>>>>> 2) Sure
>>>>>>>>>>>>>>>>>>>> 3) Yeah, it's a persistent VM now. But from us, we plan
>>>>>>>>>>>>>>>>>>>> to donate more resources beside the end of October. It's better to move the
>>>>>>>>>>>>>>>>>>>> all ARM jenkins workers to the same place, so this VM might be recycle in
>>>>>>>>>>>>>>>>>>>> the future. For now, that dosen't stop us to move the CI job forward.
>>>>>>>>>>>>>>>>>>>> 4) Correct. ;-)
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> If you need more information or help about the ARM VM,
>>>>>>>>>>>>>>>>>>>> please free to contact us.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Questions from us:
>>>>>>>>>>>>>>>>>>>> 1> is there a online chat platform? then we can chat
>>>>>>>>>>>>>>>>>>>> and discuss on that. Seem emailing is too slow for us. ;-). Or is there a
>>>>>>>>>>>>>>>>>>>> online discuss platform in Spark community?
>>>>>>>>>>>>>>>>>>>> 2> Could you please share the test scripts once you
>>>>>>>>>>>>>>>>>>>> finish the test integration? Also the existing Jenkins test scripts in
>>>>>>>>>>>>>>>>>>>> Spark CI, as we plan to do the same thing like X86, so We think testing the
>>>>>>>>>>>>>>>>>>>> same jobs like X86 on ARM would be a good start.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Thank you
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Best  Regards
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> [image: Mailtrack]
>>>>>>>>>>>>>>>>>>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> Sender
>>>>>>>>>>>>>>>>>>>> notified by
>>>>>>>>>>>>>>>>>>>> Mailtrack
>>>>>>>>>>>>>>>>>>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> 19/09/18
>>>>>>>>>>>>>>>>>>>> 上午09:33:53
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Shane Knapp <sk...@berkeley.edu> 于2019年9月18日周三
>>>>>>>>>>>>>>>>>>>> 上午5:05写道：
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> thanks for the info...  a couple of things:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> 0) could you rename/create a 'jenkins' user with the
>>>>>>>>>>>>>>>>>>>>> same rights?
>>>>>>>>>>>>>>>>>>>>> 1) i will provide an ssh key so that you can add it to
>>>>>>>>>>>>>>>>>>>>> the jenkins user's authorized_keys file
>>>>>>>>>>>>>>>>>>>>> 2) the jenkins user should NOT have root access.  this
>>>>>>>>>>>>>>>>>>>>> is a major security hole
>>>>>>>>>>>>>>>>>>>>> 3) will this be a persistent VM?  if so, i'd much
>>>>>>>>>>>>>>>>>>>>> prefer to have it set up initially so we can just log in, build what we
>>>>>>>>>>>>>>>>>>>>> need and launch the job
>>>>>>>>>>>>>>>>>>>>> 4) yay ansible!  i can create the job locally from the
>>>>>>>>>>>>>>>>>>>>> template.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Tue, Sep 17, 2019 at 1:16 AM bo zhaobo <
>>>>>>>>>>>>>>>>>>>>> bzhaojyathousandy@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Hi @shane knapp <sk...@berkeley.edu> , Thanks.
>>>>>>>>>>>>>>>>>>>>>> Tianhua huang and I already create a ARM VM for this.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> The Infomation about the ARM VM:
>>>>>>>>>>>>>>>>>>>>>> IP: 213.146.141.16
>>>>>>>>>>>>>>>>>>>>>> ssh_key: please see the attachment file
>>>>>>>>>>>>>>>>>>>>>> "lab_ssh_key.txt"
>>>>>>>>>>>>>>>>>>>>>> Then you can login to it with "ssh  -i
>>>>>>>>>>>>>>>>>>>>>> lab_ssh_key.txt arm@213.146.141.16 ", the User "arm"
>>>>>>>>>>>>>>>>>>>>>> have sudo priority.
>>>>>>>>>>>>>>>>>>>>>> Notes: This VM is running on Cloud, now it just
>>>>>>>>>>>>>>>>>>>>>> enable ssh (tcp 22 port) and ping (icmp) for ingress network connection, if
>>>>>>>>>>>>>>>>>>>>>> you need more, please let us know.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> And first we can add periodic job for spark arm test,
>>>>>>>>>>>>>>>>>>>>>> maybe the zuul job details of openlab we tested can be referenced in
>>>>>>>>>>>>>>>>>>>>>> jenkins job
>>>>>>>>>>>>>>>>>>>>>> https://github.com/theopenlab/openlab-zuul-jobs/blob/master/playbooks/spark-unit-test-arm64/run.yaml
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Let's give you some explaination about that file. In
>>>>>>>>>>>>>>>>>>>>>> the Step3 of above file, leveldbjni doesn't have a ARM release, so we build
>>>>>>>>>>>>>>>>>>>>>> leveldbjni on arm and use ourselves leveldbjni jar, which locates on [1].
>>>>>>>>>>>>>>>>>>>>>> So you need to still use the jar to test. And hadoop side also depends on
>>>>>>>>>>>>>>>>>>>>>> leveldbjni, I remember that is "hadoop-client". For this reason, we can not
>>>>>>>>>>>>>>>>>>>>>> change the pom file directly, so we use "mvn install" to enable leveldb
>>>>>>>>>>>>>>>>>>>>>> during testing spark on ARM.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Please Notes： All test runs without ROOT user. So
>>>>>>>>>>>>>>>>>>>>>> that's good if you test with "arm" User.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Best Regards
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>>>>>>> https://repo1.maven.org/maven2/org/openlabtesting/leveldbjni/leveldbjni-all/1.8/leveldbjni-all-1.8.jar
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> [image: Mailtrack]
>>>>>>>>>>>>>>>>>>>>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> Sender
>>>>>>>>>>>>>>>>>>>>>> notified by
>>>>>>>>>>>>>>>>>>>>>> Mailtrack
>>>>>>>>>>>>>>>>>>>>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> 19/09/17
>>>>>>>>>>>>>>>>>>>>>> 下午04:08:59
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Tianhua huang <hu...@gmail.com>
>>>>>>>>>>>>>>>>>>>>>> 于2019年9月17日周二 下午2:48写道：
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> @shane knapp <sk...@berkeley.edu> thank you very
>>>>>>>>>>>>>>>>>>>>>>> much, I opened an issue for this
>>>>>>>>>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/SPARK-29106,
>>>>>>>>>>>>>>>>>>>>>>> we can tall the details in it :)
>>>>>>>>>>>>>>>>>>>>>>> And we will prepare an arm instance today and will
>>>>>>>>>>>>>>>>>>>>>>> send the info to your email later.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> On Tue, Sep 17, 2019 at 4:40 AM Shane Knapp <
>>>>>>>>>>>>>>>>>>>>>>> sknapp@berkeley.edu> wrote:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> @Tianhua huang <hu...@gmail.com> sure, i
>>>>>>>>>>>>>>>>>>>>>>>> think we can get something sorted for the short-term.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> all we need is ssh access (i can provide an ssh
>>>>>>>>>>>>>>>>>>>>>>>> key), and i can then have our jenkins master launch a remote worker on that
>>>>>>>>>>>>>>>>>>>>>>>> instance.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> instance setup, etc, will be up to you.  my support
>>>>>>>>>>>>>>>>>>>>>>>> for the time being will be to create the job and 'best effort' for
>>>>>>>>>>>>>>>>>>>>>>>> everything else.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> this should get us up and running asap.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> is there an open JIRA for jenkins/arm test
>>>>>>>>>>>>>>>>>>>>>>>> support?  we can move the technical details about this idea there.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> On Sun, Sep 15, 2019 at 9:03 PM Tianhua huang <
>>>>>>>>>>>>>>>>>>>>>>>> huangtianhua223@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> @Sean Owen <sr...@gmail.com> , so sorry to reply
>>>>>>>>>>>>>>>>>>>>>>>>> late, we had a Mid-Autumn holiday:)
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> If you hope to integrate ARM CI to amplab jenkins,
>>>>>>>>>>>>>>>>>>>>>>>>> we can offer the arm instance, and then the ARM job will run together with
>>>>>>>>>>>>>>>>>>>>>>>>> other x86 jobs, so maybe there is a guideline to do this? @shane
>>>>>>>>>>>>>>>>>>>>>>>>> knapp <sk...@berkeley.edu>  would you help us?
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Sep 12, 2019 at 9:36 PM Sean Owen <
>>>>>>>>>>>>>>>>>>>>>>>>> srowen@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> I don't know what's involved in actually
>>>>>>>>>>>>>>>>>>>>>>>>>> accepting or operating those machines, so can't comment there, but in the
>>>>>>>>>>>>>>>>>>>>>>>>>> meantime it's good that you are running these tests and can help report
>>>>>>>>>>>>>>>>>>>>>>>>>> changes needed to keep it working with ARM. I would continue with that for
>>>>>>>>>>>>>>>>>>>>>>>>>> now.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Sep 11, 2019 at 10:06 PM Tianhua huang <
>>>>>>>>>>>>>>>>>>>>>>>>>> huangtianhua223@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> For the whole work process of spark ARM CI, we
>>>>>>>>>>>>>>>>>>>>>>>>>>> want to make 2 things clear.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> The first thing is:
>>>>>>>>>>>>>>>>>>>>>>>>>>> About spark ARM CI, now we have two periodic
>>>>>>>>>>>>>>>>>>>>>>>>>>> jobs, one job[1] based on commit[2](which already fixed the replay tests
>>>>>>>>>>>>>>>>>>>>>>>>>>> failed issue[3], we made a new test branch based on date 09-09-2019), the
>>>>>>>>>>>>>>>>>>>>>>>>>>> other job[4] based on spark master.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> The first job we test on the specified branch to
>>>>>>>>>>>>>>>>>>>>>>>>>>> prove that our ARM CI is good and stable.
>>>>>>>>>>>>>>>>>>>>>>>>>>> The second job checks spark master every day,
>>>>>>>>>>>>>>>>>>>>>>>>>>> then we can find whether the latest commits affect the ARM CI. According to
>>>>>>>>>>>>>>>>>>>>>>>>>>> the build history and result, it shows that some problems are easier to
>>>>>>>>>>>>>>>>>>>>>>>>>>> find on ARM like SPARK-28770
>>>>>>>>>>>>>>>>>>>>>>>>>>> <https://issues.apache.org/jira/browse/SPARK-28770>,
>>>>>>>>>>>>>>>>>>>>>>>>>>> and it also shows that we would make efforts to trace and figure them
>>>>>>>>>>>>>>>>>>>>>>>>>>> out, till now we have found and fixed several problems[5][6][7], thanks
>>>>>>>>>>>>>>>>>>>>>>>>>>> everyone of the community :). And we believe that ARM CI is very necessary,
>>>>>>>>>>>>>>>>>>>>>>>>>>> right?
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> The second thing is:
>>>>>>>>>>>>>>>>>>>>>>>>>>> We plan to run the jobs for a period of time,
>>>>>>>>>>>>>>>>>>>>>>>>>>> and you can see the result and logs from 'build history' of the jobs
>>>>>>>>>>>>>>>>>>>>>>>>>>> console, if everything goes well for one or two weeks could community
>>>>>>>>>>>>>>>>>>>>>>>>>>> accept the ARM CI? or how long the periodic jobs to run then our community
>>>>>>>>>>>>>>>>>>>>>>>>>>> could have enough confidence to accept the ARM CI? As you suggested before,
>>>>>>>>>>>>>>>>>>>>>>>>>>> it's good to integrate ARM CI to amplab jenkins, we agree that and we can
>>>>>>>>>>>>>>>>>>>>>>>>>>> donate the ARM instances and then maintain the ARM-related test jobs
>>>>>>>>>>>>>>>>>>>>>>>>>>> together with community, any thoughts?
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Thank you all!
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>>>>>>>>>>>> http://status.openlabtesting.org/job/spark-unchanged-branch-unit-test-hadoop-2.7-arm64
>>>>>>>>>>>>>>>>>>>>>>>>>>> [2]
>>>>>>>>>>>>>>>>>>>>>>>>>>> https://github.com/apache/spark/commit/0ed9fae45769d4b06b8cf8128f462f09ff3d9a72
>>>>>>>>>>>>>>>>>>>>>>>>>>> [3]
>>>>>>>>>>>>>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/SPARK-28770
>>>>>>>>>>>>>>>>>>>>>>>>>>> [4]
>>>>>>>>>>>>>>>>>>>>>>>>>>> http://status.openlabtesting.org/builds?job_name=spark-master-unit-test-hadoop-2.7-arm64
>>>>>>>>>>>>>>>>>>>>>>>>>>> [5] https://github.com/apache/spark/pull/25186
>>>>>>>>>>>>>>>>>>>>>>>>>>> [6] https://github.com/apache/spark/pull/25279
>>>>>>>>>>>>>>>>>>>>>>>>>>> [7] https://github.com/apache/spark/pull/25673
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Aug 16, 2019 at 11:24 PM Sean Owen <
>>>>>>>>>>>>>>>>>>>>>>>>>>> srowen@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Yes, I think it's just local caching. After you
>>>>>>>>>>>>>>>>>>>>>>>>>>>> run the build you should find lots of stuff cached at ~/.m2/repository and
>>>>>>>>>>>>>>>>>>>>>>>>>>>> it won't download every time.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Aug 16, 2019 at 3:01 AM bo zhaobo <
>>>>>>>>>>>>>>>>>>>>>>>>>>>> bzhaojyathousandy@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Sean,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for reply. And very apologize for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> making you confused.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I know the dependencies will be downloaded
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> from SBT or Maven. But the Spark QA job also exec "mvn clean package", why
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the log didn't print "downloading some jar from Maven central [1] and build
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> very fast. Is the reason that Spark Jenkins build the Spark jars in the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> physical machiines and won't destrory the test env after job is finished?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Then the other job build Spark will get the dependencies jar from the local
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cached, as the previous jobs exec "mvn package", those dependencies had
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> been downloaded already on local worker machine. Am I right? Is that the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> reason the job log[1] didn't print any downloading information from Maven
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Central?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thank you very much.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-master-test-maven-hadoop-2.6-ubuntu-testing/lastBuild/consoleFull
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best regards
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ZhaoBo
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [image: Mailtrack]
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> Sender
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> notified by
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Mailtrack
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> 19/08/16
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 下午03:58:53
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Sean Owen <sr...@gmail.com> 于2019年8月16日周五
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 上午10:38写道：
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm not sure what you mean. The dependencies
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> are downloaded by SBT and Maven like in any other project, and nothing
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> about it is specific to Spark.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The worker machines cache artifacts that are
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> downloaded from these, but this is a function of Maven and SBT, not Spark.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> You may find that the initial download takes a long time.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 15, 2019 at 9:02 PM bo zhaobo <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> bzhaojyathousandy@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Sean,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks very much for pointing out the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> roadmap. ;-). Then I think we will continue to focus on our test
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> environment.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> For the networking problems, I mean that we
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> can access Maven Central, and jobs cloud download the required jar package
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with a high network speed. What we want to know is that, why the Spark QA
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> test jobs[1] log shows the job script/maven build seem don't download the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> jar packages? Could you tell us the reason about that? Thank you.  The
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> reason we raise the "networking problems" is that we found a phenomenon
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> during we test, if we execute "mvn clean package" in a new test
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> environment(As in our test environment, we will destory the test VMs after
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the job is finish), maven will download the dependency jar packages from
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Maven Central, but in this job "spark-master-test-maven-hadoop" [2], from
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the log, we didn't found it download any jar packages, what the reason
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> about that?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Also we build the Spark jar with downloading
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> dependencies from Maven Central, it will cost mostly 1 hour. And we found
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [2] just cost 10min. But if we run "mvn package" in a VM which already exec
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> "mvn package" before, it just cost 14min, looks very closer with [2]. So we
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> suspect that downloading the Jar packages cost so much time. For the goad
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of ARM CI, we expect the performance of NEW ARM CI could be closer with
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> existing X86 CI, then users could accept it eaiser.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [2]
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-master-test-maven-hadoop-2.6-ubuntu-testing/lastBuild/consoleFull
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best regards
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ZhaoBo
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [image: Mailtrack]
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> Sender
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> notified by
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Mailtrack
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> 19/08/16
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 上午09:48:43
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Sean Owen <sr...@gmail.com> 于2019年8月15日周四
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 下午9:58写道：
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I think the right goal is to fix the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> remaining issues first. If we set up CI/CD it will only tell us there are
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> still some test failures. If it's stable, and not hard to add to the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> existing CI/CD, yes it could be done automatically later. You can continue
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to test on ARM independently for now.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It sounds indeed like there are some
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> networking problems in the test system if you're not able to download from
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Maven Central. That rarely takes significant time, and there aren't
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> project-specific mirrors here. You might be able to point at a closer
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> public mirror, depending on where you are.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 15, 2019 at 5:43 AM Tianhua
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> huang <hu...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I want to discuss spark ARM CI again, we
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> took some tests on arm instance based on master and the job includes
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> https://github.com/theopenlab/spark/pull/13  and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> k8s integration
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> https://github.com/theopenlab/spark/pull/17/ ,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> there are several things I want to talk about:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> First, about the failed tests:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     1.we have fixed some problems like
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> https://github.com/apache/spark/pull/25186
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> https://github.com/apache/spark/pull/25279,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> thanks sean owen and others to help us.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     2.we tried k8s integration test on
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> arm, and met an error: apk fetch hangs,  the tests passed  after adding
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '--network host' option for command `docker build`, see:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> https://github.com/theopenlab/spark/pull/17/files#diff-5b731b14068240d63a93c393f6f9b1e8R176
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> , the solution refers to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> https://github.com/gliderlabs/docker-alpine/issues/307
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and I don't know whether it happened once in community CI, or maybe we
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> should submit a pr to pass  '--network host' when `docker build`?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     3.we found there are two tests failed
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> after the commit
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> https://github.com/apache/spark/pull/23767
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> :
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>        ReplayListenerSuite:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>        - ...
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>        - End-to-end replay *** FAILED ***
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>          "[driver]" did not equal "[1]"
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (JsonProtocolSuite.scala:622)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>        - End-to-end replay with
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> compression *** FAILED ***
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>          "[driver]" did not equal "[1]"
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (JsonProtocolSuite.scala:622)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>         we tried to revert the commit and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> then the tests passed, the patch is too big and so sorry we can't find the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> reason till now, if you are interesting please try it, and it will be very
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> appreciate          if someone can help us to figure it out.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Second, about the test time, we increased
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the flavor of arm instance to 16U16G, but seems there was no significant
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> improvement, the k8s integration test took about one and a half hours, and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the QA test(like spark-master-test-maven-hadoop-2.7 community jenkins job)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> took about seventeen hours(it is too long :(), we suspect that the reason
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is the performance and network,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> we split the jobs based on projects such
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> as sql, core and so on, the time can be decrease to about seven hours, see
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> https://github.com/theopenlab/spark/pull/19 We
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> found the Spark QA tests like
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/   ,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it looks all tests seem never download the jar packages from maven centry
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> repo(such as
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> https://repo.maven.apache.org/maven2/org/opencypher/okapi-api/0.4.2/okapi-api-0.4.2.jar).
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> So we want to know how the jenkins jobs can do that, is there a internal
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> maven repo launched? maybe we can do the same thing to avoid the network
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> connection cost during downloading the dependent jar packages.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Third, the most important thing, it's
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> about ARM CI of spark, we believe that it is necessary, right? And you can
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> see we really made a lot of efforts, now the basic arm build/test jobs is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ok, so we suggest to add arm jobs to community, we can set them to novoting
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> firstly, and improve/rich the jobs step by step. Generally, there are two
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ways in our mind to integrate the ARM CI for spark:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>      1) We introduce openlab ARM CI into
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> spark as a custom CI system. We provide human resources and test ARM VMs,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> also we will focus on the ARM related issues about Spark. We will push the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> PR into community.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>      2) We donate ARM VM resources into
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> existing amplab Jenkins. We still provide human resources, focus on the ARM
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> related issues about Spark and push the PR into community.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Both options, we will provide human
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> resources to maintain, of course it will be great if we can work together.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> So please tell us which option you would like? And let's move forward.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Waiting for your reply, thank you very much.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>>>> Shane Knapp
>>>>>>>>>>>>>>>>>>>>>>>> UC Berkeley EECS Research / RISELab Staff Technical
>>>>>>>>>>>>>>>>>>>>>>>> Lead
>>>>>>>>>>>>>>>>>>>>>>>> https://rise.cs.berkeley.edu
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>> Shane Knapp
>>>>>>>>>>>>>>>>>>>>> UC Berkeley EECS Research / RISELab Staff Technical
>>>>>>>>>>>>>>>>>>>>> Lead
>>>>>>>>>>>>>>>>>>>>> https://rise.cs.berkeley.edu
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>> Shane Knapp
>>>>>>>>>>>>>>>>>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>>>>>>>>>>>>>>>>> https://rise.cs.berkeley.edu
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> Shane Knapp
>>>>>>>>>>>>>>>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>>>>>>>>>>>>>>> https://rise.cs.berkeley.edu
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Shane Knapp
>>>>>>>>>>>>>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>>>>>>>>>>>>> https://rise.cs.berkeley.edu
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Shane Knapp
>>>>>>>>>>>>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>>>>>>>>>>>> https://rise.cs.berkeley.edu
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Shane Knapp
>>>>>>>>>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>>>>>>>>> https://rise.cs.berkeley.edu
>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Shane Knapp
>>>>>>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>>>>>> https://rise.cs.berkeley.edu
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Shane Knapp
>>>>>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>>>>> https://rise.cs.berkeley.edu
>>>>>>
>>>>>
>>>>> [image: Mailtrack]
>>>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> Sender
>>>>> notified by
>>>>> Mailtrack
>>>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> 19/10/16
>>>>> 上午09:13:46
>>>>>
>>>>
>>>>
>>>> --
>>>> Shane Knapp
>>>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>>> https://rise.cs.berkeley.edu
>>>>
>>>

Re: Ask for ARM CI for spark

Posted by Tianhua huang <hu...@gmail.com>.

Hi all,

Spark arm jobs have built for some time, and now there are two jobs[1]
spark-master-test-maven-arm
<https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-arm/>
and spark-master-test-python-arm
<https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-python-arm/>,
we can see there are some build failures, but it because of the poor
performance of the arm instance, and now we begin to build spark arm jobs
on other high performance instances, and the build/test are all success, we
plan to donate the instance to amplab later.  According to the build
history, we are very happy to say spark is supported on aarch64 platform,
and I suggest to add this good news into spark-3.0.0 releasenotes. Maybe
community could provide an arm-supported release of spark at the meanwhile?

[1]
https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-arm/
https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-python-arm/

ps: the jira https://issues.apache.org/jira/browse/SPARK-29106 trace the
whole work, thank you very much Shane:)

On Thu, Oct 17, 2019 at 2:52 PM bo zhaobo <bz...@gmail.com>
wrote:

> Just Notes: The jira issue link is
> https://issues.apache.org/jira/browse/SPARK-29106
>
>
>
> [image: Mailtrack]
> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> Sender
> notified by
> Mailtrack
> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> 19/10/17
> 下午02:50:01
>
> Tianhua huang <hu...@gmail.com> 于2019年10月17日周四 上午10:47写道：
>
>> OK, let's update infos there. Thanks.
>>
>> On Thu, Oct 17, 2019 at 1:52 AM Shane Knapp <sk...@berkeley.edu> wrote:
>>
>>> i totally missed the spark jira from earlier...  let's move the
>>> conversation there!
>>>
>>> On Tue, Oct 15, 2019 at 6:21 PM bo zhaobo <bz...@gmail.com>
>>> wrote:
>>>
>>>> Shane, Awaresome! We will try the best to finish the test and the
>>>> requests on the VM recently. Once we finish those things, we will send you
>>>> an email , then we can continue the following things. Thank you very much.
>>>>
>>>> Best Regards,
>>>>
>>>> ZhaoBo
>>>>
>>>> Shane Knapp <sk...@berkeley.edu> 于 2019年10月16日周三 上午3:47写道：
>>>>
>>>>> ok!  i'm able to successfully log in to the VM!
>>>>>
>>>>> i also have created a jenkins worker entry:
>>>>> https://amplab.cs.berkeley.edu/jenkins/computer/spark-arm-vm/
>>>>>
>>>>> it's a pretty bare-bones VM, so i have some suggestions/requests
>>>>> before we can actually proceed w/testing.  i will not be able to perform
>>>>> any system configuration, as i don't have the cycles to reverse-engineer
>>>>> the ansible setup and test it all out.
>>>>>
>>>>> * java is not installed, please install the following:
>>>>>   - java8 min version 1.8.0_191
>>>>>   - java11 min version 11.0.1
>>>>>
>>>>> * it appears from the ansible playbook that there are other deps that
>>>>> need to be installed.
>>>>>   - please install all deps
>>>>>   - manually run the tests until they pass
>>>>>
>>>>> * the jenkins user should NEVER have sudo or any root-level access!
>>>>>
>>>>> * once the arm tests pass when manually run, take a snapshot of this
>>>>> image so we can recreate it w/o needing to reinstall everything
>>>>>
>>>>> after that's done i can finish configuring the jenkins worker and set
>>>>> up a build...
>>>>>
>>>>> thanks!
>>>>>
>>>>> shane
>>>>>
>>>>>
>>>>> On Mon, Oct 14, 2019 at 8:34 PM Shane Knapp <sk...@berkeley.edu>
>>>>> wrote:
>>>>>
>>>>>> yes, i will get to that tomorrow.  today was spent cleaning up the
>>>>>> mess from last week.
>>>>>>
>>>>>> On Mon, Oct 14, 2019 at 6:18 PM bo zhaobo <
>>>>>> bzhaojyathousandy@gmail.com> wrote:
>>>>>>
>>>>>>> Hi shane,
>>>>>>>
>>>>>>> That's great news for Amplab is back. ;-) . If possible, could you
>>>>>>> please take several minutes to check the ARM VM is accessible from your
>>>>>>> side? And is there any plan for the whole ARM test integration from
>>>>>>> you?(how about we finish it this month?) Thanks.
>>>>>>>
>>>>>>> Best regards,
>>>>>>>
>>>>>>> ZhaoBo
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> [image: Mailtrack]
>>>>>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> Sender
>>>>>>> notified by
>>>>>>> Mailtrack
>>>>>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> 19/10/15
>>>>>>> 上午09:13:33
>>>>>>>
>>>>>>> bo zhaobo <bz...@gmail.com> 于2019年10月10日周四 上午8:29写道：
>>>>>>>
>>>>>>>> Oh, sorry about we miss that email.  If possible, could you please
>>>>>>>> take some minutes to test the ARM VM is accessible through your ssh private
>>>>>>>> key with jenkins user? And we plan to make the whole integration process
>>>>>>>> and test could be done before the end of this month. We are very happy to
>>>>>>>> work together with you to move it forward, if you are free and agree that.
>>>>>>>> :)  Thank you very much.
>>>>>>>>
>>>>>>>> Best Regards,
>>>>>>>> Zhao Bo
>>>>>>>>
>>>>>>>>
>>>>>>>> Shane Knapp <sk...@berkeley.edu> 于 2019年10月9日周三 下午11:10写道：
>>>>>>>>
>>>>>>>>> i spent yesterday dealing w/a power outage on campus.  please see
>>>>>>>>> my email to the spark dev list.
>>>>>>>>>
>>>>>>>>> On Wed, Oct 9, 2019 at 3:29 AM Tianhua huang <
>>>>>>>>> huangtianhua223@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Shane,
>>>>>>>>>> Sorry to disturb you again, I wonder if there is a progress of
>>>>>>>>>> the jenkins arm job?
>>>>>>>>>> And please don't hesitate to contact us if you have any questions
>>>>>>>>>> and need helps, thank you very much :)
>>>>>>>>>>
>>>>>>>>>> On Tue, Oct 8, 2019 at 9:21 AM bo zhaobo <
>>>>>>>>>> bzhaojyathousandy@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Shane,
>>>>>>>>>>>
>>>>>>>>>>> Sorry for late. We are just back for a 7-days holiday.
>>>>>>>>>>>
>>>>>>>>>>> I already make the public key into the VM. If you are free,
>>>>>>>>>>> please go ahead to test. Thank you
>>>>>>>>>>>
>>>>>>>>>>> Best regards
>>>>>>>>>>>
>>>>>>>>>>> ZhaoBo
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> [image: Mailtrack]
>>>>>>>>>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> Sender
>>>>>>>>>>> notified by
>>>>>>>>>>> Mailtrack
>>>>>>>>>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> 19/10/08
>>>>>>>>>>> 上午09:16:49
>>>>>>>>>>>
>>>>>>>>>>> Shane Knapp <sk...@berkeley.edu> 于2019年10月1日周二 上午2:33写道：
>>>>>>>>>>>
>>>>>>>>>>>> here's the public key for the jenkins user:
>>>>>>>>>>>> ssh-rsa
>>>>>>>>>>>> AAAAB3NzaC1yc2EAAAABIwAAAgEApe+DQF0SusgpdSDLAeZ5ymbEbUODTMUT67yRCaVD7S4oAWgtHXWSLtgZAlD0D2N2qRm74DVXcCrN+LGIxExXP+h/xAPI+0tMHAFt0+u5zTy+6Fq3ADtG5q4dmNMohk4gZhlueeBN7JT6b3uRLwnrSr2F9DCd5F3gMd2fXAHVGWlOPY01IFwJcHcu4VVPV3pHq35N7TyZGup0Np/D1FtB4Hpw7tyrtiidYfQXE1MFVWLpHXFIoRjMEPGfZw5gfIuejImd22W3Qx9BHPC0e97wOxbQfygZHh8S0J5v6X5dvR/jZZs2queMiNwSDsVnjqDX3vgOIymfgy6xJNjiTTXPNuwEbmk54DCMkqibSY3NmmWPAzzWI1SwU4bSmVExY97TgoLr7hEBQQeMZuKScWVY2tD3yRJz18a3rJGnSboESPpItr5pLlCKcZlvJKM24goo4Uiqi9lLPvJbeXV3FbSiGt9pWDu18XzuZGkamxJzkCKSmhCoxB+fqNXWL7jEcvJw8smF9oTmnGG+in4awCBW11U2wkvPvCwUBWB3tRwHERaG6vTp0CNIKaBH5R968qsWuhbNjlARul5JG3XVUbljjxI4s8W+jRU++Ua2wlDC/6scqgEHlLGbQUO5uHCUTxn+wx7XpEZ6FSucOof0S0TR5mCrsFZ7+eV0nLX6l1cZhs8=
>>>>>>>>>>>> jenkins@amp-jenkins-master
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> once you get that set up, please let me know and i will test
>>>>>>>>>>>> ssh-ability.
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Sep 25, 2019 at 10:15 PM Shane Knapp <
>>>>>>>>>>>> sknapp@berkeley.edu> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> thanks for understanding.  i'm a one-man show.  :)
>>>>>>>>>>>>>
>>>>>>>>>>>>> shane
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, Sep 25, 2019 at 6:52 PM Tianhua huang <
>>>>>>>>>>>>> huangtianhua223@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> I am very happy to hear that :)
>>>>>>>>>>>>>> And please don't hesitate to contact us if you have any
>>>>>>>>>>>>>> questions and need helps, thanks again.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Thu, Sep 26, 2019 at 12:34 AM Shane Knapp <
>>>>>>>>>>>>>> sknapp@berkeley.edu> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> i'm hoping to get started by EOW or beginning of next.  i
>>>>>>>>>>>>>>> have an incredibly small team here @ berkeley and support a lot of research
>>>>>>>>>>>>>>> labs, and i've been down 25-50% staffing for the past 6 weeks.  thankfully
>>>>>>>>>>>>>>> my whole team is back and i'm finally getting my head above water and will
>>>>>>>>>>>>>>> have time to dedicate to this really soon.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> sorry for the delay!
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> shane
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Mon, Sep 23, 2019 at 7:21 PM bo zhaobo <
>>>>>>>>>>>>>>> bzhaojyathousandy@gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi Shane,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> How about to start the ARM work this week? Can we? ;-).
>>>>>>>>>>>>>>>> Thank you
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Best Regards
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> [image: Mailtrack]
>>>>>>>>>>>>>>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> Sender
>>>>>>>>>>>>>>>> notified by
>>>>>>>>>>>>>>>> Mailtrack
>>>>>>>>>>>>>>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> 19/09/24
>>>>>>>>>>>>>>>> 上午10:18:01
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Shane Knapp <sk...@berkeley.edu> 于2019年9月20日周五 上午11:57写道：
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> i'll have the cycles over the next week.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Thu, Sep 19, 2019 at 7:51 PM bo zhaobo <
>>>>>>>>>>>>>>>>> bzhaojyathousandy@gmail.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hi Shane,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Is there any update about the last email? ;-)
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Best Regards
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> [image: Mailtrack]
>>>>>>>>>>>>>>>>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> Sender
>>>>>>>>>>>>>>>>>> notified by
>>>>>>>>>>>>>>>>>> Mailtrack
>>>>>>>>>>>>>>>>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> 19/09/20
>>>>>>>>>>>>>>>>>> 上午10:49:35
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> bo zhaobo <bz...@gmail.com> 于2019年9月18日周三
>>>>>>>>>>>>>>>>>> 上午10:02写道：
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thanks for reply.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I will answer you question One by One.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> 0) Sure, I can create an user named 'jenkins' manually.
>>>>>>>>>>>>>>>>>>> And I will left the 'arm' User if you might need the higher priority to do
>>>>>>>>>>>>>>>>>>> something on the VM.
>>>>>>>>>>>>>>>>>>> 1) Sure, that would be great if you could provide a
>>>>>>>>>>>>>>>>>>> public key to us. ;-)
>>>>>>>>>>>>>>>>>>> 2) Sure
>>>>>>>>>>>>>>>>>>> 3) Yeah, it's a persistent VM now. But from us, we plan
>>>>>>>>>>>>>>>>>>> to donate more resources beside the end of October. It's better to move the
>>>>>>>>>>>>>>>>>>> all ARM jenkins workers to the same place, so this VM might be recycle in
>>>>>>>>>>>>>>>>>>> the future. For now, that dosen't stop us to move the CI job forward.
>>>>>>>>>>>>>>>>>>> 4) Correct. ;-)
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> If you need more information or help about the ARM VM,
>>>>>>>>>>>>>>>>>>> please free to contact us.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Questions from us:
>>>>>>>>>>>>>>>>>>> 1> is there a online chat platform? then we can chat and
>>>>>>>>>>>>>>>>>>> discuss on that. Seem emailing is too slow for us. ;-). Or is there a
>>>>>>>>>>>>>>>>>>> online discuss platform in Spark community?
>>>>>>>>>>>>>>>>>>> 2> Could you please share the test scripts once you
>>>>>>>>>>>>>>>>>>> finish the test integration? Also the existing Jenkins test scripts in
>>>>>>>>>>>>>>>>>>> Spark CI, as we plan to do the same thing like X86, so We think testing the
>>>>>>>>>>>>>>>>>>> same jobs like X86 on ARM would be a good start.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thank you
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Best  Regards
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> [image: Mailtrack]
>>>>>>>>>>>>>>>>>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> Sender
>>>>>>>>>>>>>>>>>>> notified by
>>>>>>>>>>>>>>>>>>> Mailtrack
>>>>>>>>>>>>>>>>>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> 19/09/18
>>>>>>>>>>>>>>>>>>> 上午09:33:53
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Shane Knapp <sk...@berkeley.edu> 于2019年9月18日周三
>>>>>>>>>>>>>>>>>>> 上午5:05写道：
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> thanks for the info...  a couple of things:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> 0) could you rename/create a 'jenkins' user with the
>>>>>>>>>>>>>>>>>>>> same rights?
>>>>>>>>>>>>>>>>>>>> 1) i will provide an ssh key so that you can add it to
>>>>>>>>>>>>>>>>>>>> the jenkins user's authorized_keys file
>>>>>>>>>>>>>>>>>>>> 2) the jenkins user should NOT have root access.  this
>>>>>>>>>>>>>>>>>>>> is a major security hole
>>>>>>>>>>>>>>>>>>>> 3) will this be a persistent VM?  if so, i'd much
>>>>>>>>>>>>>>>>>>>> prefer to have it set up initially so we can just log in, build what we
>>>>>>>>>>>>>>>>>>>> need and launch the job
>>>>>>>>>>>>>>>>>>>> 4) yay ansible!  i can create the job locally from the
>>>>>>>>>>>>>>>>>>>> template.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Tue, Sep 17, 2019 at 1:16 AM bo zhaobo <
>>>>>>>>>>>>>>>>>>>> bzhaojyathousandy@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Hi @shane knapp <sk...@berkeley.edu> , Thanks.
>>>>>>>>>>>>>>>>>>>>> Tianhua huang and I already create a ARM VM for this.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> The Infomation about the ARM VM:
>>>>>>>>>>>>>>>>>>>>> IP: 213.146.141.16
>>>>>>>>>>>>>>>>>>>>> ssh_key: please see the attachment file
>>>>>>>>>>>>>>>>>>>>> "lab_ssh_key.txt"
>>>>>>>>>>>>>>>>>>>>> Then you can login to it with "ssh  -i lab_ssh_key.txt
>>>>>>>>>>>>>>>>>>>>> arm@213.146.141.16 ", the User "arm" have sudo
>>>>>>>>>>>>>>>>>>>>> priority.
>>>>>>>>>>>>>>>>>>>>> Notes: This VM is running on Cloud, now it just enable
>>>>>>>>>>>>>>>>>>>>> ssh (tcp 22 port) and ping (icmp) for ingress network connection, if you
>>>>>>>>>>>>>>>>>>>>> need more, please let us know.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> And first we can add periodic job for spark arm test,
>>>>>>>>>>>>>>>>>>>>> maybe the zuul job details of openlab we tested can be referenced in
>>>>>>>>>>>>>>>>>>>>> jenkins job
>>>>>>>>>>>>>>>>>>>>> https://github.com/theopenlab/openlab-zuul-jobs/blob/master/playbooks/spark-unit-test-arm64/run.yaml
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Let's give you some explaination about that file. In
>>>>>>>>>>>>>>>>>>>>> the Step3 of above file, leveldbjni doesn't have a ARM release, so we build
>>>>>>>>>>>>>>>>>>>>> leveldbjni on arm and use ourselves leveldbjni jar, which locates on [1].
>>>>>>>>>>>>>>>>>>>>> So you need to still use the jar to test. And hadoop side also depends on
>>>>>>>>>>>>>>>>>>>>> leveldbjni, I remember that is "hadoop-client". For this reason, we can not
>>>>>>>>>>>>>>>>>>>>> change the pom file directly, so we use "mvn install" to enable leveldb
>>>>>>>>>>>>>>>>>>>>> during testing spark on ARM.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Please Notes： All test runs without ROOT user. So
>>>>>>>>>>>>>>>>>>>>> that's good if you test with "arm" User.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Best Regards
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>>>>>> https://repo1.maven.org/maven2/org/openlabtesting/leveldbjni/leveldbjni-all/1.8/leveldbjni-all-1.8.jar
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> [image: Mailtrack]
>>>>>>>>>>>>>>>>>>>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> Sender
>>>>>>>>>>>>>>>>>>>>> notified by
>>>>>>>>>>>>>>>>>>>>> Mailtrack
>>>>>>>>>>>>>>>>>>>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> 19/09/17
>>>>>>>>>>>>>>>>>>>>> 下午04:08:59
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Tianhua huang <hu...@gmail.com>
>>>>>>>>>>>>>>>>>>>>> 于2019年9月17日周二 下午2:48写道：
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> @shane knapp <sk...@berkeley.edu> thank you very
>>>>>>>>>>>>>>>>>>>>>> much, I opened an issue for this
>>>>>>>>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/SPARK-29106,
>>>>>>>>>>>>>>>>>>>>>> we can tall the details in it :)
>>>>>>>>>>>>>>>>>>>>>> And we will prepare an arm instance today and will
>>>>>>>>>>>>>>>>>>>>>> send the info to your email later.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On Tue, Sep 17, 2019 at 4:40 AM Shane Knapp <
>>>>>>>>>>>>>>>>>>>>>> sknapp@berkeley.edu> wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> @Tianhua huang <hu...@gmail.com> sure, i
>>>>>>>>>>>>>>>>>>>>>>> think we can get something sorted for the short-term.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> all we need is ssh access (i can provide an ssh
>>>>>>>>>>>>>>>>>>>>>>> key), and i can then have our jenkins master launch a remote worker on that
>>>>>>>>>>>>>>>>>>>>>>> instance.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> instance setup, etc, will be up to you.  my support
>>>>>>>>>>>>>>>>>>>>>>> for the time being will be to create the job and 'best effort' for
>>>>>>>>>>>>>>>>>>>>>>> everything else.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> this should get us up and running asap.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> is there an open JIRA for jenkins/arm test support?
>>>>>>>>>>>>>>>>>>>>>>> we can move the technical details about this idea there.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> On Sun, Sep 15, 2019 at 9:03 PM Tianhua huang <
>>>>>>>>>>>>>>>>>>>>>>> huangtianhua223@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> @Sean Owen <sr...@gmail.com> , so sorry to reply
>>>>>>>>>>>>>>>>>>>>>>>> late, we had a Mid-Autumn holiday:)
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> If you hope to integrate ARM CI to amplab jenkins,
>>>>>>>>>>>>>>>>>>>>>>>> we can offer the arm instance, and then the ARM job will run together with
>>>>>>>>>>>>>>>>>>>>>>>> other x86 jobs, so maybe there is a guideline to do this? @shane
>>>>>>>>>>>>>>>>>>>>>>>> knapp <sk...@berkeley.edu>  would you help us?
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Sep 12, 2019 at 9:36 PM Sean Owen <
>>>>>>>>>>>>>>>>>>>>>>>> srowen@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> I don't know what's involved in actually accepting
>>>>>>>>>>>>>>>>>>>>>>>>> or operating those machines, so can't comment there, but in the meantime
>>>>>>>>>>>>>>>>>>>>>>>>> it's good that you are running these tests and can help report changes
>>>>>>>>>>>>>>>>>>>>>>>>> needed to keep it working with ARM. I would continue with that for now.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Sep 11, 2019 at 10:06 PM Tianhua huang <
>>>>>>>>>>>>>>>>>>>>>>>>> huangtianhua223@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> For the whole work process of spark ARM CI, we
>>>>>>>>>>>>>>>>>>>>>>>>>> want to make 2 things clear.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> The first thing is:
>>>>>>>>>>>>>>>>>>>>>>>>>> About spark ARM CI, now we have two periodic
>>>>>>>>>>>>>>>>>>>>>>>>>> jobs, one job[1] based on commit[2](which already fixed the replay tests
>>>>>>>>>>>>>>>>>>>>>>>>>> failed issue[3], we made a new test branch based on date 09-09-2019), the
>>>>>>>>>>>>>>>>>>>>>>>>>> other job[4] based on spark master.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> The first job we test on the specified branch to
>>>>>>>>>>>>>>>>>>>>>>>>>> prove that our ARM CI is good and stable.
>>>>>>>>>>>>>>>>>>>>>>>>>> The second job checks spark master every day,
>>>>>>>>>>>>>>>>>>>>>>>>>> then we can find whether the latest commits affect the ARM CI. According to
>>>>>>>>>>>>>>>>>>>>>>>>>> the build history and result, it shows that some problems are easier to
>>>>>>>>>>>>>>>>>>>>>>>>>> find on ARM like SPARK-28770
>>>>>>>>>>>>>>>>>>>>>>>>>> <https://issues.apache.org/jira/browse/SPARK-28770>,
>>>>>>>>>>>>>>>>>>>>>>>>>> and it also shows that we would make efforts to trace and figure them
>>>>>>>>>>>>>>>>>>>>>>>>>> out, till now we have found and fixed several problems[5][6][7], thanks
>>>>>>>>>>>>>>>>>>>>>>>>>> everyone of the community :). And we believe that ARM CI is very necessary,
>>>>>>>>>>>>>>>>>>>>>>>>>> right?
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> The second thing is:
>>>>>>>>>>>>>>>>>>>>>>>>>> We plan to run the jobs for a period of time, and
>>>>>>>>>>>>>>>>>>>>>>>>>> you can see the result and logs from 'build history' of the jobs console,
>>>>>>>>>>>>>>>>>>>>>>>>>> if everything goes well for one or two weeks could community accept the ARM
>>>>>>>>>>>>>>>>>>>>>>>>>> CI? or how long the periodic jobs to run then our community could have
>>>>>>>>>>>>>>>>>>>>>>>>>> enough confidence to accept the ARM CI? As you suggested before, it's good
>>>>>>>>>>>>>>>>>>>>>>>>>> to integrate ARM CI to amplab jenkins, we agree that and we can donate the
>>>>>>>>>>>>>>>>>>>>>>>>>> ARM instances and then maintain the ARM-related test jobs together with
>>>>>>>>>>>>>>>>>>>>>>>>>> community, any thoughts?
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Thank you all!
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>>>>>>>>>>> http://status.openlabtesting.org/job/spark-unchanged-branch-unit-test-hadoop-2.7-arm64
>>>>>>>>>>>>>>>>>>>>>>>>>> [2]
>>>>>>>>>>>>>>>>>>>>>>>>>> https://github.com/apache/spark/commit/0ed9fae45769d4b06b8cf8128f462f09ff3d9a72
>>>>>>>>>>>>>>>>>>>>>>>>>> [3]
>>>>>>>>>>>>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/SPARK-28770
>>>>>>>>>>>>>>>>>>>>>>>>>> [4]
>>>>>>>>>>>>>>>>>>>>>>>>>> http://status.openlabtesting.org/builds?job_name=spark-master-unit-test-hadoop-2.7-arm64
>>>>>>>>>>>>>>>>>>>>>>>>>> [5] https://github.com/apache/spark/pull/25186
>>>>>>>>>>>>>>>>>>>>>>>>>> [6] https://github.com/apache/spark/pull/25279
>>>>>>>>>>>>>>>>>>>>>>>>>> [7] https://github.com/apache/spark/pull/25673
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Aug 16, 2019 at 11:24 PM Sean Owen <
>>>>>>>>>>>>>>>>>>>>>>>>>> srowen@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Yes, I think it's just local caching. After you
>>>>>>>>>>>>>>>>>>>>>>>>>>> run the build you should find lots of stuff cached at ~/.m2/repository and
>>>>>>>>>>>>>>>>>>>>>>>>>>> it won't download every time.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Aug 16, 2019 at 3:01 AM bo zhaobo <
>>>>>>>>>>>>>>>>>>>>>>>>>>> bzhaojyathousandy@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Sean,
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for reply. And very apologize for making
>>>>>>>>>>>>>>>>>>>>>>>>>>>> you confused.
>>>>>>>>>>>>>>>>>>>>>>>>>>>> I know the dependencies will be downloaded from
>>>>>>>>>>>>>>>>>>>>>>>>>>>> SBT or Maven. But the Spark QA job also exec "mvn clean package", why the
>>>>>>>>>>>>>>>>>>>>>>>>>>>> log didn't print "downloading some jar from Maven central [1] and build
>>>>>>>>>>>>>>>>>>>>>>>>>>>> very fast. Is the reason that Spark Jenkins build the Spark jars in the
>>>>>>>>>>>>>>>>>>>>>>>>>>>> physical machiines and won't destrory the test env after job is finished?
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Then the other job build Spark will get the dependencies jar from the local
>>>>>>>>>>>>>>>>>>>>>>>>>>>> cached, as the previous jobs exec "mvn package", those dependencies had
>>>>>>>>>>>>>>>>>>>>>>>>>>>> been downloaded already on local worker machine. Am I right? Is that the
>>>>>>>>>>>>>>>>>>>>>>>>>>>> reason the job log[1] didn't print any downloading information from Maven
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Central?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thank you very much.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>>>>>>>>>>>>> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-master-test-maven-hadoop-2.6-ubuntu-testing/lastBuild/consoleFull
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best regards
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> ZhaoBo
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> [image: Mailtrack]
>>>>>>>>>>>>>>>>>>>>>>>>>>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> Sender
>>>>>>>>>>>>>>>>>>>>>>>>>>>> notified by
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Mailtrack
>>>>>>>>>>>>>>>>>>>>>>>>>>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> 19/08/16
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 下午03:58:53
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Sean Owen <sr...@gmail.com> 于2019年8月16日周五
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 上午10:38写道：
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm not sure what you mean. The dependencies
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> are downloaded by SBT and Maven like in any other project, and nothing
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> about it is specific to Spark.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The worker machines cache artifacts that are
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> downloaded from these, but this is a function of Maven and SBT, not Spark.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> You may find that the initial download takes a long time.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 15, 2019 at 9:02 PM bo zhaobo <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> bzhaojyathousandy@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Sean,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks very much for pointing out the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> roadmap. ;-). Then I think we will continue to focus on our test
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> environment.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> For the networking problems, I mean that we
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> can access Maven Central, and jobs cloud download the required jar package
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with a high network speed. What we want to know is that, why the Spark QA
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> test jobs[1] log shows the job script/maven build seem don't download the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> jar packages? Could you tell us the reason about that? Thank you.  The
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> reason we raise the "networking problems" is that we found a phenomenon
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> during we test, if we execute "mvn clean package" in a new test
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> environment(As in our test environment, we will destory the test VMs after
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the job is finish), maven will download the dependency jar packages from
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Maven Central, but in this job "spark-master-test-maven-hadoop" [2], from
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the log, we didn't found it download any jar packages, what the reason
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> about that?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Also we build the Spark jar with downloading
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> dependencies from Maven Central, it will cost mostly 1 hour. And we found
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [2] just cost 10min. But if we run "mvn package" in a VM which already exec
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> "mvn package" before, it just cost 14min, looks very closer with [2]. So we
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> suspect that downloading the Jar packages cost so much time. For the goad
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of ARM CI, we expect the performance of NEW ARM CI could be closer with
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> existing X86 CI, then users could accept it eaiser.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [2]
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-master-test-maven-hadoop-2.6-ubuntu-testing/lastBuild/consoleFull
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best regards
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ZhaoBo
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [image: Mailtrack]
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> Sender
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> notified by
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Mailtrack
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> 19/08/16
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 上午09:48:43
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Sean Owen <sr...@gmail.com> 于2019年8月15日周四
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 下午9:58写道：
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I think the right goal is to fix the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> remaining issues first. If we set up CI/CD it will only tell us there are
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> still some test failures. If it's stable, and not hard to add to the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> existing CI/CD, yes it could be done automatically later. You can continue
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to test on ARM independently for now.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It sounds indeed like there are some
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> networking problems in the test system if you're not able to download from
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Maven Central. That rarely takes significant time, and there aren't
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> project-specific mirrors here. You might be able to point at a closer
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> public mirror, depending on where you are.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 15, 2019 at 5:43 AM Tianhua
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> huang <hu...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I want to discuss spark ARM CI again, we
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> took some tests on arm instance based on master and the job includes
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> https://github.com/theopenlab/spark/pull/13  and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> k8s integration
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> https://github.com/theopenlab/spark/pull/17/ ,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> there are several things I want to talk about:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> First, about the failed tests:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     1.we have fixed some problems like
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> https://github.com/apache/spark/pull/25186
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> https://github.com/apache/spark/pull/25279,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> thanks sean owen and others to help us.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     2.we tried k8s integration test on arm,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and met an error: apk fetch hangs,  the tests passed  after adding
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '--network host' option for command `docker build`, see:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> https://github.com/theopenlab/spark/pull/17/files#diff-5b731b14068240d63a93c393f6f9b1e8R176
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> , the solution refers to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> https://github.com/gliderlabs/docker-alpine/issues/307
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and I don't know whether it happened once in community CI, or maybe we
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> should submit a pr to pass  '--network host' when `docker build`?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     3.we found there are two tests failed
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> after the commit
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> https://github.com/apache/spark/pull/23767
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> :
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>        ReplayListenerSuite:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>        - ...
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>        - End-to-end replay *** FAILED ***
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>          "[driver]" did not equal "[1]"
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (JsonProtocolSuite.scala:622)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>        - End-to-end replay with compression
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> *** FAILED ***
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>          "[driver]" did not equal "[1]"
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (JsonProtocolSuite.scala:622)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>         we tried to revert the commit and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> then the tests passed, the patch is too big and so sorry we can't find the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> reason till now, if you are interesting please try it, and it will be very
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> appreciate          if someone can help us to figure it out.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Second, about the test time, we increased
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the flavor of arm instance to 16U16G, but seems there was no significant
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> improvement, the k8s integration test took about one and a half hours, and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the QA test(like spark-master-test-maven-hadoop-2.7 community jenkins job)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> took about seventeen hours(it is too long :(), we suspect that the reason
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is the performance and network,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> we split the jobs based on projects such as
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> sql, core and so on, the time can be decrease to about seven hours, see
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> https://github.com/theopenlab/spark/pull/19 We
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> found the Spark QA tests like
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/   ,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it looks all tests seem never download the jar packages from maven centry
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> repo(such as
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> https://repo.maven.apache.org/maven2/org/opencypher/okapi-api/0.4.2/okapi-api-0.4.2.jar).
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> So we want to know how the jenkins jobs can do that, is there a internal
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> maven repo launched? maybe we can do the same thing to avoid the network
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> connection cost during downloading the dependent jar packages.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Third, the most important thing, it's about
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ARM CI of spark, we believe that it is necessary, right? And you can see we
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> really made a lot of efforts, now the basic arm build/test jobs is ok, so
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> we suggest to add arm jobs to community, we can set them to novoting
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> firstly, and improve/rich the jobs step by step. Generally, there are two
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ways in our mind to integrate the ARM CI for spark:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>      1) We introduce openlab ARM CI into
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> spark as a custom CI system. We provide human resources and test ARM VMs,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> also we will focus on the ARM related issues about Spark. We will push the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> PR into community.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>      2) We donate ARM VM resources into
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> existing amplab Jenkins. We still provide human resources, focus on the ARM
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> related issues about Spark and push the PR into community.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Both options, we will provide human
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> resources to maintain, of course it will be great if we can work together.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> So please tell us which option you would like? And let's move forward.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Waiting for your reply, thank you very much.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>>> Shane Knapp
>>>>>>>>>>>>>>>>>>>>>>> UC Berkeley EECS Research / RISELab Staff Technical
>>>>>>>>>>>>>>>>>>>>>>> Lead
>>>>>>>>>>>>>>>>>>>>>>> https://rise.cs.berkeley.edu
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>> Shane Knapp
>>>>>>>>>>>>>>>>>>>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>>>>>>>>>>>>>>>>>>> https://rise.cs.berkeley.edu
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>> Shane Knapp
>>>>>>>>>>>>>>>>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>>>>>>>>>>>>>>>> https://rise.cs.berkeley.edu
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> Shane Knapp
>>>>>>>>>>>>>>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>>>>>>>>>>>>>> https://rise.cs.berkeley.edu
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Shane Knapp
>>>>>>>>>>>>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>>>>>>>>>>>> https://rise.cs.berkeley.edu
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Shane Knapp
>>>>>>>>>>>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>>>>>>>>>>> https://rise.cs.berkeley.edu
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Shane Knapp
>>>>>>>>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>>>>>>>> https://rise.cs.berkeley.edu
>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Shane Knapp
>>>>>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>>>>> https://rise.cs.berkeley.edu
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Shane Knapp
>>>>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>>>> https://rise.cs.berkeley.edu
>>>>>
>>>>
>>>> [image: Mailtrack]
>>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> Sender
>>>> notified by
>>>> Mailtrack
>>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> 19/10/16
>>>> 上午09:13:46
>>>>
>>>
>>>
>>> --
>>> Shane Knapp
>>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>> https://rise.cs.berkeley.edu
>>>
>>

Re: Ask for ARM CI for spark

Posted by Tianhua huang <hu...@gmail.com>.

@shane knapp <sk...@berkeley.edu> thank you very much, I opened an issue
for this https://issues.apache.org/jira/browse/SPARK-29106, we can tall the
details in it :)
And we will prepare an arm instance today and will send the info to your
email later.

On Tue, Sep 17, 2019 at 4:40 AM Shane Knapp <sk...@berkeley.edu> wrote:

> @Tianhua huang <hu...@gmail.com> sure, i think we can get
> something sorted for the short-term.
>
> all we need is ssh access (i can provide an ssh key), and i can then have
> our jenkins master launch a remote worker on that instance.
>
> instance setup, etc, will be up to you.  my support for the time being
> will be to create the job and 'best effort' for everything else.
>
> this should get us up and running asap.
>
> is there an open JIRA for jenkins/arm test support?  we can move the
> technical details about this idea there.
>
> On Sun, Sep 15, 2019 at 9:03 PM Tianhua huang <hu...@gmail.com>
> wrote:
>
>> @Sean Owen <sr...@gmail.com> , so sorry to reply late, we had a
>> Mid-Autumn holiday:)
>>
>> If you hope to integrate ARM CI to amplab jenkins, we can offer the arm
>> instance, and then the ARM job will run together with other x86 jobs, so
>> maybe there is a guideline to do this? @shane knapp <sk...@berkeley.edu>
>> would you help us?
>>
>> On Thu, Sep 12, 2019 at 9:36 PM Sean Owen <sr...@gmail.com> wrote:
>>
>>> I don't know what's involved in actually accepting or operating those
>>> machines, so can't comment there, but in the meantime it's good that you
>>> are running these tests and can help report changes needed to keep it
>>> working with ARM. I would continue with that for now.
>>>
>>> On Wed, Sep 11, 2019 at 10:06 PM Tianhua huang <
>>> huangtianhua223@gmail.com> wrote:
>>>
>>>> Hi all,
>>>>
>>>> For the whole work process of spark ARM CI, we want to make 2 things
>>>> clear.
>>>>
>>>> The first thing is:
>>>> About spark ARM CI, now we have two periodic jobs, one job[1] based on
>>>> commit[2](which already fixed the replay tests failed issue[3], we made a
>>>> new test branch based on date 09-09-2019), the other job[4] based on spark
>>>> master.
>>>>
>>>> The first job we test on the specified branch to prove that our ARM CI
>>>> is good and stable.
>>>> The second job checks spark master every day, then we can find whether
>>>> the latest commits affect the ARM CI. According to the build history and
>>>> result, it shows that some problems are easier to find on ARM like
>>>> SPARK-28770 <https://issues.apache.org/jira/browse/SPARK-28770>, and
>>>> it also shows that we would make efforts to trace and figure them out, till
>>>> now we have found and fixed several problems[5][6][7], thanks everyone of
>>>> the community :). And we believe that ARM CI is very necessary, right?
>>>>
>>>> The second thing is:
>>>> We plan to run the jobs for a period of time, and you can see the
>>>> result and logs from 'build history' of the jobs console, if everything
>>>> goes well for one or two weeks could community accept the ARM CI? or how
>>>> long the periodic jobs to run then our community could have enough
>>>> confidence to accept the ARM CI? As you suggested before, it's good to
>>>> integrate ARM CI to amplab jenkins, we agree that and we can donate the ARM
>>>> instances and then maintain the ARM-related test jobs together with
>>>> community, any thoughts?
>>>>
>>>> Thank you all!
>>>>
>>>> [1]
>>>> http://status.openlabtesting.org/job/spark-unchanged-branch-unit-test-hadoop-2.7-arm64
>>>> [2]
>>>> https://github.com/apache/spark/commit/0ed9fae45769d4b06b8cf8128f462f09ff3d9a72
>>>> [3] https://issues.apache.org/jira/browse/SPARK-28770
>>>> [4]
>>>> http://status.openlabtesting.org/builds?job_name=spark-master-unit-test-hadoop-2.7-arm64
>>>> [5] https://github.com/apache/spark/pull/25186
>>>> [6] https://github.com/apache/spark/pull/25279
>>>> [7] https://github.com/apache/spark/pull/25673
>>>>
>>>>
>>>>
>>>> On Fri, Aug 16, 2019 at 11:24 PM Sean Owen <sr...@gmail.com> wrote:
>>>>
>>>>> Yes, I think it's just local caching. After you run the build you
>>>>> should find lots of stuff cached at ~/.m2/repository and it won't download
>>>>> every time.
>>>>>
>>>>> On Fri, Aug 16, 2019 at 3:01 AM bo zhaobo <bz...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Sean,
>>>>>> Thanks for reply. And very apologize for making you confused.
>>>>>> I know the dependencies will be downloaded from SBT or Maven. But the
>>>>>> Spark QA job also exec "mvn clean package", why the log didn't print
>>>>>> "downloading some jar from Maven central [1] and build very fast. Is the
>>>>>> reason that Spark Jenkins build the Spark jars in the physical machiines
>>>>>> and won't destrory the test env after job is finished? Then the other job
>>>>>> build Spark will get the dependencies jar from the local cached, as the
>>>>>> previous jobs exec "mvn package", those dependencies had been downloaded
>>>>>> already on local worker machine. Am I right? Is that the reason the job
>>>>>> log[1] didn't print any downloading information from Maven Central?
>>>>>>
>>>>>> Thank you very much.
>>>>>>
>>>>>> [1]
>>>>>> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-master-test-maven-hadoop-2.6-ubuntu-testing/lastBuild/consoleFull
>>>>>>
>>>>>>
>>>>>> Best regards
>>>>>>
>>>>>> ZhaoBo
>>>>>>
>>>>>> [image: Mailtrack]
>>>>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> Sender
>>>>>> notified by
>>>>>> Mailtrack
>>>>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> 19/08/16
>>>>>> 下午03:58:53
>>>>>>
>>>>>> Sean Owen <sr...@gmail.com> 于2019年8月16日周五 上午10:38写道：
>>>>>>
>>>>>>> I'm not sure what you mean. The dependencies are downloaded by SBT
>>>>>>> and Maven like in any other project, and nothing about it is specific to
>>>>>>> Spark.
>>>>>>> The worker machines cache artifacts that are downloaded from these,
>>>>>>> but this is a function of Maven and SBT, not Spark. You may find that the
>>>>>>> initial download takes a long time.
>>>>>>>
>>>>>>> On Thu, Aug 15, 2019 at 9:02 PM bo zhaobo <
>>>>>>> bzhaojyathousandy@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi Sean,
>>>>>>>>
>>>>>>>> Thanks very much for pointing out the roadmap. ;-). Then I think we
>>>>>>>> will continue to focus on our test environment.
>>>>>>>>
>>>>>>>> For the networking problems, I mean that we can access Maven
>>>>>>>> Central, and jobs cloud download the required jar package with a high
>>>>>>>> network speed. What we want to know is that, why the Spark QA test jobs[1]
>>>>>>>> log shows the job script/maven build seem don't download the jar packages?
>>>>>>>> Could you tell us the reason about that? Thank you.  The reason we raise
>>>>>>>> the "networking problems" is that we found a phenomenon during we test, if
>>>>>>>> we execute "mvn clean package" in a new test environment(As in our test
>>>>>>>> environment, we will destory the test VMs after the job is finish), maven
>>>>>>>> will download the dependency jar packages from Maven Central, but in this
>>>>>>>> job "spark-master-test-maven-hadoop" [2], from the log, we didn't found it
>>>>>>>> download any jar packages, what the reason about that?
>>>>>>>> Also we build the Spark jar with downloading dependencies from
>>>>>>>> Maven Central, it will cost mostly 1 hour. And we found [2] just cost
>>>>>>>> 10min. But if we run "mvn package" in a VM which already exec "mvn package"
>>>>>>>> before, it just cost 14min, looks very closer with [2]. So we suspect that
>>>>>>>> downloading the Jar packages cost so much time. For the goad of ARM CI, we
>>>>>>>> expect the performance of NEW ARM CI could be closer with existing X86 CI,
>>>>>>>> then users could accept it eaiser.
>>>>>>>>
>>>>>>>> [1] https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/
>>>>>>>> [2]
>>>>>>>> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-master-test-maven-hadoop-2.6-ubuntu-testing/lastBuild/consoleFull
>>>>>>>>
>>>>>>>> Best regards
>>>>>>>>
>>>>>>>> ZhaoBo
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> [image: Mailtrack]
>>>>>>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> Sender
>>>>>>>> notified by
>>>>>>>> Mailtrack
>>>>>>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> 19/08/16
>>>>>>>> 上午09:48:43
>>>>>>>>
>>>>>>>> Sean Owen <sr...@gmail.com> 于2019年8月15日周四 下午9:58写道：
>>>>>>>>
>>>>>>>>> I think the right goal is to fix the remaining issues first. If we
>>>>>>>>> set up CI/CD it will only tell us there are still some test failures. If
>>>>>>>>> it's stable, and not hard to add to the existing CI/CD, yes it could be
>>>>>>>>> done automatically later. You can continue to test on ARM independently for
>>>>>>>>> now.
>>>>>>>>>
>>>>>>>>> It sounds indeed like there are some networking problems in the
>>>>>>>>> test system if you're not able to download from Maven Central. That rarely
>>>>>>>>> takes significant time, and there aren't project-specific mirrors here. You
>>>>>>>>> might be able to point at a closer public mirror, depending on where you
>>>>>>>>> are.
>>>>>>>>>
>>>>>>>>> On Thu, Aug 15, 2019 at 5:43 AM Tianhua huang <
>>>>>>>>> huangtianhua223@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi all,
>>>>>>>>>>
>>>>>>>>>> I want to discuss spark ARM CI again, we took some tests on arm
>>>>>>>>>> instance based on master and the job includes
>>>>>>>>>> https://github.com/theopenlab/spark/pull/13  and k8s integration
>>>>>>>>>> https://github.com/theopenlab/spark/pull/17/ , there are several
>>>>>>>>>> things I want to talk about:
>>>>>>>>>>
>>>>>>>>>> First, about the failed tests:
>>>>>>>>>>     1.we have fixed some problems like
>>>>>>>>>> https://github.com/apache/spark/pull/25186 and
>>>>>>>>>> https://github.com/apache/spark/pull/25279, thanks sean owen and
>>>>>>>>>> others to help us.
>>>>>>>>>>     2.we tried k8s integration test on arm, and met an error: apk
>>>>>>>>>> fetch hangs,  the tests passed  after adding '--network host' option for
>>>>>>>>>> command `docker build`, see:
>>>>>>>>>>
>>>>>>>>>> https://github.com/theopenlab/spark/pull/17/files#diff-5b731b14068240d63a93c393f6f9b1e8R176
>>>>>>>>>> , the solution refers to
>>>>>>>>>> https://github.com/gliderlabs/docker-alpine/issues/307  and I
>>>>>>>>>> don't know whether it happened once in community CI, or maybe we should
>>>>>>>>>> submit a pr to pass  '--network host' when `docker build`?
>>>>>>>>>>     3.we found there are two tests failed after the commit
>>>>>>>>>> https://github.com/apache/spark/pull/23767  :
>>>>>>>>>>        ReplayListenerSuite:
>>>>>>>>>>        - ...
>>>>>>>>>>        - End-to-end replay *** FAILED ***
>>>>>>>>>>          "[driver]" did not equal "[1]"
>>>>>>>>>> (JsonProtocolSuite.scala:622)
>>>>>>>>>>        - End-to-end replay with compression *** FAILED ***
>>>>>>>>>>          "[driver]" did not equal "[1]"
>>>>>>>>>> (JsonProtocolSuite.scala:622)
>>>>>>>>>>
>>>>>>>>>>         we tried to revert the commit and then the tests passed,
>>>>>>>>>> the patch is too big and so sorry we can't find the reason till now, if you
>>>>>>>>>> are interesting please try it, and it will be very appreciate          if
>>>>>>>>>> someone can help us to figure it out.
>>>>>>>>>>
>>>>>>>>>> Second, about the test time, we increased the flavor of arm
>>>>>>>>>> instance to 16U16G, but seems there was no significant improvement, the k8s
>>>>>>>>>> integration test took about one and a half hours, and the QA test(like
>>>>>>>>>> spark-master-test-maven-hadoop-2.7 community jenkins job) took about
>>>>>>>>>> seventeen hours(it is too long :(), we suspect that the reason is the
>>>>>>>>>> performance and network,
>>>>>>>>>> we split the jobs based on projects such as sql, core and so on,
>>>>>>>>>> the time can be decrease to about seven hours, see
>>>>>>>>>> https://github.com/theopenlab/spark/pull/19 We found the Spark
>>>>>>>>>> QA tests like
>>>>>>>>>> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/   ,
>>>>>>>>>> it looks all tests seem never download the jar packages from maven centry
>>>>>>>>>> repo(such as
>>>>>>>>>> https://repo.maven.apache.org/maven2/org/opencypher/okapi-api/0.4.2/okapi-api-0.4.2.jar).
>>>>>>>>>> So we want to know how the jenkins jobs can do that, is there a internal
>>>>>>>>>> maven repo launched? maybe we can do the same thing to avoid the network
>>>>>>>>>> connection cost during downloading the dependent jar packages.
>>>>>>>>>>
>>>>>>>>>> Third, the most important thing, it's about ARM CI of spark, we
>>>>>>>>>> believe that it is necessary, right? And you can see we really made a lot
>>>>>>>>>> of efforts, now the basic arm build/test jobs is ok, so we suggest to add
>>>>>>>>>> arm jobs to community, we can set them to novoting firstly, and
>>>>>>>>>> improve/rich the jobs step by step. Generally, there are two ways in our
>>>>>>>>>> mind to integrate the ARM CI for spark:
>>>>>>>>>>      1) We introduce openlab ARM CI into spark as a custom CI
>>>>>>>>>> system. We provide human resources and test ARM VMs, also we will focus on
>>>>>>>>>> the ARM related issues about Spark. We will push the PR into community.
>>>>>>>>>>      2) We donate ARM VM resources into existing amplab Jenkins.
>>>>>>>>>> We still provide human resources, focus on the ARM related issues about
>>>>>>>>>> Spark and push the PR into community.
>>>>>>>>>> Both options, we will provide human resources to maintain, of
>>>>>>>>>> course it will be great if we can work together. So please tell us which
>>>>>>>>>> option you would like? And let's move forward. Waiting for your reply,
>>>>>>>>>> thank you very much.
>>>>>>>>>>
>>>>>>>>>
>
> --
> Shane Knapp
> UC Berkeley EECS Research / RISELab Staff Technical Lead
> https://rise.cs.berkeley.edu
>

Re: Ask for ARM CI for spark

Posted by Shane Knapp <sk...@berkeley.edu>.

@Tianhua huang <hu...@gmail.com> sure, i think we can get
something sorted for the short-term.

all we need is ssh access (i can provide an ssh key), and i can then have
our jenkins master launch a remote worker on that instance.

instance setup, etc, will be up to you.  my support for the time being will
be to create the job and 'best effort' for everything else.

this should get us up and running asap.

is there an open JIRA for jenkins/arm test support?  we can move the
technical details about this idea there.

On Sun, Sep 15, 2019 at 9:03 PM Tianhua huang <hu...@gmail.com>
wrote:

> @Sean Owen <sr...@gmail.com> , so sorry to reply late, we had a
> Mid-Autumn holiday:)
>
> If you hope to integrate ARM CI to amplab jenkins, we can offer the arm
> instance, and then the ARM job will run together with other x86 jobs, so
> maybe there is a guideline to do this? @shane knapp <sk...@berkeley.edu>
> would you help us?
>
> On Thu, Sep 12, 2019 at 9:36 PM Sean Owen <sr...@gmail.com> wrote:
>
>> I don't know what's involved in actually accepting or operating those
>> machines, so can't comment there, but in the meantime it's good that you
>> are running these tests and can help report changes needed to keep it
>> working with ARM. I would continue with that for now.
>>
>> On Wed, Sep 11, 2019 at 10:06 PM Tianhua huang <hu...@gmail.com>
>> wrote:
>>
>>> Hi all,
>>>
>>> For the whole work process of spark ARM CI, we want to make 2 things
>>> clear.
>>>
>>> The first thing is:
>>> About spark ARM CI, now we have two periodic jobs, one job[1] based on
>>> commit[2](which already fixed the replay tests failed issue[3], we made a
>>> new test branch based on date 09-09-2019), the other job[4] based on spark
>>> master.
>>>
>>> The first job we test on the specified branch to prove that our ARM CI
>>> is good and stable.
>>> The second job checks spark master every day, then we can find whether
>>> the latest commits affect the ARM CI. According to the build history and
>>> result, it shows that some problems are easier to find on ARM like
>>> SPARK-28770 <https://issues.apache.org/jira/browse/SPARK-28770>, and it
>>> also shows that we would make efforts to trace and figure them out, till
>>> now we have found and fixed several problems[5][6][7], thanks everyone of
>>> the community :). And we believe that ARM CI is very necessary, right?
>>>
>>> The second thing is:
>>> We plan to run the jobs for a period of time, and you can see the result
>>> and logs from 'build history' of the jobs console, if everything goes well
>>> for one or two weeks could community accept the ARM CI? or how long the
>>> periodic jobs to run then our community could have enough confidence to
>>> accept the ARM CI? As you suggested before, it's good to integrate ARM CI
>>> to amplab jenkins, we agree that and we can donate the ARM instances and
>>> then maintain the ARM-related test jobs together with community, any
>>> thoughts?
>>>
>>> Thank you all!
>>>
>>> [1]
>>> http://status.openlabtesting.org/job/spark-unchanged-branch-unit-test-hadoop-2.7-arm64
>>> [2]
>>> https://github.com/apache/spark/commit/0ed9fae45769d4b06b8cf8128f462f09ff3d9a72
>>> [3] https://issues.apache.org/jira/browse/SPARK-28770
>>> [4]
>>> http://status.openlabtesting.org/builds?job_name=spark-master-unit-test-hadoop-2.7-arm64
>>> [5] https://github.com/apache/spark/pull/25186
>>> [6] https://github.com/apache/spark/pull/25279
>>> [7] https://github.com/apache/spark/pull/25673
>>>
>>>
>>>
>>> On Fri, Aug 16, 2019 at 11:24 PM Sean Owen <sr...@gmail.com> wrote:
>>>
>>>> Yes, I think it's just local caching. After you run the build you
>>>> should find lots of stuff cached at ~/.m2/repository and it won't download
>>>> every time.
>>>>
>>>> On Fri, Aug 16, 2019 at 3:01 AM bo zhaobo <bz...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Sean,
>>>>> Thanks for reply. And very apologize for making you confused.
>>>>> I know the dependencies will be downloaded from SBT or Maven. But the
>>>>> Spark QA job also exec "mvn clean package", why the log didn't print
>>>>> "downloading some jar from Maven central [1] and build very fast. Is the
>>>>> reason that Spark Jenkins build the Spark jars in the physical machiines
>>>>> and won't destrory the test env after job is finished? Then the other job
>>>>> build Spark will get the dependencies jar from the local cached, as the
>>>>> previous jobs exec "mvn package", those dependencies had been downloaded
>>>>> already on local worker machine. Am I right? Is that the reason the job
>>>>> log[1] didn't print any downloading information from Maven Central?
>>>>>
>>>>> Thank you very much.
>>>>>
>>>>> [1]
>>>>> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-master-test-maven-hadoop-2.6-ubuntu-testing/lastBuild/consoleFull
>>>>>
>>>>>
>>>>> Best regards
>>>>>
>>>>> ZhaoBo
>>>>>
>>>>> [image: Mailtrack]
>>>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> Sender
>>>>> notified by
>>>>> Mailtrack
>>>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> 19/08/16
>>>>> 下午03:58:53
>>>>>
>>>>> Sean Owen <sr...@gmail.com> 于2019年8月16日周五 上午10:38写道：
>>>>>
>>>>>> I'm not sure what you mean. The dependencies are downloaded by SBT
>>>>>> and Maven like in any other project, and nothing about it is specific to
>>>>>> Spark.
>>>>>> The worker machines cache artifacts that are downloaded from these,
>>>>>> but this is a function of Maven and SBT, not Spark. You may find that the
>>>>>> initial download takes a long time.
>>>>>>
>>>>>> On Thu, Aug 15, 2019 at 9:02 PM bo zhaobo <
>>>>>> bzhaojyathousandy@gmail.com> wrote:
>>>>>>
>>>>>>> Hi Sean,
>>>>>>>
>>>>>>> Thanks very much for pointing out the roadmap. ;-). Then I think we
>>>>>>> will continue to focus on our test environment.
>>>>>>>
>>>>>>> For the networking problems, I mean that we can access Maven
>>>>>>> Central, and jobs cloud download the required jar package with a high
>>>>>>> network speed. What we want to know is that, why the Spark QA test jobs[1]
>>>>>>> log shows the job script/maven build seem don't download the jar packages?
>>>>>>> Could you tell us the reason about that? Thank you.  The reason we raise
>>>>>>> the "networking problems" is that we found a phenomenon during we test, if
>>>>>>> we execute "mvn clean package" in a new test environment(As in our test
>>>>>>> environment, we will destory the test VMs after the job is finish), maven
>>>>>>> will download the dependency jar packages from Maven Central, but in this
>>>>>>> job "spark-master-test-maven-hadoop" [2], from the log, we didn't found it
>>>>>>> download any jar packages, what the reason about that?
>>>>>>> Also we build the Spark jar with downloading dependencies from Maven
>>>>>>> Central, it will cost mostly 1 hour. And we found [2] just cost 10min. But
>>>>>>> if we run "mvn package" in a VM which already exec "mvn package" before, it
>>>>>>> just cost 14min, looks very closer with [2]. So we suspect that downloading
>>>>>>> the Jar packages cost so much time. For the goad of ARM CI, we expect the
>>>>>>> performance of NEW ARM CI could be closer with existing X86 CI, then users
>>>>>>> could accept it eaiser.
>>>>>>>
>>>>>>> [1] https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/
>>>>>>> [2]
>>>>>>> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-master-test-maven-hadoop-2.6-ubuntu-testing/lastBuild/consoleFull
>>>>>>>
>>>>>>> Best regards
>>>>>>>
>>>>>>> ZhaoBo
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> [image: Mailtrack]
>>>>>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> Sender
>>>>>>> notified by
>>>>>>> Mailtrack
>>>>>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> 19/08/16
>>>>>>> 上午09:48:43
>>>>>>>
>>>>>>> Sean Owen <sr...@gmail.com> 于2019年8月15日周四 下午9:58写道：
>>>>>>>
>>>>>>>> I think the right goal is to fix the remaining issues first. If we
>>>>>>>> set up CI/CD it will only tell us there are still some test failures. If
>>>>>>>> it's stable, and not hard to add to the existing CI/CD, yes it could be
>>>>>>>> done automatically later. You can continue to test on ARM independently for
>>>>>>>> now.
>>>>>>>>
>>>>>>>> It sounds indeed like there are some networking problems in the
>>>>>>>> test system if you're not able to download from Maven Central. That rarely
>>>>>>>> takes significant time, and there aren't project-specific mirrors here. You
>>>>>>>> might be able to point at a closer public mirror, depending on where you
>>>>>>>> are.
>>>>>>>>
>>>>>>>> On Thu, Aug 15, 2019 at 5:43 AM Tianhua huang <
>>>>>>>> huangtianhua223@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi all,
>>>>>>>>>
>>>>>>>>> I want to discuss spark ARM CI again, we took some tests on arm
>>>>>>>>> instance based on master and the job includes
>>>>>>>>> https://github.com/theopenlab/spark/pull/13  and k8s integration
>>>>>>>>> https://github.com/theopenlab/spark/pull/17/ , there are several
>>>>>>>>> things I want to talk about:
>>>>>>>>>
>>>>>>>>> First, about the failed tests:
>>>>>>>>>     1.we have fixed some problems like
>>>>>>>>> https://github.com/apache/spark/pull/25186 and
>>>>>>>>> https://github.com/apache/spark/pull/25279, thanks sean owen and
>>>>>>>>> others to help us.
>>>>>>>>>     2.we tried k8s integration test on arm, and met an error: apk
>>>>>>>>> fetch hangs,  the tests passed  after adding '--network host' option for
>>>>>>>>> command `docker build`, see:
>>>>>>>>>
>>>>>>>>> https://github.com/theopenlab/spark/pull/17/files#diff-5b731b14068240d63a93c393f6f9b1e8R176
>>>>>>>>> , the solution refers to
>>>>>>>>> https://github.com/gliderlabs/docker-alpine/issues/307  and I
>>>>>>>>> don't know whether it happened once in community CI, or maybe we should
>>>>>>>>> submit a pr to pass  '--network host' when `docker build`?
>>>>>>>>>     3.we found there are two tests failed after the commit
>>>>>>>>> https://github.com/apache/spark/pull/23767  :
>>>>>>>>>        ReplayListenerSuite:
>>>>>>>>>        - ...
>>>>>>>>>        - End-to-end replay *** FAILED ***
>>>>>>>>>          "[driver]" did not equal "[1]"
>>>>>>>>> (JsonProtocolSuite.scala:622)
>>>>>>>>>        - End-to-end replay with compression *** FAILED ***
>>>>>>>>>          "[driver]" did not equal "[1]"
>>>>>>>>> (JsonProtocolSuite.scala:622)
>>>>>>>>>
>>>>>>>>>         we tried to revert the commit and then the tests passed,
>>>>>>>>> the patch is too big and so sorry we can't find the reason till now, if you
>>>>>>>>> are interesting please try it, and it will be very appreciate          if
>>>>>>>>> someone can help us to figure it out.
>>>>>>>>>
>>>>>>>>> Second, about the test time, we increased the flavor of arm
>>>>>>>>> instance to 16U16G, but seems there was no significant improvement, the k8s
>>>>>>>>> integration test took about one and a half hours, and the QA test(like
>>>>>>>>> spark-master-test-maven-hadoop-2.7 community jenkins job) took about
>>>>>>>>> seventeen hours(it is too long :(), we suspect that the reason is the
>>>>>>>>> performance and network,
>>>>>>>>> we split the jobs based on projects such as sql, core and so on,
>>>>>>>>> the time can be decrease to about seven hours, see
>>>>>>>>> https://github.com/theopenlab/spark/pull/19 We found the Spark QA
>>>>>>>>> tests like
>>>>>>>>> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/   ,
>>>>>>>>> it looks all tests seem never download the jar packages from maven centry
>>>>>>>>> repo(such as
>>>>>>>>> https://repo.maven.apache.org/maven2/org/opencypher/okapi-api/0.4.2/okapi-api-0.4.2.jar).
>>>>>>>>> So we want to know how the jenkins jobs can do that, is there a internal
>>>>>>>>> maven repo launched? maybe we can do the same thing to avoid the network
>>>>>>>>> connection cost during downloading the dependent jar packages.
>>>>>>>>>
>>>>>>>>> Third, the most important thing, it's about ARM CI of spark, we
>>>>>>>>> believe that it is necessary, right? And you can see we really made a lot
>>>>>>>>> of efforts, now the basic arm build/test jobs is ok, so we suggest to add
>>>>>>>>> arm jobs to community, we can set them to novoting firstly, and
>>>>>>>>> improve/rich the jobs step by step. Generally, there are two ways in our
>>>>>>>>> mind to integrate the ARM CI for spark:
>>>>>>>>>      1) We introduce openlab ARM CI into spark as a custom CI
>>>>>>>>> system. We provide human resources and test ARM VMs, also we will focus on
>>>>>>>>> the ARM related issues about Spark. We will push the PR into community.
>>>>>>>>>      2) We donate ARM VM resources into existing amplab Jenkins.
>>>>>>>>> We still provide human resources, focus on the ARM related issues about
>>>>>>>>> Spark and push the PR into community.
>>>>>>>>> Both options, we will provide human resources to maintain, of
>>>>>>>>> course it will be great if we can work together. So please tell us which
>>>>>>>>> option you would like? And let's move forward. Waiting for your reply,
>>>>>>>>> thank you very much.
>>>>>>>>>
>>>>>>>>

-- 
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu

Re: Ask for ARM CI for spark

Posted by Tianhua huang <hu...@gmail.com>.

@Sean Owen <sr...@gmail.com> , so sorry to reply late, we had a Mid-Autumn
holiday:)

If you hope to integrate ARM CI to amplab jenkins, we can offer the arm
instance, and then the ARM job will run together with other x86 jobs, so
maybe there is a guideline to do this? @shane knapp <sk...@berkeley.edu>
would you help us?

On Thu, Sep 12, 2019 at 9:36 PM Sean Owen <sr...@gmail.com> wrote:

> I don't know what's involved in actually accepting or operating those
> machines, so can't comment there, but in the meantime it's good that you
> are running these tests and can help report changes needed to keep it
> working with ARM. I would continue with that for now.
>
> On Wed, Sep 11, 2019 at 10:06 PM Tianhua huang <hu...@gmail.com>
> wrote:
>
>> Hi all,
>>
>> For the whole work process of spark ARM CI, we want to make 2 things
>> clear.
>>
>> The first thing is:
>> About spark ARM CI, now we have two periodic jobs, one job[1] based on
>> commit[2](which already fixed the replay tests failed issue[3], we made a
>> new test branch based on date 09-09-2019), the other job[4] based on spark
>> master.
>>
>> The first job we test on the specified branch to prove that our ARM CI is
>> good and stable.
>> The second job checks spark master every day, then we can find whether
>> the latest commits affect the ARM CI. According to the build history and
>> result, it shows that some problems are easier to find on ARM like
>> SPARK-28770 <https://issues.apache.org/jira/browse/SPARK-28770>, and it
>> also shows that we would make efforts to trace and figure them out, till
>> now we have found and fixed several problems[5][6][7], thanks everyone of
>> the community :). And we believe that ARM CI is very necessary, right?
>>
>> The second thing is:
>> We plan to run the jobs for a period of time, and you can see the result
>> and logs from 'build history' of the jobs console, if everything goes well
>> for one or two weeks could community accept the ARM CI? or how long the
>> periodic jobs to run then our community could have enough confidence to
>> accept the ARM CI? As you suggested before, it's good to integrate ARM CI
>> to amplab jenkins, we agree that and we can donate the ARM instances and
>> then maintain the ARM-related test jobs together with community, any
>> thoughts?
>>
>> Thank you all!
>>
>> [1]
>> http://status.openlabtesting.org/job/spark-unchanged-branch-unit-test-hadoop-2.7-arm64
>> [2]
>> https://github.com/apache/spark/commit/0ed9fae45769d4b06b8cf8128f462f09ff3d9a72
>> [3] https://issues.apache.org/jira/browse/SPARK-28770
>> [4]
>> http://status.openlabtesting.org/builds?job_name=spark-master-unit-test-hadoop-2.7-arm64
>> [5] https://github.com/apache/spark/pull/25186
>> [6] https://github.com/apache/spark/pull/25279
>> [7] https://github.com/apache/spark/pull/25673
>>
>>
>>
>> On Fri, Aug 16, 2019 at 11:24 PM Sean Owen <sr...@gmail.com> wrote:
>>
>>> Yes, I think it's just local caching. After you run the build you should
>>> find lots of stuff cached at ~/.m2/repository and it won't download every
>>> time.
>>>
>>> On Fri, Aug 16, 2019 at 3:01 AM bo zhaobo <bz...@gmail.com>
>>> wrote:
>>>
>>>> Hi Sean,
>>>> Thanks for reply. And very apologize for making you confused.
>>>> I know the dependencies will be downloaded from SBT or Maven. But the
>>>> Spark QA job also exec "mvn clean package", why the log didn't print
>>>> "downloading some jar from Maven central [1] and build very fast. Is the
>>>> reason that Spark Jenkins build the Spark jars in the physical machiines
>>>> and won't destrory the test env after job is finished? Then the other job
>>>> build Spark will get the dependencies jar from the local cached, as the
>>>> previous jobs exec "mvn package", those dependencies had been downloaded
>>>> already on local worker machine. Am I right? Is that the reason the job
>>>> log[1] didn't print any downloading information from Maven Central?
>>>>
>>>> Thank you very much.
>>>>
>>>> [1]
>>>> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-master-test-maven-hadoop-2.6-ubuntu-testing/lastBuild/consoleFull
>>>>
>>>>
>>>> Best regards
>>>>
>>>> ZhaoBo
>>>>
>>>> [image: Mailtrack]
>>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> Sender
>>>> notified by
>>>> Mailtrack
>>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> 19/08/16
>>>> 下午03:58:53
>>>>
>>>> Sean Owen <sr...@gmail.com> 于2019年8月16日周五 上午10:38写道：
>>>>
>>>>> I'm not sure what you mean. The dependencies are downloaded by SBT and
>>>>> Maven like in any other project, and nothing about it is specific to Spark.
>>>>> The worker machines cache artifacts that are downloaded from these,
>>>>> but this is a function of Maven and SBT, not Spark. You may find that the
>>>>> initial download takes a long time.
>>>>>
>>>>> On Thu, Aug 15, 2019 at 9:02 PM bo zhaobo <bz...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Sean,
>>>>>>
>>>>>> Thanks very much for pointing out the roadmap. ;-). Then I think we
>>>>>> will continue to focus on our test environment.
>>>>>>
>>>>>> For the networking problems, I mean that we can access Maven Central,
>>>>>> and jobs cloud download the required jar package with a high network speed.
>>>>>> What we want to know is that, why the Spark QA test jobs[1] log shows the
>>>>>> job script/maven build seem don't download the jar packages? Could you tell
>>>>>> us the reason about that? Thank you.  The reason we raise the "networking
>>>>>> problems" is that we found a phenomenon during we test, if we execute "mvn
>>>>>> clean package" in a new test environment(As in our test environment, we
>>>>>> will destory the test VMs after the job is finish), maven will download the
>>>>>> dependency jar packages from Maven Central, but in this job
>>>>>> "spark-master-test-maven-hadoop" [2], from the log, we didn't found it
>>>>>> download any jar packages, what the reason about that?
>>>>>> Also we build the Spark jar with downloading dependencies from Maven
>>>>>> Central, it will cost mostly 1 hour. And we found [2] just cost 10min. But
>>>>>> if we run "mvn package" in a VM which already exec "mvn package" before, it
>>>>>> just cost 14min, looks very closer with [2]. So we suspect that downloading
>>>>>> the Jar packages cost so much time. For the goad of ARM CI, we expect the
>>>>>> performance of NEW ARM CI could be closer with existing X86 CI, then users
>>>>>> could accept it eaiser.
>>>>>>
>>>>>> [1] https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/
>>>>>> [2]
>>>>>> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-master-test-maven-hadoop-2.6-ubuntu-testing/lastBuild/consoleFull
>>>>>>
>>>>>> Best regards
>>>>>>
>>>>>> ZhaoBo
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> [image: Mailtrack]
>>>>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> Sender
>>>>>> notified by
>>>>>> Mailtrack
>>>>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> 19/08/16
>>>>>> 上午09:48:43
>>>>>>
>>>>>> Sean Owen <sr...@gmail.com> 于2019年8月15日周四 下午9:58写道：
>>>>>>
>>>>>>> I think the right goal is to fix the remaining issues first. If we
>>>>>>> set up CI/CD it will only tell us there are still some test failures. If
>>>>>>> it's stable, and not hard to add to the existing CI/CD, yes it could be
>>>>>>> done automatically later. You can continue to test on ARM independently for
>>>>>>> now.
>>>>>>>
>>>>>>> It sounds indeed like there are some networking problems in the test
>>>>>>> system if you're not able to download from Maven Central. That rarely takes
>>>>>>> significant time, and there aren't project-specific mirrors here. You might
>>>>>>> be able to point at a closer public mirror, depending on where you are.
>>>>>>>
>>>>>>> On Thu, Aug 15, 2019 at 5:43 AM Tianhua huang <
>>>>>>> huangtianhua223@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> I want to discuss spark ARM CI again, we took some tests on arm
>>>>>>>> instance based on master and the job includes
>>>>>>>> https://github.com/theopenlab/spark/pull/13  and k8s integration
>>>>>>>> https://github.com/theopenlab/spark/pull/17/ , there are several
>>>>>>>> things I want to talk about:
>>>>>>>>
>>>>>>>> First, about the failed tests:
>>>>>>>>     1.we have fixed some problems like
>>>>>>>> https://github.com/apache/spark/pull/25186 and
>>>>>>>> https://github.com/apache/spark/pull/25279, thanks sean owen and
>>>>>>>> others to help us.
>>>>>>>>     2.we tried k8s integration test on arm, and met an error: apk
>>>>>>>> fetch hangs,  the tests passed  after adding '--network host' option for
>>>>>>>> command `docker build`, see:
>>>>>>>>
>>>>>>>> https://github.com/theopenlab/spark/pull/17/files#diff-5b731b14068240d63a93c393f6f9b1e8R176
>>>>>>>> , the solution refers to
>>>>>>>> https://github.com/gliderlabs/docker-alpine/issues/307  and I
>>>>>>>> don't know whether it happened once in community CI, or maybe we should
>>>>>>>> submit a pr to pass  '--network host' when `docker build`?
>>>>>>>>     3.we found there are two tests failed after the commit
>>>>>>>> https://github.com/apache/spark/pull/23767  :
>>>>>>>>        ReplayListenerSuite:
>>>>>>>>        - ...
>>>>>>>>        - End-to-end replay *** FAILED ***
>>>>>>>>          "[driver]" did not equal "[1]"
>>>>>>>> (JsonProtocolSuite.scala:622)
>>>>>>>>        - End-to-end replay with compression *** FAILED ***
>>>>>>>>          "[driver]" did not equal "[1]"
>>>>>>>> (JsonProtocolSuite.scala:622)
>>>>>>>>
>>>>>>>>         we tried to revert the commit and then the tests passed,
>>>>>>>> the patch is too big and so sorry we can't find the reason till now, if you
>>>>>>>> are interesting please try it, and it will be very appreciate          if
>>>>>>>> someone can help us to figure it out.
>>>>>>>>
>>>>>>>> Second, about the test time, we increased the flavor of arm
>>>>>>>> instance to 16U16G, but seems there was no significant improvement, the k8s
>>>>>>>> integration test took about one and a half hours, and the QA test(like
>>>>>>>> spark-master-test-maven-hadoop-2.7 community jenkins job) took about
>>>>>>>> seventeen hours(it is too long :(), we suspect that the reason is the
>>>>>>>> performance and network,
>>>>>>>> we split the jobs based on projects such as sql, core and so on,
>>>>>>>> the time can be decrease to about seven hours, see
>>>>>>>> https://github.com/theopenlab/spark/pull/19 We found the Spark QA
>>>>>>>> tests like
>>>>>>>> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/   ,
>>>>>>>> it looks all tests seem never download the jar packages from maven centry
>>>>>>>> repo(such as
>>>>>>>> https://repo.maven.apache.org/maven2/org/opencypher/okapi-api/0.4.2/okapi-api-0.4.2.jar).
>>>>>>>> So we want to know how the jenkins jobs can do that, is there a internal
>>>>>>>> maven repo launched? maybe we can do the same thing to avoid the network
>>>>>>>> connection cost during downloading the dependent jar packages.
>>>>>>>>
>>>>>>>> Third, the most important thing, it's about ARM CI of spark, we
>>>>>>>> believe that it is necessary, right? And you can see we really made a lot
>>>>>>>> of efforts, now the basic arm build/test jobs is ok, so we suggest to add
>>>>>>>> arm jobs to community, we can set them to novoting firstly, and
>>>>>>>> improve/rich the jobs step by step. Generally, there are two ways in our
>>>>>>>> mind to integrate the ARM CI for spark:
>>>>>>>>      1) We introduce openlab ARM CI into spark as a custom CI
>>>>>>>> system. We provide human resources and test ARM VMs, also we will focus on
>>>>>>>> the ARM related issues about Spark. We will push the PR into community.
>>>>>>>>      2) We donate ARM VM resources into existing amplab Jenkins. We
>>>>>>>> still provide human resources, focus on the ARM related issues about Spark
>>>>>>>> and push the PR into community.
>>>>>>>> Both options, we will provide human resources to maintain, of
>>>>>>>> course it will be great if we can work together. So please tell us which
>>>>>>>> option you would like? And let's move forward. Waiting for your reply,
>>>>>>>> thank you very much.
>>>>>>>>
>>>>>>>

Re: Ask for ARM CI for spark

Posted by Sean Owen <sr...@gmail.com>.

I don't know what's involved in actually accepting or operating those
machines, so can't comment there, but in the meantime it's good that you
are running these tests and can help report changes needed to keep it
working with ARM. I would continue with that for now.

On Wed, Sep 11, 2019 at 10:06 PM Tianhua huang <hu...@gmail.com>
wrote:

> Hi all,
>
> For the whole work process of spark ARM CI, we want to make 2 things clear.
>
> The first thing is:
> About spark ARM CI, now we have two periodic jobs, one job[1] based on
> commit[2](which already fixed the replay tests failed issue[3], we made a
> new test branch based on date 09-09-2019), the other job[4] based on spark
> master.
>
> The first job we test on the specified branch to prove that our ARM CI is
> good and stable.
> The second job checks spark master every day, then we can find whether the
> latest commits affect the ARM CI. According to the build history and
> result, it shows that some problems are easier to find on ARM like
> SPARK-28770 <https://issues.apache.org/jira/browse/SPARK-28770>, and it
> also shows that we would make efforts to trace and figure them out, till
> now we have found and fixed several problems[5][6][7], thanks everyone of
> the community :). And we believe that ARM CI is very necessary, right?
>
> The second thing is:
> We plan to run the jobs for a period of time, and you can see the result
> and logs from 'build history' of the jobs console, if everything goes well
> for one or two weeks could community accept the ARM CI? or how long the
> periodic jobs to run then our community could have enough confidence to
> accept the ARM CI? As you suggested before, it's good to integrate ARM CI
> to amplab jenkins, we agree that and we can donate the ARM instances and
> then maintain the ARM-related test jobs together with community, any
> thoughts?
>
> Thank you all!
>
> [1]
> http://status.openlabtesting.org/job/spark-unchanged-branch-unit-test-hadoop-2.7-arm64
> [2]
> https://github.com/apache/spark/commit/0ed9fae45769d4b06b8cf8128f462f09ff3d9a72
> [3] https://issues.apache.org/jira/browse/SPARK-28770
> [4]
> http://status.openlabtesting.org/builds?job_name=spark-master-unit-test-hadoop-2.7-arm64
> [5] https://github.com/apache/spark/pull/25186
> [6] https://github.com/apache/spark/pull/25279
> [7] https://github.com/apache/spark/pull/25673
>
>
>
> On Fri, Aug 16, 2019 at 11:24 PM Sean Owen <sr...@gmail.com> wrote:
>
>> Yes, I think it's just local caching. After you run the build you should
>> find lots of stuff cached at ~/.m2/repository and it won't download every
>> time.
>>
>> On Fri, Aug 16, 2019 at 3:01 AM bo zhaobo <bz...@gmail.com>
>> wrote:
>>
>>> Hi Sean,
>>> Thanks for reply. And very apologize for making you confused.
>>> I know the dependencies will be downloaded from SBT or Maven. But the
>>> Spark QA job also exec "mvn clean package", why the log didn't print
>>> "downloading some jar from Maven central [1] and build very fast. Is the
>>> reason that Spark Jenkins build the Spark jars in the physical machiines
>>> and won't destrory the test env after job is finished? Then the other job
>>> build Spark will get the dependencies jar from the local cached, as the
>>> previous jobs exec "mvn package", those dependencies had been downloaded
>>> already on local worker machine. Am I right? Is that the reason the job
>>> log[1] didn't print any downloading information from Maven Central?
>>>
>>> Thank you very much.
>>>
>>> [1]
>>> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-master-test-maven-hadoop-2.6-ubuntu-testing/lastBuild/consoleFull
>>>
>>>
>>> Best regards
>>>
>>> ZhaoBo
>>>
>>> [image: Mailtrack]
>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> Sender
>>> notified by
>>> Mailtrack
>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> 19/08/16
>>> 下午03:58:53
>>>
>>> Sean Owen <sr...@gmail.com> 于2019年8月16日周五 上午10:38写道：
>>>
>>>> I'm not sure what you mean. The dependencies are downloaded by SBT and
>>>> Maven like in any other project, and nothing about it is specific to Spark.
>>>> The worker machines cache artifacts that are downloaded from these, but
>>>> this is a function of Maven and SBT, not Spark. You may find that the
>>>> initial download takes a long time.
>>>>
>>>> On Thu, Aug 15, 2019 at 9:02 PM bo zhaobo <bz...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Sean,
>>>>>
>>>>> Thanks very much for pointing out the roadmap. ;-). Then I think we
>>>>> will continue to focus on our test environment.
>>>>>
>>>>> For the networking problems, I mean that we can access Maven Central,
>>>>> and jobs cloud download the required jar package with a high network speed.
>>>>> What we want to know is that, why the Spark QA test jobs[1] log shows the
>>>>> job script/maven build seem don't download the jar packages? Could you tell
>>>>> us the reason about that? Thank you.  The reason we raise the "networking
>>>>> problems" is that we found a phenomenon during we test, if we execute "mvn
>>>>> clean package" in a new test environment(As in our test environment, we
>>>>> will destory the test VMs after the job is finish), maven will download the
>>>>> dependency jar packages from Maven Central, but in this job
>>>>> "spark-master-test-maven-hadoop" [2], from the log, we didn't found it
>>>>> download any jar packages, what the reason about that?
>>>>> Also we build the Spark jar with downloading dependencies from Maven
>>>>> Central, it will cost mostly 1 hour. And we found [2] just cost 10min. But
>>>>> if we run "mvn package" in a VM which already exec "mvn package" before, it
>>>>> just cost 14min, looks very closer with [2]. So we suspect that downloading
>>>>> the Jar packages cost so much time. For the goad of ARM CI, we expect the
>>>>> performance of NEW ARM CI could be closer with existing X86 CI, then users
>>>>> could accept it eaiser.
>>>>>
>>>>> [1] https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/
>>>>> [2]
>>>>> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-master-test-maven-hadoop-2.6-ubuntu-testing/lastBuild/consoleFull
>>>>>
>>>>> Best regards
>>>>>
>>>>> ZhaoBo
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> [image: Mailtrack]
>>>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> Sender
>>>>> notified by
>>>>> Mailtrack
>>>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> 19/08/16
>>>>> 上午09:48:43
>>>>>
>>>>> Sean Owen <sr...@gmail.com> 于2019年8月15日周四 下午9:58写道：
>>>>>
>>>>>> I think the right goal is to fix the remaining issues first. If we
>>>>>> set up CI/CD it will only tell us there are still some test failures. If
>>>>>> it's stable, and not hard to add to the existing CI/CD, yes it could be
>>>>>> done automatically later. You can continue to test on ARM independently for
>>>>>> now.
>>>>>>
>>>>>> It sounds indeed like there are some networking problems in the test
>>>>>> system if you're not able to download from Maven Central. That rarely takes
>>>>>> significant time, and there aren't project-specific mirrors here. You might
>>>>>> be able to point at a closer public mirror, depending on where you are.
>>>>>>
>>>>>> On Thu, Aug 15, 2019 at 5:43 AM Tianhua huang <
>>>>>> huangtianhua223@gmail.com> wrote:
>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> I want to discuss spark ARM CI again, we took some tests on arm
>>>>>>> instance based on master and the job includes
>>>>>>> https://github.com/theopenlab/spark/pull/13  and k8s integration
>>>>>>> https://github.com/theopenlab/spark/pull/17/ , there are several
>>>>>>> things I want to talk about:
>>>>>>>
>>>>>>> First, about the failed tests:
>>>>>>>     1.we have fixed some problems like
>>>>>>> https://github.com/apache/spark/pull/25186 and
>>>>>>> https://github.com/apache/spark/pull/25279, thanks sean owen and
>>>>>>> others to help us.
>>>>>>>     2.we tried k8s integration test on arm, and met an error: apk
>>>>>>> fetch hangs,  the tests passed  after adding '--network host' option for
>>>>>>> command `docker build`, see:
>>>>>>>
>>>>>>> https://github.com/theopenlab/spark/pull/17/files#diff-5b731b14068240d63a93c393f6f9b1e8R176
>>>>>>> , the solution refers to
>>>>>>> https://github.com/gliderlabs/docker-alpine/issues/307  and I don't
>>>>>>> know whether it happened once in community CI, or maybe we should submit a
>>>>>>> pr to pass  '--network host' when `docker build`?
>>>>>>>     3.we found there are two tests failed after the commit
>>>>>>> https://github.com/apache/spark/pull/23767  :
>>>>>>>        ReplayListenerSuite:
>>>>>>>        - ...
>>>>>>>        - End-to-end replay *** FAILED ***
>>>>>>>          "[driver]" did not equal "[1]" (JsonProtocolSuite.scala:622)
>>>>>>>        - End-to-end replay with compression *** FAILED ***
>>>>>>>          "[driver]" did not equal "[1]" (JsonProtocolSuite.scala:622)
>>>>>>>
>>>>>>>         we tried to revert the commit and then the tests passed, the
>>>>>>> patch is too big and so sorry we can't find the reason till now, if you are
>>>>>>> interesting please try it, and it will be very appreciate          if
>>>>>>> someone can help us to figure it out.
>>>>>>>
>>>>>>> Second, about the test time, we increased the flavor of arm instance
>>>>>>> to 16U16G, but seems there was no significant improvement, the k8s
>>>>>>> integration test took about one and a half hours, and the QA test(like
>>>>>>> spark-master-test-maven-hadoop-2.7 community jenkins job) took about
>>>>>>> seventeen hours(it is too long :(), we suspect that the reason is the
>>>>>>> performance and network,
>>>>>>> we split the jobs based on projects such as sql, core and so on, the
>>>>>>> time can be decrease to about seven hours, see
>>>>>>> https://github.com/theopenlab/spark/pull/19 We found the Spark QA
>>>>>>> tests like
>>>>>>> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/   ,
>>>>>>> it looks all tests seem never download the jar packages from maven centry
>>>>>>> repo(such as
>>>>>>> https://repo.maven.apache.org/maven2/org/opencypher/okapi-api/0.4.2/okapi-api-0.4.2.jar).
>>>>>>> So we want to know how the jenkins jobs can do that, is there a internal
>>>>>>> maven repo launched? maybe we can do the same thing to avoid the network
>>>>>>> connection cost during downloading the dependent jar packages.
>>>>>>>
>>>>>>> Third, the most important thing, it's about ARM CI of spark, we
>>>>>>> believe that it is necessary, right? And you can see we really made a lot
>>>>>>> of efforts, now the basic arm build/test jobs is ok, so we suggest to add
>>>>>>> arm jobs to community, we can set them to novoting firstly, and
>>>>>>> improve/rich the jobs step by step. Generally, there are two ways in our
>>>>>>> mind to integrate the ARM CI for spark:
>>>>>>>      1) We introduce openlab ARM CI into spark as a custom CI
>>>>>>> system. We provide human resources and test ARM VMs, also we will focus on
>>>>>>> the ARM related issues about Spark. We will push the PR into community.
>>>>>>>      2) We donate ARM VM resources into existing amplab Jenkins. We
>>>>>>> still provide human resources, focus on the ARM related issues about Spark
>>>>>>> and push the PR into community.
>>>>>>> Both options, we will provide human resources to maintain, of course
>>>>>>> it will be great if we can work together. So please tell us which option
>>>>>>> you would like? And let's move forward. Waiting for your reply, thank you
>>>>>>> very much.
>>>>>>>
>>>>>>

Re: Ask for ARM CI for spark

Posted by Tianhua huang <hu...@gmail.com>.

Hi all,

For the whole work process of spark ARM CI, we want to make 2 things clear.

The first thing is:
About spark ARM CI, now we have two periodic jobs, one job[1] based on
commit[2](which already fixed the replay tests failed issue[3], we made a
new test branch based on date 09-09-2019), the other job[4] based on spark
master.

The first job we test on the specified branch to prove that our ARM CI is
good and stable.
The second job checks spark master every day, then we can find whether the
latest commits affect the ARM CI. According to the build history and
result, it shows that some problems are easier to find on ARM like
SPARK-28770 <https://issues.apache.org/jira/browse/SPARK-28770>, and it
also shows that we would make efforts to trace and figure them out, till
now we have found and fixed several problems[5][6][7], thanks everyone of
the community :). And we believe that ARM CI is very necessary, right?

The second thing is:
We plan to run the jobs for a period of time, and you can see the result
and logs from 'build history' of the jobs console, if everything goes well
for one or two weeks could community accept the ARM CI? or how long the
periodic jobs to run then our community could have enough confidence to
accept the ARM CI? As you suggested before, it's good to integrate ARM CI
to amplab jenkins, we agree that and we can donate the ARM instances and
then maintain the ARM-related test jobs together with community, any
thoughts?

Thank you all!

[1]
http://status.openlabtesting.org/job/spark-unchanged-branch-unit-test-hadoop-2.7-arm64
[2]
https://github.com/apache/spark/commit/0ed9fae45769d4b06b8cf8128f462f09ff3d9a72
[3] https://issues.apache.org/jira/browse/SPARK-28770
[4]
http://status.openlabtesting.org/builds?job_name=spark-master-unit-test-hadoop-2.7-arm64
[5] https://github.com/apache/spark/pull/25186
[6] https://github.com/apache/spark/pull/25279
[7] https://github.com/apache/spark/pull/25673



On Fri, Aug 16, 2019 at 11:24 PM Sean Owen <sr...@gmail.com> wrote:

> Yes, I think it's just local caching. After you run the build you should
> find lots of stuff cached at ~/.m2/repository and it won't download every
> time.
>
> On Fri, Aug 16, 2019 at 3:01 AM bo zhaobo <bz...@gmail.com>
> wrote:
>
>> Hi Sean,
>> Thanks for reply. And very apologize for making you confused.
>> I know the dependencies will be downloaded from SBT or Maven. But the
>> Spark QA job also exec "mvn clean package", why the log didn't print
>> "downloading some jar from Maven central [1] and build very fast. Is the
>> reason that Spark Jenkins build the Spark jars in the physical machiines
>> and won't destrory the test env after job is finished? Then the other job
>> build Spark will get the dependencies jar from the local cached, as the
>> previous jobs exec "mvn package", those dependencies had been downloaded
>> already on local worker machine. Am I right? Is that the reason the job
>> log[1] didn't print any downloading information from Maven Central?
>>
>> Thank you very much.
>>
>> [1]
>> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-master-test-maven-hadoop-2.6-ubuntu-testing/lastBuild/consoleFull
>>
>>
>> Best regards
>>
>> ZhaoBo
>>
>> [image: Mailtrack]
>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> Sender
>> notified by
>> Mailtrack
>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> 19/08/16
>> 下午03:58:53
>>
>> Sean Owen <sr...@gmail.com> 于2019年8月16日周五 上午10:38写道：
>>
>>> I'm not sure what you mean. The dependencies are downloaded by SBT and
>>> Maven like in any other project, and nothing about it is specific to Spark.
>>> The worker machines cache artifacts that are downloaded from these, but
>>> this is a function of Maven and SBT, not Spark. You may find that the
>>> initial download takes a long time.
>>>
>>> On Thu, Aug 15, 2019 at 9:02 PM bo zhaobo <bz...@gmail.com>
>>> wrote:
>>>
>>>> Hi Sean,
>>>>
>>>> Thanks very much for pointing out the roadmap. ;-). Then I think we
>>>> will continue to focus on our test environment.
>>>>
>>>> For the networking problems, I mean that we can access Maven Central,
>>>> and jobs cloud download the required jar package with a high network speed.
>>>> What we want to know is that, why the Spark QA test jobs[1] log shows the
>>>> job script/maven build seem don't download the jar packages? Could you tell
>>>> us the reason about that? Thank you.  The reason we raise the "networking
>>>> problems" is that we found a phenomenon during we test, if we execute "mvn
>>>> clean package" in a new test environment(As in our test environment, we
>>>> will destory the test VMs after the job is finish), maven will download the
>>>> dependency jar packages from Maven Central, but in this job
>>>> "spark-master-test-maven-hadoop" [2], from the log, we didn't found it
>>>> download any jar packages, what the reason about that?
>>>> Also we build the Spark jar with downloading dependencies from Maven
>>>> Central, it will cost mostly 1 hour. And we found [2] just cost 10min. But
>>>> if we run "mvn package" in a VM which already exec "mvn package" before, it
>>>> just cost 14min, looks very closer with [2]. So we suspect that downloading
>>>> the Jar packages cost so much time. For the goad of ARM CI, we expect the
>>>> performance of NEW ARM CI could be closer with existing X86 CI, then users
>>>> could accept it eaiser.
>>>>
>>>> [1] https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/
>>>> [2]
>>>> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-master-test-maven-hadoop-2.6-ubuntu-testing/lastBuild/consoleFull
>>>>
>>>> Best regards
>>>>
>>>> ZhaoBo
>>>>
>>>>
>>>>
>>>>
>>>> [image: Mailtrack]
>>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> Sender
>>>> notified by
>>>> Mailtrack
>>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> 19/08/16
>>>> 上午09:48:43
>>>>
>>>> Sean Owen <sr...@gmail.com> 于2019年8月15日周四 下午9:58写道：
>>>>
>>>>> I think the right goal is to fix the remaining issues first. If we set
>>>>> up CI/CD it will only tell us there are still some test failures. If it's
>>>>> stable, and not hard to add to the existing CI/CD, yes it could be done
>>>>> automatically later. You can continue to test on ARM independently for now.
>>>>>
>>>>> It sounds indeed like there are some networking problems in the test
>>>>> system if you're not able to download from Maven Central. That rarely takes
>>>>> significant time, and there aren't project-specific mirrors here. You might
>>>>> be able to point at a closer public mirror, depending on where you are.
>>>>>
>>>>> On Thu, Aug 15, 2019 at 5:43 AM Tianhua huang <
>>>>> huangtianhua223@gmail.com> wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I want to discuss spark ARM CI again, we took some tests on arm
>>>>>> instance based on master and the job includes
>>>>>> https://github.com/theopenlab/spark/pull/13  and k8s integration
>>>>>> https://github.com/theopenlab/spark/pull/17/ , there are several
>>>>>> things I want to talk about:
>>>>>>
>>>>>> First, about the failed tests:
>>>>>>     1.we have fixed some problems like
>>>>>> https://github.com/apache/spark/pull/25186 and
>>>>>> https://github.com/apache/spark/pull/25279, thanks sean owen and
>>>>>> others to help us.
>>>>>>     2.we tried k8s integration test on arm, and met an error: apk
>>>>>> fetch hangs,  the tests passed  after adding '--network host' option for
>>>>>> command `docker build`, see:
>>>>>>
>>>>>> https://github.com/theopenlab/spark/pull/17/files#diff-5b731b14068240d63a93c393f6f9b1e8R176
>>>>>> , the solution refers to
>>>>>> https://github.com/gliderlabs/docker-alpine/issues/307  and I don't
>>>>>> know whether it happened once in community CI, or maybe we should submit a
>>>>>> pr to pass  '--network host' when `docker build`?
>>>>>>     3.we found there are two tests failed after the commit
>>>>>> https://github.com/apache/spark/pull/23767  :
>>>>>>        ReplayListenerSuite:
>>>>>>        - ...
>>>>>>        - End-to-end replay *** FAILED ***
>>>>>>          "[driver]" did not equal "[1]" (JsonProtocolSuite.scala:622)
>>>>>>        - End-to-end replay with compression *** FAILED ***
>>>>>>          "[driver]" did not equal "[1]" (JsonProtocolSuite.scala:622)
>>>>>>
>>>>>>         we tried to revert the commit and then the tests passed, the
>>>>>> patch is too big and so sorry we can't find the reason till now, if you are
>>>>>> interesting please try it, and it will be very appreciate          if
>>>>>> someone can help us to figure it out.
>>>>>>
>>>>>> Second, about the test time, we increased the flavor of arm instance
>>>>>> to 16U16G, but seems there was no significant improvement, the k8s
>>>>>> integration test took about one and a half hours, and the QA test(like
>>>>>> spark-master-test-maven-hadoop-2.7 community jenkins job) took about
>>>>>> seventeen hours(it is too long :(), we suspect that the reason is the
>>>>>> performance and network,
>>>>>> we split the jobs based on projects such as sql, core and so on, the
>>>>>> time can be decrease to about seven hours, see
>>>>>> https://github.com/theopenlab/spark/pull/19 We found the Spark QA
>>>>>> tests like
>>>>>> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/   ,
>>>>>> it looks all tests seem never download the jar packages from maven centry
>>>>>> repo(such as
>>>>>> https://repo.maven.apache.org/maven2/org/opencypher/okapi-api/0.4.2/okapi-api-0.4.2.jar).
>>>>>> So we want to know how the jenkins jobs can do that, is there a internal
>>>>>> maven repo launched? maybe we can do the same thing to avoid the network
>>>>>> connection cost during downloading the dependent jar packages.
>>>>>>
>>>>>> Third, the most important thing, it's about ARM CI of spark, we
>>>>>> believe that it is necessary, right? And you can see we really made a lot
>>>>>> of efforts, now the basic arm build/test jobs is ok, so we suggest to add
>>>>>> arm jobs to community, we can set them to novoting firstly, and
>>>>>> improve/rich the jobs step by step. Generally, there are two ways in our
>>>>>> mind to integrate the ARM CI for spark:
>>>>>>      1) We introduce openlab ARM CI into spark as a custom CI system.
>>>>>> We provide human resources and test ARM VMs, also we will focus on the ARM
>>>>>> related issues about Spark. We will push the PR into community.
>>>>>>      2) We donate ARM VM resources into existing amplab Jenkins. We
>>>>>> still provide human resources, focus on the ARM related issues about Spark
>>>>>> and push the PR into community.
>>>>>> Both options, we will provide human resources to maintain, of course
>>>>>> it will be great if we can work together. So please tell us which option
>>>>>> you would like? And let's move forward. Waiting for your reply, thank you
>>>>>> very much.
>>>>>>
>>>>>

Re: Ask for ARM CI for spark

Posted by bo zhaobo <bz...@gmail.com>.

Hi Sean,
Thanks for reply. And very apologize for making you confused.
I know the dependencies will be downloaded from SBT or Maven. But the Spark
QA job also exec "mvn clean package", why the log didn't print "downloading
some jar from Maven central [1] and build very fast. Is the reason that
Spark Jenkins build the Spark jars in the physical machiines and won't
destrory the test env after job is finished? Then the other job build Spark
will get the dependencies jar from the local cached, as the previous jobs
exec "mvn package", those dependencies had been downloaded already on local
worker machine. Am I right? Is that the reason the job log[1] didn't print
any downloading information from Maven Central?

Thank you very much.

[1]
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-master-test-maven-hadoop-2.6-ubuntu-testing/lastBuild/consoleFull


Best regards

ZhaoBo

[image: Mailtrack]
<https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&>
Sender
notified by
Mailtrack
<https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&>
19/08/16
下午03:58:53

Sean Owen <sr...@gmail.com> 于2019年8月16日周五 上午10:38写道：

> I'm not sure what you mean. The dependencies are downloaded by SBT and
> Maven like in any other project, and nothing about it is specific to Spark.
> The worker machines cache artifacts that are downloaded from these, but
> this is a function of Maven and SBT, not Spark. You may find that the
> initial download takes a long time.
>
> On Thu, Aug 15, 2019 at 9:02 PM bo zhaobo <bz...@gmail.com>
> wrote:
>
>> Hi Sean,
>>
>> Thanks very much for pointing out the roadmap. ;-). Then I think we will
>> continue to focus on our test environment.
>>
>> For the networking problems, I mean that we can access Maven Central, and
>> jobs cloud download the required jar package with a high network speed.
>> What we want to know is that, why the Spark QA test jobs[1] log shows the
>> job script/maven build seem don't download the jar packages? Could you tell
>> us the reason about that? Thank you.  The reason we raise the "networking
>> problems" is that we found a phenomenon during we test, if we execute "mvn
>> clean package" in a new test environment(As in our test environment, we
>> will destory the test VMs after the job is finish), maven will download the
>> dependency jar packages from Maven Central, but in this job
>> "spark-master-test-maven-hadoop" [2], from the log, we didn't found it
>> download any jar packages, what the reason about that?
>> Also we build the Spark jar with downloading dependencies from Maven
>> Central, it will cost mostly 1 hour. And we found [2] just cost 10min. But
>> if we run "mvn package" in a VM which already exec "mvn package" before, it
>> just cost 14min, looks very closer with [2]. So we suspect that downloading
>> the Jar packages cost so much time. For the goad of ARM CI, we expect the
>> performance of NEW ARM CI could be closer with existing X86 CI, then users
>> could accept it eaiser.
>>
>> [1] https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/
>> [2]
>> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-master-test-maven-hadoop-2.6-ubuntu-testing/lastBuild/consoleFull
>>
>> Best regards
>>
>> ZhaoBo
>>
>>
>>
>>
>> [image: Mailtrack]
>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> Sender
>> notified by
>> Mailtrack
>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> 19/08/16
>> 上午09:48:43
>>
>> Sean Owen <sr...@gmail.com> 于2019年8月15日周四 下午9:58写道：
>>
>>> I think the right goal is to fix the remaining issues first. If we set
>>> up CI/CD it will only tell us there are still some test failures. If it's
>>> stable, and not hard to add to the existing CI/CD, yes it could be done
>>> automatically later. You can continue to test on ARM independently for now.
>>>
>>> It sounds indeed like there are some networking problems in the test
>>> system if you're not able to download from Maven Central. That rarely takes
>>> significant time, and there aren't project-specific mirrors here. You might
>>> be able to point at a closer public mirror, depending on where you are.
>>>
>>> On Thu, Aug 15, 2019 at 5:43 AM Tianhua huang <hu...@gmail.com>
>>> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I want to discuss spark ARM CI again, we took some tests on arm
>>>> instance based on master and the job includes
>>>> https://github.com/theopenlab/spark/pull/13  and k8s integration
>>>> https://github.com/theopenlab/spark/pull/17/ , there are several
>>>> things I want to talk about:
>>>>
>>>> First, about the failed tests:
>>>>     1.we have fixed some problems like
>>>> https://github.com/apache/spark/pull/25186 and
>>>> https://github.com/apache/spark/pull/25279, thanks sean owen and
>>>> others to help us.
>>>>     2.we tried k8s integration test on arm, and met an error: apk fetch
>>>> hangs,  the tests passed  after adding '--network host' option for command
>>>> `docker build`, see:
>>>>
>>>> https://github.com/theopenlab/spark/pull/17/files#diff-5b731b14068240d63a93c393f6f9b1e8R176
>>>> , the solution refers to
>>>> https://github.com/gliderlabs/docker-alpine/issues/307  and I don't
>>>> know whether it happened once in community CI, or maybe we should submit a
>>>> pr to pass  '--network host' when `docker build`?
>>>>     3.we found there are two tests failed after the commit
>>>> https://github.com/apache/spark/pull/23767  :
>>>>        ReplayListenerSuite:
>>>>        - ...
>>>>        - End-to-end replay *** FAILED ***
>>>>          "[driver]" did not equal "[1]" (JsonProtocolSuite.scala:622)
>>>>        - End-to-end replay with compression *** FAILED ***
>>>>          "[driver]" did not equal "[1]" (JsonProtocolSuite.scala:622)
>>>>
>>>>         we tried to revert the commit and then the tests passed, the
>>>> patch is too big and so sorry we can't find the reason till now, if you are
>>>> interesting please try it, and it will be very appreciate          if
>>>> someone can help us to figure it out.
>>>>
>>>> Second, about the test time, we increased the flavor of arm instance to
>>>> 16U16G, but seems there was no significant improvement, the k8s integration
>>>> test took about one and a half hours, and the QA test(like
>>>> spark-master-test-maven-hadoop-2.7 community jenkins job) took about
>>>> seventeen hours(it is too long :(), we suspect that the reason is the
>>>> performance and network,
>>>> we split the jobs based on projects such as sql, core and so on, the
>>>> time can be decrease to about seven hours, see
>>>> https://github.com/theopenlab/spark/pull/19 We found the Spark QA
>>>> tests like
>>>> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/   , it
>>>> looks all tests seem never download the jar packages from maven centry
>>>> repo(such as
>>>> https://repo.maven.apache.org/maven2/org/opencypher/okapi-api/0.4.2/okapi-api-0.4.2.jar).
>>>> So we want to know how the jenkins jobs can do that, is there a internal
>>>> maven repo launched? maybe we can do the same thing to avoid the network
>>>> connection cost during downloading the dependent jar packages.
>>>>
>>>> Third, the most important thing, it's about ARM CI of spark, we believe
>>>> that it is necessary, right? And you can see we really made a lot of
>>>> efforts, now the basic arm build/test jobs is ok, so we suggest to add arm
>>>> jobs to community, we can set them to novoting firstly, and improve/rich
>>>> the jobs step by step. Generally, there are two ways in our mind to
>>>> integrate the ARM CI for spark:
>>>>      1) We introduce openlab ARM CI into spark as a custom CI system.
>>>> We provide human resources and test ARM VMs, also we will focus on the ARM
>>>> related issues about Spark. We will push the PR into community.
>>>>      2) We donate ARM VM resources into existing amplab Jenkins. We
>>>> still provide human resources, focus on the ARM related issues about Spark
>>>> and push the PR into community.
>>>> Both options, we will provide human resources to maintain, of course it
>>>> will be great if we can work together. So please tell us which option you
>>>> would like? And let's move forward. Waiting for your reply, thank you very
>>>> much.
>>>>
>>>

Re: Ask for ARM CI for spark

Posted by Sean Owen <sr...@gmail.com>.

I'm not sure what you mean. The dependencies are downloaded by SBT and
Maven like in any other project, and nothing about it is specific to Spark.
The worker machines cache artifacts that are downloaded from these, but
this is a function of Maven and SBT, not Spark. You may find that the
initial download takes a long time.

On Thu, Aug 15, 2019 at 9:02 PM bo zhaobo <bz...@gmail.com>
wrote:

> Hi Sean,
>
> Thanks very much for pointing out the roadmap. ;-). Then I think we will
> continue to focus on our test environment.
>
> For the networking problems, I mean that we can access Maven Central, and
> jobs cloud download the required jar package with a high network speed.
> What we want to know is that, why the Spark QA test jobs[1] log shows the
> job script/maven build seem don't download the jar packages? Could you tell
> us the reason about that? Thank you.  The reason we raise the "networking
> problems" is that we found a phenomenon during we test, if we execute "mvn
> clean package" in a new test environment(As in our test environment, we
> will destory the test VMs after the job is finish), maven will download the
> dependency jar packages from Maven Central, but in this job
> "spark-master-test-maven-hadoop" [2], from the log, we didn't found it
> download any jar packages, what the reason about that?
> Also we build the Spark jar with downloading dependencies from Maven
> Central, it will cost mostly 1 hour. And we found [2] just cost 10min. But
> if we run "mvn package" in a VM which already exec "mvn package" before, it
> just cost 14min, looks very closer with [2]. So we suspect that downloading
> the Jar packages cost so much time. For the goad of ARM CI, we expect the
> performance of NEW ARM CI could be closer with existing X86 CI, then users
> could accept it eaiser.
>
> [1] https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/
> [2]
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-master-test-maven-hadoop-2.6-ubuntu-testing/lastBuild/consoleFull
>
> Best regards
>
> ZhaoBo
>
>
>
>
> [image: Mailtrack]
> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> Sender
> notified by
> Mailtrack
> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> 19/08/16
> 上午09:48:43
>
> Sean Owen <sr...@gmail.com> 于2019年8月15日周四 下午9:58写道：
>
>> I think the right goal is to fix the remaining issues first. If we set up
>> CI/CD it will only tell us there are still some test failures. If it's
>> stable, and not hard to add to the existing CI/CD, yes it could be done
>> automatically later. You can continue to test on ARM independently for now.
>>
>> It sounds indeed like there are some networking problems in the test
>> system if you're not able to download from Maven Central. That rarely takes
>> significant time, and there aren't project-specific mirrors here. You might
>> be able to point at a closer public mirror, depending on where you are.
>>
>> On Thu, Aug 15, 2019 at 5:43 AM Tianhua huang <hu...@gmail.com>
>> wrote:
>>
>>> Hi all,
>>>
>>> I want to discuss spark ARM CI again, we took some tests on arm instance
>>> based on master and the job includes
>>> https://github.com/theopenlab/spark/pull/13  and k8s integration
>>> https://github.com/theopenlab/spark/pull/17/ , there are several things
>>> I want to talk about:
>>>
>>> First, about the failed tests:
>>>     1.we have fixed some problems like
>>> https://github.com/apache/spark/pull/25186 and
>>> https://github.com/apache/spark/pull/25279, thanks sean owen and others
>>> to help us.
>>>     2.we tried k8s integration test on arm, and met an error: apk fetch
>>> hangs,  the tests passed  after adding '--network host' option for command
>>> `docker build`, see:
>>>
>>> https://github.com/theopenlab/spark/pull/17/files#diff-5b731b14068240d63a93c393f6f9b1e8R176
>>> , the solution refers to
>>> https://github.com/gliderlabs/docker-alpine/issues/307  and I don't
>>> know whether it happened once in community CI, or maybe we should submit a
>>> pr to pass  '--network host' when `docker build`?
>>>     3.we found there are two tests failed after the commit
>>> https://github.com/apache/spark/pull/23767  :
>>>        ReplayListenerSuite:
>>>        - ...
>>>        - End-to-end replay *** FAILED ***
>>>          "[driver]" did not equal "[1]" (JsonProtocolSuite.scala:622)
>>>        - End-to-end replay with compression *** FAILED ***
>>>          "[driver]" did not equal "[1]" (JsonProtocolSuite.scala:622)
>>>
>>>         we tried to revert the commit and then the tests passed, the
>>> patch is too big and so sorry we can't find the reason till now, if you are
>>> interesting please try it, and it will be very appreciate          if
>>> someone can help us to figure it out.
>>>
>>> Second, about the test time, we increased the flavor of arm instance to
>>> 16U16G, but seems there was no significant improvement, the k8s integration
>>> test took about one and a half hours, and the QA test(like
>>> spark-master-test-maven-hadoop-2.7 community jenkins job) took about
>>> seventeen hours(it is too long :(), we suspect that the reason is the
>>> performance and network,
>>> we split the jobs based on projects such as sql, core and so on, the
>>> time can be decrease to about seven hours, see
>>> https://github.com/theopenlab/spark/pull/19 We found the Spark QA tests
>>> like  https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/   ,
>>> it looks all tests seem never download the jar packages from maven centry
>>> repo(such as
>>> https://repo.maven.apache.org/maven2/org/opencypher/okapi-api/0.4.2/okapi-api-0.4.2.jar).
>>> So we want to know how the jenkins jobs can do that, is there a internal
>>> maven repo launched? maybe we can do the same thing to avoid the network
>>> connection cost during downloading the dependent jar packages.
>>>
>>> Third, the most important thing, it's about ARM CI of spark, we believe
>>> that it is necessary, right? And you can see we really made a lot of
>>> efforts, now the basic arm build/test jobs is ok, so we suggest to add arm
>>> jobs to community, we can set them to novoting firstly, and improve/rich
>>> the jobs step by step. Generally, there are two ways in our mind to
>>> integrate the ARM CI for spark:
>>>      1) We introduce openlab ARM CI into spark as a custom CI system. We
>>> provide human resources and test ARM VMs, also we will focus on the ARM
>>> related issues about Spark. We will push the PR into community.
>>>      2) We donate ARM VM resources into existing amplab Jenkins. We
>>> still provide human resources, focus on the ARM related issues about Spark
>>> and push the PR into community.
>>> Both options, we will provide human resources to maintain, of course it
>>> will be great if we can work together. So please tell us which option you
>>> would like? And let's move forward. Waiting for your reply, thank you very
>>> much.
>>>
>>

Re: Ask for ARM CI for spark

Posted by bo zhaobo <bz...@gmail.com>.

Hi Sean,

Thanks very much for pointing out the roadmap. ;-). Then I think we will
continue to focus on our test environment.

For the networking problems, I mean that we can access Maven Central, and
jobs cloud download the required jar package with a high network speed.
What we want to know is that, why the Spark QA test jobs[1] log shows the
job script/maven build seem don't download the jar packages? Could you tell
us the reason about that? Thank you.  The reason we raise the "networking
problems" is that we found a phenomenon during we test, if we execute "mvn
clean package" in a new test environment(As in our test environment, we
will destory the test VMs after the job is finish), maven will download the
dependency jar packages from Maven Central, but in this job
"spark-master-test-maven-hadoop" [2], from the log, we didn't found it
download any jar packages, what the reason about that?
Also we build the Spark jar with downloading dependencies from Maven
Central, it will cost mostly 1 hour. And we found [2] just cost 10min. But
if we run "mvn package" in a VM which already exec "mvn package" before, it
just cost 14min, looks very closer with [2]. So we suspect that downloading
the Jar packages cost so much time. For the goad of ARM CI, we expect the
performance of NEW ARM CI could be closer with existing X86 CI, then users
could accept it eaiser.

[1] https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/
[2]
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-master-test-maven-hadoop-2.6-ubuntu-testing/lastBuild/consoleFull

Best regards

ZhaoBo




[image: Mailtrack]
<https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&>
Sender
notified by
Mailtrack
<https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&>
19/08/16
上午09:48:43

Sean Owen <sr...@gmail.com> 于2019年8月15日周四 下午9:58写道：

> I think the right goal is to fix the remaining issues first. If we set up
> CI/CD it will only tell us there are still some test failures. If it's
> stable, and not hard to add to the existing CI/CD, yes it could be done
> automatically later. You can continue to test on ARM independently for now.
>
> It sounds indeed like there are some networking problems in the test
> system if you're not able to download from Maven Central. That rarely takes
> significant time, and there aren't project-specific mirrors here. You might
> be able to point at a closer public mirror, depending on where you are.
>
> On Thu, Aug 15, 2019 at 5:43 AM Tianhua huang <hu...@gmail.com>
> wrote:
>
>> Hi all,
>>
>> I want to discuss spark ARM CI again, we took some tests on arm instance
>> based on master and the job includes
>> https://github.com/theopenlab/spark/pull/13  and k8s integration
>> https://github.com/theopenlab/spark/pull/17/ , there are several things
>> I want to talk about:
>>
>> First, about the failed tests:
>>     1.we have fixed some problems like
>> https://github.com/apache/spark/pull/25186 and
>> https://github.com/apache/spark/pull/25279, thanks sean owen and others
>> to help us.
>>     2.we tried k8s integration test on arm, and met an error: apk fetch
>> hangs,  the tests passed  after adding '--network host' option for command
>> `docker build`, see:
>>
>> https://github.com/theopenlab/spark/pull/17/files#diff-5b731b14068240d63a93c393f6f9b1e8R176
>> , the solution refers to
>> https://github.com/gliderlabs/docker-alpine/issues/307  and I don't know
>> whether it happened once in community CI, or maybe we should submit a pr to
>> pass  '--network host' when `docker build`?
>>     3.we found there are two tests failed after the commit
>> https://github.com/apache/spark/pull/23767  :
>>        ReplayListenerSuite:
>>        - ...
>>        - End-to-end replay *** FAILED ***
>>          "[driver]" did not equal "[1]" (JsonProtocolSuite.scala:622)
>>        - End-to-end replay with compression *** FAILED ***
>>          "[driver]" did not equal "[1]" (JsonProtocolSuite.scala:622)
>>
>>         we tried to revert the commit and then the tests passed, the
>> patch is too big and so sorry we can't find the reason till now, if you are
>> interesting please try it, and it will be very appreciate          if
>> someone can help us to figure it out.
>>
>> Second, about the test time, we increased the flavor of arm instance to
>> 16U16G, but seems there was no significant improvement, the k8s integration
>> test took about one and a half hours, and the QA test(like
>> spark-master-test-maven-hadoop-2.7 community jenkins job) took about
>> seventeen hours(it is too long :(), we suspect that the reason is the
>> performance and network,
>> we split the jobs based on projects such as sql, core and so on, the time
>> can be decrease to about seven hours, see
>> https://github.com/theopenlab/spark/pull/19 We found the Spark QA tests
>> like  https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/   ,
>> it looks all tests seem never download the jar packages from maven centry
>> repo(such as
>> https://repo.maven.apache.org/maven2/org/opencypher/okapi-api/0.4.2/okapi-api-0.4.2.jar).
>> So we want to know how the jenkins jobs can do that, is there a internal
>> maven repo launched? maybe we can do the same thing to avoid the network
>> connection cost during downloading the dependent jar packages.
>>
>> Third, the most important thing, it's about ARM CI of spark, we believe
>> that it is necessary, right? And you can see we really made a lot of
>> efforts, now the basic arm build/test jobs is ok, so we suggest to add arm
>> jobs to community, we can set them to novoting firstly, and improve/rich
>> the jobs step by step. Generally, there are two ways in our mind to
>> integrate the ARM CI for spark:
>>      1) We introduce openlab ARM CI into spark as a custom CI system. We
>> provide human resources and test ARM VMs, also we will focus on the ARM
>> related issues about Spark. We will push the PR into community.
>>      2) We donate ARM VM resources into existing amplab Jenkins. We still
>> provide human resources, focus on the ARM related issues about Spark and
>> push the PR into community.
>> Both options, we will provide human resources to maintain, of course it
>> will be great if we can work together. So please tell us which option you
>> would like? And let's move forward. Waiting for your reply, thank you very
>> much.
>>
>

Re: Ask for ARM CI for spark

Posted by Tianhua huang <hu...@gmail.com>.

@Sean Owen <sr...@gmail.com> , thanks for your reply.
I agree with you basically, two points I have to say :)
First, maybe I didn't express clear enough, now we download from Maven
Central in our test system, seems the community jenkins ci tests never
download the jar packages from maven centry repo, our question is if there
is an internal maven repo in community jenkins?
Second, about the failed tests, of course we will continue to figure them
out, and hope if someone can help/join us:) but I am afraid if we have to
wait it to be "stable"(maybe you mean no failed tests?) And the failed
tests of ReplayListenerSuite mentioned last mail are passed before, we
suspect it introduced by https://github.com/apache/spark/pull/23767, we
revert the code and the tests passed, so hope someone can help us to look
deep into it. Now the tests we took based on master, if some modification
introduce errors, the test will fail, I think this is one reason we need
arm ci.

Thank you all :）

On Thu, Aug 15, 2019 at 9:58 PM Sean Owen <sr...@gmail.com> wrote:

> I think the right goal is to fix the remaining issues first. If we set up
> CI/CD it will only tell us there are still some test failures. If it's
> stable, and not hard to add to the existing CI/CD, yes it could be done
> automatically later. You can continue to test on ARM independently for now.
>
> It sounds indeed like there are some networking problems in the test
> system if you're not able to download from Maven Central. That rarely takes
> significant time, and there aren't project-specific mirrors here. You might
> be able to point at a closer public mirror, depending on where you are.
>
> On Thu, Aug 15, 2019 at 5:43 AM Tianhua huang <hu...@gmail.com>
> wrote:
>
>> Hi all,
>>
>> I want to discuss spark ARM CI again, we took some tests on arm instance
>> based on master and the job includes
>> https://github.com/theopenlab/spark/pull/13  and k8s integration
>> https://github.com/theopenlab/spark/pull/17/ , there are several things
>> I want to talk about:
>>
>> First, about the failed tests:
>>     1.we have fixed some problems like
>> https://github.com/apache/spark/pull/25186 and
>> https://github.com/apache/spark/pull/25279, thanks sean owen and others
>> to help us.
>>     2.we tried k8s integration test on arm, and met an error: apk fetch
>> hangs,  the tests passed  after adding '--network host' option for command
>> `docker build`, see:
>>
>> https://github.com/theopenlab/spark/pull/17/files#diff-5b731b14068240d63a93c393f6f9b1e8R176
>> , the solution refers to
>> https://github.com/gliderlabs/docker-alpine/issues/307  and I don't know
>> whether it happened once in community CI, or maybe we should submit a pr to
>> pass  '--network host' when `docker build`?
>>     3.we found there are two tests failed after the commit
>> https://github.com/apache/spark/pull/23767  :
>>        ReplayListenerSuite:
>>        - ...
>>        - End-to-end replay *** FAILED ***
>>          "[driver]" did not equal "[1]" (JsonProtocolSuite.scala:622)
>>        - End-to-end replay with compression *** FAILED ***
>>          "[driver]" did not equal "[1]" (JsonProtocolSuite.scala:622)
>>
>>         we tried to revert the commit and then the tests passed, the
>> patch is too big and so sorry we can't find the reason till now, if you are
>> interesting please try it, and it will be very appreciate          if
>> someone can help us to figure it out.
>>
>> Second, about the test time, we increased the flavor of arm instance to
>> 16U16G, but seems there was no significant improvement, the k8s integration
>> test took about one and a half hours, and the QA test(like
>> spark-master-test-maven-hadoop-2.7 community jenkins job) took about
>> seventeen hours(it is too long :(), we suspect that the reason is the
>> performance and network,
>> we split the jobs based on projects such as sql, core and so on, the time
>> can be decrease to about seven hours, see
>> https://github.com/theopenlab/spark/pull/19 We found the Spark QA tests
>> like  https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/   ,
>> it looks all tests seem never download the jar packages from maven centry
>> repo(such as
>> https://repo.maven.apache.org/maven2/org/opencypher/okapi-api/0.4.2/okapi-api-0.4.2.jar).
>> So we want to know how the jenkins jobs can do that, is there a internal
>> maven repo launched? maybe we can do the same thing to avoid the network
>> connection cost during downloading the dependent jar packages.
>>
>> Third, the most important thing, it's about ARM CI of spark, we believe
>> that it is necessary, right? And you can see we really made a lot of
>> efforts, now the basic arm build/test jobs is ok, so we suggest to add arm
>> jobs to community, we can set them to novoting firstly, and improve/rich
>> the jobs step by step. Generally, there are two ways in our mind to
>> integrate the ARM CI for spark:
>>      1) We introduce openlab ARM CI into spark as a custom CI system. We
>> provide human resources and test ARM VMs, also we will focus on the ARM
>> related issues about Spark. We will push the PR into community.
>>      2) We donate ARM VM resources into existing amplab Jenkins. We still
>> provide human resources, focus on the ARM related issues about Spark and
>> push the PR into community.
>> Both options, we will provide human resources to maintain, of course it
>> will be great if we can work together. So please tell us which option you
>> would like? And let's move forward. Waiting for your reply, thank you very
>> much.
>>
>

Re: Ask for ARM CI for spark

Posted by Sean Owen <sr...@gmail.com>.

I think the right goal is to fix the remaining issues first. If we set up
CI/CD it will only tell us there are still some test failures. If it's
stable, and not hard to add to the existing CI/CD, yes it could be done
automatically later. You can continue to test on ARM independently for now.

It sounds indeed like there are some networking problems in the test system
if you're not able to download from Maven Central. That rarely takes
significant time, and there aren't project-specific mirrors here. You might
be able to point at a closer public mirror, depending on where you are.

On Thu, Aug 15, 2019 at 5:43 AM Tianhua huang <hu...@gmail.com>
wrote:

> Hi all,
>
> I want to discuss spark ARM CI again, we took some tests on arm instance
> based on master and the job includes
> https://github.com/theopenlab/spark/pull/13  and k8s integration
> https://github.com/theopenlab/spark/pull/17/ , there are several things I
> want to talk about:
>
> First, about the failed tests:
>     1.we have fixed some problems like
> https://github.com/apache/spark/pull/25186 and
> https://github.com/apache/spark/pull/25279, thanks sean owen and others
> to help us.
>     2.we tried k8s integration test on arm, and met an error: apk fetch
> hangs,  the tests passed  after adding '--network host' option for command
> `docker build`, see:
>
> https://github.com/theopenlab/spark/pull/17/files#diff-5b731b14068240d63a93c393f6f9b1e8R176
> , the solution refers to
> https://github.com/gliderlabs/docker-alpine/issues/307  and I don't know
> whether it happened once in community CI, or maybe we should submit a pr to
> pass  '--network host' when `docker build`?
>     3.we found there are two tests failed after the commit
> https://github.com/apache/spark/pull/23767  :
>        ReplayListenerSuite:
>        - ...
>        - End-to-end replay *** FAILED ***
>          "[driver]" did not equal "[1]" (JsonProtocolSuite.scala:622)
>        - End-to-end replay with compression *** FAILED ***
>          "[driver]" did not equal "[1]" (JsonProtocolSuite.scala:622)
>
>         we tried to revert the commit and then the tests passed, the patch
> is too big and so sorry we can't find the reason till now, if you are
> interesting please try it, and it will be very appreciate          if
> someone can help us to figure it out.
>
> Second, about the test time, we increased the flavor of arm instance to
> 16U16G, but seems there was no significant improvement, the k8s integration
> test took about one and a half hours, and the QA test(like
> spark-master-test-maven-hadoop-2.7 community jenkins job) took about
> seventeen hours(it is too long :(), we suspect that the reason is the
> performance and network,
> we split the jobs based on projects such as sql, core and so on, the time
> can be decrease to about seven hours, see
> https://github.com/theopenlab/spark/pull/19 We found the Spark QA tests
> like  https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/   ,
> it looks all tests seem never download the jar packages from maven centry
> repo(such as
> https://repo.maven.apache.org/maven2/org/opencypher/okapi-api/0.4.2/okapi-api-0.4.2.jar).
> So we want to know how the jenkins jobs can do that, is there a internal
> maven repo launched? maybe we can do the same thing to avoid the network
> connection cost during downloading the dependent jar packages.
>
> Third, the most important thing, it's about ARM CI of spark, we believe
> that it is necessary, right? And you can see we really made a lot of
> efforts, now the basic arm build/test jobs is ok, so we suggest to add arm
> jobs to community, we can set them to novoting firstly, and improve/rich
> the jobs step by step. Generally, there are two ways in our mind to
> integrate the ARM CI for spark:
>      1) We introduce openlab ARM CI into spark as a custom CI system. We
> provide human resources and test ARM VMs, also we will focus on the ARM
> related issues about Spark. We will push the PR into community.
>      2) We donate ARM VM resources into existing amplab Jenkins. We still
> provide human resources, focus on the ARM related issues about Spark and
> push the PR into community.
> Both options, we will provide human resources to maintain, of course it
> will be great if we can work together. So please tell us which option you
> would like? And let's move forward. Waiting for your reply, thank you very
> much.
>

Re: Ask for ARM CI for spark

Posted by Tianhua huang <hu...@gmail.com>.

Hi all,

I want to discuss spark ARM CI again, we took some tests on arm instance
based on master and the job includes
https://github.com/theopenlab/spark/pull/13  and k8s integration
https://github.com/theopenlab/spark/pull/17/ , there are several things I
want to talk about:

First, about the failed tests:
    1.we have fixed some problems like
https://github.com/apache/spark/pull/25186 and
https://github.com/apache/spark/pull/25279, thanks sean owen and others to
help us.
    2.we tried k8s integration test on arm, and met an error: apk fetch
hangs,  the tests passed  after adding '--network host' option for command
`docker build`, see:

https://github.com/theopenlab/spark/pull/17/files#diff-5b731b14068240d63a93c393f6f9b1e8R176
, the solution refers to
https://github.com/gliderlabs/docker-alpine/issues/307  and I don't know
whether it happened once in community CI, or maybe we should submit a pr to
pass  '--network host' when `docker build`?
    3.we found there are two tests failed after the commit
https://github.com/apache/spark/pull/23767  :
       ReplayListenerSuite:
       - ...
       - End-to-end replay *** FAILED ***
         "[driver]" did not equal "[1]" (JsonProtocolSuite.scala:622)
       - End-to-end replay with compression *** FAILED ***
         "[driver]" did not equal "[1]" (JsonProtocolSuite.scala:622)

        we tried to revert the commit and then the tests passed, the patch
is too big and so sorry we can't find the reason till now, if you are
interesting please try it, and it will be very appreciate          if
someone can help us to figure it out.

Second, about the test time, we increased the flavor of arm instance to
16U16G, but seems there was no significant improvement, the k8s integration
test took about one and a half hours, and the QA test(like
spark-master-test-maven-hadoop-2.7 community jenkins job) took about
seventeen hours(it is too long :(), we suspect that the reason is the
performance and network,
we split the jobs based on projects such as sql, core and so on, the time
can be decrease to about seven hours, see
https://github.com/theopenlab/spark/pull/19 We found the Spark QA tests
like  https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/   , it
looks all tests seem never download the jar packages from maven centry
repo(such as
https://repo.maven.apache.org/maven2/org/opencypher/okapi-api/0.4.2/okapi-api-0.4.2.jar).
So we want to know how the jenkins jobs can do that, is there a internal
maven repo launched? maybe we can do the same thing to avoid the network
connection cost during downloading the dependent jar packages.

Third, the most important thing, it's about ARM CI of spark, we believe
that it is necessary, right? And you can see we really made a lot of
efforts, now the basic arm build/test jobs is ok, so we suggest to add arm
jobs to community, we can set them to novoting firstly, and improve/rich
the jobs step by step. Generally, there are two ways in our mind to
integrate the ARM CI for spark:
     1) We introduce openlab ARM CI into spark as a custom CI system. We
provide human resources and test ARM VMs, also we will focus on the ARM
related issues about Spark. We will push the PR into community.
     2) We donate ARM VM resources into existing amplab Jenkins. We still
provide human resources, focus on the ARM related issues about Spark and
push the PR into community.
Both options, we will provide human resources to maintain, of course it
will be great if we can work together. So please tell us which option you
would like? And let's move forward. Waiting for your reply, thank you very
much.

On Wed, Aug 14, 2019 at 10:30 AM Tianhua huang <hu...@gmail.com>
wrote:

> OK, thanks.
>
> On Tue, Aug 13, 2019 at 8:37 PM Sean Owen <sr...@gmail.com> wrote:
>
>> -dev@ -- it's better not to send to the whole list to discuss specific
>> changes or issues from here. You can reply on the pull request.
>> I don't know what the issue is either at a glance.
>>
>> On Tue, Aug 13, 2019 at 2:54 AM Tianhua huang <hu...@gmail.com>
>> wrote:
>>
>>> Hi all,
>>>
>>> About the arm test of spark, recently we found two tests failed after
>>> the commit https://github.com/apache/spark/pull/23767:
>>>        ReplayListenerSuite:
>>>        - ...
>>>        - End-to-end replay *** FAILED ***
>>>          "[driver]" did not equal "[1]" (JsonProtocolSuite.scala:622)
>>>        - End-to-end replay with compression *** FAILED ***
>>>          "[driver]" did not equal "[1]" (JsonProtocolSuite.scala:622)
>>>
>>> We tried to revert the commit and then the tests passed, the patch is
>>> too big and so sorry we can't find the reason till now, if you are
>>> interesting please try it, and it will be very appreciate          if
>>> someone can help us to figure it out.
>>>
>>> On Tue, Aug 6, 2019 at 9:08 AM bo zhaobo <bz...@gmail.com>
>>> wrote:
>>>
>>>> Hi shane,
>>>> Thanks for your reply. I will wait for you back. ;-)
>>>>
>>>> Thanks,
>>>> Best regards
>>>> ZhaoBo
>>>>
>>>>
>>>>
>>>> [image: Mailtrack]
>>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> Sender
>>>> notified by
>>>> Mailtrack
>>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> 19/08/06
>>>> 上午09:06:23
>>>>
>>>> shane knapp <sk...@berkeley.edu> 于2019年8月2日周五 下午10:41写道：
>>>>
>>>>> i'm out of town, but will answer some of your questions next week.
>>>>>
>>>>> On Fri, Aug 2, 2019 at 2:39 AM bo zhaobo <bz...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>> Hi Team，
>>>>>>
>>>>>> Any updates about the CI details? ;-)
>>>>>>
>>>>>> Also, I will also need your kind help about Spark QA test, could any
>>>>>> one can tell us how to trigger that tests? When? How?  So far, I haven't
>>>>>> notices how it works.
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> Best Regards,
>>>>>>
>>>>>> ZhaoBo
>>>>>>
>>>>>>
>>>>>>
>>>>>> [image: Mailtrack]
>>>>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> Sender
>>>>>> notified by
>>>>>> Mailtrack
>>>>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> 19/08/02
>>>>>> 下午05:37:30
>>>>>>
>>>>>> bo zhaobo <bz...@gmail.com> 于2019年7月31日周三 上午11:56写道：
>>>>>>
>>>>>>> Hi, team.
>>>>>>> I want to make the same test on ARM like existing CI does(x86). As
>>>>>>> building and testing the whole spark projects will cost too long time, so I
>>>>>>> plan to split them to multiple jobs to run for lower time cost. But I
>>>>>>> cannot see what the existing CI[1] have done(so many private scripts
>>>>>>> called), so could any CI maintainers help/tell us for how to split them and
>>>>>>> the details about different CI jobs does? Such as PR title contains [SQL],
>>>>>>> [INFRA], [ML], [DOC], [CORE], [PYTHON], [k8s], [DSTREAMS], [MLlib],
>>>>>>> [SCHEDULER], [SS],[YARN], [BUIILD] and etc..I found each of them seems run
>>>>>>> the different CI job.
>>>>>>>
>>>>>>> @shane knapp,
>>>>>>> Oh, sorry for disturb. I found your email looks like from '
>>>>>>> berkeley.edu', are you the good guy who we are looking for help
>>>>>>> about this? ;-)
>>>>>>> If so, could you give some helps or advices? Thank you.
>>>>>>>
>>>>>>> Thank you very much,
>>>>>>>
>>>>>>> Best Regards,
>>>>>>>
>>>>>>> ZhaoBo
>>>>>>>
>>>>>>> [1] https://amplab.cs.berkeley.edu/jenkins
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> [image: Mailtrack]
>>>>>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> Sender
>>>>>>> notified by
>>>>>>> Mailtrack
>>>>>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> 19/07/31
>>>>>>> 上午11:53:36
>>>>>>>
>>>>>>> Tianhua huang <hu...@gmail.com> 于2019年7月29日周一 上午9:38写道：
>>>>>>>
>>>>>>>> @Sean Owen <sr...@gmail.com>  Thank you very much. And I saw your
>>>>>>>> reply comment in https://issues.apache.org/jira/browse/SPARK-28519,
>>>>>>>> I will test with modification and to see whether there are other similar
>>>>>>>> tests fail, and will address them together in one pull request.
>>>>>>>>
>>>>>>>> On Sat, Jul 27, 2019 at 9:04 PM Sean Owen <sr...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Great thanks - we can take this to JIRAs now.
>>>>>>>>> I think it's worth changing the implementation of atanh if the
>>>>>>>>> test value just reflects what Spark does, and there's evidence is a little
>>>>>>>>> bit inaccurate.
>>>>>>>>> There's an equivalent formula which seems to have better accuracy.
>>>>>>>>>
>>>>>>>>> On Fri, Jul 26, 2019 at 10:02 PM Takeshi Yamamuro <
>>>>>>>>> linguin.m.s@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi, all,
>>>>>>>>>>
>>>>>>>>>> FYI:
>>>>>>>>>> >> @Yuming Wang the results in float8.sql are from PostgreSQL
>>>>>>>>>> directly?
>>>>>>>>>> >> Interesting if it also returns the same less accurate result,
>>>>>>>>>> which
>>>>>>>>>> >> might suggest it's more to do with underlying OS math
>>>>>>>>>> libraries. You
>>>>>>>>>> >> noted that these tests sometimes gave platform-dependent
>>>>>>>>>> differences
>>>>>>>>>> >> in the last digit, so wondering if the test value directly
>>>>>>>>>> reflects
>>>>>>>>>> >> PostgreSQL or just what we happen to return now.
>>>>>>>>>>
>>>>>>>>>> The results in float8.sql.out were recomputed in Spark/JVM.
>>>>>>>>>> The expected output of the PostgreSQL test is here:
>>>>>>>>>> https://github.com/postgres/postgres/blob/master/src/test/regress/expected/float8.out#L493
>>>>>>>>>>
>>>>>>>>>> As you can see in the file (float8.out), the results other than atanh
>>>>>>>>>> also are different between Spark/JVM and PostgreSQL.
>>>>>>>>>> For example, the answers of acosh are:
>>>>>>>>>> -- PostgreSQL
>>>>>>>>>>
>>>>>>>>>> https://github.com/postgres/postgres/blob/master/src/test/regress/expected/float8.out#L487
>>>>>>>>>> 1.31695789692482
>>>>>>>>>>
>>>>>>>>>> -- Spark/JVM
>>>>>>>>>>
>>>>>>>>>> https://github.com/apache/spark/blob/master/sql/core/src/test/resources/sql-tests/results/pgSQL/float8.sql.out#L523
>>>>>>>>>> 1.3169578969248166
>>>>>>>>>>
>>>>>>>>>> btw, the PostgreSQL implementation for atanh just calls atanh in
>>>>>>>>>> math.h:
>>>>>>>>>>
>>>>>>>>>> https://github.com/postgres/postgres/blob/master/src/backend/utils/adt/float.c#L2606
>>>>>>>>>>
>>>>>>>>>> Bests,
>>>>>>>>>> Takeshi
>>>>>>>>>>
>>>>>>>>>>
>>>>>
>>>>> --
>>>>> Shane Knapp
>>>>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>>>> https://rise.cs.berkeley.edu
>>>>>
>>>>

Re: Ask for ARM CI for spark

Posted by Tianhua huang <hu...@gmail.com>.

Hi all,

About the arm test of spark, recently we found two tests failed after the
commit https://github.com/apache/spark/pull/23767:
       ReplayListenerSuite:
       - ...
       - End-to-end replay *** FAILED ***
         "[driver]" did not equal "[1]" (JsonProtocolSuite.scala:622)
       - End-to-end replay with compression *** FAILED ***
         "[driver]" did not equal "[1]" (JsonProtocolSuite.scala:622)

We tried to revert the commit and then the tests passed, the patch is too
big and so sorry we can't find the reason till now, if you are interesting
please try it, and it will be very appreciate          if someone can help
us to figure it out.

On Tue, Aug 6, 2019 at 9:08 AM bo zhaobo <bz...@gmail.com>
wrote:

> Hi shane,
> Thanks for your reply. I will wait for you back. ;-)
>
> Thanks,
> Best regards
> ZhaoBo
>
>
>
> [image: Mailtrack]
> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> Sender
> notified by
> Mailtrack
> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> 19/08/06
> 上午09:06:23
>
> shane knapp <sk...@berkeley.edu> 于2019年8月2日周五 下午10:41写道：
>
>> i'm out of town, but will answer some of your questions next week.
>>
>> On Fri, Aug 2, 2019 at 2:39 AM bo zhaobo <bz...@gmail.com>
>> wrote:
>>
>>>
>>> Hi Team，
>>>
>>> Any updates about the CI details? ;-)
>>>
>>> Also, I will also need your kind help about Spark QA test, could any one
>>> can tell us how to trigger that tests? When? How?  So far, I haven't
>>> notices how it works.
>>>
>>> Thanks
>>>
>>> Best Regards,
>>>
>>> ZhaoBo
>>>
>>>
>>>
>>> [image: Mailtrack]
>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> Sender
>>> notified by
>>> Mailtrack
>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> 19/08/02
>>> 下午05:37:30
>>>
>>> bo zhaobo <bz...@gmail.com> 于2019年7月31日周三 上午11:56写道：
>>>
>>>> Hi, team.
>>>> I want to make the same test on ARM like existing CI does(x86). As
>>>> building and testing the whole spark projects will cost too long time, so I
>>>> plan to split them to multiple jobs to run for lower time cost. But I
>>>> cannot see what the existing CI[1] have done(so many private scripts
>>>> called), so could any CI maintainers help/tell us for how to split them and
>>>> the details about different CI jobs does? Such as PR title contains [SQL],
>>>> [INFRA], [ML], [DOC], [CORE], [PYTHON], [k8s], [DSTREAMS], [MLlib],
>>>> [SCHEDULER], [SS],[YARN], [BUIILD] and etc..I found each of them seems run
>>>> the different CI job.
>>>>
>>>> @shane knapp,
>>>> Oh, sorry for disturb. I found your email looks like from 'berkeley.edu',
>>>> are you the good guy who we are looking for help about this? ;-)
>>>> If so, could you give some helps or advices? Thank you.
>>>>
>>>> Thank you very much,
>>>>
>>>> Best Regards,
>>>>
>>>> ZhaoBo
>>>>
>>>> [1] https://amplab.cs.berkeley.edu/jenkins
>>>>
>>>>
>>>>
>>>>
>>>> [image: Mailtrack]
>>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> Sender
>>>> notified by
>>>> Mailtrack
>>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> 19/07/31
>>>> 上午11:53:36
>>>>
>>>> Tianhua huang <hu...@gmail.com> 于2019年7月29日周一 上午9:38写道：
>>>>
>>>>> @Sean Owen <sr...@gmail.com>  Thank you very much. And I saw your
>>>>> reply comment in https://issues.apache.org/jira/browse/SPARK-28519, I
>>>>> will test with modification and to see whether there are other similar
>>>>> tests fail, and will address them together in one pull request.
>>>>>
>>>>> On Sat, Jul 27, 2019 at 9:04 PM Sean Owen <sr...@gmail.com> wrote:
>>>>>
>>>>>> Great thanks - we can take this to JIRAs now.
>>>>>> I think it's worth changing the implementation of atanh if the test
>>>>>> value just reflects what Spark does, and there's evidence is a little bit
>>>>>> inaccurate.
>>>>>> There's an equivalent formula which seems to have better accuracy.
>>>>>>
>>>>>> On Fri, Jul 26, 2019 at 10:02 PM Takeshi Yamamuro <
>>>>>> linguin.m.s@gmail.com> wrote:
>>>>>>
>>>>>>> Hi, all,
>>>>>>>
>>>>>>> FYI:
>>>>>>> >> @Yuming Wang the results in float8.sql are from PostgreSQL
>>>>>>> directly?
>>>>>>> >> Interesting if it also returns the same less accurate result,
>>>>>>> which
>>>>>>> >> might suggest it's more to do with underlying OS math libraries.
>>>>>>> You
>>>>>>> >> noted that these tests sometimes gave platform-dependent
>>>>>>> differences
>>>>>>> >> in the last digit, so wondering if the test value directly
>>>>>>> reflects
>>>>>>> >> PostgreSQL or just what we happen to return now.
>>>>>>>
>>>>>>> The results in float8.sql.out were recomputed in Spark/JVM.
>>>>>>> The expected output of the PostgreSQL test is here:
>>>>>>> https://github.com/postgres/postgres/blob/master/src/test/regress/expected/float8.out#L493
>>>>>>>
>>>>>>> As you can see in the file (float8.out), the results other than atanh
>>>>>>> also are different between Spark/JVM and PostgreSQL.
>>>>>>> For example, the answers of acosh are:
>>>>>>> -- PostgreSQL
>>>>>>>
>>>>>>> https://github.com/postgres/postgres/blob/master/src/test/regress/expected/float8.out#L487
>>>>>>> 1.31695789692482
>>>>>>>
>>>>>>> -- Spark/JVM
>>>>>>>
>>>>>>> https://github.com/apache/spark/blob/master/sql/core/src/test/resources/sql-tests/results/pgSQL/float8.sql.out#L523
>>>>>>> 1.3169578969248166
>>>>>>>
>>>>>>> btw, the PostgreSQL implementation for atanh just calls atanh in
>>>>>>> math.h:
>>>>>>>
>>>>>>> https://github.com/postgres/postgres/blob/master/src/backend/utils/adt/float.c#L2606
>>>>>>>
>>>>>>> Bests,
>>>>>>> Takeshi
>>>>>>>
>>>>>>>
>>
>> --
>> Shane Knapp
>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>> https://rise.cs.berkeley.edu
>>
>

Re: Ask for ARM CI for spark

Posted by bo zhaobo <bz...@gmail.com>.

Hi shane,
Thanks for your reply. I will wait for you back. ;-)

Thanks,
Best regards
ZhaoBo



[image: Mailtrack]
<https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&>
Sender
notified by
Mailtrack
<https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&>
19/08/06
上午09:06:23

shane knapp <sk...@berkeley.edu> 于2019年8月2日周五 下午10:41写道：

> i'm out of town, but will answer some of your questions next week.
>
> On Fri, Aug 2, 2019 at 2:39 AM bo zhaobo <bz...@gmail.com>
> wrote:
>
>>
>> Hi Team，
>>
>> Any updates about the CI details? ;-)
>>
>> Also, I will also need your kind help about Spark QA test, could any one
>> can tell us how to trigger that tests? When? How?  So far, I haven't
>> notices how it works.
>>
>> Thanks
>>
>> Best Regards,
>>
>> ZhaoBo
>>
>>
>>
>> [image: Mailtrack]
>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> Sender
>> notified by
>> Mailtrack
>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> 19/08/02
>> 下午05:37:30
>>
>> bo zhaobo <bz...@gmail.com> 于2019年7月31日周三 上午11:56写道：
>>
>>> Hi, team.
>>> I want to make the same test on ARM like existing CI does(x86). As
>>> building and testing the whole spark projects will cost too long time, so I
>>> plan to split them to multiple jobs to run for lower time cost. But I
>>> cannot see what the existing CI[1] have done(so many private scripts
>>> called), so could any CI maintainers help/tell us for how to split them and
>>> the details about different CI jobs does? Such as PR title contains [SQL],
>>> [INFRA], [ML], [DOC], [CORE], [PYTHON], [k8s], [DSTREAMS], [MLlib],
>>> [SCHEDULER], [SS],[YARN], [BUIILD] and etc..I found each of them seems run
>>> the different CI job.
>>>
>>> @shane knapp,
>>> Oh, sorry for disturb. I found your email looks like from 'berkeley.edu',
>>> are you the good guy who we are looking for help about this? ;-)
>>> If so, could you give some helps or advices? Thank you.
>>>
>>> Thank you very much,
>>>
>>> Best Regards,
>>>
>>> ZhaoBo
>>>
>>> [1] https://amplab.cs.berkeley.edu/jenkins
>>>
>>>
>>>
>>>
>>> [image: Mailtrack]
>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> Sender
>>> notified by
>>> Mailtrack
>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> 19/07/31
>>> 上午11:53:36
>>>
>>> Tianhua huang <hu...@gmail.com> 于2019年7月29日周一 上午9:38写道：
>>>
>>>> @Sean Owen <sr...@gmail.com>  Thank you very much. And I saw your
>>>> reply comment in https://issues.apache.org/jira/browse/SPARK-28519, I
>>>> will test with modification and to see whether there are other similar
>>>> tests fail, and will address them together in one pull request.
>>>>
>>>> On Sat, Jul 27, 2019 at 9:04 PM Sean Owen <sr...@gmail.com> wrote:
>>>>
>>>>> Great thanks - we can take this to JIRAs now.
>>>>> I think it's worth changing the implementation of atanh if the test
>>>>> value just reflects what Spark does, and there's evidence is a little bit
>>>>> inaccurate.
>>>>> There's an equivalent formula which seems to have better accuracy.
>>>>>
>>>>> On Fri, Jul 26, 2019 at 10:02 PM Takeshi Yamamuro <
>>>>> linguin.m.s@gmail.com> wrote:
>>>>>
>>>>>> Hi, all,
>>>>>>
>>>>>> FYI:
>>>>>> >> @Yuming Wang the results in float8.sql are from PostgreSQL
>>>>>> directly?
>>>>>> >> Interesting if it also returns the same less accurate result, which
>>>>>> >> might suggest it's more to do with underlying OS math libraries.
>>>>>> You
>>>>>> >> noted that these tests sometimes gave platform-dependent
>>>>>> differences
>>>>>> >> in the last digit, so wondering if the test value directly reflects
>>>>>> >> PostgreSQL or just what we happen to return now.
>>>>>>
>>>>>> The results in float8.sql.out were recomputed in Spark/JVM.
>>>>>> The expected output of the PostgreSQL test is here:
>>>>>> https://github.com/postgres/postgres/blob/master/src/test/regress/expected/float8.out#L493
>>>>>>
>>>>>> As you can see in the file (float8.out), the results other than atanh
>>>>>> also are different between Spark/JVM and PostgreSQL.
>>>>>> For example, the answers of acosh are:
>>>>>> -- PostgreSQL
>>>>>>
>>>>>> https://github.com/postgres/postgres/blob/master/src/test/regress/expected/float8.out#L487
>>>>>> 1.31695789692482
>>>>>>
>>>>>> -- Spark/JVM
>>>>>>
>>>>>> https://github.com/apache/spark/blob/master/sql/core/src/test/resources/sql-tests/results/pgSQL/float8.sql.out#L523
>>>>>> 1.3169578969248166
>>>>>>
>>>>>> btw, the PostgreSQL implementation for atanh just calls atanh in
>>>>>> math.h:
>>>>>>
>>>>>> https://github.com/postgres/postgres/blob/master/src/backend/utils/adt/float.c#L2606
>>>>>>
>>>>>> Bests,
>>>>>> Takeshi
>>>>>>
>>>>>>
>
> --
> Shane Knapp
> UC Berkeley EECS Research / RISELab Staff Technical Lead
> https://rise.cs.berkeley.edu
>

Re: Ask for ARM CI for spark

Posted by shane knapp <sk...@berkeley.edu>.

i'm out of town, but will answer some of your questions next week.

On Fri, Aug 2, 2019 at 2:39 AM bo zhaobo <bz...@gmail.com>
wrote:

>
> Hi Team，
>
> Any updates about the CI details? ;-)
>
> Also, I will also need your kind help about Spark QA test, could any one
> can tell us how to trigger that tests? When? How?  So far, I haven't
> notices how it works.
>
> Thanks
>
> Best Regards,
>
> ZhaoBo
>
>
>
> [image: Mailtrack]
> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> Sender
> notified by
> Mailtrack
> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> 19/08/02
> 下午05:37:30
>
> bo zhaobo <bz...@gmail.com> 于2019年7月31日周三 上午11:56写道：
>
>> Hi, team.
>> I want to make the same test on ARM like existing CI does(x86). As
>> building and testing the whole spark projects will cost too long time, so I
>> plan to split them to multiple jobs to run for lower time cost. But I
>> cannot see what the existing CI[1] have done(so many private scripts
>> called), so could any CI maintainers help/tell us for how to split them and
>> the details about different CI jobs does? Such as PR title contains [SQL],
>> [INFRA], [ML], [DOC], [CORE], [PYTHON], [k8s], [DSTREAMS], [MLlib],
>> [SCHEDULER], [SS],[YARN], [BUIILD] and etc..I found each of them seems run
>> the different CI job.
>>
>> @shane knapp,
>> Oh, sorry for disturb. I found your email looks like from 'berkeley.edu',
>> are you the good guy who we are looking for help about this? ;-)
>> If so, could you give some helps or advices? Thank you.
>>
>> Thank you very much,
>>
>> Best Regards,
>>
>> ZhaoBo
>>
>> [1] https://amplab.cs.berkeley.edu/jenkins
>>
>>
>>
>>
>> [image: Mailtrack]
>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> Sender
>> notified by
>> Mailtrack
>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> 19/07/31
>> 上午11:53:36
>>
>> Tianhua huang <hu...@gmail.com> 于2019年7月29日周一 上午9:38写道：
>>
>>> @Sean Owen <sr...@gmail.com>  Thank you very much. And I saw your
>>> reply comment in https://issues.apache.org/jira/browse/SPARK-28519, I
>>> will test with modification and to see whether there are other similar
>>> tests fail, and will address them together in one pull request.
>>>
>>> On Sat, Jul 27, 2019 at 9:04 PM Sean Owen <sr...@gmail.com> wrote:
>>>
>>>> Great thanks - we can take this to JIRAs now.
>>>> I think it's worth changing the implementation of atanh if the test
>>>> value just reflects what Spark does, and there's evidence is a little bit
>>>> inaccurate.
>>>> There's an equivalent formula which seems to have better accuracy.
>>>>
>>>> On Fri, Jul 26, 2019 at 10:02 PM Takeshi Yamamuro <
>>>> linguin.m.s@gmail.com> wrote:
>>>>
>>>>> Hi, all,
>>>>>
>>>>> FYI:
>>>>> >> @Yuming Wang the results in float8.sql are from PostgreSQL directly?
>>>>> >> Interesting if it also returns the same less accurate result, which
>>>>> >> might suggest it's more to do with underlying OS math libraries. You
>>>>> >> noted that these tests sometimes gave platform-dependent differences
>>>>> >> in the last digit, so wondering if the test value directly reflects
>>>>> >> PostgreSQL or just what we happen to return now.
>>>>>
>>>>> The results in float8.sql.out were recomputed in Spark/JVM.
>>>>> The expected output of the PostgreSQL test is here:
>>>>> https://github.com/postgres/postgres/blob/master/src/test/regress/expected/float8.out#L493
>>>>>
>>>>> As you can see in the file (float8.out), the results other than atanh
>>>>> also are different between Spark/JVM and PostgreSQL.
>>>>> For example, the answers of acosh are:
>>>>> -- PostgreSQL
>>>>>
>>>>> https://github.com/postgres/postgres/blob/master/src/test/regress/expected/float8.out#L487
>>>>> 1.31695789692482
>>>>>
>>>>> -- Spark/JVM
>>>>>
>>>>> https://github.com/apache/spark/blob/master/sql/core/src/test/resources/sql-tests/results/pgSQL/float8.sql.out#L523
>>>>> 1.3169578969248166
>>>>>
>>>>> btw, the PostgreSQL implementation for atanh just calls atanh in
>>>>> math.h:
>>>>>
>>>>> https://github.com/postgres/postgres/blob/master/src/backend/utils/adt/float.c#L2606
>>>>>
>>>>> Bests,
>>>>> Takeshi
>>>>>
>>>>>

-- 
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu

Re: Ask for ARM CI for spark

Posted by bo zhaobo <bz...@gmail.com>.

Hi Team，

Any updates about the CI details? ;-)

Also, I will also need your kind help about Spark QA test, could any one
can tell us how to trigger that tests? When? How?  So far, I haven't
notices how it works.

Thanks

Best Regards,

ZhaoBo



[image: Mailtrack]
<https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&>
Sender
notified by
Mailtrack
<https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&>
19/08/02
下午05:37:30

bo zhaobo <bz...@gmail.com> 于2019年7月31日周三 上午11:56写道：

> Hi, team.
> I want to make the same test on ARM like existing CI does(x86). As
> building and testing the whole spark projects will cost too long time, so I
> plan to split them to multiple jobs to run for lower time cost. But I
> cannot see what the existing CI[1] have done(so many private scripts
> called), so could any CI maintainers help/tell us for how to split them and
> the details about different CI jobs does? Such as PR title contains [SQL],
> [INFRA], [ML], [DOC], [CORE], [PYTHON], [k8s], [DSTREAMS], [MLlib],
> [SCHEDULER], [SS],[YARN], [BUIILD] and etc..I found each of them seems run
> the different CI job.
>
> @shane knapp,
> Oh, sorry for disturb. I found your email looks like from 'berkeley.edu',
> are you the good guy who we are looking for help about this? ;-)
> If so, could you give some helps or advices? Thank you.
>
> Thank you very much,
>
> Best Regards,
>
> ZhaoBo
>
> [1] https://amplab.cs.berkeley.edu/jenkins
>
>
>
>
> [image: Mailtrack]
> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> Sender
> notified by
> Mailtrack
> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> 19/07/31
> 上午11:53:36
>
> Tianhua huang <hu...@gmail.com> 于2019年7月29日周一 上午9:38写道：
>
>> @Sean Owen <sr...@gmail.com>  Thank you very much. And I saw your reply
>> comment in https://issues.apache.org/jira/browse/SPARK-28519, I will
>> test with modification and to see whether there are other similar tests
>> fail, and will address them together in one pull request.
>>
>> On Sat, Jul 27, 2019 at 9:04 PM Sean Owen <sr...@gmail.com> wrote:
>>
>>> Great thanks - we can take this to JIRAs now.
>>> I think it's worth changing the implementation of atanh if the test
>>> value just reflects what Spark does, and there's evidence is a little bit
>>> inaccurate.
>>> There's an equivalent formula which seems to have better accuracy.
>>>
>>> On Fri, Jul 26, 2019 at 10:02 PM Takeshi Yamamuro <li...@gmail.com>
>>> wrote:
>>>
>>>> Hi, all,
>>>>
>>>> FYI:
>>>> >> @Yuming Wang the results in float8.sql are from PostgreSQL directly?
>>>> >> Interesting if it also returns the same less accurate result, which
>>>> >> might suggest it's more to do with underlying OS math libraries. You
>>>> >> noted that these tests sometimes gave platform-dependent differences
>>>> >> in the last digit, so wondering if the test value directly reflects
>>>> >> PostgreSQL or just what we happen to return now.
>>>>
>>>> The results in float8.sql.out were recomputed in Spark/JVM.
>>>> The expected output of the PostgreSQL test is here:
>>>> https://github.com/postgres/postgres/blob/master/src/test/regress/expected/float8.out#L493
>>>>
>>>> As you can see in the file (float8.out), the results other than atanh
>>>> also are different between Spark/JVM and PostgreSQL.
>>>> For example, the answers of acosh are:
>>>> -- PostgreSQL
>>>>
>>>> https://github.com/postgres/postgres/blob/master/src/test/regress/expected/float8.out#L487
>>>> 1.31695789692482
>>>>
>>>> -- Spark/JVM
>>>>
>>>> https://github.com/apache/spark/blob/master/sql/core/src/test/resources/sql-tests/results/pgSQL/float8.sql.out#L523
>>>> 1.3169578969248166
>>>>
>>>> btw, the PostgreSQL implementation for atanh just calls atanh in
>>>> math.h:
>>>>
>>>> https://github.com/postgres/postgres/blob/master/src/backend/utils/adt/float.c#L2606
>>>>
>>>> Bests,
>>>> Takeshi
>>>>
>>>>

Re: Ask for ARM CI for spark

Posted by bo zhaobo <bz...@gmail.com>.

Hi, team.
I want to make the same test on ARM like existing CI does(x86). As building
and testing the whole spark projects will cost too long time, so I plan to
split them to multiple jobs to run for lower time cost. But I cannot see
what the existing CI[1] have done(so many private scripts called), so could
any CI maintainers help/tell us for how to split them and the details about
different CI jobs does? Such as PR title contains [SQL], [INFRA], [ML],
[DOC], [CORE], [PYTHON], [k8s], [DSTREAMS], [MLlib], [SCHEDULER],
[SS],[YARN], [BUIILD] and etc..I found each of them seems run the different
CI job.

@shane knapp,
Oh, sorry for disturb. I found your email looks like from 'berkeley.edu',
are you the good guy who we are looking for help about this? ;-)
If so, could you give some helps or advices? Thank you.

Thank you very much,

Best Regards,

ZhaoBo

[1] https://amplab.cs.berkeley.edu/jenkins




[image: Mailtrack]
<https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&>
Sender
notified by
Mailtrack
<https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&>
19/07/31
上午11:53:36

Tianhua huang <hu...@gmail.com> 于2019年7月29日周一 上午9:38写道：

> @Sean Owen <sr...@gmail.com>  Thank you very much. And I saw your reply
> comment in https://issues.apache.org/jira/browse/SPARK-28519, I will test
> with modification and to see whether there are other similar tests fail,
> and will address them together in one pull request.
>
> On Sat, Jul 27, 2019 at 9:04 PM Sean Owen <sr...@gmail.com> wrote:
>
>> Great thanks - we can take this to JIRAs now.
>> I think it's worth changing the implementation of atanh if the test value
>> just reflects what Spark does, and there's evidence is a little bit
>> inaccurate.
>> There's an equivalent formula which seems to have better accuracy.
>>
>> On Fri, Jul 26, 2019 at 10:02 PM Takeshi Yamamuro <li...@gmail.com>
>> wrote:
>>
>>> Hi, all,
>>>
>>> FYI:
>>> >> @Yuming Wang the results in float8.sql are from PostgreSQL directly?
>>> >> Interesting if it also returns the same less accurate result, which
>>> >> might suggest it's more to do with underlying OS math libraries. You
>>> >> noted that these tests sometimes gave platform-dependent differences
>>> >> in the last digit, so wondering if the test value directly reflects
>>> >> PostgreSQL or just what we happen to return now.
>>>
>>> The results in float8.sql.out were recomputed in Spark/JVM.
>>> The expected output of the PostgreSQL test is here:
>>> https://github.com/postgres/postgres/blob/master/src/test/regress/expected/float8.out#L493
>>>
>>> As you can see in the file (float8.out), the results other than atanh
>>> also are different between Spark/JVM and PostgreSQL.
>>> For example, the answers of acosh are:
>>> -- PostgreSQL
>>>
>>> https://github.com/postgres/postgres/blob/master/src/test/regress/expected/float8.out#L487
>>> 1.31695789692482
>>>
>>> -- Spark/JVM
>>>
>>> https://github.com/apache/spark/blob/master/sql/core/src/test/resources/sql-tests/results/pgSQL/float8.sql.out#L523
>>> 1.3169578969248166
>>>
>>> btw, the PostgreSQL implementation for atanh just calls atanh in math.h:
>>>
>>> https://github.com/postgres/postgres/blob/master/src/backend/utils/adt/float.c#L2606
>>>
>>> Bests,
>>> Takeshi
>>>
>>>

Re: Ask for ARM CI for spark

Posted by Tianhua huang <hu...@gmail.com>.

@Sean Owen <sr...@gmail.com>  Thank you very much. And I saw your reply
comment in https://issues.apache.org/jira/browse/SPARK-28519, I will test
with modification and to see whether there are other similar tests fail,
and will address them together in one pull request.

On Sat, Jul 27, 2019 at 9:04 PM Sean Owen <sr...@gmail.com> wrote:

> Great thanks - we can take this to JIRAs now.
> I think it's worth changing the implementation of atanh if the test value
> just reflects what Spark does, and there's evidence is a little bit
> inaccurate.
> There's an equivalent formula which seems to have better accuracy.
>
> On Fri, Jul 26, 2019 at 10:02 PM Takeshi Yamamuro <li...@gmail.com>
> wrote:
>
>> Hi, all,
>>
>> FYI:
>> >> @Yuming Wang the results in float8.sql are from PostgreSQL directly?
>> >> Interesting if it also returns the same less accurate result, which
>> >> might suggest it's more to do with underlying OS math libraries. You
>> >> noted that these tests sometimes gave platform-dependent differences
>> >> in the last digit, so wondering if the test value directly reflects
>> >> PostgreSQL or just what we happen to return now.
>>
>> The results in float8.sql.out were recomputed in Spark/JVM.
>> The expected output of the PostgreSQL test is here:
>> https://github.com/postgres/postgres/blob/master/src/test/regress/expected/float8.out#L493
>>
>> As you can see in the file (float8.out), the results other than atanh
>> also are different between Spark/JVM and PostgreSQL.
>> For example, the answers of acosh are:
>> -- PostgreSQL
>>
>> https://github.com/postgres/postgres/blob/master/src/test/regress/expected/float8.out#L487
>> 1.31695789692482
>>
>> -- Spark/JVM
>>
>> https://github.com/apache/spark/blob/master/sql/core/src/test/resources/sql-tests/results/pgSQL/float8.sql.out#L523
>> 1.3169578969248166
>>
>> btw, the PostgreSQL implementation for atanh just calls atanh in math.h:
>>
>> https://github.com/postgres/postgres/blob/master/src/backend/utils/adt/float.c#L2606
>>
>> Bests,
>> Takeshi
>>
>>

Re: Ask for ARM CI for spark

Posted by Sean Owen <sr...@gmail.com>.

Great thanks - we can take this to JIRAs now.
I think it's worth changing the implementation of atanh if the test value
just reflects what Spark does, and there's evidence is a little bit
inaccurate.
There's an equivalent formula which seems to have better accuracy.

On Fri, Jul 26, 2019 at 10:02 PM Takeshi Yamamuro <li...@gmail.com>
wrote:

> Hi, all,
>
> FYI:
> >> @Yuming Wang the results in float8.sql are from PostgreSQL directly?
> >> Interesting if it also returns the same less accurate result, which
> >> might suggest it's more to do with underlying OS math libraries. You
> >> noted that these tests sometimes gave platform-dependent differences
> >> in the last digit, so wondering if the test value directly reflects
> >> PostgreSQL or just what we happen to return now.
>
> The results in float8.sql.out were recomputed in Spark/JVM.
> The expected output of the PostgreSQL test is here:
> https://github.com/postgres/postgres/blob/master/src/test/regress/expected/float8.out#L493
>
> As you can see in the file (float8.out), the results other than atanh
> also are different between Spark/JVM and PostgreSQL.
> For example, the answers of acosh are:
> -- PostgreSQL
>
> https://github.com/postgres/postgres/blob/master/src/test/regress/expected/float8.out#L487
> 1.31695789692482
>
> -- Spark/JVM
>
> https://github.com/apache/spark/blob/master/sql/core/src/test/resources/sql-tests/results/pgSQL/float8.sql.out#L523
> 1.3169578969248166
>
> btw, the PostgreSQL implementation for atanh just calls atanh in math.h:
>
> https://github.com/postgres/postgres/blob/master/src/backend/utils/adt/float.c#L2606
>
> Bests,
> Takeshi
>
>

Re: Ask for ARM CI for spark

Posted by Takeshi Yamamuro <li...@gmail.com>.

Hi, all,

FYI:
>> @Yuming Wang the results in float8.sql are from PostgreSQL directly?
>> Interesting if it also returns the same less accurate result, which
>> might suggest it's more to do with underlying OS math libraries. You
>> noted that these tests sometimes gave platform-dependent differences
>> in the last digit, so wondering if the test value directly reflects
>> PostgreSQL or just what we happen to return now.

The results in float8.sql.out were recomputed in Spark/JVM.
The expected output of the PostgreSQL test is here:
https://github.com/postgres/postgres/blob/master/src/test/regress/expected/float8.out#L493

As you can see in the file (float8.out), the results other than atanh also
are different between Spark/JVM and PostgreSQL.
For example, the answers of acosh are:
-- PostgreSQL
https://github.com/postgres/postgres/blob/master/src/test/regress/expected/float8.out#L487
1.31695789692482

-- Spark/JVM
https://github.com/apache/spark/blob/master/sql/core/src/test/resources/sql-tests/results/pgSQL/float8.sql.out#L523
1.3169578969248166

btw, the PostgreSQL implementation for atanh just calls atanh in math.h:
https://github.com/postgres/postgres/blob/master/src/backend/utils/adt/float.c#L2606

Bests,
Takeshi

On Sat, Jul 27, 2019 at 10:35 AM bo zhaobo <bz...@gmail.com>
wrote:

> Hi all,
>
> Thanks for your concern. Yeah, that's worth to also test in backend
> database. But need to note here, this issue is hit in Spark SQL, as we only
> test it with spark itself, not integrate other databases.
>
> Best Regards,
>
> ZhaoBo
>
>
>
> [image: Mailtrack]
> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> Sender
> notified by
> Mailtrack
> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> 19/07/27
> 上午09:30:56
>
> Sean Owen <sr...@gmail.com> 于2019年7月26日周五 下午5:46写道：
>
>> Interesting. I don't think log(3) is special, it's just that some
>> differences in how it's implemented and floating-point values on
>> aarch64 vs x86, or in the JVM, manifest at some values like this. It's
>> still a little surprising! BTW Wolfram Alpha suggests that the correct
>> value is more like ...810969..., right between the two. java.lang.Math
>> doesn't guarantee strict IEEE floating-point behavior, but
>> java.lang.StrictMath is supposed to, at the potential cost of speed,
>> and it gives ...81096, in agreement with aarch64.
>>
>> @Yuming Wang the results in float8.sql are from PostgreSQL directly?
>> Interesting if it also returns the same less accurate result, which
>> might suggest it's more to do with underlying OS math libraries. You
>> noted that these tests sometimes gave platform-dependent differences
>> in the last digit, so wondering if the test value directly reflects
>> PostgreSQL or just what we happen to return now.
>>
>> One option is to use StrictMath in special cases like computing atanh.
>> That gives a value that agrees with aarch64.
>> I also note that 0.5 * (math.log(1 + x) - math.log(1 - x) gives the
>> more accurate answer too, and makes the result agree with, say,
>> Wolfram Alpha for atanh(0.5).
>> (Actually if we do that, better still is 0.5 * (math.log1p(x) -
>> math.log1p(-x)) for best accuracy near 0)
>> Commons Math also has implementations of sinh, cosh, atanh that we
>> could call. It claims it's possibly more accurate and faster. I
>> haven't tested its result here.
>>
>> FWIW the "log1p" version appears, from some informal testing, to be
>> most accurate (in agreement with Wolfram) and using StrictMath doesn't
>> matter. If we change something, I'd use that version above.
>> The only issue is if this causes the result to disagree with
>> PostgreSQL, but then again it's more correct and maybe the DB is
>> wrong.
>>
>>
>> The rest may be a test vs PostgreSQL issue; see
>> https://issues.apache.org/jira/browse/SPARK-28316
>>
>>
>> On Fri, Jul 26, 2019 at 2:32 AM Tianhua huang <hu...@gmail.com>
>> wrote:
>> >
>> > Hi, all
>> >
>> >
>> > Sorry to disturb again, there are several sql tests failed on arm64
>> instance:
>> >
>> > pgSQL/float8.sql *** FAILED ***
>> > Expected "0.549306144334054[9]", but got "0.549306144334054[8]" Result
>> did not match for query #56
>> > SELECT atanh(double('0.5')) (SQLQueryTestSuite.scala:362)
>> > pgSQL/numeric.sql *** FAILED ***
>> > Expected "2 2247902679199174[72 224790267919917955.1326161858
>> > 4 7405685069595001 7405685069594999.0773399947
>> > 5 5068226527.321263 5068226527.3212726541
>> > 6 281839893606.99365 281839893606.9937234336
>> > 7 1716699575118595840 1716699575118597095.4233081991
>> > 8 167361463828.0749 167361463828.0749132007
>> > 9 107511333880051856] 107511333880052007....", but got "2
>> 2247902679199174[40224790267919917955.1326161858
>> > 4 7405685069595001 7405685069594999.0773399947
>> > 5 5068226527.321263 5068226527.3212726541
>> > 6 281839893606.99365 281839893606.9937234336
>> > 7 1716699575118595580 1716699575118597095.4233081991
>> > 8 167361463828.0749 167361463828.0749132007
>> > 9 107511333880051872] 107511333880052007...." Result did not match for
>> query #496
>> > SELECT t1.id1, t1.result, t2.expected
>> > FROM num_result t1, num_exp_power_10_ln t2
>> > WHERE t1.id1 = t2.id
>> > AND t1.result != t2.expected (SQLQueryTestSuite.scala:362)
>> >
>> > The first test failed, because the value of math.log(3.0) is different
>> on aarch64:
>> >
>> > # on x86_64:
>> >
>> > scala> val a = 0.5
>> > a: Double = 0.5
>> >
>> > scala> a * math.log((1.0 + a) / (1.0 - a))
>> > res1: Double = 0.5493061443340549
>> >
>> > scala> math.log((1.0 + a) / (1.0 - a))
>> > res2: Double = 1.0986122886681098
>> >
>> > # on aarch64:
>> >
>> > scala> val a = 0.5
>> >
>> > a: Double = 0.5
>> >
>> > scala> a * math.log((1.0 + a) / (1.0 - a))
>> >
>> > res20: Double = 0.5493061443340548
>> >
>> > scala> math.log((1.0 + a) / (1.0 - a))
>> >
>> > res21: Double = 1.0986122886681096
>> >
>> > And I tried other several numbers like math.log(4.0) and math.log(5.0)
>> and they are same, I don't know why math.log(3.0) is so special? But the
>> result is different indeed on aarch64. If you are interesting, please try
>> it.
>> >
>> > The second test failed, because some values of pow(10, x) is different
>> on aarch64, according to sql tests of spark, I took similar tests on
>> aarch64 and x86_64, take '-83028485' as example:
>> >
>> > # on x86_64:
>> > scala> import java.lang.Math._
>> > import java.lang.Math._
>> > scala> var a = -83028485
>> > a: Int = -83028485
>> > scala> abs(a)
>> > res4: Int = 83028485
>> > scala> math.log(abs(a))
>> > res5: Double = 18.234694299654787
>> > scala> pow(10, math.log(abs(a)))
>> > res6: Double = 1.71669957511859584E18
>> >
>> > # on aarch64:
>> >
>> > scala> var a = -83028485
>> > a: Int = -83028485
>> > scala> abs(a)
>> > res38: Int = 83028485
>> >
>> > scala> math.log(abs(a))
>> >
>> > res39: Double = 18.234694299654787
>> > scala> pow(10, math.log(abs(a)))
>> > res40: Double = 1.71669957511859558E18
>> >
>> > I send an email to jdk-dev, hope someone can help, and also I proposed
>> this to JIRA  https://issues.apache.org/jira/browse/SPARK-28519, , if
>> you are interesting, welcome to join and discuss, thank you very much.
>> >
>> >
>> > On Thu, Jul 18, 2019 at 11:12 AM Tianhua huang <
>> huangtianhua223@gmail.com> wrote:
>> >>
>> >> Thanks for your reply.
>> >>
>> >> About the first problem we didn't find any other reason in log, just
>> found timeout to wait the executor up, and after increase the timeout from
>> 10000 ms to 30000(even 20000)ms,
>> https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/SparkContextSuite.scala#L764
>>
>> https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/SparkContextSuite.scala#L792
>> the test passed, and there are more than one executor up, not sure whether
>> it's related with the flavor of our aarch64 instance? Now the flavor of the
>> instance is 8C8G. Maybe we will try the bigger flavor later. Or any one has
>> other suggestion, please contact me, thank you.
>> >>
>> >> About the second problem, I proposed a pull request to apache/spark,
>> https://github.com/apache/spark/pull/25186  if you have time, would you
>> please to help to review it, thank you very much.
>> >>
>> >> On Wed, Jul 17, 2019 at 8:37 PM Sean Owen <sr...@gmail.com> wrote:
>> >>>
>> >>> On Wed, Jul 17, 2019 at 6:28 AM Tianhua huang <
>> huangtianhua223@gmail.com> wrote:
>> >>> > Two failed and the reason is 'Can't find 1 executors before 10000
>> milliseconds elapsed', see below, then we try increase timeout the tests
>> passed, so wonder if we can increase the timeout? and here I have another
>> question about
>> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/TestUtils.scala#L285,
>> why is not >=? see the comment of the function, it should be >=?
>> >>> >
>> >>>
>> >>> I think it's ">" because the driver is also an executor, but not 100%
>> >>> sure. In any event it passes in general.
>> >>> These errors typically mean "I didn't start successfully" for some
>> >>> other reason that may be in the logs.
>> >>>
>> >>> > The other two failed and the reason is '2143289344 equaled
>> 2143289344', this because the value of floatToRawIntBits(0.0f/0.0f) on
>> aarch64 platform is 2143289344 and equals to floatToRawIntBits(Float.NaN).
>> About this I send email to jdk-dev and proposed a topic on scala community
>> https://users.scala-lang.org/t/the-value-of-floattorawintbits-0-0f-0-0f-is-different-on-x86-64-and-aarch64-platforms/4845
>> and https://github.com/scala/bug/issues/11632, I thought it's something
>> about jdk or scala, but after discuss, it should related with platform, so
>> seems the following asserts is not appropriate?
>> https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/DataFrameWindowFunctionsSuite.scala#L704-L705
>> and
>> https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala#L732-L733
>> >>>
>> >>> These tests could special-case execution on ARM, like you'll see some
>> >>> tests handle big-endian architectures.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>
>>

-- 
---
Takeshi Yamamuro

Re: Ask for ARM CI for spark

Posted by bo zhaobo <bz...@gmail.com>.

Hi all,

Thanks for your concern. Yeah, that's worth to also test in backend
database. But need to note here, this issue is hit in Spark SQL, as we only
test it with spark itself, not integrate other databases.

Best Regards,

ZhaoBo



[image: Mailtrack]
<https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&>
Sender
notified by
Mailtrack
<https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&>
19/07/27
上午09:30:56

Sean Owen <sr...@gmail.com> 于2019年7月26日周五 下午5:46写道：

> Interesting. I don't think log(3) is special, it's just that some
> differences in how it's implemented and floating-point values on
> aarch64 vs x86, or in the JVM, manifest at some values like this. It's
> still a little surprising! BTW Wolfram Alpha suggests that the correct
> value is more like ...810969..., right between the two. java.lang.Math
> doesn't guarantee strict IEEE floating-point behavior, but
> java.lang.StrictMath is supposed to, at the potential cost of speed,
> and it gives ...81096, in agreement with aarch64.
>
> @Yuming Wang the results in float8.sql are from PostgreSQL directly?
> Interesting if it also returns the same less accurate result, which
> might suggest it's more to do with underlying OS math libraries. You
> noted that these tests sometimes gave platform-dependent differences
> in the last digit, so wondering if the test value directly reflects
> PostgreSQL or just what we happen to return now.
>
> One option is to use StrictMath in special cases like computing atanh.
> That gives a value that agrees with aarch64.
> I also note that 0.5 * (math.log(1 + x) - math.log(1 - x) gives the
> more accurate answer too, and makes the result agree with, say,
> Wolfram Alpha for atanh(0.5).
> (Actually if we do that, better still is 0.5 * (math.log1p(x) -
> math.log1p(-x)) for best accuracy near 0)
> Commons Math also has implementations of sinh, cosh, atanh that we
> could call. It claims it's possibly more accurate and faster. I
> haven't tested its result here.
>
> FWIW the "log1p" version appears, from some informal testing, to be
> most accurate (in agreement with Wolfram) and using StrictMath doesn't
> matter. If we change something, I'd use that version above.
> The only issue is if this causes the result to disagree with
> PostgreSQL, but then again it's more correct and maybe the DB is
> wrong.
>
>
> The rest may be a test vs PostgreSQL issue; see
> https://issues.apache.org/jira/browse/SPARK-28316
>
>
> On Fri, Jul 26, 2019 at 2:32 AM Tianhua huang <hu...@gmail.com>
> wrote:
> >
> > Hi, all
> >
> >
> > Sorry to disturb again, there are several sql tests failed on arm64
> instance:
> >
> > pgSQL/float8.sql *** FAILED ***
> > Expected "0.549306144334054[9]", but got "0.549306144334054[8]" Result
> did not match for query #56
> > SELECT atanh(double('0.5')) (SQLQueryTestSuite.scala:362)
> > pgSQL/numeric.sql *** FAILED ***
> > Expected "2 2247902679199174[72 224790267919917955.1326161858
> > 4 7405685069595001 7405685069594999.0773399947
> > 5 5068226527.321263 5068226527.3212726541
> > 6 281839893606.99365 281839893606.9937234336
> > 7 1716699575118595840 1716699575118597095.4233081991
> > 8 167361463828.0749 167361463828.0749132007
> > 9 107511333880051856] 107511333880052007....", but got "2
> 2247902679199174[40224790267919917955.1326161858
> > 4 7405685069595001 7405685069594999.0773399947
> > 5 5068226527.321263 5068226527.3212726541
> > 6 281839893606.99365 281839893606.9937234336
> > 7 1716699575118595580 1716699575118597095.4233081991
> > 8 167361463828.0749 167361463828.0749132007
> > 9 107511333880051872] 107511333880052007...." Result did not match for
> query #496
> > SELECT t1.id1, t1.result, t2.expected
> > FROM num_result t1, num_exp_power_10_ln t2
> > WHERE t1.id1 = t2.id
> > AND t1.result != t2.expected (SQLQueryTestSuite.scala:362)
> >
> > The first test failed, because the value of math.log(3.0) is different
> on aarch64:
> >
> > # on x86_64:
> >
> > scala> val a = 0.5
> > a: Double = 0.5
> >
> > scala> a * math.log((1.0 + a) / (1.0 - a))
> > res1: Double = 0.5493061443340549
> >
> > scala> math.log((1.0 + a) / (1.0 - a))
> > res2: Double = 1.0986122886681098
> >
> > # on aarch64:
> >
> > scala> val a = 0.5
> >
> > a: Double = 0.5
> >
> > scala> a * math.log((1.0 + a) / (1.0 - a))
> >
> > res20: Double = 0.5493061443340548
> >
> > scala> math.log((1.0 + a) / (1.0 - a))
> >
> > res21: Double = 1.0986122886681096
> >
> > And I tried other several numbers like math.log(4.0) and math.log(5.0)
> and they are same, I don't know why math.log(3.0) is so special? But the
> result is different indeed on aarch64. If you are interesting, please try
> it.
> >
> > The second test failed, because some values of pow(10, x) is different
> on aarch64, according to sql tests of spark, I took similar tests on
> aarch64 and x86_64, take '-83028485' as example:
> >
> > # on x86_64:
> > scala> import java.lang.Math._
> > import java.lang.Math._
> > scala> var a = -83028485
> > a: Int = -83028485
> > scala> abs(a)
> > res4: Int = 83028485
> > scala> math.log(abs(a))
> > res5: Double = 18.234694299654787
> > scala> pow(10, math.log(abs(a)))
> > res6: Double = 1.71669957511859584E18
> >
> > # on aarch64:
> >
> > scala> var a = -83028485
> > a: Int = -83028485
> > scala> abs(a)
> > res38: Int = 83028485
> >
> > scala> math.log(abs(a))
> >
> > res39: Double = 18.234694299654787
> > scala> pow(10, math.log(abs(a)))
> > res40: Double = 1.71669957511859558E18
> >
> > I send an email to jdk-dev, hope someone can help, and also I proposed
> this to JIRA  https://issues.apache.org/jira/browse/SPARK-28519, , if you
> are interesting, welcome to join and discuss, thank you very much.
> >
> >
> > On Thu, Jul 18, 2019 at 11:12 AM Tianhua huang <
> huangtianhua223@gmail.com> wrote:
> >>
> >> Thanks for your reply.
> >>
> >> About the first problem we didn't find any other reason in log, just
> found timeout to wait the executor up, and after increase the timeout from
> 10000 ms to 30000(even 20000)ms,
> https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/SparkContextSuite.scala#L764
>
> https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/SparkContextSuite.scala#L792
> the test passed, and there are more than one executor up, not sure whether
> it's related with the flavor of our aarch64 instance? Now the flavor of the
> instance is 8C8G. Maybe we will try the bigger flavor later. Or any one has
> other suggestion, please contact me, thank you.
> >>
> >> About the second problem, I proposed a pull request to apache/spark,
> https://github.com/apache/spark/pull/25186  if you have time, would you
> please to help to review it, thank you very much.
> >>
> >> On Wed, Jul 17, 2019 at 8:37 PM Sean Owen <sr...@gmail.com> wrote:
> >>>
> >>> On Wed, Jul 17, 2019 at 6:28 AM Tianhua huang <
> huangtianhua223@gmail.com> wrote:
> >>> > Two failed and the reason is 'Can't find 1 executors before 10000
> milliseconds elapsed', see below, then we try increase timeout the tests
> passed, so wonder if we can increase the timeout? and here I have another
> question about
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/TestUtils.scala#L285,
> why is not >=? see the comment of the function, it should be >=?
> >>> >
> >>>
> >>> I think it's ">" because the driver is also an executor, but not 100%
> >>> sure. In any event it passes in general.
> >>> These errors typically mean "I didn't start successfully" for some
> >>> other reason that may be in the logs.
> >>>
> >>> > The other two failed and the reason is '2143289344 equaled
> 2143289344', this because the value of floatToRawIntBits(0.0f/0.0f) on
> aarch64 platform is 2143289344 and equals to floatToRawIntBits(Float.NaN).
> About this I send email to jdk-dev and proposed a topic on scala community
> https://users.scala-lang.org/t/the-value-of-floattorawintbits-0-0f-0-0f-is-different-on-x86-64-and-aarch64-platforms/4845
> and https://github.com/scala/bug/issues/11632, I thought it's something
> about jdk or scala, but after discuss, it should related with platform, so
> seems the following asserts is not appropriate?
> https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/DataFrameWindowFunctionsSuite.scala#L704-L705
> and
> https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala#L732-L733
> >>>
> >>> These tests could special-case execution on ARM, like you'll see some
> >>> tests handle big-endian architectures.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>
>

Re: Ask for ARM CI for spark

Posted by Sean Owen <sr...@gmail.com>.

Interesting. I don't think log(3) is special, it's just that some
differences in how it's implemented and floating-point values on
aarch64 vs x86, or in the JVM, manifest at some values like this. It's
still a little surprising! BTW Wolfram Alpha suggests that the correct
value is more like ...810969..., right between the two. java.lang.Math
doesn't guarantee strict IEEE floating-point behavior, but
java.lang.StrictMath is supposed to, at the potential cost of speed,
and it gives ...81096, in agreement with aarch64.

@Yuming Wang the results in float8.sql are from PostgreSQL directly?
Interesting if it also returns the same less accurate result, which
might suggest it's more to do with underlying OS math libraries. You
noted that these tests sometimes gave platform-dependent differences
in the last digit, so wondering if the test value directly reflects
PostgreSQL or just what we happen to return now.

One option is to use StrictMath in special cases like computing atanh.
That gives a value that agrees with aarch64.
I also note that 0.5 * (math.log(1 + x) - math.log(1 - x) gives the
more accurate answer too, and makes the result agree with, say,
Wolfram Alpha for atanh(0.5).
(Actually if we do that, better still is 0.5 * (math.log1p(x) -
math.log1p(-x)) for best accuracy near 0)
Commons Math also has implementations of sinh, cosh, atanh that we
could call. It claims it's possibly more accurate and faster. I
haven't tested its result here.

FWIW the "log1p" version appears, from some informal testing, to be
most accurate (in agreement with Wolfram) and using StrictMath doesn't
matter. If we change something, I'd use that version above.
The only issue is if this causes the result to disagree with
PostgreSQL, but then again it's more correct and maybe the DB is
wrong.


The rest may be a test vs PostgreSQL issue; see
https://issues.apache.org/jira/browse/SPARK-28316


On Fri, Jul 26, 2019 at 2:32 AM Tianhua huang <hu...@gmail.com> wrote:
>
> Hi, all
>
>
> Sorry to disturb again, there are several sql tests failed on arm64 instance:
>
> pgSQL/float8.sql *** FAILED ***
> Expected "0.549306144334054[9]", but got "0.549306144334054[8]" Result did not match for query #56
> SELECT atanh(double('0.5')) (SQLQueryTestSuite.scala:362)
> pgSQL/numeric.sql *** FAILED ***
> Expected "2 2247902679199174[72 224790267919917955.1326161858
> 4 7405685069595001 7405685069594999.0773399947
> 5 5068226527.321263 5068226527.3212726541
> 6 281839893606.99365 281839893606.9937234336
> 7 1716699575118595840 1716699575118597095.4233081991
> 8 167361463828.0749 167361463828.0749132007
> 9 107511333880051856] 107511333880052007....", but got "2 2247902679199174[40224790267919917955.1326161858
> 4 7405685069595001 7405685069594999.0773399947
> 5 5068226527.321263 5068226527.3212726541
> 6 281839893606.99365 281839893606.9937234336
> 7 1716699575118595580 1716699575118597095.4233081991
> 8 167361463828.0749 167361463828.0749132007
> 9 107511333880051872] 107511333880052007...." Result did not match for query #496
> SELECT t1.id1, t1.result, t2.expected
> FROM num_result t1, num_exp_power_10_ln t2
> WHERE t1.id1 = t2.id
> AND t1.result != t2.expected (SQLQueryTestSuite.scala:362)
>
> The first test failed, because the value of math.log(3.0) is different on aarch64:
>
> # on x86_64:
>
> scala> val a = 0.5
> a: Double = 0.5
>
> scala> a * math.log((1.0 + a) / (1.0 - a))
> res1: Double = 0.5493061443340549
>
> scala> math.log((1.0 + a) / (1.0 - a))
> res2: Double = 1.0986122886681098
>
> # on aarch64:
>
> scala> val a = 0.5
>
> a: Double = 0.5
>
> scala> a * math.log((1.0 + a) / (1.0 - a))
>
> res20: Double = 0.5493061443340548
>
> scala> math.log((1.0 + a) / (1.0 - a))
>
> res21: Double = 1.0986122886681096
>
> And I tried other several numbers like math.log(4.0) and math.log(5.0) and they are same, I don't know why math.log(3.0) is so special? But the result is different indeed on aarch64. If you are interesting, please try it.
>
> The second test failed, because some values of pow(10, x) is different on aarch64, according to sql tests of spark, I took similar tests on aarch64 and x86_64, take '-83028485' as example:
>
> # on x86_64:
> scala> import java.lang.Math._
> import java.lang.Math._
> scala> var a = -83028485
> a: Int = -83028485
> scala> abs(a)
> res4: Int = 83028485
> scala> math.log(abs(a))
> res5: Double = 18.234694299654787
> scala> pow(10, math.log(abs(a)))
> res6: Double = 1.71669957511859584E18
>
> # on aarch64:
>
> scala> var a = -83028485
> a: Int = -83028485
> scala> abs(a)
> res38: Int = 83028485
>
> scala> math.log(abs(a))
>
> res39: Double = 18.234694299654787
> scala> pow(10, math.log(abs(a)))
> res40: Double = 1.71669957511859558E18
>
> I send an email to jdk-dev, hope someone can help, and also I proposed this to JIRA  https://issues.apache.org/jira/browse/SPARK-28519, , if you are interesting, welcome to join and discuss, thank you very much.
>
>
> On Thu, Jul 18, 2019 at 11:12 AM Tianhua huang <hu...@gmail.com> wrote:
>>
>> Thanks for your reply.
>>
>> About the first problem we didn't find any other reason in log, just found timeout to wait the executor up, and after increase the timeout from 10000 ms to 30000(even 20000)ms, https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/SparkContextSuite.scala#L764  https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/SparkContextSuite.scala#L792  the test passed, and there are more than one executor up, not sure whether it's related with the flavor of our aarch64 instance? Now the flavor of the instance is 8C8G. Maybe we will try the bigger flavor later. Or any one has other suggestion, please contact me, thank you.
>>
>> About the second problem, I proposed a pull request to apache/spark, https://github.com/apache/spark/pull/25186  if you have time, would you please to help to review it, thank you very much.
>>
>> On Wed, Jul 17, 2019 at 8:37 PM Sean Owen <sr...@gmail.com> wrote:
>>>
>>> On Wed, Jul 17, 2019 at 6:28 AM Tianhua huang <hu...@gmail.com> wrote:
>>> > Two failed and the reason is 'Can't find 1 executors before 10000 milliseconds elapsed', see below, then we try increase timeout the tests passed, so wonder if we can increase the timeout? and here I have another question about https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/TestUtils.scala#L285, why is not >=? see the comment of the function, it should be >=?
>>> >
>>>
>>> I think it's ">" because the driver is also an executor, but not 100%
>>> sure. In any event it passes in general.
>>> These errors typically mean "I didn't start successfully" for some
>>> other reason that may be in the logs.
>>>
>>> > The other two failed and the reason is '2143289344 equaled 2143289344', this because the value of floatToRawIntBits(0.0f/0.0f) on aarch64 platform is 2143289344 and equals to floatToRawIntBits(Float.NaN). About this I send email to jdk-dev and proposed a topic on scala community https://users.scala-lang.org/t/the-value-of-floattorawintbits-0-0f-0-0f-is-different-on-x86-64-and-aarch64-platforms/4845 and https://github.com/scala/bug/issues/11632, I thought it's something about jdk or scala, but after discuss, it should related with platform, so seems the following asserts is not appropriate? https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/DataFrameWindowFunctionsSuite.scala#L704-L705 and https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala#L732-L733
>>>
>>> These tests could special-case execution on ARM, like you'll see some
>>> tests handle big-endian architectures.

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org

Re: Ask for ARM CI for spark

Posted by Tianhua huang <hu...@gmail.com>.

Hi, all


Sorry to disturb again, there are several sql tests failed on arm64
instance:

   - pgSQL/float8.sql *** FAILED ***
   Expected "0.549306144334054[9]", but got "0.549306144334054[8]" Result
   did not match for query #56
   SELECT atanh(double('0.5')) (SQLQueryTestSuite.scala:362)
   - pgSQL/numeric.sql *** FAILED ***
   Expected "2 2247902679199174[72 224790267919917955.1326161858
   4 7405685069595001 7405685069594999.0773399947
   5 5068226527.321263 5068226527.3212726541
   6 281839893606.99365 281839893606.9937234336
   7 1716699575118595840 1716699575118597095.4233081991
   8 167361463828.0749 167361463828.0749132007
   9 107511333880051856] 107511333880052007....", but got "2
   2247902679199174[40224790267919917955.1326161858
   4 7405685069595001 7405685069594999.0773399947
   5 5068226527.321263 5068226527.3212726541
   6 281839893606.99365 281839893606.9937234336
   7 1716699575118595580 1716699575118597095.4233081991
   8 167361463828.0749 167361463828.0749132007
   9 107511333880051872] 107511333880052007...." Result did not match for
   query #496
   SELECT t1.id1, t1.result, t2.expected
   FROM num_result t1, num_exp_power_10_ln t2
   WHERE t1.id1 = t2.id
   AND t1.result != t2.expected (SQLQueryTestSuite.scala:362)

The first test failed, because the value of math.log(3.0) is different on
aarch64:

# on x86_64:
scala> val a = 0.5
a: Double = 0.5

scala> a * math.log((1.0 + a) / (1.0 - a))
res1: Double = 0.5493061443340549

scala> math.log((1.0 + a) / (1.0 - a))
res2: Double = 1.0986122886681098

# on aarch64:

scala> val a = 0.5

a: Double = 0.5

scala> a * math.log((1.0 + a) / (1.0 - a))
res20: Double = 0.5493061443340548

scala> math.log((1.0 + a) / (1.0 - a))

res21: Double = 1.0986122886681096

And I tried other several numbers like math.log(4.0) and math.log(5.0) and
they are same, I don't know why math.log(3.0) is so special? But the result
is different indeed on aarch64. If you are interesting, please try it.

The second test failed, because some values of pow(10, x) is different on
aarch64, according to sql tests of spark, I took similar tests on aarch64
and x86_64, take '-83028485' as example:

# on x86_64:
scala> import java.lang.Math._
import java.lang.Math._
scala> var a = -83028485
a: Int = -83028485
scala> abs(a)
res4: Int = 83028485
scala> math.log(abs(a))
res5: Double = 18.234694299654787
scala> pow(10, math.log(abs(a)))
res6: Double = 1.71669957511859584E18

# on aarch64:

scala> var a = -83028485
a: Int = -83028485
scala> abs(a)
res38: Int = 83028485

scala> math.log(abs(a))

res39: Double = 18.234694299654787
scala> pow(10, math.log(abs(a)))
res40: Double = 1.71669957511859558E18

I send an email to jdk-dev, hope someone can help, and also I proposed this
to JIRA  https://issues.apache.org/jira/browse/SPARK-28519, , if you are
interesting, welcome to join and discuss, thank you very much.

On Thu, Jul 18, 2019 at 11:12 AM Tianhua huang <hu...@gmail.com>
wrote:

> Thanks for your reply.
>
> About the first problem we didn't find any other reason in log, just found
> timeout to wait the executor up, and after increase the timeout from 10000
> ms to 30000(even 20000)ms,
> https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/SparkContextSuite.scala#L764
>
> https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/SparkContextSuite.scala#L792
> the test passed, and there are more than one executor up, not sure whether
> it's related with the flavor of our aarch64 instance? Now the flavor of the
> instance is 8C8G. Maybe we will try the bigger flavor later. Or any one has
> other suggestion, please contact me, thank you.
>
> About the second problem, I proposed a pull request to apache/spark,
> https://github.com/apache/spark/pull/25186  if you have time, would you
> please to help to review it, thank you very much.
>
> On Wed, Jul 17, 2019 at 8:37 PM Sean Owen <sr...@gmail.com> wrote:
>
>> On Wed, Jul 17, 2019 at 6:28 AM Tianhua huang <hu...@gmail.com>
>> wrote:
>> > Two failed and the reason is 'Can't find 1 executors before 10000
>> milliseconds elapsed', see below, then we try increase timeout the tests
>> passed, so wonder if we can increase the timeout? and here I have another
>> question about
>> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/TestUtils.scala#L285,
>> why is not >=? see the comment of the function, it should be >=?
>> >
>>
>> I think it's ">" because the driver is also an executor, but not 100%
>> sure. In any event it passes in general.
>> These errors typically mean "I didn't start successfully" for some
>> other reason that may be in the logs.
>>
>> > The other two failed and the reason is '2143289344 equaled 2143289344',
>> this because the value of floatToRawIntBits(0.0f/0.0f) on aarch64 platform
>> is 2143289344 and equals to floatToRawIntBits(Float.NaN). About this I send
>> email to jdk-dev and proposed a topic on scala community
>> https://users.scala-lang.org/t/the-value-of-floattorawintbits-0-0f-0-0f-is-different-on-x86-64-and-aarch64-platforms/4845
>> and https://github.com/scala/bug/issues/11632, I thought it's something
>> about jdk or scala, but after discuss, it should related with platform, so
>> seems the following asserts is not appropriate?
>> https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/DataFrameWindowFunctionsSuite.scala#L704-L705
>> and
>> https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala#L732-L733
>>
>> These tests could special-case execution on ARM, like you'll see some
>> tests handle big-endian architectures.
>>
>

Re: Ask for ARM CI for spark

Posted by Tianhua huang <hu...@gmail.com>.

Thanks for your reply.

About the first problem we didn't find any other reason in log, just found
timeout to wait the executor up, and after increase the timeout from 10000
ms to 30000(even 20000)ms,
https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/SparkContextSuite.scala#L764

https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/SparkContextSuite.scala#L792
the test passed, and there are more than one executor up, not sure whether
it's related with the flavor of our aarch64 instance? Now the flavor of the
instance is 8C8G. Maybe we will try the bigger flavor later. Or any one has
other suggestion, please contact me, thank you.

About the second problem, I proposed a pull request to apache/spark,
https://github.com/apache/spark/pull/25186  if you have time, would you
please to help to review it, thank you very much.

On Wed, Jul 17, 2019 at 8:37 PM Sean Owen <sr...@gmail.com> wrote:

> On Wed, Jul 17, 2019 at 6:28 AM Tianhua huang <hu...@gmail.com>
> wrote:
> > Two failed and the reason is 'Can't find 1 executors before 10000
> milliseconds elapsed', see below, then we try increase timeout the tests
> passed, so wonder if we can increase the timeout? and here I have another
> question about
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/TestUtils.scala#L285,
> why is not >=? see the comment of the function, it should be >=?
> >
>
> I think it's ">" because the driver is also an executor, but not 100%
> sure. In any event it passes in general.
> These errors typically mean "I didn't start successfully" for some
> other reason that may be in the logs.
>
> > The other two failed and the reason is '2143289344 equaled 2143289344',
> this because the value of floatToRawIntBits(0.0f/0.0f) on aarch64 platform
> is 2143289344 and equals to floatToRawIntBits(Float.NaN). About this I send
> email to jdk-dev and proposed a topic on scala community
> https://users.scala-lang.org/t/the-value-of-floattorawintbits-0-0f-0-0f-is-different-on-x86-64-and-aarch64-platforms/4845
> and https://github.com/scala/bug/issues/11632, I thought it's something
> about jdk or scala, but after discuss, it should related with platform, so
> seems the following asserts is not appropriate?
> https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/DataFrameWindowFunctionsSuite.scala#L704-L705
> and
> https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala#L732-L733
>
> These tests could special-case execution on ARM, like you'll see some
> tests handle big-endian architectures.
>

Re: Ask for ARM CI for spark

Posted by Sean Owen <sr...@gmail.com>.

On Wed, Jul 17, 2019 at 6:28 AM Tianhua huang <hu...@gmail.com> wrote:
> Two failed and the reason is 'Can't find 1 executors before 10000 milliseconds elapsed', see below, then we try increase timeout the tests passed, so wonder if we can increase the timeout? and here I have another question about https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/TestUtils.scala#L285, why is not >=? see the comment of the function, it should be >=?
>

I think it's ">" because the driver is also an executor, but not 100%
sure. In any event it passes in general.
These errors typically mean "I didn't start successfully" for some
other reason that may be in the logs.

> The other two failed and the reason is '2143289344 equaled 2143289344', this because the value of floatToRawIntBits(0.0f/0.0f) on aarch64 platform is 2143289344 and equals to floatToRawIntBits(Float.NaN). About this I send email to jdk-dev and proposed a topic on scala community https://users.scala-lang.org/t/the-value-of-floattorawintbits-0-0f-0-0f-is-different-on-x86-64-and-aarch64-platforms/4845 and https://github.com/scala/bug/issues/11632, I thought it's something about jdk or scala, but after discuss, it should related with platform, so seems the following asserts is not appropriate? https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/DataFrameWindowFunctionsSuite.scala#L704-L705 and https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala#L732-L733

These tests could special-case execution on ARM, like you'll see some
tests handle big-endian architectures.

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org

Re: Ask for ARM CI for spark

Posted by Tianhua huang <hu...@gmail.com>.

Hi all,

We run all unit tests for spark on arm64 platform, after effort there are
four tests FAILED, see
https://logs.openlabtesting.org/logs/4/4/ae5ebaddd6ba6eba5a525b2bf757043ebbe78432/check/spark-build-arm64/9ecccad/job-output.txt.gz

Two failed and the reason is 'Can't find 1 executors before 10000
milliseconds elapsed', see below, then we try increase timeout the tests
passed, so wonder if we can increase the timeout? and here I have another
question about
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/TestUtils.scala#L285,
why is not >=? see the comment of the function, it should be >=?

- test driver discovery under local-cluster mode *** FAILED ***
  java.util.concurrent.TimeoutException: Can't find 1 executors before
10000 milliseconds elapsed
  at org.apache.spark.TestUtils$.waitUntilExecutorsUp(TestUtils.scala:293)
  at org.apache.spark.SparkContextSuite.$anonfun$new$78(SparkContextSuite.scala:753)
  at org.apache.spark.SparkContextSuite.$anonfun$new$78$adapted(SparkContextSuite.scala:741)
  at org.apache.spark.SparkFunSuite.withTempDir(SparkFunSuite.scala:161)
  at org.apache.spark.SparkContextSuite.$anonfun$new$77(SparkContextSuite.scala:741)
  at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
  at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
  at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
  at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
  at org.scalatest.Transformer.apply(Transformer.scala:22)

- test gpu driver resource files and discovery under local-cluster
mode *** FAILED ***
  java.util.concurrent.TimeoutException: Can't find 1 executors before
10000 milliseconds elapsed
  at org.apache.spark.TestUtils$.waitUntilExecutorsUp(TestUtils.scala:293)
  at org.apache.spark.SparkContextSuite.$anonfun$new$80(SparkContextSuite.scala:781)
  at org.apache.spark.SparkContextSuite.$anonfun$new$80$adapted(SparkContextSuite.scala:761)
  at org.apache.spark.SparkFunSuite.withTempDir(SparkFunSuite.scala:161)
  at org.apache.spark.SparkContextSuite.$anonfun$new$79(SparkContextSuite.scala:761)
  at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
  at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
  at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
  at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
  at org.scalatest.Transformer.apply(Transformer.scala:22)

The other two failed and the reason is '2143289344 equaled
2143289344', this because the value of floatToRawIntBits(0.0f/0.0f) on
aarch64 platform is 2143289344 and equals to
floatToRawIntBits(Float.NaN). About this I send email to jdk-dev and
proposed a topic on scala community
https://users.scala-lang.org/t/the-value-of-floattorawintbits-0-0f-0-0f-is-different-on-x86-64-and-aarch64-platforms/4845
and https://github.com/scala/bug/issues/11632, I thought it's
something about jdk or scala, but after discuss, it should related
with platform, so seems the following asserts is not appropriate?
https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/DataFrameWindowFunctionsSuite.scala#L704-L705
and https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala#L732-L733

 - SPARK-26021: NaN and -0.0 in grouping expressions *** FAILED ***
   2143289344 equaled 2143289344 (DataFrameAggregateSuite.scala:732)
 - NaN and -0.0 in window partition keys *** FAILED ***
   2143289344 equaled 2143289344 (DataFrameWindowFunctionsSuite.scala:704)

About the failed tests fixing, we are waiting for your suggestions,
thank you very much.


On Wed, Jul 10, 2019 at 10:07 AM Tianhua huang <hu...@gmail.com>
wrote:

> Hi all,
>
> I am glad to tell you there is a new progress of build/test spark on
> aarch64 server, the tests are running, see the build/test detail log
> https://logs.openlabtesting.org/logs/1/1/419fcb11764048d5a3cda186ea76dd43249e1f97/check/spark-build-arm64/75cc6f5/job-output.txt.gz and
> the aarch64 instance info see
> https://logs.openlabtesting.org/logs/1/1/419fcb11764048d5a3cda186ea76dd43249e1f97/check/spark-build-arm64/75cc6f5/zuul-info/zuul-info.ubuntu-xenial-arm64.txt In
> order to enable the test, I made some modification, the major one is to
> build leveldbjni local package, I forked fusesource/leveldbjni and
> chirino/leveldb repos, and made some modification to make sure to build the
> local package, see https://github.com/huangtianhua/leveldbjni/pull/1 and
> https://github.com/huangtianhua/leveldbjni/pull/2 , then to use it in
> spark, the detail you can find in
> https://github.com/theopenlab/spark/pull/1
>
> Now the tests are not all successful, I will try to fix it and any
> suggestion is welcome, thank you all.
>
> On Mon, Jul 1, 2019 at 5:25 PM Tianhua huang <hu...@gmail.com>
> wrote:
>
>> We are focus on the arm instance of cloud, and now I use the arm instance
>> of vexxhost cloud to run the build job which mentioned above, the
>> specification of the arm instance is 8VCPU and 8GB of RAM,
>> and we can use bigger flavor to create the arm instance to run the job,
>> if need be.
>>
>> On Fri, Jun 28, 2019 at 6:55 PM Steve Loughran
>> <st...@cloudera.com.invalid> wrote:
>>
>>>
>>> Be interesting to see how well a Pi4 works; with only 4GB of RAM you
>>> wouldn't compile with it, but you could try installing the spark jar bundle
>>> and then run against some NFS mounted disks:
>>> https://www.raspberrypi.org/magpi/raspberry-pi-4-specs-benchmarks/ ;
>>> unlikely to be fast, but it'd be an efficient kind of slow
>>>
>>> On Fri, Jun 28, 2019 at 3:08 AM Rui Chen <ch...@gmail.com> wrote:
>>>
>>>> >  I think any AA64 work is going to have to define very clearly what
>>>> "works" is defined as
>>>>
>>>> +1
>>>> It's very valuable to build a clear scope of these projects
>>>> functionality for ARM platform in upstream community, it bring confidence
>>>> to end user and customers when they plan to deploy these projects on ARM.
>>>>
>>>> This is absolute long term work, let's to make it step by step, CI,
>>>> testing, issue and resolving.
>>>>
>>>> Steve Loughran <st...@cloudera.com.invalid> 于2019年6月27日周四 下午9:22写道：
>>>>
>>>>> level db and native codecs are invariably a problem here, as is
>>>>> anything else doing misaligned IO. Protobuf has also had "issues" in the
>>>>> past
>>>>>
>>>>> see https://issues.apache.org/jira/browse/HADOOP-16100
>>>>>
>>>>> I think any AA64 work is going to have to define very clearly what
>>>>> "works" is defined as; spark standalone with a specific set of codecs is
>>>>> probably the first thing to aim for -no Snappy or lz4.
>>>>>
>>>>> Anything which goes near: protobuf, checksums, native code, etc is in
>>>>> trouble. Don't try and deploy with HDFS as the cluster FS, would be my
>>>>> recommendation.
>>>>>
>>>>> If you want a cluster use NFS or one of google GCS, Azure WASB for the
>>>>> cluster FS. And before trying either of those cloud store, run the
>>>>> filesystem connector test suites (hadoop-azure; google gcs github) to see
>>>>> that they work. If the foundational FS test suites fail, nothing else will
>>>>> work
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Jun 27, 2019 at 3:09 AM Tianhua huang <
>>>>> huangtianhua223@gmail.com> wrote:
>>>>>
>>>>>> I took the ut tests on my arm instance before and reported an issue
>>>>>> in https://issues.apache.org/jira/browse/SPARK-27721,  and seems
>>>>>> there was no leveldbjni native package for aarch64 in leveldbjni-all.jar(or
>>>>>> 1.8)
>>>>>> https://mvnrepository.com/artifact/org.fusesource.leveldbjni/leveldbjni-all/1.8
>>>>>> , we can find https://github.com/fusesource/leveldbjni/pull/82 this
>>>>>> pr added the aarch64 support and merged on 2 Nov 2017, but the latest
>>>>>> release of the repo is  on 17 Oct 2013, unfortunately it didn't
>>>>>> include the aarch64 supporting.
>>>>>>
>>>>>> I will running the test on the job mentioned above, and will try to
>>>>>> fix the issue above, or if anyone have any idea of it, welcome reply me,
>>>>>> thank you.
>>>>>>
>>>>>>
>>>>>> On Wed, Jun 26, 2019 at 8:11 PM Sean Owen <sr...@gmail.com> wrote:
>>>>>>
>>>>>>> Can you begin by testing yourself? I think the first step is to make
>>>>>>> sure the build and tests work on ARM. If you find problems you can
>>>>>>> isolate them and try to fix them, or at least report them. It's only
>>>>>>> worth getting CI in place when we think builds will work.
>>>>>>>
>>>>>>> On Tue, Jun 25, 2019 at 9:26 PM Tianhua huang <
>>>>>>> huangtianhua223@gmail.com> wrote:
>>>>>>> >
>>>>>>> > Thanks Shane :)
>>>>>>> >
>>>>>>> > This sounds good, and yes I agree that it's best to keep the
>>>>>>> test/build infrastructure in one place. If you can't find the ARM resource
>>>>>>> we are willing to support the ARM instance :)  Our goal is to make more
>>>>>>> open source software to be more compatible for aarch64 platform, so let's
>>>>>>> to do it. I will be happy if I can give some help for the goal.
>>>>>>> >
>>>>>>> > Waiting for you good news :)
>>>>>>> >
>>>>>>> > On Wed, Jun 26, 2019 at 9:47 AM shane knapp <sk...@berkeley.edu>
>>>>>>> wrote:
>>>>>>> >>
>>>>>>> >> ...or via VM as you mentioned earlier.  :)
>>>>>>> >>
>>>>>>> >> shane (who will file a JIRA tomorrow)
>>>>>>> >>
>>>>>>> >> On Tue, Jun 25, 2019 at 6:44 PM shane knapp <sk...@berkeley.edu>
>>>>>>> wrote:
>>>>>>> >>>
>>>>>>> >>> i'd much prefer that we keep the test/build infrastructure in
>>>>>>> one place.
>>>>>>> >>>
>>>>>>> >>> we don't have ARM hardware, but there's a slim possibility i can
>>>>>>> scare something up in our older research stock...
>>>>>>> >>>
>>>>>>> >>> another option would be to run the build in a arm-based docker
>>>>>>> container, which (according to the intarwebs) is possible.
>>>>>>> >>>
>>>>>>> >>> shane
>>>>>>> >>>
>>>>>>> >>> On Tue, Jun 25, 2019 at 6:35 PM Tianhua huang <
>>>>>>> huangtianhua223@gmail.com> wrote:
>>>>>>> >>>>
>>>>>>> >>>> I forked apache/spark project and propose a job(
>>>>>>> https://github.com/theopenlab/spark/pull/1) for spark building in
>>>>>>> OpenLab ARM instance, this is the first step to build spark on ARM,  I can
>>>>>>> enable a periodic job for arm building for apache/spark master if you guys
>>>>>>> like.  Later I will run tests for spark. I also willing to be the
>>>>>>> maintainer of the arm ci of spark.
>>>>>>> >>>>
>>>>>>> >>>> Thanks for you attention.
>>>>>>> >>>>
>>>>>>> >>>> On Thu, Jun 20, 2019 at 10:17 AM Tianhua huang <
>>>>>>> huangtianhua223@gmail.com> wrote:
>>>>>>> >>>>>
>>>>>>> >>>>> Thanks Sean.
>>>>>>> >>>>>
>>>>>>> >>>>> I am very happy to hear that the community will put effort to
>>>>>>> fix the ARM-related issues. I'd be happy to help if you like. And could you
>>>>>>> give the trace link of this issue, then I can check it is fixed or not,
>>>>>>> thank you.
>>>>>>> >>>>> As far as I know the old versions of spark support ARM, and
>>>>>>> now the new versions don't, this just shows that we need a CI to check
>>>>>>> whether the spark support ARM and whether some modification break it.
>>>>>>> >>>>> I will add a demo job in OpenLab to build spark on ARM and do
>>>>>>> a simple UT test. Later I will give the job link.
>>>>>>> >>>>>
>>>>>>> >>>>> Let me know what you think.
>>>>>>> >>>>>
>>>>>>> >>>>> Thank you all!
>>>>>>> >>>>>
>>>>>>> >>>>>
>>>>>>> >>>>> On Wed, Jun 19, 2019 at 8:47 PM Sean Owen <sr...@gmail.com>
>>>>>>> wrote:
>>>>>>> >>>>>>
>>>>>>> >>>>>> I'd begin by reporting and fixing ARM-related issues in the
>>>>>>> build. If
>>>>>>> >>>>>> they're small, of course we should do them. If it requires
>>>>>>> significant
>>>>>>> >>>>>> modifications, we can discuss how much Spark can support ARM.
>>>>>>> I don't
>>>>>>> >>>>>> think it's yet necessary for the Spark project to run these
>>>>>>> CI builds
>>>>>>> >>>>>> until that point, but it's always welcome if people are
>>>>>>> testing that
>>>>>>> >>>>>> separately.
>>>>>>> >>>>>>
>>>>>>> >>>>>> On Wed, Jun 19, 2019 at 7:41 AM Holden Karau <
>>>>>>> holden@pigscanfly.ca> wrote:
>>>>>>> >>>>>> >
>>>>>>> >>>>>> > Moving to dev@ for increased visibility among the
>>>>>>> developers.
>>>>>>> >>>>>> >
>>>>>>> >>>>>> > On Wed, Jun 19, 2019 at 1:24 AM Tianhua huang <
>>>>>>> huangtianhua223@gmail.com> wrote:
>>>>>>> >>>>>> >>
>>>>>>> >>>>>> >> Thanks for your reply.
>>>>>>> >>>>>> >>
>>>>>>> >>>>>> >> As I said before, I met some problem of build or test for
>>>>>>> spark on aarch64 server, so it will be better to have the ARM CI to make
>>>>>>> sure the spark is compatible for AArch64 platforms.
>>>>>>> >>>>>> >>
>>>>>>> >>>>>> >> I’m from OpenLab team(https://openlabtesting.org/ ,a
>>>>>>> community to do open source project testing. And we can support some Arm
>>>>>>> virtual machines to AMPLab Jenkins, and also we have a developer team that
>>>>>>> willing to work on this, we willing to maintain build CI jobs and address
>>>>>>> the CI issues.  What do you think?
>>>>>>> >>>>>> >>
>>>>>>> >>>>>> >>
>>>>>>> >>>>>> >> Thanks for your attention.
>>>>>>> >>>>>> >>
>>>>>>> >>>>>> >>
>>>>>>> >>>>>> >> On Wed, Jun 19, 2019 at 6:39 AM shane knapp <
>>>>>>> sknapp@berkeley.edu> wrote:
>>>>>>> >>>>>> >>>
>>>>>>> >>>>>> >>> yeah, we don't have any aarch64 systems for testing...
>>>>>>> this has been asked before but is currently pretty low on our priority list
>>>>>>> as we don't have the hardware.
>>>>>>> >>>>>> >>>
>>>>>>> >>>>>> >>> sorry,
>>>>>>> >>>>>> >>>
>>>>>>> >>>>>> >>> shane
>>>>>>> >>>>>> >>>
>>>>>>> >>>>>> >>> On Mon, Jun 10, 2019 at 7:08 PM Tianhua huang <
>>>>>>> huangtianhua223@gmail.com> wrote:
>>>>>>> >>>>>> >>>>
>>>>>>> >>>>>> >>>> Hi, sorry to disturb you.
>>>>>>> >>>>>> >>>> The CI testing for apache spark is supported by AMPLab
>>>>>>> Jenkins, and I find there are some computers(most of them are Linux (amd64)
>>>>>>> arch) for the CI development, but seems there is no Aarch64 computer for
>>>>>>> spark CI testing. Recently, I build and run test for spark(master and
>>>>>>> branch-2.4) on my arm server, and unfortunately there are some problems,
>>>>>>> for example, ut test is failed due to a LEVELDBJNI native package, the
>>>>>>> details for java test see http://paste.openstack.org/show/752063/
>>>>>>> and python test see http://paste.openstack.org/show/752709/
>>>>>>> >>>>>> >>>> So I have a question about the ARM CI testing for spark,
>>>>>>> is there any plan to support it? Thank you very much and I will wait for
>>>>>>> your reply!
>>>>>>> >>>>>> >>>
>>>>>>> >>>>>> >>>
>>>>>>> >>>>>> >>>
>>>>>>> >>>>>> >>> --
>>>>>>> >>>>>> >>> Shane Knapp
>>>>>>> >>>>>> >>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>>>>>> >>>>>> >>> https://rise.cs.berkeley.edu
>>>>>>> >>>>>> >
>>>>>>> >>>>>> >
>>>>>>> >>>>>> >
>>>>>>> >>>>>> > --
>>>>>>> >>>>>> > Twitter: https://twitter.com/holdenkarau
>>>>>>> >>>>>> > Books (Learning Spark, High Performance Spark, etc.):
>>>>>>> https://amzn.to/2MaRAG9
>>>>>>> >>>>>> > YouTube Live Streams:
>>>>>>> https://www.youtube.com/user/holdenkarau
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>> --
>>>>>>> >>> Shane Knapp
>>>>>>> >>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>>>>>> >>> https://rise.cs.berkeley.edu
>>>>>>> >>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> --
>>>>>>> >> Shane Knapp
>>>>>>> >> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>>>>>> >> https://rise.cs.berkeley.edu
>>>>>>>
>>>>>>

Re: Ask for ARM CI for spark

Posted by Tianhua huang <hu...@gmail.com>.

Hi all,

I am glad to tell you there is a new progress of build/test spark on
aarch64 server, the tests are running, see the build/test detail log
https://logs.openlabtesting.org/logs/1/1/419fcb11764048d5a3cda186ea76dd43249e1f97/check/spark-build-arm64/75cc6f5/job-output.txt.gz
and
the aarch64 instance info see
https://logs.openlabtesting.org/logs/1/1/419fcb11764048d5a3cda186ea76dd43249e1f97/check/spark-build-arm64/75cc6f5/zuul-info/zuul-info.ubuntu-xenial-arm64.txt
In
order to enable the test, I made some modification, the major one is to
build leveldbjni local package, I forked fusesource/leveldbjni and
chirino/leveldb repos, and made some modification to make sure to build the
local package, see https://github.com/huangtianhua/leveldbjni/pull/1 and
https://github.com/huangtianhua/leveldbjni/pull/2 , then to use it in
spark, the detail you can find in https://github.com/theopenlab/spark/pull/1


Now the tests are not all successful, I will try to fix it and any
suggestion is welcome, thank you all.

On Mon, Jul 1, 2019 at 5:25 PM Tianhua huang <hu...@gmail.com>
wrote:

> We are focus on the arm instance of cloud, and now I use the arm instance
> of vexxhost cloud to run the build job which mentioned above, the
> specification of the arm instance is 8VCPU and 8GB of RAM,
> and we can use bigger flavor to create the arm instance to run the job, if
> need be.
>
> On Fri, Jun 28, 2019 at 6:55 PM Steve Loughran <st...@cloudera.com.invalid>
> wrote:
>
>>
>> Be interesting to see how well a Pi4 works; with only 4GB of RAM you
>> wouldn't compile with it, but you could try installing the spark jar bundle
>> and then run against some NFS mounted disks:
>> https://www.raspberrypi.org/magpi/raspberry-pi-4-specs-benchmarks/ ;
>> unlikely to be fast, but it'd be an efficient kind of slow
>>
>> On Fri, Jun 28, 2019 at 3:08 AM Rui Chen <ch...@gmail.com> wrote:
>>
>>> >  I think any AA64 work is going to have to define very clearly what
>>> "works" is defined as
>>>
>>> +1
>>> It's very valuable to build a clear scope of these projects
>>> functionality for ARM platform in upstream community, it bring confidence
>>> to end user and customers when they plan to deploy these projects on ARM.
>>>
>>> This is absolute long term work, let's to make it step by step, CI,
>>> testing, issue and resolving.
>>>
>>> Steve Loughran <st...@cloudera.com.invalid> 于2019年6月27日周四 下午9:22写道：
>>>
>>>> level db and native codecs are invariably a problem here, as is
>>>> anything else doing misaligned IO. Protobuf has also had "issues" in the
>>>> past
>>>>
>>>> see https://issues.apache.org/jira/browse/HADOOP-16100
>>>>
>>>> I think any AA64 work is going to have to define very clearly what
>>>> "works" is defined as; spark standalone with a specific set of codecs is
>>>> probably the first thing to aim for -no Snappy or lz4.
>>>>
>>>> Anything which goes near: protobuf, checksums, native code, etc is in
>>>> trouble. Don't try and deploy with HDFS as the cluster FS, would be my
>>>> recommendation.
>>>>
>>>> If you want a cluster use NFS or one of google GCS, Azure WASB for the
>>>> cluster FS. And before trying either of those cloud store, run the
>>>> filesystem connector test suites (hadoop-azure; google gcs github) to see
>>>> that they work. If the foundational FS test suites fail, nothing else will
>>>> work
>>>>
>>>>
>>>>
>>>> On Thu, Jun 27, 2019 at 3:09 AM Tianhua huang <
>>>> huangtianhua223@gmail.com> wrote:
>>>>
>>>>> I took the ut tests on my arm instance before and reported an issue in
>>>>> https://issues.apache.org/jira/browse/SPARK-27721,  and seems there
>>>>> was no leveldbjni native package for aarch64 in leveldbjni-all.jar(or 1.8)
>>>>> https://mvnrepository.com/artifact/org.fusesource.leveldbjni/leveldbjni-all/1.8
>>>>> , we can find https://github.com/fusesource/leveldbjni/pull/82 this
>>>>> pr added the aarch64 support and merged on 2 Nov 2017, but the latest
>>>>> release of the repo is  on 17 Oct 2013, unfortunately it didn't
>>>>> include the aarch64 supporting.
>>>>>
>>>>> I will running the test on the job mentioned above, and will try to
>>>>> fix the issue above, or if anyone have any idea of it, welcome reply me,
>>>>> thank you.
>>>>>
>>>>>
>>>>> On Wed, Jun 26, 2019 at 8:11 PM Sean Owen <sr...@gmail.com> wrote:
>>>>>
>>>>>> Can you begin by testing yourself? I think the first step is to make
>>>>>> sure the build and tests work on ARM. If you find problems you can
>>>>>> isolate them and try to fix them, or at least report them. It's only
>>>>>> worth getting CI in place when we think builds will work.
>>>>>>
>>>>>> On Tue, Jun 25, 2019 at 9:26 PM Tianhua huang <
>>>>>> huangtianhua223@gmail.com> wrote:
>>>>>> >
>>>>>> > Thanks Shane :)
>>>>>> >
>>>>>> > This sounds good, and yes I agree that it's best to keep the
>>>>>> test/build infrastructure in one place. If you can't find the ARM resource
>>>>>> we are willing to support the ARM instance :)  Our goal is to make more
>>>>>> open source software to be more compatible for aarch64 platform, so let's
>>>>>> to do it. I will be happy if I can give some help for the goal.
>>>>>> >
>>>>>> > Waiting for you good news :)
>>>>>> >
>>>>>> > On Wed, Jun 26, 2019 at 9:47 AM shane knapp <sk...@berkeley.edu>
>>>>>> wrote:
>>>>>> >>
>>>>>> >> ...or via VM as you mentioned earlier.  :)
>>>>>> >>
>>>>>> >> shane (who will file a JIRA tomorrow)
>>>>>> >>
>>>>>> >> On Tue, Jun 25, 2019 at 6:44 PM shane knapp <sk...@berkeley.edu>
>>>>>> wrote:
>>>>>> >>>
>>>>>> >>> i'd much prefer that we keep the test/build infrastructure in one
>>>>>> place.
>>>>>> >>>
>>>>>> >>> we don't have ARM hardware, but there's a slim possibility i can
>>>>>> scare something up in our older research stock...
>>>>>> >>>
>>>>>> >>> another option would be to run the build in a arm-based docker
>>>>>> container, which (according to the intarwebs) is possible.
>>>>>> >>>
>>>>>> >>> shane
>>>>>> >>>
>>>>>> >>> On Tue, Jun 25, 2019 at 6:35 PM Tianhua huang <
>>>>>> huangtianhua223@gmail.com> wrote:
>>>>>> >>>>
>>>>>> >>>> I forked apache/spark project and propose a job(
>>>>>> https://github.com/theopenlab/spark/pull/1) for spark building in
>>>>>> OpenLab ARM instance, this is the first step to build spark on ARM,  I can
>>>>>> enable a periodic job for arm building for apache/spark master if you guys
>>>>>> like.  Later I will run tests for spark. I also willing to be the
>>>>>> maintainer of the arm ci of spark.
>>>>>> >>>>
>>>>>> >>>> Thanks for you attention.
>>>>>> >>>>
>>>>>> >>>> On Thu, Jun 20, 2019 at 10:17 AM Tianhua huang <
>>>>>> huangtianhua223@gmail.com> wrote:
>>>>>> >>>>>
>>>>>> >>>>> Thanks Sean.
>>>>>> >>>>>
>>>>>> >>>>> I am very happy to hear that the community will put effort to
>>>>>> fix the ARM-related issues. I'd be happy to help if you like. And could you
>>>>>> give the trace link of this issue, then I can check it is fixed or not,
>>>>>> thank you.
>>>>>> >>>>> As far as I know the old versions of spark support ARM, and now
>>>>>> the new versions don't, this just shows that we need a CI to check whether
>>>>>> the spark support ARM and whether some modification break it.
>>>>>> >>>>> I will add a demo job in OpenLab to build spark on ARM and do a
>>>>>> simple UT test. Later I will give the job link.
>>>>>> >>>>>
>>>>>> >>>>> Let me know what you think.
>>>>>> >>>>>
>>>>>> >>>>> Thank you all!
>>>>>> >>>>>
>>>>>> >>>>>
>>>>>> >>>>> On Wed, Jun 19, 2019 at 8:47 PM Sean Owen <sr...@gmail.com>
>>>>>> wrote:
>>>>>> >>>>>>
>>>>>> >>>>>> I'd begin by reporting and fixing ARM-related issues in the
>>>>>> build. If
>>>>>> >>>>>> they're small, of course we should do them. If it requires
>>>>>> significant
>>>>>> >>>>>> modifications, we can discuss how much Spark can support ARM.
>>>>>> I don't
>>>>>> >>>>>> think it's yet necessary for the Spark project to run these CI
>>>>>> builds
>>>>>> >>>>>> until that point, but it's always welcome if people are
>>>>>> testing that
>>>>>> >>>>>> separately.
>>>>>> >>>>>>
>>>>>> >>>>>> On Wed, Jun 19, 2019 at 7:41 AM Holden Karau <
>>>>>> holden@pigscanfly.ca> wrote:
>>>>>> >>>>>> >
>>>>>> >>>>>> > Moving to dev@ for increased visibility among the
>>>>>> developers.
>>>>>> >>>>>> >
>>>>>> >>>>>> > On Wed, Jun 19, 2019 at 1:24 AM Tianhua huang <
>>>>>> huangtianhua223@gmail.com> wrote:
>>>>>> >>>>>> >>
>>>>>> >>>>>> >> Thanks for your reply.
>>>>>> >>>>>> >>
>>>>>> >>>>>> >> As I said before, I met some problem of build or test for
>>>>>> spark on aarch64 server, so it will be better to have the ARM CI to make
>>>>>> sure the spark is compatible for AArch64 platforms.
>>>>>> >>>>>> >>
>>>>>> >>>>>> >> I’m from OpenLab team(https://openlabtesting.org/ ,a
>>>>>> community to do open source project testing. And we can support some Arm
>>>>>> virtual machines to AMPLab Jenkins, and also we have a developer team that
>>>>>> willing to work on this, we willing to maintain build CI jobs and address
>>>>>> the CI issues.  What do you think?
>>>>>> >>>>>> >>
>>>>>> >>>>>> >>
>>>>>> >>>>>> >> Thanks for your attention.
>>>>>> >>>>>> >>
>>>>>> >>>>>> >>
>>>>>> >>>>>> >> On Wed, Jun 19, 2019 at 6:39 AM shane knapp <
>>>>>> sknapp@berkeley.edu> wrote:
>>>>>> >>>>>> >>>
>>>>>> >>>>>> >>> yeah, we don't have any aarch64 systems for testing...
>>>>>> this has been asked before but is currently pretty low on our priority list
>>>>>> as we don't have the hardware.
>>>>>> >>>>>> >>>
>>>>>> >>>>>> >>> sorry,
>>>>>> >>>>>> >>>
>>>>>> >>>>>> >>> shane
>>>>>> >>>>>> >>>
>>>>>> >>>>>> >>> On Mon, Jun 10, 2019 at 7:08 PM Tianhua huang <
>>>>>> huangtianhua223@gmail.com> wrote:
>>>>>> >>>>>> >>>>
>>>>>> >>>>>> >>>> Hi, sorry to disturb you.
>>>>>> >>>>>> >>>> The CI testing for apache spark is supported by AMPLab
>>>>>> Jenkins, and I find there are some computers(most of them are Linux (amd64)
>>>>>> arch) for the CI development, but seems there is no Aarch64 computer for
>>>>>> spark CI testing. Recently, I build and run test for spark(master and
>>>>>> branch-2.4) on my arm server, and unfortunately there are some problems,
>>>>>> for example, ut test is failed due to a LEVELDBJNI native package, the
>>>>>> details for java test see http://paste.openstack.org/show/752063/
>>>>>> and python test see http://paste.openstack.org/show/752709/
>>>>>> >>>>>> >>>> So I have a question about the ARM CI testing for spark,
>>>>>> is there any plan to support it? Thank you very much and I will wait for
>>>>>> your reply!
>>>>>> >>>>>> >>>
>>>>>> >>>>>> >>>
>>>>>> >>>>>> >>>
>>>>>> >>>>>> >>> --
>>>>>> >>>>>> >>> Shane Knapp
>>>>>> >>>>>> >>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>>>>> >>>>>> >>> https://rise.cs.berkeley.edu
>>>>>> >>>>>> >
>>>>>> >>>>>> >
>>>>>> >>>>>> >
>>>>>> >>>>>> > --
>>>>>> >>>>>> > Twitter: https://twitter.com/holdenkarau
>>>>>> >>>>>> > Books (Learning Spark, High Performance Spark, etc.):
>>>>>> https://amzn.to/2MaRAG9
>>>>>> >>>>>> > YouTube Live Streams:
>>>>>> https://www.youtube.com/user/holdenkarau
>>>>>> >>>
>>>>>> >>>
>>>>>> >>>
>>>>>> >>> --
>>>>>> >>> Shane Knapp
>>>>>> >>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>>>>> >>> https://rise.cs.berkeley.edu
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >> --
>>>>>> >> Shane Knapp
>>>>>> >> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>>>>> >> https://rise.cs.berkeley.edu
>>>>>>
>>>>>

Re: Ask for ARM CI for spark

Posted by Tianhua huang <hu...@gmail.com>.

We are focus on the arm instance of cloud, and now I use the arm instance
of vexxhost cloud to run the build job which mentioned above, the
specification of the arm instance is 8VCPU and 8GB of RAM,
and we can use bigger flavor to create the arm instance to run the job, if
need be.

On Fri, Jun 28, 2019 at 6:55 PM Steve Loughran <st...@cloudera.com.invalid>
wrote:

>
> Be interesting to see how well a Pi4 works; with only 4GB of RAM you
> wouldn't compile with it, but you could try installing the spark jar bundle
> and then run against some NFS mounted disks:
> https://www.raspberrypi.org/magpi/raspberry-pi-4-specs-benchmarks/ ;
> unlikely to be fast, but it'd be an efficient kind of slow
>
> On Fri, Jun 28, 2019 at 3:08 AM Rui Chen <ch...@gmail.com> wrote:
>
>> >  I think any AA64 work is going to have to define very clearly what
>> "works" is defined as
>>
>> +1
>> It's very valuable to build a clear scope of these projects functionality
>> for ARM platform in upstream community, it bring confidence to end user and
>> customers when they plan to deploy these projects on ARM.
>>
>> This is absolute long term work, let's to make it step by step, CI,
>> testing, issue and resolving.
>>
>> Steve Loughran <st...@cloudera.com.invalid> 于2019年6月27日周四 下午9:22写道：
>>
>>> level db and native codecs are invariably a problem here, as is anything
>>> else doing misaligned IO. Protobuf has also had "issues" in the past
>>>
>>> see https://issues.apache.org/jira/browse/HADOOP-16100
>>>
>>> I think any AA64 work is going to have to define very clearly what
>>> "works" is defined as; spark standalone with a specific set of codecs is
>>> probably the first thing to aim for -no Snappy or lz4.
>>>
>>> Anything which goes near: protobuf, checksums, native code, etc is in
>>> trouble. Don't try and deploy with HDFS as the cluster FS, would be my
>>> recommendation.
>>>
>>> If you want a cluster use NFS or one of google GCS, Azure WASB for the
>>> cluster FS. And before trying either of those cloud store, run the
>>> filesystem connector test suites (hadoop-azure; google gcs github) to see
>>> that they work. If the foundational FS test suites fail, nothing else will
>>> work
>>>
>>>
>>>
>>> On Thu, Jun 27, 2019 at 3:09 AM Tianhua huang <hu...@gmail.com>
>>> wrote:
>>>
>>>> I took the ut tests on my arm instance before and reported an issue in
>>>> https://issues.apache.org/jira/browse/SPARK-27721,  and seems there
>>>> was no leveldbjni native package for aarch64 in leveldbjni-all.jar(or 1.8)
>>>> https://mvnrepository.com/artifact/org.fusesource.leveldbjni/leveldbjni-all/1.8
>>>> , we can find https://github.com/fusesource/leveldbjni/pull/82 this pr
>>>> added the aarch64 support and merged on 2 Nov 2017, but the latest release
>>>> of the repo is  on 17 Oct 2013, unfortunately it didn't include the
>>>> aarch64 supporting.
>>>>
>>>> I will running the test on the job mentioned above, and will try to fix
>>>> the issue above, or if anyone have any idea of it, welcome reply me, thank
>>>> you.
>>>>
>>>>
>>>> On Wed, Jun 26, 2019 at 8:11 PM Sean Owen <sr...@gmail.com> wrote:
>>>>
>>>>> Can you begin by testing yourself? I think the first step is to make
>>>>> sure the build and tests work on ARM. If you find problems you can
>>>>> isolate them and try to fix them, or at least report them. It's only
>>>>> worth getting CI in place when we think builds will work.
>>>>>
>>>>> On Tue, Jun 25, 2019 at 9:26 PM Tianhua huang <
>>>>> huangtianhua223@gmail.com> wrote:
>>>>> >
>>>>> > Thanks Shane :)
>>>>> >
>>>>> > This sounds good, and yes I agree that it's best to keep the
>>>>> test/build infrastructure in one place. If you can't find the ARM resource
>>>>> we are willing to support the ARM instance :)  Our goal is to make more
>>>>> open source software to be more compatible for aarch64 platform, so let's
>>>>> to do it. I will be happy if I can give some help for the goal.
>>>>> >
>>>>> > Waiting for you good news :)
>>>>> >
>>>>> > On Wed, Jun 26, 2019 at 9:47 AM shane knapp <sk...@berkeley.edu>
>>>>> wrote:
>>>>> >>
>>>>> >> ...or via VM as you mentioned earlier.  :)
>>>>> >>
>>>>> >> shane (who will file a JIRA tomorrow)
>>>>> >>
>>>>> >> On Tue, Jun 25, 2019 at 6:44 PM shane knapp <sk...@berkeley.edu>
>>>>> wrote:
>>>>> >>>
>>>>> >>> i'd much prefer that we keep the test/build infrastructure in one
>>>>> place.
>>>>> >>>
>>>>> >>> we don't have ARM hardware, but there's a slim possibility i can
>>>>> scare something up in our older research stock...
>>>>> >>>
>>>>> >>> another option would be to run the build in a arm-based docker
>>>>> container, which (according to the intarwebs) is possible.
>>>>> >>>
>>>>> >>> shane
>>>>> >>>
>>>>> >>> On Tue, Jun 25, 2019 at 6:35 PM Tianhua huang <
>>>>> huangtianhua223@gmail.com> wrote:
>>>>> >>>>
>>>>> >>>> I forked apache/spark project and propose a job(
>>>>> https://github.com/theopenlab/spark/pull/1) for spark building in
>>>>> OpenLab ARM instance, this is the first step to build spark on ARM,  I can
>>>>> enable a periodic job for arm building for apache/spark master if you guys
>>>>> like.  Later I will run tests for spark. I also willing to be the
>>>>> maintainer of the arm ci of spark.
>>>>> >>>>
>>>>> >>>> Thanks for you attention.
>>>>> >>>>
>>>>> >>>> On Thu, Jun 20, 2019 at 10:17 AM Tianhua huang <
>>>>> huangtianhua223@gmail.com> wrote:
>>>>> >>>>>
>>>>> >>>>> Thanks Sean.
>>>>> >>>>>
>>>>> >>>>> I am very happy to hear that the community will put effort to
>>>>> fix the ARM-related issues. I'd be happy to help if you like. And could you
>>>>> give the trace link of this issue, then I can check it is fixed or not,
>>>>> thank you.
>>>>> >>>>> As far as I know the old versions of spark support ARM, and now
>>>>> the new versions don't, this just shows that we need a CI to check whether
>>>>> the spark support ARM and whether some modification break it.
>>>>> >>>>> I will add a demo job in OpenLab to build spark on ARM and do a
>>>>> simple UT test. Later I will give the job link.
>>>>> >>>>>
>>>>> >>>>> Let me know what you think.
>>>>> >>>>>
>>>>> >>>>> Thank you all!
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>> On Wed, Jun 19, 2019 at 8:47 PM Sean Owen <sr...@gmail.com>
>>>>> wrote:
>>>>> >>>>>>
>>>>> >>>>>> I'd begin by reporting and fixing ARM-related issues in the
>>>>> build. If
>>>>> >>>>>> they're small, of course we should do them. If it requires
>>>>> significant
>>>>> >>>>>> modifications, we can discuss how much Spark can support ARM. I
>>>>> don't
>>>>> >>>>>> think it's yet necessary for the Spark project to run these CI
>>>>> builds
>>>>> >>>>>> until that point, but it's always welcome if people are testing
>>>>> that
>>>>> >>>>>> separately.
>>>>> >>>>>>
>>>>> >>>>>> On Wed, Jun 19, 2019 at 7:41 AM Holden Karau <
>>>>> holden@pigscanfly.ca> wrote:
>>>>> >>>>>> >
>>>>> >>>>>> > Moving to dev@ for increased visibility among the developers.
>>>>> >>>>>> >
>>>>> >>>>>> > On Wed, Jun 19, 2019 at 1:24 AM Tianhua huang <
>>>>> huangtianhua223@gmail.com> wrote:
>>>>> >>>>>> >>
>>>>> >>>>>> >> Thanks for your reply.
>>>>> >>>>>> >>
>>>>> >>>>>> >> As I said before, I met some problem of build or test for
>>>>> spark on aarch64 server, so it will be better to have the ARM CI to make
>>>>> sure the spark is compatible for AArch64 platforms.
>>>>> >>>>>> >>
>>>>> >>>>>> >> I’m from OpenLab team(https://openlabtesting.org/ ,a
>>>>> community to do open source project testing. And we can support some Arm
>>>>> virtual machines to AMPLab Jenkins, and also we have a developer team that
>>>>> willing to work on this, we willing to maintain build CI jobs and address
>>>>> the CI issues.  What do you think?
>>>>> >>>>>> >>
>>>>> >>>>>> >>
>>>>> >>>>>> >> Thanks for your attention.
>>>>> >>>>>> >>
>>>>> >>>>>> >>
>>>>> >>>>>> >> On Wed, Jun 19, 2019 at 6:39 AM shane knapp <
>>>>> sknapp@berkeley.edu> wrote:
>>>>> >>>>>> >>>
>>>>> >>>>>> >>> yeah, we don't have any aarch64 systems for testing...
>>>>> this has been asked before but is currently pretty low on our priority list
>>>>> as we don't have the hardware.
>>>>> >>>>>> >>>
>>>>> >>>>>> >>> sorry,
>>>>> >>>>>> >>>
>>>>> >>>>>> >>> shane
>>>>> >>>>>> >>>
>>>>> >>>>>> >>> On Mon, Jun 10, 2019 at 7:08 PM Tianhua huang <
>>>>> huangtianhua223@gmail.com> wrote:
>>>>> >>>>>> >>>>
>>>>> >>>>>> >>>> Hi, sorry to disturb you.
>>>>> >>>>>> >>>> The CI testing for apache spark is supported by AMPLab
>>>>> Jenkins, and I find there are some computers(most of them are Linux (amd64)
>>>>> arch) for the CI development, but seems there is no Aarch64 computer for
>>>>> spark CI testing. Recently, I build and run test for spark(master and
>>>>> branch-2.4) on my arm server, and unfortunately there are some problems,
>>>>> for example, ut test is failed due to a LEVELDBJNI native package, the
>>>>> details for java test see http://paste.openstack.org/show/752063/ and
>>>>> python test see http://paste.openstack.org/show/752709/
>>>>> >>>>>> >>>> So I have a question about the ARM CI testing for spark,
>>>>> is there any plan to support it? Thank you very much and I will wait for
>>>>> your reply!
>>>>> >>>>>> >>>
>>>>> >>>>>> >>>
>>>>> >>>>>> >>>
>>>>> >>>>>> >>> --
>>>>> >>>>>> >>> Shane Knapp
>>>>> >>>>>> >>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>>>> >>>>>> >>> https://rise.cs.berkeley.edu
>>>>> >>>>>> >
>>>>> >>>>>> >
>>>>> >>>>>> >
>>>>> >>>>>> > --
>>>>> >>>>>> > Twitter: https://twitter.com/holdenkarau
>>>>> >>>>>> > Books (Learning Spark, High Performance Spark, etc.):
>>>>> https://amzn.to/2MaRAG9
>>>>> >>>>>> > YouTube Live Streams:
>>>>> https://www.youtube.com/user/holdenkarau
>>>>> >>>
>>>>> >>>
>>>>> >>>
>>>>> >>> --
>>>>> >>> Shane Knapp
>>>>> >>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>>>> >>> https://rise.cs.berkeley.edu
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> --
>>>>> >> Shane Knapp
>>>>> >> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>>>> >> https://rise.cs.berkeley.edu
>>>>>
>>>>

Re: Ask for ARM CI for spark

Posted by Steve Loughran <st...@cloudera.com.INVALID>.

Be interesting to see how well a Pi4 works; with only 4GB of RAM you
wouldn't compile with it, but you could try installing the spark jar bundle
and then run against some NFS mounted disks:
https://www.raspberrypi.org/magpi/raspberry-pi-4-specs-benchmarks/ ;
unlikely to be fast, but it'd be an efficient kind of slow

On Fri, Jun 28, 2019 at 3:08 AM Rui Chen <ch...@gmail.com> wrote:

> >  I think any AA64 work is going to have to define very clearly what
> "works" is defined as
>
> +1
> It's very valuable to build a clear scope of these projects functionality
> for ARM platform in upstream community, it bring confidence to end user and
> customers when they plan to deploy these projects on ARM.
>
> This is absolute long term work, let's to make it step by step, CI,
> testing, issue and resolving.
>
> Steve Loughran <st...@cloudera.com.invalid> 于2019年6月27日周四 下午9:22写道：
>
>> level db and native codecs are invariably a problem here, as is anything
>> else doing misaligned IO. Protobuf has also had "issues" in the past
>>
>> see https://issues.apache.org/jira/browse/HADOOP-16100
>>
>> I think any AA64 work is going to have to define very clearly what
>> "works" is defined as; spark standalone with a specific set of codecs is
>> probably the first thing to aim for -no Snappy or lz4.
>>
>> Anything which goes near: protobuf, checksums, native code, etc is in
>> trouble. Don't try and deploy with HDFS as the cluster FS, would be my
>> recommendation.
>>
>> If you want a cluster use NFS or one of google GCS, Azure WASB for the
>> cluster FS. And before trying either of those cloud store, run the
>> filesystem connector test suites (hadoop-azure; google gcs github) to see
>> that they work. If the foundational FS test suites fail, nothing else will
>> work
>>
>>
>>
>> On Thu, Jun 27, 2019 at 3:09 AM Tianhua huang <hu...@gmail.com>
>> wrote:
>>
>>> I took the ut tests on my arm instance before and reported an issue in
>>> https://issues.apache.org/jira/browse/SPARK-27721,  and seems there was
>>> no leveldbjni native package for aarch64 in leveldbjni-all.jar(or 1.8)
>>> https://mvnrepository.com/artifact/org.fusesource.leveldbjni/leveldbjni-all/1.8
>>> , we can find https://github.com/fusesource/leveldbjni/pull/82 this pr
>>> added the aarch64 support and merged on 2 Nov 2017, but the latest release
>>> of the repo is  on 17 Oct 2013, unfortunately it didn't include the
>>> aarch64 supporting.
>>>
>>> I will running the test on the job mentioned above, and will try to fix
>>> the issue above, or if anyone have any idea of it, welcome reply me, thank
>>> you.
>>>
>>>
>>> On Wed, Jun 26, 2019 at 8:11 PM Sean Owen <sr...@gmail.com> wrote:
>>>
>>>> Can you begin by testing yourself? I think the first step is to make
>>>> sure the build and tests work on ARM. If you find problems you can
>>>> isolate them and try to fix them, or at least report them. It's only
>>>> worth getting CI in place when we think builds will work.
>>>>
>>>> On Tue, Jun 25, 2019 at 9:26 PM Tianhua huang <
>>>> huangtianhua223@gmail.com> wrote:
>>>> >
>>>> > Thanks Shane :)
>>>> >
>>>> > This sounds good, and yes I agree that it's best to keep the
>>>> test/build infrastructure in one place. If you can't find the ARM resource
>>>> we are willing to support the ARM instance :)  Our goal is to make more
>>>> open source software to be more compatible for aarch64 platform, so let's
>>>> to do it. I will be happy if I can give some help for the goal.
>>>> >
>>>> > Waiting for you good news :)
>>>> >
>>>> > On Wed, Jun 26, 2019 at 9:47 AM shane knapp <sk...@berkeley.edu>
>>>> wrote:
>>>> >>
>>>> >> ...or via VM as you mentioned earlier.  :)
>>>> >>
>>>> >> shane (who will file a JIRA tomorrow)
>>>> >>
>>>> >> On Tue, Jun 25, 2019 at 6:44 PM shane knapp <sk...@berkeley.edu>
>>>> wrote:
>>>> >>>
>>>> >>> i'd much prefer that we keep the test/build infrastructure in one
>>>> place.
>>>> >>>
>>>> >>> we don't have ARM hardware, but there's a slim possibility i can
>>>> scare something up in our older research stock...
>>>> >>>
>>>> >>> another option would be to run the build in a arm-based docker
>>>> container, which (according to the intarwebs) is possible.
>>>> >>>
>>>> >>> shane
>>>> >>>
>>>> >>> On Tue, Jun 25, 2019 at 6:35 PM Tianhua huang <
>>>> huangtianhua223@gmail.com> wrote:
>>>> >>>>
>>>> >>>> I forked apache/spark project and propose a job(
>>>> https://github.com/theopenlab/spark/pull/1) for spark building in
>>>> OpenLab ARM instance, this is the first step to build spark on ARM,  I can
>>>> enable a periodic job for arm building for apache/spark master if you guys
>>>> like.  Later I will run tests for spark. I also willing to be the
>>>> maintainer of the arm ci of spark.
>>>> >>>>
>>>> >>>> Thanks for you attention.
>>>> >>>>
>>>> >>>> On Thu, Jun 20, 2019 at 10:17 AM Tianhua huang <
>>>> huangtianhua223@gmail.com> wrote:
>>>> >>>>>
>>>> >>>>> Thanks Sean.
>>>> >>>>>
>>>> >>>>> I am very happy to hear that the community will put effort to fix
>>>> the ARM-related issues. I'd be happy to help if you like. And could you
>>>> give the trace link of this issue, then I can check it is fixed or not,
>>>> thank you.
>>>> >>>>> As far as I know the old versions of spark support ARM, and now
>>>> the new versions don't, this just shows that we need a CI to check whether
>>>> the spark support ARM and whether some modification break it.
>>>> >>>>> I will add a demo job in OpenLab to build spark on ARM and do a
>>>> simple UT test. Later I will give the job link.
>>>> >>>>>
>>>> >>>>> Let me know what you think.
>>>> >>>>>
>>>> >>>>> Thank you all!
>>>> >>>>>
>>>> >>>>>
>>>> >>>>> On Wed, Jun 19, 2019 at 8:47 PM Sean Owen <sr...@gmail.com>
>>>> wrote:
>>>> >>>>>>
>>>> >>>>>> I'd begin by reporting and fixing ARM-related issues in the
>>>> build. If
>>>> >>>>>> they're small, of course we should do them. If it requires
>>>> significant
>>>> >>>>>> modifications, we can discuss how much Spark can support ARM. I
>>>> don't
>>>> >>>>>> think it's yet necessary for the Spark project to run these CI
>>>> builds
>>>> >>>>>> until that point, but it's always welcome if people are testing
>>>> that
>>>> >>>>>> separately.
>>>> >>>>>>
>>>> >>>>>> On Wed, Jun 19, 2019 at 7:41 AM Holden Karau <
>>>> holden@pigscanfly.ca> wrote:
>>>> >>>>>> >
>>>> >>>>>> > Moving to dev@ for increased visibility among the developers.
>>>> >>>>>> >
>>>> >>>>>> > On Wed, Jun 19, 2019 at 1:24 AM Tianhua huang <
>>>> huangtianhua223@gmail.com> wrote:
>>>> >>>>>> >>
>>>> >>>>>> >> Thanks for your reply.
>>>> >>>>>> >>
>>>> >>>>>> >> As I said before, I met some problem of build or test for
>>>> spark on aarch64 server, so it will be better to have the ARM CI to make
>>>> sure the spark is compatible for AArch64 platforms.
>>>> >>>>>> >>
>>>> >>>>>> >> I’m from OpenLab team(https://openlabtesting.org/ ,a
>>>> community to do open source project testing. And we can support some Arm
>>>> virtual machines to AMPLab Jenkins, and also we have a developer team that
>>>> willing to work on this, we willing to maintain build CI jobs and address
>>>> the CI issues.  What do you think?
>>>> >>>>>> >>
>>>> >>>>>> >>
>>>> >>>>>> >> Thanks for your attention.
>>>> >>>>>> >>
>>>> >>>>>> >>
>>>> >>>>>> >> On Wed, Jun 19, 2019 at 6:39 AM shane knapp <
>>>> sknapp@berkeley.edu> wrote:
>>>> >>>>>> >>>
>>>> >>>>>> >>> yeah, we don't have any aarch64 systems for testing...  this
>>>> has been asked before but is currently pretty low on our priority list as
>>>> we don't have the hardware.
>>>> >>>>>> >>>
>>>> >>>>>> >>> sorry,
>>>> >>>>>> >>>
>>>> >>>>>> >>> shane
>>>> >>>>>> >>>
>>>> >>>>>> >>> On Mon, Jun 10, 2019 at 7:08 PM Tianhua huang <
>>>> huangtianhua223@gmail.com> wrote:
>>>> >>>>>> >>>>
>>>> >>>>>> >>>> Hi, sorry to disturb you.
>>>> >>>>>> >>>> The CI testing for apache spark is supported by AMPLab
>>>> Jenkins, and I find there are some computers(most of them are Linux (amd64)
>>>> arch) for the CI development, but seems there is no Aarch64 computer for
>>>> spark CI testing. Recently, I build and run test for spark(master and
>>>> branch-2.4) on my arm server, and unfortunately there are some problems,
>>>> for example, ut test is failed due to a LEVELDBJNI native package, the
>>>> details for java test see http://paste.openstack.org/show/752063/ and
>>>> python test see http://paste.openstack.org/show/752709/
>>>> >>>>>> >>>> So I have a question about the ARM CI testing for spark, is
>>>> there any plan to support it? Thank you very much and I will wait for your
>>>> reply!
>>>> >>>>>> >>>
>>>> >>>>>> >>>
>>>> >>>>>> >>>
>>>> >>>>>> >>> --
>>>> >>>>>> >>> Shane Knapp
>>>> >>>>>> >>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>>> >>>>>> >>> https://rise.cs.berkeley.edu
>>>> >>>>>> >
>>>> >>>>>> >
>>>> >>>>>> >
>>>> >>>>>> > --
>>>> >>>>>> > Twitter: https://twitter.com/holdenkarau
>>>> >>>>>> > Books (Learning Spark, High Performance Spark, etc.):
>>>> https://amzn.to/2MaRAG9
>>>> >>>>>> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>> --
>>>> >>> Shane Knapp
>>>> >>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>>> >>> https://rise.cs.berkeley.edu
>>>> >>
>>>> >>
>>>> >>
>>>> >> --
>>>> >> Shane Knapp
>>>> >> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>>> >> https://rise.cs.berkeley.edu
>>>>
>>>

Re: Ask for ARM CI for spark

Posted by Rui Chen <ch...@gmail.com>.

>  I think any AA64 work is going to have to define very clearly what
"works" is defined as

+1
It's very valuable to build a clear scope of these projects functionality
for ARM platform in upstream community, it bring confidence to end user and
customers when they plan to deploy these projects on ARM.

This is absolute long term work, let's to make it step by step, CI,
testing, issue and resolving.

Steve Loughran <st...@cloudera.com.invalid> 于2019年6月27日周四 下午9:22写道：

> level db and native codecs are invariably a problem here, as is anything
> else doing misaligned IO. Protobuf has also had "issues" in the past
>
> see https://issues.apache.org/jira/browse/HADOOP-16100
>
> I think any AA64 work is going to have to define very clearly what "works"
> is defined as; spark standalone with a specific set of codecs is probably
> the first thing to aim for -no Snappy or lz4.
>
> Anything which goes near: protobuf, checksums, native code, etc is in
> trouble. Don't try and deploy with HDFS as the cluster FS, would be my
> recommendation.
>
> If you want a cluster use NFS or one of google GCS, Azure WASB for the
> cluster FS. And before trying either of those cloud store, run the
> filesystem connector test suites (hadoop-azure; google gcs github) to see
> that they work. If the foundational FS test suites fail, nothing else will
> work
>
>
>
> On Thu, Jun 27, 2019 at 3:09 AM Tianhua huang <hu...@gmail.com>
> wrote:
>
>> I took the ut tests on my arm instance before and reported an issue in
>> https://issues.apache.org/jira/browse/SPARK-27721,  and seems there was
>> no leveldbjni native package for aarch64 in leveldbjni-all.jar(or 1.8)
>> https://mvnrepository.com/artifact/org.fusesource.leveldbjni/leveldbjni-all/1.8
>> , we can find https://github.com/fusesource/leveldbjni/pull/82 this pr
>> added the aarch64 support and merged on 2 Nov 2017, but the latest release
>> of the repo is  on 17 Oct 2013, unfortunately it didn't include the
>> aarch64 supporting.
>>
>> I will running the test on the job mentioned above, and will try to fix
>> the issue above, or if anyone have any idea of it, welcome reply me, thank
>> you.
>>
>>
>> On Wed, Jun 26, 2019 at 8:11 PM Sean Owen <sr...@gmail.com> wrote:
>>
>>> Can you begin by testing yourself? I think the first step is to make
>>> sure the build and tests work on ARM. If you find problems you can
>>> isolate them and try to fix them, or at least report them. It's only
>>> worth getting CI in place when we think builds will work.
>>>
>>> On Tue, Jun 25, 2019 at 9:26 PM Tianhua huang <hu...@gmail.com>
>>> wrote:
>>> >
>>> > Thanks Shane :)
>>> >
>>> > This sounds good, and yes I agree that it's best to keep the
>>> test/build infrastructure in one place. If you can't find the ARM resource
>>> we are willing to support the ARM instance :)  Our goal is to make more
>>> open source software to be more compatible for aarch64 platform, so let's
>>> to do it. I will be happy if I can give some help for the goal.
>>> >
>>> > Waiting for you good news :)
>>> >
>>> > On Wed, Jun 26, 2019 at 9:47 AM shane knapp <sk...@berkeley.edu>
>>> wrote:
>>> >>
>>> >> ...or via VM as you mentioned earlier.  :)
>>> >>
>>> >> shane (who will file a JIRA tomorrow)
>>> >>
>>> >> On Tue, Jun 25, 2019 at 6:44 PM shane knapp <sk...@berkeley.edu>
>>> wrote:
>>> >>>
>>> >>> i'd much prefer that we keep the test/build infrastructure in one
>>> place.
>>> >>>
>>> >>> we don't have ARM hardware, but there's a slim possibility i can
>>> scare something up in our older research stock...
>>> >>>
>>> >>> another option would be to run the build in a arm-based docker
>>> container, which (according to the intarwebs) is possible.
>>> >>>
>>> >>> shane
>>> >>>
>>> >>> On Tue, Jun 25, 2019 at 6:35 PM Tianhua huang <
>>> huangtianhua223@gmail.com> wrote:
>>> >>>>
>>> >>>> I forked apache/spark project and propose a job(
>>> https://github.com/theopenlab/spark/pull/1) for spark building in
>>> OpenLab ARM instance, this is the first step to build spark on ARM,  I can
>>> enable a periodic job for arm building for apache/spark master if you guys
>>> like.  Later I will run tests for spark. I also willing to be the
>>> maintainer of the arm ci of spark.
>>> >>>>
>>> >>>> Thanks for you attention.
>>> >>>>
>>> >>>> On Thu, Jun 20, 2019 at 10:17 AM Tianhua huang <
>>> huangtianhua223@gmail.com> wrote:
>>> >>>>>
>>> >>>>> Thanks Sean.
>>> >>>>>
>>> >>>>> I am very happy to hear that the community will put effort to fix
>>> the ARM-related issues. I'd be happy to help if you like. And could you
>>> give the trace link of this issue, then I can check it is fixed or not,
>>> thank you.
>>> >>>>> As far as I know the old versions of spark support ARM, and now
>>> the new versions don't, this just shows that we need a CI to check whether
>>> the spark support ARM and whether some modification break it.
>>> >>>>> I will add a demo job in OpenLab to build spark on ARM and do a
>>> simple UT test. Later I will give the job link.
>>> >>>>>
>>> >>>>> Let me know what you think.
>>> >>>>>
>>> >>>>> Thank you all!
>>> >>>>>
>>> >>>>>
>>> >>>>> On Wed, Jun 19, 2019 at 8:47 PM Sean Owen <sr...@gmail.com>
>>> wrote:
>>> >>>>>>
>>> >>>>>> I'd begin by reporting and fixing ARM-related issues in the
>>> build. If
>>> >>>>>> they're small, of course we should do them. If it requires
>>> significant
>>> >>>>>> modifications, we can discuss how much Spark can support ARM. I
>>> don't
>>> >>>>>> think it's yet necessary for the Spark project to run these CI
>>> builds
>>> >>>>>> until that point, but it's always welcome if people are testing
>>> that
>>> >>>>>> separately.
>>> >>>>>>
>>> >>>>>> On Wed, Jun 19, 2019 at 7:41 AM Holden Karau <
>>> holden@pigscanfly.ca> wrote:
>>> >>>>>> >
>>> >>>>>> > Moving to dev@ for increased visibility among the developers.
>>> >>>>>> >
>>> >>>>>> > On Wed, Jun 19, 2019 at 1:24 AM Tianhua huang <
>>> huangtianhua223@gmail.com> wrote:
>>> >>>>>> >>
>>> >>>>>> >> Thanks for your reply.
>>> >>>>>> >>
>>> >>>>>> >> As I said before, I met some problem of build or test for
>>> spark on aarch64 server, so it will be better to have the ARM CI to make
>>> sure the spark is compatible for AArch64 platforms.
>>> >>>>>> >>
>>> >>>>>> >> I’m from OpenLab team(https://openlabtesting.org/ ,a
>>> community to do open source project testing. And we can support some Arm
>>> virtual machines to AMPLab Jenkins, and also we have a developer team that
>>> willing to work on this, we willing to maintain build CI jobs and address
>>> the CI issues.  What do you think?
>>> >>>>>> >>
>>> >>>>>> >>
>>> >>>>>> >> Thanks for your attention.
>>> >>>>>> >>
>>> >>>>>> >>
>>> >>>>>> >> On Wed, Jun 19, 2019 at 6:39 AM shane knapp <
>>> sknapp@berkeley.edu> wrote:
>>> >>>>>> >>>
>>> >>>>>> >>> yeah, we don't have any aarch64 systems for testing...  this
>>> has been asked before but is currently pretty low on our priority list as
>>> we don't have the hardware.
>>> >>>>>> >>>
>>> >>>>>> >>> sorry,
>>> >>>>>> >>>
>>> >>>>>> >>> shane
>>> >>>>>> >>>
>>> >>>>>> >>> On Mon, Jun 10, 2019 at 7:08 PM Tianhua huang <
>>> huangtianhua223@gmail.com> wrote:
>>> >>>>>> >>>>
>>> >>>>>> >>>> Hi, sorry to disturb you.
>>> >>>>>> >>>> The CI testing for apache spark is supported by AMPLab
>>> Jenkins, and I find there are some computers(most of them are Linux (amd64)
>>> arch) for the CI development, but seems there is no Aarch64 computer for
>>> spark CI testing. Recently, I build and run test for spark(master and
>>> branch-2.4) on my arm server, and unfortunately there are some problems,
>>> for example, ut test is failed due to a LEVELDBJNI native package, the
>>> details for java test see http://paste.openstack.org/show/752063/ and
>>> python test see http://paste.openstack.org/show/752709/
>>> >>>>>> >>>> So I have a question about the ARM CI testing for spark, is
>>> there any plan to support it? Thank you very much and I will wait for your
>>> reply!
>>> >>>>>> >>>
>>> >>>>>> >>>
>>> >>>>>> >>>
>>> >>>>>> >>> --
>>> >>>>>> >>> Shane Knapp
>>> >>>>>> >>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>> >>>>>> >>> https://rise.cs.berkeley.edu
>>> >>>>>> >
>>> >>>>>> >
>>> >>>>>> >
>>> >>>>>> > --
>>> >>>>>> > Twitter: https://twitter.com/holdenkarau
>>> >>>>>> > Books (Learning Spark, High Performance Spark, etc.):
>>> https://amzn.to/2MaRAG9
>>> >>>>>> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>> >>>
>>> >>>
>>> >>>
>>> >>> --
>>> >>> Shane Knapp
>>> >>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>> >>> https://rise.cs.berkeley.edu
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Shane Knapp
>>> >> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>> >> https://rise.cs.berkeley.edu
>>>
>>

Re: Ask for ARM CI for spark

Posted by Steve Loughran <st...@cloudera.com.INVALID>.

level db and native codecs are invariably a problem here, as is anything
else doing misaligned IO. Protobuf has also had "issues" in the past

see https://issues.apache.org/jira/browse/HADOOP-16100

I think any AA64 work is going to have to define very clearly what "works"
is defined as; spark standalone with a specific set of codecs is probably
the first thing to aim for -no Snappy or lz4.

Anything which goes near: protobuf, checksums, native code, etc is in
trouble. Don't try and deploy with HDFS as the cluster FS, would be my
recommendation.

If you want a cluster use NFS or one of google GCS, Azure WASB for the
cluster FS. And before trying either of those cloud store, run the
filesystem connector test suites (hadoop-azure; google gcs github) to see
that they work. If the foundational FS test suites fail, nothing else will
work



On Thu, Jun 27, 2019 at 3:09 AM Tianhua huang <hu...@gmail.com>
wrote:

> I took the ut tests on my arm instance before and reported an issue in
> https://issues.apache.org/jira/browse/SPARK-27721,  and seems there was
> no leveldbjni native package for aarch64 in leveldbjni-all.jar(or 1.8)
> https://mvnrepository.com/artifact/org.fusesource.leveldbjni/leveldbjni-all/1.8
> , we can find https://github.com/fusesource/leveldbjni/pull/82 this pr
> added the aarch64 support and merged on 2 Nov 2017, but the latest release
> of the repo is  on 17 Oct 2013, unfortunately it didn't include the
> aarch64 supporting.
>
> I will running the test on the job mentioned above, and will try to fix
> the issue above, or if anyone have any idea of it, welcome reply me, thank
> you.
>
>
> On Wed, Jun 26, 2019 at 8:11 PM Sean Owen <sr...@gmail.com> wrote:
>
>> Can you begin by testing yourself? I think the first step is to make
>> sure the build and tests work on ARM. If you find problems you can
>> isolate them and try to fix them, or at least report them. It's only
>> worth getting CI in place when we think builds will work.
>>
>> On Tue, Jun 25, 2019 at 9:26 PM Tianhua huang <hu...@gmail.com>
>> wrote:
>> >
>> > Thanks Shane :)
>> >
>> > This sounds good, and yes I agree that it's best to keep the test/build
>> infrastructure in one place. If you can't find the ARM resource we are
>> willing to support the ARM instance :)  Our goal is to make more open
>> source software to be more compatible for aarch64 platform, so let's to do
>> it. I will be happy if I can give some help for the goal.
>> >
>> > Waiting for you good news :)
>> >
>> > On Wed, Jun 26, 2019 at 9:47 AM shane knapp <sk...@berkeley.edu>
>> wrote:
>> >>
>> >> ...or via VM as you mentioned earlier.  :)
>> >>
>> >> shane (who will file a JIRA tomorrow)
>> >>
>> >> On Tue, Jun 25, 2019 at 6:44 PM shane knapp <sk...@berkeley.edu>
>> wrote:
>> >>>
>> >>> i'd much prefer that we keep the test/build infrastructure in one
>> place.
>> >>>
>> >>> we don't have ARM hardware, but there's a slim possibility i can
>> scare something up in our older research stock...
>> >>>
>> >>> another option would be to run the build in a arm-based docker
>> container, which (according to the intarwebs) is possible.
>> >>>
>> >>> shane
>> >>>
>> >>> On Tue, Jun 25, 2019 at 6:35 PM Tianhua huang <
>> huangtianhua223@gmail.com> wrote:
>> >>>>
>> >>>> I forked apache/spark project and propose a job(
>> https://github.com/theopenlab/spark/pull/1) for spark building in
>> OpenLab ARM instance, this is the first step to build spark on ARM,  I can
>> enable a periodic job for arm building for apache/spark master if you guys
>> like.  Later I will run tests for spark. I also willing to be the
>> maintainer of the arm ci of spark.
>> >>>>
>> >>>> Thanks for you attention.
>> >>>>
>> >>>> On Thu, Jun 20, 2019 at 10:17 AM Tianhua huang <
>> huangtianhua223@gmail.com> wrote:
>> >>>>>
>> >>>>> Thanks Sean.
>> >>>>>
>> >>>>> I am very happy to hear that the community will put effort to fix
>> the ARM-related issues. I'd be happy to help if you like. And could you
>> give the trace link of this issue, then I can check it is fixed or not,
>> thank you.
>> >>>>> As far as I know the old versions of spark support ARM, and now the
>> new versions don't, this just shows that we need a CI to check whether the
>> spark support ARM and whether some modification break it.
>> >>>>> I will add a demo job in OpenLab to build spark on ARM and do a
>> simple UT test. Later I will give the job link.
>> >>>>>
>> >>>>> Let me know what you think.
>> >>>>>
>> >>>>> Thank you all!
>> >>>>>
>> >>>>>
>> >>>>> On Wed, Jun 19, 2019 at 8:47 PM Sean Owen <sr...@gmail.com> wrote:
>> >>>>>>
>> >>>>>> I'd begin by reporting and fixing ARM-related issues in the build.
>> If
>> >>>>>> they're small, of course we should do them. If it requires
>> significant
>> >>>>>> modifications, we can discuss how much Spark can support ARM. I
>> don't
>> >>>>>> think it's yet necessary for the Spark project to run these CI
>> builds
>> >>>>>> until that point, but it's always welcome if people are testing
>> that
>> >>>>>> separately.
>> >>>>>>
>> >>>>>> On Wed, Jun 19, 2019 at 7:41 AM Holden Karau <ho...@pigscanfly.ca>
>> wrote:
>> >>>>>> >
>> >>>>>> > Moving to dev@ for increased visibility among the developers.
>> >>>>>> >
>> >>>>>> > On Wed, Jun 19, 2019 at 1:24 AM Tianhua huang <
>> huangtianhua223@gmail.com> wrote:
>> >>>>>> >>
>> >>>>>> >> Thanks for your reply.
>> >>>>>> >>
>> >>>>>> >> As I said before, I met some problem of build or test for spark
>> on aarch64 server, so it will be better to have the ARM CI to make sure the
>> spark is compatible for AArch64 platforms.
>> >>>>>> >>
>> >>>>>> >> I’m from OpenLab team(https://openlabtesting.org/ ,a community
>> to do open source project testing. And we can support some Arm virtual
>> machines to AMPLab Jenkins, and also we have a developer team that willing
>> to work on this, we willing to maintain build CI jobs and address the CI
>> issues.  What do you think?
>> >>>>>> >>
>> >>>>>> >>
>> >>>>>> >> Thanks for your attention.
>> >>>>>> >>
>> >>>>>> >>
>> >>>>>> >> On Wed, Jun 19, 2019 at 6:39 AM shane knapp <
>> sknapp@berkeley.edu> wrote:
>> >>>>>> >>>
>> >>>>>> >>> yeah, we don't have any aarch64 systems for testing...  this
>> has been asked before but is currently pretty low on our priority list as
>> we don't have the hardware.
>> >>>>>> >>>
>> >>>>>> >>> sorry,
>> >>>>>> >>>
>> >>>>>> >>> shane
>> >>>>>> >>>
>> >>>>>> >>> On Mon, Jun 10, 2019 at 7:08 PM Tianhua huang <
>> huangtianhua223@gmail.com> wrote:
>> >>>>>> >>>>
>> >>>>>> >>>> Hi, sorry to disturb you.
>> >>>>>> >>>> The CI testing for apache spark is supported by AMPLab
>> Jenkins, and I find there are some computers(most of them are Linux (amd64)
>> arch) for the CI development, but seems there is no Aarch64 computer for
>> spark CI testing. Recently, I build and run test for spark(master and
>> branch-2.4) on my arm server, and unfortunately there are some problems,
>> for example, ut test is failed due to a LEVELDBJNI native package, the
>> details for java test see http://paste.openstack.org/show/752063/ and
>> python test see http://paste.openstack.org/show/752709/
>> >>>>>> >>>> So I have a question about the ARM CI testing for spark, is
>> there any plan to support it? Thank you very much and I will wait for your
>> reply!
>> >>>>>> >>>
>> >>>>>> >>>
>> >>>>>> >>>
>> >>>>>> >>> --
>> >>>>>> >>> Shane Knapp
>> >>>>>> >>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>> >>>>>> >>> https://rise.cs.berkeley.edu
>> >>>>>> >
>> >>>>>> >
>> >>>>>> >
>> >>>>>> > --
>> >>>>>> > Twitter: https://twitter.com/holdenkarau
>> >>>>>> > Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9
>> >>>>>> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Shane Knapp
>> >>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>> >>> https://rise.cs.berkeley.edu
>> >>
>> >>
>> >>
>> >> --
>> >> Shane Knapp
>> >> UC Berkeley EECS Research / RISELab Staff Technical Lead
>> >> https://rise.cs.berkeley.edu
>>
>

Re: Ask for ARM CI for spark

Posted by Tianhua huang <hu...@gmail.com>.

I took the ut tests on my arm instance before and reported an issue in
https://issues.apache.org/jira/browse/SPARK-27721,  and seems there was no
leveldbjni native package for aarch64 in leveldbjni-all.jar(or 1.8)
https://mvnrepository.com/artifact/org.fusesource.leveldbjni/leveldbjni-all/1.8
, we can find https://github.com/fusesource/leveldbjni/pull/82 this pr
added the aarch64 support and merged on 2 Nov 2017, but the latest release
of the repo is  on 17 Oct 2013, unfortunately it didn't include the aarch64
supporting.

I will running the test on the job mentioned above, and will try to fix the
issue above, or if anyone have any idea of it, welcome reply me, thank you.


On Wed, Jun 26, 2019 at 8:11 PM Sean Owen <sr...@gmail.com> wrote:

> Can you begin by testing yourself? I think the first step is to make
> sure the build and tests work on ARM. If you find problems you can
> isolate them and try to fix them, or at least report them. It's only
> worth getting CI in place when we think builds will work.
>
> On Tue, Jun 25, 2019 at 9:26 PM Tianhua huang <hu...@gmail.com>
> wrote:
> >
> > Thanks Shane :)
> >
> > This sounds good, and yes I agree that it's best to keep the test/build
> infrastructure in one place. If you can't find the ARM resource we are
> willing to support the ARM instance :)  Our goal is to make more open
> source software to be more compatible for aarch64 platform, so let's to do
> it. I will be happy if I can give some help for the goal.
> >
> > Waiting for you good news :)
> >
> > On Wed, Jun 26, 2019 at 9:47 AM shane knapp <sk...@berkeley.edu> wrote:
> >>
> >> ...or via VM as you mentioned earlier.  :)
> >>
> >> shane (who will file a JIRA tomorrow)
> >>
> >> On Tue, Jun 25, 2019 at 6:44 PM shane knapp <sk...@berkeley.edu>
> wrote:
> >>>
> >>> i'd much prefer that we keep the test/build infrastructure in one
> place.
> >>>
> >>> we don't have ARM hardware, but there's a slim possibility i can scare
> something up in our older research stock...
> >>>
> >>> another option would be to run the build in a arm-based docker
> container, which (according to the intarwebs) is possible.
> >>>
> >>> shane
> >>>
> >>> On Tue, Jun 25, 2019 at 6:35 PM Tianhua huang <
> huangtianhua223@gmail.com> wrote:
> >>>>
> >>>> I forked apache/spark project and propose a job(
> https://github.com/theopenlab/spark/pull/1) for spark building in OpenLab
> ARM instance, this is the first step to build spark on ARM,  I can enable a
> periodic job for arm building for apache/spark master if you guys like.
> Later I will run tests for spark. I also willing to be the maintainer of
> the arm ci of spark.
> >>>>
> >>>> Thanks for you attention.
> >>>>
> >>>> On Thu, Jun 20, 2019 at 10:17 AM Tianhua huang <
> huangtianhua223@gmail.com> wrote:
> >>>>>
> >>>>> Thanks Sean.
> >>>>>
> >>>>> I am very happy to hear that the community will put effort to fix
> the ARM-related issues. I'd be happy to help if you like. And could you
> give the trace link of this issue, then I can check it is fixed or not,
> thank you.
> >>>>> As far as I know the old versions of spark support ARM, and now the
> new versions don't, this just shows that we need a CI to check whether the
> spark support ARM and whether some modification break it.
> >>>>> I will add a demo job in OpenLab to build spark on ARM and do a
> simple UT test. Later I will give the job link.
> >>>>>
> >>>>> Let me know what you think.
> >>>>>
> >>>>> Thank you all!
> >>>>>
> >>>>>
> >>>>> On Wed, Jun 19, 2019 at 8:47 PM Sean Owen <sr...@gmail.com> wrote:
> >>>>>>
> >>>>>> I'd begin by reporting and fixing ARM-related issues in the build.
> If
> >>>>>> they're small, of course we should do them. If it requires
> significant
> >>>>>> modifications, we can discuss how much Spark can support ARM. I
> don't
> >>>>>> think it's yet necessary for the Spark project to run these CI
> builds
> >>>>>> until that point, but it's always welcome if people are testing that
> >>>>>> separately.
> >>>>>>
> >>>>>> On Wed, Jun 19, 2019 at 7:41 AM Holden Karau <ho...@pigscanfly.ca>
> wrote:
> >>>>>> >
> >>>>>> > Moving to dev@ for increased visibility among the developers.
> >>>>>> >
> >>>>>> > On Wed, Jun 19, 2019 at 1:24 AM Tianhua huang <
> huangtianhua223@gmail.com> wrote:
> >>>>>> >>
> >>>>>> >> Thanks for your reply.
> >>>>>> >>
> >>>>>> >> As I said before, I met some problem of build or test for spark
> on aarch64 server, so it will be better to have the ARM CI to make sure the
> spark is compatible for AArch64 platforms.
> >>>>>> >>
> >>>>>> >> I’m from OpenLab team(https://openlabtesting.org/ ,a community
> to do open source project testing. And we can support some Arm virtual
> machines to AMPLab Jenkins, and also we have a developer team that willing
> to work on this, we willing to maintain build CI jobs and address the CI
> issues.  What do you think?
> >>>>>> >>
> >>>>>> >>
> >>>>>> >> Thanks for your attention.
> >>>>>> >>
> >>>>>> >>
> >>>>>> >> On Wed, Jun 19, 2019 at 6:39 AM shane knapp <sk...@berkeley.edu>
> wrote:
> >>>>>> >>>
> >>>>>> >>> yeah, we don't have any aarch64 systems for testing...  this
> has been asked before but is currently pretty low on our priority list as
> we don't have the hardware.
> >>>>>> >>>
> >>>>>> >>> sorry,
> >>>>>> >>>
> >>>>>> >>> shane
> >>>>>> >>>
> >>>>>> >>> On Mon, Jun 10, 2019 at 7:08 PM Tianhua huang <
> huangtianhua223@gmail.com> wrote:
> >>>>>> >>>>
> >>>>>> >>>> Hi, sorry to disturb you.
> >>>>>> >>>> The CI testing for apache spark is supported by AMPLab
> Jenkins, and I find there are some computers(most of them are Linux (amd64)
> arch) for the CI development, but seems there is no Aarch64 computer for
> spark CI testing. Recently, I build and run test for spark(master and
> branch-2.4) on my arm server, and unfortunately there are some problems,
> for example, ut test is failed due to a LEVELDBJNI native package, the
> details for java test see http://paste.openstack.org/show/752063/ and
> python test see http://paste.openstack.org/show/752709/
> >>>>>> >>>> So I have a question about the ARM CI testing for spark, is
> there any plan to support it? Thank you very much and I will wait for your
> reply!
> >>>>>> >>>
> >>>>>> >>>
> >>>>>> >>>
> >>>>>> >>> --
> >>>>>> >>> Shane Knapp
> >>>>>> >>> UC Berkeley EECS Research / RISELab Staff Technical Lead
> >>>>>> >>> https://rise.cs.berkeley.edu
> >>>>>> >
> >>>>>> >
> >>>>>> >
> >>>>>> > --
> >>>>>> > Twitter: https://twitter.com/holdenkarau
> >>>>>> > Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9
> >>>>>> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau
> >>>
> >>>
> >>>
> >>> --
> >>> Shane Knapp
> >>> UC Berkeley EECS Research / RISELab Staff Technical Lead
> >>> https://rise.cs.berkeley.edu
> >>
> >>
> >>
> >> --
> >> Shane Knapp
> >> UC Berkeley EECS Research / RISELab Staff Technical Lead
> >> https://rise.cs.berkeley.edu
>

Re: Ask for ARM CI for spark

Posted by Sean Owen <sr...@gmail.com>.

Can you begin by testing yourself? I think the first step is to make
sure the build and tests work on ARM. If you find problems you can
isolate them and try to fix them, or at least report them. It's only
worth getting CI in place when we think builds will work.

On Tue, Jun 25, 2019 at 9:26 PM Tianhua huang <hu...@gmail.com> wrote:
>
> Thanks Shane :)
>
> This sounds good, and yes I agree that it's best to keep the test/build infrastructure in one place. If you can't find the ARM resource we are willing to support the ARM instance :)  Our goal is to make more open source software to be more compatible for aarch64 platform, so let's to do it. I will be happy if I can give some help for the goal.
>
> Waiting for you good news :)
>
> On Wed, Jun 26, 2019 at 9:47 AM shane knapp <sk...@berkeley.edu> wrote:
>>
>> ...or via VM as you mentioned earlier.  :)
>>
>> shane (who will file a JIRA tomorrow)
>>
>> On Tue, Jun 25, 2019 at 6:44 PM shane knapp <sk...@berkeley.edu> wrote:
>>>
>>> i'd much prefer that we keep the test/build infrastructure in one place.
>>>
>>> we don't have ARM hardware, but there's a slim possibility i can scare something up in our older research stock...
>>>
>>> another option would be to run the build in a arm-based docker container, which (according to the intarwebs) is possible.
>>>
>>> shane
>>>
>>> On Tue, Jun 25, 2019 at 6:35 PM Tianhua huang <hu...@gmail.com> wrote:
>>>>
>>>> I forked apache/spark project and propose a job(https://github.com/theopenlab/spark/pull/1) for spark building in OpenLab ARM instance, this is the first step to build spark on ARM,  I can enable a periodic job for arm building for apache/spark master if you guys like.  Later I will run tests for spark. I also willing to be the maintainer of the arm ci of spark.
>>>>
>>>> Thanks for you attention.
>>>>
>>>> On Thu, Jun 20, 2019 at 10:17 AM Tianhua huang <hu...@gmail.com> wrote:
>>>>>
>>>>> Thanks Sean.
>>>>>
>>>>> I am very happy to hear that the community will put effort to fix the ARM-related issues. I'd be happy to help if you like. And could you give the trace link of this issue, then I can check it is fixed or not, thank you.
>>>>> As far as I know the old versions of spark support ARM, and now the new versions don't, this just shows that we need a CI to check whether the spark support ARM and whether some modification break it.
>>>>> I will add a demo job in OpenLab to build spark on ARM and do a simple UT test. Later I will give the job link.
>>>>>
>>>>> Let me know what you think.
>>>>>
>>>>> Thank you all!
>>>>>
>>>>>
>>>>> On Wed, Jun 19, 2019 at 8:47 PM Sean Owen <sr...@gmail.com> wrote:
>>>>>>
>>>>>> I'd begin by reporting and fixing ARM-related issues in the build. If
>>>>>> they're small, of course we should do them. If it requires significant
>>>>>> modifications, we can discuss how much Spark can support ARM. I don't
>>>>>> think it's yet necessary for the Spark project to run these CI builds
>>>>>> until that point, but it's always welcome if people are testing that
>>>>>> separately.
>>>>>>
>>>>>> On Wed, Jun 19, 2019 at 7:41 AM Holden Karau <ho...@pigscanfly.ca> wrote:
>>>>>> >
>>>>>> > Moving to dev@ for increased visibility among the developers.
>>>>>> >
>>>>>> > On Wed, Jun 19, 2019 at 1:24 AM Tianhua huang <hu...@gmail.com> wrote:
>>>>>> >>
>>>>>> >> Thanks for your reply.
>>>>>> >>
>>>>>> >> As I said before, I met some problem of build or test for spark on aarch64 server, so it will be better to have the ARM CI to make sure the spark is compatible for AArch64 platforms.
>>>>>> >>
>>>>>> >> I’m from OpenLab team(https://openlabtesting.org/ ,a community to do open source project testing. And we can support some Arm virtual machines to AMPLab Jenkins, and also we have a developer team that willing to work on this, we willing to maintain build CI jobs and address the CI issues.  What do you think?
>>>>>> >>
>>>>>> >>
>>>>>> >> Thanks for your attention.
>>>>>> >>
>>>>>> >>
>>>>>> >> On Wed, Jun 19, 2019 at 6:39 AM shane knapp <sk...@berkeley.edu> wrote:
>>>>>> >>>
>>>>>> >>> yeah, we don't have any aarch64 systems for testing...  this has been asked before but is currently pretty low on our priority list as we don't have the hardware.
>>>>>> >>>
>>>>>> >>> sorry,
>>>>>> >>>
>>>>>> >>> shane
>>>>>> >>>
>>>>>> >>> On Mon, Jun 10, 2019 at 7:08 PM Tianhua huang <hu...@gmail.com> wrote:
>>>>>> >>>>
>>>>>> >>>> Hi, sorry to disturb you.
>>>>>> >>>> The CI testing for apache spark is supported by AMPLab Jenkins, and I find there are some computers(most of them are Linux (amd64) arch) for the CI development, but seems there is no Aarch64 computer for spark CI testing. Recently, I build and run test for spark(master and branch-2.4) on my arm server, and unfortunately there are some problems, for example, ut test is failed due to a LEVELDBJNI native package, the details for java test see http://paste.openstack.org/show/752063/ and python test see http://paste.openstack.org/show/752709/
>>>>>> >>>> So I have a question about the ARM CI testing for spark, is there any plan to support it? Thank you very much and I will wait for your reply!
>>>>>> >>>
>>>>>> >>>
>>>>>> >>>
>>>>>> >>> --
>>>>>> >>> Shane Knapp
>>>>>> >>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>>>>> >>> https://rise.cs.berkeley.edu
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > --
>>>>>> > Twitter: https://twitter.com/holdenkarau
>>>>>> > Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9
>>>>>> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>
>>>
>>>
>>> --
>>> Shane Knapp
>>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>> https://rise.cs.berkeley.edu
>>
>>
>>
>> --
>> Shane Knapp
>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>> https://rise.cs.berkeley.edu

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org

Re: Ask for ARM CI for spark

Posted by Tianhua huang <hu...@gmail.com>.

Thanks Shane :)

This sounds good, and yes I agree that it's best to keep the test/build
infrastructure in one place. If you can't find the ARM resource we are
willing to support the ARM instance :)  Our goal is to make more open
source software to be more compatible for aarch64 platform, so let's to do
it. I will be happy if I can give some help for the goal.

Waiting for you good news :)

On Wed, Jun 26, 2019 at 9:47 AM shane knapp <sk...@berkeley.edu> wrote:

> ...or via VM as you mentioned earlier.  :)
>
> shane (who will file a JIRA tomorrow)
>
> On Tue, Jun 25, 2019 at 6:44 PM shane knapp <sk...@berkeley.edu> wrote:
>
>> i'd much prefer that we keep the test/build infrastructure in one place.
>>
>> we don't have ARM hardware, but there's a slim possibility i can scare
>> something up in our older research stock...
>>
>> another option would be to run the build in a arm-based docker container,
>> which (according to the intarwebs) is possible.
>>
>> shane
>>
>> On Tue, Jun 25, 2019 at 6:35 PM Tianhua huang <hu...@gmail.com>
>> wrote:
>>
>>> I forked apache/spark project and propose a job(
>>> https://github.com/theopenlab/spark/pull/1) for spark building in
>>> OpenLab ARM instance, this is the first step to build spark on ARM,  I can
>>> enable a periodic job for arm building for apache/spark master if you
>>> guys like.  Later I will run tests for spark. I also willing to be the
>>> maintainer of the arm ci of spark.
>>>
>>> Thanks for you attention.
>>>
>>> On Thu, Jun 20, 2019 at 10:17 AM Tianhua huang <
>>> huangtianhua223@gmail.com> wrote:
>>>
>>>> Thanks Sean.
>>>>
>>>> I am very happy to hear that the community will put effort to fix the
>>>> ARM-related issues. I'd be happy to help if you like. And could you give
>>>> the trace link of this issue, then I can check it is fixed or not, thank
>>>> you.
>>>> As far as I know the old versions of spark support ARM, and now the new
>>>> versions don't, this just shows that we need a CI to check whether the
>>>> spark support ARM and whether some modification break it.
>>>> I will add a demo job in OpenLab to build spark on ARM and do a simple
>>>> UT test. Later I will give the job link.
>>>>
>>>> Let me know what you think.
>>>>
>>>> Thank you all!
>>>>
>>>>
>>>> On Wed, Jun 19, 2019 at 8:47 PM Sean Owen <sr...@gmail.com> wrote:
>>>>
>>>>> I'd begin by reporting and fixing ARM-related issues in the build. If
>>>>> they're small, of course we should do them. If it requires significant
>>>>> modifications, we can discuss how much Spark can support ARM. I don't
>>>>> think it's yet necessary for the Spark project to run these CI builds
>>>>> until that point, but it's always welcome if people are testing that
>>>>> separately.
>>>>>
>>>>> On Wed, Jun 19, 2019 at 7:41 AM Holden Karau <ho...@pigscanfly.ca>
>>>>> wrote:
>>>>> >
>>>>> > Moving to dev@ for increased visibility among the developers.
>>>>> >
>>>>> > On Wed, Jun 19, 2019 at 1:24 AM Tianhua huang <
>>>>> huangtianhua223@gmail.com> wrote:
>>>>> >>
>>>>> >> Thanks for your reply.
>>>>> >>
>>>>> >> As I said before, I met some problem of build or test for spark on
>>>>> aarch64 server, so it will be better to have the ARM CI to make sure the
>>>>> spark is compatible for AArch64 platforms.
>>>>> >>
>>>>> >> I’m from OpenLab team(https://openlabtesting.org/ ,a community to
>>>>> do open source project testing. And we can support some Arm virtual
>>>>> machines to AMPLab Jenkins, and also we have a developer team that willing
>>>>> to work on this, we willing to maintain build CI jobs and address the CI
>>>>> issues.  What do you think?
>>>>> >>
>>>>> >>
>>>>> >> Thanks for your attention.
>>>>> >>
>>>>> >>
>>>>> >> On Wed, Jun 19, 2019 at 6:39 AM shane knapp <sk...@berkeley.edu>
>>>>> wrote:
>>>>> >>>
>>>>> >>> yeah, we don't have any aarch64 systems for testing...  this has
>>>>> been asked before but is currently pretty low on our priority list as we
>>>>> don't have the hardware.
>>>>> >>>
>>>>> >>> sorry,
>>>>> >>>
>>>>> >>> shane
>>>>> >>>
>>>>> >>> On Mon, Jun 10, 2019 at 7:08 PM Tianhua huang <
>>>>> huangtianhua223@gmail.com> wrote:
>>>>> >>>>
>>>>> >>>> Hi, sorry to disturb you.
>>>>> >>>> The CI testing for apache spark is supported by AMPLab Jenkins,
>>>>> and I find there are some computers(most of them are Linux (amd64) arch)
>>>>> for the CI development, but seems there is no Aarch64 computer for spark CI
>>>>> testing. Recently, I build and run test for spark(master and branch-2.4) on
>>>>> my arm server, and unfortunately there are some problems, for example, ut
>>>>> test is failed due to a LEVELDBJNI native package, the details for java
>>>>> test see http://paste.openstack.org/show/752063/ and python test see
>>>>> http://paste.openstack.org/show/752709/
>>>>> >>>> So I have a question about the ARM CI testing for spark, is there
>>>>> any plan to support it? Thank you very much and I will wait for your reply!
>>>>> >>>
>>>>> >>>
>>>>> >>>
>>>>> >>> --
>>>>> >>> Shane Knapp
>>>>> >>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>>>> >>> https://rise.cs.berkeley.edu
>>>>> >
>>>>> >
>>>>> >
>>>>> > --
>>>>> > Twitter: https://twitter.com/holdenkarau
>>>>> > Books (Learning Spark, High Performance Spark, etc.):
>>>>> https://amzn.to/2MaRAG9
>>>>> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>>>
>>>>
>>
>> --
>> Shane Knapp
>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>> https://rise.cs.berkeley.edu
>>
>
>
> --
> Shane Knapp
> UC Berkeley EECS Research / RISELab Staff Technical Lead
> https://rise.cs.berkeley.edu
>

Re: Ask for ARM CI for spark

Posted by shane knapp <sk...@berkeley.edu>.

...or via VM as you mentioned earlier.  :)

shane (who will file a JIRA tomorrow)

On Tue, Jun 25, 2019 at 6:44 PM shane knapp <sk...@berkeley.edu> wrote:

> i'd much prefer that we keep the test/build infrastructure in one place.
>
> we don't have ARM hardware, but there's a slim possibility i can scare
> something up in our older research stock...
>
> another option would be to run the build in a arm-based docker container,
> which (according to the intarwebs) is possible.
>
> shane
>
> On Tue, Jun 25, 2019 at 6:35 PM Tianhua huang <hu...@gmail.com>
> wrote:
>
>> I forked apache/spark project and propose a job(
>> https://github.com/theopenlab/spark/pull/1) for spark building in
>> OpenLab ARM instance, this is the first step to build spark on ARM,  I can
>> enable a periodic job for arm building for apache/spark master if you
>> guys like.  Later I will run tests for spark. I also willing to be the
>> maintainer of the arm ci of spark.
>>
>> Thanks for you attention.
>>
>> On Thu, Jun 20, 2019 at 10:17 AM Tianhua huang <hu...@gmail.com>
>> wrote:
>>
>>> Thanks Sean.
>>>
>>> I am very happy to hear that the community will put effort to fix the
>>> ARM-related issues. I'd be happy to help if you like. And could you give
>>> the trace link of this issue, then I can check it is fixed or not, thank
>>> you.
>>> As far as I know the old versions of spark support ARM, and now the new
>>> versions don't, this just shows that we need a CI to check whether the
>>> spark support ARM and whether some modification break it.
>>> I will add a demo job in OpenLab to build spark on ARM and do a simple
>>> UT test. Later I will give the job link.
>>>
>>> Let me know what you think.
>>>
>>> Thank you all!
>>>
>>>
>>> On Wed, Jun 19, 2019 at 8:47 PM Sean Owen <sr...@gmail.com> wrote:
>>>
>>>> I'd begin by reporting and fixing ARM-related issues in the build. If
>>>> they're small, of course we should do them. If it requires significant
>>>> modifications, we can discuss how much Spark can support ARM. I don't
>>>> think it's yet necessary for the Spark project to run these CI builds
>>>> until that point, but it's always welcome if people are testing that
>>>> separately.
>>>>
>>>> On Wed, Jun 19, 2019 at 7:41 AM Holden Karau <ho...@pigscanfly.ca>
>>>> wrote:
>>>> >
>>>> > Moving to dev@ for increased visibility among the developers.
>>>> >
>>>> > On Wed, Jun 19, 2019 at 1:24 AM Tianhua huang <
>>>> huangtianhua223@gmail.com> wrote:
>>>> >>
>>>> >> Thanks for your reply.
>>>> >>
>>>> >> As I said before, I met some problem of build or test for spark on
>>>> aarch64 server, so it will be better to have the ARM CI to make sure the
>>>> spark is compatible for AArch64 platforms.
>>>> >>
>>>> >> I’m from OpenLab team(https://openlabtesting.org/ ,a community to
>>>> do open source project testing. And we can support some Arm virtual
>>>> machines to AMPLab Jenkins, and also we have a developer team that willing
>>>> to work on this, we willing to maintain build CI jobs and address the CI
>>>> issues.  What do you think?
>>>> >>
>>>> >>
>>>> >> Thanks for your attention.
>>>> >>
>>>> >>
>>>> >> On Wed, Jun 19, 2019 at 6:39 AM shane knapp <sk...@berkeley.edu>
>>>> wrote:
>>>> >>>
>>>> >>> yeah, we don't have any aarch64 systems for testing...  this has
>>>> been asked before but is currently pretty low on our priority list as we
>>>> don't have the hardware.
>>>> >>>
>>>> >>> sorry,
>>>> >>>
>>>> >>> shane
>>>> >>>
>>>> >>> On Mon, Jun 10, 2019 at 7:08 PM Tianhua huang <
>>>> huangtianhua223@gmail.com> wrote:
>>>> >>>>
>>>> >>>> Hi, sorry to disturb you.
>>>> >>>> The CI testing for apache spark is supported by AMPLab Jenkins,
>>>> and I find there are some computers(most of them are Linux (amd64) arch)
>>>> for the CI development, but seems there is no Aarch64 computer for spark CI
>>>> testing. Recently, I build and run test for spark(master and branch-2.4) on
>>>> my arm server, and unfortunately there are some problems, for example, ut
>>>> test is failed due to a LEVELDBJNI native package, the details for java
>>>> test see http://paste.openstack.org/show/752063/ and python test see
>>>> http://paste.openstack.org/show/752709/
>>>> >>>> So I have a question about the ARM CI testing for spark, is there
>>>> any plan to support it? Thank you very much and I will wait for your reply!
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>> --
>>>> >>> Shane Knapp
>>>> >>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>>> >>> https://rise.cs.berkeley.edu
>>>> >
>>>> >
>>>> >
>>>> > --
>>>> > Twitter: https://twitter.com/holdenkarau
>>>> > Books (Learning Spark, High Performance Spark, etc.):
>>>> https://amzn.to/2MaRAG9
>>>> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>>
>>>
>
> --
> Shane Knapp
> UC Berkeley EECS Research / RISELab Staff Technical Lead
> https://rise.cs.berkeley.edu
>


-- 
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu

Re: Ask for ARM CI for spark

Posted by shane knapp <sk...@berkeley.edu>.

i'd much prefer that we keep the test/build infrastructure in one place.

we don't have ARM hardware, but there's a slim possibility i can scare
something up in our older research stock...

another option would be to run the build in a arm-based docker container,
which (according to the intarwebs) is possible.

shane

On Tue, Jun 25, 2019 at 6:35 PM Tianhua huang <hu...@gmail.com>
wrote:

> I forked apache/spark project and propose a job(
> https://github.com/theopenlab/spark/pull/1) for spark building in OpenLab
> ARM instance, this is the first step to build spark on ARM,  I can enable a periodic
> job for arm building for apache/spark master if you guys like.  Later I
> will run tests for spark. I also willing to be the maintainer of the arm ci
> of spark.
>
> Thanks for you attention.
>
> On Thu, Jun 20, 2019 at 10:17 AM Tianhua huang <hu...@gmail.com>
> wrote:
>
>> Thanks Sean.
>>
>> I am very happy to hear that the community will put effort to fix the
>> ARM-related issues. I'd be happy to help if you like. And could you give
>> the trace link of this issue, then I can check it is fixed or not, thank
>> you.
>> As far as I know the old versions of spark support ARM, and now the new
>> versions don't, this just shows that we need a CI to check whether the
>> spark support ARM and whether some modification break it.
>> I will add a demo job in OpenLab to build spark on ARM and do a simple UT
>> test. Later I will give the job link.
>>
>> Let me know what you think.
>>
>> Thank you all!
>>
>>
>> On Wed, Jun 19, 2019 at 8:47 PM Sean Owen <sr...@gmail.com> wrote:
>>
>>> I'd begin by reporting and fixing ARM-related issues in the build. If
>>> they're small, of course we should do them. If it requires significant
>>> modifications, we can discuss how much Spark can support ARM. I don't
>>> think it's yet necessary for the Spark project to run these CI builds
>>> until that point, but it's always welcome if people are testing that
>>> separately.
>>>
>>> On Wed, Jun 19, 2019 at 7:41 AM Holden Karau <ho...@pigscanfly.ca>
>>> wrote:
>>> >
>>> > Moving to dev@ for increased visibility among the developers.
>>> >
>>> > On Wed, Jun 19, 2019 at 1:24 AM Tianhua huang <
>>> huangtianhua223@gmail.com> wrote:
>>> >>
>>> >> Thanks for your reply.
>>> >>
>>> >> As I said before, I met some problem of build or test for spark on
>>> aarch64 server, so it will be better to have the ARM CI to make sure the
>>> spark is compatible for AArch64 platforms.
>>> >>
>>> >> I’m from OpenLab team(https://openlabtesting.org/ ,a community to do
>>> open source project testing. And we can support some Arm virtual machines
>>> to AMPLab Jenkins, and also we have a developer team that willing to work
>>> on this, we willing to maintain build CI jobs and address the CI issues.
>>> What do you think?
>>> >>
>>> >>
>>> >> Thanks for your attention.
>>> >>
>>> >>
>>> >> On Wed, Jun 19, 2019 at 6:39 AM shane knapp <sk...@berkeley.edu>
>>> wrote:
>>> >>>
>>> >>> yeah, we don't have any aarch64 systems for testing...  this has
>>> been asked before but is currently pretty low on our priority list as we
>>> don't have the hardware.
>>> >>>
>>> >>> sorry,
>>> >>>
>>> >>> shane
>>> >>>
>>> >>> On Mon, Jun 10, 2019 at 7:08 PM Tianhua huang <
>>> huangtianhua223@gmail.com> wrote:
>>> >>>>
>>> >>>> Hi, sorry to disturb you.
>>> >>>> The CI testing for apache spark is supported by AMPLab Jenkins, and
>>> I find there are some computers(most of them are Linux (amd64) arch) for
>>> the CI development, but seems there is no Aarch64 computer for spark CI
>>> testing. Recently, I build and run test for spark(master and branch-2.4) on
>>> my arm server, and unfortunately there are some problems, for example, ut
>>> test is failed due to a LEVELDBJNI native package, the details for java
>>> test see http://paste.openstack.org/show/752063/ and python test see
>>> http://paste.openstack.org/show/752709/
>>> >>>> So I have a question about the ARM CI testing for spark, is there
>>> any plan to support it? Thank you very much and I will wait for your reply!
>>> >>>
>>> >>>
>>> >>>
>>> >>> --
>>> >>> Shane Knapp
>>> >>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>> >>> https://rise.cs.berkeley.edu
>>> >
>>> >
>>> >
>>> > --
>>> > Twitter: https://twitter.com/holdenkarau
>>> > Books (Learning Spark, High Performance Spark, etc.):
>>> https://amzn.to/2MaRAG9
>>> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>
>>

-- 
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu

Re: Ask for ARM CI for spark

Posted by Tianhua huang <hu...@gmail.com>.

I forked apache/spark project and propose a job(
https://github.com/theopenlab/spark/pull/1) for spark building in OpenLab
ARM instance, this is the first step to build spark on ARM,  I can
enable a periodic
job for arm building for apache/spark master if you guys like.  Later I
will run tests for spark. I also willing to be the maintainer of the arm ci
of spark.

Thanks for you attention.

On Thu, Jun 20, 2019 at 10:17 AM Tianhua huang <hu...@gmail.com>
wrote:

> Thanks Sean.
>
> I am very happy to hear that the community will put effort to fix the
> ARM-related issues. I'd be happy to help if you like. And could you give
> the trace link of this issue, then I can check it is fixed or not, thank
> you.
> As far as I know the old versions of spark support ARM, and now the new
> versions don't, this just shows that we need a CI to check whether the
> spark support ARM and whether some modification break it.
> I will add a demo job in OpenLab to build spark on ARM and do a simple UT
> test. Later I will give the job link.
>
> Let me know what you think.
>
> Thank you all!
>
>
> On Wed, Jun 19, 2019 at 8:47 PM Sean Owen <sr...@gmail.com> wrote:
>
>> I'd begin by reporting and fixing ARM-related issues in the build. If
>> they're small, of course we should do them. If it requires significant
>> modifications, we can discuss how much Spark can support ARM. I don't
>> think it's yet necessary for the Spark project to run these CI builds
>> until that point, but it's always welcome if people are testing that
>> separately.
>>
>> On Wed, Jun 19, 2019 at 7:41 AM Holden Karau <ho...@pigscanfly.ca>
>> wrote:
>> >
>> > Moving to dev@ for increased visibility among the developers.
>> >
>> > On Wed, Jun 19, 2019 at 1:24 AM Tianhua huang <
>> huangtianhua223@gmail.com> wrote:
>> >>
>> >> Thanks for your reply.
>> >>
>> >> As I said before, I met some problem of build or test for spark on
>> aarch64 server, so it will be better to have the ARM CI to make sure the
>> spark is compatible for AArch64 platforms.
>> >>
>> >> I’m from OpenLab team(https://openlabtesting.org/ ,a community to do
>> open source project testing. And we can support some Arm virtual machines
>> to AMPLab Jenkins, and also we have a developer team that willing to work
>> on this, we willing to maintain build CI jobs and address the CI issues.
>> What do you think?
>> >>
>> >>
>> >> Thanks for your attention.
>> >>
>> >>
>> >> On Wed, Jun 19, 2019 at 6:39 AM shane knapp <sk...@berkeley.edu>
>> wrote:
>> >>>
>> >>> yeah, we don't have any aarch64 systems for testing...  this has been
>> asked before but is currently pretty low on our priority list as we don't
>> have the hardware.
>> >>>
>> >>> sorry,
>> >>>
>> >>> shane
>> >>>
>> >>> On Mon, Jun 10, 2019 at 7:08 PM Tianhua huang <
>> huangtianhua223@gmail.com> wrote:
>> >>>>
>> >>>> Hi, sorry to disturb you.
>> >>>> The CI testing for apache spark is supported by AMPLab Jenkins, and
>> I find there are some computers(most of them are Linux (amd64) arch) for
>> the CI development, but seems there is no Aarch64 computer for spark CI
>> testing. Recently, I build and run test for spark(master and branch-2.4) on
>> my arm server, and unfortunately there are some problems, for example, ut
>> test is failed due to a LEVELDBJNI native package, the details for java
>> test see http://paste.openstack.org/show/752063/ and python test see
>> http://paste.openstack.org/show/752709/
>> >>>> So I have a question about the ARM CI testing for spark, is there
>> any plan to support it? Thank you very much and I will wait for your reply!
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Shane Knapp
>> >>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>> >>> https://rise.cs.berkeley.edu
>> >
>> >
>> >
>> > --
>> > Twitter: https://twitter.com/holdenkarau
>> > Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9
>> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>
>

Re: Ask for ARM CI for spark

Posted by Tianhua huang <hu...@gmail.com>.

Thanks Sean.

I am very happy to hear that the community will put effort to fix the
ARM-related issues. I'd be happy to help if you like. And could you give
the trace link of this issue, then I can check it is fixed or not, thank
you.
As far as I know the old versions of spark support ARM, and now the new
versions don't, this just shows that we need a CI to check whether the
spark support ARM and whether some modification break it.
I will add a demo job in OpenLab to build spark on ARM and do a simple UT
test. Later I will give the job link.

Let me know what you think.

Thank you all!


On Wed, Jun 19, 2019 at 8:47 PM Sean Owen <sr...@gmail.com> wrote:

> I'd begin by reporting and fixing ARM-related issues in the build. If
> they're small, of course we should do them. If it requires significant
> modifications, we can discuss how much Spark can support ARM. I don't
> think it's yet necessary for the Spark project to run these CI builds
> until that point, but it's always welcome if people are testing that
> separately.
>
> On Wed, Jun 19, 2019 at 7:41 AM Holden Karau <ho...@pigscanfly.ca> wrote:
> >
> > Moving to dev@ for increased visibility among the developers.
> >
> > On Wed, Jun 19, 2019 at 1:24 AM Tianhua huang <hu...@gmail.com>
> wrote:
> >>
> >> Thanks for your reply.
> >>
> >> As I said before, I met some problem of build or test for spark on
> aarch64 server, so it will be better to have the ARM CI to make sure the
> spark is compatible for AArch64 platforms.
> >>
> >> I’m from OpenLab team(https://openlabtesting.org/ ,a community to do
> open source project testing. And we can support some Arm virtual machines
> to AMPLab Jenkins, and also we have a developer team that willing to work
> on this, we willing to maintain build CI jobs and address the CI issues.
> What do you think?
> >>
> >>
> >> Thanks for your attention.
> >>
> >>
> >> On Wed, Jun 19, 2019 at 6:39 AM shane knapp <sk...@berkeley.edu>
> wrote:
> >>>
> >>> yeah, we don't have any aarch64 systems for testing...  this has been
> asked before but is currently pretty low on our priority list as we don't
> have the hardware.
> >>>
> >>> sorry,
> >>>
> >>> shane
> >>>
> >>> On Mon, Jun 10, 2019 at 7:08 PM Tianhua huang <
> huangtianhua223@gmail.com> wrote:
> >>>>
> >>>> Hi, sorry to disturb you.
> >>>> The CI testing for apache spark is supported by AMPLab Jenkins, and I
> find there are some computers(most of them are Linux (amd64) arch) for the
> CI development, but seems there is no Aarch64 computer for spark CI
> testing. Recently, I build and run test for spark(master and branch-2.4) on
> my arm server, and unfortunately there are some problems, for example, ut
> test is failed due to a LEVELDBJNI native package, the details for java
> test see http://paste.openstack.org/show/752063/ and python test see
> http://paste.openstack.org/show/752709/
> >>>> So I have a question about the ARM CI testing for spark, is there any
> plan to support it? Thank you very much and I will wait for your reply!
> >>>
> >>>
> >>>
> >>> --
> >>> Shane Knapp
> >>> UC Berkeley EECS Research / RISELab Staff Technical Lead
> >>> https://rise.cs.berkeley.edu
> >
> >
> >
> > --
> > Twitter: https://twitter.com/holdenkarau
> > Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9
> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>

Re: Ask for ARM CI for spark

Posted by Sean Owen <sr...@gmail.com>.

I'd begin by reporting and fixing ARM-related issues in the build. If
they're small, of course we should do them. If it requires significant
modifications, we can discuss how much Spark can support ARM. I don't
think it's yet necessary for the Spark project to run these CI builds
until that point, but it's always welcome if people are testing that
separately.

On Wed, Jun 19, 2019 at 7:41 AM Holden Karau <ho...@pigscanfly.ca> wrote:
>
> Moving to dev@ for increased visibility among the developers.
>
> On Wed, Jun 19, 2019 at 1:24 AM Tianhua huang <hu...@gmail.com> wrote:
>>
>> Thanks for your reply.
>>
>> As I said before, I met some problem of build or test for spark on aarch64 server, so it will be better to have the ARM CI to make sure the spark is compatible for AArch64 platforms.
>>
>> I’m from OpenLab team(https://openlabtesting.org/ ,a community to do open source project testing. And we can support some Arm virtual machines to AMPLab Jenkins, and also we have a developer team that willing to work on this, we willing to maintain build CI jobs and address the CI issues.  What do you think?
>>
>>
>> Thanks for your attention.
>>
>>
>> On Wed, Jun 19, 2019 at 6:39 AM shane knapp <sk...@berkeley.edu> wrote:
>>>
>>> yeah, we don't have any aarch64 systems for testing...  this has been asked before but is currently pretty low on our priority list as we don't have the hardware.
>>>
>>> sorry,
>>>
>>> shane
>>>
>>> On Mon, Jun 10, 2019 at 7:08 PM Tianhua huang <hu...@gmail.com> wrote:
>>>>
>>>> Hi, sorry to disturb you.
>>>> The CI testing for apache spark is supported by AMPLab Jenkins, and I find there are some computers(most of them are Linux (amd64) arch) for the CI development, but seems there is no Aarch64 computer for spark CI testing. Recently, I build and run test for spark(master and branch-2.4) on my arm server, and unfortunately there are some problems, for example, ut test is failed due to a LEVELDBJNI native package, the details for java test see http://paste.openstack.org/show/752063/ and python test see http://paste.openstack.org/show/752709/
>>>> So I have a question about the ARM CI testing for spark, is there any plan to support it? Thank you very much and I will wait for your reply!
>>>
>>>
>>>
>>> --
>>> Shane Knapp
>>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>> https://rise.cs.berkeley.edu
>
>
>
> --
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org

Re: Ask for ARM CI for spark

Posted by Holden Karau <ho...@pigscanfly.ca>.

Moving to dev@ for increased visibility among the developers.

On Wed, Jun 19, 2019 at 1:24 AM Tianhua huang <hu...@gmail.com>
wrote:

> Thanks for your reply.
>
> As I said before, I met some problem of build or test for spark on aarch64
> server, so it will be better to have the ARM CI to make sure the spark is compatible
> for AArch64 platforms.
>
> I’m from OpenLab team(https://openlabtesting.org/ ,a community to do open
> source project testing. And we can support some Arm virtual machines to
> AMPLab Jenkins, and also we have a developer team that willing to work on
> this, we willing to maintain build CI jobs and address the CI issues.
> What do you think?
>
>
> Thanks for your attention.
>
> On Wed, Jun 19, 2019 at 6:39 AM shane knapp <sk...@berkeley.edu> wrote:
>
>> yeah, we don't have any aarch64 systems for testing...  this has been
>> asked before but is currently pretty low on our priority list as we don't
>> have the hardware.
>>
>> sorry,
>>
>> shane
>>
>> On Mon, Jun 10, 2019 at 7:08 PM Tianhua huang <hu...@gmail.com>
>> wrote:
>>
>>> Hi, sorry to disturb you.
>>> The CI testing for apache spark is supported by AMPLab Jenkins, and I
>>> find there are some computers(most of them are Linux (amd64) arch) for
>>> the CI development, but seems there is no Aarch64 computer for spark CI
>>> testing. Recently, I build and run test for spark(master and branch-2.4) on
>>> my arm server, and unfortunately there are some problems, for example, ut
>>> test is failed due to a LEVELDBJNI native package, the details for java
>>> test see http://paste.openstack.org/show/752063/ and python test see
>>> http://paste.openstack.org/show/752709/
>>> So I have a question about the ARM CI testing for spark, is there any
>>> plan to support it? Thank you very much and I will wait for your reply!
>>>
>>
>>
>> --
>> Shane Knapp
>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>> https://rise.cs.berkeley.edu
>>
>

-- 
Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
YouTube Live Streams: https://www.youtube.com/user/holdenkarau