You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "zhao bo (Jira)" <ji...@apache.org> on 2019/10/17 15:45:00 UTC

[jira] [Comment Edited] (SPARK-29106) Add jenkins arm test for spark

    [ https://issues.apache.org/jira/browse/SPARK-29106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16953858#comment-16953858 ] 

zhao bo edited comment on SPARK-29106 at 10/17/19 3:44 PM:
-----------------------------------------------------------

Hi [~shaneknapp] Shane,
 As the whole test take some times, so I can not send out this summarize yesterday.

First Notes: The ARM VM just test with java8, as the java11 didn't get test before,
                   so we plan to add it in the future, and we test the java8 first.
 * You mentioned the dependencies of ansible had been installed.
 * Also the dependencies which would use sudo priority have be installed by user 'root'.  The 'jenkins' user doesn't have sudo or any root-level access.
 * Source code location: /home/jenkins/spark – 2019/10/16 master branch
 * The all ansible test scripts are stored in /home/jenkins/ansible_test_scripts  You can run the test with the ansible CMD.
 * When we finish the whole test on the target ARM VM, we make a whole VM snapshot for it.

Now I have finished the following tests on the ARM VM:
 1. maven test - Spark Build and UT
 =======================
 env: java8javac 1.8.0_222
 spark: master branch
 TEST STEPS: 
 - ./build/mvn -B -e clean install -DskipTests -Phadoop-2.7 -Pyarn -Phive -Phive-thriftserver -Pkinesis-asl -Pmesos 
 - ./build/mvn -B -e test -Phadoop-2.7 -Pyarn -Phive -Phive-thriftserver -Pkinesis-asl -Pmesos
 TEST ANSIBLE CMD:  
 - ansible-playbook -i /home/jenkins/ansible_test_scripts/inventory /home/jenkins/ansible_test_scripts/maven_unittest.yml
 TEST LOG(including the full log, but success at the last time):  
 - /home/jenkins/ansible_test_scripts/test_logs/spark_build.log
   - /home/jenkins/ansible_test_scripts/test_logs/spark_test_original.log
   - /home/jenkins/ansible_test_scripts/test_logs/spark_test.log
     - For the spark_test.log, as I operate mistake, there is an error in the middle of the maven UT test, and stop the test(It seems I did other thing to locate the 
 RAM to raise "not enough RAM" during test).   So I split the log into 2 file, One is test_logs/spark_test.log_before_test_fail,   the other is 
 test_logs/spark_test_including_fail_and_following.log(This is rerun the fail test and the following tests which not run in the first log file)      The main reason is 
 the maven test take so much time, and the whole tests are pass in the end. So I think it's better to not waste too much time here, then we could move the 
 integration process forward quickly.

2. Pyspark and SparkR test
 =======================
 env: python2.7  python3.6  for PySpark test
      R 3.6.1 for SparkR test
 TEST STEPS:
   - python/run-tests --python-executables=python2.7,python3.6
   - ./R/run-tests.shTEST 
 ANSIBLE CMD: 
   - ansible-playbook -i /home/jenkins/ansible_test_scripts/inventory /home/jenkins/ansible_test_scripts/pyspark_sparkr_test.yml
 TEST LOG(including the full log, but success at the last time):
    - /home/jenkins/ansible_test_scripts/test_logs/pyspark_test.log
   - /home/jenkins/ansible_test_scripts/test_logs/sparkr_test.log

In the end, through the real test on the ARM vm, to be honest, we want to show you the time cost when test on ARM.
 Test cost summarize:
 The whole test may take very long time.
 * Spark build by maven  – First build take 1h42m, after that, this would take 1h29m(This may be affected by the VM host performance during the time, the cost time may be shorter than we test.)
 * Spark UT test by maven  – This may take 8h-9h to finish the whole test
 * PySpark test  – 20 - 23 mins
 * SparkR test  – 15 - 20 mins

As the above time cost for different test jobs, we can choose multiple ways to test them as Periodic test jobs.
 * Split them and test one by one.
   - such as, if we just want to test PySpark, then we just add the periodic test  which including Spark Build and Pyspark test. That would just cost 2h per test. But if we want to test SparkR, we still need to test Spark Build. That means each test type, we must test it after Spark Build testing.
 * Test all of them each time.

we test all of them in one periodic test job, and just run Spark Build testing 1 time. But it would cost nearly 11h.
 Each way is OK for us, you could choose the way to add the periodic testing for ARM.If you want to discuss and know more, please feel free to contact us.  


was (Author: bzhaoopenstack):
Hi [~shaneknapp] Shane,
As the whole test take some times, so I can not send out this summarize yesterday.

First Notes: The ARM VM just test with java8, as the java11 didn't get test before,
                  so we plan to add it in the future, and we test the java8 first.
 * You mentioned the dependencies of ansible had been installed.
 * Also the dependencies which would use sudo priority have be installed by user 'root'.  The 'jenkins' user doesn't have sudo or any root-level access.
 * Source code location: /home/jenkins/spark -- 2019/10/16 master branch
 * The all ansible test scripts are stored in /home/jenkins/ansible_test_scripts  You can run the test with the ansible CMD.
 * When we finish the whole test on the target ARM VM, we make a whole VM snapshot for it.
 
Now I have finished the following tests on the ARM VM:
1. maven test - Spark Build and UT
=======================
env: java8javac 1.8.0_222
spark: master branch
TEST STEPS:  
  - ./build/mvn -B -e clean install -DskipTests -Phadoop-2.7 -Pyarn -Phive -Phive-thriftserver -Pkinesis-asl -Pmesos  
  - ./build/mvn -B -e test -Phadoop-2.7 -Pyarn -Phive -Phive-thriftserver -Pkinesis-asl -Pmesos
TEST ANSIBLE CMD:  
  - ansible-playbook -i /home/jenkins/ansible_test_scripts/inventory /home/jenkins/ansible_test_scripts/maven_unittest.yml
TEST LOG(including the full log, but success at the last time):   
  - /home/jenkins/ansible_test_scripts/test_logs/spark_build.log
  - /home/jenkins/ansible_test_scripts/test_logs/spark_test_original.log
  - /home/jenkins/ansible_test_scripts/test_logs/spark_test.log
    - For the spark_test.log, as I operate mistake, there is an error in the middle of the maven UT test, and stop the test(It seems I did other thing to locate the 
      RAM to raise "not enough RAM" during test).   So I split the log into 2 file, One is test_logs/spark_test.log_before_test_fail,   the other is 
      test_logs/spark_test_including_fail_and_following.log(This is rerun the fail test and the following tests which not run in the first log file)      The main reason is 
      the maven test take so much time, and the whole tests are pass in the end. So I think it's better to not waste too much time here, then we could move the 
      integration process forward quickly.
2. Pyspark and SparkR test
=======================
env: python2.7  python3.6  for PySpark test
       R 3.6.1 for SparkR test
TEST STEPS:
  - python/run-tests --python-executables=python2.7,python3.6
  - ./R/run-tests.shTEST 
ANSIBLE CMD: 
  - ansible-playbook -i /home/jenkins/ansible_test_scripts/inventory /home/jenkins/ansible_test_scripts/pyspark_sparkr_test.yml
TEST LOG(including the full log, but success at the last time):
   - /home/jenkins/ansible_test_scripts/test_logs/pyspark_test.log
   - /home/jenkins/ansible_test_scripts/test_logs/sparkr_test.log

In the end, through the real test on the ARM vm, to be honest, we want to show you the time cost when test on ARM.
Test cost summarize:
The whole test may take very long time.
* Spark build by maven  -- First build take 1h42m, after that, this would take 1h29m(This may be affected by the VM host performance during the time, the cost time may be shorter than we test.)
* Spark UT test by maven  -- This may take 8h-9h to finish the whole test
* PySpark test  -- 20 - 23 mins
* SparkR test  -- 15 - 20 mins

As the above time cost for different test jobs, we can choose multiple ways to test them as Periodic test jobs.
* Split them and test one by one.
   - such as, if we just want to test PySpark, then we just add the periodic test  which including Spark Build and Pyspark test. That would just cost 2h per test. But if we want to test SparkR, we still need to test Spark Build. That means each test type, we must test it after Spark Build testing.
* Test all of them each time.
   - we test all of them in one periodic test job, and just run Spark Build testing 1 time. But it would cost nearly 11h.
Each way is OK for us, you could choose the way to add the periodic testing for ARM.If you want to discuss and know more, please feel free to contact us.  

> Add jenkins arm test for spark
> ------------------------------
>
>                 Key: SPARK-29106
>                 URL: https://issues.apache.org/jira/browse/SPARK-29106
>             Project: Spark
>          Issue Type: Test
>          Components: Tests
>    Affects Versions: 3.0.0
>            Reporter: huangtianhua
>            Priority: Minor
>
> Add arm test jobs to amplab jenkins for spark.
> Till now we made two arm test periodic jobs for spark in OpenLab, one is based on master with hadoop 2.7(similar with QA test of amplab jenkins), other one is based on a new branch which we made on date 09-09, see  [http://status.openlabtesting.org/builds/job/spark-master-unit-test-hadoop-2.7-arm64]  and [http://status.openlabtesting.org/builds/job/spark-unchanged-branch-unit-test-hadoop-2.7-arm64.|http://status.openlabtesting.org/builds/job/spark-unchanged-branch-unit-test-hadoop-2.7-arm64] We only have to care about the first one when integrate arm test with amplab jenkins.
> About the k8s test on arm, we have took test it, see [https://github.com/theopenlab/spark/pull/17], maybe we can integrate it later. 
> And we plan test on other stable branches too, and we can integrate them to amplab when they are ready.
> We have offered an arm instance and sent the infos to shane knapp, thanks shane to add the first arm job to amplab jenkins :) 
> The other important thing is about the leveldbjni [https://github.com/fusesource/leveldbjni,|https://github.com/fusesource/leveldbjni/issues/80] spark depends on leveldbjni-all-1.8 [https://mvnrepository.com/artifact/org.fusesource.leveldbjni/leveldbjni-all/1.8], we can see there is no arm64 supporting. So we build an arm64 supporting release of leveldbjni see [https://mvnrepository.com/artifact/org.openlabtesting.leveldbjni/leveldbjni-all/1.8], but we can't modified the spark pom.xml directly with something like 'property'/'profile' to choose correct jar package on arm or x86 platform, because spark depends on some hadoop packages like hadoop-hdfs, the packages depend on leveldbjni-all-1.8 too, unless hadoop release with new arm supporting leveldbjni jar. Now we download the leveldbjni-al-1.8 of openlabtesting and 'mvn install' to use it when arm testing for spark.
> PS: The issues found and fixed:
>  SPARK-28770
>  [https://github.com/apache/spark/pull/25673]
>   
>  SPARK-28519
>  [https://github.com/apache/spark/pull/25279]
>   
>  SPARK-28433
>  [https://github.com/apache/spark/pull/25186]
>  
> SPARK-28467
> [https://github.com/apache/spark/pull/25864]
>  
> SPARK-29286
> [https://github.com/apache/spark/pull/26021]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org