You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@systemds.apache.org by GitBox <gi...@apache.org> on 2021/06/14 12:02:26 UTC

[GitHub] [systemds] corepointer opened a new pull request #1317: [SYSTEMDS-3020] Initial GPU junit tests

corepointer opened a new pull request #1317:
URL: https://github.com/apache/systemds/pull/1317


   More tests will be added as we go. For the tests to run it is advisable to not start multiple test cases simultaneously in the same JVM. To run from command line use something like this:
   
   `mvn -ntp test -DenableGPU=true -Dmaven.test.skip=false -Dtest-parallel=suites -Dtest-threadCount=1 -Dtest-forkCount=1 -D automatedtestbase.outputbuffering=false -Dtest=org.apache.sysds.test.gpu.**`
   
   * This test suite should *not* be included in the automated testing for the time being as we don't have GPU testing infrastructure set up.
   * Two of the tests are failing atm - working on that ;-)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemds] corepointer commented on pull request #1317: [SYSTEMDS-3020] Initial GPU junit tests

Posted by GitBox <gi...@apache.org>.
corepointer commented on pull request #1317:
URL: https://github.com/apache/systemds/pull/1317#issuecomment-979383359


   > or should I comment out the failed test and merge the remaining. This would avoid rebasing on my **gpu runner test fork**.
   
   You can use the "@Ignore" functionality that we already use in the row template test case (I think test #18).  
   
   > If that is okay for you.
   
   Yes, please go ahead.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@systemds.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemds] corepointer commented on pull request #1317: [SYSTEMDS-3020] Initial GPU junit tests

Posted by GitBox <gi...@apache.org>.
corepointer commented on pull request #1317:
URL: https://github.com/apache/systemds/pull/1317#issuecomment-860714047


   >     1. Is this setup allows writing feature tests on gpu, where the baseline is also with `-gpu` (e.g. compare `-gpu -lineage` with `-gpu`)?
   
   All tests add -gpu. It is up to your test to add more. So -lineage -gpu is definitely possible.
   
   >     2. I think this is an easy enough way to run regression tests in a gpu. But effect-wise, how is it different from adding `-gpu` to the existing test classes?
   
   At the moment it is no different (other than the test checking for the appearance of the corresponding gpu instruction in the heavy hitter output). The content of new tests is up to its author - anything's possible ;-)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemds] j143 commented on pull request #1317: [SYSTEMDS-3020] Initial GPU junit tests

Posted by GitBox <gi...@apache.org>.
j143 commented on pull request #1317:
URL: https://github.com/apache/systemds/pull/1317#issuecomment-971150538


   This LGTM. 👍 
   
   The workaround for the last failing test can be resolved later. :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@systemds.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemds] j143 commented on pull request #1317: [SYSTEMDS-3020] Initial GPU junit tests

Posted by GitBox <gi...@apache.org>.
j143 commented on pull request #1317:
URL: https://github.com/apache/systemds/pull/1317#issuecomment-979165918


   or should I comment out the failed test and merge the remaining. This would avoid rebasing on my **gpu runner test fork**.
   
   If that is okay for you.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@systemds.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemds] j143 edited a comment on pull request #1317: [SYSTEMDS-3020] Initial GPU junit tests

Posted by GitBox <gi...@apache.org>.
j143 edited a comment on pull request #1317:
URL: https://github.com/apache/systemds/pull/1317#issuecomment-963037706


   Hi @corepointer ,
   
    the testing does not work on this runner. Any pointers on how to resolve this one.
   
   ![image](https://user-images.githubusercontent.com/53068787/140730529-fb959f56-ca30-47f3-8ce5-ba10fb0340f8.png)
   
   cuda, cudnn is available from the command line:
   
   run results here: https://github.com/j143/systemds/runs/4137852455


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@systemds.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemds] asfgit closed pull request #1317: [SYSTEMDS-3020] Initial GPU junit tests

Posted by GitBox <gi...@apache.org>.
asfgit closed pull request #1317:
URL: https://github.com/apache/systemds/pull/1317


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@systemds.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemds] corepointer commented on pull request #1317: [SYSTEMDS-3020] Initial GPU junit tests

Posted by GitBox <gi...@apache.org>.
corepointer commented on pull request #1317:
URL: https://github.com/apache/systemds/pull/1317#issuecomment-963678890


   > Hi @corepointer ,
   > 
   > the testing does not work on this runner. Any pointers on how to resolve this one.
   > 
   > ![image](https://user-images.githubusercontent.com/53068787/140730529-fb959f56-ca30-47f3-8ce5-ba10fb0340f8.png)
   > 
   > cuda, cudnn is available from the command line:
   > 
   > run results here: https://github.com/j143/systemds/runs/4137852455
   
   You're not rebuilding the binaries yet with cmake, are you? Because then there might have been the issue that Jitify is not there. You need to clone with --recursive to fetch the external dependency. But that is an issue once the current binaries run the test. 
   
   Two things you could check: Is it CUDA version 10.2 that is installed? This is at the moment the latest version we support (I'm working on CUDA11.x support - it's almost there). The other thing to check: Is a CUDA capable device visible to your VM? You could add the command "nvidia-smi" to your runner. That should print the available CUDA devices.
   
   > 
   > This one runs fine
   > 
   > ```c
   > java -Xmx4g -Xms4g -Xmn400m -cp target/SystemDS.jar:target/lib/*:target/SystemDS-*.jar org.apache.sysds.api.DMLScript -f ../main.dml -exec singlenode -gpu
   > ```
   This one isn't using any GPU instructions though ;-)
   
   PS: I've rebased to current main branch and cherry picked the commits you added.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@systemds.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemds] corepointer commented on pull request #1317: [SYSTEMDS-3020] Initial GPU junit tests

Posted by GitBox <gi...@apache.org>.
corepointer commented on pull request #1317:
URL: https://github.com/apache/systemds/pull/1317#issuecomment-979158756


   > This LGTM. +1
   > 
   > The workaround for the last failing test can be resolved later. :)
   
   Thank you! I'll fix it and merge it in after my next paper deadline on Dec 10.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@systemds.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemds] phaniarnab commented on pull request #1317: [SYSTEMDS-3020] Initial GPU junit tests

Posted by GitBox <gi...@apache.org>.
phaniarnab commented on pull request #1317:
URL: https://github.com/apache/systemds/pull/1317#issuecomment-860685608


   Thanks, @corepointer for the gpu junit test infrastructure.
   I did not start executing the tests yet. But I have some questions regarding the way it is done.
   
   1. Is this setup allows writing feature tests on gpu, where the baseline is also with `-gpu` (e.g. compare `-gpu -lineage` with `-gpu`)?
   2. I think this is an easy enough way to run regression tests in a gpu. But effect-wise, how is it different from adding `-gpu` to the existing test classes?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemds] j143 edited a comment on pull request #1317: [SYSTEMDS-3020] Initial GPU junit tests

Posted by GitBox <gi...@apache.org>.
j143 edited a comment on pull request #1317:
URL: https://github.com/apache/systemds/pull/1317#issuecomment-963865680






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@systemds.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemds] j143 edited a comment on pull request #1317: [SYSTEMDS-3020] Initial GPU junit tests

Posted by GitBox <gi...@apache.org>.
j143 edited a comment on pull request #1317:
URL: https://github.com/apache/systemds/pull/1317#issuecomment-963037706


   Hi @corepointer ,
   
    the testing does not work on this runner. Any pointers on how to resolve this one.
   
   ![image](https://user-images.githubusercontent.com/53068787/140730529-fb959f56-ca30-47f3-8ce5-ba10fb0340f8.png)
   
   cuda, cudnn is available from the command line:
   
   run results here: https://github.com/j143/systemds/runs/4137852455
   
   ---
   This one runs fine
   
   ```c
   java -Xmx4g -Xms4g -Xmn400m -cp target/SystemDS.jar:target/lib/*:target/SystemDS-*.jar org.apache.sysds.api.DMLScript -f ../main.dml -exec singlenode -gpu
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@systemds.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemds] j143 commented on pull request #1317: [SYSTEMDS-3020] Initial GPU junit tests

Posted by GitBox <gi...@apache.org>.
j143 commented on pull request #1317:
URL: https://github.com/apache/systemds/pull/1317#issuecomment-963037706


   Hi @corepointer , the testing does not work on this runner.
   
   ![image](https://user-images.githubusercontent.com/53068787/140730529-fb959f56-ca30-47f3-8ce5-ba10fb0340f8.png)
   
   cuda, cudnn is available from the command line:
   
   run results here: https://github.com/j143/systemds/runs/4137852455


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@systemds.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemds] j143 commented on pull request #1317: [SYSTEMDS-3020] Initial GPU junit tests

Posted by GitBox <gi...@apache.org>.
j143 commented on pull request #1317:
URL: https://github.com/apache/systemds/pull/1317#issuecomment-963865680


   
   ```console
   ubuntu@ip-10-0-0-4:~/repo/systemds$ nvidia-smi
   Tue Nov  9 06:50:01 2021       
   +-----------------------------------------------------------------------------+
   | NVIDIA-SMI 440.118.02   Driver Version: 440.118.02   CUDA Version: 10.2     |
   |-------------------------------+----------------------+----------------------+
   | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
   | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
   |===============================+======================+======================|
   |   0  Tesla K80           On   | 00000000:00:1E.0 Off |                    0 |
   | N/A   26C    P0    70W / 149W |    243MiB / 11441MiB |      0%      Default |
   +-------------------------------+----------------------+----------------------+
                                                                                  
   +-----------------------------------------------------------------------------+
   | Processes:                                                       GPU Memory |
   |  GPU       PID   Type   Process name                             Usage      |
   |=============================================================================|
   |    0      9955      C   .../lib/jvm/java-11-openjdk-amd64/bin/java   232MiB |
   +-----------------------------------------------------------------------------+
   ```
   
   `cudnn` is at `/usr/include/cudnn.h` version 7.6.5 as per our docs. 
   I could successfully run the cuda samples.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@systemds.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemds] corepointer commented on pull request #1317: [SYSTEMDS-3020] Initial GPU junit tests

Posted by GitBox <gi...@apache.org>.
corepointer commented on pull request #1317:
URL: https://github.com/apache/systemds/pull/1317#issuecomment-964247137


   > |   0  Tesla K80           On   | 00000000:00:1E.0 Off |                    0 |
   
   Sorry my bad. I didn't realize that I raised the bar to compute capability 6 and above when I introduced atomicAdd() for double values (that's exactly the function that is called in the line referenced by the error messages). The Tesla K80 has a compute capability of 3.7 [1]. So for the time being code gen is for cc 6+ only if we don't find a workaround.
   
   [1] https://developer.nvidia.com/cuda-gpus


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@systemds.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org