You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Andrew Palumbo <ap...@outlook.com> on 2016/10/27 04:31:52 UTC

Errors when 0.13.0 master or vcl



Could everybody please post any errors when stack trace that they are getting on master or the vcl pr branch along with full system info?

Ie.  Branch, os major/minor, chip arch, GPU, clinfo output, Java major/minor and vendor,  vcl version, gcc version, and anything else that may be useful so that we may compare?

Thx

Andy


Sent from my Galaxy Tab A

RE: Errors when 0.13.0 master or vcl

Posted by Andrew Palumbo <ap...@outlook.com>.
Awesome.. thx Trevor.



Sent from my Verizon Wireless 4G LTE smartphone


-------- Original message --------
From: Trevor Grant <tr...@gmail.com>
Date: 10/28/2016 1:28 PM (GMT-08:00)
To: dev@mahout.apache.org
Subject: Re: Errors when 0.13.0 master or vcl

My Desktop

Branch: master
Build Command: mvn clean install -Phadoop2
Built and tested successfully

Branch: andrewpalumbo/mahout/viennacl-opmmul-a
Bulid Command: mvn clean install -Pviennacl -Phadoop2 -DskipTests && mvn
test

Builds successfully-
Tests still riddled with

[INFO] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
[WARN] Unable to create class GPUMMul: attempting OpenMP version
[INFO] Creating org.apache.mahout.viennacl.openmp.OMPMMul solver
[INFO] Unable to create class OMPMMul: falling back to java version

Would be fixed with pom hack I presented earlier (developed hack on this
box)

Machine Info:
================================================================================

 os major/minor:
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 14.04.3 LTS
Release: 14.04
Codename: trusty


 chip arch:
$ uname -a
Linux tower1 3.19.0-31-generic #36~14.04.1-Ubuntu SMP Thu Oct 8 10:21:08
UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

$ cat /proc/cpuinfo
...
model name : Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz
...
^^ 8 cores


GPU:
$ sudo nvidia-smi
Sun Oct 16 22:14:49 2016
+------------------------------------------------------+

| NVIDIA-SMI 352.63     Driver Version: 352.63         |

|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr.
ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute
M. |
|===============================+======================+======================|
|   0  GeForce GT 740      Off  | 0000:02:00.0     N/A |
 N/A |
| 33%   34C    P8    N/A /  N/A |    330MiB /  1021MiB |     N/A
 Default |
+-------------------------------+----------------------+----------------------+




 clinfo output:
 $ clinfo
 Number of platforms: 1
   Platform Profile: FULL_PROFILE
   Platform Version: OpenCL 1.2 CUDA 7.5.23
   Platform Name: NVIDIA CUDA
   Platform Vendor: NVIDIA Corporation
   Platform Extensions: cl_khr_byte_addressable_store cl_khr_icd
cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query
cl_nv_pragma_unroll cl_nv_copy_opts


   Platform Name: NVIDIA CUDA
 Number of devices: 1
   Device Type: CL_DEVICE_TYPE_GPU
   Device ID: 4318
   Max compute units: 2
   Max work items dimensions: 3
     Max work items[0]: 1024
     Max work items[1]: 1024
     Max work items[2]: 64
   Max work group size: 1024
   Preferred vector width char: 1
   Preferred vector width short: 1
   Preferred vector width int: 1
   Preferred vector width long: 1
   Preferred vector width float: 1
   Preferred vector width double: 1
   Native vector width char: 1
   Native vector width short: 1
   Native vector width int: 1
   Native vector width long: 1
   Native vector width float: 1
   Native vector width double: 1
   Max clock frequency: 1071Mhz
   Address bits: 64
   Max memory allocation: 267873536
   Image support: Yes
   Max number of images read arguments: 256
   Max number of images write arguments: 16
   Max image 2D width: 16384
   Max image 2D height: 16384
   Max image 3D width: 4096
   Max image 3D height: 4096
   Max image 3D depth: 4096
   Max samplers within kernel: 32
   Max size of kernel argument: 4352
   Alignment (bits) of base address: 4096
   Minimum alignment (bytes) for any datatype: 128
   Single precision floating point capability
     Denorms: Yes
     Quiet NaNs: Yes
     Round to nearest even: Yes
     Round to zero: Yes
     Round to +ve and infinity: Yes
     IEEE754-2008 fused multiply-add: Yes
   Cache type: Read/Write
   Cache line size: 128
   Cache size: 32768
   Global memory size: 1071494144
   Constant buffer size: 65536
   Max number of constant args: 9
   Local memory type: Local
   Local memory size: 49152
   Error correction support: 0
   Unified memory for Host and Device: 0
   Profiling timer resolution: 1000
   Device endianess: Little
   Available: Yes
   Compiler available: Yes
   Execution capabilities:
     Execute OpenCL kernels: Yes
     Execute native function: No
   Queue properties:
     Out-of-Order: Yes
     Profiling : Yes
   Platform ID: 0xab2160
   Name: GeForce GT 740
   Vendor: NVIDIA Corporation
   Device OpenCL C version: OpenCL C 1.2
   Driver version: 352.63
   Profile: FULL_PROFILE
   Version: OpenCL 1.2 CUDA
   Extensions: cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing
cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll
cl_nv_copy_opts  cl_khr_global_int32_base_atomics
cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics
cl_khr_local_int32_extended_atomics cl_khr_fp64



 Java major/minor and vendor:
 $ java -version
java version "1.7.0_85"
OpenJDK Runtime Environment (IcedTea 2.6.1) (7u85-2.6.1-5ubuntu0.14.04.1)
OpenJDK 64-Bit Server VM (build 24.85-b03, mixed mode)


  vcl version,
From /usr/include/viennacl/version.hpp
1.7.0

gcc version:
$ gcc --version
gcc (Ubuntu 5.4.1-2ubuntu1~14.04) 5.4.1 20160904


Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things."  -Virgil*


On Wed, Oct 26, 2016 at 11:31 PM, Andrew Palumbo <ap...@outlook.com> wrote:

>
>
>
> Could everybody please post any errors when stack trace that they are
> getting on master or the vcl pr branch along with full system info?
>
> Ie.  Branch, os major/minor, chip arch, GPU, clinfo output, Java
> major/minor and vendor,  vcl version, gcc version, and anything else that
> may be useful so that we may compare?
>
> Thx
>
> Andy
>
>
> Sent from my Galaxy Tab A
>

RE: Errors when 0.13.0 master or vcl

Posted by Andrew Palumbo <ap...@outlook.com>.
Nvidia-smi that allows you to monitor gpu usage..

I think u have to do like watch --interval=1 nvidia-smi

There are several options to see different usage.




Sent from my Verizon Wireless 4G LTE smartphone


-------- Original message --------
From: Trevor Grant <tr...@gmail.com>
Date: 10/28/2016 4:15 PM (GMT-08:00)
To: dev@mahout.apache.org
Subject: Re: Errors when 0.13.0 master or vcl

Anecdotes on my OOM error:

It only fails on one test, it is *sort of* working though.

I had the NVIDIA monitor up while I run the tests. I did this a few times
and monitored results (if anyone knows a CLI way to do this and pipe
results to a file- dope).

In general, the background memory utilization of my graphics card is 33%
(of 1G total).  As tests run, utilization is as high as 50% of memory and
99% of GPU utilization (for other tests).   This makes the OOM error even
more curious.

Does anyone know a good util to use to monitor the card while it is
running.

Keep up the fight Andrew, I feel like this is very close.

tg


Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things."  -Virgil*


On Fri, Oct 28, 2016 at 6:01 PM, Trevor Grant <tr...@gmail.com>
wrote:

> I was finally able to hit an OOM error.
>
> I initially was running everything from the top.
>
> When I run mvn test from the viennacl directory...
>
>
> /mahout/viennacl$ mvn test
> [INFO] Scanning for projects...
> [WARNING]
> [WARNING] Some problems were encountered while building the effective
> model for org.apache.mahout:mahout-native-viennacl_2.10:jar:0.13.
> 0-SNAPSHOT
> [WARNING] 'artifactId' contains an expression but should be a constant. @
> org.apache.mahout:mahout-native-viennacl_${scala.compat.version}:[unknown-version],
> /home/trevor/gits/mahout/viennacl/pom.xml, line 31, column 15
> [WARNING] 'build.plugins.plugin.(groupId:artifactId)' must be unique but
> found duplicate declaration of plugin org.codehaus.mojo:exec-maven-plugin
> @ org.apache.mahout:mahout-native-viennacl_${scala.
> compat.version}:[unknown-version], /home/trevor/gits/mahout/viennacl/pom.xml,
> line 181, column 15
> [WARNING]
> [WARNING] It is highly recommended to fix these problems because they
> threaten the stability of your build.
> [WARNING]
> [WARNING] For this reason, future Maven versions might no longer support
> building such malformed projects.
> [WARNING]
> [INFO]
>
> [INFO] ------------------------------------------------------------
> ------------
> [INFO] Building Mahout Native VienniaCL OpenCL Bindings 0.13.0-SNAPSHOT
> [INFO] ------------------------------------------------------------
> ------------
> [INFO]
> [INFO] --- maven-enforcer-plugin:1.4:enforce (enforce-versions) @
> mahout-native-viennacl_2.10 ---
> [INFO]
> [INFO] --- scala-maven-plugin:3.2.0:add-source (add-scala-sources) @
> mahout-native-viennacl_2.10 ---
> [INFO] Add Source directory: /home/trevor/gits/mahout/
> viennacl/src/main/scala
> [INFO] Add Test Source directory: /home/trevor/gits/mahout/
> viennacl/src/test/scala
> [INFO]
> [INFO] --- maven-dependency-plugin:2.3:properties (default) @
> mahout-native-viennacl_2.10 ---
> [INFO]
> [INFO] --- maven-remote-resources-plugin:1.5:process (default) @
> mahout-native-viennacl_2.10 ---
> [INFO]
> [INFO] --- maven-resources-plugin:2.7:resources (default-resources) @
> mahout-native-viennacl_2.10 ---
> [INFO] Using 'UTF-8' encoding to copy filtered resources.
> [INFO] skip non existing resourceDirectory /home/trevor/gits/mahout/
> viennacl/src/main/resources
> [INFO] Copying 3 resources
> [INFO]
> [INFO] --- scala-maven-plugin:3.2.0:compile (scala-compile) @
> mahout-native-viennacl_2.10 ---
> [INFO] Nothing to compile - all classes are up to date
> [INFO]
> [INFO] --- maven-compiler-plugin:3.3:compile (default-compile) @
> mahout-native-viennacl_2.10 ---
> [INFO] Nothing to compile - all classes are up to date
> [INFO]
> [INFO] --- exec-maven-plugin:1.2.1:exec (javacpp) @
> mahout-native-viennacl_2.10 ---
> Warning: Could not load platform properties for class
> org.apache.mahout.viennacl.opencl.GPUMMul
> Warning: Could not load platform properties for class
> org.apache.mahout.viennacl.opencl.GPUMMul$
> Generating /home/trevor/gits/mahout/viennacl/target/classes/org/
> apache/mahout/viennacl/opencl/javacpp/jniViennaCL.cpp
> Compiling /home/trevor/gits/mahout/viennacl/target/classes/org/
> apache/mahout/viennacl/opencl/javacpp/linux-x86_64/libjniViennaCL.so
> g++ -I/usr/include/viennacl -I/usr/lib/jvm/java-7-openjdk-amd64/include
> -I/usr/lib/jvm/java-7-openjdk-amd64/include/linux
> /home/trevor/gits/mahout/viennacl/target/classes/org/
> apache/mahout/viennacl/opencl/javacpp/jniViennaCL.cpp -msse3 -ffast-math
> -fopenmp -fpermissive -Wl,-rpath,$ORIGIN/ -Wl,-z,noexecstack -Wl,-Bsymbolic
> -march=native -m64 -Wall -Ofast -fPIC -shared -s -o libjniViennaCL.so
> -lOpenCL
> Deleting /home/trevor/gits/mahout/viennacl/target/classes/org/
> apache/mahout/viennacl/opencl/javacpp/jniViennaCL.cpp
> [INFO]
> [INFO] --- maven-resources-plugin:2.7:testResources
> (default-testResources) @ mahout-native-viennacl_2.10 ---
> [INFO] Using 'UTF-8' encoding to copy filtered resources.
> [INFO] skip non existing resourceDirectory /home/trevor/gits/mahout/
> viennacl/src/test/resources
> [INFO] Copying 3 resources
> [INFO]
> [INFO] --- scala-maven-plugin:3.2.0:testCompile (scala-test-compile) @
> mahout-native-viennacl_2.10 ---
> [INFO] Nothing to compile - all classes are up to date
> [INFO]
> [INFO] --- maven-compiler-plugin:3.3:testCompile (default-testCompile) @
> mahout-native-viennacl_2.10 ---
> [INFO] Nothing to compile - all classes are up to date
> [INFO]
> [INFO] --- maven-surefire-plugin:2.18.1:test (default-test) @
> mahout-native-viennacl_2.10 ---
> [INFO] Tests are skipped.
> [INFO]
> [INFO] --- scalatest-maven-plugin:1.0:test (test) @
> mahout-native-viennacl_2.10 ---
> Discovery starting.
> Discovery completed in 201 milliseconds.
> Run starting. Expected test count is: 7
> ViennaCLSuiteVCL:
> - row-major viennacl::matrix
>   + OCL matrix memory domain after assgn=2
> [INFO] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
> [INFO] Successfully created org.apache.mahout.viennacl.opencl.GPUMMul
> solver
> jvmRWRW
> gpuRWCW
> log4j:WARN No appenders could be found for logger
> (org.apache.mahout.viennacl.opencl.GPUMMul$).
> log4j:WARN Please initialize the log4j system properly.
> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for
> more info.
> [INFO] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
> [INFO] Successfully created org.apache.mahout.viennacl.opencl.GPUMMul
> solver
> jvmRWRW
> gpuRWCW
> - dense vcl mmul with fast_copy
> [INFO] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
> [INFO] Successfully created org.apache.mahout.viennacl.opencl.GPUMMul
> solver
> jvmRWRW
> gpuRWCW
> - mmul microbenchmark
>   + Mahout multiplication time: 2630 ms.
>   + ViennaCL/OpenCL multiplication time: 2072 ms.
>   + ViennaCL/cpu/OpenMP multiplication time: 813 ms.
> - trans
> [INFO] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
> [INFO] Successfully created org.apache.mahout.viennacl.opencl.GPUMMul
> solver
> gpuSparseRWRW
> ViennaCL: FATAL ERROR: Kernel start failed for 'spgemm_stage3'.
> ViennaCL: Smaller work sizes could not solve the problem.
> ViennaCL: FATAL ERROR: CL_MEM_OBJECT_ALLOCATION_FAILURE
>  ViennaCL could not allocate memory on the device. Most likely the device
> simply ran out of memory.
> If you think that this is a bug in ViennaCL, please report it at
> viennacl-support@lists.sourceforge.net and supply at least the following
> information:
>  * Operating System
>  * Which OpenCL implementation (AMD, NVIDIA, etc.)
>  * ViennaCL version
> Many thanks in advance!falling back to JVM MMUL
> ViennaCL: FATAL ERROR: Kernel start failed for 'spgemm_stage3'.
> ViennaCL: Smaller work sizes could not solve the problem.
> - sparse mmul microbenchmark *** FAILED ***
>   java.lang.RuntimeException: ViennaCL: FATAL ERROR:
> CL_MEM_OBJECT_ALLOCATION_FAILURE
>  ViennaCL could not allocate memory on the device. Most likely the device
> simply ran out of memory.
> If you think that this is a bug in ViennaCL, please report it at
> viennacl-support@lists.sourceforge.net and supply at least the following
> information:
>  * Operating System
>  * Which OpenCL implementation (AMD, NVIDIA, etc.)
>  * ViennaCL version
> Many thanks in advance!
>   at org.apache.mahout.viennacl.opencl.javacpp.CompressedMatrix.allocate(Native
> Method)
>   at org.apache.mahout.viennacl.opencl.javacpp.CompressedMatrix.<init>(
> CompressedMatrix.scala:61)
>   at org.apache.mahout.viennacl.opencl.ViennaCLSuiteVCL$$
> anonfun$5.apply$mcV$sp(ViennaCLSuiteVCL.scala:250)
>   at org.apache.mahout.viennacl.opencl.ViennaCLSuiteVCL$$anonfun$5.apply(
> ViennaCLSuiteVCL.scala:218)
>   at org.apache.mahout.viennacl.opencl.ViennaCLSuiteVCL$$anonfun$5.apply(
> ViennaCLSuiteVCL.scala:218)
>   at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(
> Transformer.scala:22)
>   at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
>   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
>   at org.scalatest.Transformer.apply(Transformer.scala:22)
>   at org.scalatest.Transformer.apply(Transformer.scala:20)
>   ...
>   + Mahout Sparse multiplication time: 13303 ms.
> - VCL Dense Matrix %*% Dense vector
>   + Mahout dense matrix %*% dense vector multiplication time: 0 ms.
>   + ViennaCL/cpu/OpenMP dense matrix %*% dense vector multiplication time:
> 4 ms.
> [INFO] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
> [INFO] Successfully created org.apache.mahout.viennacl.opencl.GPUMMul
> solver
> gpuRWCW
> - Sparse %*% Dense mmul microbenchmark
>   + Mahout multiplication time: 821 ms.
>   + ViennaCL/OpenCL multiplication time: 754 ms.
>   + ViennaCL/cpu/OpenMP multiplication time: 1473 ms.
> Run completed in 26 seconds, 944 milliseconds.
> Total number of tests run: 7
> Suites: completed 2, aborted 0
> Tests: succeeded 6, failed 1, canceled 0, ignored 0, pending 0
> *** 1 TEST FAILED ***
> [INFO] ------------------------------------------------------------
> ------------
> [INFO] BUILD FAILURE
> [INFO] ------------------------------------------------------------
> ------------
> [INFO] Total time: 45.600 s
> [INFO] Finished at: 2016-10-28T17:58:45-05:00
> [INFO] Final Memory: 20M/309M
> [INFO] ------------------------------------------------------------
> ------------
> [ERROR] Failed to execute goal org.scalatest:scalatest-maven-plugin:1.0:test
> (test) on project mahout-native-viennacl_2.10: There are test failures ->
> [Help 1]
> [ERROR]
> [ERROR] To see the full stack trace of the errors, re-run Maven with the
> -e switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR]
> [ERROR] For more information about the errors and possible solutions,
> please read the following articles:
> [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/
> MojoFailureException
>
>
> Trevor Grant
> Data Scientist
> https://github.com/rawkintrevo
> http://stackexchange.com/users/3002022/rawkintrevo
> http://trevorgrant.org
>
> *"Fortunate is he, who is able to know the causes of things."  -Virgil*
>
>
> On Fri, Oct 28, 2016 at 5:38 PM, Andrew Musselman <
> andrew.musselman@gmail.com> wrote:
>
>> >
>> > My Desktop
>>
>> >
>> Branch: master
>>
>> > Build Command: mvn clean install -Phadoop2
>>
>> > Built and tested successfully
>>
>> >
>> Branch: andrewpalumbo/mahout/viennacl-opmmul-a
>>
>> > Bulid Command: mvn clean install -Pviennacl -Phadoop2
>>
>> Builds successfully-
>>
>> > Tests still riddled with
>>
>> >
>> [INFO] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
>>
>> > [WARN] Unable to create class GPUMMul: attempting OpenMP version
>>
>> > [INFO] Creating org.apache.mahout.viennacl.openmp.OMPMMul solver
>>
>> > [INFO] Unable to create class OMPMMul: falling back to java version
>>
>> >
>> Also OOM
>> ViennaCLSuiteOMP:
>> - row-major viennacl::matrix
>> [INFO] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
>> [WARN] Unable to create class GPUMMul: attempting OpenMP version
>> [INFO] Creating org.apache.mahout.viennacl.openmp.OMPMMul solver
>> [INFO] Unable to create class OMPMMul: falling back to java version
>> - mmul microbenchmark
>>   + Mahout multiplication time: 8875 ms.
>>   + ViennaCL/cpu/OpenMP multiplication time: 599 ms.
>> - trans
>> [INFO] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
>> [WARN] Unable to create class GPUMMul: attempting OpenMP version
>> [INFO] Creating org.apache.mahout.viennacl.openmp.OMPMMul solver
>> [INFO] Unable to create class OMPMMul: falling back to java version
>> *** RUN ABORTED ***
>>   java.lang.OutOfMemoryError: Java heap space
>>   at
>> it.unimi.dsi.fastutil.ints.Int2DoubleOpenHashMap.rehash(Int2
>> DoubleOpenHashMap.java:1059)
>>   at
>> it.unimi.dsi.fastutil.ints.Int2DoubleOpenHashMap.insert(Int2
>> DoubleOpenHashMap.java:295)
>>   at
>> it.unimi.dsi.fastutil.ints.Int2DoubleOpenHashMap.put(Int2Dou
>> bleOpenHashMap.java:301)
>>   at
>> org.apache.mahout.math.RandomAccessSparseVector.setQuick(Ran
>> domAccessSparseVector.java:130)
>>   at
>> org.apache.mahout.math.SparseRowMatrix.setQuick(SparseRowMatrix.java:105)
>>   at org.apache.mahout.math.AbstractMatrix.assign(AbstractMatrix.
>> java:256)
>>   at
>> org.apache.mahout.math.scalabindings.MatrixOps.$colon$eq(
>> MatrixOps.scala:192)
>>   at
>> org.apache.mahout.math.scalabindings.MatrixOps.cloned(
>> MatrixOps.scala:260)
>>   at
>> org.apache.mahout.math.scalabindings.MatrixOps.$minus(MatrixOps.scala:66)
>>   at
>> org.apache.mahout.viennacl.openmp.ViennaCLSuiteOMP$$anonfun$
>> 4.apply$mcV$sp(ViennaCLSuiteOMP.scala:152)
>>
>> >
>>
>> Machine Info:
>>
>> > ============================================================
>> ====================
>>
>> >
>>  os major/minor:
>>
>> > $ lsb_release -a
>> No LSB modules are available.
>> Distributor ID: Ubuntu
>> Description: Ubuntu 16.04.1 LTS
>> Release: 16.04
>> Codename: xenial
>>
>> >
>>
>>  chip arch:
>>
>> > $ uname -a
>> Linux Bob 4.4.0-34-generic #53-Ubuntu SMP Wed Jul 27 16:06:39 UTC 2016
>> x86_64 x86_64 x86_64 GNU/Linux
>>
>> >
>> $ cat /proc/cpuinfo
>>
>> > ...
>>
>> > model name : Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz
>>
>> > ...
>>
>> > ^^ 6 cores
>>
>> >
>>
>> GPU:
>>
>> > $ sudo nvidia-smi
>>
>> >
>> +-----------------------------------------------------------
>> ------------------+
>> | NVIDIA-SMI 367.44                 Driver Version: 367.44
>>    |
>> |-------------------------------+----------------------+----
>> ------------------+
>> | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr.
>> ECC |
>> | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute
>> M. |
>> |===============================+======================+====
>> ==================|
>> |   0  GeForce GTX 750 Ti  Off  | 0000:01:00.0      On |
>>  N/A |
>> | 29%   25C    P8     1W /  38W |    307MiB /  1998MiB |      0%
>>  Default |
>> +-------------------------------+----------------------+----
>> ------------------+
>>
>>
>> +-----------------------------------------------------------
>> ------------------+
>> | Processes:                                                       GPU
>> Memory |
>> |  GPU       PID  Type  Process name                               Usage
>>    |
>> |===========================================================
>> ==================|
>> |    0      1537    G   ...C-EnableWebRtcEcdsa,WebRTC-H264WithOpenH2
>>  87MiB |
>> |    0      1990    G   /usr/lib/xorg/Xorg
>> 101MiB |
>> |    0      2319    G   /usr/bin/gnome-shell
>> 104MiB |
>> |    0      4360    G   /usr/lib/xorg/Xorg
>>  12MiB |
>> +-----------------------------------------------------------
>> ------------------+
>>
>> >
>>
>>
>>
>>  clinfo output:
>>
>> > $ clinfo
>> Number of platforms                               1
>>   Platform Name                                   NVIDIA CUDA
>>   Platform Vendor                                 NVIDIA Corporation
>>   Platform Version                                OpenCL 1.2 CUDA 8.0.0
>>   Platform Profile                                FULL_PROFILE
>>   Platform Extensions
>> cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics
>> cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics
>> cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing
>> cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll
>> cl_nv_copy_opts
>>   Platform Extensions function suffix             NV
>>
>>   Platform Name                                   NVIDIA CUDA
>> Number of devices                                 1
>>   Device Name                                     GeForce GTX 750 Ti
>>   Device Vendor                                   NVIDIA Corporation
>>   Device Vendor ID                                0x10de
>>   Device Version                                  OpenCL 1.2 CUDA
>>   Driver Version                                  367.44
>>   Device OpenCL C Version                         OpenCL C 1.2
>>   Device Type                                     GPU
>>   Device Profile                                  FULL_PROFILE
>>   Device Topology (NV)                            PCI-E, 01:00.0
>>   Max compute units                               5
>>   Max clock frequency                             1150MHz
>>   Compute Capability (NV)                         5.0
>>   Device Partition                                (core)
>>     Max number of sub-devices                     1
>>     Supported partition types                     None
>>   Max work item dimensions                        3
>>   Max work item sizes                             1024x1024x64
>>   Max work group size                             1024
>>   Preferred work group size multiple              32
>>   Warp size (NV)                                  32
>>   Preferred / native vector sizes
>>     char                                                 1 / 1
>>     short                                                1 / 1
>>     int                                                  1 / 1
>>     long                                                 1 / 1
>>     half                                                 0 / 0
>> (n/a)
>>     float                                                1 / 1
>>     double                                               1 / 1
>>  (cl_khr_fp64)
>>   Half-precision Floating-point support           (n/a)
>>   Single-precision Floating-point support         (core)
>>     Denormals                                     Yes
>>     Infinity and NANs                             Yes
>>     Round to nearest                              Yes
>>     Round to zero                                 Yes
>>     Round to infinity                             Yes
>>     IEEE754-2008 fused multiply-add               Yes
>>     Support is emulated in software               No
>>     Correctly-rounded divide and sqrt operations  Yes
>>   Double-precision Floating-point support         (cl_khr_fp64)
>>     Denormals                                     Yes
>>     Infinity and NANs                             Yes
>>     Round to nearest                              Yes
>>     Round to zero                                 Yes
>>     Round to infinity                             Yes
>>     IEEE754-2008 fused multiply-add               Yes
>>     Support is emulated in software               No
>>     Correctly-rounded divide and sqrt operations  No
>>   Address bits                                    64, Little-Endian
>>   Global memory size                              2095841280 (1.952GiB)
>>   Error Correction support                        No
>>   Max memory allocation                           523960320 (499.7MiB)
>>   Unified memory for Host and Device              No
>>   Integrated memory (NV)                          No
>>   Minimum alignment for any data type             128 bytes
>>   Alignment of base address                       4096 bits (512 bytes)
>>   Global Memory cache type                        Read/Write
>>   Global Memory cache size                        81920
>>   Global Memory cache line                        128 bytes
>>   Image support                                   Yes
>>     Max number of samplers per kernel             32
>>     Max size for 1D images from buffer            134217728 pixels
>>     Max 1D or 2D image array size                 2048 images
>>     Max 2D image size                             16384x16384 pixels
>>     Max 3D image size                             4096x4096x4096 pixels
>>     Max number of read image args                 256
>>     Max number of write image args                16
>>   Local memory type                               Local
>>   Local memory size                               49152 (48KiB)
>>   Registers per block (NV)                        65536
>>   Max constant buffer size                        65536 (64KiB)
>>   Max number of constant args                     9
>>   Max size of kernel argument                     4352 (4.25KiB)
>>   Queue properties
>>     Out-of-order execution                        Yes
>>     Profiling                                     Yes
>>   Prefer user sync for interop                    No
>>   Profiling timer resolution                      1000ns
>>   Execution capabilities
>>     Run OpenCL kernels                            Yes
>>     Run native kernels                            No
>>     Kernel execution timeout (NV)                 Yes
>>   Concurrent copy and kernel execution (NV)       Yes
>>     Number of async copy engines                  1
>>   printf() buffer size                            1048576 (1024KiB)
>>   Built-in kernels
>>   Device Available                                Yes
>>   Compiler Available                              Yes
>>   Linker Available                                Yes
>>   Device Extensions
>> cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics
>> cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics
>> cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing
>> cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll
>> cl_nv_copy_opts
>>
>> NULL platform behavior
>>   clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  NVIDIA CUDA
>>   clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   Success [NV]
>>   clCreateContext(NULL, ...) [default]            Success [NV]
>>   clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No devices found in
>> platform
>>   clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  No platform
>>   clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No devices
>> found in platform
>>   clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  No devices found
>> in
>> platform
>>   clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  No platform
>>
>> ICD loader properties
>>   ICD loader Name                                 OpenCL ICD Loader
>>   ICD loader Vendor                               OCL Icd free software
>>   ICD loader Version                              2.2.8
>>   ICD loader Profile                              OpenCL 1.2
>> NOTE: your OpenCL library declares to support OpenCL 1.2,
>> but it seems to support up to OpenCL 2.1 too.
>>
>> >
>>
>>  Java major/minor and vendor:
>>
>> > $ java -version
>> openjdk version "1.8.0_91"
>> OpenJDK Runtime Environment (build 1.8.0_91-8u91-b14-3ubuntu1~16.
>> 04.1-b14)
>> OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode)
>>
>> >
>>
>>   vcl version,
>>
>> > (Ignoring what is in /usr/include/viennacl/version.hpp)
>>
>> > 1.7.1
>>
>>
>> gcc version:
>>
>> > $ gcc --version
>> gcc (Ubuntu 5.4.0-6ubuntu1~16.04.2) 5.4.0 20160609
>>
>> >
>>
>> On Wed, Oct 26, 2016 at 11:31 PM, Andrew Palumbo <ap...@outlook.com>
>> wrote:
>>
>> >
>> >
>> >
>> > Could everybody please post any errors when stack trace that they are
>> > getting on master or the vcl pr branch along with full system info?
>> >
>> > Ie.  Branch, os major/minor, chip arch, GPU, clinfo output, Java
>> > major/minor and vendor,  vcl version, gcc version, and anything else
>> that
>> > may be useful so that we may compare?
>> >
>> > Thx
>> >
>> > Andy
>> >
>> >
>> > Sent from my Galaxy Tab A
>> >
>>
>
>

Re: Errors when 0.13.0 master or vcl

Posted by Trevor Grant <tr...@gmail.com>.
Anecdotes on my OOM error:

It only fails on one test, it is *sort of* working though.

I had the NVIDIA monitor up while I run the tests. I did this a few times
and monitored results (if anyone knows a CLI way to do this and pipe
results to a file- dope).

In general, the background memory utilization of my graphics card is 33%
(of 1G total).  As tests run, utilization is as high as 50% of memory and
99% of GPU utilization (for other tests).   This makes the OOM error even
more curious.

Does anyone know a good util to use to monitor the card while it is
running.

Keep up the fight Andrew, I feel like this is very close.

tg


Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things."  -Virgil*


On Fri, Oct 28, 2016 at 6:01 PM, Trevor Grant <tr...@gmail.com>
wrote:

> I was finally able to hit an OOM error.
>
> I initially was running everything from the top.
>
> When I run mvn test from the viennacl directory...
>
>
> /mahout/viennacl$ mvn test
> [INFO] Scanning for projects...
> [WARNING]
> [WARNING] Some problems were encountered while building the effective
> model for org.apache.mahout:mahout-native-viennacl_2.10:jar:0.13.
> 0-SNAPSHOT
> [WARNING] 'artifactId' contains an expression but should be a constant. @
> org.apache.mahout:mahout-native-viennacl_${scala.compat.version}:[unknown-version],
> /home/trevor/gits/mahout/viennacl/pom.xml, line 31, column 15
> [WARNING] 'build.plugins.plugin.(groupId:artifactId)' must be unique but
> found duplicate declaration of plugin org.codehaus.mojo:exec-maven-plugin
> @ org.apache.mahout:mahout-native-viennacl_${scala.
> compat.version}:[unknown-version], /home/trevor/gits/mahout/viennacl/pom.xml,
> line 181, column 15
> [WARNING]
> [WARNING] It is highly recommended to fix these problems because they
> threaten the stability of your build.
> [WARNING]
> [WARNING] For this reason, future Maven versions might no longer support
> building such malformed projects.
> [WARNING]
> [INFO]
>
> [INFO] ------------------------------------------------------------
> ------------
> [INFO] Building Mahout Native VienniaCL OpenCL Bindings 0.13.0-SNAPSHOT
> [INFO] ------------------------------------------------------------
> ------------
> [INFO]
> [INFO] --- maven-enforcer-plugin:1.4:enforce (enforce-versions) @
> mahout-native-viennacl_2.10 ---
> [INFO]
> [INFO] --- scala-maven-plugin:3.2.0:add-source (add-scala-sources) @
> mahout-native-viennacl_2.10 ---
> [INFO] Add Source directory: /home/trevor/gits/mahout/
> viennacl/src/main/scala
> [INFO] Add Test Source directory: /home/trevor/gits/mahout/
> viennacl/src/test/scala
> [INFO]
> [INFO] --- maven-dependency-plugin:2.3:properties (default) @
> mahout-native-viennacl_2.10 ---
> [INFO]
> [INFO] --- maven-remote-resources-plugin:1.5:process (default) @
> mahout-native-viennacl_2.10 ---
> [INFO]
> [INFO] --- maven-resources-plugin:2.7:resources (default-resources) @
> mahout-native-viennacl_2.10 ---
> [INFO] Using 'UTF-8' encoding to copy filtered resources.
> [INFO] skip non existing resourceDirectory /home/trevor/gits/mahout/
> viennacl/src/main/resources
> [INFO] Copying 3 resources
> [INFO]
> [INFO] --- scala-maven-plugin:3.2.0:compile (scala-compile) @
> mahout-native-viennacl_2.10 ---
> [INFO] Nothing to compile - all classes are up to date
> [INFO]
> [INFO] --- maven-compiler-plugin:3.3:compile (default-compile) @
> mahout-native-viennacl_2.10 ---
> [INFO] Nothing to compile - all classes are up to date
> [INFO]
> [INFO] --- exec-maven-plugin:1.2.1:exec (javacpp) @
> mahout-native-viennacl_2.10 ---
> Warning: Could not load platform properties for class
> org.apache.mahout.viennacl.opencl.GPUMMul
> Warning: Could not load platform properties for class
> org.apache.mahout.viennacl.opencl.GPUMMul$
> Generating /home/trevor/gits/mahout/viennacl/target/classes/org/
> apache/mahout/viennacl/opencl/javacpp/jniViennaCL.cpp
> Compiling /home/trevor/gits/mahout/viennacl/target/classes/org/
> apache/mahout/viennacl/opencl/javacpp/linux-x86_64/libjniViennaCL.so
> g++ -I/usr/include/viennacl -I/usr/lib/jvm/java-7-openjdk-amd64/include
> -I/usr/lib/jvm/java-7-openjdk-amd64/include/linux
> /home/trevor/gits/mahout/viennacl/target/classes/org/
> apache/mahout/viennacl/opencl/javacpp/jniViennaCL.cpp -msse3 -ffast-math
> -fopenmp -fpermissive -Wl,-rpath,$ORIGIN/ -Wl,-z,noexecstack -Wl,-Bsymbolic
> -march=native -m64 -Wall -Ofast -fPIC -shared -s -o libjniViennaCL.so
> -lOpenCL
> Deleting /home/trevor/gits/mahout/viennacl/target/classes/org/
> apache/mahout/viennacl/opencl/javacpp/jniViennaCL.cpp
> [INFO]
> [INFO] --- maven-resources-plugin:2.7:testResources
> (default-testResources) @ mahout-native-viennacl_2.10 ---
> [INFO] Using 'UTF-8' encoding to copy filtered resources.
> [INFO] skip non existing resourceDirectory /home/trevor/gits/mahout/
> viennacl/src/test/resources
> [INFO] Copying 3 resources
> [INFO]
> [INFO] --- scala-maven-plugin:3.2.0:testCompile (scala-test-compile) @
> mahout-native-viennacl_2.10 ---
> [INFO] Nothing to compile - all classes are up to date
> [INFO]
> [INFO] --- maven-compiler-plugin:3.3:testCompile (default-testCompile) @
> mahout-native-viennacl_2.10 ---
> [INFO] Nothing to compile - all classes are up to date
> [INFO]
> [INFO] --- maven-surefire-plugin:2.18.1:test (default-test) @
> mahout-native-viennacl_2.10 ---
> [INFO] Tests are skipped.
> [INFO]
> [INFO] --- scalatest-maven-plugin:1.0:test (test) @
> mahout-native-viennacl_2.10 ---
> Discovery starting.
> Discovery completed in 201 milliseconds.
> Run starting. Expected test count is: 7
> ViennaCLSuiteVCL:
> - row-major viennacl::matrix
>   + OCL matrix memory domain after assgn=2
> [INFO] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
> [INFO] Successfully created org.apache.mahout.viennacl.opencl.GPUMMul
> solver
> jvmRWRW
> gpuRWCW
> log4j:WARN No appenders could be found for logger
> (org.apache.mahout.viennacl.opencl.GPUMMul$).
> log4j:WARN Please initialize the log4j system properly.
> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for
> more info.
> [INFO] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
> [INFO] Successfully created org.apache.mahout.viennacl.opencl.GPUMMul
> solver
> jvmRWRW
> gpuRWCW
> - dense vcl mmul with fast_copy
> [INFO] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
> [INFO] Successfully created org.apache.mahout.viennacl.opencl.GPUMMul
> solver
> jvmRWRW
> gpuRWCW
> - mmul microbenchmark
>   + Mahout multiplication time: 2630 ms.
>   + ViennaCL/OpenCL multiplication time: 2072 ms.
>   + ViennaCL/cpu/OpenMP multiplication time: 813 ms.
> - trans
> [INFO] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
> [INFO] Successfully created org.apache.mahout.viennacl.opencl.GPUMMul
> solver
> gpuSparseRWRW
> ViennaCL: FATAL ERROR: Kernel start failed for 'spgemm_stage3'.
> ViennaCL: Smaller work sizes could not solve the problem.
> ViennaCL: FATAL ERROR: CL_MEM_OBJECT_ALLOCATION_FAILURE
>  ViennaCL could not allocate memory on the device. Most likely the device
> simply ran out of memory.
> If you think that this is a bug in ViennaCL, please report it at
> viennacl-support@lists.sourceforge.net and supply at least the following
> information:
>  * Operating System
>  * Which OpenCL implementation (AMD, NVIDIA, etc.)
>  * ViennaCL version
> Many thanks in advance!falling back to JVM MMUL
> ViennaCL: FATAL ERROR: Kernel start failed for 'spgemm_stage3'.
> ViennaCL: Smaller work sizes could not solve the problem.
> - sparse mmul microbenchmark *** FAILED ***
>   java.lang.RuntimeException: ViennaCL: FATAL ERROR:
> CL_MEM_OBJECT_ALLOCATION_FAILURE
>  ViennaCL could not allocate memory on the device. Most likely the device
> simply ran out of memory.
> If you think that this is a bug in ViennaCL, please report it at
> viennacl-support@lists.sourceforge.net and supply at least the following
> information:
>  * Operating System
>  * Which OpenCL implementation (AMD, NVIDIA, etc.)
>  * ViennaCL version
> Many thanks in advance!
>   at org.apache.mahout.viennacl.opencl.javacpp.CompressedMatrix.allocate(Native
> Method)
>   at org.apache.mahout.viennacl.opencl.javacpp.CompressedMatrix.<init>(
> CompressedMatrix.scala:61)
>   at org.apache.mahout.viennacl.opencl.ViennaCLSuiteVCL$$
> anonfun$5.apply$mcV$sp(ViennaCLSuiteVCL.scala:250)
>   at org.apache.mahout.viennacl.opencl.ViennaCLSuiteVCL$$anonfun$5.apply(
> ViennaCLSuiteVCL.scala:218)
>   at org.apache.mahout.viennacl.opencl.ViennaCLSuiteVCL$$anonfun$5.apply(
> ViennaCLSuiteVCL.scala:218)
>   at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(
> Transformer.scala:22)
>   at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
>   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
>   at org.scalatest.Transformer.apply(Transformer.scala:22)
>   at org.scalatest.Transformer.apply(Transformer.scala:20)
>   ...
>   + Mahout Sparse multiplication time: 13303 ms.
> - VCL Dense Matrix %*% Dense vector
>   + Mahout dense matrix %*% dense vector multiplication time: 0 ms.
>   + ViennaCL/cpu/OpenMP dense matrix %*% dense vector multiplication time:
> 4 ms.
> [INFO] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
> [INFO] Successfully created org.apache.mahout.viennacl.opencl.GPUMMul
> solver
> gpuRWCW
> - Sparse %*% Dense mmul microbenchmark
>   + Mahout multiplication time: 821 ms.
>   + ViennaCL/OpenCL multiplication time: 754 ms.
>   + ViennaCL/cpu/OpenMP multiplication time: 1473 ms.
> Run completed in 26 seconds, 944 milliseconds.
> Total number of tests run: 7
> Suites: completed 2, aborted 0
> Tests: succeeded 6, failed 1, canceled 0, ignored 0, pending 0
> *** 1 TEST FAILED ***
> [INFO] ------------------------------------------------------------
> ------------
> [INFO] BUILD FAILURE
> [INFO] ------------------------------------------------------------
> ------------
> [INFO] Total time: 45.600 s
> [INFO] Finished at: 2016-10-28T17:58:45-05:00
> [INFO] Final Memory: 20M/309M
> [INFO] ------------------------------------------------------------
> ------------
> [ERROR] Failed to execute goal org.scalatest:scalatest-maven-plugin:1.0:test
> (test) on project mahout-native-viennacl_2.10: There are test failures ->
> [Help 1]
> [ERROR]
> [ERROR] To see the full stack trace of the errors, re-run Maven with the
> -e switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR]
> [ERROR] For more information about the errors and possible solutions,
> please read the following articles:
> [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/
> MojoFailureException
>
>
> Trevor Grant
> Data Scientist
> https://github.com/rawkintrevo
> http://stackexchange.com/users/3002022/rawkintrevo
> http://trevorgrant.org
>
> *"Fortunate is he, who is able to know the causes of things."  -Virgil*
>
>
> On Fri, Oct 28, 2016 at 5:38 PM, Andrew Musselman <
> andrew.musselman@gmail.com> wrote:
>
>> >
>> > My Desktop
>>
>> >
>> Branch: master
>>
>> > Build Command: mvn clean install -Phadoop2
>>
>> > Built and tested successfully
>>
>> >
>> Branch: andrewpalumbo/mahout/viennacl-opmmul-a
>>
>> > Bulid Command: mvn clean install -Pviennacl -Phadoop2
>>
>> Builds successfully-
>>
>> > Tests still riddled with
>>
>> >
>> [INFO] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
>>
>> > [WARN] Unable to create class GPUMMul: attempting OpenMP version
>>
>> > [INFO] Creating org.apache.mahout.viennacl.openmp.OMPMMul solver
>>
>> > [INFO] Unable to create class OMPMMul: falling back to java version
>>
>> >
>> Also OOM
>> ViennaCLSuiteOMP:
>> - row-major viennacl::matrix
>> [INFO] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
>> [WARN] Unable to create class GPUMMul: attempting OpenMP version
>> [INFO] Creating org.apache.mahout.viennacl.openmp.OMPMMul solver
>> [INFO] Unable to create class OMPMMul: falling back to java version
>> - mmul microbenchmark
>>   + Mahout multiplication time: 8875 ms.
>>   + ViennaCL/cpu/OpenMP multiplication time: 599 ms.
>> - trans
>> [INFO] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
>> [WARN] Unable to create class GPUMMul: attempting OpenMP version
>> [INFO] Creating org.apache.mahout.viennacl.openmp.OMPMMul solver
>> [INFO] Unable to create class OMPMMul: falling back to java version
>> *** RUN ABORTED ***
>>   java.lang.OutOfMemoryError: Java heap space
>>   at
>> it.unimi.dsi.fastutil.ints.Int2DoubleOpenHashMap.rehash(Int2
>> DoubleOpenHashMap.java:1059)
>>   at
>> it.unimi.dsi.fastutil.ints.Int2DoubleOpenHashMap.insert(Int2
>> DoubleOpenHashMap.java:295)
>>   at
>> it.unimi.dsi.fastutil.ints.Int2DoubleOpenHashMap.put(Int2Dou
>> bleOpenHashMap.java:301)
>>   at
>> org.apache.mahout.math.RandomAccessSparseVector.setQuick(Ran
>> domAccessSparseVector.java:130)
>>   at
>> org.apache.mahout.math.SparseRowMatrix.setQuick(SparseRowMatrix.java:105)
>>   at org.apache.mahout.math.AbstractMatrix.assign(AbstractMatrix.
>> java:256)
>>   at
>> org.apache.mahout.math.scalabindings.MatrixOps.$colon$eq(
>> MatrixOps.scala:192)
>>   at
>> org.apache.mahout.math.scalabindings.MatrixOps.cloned(
>> MatrixOps.scala:260)
>>   at
>> org.apache.mahout.math.scalabindings.MatrixOps.$minus(MatrixOps.scala:66)
>>   at
>> org.apache.mahout.viennacl.openmp.ViennaCLSuiteOMP$$anonfun$
>> 4.apply$mcV$sp(ViennaCLSuiteOMP.scala:152)
>>
>> >
>>
>> Machine Info:
>>
>> > ============================================================
>> ====================
>>
>> >
>>  os major/minor:
>>
>> > $ lsb_release -a
>> No LSB modules are available.
>> Distributor ID: Ubuntu
>> Description: Ubuntu 16.04.1 LTS
>> Release: 16.04
>> Codename: xenial
>>
>> >
>>
>>  chip arch:
>>
>> > $ uname -a
>> Linux Bob 4.4.0-34-generic #53-Ubuntu SMP Wed Jul 27 16:06:39 UTC 2016
>> x86_64 x86_64 x86_64 GNU/Linux
>>
>> >
>> $ cat /proc/cpuinfo
>>
>> > ...
>>
>> > model name : Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz
>>
>> > ...
>>
>> > ^^ 6 cores
>>
>> >
>>
>> GPU:
>>
>> > $ sudo nvidia-smi
>>
>> >
>> +-----------------------------------------------------------
>> ------------------+
>> | NVIDIA-SMI 367.44                 Driver Version: 367.44
>>    |
>> |-------------------------------+----------------------+----
>> ------------------+
>> | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr.
>> ECC |
>> | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute
>> M. |
>> |===============================+======================+====
>> ==================|
>> |   0  GeForce GTX 750 Ti  Off  | 0000:01:00.0      On |
>>  N/A |
>> | 29%   25C    P8     1W /  38W |    307MiB /  1998MiB |      0%
>>  Default |
>> +-------------------------------+----------------------+----
>> ------------------+
>>
>>
>> +-----------------------------------------------------------
>> ------------------+
>> | Processes:                                                       GPU
>> Memory |
>> |  GPU       PID  Type  Process name                               Usage
>>    |
>> |===========================================================
>> ==================|
>> |    0      1537    G   ...C-EnableWebRtcEcdsa,WebRTC-H264WithOpenH2
>>  87MiB |
>> |    0      1990    G   /usr/lib/xorg/Xorg
>> 101MiB |
>> |    0      2319    G   /usr/bin/gnome-shell
>> 104MiB |
>> |    0      4360    G   /usr/lib/xorg/Xorg
>>  12MiB |
>> +-----------------------------------------------------------
>> ------------------+
>>
>> >
>>
>>
>>
>>  clinfo output:
>>
>> > $ clinfo
>> Number of platforms                               1
>>   Platform Name                                   NVIDIA CUDA
>>   Platform Vendor                                 NVIDIA Corporation
>>   Platform Version                                OpenCL 1.2 CUDA 8.0.0
>>   Platform Profile                                FULL_PROFILE
>>   Platform Extensions
>> cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics
>> cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics
>> cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing
>> cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll
>> cl_nv_copy_opts
>>   Platform Extensions function suffix             NV
>>
>>   Platform Name                                   NVIDIA CUDA
>> Number of devices                                 1
>>   Device Name                                     GeForce GTX 750 Ti
>>   Device Vendor                                   NVIDIA Corporation
>>   Device Vendor ID                                0x10de
>>   Device Version                                  OpenCL 1.2 CUDA
>>   Driver Version                                  367.44
>>   Device OpenCL C Version                         OpenCL C 1.2
>>   Device Type                                     GPU
>>   Device Profile                                  FULL_PROFILE
>>   Device Topology (NV)                            PCI-E, 01:00.0
>>   Max compute units                               5
>>   Max clock frequency                             1150MHz
>>   Compute Capability (NV)                         5.0
>>   Device Partition                                (core)
>>     Max number of sub-devices                     1
>>     Supported partition types                     None
>>   Max work item dimensions                        3
>>   Max work item sizes                             1024x1024x64
>>   Max work group size                             1024
>>   Preferred work group size multiple              32
>>   Warp size (NV)                                  32
>>   Preferred / native vector sizes
>>     char                                                 1 / 1
>>     short                                                1 / 1
>>     int                                                  1 / 1
>>     long                                                 1 / 1
>>     half                                                 0 / 0
>> (n/a)
>>     float                                                1 / 1
>>     double                                               1 / 1
>>  (cl_khr_fp64)
>>   Half-precision Floating-point support           (n/a)
>>   Single-precision Floating-point support         (core)
>>     Denormals                                     Yes
>>     Infinity and NANs                             Yes
>>     Round to nearest                              Yes
>>     Round to zero                                 Yes
>>     Round to infinity                             Yes
>>     IEEE754-2008 fused multiply-add               Yes
>>     Support is emulated in software               No
>>     Correctly-rounded divide and sqrt operations  Yes
>>   Double-precision Floating-point support         (cl_khr_fp64)
>>     Denormals                                     Yes
>>     Infinity and NANs                             Yes
>>     Round to nearest                              Yes
>>     Round to zero                                 Yes
>>     Round to infinity                             Yes
>>     IEEE754-2008 fused multiply-add               Yes
>>     Support is emulated in software               No
>>     Correctly-rounded divide and sqrt operations  No
>>   Address bits                                    64, Little-Endian
>>   Global memory size                              2095841280 (1.952GiB)
>>   Error Correction support                        No
>>   Max memory allocation                           523960320 (499.7MiB)
>>   Unified memory for Host and Device              No
>>   Integrated memory (NV)                          No
>>   Minimum alignment for any data type             128 bytes
>>   Alignment of base address                       4096 bits (512 bytes)
>>   Global Memory cache type                        Read/Write
>>   Global Memory cache size                        81920
>>   Global Memory cache line                        128 bytes
>>   Image support                                   Yes
>>     Max number of samplers per kernel             32
>>     Max size for 1D images from buffer            134217728 pixels
>>     Max 1D or 2D image array size                 2048 images
>>     Max 2D image size                             16384x16384 pixels
>>     Max 3D image size                             4096x4096x4096 pixels
>>     Max number of read image args                 256
>>     Max number of write image args                16
>>   Local memory type                               Local
>>   Local memory size                               49152 (48KiB)
>>   Registers per block (NV)                        65536
>>   Max constant buffer size                        65536 (64KiB)
>>   Max number of constant args                     9
>>   Max size of kernel argument                     4352 (4.25KiB)
>>   Queue properties
>>     Out-of-order execution                        Yes
>>     Profiling                                     Yes
>>   Prefer user sync for interop                    No
>>   Profiling timer resolution                      1000ns
>>   Execution capabilities
>>     Run OpenCL kernels                            Yes
>>     Run native kernels                            No
>>     Kernel execution timeout (NV)                 Yes
>>   Concurrent copy and kernel execution (NV)       Yes
>>     Number of async copy engines                  1
>>   printf() buffer size                            1048576 (1024KiB)
>>   Built-in kernels
>>   Device Available                                Yes
>>   Compiler Available                              Yes
>>   Linker Available                                Yes
>>   Device Extensions
>> cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics
>> cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics
>> cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing
>> cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll
>> cl_nv_copy_opts
>>
>> NULL platform behavior
>>   clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  NVIDIA CUDA
>>   clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   Success [NV]
>>   clCreateContext(NULL, ...) [default]            Success [NV]
>>   clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No devices found in
>> platform
>>   clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  No platform
>>   clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No devices
>> found in platform
>>   clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  No devices found
>> in
>> platform
>>   clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  No platform
>>
>> ICD loader properties
>>   ICD loader Name                                 OpenCL ICD Loader
>>   ICD loader Vendor                               OCL Icd free software
>>   ICD loader Version                              2.2.8
>>   ICD loader Profile                              OpenCL 1.2
>> NOTE: your OpenCL library declares to support OpenCL 1.2,
>> but it seems to support up to OpenCL 2.1 too.
>>
>> >
>>
>>  Java major/minor and vendor:
>>
>> > $ java -version
>> openjdk version "1.8.0_91"
>> OpenJDK Runtime Environment (build 1.8.0_91-8u91-b14-3ubuntu1~16.
>> 04.1-b14)
>> OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode)
>>
>> >
>>
>>   vcl version,
>>
>> > (Ignoring what is in /usr/include/viennacl/version.hpp)
>>
>> > 1.7.1
>>
>>
>> gcc version:
>>
>> > $ gcc --version
>> gcc (Ubuntu 5.4.0-6ubuntu1~16.04.2) 5.4.0 20160609
>>
>> >
>>
>> On Wed, Oct 26, 2016 at 11:31 PM, Andrew Palumbo <ap...@outlook.com>
>> wrote:
>>
>> >
>> >
>> >
>> > Could everybody please post any errors when stack trace that they are
>> > getting on master or the vcl pr branch along with full system info?
>> >
>> > Ie.  Branch, os major/minor, chip arch, GPU, clinfo output, Java
>> > major/minor and vendor,  vcl version, gcc version, and anything else
>> that
>> > may be useful so that we may compare?
>> >
>> > Thx
>> >
>> > Andy
>> >
>> >
>> > Sent from my Galaxy Tab A
>> >
>>
>
>

Re: Errors when 0.13.0 master or vcl

Posted by Andrew Palumbo <ap...@outlook.com>.
So here's my output with from:



$ mvn clean install -Pviennacl -DskipTests

$ cd viennacl

$ mvn test




[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] Mahout Build Tools ................................. SUCCESS [  2.491 s]
[INFO] Apache Mahout ...................................... SUCCESS [  0.063 s]
[INFO] Mahout Math ........................................ SUCCESS [ 10.077 s]
[INFO] Mahout HDFS ........................................ SUCCESS [  2.393 s]
[INFO] Mahout Math Scala bindings ......................... SUCCESS [ 40.426 s]
[INFO] Mahout Spark bindings .............................. SUCCESS [ 47.232 s]
[INFO] Mahout Flink bindings .............................. SUCCESS [ 38.260 s]
[INFO] Mahout Spark bindings shell ........................ SUCCESS [  8.020 s]
[INFO] Mahout Release Package ............................. SUCCESS [  3.352 s]
[INFO] Mahout H2O backend ................................. SUCCESS [ 23.609 s]
[INFO] Mahout Native VienniaCL OpenCL Bindings ............ SUCCESS [ 50.515 s]
[INFO] Mahout Native VienniaCL OpenMP Bindings ............ SUCCESS [ 24.820 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 04:11 min
[INFO] Finished at: 2016-10-29T18:28:51-07:00
[INFO] Final Memory: 127M/1432M
[INFO] ------------------------------------------------------------------------
andrew@michael:~/sandbox/mahout$ cd viennacl
andrew@michael:~/sandbox/mahout/viennacl$ mvn test
[INFO] Scanning for projects...
[WARNING]
[WARNING] Some problems were encountered while building the effective model for org.apache.mahout:mahout-native-viennacl_2.10:jar:0.12.3-SNAPSHOT
[WARNING] 'artifactId' contains an expression but should be a constant. @ org.apache.mahout:mahout-native-viennacl_${scala.compat.version}:[unknown-version], /home/andrew/sandbox/mahout/viennacl/pom.xml, line 31, column 15
[WARNING] 'build.plugins.plugin.(groupId:artifactId)' must be unique but found duplicate declaration of plugin org.codehaus.mojo:exec-maven-plugin @ org.apache.mahout:mahout-native-viennacl_${scala.compat.version}:[unknown-version], /home/andrew/sandbox/mahout/viennacl/pom.xml, line 181, column 15
[WARNING]
[WARNING] It is highly recommended to fix these problems because they threaten the stability of your build.
[WARNING]
[WARNING] For this reason, future Maven versions might no longer support building such malformed projects.
[WARNING]
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] Building Mahout Native VienniaCL OpenCL Bindings 0.12.3-SNAPSHOT
[INFO] ------------------------------------------------------------------------
[INFO]
[INFO] --- maven-enforcer-plugin:1.4:enforce (enforce-versions) @ mahout-native-viennacl_2.10 ---
[INFO]
[INFO] --- scala-maven-plugin:3.2.0:add-source (add-scala-sources) @ mahout-native-viennacl_2.10 ---
[INFO] Add Source directory: /home/andrew/sandbox/mahout/viennacl/src/main/scala
[INFO] Add Test Source directory: /home/andrew/sandbox/mahout/viennacl/src/test/scala
[INFO]
[INFO] --- maven-dependency-plugin:2.3:properties (default) @ mahout-native-viennacl_2.10 ---
[INFO]
[INFO] --- maven-remote-resources-plugin:1.5:process (default) @ mahout-native-viennacl_2.10 ---
[INFO]
[INFO] --- maven-resources-plugin:2.7:resources (default-resources) @ mahout-native-viennacl_2.10 ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory /home/andrew/sandbox/mahout/viennacl/src/main/resources
[INFO] Copying 3 resources
[INFO]
[INFO] --- scala-maven-plugin:3.2.0:compile (scala-compile) @ mahout-native-viennacl_2.10 ---
[INFO] Nothing to compile - all classes are up to date
[INFO]
[INFO] --- maven-compiler-plugin:3.3:compile (default-compile) @ mahout-native-viennacl_2.10 ---
[INFO] Nothing to compile - all classes are up to date
[INFO]
[INFO] --- exec-maven-plugin:1.2.1:exec (javacpp) @ mahout-native-viennacl_2.10 ---
Warning: Could not load platform properties for class org.apache.mahout.viennacl.opencl.GPUMMul
Warning: Could not load platform properties for class org.apache.mahout.viennacl.opencl.GPUMMul$
Generating /home/andrew/sandbox/mahout/viennacl/target/classes/org/apache/mahout/viennacl/opencl/javacpp/jniViennaCL.cpp
Compiling /home/andrew/sandbox/mahout/viennacl/target/classes/org/apache/mahout/viennacl/opencl/javacpp/linux-x86_64/libjniViennaCL.so
g++ -I/usr/include/viennacl -I/home/andrew/java/jdk1.8.0_91/include -I/home/andrew/java/jdk1.8.0_91/include/linux /home/andrew/sandbox/mahout/viennacl/target/classes/org/apache/mahout/viennacl/opencl/javacpp/jniViennaCL.cpp -msse3 -ffast-math -fopenmp -fpermissive -Wl,-rpath,$ORIGIN/ -Wl,-z,noexecstack -Wl,-Bsymbolic -march=native -m64 -Wall -Ofast -fPIC -shared -s -o libjniViennaCL.so -lOpenCL
Deleting /home/andrew/sandbox/mahout/viennacl/target/classes/org/apache/mahout/viennacl/opencl/javacpp/jniViennaCL.cpp
[INFO]
[INFO] --- maven-resources-plugin:2.7:testResources (default-testResources) @ mahout-native-viennacl_2.10 ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory /home/andrew/sandbox/mahout/viennacl/src/test/resources
[INFO] Copying 3 resources
[INFO]
[INFO] --- scala-maven-plugin:3.2.0:testCompile (scala-test-compile) @ mahout-native-viennacl_2.10 ---
[INFO] Nothing to compile - all classes are up to date
[INFO]
[INFO] --- maven-compiler-plugin:3.3:testCompile (default-testCompile) @ mahout-native-viennacl_2.10 ---
[INFO] Nothing to compile - all classes are up to date
[INFO]
[INFO] --- maven-surefire-plugin:2.18.1:test (default-test) @ mahout-native-viennacl_2.10 ---
[INFO] Tests are skipped.
[INFO]
[INFO] --- scalatest-maven-plugin:1.0:test (test) @ mahout-native-viennacl_2.10 ---
Discovery starting.
Discovery completed in 326 milliseconds.
Run starting. Expected test count is: 7
ViennaCLSuiteVCL:
- row-major viennacl::matrix
  + OCL matrix memory domain after assgn=2
[INFO] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
[INFO] Successfully created org.apache.mahout.viennacl.opencl.GPUMMul solver
jvmRWRW
gpuRWCW
log4j:WARN No appenders could be found for logger (org.apache.mahout.viennacl.opencl.GPUMMul$).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
[INFO] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
[INFO] Successfully created org.apache.mahout.viennacl.opencl.GPUMMul solver
jvmRWRW
gpuRWCW
- dense vcl mmul with fast_copy
[INFO] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
[INFO] Successfully created org.apache.mahout.viennacl.opencl.GPUMMul solver
jvmRWRW
gpuRWCW
- mmul microbenchmark
  + Mahout multiplication time: 3066 ms.
  + ViennaCL/OpenCL multiplication time: 3477 ms.
  + ViennaCL/cpu/OpenMP multiplication time: 2582 ms.
- trans
[INFO] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
[INFO] Successfully created org.apache.mahout.viennacl.opencl.GPUMMul solver
gpuSparseRWRW
- sparse mmul microbenchmark
  + Mahout Sparse multiplication time: 12157 ms.
  + ViennaCL/OpenCL Sparse multiplication time: 10956 ms.
  + ViennaCL/cpu/OpenMP Sparse multiplication time: 3764 ms.
- VCL Dense Matrix %*% Dense vector
  + Mahout dense matrix %*% dense vector multiplication time: 1 ms.
  + ViennaCL/cpu/OpenMP dense matrix %*% dense vector multiplication time: 4 ms.
[INFO] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
[INFO] Successfully created org.apache.mahout.viennacl.opencl.GPUMMul solver
gpuSparseRowRWRW
- Sparse %*% Dense mmul microbenchmark
  + Mahout multiplication time: 1067 ms.
  + ViennaCL/OpenCL multiplication time: 955 ms.
  + ViennaCL/cpu/OpenMP multiplication time: 2382 ms.
Run completed in 51 seconds, 876 milliseconds.
Total number of tests run: 7
Suites: completed 2, aborted 0
Tests: succeeded 7, failed 0, canceled 0, ignored 0, pending 0
All tests passed.
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 01:23 min
[INFO] Finished at: 2016-10-29T18:34:11-07:00
[INFO] Final Memory: 25M/562M
[INFO] ------------------------------------------------------------------------



I do have some local changes from when Pat changed the version to 0.13, and  small change in spark that i think I was using to fix what Trevor's pom.xml workaround was needed for.


I'll push them soon to the branch.



My system:


lsb_release -a
No LSB modules are available.
Distributor ID:    Ubuntu
Description:    Ubuntu 16.04.1 LTS
Release:    16.04
Codename:    xenial


andrew@michael:~/sandbox/mahout/viennacl$ uname -a
Linux michael 4.4.0-42-generic #62-Ubuntu SMP Fri Oct 7 23:11:45 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

andrew@michael:~/sandbox/mahout/viennacl$ lspci
{...}
00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 10h Processor DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 10h Processor Miscellaneous Control
00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 10h Processor Link Control
01:00.0 VGA compatible controller: NVIDIA Corporation GK107 [GeForce GT 740] (rev a1)
01:00.1 Audio device: NVIDIA Corporation GK107 HDMI Audio Controller (rev a1)
02:00.0 Network controller: Ralink corp. RT5390 Wireless 802.11n 1T/1R PCIe
03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8101/2/6E PCI Express Fast/Gigabit Ethernet controller (rev 05)



cat /proc/cpuinfo
processor    : 0
vendor_id    : AuthenticAMD
cpu family    : 16
model        : 5
model name    : AMD Athlon(tm) II X4 645 Processor
stepping    : 3

4 core^^



andrew@michael:~/sandbox/mahout/viennacl$  sudo nvidia-smi
[sudo] password for andrew:
Sat Oct 29 18:50:30 2016
+------------------------------------------------------+
| NVIDIA-SMI 361.42     Driver Version: 361.42         |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GT 740      Off  | 0000:01:00.0     N/A |                  N/A |
| 30%   37C    P8    N/A /  N/A |    151MiB /  2046MiB |     N/A      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0                  Not Supported                                         |
+-----------------------------------------------------------------------------+






andrew@michael:~/sandbox/mahout/viennacl$ clinfo
Number of platforms                               1
  Platform Name                                   NVIDIA CUDA
  Platform Vendor                                 NVIDIA Corporation
  Platform Version                                OpenCL 1.2 CUDA 8.0.20
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts
  Platform Extensions function suffix             NV

  Platform Name                                   NVIDIA CUDA
Number of devices                                 1
  Device Name                                     GeForce GT 740
  Device Vendor                                   NVIDIA Corporation
  Device Vendor ID                                0x10de
  Device Version                                  OpenCL 1.2 CUDA
  Driver Version                                  361.42
  Device OpenCL C Version                         OpenCL C 1.2
  Device Type                                     GPU
  Device Profile                                  FULL_PROFILE
  Device Topology (NV)                            PCI-E, 01:00.0
  Max compute units                               2
  Max clock frequency                             993MHz
  Compute Capability (NV)                         3.0
  Device Partition                                (core)
    Max number of sub-devices                     1
    Supported partition types                     None
  Max work item dimensions                        3
  Max work item sizes                             1024x1024x64
  Max work group size                             1024
  Preferred work group size multiple              32
  Warp size (NV)                                  32
  Preferred / native vector sizes
    char                                                 1 / 1
    short                                                1 / 1
    int                                                  1 / 1
    long                                                 1 / 1
    half                                                 0 / 0        (n/a)
    float                                                1 / 1
    double                                               1 / 1        (cl_khr_fp64)
  Half-precision Floating-point support           (n/a)
  Single-precision Floating-point support         (core)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  Yes
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Address bits                                    64, Little-Endian
  Global memory size                              2145927168 (1.999GiB)
  Error Correction support                        No
  Max memory allocation                           536481792 (511.6MiB)
  Unified memory for Host and Device              No
  Integrated memory (NV)                          No
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       4096 bits (512 bytes)
  Global Memory cache type                        Read/Write
  Global Memory cache size                        32768
  Global Memory cache line                        128 bytes
  Image support                                   Yes
    Max number of samplers per kernel             32
    Max size for 1D images from buffer            134217728 pixels
    Max 1D or 2D image array size                 2048 images
    Max 2D image size                             16384x16384 pixels
    Max 3D image size                             4096x4096x4096 pixels
    Max number of read image args                 256
    Max number of write image args                16
  Local memory type                               Local
  Local memory size                               49152 (48KiB)
  Registers per block (NV)                        65536
  Max constant buffer size                        65536 (64KiB)
  Max number of constant args                     9
  Max size of kernel argument                     4352 (4.25KiB)
  Queue properties
    Out-of-order execution                        Yes
    Profiling                                     Yes
  Prefer user sync for interop                    No
  Profiling timer resolution                      1000ns
  Execution capabilities
    Run OpenCL kernels                            Yes
    Run native kernels                            No
    Kernel execution timeout (NV)                 Yes
  Concurrent copy and kernel execution (NV)       Yes
    Number of async copy engines                  1
  printf() buffer size                            1048576 (1024KiB)
  Built-in kernels
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Device Extensions                               cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts

NULL platform behavior
  clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  NVIDIA CUDA
  clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   Success [NV]
  clCreateContext(NULL, ...) [default]            Success [NV]
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  No platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  No platform

ICD loader properties
  ICD loader Name                                 OpenCL ICD Loader
  ICD loader Vendor                               OCL Icd free software
  ICD loader Version                              2.2.8
  ICD loader Profile                              OpenCL 1.2
    NOTE:    your OpenCL library declares to support OpenCL 1.2,
        but it seems to support up to OpenCL 2.1 too.





________________________________
From: Andrew Palumbo <ap...@outlook.com>
Sent: Saturday, October 29, 2016 3:13:23 PM
To: dev@mahout.apache.org
Subject: RE: Errors when 0.13.0 master or vcl

Hmm.. ok the viennacl-omp tests failing is strange.  But actually I have not spent as much time on that stuff.

I'll be looking into this later this afternoon..

Suneel, could you send ur errors and arch details please?

I'll also send out a c++ snippet with some raw vcl matrix ops just to make sure that gpus are working.

Andrew.. I know u said u didn't see any gpu usage?

Ok I'll get on this later this afternoon... it's a paycheck morning for me.



Sent from my Verizon Wireless 4G LTE smartphone


-------- Original message --------
From: Andrew Musselman <an...@gmail.com>
Date: 10/29/2016 10:45 AM (GMT-08:00)
To: dev@mahout.apache.org
Subject: Re: Errors when 0.13.0 master or vcl

It's not reliable; I've had the -Pviennacl build pass and fail, and the
-Pviennacl-omp build fail but not pass.

On Sat, Oct 29, 2016 at 10:27 AM, Andrew Palumbo <ap...@outlook.com> wrote:

> Ok this makes sense.. it's not actually a (java) OOM error, it's actually
> a blanket GPU error... that's why we have the fall backs to omp and jvm..
>
> Does it happen each time you run that test?  Some times these are one off
> things where the gpu buffer had not been cleared when the next test
> starts.. or something like that.
>
> Thanks this is a big help in figuring out what's happening.
>
>
>
> Sent from my Verizon Wireless 4G LTE smartphone
>
>
> -------- Original message --------
> From: Trevor Grant <tr...@gmail.com>
> Date: 10/28/2016 4:01 PM (GMT-08:00)
> To: dev@mahout.apache.org
> Subject: Re: Errors when 0.13.0 master or vcl
>
> I was finally able to hit an OOM error.
>
> I initially was running everything from the top.
>
> When I run mvn test from the viennacl directory...
>
>
> /mahout/viennacl$ mvn test
> [INFO] Scanning for projects...
> [WARNING]
> [WARNING] Some problems were encountered while building the effective model
> for org.apache.mahout:mahout-native-viennacl_2.10:jar:0.13.0-SNAPSHOT
> [WARNING] 'artifactId' contains an expression but should be a constant. @
> org.apache.mahout:mahout-native-viennacl_${scala.compat.version}:[unknown-
> version],
> /home/trevor/gits/mahout/viennacl/pom.xml, line 31, column 15
> [WARNING] 'build.plugins.plugin.(groupId:artifactId)' must be unique but
> found duplicate declaration of plugin org.codehaus.mojo:exec-maven-plugin
> @
> org.apache.mahout:mahout-native-viennacl_${scala.compat.version}:[unknown-
> version],
> /home/trevor/gits/mahout/viennacl/pom.xml, line 181, column 15
> [WARNING]
> [WARNING] It is highly recommended to fix these problems because they
> threaten the stability of your build.
> [WARNING]
> [WARNING] For this reason, future Maven versions might no longer support
> building such malformed projects.
> [WARNING]
> [INFO]
>
> [INFO]
> ------------------------------------------------------------------------
> [INFO] Building Mahout Native VienniaCL OpenCL Bindings 0.13.0-SNAPSHOT
> [INFO]
> ------------------------------------------------------------------------
> [INFO]
> [INFO] --- maven-enforcer-plugin:1.4:enforce (enforce-versions) @
> mahout-native-viennacl_2.10 ---
> [INFO]
> [INFO] --- scala-maven-plugin:3.2.0:add-source (add-scala-sources) @
> mahout-native-viennacl_2.10 ---
> [INFO] Add Source directory:
> /home/trevor/gits/mahout/viennacl/src/main/scala
> [INFO] Add Test Source directory:
> /home/trevor/gits/mahout/viennacl/src/test/scala
> [INFO]
> [INFO] --- maven-dependency-plugin:2.3:properties (default) @
> mahout-native-viennacl_2.10 ---
> [INFO]
> [INFO] --- maven-remote-resources-plugin:1.5:process (default) @
> mahout-native-viennacl_2.10 ---
> [INFO]
> [INFO] --- maven-resources-plugin:2.7:resources (default-resources) @
> mahout-native-viennacl_2.10 ---
> [INFO] Using 'UTF-8' encoding to copy filtered resources.
> [INFO] skip non existing resourceDirectory
> /home/trevor/gits/mahout/viennacl/src/main/resources
> [INFO] Copying 3 resources
> [INFO]
> [INFO] --- scala-maven-plugin:3.2.0:compile (scala-compile) @
> mahout-native-viennacl_2.10 ---
> [INFO] Nothing to compile - all classes are up to date
> [INFO]
> [INFO] --- maven-compiler-plugin:3.3:compile (default-compile) @
> mahout-native-viennacl_2.10 ---
> [INFO] Nothing to compile - all classes are up to date
> [INFO]
> [INFO] --- exec-maven-plugin:1.2.1:exec (javacpp) @
> mahout-native-viennacl_2.10 ---
> Warning: Could not load platform properties for class
> org.apache.mahout.viennacl.opencl.GPUMMul
> Warning: Could not load platform properties for class
> org.apache.mahout.viennacl.opencl.GPUMMul$
> Generating
> /home/trevor/gits/mahout/viennacl/target/classes/org/
> apache/mahout/viennacl/opencl/javacpp/jniViennaCL.cpp
> Compiling
> /home/trevor/gits/mahout/viennacl/target/classes/org/
> apache/mahout/viennacl/opencl/javacpp/linux-x86_64/libjniViennaCL.so
> g++ -I/usr/include/viennacl -I/usr/lib/jvm/java-7-openjdk-amd64/include
> -I/usr/lib/jvm/java-7-openjdk-amd64/include/linux
> /home/trevor/gits/mahout/viennacl/target/classes/org/
> apache/mahout/viennacl/opencl/javacpp/jniViennaCL.cpp
> -msse3 -ffast-math -fopenmp -fpermissive -Wl,-rpath,$ORIGIN/
> -Wl,-z,noexecstack -Wl,-Bsymbolic -march=native -m64 -Wall -Ofast -fPIC
> -shared -s -o libjniViennaCL.so -lOpenCL
> Deleting
> /home/trevor/gits/mahout/viennacl/target/classes/org/
> apache/mahout/viennacl/opencl/javacpp/jniViennaCL.cpp
> [INFO]
> [INFO] --- maven-resources-plugin:2.7:testResources
> (default-testResources)
> @ mahout-native-viennacl_2.10 ---
> [INFO] Using 'UTF-8' encoding to copy filtered resources.
> [INFO] skip non existing resourceDirectory
> /home/trevor/gits/mahout/viennacl/src/test/resources
> [INFO] Copying 3 resources
> [INFO]
> [INFO] --- scala-maven-plugin:3.2.0:testCompile (scala-test-compile) @
> mahout-native-viennacl_2.10 ---
> [INFO] Nothing to compile - all classes are up to date
> [INFO]
> [INFO] --- maven-compiler-plugin:3.3:testCompile (default-testCompile) @
> mahout-native-viennacl_2.10 ---
> [INFO] Nothing to compile - all classes are up to date
> [INFO]
> [INFO] --- maven-surefire-plugin:2.18.1:test (default-test) @
> mahout-native-viennacl_2.10 ---
> [INFO] Tests are skipped.
> [INFO]
> [INFO] --- scalatest-maven-plugin:1.0:test (test) @
> mahout-native-viennacl_2.10 ---
> Discovery starting.
> Discovery completed in 201 milliseconds.
> Run starting. Expected test count is: 7
> ViennaCLSuiteVCL:
> - row-major viennacl::matrix
>   + OCL matrix memory domain after assgn=2
> [INFO] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
> [INFO] Successfully created org.apache.mahout.viennacl.opencl.GPUMMul
> solver
> jvmRWRW
> gpuRWCW
> log4j:WARN No appenders could be found for logger
> (org.apache.mahout.viennacl.opencl.GPUMMul$).
> log4j:WARN Please initialize the log4j system properly.
> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for
> more info.
> [INFO] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
> [INFO] Successfully created org.apache.mahout.viennacl.opencl.GPUMMul
> solver
> jvmRWRW
> gpuRWCW
> - dense vcl mmul with fast_copy
> [INFO] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
> [INFO] Successfully created org.apache.mahout.viennacl.opencl.GPUMMul
> solver
> jvmRWRW
> gpuRWCW
> - mmul microbenchmark
>   + Mahout multiplication time: 2630 ms.
>   + ViennaCL/OpenCL multiplication time: 2072 ms.
>   + ViennaCL/cpu/OpenMP multiplication time: 813 ms.
> - trans
> [INFO] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
> [INFO] Successfully created org.apache.mahout.viennacl.opencl.GPUMMul
> solver
> gpuSparseRWRW
> ViennaCL: FATAL ERROR: Kernel start failed for 'spgemm_stage3'.
> ViennaCL: Smaller work sizes could not solve the problem.
> ViennaCL: FATAL ERROR: CL_MEM_OBJECT_ALLOCATION_FAILURE
>  ViennaCL could not allocate memory on the device. Most likely the device
> simply ran out of memory.
> If you think that this is a bug in ViennaCL, please report it at
> viennacl-support@lists.sourceforge.net and supply at least the following
> information:
>  * Operating System
>  * Which OpenCL implementation (AMD, NVIDIA, etc.)
>  * ViennaCL version
> Many thanks in advance!falling back to JVM MMUL
> ViennaCL: FATAL ERROR: Kernel start failed for 'spgemm_stage3'.
> ViennaCL: Smaller work sizes could not solve the problem.
> - sparse mmul microbenchmark *** FAILED ***
>   java.lang.RuntimeException: ViennaCL: FATAL ERROR:
> CL_MEM_OBJECT_ALLOCATION_FAILURE
>  ViennaCL could not allocate memory on the device. Most likely the device
> simply ran out of memory.
> If you think that this is a bug in ViennaCL, please report it at
> viennacl-support@lists.sourceforge.net and supply at least the following
> information:
>  * Operating System
>  * Which OpenCL implementation (AMD, NVIDIA, etc.)
>  * ViennaCL version
> Many thanks in advance!
>   at
> org.apache.mahout.viennacl.opencl.javacpp.CompressedMatrix.allocate(Native
> Method)
>   at
> org.apache.mahout.viennacl.opencl.javacpp.CompressedMatrix.<init>(
> CompressedMatrix.scala:61)
>   at
> org.apache.mahout.viennacl.opencl.ViennaCLSuiteVCL$$
> anonfun$5.apply$mcV$sp(ViennaCLSuiteVCL.scala:250)
>   at
> org.apache.mahout.viennacl.opencl.ViennaCLSuiteVCL$$anonfun$5.apply(
> ViennaCLSuiteVCL.scala:218)
>   at
> org.apache.mahout.viennacl.opencl.ViennaCLSuiteVCL$$anonfun$5.apply(
> ViennaCLSuiteVCL.scala:218)
>   at
> org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(
> Transformer.scala:22)
>   at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
>   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
>   at org.scalatest.Transformer.apply(Transformer.scala:22)
>   at org.scalatest.Transformer.apply(Transformer.scala:20)
>   ...
>   + Mahout Sparse multiplication time: 13303 ms.
> - VCL Dense Matrix %*% Dense vector
>   + Mahout dense matrix %*% dense vector multiplication time: 0 ms.
>   + ViennaCL/cpu/OpenMP dense matrix %*% dense vector multiplication time:
> 4 ms.
> [INFO] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
> [INFO] Successfully created org.apache.mahout.viennacl.opencl.GPUMMul
> solver
> gpuRWCW
> - Sparse %*% Dense mmul microbenchmark
>   + Mahout multiplication time: 821 ms.
>   + ViennaCL/OpenCL multiplication time: 754 ms.
>   + ViennaCL/cpu/OpenMP multiplication time: 1473 ms.
> Run completed in 26 seconds, 944 milliseconds.
> Total number of tests run: 7
> Suites: completed 2, aborted 0
> Tests: succeeded 6, failed 1, canceled 0, ignored 0, pending 0
> *** 1 TEST FAILED ***
> [INFO]
> ------------------------------------------------------------------------
> [INFO] BUILD FAILURE
> [INFO]
> ------------------------------------------------------------------------
> [INFO] Total time: 45.600 s
> [INFO] Finished at: 2016-10-28T17:58:45-05:00
> [INFO] Final Memory: 20M/309M
> [INFO]
> ------------------------------------------------------------------------
> [ERROR] Failed to execute goal
> org.scalatest:scalatest-maven-plugin:1.0:test (test) on project
> mahout-native-viennacl_2.10: There are test failures -> [Help 1]
> [ERROR]
> [ERROR] To see the full stack trace of the errors, re-run Maven with the -e
> switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR]
> [ERROR] For more information about the errors and possible solutions,
> please read the following articles:
> [ERROR] [Help 1]
> http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
>
>
> Trevor Grant
> Data Scientist
> https://github.com/rawkintrevo
> http://stackexchange.com/users/3002022/rawkintrevo
> http://trevorgrant.org
>
> *"Fortunate is he, who is able to know the causes of things."  -Virgil*
>
>
> On Fri, Oct 28, 2016 at 5:38 PM, Andrew Musselman <
> andrew.musselman@gmail.com> wrote:
>
> > >
> > > My Desktop
> >
> > >
> > Branch: master
> >
> > > Build Command: mvn clean install -Phadoop2
> >
> > > Built and tested successfully
> >
> > >
> > Branch: andrewpalumbo/mahout/viennacl-opmmul-a
> >
> > > Bulid Command: mvn clean install -Pviennacl -Phadoop2
> >
> > Builds successfully-
> >
> > > Tests still riddled with
> >
> > >
> > [INFO] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
> >
> > > [WARN] Unable to create class GPUMMul: attempting OpenMP version
> >
> > > [INFO] Creating org.apache.mahout.viennacl.openmp.OMPMMul solver
> >
> > > [INFO] Unable to create class OMPMMul: falling back to java version
> >
> > >
> > Also OOM
> > ViennaCLSuiteOMP:
> > - row-major viennacl::matrix
> > [INFO] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
> > [WARN] Unable to create class GPUMMul: attempting OpenMP version
> > [INFO] Creating org.apache.mahout.viennacl.openmp.OMPMMul solver
> > [INFO] Unable to create class OMPMMul: falling back to java version
> > - mmul microbenchmark
> >   + Mahout multiplication time: 8875 ms.
> >   + ViennaCL/cpu/OpenMP multiplication time: 599 ms.
> > - trans
> > [INFO] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
> > [WARN] Unable to create class GPUMMul: attempting OpenMP version
> > [INFO] Creating org.apache.mahout.viennacl.openmp.OMPMMul solver
> > [INFO] Unable to create class OMPMMul: falling back to java version
> > *** RUN ABORTED ***
> >   java.lang.OutOfMemoryError: Java heap space
> >   at
> > it.unimi.dsi.fastutil.ints.Int2DoubleOpenHashMap.rehash(
> > Int2DoubleOpenHashMap.java:1059)
> >   at
> > it.unimi.dsi.fastutil.ints.Int2DoubleOpenHashMap.insert(
> > Int2DoubleOpenHashMap.java:295)
> >   at
> > it.unimi.dsi.fastutil.ints.Int2DoubleOpenHashMap.put(
> > Int2DoubleOpenHashMap.java:301)
> >   at
> > org.apache.mahout.math.RandomAccessSparseVector.setQuick(
> > RandomAccessSparseVector.java:130)
> >   at
> > org.apache.mahout.math.SparseRowMatrix.setQuick(
> SparseRowMatrix.java:105)
> >   at org.apache.mahout.math.AbstractMatrix.assign(
> AbstractMatrix.java:256)
> >   at
> > org.apache.mahout.math.scalabindings.MatrixOps.$
> > colon$eq(MatrixOps.scala:192)
> >   at
> > org.apache.mahout.math.scalabindings.MatrixOps.
> cloned(MatrixOps.scala:260)
> >   at
> > org.apache.mahout.math.scalabindings.MatrixOps.$
> minus(MatrixOps.scala:66)
> >   at
> > org.apache.mahout.viennacl.openmp.ViennaCLSuiteOMP$$
> > anonfun$4.apply$mcV$sp(ViennaCLSuiteOMP.scala:152)
> >
> > >
> >
> > Machine Info:
> >
> > > ============================================================
> > ====================
> >
> > >
> >  os major/minor:
> >
> > > $ lsb_release -a
> > No LSB modules are available.
> > Distributor ID: Ubuntu
> > Description: Ubuntu 16.04.1 LTS
> > Release: 16.04
> > Codename: xenial
> >
> > >
> >
> >  chip arch:
> >
> > > $ uname -a
> > Linux Bob 4.4.0-34-generic #53-Ubuntu SMP Wed Jul 27 16:06:39 UTC 2016
> > x86_64 x86_64 x86_64 GNU/Linux
> >
> > >
> > $ cat /proc/cpuinfo
> >
> > > ...
> >
> > > model name : Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz
> >
> > > ...
> >
> > > ^^ 6 cores
> >
> > >
> >
> > GPU:
> >
> > > $ sudo nvidia-smi
> >
> > >
> > +-----------------------------------------------------------
> > ------------------+
> > | NVIDIA-SMI 367.44                 Driver Version: 367.44
> >    |
> > |-------------------------------+----------------------+----
> > ------------------+
> > | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr.
> > ECC |
> > | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util
> Compute
> > M. |
> > |===============================+======================+====
> > ==================|
> > |   0  GeForce GTX 750 Ti  Off  | 0000:01:00.0      On |
> >  N/A |
> > | 29%   25C    P8     1W /  38W |    307MiB /  1998MiB |      0%
> >  Default |
> > +-------------------------------+----------------------+----
> > ------------------+
> >
> >
> > +-----------------------------------------------------------
> > ------------------+
> > | Processes:                                                       GPU
> > Memory |
> > |  GPU       PID  Type  Process name                               Usage
> >    |
> > |===========================================================
> > ==================|
> > |    0      1537    G   ...C-EnableWebRtcEcdsa,WebRTC-H264WithOpenH2
> >  87MiB |
> > |    0      1990    G   /usr/lib/xorg/Xorg
> > 101MiB |
> > |    0      2319    G   /usr/bin/gnome-shell
> > 104MiB |
> > |    0      4360    G   /usr/lib/xorg/Xorg
> >  12MiB |
> > +-----------------------------------------------------------
> > ------------------+
> >
> > >
> >
> >
> >
> >  clinfo output:
> >
> > > $ clinfo
> > Number of platforms                               1
> >   Platform Name                                   NVIDIA CUDA
> >   Platform Vendor                                 NVIDIA Corporation
> >   Platform Version                                OpenCL 1.2 CUDA 8.0.0
> >   Platform Profile                                FULL_PROFILE
> >   Platform Extensions
> > cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics
> > cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics
> > cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing
> > cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll
> > cl_nv_copy_opts
> >   Platform Extensions function suffix             NV
> >
> >   Platform Name                                   NVIDIA CUDA
> > Number of devices                                 1
> >   Device Name                                     GeForce GTX 750 Ti
> >   Device Vendor                                   NVIDIA Corporation
> >   Device Vendor ID                                0x10de
> >   Device Version                                  OpenCL 1.2 CUDA
> >   Driver Version                                  367.44
> >   Device OpenCL C Version                         OpenCL C 1.2
> >   Device Type                                     GPU
> >   Device Profile                                  FULL_PROFILE
> >   Device Topology (NV)                            PCI-E, 01:00.0
> >   Max compute units                               5
> >   Max clock frequency                             1150MHz
> >   Compute Capability (NV)                         5.0
> >   Device Partition                                (core)
> >     Max number of sub-devices                     1
> >     Supported partition types                     None
> >   Max work item dimensions                        3
> >   Max work item sizes                             1024x1024x64
> >   Max work group size                             1024
> >   Preferred work group size multiple              32
> >   Warp size (NV)                                  32
> >   Preferred / native vector sizes
> >     char                                                 1 / 1
> >     short                                                1 / 1
> >     int                                                  1 / 1
> >     long                                                 1 / 1
> >     half                                                 0 / 0
> (n/a)
> >     float                                                1 / 1
> >     double                                               1 / 1
> >  (cl_khr_fp64)
> >   Half-precision Floating-point support           (n/a)
> >   Single-precision Floating-point support         (core)
> >     Denormals                                     Yes
> >     Infinity and NANs                             Yes
> >     Round to nearest                              Yes
> >     Round to zero                                 Yes
> >     Round to infinity                             Yes
> >     IEEE754-2008 fused multiply-add               Yes
> >     Support is emulated in software               No
> >     Correctly-rounded divide and sqrt operations  Yes
> >   Double-precision Floating-point support         (cl_khr_fp64)
> >     Denormals                                     Yes
> >     Infinity and NANs                             Yes
> >     Round to nearest                              Yes
> >     Round to zero                                 Yes
> >     Round to infinity                             Yes
> >     IEEE754-2008 fused multiply-add               Yes
> >     Support is emulated in software               No
> >     Correctly-rounded divide and sqrt operations  No
> >   Address bits                                    64, Little-Endian
> >   Global memory size                              2095841280 (1.952GiB)
> >   Error Correction support                        No
> >   Max memory allocation                           523960320 (499.7MiB)
> >   Unified memory for Host and Device              No
> >   Integrated memory (NV)                          No
> >   Minimum alignment for any data type             128 bytes
> >   Alignment of base address                       4096 bits (512 bytes)
> >   Global Memory cache type                        Read/Write
> >   Global Memory cache size                        81920
> >   Global Memory cache line                        128 bytes
> >   Image support                                   Yes
> >     Max number of samplers per kernel             32
> >     Max size for 1D images from buffer            134217728 pixels
> >     Max 1D or 2D image array size                 2048 images
> >     Max 2D image size                             16384x16384 pixels
> >     Max 3D image size                             4096x4096x4096 pixels
> >     Max number of read image args                 256
> >     Max number of write image args                16
> >   Local memory type                               Local
> >   Local memory size                               49152 (48KiB)
> >   Registers per block (NV)                        65536
> >   Max constant buffer size                        65536 (64KiB)
> >   Max number of constant args                     9
> >   Max size of kernel argument                     4352 (4.25KiB)
> >   Queue properties
> >     Out-of-order execution                        Yes
> >     Profiling                                     Yes
> >   Prefer user sync for interop                    No
> >   Profiling timer resolution                      1000ns
> >   Execution capabilities
> >     Run OpenCL kernels                            Yes
> >     Run native kernels                            No
> >     Kernel execution timeout (NV)                 Yes
> >   Concurrent copy and kernel execution (NV)       Yes
> >     Number of async copy engines                  1
> >   printf() buffer size                            1048576 (1024KiB)
> >   Built-in kernels
> >   Device Available                                Yes
> >   Compiler Available                              Yes
> >   Linker Available                                Yes
> >   Device Extensions
> > cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics
> > cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics
> > cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing
> > cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll
> > cl_nv_copy_opts
> >
> > NULL platform behavior
> >   clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  NVIDIA CUDA
> >   clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   Success [NV]
> >   clCreateContext(NULL, ...) [default]            Success [NV]
> >   clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No devices found in
> > platform
> >   clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  No platform
> >   clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No devices
> > found in platform
> >   clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  No devices found
> in
> > platform
> >   clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  No platform
> >
> > ICD loader properties
> >   ICD loader Name                                 OpenCL ICD Loader
> >   ICD loader Vendor                               OCL Icd free software
> >   ICD loader Version                              2.2.8
> >   ICD loader Profile                              OpenCL 1.2
> > NOTE: your OpenCL library declares to support OpenCL 1.2,
> > but it seems to support up to OpenCL 2.1 too.
> >
> > >
> >
> >  Java major/minor and vendor:
> >
> > > $ java -version
> > openjdk version "1.8.0_91"
> > OpenJDK Runtime Environment (build 1.8.0_91-8u91-b14-3ubuntu1~16.
> 04.1-b14)
> > OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode)
> >
> > >
> >
> >   vcl version,
> >
> > > (Ignoring what is in /usr/include/viennacl/version.hpp)
> >
> > > 1.7.1
> >
> >
> > gcc version:
> >
> > > $ gcc --version
> > gcc (Ubuntu 5.4.0-6ubuntu1~16.04.2) 5.4.0 20160609
> >
> > >
> >
> > On Wed, Oct 26, 2016 at 11:31 PM, Andrew Palumbo <ap...@outlook.com>
> > wrote:
> >
> > >
> > >
> > >
> > > Could everybody please post any errors when stack trace that they are
> > > getting on master or the vcl pr branch along with full system info?
> > >
> > > Ie.  Branch, os major/minor, chip arch, GPU, clinfo output, Java
> > > major/minor and vendor,  vcl version, gcc version, and anything else
> that
> > > may be useful so that we may compare?
> > >
> > > Thx
> > >
> > > Andy
> > >
> > >
> > > Sent from my Galaxy Tab A
> > >
> >
>

RE: Errors when 0.13.0 master or vcl

Posted by Andrew Palumbo <ap...@outlook.com>.
Hmm.. ok the viennacl-omp tests failing is strange.  But actually I have not spent as much time on that stuff.

I'll be looking into this later this afternoon..

Suneel, could you send ur errors and arch details please?

I'll also send out a c++ snippet with some raw vcl matrix ops just to make sure that gpus are working.

Andrew.. I know u said u didn't see any gpu usage?

Ok I'll get on this later this afternoon... it's a paycheck morning for me.



Sent from my Verizon Wireless 4G LTE smartphone


-------- Original message --------
From: Andrew Musselman <an...@gmail.com>
Date: 10/29/2016 10:45 AM (GMT-08:00)
To: dev@mahout.apache.org
Subject: Re: Errors when 0.13.0 master or vcl

It's not reliable; I've had the -Pviennacl build pass and fail, and the
-Pviennacl-omp build fail but not pass.

On Sat, Oct 29, 2016 at 10:27 AM, Andrew Palumbo <ap...@outlook.com> wrote:

> Ok this makes sense.. it's not actually a (java) OOM error, it's actually
> a blanket GPU error... that's why we have the fall backs to omp and jvm..
>
> Does it happen each time you run that test?  Some times these are one off
> things where the gpu buffer had not been cleared when the next test
> starts.. or something like that.
>
> Thanks this is a big help in figuring out what's happening.
>
>
>
> Sent from my Verizon Wireless 4G LTE smartphone
>
>
> -------- Original message --------
> From: Trevor Grant <tr...@gmail.com>
> Date: 10/28/2016 4:01 PM (GMT-08:00)
> To: dev@mahout.apache.org
> Subject: Re: Errors when 0.13.0 master or vcl
>
> I was finally able to hit an OOM error.
>
> I initially was running everything from the top.
>
> When I run mvn test from the viennacl directory...
>
>
> /mahout/viennacl$ mvn test
> [INFO] Scanning for projects...
> [WARNING]
> [WARNING] Some problems were encountered while building the effective model
> for org.apache.mahout:mahout-native-viennacl_2.10:jar:0.13.0-SNAPSHOT
> [WARNING] 'artifactId' contains an expression but should be a constant. @
> org.apache.mahout:mahout-native-viennacl_${scala.compat.version}:[unknown-
> version],
> /home/trevor/gits/mahout/viennacl/pom.xml, line 31, column 15
> [WARNING] 'build.plugins.plugin.(groupId:artifactId)' must be unique but
> found duplicate declaration of plugin org.codehaus.mojo:exec-maven-plugin
> @
> org.apache.mahout:mahout-native-viennacl_${scala.compat.version}:[unknown-
> version],
> /home/trevor/gits/mahout/viennacl/pom.xml, line 181, column 15
> [WARNING]
> [WARNING] It is highly recommended to fix these problems because they
> threaten the stability of your build.
> [WARNING]
> [WARNING] For this reason, future Maven versions might no longer support
> building such malformed projects.
> [WARNING]
> [INFO]
>
> [INFO]
> ------------------------------------------------------------------------
> [INFO] Building Mahout Native VienniaCL OpenCL Bindings 0.13.0-SNAPSHOT
> [INFO]
> ------------------------------------------------------------------------
> [INFO]
> [INFO] --- maven-enforcer-plugin:1.4:enforce (enforce-versions) @
> mahout-native-viennacl_2.10 ---
> [INFO]
> [INFO] --- scala-maven-plugin:3.2.0:add-source (add-scala-sources) @
> mahout-native-viennacl_2.10 ---
> [INFO] Add Source directory:
> /home/trevor/gits/mahout/viennacl/src/main/scala
> [INFO] Add Test Source directory:
> /home/trevor/gits/mahout/viennacl/src/test/scala
> [INFO]
> [INFO] --- maven-dependency-plugin:2.3:properties (default) @
> mahout-native-viennacl_2.10 ---
> [INFO]
> [INFO] --- maven-remote-resources-plugin:1.5:process (default) @
> mahout-native-viennacl_2.10 ---
> [INFO]
> [INFO] --- maven-resources-plugin:2.7:resources (default-resources) @
> mahout-native-viennacl_2.10 ---
> [INFO] Using 'UTF-8' encoding to copy filtered resources.
> [INFO] skip non existing resourceDirectory
> /home/trevor/gits/mahout/viennacl/src/main/resources
> [INFO] Copying 3 resources
> [INFO]
> [INFO] --- scala-maven-plugin:3.2.0:compile (scala-compile) @
> mahout-native-viennacl_2.10 ---
> [INFO] Nothing to compile - all classes are up to date
> [INFO]
> [INFO] --- maven-compiler-plugin:3.3:compile (default-compile) @
> mahout-native-viennacl_2.10 ---
> [INFO] Nothing to compile - all classes are up to date
> [INFO]
> [INFO] --- exec-maven-plugin:1.2.1:exec (javacpp) @
> mahout-native-viennacl_2.10 ---
> Warning: Could not load platform properties for class
> org.apache.mahout.viennacl.opencl.GPUMMul
> Warning: Could not load platform properties for class
> org.apache.mahout.viennacl.opencl.GPUMMul$
> Generating
> /home/trevor/gits/mahout/viennacl/target/classes/org/
> apache/mahout/viennacl/opencl/javacpp/jniViennaCL.cpp
> Compiling
> /home/trevor/gits/mahout/viennacl/target/classes/org/
> apache/mahout/viennacl/opencl/javacpp/linux-x86_64/libjniViennaCL.so
> g++ -I/usr/include/viennacl -I/usr/lib/jvm/java-7-openjdk-amd64/include
> -I/usr/lib/jvm/java-7-openjdk-amd64/include/linux
> /home/trevor/gits/mahout/viennacl/target/classes/org/
> apache/mahout/viennacl/opencl/javacpp/jniViennaCL.cpp
> -msse3 -ffast-math -fopenmp -fpermissive -Wl,-rpath,$ORIGIN/
> -Wl,-z,noexecstack -Wl,-Bsymbolic -march=native -m64 -Wall -Ofast -fPIC
> -shared -s -o libjniViennaCL.so -lOpenCL
> Deleting
> /home/trevor/gits/mahout/viennacl/target/classes/org/
> apache/mahout/viennacl/opencl/javacpp/jniViennaCL.cpp
> [INFO]
> [INFO] --- maven-resources-plugin:2.7:testResources
> (default-testResources)
> @ mahout-native-viennacl_2.10 ---
> [INFO] Using 'UTF-8' encoding to copy filtered resources.
> [INFO] skip non existing resourceDirectory
> /home/trevor/gits/mahout/viennacl/src/test/resources
> [INFO] Copying 3 resources
> [INFO]
> [INFO] --- scala-maven-plugin:3.2.0:testCompile (scala-test-compile) @
> mahout-native-viennacl_2.10 ---
> [INFO] Nothing to compile - all classes are up to date
> [INFO]
> [INFO] --- maven-compiler-plugin:3.3:testCompile (default-testCompile) @
> mahout-native-viennacl_2.10 ---
> [INFO] Nothing to compile - all classes are up to date
> [INFO]
> [INFO] --- maven-surefire-plugin:2.18.1:test (default-test) @
> mahout-native-viennacl_2.10 ---
> [INFO] Tests are skipped.
> [INFO]
> [INFO] --- scalatest-maven-plugin:1.0:test (test) @
> mahout-native-viennacl_2.10 ---
> Discovery starting.
> Discovery completed in 201 milliseconds.
> Run starting. Expected test count is: 7
> ViennaCLSuiteVCL:
> - row-major viennacl::matrix
>   + OCL matrix memory domain after assgn=2
> [INFO] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
> [INFO] Successfully created org.apache.mahout.viennacl.opencl.GPUMMul
> solver
> jvmRWRW
> gpuRWCW
> log4j:WARN No appenders could be found for logger
> (org.apache.mahout.viennacl.opencl.GPUMMul$).
> log4j:WARN Please initialize the log4j system properly.
> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for
> more info.
> [INFO] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
> [INFO] Successfully created org.apache.mahout.viennacl.opencl.GPUMMul
> solver
> jvmRWRW
> gpuRWCW
> - dense vcl mmul with fast_copy
> [INFO] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
> [INFO] Successfully created org.apache.mahout.viennacl.opencl.GPUMMul
> solver
> jvmRWRW
> gpuRWCW
> - mmul microbenchmark
>   + Mahout multiplication time: 2630 ms.
>   + ViennaCL/OpenCL multiplication time: 2072 ms.
>   + ViennaCL/cpu/OpenMP multiplication time: 813 ms.
> - trans
> [INFO] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
> [INFO] Successfully created org.apache.mahout.viennacl.opencl.GPUMMul
> solver
> gpuSparseRWRW
> ViennaCL: FATAL ERROR: Kernel start failed for 'spgemm_stage3'.
> ViennaCL: Smaller work sizes could not solve the problem.
> ViennaCL: FATAL ERROR: CL_MEM_OBJECT_ALLOCATION_FAILURE
>  ViennaCL could not allocate memory on the device. Most likely the device
> simply ran out of memory.
> If you think that this is a bug in ViennaCL, please report it at
> viennacl-support@lists.sourceforge.net and supply at least the following
> information:
>  * Operating System
>  * Which OpenCL implementation (AMD, NVIDIA, etc.)
>  * ViennaCL version
> Many thanks in advance!falling back to JVM MMUL
> ViennaCL: FATAL ERROR: Kernel start failed for 'spgemm_stage3'.
> ViennaCL: Smaller work sizes could not solve the problem.
> - sparse mmul microbenchmark *** FAILED ***
>   java.lang.RuntimeException: ViennaCL: FATAL ERROR:
> CL_MEM_OBJECT_ALLOCATION_FAILURE
>  ViennaCL could not allocate memory on the device. Most likely the device
> simply ran out of memory.
> If you think that this is a bug in ViennaCL, please report it at
> viennacl-support@lists.sourceforge.net and supply at least the following
> information:
>  * Operating System
>  * Which OpenCL implementation (AMD, NVIDIA, etc.)
>  * ViennaCL version
> Many thanks in advance!
>   at
> org.apache.mahout.viennacl.opencl.javacpp.CompressedMatrix.allocate(Native
> Method)
>   at
> org.apache.mahout.viennacl.opencl.javacpp.CompressedMatrix.<init>(
> CompressedMatrix.scala:61)
>   at
> org.apache.mahout.viennacl.opencl.ViennaCLSuiteVCL$$
> anonfun$5.apply$mcV$sp(ViennaCLSuiteVCL.scala:250)
>   at
> org.apache.mahout.viennacl.opencl.ViennaCLSuiteVCL$$anonfun$5.apply(
> ViennaCLSuiteVCL.scala:218)
>   at
> org.apache.mahout.viennacl.opencl.ViennaCLSuiteVCL$$anonfun$5.apply(
> ViennaCLSuiteVCL.scala:218)
>   at
> org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(
> Transformer.scala:22)
>   at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
>   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
>   at org.scalatest.Transformer.apply(Transformer.scala:22)
>   at org.scalatest.Transformer.apply(Transformer.scala:20)
>   ...
>   + Mahout Sparse multiplication time: 13303 ms.
> - VCL Dense Matrix %*% Dense vector
>   + Mahout dense matrix %*% dense vector multiplication time: 0 ms.
>   + ViennaCL/cpu/OpenMP dense matrix %*% dense vector multiplication time:
> 4 ms.
> [INFO] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
> [INFO] Successfully created org.apache.mahout.viennacl.opencl.GPUMMul
> solver
> gpuRWCW
> - Sparse %*% Dense mmul microbenchmark
>   + Mahout multiplication time: 821 ms.
>   + ViennaCL/OpenCL multiplication time: 754 ms.
>   + ViennaCL/cpu/OpenMP multiplication time: 1473 ms.
> Run completed in 26 seconds, 944 milliseconds.
> Total number of tests run: 7
> Suites: completed 2, aborted 0
> Tests: succeeded 6, failed 1, canceled 0, ignored 0, pending 0
> *** 1 TEST FAILED ***
> [INFO]
> ------------------------------------------------------------------------
> [INFO] BUILD FAILURE
> [INFO]
> ------------------------------------------------------------------------
> [INFO] Total time: 45.600 s
> [INFO] Finished at: 2016-10-28T17:58:45-05:00
> [INFO] Final Memory: 20M/309M
> [INFO]
> ------------------------------------------------------------------------
> [ERROR] Failed to execute goal
> org.scalatest:scalatest-maven-plugin:1.0:test (test) on project
> mahout-native-viennacl_2.10: There are test failures -> [Help 1]
> [ERROR]
> [ERROR] To see the full stack trace of the errors, re-run Maven with the -e
> switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR]
> [ERROR] For more information about the errors and possible solutions,
> please read the following articles:
> [ERROR] [Help 1]
> http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
>
>
> Trevor Grant
> Data Scientist
> https://github.com/rawkintrevo
> http://stackexchange.com/users/3002022/rawkintrevo
> http://trevorgrant.org
>
> *"Fortunate is he, who is able to know the causes of things."  -Virgil*
>
>
> On Fri, Oct 28, 2016 at 5:38 PM, Andrew Musselman <
> andrew.musselman@gmail.com> wrote:
>
> > >
> > > My Desktop
> >
> > >
> > Branch: master
> >
> > > Build Command: mvn clean install -Phadoop2
> >
> > > Built and tested successfully
> >
> > >
> > Branch: andrewpalumbo/mahout/viennacl-opmmul-a
> >
> > > Bulid Command: mvn clean install -Pviennacl -Phadoop2
> >
> > Builds successfully-
> >
> > > Tests still riddled with
> >
> > >
> > [INFO] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
> >
> > > [WARN] Unable to create class GPUMMul: attempting OpenMP version
> >
> > > [INFO] Creating org.apache.mahout.viennacl.openmp.OMPMMul solver
> >
> > > [INFO] Unable to create class OMPMMul: falling back to java version
> >
> > >
> > Also OOM
> > ViennaCLSuiteOMP:
> > - row-major viennacl::matrix
> > [INFO] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
> > [WARN] Unable to create class GPUMMul: attempting OpenMP version
> > [INFO] Creating org.apache.mahout.viennacl.openmp.OMPMMul solver
> > [INFO] Unable to create class OMPMMul: falling back to java version
> > - mmul microbenchmark
> >   + Mahout multiplication time: 8875 ms.
> >   + ViennaCL/cpu/OpenMP multiplication time: 599 ms.
> > - trans
> > [INFO] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
> > [WARN] Unable to create class GPUMMul: attempting OpenMP version
> > [INFO] Creating org.apache.mahout.viennacl.openmp.OMPMMul solver
> > [INFO] Unable to create class OMPMMul: falling back to java version
> > *** RUN ABORTED ***
> >   java.lang.OutOfMemoryError: Java heap space
> >   at
> > it.unimi.dsi.fastutil.ints.Int2DoubleOpenHashMap.rehash(
> > Int2DoubleOpenHashMap.java:1059)
> >   at
> > it.unimi.dsi.fastutil.ints.Int2DoubleOpenHashMap.insert(
> > Int2DoubleOpenHashMap.java:295)
> >   at
> > it.unimi.dsi.fastutil.ints.Int2DoubleOpenHashMap.put(
> > Int2DoubleOpenHashMap.java:301)
> >   at
> > org.apache.mahout.math.RandomAccessSparseVector.setQuick(
> > RandomAccessSparseVector.java:130)
> >   at
> > org.apache.mahout.math.SparseRowMatrix.setQuick(
> SparseRowMatrix.java:105)
> >   at org.apache.mahout.math.AbstractMatrix.assign(
> AbstractMatrix.java:256)
> >   at
> > org.apache.mahout.math.scalabindings.MatrixOps.$
> > colon$eq(MatrixOps.scala:192)
> >   at
> > org.apache.mahout.math.scalabindings.MatrixOps.
> cloned(MatrixOps.scala:260)
> >   at
> > org.apache.mahout.math.scalabindings.MatrixOps.$
> minus(MatrixOps.scala:66)
> >   at
> > org.apache.mahout.viennacl.openmp.ViennaCLSuiteOMP$$
> > anonfun$4.apply$mcV$sp(ViennaCLSuiteOMP.scala:152)
> >
> > >
> >
> > Machine Info:
> >
> > > ============================================================
> > ====================
> >
> > >
> >  os major/minor:
> >
> > > $ lsb_release -a
> > No LSB modules are available.
> > Distributor ID: Ubuntu
> > Description: Ubuntu 16.04.1 LTS
> > Release: 16.04
> > Codename: xenial
> >
> > >
> >
> >  chip arch:
> >
> > > $ uname -a
> > Linux Bob 4.4.0-34-generic #53-Ubuntu SMP Wed Jul 27 16:06:39 UTC 2016
> > x86_64 x86_64 x86_64 GNU/Linux
> >
> > >
> > $ cat /proc/cpuinfo
> >
> > > ...
> >
> > > model name : Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz
> >
> > > ...
> >
> > > ^^ 6 cores
> >
> > >
> >
> > GPU:
> >
> > > $ sudo nvidia-smi
> >
> > >
> > +-----------------------------------------------------------
> > ------------------+
> > | NVIDIA-SMI 367.44                 Driver Version: 367.44
> >    |
> > |-------------------------------+----------------------+----
> > ------------------+
> > | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr.
> > ECC |
> > | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util
> Compute
> > M. |
> > |===============================+======================+====
> > ==================|
> > |   0  GeForce GTX 750 Ti  Off  | 0000:01:00.0      On |
> >  N/A |
> > | 29%   25C    P8     1W /  38W |    307MiB /  1998MiB |      0%
> >  Default |
> > +-------------------------------+----------------------+----
> > ------------------+
> >
> >
> > +-----------------------------------------------------------
> > ------------------+
> > | Processes:                                                       GPU
> > Memory |
> > |  GPU       PID  Type  Process name                               Usage
> >    |
> > |===========================================================
> > ==================|
> > |    0      1537    G   ...C-EnableWebRtcEcdsa,WebRTC-H264WithOpenH2
> >  87MiB |
> > |    0      1990    G   /usr/lib/xorg/Xorg
> > 101MiB |
> > |    0      2319    G   /usr/bin/gnome-shell
> > 104MiB |
> > |    0      4360    G   /usr/lib/xorg/Xorg
> >  12MiB |
> > +-----------------------------------------------------------
> > ------------------+
> >
> > >
> >
> >
> >
> >  clinfo output:
> >
> > > $ clinfo
> > Number of platforms                               1
> >   Platform Name                                   NVIDIA CUDA
> >   Platform Vendor                                 NVIDIA Corporation
> >   Platform Version                                OpenCL 1.2 CUDA 8.0.0
> >   Platform Profile                                FULL_PROFILE
> >   Platform Extensions
> > cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics
> > cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics
> > cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing
> > cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll
> > cl_nv_copy_opts
> >   Platform Extensions function suffix             NV
> >
> >   Platform Name                                   NVIDIA CUDA
> > Number of devices                                 1
> >   Device Name                                     GeForce GTX 750 Ti
> >   Device Vendor                                   NVIDIA Corporation
> >   Device Vendor ID                                0x10de
> >   Device Version                                  OpenCL 1.2 CUDA
> >   Driver Version                                  367.44
> >   Device OpenCL C Version                         OpenCL C 1.2
> >   Device Type                                     GPU
> >   Device Profile                                  FULL_PROFILE
> >   Device Topology (NV)                            PCI-E, 01:00.0
> >   Max compute units                               5
> >   Max clock frequency                             1150MHz
> >   Compute Capability (NV)                         5.0
> >   Device Partition                                (core)
> >     Max number of sub-devices                     1
> >     Supported partition types                     None
> >   Max work item dimensions                        3
> >   Max work item sizes                             1024x1024x64
> >   Max work group size                             1024
> >   Preferred work group size multiple              32
> >   Warp size (NV)                                  32
> >   Preferred / native vector sizes
> >     char                                                 1 / 1
> >     short                                                1 / 1
> >     int                                                  1 / 1
> >     long                                                 1 / 1
> >     half                                                 0 / 0
> (n/a)
> >     float                                                1 / 1
> >     double                                               1 / 1
> >  (cl_khr_fp64)
> >   Half-precision Floating-point support           (n/a)
> >   Single-precision Floating-point support         (core)
> >     Denormals                                     Yes
> >     Infinity and NANs                             Yes
> >     Round to nearest                              Yes
> >     Round to zero                                 Yes
> >     Round to infinity                             Yes
> >     IEEE754-2008 fused multiply-add               Yes
> >     Support is emulated in software               No
> >     Correctly-rounded divide and sqrt operations  Yes
> >   Double-precision Floating-point support         (cl_khr_fp64)
> >     Denormals                                     Yes
> >     Infinity and NANs                             Yes
> >     Round to nearest                              Yes
> >     Round to zero                                 Yes
> >     Round to infinity                             Yes
> >     IEEE754-2008 fused multiply-add               Yes
> >     Support is emulated in software               No
> >     Correctly-rounded divide and sqrt operations  No
> >   Address bits                                    64, Little-Endian
> >   Global memory size                              2095841280 (1.952GiB)
> >   Error Correction support                        No
> >   Max memory allocation                           523960320 (499.7MiB)
> >   Unified memory for Host and Device              No
> >   Integrated memory (NV)                          No
> >   Minimum alignment for any data type             128 bytes
> >   Alignment of base address                       4096 bits (512 bytes)
> >   Global Memory cache type                        Read/Write
> >   Global Memory cache size                        81920
> >   Global Memory cache line                        128 bytes
> >   Image support                                   Yes
> >     Max number of samplers per kernel             32
> >     Max size for 1D images from buffer            134217728 pixels
> >     Max 1D or 2D image array size                 2048 images
> >     Max 2D image size                             16384x16384 pixels
> >     Max 3D image size                             4096x4096x4096 pixels
> >     Max number of read image args                 256
> >     Max number of write image args                16
> >   Local memory type                               Local
> >   Local memory size                               49152 (48KiB)
> >   Registers per block (NV)                        65536
> >   Max constant buffer size                        65536 (64KiB)
> >   Max number of constant args                     9
> >   Max size of kernel argument                     4352 (4.25KiB)
> >   Queue properties
> >     Out-of-order execution                        Yes
> >     Profiling                                     Yes
> >   Prefer user sync for interop                    No
> >   Profiling timer resolution                      1000ns
> >   Execution capabilities
> >     Run OpenCL kernels                            Yes
> >     Run native kernels                            No
> >     Kernel execution timeout (NV)                 Yes
> >   Concurrent copy and kernel execution (NV)       Yes
> >     Number of async copy engines                  1
> >   printf() buffer size                            1048576 (1024KiB)
> >   Built-in kernels
> >   Device Available                                Yes
> >   Compiler Available                              Yes
> >   Linker Available                                Yes
> >   Device Extensions
> > cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics
> > cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics
> > cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing
> > cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll
> > cl_nv_copy_opts
> >
> > NULL platform behavior
> >   clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  NVIDIA CUDA
> >   clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   Success [NV]
> >   clCreateContext(NULL, ...) [default]            Success [NV]
> >   clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No devices found in
> > platform
> >   clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  No platform
> >   clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No devices
> > found in platform
> >   clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  No devices found
> in
> > platform
> >   clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  No platform
> >
> > ICD loader properties
> >   ICD loader Name                                 OpenCL ICD Loader
> >   ICD loader Vendor                               OCL Icd free software
> >   ICD loader Version                              2.2.8
> >   ICD loader Profile                              OpenCL 1.2
> > NOTE: your OpenCL library declares to support OpenCL 1.2,
> > but it seems to support up to OpenCL 2.1 too.
> >
> > >
> >
> >  Java major/minor and vendor:
> >
> > > $ java -version
> > openjdk version "1.8.0_91"
> > OpenJDK Runtime Environment (build 1.8.0_91-8u91-b14-3ubuntu1~16.
> 04.1-b14)
> > OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode)
> >
> > >
> >
> >   vcl version,
> >
> > > (Ignoring what is in /usr/include/viennacl/version.hpp)
> >
> > > 1.7.1
> >
> >
> > gcc version:
> >
> > > $ gcc --version
> > gcc (Ubuntu 5.4.0-6ubuntu1~16.04.2) 5.4.0 20160609
> >
> > >
> >
> > On Wed, Oct 26, 2016 at 11:31 PM, Andrew Palumbo <ap...@outlook.com>
> > wrote:
> >
> > >
> > >
> > >
> > > Could everybody please post any errors when stack trace that they are
> > > getting on master or the vcl pr branch along with full system info?
> > >
> > > Ie.  Branch, os major/minor, chip arch, GPU, clinfo output, Java
> > > major/minor and vendor,  vcl version, gcc version, and anything else
> that
> > > may be useful so that we may compare?
> > >
> > > Thx
> > >
> > > Andy
> > >
> > >
> > > Sent from my Galaxy Tab A
> > >
> >
>

Re: Errors when 0.13.0 master or vcl

Posted by Andrew Musselman <an...@gmail.com>.
It's not reliable; I've had the -Pviennacl build pass and fail, and the
-Pviennacl-omp build fail but not pass.

On Sat, Oct 29, 2016 at 10:27 AM, Andrew Palumbo <ap...@outlook.com> wrote:

> Ok this makes sense.. it's not actually a (java) OOM error, it's actually
> a blanket GPU error... that's why we have the fall backs to omp and jvm..
>
> Does it happen each time you run that test?  Some times these are one off
> things where the gpu buffer had not been cleared when the next test
> starts.. or something like that.
>
> Thanks this is a big help in figuring out what's happening.
>
>
>
> Sent from my Verizon Wireless 4G LTE smartphone
>
>
> -------- Original message --------
> From: Trevor Grant <tr...@gmail.com>
> Date: 10/28/2016 4:01 PM (GMT-08:00)
> To: dev@mahout.apache.org
> Subject: Re: Errors when 0.13.0 master or vcl
>
> I was finally able to hit an OOM error.
>
> I initially was running everything from the top.
>
> When I run mvn test from the viennacl directory...
>
>
> /mahout/viennacl$ mvn test
> [INFO] Scanning for projects...
> [WARNING]
> [WARNING] Some problems were encountered while building the effective model
> for org.apache.mahout:mahout-native-viennacl_2.10:jar:0.13.0-SNAPSHOT
> [WARNING] 'artifactId' contains an expression but should be a constant. @
> org.apache.mahout:mahout-native-viennacl_${scala.compat.version}:[unknown-
> version],
> /home/trevor/gits/mahout/viennacl/pom.xml, line 31, column 15
> [WARNING] 'build.plugins.plugin.(groupId:artifactId)' must be unique but
> found duplicate declaration of plugin org.codehaus.mojo:exec-maven-plugin
> @
> org.apache.mahout:mahout-native-viennacl_${scala.compat.version}:[unknown-
> version],
> /home/trevor/gits/mahout/viennacl/pom.xml, line 181, column 15
> [WARNING]
> [WARNING] It is highly recommended to fix these problems because they
> threaten the stability of your build.
> [WARNING]
> [WARNING] For this reason, future Maven versions might no longer support
> building such malformed projects.
> [WARNING]
> [INFO]
>
> [INFO]
> ------------------------------------------------------------------------
> [INFO] Building Mahout Native VienniaCL OpenCL Bindings 0.13.0-SNAPSHOT
> [INFO]
> ------------------------------------------------------------------------
> [INFO]
> [INFO] --- maven-enforcer-plugin:1.4:enforce (enforce-versions) @
> mahout-native-viennacl_2.10 ---
> [INFO]
> [INFO] --- scala-maven-plugin:3.2.0:add-source (add-scala-sources) @
> mahout-native-viennacl_2.10 ---
> [INFO] Add Source directory:
> /home/trevor/gits/mahout/viennacl/src/main/scala
> [INFO] Add Test Source directory:
> /home/trevor/gits/mahout/viennacl/src/test/scala
> [INFO]
> [INFO] --- maven-dependency-plugin:2.3:properties (default) @
> mahout-native-viennacl_2.10 ---
> [INFO]
> [INFO] --- maven-remote-resources-plugin:1.5:process (default) @
> mahout-native-viennacl_2.10 ---
> [INFO]
> [INFO] --- maven-resources-plugin:2.7:resources (default-resources) @
> mahout-native-viennacl_2.10 ---
> [INFO] Using 'UTF-8' encoding to copy filtered resources.
> [INFO] skip non existing resourceDirectory
> /home/trevor/gits/mahout/viennacl/src/main/resources
> [INFO] Copying 3 resources
> [INFO]
> [INFO] --- scala-maven-plugin:3.2.0:compile (scala-compile) @
> mahout-native-viennacl_2.10 ---
> [INFO] Nothing to compile - all classes are up to date
> [INFO]
> [INFO] --- maven-compiler-plugin:3.3:compile (default-compile) @
> mahout-native-viennacl_2.10 ---
> [INFO] Nothing to compile - all classes are up to date
> [INFO]
> [INFO] --- exec-maven-plugin:1.2.1:exec (javacpp) @
> mahout-native-viennacl_2.10 ---
> Warning: Could not load platform properties for class
> org.apache.mahout.viennacl.opencl.GPUMMul
> Warning: Could not load platform properties for class
> org.apache.mahout.viennacl.opencl.GPUMMul$
> Generating
> /home/trevor/gits/mahout/viennacl/target/classes/org/
> apache/mahout/viennacl/opencl/javacpp/jniViennaCL.cpp
> Compiling
> /home/trevor/gits/mahout/viennacl/target/classes/org/
> apache/mahout/viennacl/opencl/javacpp/linux-x86_64/libjniViennaCL.so
> g++ -I/usr/include/viennacl -I/usr/lib/jvm/java-7-openjdk-amd64/include
> -I/usr/lib/jvm/java-7-openjdk-amd64/include/linux
> /home/trevor/gits/mahout/viennacl/target/classes/org/
> apache/mahout/viennacl/opencl/javacpp/jniViennaCL.cpp
> -msse3 -ffast-math -fopenmp -fpermissive -Wl,-rpath,$ORIGIN/
> -Wl,-z,noexecstack -Wl,-Bsymbolic -march=native -m64 -Wall -Ofast -fPIC
> -shared -s -o libjniViennaCL.so -lOpenCL
> Deleting
> /home/trevor/gits/mahout/viennacl/target/classes/org/
> apache/mahout/viennacl/opencl/javacpp/jniViennaCL.cpp
> [INFO]
> [INFO] --- maven-resources-plugin:2.7:testResources
> (default-testResources)
> @ mahout-native-viennacl_2.10 ---
> [INFO] Using 'UTF-8' encoding to copy filtered resources.
> [INFO] skip non existing resourceDirectory
> /home/trevor/gits/mahout/viennacl/src/test/resources
> [INFO] Copying 3 resources
> [INFO]
> [INFO] --- scala-maven-plugin:3.2.0:testCompile (scala-test-compile) @
> mahout-native-viennacl_2.10 ---
> [INFO] Nothing to compile - all classes are up to date
> [INFO]
> [INFO] --- maven-compiler-plugin:3.3:testCompile (default-testCompile) @
> mahout-native-viennacl_2.10 ---
> [INFO] Nothing to compile - all classes are up to date
> [INFO]
> [INFO] --- maven-surefire-plugin:2.18.1:test (default-test) @
> mahout-native-viennacl_2.10 ---
> [INFO] Tests are skipped.
> [INFO]
> [INFO] --- scalatest-maven-plugin:1.0:test (test) @
> mahout-native-viennacl_2.10 ---
> Discovery starting.
> Discovery completed in 201 milliseconds.
> Run starting. Expected test count is: 7
> ViennaCLSuiteVCL:
> - row-major viennacl::matrix
>   + OCL matrix memory domain after assgn=2
> [INFO] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
> [INFO] Successfully created org.apache.mahout.viennacl.opencl.GPUMMul
> solver
> jvmRWRW
> gpuRWCW
> log4j:WARN No appenders could be found for logger
> (org.apache.mahout.viennacl.opencl.GPUMMul$).
> log4j:WARN Please initialize the log4j system properly.
> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for
> more info.
> [INFO] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
> [INFO] Successfully created org.apache.mahout.viennacl.opencl.GPUMMul
> solver
> jvmRWRW
> gpuRWCW
> - dense vcl mmul with fast_copy
> [INFO] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
> [INFO] Successfully created org.apache.mahout.viennacl.opencl.GPUMMul
> solver
> jvmRWRW
> gpuRWCW
> - mmul microbenchmark
>   + Mahout multiplication time: 2630 ms.
>   + ViennaCL/OpenCL multiplication time: 2072 ms.
>   + ViennaCL/cpu/OpenMP multiplication time: 813 ms.
> - trans
> [INFO] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
> [INFO] Successfully created org.apache.mahout.viennacl.opencl.GPUMMul
> solver
> gpuSparseRWRW
> ViennaCL: FATAL ERROR: Kernel start failed for 'spgemm_stage3'.
> ViennaCL: Smaller work sizes could not solve the problem.
> ViennaCL: FATAL ERROR: CL_MEM_OBJECT_ALLOCATION_FAILURE
>  ViennaCL could not allocate memory on the device. Most likely the device
> simply ran out of memory.
> If you think that this is a bug in ViennaCL, please report it at
> viennacl-support@lists.sourceforge.net and supply at least the following
> information:
>  * Operating System
>  * Which OpenCL implementation (AMD, NVIDIA, etc.)
>  * ViennaCL version
> Many thanks in advance!falling back to JVM MMUL
> ViennaCL: FATAL ERROR: Kernel start failed for 'spgemm_stage3'.
> ViennaCL: Smaller work sizes could not solve the problem.
> - sparse mmul microbenchmark *** FAILED ***
>   java.lang.RuntimeException: ViennaCL: FATAL ERROR:
> CL_MEM_OBJECT_ALLOCATION_FAILURE
>  ViennaCL could not allocate memory on the device. Most likely the device
> simply ran out of memory.
> If you think that this is a bug in ViennaCL, please report it at
> viennacl-support@lists.sourceforge.net and supply at least the following
> information:
>  * Operating System
>  * Which OpenCL implementation (AMD, NVIDIA, etc.)
>  * ViennaCL version
> Many thanks in advance!
>   at
> org.apache.mahout.viennacl.opencl.javacpp.CompressedMatrix.allocate(Native
> Method)
>   at
> org.apache.mahout.viennacl.opencl.javacpp.CompressedMatrix.<init>(
> CompressedMatrix.scala:61)
>   at
> org.apache.mahout.viennacl.opencl.ViennaCLSuiteVCL$$
> anonfun$5.apply$mcV$sp(ViennaCLSuiteVCL.scala:250)
>   at
> org.apache.mahout.viennacl.opencl.ViennaCLSuiteVCL$$anonfun$5.apply(
> ViennaCLSuiteVCL.scala:218)
>   at
> org.apache.mahout.viennacl.opencl.ViennaCLSuiteVCL$$anonfun$5.apply(
> ViennaCLSuiteVCL.scala:218)
>   at
> org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(
> Transformer.scala:22)
>   at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
>   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
>   at org.scalatest.Transformer.apply(Transformer.scala:22)
>   at org.scalatest.Transformer.apply(Transformer.scala:20)
>   ...
>   + Mahout Sparse multiplication time: 13303 ms.
> - VCL Dense Matrix %*% Dense vector
>   + Mahout dense matrix %*% dense vector multiplication time: 0 ms.
>   + ViennaCL/cpu/OpenMP dense matrix %*% dense vector multiplication time:
> 4 ms.
> [INFO] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
> [INFO] Successfully created org.apache.mahout.viennacl.opencl.GPUMMul
> solver
> gpuRWCW
> - Sparse %*% Dense mmul microbenchmark
>   + Mahout multiplication time: 821 ms.
>   + ViennaCL/OpenCL multiplication time: 754 ms.
>   + ViennaCL/cpu/OpenMP multiplication time: 1473 ms.
> Run completed in 26 seconds, 944 milliseconds.
> Total number of tests run: 7
> Suites: completed 2, aborted 0
> Tests: succeeded 6, failed 1, canceled 0, ignored 0, pending 0
> *** 1 TEST FAILED ***
> [INFO]
> ------------------------------------------------------------------------
> [INFO] BUILD FAILURE
> [INFO]
> ------------------------------------------------------------------------
> [INFO] Total time: 45.600 s
> [INFO] Finished at: 2016-10-28T17:58:45-05:00
> [INFO] Final Memory: 20M/309M
> [INFO]
> ------------------------------------------------------------------------
> [ERROR] Failed to execute goal
> org.scalatest:scalatest-maven-plugin:1.0:test (test) on project
> mahout-native-viennacl_2.10: There are test failures -> [Help 1]
> [ERROR]
> [ERROR] To see the full stack trace of the errors, re-run Maven with the -e
> switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR]
> [ERROR] For more information about the errors and possible solutions,
> please read the following articles:
> [ERROR] [Help 1]
> http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
>
>
> Trevor Grant
> Data Scientist
> https://github.com/rawkintrevo
> http://stackexchange.com/users/3002022/rawkintrevo
> http://trevorgrant.org
>
> *"Fortunate is he, who is able to know the causes of things."  -Virgil*
>
>
> On Fri, Oct 28, 2016 at 5:38 PM, Andrew Musselman <
> andrew.musselman@gmail.com> wrote:
>
> > >
> > > My Desktop
> >
> > >
> > Branch: master
> >
> > > Build Command: mvn clean install -Phadoop2
> >
> > > Built and tested successfully
> >
> > >
> > Branch: andrewpalumbo/mahout/viennacl-opmmul-a
> >
> > > Bulid Command: mvn clean install -Pviennacl -Phadoop2
> >
> > Builds successfully-
> >
> > > Tests still riddled with
> >
> > >
> > [INFO] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
> >
> > > [WARN] Unable to create class GPUMMul: attempting OpenMP version
> >
> > > [INFO] Creating org.apache.mahout.viennacl.openmp.OMPMMul solver
> >
> > > [INFO] Unable to create class OMPMMul: falling back to java version
> >
> > >
> > Also OOM
> > ViennaCLSuiteOMP:
> > - row-major viennacl::matrix
> > [INFO] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
> > [WARN] Unable to create class GPUMMul: attempting OpenMP version
> > [INFO] Creating org.apache.mahout.viennacl.openmp.OMPMMul solver
> > [INFO] Unable to create class OMPMMul: falling back to java version
> > - mmul microbenchmark
> >   + Mahout multiplication time: 8875 ms.
> >   + ViennaCL/cpu/OpenMP multiplication time: 599 ms.
> > - trans
> > [INFO] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
> > [WARN] Unable to create class GPUMMul: attempting OpenMP version
> > [INFO] Creating org.apache.mahout.viennacl.openmp.OMPMMul solver
> > [INFO] Unable to create class OMPMMul: falling back to java version
> > *** RUN ABORTED ***
> >   java.lang.OutOfMemoryError: Java heap space
> >   at
> > it.unimi.dsi.fastutil.ints.Int2DoubleOpenHashMap.rehash(
> > Int2DoubleOpenHashMap.java:1059)
> >   at
> > it.unimi.dsi.fastutil.ints.Int2DoubleOpenHashMap.insert(
> > Int2DoubleOpenHashMap.java:295)
> >   at
> > it.unimi.dsi.fastutil.ints.Int2DoubleOpenHashMap.put(
> > Int2DoubleOpenHashMap.java:301)
> >   at
> > org.apache.mahout.math.RandomAccessSparseVector.setQuick(
> > RandomAccessSparseVector.java:130)
> >   at
> > org.apache.mahout.math.SparseRowMatrix.setQuick(
> SparseRowMatrix.java:105)
> >   at org.apache.mahout.math.AbstractMatrix.assign(
> AbstractMatrix.java:256)
> >   at
> > org.apache.mahout.math.scalabindings.MatrixOps.$
> > colon$eq(MatrixOps.scala:192)
> >   at
> > org.apache.mahout.math.scalabindings.MatrixOps.
> cloned(MatrixOps.scala:260)
> >   at
> > org.apache.mahout.math.scalabindings.MatrixOps.$
> minus(MatrixOps.scala:66)
> >   at
> > org.apache.mahout.viennacl.openmp.ViennaCLSuiteOMP$$
> > anonfun$4.apply$mcV$sp(ViennaCLSuiteOMP.scala:152)
> >
> > >
> >
> > Machine Info:
> >
> > > ============================================================
> > ====================
> >
> > >
> >  os major/minor:
> >
> > > $ lsb_release -a
> > No LSB modules are available.
> > Distributor ID: Ubuntu
> > Description: Ubuntu 16.04.1 LTS
> > Release: 16.04
> > Codename: xenial
> >
> > >
> >
> >  chip arch:
> >
> > > $ uname -a
> > Linux Bob 4.4.0-34-generic #53-Ubuntu SMP Wed Jul 27 16:06:39 UTC 2016
> > x86_64 x86_64 x86_64 GNU/Linux
> >
> > >
> > $ cat /proc/cpuinfo
> >
> > > ...
> >
> > > model name : Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz
> >
> > > ...
> >
> > > ^^ 6 cores
> >
> > >
> >
> > GPU:
> >
> > > $ sudo nvidia-smi
> >
> > >
> > +-----------------------------------------------------------
> > ------------------+
> > | NVIDIA-SMI 367.44                 Driver Version: 367.44
> >    |
> > |-------------------------------+----------------------+----
> > ------------------+
> > | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr.
> > ECC |
> > | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util
> Compute
> > M. |
> > |===============================+======================+====
> > ==================|
> > |   0  GeForce GTX 750 Ti  Off  | 0000:01:00.0      On |
> >  N/A |
> > | 29%   25C    P8     1W /  38W |    307MiB /  1998MiB |      0%
> >  Default |
> > +-------------------------------+----------------------+----
> > ------------------+
> >
> >
> > +-----------------------------------------------------------
> > ------------------+
> > | Processes:                                                       GPU
> > Memory |
> > |  GPU       PID  Type  Process name                               Usage
> >    |
> > |===========================================================
> > ==================|
> > |    0      1537    G   ...C-EnableWebRtcEcdsa,WebRTC-H264WithOpenH2
> >  87MiB |
> > |    0      1990    G   /usr/lib/xorg/Xorg
> > 101MiB |
> > |    0      2319    G   /usr/bin/gnome-shell
> > 104MiB |
> > |    0      4360    G   /usr/lib/xorg/Xorg
> >  12MiB |
> > +-----------------------------------------------------------
> > ------------------+
> >
> > >
> >
> >
> >
> >  clinfo output:
> >
> > > $ clinfo
> > Number of platforms                               1
> >   Platform Name                                   NVIDIA CUDA
> >   Platform Vendor                                 NVIDIA Corporation
> >   Platform Version                                OpenCL 1.2 CUDA 8.0.0
> >   Platform Profile                                FULL_PROFILE
> >   Platform Extensions
> > cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics
> > cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics
> > cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing
> > cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll
> > cl_nv_copy_opts
> >   Platform Extensions function suffix             NV
> >
> >   Platform Name                                   NVIDIA CUDA
> > Number of devices                                 1
> >   Device Name                                     GeForce GTX 750 Ti
> >   Device Vendor                                   NVIDIA Corporation
> >   Device Vendor ID                                0x10de
> >   Device Version                                  OpenCL 1.2 CUDA
> >   Driver Version                                  367.44
> >   Device OpenCL C Version                         OpenCL C 1.2
> >   Device Type                                     GPU
> >   Device Profile                                  FULL_PROFILE
> >   Device Topology (NV)                            PCI-E, 01:00.0
> >   Max compute units                               5
> >   Max clock frequency                             1150MHz
> >   Compute Capability (NV)                         5.0
> >   Device Partition                                (core)
> >     Max number of sub-devices                     1
> >     Supported partition types                     None
> >   Max work item dimensions                        3
> >   Max work item sizes                             1024x1024x64
> >   Max work group size                             1024
> >   Preferred work group size multiple              32
> >   Warp size (NV)                                  32
> >   Preferred / native vector sizes
> >     char                                                 1 / 1
> >     short                                                1 / 1
> >     int                                                  1 / 1
> >     long                                                 1 / 1
> >     half                                                 0 / 0
> (n/a)
> >     float                                                1 / 1
> >     double                                               1 / 1
> >  (cl_khr_fp64)
> >   Half-precision Floating-point support           (n/a)
> >   Single-precision Floating-point support         (core)
> >     Denormals                                     Yes
> >     Infinity and NANs                             Yes
> >     Round to nearest                              Yes
> >     Round to zero                                 Yes
> >     Round to infinity                             Yes
> >     IEEE754-2008 fused multiply-add               Yes
> >     Support is emulated in software               No
> >     Correctly-rounded divide and sqrt operations  Yes
> >   Double-precision Floating-point support         (cl_khr_fp64)
> >     Denormals                                     Yes
> >     Infinity and NANs                             Yes
> >     Round to nearest                              Yes
> >     Round to zero                                 Yes
> >     Round to infinity                             Yes
> >     IEEE754-2008 fused multiply-add               Yes
> >     Support is emulated in software               No
> >     Correctly-rounded divide and sqrt operations  No
> >   Address bits                                    64, Little-Endian
> >   Global memory size                              2095841280 (1.952GiB)
> >   Error Correction support                        No
> >   Max memory allocation                           523960320 (499.7MiB)
> >   Unified memory for Host and Device              No
> >   Integrated memory (NV)                          No
> >   Minimum alignment for any data type             128 bytes
> >   Alignment of base address                       4096 bits (512 bytes)
> >   Global Memory cache type                        Read/Write
> >   Global Memory cache size                        81920
> >   Global Memory cache line                        128 bytes
> >   Image support                                   Yes
> >     Max number of samplers per kernel             32
> >     Max size for 1D images from buffer            134217728 pixels
> >     Max 1D or 2D image array size                 2048 images
> >     Max 2D image size                             16384x16384 pixels
> >     Max 3D image size                             4096x4096x4096 pixels
> >     Max number of read image args                 256
> >     Max number of write image args                16
> >   Local memory type                               Local
> >   Local memory size                               49152 (48KiB)
> >   Registers per block (NV)                        65536
> >   Max constant buffer size                        65536 (64KiB)
> >   Max number of constant args                     9
> >   Max size of kernel argument                     4352 (4.25KiB)
> >   Queue properties
> >     Out-of-order execution                        Yes
> >     Profiling                                     Yes
> >   Prefer user sync for interop                    No
> >   Profiling timer resolution                      1000ns
> >   Execution capabilities
> >     Run OpenCL kernels                            Yes
> >     Run native kernels                            No
> >     Kernel execution timeout (NV)                 Yes
> >   Concurrent copy and kernel execution (NV)       Yes
> >     Number of async copy engines                  1
> >   printf() buffer size                            1048576 (1024KiB)
> >   Built-in kernels
> >   Device Available                                Yes
> >   Compiler Available                              Yes
> >   Linker Available                                Yes
> >   Device Extensions
> > cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics
> > cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics
> > cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing
> > cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll
> > cl_nv_copy_opts
> >
> > NULL platform behavior
> >   clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  NVIDIA CUDA
> >   clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   Success [NV]
> >   clCreateContext(NULL, ...) [default]            Success [NV]
> >   clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No devices found in
> > platform
> >   clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  No platform
> >   clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No devices
> > found in platform
> >   clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  No devices found
> in
> > platform
> >   clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  No platform
> >
> > ICD loader properties
> >   ICD loader Name                                 OpenCL ICD Loader
> >   ICD loader Vendor                               OCL Icd free software
> >   ICD loader Version                              2.2.8
> >   ICD loader Profile                              OpenCL 1.2
> > NOTE: your OpenCL library declares to support OpenCL 1.2,
> > but it seems to support up to OpenCL 2.1 too.
> >
> > >
> >
> >  Java major/minor and vendor:
> >
> > > $ java -version
> > openjdk version "1.8.0_91"
> > OpenJDK Runtime Environment (build 1.8.0_91-8u91-b14-3ubuntu1~16.
> 04.1-b14)
> > OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode)
> >
> > >
> >
> >   vcl version,
> >
> > > (Ignoring what is in /usr/include/viennacl/version.hpp)
> >
> > > 1.7.1
> >
> >
> > gcc version:
> >
> > > $ gcc --version
> > gcc (Ubuntu 5.4.0-6ubuntu1~16.04.2) 5.4.0 20160609
> >
> > >
> >
> > On Wed, Oct 26, 2016 at 11:31 PM, Andrew Palumbo <ap...@outlook.com>
> > wrote:
> >
> > >
> > >
> > >
> > > Could everybody please post any errors when stack trace that they are
> > > getting on master or the vcl pr branch along with full system info?
> > >
> > > Ie.  Branch, os major/minor, chip arch, GPU, clinfo output, Java
> > > major/minor and vendor,  vcl version, gcc version, and anything else
> that
> > > may be useful so that we may compare?
> > >
> > > Thx
> > >
> > > Andy
> > >
> > >
> > > Sent from my Galaxy Tab A
> > >
> >
>

RE: Errors when 0.13.0 master or vcl

Posted by Andrew Palumbo <ap...@outlook.com>.
Ok this makes sense.. it's not actually a (java) OOM error, it's actually a blanket GPU error... that's why we have the fall backs to omp and jvm..

Does it happen each time you run that test?  Some times these are one off things where the gpu buffer had not been cleared when the next test starts.. or something like that.

Thanks this is a big help in figuring out what's happening.



Sent from my Verizon Wireless 4G LTE smartphone


-------- Original message --------
From: Trevor Grant <tr...@gmail.com>
Date: 10/28/2016 4:01 PM (GMT-08:00)
To: dev@mahout.apache.org
Subject: Re: Errors when 0.13.0 master or vcl

I was finally able to hit an OOM error.

I initially was running everything from the top.

When I run mvn test from the viennacl directory...


/mahout/viennacl$ mvn test
[INFO] Scanning for projects...
[WARNING]
[WARNING] Some problems were encountered while building the effective model
for org.apache.mahout:mahout-native-viennacl_2.10:jar:0.13.0-SNAPSHOT
[WARNING] 'artifactId' contains an expression but should be a constant. @
org.apache.mahout:mahout-native-viennacl_${scala.compat.version}:[unknown-version],
/home/trevor/gits/mahout/viennacl/pom.xml, line 31, column 15
[WARNING] 'build.plugins.plugin.(groupId:artifactId)' must be unique but
found duplicate declaration of plugin org.codehaus.mojo:exec-maven-plugin @
org.apache.mahout:mahout-native-viennacl_${scala.compat.version}:[unknown-version],
/home/trevor/gits/mahout/viennacl/pom.xml, line 181, column 15
[WARNING]
[WARNING] It is highly recommended to fix these problems because they
threaten the stability of your build.
[WARNING]
[WARNING] For this reason, future Maven versions might no longer support
building such malformed projects.
[WARNING]
[INFO]

[INFO]
------------------------------------------------------------------------
[INFO] Building Mahout Native VienniaCL OpenCL Bindings 0.13.0-SNAPSHOT
[INFO]
------------------------------------------------------------------------
[INFO]
[INFO] --- maven-enforcer-plugin:1.4:enforce (enforce-versions) @
mahout-native-viennacl_2.10 ---
[INFO]
[INFO] --- scala-maven-plugin:3.2.0:add-source (add-scala-sources) @
mahout-native-viennacl_2.10 ---
[INFO] Add Source directory:
/home/trevor/gits/mahout/viennacl/src/main/scala
[INFO] Add Test Source directory:
/home/trevor/gits/mahout/viennacl/src/test/scala
[INFO]
[INFO] --- maven-dependency-plugin:2.3:properties (default) @
mahout-native-viennacl_2.10 ---
[INFO]
[INFO] --- maven-remote-resources-plugin:1.5:process (default) @
mahout-native-viennacl_2.10 ---
[INFO]
[INFO] --- maven-resources-plugin:2.7:resources (default-resources) @
mahout-native-viennacl_2.10 ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory
/home/trevor/gits/mahout/viennacl/src/main/resources
[INFO] Copying 3 resources
[INFO]
[INFO] --- scala-maven-plugin:3.2.0:compile (scala-compile) @
mahout-native-viennacl_2.10 ---
[INFO] Nothing to compile - all classes are up to date
[INFO]
[INFO] --- maven-compiler-plugin:3.3:compile (default-compile) @
mahout-native-viennacl_2.10 ---
[INFO] Nothing to compile - all classes are up to date
[INFO]
[INFO] --- exec-maven-plugin:1.2.1:exec (javacpp) @
mahout-native-viennacl_2.10 ---
Warning: Could not load platform properties for class
org.apache.mahout.viennacl.opencl.GPUMMul
Warning: Could not load platform properties for class
org.apache.mahout.viennacl.opencl.GPUMMul$
Generating
/home/trevor/gits/mahout/viennacl/target/classes/org/apache/mahout/viennacl/opencl/javacpp/jniViennaCL.cpp
Compiling
/home/trevor/gits/mahout/viennacl/target/classes/org/apache/mahout/viennacl/opencl/javacpp/linux-x86_64/libjniViennaCL.so
g++ -I/usr/include/viennacl -I/usr/lib/jvm/java-7-openjdk-amd64/include
-I/usr/lib/jvm/java-7-openjdk-amd64/include/linux
/home/trevor/gits/mahout/viennacl/target/classes/org/apache/mahout/viennacl/opencl/javacpp/jniViennaCL.cpp
-msse3 -ffast-math -fopenmp -fpermissive -Wl,-rpath,$ORIGIN/
-Wl,-z,noexecstack -Wl,-Bsymbolic -march=native -m64 -Wall -Ofast -fPIC
-shared -s -o libjniViennaCL.so -lOpenCL
Deleting
/home/trevor/gits/mahout/viennacl/target/classes/org/apache/mahout/viennacl/opencl/javacpp/jniViennaCL.cpp
[INFO]
[INFO] --- maven-resources-plugin:2.7:testResources (default-testResources)
@ mahout-native-viennacl_2.10 ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory
/home/trevor/gits/mahout/viennacl/src/test/resources
[INFO] Copying 3 resources
[INFO]
[INFO] --- scala-maven-plugin:3.2.0:testCompile (scala-test-compile) @
mahout-native-viennacl_2.10 ---
[INFO] Nothing to compile - all classes are up to date
[INFO]
[INFO] --- maven-compiler-plugin:3.3:testCompile (default-testCompile) @
mahout-native-viennacl_2.10 ---
[INFO] Nothing to compile - all classes are up to date
[INFO]
[INFO] --- maven-surefire-plugin:2.18.1:test (default-test) @
mahout-native-viennacl_2.10 ---
[INFO] Tests are skipped.
[INFO]
[INFO] --- scalatest-maven-plugin:1.0:test (test) @
mahout-native-viennacl_2.10 ---
Discovery starting.
Discovery completed in 201 milliseconds.
Run starting. Expected test count is: 7
ViennaCLSuiteVCL:
- row-major viennacl::matrix
  + OCL matrix memory domain after assgn=2
[INFO] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
[INFO] Successfully created org.apache.mahout.viennacl.opencl.GPUMMul solver
jvmRWRW
gpuRWCW
log4j:WARN No appenders could be found for logger
(org.apache.mahout.viennacl.opencl.GPUMMul$).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for
more info.
[INFO] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
[INFO] Successfully created org.apache.mahout.viennacl.opencl.GPUMMul solver
jvmRWRW
gpuRWCW
- dense vcl mmul with fast_copy
[INFO] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
[INFO] Successfully created org.apache.mahout.viennacl.opencl.GPUMMul solver
jvmRWRW
gpuRWCW
- mmul microbenchmark
  + Mahout multiplication time: 2630 ms.
  + ViennaCL/OpenCL multiplication time: 2072 ms.
  + ViennaCL/cpu/OpenMP multiplication time: 813 ms.
- trans
[INFO] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
[INFO] Successfully created org.apache.mahout.viennacl.opencl.GPUMMul solver
gpuSparseRWRW
ViennaCL: FATAL ERROR: Kernel start failed for 'spgemm_stage3'.
ViennaCL: Smaller work sizes could not solve the problem.
ViennaCL: FATAL ERROR: CL_MEM_OBJECT_ALLOCATION_FAILURE
 ViennaCL could not allocate memory on the device. Most likely the device
simply ran out of memory.
If you think that this is a bug in ViennaCL, please report it at
viennacl-support@lists.sourceforge.net and supply at least the following
information:
 * Operating System
 * Which OpenCL implementation (AMD, NVIDIA, etc.)
 * ViennaCL version
Many thanks in advance!falling back to JVM MMUL
ViennaCL: FATAL ERROR: Kernel start failed for 'spgemm_stage3'.
ViennaCL: Smaller work sizes could not solve the problem.
- sparse mmul microbenchmark *** FAILED ***
  java.lang.RuntimeException: ViennaCL: FATAL ERROR:
CL_MEM_OBJECT_ALLOCATION_FAILURE
 ViennaCL could not allocate memory on the device. Most likely the device
simply ran out of memory.
If you think that this is a bug in ViennaCL, please report it at
viennacl-support@lists.sourceforge.net and supply at least the following
information:
 * Operating System
 * Which OpenCL implementation (AMD, NVIDIA, etc.)
 * ViennaCL version
Many thanks in advance!
  at
org.apache.mahout.viennacl.opencl.javacpp.CompressedMatrix.allocate(Native
Method)
  at
org.apache.mahout.viennacl.opencl.javacpp.CompressedMatrix.<init>(CompressedMatrix.scala:61)
  at
org.apache.mahout.viennacl.opencl.ViennaCLSuiteVCL$$anonfun$5.apply$mcV$sp(ViennaCLSuiteVCL.scala:250)
  at
org.apache.mahout.viennacl.opencl.ViennaCLSuiteVCL$$anonfun$5.apply(ViennaCLSuiteVCL.scala:218)
  at
org.apache.mahout.viennacl.opencl.ViennaCLSuiteVCL$$anonfun$5.apply(ViennaCLSuiteVCL.scala:218)
  at
org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
  at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
  at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
  at org.scalatest.Transformer.apply(Transformer.scala:22)
  at org.scalatest.Transformer.apply(Transformer.scala:20)
  ...
  + Mahout Sparse multiplication time: 13303 ms.
- VCL Dense Matrix %*% Dense vector
  + Mahout dense matrix %*% dense vector multiplication time: 0 ms.
  + ViennaCL/cpu/OpenMP dense matrix %*% dense vector multiplication time:
4 ms.
[INFO] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
[INFO] Successfully created org.apache.mahout.viennacl.opencl.GPUMMul solver
gpuRWCW
- Sparse %*% Dense mmul microbenchmark
  + Mahout multiplication time: 821 ms.
  + ViennaCL/OpenCL multiplication time: 754 ms.
  + ViennaCL/cpu/OpenMP multiplication time: 1473 ms.
Run completed in 26 seconds, 944 milliseconds.
Total number of tests run: 7
Suites: completed 2, aborted 0
Tests: succeeded 6, failed 1, canceled 0, ignored 0, pending 0
*** 1 TEST FAILED ***
[INFO]
------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO]
------------------------------------------------------------------------
[INFO] Total time: 45.600 s
[INFO] Finished at: 2016-10-28T17:58:45-05:00
[INFO] Final Memory: 20M/309M
[INFO]
------------------------------------------------------------------------
[ERROR] Failed to execute goal
org.scalatest:scalatest-maven-plugin:1.0:test (test) on project
mahout-native-viennacl_2.10: There are test failures -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions,
please read the following articles:
[ERROR] [Help 1]
http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException


Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things."  -Virgil*


On Fri, Oct 28, 2016 at 5:38 PM, Andrew Musselman <
andrew.musselman@gmail.com> wrote:

> >
> > My Desktop
>
> >
> Branch: master
>
> > Build Command: mvn clean install -Phadoop2
>
> > Built and tested successfully
>
> >
> Branch: andrewpalumbo/mahout/viennacl-opmmul-a
>
> > Bulid Command: mvn clean install -Pviennacl -Phadoop2
>
> Builds successfully-
>
> > Tests still riddled with
>
> >
> [INFO] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
>
> > [WARN] Unable to create class GPUMMul: attempting OpenMP version
>
> > [INFO] Creating org.apache.mahout.viennacl.openmp.OMPMMul solver
>
> > [INFO] Unable to create class OMPMMul: falling back to java version
>
> >
> Also OOM
> ViennaCLSuiteOMP:
> - row-major viennacl::matrix
> [INFO] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
> [WARN] Unable to create class GPUMMul: attempting OpenMP version
> [INFO] Creating org.apache.mahout.viennacl.openmp.OMPMMul solver
> [INFO] Unable to create class OMPMMul: falling back to java version
> - mmul microbenchmark
>   + Mahout multiplication time: 8875 ms.
>   + ViennaCL/cpu/OpenMP multiplication time: 599 ms.
> - trans
> [INFO] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
> [WARN] Unable to create class GPUMMul: attempting OpenMP version
> [INFO] Creating org.apache.mahout.viennacl.openmp.OMPMMul solver
> [INFO] Unable to create class OMPMMul: falling back to java version
> *** RUN ABORTED ***
>   java.lang.OutOfMemoryError: Java heap space
>   at
> it.unimi.dsi.fastutil.ints.Int2DoubleOpenHashMap.rehash(
> Int2DoubleOpenHashMap.java:1059)
>   at
> it.unimi.dsi.fastutil.ints.Int2DoubleOpenHashMap.insert(
> Int2DoubleOpenHashMap.java:295)
>   at
> it.unimi.dsi.fastutil.ints.Int2DoubleOpenHashMap.put(
> Int2DoubleOpenHashMap.java:301)
>   at
> org.apache.mahout.math.RandomAccessSparseVector.setQuick(
> RandomAccessSparseVector.java:130)
>   at
> org.apache.mahout.math.SparseRowMatrix.setQuick(SparseRowMatrix.java:105)
>   at org.apache.mahout.math.AbstractMatrix.assign(AbstractMatrix.java:256)
>   at
> org.apache.mahout.math.scalabindings.MatrixOps.$
> colon$eq(MatrixOps.scala:192)
>   at
> org.apache.mahout.math.scalabindings.MatrixOps.cloned(MatrixOps.scala:260)
>   at
> org.apache.mahout.math.scalabindings.MatrixOps.$minus(MatrixOps.scala:66)
>   at
> org.apache.mahout.viennacl.openmp.ViennaCLSuiteOMP$$
> anonfun$4.apply$mcV$sp(ViennaCLSuiteOMP.scala:152)
>
> >
>
> Machine Info:
>
> > ============================================================
> ====================
>
> >
>  os major/minor:
>
> > $ lsb_release -a
> No LSB modules are available.
> Distributor ID: Ubuntu
> Description: Ubuntu 16.04.1 LTS
> Release: 16.04
> Codename: xenial
>
> >
>
>  chip arch:
>
> > $ uname -a
> Linux Bob 4.4.0-34-generic #53-Ubuntu SMP Wed Jul 27 16:06:39 UTC 2016
> x86_64 x86_64 x86_64 GNU/Linux
>
> >
> $ cat /proc/cpuinfo
>
> > ...
>
> > model name : Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz
>
> > ...
>
> > ^^ 6 cores
>
> >
>
> GPU:
>
> > $ sudo nvidia-smi
>
> >
> +-----------------------------------------------------------
> ------------------+
> | NVIDIA-SMI 367.44                 Driver Version: 367.44
>    |
> |-------------------------------+----------------------+----
> ------------------+
> | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr.
> ECC |
> | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute
> M. |
> |===============================+======================+====
> ==================|
> |   0  GeForce GTX 750 Ti  Off  | 0000:01:00.0      On |
>  N/A |
> | 29%   25C    P8     1W /  38W |    307MiB /  1998MiB |      0%
>  Default |
> +-------------------------------+----------------------+----
> ------------------+
>
>
> +-----------------------------------------------------------
> ------------------+
> | Processes:                                                       GPU
> Memory |
> |  GPU       PID  Type  Process name                               Usage
>    |
> |===========================================================
> ==================|
> |    0      1537    G   ...C-EnableWebRtcEcdsa,WebRTC-H264WithOpenH2
>  87MiB |
> |    0      1990    G   /usr/lib/xorg/Xorg
> 101MiB |
> |    0      2319    G   /usr/bin/gnome-shell
> 104MiB |
> |    0      4360    G   /usr/lib/xorg/Xorg
>  12MiB |
> +-----------------------------------------------------------
> ------------------+
>
> >
>
>
>
>  clinfo output:
>
> > $ clinfo
> Number of platforms                               1
>   Platform Name                                   NVIDIA CUDA
>   Platform Vendor                                 NVIDIA Corporation
>   Platform Version                                OpenCL 1.2 CUDA 8.0.0
>   Platform Profile                                FULL_PROFILE
>   Platform Extensions
> cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics
> cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics
> cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing
> cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll
> cl_nv_copy_opts
>   Platform Extensions function suffix             NV
>
>   Platform Name                                   NVIDIA CUDA
> Number of devices                                 1
>   Device Name                                     GeForce GTX 750 Ti
>   Device Vendor                                   NVIDIA Corporation
>   Device Vendor ID                                0x10de
>   Device Version                                  OpenCL 1.2 CUDA
>   Driver Version                                  367.44
>   Device OpenCL C Version                         OpenCL C 1.2
>   Device Type                                     GPU
>   Device Profile                                  FULL_PROFILE
>   Device Topology (NV)                            PCI-E, 01:00.0
>   Max compute units                               5
>   Max clock frequency                             1150MHz
>   Compute Capability (NV)                         5.0
>   Device Partition                                (core)
>     Max number of sub-devices                     1
>     Supported partition types                     None
>   Max work item dimensions                        3
>   Max work item sizes                             1024x1024x64
>   Max work group size                             1024
>   Preferred work group size multiple              32
>   Warp size (NV)                                  32
>   Preferred / native vector sizes
>     char                                                 1 / 1
>     short                                                1 / 1
>     int                                                  1 / 1
>     long                                                 1 / 1
>     half                                                 0 / 0        (n/a)
>     float                                                1 / 1
>     double                                               1 / 1
>  (cl_khr_fp64)
>   Half-precision Floating-point support           (n/a)
>   Single-precision Floating-point support         (core)
>     Denormals                                     Yes
>     Infinity and NANs                             Yes
>     Round to nearest                              Yes
>     Round to zero                                 Yes
>     Round to infinity                             Yes
>     IEEE754-2008 fused multiply-add               Yes
>     Support is emulated in software               No
>     Correctly-rounded divide and sqrt operations  Yes
>   Double-precision Floating-point support         (cl_khr_fp64)
>     Denormals                                     Yes
>     Infinity and NANs                             Yes
>     Round to nearest                              Yes
>     Round to zero                                 Yes
>     Round to infinity                             Yes
>     IEEE754-2008 fused multiply-add               Yes
>     Support is emulated in software               No
>     Correctly-rounded divide and sqrt operations  No
>   Address bits                                    64, Little-Endian
>   Global memory size                              2095841280 (1.952GiB)
>   Error Correction support                        No
>   Max memory allocation                           523960320 (499.7MiB)
>   Unified memory for Host and Device              No
>   Integrated memory (NV)                          No
>   Minimum alignment for any data type             128 bytes
>   Alignment of base address                       4096 bits (512 bytes)
>   Global Memory cache type                        Read/Write
>   Global Memory cache size                        81920
>   Global Memory cache line                        128 bytes
>   Image support                                   Yes
>     Max number of samplers per kernel             32
>     Max size for 1D images from buffer            134217728 pixels
>     Max 1D or 2D image array size                 2048 images
>     Max 2D image size                             16384x16384 pixels
>     Max 3D image size                             4096x4096x4096 pixels
>     Max number of read image args                 256
>     Max number of write image args                16
>   Local memory type                               Local
>   Local memory size                               49152 (48KiB)
>   Registers per block (NV)                        65536
>   Max constant buffer size                        65536 (64KiB)
>   Max number of constant args                     9
>   Max size of kernel argument                     4352 (4.25KiB)
>   Queue properties
>     Out-of-order execution                        Yes
>     Profiling                                     Yes
>   Prefer user sync for interop                    No
>   Profiling timer resolution                      1000ns
>   Execution capabilities
>     Run OpenCL kernels                            Yes
>     Run native kernels                            No
>     Kernel execution timeout (NV)                 Yes
>   Concurrent copy and kernel execution (NV)       Yes
>     Number of async copy engines                  1
>   printf() buffer size                            1048576 (1024KiB)
>   Built-in kernels
>   Device Available                                Yes
>   Compiler Available                              Yes
>   Linker Available                                Yes
>   Device Extensions
> cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics
> cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics
> cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing
> cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll
> cl_nv_copy_opts
>
> NULL platform behavior
>   clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  NVIDIA CUDA
>   clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   Success [NV]
>   clCreateContext(NULL, ...) [default]            Success [NV]
>   clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No devices found in
> platform
>   clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  No platform
>   clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No devices
> found in platform
>   clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  No devices found in
> platform
>   clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  No platform
>
> ICD loader properties
>   ICD loader Name                                 OpenCL ICD Loader
>   ICD loader Vendor                               OCL Icd free software
>   ICD loader Version                              2.2.8
>   ICD loader Profile                              OpenCL 1.2
> NOTE: your OpenCL library declares to support OpenCL 1.2,
> but it seems to support up to OpenCL 2.1 too.
>
> >
>
>  Java major/minor and vendor:
>
> > $ java -version
> openjdk version "1.8.0_91"
> OpenJDK Runtime Environment (build 1.8.0_91-8u91-b14-3ubuntu1~16.04.1-b14)
> OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode)
>
> >
>
>   vcl version,
>
> > (Ignoring what is in /usr/include/viennacl/version.hpp)
>
> > 1.7.1
>
>
> gcc version:
>
> > $ gcc --version
> gcc (Ubuntu 5.4.0-6ubuntu1~16.04.2) 5.4.0 20160609
>
> >
>
> On Wed, Oct 26, 2016 at 11:31 PM, Andrew Palumbo <ap...@outlook.com>
> wrote:
>
> >
> >
> >
> > Could everybody please post any errors when stack trace that they are
> > getting on master or the vcl pr branch along with full system info?
> >
> > Ie.  Branch, os major/minor, chip arch, GPU, clinfo output, Java
> > major/minor and vendor,  vcl version, gcc version, and anything else that
> > may be useful so that we may compare?
> >
> > Thx
> >
> > Andy
> >
> >
> > Sent from my Galaxy Tab A
> >
>

Re: Errors when 0.13.0 master or vcl

Posted by Trevor Grant <tr...@gmail.com>.
I was finally able to hit an OOM error.

I initially was running everything from the top.

When I run mvn test from the viennacl directory...


/mahout/viennacl$ mvn test
[INFO] Scanning for projects...
[WARNING]
[WARNING] Some problems were encountered while building the effective model
for org.apache.mahout:mahout-native-viennacl_2.10:jar:0.13.0-SNAPSHOT
[WARNING] 'artifactId' contains an expression but should be a constant. @
org.apache.mahout:mahout-native-viennacl_${scala.compat.version}:[unknown-version],
/home/trevor/gits/mahout/viennacl/pom.xml, line 31, column 15
[WARNING] 'build.plugins.plugin.(groupId:artifactId)' must be unique but
found duplicate declaration of plugin org.codehaus.mojo:exec-maven-plugin @
org.apache.mahout:mahout-native-viennacl_${scala.compat.version}:[unknown-version],
/home/trevor/gits/mahout/viennacl/pom.xml, line 181, column 15
[WARNING]
[WARNING] It is highly recommended to fix these problems because they
threaten the stability of your build.
[WARNING]
[WARNING] For this reason, future Maven versions might no longer support
building such malformed projects.
[WARNING]
[INFO]

[INFO]
------------------------------------------------------------------------
[INFO] Building Mahout Native VienniaCL OpenCL Bindings 0.13.0-SNAPSHOT
[INFO]
------------------------------------------------------------------------
[INFO]
[INFO] --- maven-enforcer-plugin:1.4:enforce (enforce-versions) @
mahout-native-viennacl_2.10 ---
[INFO]
[INFO] --- scala-maven-plugin:3.2.0:add-source (add-scala-sources) @
mahout-native-viennacl_2.10 ---
[INFO] Add Source directory:
/home/trevor/gits/mahout/viennacl/src/main/scala
[INFO] Add Test Source directory:
/home/trevor/gits/mahout/viennacl/src/test/scala
[INFO]
[INFO] --- maven-dependency-plugin:2.3:properties (default) @
mahout-native-viennacl_2.10 ---
[INFO]
[INFO] --- maven-remote-resources-plugin:1.5:process (default) @
mahout-native-viennacl_2.10 ---
[INFO]
[INFO] --- maven-resources-plugin:2.7:resources (default-resources) @
mahout-native-viennacl_2.10 ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory
/home/trevor/gits/mahout/viennacl/src/main/resources
[INFO] Copying 3 resources
[INFO]
[INFO] --- scala-maven-plugin:3.2.0:compile (scala-compile) @
mahout-native-viennacl_2.10 ---
[INFO] Nothing to compile - all classes are up to date
[INFO]
[INFO] --- maven-compiler-plugin:3.3:compile (default-compile) @
mahout-native-viennacl_2.10 ---
[INFO] Nothing to compile - all classes are up to date
[INFO]
[INFO] --- exec-maven-plugin:1.2.1:exec (javacpp) @
mahout-native-viennacl_2.10 ---
Warning: Could not load platform properties for class
org.apache.mahout.viennacl.opencl.GPUMMul
Warning: Could not load platform properties for class
org.apache.mahout.viennacl.opencl.GPUMMul$
Generating
/home/trevor/gits/mahout/viennacl/target/classes/org/apache/mahout/viennacl/opencl/javacpp/jniViennaCL.cpp
Compiling
/home/trevor/gits/mahout/viennacl/target/classes/org/apache/mahout/viennacl/opencl/javacpp/linux-x86_64/libjniViennaCL.so
g++ -I/usr/include/viennacl -I/usr/lib/jvm/java-7-openjdk-amd64/include
-I/usr/lib/jvm/java-7-openjdk-amd64/include/linux
/home/trevor/gits/mahout/viennacl/target/classes/org/apache/mahout/viennacl/opencl/javacpp/jniViennaCL.cpp
-msse3 -ffast-math -fopenmp -fpermissive -Wl,-rpath,$ORIGIN/
-Wl,-z,noexecstack -Wl,-Bsymbolic -march=native -m64 -Wall -Ofast -fPIC
-shared -s -o libjniViennaCL.so -lOpenCL
Deleting
/home/trevor/gits/mahout/viennacl/target/classes/org/apache/mahout/viennacl/opencl/javacpp/jniViennaCL.cpp
[INFO]
[INFO] --- maven-resources-plugin:2.7:testResources (default-testResources)
@ mahout-native-viennacl_2.10 ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory
/home/trevor/gits/mahout/viennacl/src/test/resources
[INFO] Copying 3 resources
[INFO]
[INFO] --- scala-maven-plugin:3.2.0:testCompile (scala-test-compile) @
mahout-native-viennacl_2.10 ---
[INFO] Nothing to compile - all classes are up to date
[INFO]
[INFO] --- maven-compiler-plugin:3.3:testCompile (default-testCompile) @
mahout-native-viennacl_2.10 ---
[INFO] Nothing to compile - all classes are up to date
[INFO]
[INFO] --- maven-surefire-plugin:2.18.1:test (default-test) @
mahout-native-viennacl_2.10 ---
[INFO] Tests are skipped.
[INFO]
[INFO] --- scalatest-maven-plugin:1.0:test (test) @
mahout-native-viennacl_2.10 ---
Discovery starting.
Discovery completed in 201 milliseconds.
Run starting. Expected test count is: 7
ViennaCLSuiteVCL:
- row-major viennacl::matrix
  + OCL matrix memory domain after assgn=2
[INFO] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
[INFO] Successfully created org.apache.mahout.viennacl.opencl.GPUMMul solver
jvmRWRW
gpuRWCW
log4j:WARN No appenders could be found for logger
(org.apache.mahout.viennacl.opencl.GPUMMul$).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for
more info.
[INFO] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
[INFO] Successfully created org.apache.mahout.viennacl.opencl.GPUMMul solver
jvmRWRW
gpuRWCW
- dense vcl mmul with fast_copy
[INFO] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
[INFO] Successfully created org.apache.mahout.viennacl.opencl.GPUMMul solver
jvmRWRW
gpuRWCW
- mmul microbenchmark
  + Mahout multiplication time: 2630 ms.
  + ViennaCL/OpenCL multiplication time: 2072 ms.
  + ViennaCL/cpu/OpenMP multiplication time: 813 ms.
- trans
[INFO] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
[INFO] Successfully created org.apache.mahout.viennacl.opencl.GPUMMul solver
gpuSparseRWRW
ViennaCL: FATAL ERROR: Kernel start failed for 'spgemm_stage3'.
ViennaCL: Smaller work sizes could not solve the problem.
ViennaCL: FATAL ERROR: CL_MEM_OBJECT_ALLOCATION_FAILURE
 ViennaCL could not allocate memory on the device. Most likely the device
simply ran out of memory.
If you think that this is a bug in ViennaCL, please report it at
viennacl-support@lists.sourceforge.net and supply at least the following
information:
 * Operating System
 * Which OpenCL implementation (AMD, NVIDIA, etc.)
 * ViennaCL version
Many thanks in advance!falling back to JVM MMUL
ViennaCL: FATAL ERROR: Kernel start failed for 'spgemm_stage3'.
ViennaCL: Smaller work sizes could not solve the problem.
- sparse mmul microbenchmark *** FAILED ***
  java.lang.RuntimeException: ViennaCL: FATAL ERROR:
CL_MEM_OBJECT_ALLOCATION_FAILURE
 ViennaCL could not allocate memory on the device. Most likely the device
simply ran out of memory.
If you think that this is a bug in ViennaCL, please report it at
viennacl-support@lists.sourceforge.net and supply at least the following
information:
 * Operating System
 * Which OpenCL implementation (AMD, NVIDIA, etc.)
 * ViennaCL version
Many thanks in advance!
  at
org.apache.mahout.viennacl.opencl.javacpp.CompressedMatrix.allocate(Native
Method)
  at
org.apache.mahout.viennacl.opencl.javacpp.CompressedMatrix.<init>(CompressedMatrix.scala:61)
  at
org.apache.mahout.viennacl.opencl.ViennaCLSuiteVCL$$anonfun$5.apply$mcV$sp(ViennaCLSuiteVCL.scala:250)
  at
org.apache.mahout.viennacl.opencl.ViennaCLSuiteVCL$$anonfun$5.apply(ViennaCLSuiteVCL.scala:218)
  at
org.apache.mahout.viennacl.opencl.ViennaCLSuiteVCL$$anonfun$5.apply(ViennaCLSuiteVCL.scala:218)
  at
org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
  at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
  at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
  at org.scalatest.Transformer.apply(Transformer.scala:22)
  at org.scalatest.Transformer.apply(Transformer.scala:20)
  ...
  + Mahout Sparse multiplication time: 13303 ms.
- VCL Dense Matrix %*% Dense vector
  + Mahout dense matrix %*% dense vector multiplication time: 0 ms.
  + ViennaCL/cpu/OpenMP dense matrix %*% dense vector multiplication time:
4 ms.
[INFO] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
[INFO] Successfully created org.apache.mahout.viennacl.opencl.GPUMMul solver
gpuRWCW
- Sparse %*% Dense mmul microbenchmark
  + Mahout multiplication time: 821 ms.
  + ViennaCL/OpenCL multiplication time: 754 ms.
  + ViennaCL/cpu/OpenMP multiplication time: 1473 ms.
Run completed in 26 seconds, 944 milliseconds.
Total number of tests run: 7
Suites: completed 2, aborted 0
Tests: succeeded 6, failed 1, canceled 0, ignored 0, pending 0
*** 1 TEST FAILED ***
[INFO]
------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO]
------------------------------------------------------------------------
[INFO] Total time: 45.600 s
[INFO] Finished at: 2016-10-28T17:58:45-05:00
[INFO] Final Memory: 20M/309M
[INFO]
------------------------------------------------------------------------
[ERROR] Failed to execute goal
org.scalatest:scalatest-maven-plugin:1.0:test (test) on project
mahout-native-viennacl_2.10: There are test failures -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions,
please read the following articles:
[ERROR] [Help 1]
http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException


Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things."  -Virgil*


On Fri, Oct 28, 2016 at 5:38 PM, Andrew Musselman <
andrew.musselman@gmail.com> wrote:

> >
> > My Desktop
>
> >
> Branch: master
>
> > Build Command: mvn clean install -Phadoop2
>
> > Built and tested successfully
>
> >
> Branch: andrewpalumbo/mahout/viennacl-opmmul-a
>
> > Bulid Command: mvn clean install -Pviennacl -Phadoop2
>
> Builds successfully-
>
> > Tests still riddled with
>
> >
> [INFO] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
>
> > [WARN] Unable to create class GPUMMul: attempting OpenMP version
>
> > [INFO] Creating org.apache.mahout.viennacl.openmp.OMPMMul solver
>
> > [INFO] Unable to create class OMPMMul: falling back to java version
>
> >
> Also OOM
> ViennaCLSuiteOMP:
> - row-major viennacl::matrix
> [INFO] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
> [WARN] Unable to create class GPUMMul: attempting OpenMP version
> [INFO] Creating org.apache.mahout.viennacl.openmp.OMPMMul solver
> [INFO] Unable to create class OMPMMul: falling back to java version
> - mmul microbenchmark
>   + Mahout multiplication time: 8875 ms.
>   + ViennaCL/cpu/OpenMP multiplication time: 599 ms.
> - trans
> [INFO] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
> [WARN] Unable to create class GPUMMul: attempting OpenMP version
> [INFO] Creating org.apache.mahout.viennacl.openmp.OMPMMul solver
> [INFO] Unable to create class OMPMMul: falling back to java version
> *** RUN ABORTED ***
>   java.lang.OutOfMemoryError: Java heap space
>   at
> it.unimi.dsi.fastutil.ints.Int2DoubleOpenHashMap.rehash(
> Int2DoubleOpenHashMap.java:1059)
>   at
> it.unimi.dsi.fastutil.ints.Int2DoubleOpenHashMap.insert(
> Int2DoubleOpenHashMap.java:295)
>   at
> it.unimi.dsi.fastutil.ints.Int2DoubleOpenHashMap.put(
> Int2DoubleOpenHashMap.java:301)
>   at
> org.apache.mahout.math.RandomAccessSparseVector.setQuick(
> RandomAccessSparseVector.java:130)
>   at
> org.apache.mahout.math.SparseRowMatrix.setQuick(SparseRowMatrix.java:105)
>   at org.apache.mahout.math.AbstractMatrix.assign(AbstractMatrix.java:256)
>   at
> org.apache.mahout.math.scalabindings.MatrixOps.$
> colon$eq(MatrixOps.scala:192)
>   at
> org.apache.mahout.math.scalabindings.MatrixOps.cloned(MatrixOps.scala:260)
>   at
> org.apache.mahout.math.scalabindings.MatrixOps.$minus(MatrixOps.scala:66)
>   at
> org.apache.mahout.viennacl.openmp.ViennaCLSuiteOMP$$
> anonfun$4.apply$mcV$sp(ViennaCLSuiteOMP.scala:152)
>
> >
>
> Machine Info:
>
> > ============================================================
> ====================
>
> >
>  os major/minor:
>
> > $ lsb_release -a
> No LSB modules are available.
> Distributor ID: Ubuntu
> Description: Ubuntu 16.04.1 LTS
> Release: 16.04
> Codename: xenial
>
> >
>
>  chip arch:
>
> > $ uname -a
> Linux Bob 4.4.0-34-generic #53-Ubuntu SMP Wed Jul 27 16:06:39 UTC 2016
> x86_64 x86_64 x86_64 GNU/Linux
>
> >
> $ cat /proc/cpuinfo
>
> > ...
>
> > model name : Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz
>
> > ...
>
> > ^^ 6 cores
>
> >
>
> GPU:
>
> > $ sudo nvidia-smi
>
> >
> +-----------------------------------------------------------
> ------------------+
> | NVIDIA-SMI 367.44                 Driver Version: 367.44
>    |
> |-------------------------------+----------------------+----
> ------------------+
> | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr.
> ECC |
> | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute
> M. |
> |===============================+======================+====
> ==================|
> |   0  GeForce GTX 750 Ti  Off  | 0000:01:00.0      On |
>  N/A |
> | 29%   25C    P8     1W /  38W |    307MiB /  1998MiB |      0%
>  Default |
> +-------------------------------+----------------------+----
> ------------------+
>
>
> +-----------------------------------------------------------
> ------------------+
> | Processes:                                                       GPU
> Memory |
> |  GPU       PID  Type  Process name                               Usage
>    |
> |===========================================================
> ==================|
> |    0      1537    G   ...C-EnableWebRtcEcdsa,WebRTC-H264WithOpenH2
>  87MiB |
> |    0      1990    G   /usr/lib/xorg/Xorg
> 101MiB |
> |    0      2319    G   /usr/bin/gnome-shell
> 104MiB |
> |    0      4360    G   /usr/lib/xorg/Xorg
>  12MiB |
> +-----------------------------------------------------------
> ------------------+
>
> >
>
>
>
>  clinfo output:
>
> > $ clinfo
> Number of platforms                               1
>   Platform Name                                   NVIDIA CUDA
>   Platform Vendor                                 NVIDIA Corporation
>   Platform Version                                OpenCL 1.2 CUDA 8.0.0
>   Platform Profile                                FULL_PROFILE
>   Platform Extensions
> cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics
> cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics
> cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing
> cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll
> cl_nv_copy_opts
>   Platform Extensions function suffix             NV
>
>   Platform Name                                   NVIDIA CUDA
> Number of devices                                 1
>   Device Name                                     GeForce GTX 750 Ti
>   Device Vendor                                   NVIDIA Corporation
>   Device Vendor ID                                0x10de
>   Device Version                                  OpenCL 1.2 CUDA
>   Driver Version                                  367.44
>   Device OpenCL C Version                         OpenCL C 1.2
>   Device Type                                     GPU
>   Device Profile                                  FULL_PROFILE
>   Device Topology (NV)                            PCI-E, 01:00.0
>   Max compute units                               5
>   Max clock frequency                             1150MHz
>   Compute Capability (NV)                         5.0
>   Device Partition                                (core)
>     Max number of sub-devices                     1
>     Supported partition types                     None
>   Max work item dimensions                        3
>   Max work item sizes                             1024x1024x64
>   Max work group size                             1024
>   Preferred work group size multiple              32
>   Warp size (NV)                                  32
>   Preferred / native vector sizes
>     char                                                 1 / 1
>     short                                                1 / 1
>     int                                                  1 / 1
>     long                                                 1 / 1
>     half                                                 0 / 0        (n/a)
>     float                                                1 / 1
>     double                                               1 / 1
>  (cl_khr_fp64)
>   Half-precision Floating-point support           (n/a)
>   Single-precision Floating-point support         (core)
>     Denormals                                     Yes
>     Infinity and NANs                             Yes
>     Round to nearest                              Yes
>     Round to zero                                 Yes
>     Round to infinity                             Yes
>     IEEE754-2008 fused multiply-add               Yes
>     Support is emulated in software               No
>     Correctly-rounded divide and sqrt operations  Yes
>   Double-precision Floating-point support         (cl_khr_fp64)
>     Denormals                                     Yes
>     Infinity and NANs                             Yes
>     Round to nearest                              Yes
>     Round to zero                                 Yes
>     Round to infinity                             Yes
>     IEEE754-2008 fused multiply-add               Yes
>     Support is emulated in software               No
>     Correctly-rounded divide and sqrt operations  No
>   Address bits                                    64, Little-Endian
>   Global memory size                              2095841280 (1.952GiB)
>   Error Correction support                        No
>   Max memory allocation                           523960320 (499.7MiB)
>   Unified memory for Host and Device              No
>   Integrated memory (NV)                          No
>   Minimum alignment for any data type             128 bytes
>   Alignment of base address                       4096 bits (512 bytes)
>   Global Memory cache type                        Read/Write
>   Global Memory cache size                        81920
>   Global Memory cache line                        128 bytes
>   Image support                                   Yes
>     Max number of samplers per kernel             32
>     Max size for 1D images from buffer            134217728 pixels
>     Max 1D or 2D image array size                 2048 images
>     Max 2D image size                             16384x16384 pixels
>     Max 3D image size                             4096x4096x4096 pixels
>     Max number of read image args                 256
>     Max number of write image args                16
>   Local memory type                               Local
>   Local memory size                               49152 (48KiB)
>   Registers per block (NV)                        65536
>   Max constant buffer size                        65536 (64KiB)
>   Max number of constant args                     9
>   Max size of kernel argument                     4352 (4.25KiB)
>   Queue properties
>     Out-of-order execution                        Yes
>     Profiling                                     Yes
>   Prefer user sync for interop                    No
>   Profiling timer resolution                      1000ns
>   Execution capabilities
>     Run OpenCL kernels                            Yes
>     Run native kernels                            No
>     Kernel execution timeout (NV)                 Yes
>   Concurrent copy and kernel execution (NV)       Yes
>     Number of async copy engines                  1
>   printf() buffer size                            1048576 (1024KiB)
>   Built-in kernels
>   Device Available                                Yes
>   Compiler Available                              Yes
>   Linker Available                                Yes
>   Device Extensions
> cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics
> cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics
> cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing
> cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll
> cl_nv_copy_opts
>
> NULL platform behavior
>   clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  NVIDIA CUDA
>   clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   Success [NV]
>   clCreateContext(NULL, ...) [default]            Success [NV]
>   clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No devices found in
> platform
>   clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  No platform
>   clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No devices
> found in platform
>   clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  No devices found in
> platform
>   clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  No platform
>
> ICD loader properties
>   ICD loader Name                                 OpenCL ICD Loader
>   ICD loader Vendor                               OCL Icd free software
>   ICD loader Version                              2.2.8
>   ICD loader Profile                              OpenCL 1.2
> NOTE: your OpenCL library declares to support OpenCL 1.2,
> but it seems to support up to OpenCL 2.1 too.
>
> >
>
>  Java major/minor and vendor:
>
> > $ java -version
> openjdk version "1.8.0_91"
> OpenJDK Runtime Environment (build 1.8.0_91-8u91-b14-3ubuntu1~16.04.1-b14)
> OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode)
>
> >
>
>   vcl version,
>
> > (Ignoring what is in /usr/include/viennacl/version.hpp)
>
> > 1.7.1
>
>
> gcc version:
>
> > $ gcc --version
> gcc (Ubuntu 5.4.0-6ubuntu1~16.04.2) 5.4.0 20160609
>
> >
>
> On Wed, Oct 26, 2016 at 11:31 PM, Andrew Palumbo <ap...@outlook.com>
> wrote:
>
> >
> >
> >
> > Could everybody please post any errors when stack trace that they are
> > getting on master or the vcl pr branch along with full system info?
> >
> > Ie.  Branch, os major/minor, chip arch, GPU, clinfo output, Java
> > major/minor and vendor,  vcl version, gcc version, and anything else that
> > may be useful so that we may compare?
> >
> > Thx
> >
> > Andy
> >
> >
> > Sent from my Galaxy Tab A
> >
>

RE: Errors when 0.13.0 master or vcl

Posted by Andrew Palumbo <ap...@outlook.com>.
Thx andrew for posting.



Sent from my Verizon Wireless 4G LTE smartphone


-------- Original message --------
From: Andrew Musselman <an...@gmail.com>
Date: 10/28/2016 3:39 PM (GMT-08:00)
To: dev@mahout.apache.org
Subject: Re: Errors when 0.13.0 master or vcl

>
> My Desktop

>
Branch: master

> Build Command: mvn clean install -Phadoop2

> Built and tested successfully

>
Branch: andrewpalumbo/mahout/viennacl-opmmul-a

> Bulid Command: mvn clean install -Pviennacl -Phadoop2

Builds successfully-

> Tests still riddled with

>
[INFO] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver

> [WARN] Unable to create class GPUMMul: attempting OpenMP version

> [INFO] Creating org.apache.mahout.viennacl.openmp.OMPMMul solver

> [INFO] Unable to create class OMPMMul: falling back to java version

>
Also OOM
ViennaCLSuiteOMP:
- row-major viennacl::matrix
[INFO] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
[WARN] Unable to create class GPUMMul: attempting OpenMP version
[INFO] Creating org.apache.mahout.viennacl.openmp.OMPMMul solver
[INFO] Unable to create class OMPMMul: falling back to java version
- mmul microbenchmark
  + Mahout multiplication time: 8875 ms.
  + ViennaCL/cpu/OpenMP multiplication time: 599 ms.
- trans
[INFO] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
[WARN] Unable to create class GPUMMul: attempting OpenMP version
[INFO] Creating org.apache.mahout.viennacl.openmp.OMPMMul solver
[INFO] Unable to create class OMPMMul: falling back to java version
*** RUN ABORTED ***
  java.lang.OutOfMemoryError: Java heap space
  at
it.unimi.dsi.fastutil.ints.Int2DoubleOpenHashMap.rehash(Int2DoubleOpenHashMap.java:1059)
  at
it.unimi.dsi.fastutil.ints.Int2DoubleOpenHashMap.insert(Int2DoubleOpenHashMap.java:295)
  at
it.unimi.dsi.fastutil.ints.Int2DoubleOpenHashMap.put(Int2DoubleOpenHashMap.java:301)
  at
org.apache.mahout.math.RandomAccessSparseVector.setQuick(RandomAccessSparseVector.java:130)
  at
org.apache.mahout.math.SparseRowMatrix.setQuick(SparseRowMatrix.java:105)
  at org.apache.mahout.math.AbstractMatrix.assign(AbstractMatrix.java:256)
  at
org.apache.mahout.math.scalabindings.MatrixOps.$colon$eq(MatrixOps.scala:192)
  at
org.apache.mahout.math.scalabindings.MatrixOps.cloned(MatrixOps.scala:260)
  at
org.apache.mahout.math.scalabindings.MatrixOps.$minus(MatrixOps.scala:66)
  at
org.apache.mahout.viennacl.openmp.ViennaCLSuiteOMP$$anonfun$4.apply$mcV$sp(ViennaCLSuiteOMP.scala:152)

>

Machine Info:

> ============================================================
====================

>
 os major/minor:

> $ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.1 LTS
Release: 16.04
Codename: xenial

>

 chip arch:

> $ uname -a
Linux Bob 4.4.0-34-generic #53-Ubuntu SMP Wed Jul 27 16:06:39 UTC 2016
x86_64 x86_64 x86_64 GNU/Linux

>
$ cat /proc/cpuinfo

> ...

> model name : Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz

> ...

> ^^ 6 cores

>

GPU:

> $ sudo nvidia-smi

>
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.44                 Driver Version: 367.44
   |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr.
ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute
M. |
|===============================+======================+======================|
|   0  GeForce GTX 750 Ti  Off  | 0000:01:00.0      On |
 N/A |
| 29%   25C    P8     1W /  38W |    307MiB /  1998MiB |      0%
 Default |
+-------------------------------+----------------------+----------------------+


+-----------------------------------------------------------------------------+
| Processes:                                                       GPU
Memory |
|  GPU       PID  Type  Process name                               Usage
   |
|=============================================================================|
|    0      1537    G   ...C-EnableWebRtcEcdsa,WebRTC-H264WithOpenH2
 87MiB |
|    0      1990    G   /usr/lib/xorg/Xorg
101MiB |
|    0      2319    G   /usr/bin/gnome-shell
104MiB |
|    0      4360    G   /usr/lib/xorg/Xorg
 12MiB |
+-----------------------------------------------------------------------------+

>



 clinfo output:

> $ clinfo
Number of platforms                               1
  Platform Name                                   NVIDIA CUDA
  Platform Vendor                                 NVIDIA Corporation
  Platform Version                                OpenCL 1.2 CUDA 8.0.0
  Platform Profile                                FULL_PROFILE
  Platform Extensions
cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics
cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics
cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing
cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll
cl_nv_copy_opts
  Platform Extensions function suffix             NV

  Platform Name                                   NVIDIA CUDA
Number of devices                                 1
  Device Name                                     GeForce GTX 750 Ti
  Device Vendor                                   NVIDIA Corporation
  Device Vendor ID                                0x10de
  Device Version                                  OpenCL 1.2 CUDA
  Driver Version                                  367.44
  Device OpenCL C Version                         OpenCL C 1.2
  Device Type                                     GPU
  Device Profile                                  FULL_PROFILE
  Device Topology (NV)                            PCI-E, 01:00.0
  Max compute units                               5
  Max clock frequency                             1150MHz
  Compute Capability (NV)                         5.0
  Device Partition                                (core)
    Max number of sub-devices                     1
    Supported partition types                     None
  Max work item dimensions                        3
  Max work item sizes                             1024x1024x64
  Max work group size                             1024
  Preferred work group size multiple              32
  Warp size (NV)                                  32
  Preferred / native vector sizes
    char                                                 1 / 1
    short                                                1 / 1
    int                                                  1 / 1
    long                                                 1 / 1
    half                                                 0 / 0        (n/a)
    float                                                1 / 1
    double                                               1 / 1
 (cl_khr_fp64)
  Half-precision Floating-point support           (n/a)
  Single-precision Floating-point support         (core)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  Yes
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Address bits                                    64, Little-Endian
  Global memory size                              2095841280 (1.952GiB)
  Error Correction support                        No
  Max memory allocation                           523960320 (499.7MiB)
  Unified memory for Host and Device              No
  Integrated memory (NV)                          No
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       4096 bits (512 bytes)
  Global Memory cache type                        Read/Write
  Global Memory cache size                        81920
  Global Memory cache line                        128 bytes
  Image support                                   Yes
    Max number of samplers per kernel             32
    Max size for 1D images from buffer            134217728 pixels
    Max 1D or 2D image array size                 2048 images
    Max 2D image size                             16384x16384 pixels
    Max 3D image size                             4096x4096x4096 pixels
    Max number of read image args                 256
    Max number of write image args                16
  Local memory type                               Local
  Local memory size                               49152 (48KiB)
  Registers per block (NV)                        65536
  Max constant buffer size                        65536 (64KiB)
  Max number of constant args                     9
  Max size of kernel argument                     4352 (4.25KiB)
  Queue properties
    Out-of-order execution                        Yes
    Profiling                                     Yes
  Prefer user sync for interop                    No
  Profiling timer resolution                      1000ns
  Execution capabilities
    Run OpenCL kernels                            Yes
    Run native kernels                            No
    Kernel execution timeout (NV)                 Yes
  Concurrent copy and kernel execution (NV)       Yes
    Number of async copy engines                  1
  printf() buffer size                            1048576 (1024KiB)
  Built-in kernels
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Device Extensions
cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics
cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics
cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing
cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll
cl_nv_copy_opts

NULL platform behavior
  clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  NVIDIA CUDA
  clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   Success [NV]
  clCreateContext(NULL, ...) [default]            Success [NV]
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No devices found in
platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  No platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No devices
found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  No devices found in
platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  No platform

ICD loader properties
  ICD loader Name                                 OpenCL ICD Loader
  ICD loader Vendor                               OCL Icd free software
  ICD loader Version                              2.2.8
  ICD loader Profile                              OpenCL 1.2
NOTE: your OpenCL library declares to support OpenCL 1.2,
but it seems to support up to OpenCL 2.1 too.

>

 Java major/minor and vendor:

> $ java -version
openjdk version "1.8.0_91"
OpenJDK Runtime Environment (build 1.8.0_91-8u91-b14-3ubuntu1~16.04.1-b14)
OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode)

>

  vcl version,

> (Ignoring what is in /usr/include/viennacl/version.hpp)

> 1.7.1


gcc version:

> $ gcc --version
gcc (Ubuntu 5.4.0-6ubuntu1~16.04.2) 5.4.0 20160609

>

On Wed, Oct 26, 2016 at 11:31 PM, Andrew Palumbo <ap...@outlook.com> wrote:

>
>
>
> Could everybody please post any errors when stack trace that they are
> getting on master or the vcl pr branch along with full system info?
>
> Ie.  Branch, os major/minor, chip arch, GPU, clinfo output, Java
> major/minor and vendor,  vcl version, gcc version, and anything else that
> may be useful so that we may compare?
>
> Thx
>
> Andy
>
>
> Sent from my Galaxy Tab A
>

Re: Errors when 0.13.0 master or vcl

Posted by Andrew Musselman <an...@gmail.com>.
>
> My Desktop

>
Branch: master

> Build Command: mvn clean install -Phadoop2

> Built and tested successfully

>
Branch: andrewpalumbo/mahout/viennacl-opmmul-a

> Bulid Command: mvn clean install -Pviennacl -Phadoop2

Builds successfully-

> Tests still riddled with

>
[INFO] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver

> [WARN] Unable to create class GPUMMul: attempting OpenMP version

> [INFO] Creating org.apache.mahout.viennacl.openmp.OMPMMul solver

> [INFO] Unable to create class OMPMMul: falling back to java version

>
Also OOM
ViennaCLSuiteOMP:
- row-major viennacl::matrix
[INFO] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
[WARN] Unable to create class GPUMMul: attempting OpenMP version
[INFO] Creating org.apache.mahout.viennacl.openmp.OMPMMul solver
[INFO] Unable to create class OMPMMul: falling back to java version
- mmul microbenchmark
  + Mahout multiplication time: 8875 ms.
  + ViennaCL/cpu/OpenMP multiplication time: 599 ms.
- trans
[INFO] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
[WARN] Unable to create class GPUMMul: attempting OpenMP version
[INFO] Creating org.apache.mahout.viennacl.openmp.OMPMMul solver
[INFO] Unable to create class OMPMMul: falling back to java version
*** RUN ABORTED ***
  java.lang.OutOfMemoryError: Java heap space
  at
it.unimi.dsi.fastutil.ints.Int2DoubleOpenHashMap.rehash(Int2DoubleOpenHashMap.java:1059)
  at
it.unimi.dsi.fastutil.ints.Int2DoubleOpenHashMap.insert(Int2DoubleOpenHashMap.java:295)
  at
it.unimi.dsi.fastutil.ints.Int2DoubleOpenHashMap.put(Int2DoubleOpenHashMap.java:301)
  at
org.apache.mahout.math.RandomAccessSparseVector.setQuick(RandomAccessSparseVector.java:130)
  at
org.apache.mahout.math.SparseRowMatrix.setQuick(SparseRowMatrix.java:105)
  at org.apache.mahout.math.AbstractMatrix.assign(AbstractMatrix.java:256)
  at
org.apache.mahout.math.scalabindings.MatrixOps.$colon$eq(MatrixOps.scala:192)
  at
org.apache.mahout.math.scalabindings.MatrixOps.cloned(MatrixOps.scala:260)
  at
org.apache.mahout.math.scalabindings.MatrixOps.$minus(MatrixOps.scala:66)
  at
org.apache.mahout.viennacl.openmp.ViennaCLSuiteOMP$$anonfun$4.apply$mcV$sp(ViennaCLSuiteOMP.scala:152)

>

Machine Info:

> ============================================================
====================

>
 os major/minor:

> $ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.1 LTS
Release: 16.04
Codename: xenial

>

 chip arch:

> $ uname -a
Linux Bob 4.4.0-34-generic #53-Ubuntu SMP Wed Jul 27 16:06:39 UTC 2016
x86_64 x86_64 x86_64 GNU/Linux

>
$ cat /proc/cpuinfo

> ...

> model name : Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz

> ...

> ^^ 6 cores

>

GPU:

> $ sudo nvidia-smi

>
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.44                 Driver Version: 367.44
   |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr.
ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute
M. |
|===============================+======================+======================|
|   0  GeForce GTX 750 Ti  Off  | 0000:01:00.0      On |
 N/A |
| 29%   25C    P8     1W /  38W |    307MiB /  1998MiB |      0%
 Default |
+-------------------------------+----------------------+----------------------+


+-----------------------------------------------------------------------------+
| Processes:                                                       GPU
Memory |
|  GPU       PID  Type  Process name                               Usage
   |
|=============================================================================|
|    0      1537    G   ...C-EnableWebRtcEcdsa,WebRTC-H264WithOpenH2
 87MiB |
|    0      1990    G   /usr/lib/xorg/Xorg
101MiB |
|    0      2319    G   /usr/bin/gnome-shell
104MiB |
|    0      4360    G   /usr/lib/xorg/Xorg
 12MiB |
+-----------------------------------------------------------------------------+

>



 clinfo output:

> $ clinfo
Number of platforms                               1
  Platform Name                                   NVIDIA CUDA
  Platform Vendor                                 NVIDIA Corporation
  Platform Version                                OpenCL 1.2 CUDA 8.0.0
  Platform Profile                                FULL_PROFILE
  Platform Extensions
cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics
cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics
cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing
cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll
cl_nv_copy_opts
  Platform Extensions function suffix             NV

  Platform Name                                   NVIDIA CUDA
Number of devices                                 1
  Device Name                                     GeForce GTX 750 Ti
  Device Vendor                                   NVIDIA Corporation
  Device Vendor ID                                0x10de
  Device Version                                  OpenCL 1.2 CUDA
  Driver Version                                  367.44
  Device OpenCL C Version                         OpenCL C 1.2
  Device Type                                     GPU
  Device Profile                                  FULL_PROFILE
  Device Topology (NV)                            PCI-E, 01:00.0
  Max compute units                               5
  Max clock frequency                             1150MHz
  Compute Capability (NV)                         5.0
  Device Partition                                (core)
    Max number of sub-devices                     1
    Supported partition types                     None
  Max work item dimensions                        3
  Max work item sizes                             1024x1024x64
  Max work group size                             1024
  Preferred work group size multiple              32
  Warp size (NV)                                  32
  Preferred / native vector sizes
    char                                                 1 / 1
    short                                                1 / 1
    int                                                  1 / 1
    long                                                 1 / 1
    half                                                 0 / 0        (n/a)
    float                                                1 / 1
    double                                               1 / 1
 (cl_khr_fp64)
  Half-precision Floating-point support           (n/a)
  Single-precision Floating-point support         (core)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  Yes
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Address bits                                    64, Little-Endian
  Global memory size                              2095841280 (1.952GiB)
  Error Correction support                        No
  Max memory allocation                           523960320 (499.7MiB)
  Unified memory for Host and Device              No
  Integrated memory (NV)                          No
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       4096 bits (512 bytes)
  Global Memory cache type                        Read/Write
  Global Memory cache size                        81920
  Global Memory cache line                        128 bytes
  Image support                                   Yes
    Max number of samplers per kernel             32
    Max size for 1D images from buffer            134217728 pixels
    Max 1D or 2D image array size                 2048 images
    Max 2D image size                             16384x16384 pixels
    Max 3D image size                             4096x4096x4096 pixels
    Max number of read image args                 256
    Max number of write image args                16
  Local memory type                               Local
  Local memory size                               49152 (48KiB)
  Registers per block (NV)                        65536
  Max constant buffer size                        65536 (64KiB)
  Max number of constant args                     9
  Max size of kernel argument                     4352 (4.25KiB)
  Queue properties
    Out-of-order execution                        Yes
    Profiling                                     Yes
  Prefer user sync for interop                    No
  Profiling timer resolution                      1000ns
  Execution capabilities
    Run OpenCL kernels                            Yes
    Run native kernels                            No
    Kernel execution timeout (NV)                 Yes
  Concurrent copy and kernel execution (NV)       Yes
    Number of async copy engines                  1
  printf() buffer size                            1048576 (1024KiB)
  Built-in kernels
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Device Extensions
cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics
cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics
cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing
cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll
cl_nv_copy_opts

NULL platform behavior
  clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  NVIDIA CUDA
  clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   Success [NV]
  clCreateContext(NULL, ...) [default]            Success [NV]
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No devices found in
platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  No platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No devices
found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  No devices found in
platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  No platform

ICD loader properties
  ICD loader Name                                 OpenCL ICD Loader
  ICD loader Vendor                               OCL Icd free software
  ICD loader Version                              2.2.8
  ICD loader Profile                              OpenCL 1.2
NOTE: your OpenCL library declares to support OpenCL 1.2,
but it seems to support up to OpenCL 2.1 too.

>

 Java major/minor and vendor:

> $ java -version
openjdk version "1.8.0_91"
OpenJDK Runtime Environment (build 1.8.0_91-8u91-b14-3ubuntu1~16.04.1-b14)
OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode)

>

  vcl version,

> (Ignoring what is in /usr/include/viennacl/version.hpp)

> 1.7.1


gcc version:

> $ gcc --version
gcc (Ubuntu 5.4.0-6ubuntu1~16.04.2) 5.4.0 20160609

>

On Wed, Oct 26, 2016 at 11:31 PM, Andrew Palumbo <ap...@outlook.com> wrote:

>
>
>
> Could everybody please post any errors when stack trace that they are
> getting on master or the vcl pr branch along with full system info?
>
> Ie.  Branch, os major/minor, chip arch, GPU, clinfo output, Java
> major/minor and vendor,  vcl version, gcc version, and anything else that
> may be useful so that we may compare?
>
> Thx
>
> Andy
>
>
> Sent from my Galaxy Tab A
>

Re: Errors when 0.13.0 master or vcl

Posted by Trevor Grant <tr...@gmail.com>.
My Desktop

Branch: master
Build Command: mvn clean install -Phadoop2
Built and tested successfully

Branch: andrewpalumbo/mahout/viennacl-opmmul-a
Bulid Command: mvn clean install -Pviennacl -Phadoop2 -DskipTests && mvn
test

Builds successfully-
Tests still riddled with

[INFO] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
[WARN] Unable to create class GPUMMul: attempting OpenMP version
[INFO] Creating org.apache.mahout.viennacl.openmp.OMPMMul solver
[INFO] Unable to create class OMPMMul: falling back to java version

Would be fixed with pom hack I presented earlier (developed hack on this
box)

Machine Info:
================================================================================

 os major/minor:
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 14.04.3 LTS
Release: 14.04
Codename: trusty


 chip arch:
$ uname -a
Linux tower1 3.19.0-31-generic #36~14.04.1-Ubuntu SMP Thu Oct 8 10:21:08
UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

$ cat /proc/cpuinfo
...
model name : Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz
...
^^ 8 cores


GPU:
$ sudo nvidia-smi
Sun Oct 16 22:14:49 2016
+------------------------------------------------------+

| NVIDIA-SMI 352.63     Driver Version: 352.63         |

|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr.
ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute
M. |
|===============================+======================+======================|
|   0  GeForce GT 740      Off  | 0000:02:00.0     N/A |
 N/A |
| 33%   34C    P8    N/A /  N/A |    330MiB /  1021MiB |     N/A
 Default |
+-------------------------------+----------------------+----------------------+




 clinfo output:
 $ clinfo
 Number of platforms: 1
   Platform Profile: FULL_PROFILE
   Platform Version: OpenCL 1.2 CUDA 7.5.23
   Platform Name: NVIDIA CUDA
   Platform Vendor: NVIDIA Corporation
   Platform Extensions: cl_khr_byte_addressable_store cl_khr_icd
cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query
cl_nv_pragma_unroll cl_nv_copy_opts


   Platform Name: NVIDIA CUDA
 Number of devices: 1
   Device Type: CL_DEVICE_TYPE_GPU
   Device ID: 4318
   Max compute units: 2
   Max work items dimensions: 3
     Max work items[0]: 1024
     Max work items[1]: 1024
     Max work items[2]: 64
   Max work group size: 1024
   Preferred vector width char: 1
   Preferred vector width short: 1
   Preferred vector width int: 1
   Preferred vector width long: 1
   Preferred vector width float: 1
   Preferred vector width double: 1
   Native vector width char: 1
   Native vector width short: 1
   Native vector width int: 1
   Native vector width long: 1
   Native vector width float: 1
   Native vector width double: 1
   Max clock frequency: 1071Mhz
   Address bits: 64
   Max memory allocation: 267873536
   Image support: Yes
   Max number of images read arguments: 256
   Max number of images write arguments: 16
   Max image 2D width: 16384
   Max image 2D height: 16384
   Max image 3D width: 4096
   Max image 3D height: 4096
   Max image 3D depth: 4096
   Max samplers within kernel: 32
   Max size of kernel argument: 4352
   Alignment (bits) of base address: 4096
   Minimum alignment (bytes) for any datatype: 128
   Single precision floating point capability
     Denorms: Yes
     Quiet NaNs: Yes
     Round to nearest even: Yes
     Round to zero: Yes
     Round to +ve and infinity: Yes
     IEEE754-2008 fused multiply-add: Yes
   Cache type: Read/Write
   Cache line size: 128
   Cache size: 32768
   Global memory size: 1071494144
   Constant buffer size: 65536
   Max number of constant args: 9
   Local memory type: Local
   Local memory size: 49152
   Error correction support: 0
   Unified memory for Host and Device: 0
   Profiling timer resolution: 1000
   Device endianess: Little
   Available: Yes
   Compiler available: Yes
   Execution capabilities:
     Execute OpenCL kernels: Yes
     Execute native function: No
   Queue properties:
     Out-of-Order: Yes
     Profiling : Yes
   Platform ID: 0xab2160
   Name: GeForce GT 740
   Vendor: NVIDIA Corporation
   Device OpenCL C version: OpenCL C 1.2
   Driver version: 352.63
   Profile: FULL_PROFILE
   Version: OpenCL 1.2 CUDA
   Extensions: cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing
cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll
cl_nv_copy_opts  cl_khr_global_int32_base_atomics
cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics
cl_khr_local_int32_extended_atomics cl_khr_fp64



 Java major/minor and vendor:
 $ java -version
java version "1.7.0_85"
OpenJDK Runtime Environment (IcedTea 2.6.1) (7u85-2.6.1-5ubuntu0.14.04.1)
OpenJDK 64-Bit Server VM (build 24.85-b03, mixed mode)


  vcl version,
From /usr/include/viennacl/version.hpp
1.7.0

gcc version:
$ gcc --version
gcc (Ubuntu 5.4.1-2ubuntu1~14.04) 5.4.1 20160904


Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things."  -Virgil*


On Wed, Oct 26, 2016 at 11:31 PM, Andrew Palumbo <ap...@outlook.com> wrote:

>
>
>
> Could everybody please post any errors when stack trace that they are
> getting on master or the vcl pr branch along with full system info?
>
> Ie.  Branch, os major/minor, chip arch, GPU, clinfo output, Java
> major/minor and vendor,  vcl version, gcc version, and anything else that
> may be useful so that we may compare?
>
> Thx
>
> Andy
>
>
> Sent from my Galaxy Tab A
>