You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2020/09/13 02:06:10 UTC

[GitHub] [incubator-mxnet] MoisesHer opened a new pull request #19131: Add GPU-optimization for split op

MoisesHer opened a new pull request #19131:
URL: https://github.com/apache/incubator-mxnet/pull/19131


   ## Description ##
   Optimization of split operator on GPU
   
   ## Checklist ##
   ### Essentials ###
   - [ ] Changes are complete (i.e. I finished coding on this PR)
   - [x] All changes have test coverage
   - [x] Code is well-documented
   
   ### Changes ###
   - [x] Added a specific split operator for GPU
   - [x] The implementation includes optimal CUDA kernel that identifies if split is performed along last_axis or another, having different paths depending on that. 
   
   TODO:
   - Study performance on some use cases: 
   For BERT-base inference, split is improved from 0.56 ms to 0.18ms, i.e  3x speedup achieved.
     
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] sxjscience commented on a change in pull request #19131: Add GPU-optimization for split op

Posted by GitBox <gi...@apache.org>.
sxjscience commented on a change in pull request #19131:
URL: https://github.com/apache/incubator-mxnet/pull/19131#discussion_r502968320



##########
File path: tests/python/gpu/test_operator_gpu.py
##########
@@ -2319,3 +2319,21 @@ def test_fp16_spmm():
     out = mxsps.dot(inp, weight)
     out_np = mx.nd.dot(inp, weight)
     assert_almost_equal(out.asnumpy(), out_np, rtol=1e-3, atol=1e-5)
+
+@with_seed()
+@pytest.mark.serial

Review comment:
       I noticed that the `mark.serial` is triggered for all tests in this file. Thus, we may keep `mark.serial`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] sxjscience merged pull request #19131: Add GPU-optimization for split op

Posted by GitBox <gi...@apache.org>.
sxjscience merged pull request #19131:
URL: https://github.com/apache/incubator-mxnet/pull/19131


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] szha commented on a change in pull request #19131: Add GPU-optimization for split op

Posted by GitBox <gi...@apache.org>.
szha commented on a change in pull request #19131:
URL: https://github.com/apache/incubator-mxnet/pull/19131#discussion_r502997846



##########
File path: tests/python/gpu/test_operator_gpu.py
##########
@@ -2319,3 +2319,21 @@ def test_fp16_spmm():
     out = mxsps.dot(inp, weight)
     out_np = mx.nd.dot(inp, weight)
     assert_almost_equal(out.asnumpy(), out_np, rtol=1e-3, atol=1e-5)
+
+@with_seed()
+@pytest.mark.serial

Review comment:
       We run tests based on the tag and it has nothing to do with file. serial is only needed when test invocation is long-running and consumes lots of memory. Since this is no longer the case through parametrizing the input, the serial tag is not needed.
   
   please open a follow up PR to finish the change.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] szha commented on a change in pull request #19131: Add GPU-optimization for split op

Posted by GitBox <gi...@apache.org>.
szha commented on a change in pull request #19131:
URL: https://github.com/apache/incubator-mxnet/pull/19131#discussion_r487479893



##########
File path: tests/python/gpu/test_operator_gpu.py
##########
@@ -2319,3 +2319,37 @@ def test_fp16_spmm():
     out = mxsps.dot(inp, weight)
     out_np = mx.nd.dot(inp, weight)
     assert_almost_equal(out.asnumpy(), out_np, rtol=1e-3, atol=1e-5)
+
+@with_seed()
+@pytest.mark.serial
+def test_split_v2_fwd():
+    dim = random.randint(2, 6)
+    shape = rand_shape_nd(dim)
+    axis = random.randint(-dim, dim-1)
+    axis_size = shape[axis]
+    samples = random.randint(0, axis_size - 1)
+    indices = sorted(random.sample([i for i in range(1, axis_size)], samples))
+    indices = tuple(indices)
+    dtypes = ["float16", "float32", "float64"]
+    for dtype in dtypes:
+        mx_data = rand_ndarray(shape, dtype=dtype)
+        np_data = mx_data.asnumpy()
+        np_out = np.split(np_data, indices_or_sections=indices, axis=axis)
+        data = mx.sym.Variable("data")
+        sym = mx.sym.split_v2(data, indices_or_sections=indices, axis=axis)
+        check_symbolic_forward(sym, {"data": mx_data}, np_out, rtol=1e-3, atol=1e-5)
+    # test load types with dtpye fp16/fp32

Review comment:
       the following looks more suitable in a separate test.

##########
File path: tests/python/gpu/test_operator_gpu.py
##########
@@ -2319,3 +2319,37 @@ def test_fp16_spmm():
     out = mxsps.dot(inp, weight)
     out_np = mx.nd.dot(inp, weight)
     assert_almost_equal(out.asnumpy(), out_np, rtol=1e-3, atol=1e-5)
+
+@with_seed()
+@pytest.mark.serial
+def test_split_v2_fwd():
+    dim = random.randint(2, 6)
+    shape = rand_shape_nd(dim)
+    axis = random.randint(-dim, dim-1)
+    axis_size = shape[axis]
+    samples = random.randint(0, axis_size - 1)
+    indices = sorted(random.sample([i for i in range(1, axis_size)], samples))
+    indices = tuple(indices)
+    dtypes = ["float16", "float32", "float64"]
+    for dtype in dtypes:

Review comment:
       use `@pytest.mark.parametrize`

##########
File path: tests/python/gpu/test_operator_gpu.py
##########
@@ -2319,3 +2319,37 @@ def test_fp16_spmm():
     out = mxsps.dot(inp, weight)
     out_np = mx.nd.dot(inp, weight)
     assert_almost_equal(out.asnumpy(), out_np, rtol=1e-3, atol=1e-5)
+
+@with_seed()
+@pytest.mark.serial

Review comment:
       once you split the test, it shouldn't be necessary to run it as a serial test

##########
File path: tests/python/gpu/test_operator_gpu.py
##########
@@ -2319,3 +2319,37 @@ def test_fp16_spmm():
     out = mxsps.dot(inp, weight)
     out_np = mx.nd.dot(inp, weight)
     assert_almost_equal(out.asnumpy(), out_np, rtol=1e-3, atol=1e-5)
+
+@with_seed()
+@pytest.mark.serial
+def test_split_v2_fwd():
+    dim = random.randint(2, 6)
+    shape = rand_shape_nd(dim)
+    axis = random.randint(-dim, dim-1)
+    axis_size = shape[axis]
+    samples = random.randint(0, axis_size - 1)
+    indices = sorted(random.sample([i for i in range(1, axis_size)], samples))
+    indices = tuple(indices)
+    dtypes = ["float16", "float32", "float64"]
+    for dtype in dtypes:
+        mx_data = rand_ndarray(shape, dtype=dtype)
+        np_data = mx_data.asnumpy()
+        np_out = np.split(np_data, indices_or_sections=indices, axis=axis)
+        data = mx.sym.Variable("data")
+        sym = mx.sym.split_v2(data, indices_or_sections=indices, axis=axis)
+        check_symbolic_forward(sym, {"data": mx_data}, np_out, rtol=1e-3, atol=1e-5)
+    # test load types with dtpye fp16/fp32
+    multiple_fp64 = random.randint(1,8) * 4
+    shape = (multiple_fp64, multiple_fp64, multiple_fp64, multiple_fp64)
+    dtypes = ["float16", "float32"]
+    axes = [-1, -2, -3, -4]
+    n_sections = [1, 2, 4]
+    for dtype in dtypes:
+        for axis in axes:
+            for n_sec in n_sections:

Review comment:
       use `@pytest.mark.parametrize`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] szha commented on a change in pull request #19131: Add GPU-optimization for split op

Posted by GitBox <gi...@apache.org>.
szha commented on a change in pull request #19131:
URL: https://github.com/apache/incubator-mxnet/pull/19131#discussion_r502859048



##########
File path: tests/python/gpu/test_operator_gpu.py
##########
@@ -2319,3 +2319,21 @@ def test_fp16_spmm():
     out = mxsps.dot(inp, weight)
     out_np = mx.nd.dot(inp, weight)
     assert_almost_equal(out.asnumpy(), out_np, rtol=1e-3, atol=1e-5)
+
+@with_seed()
+@pytest.mark.serial

Review comment:
       no need to mark as serial




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] mxnet-bot commented on pull request #19131: Add GPU-optimization for split op

Posted by GitBox <gi...@apache.org>.
mxnet-bot commented on pull request #19131:
URL: https://github.com/apache/incubator-mxnet/pull/19131#issuecomment-691592557


   Hey @MoisesHer , Thanks for submitting the PR 
   All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands: 
   - To trigger all jobs: @mxnet-bot run ci [all] 
   - To trigger specific jobs: @mxnet-bot run ci [job1, job2] 
   *** 
   **CI supported jobs**: [miscellaneous, edge, windows-cpu, windows-gpu, unix-cpu, website, unix-gpu, clang, sanity, centos-cpu, centos-gpu]
   *** 
   _Note_: 
    Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin. 
   All CI tests must pass before the PR can be merged. 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org