You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@systemml.apache.org by GitBox <gi...@apache.org> on 2020/04/17 16:52:47 UTC

[GitHub] [systemml] juliale-15 opened a new pull request #892: Extend Python API (rand, lm, matrix multiplication)

juliale-15 opened a new pull request #892: Extend Python API (rand, lm, matrix multiplication)
URL: https://github.com/apache/systemml/pull/892
 
 
   Add rand(), lm and matrix multiplication to Python API

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [systemml] Baunsgaard commented on a change in pull request #892: Extend Python API (rand, lm, matrix multiplication)

Posted by GitBox <gi...@apache.org>.
Baunsgaard commented on a change in pull request #892: Extend Python API (rand, lm, matrix multiplication)
URL: https://github.com/apache/systemml/pull/892#discussion_r410876877
 
 

 ##########
 File path: src/main/python/tests/test_matrix_rand.py
 ##########
 @@ -0,0 +1,111 @@
+#-------------------------------------------------------------
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+#-------------------------------------------------------------
+
+# Make the `systemds` package importable
+import os
+import sys
+import unittest
+import numpy as np
+import scipy.stats as st
+import random
+
+path = os.path.join(os.path.dirname(os.path.realpath(__file__)), "../")
+sys.path.insert(0, path)
+from systemds.context import SystemDSContext
+
+
+shape = (random.randrange(0, 50), random.randrange(0, 50))
+min_max = (0, 1)
+sparsity = 0.2
+distributions = ['norm', 'uniform']
+
+sds = SystemDSContext()
+
+
+class TestRand(unittest.TestCase):
+    def test_rand_shape(self):
+        m = sds.rand(rows=shape[0], cols=shape[1]).compute()
+        self.assertTrue(m.shape == shape)
+
+    def test_rand_min_max(self):
+        m = sds.rand(rows=shape[0], cols=shape[1], min=min_max[0], max=min_max[1]).compute()
+        self.assertTrue((m.min() >= min_max[0]) and (m.max() <= min_max[1]))
+
+    def test_rand_sparsity(self):
+        m = sds.rand(rows=shape[0], cols=shape[1], sparsity=sparsity).compute()
+
+        m_flat = m.flatten('F')
+        count, bins = np.histogram(m_flat)
+
+        non_zero_value_percent = sum(count[1:]) * 100 / count[0]
+        e = 0.05
+        self.assertTrue(sum(count) == (shape[0] * shape[1]) and (non_zero_value_percent >= (sparsity - e) * 100)
+                        and (non_zero_value_percent <= (sparsity + e) * 100))
+
+    def test_rand_uniform_distribution(self):
+        m = sds.rand(rows=shape[0], cols=shape[1], pdf="uniform", min=min_max[0], max=min_max[1],).compute()
+
+        dist = find_best_fit_distribution(m.flatten('F'), distributions)
+        self.assertTrue(dist == 'uniform')
+
+    def test_rand_normal_distribution(self):
+        m = sds.rand(rows=shape[0], cols=shape[1], pdf="normal", min=min_max[0], max=min_max[1]).compute()
+
+        dist = find_best_fit_distribution(m.flatten('F'), distributions)
+        self.assertTrue(dist == 'norm')
+
+    def test_rand_invalid_shape(self):
+        try:
+            sds.rand(rows=1, cols=-10).compute()
+            self.assertTrue(False)
+        except Exception as e:
+            print(e)
+
+    def test_rand_invalid_pdf(self):
+        try:
+            sds.rand(rows=1, cols=10, pdf="norm").compute()
+            self.assertTrue(False)
+        except Exception as e:
+            print(e)
+
+
 
 Review comment:
   OCD (double newline)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [systemml] Baunsgaard commented on a change in pull request #892: Extend Python API (rand, lm, matrix multiplication)

Posted by GitBox <gi...@apache.org>.
Baunsgaard commented on a change in pull request #892: Extend Python API (rand, lm, matrix multiplication)
URL: https://github.com/apache/systemml/pull/892#discussion_r410384137
 
 

 ##########
 File path: src/main/python/systemds/matrix/operation_node.py
 ##########
 @@ -159,6 +159,10 @@ def __eq__(self, other):
     def __ne__(self, other):
         return OperationNode(self.sds_context, '!=', [self, other])
 
+
 
 Review comment:
   (OCD comment) double newline

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [systemml] Baunsgaard commented on a change in pull request #892: Extend Python API (rand, lm, matrix multiplication)

Posted by GitBox <gi...@apache.org>.
Baunsgaard commented on a change in pull request #892: Extend Python API (rand, lm, matrix multiplication)
URL: https://github.com/apache/systemml/pull/892#discussion_r410876929
 
 

 ##########
 File path: src/main/python/tests/test_matrix_rand.py
 ##########
 @@ -0,0 +1,111 @@
+#-------------------------------------------------------------
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+#-------------------------------------------------------------
+
+# Make the `systemds` package importable
+import os
+import sys
+import unittest
+import numpy as np
+import scipy.stats as st
+import random
+
+path = os.path.join(os.path.dirname(os.path.realpath(__file__)), "../")
+sys.path.insert(0, path)
+from systemds.context import SystemDSContext
+
+
 
 Review comment:
   OCD (double newline)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [systemml] Baunsgaard commented on a change in pull request #892: Extend Python API (rand, lm, matrix multiplication)

Posted by GitBox <gi...@apache.org>.
Baunsgaard commented on a change in pull request #892: Extend Python API (rand, lm, matrix multiplication)
URL: https://github.com/apache/systemml/pull/892#discussion_r410371982
 
 

 ##########
 File path: src/main/python/tests/test_matrix_aggregations.py
 ##########
 @@ -25,6 +25,8 @@
 import warnings
 import unittest
 import numpy as np
+import scipy.stats as st
 
 Review comment:
   The reason why some tests fail is:
   
   ```
   2020-04-17T16:54:38.5699934Z Traceback (most recent call last):
   2020-04-17T16:54:38.5701814Z   File "tests/test_matrix_aggregations.py", line 28, in <module>
   2020-04-17T16:54:38.5702098Z     import scipy.stats as st
   2020-04-17T16:54:38.5702821Z ModuleNotFoundError: No module named 'scipy'
   2020-04-17T16:54:38.5888836Z ##[error]Process completed with exit code 1.
   ```
   
   We can fix that by either adding SciPy as a dependency or not use it. It could be useful for other tests, so we could add it. Do you have any opinions on this?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [systemml] Baunsgaard commented on a change in pull request #892: Extend Python API (rand, lm, matrix multiplication)

Posted by GitBox <gi...@apache.org>.
Baunsgaard commented on a change in pull request #892: Extend Python API (rand, lm, matrix multiplication)
URL: https://github.com/apache/systemml/pull/892#discussion_r410373065
 
 

 ##########
 File path: src/main/python/tests/test_matrix_aggregations.py
 ##########
 @@ -84,6 +86,63 @@ def test_var2(self):
     def test_var3(self):
         self.assertTrue(np.allclose(sds.matrix(m1).var(axis=1).compute(), m1.var(axis=1, ddof=1).reshape(dim, 1)))
 
+    def test_rand_basic(self):
 
 Review comment:
   I would make a new file, maybe call it test_matrix_rand.py, since rand does not fit into aggregation operations.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [systemml] Baunsgaard commented on a change in pull request #892: Extend Python API (rand, lm, matrix multiplication)

Posted by GitBox <gi...@apache.org>.
Baunsgaard commented on a change in pull request #892: Extend Python API (rand, lm, matrix multiplication)
URL: https://github.com/apache/systemml/pull/892#discussion_r410379612
 
 

 ##########
 File path: src/main/python/tests/test_matrix_aggregations.py
 ##########
 @@ -84,6 +86,63 @@ def test_var2(self):
     def test_var3(self):
         self.assertTrue(np.allclose(sds.matrix(m1).var(axis=1).compute(), m1.var(axis=1, ddof=1).reshape(dim, 1)))
 
+    def test_rand_basic(self):
+        seed = 15
+        shape = (20, 20)
 
 Review comment:
   What happens if you give it (-1, 10)?
   Does the response make sense to you?
   What would help a user fix her/his input in such a case?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [systemml] juliale-15 commented on a change in pull request #892: Extend Python API (rand, lm, matrix multiplication)

Posted by GitBox <gi...@apache.org>.
juliale-15 commented on a change in pull request #892: Extend Python API (rand, lm, matrix multiplication)
URL: https://github.com/apache/systemml/pull/892#discussion_r410681349
 
 

 ##########
 File path: src/main/python/tests/test_matrix_aggregations.py
 ##########
 @@ -25,6 +25,8 @@
 import warnings
 import unittest
 import numpy as np
+import scipy.stats as st
 
 Review comment:
   Yes. I think it would be good to add it :) it can probably be used for other testcases

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [systemml] Baunsgaard commented on a change in pull request #892: Extend Python API (rand, lm, matrix multiplication)

Posted by GitBox <gi...@apache.org>.
Baunsgaard commented on a change in pull request #892: Extend Python API (rand, lm, matrix multiplication)
URL: https://github.com/apache/systemml/pull/892#discussion_r410681369
 
 

 ##########
 File path: src/main/python/tests/test_matrix_aggregations.py
 ##########
 @@ -84,6 +86,63 @@ def test_var2(self):
     def test_var3(self):
         self.assertTrue(np.allclose(sds.matrix(m1).var(axis=1).compute(), m1.var(axis=1, ddof=1).reshape(dim, 1)))
 
+    def test_rand_basic(self):
+        seed = 15
+        shape = (20, 20)
 
 Review comment:
   yes, but while doing it, look at the exception thrown, and the output as an entire thing, and reason with yourself, does this error message help. If yes, GREAT! If not what changes to that error message would be good to help a user, and where in the operations should this information be extracted and returned to the user.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [systemml] Baunsgaard commented on a change in pull request #892: Extend Python API (rand, lm, matrix multiplication)

Posted by GitBox <gi...@apache.org>.
Baunsgaard commented on a change in pull request #892: Extend Python API (rand, lm, matrix multiplication)
URL: https://github.com/apache/systemml/pull/892#discussion_r410684799
 
 

 ##########
 File path: src/main/python/tests/test_matrix_aggregations.py
 ##########
 @@ -84,6 +86,63 @@ def test_var2(self):
     def test_var3(self):
         self.assertTrue(np.allclose(sds.matrix(m1).var(axis=1).compute(), m1.var(axis=1, ddof=1).reshape(dim, 1)))
 
+    def test_rand_basic(self):
+        seed = 15
+        shape = (20, 20)
+        min_max = (0, 1)
+        sparsity = 0.2
+
+        m = sds.rand(rows=shape[0], cols=shape[1], pdf="uniform", min=min_max[0], max=min_max[1],
+                     seed=seed, sparsity=sparsity).compute()
+
+        self.assertTrue(m.shape == shape)
+        self.assertTrue((m.min() >= min_max[0]) and (m.max() <= min_max[1]))
+
+        # sparsity
+        m_flat = m.flatten('F')
+        count, bins, patches = plt.hist(m_flat)
+
+        non_zero_value_percent = sum(count[1:]) * 100 / count[0]
+        e = 0.05
+        self.assertTrue((non_zero_value_percent >= (sparsity - e) * 100)
+                        and (non_zero_value_percent <= (sparsity + e) * 100))
+        self.assertTrue(sum(count) == (shape[0] * shape[1]))
+
+    def test_rand_distribution(self):
+        seed = 15
+        shape = (20, 20)
+        min_max = (0, 1)
+
+        m = sds.rand(rows=shape[0], cols=shape[1], pdf="uniform", min=min_max[0], max=min_max[1],
+                     seed=seed).compute()
+
+        m_flat = m.flatten('F')
+
+        dist = best_distribution(m_flat)
+        self.assertTrue(dist == 'uniform')
+
+        m1 = sds.rand(rows=shape[0], cols=shape[1], pdf="normal", min=min_max[0], max=min_max[1],
+                     seed=seed).compute()
+
+        m1_flat = m1.flatten('F')
+
+        dist = best_distribution(m1_flat)
+        self.assertTrue(dist == 'norm')
+
+
+def best_distribution(data):
+    distributions = ['norm', 'uniform']
+    result = dict()
+
+    for dist in distributions:
+        param = getattr(st, dist).fit(data)
+
+        D, p_value = st.kstest(data, dist, args=param)
+        result[dist] = p_value
+
+    best_dist = max(result, key=result.get)
+    return best_dist
+
 
 Review comment:
   if it removes the newlines, that would be great :+1: 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [systemml] Baunsgaard commented on a change in pull request #892: Extend Python API (rand, lm, matrix multiplication)

Posted by GitBox <gi...@apache.org>.
Baunsgaard commented on a change in pull request #892: Extend Python API (rand, lm, matrix multiplication)
URL: https://github.com/apache/systemml/pull/892#discussion_r410381855
 
 

 ##########
 File path: src/main/python/tests/test_matrix_aggregations.py
 ##########
 @@ -84,6 +86,63 @@ def test_var2(self):
     def test_var3(self):
         self.assertTrue(np.allclose(sds.matrix(m1).var(axis=1).compute(), m1.var(axis=1, ddof=1).reshape(dim, 1)))
 
+    def test_rand_basic(self):
+        seed = 15
+        shape = (20, 20)
+        min_max = (0, 1)
+        sparsity = 0.2
+
+        m = sds.rand(rows=shape[0], cols=shape[1], pdf="uniform", min=min_max[0], max=min_max[1],
+                     seed=seed, sparsity=sparsity).compute()
+
+        self.assertTrue(m.shape == shape)
+        self.assertTrue((m.min() >= min_max[0]) and (m.max() <= min_max[1]))
+
+        # sparsity
+        m_flat = m.flatten('F')
+        count, bins, patches = plt.hist(m_flat)
+
+        non_zero_value_percent = sum(count[1:]) * 100 / count[0]
+        e = 0.05
+        self.assertTrue((non_zero_value_percent >= (sparsity - e) * 100)
+                        and (non_zero_value_percent <= (sparsity + e) * 100))
+        self.assertTrue(sum(count) == (shape[0] * shape[1]))
 
 Review comment:
   Good practice is to only assert one thing in each test.
   This is sometimes hard, to correlate with not duplicating code or repeating steps, but possible if some thought is put into it.
   (Low Priority)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [systemml] Baunsgaard commented on a change in pull request #892: Extend Python API (rand, lm, matrix multiplication)

Posted by GitBox <gi...@apache.org>.
Baunsgaard commented on a change in pull request #892: Extend Python API (rand, lm, matrix multiplication)
URL: https://github.com/apache/systemml/pull/892#discussion_r410385286
 
 

 ##########
 File path: src/main/python/systemds/matrix/matrix.py
 ##########
 @@ -141,3 +141,32 @@ def seq(sds_context: 'SystemDSContext', start: Union[float, int], stop: Union[fl
         start = 0
     unnamed_input_nodes = [start, stop, step]
     return OperationNode(sds_context, 'seq', unnamed_input_nodes)
+
+
 
 Review comment:
   (OCD comment) double newline

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [systemml] Baunsgaard commented on a change in pull request #892: Extend Python API (rand, lm, matrix multiplication)

Posted by GitBox <gi...@apache.org>.
Baunsgaard commented on a change in pull request #892: Extend Python API (rand, lm, matrix multiplication)
URL: https://github.com/apache/systemml/pull/892#discussion_r410384979
 
 

 ##########
 File path: src/main/python/systemds/matrix/operation_node.py
 ##########
 @@ -229,3 +233,10 @@ def moment(self, moment, weights: DAGNode = None) -> 'OperationNode':
             unnamed_inputs.append(weights)
         unnamed_inputs.append(moment)
         return OperationNode(self.sds_context, 'moment', unnamed_inputs, output_type=OutputType.DOUBLE)
+
+    def lm(self, y: DAGNode, **kwargs) -> 'OperationNode':
 
 Review comment:
   (Suggestion) Could it make sense to move this lm function to a new file?
   
   I can imagine that if we add all functions to this file it will become harder to look over.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [systemml] Baunsgaard commented on a change in pull request #892: Extend Python API (rand, lm, matrix multiplication)

Posted by GitBox <gi...@apache.org>.
Baunsgaard commented on a change in pull request #892: Extend Python API (rand, lm, matrix multiplication)
URL: https://github.com/apache/systemml/pull/892#discussion_r410382142
 
 

 ##########
 File path: src/main/python/tests/test_matrix_aggregations.py
 ##########
 @@ -84,6 +86,63 @@ def test_var2(self):
     def test_var3(self):
         self.assertTrue(np.allclose(sds.matrix(m1).var(axis=1).compute(), m1.var(axis=1, ddof=1).reshape(dim, 1)))
 
+    def test_rand_basic(self):
+        seed = 15
+        shape = (20, 20)
+        min_max = (0, 1)
+        sparsity = 0.2
+
+        m = sds.rand(rows=shape[0], cols=shape[1], pdf="uniform", min=min_max[0], max=min_max[1],
+                     seed=seed, sparsity=sparsity).compute()
+
+        self.assertTrue(m.shape == shape)
+        self.assertTrue((m.min() >= min_max[0]) and (m.max() <= min_max[1]))
+
+        # sparsity
+        m_flat = m.flatten('F')
+        count, bins, patches = plt.hist(m_flat)
+
+        non_zero_value_percent = sum(count[1:]) * 100 / count[0]
+        e = 0.05
+        self.assertTrue((non_zero_value_percent >= (sparsity - e) * 100)
+                        and (non_zero_value_percent <= (sparsity + e) * 100))
+        self.assertTrue(sum(count) == (shape[0] * shape[1]))
+
+    def test_rand_distribution(self):
+        seed = 15
+        shape = (20, 20)
+        min_max = (0, 1)
 
 Review comment:
   Duplicate values from last test.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [systemml] Baunsgaard commented on a change in pull request #892: Extend Python API (rand, lm, matrix multiplication)

Posted by GitBox <gi...@apache.org>.
Baunsgaard commented on a change in pull request #892: Extend Python API (rand, lm, matrix multiplication)
URL: https://github.com/apache/systemml/pull/892#discussion_r410380592
 
 

 ##########
 File path: src/main/python/tests/test_matrix_aggregations.py
 ##########
 @@ -84,6 +86,63 @@ def test_var2(self):
     def test_var3(self):
         self.assertTrue(np.allclose(sds.matrix(m1).var(axis=1).compute(), m1.var(axis=1, ddof=1).reshape(dim, 1)))
 
+    def test_rand_basic(self):
+        seed = 15
+        shape = (20, 20)
+        min_max = (0, 1)
+        sparsity = 0.2
 
 Review comment:
   Try setting up the tests such that it runs with multiple parameters, without code duplication (preferably such that each parameter set counts as a distinct test case).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [systemml] Baunsgaard commented on a change in pull request #892: Extend Python API (rand, lm, matrix multiplication)

Posted by GitBox <gi...@apache.org>.
Baunsgaard commented on a change in pull request #892: Extend Python API (rand, lm, matrix multiplication)
URL: https://github.com/apache/systemml/pull/892#discussion_r410876908
 
 

 ##########
 File path: src/main/python/tests/test_matrix_rand.py
 ##########
 @@ -0,0 +1,111 @@
+#-------------------------------------------------------------
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+#-------------------------------------------------------------
+
+# Make the `systemds` package importable
+import os
+import sys
+import unittest
+import numpy as np
+import scipy.stats as st
+import random
+
+path = os.path.join(os.path.dirname(os.path.realpath(__file__)), "../")
+sys.path.insert(0, path)
+from systemds.context import SystemDSContext
+
+
+shape = (random.randrange(0, 50), random.randrange(0, 50))
+min_max = (0, 1)
+sparsity = 0.2
+distributions = ['norm', 'uniform']
+
+sds = SystemDSContext()
+
+
 
 Review comment:
   OCD (double newline)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [systemml] Baunsgaard commented on a change in pull request #892: Extend Python API (rand, lm, matrix multiplication)

Posted by GitBox <gi...@apache.org>.
Baunsgaard commented on a change in pull request #892: Extend Python API (rand, lm, matrix multiplication)
URL: https://github.com/apache/systemml/pull/892#discussion_r410876849
 
 

 ##########
 File path: src/main/python/tests/test_matrix_rand.py
 ##########
 @@ -0,0 +1,111 @@
+#-------------------------------------------------------------
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+#-------------------------------------------------------------
+
+# Make the `systemds` package importable
+import os
+import sys
+import unittest
+import numpy as np
+import scipy.stats as st
+import random
+
+path = os.path.join(os.path.dirname(os.path.realpath(__file__)), "../")
+sys.path.insert(0, path)
+from systemds.context import SystemDSContext
+
+
+shape = (random.randrange(0, 50), random.randrange(0, 50))
+min_max = (0, 1)
+sparsity = 0.2
+distributions = ['norm', 'uniform']
+
+sds = SystemDSContext()
+
+
+class TestRand(unittest.TestCase):
+    def test_rand_shape(self):
+        m = sds.rand(rows=shape[0], cols=shape[1]).compute()
+        self.assertTrue(m.shape == shape)
+
+    def test_rand_min_max(self):
+        m = sds.rand(rows=shape[0], cols=shape[1], min=min_max[0], max=min_max[1]).compute()
+        self.assertTrue((m.min() >= min_max[0]) and (m.max() <= min_max[1]))
+
+    def test_rand_sparsity(self):
+        m = sds.rand(rows=shape[0], cols=shape[1], sparsity=sparsity).compute()
+
+        m_flat = m.flatten('F')
+        count, bins = np.histogram(m_flat)
+
+        non_zero_value_percent = sum(count[1:]) * 100 / count[0]
+        e = 0.05
+        self.assertTrue(sum(count) == (shape[0] * shape[1]) and (non_zero_value_percent >= (sparsity - e) * 100)
+                        and (non_zero_value_percent <= (sparsity + e) * 100))
+
+    def test_rand_uniform_distribution(self):
+        m = sds.rand(rows=shape[0], cols=shape[1], pdf="uniform", min=min_max[0], max=min_max[1],).compute()
+
+        dist = find_best_fit_distribution(m.flatten('F'), distributions)
+        self.assertTrue(dist == 'uniform')
+
+    def test_rand_normal_distribution(self):
+        m = sds.rand(rows=shape[0], cols=shape[1], pdf="normal", min=min_max[0], max=min_max[1]).compute()
+
+        dist = find_best_fit_distribution(m.flatten('F'), distributions)
+        self.assertTrue(dist == 'norm')
+
+    def test_rand_invalid_shape(self):
+        try:
+            sds.rand(rows=1, cols=-10).compute()
+            self.assertTrue(False)
+        except Exception as e:
+            print(e)
+
+    def test_rand_invalid_pdf(self):
+        try:
+            sds.rand(rows=1, cols=10, pdf="norm").compute()
+            self.assertTrue(False)
+        except Exception as e:
+            print(e)
+
+
+def find_best_fit_distribution(data, distribution_lst):
+    """
+    Finds and returns the distribution of the distributions list that fits the data the best.
+    :param data: flat numpy array
+    :param distribution_lst: distributions to check
+    :return: best distribution that fits the data
+    """
+    result = dict()
+
+    for dist in distribution_lst:
+        param = getattr(st, dist).fit(data)
+
+        D, p_value = st.kstest(data, dist, args=param)
+        result[dist] = p_value
+
+    best_dist = max(result, key=result.get)
+    return best_dist
+
+
 
 Review comment:
   OCD (double newline)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [systemml] juliale-15 commented on a change in pull request #892: Extend Python API (rand, lm, matrix multiplication)

Posted by GitBox <gi...@apache.org>.
juliale-15 commented on a change in pull request #892: Extend Python API (rand, lm, matrix multiplication)
URL: https://github.com/apache/systemml/pull/892#discussion_r410684515
 
 

 ##########
 File path: src/main/python/tests/test_matrix_aggregations.py
 ##########
 @@ -84,6 +86,63 @@ def test_var2(self):
     def test_var3(self):
         self.assertTrue(np.allclose(sds.matrix(m1).var(axis=1).compute(), m1.var(axis=1, ddof=1).reshape(dim, 1)))
 
+    def test_rand_basic(self):
+        seed = 15
+        shape = (20, 20)
+        min_max = (0, 1)
+        sparsity = 0.2
+
+        m = sds.rand(rows=shape[0], cols=shape[1], pdf="uniform", min=min_max[0], max=min_max[1],
+                     seed=seed, sparsity=sparsity).compute()
+
+        self.assertTrue(m.shape == shape)
+        self.assertTrue((m.min() >= min_max[0]) and (m.max() <= min_max[1]))
+
+        # sparsity
+        m_flat = m.flatten('F')
+        count, bins, patches = plt.hist(m_flat)
+
+        non_zero_value_percent = sum(count[1:]) * 100 / count[0]
+        e = 0.05
+        self.assertTrue((non_zero_value_percent >= (sparsity - e) * 100)
+                        and (non_zero_value_percent <= (sparsity + e) * 100))
+        self.assertTrue(sum(count) == (shape[0] * shape[1]))
+
+    def test_rand_distribution(self):
+        seed = 15
+        shape = (20, 20)
+        min_max = (0, 1)
+
+        m = sds.rand(rows=shape[0], cols=shape[1], pdf="uniform", min=min_max[0], max=min_max[1],
+                     seed=seed).compute()
+
+        m_flat = m.flatten('F')
+
+        dist = best_distribution(m_flat)
+        self.assertTrue(dist == 'uniform')
+
+        m1 = sds.rand(rows=shape[0], cols=shape[1], pdf="normal", min=min_max[0], max=min_max[1],
+                     seed=seed).compute()
+
+        m1_flat = m1.flatten('F')
+
+        dist = best_distribution(m1_flat)
+        self.assertTrue(dist == 'norm')
+
+
+def best_distribution(data):
+    distributions = ['norm', 'uniform']
+    result = dict()
+
+    for dist in distributions:
+        param = getattr(st, dist).fit(data)
+
+        D, p_value = st.kstest(data, dist, args=param)
+        result[dist] = p_value
+
+    best_dist = max(result, key=result.get)
+    return best_dist
+
 
 Review comment:
   Hi, regarding the code formatter,  I'm using PyCharms integrated Pep8 checks. Would that be enough? 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [systemml] Baunsgaard commented on a change in pull request #892: Extend Python API (rand, lm, matrix multiplication)

Posted by GitBox <gi...@apache.org>.
Baunsgaard commented on a change in pull request #892: Extend Python API (rand, lm, matrix multiplication)
URL: https://github.com/apache/systemml/pull/892#discussion_r410384018
 
 

 ##########
 File path: src/main/python/systemds/matrix/operation_node.py
 ##########
 @@ -229,3 +233,10 @@ def moment(self, moment, weights: DAGNode = None) -> 'OperationNode':
             unnamed_inputs.append(weights)
         unnamed_inputs.append(moment)
         return OperationNode(self.sds_context, 'moment', unnamed_inputs, output_type=OutputType.DOUBLE)
+
+    def lm(self, y: DAGNode, **kwargs) -> 'OperationNode':
 
 Review comment:
   Missing tests (You said this, but just a reminder.)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [systemml] Baunsgaard commented on a change in pull request #892: Extend Python API (rand, lm, matrix multiplication)

Posted by GitBox <gi...@apache.org>.
Baunsgaard commented on a change in pull request #892: Extend Python API (rand, lm, matrix multiplication)
URL: https://github.com/apache/systemml/pull/892#discussion_r410382542
 
 

 ##########
 File path: src/main/python/tests/test_matrix_aggregations.py
 ##########
 @@ -84,6 +86,63 @@ def test_var2(self):
     def test_var3(self):
         self.assertTrue(np.allclose(sds.matrix(m1).var(axis=1).compute(), m1.var(axis=1, ddof=1).reshape(dim, 1)))
 
+    def test_rand_basic(self):
+        seed = 15
+        shape = (20, 20)
+        min_max = (0, 1)
+        sparsity = 0.2
+
+        m = sds.rand(rows=shape[0], cols=shape[1], pdf="uniform", min=min_max[0], max=min_max[1],
+                     seed=seed, sparsity=sparsity).compute()
+
+        self.assertTrue(m.shape == shape)
+        self.assertTrue((m.min() >= min_max[0]) and (m.max() <= min_max[1]))
+
+        # sparsity
+        m_flat = m.flatten('F')
+        count, bins, patches = plt.hist(m_flat)
+
+        non_zero_value_percent = sum(count[1:]) * 100 / count[0]
+        e = 0.05
+        self.assertTrue((non_zero_value_percent >= (sparsity - e) * 100)
+                        and (non_zero_value_percent <= (sparsity + e) * 100))
+        self.assertTrue(sum(count) == (shape[0] * shape[1]))
+
+    def test_rand_distribution(self):
+        seed = 15
+        shape = (20, 20)
+        min_max = (0, 1)
+
+        m = sds.rand(rows=shape[0], cols=shape[1], pdf="uniform", min=min_max[0], max=min_max[1],
+                     seed=seed).compute()
+
+        m_flat = m.flatten('F')
+
+        dist = best_distribution(m_flat)
+        self.assertTrue(dist == 'uniform')
+
+        m1 = sds.rand(rows=shape[0], cols=shape[1], pdf="normal", min=min_max[0], max=min_max[1],
+                     seed=seed).compute()
+
+        m1_flat = m1.flatten('F')
+
+        dist = best_distribution(m1_flat)
+        self.assertTrue(dist == 'norm')
+
+
 
 Review comment:
   (OCD comment) double newline 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [systemml] Baunsgaard commented on a change in pull request #892: Extend Python API (rand, lm, matrix multiplication)

Posted by GitBox <gi...@apache.org>.
Baunsgaard commented on a change in pull request #892: Extend Python API (rand, lm, matrix multiplication)
URL: https://github.com/apache/systemml/pull/892#discussion_r410382683
 
 

 ##########
 File path: src/main/python/tests/test_matrix_aggregations.py
 ##########
 @@ -84,6 +86,63 @@ def test_var2(self):
     def test_var3(self):
         self.assertTrue(np.allclose(sds.matrix(m1).var(axis=1).compute(), m1.var(axis=1, ddof=1).reshape(dim, 1)))
 
+    def test_rand_basic(self):
+        seed = 15
+        shape = (20, 20)
+        min_max = (0, 1)
+        sparsity = 0.2
+
+        m = sds.rand(rows=shape[0], cols=shape[1], pdf="uniform", min=min_max[0], max=min_max[1],
+                     seed=seed, sparsity=sparsity).compute()
+
+        self.assertTrue(m.shape == shape)
+        self.assertTrue((m.min() >= min_max[0]) and (m.max() <= min_max[1]))
+
+        # sparsity
+        m_flat = m.flatten('F')
+        count, bins, patches = plt.hist(m_flat)
+
+        non_zero_value_percent = sum(count[1:]) * 100 / count[0]
+        e = 0.05
+        self.assertTrue((non_zero_value_percent >= (sparsity - e) * 100)
+                        and (non_zero_value_percent <= (sparsity + e) * 100))
+        self.assertTrue(sum(count) == (shape[0] * shape[1]))
+
+    def test_rand_distribution(self):
+        seed = 15
+        shape = (20, 20)
+        min_max = (0, 1)
+
+        m = sds.rand(rows=shape[0], cols=shape[1], pdf="uniform", min=min_max[0], max=min_max[1],
+                     seed=seed).compute()
+
+        m_flat = m.flatten('F')
+
+        dist = best_distribution(m_flat)
+        self.assertTrue(dist == 'uniform')
+
+        m1 = sds.rand(rows=shape[0], cols=shape[1], pdf="normal", min=min_max[0], max=min_max[1],
+                     seed=seed).compute()
+
+        m1_flat = m1.flatten('F')
+
+        dist = best_distribution(m1_flat)
+        self.assertTrue(dist == 'norm')
+
+
+def best_distribution(data):
+    distributions = ['norm', 'uniform']
+    result = dict()
+
+    for dist in distributions:
+        param = getattr(st, dist).fit(data)
+
+        D, p_value = st.kstest(data, dist, args=param)
+        result[dist] = p_value
+
+    best_dist = max(result, key=result.get)
+    return best_dist
+
 
 Review comment:
   (OCD comment) double newline

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [systemml] Baunsgaard commented on a change in pull request #892: Extend Python API (rand, lm, matrix multiplication)

Posted by GitBox <gi...@apache.org>.
Baunsgaard commented on a change in pull request #892: Extend Python API (rand, lm, matrix multiplication)
URL: https://github.com/apache/systemml/pull/892#discussion_r410383402
 
 

 ##########
 File path: src/main/python/tests/test_matrix_aggregations.py
 ##########
 @@ -84,6 +86,63 @@ def test_var2(self):
     def test_var3(self):
         self.assertTrue(np.allclose(sds.matrix(m1).var(axis=1).compute(), m1.var(axis=1, ddof=1).reshape(dim, 1)))
 
+    def test_rand_basic(self):
+        seed = 15
+        shape = (20, 20)
+        min_max = (0, 1)
+        sparsity = 0.2
+
+        m = sds.rand(rows=shape[0], cols=shape[1], pdf="uniform", min=min_max[0], max=min_max[1],
+                     seed=seed, sparsity=sparsity).compute()
+
+        self.assertTrue(m.shape == shape)
+        self.assertTrue((m.min() >= min_max[0]) and (m.max() <= min_max[1]))
+
+        # sparsity
+        m_flat = m.flatten('F')
+        count, bins, patches = plt.hist(m_flat)
+
+        non_zero_value_percent = sum(count[1:]) * 100 / count[0]
+        e = 0.05
+        self.assertTrue((non_zero_value_percent >= (sparsity - e) * 100)
+                        and (non_zero_value_percent <= (sparsity + e) * 100))
+        self.assertTrue(sum(count) == (shape[0] * shape[1]))
+
+    def test_rand_distribution(self):
+        seed = 15
+        shape = (20, 20)
+        min_max = (0, 1)
+
+        m = sds.rand(rows=shape[0], cols=shape[1], pdf="uniform", min=min_max[0], max=min_max[1],
+                     seed=seed).compute()
+
+        m_flat = m.flatten('F')
+
+        dist = best_distribution(m_flat)
+        self.assertTrue(dist == 'uniform')
+
+        m1 = sds.rand(rows=shape[0], cols=shape[1], pdf="normal", min=min_max[0], max=min_max[1],
+                     seed=seed).compute()
+
+        m1_flat = m1.flatten('F')
+
+        dist = best_distribution(m1_flat)
+        self.assertTrue(dist == 'norm')
+
+
+def best_distribution(data):
 
 Review comment:
   Define what you mean with best distribution in this test.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [systemml] Baunsgaard commented on a change in pull request #892: Extend Python API (rand, lm, matrix multiplication)

Posted by GitBox <gi...@apache.org>.
Baunsgaard commented on a change in pull request #892: Extend Python API (rand, lm, matrix multiplication)
URL: https://github.com/apache/systemml/pull/892#discussion_r410876115
 
 

 ##########
 File path: src/main/python/systemds/context/systemds_context.py
 ##########
 @@ -164,4 +164,14 @@ def rand(self, rows: int, cols: int, min: Union[float, int] = None,
         :param lambd: lamda value for "poison" distribution
         :return:
         """
+        available_pdfs = ["uniform", "normal", "poisson"]
+        if rows < 0:
+            raise ValueError("In rand statement, can only assign rows a long (integer) value >= 0 "
+                            "-- attempted to assign value: {r}".format(r=rows))
+        if cols < 0:
+            raise ValueError("In rand statement, can only assign cols a long (integer) value >= 0 "
 
 Review comment:
   `if cols <= 0:`
   `In rand statement, can only assign cols a long (integer) value > 0 `
   
   not >=0
   
   This makes me think that you did not verify the edge case of 0. Other than that, the message looks good :smile: 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [systemml] Baunsgaard commented on a change in pull request #892: Extend Python API (rand, lm, matrix multiplication)

Posted by GitBox <gi...@apache.org>.
Baunsgaard commented on a change in pull request #892: Extend Python API (rand, lm, matrix multiplication)
URL: https://github.com/apache/systemml/pull/892#discussion_r410377068
 
 

 ##########
 File path: src/main/python/tests/test_matrix_aggregations.py
 ##########
 @@ -25,6 +25,8 @@
 import warnings
 import unittest
 import numpy as np
+import scipy.stats as st
+import matplotlib.pyplot as plt
 
 Review comment:
   matplotlib is a great library, but it should not be included in the tests here.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [systemml] juliale-15 commented on a change in pull request #892: Extend Python API (rand, lm, matrix multiplication)

Posted by GitBox <gi...@apache.org>.
juliale-15 commented on a change in pull request #892: Extend Python API (rand, lm, matrix multiplication)
URL: https://github.com/apache/systemml/pull/892#discussion_r410681017
 
 

 ##########
 File path: src/main/python/tests/test_matrix_aggregations.py
 ##########
 @@ -84,6 +86,63 @@ def test_var2(self):
     def test_var3(self):
         self.assertTrue(np.allclose(sds.matrix(m1).var(axis=1).compute(), m1.var(axis=1, ddof=1).reshape(dim, 1)))
 
+    def test_rand_basic(self):
+        seed = 15
+        shape = (20, 20)
 
 Review comment:
   I would expect an exception to be thrown, and the user can catch it in a try-except block. 
   In this case I would add a test case to expect an exception in a try except block.
   Is this what you meant?
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services