You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@systemml.apache.org by GitBox <gi...@apache.org> on 2020/06/04 08:48:49 UTC

[GitHub] [systemml] j143 commented on a change in pull request #943: [DOC] Documentation for builtin OutlierIQR function

j143 commented on a change in pull request #943:
URL: https://github.com/apache/systemml/pull/943#discussion_r435093560



##########
File path: dev/docs/builtins-reference.md
##########
@@ -291,34 +291,95 @@ y = X %*% rand(rows=ncol(X), 1)
 [C, S] = steplm(X = X, y = y, icpt = 1);
 ```
 
-## `slicefinder`-Function
+## `outlier-Function
 
-The `slicefinder`-function returns top-k worst performing subsets according to a model calculation.
+An outlier in a probability distribution function is a number that is more than 1.5 times the length of the data set away from either the lower or upper quartiles. 
+Specifically, if a number is less than Q1−1.5×IQR or greater than Q3+1.5×IQR, then it is an outlier.
 
 ### Usage
 ```r
-slicefinder(X,W, y, k, paq, S);
+outlier(X,opposite);
 ```
 
 ### Arguments
 | Name    | Type           | Default  | Description |
 | :------ | :------------- | -------- | :---------- |
 | X       | Matrix[Double] | required | Recoded dataset into Matrix |
-| W       | Matrix[Double] | required | Trained model |
-| y       | Matrix[Double] | required | 1-column matrix of response values. |
-| k       | Integer        | 1        | Number of subsets required |
-| paq     | Integer        | 1        | amount of values wanted for each col, if paq = 1 then its off |
-| S       | Integer        | 2        | amount of subsets to combine (for now supported only 1 and 2) |
+|opposite| Boolean | required | Used for xor gate evaluation |
 
 ### Returns
 | Type           | Description |
 | :------------- | :---------- |
-| Matrix[Double] | Matrix containing the information of top_K slices (relative error, standart error, value0, value1, col_number(sort), rows, cols,range_row,range_cols, value00, value01,col_number2(sort), rows2, cols2,range_row2,range_cols2) |
+| Matrix[Double] | 1-column matrix of weights. |
 
-### Usage
+### Example
 ```r
 X = rand (rows = 50, cols = 10)
-y = X %*% rand(rows=ncol(X), 1)
-w = lm(X = X, y = y)
-ress = slicefinder(X = X,W = w, Y = y,  k = 5, paq = 1, S = 2);
+opposite = 1
+outlier(X=X,opposite=opposite)
 ```
+## outlierByIQR - Function
+
+Builtin function for detecting and repairing outliers using Interquartile Range.
+A commonly used rule says that a data point is an outlier if it is more than 1.5 IQR
+above the third quartile or below the first quartile.
+outlierByIQR function computes the matrix and set's a lower-bound quartile range and upper-bound quartile range 
+and the number which is less then the lower-bound or higher then the upper-bound is treated as a outlier, hence
+removed from the matrix.
+
+
+### Usage
+```r
+outlierByIQR(X,k,repair_method,max_iterations,verbose)
+`
+### Arguments
+| Name    | Type           | Default  | Description |
+| :------ | :------------- | -------- | :---------- |
+| X       | Matrix[Double] | required | matrix with outliers |
+|k         |     Double 	   |  1.5         | a constant used to discern outliers k*IQR 
+ |isIterative|  Boolean | TRUE   |iterative repair or single repair 
+ |repairMethod|   Integer|  1           | values: 0 = delete rows having outliers, 
+                                                              1 = replace outliers with zeros 
+                                            		      2 = replace outliers as missing values 
+ |max_iterations|  Integer | 0      | values: 0 = arbitrary number of iteraition until all outliers are removed, 
+                                                            n = any constant defined by user
+### Returns
+| Type           | Description |
+| :------------- | :---------- |
+| Matrix[Double] | matrix without any outlier. |
+
+### Example
+```r
+X = rand (rows=10,cols=10)
+opposite = 1
+Y = outlier(X = X, opposite = opposite)
+Z = outlierByIQR(X=Y,k=1.5,repairMethod=0,max_iterations=3,verbose=1)
+print("\n"+toString(Z))
+`
+###outlierBySd - function
+Builtin function for detecting and repairing outliers using standard deviation
+

Review comment:
       This is PR related to `OutlierIQR`, but we are having `OutlierBySd` changes. 🍰 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org