You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by GitBox <gi...@apache.org> on 2019/07/22 06:58:07 UTC

[GitHub] [commons-statistics] BBenNguyenn opened a new pull request #21: Milestone-1 with squashed commits, please see description for more details

BBenNguyenn opened a new pull request #21: Milestone-1 with squashed commits, please see description for more details
URL: https://github.com/apache/commons-statistics/pull/21
 
 
   **OLS functionality complete:**
   - converted all math4.linear dependencies to EJML (more complicated and time consuming than expected) 
   -> should be easier now for GLS and logistic since experience was gained while porting OLS....
   - ensured full unit test coverage by porting all old ols tests and adding some new ones
   - created preliminary RegressionResults interface which essentially holds calculated results of a regression (to be accessed multiple times but calculated once) -> Note: preliminary usage with testSwissFertilityInterfaceFormat() in OLSRegressionTest, more to be added....
   - included new and ported Javadoc comments, ensuring no checkstyle errors
   
   
   **Known code smell:** math4.stat depedency
   Dependency usage: StatUtil, SumOfSquares, Variance, SecondMoment
   **Explanation:**
   This dependency is temporary until Statistics Descriptive completes array as input methods for above class functionalities which is said to be coming soon.
   I have considered helping Virendra with it to prevent all old dependencies completely for this milestone but I don't think I should interfere while I haven't completed my component since I would have to learn how to use streams properly as well, and it does sound like Virendra will be done soon anyways.
   Once Virendra is done, the switch will be swift, since only about 3 methods total use those functionalities.
   
   
   **Known code smell:** Data loading is perhaps not ideal
   **Explanation:**
   The current RegressionDataLoader stores the input data within a RegressionRawData object and passes an interface with a getter.
   This should be improved by using one of the suggested strategies in the ML.
   Will get to this this week or maybe after port of GLS....
   
   
   **Next Objectives:**
   - Improve data loading strategy 
   -> as suggested, a proper Factory pattern model
   - Finalize RegressionResults interface for OLS and other regressions to output 
   -> Summary statistics printout method?
   - Port GLS (expected to not take as long as OLS)
   - Start LogisticRegression implementation design
   
   
   **PLEASE NOTE:** 
   - I've created a UML "UML_current.png" in the README directory if anyone thinks a visual would be helpful.
   - Full commit history (before complete squashing) is in STATISTICS-8_Regression_Module branch.
   
   
   Thank you for your time and review,
   -Ben Nguyen

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services