You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@systemds.apache.org by GitBox <gi...@apache.org> on 2021/08/21 09:58:11 UTC

[GitHub] [systemds] fathollahzadeh opened a new pull request #1369: [SYSTEMDS][IOGEN]Auto Generate Reader

fathollahzadeh opened a new pull request #1369:
URL: https://github.com/apache/systemds/pull/1369


   This PR adds a new Reader feature. The primary goal of this work is to extract the file format properties of the file from a sample. Then, generate a reader based on extracted properties. This is a draft version of the inferred section.
   
   Tested file formats:
   1. Full CSV file, CSV file with missed values, Missed values filled with arbitrary NA string
   2. LibSVM file with an arbitrary order of stored index and value.
   3. MtrixMarket with General, symmetric, and skew-symmetric.
   
   I tested the tests with unique and duplicate values.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@systemds.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemds] asfgit closed pull request #1369: [SYSTEMDS][IOGEN] Auto Generate Reader

Posted by GitBox <gi...@apache.org>.
asfgit closed pull request #1369:
URL: https://github.com/apache/systemds/pull/1369


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@systemds.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemds] mboehm7 commented on pull request #1369: [SYSTEMDS][IOGEN] Auto Generate Reader

Posted by GitBox <gi...@apache.org>.
mboehm7 commented on pull request #1369:
URL: https://github.com/apache/systemds/pull/1369#issuecomment-922481472


   LGTM - thanks @fathollahzadeh  for finalizing this initial version of the IOGEN framework. So far this implementation uses generalized readers for types of structured inputs - it's fine to postpone the actual code generation until we come to structure that requires generation for performance.
   
   During the merge I made a number of minor changes:
   * Included the iogen tests into our github test workflow
   * Fixed two failing tests (set to ignore) which missed some input file. Similarly other tests throw exceptions due to missing files but succeed. Please do a separate PR for fixing these tests including better checks for expected results.
   * Fixed a number of warnings of unused variables, and simplified some verbose code snippets
   * Fixed the parsing of ints via the fast string tokenizer (instead of `(int)Double.parseDouble()` we should use `Integer.parseInt()`)
   * In the future please avoid wildcard imports and reformatting files (e.g., `DataConverter` here) where no changes have been made.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@systemds.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemds] fathollahzadeh commented on pull request #1369: [SYSTEMDS][IOGEN] Auto Generate Reader

Posted by GitBox <gi...@apache.org>.
fathollahzadeh commented on pull request #1369:
URL: https://github.com/apache/systemds/pull/1369#issuecomment-919347735


   > Thanks @fathollahzadeh - that's a good first step and we can bring it in as experimental after some additional cleanups (dependencies, imports). As next steps we should run some experiments for better understanding the runtime breakdown, and then generalize the hard-coded readers to code-generated reads that utilize sub-templates for key concepts.
   
   Thanks @mboehm7 for the first review and your supports. The new updates are supporting auto-generate readers for Matrix and Frame. In matrix readers, we are supporting general, symmetric (upper, lower triangular), and skew-symmetric. At this PR we do not support pattern matching. For Frame readers, we are supporting three value types (STRING, Numeric, and BOOLEAN). Unknown value types in Frame are our future work. 
   For the test side, we don't have any DML scripts. We are calling pure java public methods for generating readers. The new DML script is also our future work. 
   
   Thanks for the review. 
     
    


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@systemds.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [systemds] mboehm7 commented on pull request #1369: [SYSTEMDS][IOGEN] Auto Generate Reader

Posted by GitBox <gi...@apache.org>.
mboehm7 commented on pull request #1369:
URL: https://github.com/apache/systemds/pull/1369#issuecomment-903830596


   Thanks @fathollahzadeh - that's a good first step and we can bring it in as experimental after some additional cleanups (dependencies, imports). As next steps we should run some experiments for better understanding the runtime breakdown, and then generalize the hard-coded readers to code-generated reads that utilize sub-templates for key concepts.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@systemds.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org