You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by "Joe Witt (Jira)" <ji...@apache.org> on 2020/05/04 14:49:00 UTC

[jira] [Commented] (MINIFICPP-1201) Integrates MiNiFi C++ with H2O Driverless AI MOJO Scoring Pipeline (C++ Runtime Python Wrapper) To Do ML Inference on Edge

    [ https://issues.apache.org/jira/browse/MINIFICPP-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17099004#comment-17099004 ] 

Joe Witt commented on MINIFICPP-1201:
-------------------------------------

tagging as a blocker since the commit has been merged and yet licensing remains a concern

> Integrates MiNiFi C++ with H2O Driverless AI MOJO Scoring Pipeline (C++ Runtime Python Wrapper) To Do ML Inference on Edge
> --------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MINIFICPP-1201
>                 URL: https://issues.apache.org/jira/browse/MINIFICPP-1201
>             Project: Apache NiFi MiNiFi C++
>          Issue Type: New Feature
>    Affects Versions: master
>         Environment: Ubuntu 18.04 in AWS EC2
> MiNiFi C++ 0.7.0
>            Reporter: James Medel
>            Priority: Blocker
>             Fix For: master
>
>
> *MiNiFi C++ and H2O Driverless AI Integration* via Custom Python Processors:
> Integrates MiNiFi C++ with H2O Driverless AI by using Driverless AI's MOJO Scoring Pipeline (in C++ Runtime Python Wrapper) and MiNiFi's Custom Python Processor. Uses a Python Processor to execute the MOJO Scoring Pipeline to do batch scoring or real-time scoring for one or more predicted labels on tabular test data in the incoming flow file content. If the tabular data is one row, then the MOJO does real-time scoring. If the tabular data is multiple rows, then the MOJO does batch scoring. I would like to contribute my processors to MiNiFi C++ as a new feature.
> *1 custom python processor* created for MiNiFi:
> *H2oMojoPwScoring* - Executes H2O Driverless AI's MOJO Scoring Pipeline in C++ Runtime Python Wrapper to do batch scoring or real-time scoring on a frame of data within each incoming flow file. Requires the user to add the *pipeline.mojo* filepath into the "MOJO Pipeline Filepath" property. This property is used in the onTrigger(context, session) function to get the pipeline.mojo filepath, so we can *pass it into* the *daimojo.model(pipeline_mojo_filepath)* function to instantiate our *mojo_scorer*. MOJO creation time and uuid are added as individual flow file attributes. Then the *flow file content* is *loaded into Datatable* *frame* to hold the test data. Then a Python lambda function called compare is used to compare whether the datatable frame header column names equals the expected header column names from the mojo scorer. This check is done because the datatable frame could have a missing header, which is true when the header does not equal the expected header and so we update the datatable frame header with the mojo scorer's expected header. Having the correct header works nicely because the *mojo scorer's* *predict(datatable_frame)* function needs the header and then does the prediction returning a predictions datatable frame. The mojo scorer's predict function is *capable of doing real-time scoring or batch scoring*, it just depends on the amount of rows that the tabular data has. This predictions datatable frame is then converted to pandas dataframe, so we can use pandas' to_string(index=False) function to convert the dataframe to a string without the dataframe's index. Then *the prediction string is written to flow file content*. A flow file attribute is added for the number of rows scored. Another one or more flow file attributes are added for the predicted label name and its associated score. Finally, the flow file is transferred on a success relationship.
>  
> *Hydraulic System Condition Monitoring* Data used in MiNiFi Flow:
> The sensor test data I used in this integration comes from Kaggle: Condition Monitoring of Hydraulic Systems. I was able to predict hydraulic system cooling efficiency through MiNiFi and H2O integration described above. This use case here is hydraulic system predictive maintenance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)