You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by "James Medel (Jira)" <ji...@apache.org> on 2020/04/30 07:52:00 UTC

[jira] [Created] (NIFI-7411) Integrates NiFi with H2O Driverless AI MOJO Scoring Pipeline (Java Runtime) To Do ML Inference

James Medel created NIFI-7411:
---------------------------------

             Summary: Integrates NiFi with H2O Driverless AI MOJO Scoring Pipeline (Java Runtime) To Do ML Inference
                 Key: NIFI-7411
                 URL: https://issues.apache.org/jira/browse/NIFI-7411
             Project: Apache NiFi
          Issue Type: New Feature
          Components: Extensions
    Affects Versions: 1.12.0
         Environment: Mac OS X Mojave 10.14.6
            Reporter: James Medel


*NiFi and H2O Driverless AI Integration* via Custom NiFi Processor:

Integrates NiFi with H2O Driverless AI by using Driverless AI's MOJO Scoring Pipeline (in Java Runtime) and NiFi's Custom Processor. This processor executes the MOJO Scoring Pipeline to do batch scoring or real-time scoring for one or more predicted labels on tabular data in the incoming flow file content. If the tabular data is one row, then the MOJO does real-time scoring. If the tabular data is multiple rows, then the MOJO does batch scoring. I would like to contribute my processor to NiFi as a new feature.

*1 Custom Processor* created for NiFi:

*ExecuteMojoScoringRecord* - Executes H2O Driverless AI's MOJO Scoring Pipeline in Java Runtime to do batch scoring or real-time scoring on a frame of data within each incoming flow file. It requires the user to add *mojo2-runtime.jar* filepath into *MOJO2 Runtime JAR Directory* ** property to dynamically modify the classpath. It also requires the user to add the *pipeline.mojo* filepath into the *Pipeline MOJO Filepath* property. This property is used in the onTrigger() method to get the pipeline.mojo filepath, so we can pass it into the
MojoPipeline.loadFrom(pipelineMojoPath) to instantiate our MojoPipeline model. Then the record read in with Record Reader and the model are passed into a predict() method to make predictions on the test data within the record. Inside the predict() method, I use MojoFrameBuilder and MojoRowBuilder with the recordMap to build an input MojoFrame. Then I use the model's transform(input MojoFrame) method to make the predictions on the input and store them into an output MojoFrame. I iterate through the MojoFrame by row and column to store each key value pair prediction into the predictedRecordMap. I then convert the predictedRecordMap to predictedRecord and return the record back to onTrigger to write the record to the flow file content using RecordSetWriter. We keep writing predicted Records to the flow file content until there are no more records to write. Then we reach near end of onTrigger() and the flow file is either transferred on relationship failure, success or original to the next connection.
 
*Hydraulic System Condition Monitoring* Data used in NiFi Flow:
 
The sensor test data I used in this integration comes from UCI ML Repo: Condition Monitoring for Hydraulic Systems. I was able to predict the hydraulic cooling condition through NiFi and H2O Integration described above. This use case is hydraulic system predictive maintenance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)