You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@nifi.apache.org by "Eric Chaves (JIRA)" <ji...@apache.org> on 2017/11/05 22:51:00 UTC

[jira] [Created] (NIFI-4569) Brief summary of a newbie journey into NiFi

Eric Chaves created NIFI-4569:
---------------------------------

Summary: Brief summary of a newbie journey into NiFi
Key: NIFI-4569
URL: https://issues.apache.org/jira/browse/NIFI-4569
Project: Apache NiFi
Issue Type: Improvement
Components: Core Framework, Docker
Affects Versions: 1.4.0
Reporter: Eric Chaves
Priority: Minor

Hi folks,

As requested at Nifi's User mailing list, I'm compiling a small briefing of my experience as a new user working with NiFi. This report includes some feedback while I was learning NiFi following some tutorial and articles found on the internet.

In particular, after scouting the internet and reading NiFi's docs, I first tried to write some custom flows using scripted processors and scripted services.

Most of the troubles I've faced I attribute to my lack of expertise with Java/Groovy plus my small knowledge with NiFi's architecture. I did faced some inconsistencies however that may be eased for newcomers like me in future releases.

*Development workflow*
To write my flows I'm using the official docker image (apache/nifi:1.4.0) together with other images like mysql, mongodb and localstack (to simulate AWS services).

It took me a while to properly setup a working folder using docker. When I simple run the container without any volume binded to my local host, NiFi started properly because all the required files are already present in conf dir. This setup works fine for trying NiFi out but is not good as development starting point because changes may be lost due to container rebuilt.

To address this I first attempted to bind volumes for ./conf and ./logs folder but this didn't worked as expected because this produced and empty ./conf folder that prevents nifi from starting. To get this right was a little hard and I only did it after looking at some github's projects to compare they're ./conf dir with NiFi's default configuration inside the docker image.

_- *Suggestion #1*: NiFi shoud created default configuration files when no file is present in ./conf_

After getting my docker right I started creating some flows following some Hortonworks tutorials and others website (like Matt's blog) and the first thing that confused me was the lack of a simple to get a new blank canvas. In order to do so I need to stop all components, delete processors and groups, ports and etc.

_- *Suggestion #2*: Add an "new canvas" button to the operations pallete (or somewhere else)._

At this point I started writing some flows that I would like to keep under version control and there is no simple way for that. NiFi's canvas is kept in a gzip file which is not ideal for version control and the current way of exporting a canvas requires us to save it as template and then export the template. This process seems ok when we have finished a flow and want to export it for future use as an actual template but is not good for small commits while working in the flow (WIP commits). A better approach would be to have the flowfile as xml directly under source control.

_- *Suggestion #3*: Enable a configuration mode to kept the current canva as pure xml instead of GZipped file._

*Writing Custom Processors*
Not much to be said here. The experience was awesome. The developers guide is very good and Matt's 3 post 'nifi scripting cookbooks' are priceless. Those cookbooks should be added to the default documentation.

Two things hit me here: I assumed that all JAR used by NiFi's default processors were available for use by a script processor (requiring it only to be imported) but that is not the case and I needed to re-add some JAR's (like javax.mail or AWS java sdk) to the script's modules folder. That was not intuitive for me but maybe because I'm not a Java developer. I also had a trouble with java loading proper handlers for mime types (something related to javax.mail) that blocked me to use Groovy for a custom script write multipart mime messages (I ended writing it in python).

- *Suggestion #4*: Make clear which JAR files are available by default inside a and which are not and how to properly configure the system class loader.

*Writing Custom Services*
I'm working on a custom flow that needs to enrich some data records. Since I had successfully wrote some processors scripts I tried to script a custom LookupService and after googling around I used two sources as reference: 1) An Andy Lopresto script found at gist and 2) the test_lookup_inline.groovy at NiFi's source code.

I made some mistakes and my script lookup was not working so I decided to log some info in order to troubleshoot my code but no information was logged. That's when I noticed that my docker image was not producing any logs. I read the administrator guide to see If I was missing something at bootstrap.conf file but my file was ok according to the docs.

It was only when I scouted the logback.xml file that I noticed an variable "bootstrap.conf.dir" somewhere and that hinted me to try adding a "log.dir=" key to the bootstrap.conf file. At this time I also found that I had the wrong permissions on my local log folder which was probably the reason why no logs were being written. Since I made both changes at once I can't say for sure which one fixed the logs.

Only once I got the nifi-user.log properly working that I could see the exception was being raised (because there wasn't a log object) and that was the whole error. Once fixed the scripted lookup worked like a charm.

Matt explained in the user's list the reasons why there is no log by default on controller services and it makes sense however as a user I got a little lost because when configuring processors I can set the log level very easily but when comes to ControllerServices the dialog has no mention about how or where log is done.

The same thing happens when I was editing my script code and re-running. With processors I point the script file and in order to have it reloaded in case of changes I only need to stop/start the processor.

For scripted services I assumed that disabling/enabling them would had the same effect but that was not the case. Once enabled I could only had it reloaded by stoping my docker container and starting it again. It took me a while before I could figured out that my new code was not being loaded and instead the previous versions still in use (even after disabling/enabling it).

- *Suggestion #5*: Improve UX consistency between processor configuration and controller services configuration. Allow service's code reload with enable/disable and add some link for how to properly log service messages.

- *Suggestion #6*: Add a section into developer guide explaining the default interfaces (like LookupServices) and how they should be implemented/extended

Anyway I'm really enjoying NiFi and day after day it's becoming easier to understand it's component's model.

Great work guys and thanks for such a great software!!

--
This message was sent by Atlassian JIRA
(v6.4.14#64029)