You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by "Tauzell, Dave" <Da...@surescripts.com> on 2016/02/23 23:01:55 UTC

Script that outputs to a folder

I want to run a flow like this:

Notice file in directory
Call script passing path to file
The script calls a mapreduce job
Take the output of the mapreduce job (files) and move those to a new HDFS folder

I see there is an ExecuteStreamProcess which passes a FlowFile to stdin and then uses stdout as a flow file.  But in my case, the script reads and writes from files based on a path.   I wanted to be able to create these steps and connect them in the UI, but I'm thinking that what I need to do instead is have:


1.       A file watcher on the original directory that then calls the scripts

2.       A file watcher on the script's output directory

Does that make sense?

-Dave

Dave Tauzell | Senior Software Engineer | Surescripts
O: 651.855.3042 | www.surescripts.com<http://www.surescripts.com/> |   Dave.Tauzell@surescripts.com<ma...@surescripts.com>
Connect with us: Twitter<https://twitter.com/Surescripts> I LinkedIn<https://www.linkedin.com/company/surescripts-llc> I Facebook<https://www.facebook.com/Surescripts> I YouTube<http://www.youtube.com/SurescriptsTV>


This e-mail and any files transmitted with it are confidential, may contain sensitive information, and are intended solely for the use of the individual or entity to whom they are addressed. If you have received this e-mail in error, please notify the sender by reply e-mail immediately and destroy all copies of the e-mail and any attachments.

Re: Script that outputs to a folder

Posted by Aldrin Piri <al...@gmail.com>.
Hello Dave,

Your prescribed approach certainly seems reasonable using the default
components.  The alternative to make everything work in flow would be to
get your script utilizing input and output streams in lieu of the files.
Additionally, depending on the scripting language, if you are working with
0.5.0 or later, the introduction of ExecuteScript and
InvokeScriptedProcessor may open up some interesting possibilities as well.

Let us know if you hit any other stumbling points or have some additional
questions on getting these files to where they need to be.





On Tue, Feb 23, 2016 at 5:01 PM, Tauzell, Dave <Dave.Tauzell@surescripts.com
> wrote:

> I want to run a flow like this:
>
>
>
> Notice file in directory
>
> Call script passing path to file
>
> The script calls a mapreduce job
>
> Take the output of the mapreduce job (files) and move those to a new HDFS
> folder
>
>
>
> I see there is an ExecuteStreamProcess which passes a FlowFile to stdin
> and then uses stdout as a flow file.  But in my case, the script reads and
> writes from files based on a path.   I wanted to be able to create these
> steps and connect them in the UI, but I’m thinking that what I need to do
> instead is have:
>
>
>
> 1.       A file watcher on the original directory that then calls the
> scripts
>
> 2.       A file watcher on the script’s output directory
>
>
>
> Does that make sense?
>
>
>
> -Dave
>
>
>
> Dave Tauzell | Senior Software Engineer | Surescripts
>
> O: 651.855.3042 | www.surescripts.com |   Dave.Tauzell@surescripts.com
>
> Connect with us: Twitter <https://twitter.com/Surescripts> I LinkedIn
> <https://www.linkedin.com/company/surescripts-llc> I Facebook
> <https://www.facebook.com/Surescripts> I YouTube
> <http://www.youtube.com/SurescriptsTV>
>
>
>
>
> This e-mail and any files transmitted with it are confidential, may
> contain sensitive information, and are intended solely for the use of the
> individual or entity to whom they are addressed. If you have received this
> e-mail in error, please notify the sender by reply e-mail immediately and
> destroy all copies of the e-mail and any attachments.
>