You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@nifi.apache.org by mbwagne <mi...@wavestrike.com> on 2015/04/20 22:45:55 UTC

ExecuteStreamCommand with output directory

I have an ExecuteStreamCommand Processor that runs an application that
generates an output directory with several files and subdirectories (no
option to specify output stream). How would I pickup the output from that
command from another processor, ensuring the command has completely finished
(i.e. GetFile would start right away)?



--
View this message in context: http://apache-nifi-incubating-developer-list.39713.n7.nabble.com/ExecuteStreamCommand-with-output-directory-tp1177.html
Sent from the Apache NiFi (incubating) Developer List mailing list archive at Nabble.com.

Re: ExecuteStreamCommand with output directory

Posted by Brandon DeVries <br...@jhu.edu>.

Mike,

    Yes, it's a bit of a compromise.  But honestly, when you use
ExecuteStreamCommand you're stepping a bit outside of NiFi, and its
probably going to be less than ideal.  As Joe said, writing your own
processor would allow you to handle things a bit more directly and
efficiently.  However, ExecuteStreamCommand allows you to put together a
proof of concept, and from there you can decide if the gains from writing
your own processor are worth it for your case.  Let us know if there's
anything else we can do to help.

Brandon

On Thu, Apr 23, 2015 at 12:59 PM mbwagne <mi...@wavestrike.com> wrote:

> Brandon's approach did work. Thanks Brandon!
>
> My concern is I'm tarring up a directory in a shell script just to unpack
> in
> a nifi processor. That seems like a lot of unnecessary IO. I wish I could
> trigger GetFile on the completion of the ExecuteStreamCommand via the
> success route.
>
>
>
> --
> View this message in context:
> http://apache-nifi-incubating-developer-list.39713.n7.nabble.com/ExecuteStreamCommand-with-output-directory-tp1177p1219.html
> Sent from the Apache NiFi (incubating) Developer List mailing list archive
> at Nabble.com.
>

Re: ExecuteStreamCommand with output directory

Posted by Mark Payne <ma...@hotmail.com>.

Mike,

Any chance you can control where the data is written to by the script?

If you write it for instance to ".myDir"  your "wrapper" script could 
just rename it form ".myDir" to "myDir".

GetFile by default does not pick up any file that begins with a .

------ Original Message ------
From: "mbwagne" <mi...@wavestrike.com>
To: dev@nifi.incubator.apache.org
Sent: 4/23/2015 12:43:12 PM
Subject: Re: ExecuteStreamCommand with output directory

>Brandon's approach did work. Thanks Brandon!
>
>My concern is I'm tarring up a directory in a shell script just to 
>unpack in
>a nifi processor. That seems like a lot of unnecessary IO. I wish I 
>could
>trigger GetFile on the completion of the ExecuteStreamCommand via the
>success route.
>
>
>
>--
>View this message in context: 
>http://apache-nifi-incubating-developer-list.39713.n7.nabble.com/ExecuteStreamCommand-with-output-directory-tp1177p1219.html
>Sent from the Apache NiFi (incubating) Developer List mailing list 
>archive at Nabble.com.

Re: ExecuteStreamCommand with output directory

Posted by mbwagne <mi...@wavestrike.com>.

Brandon's approach did work. Thanks Brandon!

My concern is I'm tarring up a directory in a shell script just to unpack in
a nifi processor. That seems like a lot of unnecessary IO. I wish I could
trigger GetFile on the completion of the ExecuteStreamCommand via the
success route.



--
View this message in context: http://apache-nifi-incubating-developer-list.39713.n7.nabble.com/ExecuteStreamCommand-with-output-directory-tp1177p1219.html
Sent from the Apache NiFi (incubating) Developer List mailing list archive at Nabble.com.

Re: ExecuteStreamCommand with output directory

Posted by Joe Witt <jo...@gmail.com>.

Mike,

There is presently no support for such a mechanism (wait for '.done').
That is a really specific model.  I think for such a case what Brandon
mentions is the best/safe way.

Or it would be pretty easy to build a custom processor for that too.

Thanks
Joe

On Mon, Apr 20, 2015 at 5:05 PM, mbwagne <mi...@wavestrike.com> wrote:
> Joe,
>
> The Minimum File Age should solve most cases. It's not currently possible to
> wait for a "done" file to know when a directory is complete is there? Like
> the following example where GetFile has an Input Directory of "output" and
> "1234" and "3456" are complete, but "2345" is not.
>
>
> output/1234/
>     resources/
>         test1.txt
>         te2t2.txt
>     .done
> output/2345/
>     resources/
>         test3.txt
> output/3456/
>     resources/
>         test3.txt
>         test4.txt
>     .done
>
>
> Thanks,
> Mike
>
>
>
> --
> View this message in context: http://apache-nifi-incubating-developer-list.39713.n7.nabble.com/ExecuteStreamCommand-with-output-directory-tp1177p1183.html
> Sent from the Apache NiFi (incubating) Developer List mailing list archive at Nabble.com.

Re: ExecuteStreamCommand with output directory

Posted by mbwagne <mi...@wavestrike.com>.

Joe,

The Minimum File Age should solve most cases. It's not currently possible to
wait for a "done" file to know when a directory is complete is there? Like
the following example where GetFile has an Input Directory of "output" and
"1234" and "3456" are complete, but "2345" is not.


output/1234/
    resources/
        test1.txt
        te2t2.txt
    .done
output/2345/
    resources/
        test3.txt
output/3456/
    resources/
        test3.txt
        test4.txt
    .done


Thanks,
Mike



--
View this message in context: http://apache-nifi-incubating-developer-list.39713.n7.nabble.com/ExecuteStreamCommand-with-output-directory-tp1177p1183.html
Sent from the Apache NiFi (incubating) Developer List mailing list archive at Nabble.com.

Re: ExecuteStreamCommand with output directory

Posted by mbwagne <mi...@wavestrike.com>.

Good idea Brandon! I'll look at that approach. I was just hoping to NIFI it
all, but maybe that's not the correct way to think about it.



--
View this message in context: http://apache-nifi-incubating-developer-list.39713.n7.nabble.com/ExecuteStreamCommand-with-output-directory-tp1177p1184.html
Sent from the Apache NiFi (incubating) Developer List mailing list archive at Nabble.com.

Re: ExecuteStreamCommand with output directory

Posted by Brandon DeVries <br...@jhu.edu>.

Mike,

You could also create a wrapper script for your application and call that
from ExecuteStreamCommand. The wrapper would call your application,  tar up
the output,  and steam that out.  Follow that with UnpackContent,  and
continue from there.

Brandon

On Mon, Apr 20, 2015, 5:07 PM Joe Witt <jo...@gmail.com> wrote:

> Mike,
>
> Consider using GetFile to a parent directory of the
> executestreamcommand output and telling it to recurse for files.
>
> Keep in mind though no matter what if the process doesn't write files
> with some sort of flag you have a race condition.  GetFile let's you
> do things to reduce the risk of the inherent race condition though.
> For instance, you can tell it to only pick up data that is a certain
> age as indicated by its last modified date.
>
> Does this sound like it would take care of it?
>
> Thanks
> Joe
>
> On Mon, Apr 20, 2015 at 4:45 PM, mbwagne <mi...@wavestrike.com>
> wrote:
> > I have an ExecuteStreamCommand Processor that runs an application that
> > generates an output directory with several files and subdirectories (no
> > option to specify output stream). How would I pickup the output from that
> > command from another processor, ensuring the command has completely
> finished
> > (i.e. GetFile would start right away)?
> >
> >
> >
> > --
> > View this message in context:
> http://apache-nifi-incubating-developer-list.39713.n7.nabble.com/ExecuteStreamCommand-with-output-directory-tp1177.html
> > Sent from the Apache NiFi (incubating) Developer List mailing list
> archive at Nabble.com.
>

Re: ExecuteStreamCommand with output directory

Posted by Joe Witt <jo...@gmail.com>.

Mike,

Consider using GetFile to a parent directory of the
executestreamcommand output and telling it to recurse for files.

Keep in mind though no matter what if the process doesn't write files
with some sort of flag you have a race condition.  GetFile let's you
do things to reduce the risk of the inherent race condition though.
For instance, you can tell it to only pick up data that is a certain
age as indicated by its last modified date.

Does this sound like it would take care of it?

Thanks
Joe

On Mon, Apr 20, 2015 at 4:45 PM, mbwagne <mi...@wavestrike.com> wrote:
> I have an ExecuteStreamCommand Processor that runs an application that
> generates an output directory with several files and subdirectories (no
> option to specify output stream). How would I pickup the output from that
> command from another processor, ensuring the command has completely finished
> (i.e. GetFile would start right away)?
>
>
>
> --
> View this message in context: http://apache-nifi-incubating-developer-list.39713.n7.nabble.com/ExecuteStreamCommand-with-output-directory-tp1177.html
> Sent from the Apache NiFi (incubating) Developer List mailing list archive at Nabble.com.