You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nifi.apache.org by Andre <an...@fucs.org> on 2016/08/09 07:12:58 UTC

Re: NIFI-1170 (ie TailDir )

Mark,

I am preparing to start working on NIFI-1170 again and I was wondering, is
a second level of state space something that can be done?

"TailDir" (similar to Flume's tail processor) should be capable of holding
the state of multiple files, however, I presently need to serialised,
store, read and deserialise every time a file state is updated.

Would it be possible to extend the state beyond a single state per
processor?

Cheers

On Sat, Jul 9, 2016 at 12:01 AM, Bryan Bende <bb...@gmail.com> wrote:

> Andre,
>
> Currently each processor can only persist one state map. The reason for
> this is that behind the scenes it is storing the state under a key like
> "components/<processor-uuid>" to ensure that the state is only for that
> processor, and can't be messed with by other processors.
> I supposes we could still have a way for the state manager API to let a key
> be specified and allow for something like
> "components/<processor-uuid>/state1" and
> "components/<processor-uuid>/state2", Mark Payne would probably need to
> comment more on this idea.
>
> As far as serializing/deserializing though, I think it is only
> deserializing in an @OnScheduled method called recoverState... so while the
> processor is running it is continuously serializing the state so that if it
> ever crashes it can pick back up, but it only ever
> reads that state if the processor restarts (either manual stop/start, or
> crash and restart). Hope that helps.
>
> Also, I'm wondering if TailDir can end up handling both cases of tailing a
> single file, and also tailing everything in a directory. I don't know all
> the ins and outs, but it seems like tailing everything in a directory with
> some kind filename filter might allow for tailing a single file as well,
> but I'm just thinking out loud here.
>
> -Bryan
>
>
> On Fri, Jul 8, 2016 at 7:49 AM, Andre <an...@fucs.org> wrote:
>
> > all,
> >
> > I ended up doing a fork TailFile and bolting together a frankenprototype
> of
> > the processor here:
> > (apologies for the spaghettiness of the code but the task was clearly
> > beyond my league... :-D )
> >
> > https://github.com/trixpan/nifi/tree/NIFI-1170
> >
> > I am still going through the basics of it but I would like to reach out
> for
> > feedback.
> >
> > Presently I am having to serialize and unserialize the state holding
> > object, something that doesn't seem to be the most efficient way. So I
> was
> > wondering:
> >
> > Can a processor store more than one state per context? If so, how?
> >
> > I thank you in advance
> >
>

Re: NIFI-1170 (ie TailDir )

Posted by Mark Payne <ma...@hotmail.com>.
Andre,

The state provide offers an arbitrary set of key/value pairs via the Map interface.
I'd recommend going with something like "file.0.name" => "/data/myfile.txt", "file.0.timestamp" => "147327302730", etc.

Is there something that I'm missing, so that this won't work?

Thanks
-Mark



> On Aug 9, 2016, at 3:12 AM, Andre <an...@fucs.org> wrote:
> 
> Mark,
> 
> I am preparing to start working on NIFI-1170 again and I was wondering, is a second level of state space something that can be done?
> 
> "TailDir" (similar to Flume's tail processor) should be capable of holding the state of multiple files, however, I presently need to serialised, store, read and deserialise every time a file state is updated. 
> 
> Would it be possible to extend the state beyond a single state per processor?
> 
> Cheers
> 
> On Sat, Jul 9, 2016 at 12:01 AM, Bryan Bende <bbende@gmail.com <ma...@gmail.com>> wrote:
> Andre,
> 
> Currently each processor can only persist one state map. The reason for
> this is that behind the scenes it is storing the state under a key like
> "components/<processor-uuid>" to ensure that the state is only for that
> processor, and can't be messed with by other processors.
> I supposes we could still have a way for the state manager API to let a key
> be specified and allow for something like
> "components/<processor-uuid>/state1" and
> "components/<processor-uuid>/state2", Mark Payne would probably need to
> comment more on this idea.
> 
> As far as serializing/deserializing though, I think it is only
> deserializing in an @OnScheduled method called recoverState... so while the
> processor is running it is continuously serializing the state so that if it
> ever crashes it can pick back up, but it only ever
> reads that state if the processor restarts (either manual stop/start, or
> crash and restart). Hope that helps.
> 
> Also, I'm wondering if TailDir can end up handling both cases of tailing a
> single file, and also tailing everything in a directory. I don't know all
> the ins and outs, but it seems like tailing everything in a directory with
> some kind filename filter might allow for tailing a single file as well,
> but I'm just thinking out loud here.
> 
> -Bryan
> 
> 
> On Fri, Jul 8, 2016 at 7:49 AM, Andre <andre-lists@fucs.org <ma...@fucs.org>> wrote:
> 
> > all,
> >
> > I ended up doing a fork TailFile and bolting together a frankenprototype of
> > the processor here:
> > (apologies for the spaghettiness of the code but the task was clearly
> > beyond my league... :-D )
> >
> > https://github.com/trixpan/nifi/tree/NIFI-1170 <https://github.com/trixpan/nifi/tree/NIFI-1170>
> >
> > I am still going through the basics of it but I would like to reach out for
> > feedback.
> >
> > Presently I am having to serialize and unserialize the state holding
> > object, something that doesn't seem to be the most efficient way. So I was
> > wondering:
> >
> > Can a processor store more than one state per context? If so, how?
> >
> > I thank you in advance
> >
>