You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@nifi.apache.org by Lars Francke <la...@gmail.com> on 2016/02/17 17:06:38 UTC

Maximum attribute size

Hi and sorry for all these questions.

I know that FlowFile content is persisted to the content_repository and can
handle reasonably large amounts of data. Is the same true for attributes?

I download JSON files (up to 200kb I'd say) and I want to insert them as
they are into a PostgreSQL JSONB column. I'd love to use the PutSQL
processor for that but it requires parameters in attributes.

I have a feeling that putting large objects in attributes is a bad idea?

Re: Maximum attribute size

Posted by Joe Percivall <jo...@yahoo.com>.

Hello Lars,
You are correct that the WAL is different from swapping. 
Swapping is used when a single connection queue grows to be very large. A chunk of the FlowFiles are then swapped out of JVM memory and written to disk. Where they are stored until they are swapped back in for processing. The WAL is almost solely for persistence of information when an NiFi instance is stopped for some reason (ie. restarting or hardware failures).
I am currently working on finishing up a document which will explain these and many other concepts utilized by the underlying system. So look out for that in the relatively near future. Joe
- - - - - - Joseph Percivalllinkedin.com/in/Percivalle: joepercivall@yahoo.com

    On Wednesday, February 17, 2016 6:48 PM, Lars Francke <la...@gmail.com> wrote:

 Thanks a lot for confirming my suspicions.
One last clarification: The WAL is different from the swapping concept, correct? I guess it's way faster to swap in a dedicated "dump" than replaying a WAL.
On Wed, Feb 17, 2016 at 7:53 PM, Joe Witt <jo...@gmail.com> wrote:

Lars,

You are right about the thought process.  We've never provided solid
guidance here but we should.  It is definitely the case that flow file
content is streamed to and from the underlying repository and the only
way to access it is through that API.  Thus well behaved extensions
and the framework itself can handle basically data as large as the
underlying repository has space for.  For the flow file attributes
though these are held in memory in a map with each flowfile object.
So it is important to avoid having vast (undefined) quantities of
attributes or attributes with really large (undefined) values.

There are things we can and should do to make even this relatively
transparent to the users and it is why actually we support swapping
flowfiles to disk when there are large queues because even those inmem
attributes can really add up.

Thanks
Joe

On Wed, Feb 17, 2016 at 11:06 AM, Lars Francke <la...@gmail.com> wrote:
> Hi and sorry for all these questions.
>
> I know that FlowFile content is persisted to the content_repository and can
> handle reasonably large amounts of data. Is the same true for attributes?
>
> I download JSON files (up to 200kb I'd say) and I want to insert them as
> they are into a PostgreSQL JSONB column. I'd love to use the PutSQL
> processor for that but it requires parameters in attributes.
>
> I have a feeling that putting large objects in attributes is a bad idea?

Re: Maximum attribute size

Posted by Lars Francke <la...@gmail.com>.

Thanks a lot for confirming my suspicions.

One last clarification: The WAL is different from the swapping concept,
correct? I guess it's way faster to swap in a dedicated "dump" than
replaying a WAL.

On Wed, Feb 17, 2016 at 7:53 PM, Joe Witt <jo...@gmail.com> wrote:

> Lars,
>
> You are right about the thought process.  We've never provided solid
> guidance here but we should.  It is definitely the case that flow file
> content is streamed to and from the underlying repository and the only
> way to access it is through that API.  Thus well behaved extensions
> and the framework itself can handle basically data as large as the
> underlying repository has space for.  For the flow file attributes
> though these are held in memory in a map with each flowfile object.
> So it is important to avoid having vast (undefined) quantities of
> attributes or attributes with really large (undefined) values.
>
> There are things we can and should do to make even this relatively
> transparent to the users and it is why actually we support swapping
> flowfiles to disk when there are large queues because even those inmem
> attributes can really add up.
>
> Thanks
> Joe
>
> On Wed, Feb 17, 2016 at 11:06 AM, Lars Francke <la...@gmail.com>
> wrote:
> > Hi and sorry for all these questions.
> >
> > I know that FlowFile content is persisted to the content_repository and
> can
> > handle reasonably large amounts of data. Is the same true for attributes?
> >
> > I download JSON files (up to 200kb I'd say) and I want to insert them as
> > they are into a PostgreSQL JSONB column. I'd love to use the PutSQL
> > processor for that but it requires parameters in attributes.
> >
> > I have a feeling that putting large objects in attributes is a bad idea?
>

Re: Maximum attribute size

Posted by Joe Witt <jo...@gmail.com>.

Lars,

You are right about the thought process.  We've never provided solid
guidance here but we should.  It is definitely the case that flow file
content is streamed to and from the underlying repository and the only
way to access it is through that API.  Thus well behaved extensions
and the framework itself can handle basically data as large as the
underlying repository has space for.  For the flow file attributes
though these are held in memory in a map with each flowfile object.
So it is important to avoid having vast (undefined) quantities of
attributes or attributes with really large (undefined) values.

There are things we can and should do to make even this relatively
transparent to the users and it is why actually we support swapping
flowfiles to disk when there are large queues because even those inmem
attributes can really add up.

Thanks
Joe

On Wed, Feb 17, 2016 at 11:06 AM, Lars Francke <la...@gmail.com> wrote:
> Hi and sorry for all these questions.
>
> I know that FlowFile content is persisted to the content_repository and can
> handle reasonably large amounts of data. Is the same true for attributes?
>
> I download JSON files (up to 200kb I'd say) and I want to insert them as
> they are into a PostgreSQL JSONB column. I'd love to use the PutSQL
> processor for that but it requires parameters in attributes.
>
> I have a feeling that putting large objects in attributes is a bad idea?