You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nifi.apache.org by József Mészáros <jo...@impresstv.com> on 2015/10/01 18:13:02 UTC

Interactive Queue management

Hey NiFi experts :-)

I have started to work on the backend part of interactive queue management,
which has several related issues: NIFI-99 (Review in flight flow file
details) <https://issues.apache.org/jira/browse/NIFI-99>,NIFI-108
<https://issues.apache.org/jira/browse/NIFI-108>,NIFI-730 (Purge queue from
UI) <https://issues.apache.org/jira/browse/NIFI-730>,NIFI-139 (Distribution
of FlowFiles on a connection)
<https://issues.apache.org/jira/browse/NIFI-139>. There is a feature
proposal description [1], which helps you to get a quick overview.

I hope I made the first step to move forward with this topic in a
"standard" and "good" direction. I tried to think generally, and making the
changes considering all the requested improvements from backend
perspective. The basic idea was to extend the web-api with a new endpoint
for managing a connection queue (used e.g. by the UI). Based on the
mentioned issues, I created the following new "methods":

   - Get the content of the connection queue

*GET*
http://your-host/nifi-api/controller/process-groups/{process-group-id}/connections/{connection-id}/queue

Response: List of connection queue items

   - Clear (purge) the connection queue

*DELETE*
http://your-host/nifi-api/controller/process-groups/{process-group-id}/connections/{connection-id}/queue

   - Remove a single item from the connection queue

*DELETE*
http://your-host/nifi-api/controller/process-groups/{process-group-id}/connections/{connection-id}/queue/{flow-file-uuid}

   - Get a single item from the connection queue

*GET*
http://your-host/nifi-api/controller/process-groups/{process-group-id}/connections/{connection-id}/queue/{flow-file-uuid}

Response: Single connection queue item

The connection queue item looks like this (JSON):

"connectionQueueItem": {
        "flowFileId": 16,
        "flowFileUuid": "92c74b41-005e-444d-8f9e-f9cbc01af5f2",
        "fileSize": "42 bytes",
        "fileSizeBytes": 42,
        "fileName": "filename.tsv",
        "entryDate": 1443546148763,
        "lineageStartDate": 1443546148763,
        "contentClaimSection": "1",
        "contentClaimContainer": "default",
        "contentClaimIdentifier": "1443546148763-1",
        "contentClaimOffset": 0
    }

If the flow file has a priority attribute, it is also included as a numeric
value.

It contains enough information for "View content" and "Download content"
panels and from the frontend perspective, it could be implemented in a
similar way. If you would like to purge the connection queue, or a single
item, you just have to make an HTTP DELETE request. And off course your are
able to make statistics and review the content of the queue with the first
method. Updating an item (e.g. reorder the queue) can result a new method,
or covered by a delete + put for a single item with a new priority value.

You can find my commits in the following repo :
https://github.com/ImpressTv/nifi in branch NIFI-108
<https://github.com/ImpressTV/nifi/tree/NIFI-108>. I do not want to make a
pull request until the backend is not in a mergeable state.

It is important to mention, that the backend code is not complete, and at
some points it maybe requires some reshaping, but you can see the basic
concept, and the direction. Before making any new commits, I wanted to
share the current state with you, and start/initiate a conversation about
the topic, and to get feedback, whether is it a good contribution, or not.
So guys, the questions are open :-)

Regards,
Joe

[1]
https://cwiki.apache.org/confluence/display/NIFI/Interactive+Queue+Management

Re: Interactive Queue management

Posted by Corey Flowers <cf...@onyxpoint.com>.
Can we also get the ability to clear the bulletins from a processor on
restart? Sometimes it would be easier to troubleshoot a processor if
you could clear the bulletins.

Sent from my iPhone

> On Oct 9, 2015, at 10:16 AM, Matt Gilman <ma...@gmail.com> wrote:
>
> Joe,
>
> We've been receiving a lot of feedback lately regarding the Purge/Clear
> Queue capability. Because of this we'd like to introduce that feature into
> the 0.4.0 release with the Viewing and Removing of individual FlowFiles
> into a subsequent release. The Purge/Clear Queue capability is a small
> portion of the full Queue Management feature that we can introduce
> independently and quickly.
>
> I looked through your NiFi Fork specifically at the Purge/Clear
> functionality. There are some additional considerations that I mentioned
> before specifically around FlowFile swapping and submitting the Purge/Clear
> request asynchronously that we need to account for. We have created a
> branch (NIFI-730) for doing this work. We'd love for you to work with us on
> this aspect if you are interested.
>
> Thanks!
>
> Matt
>
>> On Fri, Oct 2, 2015 at 3:16 PM, Matt Gilman <ma...@gmail.com> wrote:
>>
>> Joe,
>>
>> Yes, as Mark mentioned it is definitely awesome that your interested in
>> digging in here. Most of the discussions regarding this feature have been
>> really high level at this point. So we're happy to work through some of the
>> details as Mark has begun. A couple points that come to mind right now.
>>
>> - I don't think we want to support manually prioritization. The
>> connections can be configured with prioritizers and we'd like to use those
>> to manage the ordering of the enqueued FlowFiles. However, the listing of
>> FlowFiles will be rendered by their priority by default though will likely
>> support sorting by any of the fields.
>>
>> - The current thought process is that we'll want to require source and
>> destination components to be stopped. This is inline with the existing
>> functionality throughout the application.
>>
>> - The number of enqueued FlowFiles is technically unbounded. Because of
>> this the endpoint may be need to support some sort of pagination since we'd
>> may not want/be able to return the entire queue in a single response. There
>> is some concern about Java heap since many of the flowfiles may be swapped
>> out to disk. Additionally there are some concerns about HTTP response size
>> and the amount of data we store client side.
>>
>> - What we do with the FlowFiles that are swapped out to disk is still
>> undecided. Not sure whether we want to load them from disk in order to
>> include them in the response or if we just show that X number of FlowFiles
>> are currently swapped out.
>>
>> Some of these items will need to be hashed out but we're happy to work
>> through them with you. We should keep the Feature Proposal up to date as
>> well [1].
>>
>> Thanks!
>>
>> Matt
>>
>> [1]
>> https://cwiki.apache.org/confluence/display/NIFI/Interactive+Queue+Management
>>
>>> On Thu, Oct 1, 2015 at 12:34 PM, Mark Payne <ma...@hotmail.com> wrote:
>>>
>>> Joe,
>>>
>>> First of all, it is awesome that you're interested in jumping on this!
>>> And I think you're off to a great
>>> start and have a really good understanding of exactly where we all want
>>> to go with this.
>>>
>>> I'm sure there will be a lot of questions that will come up in working
>>> through a lot of the
>>> stuff here. Just from reading through the email here i have a couple of
>>> comments/thoughts that
>>> may help to shape the way forward. This is a bit of a stream of
>>> consciousness, so I hope all
>>> makes sense :)
>>>
>>> The connectionQueueItem model that you lay out here, I think is really
>>> just a FlowFile.
>>> I think it will make sense to just use the name flowFile.
>>>
>>> When you bring up the contents of a queue in the UI, I would imagine that
>>> it would be shown
>>> as something similar to the Data Provenance table. From there I'd want to
>>> click on the FlowFile
>>> in the table to see more details. So I'm envisioning two separate data
>>> models really. The first
>>> would be maybe a FlowFileSummary. It would look very similar to what
>>> you've laid out below,
>>> but perhaps contain information about how long the FlowFile has been
>>> queued up, perhaps
>>> how many times it has been re-queued on this particular queue (for
>>> example, if a FlowFile keeps
>>> failing to process, we could use this information to remove that
>>> particular FlowFile from the
>>> queue, etc.)
>>>
>>> When we get more info for the FlowFile, I would expect it to contain all
>>> FlowFile Attributes. This
>>> I think is a different data model because if we pull back all attributes
>>> for every FlowFile when
>>> we render the table, the amount of data brought back could be huge.
>>>
>>> Another consideration here, is that when a connection has a lot of
>>> FlowFiles on it, the framework
>>> may swap those FlowFiles out to disk in order to remove them from the
>>> Java heap. We will
>>> want to ensure that we include info about how much is in the queue (# of
>>> FlowFiles and size of those
>>> FlowFiles), how much is swapped out (# of FlowFiles + size), and how much
>>> is currently being
>>> processed by Processors (in the FlowFileQueue this is referenced as
>>> Unacknowledged FlowFiles).
>>>
>>> In the UI table, we should also make sure that by default we are showing
>>> the FlowFiles in the order
>>> in which they exist in the queue right now.
>>>
>>> From a RESTful perspective, we may want to also consider that in order to
>>> purge a queue, we are not
>>> really deleting the queue itself, but rather its contents. So perhaps we
>>> should use a URI like
>>>
>>> http://your-host/nifi-api/controller/process-groups/{process-group-id}/connections/{connection-id}/queue/contents
>>> <
>>> http://your-host/nifi-api/controller/process-groups/%7Bprocess-group-id%7D/connections/%7Bconnection-id%7D/queue/contents
>>> but I'll be the first to admit that REST is not really my forte. So if
>>> that doesn't make sense then ignore that.
>>>
>>> Very excited to see you jumping in here!
>>>
>>> Thanks
>>> -Mark
>>>
>>>
>>>
>>>>> On Oct 1, 2015, at 12:13 PM, József Mészáros <
>>>> joe.meszaros@impresstv.com> wrote:
>>>>
>>>> Hey NiFi experts :-)
>>>>
>>>> I have started to work on the backend part of interactive queue
>>> management,
>>>> which has several related issues: NIFI-99 (Review in flight flow file
>>>> details) <https://issues.apache.org/jira/browse/NIFI-99>,NIFI-108
>>>> <https://issues.apache.org/jira/browse/NIFI-108>,NIFI-730 (Purge queue
>>> from
>>>> UI) <https://issues.apache.org/jira/browse/NIFI-730>,NIFI-139
>>> (Distribution
>>>> of FlowFiles on a connection)
>>>> <https://issues.apache.org/jira/browse/NIFI-139>. There is a feature
>>>> proposal description [1], which helps you to get a quick overview.
>>>>
>>>> I hope I made the first step to move forward with this topic in a
>>>> "standard" and "good" direction. I tried to think generally, and making
>>> the
>>>> changes considering all the requested improvements from backend
>>>> perspective. The basic idea was to extend the web-api with a new
>>> endpoint
>>>> for managing a connection queue (used e.g. by the UI). Based on the
>>>> mentioned issues, I created the following new "methods":
>>>>
>>>>  - Get the content of the connection queue
>>>>
>>>> *GET*
>>> http://your-host/nifi-api/controller/process-groups/{process-group-id}/connections/{connection-id}/queue
>>>>
>>>> Response: List of connection queue items
>>>>
>>>>  - Clear (purge) the connection queue
>>>>
>>>> *DELETE*
>>> http://your-host/nifi-api/controller/process-groups/{process-group-id}/connections/{connection-id}/queue
>>>>
>>>>  - Remove a single item from the connection queue
>>>>
>>>> *DELETE*
>>> http://your-host/nifi-api/controller/process-groups/{process-group-id}/connections/{connection-id}/queue/{flow-file-uuid}
>>>>
>>>>  - Get a single item from the connection queue
>>>>
>>>> *GET*
>>> http://your-host/nifi-api/controller/process-groups/{process-group-id}/connections/{connection-id}/queue/{flow-file-uuid}
>>>>
>>>> Response: Single connection queue item
>>>>
>>>> The connection queue item looks like this (JSON):
>>>>
>>>> "connectionQueueItem": {
>>>>       "flowFileId": 16,
>>>>       "flowFileUuid": "92c74b41-005e-444d-8f9e-f9cbc01af5f2",
>>>>       "fileSize": "42 bytes",
>>>>       "fileSizeBytes": 42,
>>>>       "fileName": "filename.tsv",
>>>>       "entryDate": 1443546148763,
>>>>       "lineageStartDate": 1443546148763,
>>>>       "contentClaimSection": "1",
>>>>       "contentClaimContainer": "default",
>>>>       "contentClaimIdentifier": "1443546148763-1",
>>>>       "contentClaimOffset": 0
>>>>   }
>>>>
>>>> If the flow file has a priority attribute, it is also included as a
>>> numeric
>>>> value.
>>>>
>>>> It contains enough information for "View content" and "Download content"
>>>> panels and from the frontend perspective, it could be implemented in a
>>>> similar way. If you would like to purge the connection queue, or a
>>> single
>>>> item, you just have to make an HTTP DELETE request. And off course your
>>> are
>>>> able to make statistics and review the content of the queue with the
>>> first
>>>> method. Updating an item (e.g. reorder the queue) can result a new
>>> method,
>>>> or covered by a delete + put for a single item with a new priority
>>> value.
>>>>
>>>> You can find my commits in the following repo :
>>>> https://github.com/ImpressTv/nifi in branch NIFI-108
>>>> <https://github.com/ImpressTV/nifi/tree/NIFI-108>. I do not want to
>>> make a
>>>> pull request until the backend is not in a mergeable state.
>>>>
>>>> It is important to mention, that the backend code is not complete, and
>>> at
>>>> some points it maybe requires some reshaping, but you can see the basic
>>>> concept, and the direction. Before making any new commits, I wanted to
>>>> share the current state with you, and start/initiate a conversation
>>> about
>>>> the topic, and to get feedback, whether is it a good contribution, or
>>> not.
>>>> So guys, the questions are open :-)
>>>>
>>>> Regards,
>>>> Joe
>>>>
>>>> [1]
>>> https://cwiki.apache.org/confluence/display/NIFI/Interactive+Queue+Management
>>

Re: Interactive Queue management

Posted by Rick Braddy <rb...@softnas.com>.
+1

> On Oct 9, 2015, at 7:16 AM, Matt Gilman <ma...@gmail.com> wrote:
> 
> Joe,
> 
> We've been receiving a lot of feedback lately regarding the Purge/Clear
> Queue capability. Because of this we'd like to introduce that feature into
> the 0.4.0 release with the Viewing and Removing of individual FlowFiles
> into a subsequent release. The Purge/Clear Queue capability is a small
> portion of the full Queue Management feature that we can introduce
> independently and quickly.
> 
> I looked through your NiFi Fork specifically at the Purge/Clear
> functionality. There are some additional considerations that I mentioned
> before specifically around FlowFile swapping and submitting the Purge/Clear
> request asynchronously that we need to account for. We have created a
> branch (NIFI-730) for doing this work. We'd love for you to work with us on
> this aspect if you are interested.
> 
> Thanks!
> 
> Matt
> 
>> On Fri, Oct 2, 2015 at 3:16 PM, Matt Gilman <ma...@gmail.com> wrote:
>> 
>> Joe,
>> 
>> Yes, as Mark mentioned it is definitely awesome that your interested in
>> digging in here. Most of the discussions regarding this feature have been
>> really high level at this point. So we're happy to work through some of the
>> details as Mark has begun. A couple points that come to mind right now.
>> 
>> - I don't think we want to support manually prioritization. The
>> connections can be configured with prioritizers and we'd like to use those
>> to manage the ordering of the enqueued FlowFiles. However, the listing of
>> FlowFiles will be rendered by their priority by default though will likely
>> support sorting by any of the fields.
>> 
>> - The current thought process is that we'll want to require source and
>> destination components to be stopped. This is inline with the existing
>> functionality throughout the application.
>> 
>> - The number of enqueued FlowFiles is technically unbounded. Because of
>> this the endpoint may be need to support some sort of pagination since we'd
>> may not want/be able to return the entire queue in a single response. There
>> is some concern about Java heap since many of the flowfiles may be swapped
>> out to disk. Additionally there are some concerns about HTTP response size
>> and the amount of data we store client side.
>> 
>> - What we do with the FlowFiles that are swapped out to disk is still
>> undecided. Not sure whether we want to load them from disk in order to
>> include them in the response or if we just show that X number of FlowFiles
>> are currently swapped out.
>> 
>> Some of these items will need to be hashed out but we're happy to work
>> through them with you. We should keep the Feature Proposal up to date as
>> well [1].
>> 
>> Thanks!
>> 
>> Matt
>> 
>> [1]
>> https://cwiki.apache.org/confluence/display/NIFI/Interactive+Queue+Management
>> 
>>> On Thu, Oct 1, 2015 at 12:34 PM, Mark Payne <ma...@hotmail.com> wrote:
>>> 
>>> Joe,
>>> 
>>> First of all, it is awesome that you're interested in jumping on this!
>>> And I think you're off to a great
>>> start and have a really good understanding of exactly where we all want
>>> to go with this.
>>> 
>>> I'm sure there will be a lot of questions that will come up in working
>>> through a lot of the
>>> stuff here. Just from reading through the email here i have a couple of
>>> comments/thoughts that
>>> may help to shape the way forward. This is a bit of a stream of
>>> consciousness, so I hope all
>>> makes sense :)
>>> 
>>> The connectionQueueItem model that you lay out here, I think is really
>>> just a FlowFile.
>>> I think it will make sense to just use the name flowFile.
>>> 
>>> When you bring up the contents of a queue in the UI, I would imagine that
>>> it would be shown
>>> as something similar to the Data Provenance table. From there I'd want to
>>> click on the FlowFile
>>> in the table to see more details. So I'm envisioning two separate data
>>> models really. The first
>>> would be maybe a FlowFileSummary. It would look very similar to what
>>> you've laid out below,
>>> but perhaps contain information about how long the FlowFile has been
>>> queued up, perhaps
>>> how many times it has been re-queued on this particular queue (for
>>> example, if a FlowFile keeps
>>> failing to process, we could use this information to remove that
>>> particular FlowFile from the
>>> queue, etc.)
>>> 
>>> When we get more info for the FlowFile, I would expect it to contain all
>>> FlowFile Attributes. This
>>> I think is a different data model because if we pull back all attributes
>>> for every FlowFile when
>>> we render the table, the amount of data brought back could be huge.
>>> 
>>> Another consideration here, is that when a connection has a lot of
>>> FlowFiles on it, the framework
>>> may swap those FlowFiles out to disk in order to remove them from the
>>> Java heap. We will
>>> want to ensure that we include info about how much is in the queue (# of
>>> FlowFiles and size of those
>>> FlowFiles), how much is swapped out (# of FlowFiles + size), and how much
>>> is currently being
>>> processed by Processors (in the FlowFileQueue this is referenced as
>>> Unacknowledged FlowFiles).
>>> 
>>> In the UI table, we should also make sure that by default we are showing
>>> the FlowFiles in the order
>>> in which they exist in the queue right now.
>>> 
>>> From a RESTful perspective, we may want to also consider that in order to
>>> purge a queue, we are not
>>> really deleting the queue itself, but rather its contents. So perhaps we
>>> should use a URI like
>>> 
>>> http://your-host/nifi-api/controller/process-groups/{process-group-id}/connections/{connection-id}/queue/contents
>>> <
>>> http://your-host/nifi-api/controller/process-groups/%7Bprocess-group-id%7D/connections/%7Bconnection-id%7D/queue/contents
>>> but I'll be the first to admit that REST is not really my forte. So if
>>> that doesn't make sense then ignore that.
>>> 
>>> Very excited to see you jumping in here!
>>> 
>>> Thanks
>>> -Mark
>>> 
>>> 
>>> 
>>>>> On Oct 1, 2015, at 12:13 PM, József Mészáros <
>>>> joe.meszaros@impresstv.com> wrote:
>>>> 
>>>> Hey NiFi experts :-)
>>>> 
>>>> I have started to work on the backend part of interactive queue
>>> management,
>>>> which has several related issues: NIFI-99 (Review in flight flow file
>>>> details) <https://issues.apache.org/jira/browse/NIFI-99>,NIFI-108
>>>> <https://issues.apache.org/jira/browse/NIFI-108>,NIFI-730 (Purge queue
>>> from
>>>> UI) <https://issues.apache.org/jira/browse/NIFI-730>,NIFI-139
>>> (Distribution
>>>> of FlowFiles on a connection)
>>>> <https://issues.apache.org/jira/browse/NIFI-139>. There is a feature
>>>> proposal description [1], which helps you to get a quick overview.
>>>> 
>>>> I hope I made the first step to move forward with this topic in a
>>>> "standard" and "good" direction. I tried to think generally, and making
>>> the
>>>> changes considering all the requested improvements from backend
>>>> perspective. The basic idea was to extend the web-api with a new
>>> endpoint
>>>> for managing a connection queue (used e.g. by the UI). Based on the
>>>> mentioned issues, I created the following new "methods":
>>>> 
>>>>  - Get the content of the connection queue
>>>> 
>>>> *GET*
>>> http://your-host/nifi-api/controller/process-groups/{process-group-id}/connections/{connection-id}/queue
>>>> 
>>>> Response: List of connection queue items
>>>> 
>>>>  - Clear (purge) the connection queue
>>>> 
>>>> *DELETE*
>>> http://your-host/nifi-api/controller/process-groups/{process-group-id}/connections/{connection-id}/queue
>>>> 
>>>>  - Remove a single item from the connection queue
>>>> 
>>>> *DELETE*
>>> http://your-host/nifi-api/controller/process-groups/{process-group-id}/connections/{connection-id}/queue/{flow-file-uuid}
>>>> 
>>>>  - Get a single item from the connection queue
>>>> 
>>>> *GET*
>>> http://your-host/nifi-api/controller/process-groups/{process-group-id}/connections/{connection-id}/queue/{flow-file-uuid}
>>>> 
>>>> Response: Single connection queue item
>>>> 
>>>> The connection queue item looks like this (JSON):
>>>> 
>>>> "connectionQueueItem": {
>>>>       "flowFileId": 16,
>>>>       "flowFileUuid": "92c74b41-005e-444d-8f9e-f9cbc01af5f2",
>>>>       "fileSize": "42 bytes",
>>>>       "fileSizeBytes": 42,
>>>>       "fileName": "filename.tsv",
>>>>       "entryDate": 1443546148763,
>>>>       "lineageStartDate": 1443546148763,
>>>>       "contentClaimSection": "1",
>>>>       "contentClaimContainer": "default",
>>>>       "contentClaimIdentifier": "1443546148763-1",
>>>>       "contentClaimOffset": 0
>>>>   }
>>>> 
>>>> If the flow file has a priority attribute, it is also included as a
>>> numeric
>>>> value.
>>>> 
>>>> It contains enough information for "View content" and "Download content"
>>>> panels and from the frontend perspective, it could be implemented in a
>>>> similar way. If you would like to purge the connection queue, or a
>>> single
>>>> item, you just have to make an HTTP DELETE request. And off course your
>>> are
>>>> able to make statistics and review the content of the queue with the
>>> first
>>>> method. Updating an item (e.g. reorder the queue) can result a new
>>> method,
>>>> or covered by a delete + put for a single item with a new priority
>>> value.
>>>> 
>>>> You can find my commits in the following repo :
>>>> https://github.com/ImpressTv/nifi in branch NIFI-108
>>>> <https://github.com/ImpressTV/nifi/tree/NIFI-108>. I do not want to
>>> make a
>>>> pull request until the backend is not in a mergeable state.
>>>> 
>>>> It is important to mention, that the backend code is not complete, and
>>> at
>>>> some points it maybe requires some reshaping, but you can see the basic
>>>> concept, and the direction. Before making any new commits, I wanted to
>>>> share the current state with you, and start/initiate a conversation
>>> about
>>>> the topic, and to get feedback, whether is it a good contribution, or
>>> not.
>>>> So guys, the questions are open :-)
>>>> 
>>>> Regards,
>>>> Joe
>>>> 
>>>> [1]
>>> https://cwiki.apache.org/confluence/display/NIFI/Interactive+Queue+Management
>> 

Re: Interactive Queue management

Posted by Matt Gilman <ma...@gmail.com>.
Joe,

We've been receiving a lot of feedback lately regarding the Purge/Clear
Queue capability. Because of this we'd like to introduce that feature into
the 0.4.0 release with the Viewing and Removing of individual FlowFiles
into a subsequent release. The Purge/Clear Queue capability is a small
portion of the full Queue Management feature that we can introduce
independently and quickly.

I looked through your NiFi Fork specifically at the Purge/Clear
functionality. There are some additional considerations that I mentioned
before specifically around FlowFile swapping and submitting the Purge/Clear
request asynchronously that we need to account for. We have created a
branch (NIFI-730) for doing this work. We'd love for you to work with us on
this aspect if you are interested.

Thanks!

Matt

On Fri, Oct 2, 2015 at 3:16 PM, Matt Gilman <ma...@gmail.com> wrote:

> Joe,
>
> Yes, as Mark mentioned it is definitely awesome that your interested in
> digging in here. Most of the discussions regarding this feature have been
> really high level at this point. So we're happy to work through some of the
> details as Mark has begun. A couple points that come to mind right now.
>
> - I don't think we want to support manually prioritization. The
> connections can be configured with prioritizers and we'd like to use those
> to manage the ordering of the enqueued FlowFiles. However, the listing of
> FlowFiles will be rendered by their priority by default though will likely
> support sorting by any of the fields.
>
> - The current thought process is that we'll want to require source and
> destination components to be stopped. This is inline with the existing
> functionality throughout the application.
>
> - The number of enqueued FlowFiles is technically unbounded. Because of
> this the endpoint may be need to support some sort of pagination since we'd
> may not want/be able to return the entire queue in a single response. There
> is some concern about Java heap since many of the flowfiles may be swapped
> out to disk. Additionally there are some concerns about HTTP response size
> and the amount of data we store client side.
>
> - What we do with the FlowFiles that are swapped out to disk is still
> undecided. Not sure whether we want to load them from disk in order to
> include them in the response or if we just show that X number of FlowFiles
> are currently swapped out.
>
> Some of these items will need to be hashed out but we're happy to work
> through them with you. We should keep the Feature Proposal up to date as
> well [1].
>
> Thanks!
>
> Matt
>
> [1]
> https://cwiki.apache.org/confluence/display/NIFI/Interactive+Queue+Management
>
> On Thu, Oct 1, 2015 at 12:34 PM, Mark Payne <ma...@hotmail.com> wrote:
>
>> Joe,
>>
>> First of all, it is awesome that you're interested in jumping on this!
>> And I think you're off to a great
>> start and have a really good understanding of exactly where we all want
>> to go with this.
>>
>> I'm sure there will be a lot of questions that will come up in working
>> through a lot of the
>> stuff here. Just from reading through the email here i have a couple of
>> comments/thoughts that
>> may help to shape the way forward. This is a bit of a stream of
>> consciousness, so I hope all
>> makes sense :)
>>
>> The connectionQueueItem model that you lay out here, I think is really
>> just a FlowFile.
>> I think it will make sense to just use the name flowFile.
>>
>> When you bring up the contents of a queue in the UI, I would imagine that
>> it would be shown
>> as something similar to the Data Provenance table. From there I'd want to
>> click on the FlowFile
>> in the table to see more details. So I'm envisioning two separate data
>> models really. The first
>> would be maybe a FlowFileSummary. It would look very similar to what
>> you've laid out below,
>> but perhaps contain information about how long the FlowFile has been
>> queued up, perhaps
>> how many times it has been re-queued on this particular queue (for
>> example, if a FlowFile keeps
>> failing to process, we could use this information to remove that
>> particular FlowFile from the
>> queue, etc.)
>>
>> When we get more info for the FlowFile, I would expect it to contain all
>> FlowFile Attributes. This
>> I think is a different data model because if we pull back all attributes
>> for every FlowFile when
>> we render the table, the amount of data brought back could be huge.
>>
>> Another consideration here, is that when a connection has a lot of
>> FlowFiles on it, the framework
>> may swap those FlowFiles out to disk in order to remove them from the
>> Java heap. We will
>> want to ensure that we include info about how much is in the queue (# of
>> FlowFiles and size of those
>> FlowFiles), how much is swapped out (# of FlowFiles + size), and how much
>> is currently being
>> processed by Processors (in the FlowFileQueue this is referenced as
>> Unacknowledged FlowFiles).
>>
>> In the UI table, we should also make sure that by default we are showing
>> the FlowFiles in the order
>> in which they exist in the queue right now.
>>
>> From a RESTful perspective, we may want to also consider that in order to
>> purge a queue, we are not
>> really deleting the queue itself, but rather its contents. So perhaps we
>> should use a URI like
>>
>> http://your-host/nifi-api/controller/process-groups/{process-group-id}/connections/{connection-id}/queue/contents
>> <
>> http://your-host/nifi-api/controller/process-groups/%7Bprocess-group-id%7D/connections/%7Bconnection-id%7D/queue/contents
>> >
>> but I'll be the first to admit that REST is not really my forte. So if
>> that doesn't make sense then ignore that.
>>
>> Very excited to see you jumping in here!
>>
>> Thanks
>> -Mark
>>
>>
>>
>> > On Oct 1, 2015, at 12:13 PM, József Mészáros <
>> joe.meszaros@impresstv.com> wrote:
>> >
>> > Hey NiFi experts :-)
>> >
>> > I have started to work on the backend part of interactive queue
>> management,
>> > which has several related issues: NIFI-99 (Review in flight flow file
>> > details) <https://issues.apache.org/jira/browse/NIFI-99>,NIFI-108
>> > <https://issues.apache.org/jira/browse/NIFI-108>,NIFI-730 (Purge queue
>> from
>> > UI) <https://issues.apache.org/jira/browse/NIFI-730>,NIFI-139
>> (Distribution
>> > of FlowFiles on a connection)
>> > <https://issues.apache.org/jira/browse/NIFI-139>. There is a feature
>> > proposal description [1], which helps you to get a quick overview.
>> >
>> > I hope I made the first step to move forward with this topic in a
>> > "standard" and "good" direction. I tried to think generally, and making
>> the
>> > changes considering all the requested improvements from backend
>> > perspective. The basic idea was to extend the web-api with a new
>> endpoint
>> > for managing a connection queue (used e.g. by the UI). Based on the
>> > mentioned issues, I created the following new "methods":
>> >
>> >   - Get the content of the connection queue
>> >
>> > *GET*
>> >
>> http://your-host/nifi-api/controller/process-groups/{process-group-id}/connections/{connection-id}/queue
>> >
>> > Response: List of connection queue items
>> >
>> >   - Clear (purge) the connection queue
>> >
>> > *DELETE*
>> >
>> http://your-host/nifi-api/controller/process-groups/{process-group-id}/connections/{connection-id}/queue
>> >
>> >   - Remove a single item from the connection queue
>> >
>> > *DELETE*
>> >
>> http://your-host/nifi-api/controller/process-groups/{process-group-id}/connections/{connection-id}/queue/{flow-file-uuid}
>> >
>> >   - Get a single item from the connection queue
>> >
>> > *GET*
>> >
>> http://your-host/nifi-api/controller/process-groups/{process-group-id}/connections/{connection-id}/queue/{flow-file-uuid}
>> >
>> > Response: Single connection queue item
>> >
>> > The connection queue item looks like this (JSON):
>> >
>> > "connectionQueueItem": {
>> >        "flowFileId": 16,
>> >        "flowFileUuid": "92c74b41-005e-444d-8f9e-f9cbc01af5f2",
>> >        "fileSize": "42 bytes",
>> >        "fileSizeBytes": 42,
>> >        "fileName": "filename.tsv",
>> >        "entryDate": 1443546148763,
>> >        "lineageStartDate": 1443546148763,
>> >        "contentClaimSection": "1",
>> >        "contentClaimContainer": "default",
>> >        "contentClaimIdentifier": "1443546148763-1",
>> >        "contentClaimOffset": 0
>> >    }
>> >
>> > If the flow file has a priority attribute, it is also included as a
>> numeric
>> > value.
>> >
>> > It contains enough information for "View content" and "Download content"
>> > panels and from the frontend perspective, it could be implemented in a
>> > similar way. If you would like to purge the connection queue, or a
>> single
>> > item, you just have to make an HTTP DELETE request. And off course your
>> are
>> > able to make statistics and review the content of the queue with the
>> first
>> > method. Updating an item (e.g. reorder the queue) can result a new
>> method,
>> > or covered by a delete + put for a single item with a new priority
>> value.
>> >
>> > You can find my commits in the following repo :
>> > https://github.com/ImpressTv/nifi in branch NIFI-108
>> > <https://github.com/ImpressTV/nifi/tree/NIFI-108>. I do not want to
>> make a
>> > pull request until the backend is not in a mergeable state.
>> >
>> > It is important to mention, that the backend code is not complete, and
>> at
>> > some points it maybe requires some reshaping, but you can see the basic
>> > concept, and the direction. Before making any new commits, I wanted to
>> > share the current state with you, and start/initiate a conversation
>> about
>> > the topic, and to get feedback, whether is it a good contribution, or
>> not.
>> > So guys, the questions are open :-)
>> >
>> > Regards,
>> > Joe
>> >
>> > [1]
>> >
>> https://cwiki.apache.org/confluence/display/NIFI/Interactive+Queue+Management
>>
>>
>

Re: Interactive Queue management

Posted by Matt Gilman <ma...@gmail.com>.
Joe,

Yes, as Mark mentioned it is definitely awesome that your interested in
digging in here. Most of the discussions regarding this feature have been
really high level at this point. So we're happy to work through some of the
details as Mark has begun. A couple points that come to mind right now.

- I don't think we want to support manually prioritization. The connections
can be configured with prioritizers and we'd like to use those to manage
the ordering of the enqueued FlowFiles. However, the listing of FlowFiles
will be rendered by their priority by default though will likely support
sorting by any of the fields.

- The current thought process is that we'll want to require source and
destination components to be stopped. This is inline with the existing
functionality throughout the application.

- The number of enqueued FlowFiles is technically unbounded. Because of
this the endpoint may be need to support some sort of pagination since we'd
may not want/be able to return the entire queue in a single response. There
is some concern about Java heap since many of the flowfiles may be swapped
out to disk. Additionally there are some concerns about HTTP response size
and the amount of data we store client side.

- What we do with the FlowFiles that are swapped out to disk is still
undecided. Not sure whether we want to load them from disk in order to
include them in the response or if we just show that X number of FlowFiles
are currently swapped out.

Some of these items will need to be hashed out but we're happy to work
through them with you. We should keep the Feature Proposal up to date as
well [1].

Thanks!

Matt

[1]
https://cwiki.apache.org/confluence/display/NIFI/Interactive+Queue+Management

On Thu, Oct 1, 2015 at 12:34 PM, Mark Payne <ma...@hotmail.com> wrote:

> Joe,
>
> First of all, it is awesome that you're interested in jumping on this! And
> I think you're off to a great
> start and have a really good understanding of exactly where we all want to
> go with this.
>
> I'm sure there will be a lot of questions that will come up in working
> through a lot of the
> stuff here. Just from reading through the email here i have a couple of
> comments/thoughts that
> may help to shape the way forward. This is a bit of a stream of
> consciousness, so I hope all
> makes sense :)
>
> The connectionQueueItem model that you lay out here, I think is really
> just a FlowFile.
> I think it will make sense to just use the name flowFile.
>
> When you bring up the contents of a queue in the UI, I would imagine that
> it would be shown
> as something similar to the Data Provenance table. From there I'd want to
> click on the FlowFile
> in the table to see more details. So I'm envisioning two separate data
> models really. The first
> would be maybe a FlowFileSummary. It would look very similar to what
> you've laid out below,
> but perhaps contain information about how long the FlowFile has been
> queued up, perhaps
> how many times it has been re-queued on this particular queue (for
> example, if a FlowFile keeps
> failing to process, we could use this information to remove that
> particular FlowFile from the
> queue, etc.)
>
> When we get more info for the FlowFile, I would expect it to contain all
> FlowFile Attributes. This
> I think is a different data model because if we pull back all attributes
> for every FlowFile when
> we render the table, the amount of data brought back could be huge.
>
> Another consideration here, is that when a connection has a lot of
> FlowFiles on it, the framework
> may swap those FlowFiles out to disk in order to remove them from the Java
> heap. We will
> want to ensure that we include info about how much is in the queue (# of
> FlowFiles and size of those
> FlowFiles), how much is swapped out (# of FlowFiles + size), and how much
> is currently being
> processed by Processors (in the FlowFileQueue this is referenced as
> Unacknowledged FlowFiles).
>
> In the UI table, we should also make sure that by default we are showing
> the FlowFiles in the order
> in which they exist in the queue right now.
>
> From a RESTful perspective, we may want to also consider that in order to
> purge a queue, we are not
> really deleting the queue itself, but rather its contents. So perhaps we
> should use a URI like
>
> http://your-host/nifi-api/controller/process-groups/{process-group-id}/connections/{connection-id}/queue/contents
> <
> http://your-host/nifi-api/controller/process-groups/%7Bprocess-group-id%7D/connections/%7Bconnection-id%7D/queue/contents
> >
> but I'll be the first to admit that REST is not really my forte. So if
> that doesn't make sense then ignore that.
>
> Very excited to see you jumping in here!
>
> Thanks
> -Mark
>
>
>
> > On Oct 1, 2015, at 12:13 PM, József Mészáros <jo...@impresstv.com>
> wrote:
> >
> > Hey NiFi experts :-)
> >
> > I have started to work on the backend part of interactive queue
> management,
> > which has several related issues: NIFI-99 (Review in flight flow file
> > details) <https://issues.apache.org/jira/browse/NIFI-99>,NIFI-108
> > <https://issues.apache.org/jira/browse/NIFI-108>,NIFI-730 (Purge queue
> from
> > UI) <https://issues.apache.org/jira/browse/NIFI-730>,NIFI-139
> (Distribution
> > of FlowFiles on a connection)
> > <https://issues.apache.org/jira/browse/NIFI-139>. There is a feature
> > proposal description [1], which helps you to get a quick overview.
> >
> > I hope I made the first step to move forward with this topic in a
> > "standard" and "good" direction. I tried to think generally, and making
> the
> > changes considering all the requested improvements from backend
> > perspective. The basic idea was to extend the web-api with a new endpoint
> > for managing a connection queue (used e.g. by the UI). Based on the
> > mentioned issues, I created the following new "methods":
> >
> >   - Get the content of the connection queue
> >
> > *GET*
> >
> http://your-host/nifi-api/controller/process-groups/{process-group-id}/connections/{connection-id}/queue
> >
> > Response: List of connection queue items
> >
> >   - Clear (purge) the connection queue
> >
> > *DELETE*
> >
> http://your-host/nifi-api/controller/process-groups/{process-group-id}/connections/{connection-id}/queue
> >
> >   - Remove a single item from the connection queue
> >
> > *DELETE*
> >
> http://your-host/nifi-api/controller/process-groups/{process-group-id}/connections/{connection-id}/queue/{flow-file-uuid}
> >
> >   - Get a single item from the connection queue
> >
> > *GET*
> >
> http://your-host/nifi-api/controller/process-groups/{process-group-id}/connections/{connection-id}/queue/{flow-file-uuid}
> >
> > Response: Single connection queue item
> >
> > The connection queue item looks like this (JSON):
> >
> > "connectionQueueItem": {
> >        "flowFileId": 16,
> >        "flowFileUuid": "92c74b41-005e-444d-8f9e-f9cbc01af5f2",
> >        "fileSize": "42 bytes",
> >        "fileSizeBytes": 42,
> >        "fileName": "filename.tsv",
> >        "entryDate": 1443546148763,
> >        "lineageStartDate": 1443546148763,
> >        "contentClaimSection": "1",
> >        "contentClaimContainer": "default",
> >        "contentClaimIdentifier": "1443546148763-1",
> >        "contentClaimOffset": 0
> >    }
> >
> > If the flow file has a priority attribute, it is also included as a
> numeric
> > value.
> >
> > It contains enough information for "View content" and "Download content"
> > panels and from the frontend perspective, it could be implemented in a
> > similar way. If you would like to purge the connection queue, or a single
> > item, you just have to make an HTTP DELETE request. And off course your
> are
> > able to make statistics and review the content of the queue with the
> first
> > method. Updating an item (e.g. reorder the queue) can result a new
> method,
> > or covered by a delete + put for a single item with a new priority value.
> >
> > You can find my commits in the following repo :
> > https://github.com/ImpressTv/nifi in branch NIFI-108
> > <https://github.com/ImpressTV/nifi/tree/NIFI-108>. I do not want to
> make a
> > pull request until the backend is not in a mergeable state.
> >
> > It is important to mention, that the backend code is not complete, and at
> > some points it maybe requires some reshaping, but you can see the basic
> > concept, and the direction. Before making any new commits, I wanted to
> > share the current state with you, and start/initiate a conversation about
> > the topic, and to get feedback, whether is it a good contribution, or
> not.
> > So guys, the questions are open :-)
> >
> > Regards,
> > Joe
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/display/NIFI/Interactive+Queue+Management
>
>

Re: Interactive Queue management

Posted by Mark Payne <ma...@hotmail.com>.
Joe,

First of all, it is awesome that you're interested in jumping on this! And I think you're off to a great
start and have a really good understanding of exactly where we all want to go with this.

I'm sure there will be a lot of questions that will come up in working through a lot of the
stuff here. Just from reading through the email here i have a couple of comments/thoughts that
may help to shape the way forward. This is a bit of a stream of consciousness, so I hope all
makes sense :)

The connectionQueueItem model that you lay out here, I think is really just a FlowFile.
I think it will make sense to just use the name flowFile.

When you bring up the contents of a queue in the UI, I would imagine that it would be shown
as something similar to the Data Provenance table. From there I'd want to click on the FlowFile
in the table to see more details. So I'm envisioning two separate data models really. The first
would be maybe a FlowFileSummary. It would look very similar to what you've laid out below,
but perhaps contain information about how long the FlowFile has been queued up, perhaps
how many times it has been re-queued on this particular queue (for example, if a FlowFile keeps
failing to process, we could use this information to remove that particular FlowFile from the
queue, etc.)

When we get more info for the FlowFile, I would expect it to contain all FlowFile Attributes. This
I think is a different data model because if we pull back all attributes for every FlowFile when
we render the table, the amount of data brought back could be huge.

Another consideration here, is that when a connection has a lot of FlowFiles on it, the framework
may swap those FlowFiles out to disk in order to remove them from the Java heap. We will
want to ensure that we include info about how much is in the queue (# of FlowFiles and size of those 
FlowFiles), how much is swapped out (# of FlowFiles + size), and how much is currently being
processed by Processors (in the FlowFileQueue this is referenced as Unacknowledged FlowFiles).

In the UI table, we should also make sure that by default we are showing the FlowFiles in the order
in which they exist in the queue right now.

From a RESTful perspective, we may want to also consider that in order to purge a queue, we are not
really deleting the queue itself, but rather its contents. So perhaps we should use a URI like
http://your-host/nifi-api/controller/process-groups/{process-group-id}/connections/{connection-id}/queue/contents <http://your-host/nifi-api/controller/process-groups/%7Bprocess-group-id%7D/connections/%7Bconnection-id%7D/queue/contents>
but I'll be the first to admit that REST is not really my forte. So if that doesn't make sense then ignore that.

Very excited to see you jumping in here!

Thanks
-Mark



> On Oct 1, 2015, at 12:13 PM, József Mészáros <jo...@impresstv.com> wrote:
> 
> Hey NiFi experts :-)
> 
> I have started to work on the backend part of interactive queue management,
> which has several related issues: NIFI-99 (Review in flight flow file
> details) <https://issues.apache.org/jira/browse/NIFI-99>,NIFI-108
> <https://issues.apache.org/jira/browse/NIFI-108>,NIFI-730 (Purge queue from
> UI) <https://issues.apache.org/jira/browse/NIFI-730>,NIFI-139 (Distribution
> of FlowFiles on a connection)
> <https://issues.apache.org/jira/browse/NIFI-139>. There is a feature
> proposal description [1], which helps you to get a quick overview.
> 
> I hope I made the first step to move forward with this topic in a
> "standard" and "good" direction. I tried to think generally, and making the
> changes considering all the requested improvements from backend
> perspective. The basic idea was to extend the web-api with a new endpoint
> for managing a connection queue (used e.g. by the UI). Based on the
> mentioned issues, I created the following new "methods":
> 
>   - Get the content of the connection queue
> 
> *GET*
> http://your-host/nifi-api/controller/process-groups/{process-group-id}/connections/{connection-id}/queue
> 
> Response: List of connection queue items
> 
>   - Clear (purge) the connection queue
> 
> *DELETE*
> http://your-host/nifi-api/controller/process-groups/{process-group-id}/connections/{connection-id}/queue
> 
>   - Remove a single item from the connection queue
> 
> *DELETE*
> http://your-host/nifi-api/controller/process-groups/{process-group-id}/connections/{connection-id}/queue/{flow-file-uuid}
> 
>   - Get a single item from the connection queue
> 
> *GET*
> http://your-host/nifi-api/controller/process-groups/{process-group-id}/connections/{connection-id}/queue/{flow-file-uuid}
> 
> Response: Single connection queue item
> 
> The connection queue item looks like this (JSON):
> 
> "connectionQueueItem": {
>        "flowFileId": 16,
>        "flowFileUuid": "92c74b41-005e-444d-8f9e-f9cbc01af5f2",
>        "fileSize": "42 bytes",
>        "fileSizeBytes": 42,
>        "fileName": "filename.tsv",
>        "entryDate": 1443546148763,
>        "lineageStartDate": 1443546148763,
>        "contentClaimSection": "1",
>        "contentClaimContainer": "default",
>        "contentClaimIdentifier": "1443546148763-1",
>        "contentClaimOffset": 0
>    }
> 
> If the flow file has a priority attribute, it is also included as a numeric
> value.
> 
> It contains enough information for "View content" and "Download content"
> panels and from the frontend perspective, it could be implemented in a
> similar way. If you would like to purge the connection queue, or a single
> item, you just have to make an HTTP DELETE request. And off course your are
> able to make statistics and review the content of the queue with the first
> method. Updating an item (e.g. reorder the queue) can result a new method,
> or covered by a delete + put for a single item with a new priority value.
> 
> You can find my commits in the following repo :
> https://github.com/ImpressTv/nifi in branch NIFI-108
> <https://github.com/ImpressTV/nifi/tree/NIFI-108>. I do not want to make a
> pull request until the backend is not in a mergeable state.
> 
> It is important to mention, that the backend code is not complete, and at
> some points it maybe requires some reshaping, but you can see the basic
> concept, and the direction. Before making any new commits, I wanted to
> share the current state with you, and start/initiate a conversation about
> the topic, and to get feedback, whether is it a good contribution, or not.
> So guys, the questions are open :-)
> 
> Regards,
> Joe
> 
> [1]
> https://cwiki.apache.org/confluence/display/NIFI/Interactive+Queue+Management