You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by Charlie Frasure <ch...@gmail.com> on 2015/11/20 03:10:01 UTC

queued files

I have a question on troubleshooting a flow.  I've built a flow with no
exception routing, just trying to process the expected values first.  When
a file exposes a problem with the logic in my flow, it queues up prior to
the flow that is raising the bulletin.

In the bulletin, I can see an id, but can't tell which file it is.  Data
provenance doesn't seem to help as it passed the flow on the last
processor, but hasn't been logged (to my knowledge) on the next one.

Is there a way to match the bulletin back to a file without creating a
route for failed files?

Re: queued files

Posted by Joe Percivall <jo...@yahoo.com>.
Not a problem, I'd be interested in any follow-up details.

I agree that it should be a separate processor since this is an almost atomic unit of work that can be used in many different work-flows. I created a jira for this new processor: https://issues.apache.org/jira/browse/NIFI-1217


Joe
- - - - - - 
Joseph Percivall
linkedin.com/in/Percivall
e: joepercivall@yahoo.com




On Tuesday, November 24, 2015 1:49 PM, Charlie Frasure <ch...@gmail.com> wrote:



Interesting.  Thanks for the update and the template.  I use osx as a playground, but this will have to be implemented on RHEL.  I'll see about downloading or building this and testing.  Performance will be critical due to the volume of data; I've run into some python-based detection libraries that slowed the process way down.


A related project, jchardet[1] looks interesting as a possible start for a custom processor.

[1] http://jchardet.sourceforge.net/




On Tue, Nov 24, 2015 at 11:29 AM, Joe Percivall <jo...@yahoo.com> wrote:

Hello Charlie,
>
>I was looking back through and saw this wasn't totally resolved yet.
>
>
>Couple questions. First, what system are you using? There are a couple of options for the stream command depending on what you're using. Also are you able to get new commands (using yum or brew)?
>
>The key thing I want to solve is to find the encoding of a file just based on it contents and not relying on having access to the original file. ExecuteStreamCommand should enable this. This is because you can just pass any FlowFile into ExecuteStreamCommand then it can route the FlowFile contents to STDIN for the command to execute on.
>
>Mac's (what I am using) default command for finding file encodings is "file -bi filename.txt" but it doesn't allow you to pass in a file via STDIN. I found a command called "uchardet"[1] which finds file encodings and allows you to pass the file in via STDIN.
>
>I attached a template that takes in a file using GetFile (deletes the original) and routes that FlowFile to ExecuteStreamCommand. ExecuteStreamCommand then runs "uchardet" on the contents of the FlowFile and outputs the encoding to the "encoding" attribute of the original FlowFile.
>
>[1] https://github.com/BYVoid/uchardet
>
>If this doesn't satisfy your needs just let me know!
>Joe
>
>- - - - - -
>Joseph Percivall
>linkedin.com/in/Percivall
>e: joepercivall@yahoo.com
>
>
>
>
>
>On Friday, November 20, 2015 9:53 AM, Charlie Frasure <ch...@gmail.com> wrote:
>
>
>
>I'm definitely game for that.  Let me know what I can do to help.
>
>
>
>On Fri, Nov 20, 2015 at 9:35 AM, Joe Witt <jo...@gmail.com> wrote:
>
>Charlie
>>
>>Got ya.  I missed the 'encoding vs content type' thing.  I agree let's
>>find a way to avoid the extra copy.  We dont expose the storage
>>location of the underlying bytes.  So on the ListFile thing.  What I
>>was thinking was this (and honestly I've not tested this so maybe i'm
>>skipping something important)
>>
>>ListFile to get a listing of names/etc.. of interest
>>
>>Execute the 'file --mime-encoding ${filename}' to get more attributes
>>available to work with
>>
>>RouteOnAttribute to decide what to do with the file next.  You can
>>Fetch/delete what you don't want you can Fetch/pass on what you do
>>
>>I was looking for a way to check the mime-encoding while passing the
>>data to detect into an input stream.  because that is actually how
>>execute stream command wants to work.
>>
>>This is a use case that should be pretty easy so if you're willing to
>>chat through it with us we'll figure out a path to make it work well.
>>
>>Thanks
>>Joe
>>
>>On Fri, Nov 20, 2015 at 9:17 AM, Charlie Frasure
>>
>><ch...@gmail.com> wrote:
>>> Thanks Joe,
>>>
>>> The use case is that I'm receiving data without knowing what character set
>>> it is coming in.  --mime-encoding is giving it's best guess on character set
>>> rather than the content type.
>>>
>>> The ListFile sounds interesting, but I wonder if I really even need that.  I
>>> don't want to leave the files in place, I just want to run an external
>>> command on them as part of the data flow.  Is there a way I can run an
>>> external command against the physical file such as
>>> /opt/nifi/somedir/12345.uuid?  Would that info be in an attribute somewhere?
>>> It just seems wasteful to make an extra copy of the file, in order to run a
>>> read-only command on it, then delete it.  If ListFiles is still the right
>>> way to go, please let me know.
>>>
>>>
>>> On Fri, Nov 20, 2015 at 6:45 AM, Joe Witt <jo...@gmail.com> wrote:
>>>>
>>>> For identifying the mime type you may have sufficient results with the
>>>> existing processor 'IdentifyMimeType' which you can put into the flow.
>>>>
>>>> For better logic around identifying files to pull but first calling an
>>>> external command to learn more about them the upcoming
>>>> ListFile/FetchFile combo that comes from this JIRA [1] might give you
>>>> better flexibility.
>>>>
>>>> [1] https://issues.apache.org/jira/browse/NIFI-631
>>>>
>>>> Thanks
>>>> Joe
>>>>
>>>> On Fri, Nov 20, 2015 at 12:08 AM, Charlie Frasure
>>>> <ch...@gmail.com> wrote:
>>>> > Thanks everyone for the help.  The trouble started a few processors
>>>> > earlier
>>>> > in an ExecuteStreamCommand on ${filename} with the result of "file not
>>>> > found".  I had originally set my GetFile processor to not remove files,
>>>> > but
>>>> > recently changed that.  Now it seems that my ExecuteStreamCommand may
>>>> > not be
>>>> > the best way to accomplish this.
>>>> >
>>>> > The command that gets executed is: file -b --mime-encoding ${filename}
>>>> > in the working directory: ${absolute.path}
>>>> >
>>>> > Now that the file is no longer in the source directory when the
>>>> > processor
>>>> > fires, the command is broken.  I could PutFile somewhere temporarily; is
>>>> > there a better way?
>>>> >
>>>> > On Thu, Nov 19, 2015 at 10:33 PM, Joe Witt <jo...@gmail.com> wrote:
>>>> >>
>>>> >> Charlie,
>>>> >>
>>>> >> The fact that this is confusing is something we agree should be more
>>>> >> clear and we will improve.  We're tackling it based on what is
>>>> >> mentioned here [1].
>>>> >>
>>>> >> [1]
>>>> >>
>>>> >> https://cwiki.apache.org/confluence/display/NIFI/Interactive+Queue+Management
>>>> >>
>>>> >> Thanks
>>>> >> Joe
>>>> >>
>>>> >> On Thu, Nov 19, 2015 at 10:30 PM, Corey Flowers
>>>> >> <cf...@onyxpoint.com>
>>>> >> wrote:
>>>> >> > These guys are right. The file to look in for the uuid is the
>>>> >> > nifi-app.log.
>>>> >> > Also if you wanted to see what the processor itself was doing, you
>>>> >> > could
>>>> >> > right click on the processor, get its uuid and while it is running,
>>>> >> > run
>>>> >> > (assuming it is on Linux):
>>>> >> >
>>>> >> > tail -F nifi-app.log | grep uuid
>>>> >> >
>>>> >> > This will just scroll the logs for that specific processor and will
>>>> >> > show
>>>> >> > you
>>>> >> > what it is doing. It should also tell you specific file names and
>>>> >> > uuids
>>>> >> > of
>>>> >> > the failing files.
>>>> >> >
>>>> >> > Hope that helps! Have a great night and good luck!
>>>> >> >
>>>> >> > Sent from my iPhone
>>>> >> >
>>>> >> > On Nov 19, 2015, at 9:27 PM, Juan Sequeiros <he...@gmail.com>
>>>> >> > wrote:
>>>> >> >
>>>> >> > You can also check the NiFi logs for a searchable id or for what the
>>>> >> > previous processor ID produced to help search provenance.
>>>> >> >
>>>> >> > On Nov 19, 2015 21:22, "Bryan Bende" <bb...@gmail.com> wrote:
>>>> >> >>
>>>> >> >> Charlie,
>>>> >> >>
>>>> >> >> The behavior you described usually means that the processor
>>>> >> >> encountered
>>>> >> >> an
>>>> >> >> unexpected error which was thrown back to the framework which rolls
>>>> >> >> back the
>>>> >> >> processing of that flow file and leaves it in the queue, as opposed
>>>> >> >> to
>>>> >> >> an
>>>> >> >> error it expected where it would usually route to a failure
>>>> >> >> relationship.
>>>> >> >>
>>>> >> >> Is the id that you see in the bulletin a uuid?
>>>> >> >>
>>>> >> >> There should still be some provenance events for this FlowFile from
>>>> >> >> the
>>>> >> >> previous points in the flow. If it looks like the uuid of the
>>>> >> >> FlowFile,
>>>> >> >> that
>>>> >> >> should be searchable from provenance using the search button on the
>>>> >> >> right.
>>>> >> >> Let us know if we can help more.
>>>> >> >>
>>>> >> >> -Bryan
>>>> >> >>
>>>> >> >> On Thu, Nov 19, 2015 at 9:10 PM, Charlie Frasure
>>>> >> >> <ch...@gmail.com> wrote:
>>>> >> >>>
>>>> >> >>> I have a question on troubleshooting a flow.  I've built a flow
>>>> >> >>> with
>>>> >> >>> no
>>>> >> >>> exception routing, just trying to process the expected values
>>>> >> >>> first.
>>>> >> >>> When a
>>>> >> >>> file exposes a problem with the logic in my flow, it queues up
>>>> >> >>> prior
>>>> >> >>> to the
>>>> >> >>> flow that is raising the bulletin.
>>>> >> >>>
>>>> >> >>> In the bulletin, I can see an id, but can't tell which file it is.
>>>> >> >>> Data
>>>> >> >>> provenance doesn't seem to help as it passed the flow on the last
>>>> >> >>> processor,
>>>> >> >>> but hasn't been logged (to my knowledge) on the next one.
>>>> >> >>>
>>>> >> >>> Is there a way to match the bulletin back to a file without
>>>> >> >>> creating a
>>>> >> >>> route for failed files?
>>>> >> >>
>>>> >> >>
>>>> >> >
>>>> >
>>>> >
>>>
>>>
>>

Re: queued files

Posted by Charlie Frasure <ch...@gmail.com>.
Interesting.  Thanks for the update and the template.  I use osx as a
playground, but this will have to be implemented on RHEL.  I'll see about
downloading or building this and testing.  Performance will be critical due
to the volume of data; I've run into some python-based detection libraries
that slowed the process way down.

A related project, jchardet[1] looks interesting as a possible start for a
custom processor.

[1] http://jchardet.sourceforge.net/



On Tue, Nov 24, 2015 at 11:29 AM, Joe Percivall <jo...@yahoo.com>
wrote:

> Hello Charlie,
>
> I was looking back through and saw this wasn't totally resolved yet.
>
>
> Couple questions. First, what system are you using? There are a couple of
> options for the stream command depending on what you're using. Also are you
> able to get new commands (using yum or brew)?
>
> The key thing I want to solve is to find the encoding of a file just based
> on it contents and not relying on having access to the original file.
> ExecuteStreamCommand should enable this. This is because you can just pass
> any FlowFile into ExecuteStreamCommand then it can route the FlowFile
> contents to STDIN for the command to execute on.
>
> Mac's (what I am using) default command for finding file encodings is
> "file -bi filename.txt" but it doesn't allow you to pass in a file via
> STDIN. I found a command called "uchardet"[1] which finds file encodings
> and allows you to pass the file in via STDIN.
>
> I attached a template that takes in a file using GetFile (deletes the
> original) and routes that FlowFile to ExecuteStreamCommand.
> ExecuteStreamCommand then runs "uchardet" on the contents of the FlowFile
> and outputs the encoding to the "encoding" attribute of the original
> FlowFile.
>
> [1] https://github.com/BYVoid/uchardet
>
> If this doesn't satisfy your needs just let me know!
> Joe
>
> - - - - - -
> Joseph Percivall
> linkedin.com/in/Percivall
> e: joepercivall@yahoo.com
>
>
>
>
> On Friday, November 20, 2015 9:53 AM, Charlie Frasure <
> charliefrasure@gmail.com> wrote:
>
>
>
> I'm definitely game for that.  Let me know what I can do to help.
>
>
>
> On Fri, Nov 20, 2015 at 9:35 AM, Joe Witt <jo...@gmail.com> wrote:
>
> Charlie
> >
> >Got ya.  I missed the 'encoding vs content type' thing.  I agree let's
> >find a way to avoid the extra copy.  We dont expose the storage
> >location of the underlying bytes.  So on the ListFile thing.  What I
> >was thinking was this (and honestly I've not tested this so maybe i'm
> >skipping something important)
> >
> >ListFile to get a listing of names/etc.. of interest
> >
> >Execute the 'file --mime-encoding ${filename}' to get more attributes
> >available to work with
> >
> >RouteOnAttribute to decide what to do with the file next.  You can
> >Fetch/delete what you don't want you can Fetch/pass on what you do
> >
> >I was looking for a way to check the mime-encoding while passing the
> >data to detect into an input stream.  because that is actually how
> >execute stream command wants to work.
> >
> >This is a use case that should be pretty easy so if you're willing to
> >chat through it with us we'll figure out a path to make it work well.
> >
> >Thanks
> >Joe
> >
> >On Fri, Nov 20, 2015 at 9:17 AM, Charlie Frasure
> >
> ><ch...@gmail.com> wrote:
> >> Thanks Joe,
> >>
> >> The use case is that I'm receiving data without knowing what character
> set
> >> it is coming in.  --mime-encoding is giving it's best guess on
> character set
> >> rather than the content type.
> >>
> >> The ListFile sounds interesting, but I wonder if I really even need
> that.  I
> >> don't want to leave the files in place, I just want to run an external
> >> command on them as part of the data flow.  Is there a way I can run an
> >> external command against the physical file such as
> >> /opt/nifi/somedir/12345.uuid?  Would that info be in an attribute
> somewhere?
> >> It just seems wasteful to make an extra copy of the file, in order to
> run a
> >> read-only command on it, then delete it.  If ListFiles is still the
> right
> >> way to go, please let me know.
> >>
> >>
> >> On Fri, Nov 20, 2015 at 6:45 AM, Joe Witt <jo...@gmail.com> wrote:
> >>>
> >>> For identifying the mime type you may have sufficient results with the
> >>> existing processor 'IdentifyMimeType' which you can put into the flow.
> >>>
> >>> For better logic around identifying files to pull but first calling an
> >>> external command to learn more about them the upcoming
> >>> ListFile/FetchFile combo that comes from this JIRA [1] might give you
> >>> better flexibility.
> >>>
> >>> [1] https://issues.apache.org/jira/browse/NIFI-631
> >>>
> >>> Thanks
> >>> Joe
> >>>
> >>> On Fri, Nov 20, 2015 at 12:08 AM, Charlie Frasure
> >>> <ch...@gmail.com> wrote:
> >>> > Thanks everyone for the help.  The trouble started a few processors
> >>> > earlier
> >>> > in an ExecuteStreamCommand on ${filename} with the result of "file
> not
> >>> > found".  I had originally set my GetFile processor to not remove
> files,
> >>> > but
> >>> > recently changed that.  Now it seems that my ExecuteStreamCommand may
> >>> > not be
> >>> > the best way to accomplish this.
> >>> >
> >>> > The command that gets executed is: file -b --mime-encoding
> ${filename}
> >>> > in the working directory: ${absolute.path}
> >>> >
> >>> > Now that the file is no longer in the source directory when the
> >>> > processor
> >>> > fires, the command is broken.  I could PutFile somewhere
> temporarily; is
> >>> > there a better way?
> >>> >
> >>> > On Thu, Nov 19, 2015 at 10:33 PM, Joe Witt <jo...@gmail.com>
> wrote:
> >>> >>
> >>> >> Charlie,
> >>> >>
> >>> >> The fact that this is confusing is something we agree should be more
> >>> >> clear and we will improve.  We're tackling it based on what is
> >>> >> mentioned here [1].
> >>> >>
> >>> >> [1]
> >>> >>
> >>> >>
> https://cwiki.apache.org/confluence/display/NIFI/Interactive+Queue+Management
> >>> >>
> >>> >> Thanks
> >>> >> Joe
> >>> >>
> >>> >> On Thu, Nov 19, 2015 at 10:30 PM, Corey Flowers
> >>> >> <cf...@onyxpoint.com>
> >>> >> wrote:
> >>> >> > These guys are right. The file to look in for the uuid is the
> >>> >> > nifi-app.log.
> >>> >> > Also if you wanted to see what the processor itself was doing, you
> >>> >> > could
> >>> >> > right click on the processor, get its uuid and while it is
> running,
> >>> >> > run
> >>> >> > (assuming it is on Linux):
> >>> >> >
> >>> >> > tail -F nifi-app.log | grep uuid
> >>> >> >
> >>> >> > This will just scroll the logs for that specific processor and
> will
> >>> >> > show
> >>> >> > you
> >>> >> > what it is doing. It should also tell you specific file names and
> >>> >> > uuids
> >>> >> > of
> >>> >> > the failing files.
> >>> >> >
> >>> >> > Hope that helps! Have a great night and good luck!
> >>> >> >
> >>> >> > Sent from my iPhone
> >>> >> >
> >>> >> > On Nov 19, 2015, at 9:27 PM, Juan Sequeiros <he...@gmail.com>
> >>> >> > wrote:
> >>> >> >
> >>> >> > You can also check the NiFi logs for a searchable id or for what
> the
> >>> >> > previous processor ID produced to help search provenance.
> >>> >> >
> >>> >> > On Nov 19, 2015 21:22, "Bryan Bende" <bb...@gmail.com> wrote:
> >>> >> >>
> >>> >> >> Charlie,
> >>> >> >>
> >>> >> >> The behavior you described usually means that the processor
> >>> >> >> encountered
> >>> >> >> an
> >>> >> >> unexpected error which was thrown back to the framework which
> rolls
> >>> >> >> back the
> >>> >> >> processing of that flow file and leaves it in the queue, as
> opposed
> >>> >> >> to
> >>> >> >> an
> >>> >> >> error it expected where it would usually route to a failure
> >>> >> >> relationship.
> >>> >> >>
> >>> >> >> Is the id that you see in the bulletin a uuid?
> >>> >> >>
> >>> >> >> There should still be some provenance events for this FlowFile
> from
> >>> >> >> the
> >>> >> >> previous points in the flow. If it looks like the uuid of the
> >>> >> >> FlowFile,
> >>> >> >> that
> >>> >> >> should be searchable from provenance using the search button on
> the
> >>> >> >> right.
> >>> >> >> Let us know if we can help more.
> >>> >> >>
> >>> >> >> -Bryan
> >>> >> >>
> >>> >> >> On Thu, Nov 19, 2015 at 9:10 PM, Charlie Frasure
> >>> >> >> <ch...@gmail.com> wrote:
> >>> >> >>>
> >>> >> >>> I have a question on troubleshooting a flow.  I've built a flow
> >>> >> >>> with
> >>> >> >>> no
> >>> >> >>> exception routing, just trying to process the expected values
> >>> >> >>> first.
> >>> >> >>> When a
> >>> >> >>> file exposes a problem with the logic in my flow, it queues up
> >>> >> >>> prior
> >>> >> >>> to the
> >>> >> >>> flow that is raising the bulletin.
> >>> >> >>>
> >>> >> >>> In the bulletin, I can see an id, but can't tell which file it
> is.
> >>> >> >>> Data
> >>> >> >>> provenance doesn't seem to help as it passed the flow on the
> last
> >>> >> >>> processor,
> >>> >> >>> but hasn't been logged (to my knowledge) on the next one.
> >>> >> >>>
> >>> >> >>> Is there a way to match the bulletin back to a file without
> >>> >> >>> creating a
> >>> >> >>> route for failed files?
> >>> >> >>
> >>> >> >>
> >>> >> >
> >>> >
> >>> >
> >>
> >>
> >
>

Re: queued files

Posted by Joe Percivall <jo...@yahoo.com>.
Hello Charlie,

I was looking back through and saw this wasn't totally resolved yet. 


Couple questions. First, what system are you using? There are a couple of options for the stream command depending on what you're using. Also are you able to get new commands (using yum or brew)?

The key thing I want to solve is to find the encoding of a file just based on it contents and not relying on having access to the original file. ExecuteStreamCommand should enable this. This is because you can just pass any FlowFile into ExecuteStreamCommand then it can route the FlowFile contents to STDIN for the command to execute on.

Mac's (what I am using) default command for finding file encodings is "file -bi filename.txt" but it doesn't allow you to pass in a file via STDIN. I found a command called "uchardet"[1] which finds file encodings and allows you to pass the file in via STDIN. 

I attached a template that takes in a file using GetFile (deletes the original) and routes that FlowFile to ExecuteStreamCommand. ExecuteStreamCommand then runs "uchardet" on the contents of the FlowFile and outputs the encoding to the "encoding" attribute of the original FlowFile.
 
[1] https://github.com/BYVoid/uchardet

If this doesn't satisfy your needs just let me know!
Joe

- - - - - - 
Joseph Percivall
linkedin.com/in/Percivall
e: joepercivall@yahoo.com




On Friday, November 20, 2015 9:53 AM, Charlie Frasure <ch...@gmail.com> wrote:



I'm definitely game for that.  Let me know what I can do to help.



On Fri, Nov 20, 2015 at 9:35 AM, Joe Witt <jo...@gmail.com> wrote:

Charlie
>
>Got ya.  I missed the 'encoding vs content type' thing.  I agree let's
>find a way to avoid the extra copy.  We dont expose the storage
>location of the underlying bytes.  So on the ListFile thing.  What I
>was thinking was this (and honestly I've not tested this so maybe i'm
>skipping something important)
>
>ListFile to get a listing of names/etc.. of interest
>
>Execute the 'file --mime-encoding ${filename}' to get more attributes
>available to work with
>
>RouteOnAttribute to decide what to do with the file next.  You can
>Fetch/delete what you don't want you can Fetch/pass on what you do
>
>I was looking for a way to check the mime-encoding while passing the
>data to detect into an input stream.  because that is actually how
>execute stream command wants to work.
>
>This is a use case that should be pretty easy so if you're willing to
>chat through it with us we'll figure out a path to make it work well.
>
>Thanks
>Joe
>
>On Fri, Nov 20, 2015 at 9:17 AM, Charlie Frasure
>
><ch...@gmail.com> wrote:
>> Thanks Joe,
>>
>> The use case is that I'm receiving data without knowing what character set
>> it is coming in.  --mime-encoding is giving it's best guess on character set
>> rather than the content type.
>>
>> The ListFile sounds interesting, but I wonder if I really even need that.  I
>> don't want to leave the files in place, I just want to run an external
>> command on them as part of the data flow.  Is there a way I can run an
>> external command against the physical file such as
>> /opt/nifi/somedir/12345.uuid?  Would that info be in an attribute somewhere?
>> It just seems wasteful to make an extra copy of the file, in order to run a
>> read-only command on it, then delete it.  If ListFiles is still the right
>> way to go, please let me know.
>>
>>
>> On Fri, Nov 20, 2015 at 6:45 AM, Joe Witt <jo...@gmail.com> wrote:
>>>
>>> For identifying the mime type you may have sufficient results with the
>>> existing processor 'IdentifyMimeType' which you can put into the flow.
>>>
>>> For better logic around identifying files to pull but first calling an
>>> external command to learn more about them the upcoming
>>> ListFile/FetchFile combo that comes from this JIRA [1] might give you
>>> better flexibility.
>>>
>>> [1] https://issues.apache.org/jira/browse/NIFI-631
>>>
>>> Thanks
>>> Joe
>>>
>>> On Fri, Nov 20, 2015 at 12:08 AM, Charlie Frasure
>>> <ch...@gmail.com> wrote:
>>> > Thanks everyone for the help.  The trouble started a few processors
>>> > earlier
>>> > in an ExecuteStreamCommand on ${filename} with the result of "file not
>>> > found".  I had originally set my GetFile processor to not remove files,
>>> > but
>>> > recently changed that.  Now it seems that my ExecuteStreamCommand may
>>> > not be
>>> > the best way to accomplish this.
>>> >
>>> > The command that gets executed is: file -b --mime-encoding ${filename}
>>> > in the working directory: ${absolute.path}
>>> >
>>> > Now that the file is no longer in the source directory when the
>>> > processor
>>> > fires, the command is broken.  I could PutFile somewhere temporarily; is
>>> > there a better way?
>>> >
>>> > On Thu, Nov 19, 2015 at 10:33 PM, Joe Witt <jo...@gmail.com> wrote:
>>> >>
>>> >> Charlie,
>>> >>
>>> >> The fact that this is confusing is something we agree should be more
>>> >> clear and we will improve.  We're tackling it based on what is
>>> >> mentioned here [1].
>>> >>
>>> >> [1]
>>> >>
>>> >> https://cwiki.apache.org/confluence/display/NIFI/Interactive+Queue+Management
>>> >>
>>> >> Thanks
>>> >> Joe
>>> >>
>>> >> On Thu, Nov 19, 2015 at 10:30 PM, Corey Flowers
>>> >> <cf...@onyxpoint.com>
>>> >> wrote:
>>> >> > These guys are right. The file to look in for the uuid is the
>>> >> > nifi-app.log.
>>> >> > Also if you wanted to see what the processor itself was doing, you
>>> >> > could
>>> >> > right click on the processor, get its uuid and while it is running,
>>> >> > run
>>> >> > (assuming it is on Linux):
>>> >> >
>>> >> > tail -F nifi-app.log | grep uuid
>>> >> >
>>> >> > This will just scroll the logs for that specific processor and will
>>> >> > show
>>> >> > you
>>> >> > what it is doing. It should also tell you specific file names and
>>> >> > uuids
>>> >> > of
>>> >> > the failing files.
>>> >> >
>>> >> > Hope that helps! Have a great night and good luck!
>>> >> >
>>> >> > Sent from my iPhone
>>> >> >
>>> >> > On Nov 19, 2015, at 9:27 PM, Juan Sequeiros <he...@gmail.com>
>>> >> > wrote:
>>> >> >
>>> >> > You can also check the NiFi logs for a searchable id or for what the
>>> >> > previous processor ID produced to help search provenance.
>>> >> >
>>> >> > On Nov 19, 2015 21:22, "Bryan Bende" <bb...@gmail.com> wrote:
>>> >> >>
>>> >> >> Charlie,
>>> >> >>
>>> >> >> The behavior you described usually means that the processor
>>> >> >> encountered
>>> >> >> an
>>> >> >> unexpected error which was thrown back to the framework which rolls
>>> >> >> back the
>>> >> >> processing of that flow file and leaves it in the queue, as opposed
>>> >> >> to
>>> >> >> an
>>> >> >> error it expected where it would usually route to a failure
>>> >> >> relationship.
>>> >> >>
>>> >> >> Is the id that you see in the bulletin a uuid?
>>> >> >>
>>> >> >> There should still be some provenance events for this FlowFile from
>>> >> >> the
>>> >> >> previous points in the flow. If it looks like the uuid of the
>>> >> >> FlowFile,
>>> >> >> that
>>> >> >> should be searchable from provenance using the search button on the
>>> >> >> right.
>>> >> >> Let us know if we can help more.
>>> >> >>
>>> >> >> -Bryan
>>> >> >>
>>> >> >> On Thu, Nov 19, 2015 at 9:10 PM, Charlie Frasure
>>> >> >> <ch...@gmail.com> wrote:
>>> >> >>>
>>> >> >>> I have a question on troubleshooting a flow.  I've built a flow
>>> >> >>> with
>>> >> >>> no
>>> >> >>> exception routing, just trying to process the expected values
>>> >> >>> first.
>>> >> >>> When a
>>> >> >>> file exposes a problem with the logic in my flow, it queues up
>>> >> >>> prior
>>> >> >>> to the
>>> >> >>> flow that is raising the bulletin.
>>> >> >>>
>>> >> >>> In the bulletin, I can see an id, but can't tell which file it is.
>>> >> >>> Data
>>> >> >>> provenance doesn't seem to help as it passed the flow on the last
>>> >> >>> processor,
>>> >> >>> but hasn't been logged (to my knowledge) on the next one.
>>> >> >>>
>>> >> >>> Is there a way to match the bulletin back to a file without
>>> >> >>> creating a
>>> >> >>> route for failed files?
>>> >> >>
>>> >> >>
>>> >> >
>>> >
>>> >
>>
>>
>

Re: queued files

Posted by Charlie Frasure <ch...@gmail.com>.
I'm definitely game for that.  Let me know what I can do to help.

On Fri, Nov 20, 2015 at 9:35 AM, Joe Witt <jo...@gmail.com> wrote:

> Charlie
>
> Got ya.  I missed the 'encoding vs content type' thing.  I agree let's
> find a way to avoid the extra copy.  We dont expose the storage
> location of the underlying bytes.  So on the ListFile thing.  What I
> was thinking was this (and honestly I've not tested this so maybe i'm
> skipping something important)
>
> ListFile to get a listing of names/etc.. of interest
>
> Execute the 'file --mime-encoding ${filename}' to get more attributes
> available to work with
>
> RouteOnAttribute to decide what to do with the file next.  You can
> Fetch/delete what you don't want you can Fetch/pass on what you do
>
> I was looking for a way to check the mime-encoding while passing the
> data to detect into an input stream.  because that is actually how
> execute stream command wants to work.
>
> This is a use case that should be pretty easy so if you're willing to
> chat through it with us we'll figure out a path to make it work well.
>
> Thanks
> Joe
>
> On Fri, Nov 20, 2015 at 9:17 AM, Charlie Frasure
> <ch...@gmail.com> wrote:
> > Thanks Joe,
> >
> > The use case is that I'm receiving data without knowing what character
> set
> > it is coming in.  --mime-encoding is giving it's best guess on character
> set
> > rather than the content type.
> >
> > The ListFile sounds interesting, but I wonder if I really even need
> that.  I
> > don't want to leave the files in place, I just want to run an external
> > command on them as part of the data flow.  Is there a way I can run an
> > external command against the physical file such as
> > /opt/nifi/somedir/12345.uuid?  Would that info be in an attribute
> somewhere?
> > It just seems wasteful to make an extra copy of the file, in order to
> run a
> > read-only command on it, then delete it.  If ListFiles is still the right
> > way to go, please let me know.
> >
> >
> > On Fri, Nov 20, 2015 at 6:45 AM, Joe Witt <jo...@gmail.com> wrote:
> >>
> >> For identifying the mime type you may have sufficient results with the
> >> existing processor 'IdentifyMimeType' which you can put into the flow.
> >>
> >> For better logic around identifying files to pull but first calling an
> >> external command to learn more about them the upcoming
> >> ListFile/FetchFile combo that comes from this JIRA [1] might give you
> >> better flexibility.
> >>
> >> [1] https://issues.apache.org/jira/browse/NIFI-631
> >>
> >> Thanks
> >> Joe
> >>
> >> On Fri, Nov 20, 2015 at 12:08 AM, Charlie Frasure
> >> <ch...@gmail.com> wrote:
> >> > Thanks everyone for the help.  The trouble started a few processors
> >> > earlier
> >> > in an ExecuteStreamCommand on ${filename} with the result of "file not
> >> > found".  I had originally set my GetFile processor to not remove
> files,
> >> > but
> >> > recently changed that.  Now it seems that my ExecuteStreamCommand may
> >> > not be
> >> > the best way to accomplish this.
> >> >
> >> > The command that gets executed is: file -b --mime-encoding ${filename}
> >> > in the working directory: ${absolute.path}
> >> >
> >> > Now that the file is no longer in the source directory when the
> >> > processor
> >> > fires, the command is broken.  I could PutFile somewhere temporarily;
> is
> >> > there a better way?
> >> >
> >> > On Thu, Nov 19, 2015 at 10:33 PM, Joe Witt <jo...@gmail.com>
> wrote:
> >> >>
> >> >> Charlie,
> >> >>
> >> >> The fact that this is confusing is something we agree should be more
> >> >> clear and we will improve.  We're tackling it based on what is
> >> >> mentioned here [1].
> >> >>
> >> >> [1]
> >> >>
> >> >>
> https://cwiki.apache.org/confluence/display/NIFI/Interactive+Queue+Management
> >> >>
> >> >> Thanks
> >> >> Joe
> >> >>
> >> >> On Thu, Nov 19, 2015 at 10:30 PM, Corey Flowers
> >> >> <cf...@onyxpoint.com>
> >> >> wrote:
> >> >> > These guys are right. The file to look in for the uuid is the
> >> >> > nifi-app.log.
> >> >> > Also if you wanted to see what the processor itself was doing, you
> >> >> > could
> >> >> > right click on the processor, get its uuid and while it is running,
> >> >> > run
> >> >> > (assuming it is on Linux):
> >> >> >
> >> >> > tail -F nifi-app.log | grep uuid
> >> >> >
> >> >> > This will just scroll the logs for that specific processor and will
> >> >> > show
> >> >> > you
> >> >> > what it is doing. It should also tell you specific file names and
> >> >> > uuids
> >> >> > of
> >> >> > the failing files.
> >> >> >
> >> >> > Hope that helps! Have a great night and good luck!
> >> >> >
> >> >> > Sent from my iPhone
> >> >> >
> >> >> > On Nov 19, 2015, at 9:27 PM, Juan Sequeiros <he...@gmail.com>
> >> >> > wrote:
> >> >> >
> >> >> > You can also check the NiFi logs for a searchable id or for what
> the
> >> >> > previous processor ID produced to help search provenance.
> >> >> >
> >> >> > On Nov 19, 2015 21:22, "Bryan Bende" <bb...@gmail.com> wrote:
> >> >> >>
> >> >> >> Charlie,
> >> >> >>
> >> >> >> The behavior you described usually means that the processor
> >> >> >> encountered
> >> >> >> an
> >> >> >> unexpected error which was thrown back to the framework which
> rolls
> >> >> >> back the
> >> >> >> processing of that flow file and leaves it in the queue, as
> opposed
> >> >> >> to
> >> >> >> an
> >> >> >> error it expected where it would usually route to a failure
> >> >> >> relationship.
> >> >> >>
> >> >> >> Is the id that you see in the bulletin a uuid?
> >> >> >>
> >> >> >> There should still be some provenance events for this FlowFile
> from
> >> >> >> the
> >> >> >> previous points in the flow. If it looks like the uuid of the
> >> >> >> FlowFile,
> >> >> >> that
> >> >> >> should be searchable from provenance using the search button on
> the
> >> >> >> right.
> >> >> >> Let us know if we can help more.
> >> >> >>
> >> >> >> -Bryan
> >> >> >>
> >> >> >> On Thu, Nov 19, 2015 at 9:10 PM, Charlie Frasure
> >> >> >> <ch...@gmail.com> wrote:
> >> >> >>>
> >> >> >>> I have a question on troubleshooting a flow.  I've built a flow
> >> >> >>> with
> >> >> >>> no
> >> >> >>> exception routing, just trying to process the expected values
> >> >> >>> first.
> >> >> >>> When a
> >> >> >>> file exposes a problem with the logic in my flow, it queues up
> >> >> >>> prior
> >> >> >>> to the
> >> >> >>> flow that is raising the bulletin.
> >> >> >>>
> >> >> >>> In the bulletin, I can see an id, but can't tell which file it
> is.
> >> >> >>> Data
> >> >> >>> provenance doesn't seem to help as it passed the flow on the last
> >> >> >>> processor,
> >> >> >>> but hasn't been logged (to my knowledge) on the next one.
> >> >> >>>
> >> >> >>> Is there a way to match the bulletin back to a file without
> >> >> >>> creating a
> >> >> >>> route for failed files?
> >> >> >>
> >> >> >>
> >> >> >
> >> >
> >> >
> >
> >
>

Re: queued files

Posted by Joe Witt <jo...@gmail.com>.
Charlie

Got ya.  I missed the 'encoding vs content type' thing.  I agree let's
find a way to avoid the extra copy.  We dont expose the storage
location of the underlying bytes.  So on the ListFile thing.  What I
was thinking was this (and honestly I've not tested this so maybe i'm
skipping something important)

ListFile to get a listing of names/etc.. of interest

Execute the 'file --mime-encoding ${filename}' to get more attributes
available to work with

RouteOnAttribute to decide what to do with the file next.  You can
Fetch/delete what you don't want you can Fetch/pass on what you do

I was looking for a way to check the mime-encoding while passing the
data to detect into an input stream.  because that is actually how
execute stream command wants to work.

This is a use case that should be pretty easy so if you're willing to
chat through it with us we'll figure out a path to make it work well.

Thanks
Joe

On Fri, Nov 20, 2015 at 9:17 AM, Charlie Frasure
<ch...@gmail.com> wrote:
> Thanks Joe,
>
> The use case is that I'm receiving data without knowing what character set
> it is coming in.  --mime-encoding is giving it's best guess on character set
> rather than the content type.
>
> The ListFile sounds interesting, but I wonder if I really even need that.  I
> don't want to leave the files in place, I just want to run an external
> command on them as part of the data flow.  Is there a way I can run an
> external command against the physical file such as
> /opt/nifi/somedir/12345.uuid?  Would that info be in an attribute somewhere?
> It just seems wasteful to make an extra copy of the file, in order to run a
> read-only command on it, then delete it.  If ListFiles is still the right
> way to go, please let me know.
>
>
> On Fri, Nov 20, 2015 at 6:45 AM, Joe Witt <jo...@gmail.com> wrote:
>>
>> For identifying the mime type you may have sufficient results with the
>> existing processor 'IdentifyMimeType' which you can put into the flow.
>>
>> For better logic around identifying files to pull but first calling an
>> external command to learn more about them the upcoming
>> ListFile/FetchFile combo that comes from this JIRA [1] might give you
>> better flexibility.
>>
>> [1] https://issues.apache.org/jira/browse/NIFI-631
>>
>> Thanks
>> Joe
>>
>> On Fri, Nov 20, 2015 at 12:08 AM, Charlie Frasure
>> <ch...@gmail.com> wrote:
>> > Thanks everyone for the help.  The trouble started a few processors
>> > earlier
>> > in an ExecuteStreamCommand on ${filename} with the result of "file not
>> > found".  I had originally set my GetFile processor to not remove files,
>> > but
>> > recently changed that.  Now it seems that my ExecuteStreamCommand may
>> > not be
>> > the best way to accomplish this.
>> >
>> > The command that gets executed is: file -b --mime-encoding ${filename}
>> > in the working directory: ${absolute.path}
>> >
>> > Now that the file is no longer in the source directory when the
>> > processor
>> > fires, the command is broken.  I could PutFile somewhere temporarily; is
>> > there a better way?
>> >
>> > On Thu, Nov 19, 2015 at 10:33 PM, Joe Witt <jo...@gmail.com> wrote:
>> >>
>> >> Charlie,
>> >>
>> >> The fact that this is confusing is something we agree should be more
>> >> clear and we will improve.  We're tackling it based on what is
>> >> mentioned here [1].
>> >>
>> >> [1]
>> >>
>> >> https://cwiki.apache.org/confluence/display/NIFI/Interactive+Queue+Management
>> >>
>> >> Thanks
>> >> Joe
>> >>
>> >> On Thu, Nov 19, 2015 at 10:30 PM, Corey Flowers
>> >> <cf...@onyxpoint.com>
>> >> wrote:
>> >> > These guys are right. The file to look in for the uuid is the
>> >> > nifi-app.log.
>> >> > Also if you wanted to see what the processor itself was doing, you
>> >> > could
>> >> > right click on the processor, get its uuid and while it is running,
>> >> > run
>> >> > (assuming it is on Linux):
>> >> >
>> >> > tail -F nifi-app.log | grep uuid
>> >> >
>> >> > This will just scroll the logs for that specific processor and will
>> >> > show
>> >> > you
>> >> > what it is doing. It should also tell you specific file names and
>> >> > uuids
>> >> > of
>> >> > the failing files.
>> >> >
>> >> > Hope that helps! Have a great night and good luck!
>> >> >
>> >> > Sent from my iPhone
>> >> >
>> >> > On Nov 19, 2015, at 9:27 PM, Juan Sequeiros <he...@gmail.com>
>> >> > wrote:
>> >> >
>> >> > You can also check the NiFi logs for a searchable id or for what the
>> >> > previous processor ID produced to help search provenance.
>> >> >
>> >> > On Nov 19, 2015 21:22, "Bryan Bende" <bb...@gmail.com> wrote:
>> >> >>
>> >> >> Charlie,
>> >> >>
>> >> >> The behavior you described usually means that the processor
>> >> >> encountered
>> >> >> an
>> >> >> unexpected error which was thrown back to the framework which rolls
>> >> >> back the
>> >> >> processing of that flow file and leaves it in the queue, as opposed
>> >> >> to
>> >> >> an
>> >> >> error it expected where it would usually route to a failure
>> >> >> relationship.
>> >> >>
>> >> >> Is the id that you see in the bulletin a uuid?
>> >> >>
>> >> >> There should still be some provenance events for this FlowFile from
>> >> >> the
>> >> >> previous points in the flow. If it looks like the uuid of the
>> >> >> FlowFile,
>> >> >> that
>> >> >> should be searchable from provenance using the search button on the
>> >> >> right.
>> >> >> Let us know if we can help more.
>> >> >>
>> >> >> -Bryan
>> >> >>
>> >> >> On Thu, Nov 19, 2015 at 9:10 PM, Charlie Frasure
>> >> >> <ch...@gmail.com> wrote:
>> >> >>>
>> >> >>> I have a question on troubleshooting a flow.  I've built a flow
>> >> >>> with
>> >> >>> no
>> >> >>> exception routing, just trying to process the expected values
>> >> >>> first.
>> >> >>> When a
>> >> >>> file exposes a problem with the logic in my flow, it queues up
>> >> >>> prior
>> >> >>> to the
>> >> >>> flow that is raising the bulletin.
>> >> >>>
>> >> >>> In the bulletin, I can see an id, but can't tell which file it is.
>> >> >>> Data
>> >> >>> provenance doesn't seem to help as it passed the flow on the last
>> >> >>> processor,
>> >> >>> but hasn't been logged (to my knowledge) on the next one.
>> >> >>>
>> >> >>> Is there a way to match the bulletin back to a file without
>> >> >>> creating a
>> >> >>> route for failed files?
>> >> >>
>> >> >>
>> >> >
>> >
>> >
>
>

Re: queued files

Posted by Charlie Frasure <ch...@gmail.com>.
Thanks Joe,

The use case is that I'm receiving data without knowing what character set
it is coming in.  --mime-encoding is giving it's best guess on character
set rather than the content type.

The ListFile sounds interesting, but I wonder if I really even need that.
I don't want to leave the files in place, I just want to run an external
command on them as part of the data flow.  Is there a way I can run an
external command against the physical file such as
/opt/nifi/somedir/12345.uuid?
Would that info be in an attribute somewhere?  It just seems wasteful to
make an extra copy of the file, in order to run a read-only command on it,
then delete it.  If ListFiles is still the right way to go, please let me
know.


On Fri, Nov 20, 2015 at 6:45 AM, Joe Witt <jo...@gmail.com> wrote:

> For identifying the mime type you may have sufficient results with the
> existing processor 'IdentifyMimeType' which you can put into the flow.
>
> For better logic around identifying files to pull but first calling an
> external command to learn more about them the upcoming
> ListFile/FetchFile combo that comes from this JIRA [1] might give you
> better flexibility.
>
> [1] https://issues.apache.org/jira/browse/NIFI-631
>
> Thanks
> Joe
>
> On Fri, Nov 20, 2015 at 12:08 AM, Charlie Frasure
> <ch...@gmail.com> wrote:
> > Thanks everyone for the help.  The trouble started a few processors
> earlier
> > in an ExecuteStreamCommand on ${filename} with the result of "file not
> > found".  I had originally set my GetFile processor to not remove files,
> but
> > recently changed that.  Now it seems that my ExecuteStreamCommand may
> not be
> > the best way to accomplish this.
> >
> > The command that gets executed is: file -b --mime-encoding ${filename}
> > in the working directory: ${absolute.path}
> >
> > Now that the file is no longer in the source directory when the processor
> > fires, the command is broken.  I could PutFile somewhere temporarily; is
> > there a better way?
> >
> > On Thu, Nov 19, 2015 at 10:33 PM, Joe Witt <jo...@gmail.com> wrote:
> >>
> >> Charlie,
> >>
> >> The fact that this is confusing is something we agree should be more
> >> clear and we will improve.  We're tackling it based on what is
> >> mentioned here [1].
> >>
> >> [1]
> >>
> https://cwiki.apache.org/confluence/display/NIFI/Interactive+Queue+Management
> >>
> >> Thanks
> >> Joe
> >>
> >> On Thu, Nov 19, 2015 at 10:30 PM, Corey Flowers <cflowers@onyxpoint.com
> >
> >> wrote:
> >> > These guys are right. The file to look in for the uuid is the
> >> > nifi-app.log.
> >> > Also if you wanted to see what the processor itself was doing, you
> could
> >> > right click on the processor, get its uuid and while it is running,
> run
> >> > (assuming it is on Linux):
> >> >
> >> > tail -F nifi-app.log | grep uuid
> >> >
> >> > This will just scroll the logs for that specific processor and will
> show
> >> > you
> >> > what it is doing. It should also tell you specific file names and
> uuids
> >> > of
> >> > the failing files.
> >> >
> >> > Hope that helps! Have a great night and good luck!
> >> >
> >> > Sent from my iPhone
> >> >
> >> > On Nov 19, 2015, at 9:27 PM, Juan Sequeiros <he...@gmail.com>
> wrote:
> >> >
> >> > You can also check the NiFi logs for a searchable id or for what the
> >> > previous processor ID produced to help search provenance.
> >> >
> >> > On Nov 19, 2015 21:22, "Bryan Bende" <bb...@gmail.com> wrote:
> >> >>
> >> >> Charlie,
> >> >>
> >> >> The behavior you described usually means that the processor
> encountered
> >> >> an
> >> >> unexpected error which was thrown back to the framework which rolls
> >> >> back the
> >> >> processing of that flow file and leaves it in the queue, as opposed
> to
> >> >> an
> >> >> error it expected where it would usually route to a failure
> >> >> relationship.
> >> >>
> >> >> Is the id that you see in the bulletin a uuid?
> >> >>
> >> >> There should still be some provenance events for this FlowFile from
> the
> >> >> previous points in the flow. If it looks like the uuid of the
> FlowFile,
> >> >> that
> >> >> should be searchable from provenance using the search button on the
> >> >> right.
> >> >> Let us know if we can help more.
> >> >>
> >> >> -Bryan
> >> >>
> >> >> On Thu, Nov 19, 2015 at 9:10 PM, Charlie Frasure
> >> >> <ch...@gmail.com> wrote:
> >> >>>
> >> >>> I have a question on troubleshooting a flow.  I've built a flow with
> >> >>> no
> >> >>> exception routing, just trying to process the expected values first.
> >> >>> When a
> >> >>> file exposes a problem with the logic in my flow, it queues up prior
> >> >>> to the
> >> >>> flow that is raising the bulletin.
> >> >>>
> >> >>> In the bulletin, I can see an id, but can't tell which file it is.
> >> >>> Data
> >> >>> provenance doesn't seem to help as it passed the flow on the last
> >> >>> processor,
> >> >>> but hasn't been logged (to my knowledge) on the next one.
> >> >>>
> >> >>> Is there a way to match the bulletin back to a file without
> creating a
> >> >>> route for failed files?
> >> >>
> >> >>
> >> >
> >
> >
>

Re: queued files

Posted by Joe Witt <jo...@gmail.com>.
For identifying the mime type you may have sufficient results with the
existing processor 'IdentifyMimeType' which you can put into the flow.

For better logic around identifying files to pull but first calling an
external command to learn more about them the upcoming
ListFile/FetchFile combo that comes from this JIRA [1] might give you
better flexibility.

[1] https://issues.apache.org/jira/browse/NIFI-631

Thanks
Joe

On Fri, Nov 20, 2015 at 12:08 AM, Charlie Frasure
<ch...@gmail.com> wrote:
> Thanks everyone for the help.  The trouble started a few processors earlier
> in an ExecuteStreamCommand on ${filename} with the result of "file not
> found".  I had originally set my GetFile processor to not remove files, but
> recently changed that.  Now it seems that my ExecuteStreamCommand may not be
> the best way to accomplish this.
>
> The command that gets executed is: file -b --mime-encoding ${filename}
> in the working directory: ${absolute.path}
>
> Now that the file is no longer in the source directory when the processor
> fires, the command is broken.  I could PutFile somewhere temporarily; is
> there a better way?
>
> On Thu, Nov 19, 2015 at 10:33 PM, Joe Witt <jo...@gmail.com> wrote:
>>
>> Charlie,
>>
>> The fact that this is confusing is something we agree should be more
>> clear and we will improve.  We're tackling it based on what is
>> mentioned here [1].
>>
>> [1]
>> https://cwiki.apache.org/confluence/display/NIFI/Interactive+Queue+Management
>>
>> Thanks
>> Joe
>>
>> On Thu, Nov 19, 2015 at 10:30 PM, Corey Flowers <cf...@onyxpoint.com>
>> wrote:
>> > These guys are right. The file to look in for the uuid is the
>> > nifi-app.log.
>> > Also if you wanted to see what the processor itself was doing, you could
>> > right click on the processor, get its uuid and while it is running, run
>> > (assuming it is on Linux):
>> >
>> > tail -F nifi-app.log | grep uuid
>> >
>> > This will just scroll the logs for that specific processor and will show
>> > you
>> > what it is doing. It should also tell you specific file names and uuids
>> > of
>> > the failing files.
>> >
>> > Hope that helps! Have a great night and good luck!
>> >
>> > Sent from my iPhone
>> >
>> > On Nov 19, 2015, at 9:27 PM, Juan Sequeiros <he...@gmail.com> wrote:
>> >
>> > You can also check the NiFi logs for a searchable id or for what the
>> > previous processor ID produced to help search provenance.
>> >
>> > On Nov 19, 2015 21:22, "Bryan Bende" <bb...@gmail.com> wrote:
>> >>
>> >> Charlie,
>> >>
>> >> The behavior you described usually means that the processor encountered
>> >> an
>> >> unexpected error which was thrown back to the framework which rolls
>> >> back the
>> >> processing of that flow file and leaves it in the queue, as opposed to
>> >> an
>> >> error it expected where it would usually route to a failure
>> >> relationship.
>> >>
>> >> Is the id that you see in the bulletin a uuid?
>> >>
>> >> There should still be some provenance events for this FlowFile from the
>> >> previous points in the flow. If it looks like the uuid of the FlowFile,
>> >> that
>> >> should be searchable from provenance using the search button on the
>> >> right.
>> >> Let us know if we can help more.
>> >>
>> >> -Bryan
>> >>
>> >> On Thu, Nov 19, 2015 at 9:10 PM, Charlie Frasure
>> >> <ch...@gmail.com> wrote:
>> >>>
>> >>> I have a question on troubleshooting a flow.  I've built a flow with
>> >>> no
>> >>> exception routing, just trying to process the expected values first.
>> >>> When a
>> >>> file exposes a problem with the logic in my flow, it queues up prior
>> >>> to the
>> >>> flow that is raising the bulletin.
>> >>>
>> >>> In the bulletin, I can see an id, but can't tell which file it is.
>> >>> Data
>> >>> provenance doesn't seem to help as it passed the flow on the last
>> >>> processor,
>> >>> but hasn't been logged (to my knowledge) on the next one.
>> >>>
>> >>> Is there a way to match the bulletin back to a file without creating a
>> >>> route for failed files?
>> >>
>> >>
>> >
>
>

Re: queued files

Posted by Charlie Frasure <ch...@gmail.com>.
Thanks everyone for the help.  The trouble started a few processors earlier
in an ExecuteStreamCommand on ${filename} with the result of "file not
found".  I had originally set my GetFile processor to not remove files, but
recently changed that.  Now it seems that my ExecuteStreamCommand may not
be the best way to accomplish this.

The command that gets executed is: file -b --mime-encoding ${filename}
in the working directory: ${absolute.path}

Now that the file is no longer in the source directory when the processor
fires, the command is broken.  I could PutFile somewhere temporarily; is
there a better way?

On Thu, Nov 19, 2015 at 10:33 PM, Joe Witt <jo...@gmail.com> wrote:

> Charlie,
>
> The fact that this is confusing is something we agree should be more
> clear and we will improve.  We're tackling it based on what is
> mentioned here [1].
>
> [1]
> https://cwiki.apache.org/confluence/display/NIFI/Interactive+Queue+Management
>
> Thanks
> Joe
>
> On Thu, Nov 19, 2015 at 10:30 PM, Corey Flowers <cf...@onyxpoint.com>
> wrote:
> > These guys are right. The file to look in for the uuid is the
> nifi-app.log.
> > Also if you wanted to see what the processor itself was doing, you could
> > right click on the processor, get its uuid and while it is running, run
> > (assuming it is on Linux):
> >
> > tail -F nifi-app.log | grep uuid
> >
> > This will just scroll the logs for that specific processor and will show
> you
> > what it is doing. It should also tell you specific file names and uuids
> of
> > the failing files.
> >
> > Hope that helps! Have a great night and good luck!
> >
> > Sent from my iPhone
> >
> > On Nov 19, 2015, at 9:27 PM, Juan Sequeiros <he...@gmail.com> wrote:
> >
> > You can also check the NiFi logs for a searchable id or for what the
> > previous processor ID produced to help search provenance.
> >
> > On Nov 19, 2015 21:22, "Bryan Bende" <bb...@gmail.com> wrote:
> >>
> >> Charlie,
> >>
> >> The behavior you described usually means that the processor encountered
> an
> >> unexpected error which was thrown back to the framework which rolls
> back the
> >> processing of that flow file and leaves it in the queue, as opposed to
> an
> >> error it expected where it would usually route to a failure
> relationship.
> >>
> >> Is the id that you see in the bulletin a uuid?
> >>
> >> There should still be some provenance events for this FlowFile from the
> >> previous points in the flow. If it looks like the uuid of the FlowFile,
> that
> >> should be searchable from provenance using the search button on the
> right.
> >> Let us know if we can help more.
> >>
> >> -Bryan
> >>
> >> On Thu, Nov 19, 2015 at 9:10 PM, Charlie Frasure
> >> <ch...@gmail.com> wrote:
> >>>
> >>> I have a question on troubleshooting a flow.  I've built a flow with no
> >>> exception routing, just trying to process the expected values first.
> When a
> >>> file exposes a problem with the logic in my flow, it queues up prior
> to the
> >>> flow that is raising the bulletin.
> >>>
> >>> In the bulletin, I can see an id, but can't tell which file it is.
> Data
> >>> provenance doesn't seem to help as it passed the flow on the last
> processor,
> >>> but hasn't been logged (to my knowledge) on the next one.
> >>>
> >>> Is there a way to match the bulletin back to a file without creating a
> >>> route for failed files?
> >>
> >>
> >
>

Re: queued files

Posted by Joe Witt <jo...@gmail.com>.
Charlie,

The fact that this is confusing is something we agree should be more
clear and we will improve.  We're tackling it based on what is
mentioned here [1].

[1] https://cwiki.apache.org/confluence/display/NIFI/Interactive+Queue+Management

Thanks
Joe

On Thu, Nov 19, 2015 at 10:30 PM, Corey Flowers <cf...@onyxpoint.com> wrote:
> These guys are right. The file to look in for the uuid is the nifi-app.log.
> Also if you wanted to see what the processor itself was doing, you could
> right click on the processor, get its uuid and while it is running, run
> (assuming it is on Linux):
>
> tail -F nifi-app.log | grep uuid
>
> This will just scroll the logs for that specific processor and will show you
> what it is doing. It should also tell you specific file names and uuids of
> the failing files.
>
> Hope that helps! Have a great night and good luck!
>
> Sent from my iPhone
>
> On Nov 19, 2015, at 9:27 PM, Juan Sequeiros <he...@gmail.com> wrote:
>
> You can also check the NiFi logs for a searchable id or for what the
> previous processor ID produced to help search provenance.
>
> On Nov 19, 2015 21:22, "Bryan Bende" <bb...@gmail.com> wrote:
>>
>> Charlie,
>>
>> The behavior you described usually means that the processor encountered an
>> unexpected error which was thrown back to the framework which rolls back the
>> processing of that flow file and leaves it in the queue, as opposed to an
>> error it expected where it would usually route to a failure relationship.
>>
>> Is the id that you see in the bulletin a uuid?
>>
>> There should still be some provenance events for this FlowFile from the
>> previous points in the flow. If it looks like the uuid of the FlowFile, that
>> should be searchable from provenance using the search button on the right.
>> Let us know if we can help more.
>>
>> -Bryan
>>
>> On Thu, Nov 19, 2015 at 9:10 PM, Charlie Frasure
>> <ch...@gmail.com> wrote:
>>>
>>> I have a question on troubleshooting a flow.  I've built a flow with no
>>> exception routing, just trying to process the expected values first.  When a
>>> file exposes a problem with the logic in my flow, it queues up prior to the
>>> flow that is raising the bulletin.
>>>
>>> In the bulletin, I can see an id, but can't tell which file it is.  Data
>>> provenance doesn't seem to help as it passed the flow on the last processor,
>>> but hasn't been logged (to my knowledge) on the next one.
>>>
>>> Is there a way to match the bulletin back to a file without creating a
>>> route for failed files?
>>
>>
>

Re: queued files

Posted by Corey Flowers <cf...@onyxpoint.com>.
These guys are right. The file to look in for the uuid is the nifi-app.log.
Also if you wanted to see what the processor itself was doing, you could
right click on the processor, get its uuid and while it is running, run
(assuming it is on Linux):

tail -F nifi-app.log | grep uuid

This will just scroll the logs for that specific processor and will show
you what it is doing. It should also tell you specific file names and uuids
of the failing files.

Hope that helps! Have a great night and good luck!

Sent from my iPhone

On Nov 19, 2015, at 9:27 PM, Juan Sequeiros <he...@gmail.com> wrote:

You can also check the NiFi logs for a searchable id or for what the
previous processor ID produced to help search provenance.
On Nov 19, 2015 21:22, "Bryan Bende" <bb...@gmail.com> wrote:

> Charlie,
>
> The behavior you described usually means that the processor encountered an
> unexpected error which was thrown back to the framework which rolls back
> the processing of that flow file and leaves it in the queue, as opposed to
> an error it expected where it would usually route to a failure relationship.
>
> Is the id that you see in the bulletin a uuid?
>
> There should still be some provenance events for this FlowFile from the
> previous points in the flow. If it looks like the uuid of the FlowFile,
> that should be searchable from provenance using the search button on the
> right. Let us know if we can help more.
>
> -Bryan
>
> On Thu, Nov 19, 2015 at 9:10 PM, Charlie Frasure <charliefrasure@gmail.com
> > wrote:
>
>> I have a question on troubleshooting a flow.  I've built a flow with no
>> exception routing, just trying to process the expected values first.  When
>> a file exposes a problem with the logic in my flow, it queues up prior to
>> the flow that is raising the bulletin.
>>
>> In the bulletin, I can see an id, but can't tell which file it is.  Data
>> provenance doesn't seem to help as it passed the flow on the last
>> processor, but hasn't been logged (to my knowledge) on the next one.
>>
>> Is there a way to match the bulletin back to a file without creating a
>> route for failed files?
>>
>
>

Re: queued files

Posted by Juan Sequeiros <he...@gmail.com>.
You can also check the NiFi logs for a searchable id or for what the
previous processor ID produced to help search provenance.
On Nov 19, 2015 21:22, "Bryan Bende" <bb...@gmail.com> wrote:

> Charlie,
>
> The behavior you described usually means that the processor encountered an
> unexpected error which was thrown back to the framework which rolls back
> the processing of that flow file and leaves it in the queue, as opposed to
> an error it expected where it would usually route to a failure relationship.
>
> Is the id that you see in the bulletin a uuid?
>
> There should still be some provenance events for this FlowFile from the
> previous points in the flow. If it looks like the uuid of the FlowFile,
> that should be searchable from provenance using the search button on the
> right. Let us know if we can help more.
>
> -Bryan
>
> On Thu, Nov 19, 2015 at 9:10 PM, Charlie Frasure <charliefrasure@gmail.com
> > wrote:
>
>> I have a question on troubleshooting a flow.  I've built a flow with no
>> exception routing, just trying to process the expected values first.  When
>> a file exposes a problem with the logic in my flow, it queues up prior to
>> the flow that is raising the bulletin.
>>
>> In the bulletin, I can see an id, but can't tell which file it is.  Data
>> provenance doesn't seem to help as it passed the flow on the last
>> processor, but hasn't been logged (to my knowledge) on the next one.
>>
>> Is there a way to match the bulletin back to a file without creating a
>> route for failed files?
>>
>
>

Re: queued files

Posted by Bryan Bende <bb...@gmail.com>.
Charlie,

The behavior you described usually means that the processor encountered an
unexpected error which was thrown back to the framework which rolls back
the processing of that flow file and leaves it in the queue, as opposed to
an error it expected where it would usually route to a failure relationship.

Is the id that you see in the bulletin a uuid?

There should still be some provenance events for this FlowFile from the
previous points in the flow. If it looks like the uuid of the FlowFile,
that should be searchable from provenance using the search button on the
right. Let us know if we can help more.

-Bryan

On Thu, Nov 19, 2015 at 9:10 PM, Charlie Frasure <ch...@gmail.com>
wrote:

> I have a question on troubleshooting a flow.  I've built a flow with no
> exception routing, just trying to process the expected values first.  When
> a file exposes a problem with the logic in my flow, it queues up prior to
> the flow that is raising the bulletin.
>
> In the bulletin, I can see an id, but can't tell which file it is.  Data
> provenance doesn't seem to help as it passed the flow on the last
> processor, but hasn't been logged (to my knowledge) on the next one.
>
> Is there a way to match the bulletin back to a file without creating a
> route for failed files?
>