You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by Jeff Lord <je...@gmail.com> on 2016/02/19 02:36:06 UTC

splitText output appears to be getting dropped

I have a pretty simple flow where I query for a list of ids using
executeProcess and than pass that list along to splitText where I am trying
to split on each line to than dynamically build a url further down the line
using updateAttribute and so on.

executeProcess -> splitText -> putFile

For some reason I am only getting one file written with one line.
I would expect something more like 100 files each with one line.
Using the provenance reporter it appears that some of my items are being
dropped.

Time02/18/2016 17:13:46.145 PST
Event DurationNo value set
Lineage Duration00:00:12.187
TypeDROP
FlowFile Uuid7fa42367-490d-4b54-a32f-d062a885474a
File Size14 bytes
Component Id3b37a828-ba2c-4047-ba7a-578fd0684ce6
Component NamePutFile
Component TypePutFile
DetailsAuto-Terminated by failure Relationship

Any ideas on what I need to change here?

Thanks in advance,

Jeff

Re: splitText output appears to be getting dropped

Posted by Conrad Crampton <co...@SecData.com>.
Hi Matt,
Of course, if I actually stopped and thought about this for a moment, then this makes perfect sense – i.e. How would NiFi know which flow file filename should be chosen.
However, even with the first file in the bin strategy, the file name chosen is still the original one before it is renamed which still seems a bit odd.
No matter, for full control, I’ll rename the file after the merge processor.
Thanks for pointing this out though Matt.
Conrad

From: Matthew Clarke <ma...@gmail.com>>
Reply-To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
Date: Friday, 19 February 2016 at 18:29
To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
Subject: Re: splitText output appears to be getting dropped

Conrad,
     The mergeContent processor will bin files based upon the configuration you have configured.  Since it is taking multiple files and creating one output file from them, that output file cannot have multiple filenames.  MergeContent will use the filename of the first file in the bin as the filename of the output file.  As far as the rest of the attributes go from the numerous source files, the 'Attribute Strategy' property in MergeContent determines how they are applied to the new output file.

Matt

On Fri, Feb 19, 2016 at 11:25 AM, Conrad Crampton <co...@secdata.com>> wrote:
Hi,
Perfect!
I tried \n for linefeed – didn’t think of shift+enter!

The reason I was updating filename early on in my flow file was just because I already had UpdateAttributes that was a handy place to do so. I can put it just before the PutFile though so no major issue, just wondered why this was happening and if it was be design (feature) or bug.

Thanks
Conrad

From: Bryan Bende <bb...@gmail.com>>
Reply-To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
Date: Friday, 19 February 2016 at 16:16
To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
Subject: Re: splitText output appears to be getting dropped

Hello,

MergeContent has properties for header, demarcator, and footer, and also has a strategy property which specifies whether these values come from a file or inline text.

If you do inline text and specify a demarcator of a new line (shift + enter in the demarcator value) then binary concatenation will get you all of the lines merged together with new lines between them.

As far as the file naming, can you just wait until after RouteContent to rename them? They just need be renamed before the PutFile, but it doesn't necessarily have to be before RouteOnContent.

Let us know if that helps.

Thanks,

Bryan


On Fri, Feb 19, 2016 at 11:01 AM, Conrad Crampton <co...@secdata.com>> wrote:
Hi,
Sorry to piggy back on this thread, but I have pretty much the same issue – I am splitting log files -> routeoncontent (various paths) two of these paths (including unmatched), basically need to just get farmed off into a directory just in case they are needed later.
These go into a MergeContent processor where I would like to merge into one file – each flowfile content as a line in the file delimited by line feed (as like the original file), whichever way I try this though doesn’t quite do what I want. If I try BinaryConcatenation the file ends up as one long line, if TAR each Flowfile is a separate file in a TAR (not unsurprisingly). There doesn’t seem to be anyway of merging flow file content into one file (that ideally has similar functions to be able to compress, specify number of files etc.)

Another related question to the answer below (really helped me out with same issue), however if I rename the filename early on in my process flow, it appears to be changed back to its original at MergeContent processor time so I have to put another UpdateAttributes step in after the Merge to rename the filename.
The flow is

UpdateAttributes -> RouteOnContent -> UpdateAttribute -> MergeContent -> PutFile
             ^   ^^^
     |  |||
Filename changed samesamereverted

If I put an extra UpdateAttribute before PutFile then fine. Logging at each of the above points shows filename updated to ${uuid}-${filename}, but at reverted is back at filename.

Any suggestions on particularly the first question??

Thanks
Conrad



From: Jeff Lord <je...@gmail.com>>
Reply-To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
Date: Friday, 19 February 2016 at 03:22
To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
Subject: Re: splitText output appears to be getting dropped

Matt,

Thanks a bunch!
That did the trick.
Is there a better way to handle this out of curiosity? Than writing out a single line into multiple files.
Each file contains a single string that will be used to build a url.

-Jeff

On Thu, Feb 18, 2016 at 6:00 PM, Matthew Clarke <ma...@gmail.com>> wrote:

Jeff,
      It appears you files are being dropped because your are auto-terminating the failure relationship on your putFile processor. When the splitText processor splits the file by lines every new file has the same filename as the original it came from. My guess is the first file is being worked to disk and all others are failing because a file of the same name already exists in target dir. Try adding an UpdateAttribute processor after the splitText to rename all the files. Easiest way is to append the files uuid to its filename.  I also do not recommend auto-terminating failure relationships except in rare cases.

Matt

On Feb 18, 2016 8:36 PM, "Jeff Lord" <je...@gmail.com>> wrote:
I have a pretty simple flow where I query for a list of ids using executeProcess and than pass that list along to splitText where I am trying to split on each line to than dynamically build a url further down the line using updateAttribute and so on.

executeProcess -> splitText -> putFile

For some reason I am only getting one file written with one line.
I would expect something more like 100 files each with one line.
Using the provenance reporter it appears that some of my items are being dropped.

Time02/18/2016 17:13:46.145 PST
Event DurationNo value set
Lineage Duration00:00:12.187
TypeDROP
FlowFile Uuid7fa42367-490d-4b54-a32f-d062a885474a
File Size14 bytes
Component Id3b37a828-ba2c-4047-ba7a-578fd0684ce6
Component NamePutFile
Component TypePutFile
DetailsAuto-Terminated by failure Relationship

Any ideas on what I need to change here?

Thanks in advance,

Jeff




***This email originated outside SecureData***

Click here<https://www.mailcontrol.com/sr/MZbqvYs5QwJvpeaetUwhCQ==> to report this email as spam.


SecureData, combating cyber threats

________________________________

The information contained in this message or any of its attachments may be privileged and confidential and intended for the exclusive use of the intended recipient. If you are not the intended recipient any disclosure, reproduction, distribution or other dissemination or use of this communications is strictly prohibited. The views expressed in this email are those of the individual and not necessarily of SecureData Europe Ltd. Any prices quoted are only valid if followed up by a formal written quote.

SecureData Europe Limited. Registered in England & Wales 04365896. Registered Address: SecureData House, Hermitage Court, Hermitage Lane, Maidstone, Kent, ME16 9NT



Re: splitText output appears to be getting dropped

Posted by Matthew Clarke <ma...@gmail.com>.
Conrad,
     The mergeContent processor will bin files based upon the configuration
you have configured.  Since it is taking multiple files and creating one
output file from them, that output file cannot have multiple filenames.
MergeContent will use the filename of the first file in the bin as the
filename of the output file.  As far as the rest of the attributes go from
the numerous source files, the 'Attribute Strategy' property in
MergeContent determines how they are applied to the new output file.

Matt

On Fri, Feb 19, 2016 at 11:25 AM, Conrad Crampton <
conrad.crampton@secdata.com> wrote:

> Hi,
> Perfect!
> I tried \n for linefeed – didn’t think of shift+enter!
>
> The reason I was updating filename early on in my flow file was just
> because I already had UpdateAttributes that was a handy place to do so. I
> can put it just before the PutFile though so no major issue, just wondered
> why this was happening and if it was be design (feature) or bug.
>
> Thanks
> Conrad
>
> From: Bryan Bende <bb...@gmail.com>
> Reply-To: "users@nifi.apache.org" <us...@nifi.apache.org>
> Date: Friday, 19 February 2016 at 16:16
> To: "users@nifi.apache.org" <us...@nifi.apache.org>
> Subject: Re: splitText output appears to be getting dropped
>
> Hello,
>
> MergeContent has properties for header, demarcator, and footer, and also
> has a strategy property which specifies whether these values come from a
> file or inline text.
>
> If you do inline text and specify a demarcator of a new line (shift +
> enter in the demarcator value) then binary concatenation will get you all
> of the lines merged together with new lines between them.
>
> As far as the file naming, can you just wait until after RouteContent to
> rename them? They just need be renamed before the PutFile, but it doesn't
> necessarily have to be before RouteOnContent.
>
> Let us know if that helps.
>
> Thanks,
>
> Bryan
>
>
> On Fri, Feb 19, 2016 at 11:01 AM, Conrad Crampton <
> conrad.crampton@secdata.com> wrote:
>
>> Hi,
>> Sorry to piggy back on this thread, but I have pretty much the same issue
>> – I am splitting log files -> routeoncontent (various paths) two of these
>> paths (including unmatched), basically need to just get farmed off into a
>> directory just in case they are needed later.
>> These go into a MergeContent processor where I would like to merge into
>> one file – each flowfile content as a line in the file delimited by line
>> feed (as like the original file), whichever way I try this though doesn’t
>> quite do what I want. If I try BinaryConcatenation the file ends up as one
>> long line, if TAR each Flowfile is a separate file in a TAR (not
>> unsurprisingly). There doesn’t seem to be anyway of merging flow file
>> content into one file (that ideally has similar functions to be able to
>> compress, specify number of files etc.)
>>
>> Another related question to the answer below (really helped me out with
>> same issue), however if I rename the filename early on in my process flow,
>> it appears to be changed back to its original at MergeContent processor
>> time so I have to put another UpdateAttributes step in after the Merge to
>> rename the filename.
>> The flow is
>>
>> UpdateAttributes -> RouteOnContent -> UpdateAttribute -> MergeContent ->
>> PutFile
>>              ^   ^^ ^
>>      |  | ||
>> Filename changed samesame reverted
>>
>> If I put an extra UpdateAttribute before PutFile then fine. Logging at
>> each of the above points shows filename updated to ${uuid}-${filename}, but
>> at reverted is back at filename.
>>
>> Any suggestions on particularly the first question??
>>
>> Thanks
>> Conrad
>>
>>
>>
>> From: Jeff Lord <je...@gmail.com>
>> Reply-To: "users@nifi.apache.org" <us...@nifi.apache.org>
>> Date: Friday, 19 February 2016 at 03:22
>> To: "users@nifi.apache.org" <us...@nifi.apache.org>
>> Subject: Re: splitText output appears to be getting dropped
>>
>> Matt,
>>
>> Thanks a bunch!
>> That did the trick.
>> Is there a better way to handle this out of curiosity? Than writing out a
>> single line into multiple files.
>> Each file contains a single string that will be used to build a url.
>>
>> -Jeff
>>
>> On Thu, Feb 18, 2016 at 6:00 PM, Matthew Clarke <
>> matt.clarke.138@gmail.com> wrote:
>>
>>> Jeff,
>>>       It appears you files are being dropped because your are
>>> auto-terminating the failure relationship on your putFile processor. When
>>> the splitText processor splits the file by lines every new file has the
>>> same filename as the original it came from. My guess is the first file is
>>> being worked to disk and all others are failing because a file of the same
>>> name already exists in target dir. Try adding an UpdateAttribute processor
>>> after the splitText to rename all the files. Easiest way is to append the
>>> files uuid to its filename.  I also do not recommend auto-terminating
>>> failure relationships except in rare cases.
>>>
>>> Matt
>>> On Feb 18, 2016 8:36 PM, "Jeff Lord" <je...@gmail.com> wrote:
>>>
>>>> I have a pretty simple flow where I query for a list of ids using
>>>> executeProcess and than pass that list along to splitText where I am trying
>>>> to split on each line to than dynamically build a url further down the line
>>>> using updateAttribute and so on.
>>>>
>>>> executeProcess -> splitText -> putFile
>>>>
>>>> For some reason I am only getting one file written with one line.
>>>> I would expect something more like 100 files each with one line.
>>>> Using the provenance reporter it appears that some of my items are
>>>> being dropped.
>>>>
>>>> Time02/18/2016 17:13:46.145 PST
>>>> Event DurationNo value set
>>>> Lineage Duration00:00:12.187
>>>> TypeDROP
>>>> FlowFile Uuid7fa42367-490d-4b54-a32f-d062a885474a
>>>> File Size14 bytes
>>>> Component Id3b37a828-ba2c-4047-ba7a-578fd0684ce6
>>>> Component NamePutFile
>>>> Component TypePutFile
>>>> DetailsAuto-Terminated by failure Relationship
>>>>
>>>> Any ideas on what I need to change here?
>>>>
>>>> Thanks in advance,
>>>>
>>>> Jeff
>>>>
>>>
>>
>>
>> ***This email originated outside SecureData***
>>
>> Click here <https://www.mailcontrol.com/sr/MZbqvYs5QwJvpeaetUwhCQ==> to
>> report this email as spam.
>>
>>
>> SecureData, combating cyber threats
>>
>> ------------------------------
>>
>> The information contained in this message or any of its attachments may
>> be privileged and confidential and intended for the exclusive use of the
>> intended recipient. If you are not the intended recipient any disclosure,
>> reproduction, distribution or other dissemination or use of this
>> communications is strictly prohibited. The views expressed in this email
>> are those of the individual and not necessarily of SecureData Europe Ltd.
>> Any prices quoted are only valid if followed up by a formal written quote.
>>
>> SecureData Europe Limited. Registered in England & Wales 04365896.
>> Registered Address: SecureData House, Hermitage Court, Hermitage Lane,
>> Maidstone, Kent, ME16 9NT
>>
>
>

Re: splitText output appears to be getting dropped

Posted by Conrad Crampton <co...@SecData.com>.
Hi,
Perfect!
I tried \n for linefeed – didn’t think of shift+enter!

The reason I was updating filename early on in my flow file was just because I already had UpdateAttributes that was a handy place to do so. I can put it just before the PutFile though so no major issue, just wondered why this was happening and if it was be design (feature) or bug.

Thanks
Conrad

From: Bryan Bende <bb...@gmail.com>>
Reply-To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
Date: Friday, 19 February 2016 at 16:16
To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
Subject: Re: splitText output appears to be getting dropped

Hello,

MergeContent has properties for header, demarcator, and footer, and also has a strategy property which specifies whether these values come from a file or inline text.

If you do inline text and specify a demarcator of a new line (shift + enter in the demarcator value) then binary concatenation will get you all of the lines merged together with new lines between them.

As far as the file naming, can you just wait until after RouteContent to rename them? They just need be renamed before the PutFile, but it doesn't necessarily have to be before RouteOnContent.

Let us know if that helps.

Thanks,

Bryan


On Fri, Feb 19, 2016 at 11:01 AM, Conrad Crampton <co...@secdata.com>> wrote:
Hi,
Sorry to piggy back on this thread, but I have pretty much the same issue – I am splitting log files -> routeoncontent (various paths) two of these paths (including unmatched), basically need to just get farmed off into a directory just in case they are needed later.
These go into a MergeContent processor where I would like to merge into one file – each flowfile content as a line in the file delimited by line feed (as like the original file), whichever way I try this though doesn’t quite do what I want. If I try BinaryConcatenation the file ends up as one long line, if TAR each Flowfile is a separate file in a TAR (not unsurprisingly). There doesn’t seem to be anyway of merging flow file content into one file (that ideally has similar functions to be able to compress, specify number of files etc.)

Another related question to the answer below (really helped me out with same issue), however if I rename the filename early on in my process flow, it appears to be changed back to its original at MergeContent processor time so I have to put another UpdateAttributes step in after the Merge to rename the filename.
The flow is

UpdateAttributes -> RouteOnContent -> UpdateAttribute -> MergeContent -> PutFile
             ^   ^^ ^
     |  | ||
Filename changed samesame reverted

If I put an extra UpdateAttribute before PutFile then fine. Logging at each of the above points shows filename updated to ${uuid}-${filename}, but at reverted is back at filename.

Any suggestions on particularly the first question??

Thanks
Conrad



From: Jeff Lord <je...@gmail.com>>
Reply-To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
Date: Friday, 19 February 2016 at 03:22
To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
Subject: Re: splitText output appears to be getting dropped

Matt,

Thanks a bunch!
That did the trick.
Is there a better way to handle this out of curiosity? Than writing out a single line into multiple files.
Each file contains a single string that will be used to build a url.

-Jeff

On Thu, Feb 18, 2016 at 6:00 PM, Matthew Clarke <ma...@gmail.com>> wrote:

Jeff,
      It appears you files are being dropped because your are auto-terminating the failure relationship on your putFile processor. When the splitText processor splits the file by lines every new file has the same filename as the original it came from. My guess is the first file is being worked to disk and all others are failing because a file of the same name already exists in target dir. Try adding an UpdateAttribute processor after the splitText to rename all the files. Easiest way is to append the files uuid to its filename.  I also do not recommend auto-terminating failure relationships except in rare cases.

Matt

On Feb 18, 2016 8:36 PM, "Jeff Lord" <je...@gmail.com>> wrote:
I have a pretty simple flow where I query for a list of ids using executeProcess and than pass that list along to splitText where I am trying to split on each line to than dynamically build a url further down the line using updateAttribute and so on.

executeProcess -> splitText -> putFile

For some reason I am only getting one file written with one line.
I would expect something more like 100 files each with one line.
Using the provenance reporter it appears that some of my items are being dropped.

Time02/18/2016 17:13:46.145 PST
Event DurationNo value set
Lineage Duration00:00:12.187
TypeDROP
FlowFile Uuid7fa42367-490d-4b54-a32f-d062a885474a
File Size14 bytes
Component Id3b37a828-ba2c-4047-ba7a-578fd0684ce6
Component NamePutFile
Component TypePutFile
DetailsAuto-Terminated by failure Relationship

Any ideas on what I need to change here?

Thanks in advance,

Jeff




***This email originated outside SecureData***

Click here<https://www.mailcontrol.com/sr/MZbqvYs5QwJvpeaetUwhCQ==> to report this email as spam.


SecureData, combating cyber threats

________________________________

The information contained in this message or any of its attachments may be privileged and confidential and intended for the exclusive use of the intended recipient. If you are not the intended recipient any disclosure, reproduction, distribution or other dissemination or use of this communications is strictly prohibited. The views expressed in this email are those of the individual and not necessarily of SecureData Europe Ltd. Any prices quoted are only valid if followed up by a formal written quote.

SecureData Europe Limited. Registered in England & Wales 04365896. Registered Address: SecureData House, Hermitage Court, Hermitage Lane, Maidstone, Kent, ME16 9NT


Re: splitText output appears to be getting dropped

Posted by Bryan Bende <bb...@gmail.com>.
Hello,

MergeContent has properties for header, demarcator, and footer, and also
has a strategy property which specifies whether these values come from a
file or inline text.

If you do inline text and specify a demarcator of a new line (shift + enter
in the demarcator value) then binary concatenation will get you all of the
lines merged together with new lines between them.

As far as the file naming, can you just wait until after RouteContent to
rename them? They just need be renamed before the PutFile, but it doesn't
necessarily have to be before RouteOnContent.

Let us know if that helps.

Thanks,

Bryan


On Fri, Feb 19, 2016 at 11:01 AM, Conrad Crampton <
conrad.crampton@secdata.com> wrote:

> Hi,
> Sorry to piggy back on this thread, but I have pretty much the same issue
> – I am splitting log files -> routeoncontent (various paths) two of these
> paths (including unmatched), basically need to just get farmed off into a
> directory just in case they are needed later.
> These go into a MergeContent processor where I would like to merge into
> one file – each flowfile content as a line in the file delimited by line
> feed (as like the original file), whichever way I try this though doesn’t
> quite do what I want. If I try BinaryConcatenation the file ends up as one
> long line, if TAR each Flowfile is a separate file in a TAR (not
> unsurprisingly). There doesn’t seem to be anyway of merging flow file
> content into one file (that ideally has similar functions to be able to
> compress, specify number of files etc.)
>
> Another related question to the answer below (really helped me out with
> same issue), however if I rename the filename early on in my process flow,
> it appears to be changed back to its original at MergeContent processor
> time so I have to put another UpdateAttributes step in after the Merge to
> rename the filename.
> The flow is
>
> UpdateAttributes -> RouteOnContent -> UpdateAttribute -> MergeContent ->
> PutFile
>              ^   ^ ^ ^
>      |   | | |
> Filename changed same same reverted
>
> If I put an extra UpdateAttribute before PutFile then fine. Logging at
> each of the above points shows filename updated to ${uuid}-${filename}, but
> at reverted is back at filename.
>
> Any suggestions on particularly the first question??
>
> Thanks
> Conrad
>
>
>
> From: Jeff Lord <je...@gmail.com>
> Reply-To: "users@nifi.apache.org" <us...@nifi.apache.org>
> Date: Friday, 19 February 2016 at 03:22
> To: "users@nifi.apache.org" <us...@nifi.apache.org>
> Subject: Re: splitText output appears to be getting dropped
>
> Matt,
>
> Thanks a bunch!
> That did the trick.
> Is there a better way to handle this out of curiosity? Than writing out a
> single line into multiple files.
> Each file contains a single string that will be used to build a url.
>
> -Jeff
>
> On Thu, Feb 18, 2016 at 6:00 PM, Matthew Clarke <matt.clarke.138@gmail.com
> > wrote:
>
>> Jeff,
>>       It appears you files are being dropped because your are
>> auto-terminating the failure relationship on your putFile processor. When
>> the splitText processor splits the file by lines every new file has the
>> same filename as the original it came from. My guess is the first file is
>> being worked to disk and all others are failing because a file of the same
>> name already exists in target dir. Try adding an UpdateAttribute processor
>> after the splitText to rename all the files. Easiest way is to append the
>> files uuid to its filename.  I also do not recommend auto-terminating
>> failure relationships except in rare cases.
>>
>> Matt
>> On Feb 18, 2016 8:36 PM, "Jeff Lord" <je...@gmail.com> wrote:
>>
>>> I have a pretty simple flow where I query for a list of ids using
>>> executeProcess and than pass that list along to splitText where I am trying
>>> to split on each line to than dynamically build a url further down the line
>>> using updateAttribute and so on.
>>>
>>> executeProcess -> splitText -> putFile
>>>
>>> For some reason I am only getting one file written with one line.
>>> I would expect something more like 100 files each with one line.
>>> Using the provenance reporter it appears that some of my items are being
>>> dropped.
>>>
>>> Time02/18/2016 17:13:46.145 PST
>>> Event DurationNo value set
>>> Lineage Duration00:00:12.187
>>> TypeDROP
>>> FlowFile Uuid7fa42367-490d-4b54-a32f-d062a885474a
>>> File Size14 bytes
>>> Component Id3b37a828-ba2c-4047-ba7a-578fd0684ce6
>>> Component NamePutFile
>>> Component TypePutFile
>>> DetailsAuto-Terminated by failure Relationship
>>>
>>> Any ideas on what I need to change here?
>>>
>>> Thanks in advance,
>>>
>>> Jeff
>>>
>>
>
>
> ***This email originated outside SecureData***
>
> Click here <https://www.mailcontrol.com/sr/MZbqvYs5QwJvpeaetUwhCQ==> to
> report this email as spam.
>
>
> SecureData, combating cyber threats
>
> ------------------------------
>
> The information contained in this message or any of its attachments may be
> privileged and confidential and intended for the exclusive use of the
> intended recipient. If you are not the intended recipient any disclosure,
> reproduction, distribution or other dissemination or use of this
> communications is strictly prohibited. The views expressed in this email
> are those of the individual and not necessarily of SecureData Europe Ltd.
> Any prices quoted are only valid if followed up by a formal written quote.
>
> SecureData Europe Limited. Registered in England & Wales 04365896.
> Registered Address: SecureData House, Hermitage Court, Hermitage Lane,
> Maidstone, Kent, ME16 9NT
>

Re: splitText output appears to be getting dropped

Posted by Conrad Crampton <co...@SecData.com>.
Hi,
Sorry to piggy back on this thread, but I have pretty much the same issue – I am splitting log files -> routeoncontent (various paths) two of these paths (including unmatched), basically need to just get farmed off into a directory just in case they are needed later.
These go into a MergeContent processor where I would like to merge into one file – each flowfile content as a line in the file delimited by line feed (as like the original file), whichever way I try this though doesn’t quite do what I want. If I try BinaryConcatenation the file ends up as one long line, if TAR each Flowfile is a separate file in a TAR (not unsurprisingly). There doesn’t seem to be anyway of merging flow file content into one file (that ideally has similar functions to be able to compress, specify number of files etc.)

Another related question to the answer below (really helped me out with same issue), however if I rename the filename early on in my process flow, it appears to be changed back to its original at MergeContent processor time so I have to put another UpdateAttributes step in after the Merge to rename the filename.
The flow is

UpdateAttributes -> RouteOnContent -> UpdateAttribute -> MergeContent -> PutFile
             ^   ^ ^ ^
     |   | | |
Filename changed same same reverted

If I put an extra UpdateAttribute before PutFile then fine. Logging at each of the above points shows filename updated to ${uuid}-${filename}, but at reverted is back at filename.

Any suggestions on particularly the first question??

Thanks
Conrad



From: Jeff Lord <je...@gmail.com>>
Reply-To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
Date: Friday, 19 February 2016 at 03:22
To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
Subject: Re: splitText output appears to be getting dropped

Matt,

Thanks a bunch!
That did the trick.
Is there a better way to handle this out of curiosity? Than writing out a single line into multiple files.
Each file contains a single string that will be used to build a url.

-Jeff

On Thu, Feb 18, 2016 at 6:00 PM, Matthew Clarke <ma...@gmail.com>> wrote:

Jeff,
      It appears you files are being dropped because your are auto-terminating the failure relationship on your putFile processor. When the splitText processor splits the file by lines every new file has the same filename as the original it came from. My guess is the first file is being worked to disk and all others are failing because a file of the same name already exists in target dir. Try adding an UpdateAttribute processor after the splitText to rename all the files. Easiest way is to append the files uuid to its filename.  I also do not recommend auto-terminating failure relationships except in rare cases.

Matt

On Feb 18, 2016 8:36 PM, "Jeff Lord" <je...@gmail.com>> wrote:
I have a pretty simple flow where I query for a list of ids using executeProcess and than pass that list along to splitText where I am trying to split on each line to than dynamically build a url further down the line using updateAttribute and so on.

executeProcess -> splitText -> putFile

For some reason I am only getting one file written with one line.
I would expect something more like 100 files each with one line.
Using the provenance reporter it appears that some of my items are being dropped.

Time02/18/2016 17:13:46.145 PST
Event DurationNo value set
Lineage Duration00:00:12.187
TypeDROP
FlowFile Uuid7fa42367-490d-4b54-a32f-d062a885474a
File Size14 bytes
Component Id3b37a828-ba2c-4047-ba7a-578fd0684ce6
Component NamePutFile
Component TypePutFile
DetailsAuto-Terminated by failure Relationship

Any ideas on what I need to change here?

Thanks in advance,

Jeff




***This email originated outside SecureData***

Click here<https://www.mailcontrol.com/sr/MZbqvYs5QwJvpeaetUwhCQ==> to report this email as spam.


SecureData, combating cyber threats
______________________________________________________________________ 
The information contained in this message or any of its attachments may be privileged and confidential and intended for the exclusive use of the intended recipient. If you are not the intended recipient any disclosure, reproduction, distribution or other dissemination or use of this communications is strictly prohibited. The views expressed in this email are those of the individual and not necessarily of SecureData Europe Ltd. Any prices quoted are only valid if followed up by a formal written quote.

SecureData Europe Limited. Registered in England & Wales 04365896. Registered Address: SecureData House, Hermitage Court, Hermitage Lane, Maidstone, Kent, ME16 9NT

Re: splitText output appears to be getting dropped

Posted by Jeff Lord <je...@gmail.com>.
Matt,

Thanks a bunch!
That did the trick.
Is there a better way to handle this out of curiosity? Than writing out a
single line into multiple files.
Each file contains a single string that will be used to build a url.

-Jeff

On Thu, Feb 18, 2016 at 6:00 PM, Matthew Clarke <ma...@gmail.com>
wrote:

> Jeff,
>       It appears you files are being dropped because your are
> auto-terminating the failure relationship on your putFile processor. When
> the splitText processor splits the file by lines every new file has the
> same filename as the original it came from. My guess is the first file is
> being worked to disk and all others are failing because a file of the same
> name already exists in target dir. Try adding an UpdateAttribute processor
> after the splitText to rename all the files. Easiest way is to append the
> files uuid to its filename.  I also do not recommend auto-terminating
> failure relationships except in rare cases.
>
> Matt
> On Feb 18, 2016 8:36 PM, "Jeff Lord" <je...@gmail.com> wrote:
>
>> I have a pretty simple flow where I query for a list of ids using
>> executeProcess and than pass that list along to splitText where I am trying
>> to split on each line to than dynamically build a url further down the line
>> using updateAttribute and so on.
>>
>> executeProcess -> splitText -> putFile
>>
>> For some reason I am only getting one file written with one line.
>> I would expect something more like 100 files each with one line.
>> Using the provenance reporter it appears that some of my items are being
>> dropped.
>>
>> Time02/18/2016 17:13:46.145 PST
>> Event DurationNo value set
>> Lineage Duration00:00:12.187
>> TypeDROP
>> FlowFile Uuid7fa42367-490d-4b54-a32f-d062a885474a
>> File Size14 bytes
>> Component Id3b37a828-ba2c-4047-ba7a-578fd0684ce6
>> Component NamePutFile
>> Component TypePutFile
>> DetailsAuto-Terminated by failure Relationship
>>
>> Any ideas on what I need to change here?
>>
>> Thanks in advance,
>>
>> Jeff
>>
>

Re: splitText output appears to be getting dropped

Posted by Matthew Clarke <ma...@gmail.com>.
Jeff,
      It appears you files are being dropped because your are
auto-terminating the failure relationship on your putFile processor. When
the splitText processor splits the file by lines every new file has the
same filename as the original it came from. My guess is the first file is
being worked to disk and all others are failing because a file of the same
name already exists in target dir. Try adding an UpdateAttribute processor
after the splitText to rename all the files. Easiest way is to append the
files uuid to its filename.  I also do not recommend auto-terminating
failure relationships except in rare cases.

Matt
On Feb 18, 2016 8:36 PM, "Jeff Lord" <je...@gmail.com> wrote:

> I have a pretty simple flow where I query for a list of ids using
> executeProcess and than pass that list along to splitText where I am trying
> to split on each line to than dynamically build a url further down the line
> using updateAttribute and so on.
>
> executeProcess -> splitText -> putFile
>
> For some reason I am only getting one file written with one line.
> I would expect something more like 100 files each with one line.
> Using the provenance reporter it appears that some of my items are being
> dropped.
>
> Time02/18/2016 17:13:46.145 PST
> Event DurationNo value set
> Lineage Duration00:00:12.187
> TypeDROP
> FlowFile Uuid7fa42367-490d-4b54-a32f-d062a885474a
> File Size14 bytes
> Component Id3b37a828-ba2c-4047-ba7a-578fd0684ce6
> Component NamePutFile
> Component TypePutFile
> DetailsAuto-Terminated by failure Relationship
>
> Any ideas on what I need to change here?
>
> Thanks in advance,
>
> Jeff
>