You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by Charlie Frasure <ch...@gmail.com> on 2015/11/23 16:54:39 UTC

archive files

Use case: Archive and compress files by category and month, store like
files in a common directory.

I'm already processing the files, and have extracted the interesting
attributes from each.  I ran them through MergeContent, but have not been
able to produce a logical directory structure to store the results.  I
would prefer something like archive/categoryA/201511/somefilename.tar.gz
where somefilename is made up of all the categoryA files received in
November 2015.

I switched gears, and used PutFile to store the files in the preferred
directory structure, but at a loss of how to archive them within their
folders given hundreds of dynamic categories, and date additions every
month.

I'm playing with MergeContent's Correlation Attribute Name, but am also
considering trying the "Degfragment" merge strategy by correlating the
files earlier in the process.

Any suggestions would be appreciated.

Re: archive files

Posted by Charlie Frasure <ch...@gmail.com>.
Thanks Mark, that would be especially useful during development of a new
flow, I believe.  I decreased the timeouts and increased the max number of
bins to get some of the files merging that were being binned individually.

On Mon, Nov 30, 2015 at 3:22 PM, Mark Payne <ma...@hotmail.com> wrote:

> Charlie,
>
> As you mentioned, there have been several others asking about how Merge
> Content is making the determination
> that a bin is full. I created a ticket [1] to add this information to the
> Provenance Event generated by Merge Content.
> This way, it should be much more obvious exactly why each bin is being
> merged.
>
> Thanks
> -Mark
>
> [1] https://issues.apache.org/jira/browse/NIFI-1232
>
>
>
>
> On Nov 30, 2015, at 3:11 PM, Mark Payne <ma...@hotmail.com> wrote:
>
> Charlie,
>
> One thing that you should note, specifically when using the Correlation
> Attribute is the <Maximum number of Bins> property. If the value that
> you are using for the Correlation Attribute varies quite a bit, you could
> quickly fill up the default number of bins (100). In this case, it won't be
> able to add a FlowFile to any of the bins until the timeout occurs and as
> a result it will immediately evict the oldest bin.
>
> Thanks
> -Mark
>
>
>
> On Nov 30, 2015, at 3:05 PM, Charlie Frasure <ch...@gmail.com>
> wrote:
>
> Joe,
>
> Thanks for checking in.  I tried it again and noticed that the correlation
> attribute in MergeContent doesn't accept expressions.  I was attempting to
> combine multiple attributes to define a bin, so I moved that expression to
> an earlier UpdateAttribute process which seemed to resolve my issue.
>
> Now I'm dealing with bins being released before I think they should, but
> it seems that there's been other people with the same problem that must've
> been resolved, so I'll poke on that a bit more before posting.
>
> Thanks,
> Charlie
>
>
>
>
> On Mon, Nov 30, 2015 at 1:21 PM, Joe Percivall <jo...@yahoo.com>
> wrote:
>
>> Hello Charlie,
>>
>> Sorry no one has gotten back to you yet, everyone is busy getting 0.4.0
>> finished up and of course Thanksgiving. Have you made any more progress?
>>
>>
>> Since it is a continuous task it is well within NiFi's wheelhouse. In
>> your original message you mentioned that you already had them merged in to
>> single flowfile but just had trouble creating the path to do a PutFile.
>> Have you tried using expression language [1] to create the path? Assuming
>> you have attributes for the category and date you should be able to create
>> an expression language expression which properly evaluates to what you need.
>>
>> If you need help with creating the proper expression, just reply with the
>> attribute names for the category and dates and I'd be happy to help.
>>
>> [1]
>> https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html
>>
>> Joe
>> - - - - - -
>> Joseph Percivall
>> linkedin.com/in/Percivall
>> e: joepercivall@yahoo.com
>>
>>
>>
>>
>> On Monday, November 23, 2015 11:37 AM, Charlie Frasure <
>> charliefrasure@gmail.com> wrote:
>>
>>
>>
>> Joe,
>>
>> This is a continuous task.  The main intent is to keep a version of the
>> file prior to conversions etc.  Ideally, it would be highly compressed, and
>> easy to locate.  Best case scenario, the archive files are the contents of
>> highly structured nested directories.  File sizes range from a few bytes to
>> < 1GB.  It wouldn't have to run real time (updating archives seems to be a
>> fairly intensive task), but would probably run at least every few days.
>>
>> Thanks,
>> Charlie
>>
>>
>>
>>
>>
>> On Mon, Nov 23, 2015 at 11:08 AM, Joe Witt <jo...@gmail.com> wrote:
>>
>> Charlie,
>> >
>> >Can give some pointers on how to get in the ballpark with this but
>> >want to make sure we have a good alignment of purpose here.  NiFi has
>> >from time to time come up as an intuitive way to build an archive
>> >management tool and it is always "not quite right" because of the
>> >subtle differences between continuous streams of information and
>> >ad-hoc sort of one-time tasks.
>> >
>> >Would this be a continuous task (always running) even if it is slow
>> >(every few minutes, hours, days) or would it be a one-time thing to
>> >move a bunch of data from one place to another?
>> >
>> >The difference sounds very minor but it will help me to understand how
>> >best to respond.
>> >
>> >Thanks
>> >Joe
>> >
>> >
>> >On Mon, Nov 23, 2015 at 10:54 AM, Charlie Frasure
>> ><ch...@gmail.com> wrote:
>> >> Use case: Archive and compress files by category and month, store like
>> files
>> >> in a common directory.
>> >>
>> >> I'm already processing the files, and have extracted the interesting
>> >> attributes from each.  I ran them through MergeContent, but have not
>> been
>> >> able to produce a logical directory structure to store the results.  I
>> would
>> >> prefer something like archive/categoryA/201511/somefilename.tar.gz
>> where
>> >> somefilename is made up of all the categoryA files received in November
>> >> 2015.
>> >>
>> >> I switched gears, and used PutFile to store the files in the preferred
>> >> directory structure, but at a loss of how to archive them within their
>> >> folders given hundreds of dynamic categories, and date additions every
>> >> month.
>> >>
>> >> I'm playing with MergeContent's Correlation Attribute Name, but am also
>> >> considering trying the "Degfragment" merge strategy by correlating the
>> files
>> >> earlier in the process.
>> >>
>> >> Any suggestions would be appreciated.
>> >
>>
>
>
>
>

Re: archive files

Posted by Mark Payne <ma...@hotmail.com>.
Charlie,

As you mentioned, there have been several others asking about how Merge Content is making the determination
that a bin is full. I created a ticket [1] to add this information to the Provenance Event generated by Merge Content.
This way, it should be much more obvious exactly why each bin is being merged.

Thanks
-Mark

[1] https://issues.apache.org/jira/browse/NIFI-1232 <https://issues.apache.org/jira/browse/NIFI-1232>




> On Nov 30, 2015, at 3:11 PM, Mark Payne <ma...@hotmail.com> wrote:
> 
> Charlie,
> 
> One thing that you should note, specifically when using the Correlation Attribute is the <Maximum number of Bins> property. If the value that
> you are using for the Correlation Attribute varies quite a bit, you could quickly fill up the default number of bins (100). In this case, it won't be
> able to add a FlowFile to any of the bins until the timeout occurs and as a result it will immediately evict the oldest bin. 
> 
> Thanks
> -Mark
> 
> 
> 
>> On Nov 30, 2015, at 3:05 PM, Charlie Frasure <charliefrasure@gmail.com <ma...@gmail.com>> wrote:
>> 
>> Joe,
>> 
>> Thanks for checking in.  I tried it again and noticed that the correlation attribute in MergeContent doesn't accept expressions.  I was attempting to combine multiple attributes to define a bin, so I moved that expression to an earlier UpdateAttribute process which seemed to resolve my issue.
>> 
>> Now I'm dealing with bins being released before I think they should, but it seems that there's been other people with the same problem that must've been resolved, so I'll poke on that a bit more before posting.
>> 
>> Thanks,
>> Charlie
>>  
>> 
>> 
>> 
>> On Mon, Nov 30, 2015 at 1:21 PM, Joe Percivall <joepercivall@yahoo.com <ma...@yahoo.com>> wrote:
>> Hello Charlie,
>> 
>> Sorry no one has gotten back to you yet, everyone is busy getting 0.4.0 finished up and of course Thanksgiving. Have you made any more progress?
>> 
>> 
>> Since it is a continuous task it is well within NiFi's wheelhouse. In your original message you mentioned that you already had them merged in to single flowfile but just had trouble creating the path to do a PutFile. Have you tried using expression language [1] to create the path? Assuming you have attributes for the category and date you should be able to create an expression language expression which properly evaluates to what you need.
>> 
>> If you need help with creating the proper expression, just reply with the attribute names for the category and dates and I'd be happy to help.
>> 
>> [1] https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html <https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html>
>> 
>> Joe
>> - - - - - -
>> Joseph Percivall
>> linkedin.com/in/Percivall <http://linkedin.com/in/Percivall>
>> e: joepercivall@yahoo.com <ma...@yahoo.com>
>> 
>> 
>> 
>> 
>> On Monday, November 23, 2015 11:37 AM, Charlie Frasure <charliefrasure@gmail.com <ma...@gmail.com>> wrote:
>> 
>> 
>> 
>> Joe,
>> 
>> This is a continuous task.  The main intent is to keep a version of the file prior to conversions etc.  Ideally, it would be highly compressed, and easy to locate.  Best case scenario, the archive files are the contents of highly structured nested directories.  File sizes range from a few bytes to < 1GB.  It wouldn't have to run real time (updating archives seems to be a fairly intensive task), but would probably run at least every few days.
>> 
>> Thanks,
>> Charlie
>> 
>> 
>> 
>> 
>> 
>> On Mon, Nov 23, 2015 at 11:08 AM, Joe Witt <joe.witt@gmail.com <ma...@gmail.com>> wrote:
>> 
>> Charlie,
>> >
>> >Can give some pointers on how to get in the ballpark with this but
>> >want to make sure we have a good alignment of purpose here.  NiFi has
>> >from time to time come up as an intuitive way to build an archive
>> >management tool and it is always "not quite right" because of the
>> >subtle differences between continuous streams of information and
>> >ad-hoc sort of one-time tasks.
>> >
>> >Would this be a continuous task (always running) even if it is slow
>> >(every few minutes, hours, days) or would it be a one-time thing to
>> >move a bunch of data from one place to another?
>> >
>> >The difference sounds very minor but it will help me to understand how
>> >best to respond.
>> >
>> >Thanks
>> >Joe
>> >
>> >
>> >On Mon, Nov 23, 2015 at 10:54 AM, Charlie Frasure
>> ><charliefrasure@gmail.com <ma...@gmail.com>> wrote:
>> >> Use case: Archive and compress files by category and month, store like files
>> >> in a common directory.
>> >>
>> >> I'm already processing the files, and have extracted the interesting
>> >> attributes from each.  I ran them through MergeContent, but have not been
>> >> able to produce a logical directory structure to store the results.  I would
>> >> prefer something like archive/categoryA/201511/somefilename.tar.gz where
>> >> somefilename is made up of all the categoryA files received in November
>> >> 2015.
>> >>
>> >> I switched gears, and used PutFile to store the files in the preferred
>> >> directory structure, but at a loss of how to archive them within their
>> >> folders given hundreds of dynamic categories, and date additions every
>> >> month.
>> >>
>> >> I'm playing with MergeContent's Correlation Attribute Name, but am also
>> >> considering trying the "Degfragment" merge strategy by correlating the files
>> >> earlier in the process.
>> >>
>> >> Any suggestions would be appreciated.
>> >
>> 
> 


Re: archive files

Posted by Mark Payne <ma...@hotmail.com>.
Charlie,

One thing that you should note, specifically when using the Correlation Attribute is the <Maximum number of Bins> property. If the value that
you are using for the Correlation Attribute varies quite a bit, you could quickly fill up the default number of bins (100). In this case, it won't be
able to add a FlowFile to any of the bins until the timeout occurs and as a result it will immediately evict the oldest bin. 

Thanks
-Mark



> On Nov 30, 2015, at 3:05 PM, Charlie Frasure <ch...@gmail.com> wrote:
> 
> Joe,
> 
> Thanks for checking in.  I tried it again and noticed that the correlation attribute in MergeContent doesn't accept expressions.  I was attempting to combine multiple attributes to define a bin, so I moved that expression to an earlier UpdateAttribute process which seemed to resolve my issue.
> 
> Now I'm dealing with bins being released before I think they should, but it seems that there's been other people with the same problem that must've been resolved, so I'll poke on that a bit more before posting.
> 
> Thanks,
> Charlie
>  
> 
> 
> 
> On Mon, Nov 30, 2015 at 1:21 PM, Joe Percivall <joepercivall@yahoo.com <ma...@yahoo.com>> wrote:
> Hello Charlie,
> 
> Sorry no one has gotten back to you yet, everyone is busy getting 0.4.0 finished up and of course Thanksgiving. Have you made any more progress?
> 
> 
> Since it is a continuous task it is well within NiFi's wheelhouse. In your original message you mentioned that you already had them merged in to single flowfile but just had trouble creating the path to do a PutFile. Have you tried using expression language [1] to create the path? Assuming you have attributes for the category and date you should be able to create an expression language expression which properly evaluates to what you need.
> 
> If you need help with creating the proper expression, just reply with the attribute names for the category and dates and I'd be happy to help.
> 
> [1] https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html <https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html>
> 
> Joe
> - - - - - -
> Joseph Percivall
> linkedin.com/in/Percivall <http://linkedin.com/in/Percivall>
> e: joepercivall@yahoo.com <ma...@yahoo.com>
> 
> 
> 
> 
> On Monday, November 23, 2015 11:37 AM, Charlie Frasure <charliefrasure@gmail.com <ma...@gmail.com>> wrote:
> 
> 
> 
> Joe,
> 
> This is a continuous task.  The main intent is to keep a version of the file prior to conversions etc.  Ideally, it would be highly compressed, and easy to locate.  Best case scenario, the archive files are the contents of highly structured nested directories.  File sizes range from a few bytes to < 1GB.  It wouldn't have to run real time (updating archives seems to be a fairly intensive task), but would probably run at least every few days.
> 
> Thanks,
> Charlie
> 
> 
> 
> 
> 
> On Mon, Nov 23, 2015 at 11:08 AM, Joe Witt <joe.witt@gmail.com <ma...@gmail.com>> wrote:
> 
> Charlie,
> >
> >Can give some pointers on how to get in the ballpark with this but
> >want to make sure we have a good alignment of purpose here.  NiFi has
> >from time to time come up as an intuitive way to build an archive
> >management tool and it is always "not quite right" because of the
> >subtle differences between continuous streams of information and
> >ad-hoc sort of one-time tasks.
> >
> >Would this be a continuous task (always running) even if it is slow
> >(every few minutes, hours, days) or would it be a one-time thing to
> >move a bunch of data from one place to another?
> >
> >The difference sounds very minor but it will help me to understand how
> >best to respond.
> >
> >Thanks
> >Joe
> >
> >
> >On Mon, Nov 23, 2015 at 10:54 AM, Charlie Frasure
> ><charliefrasure@gmail.com <ma...@gmail.com>> wrote:
> >> Use case: Archive and compress files by category and month, store like files
> >> in a common directory.
> >>
> >> I'm already processing the files, and have extracted the interesting
> >> attributes from each.  I ran them through MergeContent, but have not been
> >> able to produce a logical directory structure to store the results.  I would
> >> prefer something like archive/categoryA/201511/somefilename.tar.gz where
> >> somefilename is made up of all the categoryA files received in November
> >> 2015.
> >>
> >> I switched gears, and used PutFile to store the files in the preferred
> >> directory structure, but at a loss of how to archive them within their
> >> folders given hundreds of dynamic categories, and date additions every
> >> month.
> >>
> >> I'm playing with MergeContent's Correlation Attribute Name, but am also
> >> considering trying the "Degfragment" merge strategy by correlating the files
> >> earlier in the process.
> >>
> >> Any suggestions would be appreciated.
> >
> 


Re: archive files

Posted by Charlie Frasure <ch...@gmail.com>.
Joe,

Thanks for checking in.  I tried it again and noticed that the correlation
attribute in MergeContent doesn't accept expressions.  I was attempting to
combine multiple attributes to define a bin, so I moved that expression to
an earlier UpdateAttribute process which seemed to resolve my issue.

Now I'm dealing with bins being released before I think they should, but it
seems that there's been other people with the same problem that must've
been resolved, so I'll poke on that a bit more before posting.

Thanks,
Charlie




On Mon, Nov 30, 2015 at 1:21 PM, Joe Percivall <jo...@yahoo.com>
wrote:

> Hello Charlie,
>
> Sorry no one has gotten back to you yet, everyone is busy getting 0.4.0
> finished up and of course Thanksgiving. Have you made any more progress?
>
>
> Since it is a continuous task it is well within NiFi's wheelhouse. In your
> original message you mentioned that you already had them merged in to
> single flowfile but just had trouble creating the path to do a PutFile.
> Have you tried using expression language [1] to create the path? Assuming
> you have attributes for the category and date you should be able to create
> an expression language expression which properly evaluates to what you need.
>
> If you need help with creating the proper expression, just reply with the
> attribute names for the category and dates and I'd be happy to help.
>
> [1]
> https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html
>
> Joe
> - - - - - -
> Joseph Percivall
> linkedin.com/in/Percivall
> e: joepercivall@yahoo.com
>
>
>
>
> On Monday, November 23, 2015 11:37 AM, Charlie Frasure <
> charliefrasure@gmail.com> wrote:
>
>
>
> Joe,
>
> This is a continuous task.  The main intent is to keep a version of the
> file prior to conversions etc.  Ideally, it would be highly compressed, and
> easy to locate.  Best case scenario, the archive files are the contents of
> highly structured nested directories.  File sizes range from a few bytes to
> < 1GB.  It wouldn't have to run real time (updating archives seems to be a
> fairly intensive task), but would probably run at least every few days.
>
> Thanks,
> Charlie
>
>
>
>
>
> On Mon, Nov 23, 2015 at 11:08 AM, Joe Witt <jo...@gmail.com> wrote:
>
> Charlie,
> >
> >Can give some pointers on how to get in the ballpark with this but
> >want to make sure we have a good alignment of purpose here.  NiFi has
> >from time to time come up as an intuitive way to build an archive
> >management tool and it is always "not quite right" because of the
> >subtle differences between continuous streams of information and
> >ad-hoc sort of one-time tasks.
> >
> >Would this be a continuous task (always running) even if it is slow
> >(every few minutes, hours, days) or would it be a one-time thing to
> >move a bunch of data from one place to another?
> >
> >The difference sounds very minor but it will help me to understand how
> >best to respond.
> >
> >Thanks
> >Joe
> >
> >
> >On Mon, Nov 23, 2015 at 10:54 AM, Charlie Frasure
> ><ch...@gmail.com> wrote:
> >> Use case: Archive and compress files by category and month, store like
> files
> >> in a common directory.
> >>
> >> I'm already processing the files, and have extracted the interesting
> >> attributes from each.  I ran them through MergeContent, but have not
> been
> >> able to produce a logical directory structure to store the results.  I
> would
> >> prefer something like archive/categoryA/201511/somefilename.tar.gz where
> >> somefilename is made up of all the categoryA files received in November
> >> 2015.
> >>
> >> I switched gears, and used PutFile to store the files in the preferred
> >> directory structure, but at a loss of how to archive them within their
> >> folders given hundreds of dynamic categories, and date additions every
> >> month.
> >>
> >> I'm playing with MergeContent's Correlation Attribute Name, but am also
> >> considering trying the "Degfragment" merge strategy by correlating the
> files
> >> earlier in the process.
> >>
> >> Any suggestions would be appreciated.
> >
>

Re: archive files

Posted by Joe Percivall <jo...@yahoo.com>.
Hello Charlie,

Sorry no one has gotten back to you yet, everyone is busy getting 0.4.0 finished up and of course Thanksgiving. Have you made any more progress?


Since it is a continuous task it is well within NiFi's wheelhouse. In your original message you mentioned that you already had them merged in to single flowfile but just had trouble creating the path to do a PutFile. Have you tried using expression language [1] to create the path? Assuming you have attributes for the category and date you should be able to create an expression language expression which properly evaluates to what you need. 

If you need help with creating the proper expression, just reply with the attribute names for the category and dates and I'd be happy to help.

[1] https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html

Joe 
- - - - - - 
Joseph Percivall
linkedin.com/in/Percivall
e: joepercivall@yahoo.com




On Monday, November 23, 2015 11:37 AM, Charlie Frasure <ch...@gmail.com> wrote:



Joe,

This is a continuous task.  The main intent is to keep a version of the file prior to conversions etc.  Ideally, it would be highly compressed, and easy to locate.  Best case scenario, the archive files are the contents of highly structured nested directories.  File sizes range from a few bytes to < 1GB.  It wouldn't have to run real time (updating archives seems to be a fairly intensive task), but would probably run at least every few days.

Thanks,
Charlie
 




On Mon, Nov 23, 2015 at 11:08 AM, Joe Witt <jo...@gmail.com> wrote:

Charlie,
>
>Can give some pointers on how to get in the ballpark with this but
>want to make sure we have a good alignment of purpose here.  NiFi has
>from time to time come up as an intuitive way to build an archive
>management tool and it is always "not quite right" because of the
>subtle differences between continuous streams of information and
>ad-hoc sort of one-time tasks.
>
>Would this be a continuous task (always running) even if it is slow
>(every few minutes, hours, days) or would it be a one-time thing to
>move a bunch of data from one place to another?
>
>The difference sounds very minor but it will help me to understand how
>best to respond.
>
>Thanks
>Joe
>
>
>On Mon, Nov 23, 2015 at 10:54 AM, Charlie Frasure
><ch...@gmail.com> wrote:
>> Use case: Archive and compress files by category and month, store like files
>> in a common directory.
>>
>> I'm already processing the files, and have extracted the interesting
>> attributes from each.  I ran them through MergeContent, but have not been
>> able to produce a logical directory structure to store the results.  I would
>> prefer something like archive/categoryA/201511/somefilename.tar.gz where
>> somefilename is made up of all the categoryA files received in November
>> 2015.
>>
>> I switched gears, and used PutFile to store the files in the preferred
>> directory structure, but at a loss of how to archive them within their
>> folders given hundreds of dynamic categories, and date additions every
>> month.
>>
>> I'm playing with MergeContent's Correlation Attribute Name, but am also
>> considering trying the "Degfragment" merge strategy by correlating the files
>> earlier in the process.
>>
>> Any suggestions would be appreciated.
>

Re: archive files

Posted by Charlie Frasure <ch...@gmail.com>.
Joe,

This is a continuous task.  The main intent is to keep a version of the
file prior to conversions etc.  Ideally, it would be highly compressed, and
easy to locate.  Best case scenario, the archive files are the contents of
highly structured nested directories.  File sizes range from a few bytes to
< 1GB.  It wouldn't have to run real time (updating archives seems to be a
fairly intensive task), but would probably run at least every few days.

Thanks,
Charlie




On Mon, Nov 23, 2015 at 11:08 AM, Joe Witt <jo...@gmail.com> wrote:

> Charlie,
>
> Can give some pointers on how to get in the ballpark with this but
> want to make sure we have a good alignment of purpose here.  NiFi has
> from time to time come up as an intuitive way to build an archive
> management tool and it is always "not quite right" because of the
> subtle differences between continuous streams of information and
> ad-hoc sort of one-time tasks.
>
> Would this be a continuous task (always running) even if it is slow
> (every few minutes, hours, days) or would it be a one-time thing to
> move a bunch of data from one place to another?
>
> The difference sounds very minor but it will help me to understand how
> best to respond.
>
> Thanks
> Joe
>
> On Mon, Nov 23, 2015 at 10:54 AM, Charlie Frasure
> <ch...@gmail.com> wrote:
> > Use case: Archive and compress files by category and month, store like
> files
> > in a common directory.
> >
> > I'm already processing the files, and have extracted the interesting
> > attributes from each.  I ran them through MergeContent, but have not been
> > able to produce a logical directory structure to store the results.  I
> would
> > prefer something like archive/categoryA/201511/somefilename.tar.gz where
> > somefilename is made up of all the categoryA files received in November
> > 2015.
> >
> > I switched gears, and used PutFile to store the files in the preferred
> > directory structure, but at a loss of how to archive them within their
> > folders given hundreds of dynamic categories, and date additions every
> > month.
> >
> > I'm playing with MergeContent's Correlation Attribute Name, but am also
> > considering trying the "Degfragment" merge strategy by correlating the
> files
> > earlier in the process.
> >
> > Any suggestions would be appreciated.
>

Re: archive files

Posted by Joe Witt <jo...@gmail.com>.
Charlie,

Can give some pointers on how to get in the ballpark with this but
want to make sure we have a good alignment of purpose here.  NiFi has
from time to time come up as an intuitive way to build an archive
management tool and it is always "not quite right" because of the
subtle differences between continuous streams of information and
ad-hoc sort of one-time tasks.

Would this be a continuous task (always running) even if it is slow
(every few minutes, hours, days) or would it be a one-time thing to
move a bunch of data from one place to another?

The difference sounds very minor but it will help me to understand how
best to respond.

Thanks
Joe

On Mon, Nov 23, 2015 at 10:54 AM, Charlie Frasure
<ch...@gmail.com> wrote:
> Use case: Archive and compress files by category and month, store like files
> in a common directory.
>
> I'm already processing the files, and have extracted the interesting
> attributes from each.  I ran them through MergeContent, but have not been
> able to produce a logical directory structure to store the results.  I would
> prefer something like archive/categoryA/201511/somefilename.tar.gz where
> somefilename is made up of all the categoryA files received in November
> 2015.
>
> I switched gears, and used PutFile to store the files in the preferred
> directory structure, but at a loss of how to archive them within their
> folders given hundreds of dynamic categories, and date additions every
> month.
>
> I'm playing with MergeContent's Correlation Attribute Name, but am also
> considering trying the "Degfragment" merge strategy by correlating the files
> earlier in the process.
>
> Any suggestions would be appreciated.