You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by Kiran <ki...@protonmail.com> on 2018/03/01 21:59:58 UTC
List archive files before extration
Hello,
I've got a NiFi flow which:
1. Ingest archive files (tar.gz, rar and zip)
2. IdentifyMimeType of the archive
3. UnpackContent of the archive
4. Identify which of the files can be processed based on filename
The problem I've got is that a lot of processing time/content repo space is wasted by extracting the archive files and realising that I can't process the file based on the filename.
I was wondering if there was any way of getting a list of the filenames within the archive without actually extracting the files? Based on the filenames I can then decide if I should unpack the archive or not.
Kiran
Re: List archive files before extration
Posted by Ed B <bd...@gmail.com>.
Hey Kiran,
It is possible to do in NIFI using ExecuteProcess processor.
I would implement it as following:
1. Get the file from FS
2. Route on filename extension (tar.gz, zip, rar, etc). when you create
relationships by adding expressions, you can use for example:
${filename:matches('.+\.tar.gz')}
3. use ExecuteProcess to run shell command that will list archived files
(tar -tf file.tar.gz, or unzip -l, or unrar -l).
4. Analyze the content of modified FF with the logic you have for a
filename and finally
5. Unarchive and continue your flow, or stop the flow
I hope I understood your requirements correctly and that will work for you.
Regards,
Ed.
On Thu, Mar 1, 2018 at 5:00 PM Kiran <ki...@protonmail.com> wrote:
> Hello,
>
> I've got a NiFi flow which:
> 1. Ingest archive files (tar.gz, rar and zip)
> 2. IdentifyMimeType of the archive
> 3. UnpackContent of the archive
> 4. Identify which of the files can be processed based on *filename*
>
> The problem I've got is that a lot of processing time/content repo space
> is wasted by extracting the archive files and realising that I can't
> process the file based on the filename.
>
> I was wondering if there was any way of getting a list of the filenames
> within the archive without actually extracting the files? Based on the
> filenames I can then decide if I should unpack the archive or not.
>
> Kiran
>
>
>