You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by Kiran <ki...@protonmail.com> on 2018/03/01 21:59:58 UTC

List archive files before extration

Hello,

I've got a NiFi flow which:
1. Ingest archive files (tar.gz, rar and zip)
2. IdentifyMimeType of the archive
3. UnpackContent of the archive
4. Identify which of the files can be processed based on filename

The problem I've got is that a lot of processing time/content repo space is wasted by extracting the archive files and realising that I can't process the file based on the filename.

I was wondering if there was any way of getting a list of the filenames within the archive without actually extracting the files? Based on the filenames I can then decide if I should unpack the archive or not.

Kiran

Re: List archive files before extration

Posted by Ed B <bd...@gmail.com>.
Hey Kiran,

It is possible to do in NIFI using ExecuteProcess processor.
I would implement it as following:
1. Get the file from FS
2. Route on filename extension (tar.gz, zip, rar, etc). when you create
relationships by adding expressions, you can use for example:
${filename:matches('.+\.tar.gz')}
3. use ExecuteProcess to run shell command that will list archived files
(tar -tf file.tar.gz, or unzip -l, or unrar -l).
4. Analyze the content of modified FF with the logic you have for a
filename and finally
5. Unarchive and continue your flow, or stop the flow

I hope I understood your requirements correctly and that will work for you.

Regards,
Ed.

On Thu, Mar 1, 2018 at 5:00 PM Kiran <ki...@protonmail.com> wrote:

> Hello,
>
> I've got a NiFi flow which:
> 1. Ingest archive files (tar.gz, rar and zip)
> 2. IdentifyMimeType of the archive
> 3. UnpackContent of the archive
> 4. Identify which of the files can be processed based on *filename*
>
> The problem I've got is that a lot of processing time/content repo space
> is wasted by extracting the archive files and realising that I can't
> process the file based on the filename.
>
> I was wondering if there was any way of getting a list of the filenames
> within the archive without actually extracting the files? Based on the
> filenames I can then decide if I should unpack the archive or not.
>
> Kiran
>
>
>