You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nifi.apache.org by lu...@china-inv.cn on 2017/11/20 08:16:45 UTC

How to delete the data in the flowfile?

Hi, All,

We use NiFi to import data from Oracle database to Hive.

The first step is to extract all data from the Oracle database and persist 
it into the flowfile 
which will then 'flow' into other processors to do further processing.

After persisting the data into the Hive, we found that the data persisted 
in the first step were not
deteled. This will occupied a lot of disk spaces.

So is there any way to tell NiFi to delete those data after the next 
processor has finished reading the data?

Thanks

Boying



 
本邮件内容包含保密信息。如阁下并非拟发送的收件人,请您不要阅读、保存、对外
披露或复制本邮件的任何内容,或者打开本邮件的任何附件。请即回复邮件告知发件
人,并立刻将该邮件及其附件从您的电脑系统中全部删除,不胜感激。

 
This email message may contain confidential and/or privileged information. 
If you are not the intended recipient, please do not read, save, forward, 
disclose or copy the contents of this email or open any file attached to 
this email. We will be grateful if you could advise the sender immediately 
by replying this email, and delete this email and any attachment or links 
to this email completely and immediately from your computer system. 




答复: Re: How to delete the data in the flowfile?

Posted by lu...@china-inv.cn.
Very appreicate for your helpl. It's very helpful. :)



发件人: 
Jeff <jt...@gmail.com>
收件人:
dev@nifi.apache.org
日期:
2017/11/20 22:07
主题:
Re: How to delete the data in the flowfile?



Hello Boying,

Once flowfiles have completed processing, they may still be archived 
within
the content repository for a certain period of time before they age-off. 
In
the NiFi Admin guide, there is a section on Content Repository properties
[1] you can set in nifi.properties, through which you can tweak how much
space is used to archive, how long flowfiles are archived, or to disable
archiving completely.

Lowering the "nifi.content.repository.archive.max.retention.period" and
"nifi.content.repository.archive.max.usage.percentage" properties can help
limit the amount of disk space the content repository uses for archived
flowfiles.  You can disable content archiving by setting
"nifi.content.repository.archive.enabled" to false if you prefer to have 
no
archive at all.

If your flow uses a processor like PutFile to place a flowfile in a
temporary directory to do further processing on it, or to allow "backups"
of the flowfile for various stages of processing, then your flow must be
designed to clean up those files after they are no longer needed.  There
are several ways to do this, one of them being Wait/Notify processors.
There's a blog that Koji has written [2] with some examples on how to use
the Wait and Notify processors, and the concepts covered in the blog 
should
be usable in your case where you might want to use the Wait/Notify
processors to signal that flowfiles that are no longer needed that have
been explicitly archived/copied by processors like "PutFile" can be 
removed.

Please let me know if neither of these solutions help with disk space
issues while using your flow.  If you provide your flow as an example, we
can take a look at other ways to try to minimize disk usage.

[1]
https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#file-system-content-repository-properties

[2]
http://ijokarumawak.github.io/nifi/2017/02/02/nifi-notify-batch/#alternative-solution-waitnotify


On Mon, Nov 20, 2017 at 3:16 AM <lu...@china-inv.cn> wrote:

> Hi, All,
>
> We use NiFi to import data from Oracle database to Hive.
>
> The first step is to extract all data from the Oracle database and 
persist
> it into the flowfile
> which will then 'flow' into other processors to do further processing.
>
> After persisting the data into the Hive, we found that the data 
persisted
> in the first step were not
> deteled. This will occupied a lot of disk spaces.
>
> So is there any way to tell NiFi to delete those data after the next
> processor has finished reading the data?
>
> Thanks
>
> Boying
>
>
>
>
> 本邮件内容包含保密信息。如阁下并非拟发送的收件人,请您不要阅读、保存、对
外
> 披露或复制本邮件的任何内容,或者打开本邮件的任何附件。请即回复邮件告知发
件
> 人,并立刻将该邮件及其附件从您的电脑系统中全部删除,不胜感激。
>
>
> This email message may contain confidential and/or privileged 
information.
> If you are not the intended recipient, please do not read, save, 
forward,
> disclose or copy the contents of this email or open any file attached to
> this email. We will be grateful if you could advise the sender 
immediately
> by replying this email, and delete this email and any attachment or 
links
> to this email completely and immediately from your computer system.
>
>
>
>





 
本邮件内容包含保密信息。如阁下并非拟发送的收件人,请您不要阅读、保存、对外
披露或复制本邮件的任何内容,或者打开本邮件的任何附件。请即回复邮件告知发件
人,并立刻将该邮件及其附件从您的电脑系统中全部删除,不胜感激。

 
This email message may contain confidential and/or privileged information. 
If you are not the intended recipient, please do not read, save, forward, 
disclose or copy the contents of this email or open any file attached to 
this email. We will be grateful if you could advise the sender immediately 
by replying this email, and delete this email and any attachment or links 
to this email completely and immediately from your computer system. 




Re: How to delete the data in the flowfile?

Posted by Jeff <jt...@gmail.com>.
Hello Boying,

Once flowfiles have completed processing, they may still be archived within
the content repository for a certain period of time before they age-off. In
the NiFi Admin guide, there is a section on Content Repository properties
[1] you can set in nifi.properties, through which you can tweak how much
space is used to archive, how long flowfiles are archived, or to disable
archiving completely.

Lowering the "nifi.content.repository.archive.max.retention.period" and
"nifi.content.repository.archive.max.usage.percentage" properties can help
limit the amount of disk space the content repository uses for archived
flowfiles.  You can disable content archiving by setting
"nifi.content.repository.archive.enabled" to false if you prefer to have no
archive at all.

If your flow uses a processor like PutFile to place a flowfile in a
temporary directory to do further processing on it, or to allow "backups"
of the flowfile for various stages of processing, then your flow must be
designed to clean up those files after they are no longer needed.  There
are several ways to do this, one of them being Wait/Notify processors.
There's a blog that Koji has written [2] with some examples on how to use
the Wait and Notify processors, and the concepts covered in the blog should
be usable in your case where you might want to use the Wait/Notify
processors to signal that flowfiles that are no longer needed that have
been explicitly archived/copied by processors like "PutFile" can be removed.

Please let me know if neither of these solutions help with disk space
issues while using your flow.  If you provide your flow as an example, we
can take a look at other ways to try to minimize disk usage.

[1]
https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#file-system-content-repository-properties
[2]
http://ijokarumawak.github.io/nifi/2017/02/02/nifi-notify-batch/#alternative-solution-waitnotify

On Mon, Nov 20, 2017 at 3:16 AM <lu...@china-inv.cn> wrote:

> Hi, All,
>
> We use NiFi to import data from Oracle database to Hive.
>
> The first step is to extract all data from the Oracle database and persist
> it into the flowfile
> which will then 'flow' into other processors to do further processing.
>
> After persisting the data into the Hive, we found that the data persisted
> in the first step were not
> deteled. This will occupied a lot of disk spaces.
>
> So is there any way to tell NiFi to delete those data after the next
> processor has finished reading the data?
>
> Thanks
>
> Boying
>
>
>
>
> 本邮件内容包含保密信息。如阁下并非拟发送的收件人,请您不要阅读、保存、对外
> 披露或复制本邮件的任何内容,或者打开本邮件的任何附件。请即回复邮件告知发件
> 人,并立刻将该邮件及其附件从您的电脑系统中全部删除,不胜感激。
>
>
> This email message may contain confidential and/or privileged information.
> If you are not the intended recipient, please do not read, save, forward,
> disclose or copy the contents of this email or open any file attached to
> this email. We will be grateful if you could advise the sender immediately
> by replying this email, and delete this email and any attachment or links
> to this email completely and immediately from your computer system.
>
>
>
>