You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by "Kulkarni, Suyog" <Su...@csx.com> on 2016/09/07 16:24:04 UTC

Appending files in Hadoop with PutHDFS ...

Hi,

I just wanted to find out if PutHDFS now supports appending files in HDFS or not. I noticed there was a Jira with status "Resolved" for this, but I wanted to know which version has this feature or if there is any patch available for this. Also would like to know if anyone has tried it successfully or not. We are currently running version 0.6.

Thanks,
Suyog Kulkarni
suyog_kulkarni@csx.com<ma...@csx.com>




This email transmission and any accompanying attachments may contain CSX privileged and confidential information intended only for the use of the intended addressee. Any dissemination, distribution, copying or action taken in reliance on the contents of this email by anyone other than the intended recipient is strictly prohibited. If you have received this email in error please immediately delete it and notify sender at the above CSX email address. Sender and CSX accept no liability for any damage caused directly or indirectly by receipt of this email.

Re: Appending files in Hadoop with PutHDFS ...

Posted by Tijo Thomas <ti...@gmail.com>.
Hi ,

I found this Jira open . https://issues.apache.org/jira/browse/NIFI-1322

In some of the commercial distributions of hadoop version(Huawei Hadoop
distribution)  small file features are supported so that  the Nifi can
write lot of small files and later merge into a bingger one.  I think a
good  approach is to  put in kafka/hbase and move it to HDFS based on size.

http://blog.cloudera.com/blog/2009/07/file-appends-in-hdfs/   suggest not
to append .  In that  case  do we need to keep the Jira open ?
I was also an impression that Nifi is going to implement append some how
in future.

Tijo


On Thu, Sep 8, 2016 at 1:28 AM, Jeff <jt...@gmail.com> wrote:

> Hello,
>
> The JIRA to which you are referring (probably https://issues.
> apache.org/jira/browse/NIFI-958) was closed and marked as a duplicate of
> https://issues.apache.org/jira/browse/NIFI-1321 and http
> s://issues.apache.org/jira/browse/NIFI-1322, so that work can be tracked
> separately for PutHDFS and PutFile.  There's some discussions on those
> JIRAs, but to my knowledge (and JIRA-searching skills) there is no
> capability to append files on HDFS.
>
> On Wed, Sep 7, 2016 at 12:24 PM Kulkarni, Suyog <Su...@csx.com>
> wrote:
>
>> Hi,
>>
>> I just wanted to find out if PutHDFS now supports appending files in HDFS
>> or not. I noticed there was a Jira with status "Resolved" for this, but I
>> wanted to know which version has this feature or if there is any patch
>> available for this. Also would like to know if anyone has tried it
>> successfully or not. We are currently running version 0.6.
>>
>> Thanks,
>> Suyog Kulkarni
>> suyog_kulkarni@csx.com<ma...@csx.com>
>>
>>
>>
>>
>> This email transmission and any accompanying attachments may contain CSX
>> privileged and confidential information intended only for the use of the
>> intended addressee. Any dissemination, distribution, copying or action
>> taken in reliance on the contents of this email by anyone other than the
>> intended recipient is strictly prohibited. If you have received this email
>> in error please immediately delete it and notify sender at the above CSX
>> email address. Sender and CSX accept no liability for any damage caused
>> directly or indirectly by receipt of this email.
>>
>

Re: Appending files in Hadoop with PutHDFS ...

Posted by Jeff <jt...@gmail.com>.
Hello,

The JIRA to which you are referring (probably
https://issues.apache.org/jira/browse/NIFI-958) was closed and marked as a
duplicate of https://issues.apache.org/jira/browse/NIFI-1321 and
https://issues.apache.org/jira/browse/NIFI-1322, so that work can be
tracked separately for PutHDFS and PutFile.  There's some discussions on
those JIRAs, but to my knowledge (and JIRA-searching skills) there is no
capability to append files on HDFS.

On Wed, Sep 7, 2016 at 12:24 PM Kulkarni, Suyog <Su...@csx.com>
wrote:

> Hi,
>
> I just wanted to find out if PutHDFS now supports appending files in HDFS
> or not. I noticed there was a Jira with status "Resolved" for this, but I
> wanted to know which version has this feature or if there is any patch
> available for this. Also would like to know if anyone has tried it
> successfully or not. We are currently running version 0.6.
>
> Thanks,
> Suyog Kulkarni
> suyog_kulkarni@csx.com<ma...@csx.com>
>
>
>
>
> This email transmission and any accompanying attachments may contain CSX
> privileged and confidential information intended only for the use of the
> intended addressee. Any dissemination, distribution, copying or action
> taken in reliance on the contents of this email by anyone other than the
> intended recipient is strictly prohibited. If you have received this email
> in error please immediately delete it and notify sender at the above CSX
> email address. Sender and CSX accept no liability for any damage caused
> directly or indirectly by receipt of this email.
>

Re: Appending files in Hadoop with PutHDFS ...

Posted by Jeff <jt...@gmail.com>.
Hello,

The JIRA to which you are referring (probably
https://issues.apache.org/jira/browse/NIFI-958) was closed and marked as a
duplicate of https://issues.apache.org/jira/browse/NIFI-1321 and
https://issues.apache.org/jira/browse/NIFI-1322, so that work can be
tracked separately for PutHDFS and PutFile.  There's some discussions on
those JIRAs, but to my knowledge (and JIRA-searching skills) there is no
capability to append files on HDFS.

On Wed, Sep 7, 2016 at 12:24 PM Kulkarni, Suyog <Su...@csx.com>
wrote:

> Hi,
>
> I just wanted to find out if PutHDFS now supports appending files in HDFS
> or not. I noticed there was a Jira with status "Resolved" for this, but I
> wanted to know which version has this feature or if there is any patch
> available for this. Also would like to know if anyone has tried it
> successfully or not. We are currently running version 0.6.
>
> Thanks,
> Suyog Kulkarni
> suyog_kulkarni@csx.com<ma...@csx.com>
>
>
>
>
> This email transmission and any accompanying attachments may contain CSX
> privileged and confidential information intended only for the use of the
> intended addressee. Any dissemination, distribution, copying or action
> taken in reliance on the contents of this email by anyone other than the
> intended recipient is strictly prohibited. If you have received this email
> in error please immediately delete it and notify sender at the above CSX
> email address. Sender and CSX accept no liability for any damage caused
> directly or indirectly by receipt of this email.
>

RE: Appending files in Hadoop with PutHDFS ...

Posted by "Kulkarni, Suyog" <Su...@csx.com>.
Thanks for your help Matt and Andre. We will try out your proposed solutions.

Regards,
Suyog Kulkarni
Email: suyog_kulkarni@csx.com<ma...@csx.com>

From: Andre [mailto:andre-lists@fucs.org]
Sent: Wednesday, September 07, 2016 8:54 PM
To: users@nifi.apache.org
Subject: Re: Appending files in Hadoop with PutHDFS ...

Suyog,

I suspect you are struggling to get MergeContent into a setting that achieve optimal balance between latency and number of files?

If that is the case, there are a few ways of solving this issue. Matt one is great and very popular around hadoop users, but not the only one:


Without going to vendor specific ways, another possible way of solving this is to use a staging folder in HDFS, and then use NiFi to grab and concatenate the files via

GetHDFS -> MergeContent -> PutHDFS

In summary you would have a flow like this:



Realtime Pipeline:

Listen* (whatever protocol you are using) -> MergeContent_with_low_latency_settings -> PutHDFS_to_staging_folder

Ideally you would name the staging folder after the hour, minute or whatever you want to concatenate based on (e.g. hdfs:/sensor/staging_data/2016/08/31/00 )

your real time apps would point to Staging folder.




Concatenation Pipeline:

GetHDFS_from_staging_folder -> Merge Content -> PutHDFS_to_warm_store

On your GetHDFS_from_staging_folder you set:

* Directory field to use an ExpressionLanguage to look for something like =>  hdfs:/sensor/staging_data/${now():toNumber():minus(3600000):format('yyyy/MM/dd/HH')
  (this assumes an hourly concatenation, adjust to the right balance of files / buckets)

* Batch Size => use something larger, so you can fetch a large number of small files per iteration

Your PutHDFS_to_warm_store destination would be again be dynamically set based on time.





Hope this helps




On Thu, Sep 8, 2016 at 9:06 AM, Matt Burgess <ma...@gmail.com>> wrote:
Suyog,

If MergeContent is not working out, you could put a Hadoop client on the NiFi node, or a NiFi instance on a Hadoop cluster. In the latter case you can put a Remote Process Group on the edge node NiFi and an Input Port on the Hadoop cluster NiFi, then send the files from the edge to the cluster. On the Hadoop NiFi you can use PutHDFS to place the small files, then ExecuteStreamCommand to execute a "hadoop fs -cat" command to bring all the small files together for more efficient processing. I realize it's not ideal but could be a viable workaround until the aforementioned Jiras get resolved.

Regards,
Matt


> On Sep 7, 2016, at 12:54 PM, Kulkarni, Suyog <Su...@csx.com>> wrote:
>
> Thanks Matt.
> Any recommendation for a workaround to achieve this? We are currently getting hundreds of sensor messages/minute that we are ingesting into Hadoop (for further analysis) using PutHDFS processor. But instead of creating hundreds of small message files in HDFS, we would like to have them saved as one large daily or weekly file. We successfully tested the MergeContent processor (to merge the message data and periodically write one big file) but the latency it introduces is not acceptable. What are some other options that we can try?
>
> Suyog Kulkarni
> suyog_kulkarni@csx.com<ma...@csx.com>
>
>
> -----Original Message-----
> From: Matt Burgess [mailto:mattyb149@apache.org<ma...@apache.org>]
> Sent: Wednesday, September 07, 2016 12:30 PM
> To: users@nifi.apache.org<ma...@nifi.apache.org>
> Subject: Re: Appending files in Hadoop with PutHDFS ...
>
> Suyog,
>
> PutHDFS does not support appending files at the moment. I believe the Jira you mentioned is NIFI-958 [1], which is marked Resolved but should be Closed as duplicate. This case was split into two others,
> NIFI-1321 for PutFile [2] and NIFI-1322 for PutHDFS [3]. The latter is not resolved or being actively worked on, and the former appears to have been abandoned in favor of an AppendLog processor.
>
> Regards,
> Matt
>
> [1] https://secure-web.cisco.com/1Z2BohChUCt7WjQqYnmHDRy7kZCsAU1hTdmwqXhD1Z84BMxX-RytYLbcBRv33zRDfYpu9wXqx_yKFJWyR5SMegn9OJby-c3JewEGr65lXwHqYTJ_ix0Q0VU-4VDjiRSd82iJG0oKHfrv6Ivo7RUilQDN7tSjmNblsZsaDhho_-7R88ZQ-3Dgcfl36SpoAUOQB2O6n_uhIZhQTTdksol7c4W3rIZ4l26Qy-P8IIVm5zvSA5_SFxN3fFUADzu16XnHYO6b3S76G9FFVqgyI7pyBeYGohFUsoyxDZhjYJgJMZLVFES5bHwUsgPU0TgrP33Npxqn_isikSwfNmAIuvCJ6YZAeqloaEQCHlwxJ5pioiwCopsksVWoSwswSFVHCHgdx/https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FNIFI-958
> [2] https://secure-web.cisco.com/19T3mDCw6U0hqAOuo87QoFuwEOsjyKQPygdnkLUf4xry38meESVn5ggZOEvhWbSFbK9NPpGn-A56BWwJJXXJs0xEAkhuEHgwPP8YHprSAOWnzn5O_xD6gRtigd-49MGRaItUQgLlUJ0848ZI5JUYHisuyfkCh0s4m1DRvUu_pU0I9mn_gcU-H67qdnGqKKcW6akuAUTLjK4j8dbLhMFMSb3Dnsgrs3bPH1WDjQWEhuL3erNddkJ3VNmsW83oxs9bFWEfRYbBXxVPMJzmhOpozL20bwL6rhPZZ6-RnkQhcZAvQHTCNwGiaNnUduDDx72G6a70If3wko8E_XUroaDmgGuBzK6Wc6oJNI3094Ihn9kEldYqQ-hxwsCAfyIzEiCST/https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FNIFI-1321
> [3] https://secure-web.cisco.com/19_Pxs1eklb1BUrYJIx3hAx13125_GpXkHvn4SDkYNbN9TVGLDBlfsQZ6XxLArnXHO-kbAqOygqpyyX25FgSFPNdaPv3vHsO4URVkwtamH08JQ-2ueutOKGU3SfsqY_Lpz9pXQ-HTqNiIiQWYiEWnFnBwiVfPhknsYcXIzcllpzLxbwVZ3OHMh9H4x_fUA8NrmWVgitsNSwDEZTAx3DQKcPOhQIO8YtT3IwJOwbmR_x7tsjsZVp3g15i9iPPSL6DBWZanTuAKE5Myn31IRLZpA4kYIzvTUCB4ragj8iPDIg6i1KwRxZKMDqjZXJqukPs8vPFfq47Hz3gaxzWUsPsxmNSU3VQoyOwk-yKkSaDFAQ8OdDHZDoxAHhbQl6ICspnE/https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FNIFI-1322
>
>> On Wed, Sep 7, 2016 at 12:24 PM, Kulkarni, Suyog <Su...@csx.com>> wrote:
>> Hi,
>>
>>
>>
>> I just wanted to find out if PutHDFS now supports appending files in
>> HDFS or not. I noticed there was a Jira with status “Resolved” for
>> this, but I wanted to know which version has this feature or if there
>> is any patch available for this. Also would like to know if anyone has
>> tried it successfully or not. We are currently running version 0.6.
>>
>>
>>
>> Thanks,
>>
>> Suyog Kulkarni
>>
>> suyog_kulkarni@csx.com<ma...@csx.com>
>>
>>
>>
>>
>>
>>
>> This email transmission and any accompanying attachments may contain
>> CSX privileged and confidential information intended only for the use
>> of the intended addressee. Any dissemination, distribution, copying or
>> action taken in reliance on the contents of this email by anyone other
>> than the intended recipient is strictly prohibited. If you have
>> received this email in error please immediately delete it and notify
>> sender at the above CSX email address. Sender and CSX accept no
>> liability for any damage caused directly or indirectly by receipt of this email.
>
>
>
>
> This email transmission and any accompanying attachments may contain CSX privileged and confidential information intended only for the use of the intended addressee. Any dissemination, distribution, copying or action taken in reliance on the contents of this email by anyone other than the intended recipient is strictly prohibited. If you have received this email in error please immediately delete it and notify sender at the above CSX email address. Sender and CSX accept no liability for any damage caused directly or indirectly by receipt of this email.


Re: Appending files in Hadoop with PutHDFS ...

Posted by Andre <an...@fucs.org>.
Suyog,

I suspect you are struggling to get MergeContent into a setting that
achieve optimal balance between latency and number of files?

If that is the case, there are a few ways of solving this issue. Matt one
is great and very popular around hadoop users, but not the only one:


Without going to vendor specific ways, another possible way of solving this
is to use a staging folder in HDFS, and then use NiFi to grab and
concatenate the files via

GetHDFS -> MergeContent -> PutHDFS

In summary you would have a flow like this:



Realtime Pipeline:

Listen* (whatever protocol you are using) ->
MergeContent_with_low_latency_settings -> PutHDFS_to_staging_folder

Ideally you would name the staging folder after the hour, minute or
whatever you want to concatenate based on (e.g.
hdfs:/sensor/staging_data/2016/08/31/00 )

your real time apps would point to Staging folder.




Concatenation Pipeline:

GetHDFS_from_staging_folder -> Merge Content -> PutHDFS_to_warm_store

On your GetHDFS_from_staging_folder you set:

* Directory field to use an ExpressionLanguage to look for something like
=>
 hdfs:/sensor/staging_data/${now():toNumber():minus(3600000):format('yyyy/MM/dd/HH')
  (this assumes an hourly concatenation, adjust to the right balance of
files / buckets)

* Batch Size => use something larger, so you can fetch a large number of
small files per iteration

Your PutHDFS_to_warm_store destination would be again be dynamically set
based on time.





Hope this helps




On Thu, Sep 8, 2016 at 9:06 AM, Matt Burgess <ma...@gmail.com> wrote:

> Suyog,
>
> If MergeContent is not working out, you could put a Hadoop client on the
> NiFi node, or a NiFi instance on a Hadoop cluster. In the latter case you
> can put a Remote Process Group on the edge node NiFi and an Input Port on
> the Hadoop cluster NiFi, then send the files from the edge to the cluster.
> On the Hadoop NiFi you can use PutHDFS to place the small files, then
> ExecuteStreamCommand to execute a "hadoop fs -cat" command to bring all the
> small files together for more efficient processing. I realize it's not
> ideal but could be a viable workaround until the aforementioned Jiras get
> resolved.
>
> Regards,
> Matt
>
>
> > On Sep 7, 2016, at 12:54 PM, Kulkarni, Suyog <Su...@csx.com>
> wrote:
> >
> > Thanks Matt.
> > Any recommendation for a workaround to achieve this? We are currently
> getting hundreds of sensor messages/minute that we are ingesting into
> Hadoop (for further analysis) using PutHDFS processor. But instead of
> creating hundreds of small message files in HDFS, we would like to have
> them saved as one large daily or weekly file. We successfully tested the
> MergeContent processor (to merge the message data and periodically write
> one big file) but the latency it introduces is not acceptable. What are
> some other options that we can try?
> >
> > Suyog Kulkarni
> > suyog_kulkarni@csx.com
> >
> >
> > -----Original Message-----
> > From: Matt Burgess [mailto:mattyb149@apache.org]
> > Sent: Wednesday, September 07, 2016 12:30 PM
> > To: users@nifi.apache.org
> > Subject: Re: Appending files in Hadoop with PutHDFS ...
> >
> > Suyog,
> >
> > PutHDFS does not support appending files at the moment. I believe the
> Jira you mentioned is NIFI-958 [1], which is marked Resolved but should be
> Closed as duplicate. This case was split into two others,
> > NIFI-1321 for PutFile [2] and NIFI-1322 for PutHDFS [3]. The latter is
> not resolved or being actively worked on, and the former appears to have
> been abandoned in favor of an AppendLog processor.
> >
> > Regards,
> > Matt
> >
> > [1] https://secure-web.cisco.com/1Z2BohChUCt7WjQqYnmHDRy7kZCsAU
> 1hTdmwqXhD1Z84BMxX-RytYLbcBRv33zRDfYpu9wXqx_yKFJWyR5SMegn9OJby-
> c3JewEGr65lXwHqYTJ_ix0Q0VU-4VDjiRSd82iJG0oKHfrv6Ivo7RUilQ
> DN7tSjmNblsZsaDhho_-7R88ZQ-3Dgcfl36SpoAUOQB2O6n_
> uhIZhQTTdksol7c4W3rIZ4l26Qy-P8IIVm5zvSA5_SFxN3fFUADzu16XnHYO6b3S76G9FFV
> qgyI7pyBeYGohFUsoyxDZhjYJgJMZLVFES5bHwUsgPU0TgrP33Npxqn_
> isikSwfNmAIuvCJ6YZAeqloaEQCHlwxJ5pioiwCopsksVWoSwswSFVHCHgdx
> /https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FNIFI-958
> > [2] https://secure-web.cisco.com/19T3mDCw6U0hqAOuo87QoFuwEOsjyK
> QPygdnkLUf4xry38meESVn5ggZOEvhWbSFbK9NPpGn-A56BWwJJXXJs0xEAkhuEHgwPP8YHpr
> SAOWnzn5O_xD6gRtigd-49MGRaItUQgLlUJ0848ZI5JUYHisuy
> fkCh0s4m1DRvUu_pU0I9mn_gcU-H67qdnGqKKcW6akuAUTLjK4j8dbLhM
> FMSb3Dnsgrs3bPH1WDjQWEhuL3erNddkJ3VNmsW83oxs9bFWEfRYbBXxVPMJ
> zmhOpozL20bwL6rhPZZ6-RnkQhcZAvQHTCNwGiaNnUduDDx72G6a70If3wko8E_
> XUroaDmgGuBzK6Wc6oJNI3094Ihn9kEldYqQ-hxwsCAfyIzEiCST/https%
> 3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FNIFI-1321
> > [3] https://secure-web.cisco.com/19_Pxs1eklb1BUrYJIx3hAx13125_
> GpXkHvn4SDkYNbN9TVGLDBlfsQZ6XxLArnXHO-kbAqOygqpyyX25FgSFPNdaPv3vHsO4
> URVkwtamH08JQ-2ueutOKGU3SfsqY_Lpz9pXQ-HTqNiIiQWYiEWnFnBwiVfPhknsYcXI
> zcllpzLxbwVZ3OHMh9H4x_fUA8NrmWVgitsNSwDEZTAx3DQKcPOhQIO8YtT3IwJOwbmR_
> x7tsjsZVp3g15i9iPPSL6DBWZanTuAKE5Myn31IRLZpA4kYIzvTUCB4ragj8
> iPDIg6i1KwRxZKMDqjZXJqukPs8vPFfq47Hz3gaxzWUsPsxmNSU3VQoyOwk-
> yKkSaDFAQ8OdDHZDoxAHhbQl6ICspnE/https%3A%2F%2Fissues.apache.
> org%2Fjira%2Fbrowse%2FNIFI-1322
> >
> >> On Wed, Sep 7, 2016 at 12:24 PM, Kulkarni, Suyog <
> Suyog_Kulkarni@csx.com> wrote:
> >> Hi,
> >>
> >>
> >>
> >> I just wanted to find out if PutHDFS now supports appending files in
> >> HDFS or not. I noticed there was a Jira with status “Resolved” for
> >> this, but I wanted to know which version has this feature or if there
> >> is any patch available for this. Also would like to know if anyone has
> >> tried it successfully or not. We are currently running version 0.6.
> >>
> >>
> >>
> >> Thanks,
> >>
> >> Suyog Kulkarni
> >>
> >> suyog_kulkarni@csx.com
> >>
> >>
> >>
> >>
> >>
> >>
> >> This email transmission and any accompanying attachments may contain
> >> CSX privileged and confidential information intended only for the use
> >> of the intended addressee. Any dissemination, distribution, copying or
> >> action taken in reliance on the contents of this email by anyone other
> >> than the intended recipient is strictly prohibited. If you have
> >> received this email in error please immediately delete it and notify
> >> sender at the above CSX email address. Sender and CSX accept no
> >> liability for any damage caused directly or indirectly by receipt of
> this email.
> >
> >
> >
> >
> > This email transmission and any accompanying attachments may contain CSX
> privileged and confidential information intended only for the use of the
> intended addressee. Any dissemination, distribution, copying or action
> taken in reliance on the contents of this email by anyone other than the
> intended recipient is strictly prohibited. If you have received this email
> in error please immediately delete it and notify sender at the above CSX
> email address. Sender and CSX accept no liability for any damage caused
> directly or indirectly by receipt of this email.
>

Re: Appending files in Hadoop with PutHDFS ...

Posted by Matt Burgess <ma...@gmail.com>.
Suyog,

If MergeContent is not working out, you could put a Hadoop client on the NiFi node, or a NiFi instance on a Hadoop cluster. In the latter case you can put a Remote Process Group on the edge node NiFi and an Input Port on the Hadoop cluster NiFi, then send the files from the edge to the cluster. On the Hadoop NiFi you can use PutHDFS to place the small files, then ExecuteStreamCommand to execute a "hadoop fs -cat" command to bring all the small files together for more efficient processing. I realize it's not ideal but could be a viable workaround until the aforementioned Jiras get resolved.

Regards,
Matt


> On Sep 7, 2016, at 12:54 PM, Kulkarni, Suyog <Su...@csx.com> wrote:
> 
> Thanks Matt. 
> Any recommendation for a workaround to achieve this? We are currently getting hundreds of sensor messages/minute that we are ingesting into Hadoop (for further analysis) using PutHDFS processor. But instead of creating hundreds of small message files in HDFS, we would like to have them saved as one large daily or weekly file. We successfully tested the MergeContent processor (to merge the message data and periodically write one big file) but the latency it introduces is not acceptable. What are some other options that we can try?
> 
> Suyog Kulkarni
> suyog_kulkarni@csx.com
> 
> 
> -----Original Message-----
> From: Matt Burgess [mailto:mattyb149@apache.org] 
> Sent: Wednesday, September 07, 2016 12:30 PM
> To: users@nifi.apache.org
> Subject: Re: Appending files in Hadoop with PutHDFS ...
> 
> Suyog,
> 
> PutHDFS does not support appending files at the moment. I believe the Jira you mentioned is NIFI-958 [1], which is marked Resolved but should be Closed as duplicate. This case was split into two others,
> NIFI-1321 for PutFile [2] and NIFI-1322 for PutHDFS [3]. The latter is not resolved or being actively worked on, and the former appears to have been abandoned in favor of an AppendLog processor.
> 
> Regards,
> Matt
> 
> [1] https://secure-web.cisco.com/1Z2BohChUCt7WjQqYnmHDRy7kZCsAU1hTdmwqXhD1Z84BMxX-RytYLbcBRv33zRDfYpu9wXqx_yKFJWyR5SMegn9OJby-c3JewEGr65lXwHqYTJ_ix0Q0VU-4VDjiRSd82iJG0oKHfrv6Ivo7RUilQDN7tSjmNblsZsaDhho_-7R88ZQ-3Dgcfl36SpoAUOQB2O6n_uhIZhQTTdksol7c4W3rIZ4l26Qy-P8IIVm5zvSA5_SFxN3fFUADzu16XnHYO6b3S76G9FFVqgyI7pyBeYGohFUsoyxDZhjYJgJMZLVFES5bHwUsgPU0TgrP33Npxqn_isikSwfNmAIuvCJ6YZAeqloaEQCHlwxJ5pioiwCopsksVWoSwswSFVHCHgdx/https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FNIFI-958
> [2] https://secure-web.cisco.com/19T3mDCw6U0hqAOuo87QoFuwEOsjyKQPygdnkLUf4xry38meESVn5ggZOEvhWbSFbK9NPpGn-A56BWwJJXXJs0xEAkhuEHgwPP8YHprSAOWnzn5O_xD6gRtigd-49MGRaItUQgLlUJ0848ZI5JUYHisuyfkCh0s4m1DRvUu_pU0I9mn_gcU-H67qdnGqKKcW6akuAUTLjK4j8dbLhMFMSb3Dnsgrs3bPH1WDjQWEhuL3erNddkJ3VNmsW83oxs9bFWEfRYbBXxVPMJzmhOpozL20bwL6rhPZZ6-RnkQhcZAvQHTCNwGiaNnUduDDx72G6a70If3wko8E_XUroaDmgGuBzK6Wc6oJNI3094Ihn9kEldYqQ-hxwsCAfyIzEiCST/https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FNIFI-1321
> [3] https://secure-web.cisco.com/19_Pxs1eklb1BUrYJIx3hAx13125_GpXkHvn4SDkYNbN9TVGLDBlfsQZ6XxLArnXHO-kbAqOygqpyyX25FgSFPNdaPv3vHsO4URVkwtamH08JQ-2ueutOKGU3SfsqY_Lpz9pXQ-HTqNiIiQWYiEWnFnBwiVfPhknsYcXIzcllpzLxbwVZ3OHMh9H4x_fUA8NrmWVgitsNSwDEZTAx3DQKcPOhQIO8YtT3IwJOwbmR_x7tsjsZVp3g15i9iPPSL6DBWZanTuAKE5Myn31IRLZpA4kYIzvTUCB4ragj8iPDIg6i1KwRxZKMDqjZXJqukPs8vPFfq47Hz3gaxzWUsPsxmNSU3VQoyOwk-yKkSaDFAQ8OdDHZDoxAHhbQl6ICspnE/https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FNIFI-1322
> 
>> On Wed, Sep 7, 2016 at 12:24 PM, Kulkarni, Suyog <Su...@csx.com> wrote:
>> Hi,
>> 
>> 
>> 
>> I just wanted to find out if PutHDFS now supports appending files in 
>> HDFS or not. I noticed there was a Jira with status “Resolved” for 
>> this, but I wanted to know which version has this feature or if there 
>> is any patch available for this. Also would like to know if anyone has 
>> tried it successfully or not. We are currently running version 0.6.
>> 
>> 
>> 
>> Thanks,
>> 
>> Suyog Kulkarni
>> 
>> suyog_kulkarni@csx.com
>> 
>> 
>> 
>> 
>> 
>> 
>> This email transmission and any accompanying attachments may contain 
>> CSX privileged and confidential information intended only for the use 
>> of the intended addressee. Any dissemination, distribution, copying or 
>> action taken in reliance on the contents of this email by anyone other 
>> than the intended recipient is strictly prohibited. If you have 
>> received this email in error please immediately delete it and notify 
>> sender at the above CSX email address. Sender and CSX accept no 
>> liability for any damage caused directly or indirectly by receipt of this email.
> 
> 
> 
> 
> This email transmission and any accompanying attachments may contain CSX privileged and confidential information intended only for the use of the intended addressee. Any dissemination, distribution, copying or action taken in reliance on the contents of this email by anyone other than the intended recipient is strictly prohibited. If you have received this email in error please immediately delete it and notify sender at the above CSX email address. Sender and CSX accept no liability for any damage caused directly or indirectly by receipt of this email.

RE: Appending files in Hadoop with PutHDFS ...

Posted by "Kulkarni, Suyog" <Su...@csx.com>.
Thanks Matt. 
Any recommendation for a workaround to achieve this? We are currently getting hundreds of sensor messages/minute that we are ingesting into Hadoop (for further analysis) using PutHDFS processor. But instead of creating hundreds of small message files in HDFS, we would like to have them saved as one large daily or weekly file. We successfully tested the MergeContent processor (to merge the message data and periodically write one big file) but the latency it introduces is not acceptable. What are some other options that we can try?

Suyog Kulkarni
suyog_kulkarni@csx.com


-----Original Message-----
From: Matt Burgess [mailto:mattyb149@apache.org] 
Sent: Wednesday, September 07, 2016 12:30 PM
To: users@nifi.apache.org
Subject: Re: Appending files in Hadoop with PutHDFS ...

Suyog,

PutHDFS does not support appending files at the moment. I believe the Jira you mentioned is NIFI-958 [1], which is marked Resolved but should be Closed as duplicate. This case was split into two others,
NIFI-1321 for PutFile [2] and NIFI-1322 for PutHDFS [3]. The latter is not resolved or being actively worked on, and the former appears to have been abandoned in favor of an AppendLog processor.

Regards,
Matt

[1] https://secure-web.cisco.com/1Z2BohChUCt7WjQqYnmHDRy7kZCsAU1hTdmwqXhD1Z84BMxX-RytYLbcBRv33zRDfYpu9wXqx_yKFJWyR5SMegn9OJby-c3JewEGr65lXwHqYTJ_ix0Q0VU-4VDjiRSd82iJG0oKHfrv6Ivo7RUilQDN7tSjmNblsZsaDhho_-7R88ZQ-3Dgcfl36SpoAUOQB2O6n_uhIZhQTTdksol7c4W3rIZ4l26Qy-P8IIVm5zvSA5_SFxN3fFUADzu16XnHYO6b3S76G9FFVqgyI7pyBeYGohFUsoyxDZhjYJgJMZLVFES5bHwUsgPU0TgrP33Npxqn_isikSwfNmAIuvCJ6YZAeqloaEQCHlwxJ5pioiwCopsksVWoSwswSFVHCHgdx/https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FNIFI-958
[2] https://secure-web.cisco.com/19T3mDCw6U0hqAOuo87QoFuwEOsjyKQPygdnkLUf4xry38meESVn5ggZOEvhWbSFbK9NPpGn-A56BWwJJXXJs0xEAkhuEHgwPP8YHprSAOWnzn5O_xD6gRtigd-49MGRaItUQgLlUJ0848ZI5JUYHisuyfkCh0s4m1DRvUu_pU0I9mn_gcU-H67qdnGqKKcW6akuAUTLjK4j8dbLhMFMSb3Dnsgrs3bPH1WDjQWEhuL3erNddkJ3VNmsW83oxs9bFWEfRYbBXxVPMJzmhOpozL20bwL6rhPZZ6-RnkQhcZAvQHTCNwGiaNnUduDDx72G6a70If3wko8E_XUroaDmgGuBzK6Wc6oJNI3094Ihn9kEldYqQ-hxwsCAfyIzEiCST/https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FNIFI-1321
[3] https://secure-web.cisco.com/19_Pxs1eklb1BUrYJIx3hAx13125_GpXkHvn4SDkYNbN9TVGLDBlfsQZ6XxLArnXHO-kbAqOygqpyyX25FgSFPNdaPv3vHsO4URVkwtamH08JQ-2ueutOKGU3SfsqY_Lpz9pXQ-HTqNiIiQWYiEWnFnBwiVfPhknsYcXIzcllpzLxbwVZ3OHMh9H4x_fUA8NrmWVgitsNSwDEZTAx3DQKcPOhQIO8YtT3IwJOwbmR_x7tsjsZVp3g15i9iPPSL6DBWZanTuAKE5Myn31IRLZpA4kYIzvTUCB4ragj8iPDIg6i1KwRxZKMDqjZXJqukPs8vPFfq47Hz3gaxzWUsPsxmNSU3VQoyOwk-yKkSaDFAQ8OdDHZDoxAHhbQl6ICspnE/https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FNIFI-1322

On Wed, Sep 7, 2016 at 12:24 PM, Kulkarni, Suyog <Su...@csx.com> wrote:
> Hi,
>
>
>
> I just wanted to find out if PutHDFS now supports appending files in 
> HDFS or not. I noticed there was a Jira with status “Resolved” for 
> this, but I wanted to know which version has this feature or if there 
> is any patch available for this. Also would like to know if anyone has 
> tried it successfully or not. We are currently running version 0.6.
>
>
>
> Thanks,
>
> Suyog Kulkarni
>
> suyog_kulkarni@csx.com
>
>
>
>
>
>
> This email transmission and any accompanying attachments may contain 
> CSX privileged and confidential information intended only for the use 
> of the intended addressee. Any dissemination, distribution, copying or 
> action taken in reliance on the contents of this email by anyone other 
> than the intended recipient is strictly prohibited. If you have 
> received this email in error please immediately delete it and notify 
> sender at the above CSX email address. Sender and CSX accept no 
> liability for any damage caused directly or indirectly by receipt of this email.




This email transmission and any accompanying attachments may contain CSX privileged and confidential information intended only for the use of the intended addressee. Any dissemination, distribution, copying or action taken in reliance on the contents of this email by anyone other than the intended recipient is strictly prohibited. If you have received this email in error please immediately delete it and notify sender at the above CSX email address. Sender and CSX accept no liability for any damage caused directly or indirectly by receipt of this email.

Re: Appending files in Hadoop with PutHDFS ...

Posted by Matt Burgess <ma...@apache.org>.
Suyog,

PutHDFS does not support appending files at the moment. I believe the
Jira you mentioned is NIFI-958 [1], which is marked Resolved but
should be Closed as duplicate. This case was split into two others,
NIFI-1321 for PutFile [2] and NIFI-1322 for PutHDFS [3]. The latter is
not resolved or being actively worked on, and the former appears to
have been abandoned in favor of an AppendLog processor.

Regards,
Matt

[1] https://issues.apache.org/jira/browse/NIFI-958
[2] https://issues.apache.org/jira/browse/NIFI-1321
[3] https://issues.apache.org/jira/browse/NIFI-1322

On Wed, Sep 7, 2016 at 12:24 PM, Kulkarni, Suyog <Su...@csx.com> wrote:
> Hi,
>
>
>
> I just wanted to find out if PutHDFS now supports appending files in HDFS or
> not. I noticed there was a Jira with status “Resolved” for this, but I
> wanted to know which version has this feature or if there is any patch
> available for this. Also would like to know if anyone has tried it
> successfully or not. We are currently running version 0.6.
>
>
>
> Thanks,
>
> Suyog Kulkarni
>
> suyog_kulkarni@csx.com
>
>
>
>
>
>
> This email transmission and any accompanying attachments may contain CSX
> privileged and confidential information intended only for the use of the
> intended addressee. Any dissemination, distribution, copying or action taken
> in reliance on the contents of this email by anyone other than the intended
> recipient is strictly prohibited. If you have received this email in error
> please immediately delete it and notify sender at the above CSX email
> address. Sender and CSX accept no liability for any damage caused directly
> or indirectly by receipt of this email.