You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nifi.apache.org by Matthew Clarke <ma...@gmail.com> on 2016/05/10 22:06:56 UTC

Re: Reg: Get files from ftp

The list type processors are designed to use NiFi state management to keep
from listing the same files twice. The fetch type processors with retrieve
files based on the FlowFiles it is fed. Typically those FlowFiles it works
from come from the corresponding list processor.
On May 10, 2016 8:56 AM, "Mark Payne" <ma...@hotmail.com> wrote:

> Sourav,
>
> Sure. Within the nifi-standard-processors bundle are a few classes that
> would be important here.
> First is the AbstractListProcessor. You'll want to use this as your base
> class for ListFTP. Also, FetchFileTransfer
> will be the class that you'll extend for the FetchFTP processor.
>
> The ListSFTP and FetchSFTP are great examples to look at as examples.
>
> Additionally, the GetFTP and GetSFTP are good examples to look at as to
> how the FTP & SFTP implementations
> differ. They basically differ in the Property Descriptors provided and the
> FileTransfer object that is used.
>
> If you have any questions, please feel free to reach out to this mailing
> list. Very happy to help however we can!
>
> Thanks
> -Mark
>
>
> > On May 10, 2016, at 1:30 AM, Sourav Gulati <so...@impetus.co.in>
> wrote:
> >
> > Sure Mark. I am interested to work on it. Please provide some pointers
> regarding that.
> >
> > Also, I will check if Sftp can be used. So ListSFTP / FetchSFTP won't
> pick files more than once?
> >
> > Regards,
> > Sourav Gulati
> >
> > -----Original Message-----
> > From: Mark Payne [mailto:markap14@hotmail.com]
> > Sent: Monday, May 09, 2016 5:34 PM
> > To: dev@nifi.apache.org
> > Subject: Re: Reg: Get files from ftp
> >
> > Sourav,
> >
> > We have begun transitioning from many of the Get*** Processors to
> List*** and Fetch*** Processors.
> > There is a ListSFTP / FetchSFTP processor set but not currently a
> List/Fetch FTP. Is SFTP a possibility for you? Would you be interested in
> working on a List/Fetch FTP Processor set?
> >
> > Thanks
> > -Mark
> >
> >> On May 9, 2016, at 5:48 AM, Sourav Gulati <so...@impetus.co.in>
> wrote:
> >>
> >> Hi Team,
> >>
> >> I need a suggestion.
> >>
> >> I want to get files from ftp server for which GetFtp processor is
> available. However, as I cannot delete files from source, I need to put a
> check that this processor does not pick a file more than once. What is the
> best way to do that?
> >>
> >> Regards,
> >> Sourav Gulati
> >>
> >>
> >> ________________________________
> >>
> >>
> >>
> >>
> >>
> >>
> >> NOTE: This message may contain information that is confidential,
> proprietary, privileged or otherwise protected by law. The message is
> intended solely for the named addressee. If received in error, please
> destroy and notify the sender. Any use of this email is prohibited when
> received in error. Impetus does not represent, warrant and/or guarantee,
> that the integrity of this communication has been maintained nor that the
> communication is free of errors, virus, interception or interference.
> >
> >
> > ________________________________
> >
> >
> >
> >
> >
> >
> > NOTE: This message may contain information that is confidential,
> proprietary, privileged or otherwise protected by law. The message is
> intended solely for the named addressee. If received in error, please
> destroy and notify the sender. Any use of this email is prohibited when
> received in error. Impetus does not represent, warrant and/or guarantee,
> that the integrity of this communication has been maintained nor that the
> communication is free of errors, virus, interception or interference.
>
>

Re: Reg: Get files from ftp

Posted by Mark Payne <ma...@hotmail.com>.
Sourav,

Some of the older List*** Processors have a property for a "Distributed Cache Service." If that is
populated, then the Processor will pull the state from there upon restart. However, newer versions
of NiFi do not store state there - they will only restore from there upon restart. State is stored only
in ZooKeeper (whichever instance is configured in your conf/state-mangement.xml file). If there is
no ZooKeeper instance that is properly configured there, it will fail to store any state and you'll end
up re-listing those files when NiFi restarts.

Thanks
-Mark


> On May 12, 2016, at 1:13 AM, Sourav Gulati <so...@impetus.co.in> wrote:
> 
> Mark,
> 
> My question is that I am running NIFI in clustered mode but embedded zookeeper is not running. However, I can see that  it is still saving state of the files being listed.
> 
> Since zookeeper is not running, where the state is being saved by default in clustered mode?
> 
> Regards,
> Sourav Gulati
> 
> -----Original Message-----
> From: Mark Payne [mailto:markap14@hotmail.com]
> Sent: Wednesday, May 11, 2016 7:17 PM
> To: dev@nifi.apache.org
> Subject: Re: Reg: Get files from ftp
> 
> Sourav,
> 
> If you run an embedded zookeeper, then yes, it runs within the NiFi JVM and stores state (by default) in the ./state/zookeeper directory.
> 
> Thanks
> -Mark
> 
> 
>> On May 11, 2016, at 9:14 AM, Sourav Gulati <so...@impetus.co.in> wrote:
>> 
>> Mark,
>> 
>> Does zookeeper process runs inside Nifi JVM? If yes, what is the default path of zookeeper data directory?
>> 
>> Regards,
>> Sourav Gulati
>> 
>> -----Original Message-----
>> From: Mark Payne [mailto:markap14@hotmail.com]
>> Sent: Wednesday, May 11, 2016 6:37 PM
>> To: dev@nifi.apache.org
>> Subject: Re: Reg: Get files from ftp
>> 
>> Sourav,
>> 
>> If your NiFi instance is clustered, it will store the information in
>> ZooKeeper. If not clustered, it will store the state in a local file.
>> This is done because in a cluster, you typically want to run your
>> List*** Processors on Primary Node only, and this allows another node to pick up where the previous one left off if the Primary Node changes. Of course, storing all of the files that have been listed can become very verbose so it stores only a small amount of data -- the timestamp of the latest file discovered and the timestamp of the latest file process/listed. It can then use this information to determine if files are new or modified without storing much info.
>> 
>> Thanks
>> -Mark
>> 
>>> On May 11, 2016, at 12:39 AM, Sourav Gulati <so...@impetus.co.in> wrote:
>>> 
>>> Thanks Matthew,
>>> A quick question: Where does it store the state of files already listed?
>>> 
>>> 
>>> Regards,
>>> Sourav Gulati
>>> 
>>> -----Original Message-----
>>> From: Matthew Clarke [mailto:matt.clarke.138@gmail.com]
>>> Sent: Wednesday, May 11, 2016 3:37 AM
>>> To: dev@nifi.apache.org
>>> Subject: Re: Reg: Get files from ftp
>>> 
>>> The list type processors are designed to use NiFi state management to keep from listing the same files twice. The fetch type processors with retrieve files based on the FlowFiles it is fed. Typically those FlowFiles it works from come from the corresponding list processor.
>>> On May 10, 2016 8:56 AM, "Mark Payne" <ma...@hotmail.com> wrote:
>>> 
>>>> Sourav,
>>>> 
>>>> Sure. Within the nifi-standard-processors bundle are a few classes
>>>> that would be important here.
>>>> First is the AbstractListProcessor. You'll want to use this as your
>>>> base class for ListFTP. Also, FetchFileTransfer will be the class
>>>> that you'll extend for the FetchFTP processor.
>>>> 
>>>> The ListSFTP and FetchSFTP are great examples to look at as examples.
>>>> 
>>>> Additionally, the GetFTP and GetSFTP are good examples to look at as
>>>> to how the FTP & SFTP implementations differ. They basically differ
>>>> in the Property Descriptors provided and the FileTransfer object
>>>> that is used.
>>>> 
>>>> If you have any questions, please feel free to reach out to this
>>>> mailing list. Very happy to help however we can!
>>>> 
>>>> Thanks
>>>> -Mark
>>>> 
>>>> 
>>>>> On May 10, 2016, at 1:30 AM, Sourav Gulati
>>>>> <so...@impetus.co.in>
>>>> wrote:
>>>>> 
>>>>> Sure Mark. I am interested to work on it. Please provide some
>>>>> pointers
>>>> regarding that.
>>>>> 
>>>>> Also, I will check if Sftp can be used. So ListSFTP / FetchSFTP
>>>>> won't
>>>> pick files more than once?
>>>>> 
>>>>> Regards,
>>>>> Sourav Gulati
>>>>> 
>>>>> -----Original Message-----
>>>>> From: Mark Payne [mailto:markap14@hotmail.com]
>>>>> Sent: Monday, May 09, 2016 5:34 PM
>>>>> To: dev@nifi.apache.org
>>>>> Subject: Re: Reg: Get files from ftp
>>>>> 
>>>>> Sourav,
>>>>> 
>>>>> We have begun transitioning from many of the Get*** Processors to
>>>> List*** and Fetch*** Processors.
>>>>> There is a ListSFTP / FetchSFTP processor set but not currently a
>>>> List/Fetch FTP. Is SFTP a possibility for you? Would you be
>>>> interested in working on a List/Fetch FTP Processor set?
>>>>> 
>>>>> Thanks
>>>>> -Mark
>>>>> 
>>>>>> On May 9, 2016, at 5:48 AM, Sourav Gulati
>>>>>> <so...@impetus.co.in>
>>>> wrote:
>>>>>> 
>>>>>> Hi Team,
>>>>>> 
>>>>>> I need a suggestion.
>>>>>> 
>>>>>> I want to get files from ftp server for which GetFtp processor is
>>>> available. However, as I cannot delete files from source, I need to
>>>> put a check that this processor does not pick a file more than once.
>>>> What is the best way to do that?
>>>>>> 
>>>>>> Regards,
>>>>>> Sourav Gulati
>>>>>> 
>>>>>> 
>>>>>> ________________________________
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> NOTE: This message may contain information that is confidential,
>>>> proprietary, privileged or otherwise protected by law. The message
>>>> is intended solely for the named addressee. If received in error,
>>>> please destroy and notify the sender. Any use of this email is
>>>> prohibited when received in error. Impetus does not represent,
>>>> warrant and/or guarantee, that the integrity of this communication
>>>> has been maintained nor that the communication is free of errors, virus, interception or interference.
>>>>> 
>>>>> 
>>>>> ________________________________
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> NOTE: This message may contain information that is confidential,
>>>> proprietary, privileged or otherwise protected by law. The message
>>>> is intended solely for the named addressee. If received in error,
>>>> please destroy and notify the sender. Any use of this email is
>>>> prohibited when received in error. Impetus does not represent,
>>>> warrant and/or guarantee, that the integrity of this communication
>>>> has been maintained nor that the communication is free of errors, virus, interception or interference.
>>>> 
>>>> 
>>> 
>>> ________________________________
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.
>> 
>> 
>> ________________________________
>> 
>> 
>> 
>> 
>> 
>> 
>> NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.
> 
> 
> ________________________________
> 
> 
> 
> 
> 
> 
> NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.


RE: Reg: Get files from ftp

Posted by Sourav Gulati <so...@impetus.co.in>.
Mark,

My question is that I am running NIFI in clustered mode but embedded zookeeper is not running. However, I can see that  it is still saving state of the files being listed.

Since zookeeper is not running, where the state is being saved by default in clustered mode?

Regards,
Sourav Gulati

-----Original Message-----
From: Mark Payne [mailto:markap14@hotmail.com]
Sent: Wednesday, May 11, 2016 7:17 PM
To: dev@nifi.apache.org
Subject: Re: Reg: Get files from ftp

Sourav,

If you run an embedded zookeeper, then yes, it runs within the NiFi JVM and stores state (by default) in the ./state/zookeeper directory.

Thanks
-Mark


> On May 11, 2016, at 9:14 AM, Sourav Gulati <so...@impetus.co.in> wrote:
>
> Mark,
>
> Does zookeeper process runs inside Nifi JVM? If yes, what is the default path of zookeeper data directory?
>
> Regards,
> Sourav Gulati
>
> -----Original Message-----
> From: Mark Payne [mailto:markap14@hotmail.com]
> Sent: Wednesday, May 11, 2016 6:37 PM
> To: dev@nifi.apache.org
> Subject: Re: Reg: Get files from ftp
>
> Sourav,
>
> If your NiFi instance is clustered, it will store the information in
> ZooKeeper. If not clustered, it will store the state in a local file.
> This is done because in a cluster, you typically want to run your
> List*** Processors on Primary Node only, and this allows another node to pick up where the previous one left off if the Primary Node changes. Of course, storing all of the files that have been listed can become very verbose so it stores only a small amount of data -- the timestamp of the latest file discovered and the timestamp of the latest file process/listed. It can then use this information to determine if files are new or modified without storing much info.
>
> Thanks
> -Mark
>
>> On May 11, 2016, at 12:39 AM, Sourav Gulati <so...@impetus.co.in> wrote:
>>
>> Thanks Matthew,
>> A quick question: Where does it store the state of files already listed?
>>
>>
>> Regards,
>> Sourav Gulati
>>
>> -----Original Message-----
>> From: Matthew Clarke [mailto:matt.clarke.138@gmail.com]
>> Sent: Wednesday, May 11, 2016 3:37 AM
>> To: dev@nifi.apache.org
>> Subject: Re: Reg: Get files from ftp
>>
>> The list type processors are designed to use NiFi state management to keep from listing the same files twice. The fetch type processors with retrieve files based on the FlowFiles it is fed. Typically those FlowFiles it works from come from the corresponding list processor.
>> On May 10, 2016 8:56 AM, "Mark Payne" <ma...@hotmail.com> wrote:
>>
>>> Sourav,
>>>
>>> Sure. Within the nifi-standard-processors bundle are a few classes
>>> that would be important here.
>>> First is the AbstractListProcessor. You'll want to use this as your
>>> base class for ListFTP. Also, FetchFileTransfer will be the class
>>> that you'll extend for the FetchFTP processor.
>>>
>>> The ListSFTP and FetchSFTP are great examples to look at as examples.
>>>
>>> Additionally, the GetFTP and GetSFTP are good examples to look at as
>>> to how the FTP & SFTP implementations differ. They basically differ
>>> in the Property Descriptors provided and the FileTransfer object
>>> that is used.
>>>
>>> If you have any questions, please feel free to reach out to this
>>> mailing list. Very happy to help however we can!
>>>
>>> Thanks
>>> -Mark
>>>
>>>
>>>> On May 10, 2016, at 1:30 AM, Sourav Gulati
>>>> <so...@impetus.co.in>
>>> wrote:
>>>>
>>>> Sure Mark. I am interested to work on it. Please provide some
>>>> pointers
>>> regarding that.
>>>>
>>>> Also, I will check if Sftp can be used. So ListSFTP / FetchSFTP
>>>> won't
>>> pick files more than once?
>>>>
>>>> Regards,
>>>> Sourav Gulati
>>>>
>>>> -----Original Message-----
>>>> From: Mark Payne [mailto:markap14@hotmail.com]
>>>> Sent: Monday, May 09, 2016 5:34 PM
>>>> To: dev@nifi.apache.org
>>>> Subject: Re: Reg: Get files from ftp
>>>>
>>>> Sourav,
>>>>
>>>> We have begun transitioning from many of the Get*** Processors to
>>> List*** and Fetch*** Processors.
>>>> There is a ListSFTP / FetchSFTP processor set but not currently a
>>> List/Fetch FTP. Is SFTP a possibility for you? Would you be
>>> interested in working on a List/Fetch FTP Processor set?
>>>>
>>>> Thanks
>>>> -Mark
>>>>
>>>>> On May 9, 2016, at 5:48 AM, Sourav Gulati
>>>>> <so...@impetus.co.in>
>>> wrote:
>>>>>
>>>>> Hi Team,
>>>>>
>>>>> I need a suggestion.
>>>>>
>>>>> I want to get files from ftp server for which GetFtp processor is
>>> available. However, as I cannot delete files from source, I need to
>>> put a check that this processor does not pick a file more than once.
>>> What is the best way to do that?
>>>>>
>>>>> Regards,
>>>>> Sourav Gulati
>>>>>
>>>>>
>>>>> ________________________________
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> NOTE: This message may contain information that is confidential,
>>> proprietary, privileged or otherwise protected by law. The message
>>> is intended solely for the named addressee. If received in error,
>>> please destroy and notify the sender. Any use of this email is
>>> prohibited when received in error. Impetus does not represent,
>>> warrant and/or guarantee, that the integrity of this communication
>>> has been maintained nor that the communication is free of errors, virus, interception or interference.
>>>>
>>>>
>>>> ________________________________
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> NOTE: This message may contain information that is confidential,
>>> proprietary, privileged or otherwise protected by law. The message
>>> is intended solely for the named addressee. If received in error,
>>> please destroy and notify the sender. Any use of this email is
>>> prohibited when received in error. Impetus does not represent,
>>> warrant and/or guarantee, that the integrity of this communication
>>> has been maintained nor that the communication is free of errors, virus, interception or interference.
>>>
>>>
>>
>> ________________________________
>>
>>
>>
>>
>>
>>
>> NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.
>
>
> ________________________________
>
>
>
>
>
>
> NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.


________________________________






NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

Re: Reg: Get files from ftp

Posted by Mark Payne <ma...@hotmail.com>.
Sourav,

If you run an embedded zookeeper, then yes, it runs within the NiFi JVM and stores state
(by default) in the ./state/zookeeper directory.

Thanks
-Mark


> On May 11, 2016, at 9:14 AM, Sourav Gulati <so...@impetus.co.in> wrote:
> 
> Mark,
> 
> Does zookeeper process runs inside Nifi JVM? If yes, what is the default path of zookeeper data directory?
> 
> Regards,
> Sourav Gulati
> 
> -----Original Message-----
> From: Mark Payne [mailto:markap14@hotmail.com]
> Sent: Wednesday, May 11, 2016 6:37 PM
> To: dev@nifi.apache.org
> Subject: Re: Reg: Get files from ftp
> 
> Sourav,
> 
> If your NiFi instance is clustered, it will store the information in ZooKeeper. If not clustered, it will store the state in a local file. This is done because in a cluster, you typically want to run your
> List*** Processors on Primary Node only, and this allows another node to pick up where the previous one left off if the Primary Node changes. Of course, storing all of the files that have been listed can become very verbose so it stores only a small amount of data -- the timestamp of the latest file discovered and the timestamp of the latest file process/listed. It can then use this information to determine if files are new or modified without storing much info.
> 
> Thanks
> -Mark
> 
>> On May 11, 2016, at 12:39 AM, Sourav Gulati <so...@impetus.co.in> wrote:
>> 
>> Thanks Matthew,
>> A quick question: Where does it store the state of files already listed?
>> 
>> 
>> Regards,
>> Sourav Gulati
>> 
>> -----Original Message-----
>> From: Matthew Clarke [mailto:matt.clarke.138@gmail.com]
>> Sent: Wednesday, May 11, 2016 3:37 AM
>> To: dev@nifi.apache.org
>> Subject: Re: Reg: Get files from ftp
>> 
>> The list type processors are designed to use NiFi state management to keep from listing the same files twice. The fetch type processors with retrieve files based on the FlowFiles it is fed. Typically those FlowFiles it works from come from the corresponding list processor.
>> On May 10, 2016 8:56 AM, "Mark Payne" <ma...@hotmail.com> wrote:
>> 
>>> Sourav,
>>> 
>>> Sure. Within the nifi-standard-processors bundle are a few classes
>>> that would be important here.
>>> First is the AbstractListProcessor. You'll want to use this as your
>>> base class for ListFTP. Also, FetchFileTransfer will be the class
>>> that you'll extend for the FetchFTP processor.
>>> 
>>> The ListSFTP and FetchSFTP are great examples to look at as examples.
>>> 
>>> Additionally, the GetFTP and GetSFTP are good examples to look at as
>>> to how the FTP & SFTP implementations differ. They basically differ
>>> in the Property Descriptors provided and the FileTransfer object that
>>> is used.
>>> 
>>> If you have any questions, please feel free to reach out to this
>>> mailing list. Very happy to help however we can!
>>> 
>>> Thanks
>>> -Mark
>>> 
>>> 
>>>> On May 10, 2016, at 1:30 AM, Sourav Gulati
>>>> <so...@impetus.co.in>
>>> wrote:
>>>> 
>>>> Sure Mark. I am interested to work on it. Please provide some
>>>> pointers
>>> regarding that.
>>>> 
>>>> Also, I will check if Sftp can be used. So ListSFTP / FetchSFTP
>>>> won't
>>> pick files more than once?
>>>> 
>>>> Regards,
>>>> Sourav Gulati
>>>> 
>>>> -----Original Message-----
>>>> From: Mark Payne [mailto:markap14@hotmail.com]
>>>> Sent: Monday, May 09, 2016 5:34 PM
>>>> To: dev@nifi.apache.org
>>>> Subject: Re: Reg: Get files from ftp
>>>> 
>>>> Sourav,
>>>> 
>>>> We have begun transitioning from many of the Get*** Processors to
>>> List*** and Fetch*** Processors.
>>>> There is a ListSFTP / FetchSFTP processor set but not currently a
>>> List/Fetch FTP. Is SFTP a possibility for you? Would you be
>>> interested in working on a List/Fetch FTP Processor set?
>>>> 
>>>> Thanks
>>>> -Mark
>>>> 
>>>>> On May 9, 2016, at 5:48 AM, Sourav Gulati
>>>>> <so...@impetus.co.in>
>>> wrote:
>>>>> 
>>>>> Hi Team,
>>>>> 
>>>>> I need a suggestion.
>>>>> 
>>>>> I want to get files from ftp server for which GetFtp processor is
>>> available. However, as I cannot delete files from source, I need to
>>> put a check that this processor does not pick a file more than once.
>>> What is the best way to do that?
>>>>> 
>>>>> Regards,
>>>>> Sourav Gulati
>>>>> 
>>>>> 
>>>>> ________________________________
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> NOTE: This message may contain information that is confidential,
>>> proprietary, privileged or otherwise protected by law. The message is
>>> intended solely for the named addressee. If received in error, please
>>> destroy and notify the sender. Any use of this email is prohibited
>>> when received in error. Impetus does not represent, warrant and/or
>>> guarantee, that the integrity of this communication has been
>>> maintained nor that the communication is free of errors, virus, interception or interference.
>>>> 
>>>> 
>>>> ________________________________
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> NOTE: This message may contain information that is confidential,
>>> proprietary, privileged or otherwise protected by law. The message is
>>> intended solely for the named addressee. If received in error, please
>>> destroy and notify the sender. Any use of this email is prohibited
>>> when received in error. Impetus does not represent, warrant and/or
>>> guarantee, that the integrity of this communication has been
>>> maintained nor that the communication is free of errors, virus, interception or interference.
>>> 
>>> 
>> 
>> ________________________________
>> 
>> 
>> 
>> 
>> 
>> 
>> NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.
> 
> 
> ________________________________
> 
> 
> 
> 
> 
> 
> NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.


RE: Reg: Get files from ftp

Posted by Sourav Gulati <so...@impetus.co.in>.
I am running NIFi is clustered mode. However, following property is set to false

# Specifies whether or not this instance of NiFi should run an embedded ZooKeeper server
nifi.state.management.embedded.zookeeper.start=false

Also as per zookeeper.properties, path of data dir is
./state/zookeeper

But I cannot see any zookeeper folder inside ./state folder
nifi-0.6.1/conf/state$ ls
b106a89f-09d7-4f23-94e2-00f2299393f5.peers
ubuntu@kafka-broker1:~/softwares/nifi-0.6.1/conf/state$ cat b106a89f-09d7-4f23-94e2-00f2299393f5.peers
10.3.8.198:8099:false

Regards,
Sourav Gulati

-----Original Message-----
From: Sourav Gulati [mailto:sourav.gulati@impetus.co.in]
Sent: Wednesday, May 11, 2016 6:45 PM
To: dev@nifi.apache.org
Subject: RE: Reg: Get files from ftp

Mark,

Does zookeeper process runs inside Nifi JVM? If yes, what is the default path of zookeeper data directory?

Regards,
Sourav Gulati

-----Original Message-----
From: Mark Payne [mailto:markap14@hotmail.com]
Sent: Wednesday, May 11, 2016 6:37 PM
To: dev@nifi.apache.org
Subject: Re: Reg: Get files from ftp

Sourav,

If your NiFi instance is clustered, it will store the information in ZooKeeper. If not clustered, it will store the state in a local file. This is done because in a cluster, you typically want to run your
List*** Processors on Primary Node only, and this allows another node to pick up where the previous one left off if the Primary Node changes. Of course, storing all of the files that have been listed can become very verbose so it stores only a small amount of data -- the timestamp of the latest file discovered and the timestamp of the latest file process/listed. It can then use this information to determine if files are new or modified without storing much info.

Thanks
-Mark

> On May 11, 2016, at 12:39 AM, Sourav Gulati <so...@impetus.co.in> wrote:
>
> Thanks Matthew,
> A quick question: Where does it store the state of files already listed?
>
>
> Regards,
> Sourav Gulati
>
> -----Original Message-----
> From: Matthew Clarke [mailto:matt.clarke.138@gmail.com]
> Sent: Wednesday, May 11, 2016 3:37 AM
> To: dev@nifi.apache.org
> Subject: Re: Reg: Get files from ftp
>
> The list type processors are designed to use NiFi state management to keep from listing the same files twice. The fetch type processors with retrieve files based on the FlowFiles it is fed. Typically those FlowFiles it works from come from the corresponding list processor.
> On May 10, 2016 8:56 AM, "Mark Payne" <ma...@hotmail.com> wrote:
>
>> Sourav,
>>
>> Sure. Within the nifi-standard-processors bundle are a few classes
>> that would be important here.
>> First is the AbstractListProcessor. You'll want to use this as your
>> base class for ListFTP. Also, FetchFileTransfer will be the class
>> that you'll extend for the FetchFTP processor.
>>
>> The ListSFTP and FetchSFTP are great examples to look at as examples.
>>
>> Additionally, the GetFTP and GetSFTP are good examples to look at as
>> to how the FTP & SFTP implementations differ. They basically differ
>> in the Property Descriptors provided and the FileTransfer object that
>> is used.
>>
>> If you have any questions, please feel free to reach out to this
>> mailing list. Very happy to help however we can!
>>
>> Thanks
>> -Mark
>>
>>
>>> On May 10, 2016, at 1:30 AM, Sourav Gulati
>>> <so...@impetus.co.in>
>> wrote:
>>>
>>> Sure Mark. I am interested to work on it. Please provide some
>>> pointers
>> regarding that.
>>>
>>> Also, I will check if Sftp can be used. So ListSFTP / FetchSFTP
>>> won't
>> pick files more than once?
>>>
>>> Regards,
>>> Sourav Gulati
>>>
>>> -----Original Message-----
>>> From: Mark Payne [mailto:markap14@hotmail.com]
>>> Sent: Monday, May 09, 2016 5:34 PM
>>> To: dev@nifi.apache.org
>>> Subject: Re: Reg: Get files from ftp
>>>
>>> Sourav,
>>>
>>> We have begun transitioning from many of the Get*** Processors to
>> List*** and Fetch*** Processors.
>>> There is a ListSFTP / FetchSFTP processor set but not currently a
>> List/Fetch FTP. Is SFTP a possibility for you? Would you be
>> interested in working on a List/Fetch FTP Processor set?
>>>
>>> Thanks
>>> -Mark
>>>
>>>> On May 9, 2016, at 5:48 AM, Sourav Gulati
>>>> <so...@impetus.co.in>
>> wrote:
>>>>
>>>> Hi Team,
>>>>
>>>> I need a suggestion.
>>>>
>>>> I want to get files from ftp server for which GetFtp processor is
>> available. However, as I cannot delete files from source, I need to
>> put a check that this processor does not pick a file more than once.
>> What is the best way to do that?
>>>>
>>>> Regards,
>>>> Sourav Gulati
>>>>
>>>>
>>>> ________________________________
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> NOTE: This message may contain information that is confidential,
>> proprietary, privileged or otherwise protected by law. The message is
>> intended solely for the named addressee. If received in error, please
>> destroy and notify the sender. Any use of this email is prohibited
>> when received in error. Impetus does not represent, warrant and/or
>> guarantee, that the integrity of this communication has been
>> maintained nor that the communication is free of errors, virus, interception or interference.
>>>
>>>
>>> ________________________________
>>>
>>>
>>>
>>>
>>>
>>>
>>> NOTE: This message may contain information that is confidential,
>> proprietary, privileged or otherwise protected by law. The message is
>> intended solely for the named addressee. If received in error, please
>> destroy and notify the sender. Any use of this email is prohibited
>> when received in error. Impetus does not represent, warrant and/or
>> guarantee, that the integrity of this communication has been
>> maintained nor that the communication is free of errors, virus, interception or interference.
>>
>>
>
> ________________________________
>
>
>
>
>
>
> NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.


________________________________






NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

________________________________






NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

RE: Reg: Get files from ftp

Posted by Sourav Gulati <so...@impetus.co.in>.
Mark,

Does zookeeper process runs inside Nifi JVM? If yes, what is the default path of zookeeper data directory?

Regards,
Sourav Gulati

-----Original Message-----
From: Mark Payne [mailto:markap14@hotmail.com]
Sent: Wednesday, May 11, 2016 6:37 PM
To: dev@nifi.apache.org
Subject: Re: Reg: Get files from ftp

Sourav,

If your NiFi instance is clustered, it will store the information in ZooKeeper. If not clustered, it will store the state in a local file. This is done because in a cluster, you typically want to run your
List*** Processors on Primary Node only, and this allows another node to pick up where the previous one left off if the Primary Node changes. Of course, storing all of the files that have been listed can become very verbose so it stores only a small amount of data -- the timestamp of the latest file discovered and the timestamp of the latest file process/listed. It can then use this information to determine if files are new or modified without storing much info.

Thanks
-Mark

> On May 11, 2016, at 12:39 AM, Sourav Gulati <so...@impetus.co.in> wrote:
>
> Thanks Matthew,
> A quick question: Where does it store the state of files already listed?
>
>
> Regards,
> Sourav Gulati
>
> -----Original Message-----
> From: Matthew Clarke [mailto:matt.clarke.138@gmail.com]
> Sent: Wednesday, May 11, 2016 3:37 AM
> To: dev@nifi.apache.org
> Subject: Re: Reg: Get files from ftp
>
> The list type processors are designed to use NiFi state management to keep from listing the same files twice. The fetch type processors with retrieve files based on the FlowFiles it is fed. Typically those FlowFiles it works from come from the corresponding list processor.
> On May 10, 2016 8:56 AM, "Mark Payne" <ma...@hotmail.com> wrote:
>
>> Sourav,
>>
>> Sure. Within the nifi-standard-processors bundle are a few classes
>> that would be important here.
>> First is the AbstractListProcessor. You'll want to use this as your
>> base class for ListFTP. Also, FetchFileTransfer will be the class
>> that you'll extend for the FetchFTP processor.
>>
>> The ListSFTP and FetchSFTP are great examples to look at as examples.
>>
>> Additionally, the GetFTP and GetSFTP are good examples to look at as
>> to how the FTP & SFTP implementations differ. They basically differ
>> in the Property Descriptors provided and the FileTransfer object that
>> is used.
>>
>> If you have any questions, please feel free to reach out to this
>> mailing list. Very happy to help however we can!
>>
>> Thanks
>> -Mark
>>
>>
>>> On May 10, 2016, at 1:30 AM, Sourav Gulati
>>> <so...@impetus.co.in>
>> wrote:
>>>
>>> Sure Mark. I am interested to work on it. Please provide some
>>> pointers
>> regarding that.
>>>
>>> Also, I will check if Sftp can be used. So ListSFTP / FetchSFTP
>>> won't
>> pick files more than once?
>>>
>>> Regards,
>>> Sourav Gulati
>>>
>>> -----Original Message-----
>>> From: Mark Payne [mailto:markap14@hotmail.com]
>>> Sent: Monday, May 09, 2016 5:34 PM
>>> To: dev@nifi.apache.org
>>> Subject: Re: Reg: Get files from ftp
>>>
>>> Sourav,
>>>
>>> We have begun transitioning from many of the Get*** Processors to
>> List*** and Fetch*** Processors.
>>> There is a ListSFTP / FetchSFTP processor set but not currently a
>> List/Fetch FTP. Is SFTP a possibility for you? Would you be
>> interested in working on a List/Fetch FTP Processor set?
>>>
>>> Thanks
>>> -Mark
>>>
>>>> On May 9, 2016, at 5:48 AM, Sourav Gulati
>>>> <so...@impetus.co.in>
>> wrote:
>>>>
>>>> Hi Team,
>>>>
>>>> I need a suggestion.
>>>>
>>>> I want to get files from ftp server for which GetFtp processor is
>> available. However, as I cannot delete files from source, I need to
>> put a check that this processor does not pick a file more than once.
>> What is the best way to do that?
>>>>
>>>> Regards,
>>>> Sourav Gulati
>>>>
>>>>
>>>> ________________________________
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> NOTE: This message may contain information that is confidential,
>> proprietary, privileged or otherwise protected by law. The message is
>> intended solely for the named addressee. If received in error, please
>> destroy and notify the sender. Any use of this email is prohibited
>> when received in error. Impetus does not represent, warrant and/or
>> guarantee, that the integrity of this communication has been
>> maintained nor that the communication is free of errors, virus, interception or interference.
>>>
>>>
>>> ________________________________
>>>
>>>
>>>
>>>
>>>
>>>
>>> NOTE: This message may contain information that is confidential,
>> proprietary, privileged or otherwise protected by law. The message is
>> intended solely for the named addressee. If received in error, please
>> destroy and notify the sender. Any use of this email is prohibited
>> when received in error. Impetus does not represent, warrant and/or
>> guarantee, that the integrity of this communication has been
>> maintained nor that the communication is free of errors, virus, interception or interference.
>>
>>
>
> ________________________________
>
>
>
>
>
>
> NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.


________________________________






NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

Re: Reg: Get files from ftp

Posted by Mark Payne <ma...@hotmail.com>.
Sourav,

If your NiFi instance is clustered, it will store the information in ZooKeeper. If not clustered, it
will store the state in a local file. This is done because in a cluster, you typically want to run your
List*** Processors on Primary Node only, and this allows another node to pick up where the previous
one left off if the Primary Node changes. Of course, storing all of the files that have been listed can become
very verbose so it stores only a small amount of data -- the timestamp of the latest file discovered
and the timestamp of the latest file process/listed. It can then use this information to determine if files
are new or modified without storing much info.

Thanks
-Mark

> On May 11, 2016, at 12:39 AM, Sourav Gulati <so...@impetus.co.in> wrote:
> 
> Thanks Matthew,
> A quick question: Where does it store the state of files already listed?
> 
> 
> Regards,
> Sourav Gulati
> 
> -----Original Message-----
> From: Matthew Clarke [mailto:matt.clarke.138@gmail.com]
> Sent: Wednesday, May 11, 2016 3:37 AM
> To: dev@nifi.apache.org
> Subject: Re: Reg: Get files from ftp
> 
> The list type processors are designed to use NiFi state management to keep from listing the same files twice. The fetch type processors with retrieve files based on the FlowFiles it is fed. Typically those FlowFiles it works from come from the corresponding list processor.
> On May 10, 2016 8:56 AM, "Mark Payne" <ma...@hotmail.com> wrote:
> 
>> Sourav,
>> 
>> Sure. Within the nifi-standard-processors bundle are a few classes
>> that would be important here.
>> First is the AbstractListProcessor. You'll want to use this as your
>> base class for ListFTP. Also, FetchFileTransfer will be the class that
>> you'll extend for the FetchFTP processor.
>> 
>> The ListSFTP and FetchSFTP are great examples to look at as examples.
>> 
>> Additionally, the GetFTP and GetSFTP are good examples to look at as
>> to how the FTP & SFTP implementations differ. They basically differ in
>> the Property Descriptors provided and the FileTransfer object that is
>> used.
>> 
>> If you have any questions, please feel free to reach out to this
>> mailing list. Very happy to help however we can!
>> 
>> Thanks
>> -Mark
>> 
>> 
>>> On May 10, 2016, at 1:30 AM, Sourav Gulati
>>> <so...@impetus.co.in>
>> wrote:
>>> 
>>> Sure Mark. I am interested to work on it. Please provide some
>>> pointers
>> regarding that.
>>> 
>>> Also, I will check if Sftp can be used. So ListSFTP / FetchSFTP
>>> won't
>> pick files more than once?
>>> 
>>> Regards,
>>> Sourav Gulati
>>> 
>>> -----Original Message-----
>>> From: Mark Payne [mailto:markap14@hotmail.com]
>>> Sent: Monday, May 09, 2016 5:34 PM
>>> To: dev@nifi.apache.org
>>> Subject: Re: Reg: Get files from ftp
>>> 
>>> Sourav,
>>> 
>>> We have begun transitioning from many of the Get*** Processors to
>> List*** and Fetch*** Processors.
>>> There is a ListSFTP / FetchSFTP processor set but not currently a
>> List/Fetch FTP. Is SFTP a possibility for you? Would you be interested
>> in working on a List/Fetch FTP Processor set?
>>> 
>>> Thanks
>>> -Mark
>>> 
>>>> On May 9, 2016, at 5:48 AM, Sourav Gulati
>>>> <so...@impetus.co.in>
>> wrote:
>>>> 
>>>> Hi Team,
>>>> 
>>>> I need a suggestion.
>>>> 
>>>> I want to get files from ftp server for which GetFtp processor is
>> available. However, as I cannot delete files from source, I need to
>> put a check that this processor does not pick a file more than once.
>> What is the best way to do that?
>>>> 
>>>> Regards,
>>>> Sourav Gulati
>>>> 
>>>> 
>>>> ________________________________
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> NOTE: This message may contain information that is confidential,
>> proprietary, privileged or otherwise protected by law. The message is
>> intended solely for the named addressee. If received in error, please
>> destroy and notify the sender. Any use of this email is prohibited
>> when received in error. Impetus does not represent, warrant and/or
>> guarantee, that the integrity of this communication has been
>> maintained nor that the communication is free of errors, virus, interception or interference.
>>> 
>>> 
>>> ________________________________
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> NOTE: This message may contain information that is confidential,
>> proprietary, privileged or otherwise protected by law. The message is
>> intended solely for the named addressee. If received in error, please
>> destroy and notify the sender. Any use of this email is prohibited
>> when received in error. Impetus does not represent, warrant and/or
>> guarantee, that the integrity of this communication has been
>> maintained nor that the communication is free of errors, virus, interception or interference.
>> 
>> 
> 
> ________________________________
> 
> 
> 
> 
> 
> 
> NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.


RE: Reg: Get files from ftp

Posted by Sourav Gulati <so...@impetus.co.in>.
Thanks Matthew,
A quick question: Where does it store the state of files already listed?


Regards,
Sourav Gulati

-----Original Message-----
From: Matthew Clarke [mailto:matt.clarke.138@gmail.com]
Sent: Wednesday, May 11, 2016 3:37 AM
To: dev@nifi.apache.org
Subject: Re: Reg: Get files from ftp

The list type processors are designed to use NiFi state management to keep from listing the same files twice. The fetch type processors with retrieve files based on the FlowFiles it is fed. Typically those FlowFiles it works from come from the corresponding list processor.
On May 10, 2016 8:56 AM, "Mark Payne" <ma...@hotmail.com> wrote:

> Sourav,
>
> Sure. Within the nifi-standard-processors bundle are a few classes
> that would be important here.
> First is the AbstractListProcessor. You'll want to use this as your
> base class for ListFTP. Also, FetchFileTransfer will be the class that
> you'll extend for the FetchFTP processor.
>
> The ListSFTP and FetchSFTP are great examples to look at as examples.
>
> Additionally, the GetFTP and GetSFTP are good examples to look at as
> to how the FTP & SFTP implementations differ. They basically differ in
> the Property Descriptors provided and the FileTransfer object that is
> used.
>
> If you have any questions, please feel free to reach out to this
> mailing list. Very happy to help however we can!
>
> Thanks
> -Mark
>
>
> > On May 10, 2016, at 1:30 AM, Sourav Gulati
> > <so...@impetus.co.in>
> wrote:
> >
> > Sure Mark. I am interested to work on it. Please provide some
> > pointers
> regarding that.
> >
> > Also, I will check if Sftp can be used. So ListSFTP / FetchSFTP
> > won't
> pick files more than once?
> >
> > Regards,
> > Sourav Gulati
> >
> > -----Original Message-----
> > From: Mark Payne [mailto:markap14@hotmail.com]
> > Sent: Monday, May 09, 2016 5:34 PM
> > To: dev@nifi.apache.org
> > Subject: Re: Reg: Get files from ftp
> >
> > Sourav,
> >
> > We have begun transitioning from many of the Get*** Processors to
> List*** and Fetch*** Processors.
> > There is a ListSFTP / FetchSFTP processor set but not currently a
> List/Fetch FTP. Is SFTP a possibility for you? Would you be interested
> in working on a List/Fetch FTP Processor set?
> >
> > Thanks
> > -Mark
> >
> >> On May 9, 2016, at 5:48 AM, Sourav Gulati
> >> <so...@impetus.co.in>
> wrote:
> >>
> >> Hi Team,
> >>
> >> I need a suggestion.
> >>
> >> I want to get files from ftp server for which GetFtp processor is
> available. However, as I cannot delete files from source, I need to
> put a check that this processor does not pick a file more than once.
> What is the best way to do that?
> >>
> >> Regards,
> >> Sourav Gulati
> >>
> >>
> >> ________________________________
> >>
> >>
> >>
> >>
> >>
> >>
> >> NOTE: This message may contain information that is confidential,
> proprietary, privileged or otherwise protected by law. The message is
> intended solely for the named addressee. If received in error, please
> destroy and notify the sender. Any use of this email is prohibited
> when received in error. Impetus does not represent, warrant and/or
> guarantee, that the integrity of this communication has been
> maintained nor that the communication is free of errors, virus, interception or interference.
> >
> >
> > ________________________________
> >
> >
> >
> >
> >
> >
> > NOTE: This message may contain information that is confidential,
> proprietary, privileged or otherwise protected by law. The message is
> intended solely for the named addressee. If received in error, please
> destroy and notify the sender. Any use of this email is prohibited
> when received in error. Impetus does not represent, warrant and/or
> guarantee, that the integrity of this communication has been
> maintained nor that the communication is free of errors, virus, interception or interference.
>
>

________________________________






NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.