You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by sudeep mishra <su...@gmail.com> on 2016/01/12 10:36:03 UTC

PutDistributedMapCache

Hi,

I can cache some data to be used in NiFi flow. I can see the
processor PutDistributedMapCache in the documentation which saves key-value
pairs in DistributedMapCache for NiFi but I do not see any processor to red
this data. How can I read data from DistributedMapCache in my data flow?


Thanks & Regards,

Sudeep Shekhar Mishra

Re: PutDistributedMapCache

Posted by sudeep mishra <su...@gmail.com>.
Thanks Joe.

I am able to use the processor and it works fine.

Just wanted to know if there is a way to clear the data in
DistriutedMapCache which you just replied back.

Thanks Again.

On Thu, Jan 14, 2016 at 9:26 PM, Joe Percivall <jo...@yahoo.com>
wrote:

> Hello Sudeep,
>
> Sorry, not following your emails, did you need more help importing the
> processor?
>
> Currently the way you would clear a DistributedMapCache is to just remove
> the DistributedMapCacheServer controller service and make a new one.
>
> Joe
> - - - - - -
> *Joseph Percivall*
> linkedin.com/in/Percivall
> e: joepercivall@yahoo.com
>
>
>
> On Thursday, January 14, 2016 7:04 AM, sudeep mishra <
> sudeepshekharm@gmail.com> wrote:
>
>
> Thanks Joe. The GetDistributedMapCache seems to be working fine.
>
> Is there a way to clear DistributedMapCache on demand?
>
> Regards,
>
> Sudeep
>
> On Thu, Jan 14, 2016 at 12:42 PM, sudeep mishra <su...@gmail.com>
> wrote:
>
> Upon building the repository we get different .nar files which can be
> updated in the lib for my requirement.
> Thanks for your help.
>
> On Thu, Jan 14, 2016 at 9:27 AM, sudeep mishra <su...@gmail.com>
> wrote:
>
> Is it possible to build the code for only a particular processor? Just
> curious if we can build and deploy a particular processor in an existing
> NiFi environment.
>
> On Wed, Jan 13, 2016 at 9:33 PM, sudeep mishra <su...@gmail.com>
> wrote:
>
> Thanks Joe. I will try out the patch.
>
> On Wed, Jan 13, 2016 at 9:31 PM, Joe Percivall <jo...@yahoo.com>
> wrote:
>
> You would need to clone the nifi source from github and then apply the
> patch using git.
>
> Here is how to clone a repo:
> https://help.github.com/articles/cloning-a-repository/
> Along with the nifi repo itself: https://github.com/apache/nifi
>
> and how to apply a patch:
> http://makandracards.com/makandra/2521-git-how-to-create-and-apply-patches
>
> Let me know if you have any other questions,
> Joe
> - - - - - -
> Joseph Percivall
> linkedin.com/in/Percivall
> e: joepercivall@yahoo.com
>
>
>
> On Wednesday, January 13, 2016 10:56 AM, sudeep mishra <
> sudeepshekharm@gmail.com> wrote:
>
>
>
> Thank you very much Joe.
>
> Can you please let me know how I can use the .patch file? I am using the
> NiFi via the binaries... Do I need to setup the source code and build the
> same along with the patch?
>
> Thanks & Regards,
>
> Sudeep
>
>
> On Wed, Jan 13, 2016 at 9:02 PM, Joe Percivall <jo...@yahoo.com>
> wrote:
>
> Hello Sudeep,
> >
> >I put up a patch on the GetDistributedMapCache ticket[1]. Let me know
> what you think.
> >
> >The PutDistributedMapCache processor and GetDistributedMapCache work with
> the data as a byte[] so it should be format agnostic. That being said it
> will be up to you to know what is in there in order to use it later.
> >
> >[1] https://issues.apache.org/jira/browse/NIFI-1382
> >
> >Joe
> >- - - - - -
> >Joseph Percivall
> >linkedin.com/in/Percivall
> >e: joepercivall@yahoo.com
> >
> >
> >
> >
> >On Tuesday, January 12, 2016 11:34 PM, sudeep mishra <
> sudeepshekharm@gmail.com> wrote:
> >
> >
> >
> >Thanks Joe.
> >
> >I do not have specific configuration as of now as I am still exploring
> NiFi. Though I think it would be helpful to let user store and retrieve the
> cache values in different formats json, avro etc.
> >
> >Thanks & Regards,
> >
> >Sudeep
> >
> >
> >
> >
> >
> >On Tue, Jan 12, 2016 at 9:15 PM, Joe Percivall <jo...@yahoo.com>
> wrote:
> >
> >Hello Sudeep,
> >>
> >>
> >>We are currently lacking a "GetDistributedMapCache" processor that
> corresponds to the "PutDistributedMapCache". I created a ticket[1] and will
> be working on it today. If you have any comments, configuration
> suggestions, etc. please let me know or comment on the ticket.
> >>
> >>
> >>[1] https://issues.apache.org/jira/browse/NIFI-1382
> >>
> >>Joe
> >>- - - - - -
> >>Joseph Percivall
> >>linkedin.com/in/Percivall
> >>e: joepercivall@yahoo.com
> >>
> >>
> >>
> >>
> >>
> >>On Tuesday, January 12, 2016 9:46 AM, sudeep mishra <
> sudeepshekharm@gmail.com> wrote:
> >>
> >>
> >>
> >>Thanks Matt.
> >>
> >>
> >>In my data flow I am expected to perform certain validations on data. I
> am loading some SQLServer data into HDFSusing Sqoop (not part of NiFi
> flow). For each record in HDFS file I have to query another database and
> then save the validated record again in HDFS which will be processed bysome
> Spark jobs.
> >>
> >>
> >>Since I have to query for each record thus I was planning to cache the
> database records against which I have to validate the HDFS. Thus I was
> evaluating the DistributedCacheServer. But looks like its purpose is
> different. Alternatively can we integrate Redis or another distributed
> cache with NiFi as I do not see any processor for it.
> >>
> >>
> >>Appreciate your help.
> >>
> >>
> >>Thanks & Regards,
> >>
> >>
> >>Sudeep
> >>
> >>
> >>
> >>
> >>On Tue, Jan 12, 2016 at 6:59 PM, Matthew Clarke <
> matt.clarke.138@gmail.com> wrote:
> >>
> >>Sudeep,
> >>>       I was a little off on my second scenario.  The detectduplicate
> processor uses the distributedcache service all on its own.. Files that are
> route through it are loaded into the cache if they do not already exist in
> the cache.  if they do already exist they are routed to duplicate.  The
> putDistributedCache processor was a community contribution to which there
> are no processor that make use of the info that it caches.
> >>>
> >>>       We should probably build a processor that would make use of the
> data that can be loaded by the putDistributeCache processor.  Is there a
> particular use case you are trying to solve where this would be applicable?
> >>>
> >>>
> >>>Thanks,
> >>>Matt
> >>>
> >>>
> >>>On Tue, Jan 12, 2016 at 8:11 AM, Matthew Clarke <
> matt.clarke.138@gmail.com> wrote:
> >>>
> >>>Sudeep,
> >>>>    The DistributedMapCache is typically used to prevent the
> consumption of duplicate data by some of the ingest type processors
> (GetHBASE, ListHDFS, and ListSFTP).  NiFi uses the service to keep a
> listing of what has been consumed so the same files are not consumed
> multiple times. The Service can also be used to detect if duplicate data
> already exists within a NiFi Instance or cluster. This would be the
> scenario where some source is pushing data to your NiFi and perhaps they
> push the same data more than once. You want to catch these duplicates so
> you can perhaps kick them out of your flow. For this you would use the
> PutDistributedCache processor to cache all incoming data and then use the
> DetectDuplicate processor to find those duplicates.
> >>>>
> >>>>    Was there a different use case you were looking to solve using the
> Distributed cache service?
> >>>>
> >>>>
> >>>>Thanks,
> >>>>Matt
> >>>>
> >>>>
> >>>>On Tue, Jan 12, 2016 at 4:36 AM, sudeep mishra <
> sudeepshekharm@gmail.com> wrote:
> >>>>
> >>>>Hi,
> >>>>>
> >>>>>
> >>>>>I can cache some data to be used in NiFi flow. I can see the
> processor PutDistributedMapCache in the documentation which saves key-value
> pairs in DistributedMapCache for NiFi but I do not see any processor to red
> this data. How can I read data from DistributedMapCache in my data flow?
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>Thanks & Regards,
> >>>>>
> >>>>>
> >>>>>Sudeep Shekhar Mishra
> >>>>>
> >>>>>
> >>>>
> >>>
> >>
> >>
> >>
> >>--
> >>
> >>Thanks & Regards,
> >>
> >>
> >>Sudeep Shekhar Mishra
> >>
> >>
> >>+91-9167519029
> >>sudeepshekharm@gmail.com
> >>
> >>
> >
> >
> >--
> >
> >Thanks & Regards,
> >
> >Sudeep Shekhar Mishra
> >
> >+91-9167519029
> >sudeepshekharm@gmail.com
> >
>
>
> --
>
> Thanks & Regards,
>
> Sudeep Shekhar Mishra
>
> +91-9167519029
> sudeepshekharm@gmail.com
>
>
>
>
> --
> Thanks & Regards,
>
> Sudeep Shekhar Mishra
>
> +91-9167519029
> sudeepshekharm@gmail.com
>
>
>
>
> --
> Thanks & Regards,
>
> Sudeep Shekhar Mishra
>
> +91-9167519029
> sudeepshekharm@gmail.com
>
>
>
>
> --
> Thanks & Regards,
>
> Sudeep Shekhar Mishra
>
> +91-9167519029
> sudeepshekharm@gmail.com
>
>
>
>
> --
> Thanks & Regards,
>
> Sudeep Shekhar Mishra
>
> +91-9167519029
> sudeepshekharm@gmail.com
>
>
>


-- 
Thanks & Regards,

Sudeep Shekhar Mishra

+91-9167519029
sudeepshekharm@gmail.com

Re: PutDistributedMapCache

Posted by Joe Percivall <jo...@yahoo.com>.
Hello Sudeep,
Sorry, not following your emails, did you need more help importing the processor?
Currently the way you would clear a DistributedMapCache is to just remove the DistributedMapCacheServer controller service and make a new one.
Joe - - - - - - Joseph Percivalllinkedin.com/in/Percivalle: joepercivall@yahoo.com
 

    On Thursday, January 14, 2016 7:04 AM, sudeep mishra <su...@gmail.com> wrote:
 

 Thanks Joe. The GetDistributedMapCache seems to be working fine. 
Is there a way to clear DistributedMapCache on demand?
Regards,
Sudeep
On Thu, Jan 14, 2016 at 12:42 PM, sudeep mishra <su...@gmail.com> wrote:

Upon building the repository we get different .nar files which can be updated in the lib for my requirement. Thanks for your help.
On Thu, Jan 14, 2016 at 9:27 AM, sudeep mishra <su...@gmail.com> wrote:

Is it possible to build the code for only a particular processor? Just curious if we can build and deploy a particular processor in an existing NiFi environment.
On Wed, Jan 13, 2016 at 9:33 PM, sudeep mishra <su...@gmail.com> wrote:

Thanks Joe. I will try out the patch.
On Wed, Jan 13, 2016 at 9:31 PM, Joe Percivall <jo...@yahoo.com> wrote:

You would need to clone the nifi source from github and then apply the patch using git.

Here is how to clone a repo: https://help.github.com/articles/cloning-a-repository/
Along with the nifi repo itself: https://github.com/apache/nifi

and how to apply a patch: http://makandracards.com/makandra/2521-git-how-to-create-and-apply-patches

Let me know if you have any other questions,
Joe
- - - - - -
Joseph Percivall
linkedin.com/in/Percivall
e: joepercivall@yahoo.com



On Wednesday, January 13, 2016 10:56 AM, sudeep mishra <su...@gmail.com> wrote:



Thank you very much Joe.

Can you please let me know how I can use the .patch file? I am using the NiFi via the binaries... Do I need to setup the source code and build the same along with the patch?

Thanks & Regards,

Sudeep


On Wed, Jan 13, 2016 at 9:02 PM, Joe Percivall <jo...@yahoo.com> wrote:

Hello Sudeep,
>
>I put up a patch on the GetDistributedMapCache ticket[1]. Let me know what you think.
>
>The PutDistributedMapCache processor and GetDistributedMapCache work with the data as a byte[] so it should be format agnostic. That being said it will be up to you to know what is in there in order to use it later.
>
>[1] https://issues.apache.org/jira/browse/NIFI-1382
>
>Joe
>- - - - - -
>Joseph Percivall
>linkedin.com/in/Percivall
>e: joepercivall@yahoo.com
>
>
>
>
>On Tuesday, January 12, 2016 11:34 PM, sudeep mishra <su...@gmail.com> wrote:
>
>
>
>Thanks Joe.
>
>I do not have specific configuration as of now as I am still exploring NiFi. Though I think it would be helpful to let user store and retrieve the cache values in different formats json, avro etc.
>
>Thanks & Regards,
>
>Sudeep
>
>
>
>
>
>On Tue, Jan 12, 2016 at 9:15 PM, Joe Percivall <jo...@yahoo.com> wrote:
>
>Hello Sudeep,
>>
>>
>>We are currently lacking a "GetDistributedMapCache" processor that corresponds to the "PutDistributedMapCache". I created a ticket[1] and will be working on it today. If you have any comments, configuration suggestions, etc. please let me know or comment on the ticket.
>>
>>
>>[1] https://issues.apache.org/jira/browse/NIFI-1382
>>
>>Joe
>>- - - - - -
>>Joseph Percivall
>>linkedin.com/in/Percivall
>>e: joepercivall@yahoo.com
>>
>>
>>
>>
>>
>>On Tuesday, January 12, 2016 9:46 AM, sudeep mishra <su...@gmail.com> wrote:
>>
>>
>>
>>Thanks Matt.
>>
>>
>>In my data flow I am expected to perform certain validations on data. I am loading some SQLServer data into HDFSusing Sqoop (not part of NiFi flow). For each record in HDFS file I have to query another database and then save the validated record again in HDFS which will be processed bysome Spark jobs.
>>
>>
>>Since I have to query for each record thus I was planning to cache the database records against which I have to validate the HDFS. Thus I was evaluating the DistributedCacheServer. But looks like its purpose is different. Alternatively can we integrate Redis or another distributed cache with NiFi as I do not see any processor for it.
>>
>>
>>Appreciate your help.
>>
>>
>>Thanks & Regards,
>>
>>
>>Sudeep
>>
>>
>>
>>
>>On Tue, Jan 12, 2016 at 6:59 PM, Matthew Clarke <ma...@gmail.com> wrote:
>>
>>Sudeep,
>>>       I was a little off on my second scenario.  The detectduplicate processor uses the distributedcache service all on its own.. Files that are route through it are loaded into the cache if they do not already exist in the cache.  if they do already exist they are routed to duplicate.  The putDistributedCache processor was a community contribution to which there are no processor that make use of the info that it caches.
>>>
>>>       We should probably build a processor that would make use of the data that can be loaded by the putDistributeCache processor.  Is there a particular use case you are trying to solve where this would be applicable?
>>>
>>>
>>>Thanks,
>>>Matt
>>>
>>>
>>>On Tue, Jan 12, 2016 at 8:11 AM, Matthew Clarke <ma...@gmail.com> wrote:
>>>
>>>Sudeep,
>>>>    The DistributedMapCache is typically used to prevent the consumption of duplicate data by some of the ingest type processors (GetHBASE, ListHDFS, and ListSFTP).  NiFi uses the service to keep a listing of what has been consumed so the same files are not consumed multiple times. The Service can also be used to detect if duplicate data already exists within a NiFi Instance or cluster. This would be the scenario where some source is pushing data to your NiFi and perhaps they push the same data more than once. You want to catch these duplicates so you can perhaps kick them out of your flow. For this you would use the PutDistributedCache processor to cache all incoming data and then use the DetectDuplicate processor to find those duplicates.
>>>>
>>>>    Was there a different use case you were looking to solve using the Distributed cache service?
>>>>
>>>>
>>>>Thanks,
>>>>Matt
>>>>
>>>>
>>>>On Tue, Jan 12, 2016 at 4:36 AM, sudeep mishra <su...@gmail.com> wrote:
>>>>
>>>>Hi,
>>>>>
>>>>>
>>>>>I can cache some data to be used in NiFi flow. I can see the processor PutDistributedMapCache in the documentation which saves key-value pairs in DistributedMapCache for NiFi but I do not see any processor to red this data. How can I read data from DistributedMapCache in my data flow?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>Thanks & Regards,
>>>>>
>>>>>
>>>>>Sudeep Shekhar Mishra
>>>>>
>>>>>
>>>>
>>>
>>
>>
>>
>>--
>>
>>Thanks & Regards,
>>
>>
>>Sudeep Shekhar Mishra
>>
>>
>>+91-9167519029
>>sudeepshekharm@gmail.com
>>
>>
>
>
>--
>
>Thanks & Regards,
>
>Sudeep Shekhar Mishra
>
>+91-9167519029
>sudeepshekharm@gmail.com
>


--

Thanks & Regards,

Sudeep Shekhar Mishra

+91-9167519029
sudeepshekharm@gmail.com




-- 
Thanks & Regards,
Sudeep Shekhar Mishra
+91-9167519029sudeepshekharm@gmail.com



-- 
Thanks & Regards,
Sudeep Shekhar Mishra
+91-9167519029sudeepshekharm@gmail.com



-- 
Thanks & Regards,
Sudeep Shekhar Mishra
+91-9167519029sudeepshekharm@gmail.com



-- 
Thanks & Regards,
Sudeep Shekhar Mishra
+91-9167519029sudeepshekharm@gmail.com

  

Re: PutDistributedMapCache

Posted by sudeep mishra <su...@gmail.com>.
Thanks Joe. The GetDistributedMapCache seems to be working fine.

Is there a way to clear DistributedMapCache on demand?

Regards,

Sudeep

On Thu, Jan 14, 2016 at 12:42 PM, sudeep mishra <su...@gmail.com>
wrote:

> Upon building the repository we get different .nar files which can be
> updated in the lib for my requirement.
> Thanks for your help.
>
> On Thu, Jan 14, 2016 at 9:27 AM, sudeep mishra <su...@gmail.com>
> wrote:
>
>> Is it possible to build the code for only a particular processor? Just
>> curious if we can build and deploy a particular processor in an existing
>> NiFi environment.
>>
>> On Wed, Jan 13, 2016 at 9:33 PM, sudeep mishra <su...@gmail.com>
>> wrote:
>>
>>> Thanks Joe. I will try out the patch.
>>>
>>> On Wed, Jan 13, 2016 at 9:31 PM, Joe Percivall <jo...@yahoo.com>
>>> wrote:
>>>
>>>> You would need to clone the nifi source from github and then apply the
>>>> patch using git.
>>>>
>>>> Here is how to clone a repo:
>>>> https://help.github.com/articles/cloning-a-repository/
>>>> Along with the nifi repo itself: https://github.com/apache/nifi
>>>>
>>>> and how to apply a patch:
>>>> http://makandracards.com/makandra/2521-git-how-to-create-and-apply-patches
>>>>
>>>> Let me know if you have any other questions,
>>>> Joe
>>>> - - - - - -
>>>> Joseph Percivall
>>>> linkedin.com/in/Percivall
>>>> e: joepercivall@yahoo.com
>>>>
>>>>
>>>>
>>>> On Wednesday, January 13, 2016 10:56 AM, sudeep mishra <
>>>> sudeepshekharm@gmail.com> wrote:
>>>>
>>>>
>>>>
>>>> Thank you very much Joe.
>>>>
>>>> Can you please let me know how I can use the .patch file? I am using
>>>> the NiFi via the binaries... Do I need to setup the source code and build
>>>> the same along with the patch?
>>>>
>>>> Thanks & Regards,
>>>>
>>>> Sudeep
>>>>
>>>>
>>>> On Wed, Jan 13, 2016 at 9:02 PM, Joe Percivall <jo...@yahoo.com>
>>>> wrote:
>>>>
>>>> Hello Sudeep,
>>>> >
>>>> >I put up a patch on the GetDistributedMapCache ticket[1]. Let me know
>>>> what you think.
>>>> >
>>>> >The PutDistributedMapCache processor and GetDistributedMapCache work
>>>> with the data as a byte[] so it should be format agnostic. That being said
>>>> it will be up to you to know what is in there in order to use it later.
>>>> >
>>>> >[1] https://issues.apache.org/jira/browse/NIFI-1382
>>>> >
>>>> >Joe
>>>> >- - - - - -
>>>> >Joseph Percivall
>>>> >linkedin.com/in/Percivall
>>>> >e: joepercivall@yahoo.com
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >On Tuesday, January 12, 2016 11:34 PM, sudeep mishra <
>>>> sudeepshekharm@gmail.com> wrote:
>>>> >
>>>> >
>>>> >
>>>> >Thanks Joe.
>>>> >
>>>> >I do not have specific configuration as of now as I am still exploring
>>>> NiFi. Though I think it would be helpful to let user store and retrieve the
>>>> cache values in different formats json, avro etc.
>>>> >
>>>> >Thanks & Regards,
>>>> >
>>>> >Sudeep
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >On Tue, Jan 12, 2016 at 9:15 PM, Joe Percivall <jo...@yahoo.com>
>>>> wrote:
>>>> >
>>>> >Hello Sudeep,
>>>> >>
>>>> >>
>>>> >>We are currently lacking a "GetDistributedMapCache" processor that
>>>> corresponds to the "PutDistributedMapCache". I created a ticket[1] and will
>>>> be working on it today. If you have any comments, configuration
>>>> suggestions, etc. please let me know or comment on the ticket.
>>>> >>
>>>> >>
>>>> >>[1] https://issues.apache.org/jira/browse/NIFI-1382
>>>> >>
>>>> >>Joe
>>>> >>- - - - - -
>>>> >>Joseph Percivall
>>>> >>linkedin.com/in/Percivall
>>>> >>e: joepercivall@yahoo.com
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >>On Tuesday, January 12, 2016 9:46 AM, sudeep mishra <
>>>> sudeepshekharm@gmail.com> wrote:
>>>> >>
>>>> >>
>>>> >>
>>>> >>Thanks Matt.
>>>> >>
>>>> >>
>>>> >>In my data flow I am expected to perform certain validations on data.
>>>> I am loading some SQLServer data into HDFSusing Sqoop (not part of NiFi
>>>> flow). For each record in HDFS file I have to query another database and
>>>> then save the validated record again in HDFS which will be processed bysome
>>>> Spark jobs.
>>>> >>
>>>> >>
>>>> >>Since I have to query for each record thus I was planning to cache
>>>> the database records against which I have to validate the HDFS. Thus I was
>>>> evaluating the DistributedCacheServer. But looks like its purpose is
>>>> different. Alternatively can we integrate Redis or another distributed
>>>> cache with NiFi as I do not see any processor for it.
>>>> >>
>>>> >>
>>>> >>Appreciate your help.
>>>> >>
>>>> >>
>>>> >>Thanks & Regards,
>>>> >>
>>>> >>
>>>> >>Sudeep
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >>On Tue, Jan 12, 2016 at 6:59 PM, Matthew Clarke <
>>>> matt.clarke.138@gmail.com> wrote:
>>>> >>
>>>> >>Sudeep,
>>>> >>>       I was a little off on my second scenario.  The
>>>> detectduplicate processor uses the distributedcache service all on its
>>>> own.. Files that are route through it are loaded into the cache if they do
>>>> not already exist in the cache.  if they do already exist they are routed
>>>> to duplicate.  The putDistributedCache processor was a community
>>>> contribution to which there are no processor that make use of the info that
>>>> it caches.
>>>> >>>
>>>> >>>       We should probably build a processor that would make use of
>>>> the data that can be loaded by the putDistributeCache processor.  Is there
>>>> a particular use case you are trying to solve where this would be
>>>> applicable?
>>>> >>>
>>>> >>>
>>>> >>>Thanks,
>>>> >>>Matt
>>>> >>>
>>>> >>>
>>>> >>>On Tue, Jan 12, 2016 at 8:11 AM, Matthew Clarke <
>>>> matt.clarke.138@gmail.com> wrote:
>>>> >>>
>>>> >>>Sudeep,
>>>> >>>>    The DistributedMapCache is typically used to prevent the
>>>> consumption of duplicate data by some of the ingest type processors
>>>> (GetHBASE, ListHDFS, and ListSFTP).  NiFi uses the service to keep a
>>>> listing of what has been consumed so the same files are not consumed
>>>> multiple times. The Service can also be used to detect if duplicate data
>>>> already exists within a NiFi Instance or cluster. This would be the
>>>> scenario where some source is pushing data to your NiFi and perhaps they
>>>> push the same data more than once. You want to catch these duplicates so
>>>> you can perhaps kick them out of your flow. For this you would use the
>>>> PutDistributedCache processor to cache all incoming data and then use the
>>>> DetectDuplicate processor to find those duplicates.
>>>> >>>>
>>>> >>>>    Was there a different use case you were looking to solve using
>>>> the Distributed cache service?
>>>> >>>>
>>>> >>>>
>>>> >>>>Thanks,
>>>> >>>>Matt
>>>> >>>>
>>>> >>>>
>>>> >>>>On Tue, Jan 12, 2016 at 4:36 AM, sudeep mishra <
>>>> sudeepshekharm@gmail.com> wrote:
>>>> >>>>
>>>> >>>>Hi,
>>>> >>>>>
>>>> >>>>>
>>>> >>>>>I can cache some data to be used in NiFi flow. I can see the
>>>> processor PutDistributedMapCache in the documentation which saves key-value
>>>> pairs in DistributedMapCache for NiFi but I do not see any processor to red
>>>> this data. How can I read data from DistributedMapCache in my data flow?
>>>> >>>>>
>>>> >>>>>
>>>> >>>>>
>>>> >>>>>
>>>> >>>>>Thanks & Regards,
>>>> >>>>>
>>>> >>>>>
>>>> >>>>>Sudeep Shekhar Mishra
>>>> >>>>>
>>>> >>>>>
>>>> >>>>
>>>> >>>
>>>> >>
>>>> >>
>>>> >>
>>>> >>--
>>>> >>
>>>> >>Thanks & Regards,
>>>> >>
>>>> >>
>>>> >>Sudeep Shekhar Mishra
>>>> >>
>>>> >>
>>>> >>+91-9167519029
>>>> >>sudeepshekharm@gmail.com
>>>> >>
>>>> >>
>>>> >
>>>> >
>>>> >--
>>>> >
>>>> >Thanks & Regards,
>>>> >
>>>> >Sudeep Shekhar Mishra
>>>> >
>>>> >+91-9167519029
>>>> >sudeepshekharm@gmail.com
>>>> >
>>>>
>>>>
>>>> --
>>>>
>>>> Thanks & Regards,
>>>>
>>>> Sudeep Shekhar Mishra
>>>>
>>>> +91-9167519029
>>>> sudeepshekharm@gmail.com
>>>>
>>>
>>>
>>>
>>> --
>>> Thanks & Regards,
>>>
>>> Sudeep Shekhar Mishra
>>>
>>> +91-9167519029
>>> sudeepshekharm@gmail.com
>>>
>>
>>
>>
>> --
>> Thanks & Regards,
>>
>> Sudeep Shekhar Mishra
>>
>> +91-9167519029
>> sudeepshekharm@gmail.com
>>
>
>
>
> --
> Thanks & Regards,
>
> Sudeep Shekhar Mishra
>
> +91-9167519029
> sudeepshekharm@gmail.com
>



-- 
Thanks & Regards,

Sudeep Shekhar Mishra

+91-9167519029
sudeepshekharm@gmail.com

Re: PutDistributedMapCache

Posted by sudeep mishra <su...@gmail.com>.
Upon building the repository we get different .nar files which can be
updated in the lib for my requirement.
Thanks for your help.

On Thu, Jan 14, 2016 at 9:27 AM, sudeep mishra <su...@gmail.com>
wrote:

> Is it possible to build the code for only a particular processor? Just
> curious if we can build and deploy a particular processor in an existing
> NiFi environment.
>
> On Wed, Jan 13, 2016 at 9:33 PM, sudeep mishra <su...@gmail.com>
> wrote:
>
>> Thanks Joe. I will try out the patch.
>>
>> On Wed, Jan 13, 2016 at 9:31 PM, Joe Percivall <jo...@yahoo.com>
>> wrote:
>>
>>> You would need to clone the nifi source from github and then apply the
>>> patch using git.
>>>
>>> Here is how to clone a repo:
>>> https://help.github.com/articles/cloning-a-repository/
>>> Along with the nifi repo itself: https://github.com/apache/nifi
>>>
>>> and how to apply a patch:
>>> http://makandracards.com/makandra/2521-git-how-to-create-and-apply-patches
>>>
>>> Let me know if you have any other questions,
>>> Joe
>>> - - - - - -
>>> Joseph Percivall
>>> linkedin.com/in/Percivall
>>> e: joepercivall@yahoo.com
>>>
>>>
>>>
>>> On Wednesday, January 13, 2016 10:56 AM, sudeep mishra <
>>> sudeepshekharm@gmail.com> wrote:
>>>
>>>
>>>
>>> Thank you very much Joe.
>>>
>>> Can you please let me know how I can use the .patch file? I am using the
>>> NiFi via the binaries... Do I need to setup the source code and build the
>>> same along with the patch?
>>>
>>> Thanks & Regards,
>>>
>>> Sudeep
>>>
>>>
>>> On Wed, Jan 13, 2016 at 9:02 PM, Joe Percivall <jo...@yahoo.com>
>>> wrote:
>>>
>>> Hello Sudeep,
>>> >
>>> >I put up a patch on the GetDistributedMapCache ticket[1]. Let me know
>>> what you think.
>>> >
>>> >The PutDistributedMapCache processor and GetDistributedMapCache work
>>> with the data as a byte[] so it should be format agnostic. That being said
>>> it will be up to you to know what is in there in order to use it later.
>>> >
>>> >[1] https://issues.apache.org/jira/browse/NIFI-1382
>>> >
>>> >Joe
>>> >- - - - - -
>>> >Joseph Percivall
>>> >linkedin.com/in/Percivall
>>> >e: joepercivall@yahoo.com
>>> >
>>> >
>>> >
>>> >
>>> >On Tuesday, January 12, 2016 11:34 PM, sudeep mishra <
>>> sudeepshekharm@gmail.com> wrote:
>>> >
>>> >
>>> >
>>> >Thanks Joe.
>>> >
>>> >I do not have specific configuration as of now as I am still exploring
>>> NiFi. Though I think it would be helpful to let user store and retrieve the
>>> cache values in different formats json, avro etc.
>>> >
>>> >Thanks & Regards,
>>> >
>>> >Sudeep
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >On Tue, Jan 12, 2016 at 9:15 PM, Joe Percivall <jo...@yahoo.com>
>>> wrote:
>>> >
>>> >Hello Sudeep,
>>> >>
>>> >>
>>> >>We are currently lacking a "GetDistributedMapCache" processor that
>>> corresponds to the "PutDistributedMapCache". I created a ticket[1] and will
>>> be working on it today. If you have any comments, configuration
>>> suggestions, etc. please let me know or comment on the ticket.
>>> >>
>>> >>
>>> >>[1] https://issues.apache.org/jira/browse/NIFI-1382
>>> >>
>>> >>Joe
>>> >>- - - - - -
>>> >>Joseph Percivall
>>> >>linkedin.com/in/Percivall
>>> >>e: joepercivall@yahoo.com
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>On Tuesday, January 12, 2016 9:46 AM, sudeep mishra <
>>> sudeepshekharm@gmail.com> wrote:
>>> >>
>>> >>
>>> >>
>>> >>Thanks Matt.
>>> >>
>>> >>
>>> >>In my data flow I am expected to perform certain validations on data.
>>> I am loading some SQLServer data into HDFSusing Sqoop (not part of NiFi
>>> flow). For each record in HDFS file I have to query another database and
>>> then save the validated record again in HDFS which will be processed bysome
>>> Spark jobs.
>>> >>
>>> >>
>>> >>Since I have to query for each record thus I was planning to cache the
>>> database records against which I have to validate the HDFS. Thus I was
>>> evaluating the DistributedCacheServer. But looks like its purpose is
>>> different. Alternatively can we integrate Redis or another distributed
>>> cache with NiFi as I do not see any processor for it.
>>> >>
>>> >>
>>> >>Appreciate your help.
>>> >>
>>> >>
>>> >>Thanks & Regards,
>>> >>
>>> >>
>>> >>Sudeep
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>On Tue, Jan 12, 2016 at 6:59 PM, Matthew Clarke <
>>> matt.clarke.138@gmail.com> wrote:
>>> >>
>>> >>Sudeep,
>>> >>>       I was a little off on my second scenario.  The detectduplicate
>>> processor uses the distributedcache service all on its own.. Files that are
>>> route through it are loaded into the cache if they do not already exist in
>>> the cache.  if they do already exist they are routed to duplicate.  The
>>> putDistributedCache processor was a community contribution to which there
>>> are no processor that make use of the info that it caches.
>>> >>>
>>> >>>       We should probably build a processor that would make use of
>>> the data that can be loaded by the putDistributeCache processor.  Is there
>>> a particular use case you are trying to solve where this would be
>>> applicable?
>>> >>>
>>> >>>
>>> >>>Thanks,
>>> >>>Matt
>>> >>>
>>> >>>
>>> >>>On Tue, Jan 12, 2016 at 8:11 AM, Matthew Clarke <
>>> matt.clarke.138@gmail.com> wrote:
>>> >>>
>>> >>>Sudeep,
>>> >>>>    The DistributedMapCache is typically used to prevent the
>>> consumption of duplicate data by some of the ingest type processors
>>> (GetHBASE, ListHDFS, and ListSFTP).  NiFi uses the service to keep a
>>> listing of what has been consumed so the same files are not consumed
>>> multiple times. The Service can also be used to detect if duplicate data
>>> already exists within a NiFi Instance or cluster. This would be the
>>> scenario where some source is pushing data to your NiFi and perhaps they
>>> push the same data more than once. You want to catch these duplicates so
>>> you can perhaps kick them out of your flow. For this you would use the
>>> PutDistributedCache processor to cache all incoming data and then use the
>>> DetectDuplicate processor to find those duplicates.
>>> >>>>
>>> >>>>    Was there a different use case you were looking to solve using
>>> the Distributed cache service?
>>> >>>>
>>> >>>>
>>> >>>>Thanks,
>>> >>>>Matt
>>> >>>>
>>> >>>>
>>> >>>>On Tue, Jan 12, 2016 at 4:36 AM, sudeep mishra <
>>> sudeepshekharm@gmail.com> wrote:
>>> >>>>
>>> >>>>Hi,
>>> >>>>>
>>> >>>>>
>>> >>>>>I can cache some data to be used in NiFi flow. I can see the
>>> processor PutDistributedMapCache in the documentation which saves key-value
>>> pairs in DistributedMapCache for NiFi but I do not see any processor to red
>>> this data. How can I read data from DistributedMapCache in my data flow?
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>>Thanks & Regards,
>>> >>>>>
>>> >>>>>
>>> >>>>>Sudeep Shekhar Mishra
>>> >>>>>
>>> >>>>>
>>> >>>>
>>> >>>
>>> >>
>>> >>
>>> >>
>>> >>--
>>> >>
>>> >>Thanks & Regards,
>>> >>
>>> >>
>>> >>Sudeep Shekhar Mishra
>>> >>
>>> >>
>>> >>+91-9167519029
>>> >>sudeepshekharm@gmail.com
>>> >>
>>> >>
>>> >
>>> >
>>> >--
>>> >
>>> >Thanks & Regards,
>>> >
>>> >Sudeep Shekhar Mishra
>>> >
>>> >+91-9167519029
>>> >sudeepshekharm@gmail.com
>>> >
>>>
>>>
>>> --
>>>
>>> Thanks & Regards,
>>>
>>> Sudeep Shekhar Mishra
>>>
>>> +91-9167519029
>>> sudeepshekharm@gmail.com
>>>
>>
>>
>>
>> --
>> Thanks & Regards,
>>
>> Sudeep Shekhar Mishra
>>
>> +91-9167519029
>> sudeepshekharm@gmail.com
>>
>
>
>
> --
> Thanks & Regards,
>
> Sudeep Shekhar Mishra
>
> +91-9167519029
> sudeepshekharm@gmail.com
>



-- 
Thanks & Regards,

Sudeep Shekhar Mishra

+91-9167519029
sudeepshekharm@gmail.com

Re: PutDistributedMapCache

Posted by sudeep mishra <su...@gmail.com>.
Is it possible to build the code for only a particular processor? Just
curious if we can build and deploy a particular processor in an existing
NiFi environment.

On Wed, Jan 13, 2016 at 9:33 PM, sudeep mishra <su...@gmail.com>
wrote:

> Thanks Joe. I will try out the patch.
>
> On Wed, Jan 13, 2016 at 9:31 PM, Joe Percivall <jo...@yahoo.com>
> wrote:
>
>> You would need to clone the nifi source from github and then apply the
>> patch using git.
>>
>> Here is how to clone a repo:
>> https://help.github.com/articles/cloning-a-repository/
>> Along with the nifi repo itself: https://github.com/apache/nifi
>>
>> and how to apply a patch:
>> http://makandracards.com/makandra/2521-git-how-to-create-and-apply-patches
>>
>> Let me know if you have any other questions,
>> Joe
>> - - - - - -
>> Joseph Percivall
>> linkedin.com/in/Percivall
>> e: joepercivall@yahoo.com
>>
>>
>>
>> On Wednesday, January 13, 2016 10:56 AM, sudeep mishra <
>> sudeepshekharm@gmail.com> wrote:
>>
>>
>>
>> Thank you very much Joe.
>>
>> Can you please let me know how I can use the .patch file? I am using the
>> NiFi via the binaries... Do I need to setup the source code and build the
>> same along with the patch?
>>
>> Thanks & Regards,
>>
>> Sudeep
>>
>>
>> On Wed, Jan 13, 2016 at 9:02 PM, Joe Percivall <jo...@yahoo.com>
>> wrote:
>>
>> Hello Sudeep,
>> >
>> >I put up a patch on the GetDistributedMapCache ticket[1]. Let me know
>> what you think.
>> >
>> >The PutDistributedMapCache processor and GetDistributedMapCache work
>> with the data as a byte[] so it should be format agnostic. That being said
>> it will be up to you to know what is in there in order to use it later.
>> >
>> >[1] https://issues.apache.org/jira/browse/NIFI-1382
>> >
>> >Joe
>> >- - - - - -
>> >Joseph Percivall
>> >linkedin.com/in/Percivall
>> >e: joepercivall@yahoo.com
>> >
>> >
>> >
>> >
>> >On Tuesday, January 12, 2016 11:34 PM, sudeep mishra <
>> sudeepshekharm@gmail.com> wrote:
>> >
>> >
>> >
>> >Thanks Joe.
>> >
>> >I do not have specific configuration as of now as I am still exploring
>> NiFi. Though I think it would be helpful to let user store and retrieve the
>> cache values in different formats json, avro etc.
>> >
>> >Thanks & Regards,
>> >
>> >Sudeep
>> >
>> >
>> >
>> >
>> >
>> >On Tue, Jan 12, 2016 at 9:15 PM, Joe Percivall <jo...@yahoo.com>
>> wrote:
>> >
>> >Hello Sudeep,
>> >>
>> >>
>> >>We are currently lacking a "GetDistributedMapCache" processor that
>> corresponds to the "PutDistributedMapCache". I created a ticket[1] and will
>> be working on it today. If you have any comments, configuration
>> suggestions, etc. please let me know or comment on the ticket.
>> >>
>> >>
>> >>[1] https://issues.apache.org/jira/browse/NIFI-1382
>> >>
>> >>Joe
>> >>- - - - - -
>> >>Joseph Percivall
>> >>linkedin.com/in/Percivall
>> >>e: joepercivall@yahoo.com
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>On Tuesday, January 12, 2016 9:46 AM, sudeep mishra <
>> sudeepshekharm@gmail.com> wrote:
>> >>
>> >>
>> >>
>> >>Thanks Matt.
>> >>
>> >>
>> >>In my data flow I am expected to perform certain validations on data. I
>> am loading some SQLServer data into HDFSusing Sqoop (not part of NiFi
>> flow). For each record in HDFS file I have to query another database and
>> then save the validated record again in HDFS which will be processed bysome
>> Spark jobs.
>> >>
>> >>
>> >>Since I have to query for each record thus I was planning to cache the
>> database records against which I have to validate the HDFS. Thus I was
>> evaluating the DistributedCacheServer. But looks like its purpose is
>> different. Alternatively can we integrate Redis or another distributed
>> cache with NiFi as I do not see any processor for it.
>> >>
>> >>
>> >>Appreciate your help.
>> >>
>> >>
>> >>Thanks & Regards,
>> >>
>> >>
>> >>Sudeep
>> >>
>> >>
>> >>
>> >>
>> >>On Tue, Jan 12, 2016 at 6:59 PM, Matthew Clarke <
>> matt.clarke.138@gmail.com> wrote:
>> >>
>> >>Sudeep,
>> >>>       I was a little off on my second scenario.  The detectduplicate
>> processor uses the distributedcache service all on its own.. Files that are
>> route through it are loaded into the cache if they do not already exist in
>> the cache.  if they do already exist they are routed to duplicate.  The
>> putDistributedCache processor was a community contribution to which there
>> are no processor that make use of the info that it caches.
>> >>>
>> >>>       We should probably build a processor that would make use of the
>> data that can be loaded by the putDistributeCache processor.  Is there a
>> particular use case you are trying to solve where this would be applicable?
>> >>>
>> >>>
>> >>>Thanks,
>> >>>Matt
>> >>>
>> >>>
>> >>>On Tue, Jan 12, 2016 at 8:11 AM, Matthew Clarke <
>> matt.clarke.138@gmail.com> wrote:
>> >>>
>> >>>Sudeep,
>> >>>>    The DistributedMapCache is typically used to prevent the
>> consumption of duplicate data by some of the ingest type processors
>> (GetHBASE, ListHDFS, and ListSFTP).  NiFi uses the service to keep a
>> listing of what has been consumed so the same files are not consumed
>> multiple times. The Service can also be used to detect if duplicate data
>> already exists within a NiFi Instance or cluster. This would be the
>> scenario where some source is pushing data to your NiFi and perhaps they
>> push the same data more than once. You want to catch these duplicates so
>> you can perhaps kick them out of your flow. For this you would use the
>> PutDistributedCache processor to cache all incoming data and then use the
>> DetectDuplicate processor to find those duplicates.
>> >>>>
>> >>>>    Was there a different use case you were looking to solve using
>> the Distributed cache service?
>> >>>>
>> >>>>
>> >>>>Thanks,
>> >>>>Matt
>> >>>>
>> >>>>
>> >>>>On Tue, Jan 12, 2016 at 4:36 AM, sudeep mishra <
>> sudeepshekharm@gmail.com> wrote:
>> >>>>
>> >>>>Hi,
>> >>>>>
>> >>>>>
>> >>>>>I can cache some data to be used in NiFi flow. I can see the
>> processor PutDistributedMapCache in the documentation which saves key-value
>> pairs in DistributedMapCache for NiFi but I do not see any processor to red
>> this data. How can I read data from DistributedMapCache in my data flow?
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>Thanks & Regards,
>> >>>>>
>> >>>>>
>> >>>>>Sudeep Shekhar Mishra
>> >>>>>
>> >>>>>
>> >>>>
>> >>>
>> >>
>> >>
>> >>
>> >>--
>> >>
>> >>Thanks & Regards,
>> >>
>> >>
>> >>Sudeep Shekhar Mishra
>> >>
>> >>
>> >>+91-9167519029
>> >>sudeepshekharm@gmail.com
>> >>
>> >>
>> >
>> >
>> >--
>> >
>> >Thanks & Regards,
>> >
>> >Sudeep Shekhar Mishra
>> >
>> >+91-9167519029
>> >sudeepshekharm@gmail.com
>> >
>>
>>
>> --
>>
>> Thanks & Regards,
>>
>> Sudeep Shekhar Mishra
>>
>> +91-9167519029
>> sudeepshekharm@gmail.com
>>
>
>
>
> --
> Thanks & Regards,
>
> Sudeep Shekhar Mishra
>
> +91-9167519029
> sudeepshekharm@gmail.com
>



-- 
Thanks & Regards,

Sudeep Shekhar Mishra

+91-9167519029
sudeepshekharm@gmail.com

Re: PutDistributedMapCache

Posted by sudeep mishra <su...@gmail.com>.
Thanks Joe. I will try out the patch.

On Wed, Jan 13, 2016 at 9:31 PM, Joe Percivall <jo...@yahoo.com>
wrote:

> You would need to clone the nifi source from github and then apply the
> patch using git.
>
> Here is how to clone a repo:
> https://help.github.com/articles/cloning-a-repository/
> Along with the nifi repo itself: https://github.com/apache/nifi
>
> and how to apply a patch:
> http://makandracards.com/makandra/2521-git-how-to-create-and-apply-patches
>
> Let me know if you have any other questions,
> Joe
> - - - - - -
> Joseph Percivall
> linkedin.com/in/Percivall
> e: joepercivall@yahoo.com
>
>
>
> On Wednesday, January 13, 2016 10:56 AM, sudeep mishra <
> sudeepshekharm@gmail.com> wrote:
>
>
>
> Thank you very much Joe.
>
> Can you please let me know how I can use the .patch file? I am using the
> NiFi via the binaries... Do I need to setup the source code and build the
> same along with the patch?
>
> Thanks & Regards,
>
> Sudeep
>
>
> On Wed, Jan 13, 2016 at 9:02 PM, Joe Percivall <jo...@yahoo.com>
> wrote:
>
> Hello Sudeep,
> >
> >I put up a patch on the GetDistributedMapCache ticket[1]. Let me know
> what you think.
> >
> >The PutDistributedMapCache processor and GetDistributedMapCache work with
> the data as a byte[] so it should be format agnostic. That being said it
> will be up to you to know what is in there in order to use it later.
> >
> >[1] https://issues.apache.org/jira/browse/NIFI-1382
> >
> >Joe
> >- - - - - -
> >Joseph Percivall
> >linkedin.com/in/Percivall
> >e: joepercivall@yahoo.com
> >
> >
> >
> >
> >On Tuesday, January 12, 2016 11:34 PM, sudeep mishra <
> sudeepshekharm@gmail.com> wrote:
> >
> >
> >
> >Thanks Joe.
> >
> >I do not have specific configuration as of now as I am still exploring
> NiFi. Though I think it would be helpful to let user store and retrieve the
> cache values in different formats json, avro etc.
> >
> >Thanks & Regards,
> >
> >Sudeep
> >
> >
> >
> >
> >
> >On Tue, Jan 12, 2016 at 9:15 PM, Joe Percivall <jo...@yahoo.com>
> wrote:
> >
> >Hello Sudeep,
> >>
> >>
> >>We are currently lacking a "GetDistributedMapCache" processor that
> corresponds to the "PutDistributedMapCache". I created a ticket[1] and will
> be working on it today. If you have any comments, configuration
> suggestions, etc. please let me know or comment on the ticket.
> >>
> >>
> >>[1] https://issues.apache.org/jira/browse/NIFI-1382
> >>
> >>Joe
> >>- - - - - -
> >>Joseph Percivall
> >>linkedin.com/in/Percivall
> >>e: joepercivall@yahoo.com
> >>
> >>
> >>
> >>
> >>
> >>On Tuesday, January 12, 2016 9:46 AM, sudeep mishra <
> sudeepshekharm@gmail.com> wrote:
> >>
> >>
> >>
> >>Thanks Matt.
> >>
> >>
> >>In my data flow I am expected to perform certain validations on data. I
> am loading some SQLServer data into HDFSusing Sqoop (not part of NiFi
> flow). For each record in HDFS file I have to query another database and
> then save the validated record again in HDFS which will be processed bysome
> Spark jobs.
> >>
> >>
> >>Since I have to query for each record thus I was planning to cache the
> database records against which I have to validate the HDFS. Thus I was
> evaluating the DistributedCacheServer. But looks like its purpose is
> different. Alternatively can we integrate Redis or another distributed
> cache with NiFi as I do not see any processor for it.
> >>
> >>
> >>Appreciate your help.
> >>
> >>
> >>Thanks & Regards,
> >>
> >>
> >>Sudeep
> >>
> >>
> >>
> >>
> >>On Tue, Jan 12, 2016 at 6:59 PM, Matthew Clarke <
> matt.clarke.138@gmail.com> wrote:
> >>
> >>Sudeep,
> >>>       I was a little off on my second scenario.  The detectduplicate
> processor uses the distributedcache service all on its own.. Files that are
> route through it are loaded into the cache if they do not already exist in
> the cache.  if they do already exist they are routed to duplicate.  The
> putDistributedCache processor was a community contribution to which there
> are no processor that make use of the info that it caches.
> >>>
> >>>       We should probably build a processor that would make use of the
> data that can be loaded by the putDistributeCache processor.  Is there a
> particular use case you are trying to solve where this would be applicable?
> >>>
> >>>
> >>>Thanks,
> >>>Matt
> >>>
> >>>
> >>>On Tue, Jan 12, 2016 at 8:11 AM, Matthew Clarke <
> matt.clarke.138@gmail.com> wrote:
> >>>
> >>>Sudeep,
> >>>>    The DistributedMapCache is typically used to prevent the
> consumption of duplicate data by some of the ingest type processors
> (GetHBASE, ListHDFS, and ListSFTP).  NiFi uses the service to keep a
> listing of what has been consumed so the same files are not consumed
> multiple times. The Service can also be used to detect if duplicate data
> already exists within a NiFi Instance or cluster. This would be the
> scenario where some source is pushing data to your NiFi and perhaps they
> push the same data more than once. You want to catch these duplicates so
> you can perhaps kick them out of your flow. For this you would use the
> PutDistributedCache processor to cache all incoming data and then use the
> DetectDuplicate processor to find those duplicates.
> >>>>
> >>>>    Was there a different use case you were looking to solve using the
> Distributed cache service?
> >>>>
> >>>>
> >>>>Thanks,
> >>>>Matt
> >>>>
> >>>>
> >>>>On Tue, Jan 12, 2016 at 4:36 AM, sudeep mishra <
> sudeepshekharm@gmail.com> wrote:
> >>>>
> >>>>Hi,
> >>>>>
> >>>>>
> >>>>>I can cache some data to be used in NiFi flow. I can see the
> processor PutDistributedMapCache in the documentation which saves key-value
> pairs in DistributedMapCache for NiFi but I do not see any processor to red
> this data. How can I read data from DistributedMapCache in my data flow?
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>Thanks & Regards,
> >>>>>
> >>>>>
> >>>>>Sudeep Shekhar Mishra
> >>>>>
> >>>>>
> >>>>
> >>>
> >>
> >>
> >>
> >>--
> >>
> >>Thanks & Regards,
> >>
> >>
> >>Sudeep Shekhar Mishra
> >>
> >>
> >>+91-9167519029
> >>sudeepshekharm@gmail.com
> >>
> >>
> >
> >
> >--
> >
> >Thanks & Regards,
> >
> >Sudeep Shekhar Mishra
> >
> >+91-9167519029
> >sudeepshekharm@gmail.com
> >
>
>
> --
>
> Thanks & Regards,
>
> Sudeep Shekhar Mishra
>
> +91-9167519029
> sudeepshekharm@gmail.com
>



-- 
Thanks & Regards,

Sudeep Shekhar Mishra

+91-9167519029
sudeepshekharm@gmail.com

Re: PutDistributedMapCache

Posted by Joe Percivall <jo...@yahoo.com>.
You would need to clone the nifi source from github and then apply the patch using git.

Here is how to clone a repo: https://help.github.com/articles/cloning-a-repository/
Along with the nifi repo itself: https://github.com/apache/nifi

and how to apply a patch: http://makandracards.com/makandra/2521-git-how-to-create-and-apply-patches
 
Let me know if you have any other questions,
Joe
- - - - - - 
Joseph Percivall
linkedin.com/in/Percivall
e: joepercivall@yahoo.com



On Wednesday, January 13, 2016 10:56 AM, sudeep mishra <su...@gmail.com> wrote:



Thank you very much Joe.

Can you please let me know how I can use the .patch file? I am using the NiFi via the binaries... Do I need to setup the source code and build the same along with the patch?

Thanks & Regards,

Sudeep


On Wed, Jan 13, 2016 at 9:02 PM, Joe Percivall <jo...@yahoo.com> wrote:

Hello Sudeep,
>
>I put up a patch on the GetDistributedMapCache ticket[1]. Let me know what you think.
>
>The PutDistributedMapCache processor and GetDistributedMapCache work with the data as a byte[] so it should be format agnostic. That being said it will be up to you to know what is in there in order to use it later.
>
>[1] https://issues.apache.org/jira/browse/NIFI-1382
>
>Joe
>- - - - - -
>Joseph Percivall
>linkedin.com/in/Percivall
>e: joepercivall@yahoo.com
>
>
>
>
>On Tuesday, January 12, 2016 11:34 PM, sudeep mishra <su...@gmail.com> wrote:
>
>
>
>Thanks Joe.
>
>I do not have specific configuration as of now as I am still exploring NiFi. Though I think it would be helpful to let user store and retrieve the cache values in different formats json, avro etc.
>
>Thanks & Regards,
>
>Sudeep
>
>
>
>
>
>On Tue, Jan 12, 2016 at 9:15 PM, Joe Percivall <jo...@yahoo.com> wrote:
>
>Hello Sudeep,
>>
>>
>>We are currently lacking a "GetDistributedMapCache" processor that corresponds to the "PutDistributedMapCache". I created a ticket[1] and will be working on it today. If you have any comments, configuration suggestions, etc. please let me know or comment on the ticket.
>>
>>
>>[1] https://issues.apache.org/jira/browse/NIFI-1382
>>
>>Joe
>>- - - - - -
>>Joseph Percivall
>>linkedin.com/in/Percivall
>>e: joepercivall@yahoo.com
>>
>>
>>
>>
>>
>>On Tuesday, January 12, 2016 9:46 AM, sudeep mishra <su...@gmail.com> wrote:
>>
>>
>>
>>Thanks Matt.
>>
>>
>>In my data flow I am expected to perform certain validations on data. I am loading some SQLServer data into HDFSusing Sqoop (not part of NiFi flow). For each record in HDFS file I have to query another database and then save the validated record again in HDFS which will be processed bysome Spark jobs.
>>
>>
>>Since I have to query for each record thus I was planning to cache the database records against which I have to validate the HDFS. Thus I was evaluating the DistributedCacheServer. But looks like its purpose is different. Alternatively can we integrate Redis or another distributed cache with NiFi as I do not see any processor for it.
>>
>>
>>Appreciate your help.
>>
>>
>>Thanks & Regards,
>>
>>
>>Sudeep
>>
>>
>>
>>
>>On Tue, Jan 12, 2016 at 6:59 PM, Matthew Clarke <ma...@gmail.com> wrote:
>>
>>Sudeep,
>>>       I was a little off on my second scenario.  The detectduplicate processor uses the distributedcache service all on its own.. Files that are route through it are loaded into the cache if they do not already exist in the cache.  if they do already exist they are routed to duplicate.  The putDistributedCache processor was a community contribution to which there are no processor that make use of the info that it caches.
>>>
>>>       We should probably build a processor that would make use of the data that can be loaded by the putDistributeCache processor.  Is there a particular use case you are trying to solve where this would be applicable?
>>>
>>>
>>>Thanks,
>>>Matt
>>>
>>>
>>>On Tue, Jan 12, 2016 at 8:11 AM, Matthew Clarke <ma...@gmail.com> wrote:
>>>
>>>Sudeep,
>>>>    The DistributedMapCache is typically used to prevent the consumption of duplicate data by some of the ingest type processors (GetHBASE, ListHDFS, and ListSFTP).  NiFi uses the service to keep a listing of what has been consumed so the same files are not consumed multiple times. The Service can also be used to detect if duplicate data already exists within a NiFi Instance or cluster. This would be the scenario where some source is pushing data to your NiFi and perhaps they push the same data more than once. You want to catch these duplicates so you can perhaps kick them out of your flow. For this you would use the PutDistributedCache processor to cache all incoming data and then use the DetectDuplicate processor to find those duplicates.
>>>>
>>>>    Was there a different use case you were looking to solve using the Distributed cache service?
>>>>
>>>>
>>>>Thanks,
>>>>Matt
>>>>
>>>>
>>>>On Tue, Jan 12, 2016 at 4:36 AM, sudeep mishra <su...@gmail.com> wrote:
>>>>
>>>>Hi,
>>>>>
>>>>>
>>>>>I can cache some data to be used in NiFi flow. I can see the processor PutDistributedMapCache in the documentation which saves key-value pairs in DistributedMapCache for NiFi but I do not see any processor to red this data. How can I read data from DistributedMapCache in my data flow?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>Thanks & Regards,
>>>>>
>>>>>
>>>>>Sudeep Shekhar Mishra
>>>>>
>>>>>
>>>>
>>>
>>
>>
>>
>>--
>>
>>Thanks & Regards,
>>
>>
>>Sudeep Shekhar Mishra
>>
>>
>>+91-9167519029
>>sudeepshekharm@gmail.com
>>
>>
>
>
>--
>
>Thanks & Regards,
>
>Sudeep Shekhar Mishra
>
>+91-9167519029
>sudeepshekharm@gmail.com
>


-- 

Thanks & Regards,

Sudeep Shekhar Mishra

+91-9167519029
sudeepshekharm@gmail.com

Re: PutDistributedMapCache

Posted by sudeep mishra <su...@gmail.com>.
Thank you very much Joe.

Can you please let me know how I can use the .patch file? I am using the
NiFi via the binaries... Do I need to setup the source code and build the
same along with the patch?

Thanks & Regards,

Sudeep

On Wed, Jan 13, 2016 at 9:02 PM, Joe Percivall <jo...@yahoo.com>
wrote:

> Hello Sudeep,
>
> I put up a patch on the GetDistributedMapCache ticket[1]. Let me know what
> you think.
>
> The PutDistributedMapCache processor and GetDistributedMapCache work with
> the data as a byte[] so it should be format agnostic. That being said it
> will be up to you to know what is in there in order to use it later.
>
> [1] https://issues.apache.org/jira/browse/NIFI-1382
>
> Joe
> - - - - - -
> Joseph Percivall
> linkedin.com/in/Percivall
> e: joepercivall@yahoo.com
>
>
>
> On Tuesday, January 12, 2016 11:34 PM, sudeep mishra <
> sudeepshekharm@gmail.com> wrote:
>
>
>
> Thanks Joe.
>
> I do not have specific configuration as of now as I am still exploring
> NiFi. Though I think it would be helpful to let user store and retrieve the
> cache values in different formats json, avro etc.
>
> Thanks & Regards,
>
> Sudeep
>
>
>
>
>
> On Tue, Jan 12, 2016 at 9:15 PM, Joe Percivall <jo...@yahoo.com>
> wrote:
>
> Hello Sudeep,
> >
> >
> >We are currently lacking a "GetDistributedMapCache" processor that
> corresponds to the "PutDistributedMapCache". I created a ticket[1] and will
> be working on it today. If you have any comments, configuration
> suggestions, etc. please let me know or comment on the ticket.
> >
> >
> >[1] https://issues.apache.org/jira/browse/NIFI-1382
> >
> >Joe
> >- - - - - -
> >Joseph Percivall
> >linkedin.com/in/Percivall
> >e: joepercivall@yahoo.com
> >
> >
> >
> >
> >
> >On Tuesday, January 12, 2016 9:46 AM, sudeep mishra <
> sudeepshekharm@gmail.com> wrote:
> >
> >
> >
> >Thanks Matt.
> >
> >
> >In my data flow I am expected to perform certain validations on data. I
> am loading some SQLServer data into HDFSusing Sqoop (not part of NiFi
> flow). For each record in HDFS file I have to query another database and
> then save the validated record again in HDFS which will be processed bysome
> Spark jobs.
> >
> >
> >Since I have to query for each record thus I was planning to cache the
> database records against which I have to validate the HDFS. Thus I was
> evaluating the DistributedCacheServer. But looks like its purpose is
> different. Alternatively can we integrate Redis or another distributed
> cache with NiFi as I do not see any processor for it.
> >
> >
> >Appreciate your help.
> >
> >
> >Thanks & Regards,
> >
> >
> >Sudeep
> >
> >
> >
> >
> >On Tue, Jan 12, 2016 at 6:59 PM, Matthew Clarke <
> matt.clarke.138@gmail.com> wrote:
> >
> >Sudeep,
> >>       I was a little off on my second scenario.  The detectduplicate
> processor uses the distributedcache service all on its own.. Files that are
> route through it are loaded into the cache if they do not already exist in
> the cache.  if they do already exist they are routed to duplicate.  The
> putDistributedCache processor was a community contribution to which there
> are no processor that make use of the info that it caches.
> >>
> >>       We should probably build a processor that would make use of the
> data that can be loaded by the putDistributeCache processor.  Is there a
> particular use case you are trying to solve where this would be applicable?
> >>
> >>
> >>Thanks,
> >>Matt
> >>
> >>
> >>On Tue, Jan 12, 2016 at 8:11 AM, Matthew Clarke <
> matt.clarke.138@gmail.com> wrote:
> >>
> >>Sudeep,
> >>>    The DistributedMapCache is typically used to prevent the
> consumption of duplicate data by some of the ingest type processors
> (GetHBASE, ListHDFS, and ListSFTP).  NiFi uses the service to keep a
> listing of what has been consumed so the same files are not consumed
> multiple times. The Service can also be used to detect if duplicate data
> already exists within a NiFi Instance or cluster. This would be the
> scenario where some source is pushing data to your NiFi and perhaps they
> push the same data more than once. You want to catch these duplicates so
> you can perhaps kick them out of your flow. For this you would use the
> PutDistributedCache processor to cache all incoming data and then use the
> DetectDuplicate processor to find those duplicates.
> >>>
> >>>    Was there a different use case you were looking to solve using the
> Distributed cache service?
> >>>
> >>>
> >>>Thanks,
> >>>Matt
> >>>
> >>>
> >>>On Tue, Jan 12, 2016 at 4:36 AM, sudeep mishra <
> sudeepshekharm@gmail.com> wrote:
> >>>
> >>>Hi,
> >>>>
> >>>>
> >>>>I can cache some data to be used in NiFi flow. I can see the processor
> PutDistributedMapCache in the documentation which saves key-value pairs in
> DistributedMapCache for NiFi but I do not see any processor to red this
> data. How can I read data from DistributedMapCache in my data flow?
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>Thanks & Regards,
> >>>>
> >>>>
> >>>>Sudeep Shekhar Mishra
> >>>>
> >>>>
> >>>
> >>
> >
> >
> >
> >--
> >
> >Thanks & Regards,
> >
> >
> >Sudeep Shekhar Mishra
> >
> >
> >+91-9167519029
> >sudeepshekharm@gmail.com
> >
> >
>
>
> --
>
> Thanks & Regards,
>
> Sudeep Shekhar Mishra
>
> +91-9167519029
> sudeepshekharm@gmail.com
>



-- 
Thanks & Regards,

Sudeep Shekhar Mishra

+91-9167519029
sudeepshekharm@gmail.com

Re: PutDistributedMapCache

Posted by Joe Percivall <jo...@yahoo.com>.
Hello Sudeep, 

I put up a patch on the GetDistributedMapCache ticket[1]. Let me know what you think.

The PutDistributedMapCache processor and GetDistributedMapCache work with the data as a byte[] so it should be format agnostic. That being said it will be up to you to know what is in there in order to use it later.

[1] https://issues.apache.org/jira/browse/NIFI-1382
 
Joe
- - - - - - 
Joseph Percivall
linkedin.com/in/Percivall
e: joepercivall@yahoo.com



On Tuesday, January 12, 2016 11:34 PM, sudeep mishra <su...@gmail.com> wrote:



Thanks Joe.

I do not have specific configuration as of now as I am still exploring NiFi. Though I think it would be helpful to let user store and retrieve the cache values in different formats json, avro etc.

Thanks & Regards,

Sudeep





On Tue, Jan 12, 2016 at 9:15 PM, Joe Percivall <jo...@yahoo.com> wrote:

Hello Sudeep,
>
>
>We are currently lacking a "GetDistributedMapCache" processor that corresponds to the "PutDistributedMapCache". I created a ticket[1] and will be working on it today. If you have any comments, configuration suggestions, etc. please let me know or comment on the ticket.
>
>
>[1] https://issues.apache.org/jira/browse/NIFI-1382
> 
>Joe
>- - - - - - 
>Joseph Percivall
>linkedin.com/in/Percivall
>e: joepercivall@yahoo.com
>
>
>
>
>
>On Tuesday, January 12, 2016 9:46 AM, sudeep mishra <su...@gmail.com> wrote:
>
>
>
>Thanks Matt.
>
>
>In my data flow I am expected to perform certain validations on data. I am loading some SQLServer data into HDFSusing Sqoop (not part of NiFi flow). For each record in HDFS file I have to query another database and then save the validated record again in HDFS which will be processed bysome Spark jobs.
>
>
>Since I have to query for each record thus I was planning to cache the database records against which I have to validate the HDFS. Thus I was evaluating the DistributedCacheServer. But looks like its purpose is different. Alternatively can we integrate Redis or another distributed cache with NiFi as I do not see any processor for it.
>
>
>Appreciate your help.
>
>
>Thanks & Regards,
>
>
>Sudeep
>
>
>
>
>On Tue, Jan 12, 2016 at 6:59 PM, Matthew Clarke <ma...@gmail.com> wrote:
>
>Sudeep,
>>       I was a little off on my second scenario.  The detectduplicate processor uses the distributedcache service all on its own.. Files that are route through it are loaded into the cache if they do not already exist in the cache.  if they do already exist they are routed to duplicate.  The putDistributedCache processor was a community contribution to which there are no processor that make use of the info that it caches.
>>
>>       We should probably build a processor that would make use of the data that can be loaded by the putDistributeCache processor.  Is there a particular use case you are trying to solve where this would be applicable?
>>
>>
>>Thanks,
>>Matt
>>
>>
>>On Tue, Jan 12, 2016 at 8:11 AM, Matthew Clarke <ma...@gmail.com> wrote:
>>
>>Sudeep,
>>>    The DistributedMapCache is typically used to prevent the consumption of duplicate data by some of the ingest type processors (GetHBASE, ListHDFS, and ListSFTP).  NiFi uses the service to keep a listing of what has been consumed so the same files are not consumed multiple times. The Service can also be used to detect if duplicate data already exists within a NiFi Instance or cluster. This would be the scenario where some source is pushing data to your NiFi and perhaps they push the same data more than once. You want to catch these duplicates so you can perhaps kick them out of your flow. For this you would use the PutDistributedCache processor to cache all incoming data and then use the DetectDuplicate processor to find those duplicates.
>>>
>>>    Was there a different use case you were looking to solve using the Distributed cache service?
>>>
>>>
>>>Thanks,
>>>Matt
>>>
>>>
>>>On Tue, Jan 12, 2016 at 4:36 AM, sudeep mishra <su...@gmail.com> wrote:
>>>
>>>Hi,
>>>>
>>>>
>>>>I can cache some data to be used in NiFi flow. I can see the processor PutDistributedMapCache in the documentation which saves key-value pairs in DistributedMapCache for NiFi but I do not see any processor to red this data. How can I read data from DistributedMapCache in my data flow?
>>>>
>>>>
>>>>
>>>>
>>>>Thanks & Regards,
>>>>
>>>>
>>>>Sudeep Shekhar Mishra
>>>>
>>>>
>>>
>>
>
>
>
>-- 
>
>Thanks & Regards,
>
>
>Sudeep Shekhar Mishra
>
>
>+91-9167519029
>sudeepshekharm@gmail.com
>
>


-- 

Thanks & Regards,

Sudeep Shekhar Mishra

+91-9167519029
sudeepshekharm@gmail.com

Re: PutDistributedMapCache

Posted by sudeep mishra <su...@gmail.com>.
Thanks Joe.

I do not have specific configuration as of now as I am still exploring
NiFi. Though I think it would be helpful to let user store and retrieve the
cache values in different formats json, avro etc.

Thanks & Regards,

Sudeep



On Tue, Jan 12, 2016 at 9:15 PM, Joe Percivall <jo...@yahoo.com>
wrote:

> Hello Sudeep,
>
> We are currently lacking a "GetDistributedMapCache" processor that
> corresponds to the "PutDistributedMapCache". I created a ticket[1] and will
> be working on it today. If you have any comments, configuration
> suggestions, etc. please let me know or comment on the ticket.
>
> [1] https://issues.apache.org/jira/browse/NIFI-1382
>
> Joe
> - - - - - -
> *Joseph Percivall*
> linkedin.com/in/Percivall
> e: joepercivall@yahoo.com
>
>
>
> On Tuesday, January 12, 2016 9:46 AM, sudeep mishra <
> sudeepshekharm@gmail.com> wrote:
>
>
> Thanks Matt.
>
> In my data flow I am expected to perform certain validations on data. I am
> loading some SQLServer data into HDFSusing Sqoop (not part of NiFi flow).
> For each record in HDFS file I have to query another database and then save
> the validated record again in HDFS which will be processed bysome Spark
> jobs.
>
> Since I have to query for each record thus I was planning to cache the
> database records against which I have to validate the HDFS. Thus I was
> evaluating the DistributedCacheServer. But looks like its purpose is
> different. Alternatively can we integrate Redis or another distributed
> cache with NiFi as I do not see any processor for it.
>
> Appreciate your help.
>
> Thanks & Regards,
>
> Sudeep
>
>
> On Tue, Jan 12, 2016 at 6:59 PM, Matthew Clarke <matt.clarke.138@gmail.com
> > wrote:
>
> Sudeep,
>        I was a little off on my second scenario.  The detectduplicate
> processor uses the distributedcache service all on its own.. Files that are
> route through it are loaded into the cache if they do not already exist in
> the cache.  if they do already exist they are routed to duplicate.  The
> putDistributedCache processor was a community contribution to which there
> are no processor that make use of the info that it caches.
>
>        We should probably build a processor that would make use of the
> data that can be loaded by the putDistributeCache processor.  Is there a
> particular use case you are trying to solve where this would be applicable?
>
> Thanks,
> Matt
>
> On Tue, Jan 12, 2016 at 8:11 AM, Matthew Clarke <matt.clarke.138@gmail.com
> > wrote:
>
> Sudeep,
>     The DistributedMapCache is typically used to prevent the consumption
> of duplicate data by some of the ingest type processors (GetHBASE,
> ListHDFS, and ListSFTP).  NiFi uses the service to keep a listing of what
> has been consumed so the same files are not consumed multiple times. The
> Service can also be used to detect if duplicate data already exists within
> a NiFi Instance or cluster. This would be the scenario where some source is
> pushing data to your NiFi and perhaps they push the same data more than
> once. You want to catch these duplicates so you can perhaps kick them out
> of your flow. For this you would use the PutDistributedCache processor to
> cache all incoming data and then use the DetectDuplicate processor to find
> those duplicates.
>
>     Was there a different use case you were looking to solve using the
> Distributed cache service?
>
> Thanks,
> Matt
>
> On Tue, Jan 12, 2016 at 4:36 AM, sudeep mishra <su...@gmail.com>
> wrote:
>
> Hi,
>
> I can cache some data to be used in NiFi flow. I can see the
> processor PutDistributedMapCache in the documentation which saves key-value
> pairs in DistributedMapCache for NiFi but I do not see any processor to red
> this data. How can I read data from DistributedMapCache in my data flow?
>
>
> Thanks & Regards,
>
> Sudeep Shekhar Mishra
>
>
>
>
>
>
> --
> Thanks & Regards,
>
> Sudeep Shekhar Mishra
>
> +91-9167519029
> sudeepshekharm@gmail.com
>
>
>


-- 
Thanks & Regards,

Sudeep Shekhar Mishra

+91-9167519029
sudeepshekharm@gmail.com

Re: PutDistributedMapCache

Posted by Joe Percivall <jo...@yahoo.com>.
Hello Sudeep,
We are currently lacking a "GetDistributedMapCache" processor that corresponds to the "PutDistributedMapCache". I created a ticket[1] and will be working on it today. If you have any comments, configuration suggestions, etc. please let me know or comment on the ticket.
[1] https://issues.apache.org/jira/browse/NIFI-1382 Joe- - - - - - Joseph Percivalllinkedin.com/in/Percivalle: joepercivall@yahoo.com
 

    On Tuesday, January 12, 2016 9:46 AM, sudeep mishra <su...@gmail.com> wrote:
 

 Thanks Matt.
In my data flow I am expected to perform certain validations on data. I am loading some SQLServer data into HDFSusing Sqoop (not part of NiFi flow). For each record in HDFS file I have to query another database and then save the validated record again in HDFS which will be processed bysome Spark jobs.
Since I have to query for each record thus I was planning to cache the database records against which I have to validate the HDFS. Thus I was evaluating the DistributedCacheServer. But looks like its purpose is different. Alternatively can we integrate Redis or another distributed cache with NiFi as I do not see any processor for it.
Appreciate your help.
Thanks & Regards,
Sudeep

On Tue, Jan 12, 2016 at 6:59 PM, Matthew Clarke <ma...@gmail.com> wrote:

Sudeep,       I was a little off on my second scenario.  The detectduplicate processor uses the distributedcache service all on its own.. Files that are route through it are loaded into the cache if they do not already exist in the cache.  if they do already exist they are routed to duplicate.  The putDistributedCache processor was a community contribution to which there are no processor that make use of the info that it caches.

       We should probably build a processor that would make use of the data that can be loaded by the putDistributeCache processor.  Is there a particular use case you are trying to solve where this would be applicable?
Thanks,Matt
On Tue, Jan 12, 2016 at 8:11 AM, Matthew Clarke <ma...@gmail.com> wrote:

Sudeep,    The DistributedMapCache is typically used to prevent the consumption of duplicate data by some of the ingest type processors (GetHBASE, ListHDFS, and ListSFTP).  NiFi uses the service to keep a listing of what has been consumed so the same files are not consumed multiple times. The Service can also be used to detect if duplicate data already exists within a NiFi Instance or cluster. This would be the scenario where some source is pushing data to your NiFi and perhaps they push the same data more than once. You want to catch these duplicates so you can perhaps kick them out of your flow. For this you would use the PutDistributedCache processor to cache all incoming data and then use the DetectDuplicate processor to find those duplicates.

    Was there a different use case you were looking to solve using the Distributed cache service?
Thanks,Matt
On Tue, Jan 12, 2016 at 4:36 AM, sudeep mishra <su...@gmail.com> wrote:

Hi,
I can cache some data to be used in NiFi flow. I can see the processor PutDistributedMapCache in the documentation which saves key-value pairs in DistributedMapCache for NiFi but I do not see any processor to red this data. How can I read data from DistributedMapCache in my data flow?


Thanks & Regards,
Sudeep Shekhar Mishra








-- 
Thanks & Regards,
Sudeep Shekhar Mishra
+91-9167519029sudeepshekharm@gmail.com

  

Re: PutDistributedMapCache

Posted by sudeep mishra <su...@gmail.com>.
Thanks Matt.

In my data flow I am expected to perform certain validations on data. I am
loading some SQLServer data into HDFSusing Sqoop (not part of NiFi flow).
For each record in HDFS file I have to query another database and then save
the validated record again in HDFS which will be processed bysome Spark
jobs.

Since I have to query for each record thus I was planning to cache the
database records against which I have to validate the HDFS. Thus I was
evaluating the DistributedCacheServer. But looks like its purpose is
different. Alternatively can we integrate Redis or another distributed
cache with NiFi as I do not see any processor for it.

Appreciate your help.

Thanks & Regards,

Sudeep


On Tue, Jan 12, 2016 at 6:59 PM, Matthew Clarke <ma...@gmail.com>
wrote:

> Sudeep,
>        I was a little off on my second scenario.  The detectduplicate
> processor uses the distributedcache service all on its own.. Files that are
> route through it are loaded into the cache if they do not already exist in
> the cache.  if they do already exist they are routed to duplicate.  The
> putDistributedCache processor was a community contribution to which there
> are no processor that make use of the info that it caches.
>
>        We should probably build a processor that would make use of the
> data that can be loaded by the putDistributeCache processor.  Is there a
> particular use case you are trying to solve where this would be applicable?
>
> Thanks,
> Matt
>
> On Tue, Jan 12, 2016 at 8:11 AM, Matthew Clarke <matt.clarke.138@gmail.com
> > wrote:
>
>> Sudeep,
>>     The DistributedMapCache is typically used to prevent the consumption
>> of duplicate data by some of the ingest type processors (GetHBASE,
>> ListHDFS, and ListSFTP).  NiFi uses the service to keep a listing of what
>> has been consumed so the same files are not consumed multiple times. The
>> Service can also be used to detect if duplicate data already exists within
>> a NiFi Instance or cluster. This would be the scenario where some source is
>> pushing data to your NiFi and perhaps they push the same data more than
>> once. You want to catch these duplicates so you can perhaps kick them out
>> of your flow. For this you would use the PutDistributedCache processor to
>> cache all incoming data and then use the DetectDuplicate processor to find
>> those duplicates.
>>
>>     Was there a different use case you were looking to solve using the
>> Distributed cache service?
>>
>> Thanks,
>> Matt
>>
>> On Tue, Jan 12, 2016 at 4:36 AM, sudeep mishra <su...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I can cache some data to be used in NiFi flow. I can see the
>>> processor PutDistributedMapCache in the documentation which saves key-value
>>> pairs in DistributedMapCache for NiFi but I do not see any processor to red
>>> this data. How can I read data from DistributedMapCache in my data flow?
>>>
>>>
>>> Thanks & Regards,
>>>
>>> Sudeep Shekhar Mishra
>>>
>>>
>>
>


-- 
Thanks & Regards,

Sudeep Shekhar Mishra

+91-9167519029
sudeepshekharm@gmail.com

Re: PutDistributedMapCache

Posted by Matthew Clarke <ma...@gmail.com>.
Sudeep,
       I was a little off on my second scenario.  The detectduplicate
processor uses the distributedcache service all on its own.. Files that are
route through it are loaded into the cache if they do not already exist in
the cache.  if they do already exist they are routed to duplicate.  The
putDistributedCache processor was a community contribution to which there
are no processor that make use of the info that it caches.

       We should probably build a processor that would make use of the data
that can be loaded by the putDistributeCache processor.  Is there a
particular use case you are trying to solve where this would be applicable?

Thanks,
Matt

On Tue, Jan 12, 2016 at 8:11 AM, Matthew Clarke <ma...@gmail.com>
wrote:

> Sudeep,
>     The DistributedMapCache is typically used to prevent the consumption
> of duplicate data by some of the ingest type processors (GetHBASE,
> ListHDFS, and ListSFTP).  NiFi uses the service to keep a listing of what
> has been consumed so the same files are not consumed multiple times. The
> Service can also be used to detect if duplicate data already exists within
> a NiFi Instance or cluster. This would be the scenario where some source is
> pushing data to your NiFi and perhaps they push the same data more than
> once. You want to catch these duplicates so you can perhaps kick them out
> of your flow. For this you would use the PutDistributedCache processor to
> cache all incoming data and then use the DetectDuplicate processor to find
> those duplicates.
>
>     Was there a different use case you were looking to solve using the
> Distributed cache service?
>
> Thanks,
> Matt
>
> On Tue, Jan 12, 2016 at 4:36 AM, sudeep mishra <su...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I can cache some data to be used in NiFi flow. I can see the
>> processor PutDistributedMapCache in the documentation which saves key-value
>> pairs in DistributedMapCache for NiFi but I do not see any processor to red
>> this data. How can I read data from DistributedMapCache in my data flow?
>>
>>
>> Thanks & Regards,
>>
>> Sudeep Shekhar Mishra
>>
>>
>

Re: PutDistributedMapCache

Posted by Matthew Clarke <ma...@gmail.com>.
Sudeep,
    The DistributedMapCache is typically used to prevent the consumption of
duplicate data by some of the ingest type processors (GetHBASE, ListHDFS,
and ListSFTP).  NiFi uses the service to keep a listing of what has been
consumed so the same files are not consumed multiple times. The Service can
also be used to detect if duplicate data already exists within a NiFi
Instance or cluster. This would be the scenario where some source is
pushing data to your NiFi and perhaps they push the same data more than
once. You want to catch these duplicates so you can perhaps kick them out
of your flow. For this you would use the PutDistributedCache processor to
cache all incoming data and then use the DetectDuplicate processor to find
those duplicates.

    Was there a different use case you were looking to solve using the
Distributed cache service?

Thanks,
Matt

On Tue, Jan 12, 2016 at 4:36 AM, sudeep mishra <su...@gmail.com>
wrote:

> Hi,
>
> I can cache some data to be used in NiFi flow. I can see the
> processor PutDistributedMapCache in the documentation which saves key-value
> pairs in DistributedMapCache for NiFi but I do not see any processor to red
> this data. How can I read data from DistributedMapCache in my data flow?
>
>
> Thanks & Regards,
>
> Sudeep Shekhar Mishra
>
>