You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/04/02 13:08:41 UTC
[jira] [Commented] (NIFI-3644) Add DetectDuplicateUsingHBase
processor
[ https://issues.apache.org/jira/browse/NIFI-3644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15952671#comment-15952671 ]
ASF GitHub Bot commented on NIFI-3644:
--------------------------------------
GitHub user baolsen opened a pull request:
https://github.com/apache/nifi/pull/1645
NIFI-3644 - Added HBase_1_1_2_ClientMapCacheService
Added HBase_1_1_2_ClientMapCacheService which implements DistributedMapCacheClient.
The DetectDuplicate processor can now make use of HBase_1_1_2_ClientMapCacheService for storing the duplicate cache on HBase.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/baolsen/nifi DistributedMapCacheHBaseClientService
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/nifi/pull/1645.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1645
----
commit 8c0285b5efb6afd1607bb050650b758fed7d06e3
Author: baolsen <bj...@gmail.com>
Date: 2017-03-23T12:35:43Z
Update HBaseClientService.java
Added "get" function call for doing single row lookup on HBase (HBase get)
commit 03d1b36376c6954d8bdcf4056314fced0cf0d1fc
Author: baolsen <bj...@gmail.com>
Date: 2017-03-23T13:20:41Z
Update HBase_1_1_2_ClientService.java
Implemented "get" function for retrieval of single HBase rows.
commit 6dbca10e82b3b6b8ac94f8f0152b8fff85008082
Author: baolsen <bj...@gmail.com>
Date: 2017-03-23T13:33:15Z
Update HBase_1_1_2_ClientService.java
commit df30a22a3ba71fedfe1dffedefcc0eb64c3670b0
Author: baolsen <bj...@gmail.com>
Date: 2017-03-23T13:40:08Z
Update HBase_1_1_2_ClientService.java
commit 6d8036cc03ef49e41b92dbb5fa7e0de41cc15c3d
Author: baolsen <bj...@gmail.com>
Date: 2017-03-23T13:44:12Z
Update MockHBaseClientService.java
Implemented "get" function with UnsupportedException
commit 4bcb26fd6a99a23852097f4f3db02cbeb6b8a3b5
Author: baolsen <bj...@gmail.com>
Date: 2017-03-23T13:46:23Z
Update HBase_1_1_2_ClientService.java
commit 4b266d9d1d112e2bf8aa198f87253d17c055dbbc
Author: baolsen <bj...@gmail.com>
Date: 2017-03-23T13:50:09Z
Update MockHBaseClientService.java
commit 2ef850bc7c2bce5f9dd35fc9ce5cf08c7ecf07c4
Author: baolsen <bj...@gmail.com>
Date: 2017-03-29T08:51:11Z
Test
commit e802f147bcd19664b9053e240ec1476ff7a61e7b
Author: baolsen <bj...@gmail.com>
Date: 2017-03-29T08:52:35Z
Test
commit 4cabff26658090c08d813e74d27894a9fd684c57
Author: baolsen <bj...@gmail.com>
Date: 2017-03-31T07:59:50Z
Completed initial development of HBase_1_1_2_ClientMapCacheService.java which is compatible with DetectDuplicate (and other processors)
Still need to implement value deletion
commit 7790d3f5a8d56f0801d40ad2c836a8db7c123e1b
Author: baolsen <bj...@gmail.com>
Date: 2017-03-31T08:31:06Z
Undid changes to files for an earlier attempt at this
commit 594dc059cdbe708f10849c794b826d24e83e787d
Author: baolsen <bj...@gmail.com>
Date: 2017-03-31T08:33:47Z
Undid changes to files for an earlier attempt at this
commit fbd3034e736ecdd1d721cc788e5c984eee6560c7
Author: baolsen <bj...@gmail.com>
Date: 2017-04-02T13:01:21Z
Added remove() for cache and Documentation
----
> Add DetectDuplicateUsingHBase processor
> ---------------------------------------
>
> Key: NIFI-3644
> URL: https://issues.apache.org/jira/browse/NIFI-3644
> Project: Apache NiFi
> Issue Type: Improvement
> Components: Extensions
> Reporter: Bjorn Olsen
> Priority: Minor
>
> The DetectDuplicate processor makes use of a distributed map cache for maintaining a list of unique file identifiers (such as hashes).
> The distributed map cache functionality could be provided by an HBase table, which then allows for reliably storing a huge volume of file identifiers and auditing information. The downside of this approach is of course that HBase is required.
> Storing the unique file identifiers in a reliable, query-able manner along with some audit information is of benefit to several use cases.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)