You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/04/02 13:08:41 UTC

[jira] [Commented] (NIFI-3644) Add DetectDuplicateUsingHBase processor

    [ https://issues.apache.org/jira/browse/NIFI-3644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15952671#comment-15952671 ] 

ASF GitHub Bot commented on NIFI-3644:
--------------------------------------

GitHub user baolsen opened a pull request:

    https://github.com/apache/nifi/pull/1645

    NIFI-3644 - Added HBase_1_1_2_ClientMapCacheService

    Added HBase_1_1_2_ClientMapCacheService which implements DistributedMapCacheClient.
    The DetectDuplicate processor can now make use of HBase_1_1_2_ClientMapCacheService for storing the duplicate cache on HBase.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/baolsen/nifi DistributedMapCacheHBaseClientService

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/nifi/pull/1645.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1645
    
----
commit 8c0285b5efb6afd1607bb050650b758fed7d06e3
Author: baolsen <bj...@gmail.com>
Date:   2017-03-23T12:35:43Z

    Update HBaseClientService.java
    
    Added "get" function call for doing single row lookup on HBase (HBase get)

commit 03d1b36376c6954d8bdcf4056314fced0cf0d1fc
Author: baolsen <bj...@gmail.com>
Date:   2017-03-23T13:20:41Z

    Update HBase_1_1_2_ClientService.java
    
    Implemented "get" function for retrieval of single HBase rows.

commit 6dbca10e82b3b6b8ac94f8f0152b8fff85008082
Author: baolsen <bj...@gmail.com>
Date:   2017-03-23T13:33:15Z

    Update HBase_1_1_2_ClientService.java

commit df30a22a3ba71fedfe1dffedefcc0eb64c3670b0
Author: baolsen <bj...@gmail.com>
Date:   2017-03-23T13:40:08Z

    Update HBase_1_1_2_ClientService.java

commit 6d8036cc03ef49e41b92dbb5fa7e0de41cc15c3d
Author: baolsen <bj...@gmail.com>
Date:   2017-03-23T13:44:12Z

    Update MockHBaseClientService.java
    
    Implemented "get" function with UnsupportedException

commit 4bcb26fd6a99a23852097f4f3db02cbeb6b8a3b5
Author: baolsen <bj...@gmail.com>
Date:   2017-03-23T13:46:23Z

    Update HBase_1_1_2_ClientService.java

commit 4b266d9d1d112e2bf8aa198f87253d17c055dbbc
Author: baolsen <bj...@gmail.com>
Date:   2017-03-23T13:50:09Z

    Update MockHBaseClientService.java

commit 2ef850bc7c2bce5f9dd35fc9ce5cf08c7ecf07c4
Author: baolsen <bj...@gmail.com>
Date:   2017-03-29T08:51:11Z

    Test

commit e802f147bcd19664b9053e240ec1476ff7a61e7b
Author: baolsen <bj...@gmail.com>
Date:   2017-03-29T08:52:35Z

    Test

commit 4cabff26658090c08d813e74d27894a9fd684c57
Author: baolsen <bj...@gmail.com>
Date:   2017-03-31T07:59:50Z

    Completed initial development of HBase_1_1_2_ClientMapCacheService.java which is compatible with DetectDuplicate (and other processors)
    Still need to implement value deletion

commit 7790d3f5a8d56f0801d40ad2c836a8db7c123e1b
Author: baolsen <bj...@gmail.com>
Date:   2017-03-31T08:31:06Z

    Undid changes to files for an earlier attempt at this

commit 594dc059cdbe708f10849c794b826d24e83e787d
Author: baolsen <bj...@gmail.com>
Date:   2017-03-31T08:33:47Z

    Undid changes to files for an earlier attempt at this

commit fbd3034e736ecdd1d721cc788e5c984eee6560c7
Author: baolsen <bj...@gmail.com>
Date:   2017-04-02T13:01:21Z

    Added remove() for cache and Documentation

----


> Add DetectDuplicateUsingHBase processor
> ---------------------------------------
>
>                 Key: NIFI-3644
>                 URL: https://issues.apache.org/jira/browse/NIFI-3644
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Extensions
>            Reporter: Bjorn Olsen
>            Priority: Minor
>
> The DetectDuplicate processor makes use of a distributed map cache for maintaining a list of unique file identifiers (such as hashes).
> The distributed map cache functionality could be provided by an HBase table, which then allows for reliably storing a huge volume of file identifiers and auditing information. The downside of this approach is of course that HBase is required.
> Storing the unique file identifiers in a reliable, query-able manner along with some audit information is of benefit to several use cases.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)