You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by baolsen <gi...@git.apache.org> on 2017/04/02 13:08:03 UTC

[GitHub] nifi pull request #1645: NIFI-3644 - Added HBase_1_1_2_ClientMapCacheService

GitHub user baolsen opened a pull request:

    https://github.com/apache/nifi/pull/1645

    NIFI-3644 - Added HBase_1_1_2_ClientMapCacheService

    Added HBase_1_1_2_ClientMapCacheService which implements DistributedMapCacheClient.
    The DetectDuplicate processor can now make use of HBase_1_1_2_ClientMapCacheService for storing the duplicate cache on HBase.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/baolsen/nifi DistributedMapCacheHBaseClientService

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/nifi/pull/1645.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1645
    
----
commit 8c0285b5efb6afd1607bb050650b758fed7d06e3
Author: baolsen <bj...@gmail.com>
Date:   2017-03-23T12:35:43Z

    Update HBaseClientService.java
    
    Added "get" function call for doing single row lookup on HBase (HBase get)

commit 03d1b36376c6954d8bdcf4056314fced0cf0d1fc
Author: baolsen <bj...@gmail.com>
Date:   2017-03-23T13:20:41Z

    Update HBase_1_1_2_ClientService.java
    
    Implemented "get" function for retrieval of single HBase rows.

commit 6dbca10e82b3b6b8ac94f8f0152b8fff85008082
Author: baolsen <bj...@gmail.com>
Date:   2017-03-23T13:33:15Z

    Update HBase_1_1_2_ClientService.java

commit df30a22a3ba71fedfe1dffedefcc0eb64c3670b0
Author: baolsen <bj...@gmail.com>
Date:   2017-03-23T13:40:08Z

    Update HBase_1_1_2_ClientService.java

commit 6d8036cc03ef49e41b92dbb5fa7e0de41cc15c3d
Author: baolsen <bj...@gmail.com>
Date:   2017-03-23T13:44:12Z

    Update MockHBaseClientService.java
    
    Implemented "get" function with UnsupportedException

commit 4bcb26fd6a99a23852097f4f3db02cbeb6b8a3b5
Author: baolsen <bj...@gmail.com>
Date:   2017-03-23T13:46:23Z

    Update HBase_1_1_2_ClientService.java

commit 4b266d9d1d112e2bf8aa198f87253d17c055dbbc
Author: baolsen <bj...@gmail.com>
Date:   2017-03-23T13:50:09Z

    Update MockHBaseClientService.java

commit 2ef850bc7c2bce5f9dd35fc9ce5cf08c7ecf07c4
Author: baolsen <bj...@gmail.com>
Date:   2017-03-29T08:51:11Z

    Test

commit e802f147bcd19664b9053e240ec1476ff7a61e7b
Author: baolsen <bj...@gmail.com>
Date:   2017-03-29T08:52:35Z

    Test

commit 4cabff26658090c08d813e74d27894a9fd684c57
Author: baolsen <bj...@gmail.com>
Date:   2017-03-31T07:59:50Z

    Completed initial development of HBase_1_1_2_ClientMapCacheService.java which is compatible with DetectDuplicate (and other processors)
    Still need to implement value deletion

commit 7790d3f5a8d56f0801d40ad2c836a8db7c123e1b
Author: baolsen <bj...@gmail.com>
Date:   2017-03-31T08:31:06Z

    Undid changes to files for an earlier attempt at this

commit 594dc059cdbe708f10849c794b826d24e83e787d
Author: baolsen <bj...@gmail.com>
Date:   2017-03-31T08:33:47Z

    Undid changes to files for an earlier attempt at this

commit fbd3034e736ecdd1d721cc788e5c984eee6560c7
Author: baolsen <bj...@gmail.com>
Date:   2017-04-02T13:01:21Z

    Added remove() for cache and Documentation

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi pull request #1645: NIFI-3644 - Added HBase_1_1_2_ClientMapCacheService

Posted by bbende <gi...@git.apache.org>.
Github user bbende commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/1645#discussion_r109728428
  
    --- Diff: nifi-nar-bundles/nifi-standard-services/nifi-hbase_1_1_2-client-service-bundle/nifi-hbase_1_1_2-client-service/src/main/java/org/apache/nifi/hbase/HBase_1_1_2_ClientMapCacheService.java ---
    @@ -0,0 +1,224 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.nifi.hbase;
    +
    +import java.io.ByteArrayOutputStream;
    +import java.io.IOException;
    +import java.util.ArrayList;
    +import java.util.List;
    +
    +import org.apache.nifi.annotation.documentation.CapabilityDescription;
    +import org.apache.nifi.annotation.documentation.SeeAlso;
    +import org.apache.nifi.annotation.documentation.Tags;
    +import org.apache.nifi.annotation.lifecycle.OnEnabled;
    +import org.apache.nifi.components.PropertyDescriptor;
    +import org.apache.nifi.controller.AbstractControllerService;
    +import org.apache.nifi.controller.ConfigurationContext;
    +
    +import org.apache.nifi.distributed.cache.client.DistributedMapCacheClient;
    +import org.apache.nifi.distributed.cache.client.Serializer;
    +import org.apache.nifi.distributed.cache.client.Deserializer;
    +import java.io.ByteArrayOutputStream;
    +import org.apache.nifi.reporting.InitializationException;
    +
    +import java.nio.charset.StandardCharsets;
    +import org.apache.nifi.hbase.scan.ResultCell;
    +import org.apache.nifi.hbase.scan.ResultHandler;
    +import org.apache.nifi.hbase.scan.Column;
    +import org.apache.nifi.hbase.put.PutColumn;
    +
    +
    +import org.apache.nifi.processor.util.StandardValidators;
    +
    +@Tags({"distributed", "cache", "state", "map", "cluster","hbase"})
    +@SeeAlso(classNames = {"org.apache.nifi.distributed.cache.server.map.DistributedMapCacheClient", "org.apache.nifi.hbase.HBase_1_1_2_ClientService"})
    +@CapabilityDescription("Provides the ability to use an HBase table as a cache, in place of a DistributedMapCache."
    +    + " Uses a HBase_1_1_2_ClientService controller to communicate with HBase.")
    +
    +public class HBase_1_1_2_ClientMapCacheService extends AbstractControllerService implements DistributedMapCacheClient {
    +
    +    static final PropertyDescriptor HBASE_CLIENT_SERVICE = new PropertyDescriptor.Builder()
    +        .name("HBase Client Service")
    +        .description("Specifies the HBase Client Controller Service to use for accessing HBase.")
    +        .required(true)
    +        .identifiesControllerService(HBaseClientService.class)
    +        .build();
    +
    +    public static final PropertyDescriptor HBASE_CACHE_TABLE_NAME = new PropertyDescriptor.Builder()
    +        .name("HBase Cache Table Name")
    +        .description("Name of the table on HBase to use for the cache.")
    +        .required(true)
    +        .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +        .build();
    +
    +    public static final PropertyDescriptor HBASE_COLUMN_FAMILY = new PropertyDescriptor.Builder()
    +        .name("HBase Column Family")
    +        .description("Name of the column family on HBase to use for the cache.")
    +        .required(true)
    +        .defaultValue("f")
    +        .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +        .build();
    +
    +    public static final PropertyDescriptor HBASE_COLUMN_QUALIFIER = new PropertyDescriptor.Builder()
    +        .name("HBase Column Qualifier")
    +        .description("Name of the column qualifier on HBase to use for the cache")
    +        .defaultValue("q")
    +        .required(true)
    +        .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +        .build();
    +
    +    @Override
    +    protected List<PropertyDescriptor> getSupportedPropertyDescriptors() {
    +        final List<PropertyDescriptor> descriptors = new ArrayList<>();
    +        descriptors.add(HBASE_CACHE_TABLE_NAME);
    +        descriptors.add(HBASE_CLIENT_SERVICE);
    +        descriptors.add(HBASE_COLUMN_FAMILY);
    +        descriptors.add(HBASE_COLUMN_QUALIFIER);
    +        return descriptors;
    +    }
    +
    +    private String hBaseCacheTableName;
    +    private HBaseClientService hBaseClientService;
    +
    +    private String hBaseColumnFamily;
    +    private byte[] hBaseColumnFamilyBytes;
    +
    +    private String hBaseColumnQualifier;
    +    private byte[] hBaseColumnQualifierBytes;
    +
    +    @OnEnabled
    +    public void onConfigured(final ConfigurationContext context) throws InitializationException{
    +       hBaseCacheTableName  = context.getProperty(HBASE_CACHE_TABLE_NAME).getValue();
    +       hBaseClientService   = context.getProperty(HBASE_CLIENT_SERVICE).asControllerService(HBaseClientService.class);
    +       hBaseColumnFamily    = context.getProperty(HBASE_COLUMN_FAMILY).getValue();
    +       hBaseColumnQualifier = context.getProperty(HBASE_COLUMN_QUALIFIER).getValue();
    +
    +       hBaseColumnFamilyBytes    = hBaseColumnFamily.getBytes(StandardCharsets.UTF_8);
    +       hBaseColumnQualifierBytes = hBaseColumnQualifier.getBytes(StandardCharsets.UTF_8);
    +    }
    +
    +    private <T> byte[] serialize(final T value, final Serializer<T> serializer) throws IOException {
    +        final ByteArrayOutputStream baos = new ByteArrayOutputStream();
    +        serializer.serialize(value, baos);
    +        return baos.toByteArray();
    +    }
    +    private <T> T deserialize(final byte[] value, final Deserializer<T> deserializer) throws IOException {
    +        return deserializer.deserialize(value);
    +    }
    +
    +
    +    @Override
    +    public <K, V> boolean putIfAbsent(final K key, final V value, final Serializer<K> keySerializer, final Serializer<V> valueSerializer) throws IOException {
    +      if (containsKey(key, keySerializer)) {
    --- End diff --
    
    Should this be if (!containsKey(...)) ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi issue #1645: NIFI-3644 - Added HBase_1_1_2_ClientMapCacheService

Posted by bbende <gi...@git.apache.org>.
Github user bbende commented on the issue:

    https://github.com/apache/nifi/pull/1645
  
    Sorry for taking so long to get back to this...
    
    I tested this using PutDistributedMapCache and FetchDistributedMapCache, and noticed the value coming back from fetch wasn't exactly what I had stored. 
    
    In HBaseRowHandler we had:
    `lastResultBytes = resultCell.getValueArray()`
    
    And we need:
    `lastResultBytes = Arrays.copyOfRange(resultCell.getValueArray(), resultCell.getValueOffset(), resultCell.getValueLength() + resultCell.getValueOffset());
    `
    
    I made a commit here that includes the change:
    https://github.com/bbende/nifi/commit/dc8f14d95d6cdbab2aa6e815269fe0d98faa2fe6
    
    I also moved MockHBaseClientService into it's own class so it can be used by both tests, so that we don't have to duplicate that code.
    
    Everything else looks good so I will go ahead and merge these changes together (your commit then mine). 
    
    Thanks again for contributing! and sorry for the delay.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi pull request #1645: NIFI-3644 - Added HBase_1_1_2_ClientMapCacheService

Posted by baolsen <gi...@git.apache.org>.
Github user baolsen commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/1645#discussion_r110219758
  
    --- Diff: nifi-nar-bundles/nifi-standard-services/nifi-hbase_1_1_2-client-service-bundle/nifi-hbase_1_1_2-client-service/src/main/java/org/apache/nifi/hbase/HBase_1_1_2_ClientMapCacheService.java ---
    @@ -0,0 +1,224 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.nifi.hbase;
    +
    +import java.io.ByteArrayOutputStream;
    +import java.io.IOException;
    +import java.util.ArrayList;
    +import java.util.List;
    +
    +import org.apache.nifi.annotation.documentation.CapabilityDescription;
    +import org.apache.nifi.annotation.documentation.SeeAlso;
    +import org.apache.nifi.annotation.documentation.Tags;
    +import org.apache.nifi.annotation.lifecycle.OnEnabled;
    +import org.apache.nifi.components.PropertyDescriptor;
    +import org.apache.nifi.controller.AbstractControllerService;
    +import org.apache.nifi.controller.ConfigurationContext;
    +
    +import org.apache.nifi.distributed.cache.client.DistributedMapCacheClient;
    +import org.apache.nifi.distributed.cache.client.Serializer;
    +import org.apache.nifi.distributed.cache.client.Deserializer;
    +import java.io.ByteArrayOutputStream;
    +import org.apache.nifi.reporting.InitializationException;
    +
    +import java.nio.charset.StandardCharsets;
    +import org.apache.nifi.hbase.scan.ResultCell;
    +import org.apache.nifi.hbase.scan.ResultHandler;
    +import org.apache.nifi.hbase.scan.Column;
    +import org.apache.nifi.hbase.put.PutColumn;
    +
    +
    +import org.apache.nifi.processor.util.StandardValidators;
    +
    +@Tags({"distributed", "cache", "state", "map", "cluster","hbase"})
    +@SeeAlso(classNames = {"org.apache.nifi.distributed.cache.server.map.DistributedMapCacheClient", "org.apache.nifi.hbase.HBase_1_1_2_ClientService"})
    +@CapabilityDescription("Provides the ability to use an HBase table as a cache, in place of a DistributedMapCache."
    +    + " Uses a HBase_1_1_2_ClientService controller to communicate with HBase.")
    +
    +public class HBase_1_1_2_ClientMapCacheService extends AbstractControllerService implements DistributedMapCacheClient {
    +
    +    static final PropertyDescriptor HBASE_CLIENT_SERVICE = new PropertyDescriptor.Builder()
    +        .name("HBase Client Service")
    +        .description("Specifies the HBase Client Controller Service to use for accessing HBase.")
    +        .required(true)
    +        .identifiesControllerService(HBaseClientService.class)
    +        .build();
    +
    +    public static final PropertyDescriptor HBASE_CACHE_TABLE_NAME = new PropertyDescriptor.Builder()
    +        .name("HBase Cache Table Name")
    +        .description("Name of the table on HBase to use for the cache.")
    +        .required(true)
    +        .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +        .build();
    +
    +    public static final PropertyDescriptor HBASE_COLUMN_FAMILY = new PropertyDescriptor.Builder()
    +        .name("HBase Column Family")
    +        .description("Name of the column family on HBase to use for the cache.")
    +        .required(true)
    +        .defaultValue("f")
    +        .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +        .build();
    +
    +    public static final PropertyDescriptor HBASE_COLUMN_QUALIFIER = new PropertyDescriptor.Builder()
    +        .name("HBase Column Qualifier")
    +        .description("Name of the column qualifier on HBase to use for the cache")
    +        .defaultValue("q")
    +        .required(true)
    +        .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +        .build();
    +
    +    @Override
    +    protected List<PropertyDescriptor> getSupportedPropertyDescriptors() {
    +        final List<PropertyDescriptor> descriptors = new ArrayList<>();
    +        descriptors.add(HBASE_CACHE_TABLE_NAME);
    +        descriptors.add(HBASE_CLIENT_SERVICE);
    +        descriptors.add(HBASE_COLUMN_FAMILY);
    +        descriptors.add(HBASE_COLUMN_QUALIFIER);
    +        return descriptors;
    +    }
    +
    +    private String hBaseCacheTableName;
    +    private HBaseClientService hBaseClientService;
    +
    +    private String hBaseColumnFamily;
    +    private byte[] hBaseColumnFamilyBytes;
    +
    +    private String hBaseColumnQualifier;
    +    private byte[] hBaseColumnQualifierBytes;
    +
    +    @OnEnabled
    +    public void onConfigured(final ConfigurationContext context) throws InitializationException{
    +       hBaseCacheTableName  = context.getProperty(HBASE_CACHE_TABLE_NAME).getValue();
    +       hBaseClientService   = context.getProperty(HBASE_CLIENT_SERVICE).asControllerService(HBaseClientService.class);
    +       hBaseColumnFamily    = context.getProperty(HBASE_COLUMN_FAMILY).getValue();
    +       hBaseColumnQualifier = context.getProperty(HBASE_COLUMN_QUALIFIER).getValue();
    +
    +       hBaseColumnFamilyBytes    = hBaseColumnFamily.getBytes(StandardCharsets.UTF_8);
    +       hBaseColumnQualifierBytes = hBaseColumnQualifier.getBytes(StandardCharsets.UTF_8);
    +    }
    +
    +    private <T> byte[] serialize(final T value, final Serializer<T> serializer) throws IOException {
    +        final ByteArrayOutputStream baos = new ByteArrayOutputStream();
    +        serializer.serialize(value, baos);
    +        return baos.toByteArray();
    +    }
    +    private <T> T deserialize(final byte[] value, final Deserializer<T> deserializer) throws IOException {
    +        return deserializer.deserialize(value);
    +    }
    +
    +
    +    @Override
    +    public <K, V> boolean putIfAbsent(final K key, final V value, final Serializer<K> keySerializer, final Serializer<V> valueSerializer) throws IOException {
    +      if (containsKey(key, keySerializer)) {
    +        put(key, value, keySerializer, valueSerializer);
    +        return true;
    +      } else return false;
    +    }
    +
    +    @Override
    +    public <K, V> void put(final K key, final V value, final Serializer<K> keySerializer, final Serializer<V> valueSerializer) throws IOException {
    +
    +        List<PutColumn> putColumns = new ArrayList<PutColumn>(1);
    +        final byte[] rowIdBytes = serialize(key, keySerializer);
    +        final byte[] valueBytes = serialize(value, valueSerializer);
    +
    +        final PutColumn putColumn = new PutColumn(hBaseColumnFamilyBytes, hBaseColumnQualifierBytes, valueBytes);
    +        putColumns.add(putColumn);
    +
    +        hBaseClientService.put(hBaseCacheTableName, rowIdBytes, putColumns);
    +    }
    +
    +    @Override
    +    public <K> boolean containsKey(final K key, final Serializer<K> keySerializer) throws IOException {
    +      final byte[] rowIdBytes = serialize(key, keySerializer);
    +      final HBaseRowHandler handler = new HBaseRowHandler();
    +
    +      final List<Column> columnsList = new ArrayList<Column>(0);
    +
    +      hBaseClientService.scan(hBaseCacheTableName, rowIdBytes, rowIdBytes, columnsList, handler);
    +      return (handler.numRows() > 0);
    +    }
    +
    +    @Override
    +    public <K, V> V getAndPutIfAbsent(final K key, final V value, final Serializer<K> keySerializer, final Serializer<V> valueSerializer, final Deserializer<V> valueDeserializer) throws IOException {
    --- End diff --
    
    @bbende I'm having some trouble with getAndPutIfAbsent.
    
    In my local copy of the code I've managed to add a putIfAbsent function which uses HBase's checkAndPut to ensure it is atomic, however there doesn't seem to be a way to do getAndPutIfAbsent atomically.
    
    The best I have come up with (pseudocode) is:
    g=get
    wasAbsent = putIfAbsent
    if ( ! wasAbsent ) return g
    else return null
    
    This handles concurrent deletes but not updates / replacements. 
    
    It looks like this might not be an issue - in DetectDuplicate, updates are not done and we are only interested in getting the original cache value. So getAndPutIfAbsent seems to be a convenience function rather than needing to be explicitly atomic. It's not clear though.
    
    Do you have any suggestions?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi pull request #1645: NIFI-3644 - Added HBase_1_1_2_ClientMapCacheService

Posted by bbende <gi...@git.apache.org>.
Github user bbende commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/1645#discussion_r109727737
  
    --- Diff: nifi-nar-bundles/nifi-standard-services/nifi-hbase_1_1_2-client-service-bundle/nifi-hbase_1_1_2-client-service/src/main/java/org/apache/nifi/hbase/HBase_1_1_2_ClientMapCacheService.java ---
    @@ -0,0 +1,224 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.nifi.hbase;
    +
    +import java.io.ByteArrayOutputStream;
    +import java.io.IOException;
    +import java.util.ArrayList;
    +import java.util.List;
    +
    +import org.apache.nifi.annotation.documentation.CapabilityDescription;
    +import org.apache.nifi.annotation.documentation.SeeAlso;
    +import org.apache.nifi.annotation.documentation.Tags;
    +import org.apache.nifi.annotation.lifecycle.OnEnabled;
    +import org.apache.nifi.components.PropertyDescriptor;
    +import org.apache.nifi.controller.AbstractControllerService;
    +import org.apache.nifi.controller.ConfigurationContext;
    +
    +import org.apache.nifi.distributed.cache.client.DistributedMapCacheClient;
    +import org.apache.nifi.distributed.cache.client.Serializer;
    +import org.apache.nifi.distributed.cache.client.Deserializer;
    +import java.io.ByteArrayOutputStream;
    +import org.apache.nifi.reporting.InitializationException;
    +
    +import java.nio.charset.StandardCharsets;
    +import org.apache.nifi.hbase.scan.ResultCell;
    +import org.apache.nifi.hbase.scan.ResultHandler;
    +import org.apache.nifi.hbase.scan.Column;
    +import org.apache.nifi.hbase.put.PutColumn;
    +
    +
    +import org.apache.nifi.processor.util.StandardValidators;
    +
    +@Tags({"distributed", "cache", "state", "map", "cluster","hbase"})
    +@SeeAlso(classNames = {"org.apache.nifi.distributed.cache.server.map.DistributedMapCacheClient", "org.apache.nifi.hbase.HBase_1_1_2_ClientService"})
    +@CapabilityDescription("Provides the ability to use an HBase table as a cache, in place of a DistributedMapCache."
    +    + " Uses a HBase_1_1_2_ClientService controller to communicate with HBase.")
    +
    +public class HBase_1_1_2_ClientMapCacheService extends AbstractControllerService implements DistributedMapCacheClient {
    +
    +    static final PropertyDescriptor HBASE_CLIENT_SERVICE = new PropertyDescriptor.Builder()
    +        .name("HBase Client Service")
    +        .description("Specifies the HBase Client Controller Service to use for accessing HBase.")
    +        .required(true)
    +        .identifiesControllerService(HBaseClientService.class)
    +        .build();
    +
    +    public static final PropertyDescriptor HBASE_CACHE_TABLE_NAME = new PropertyDescriptor.Builder()
    +        .name("HBase Cache Table Name")
    +        .description("Name of the table on HBase to use for the cache.")
    +        .required(true)
    +        .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +        .build();
    +
    +    public static final PropertyDescriptor HBASE_COLUMN_FAMILY = new PropertyDescriptor.Builder()
    +        .name("HBase Column Family")
    +        .description("Name of the column family on HBase to use for the cache.")
    +        .required(true)
    +        .defaultValue("f")
    +        .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +        .build();
    +
    +    public static final PropertyDescriptor HBASE_COLUMN_QUALIFIER = new PropertyDescriptor.Builder()
    +        .name("HBase Column Qualifier")
    +        .description("Name of the column qualifier on HBase to use for the cache")
    +        .defaultValue("q")
    +        .required(true)
    +        .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +        .build();
    +
    +    @Override
    +    protected List<PropertyDescriptor> getSupportedPropertyDescriptors() {
    +        final List<PropertyDescriptor> descriptors = new ArrayList<>();
    +        descriptors.add(HBASE_CACHE_TABLE_NAME);
    +        descriptors.add(HBASE_CLIENT_SERVICE);
    +        descriptors.add(HBASE_COLUMN_FAMILY);
    +        descriptors.add(HBASE_COLUMN_QUALIFIER);
    +        return descriptors;
    +    }
    +
    +    private String hBaseCacheTableName;
    --- End diff --
    
    Since all these member variables are set in @OnEnabled, they should all be marked as volatile since different threads can call OnEnabled vs the actual methods of the service, volatile forces the variable to be read fresh and ensures that the other thread sees the correct value.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi pull request #1645: NIFI-3644 - Added HBase_1_1_2_ClientMapCacheService

Posted by bbende <gi...@git.apache.org>.
Github user bbende commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/1645#discussion_r109728311
  
    --- Diff: nifi-nar-bundles/nifi-standard-services/nifi-hbase_1_1_2-client-service-bundle/nifi-hbase_1_1_2-client-service/src/main/java/org/apache/nifi/hbase/HBase_1_1_2_ClientMapCacheService.java ---
    @@ -0,0 +1,224 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.nifi.hbase;
    +
    +import java.io.ByteArrayOutputStream;
    +import java.io.IOException;
    +import java.util.ArrayList;
    +import java.util.List;
    +
    +import org.apache.nifi.annotation.documentation.CapabilityDescription;
    +import org.apache.nifi.annotation.documentation.SeeAlso;
    +import org.apache.nifi.annotation.documentation.Tags;
    +import org.apache.nifi.annotation.lifecycle.OnEnabled;
    +import org.apache.nifi.components.PropertyDescriptor;
    +import org.apache.nifi.controller.AbstractControllerService;
    +import org.apache.nifi.controller.ConfigurationContext;
    +
    +import org.apache.nifi.distributed.cache.client.DistributedMapCacheClient;
    +import org.apache.nifi.distributed.cache.client.Serializer;
    +import org.apache.nifi.distributed.cache.client.Deserializer;
    +import java.io.ByteArrayOutputStream;
    +import org.apache.nifi.reporting.InitializationException;
    +
    +import java.nio.charset.StandardCharsets;
    +import org.apache.nifi.hbase.scan.ResultCell;
    +import org.apache.nifi.hbase.scan.ResultHandler;
    +import org.apache.nifi.hbase.scan.Column;
    +import org.apache.nifi.hbase.put.PutColumn;
    +
    +
    +import org.apache.nifi.processor.util.StandardValidators;
    +
    +@Tags({"distributed", "cache", "state", "map", "cluster","hbase"})
    +@SeeAlso(classNames = {"org.apache.nifi.distributed.cache.server.map.DistributedMapCacheClient", "org.apache.nifi.hbase.HBase_1_1_2_ClientService"})
    +@CapabilityDescription("Provides the ability to use an HBase table as a cache, in place of a DistributedMapCache."
    +    + " Uses a HBase_1_1_2_ClientService controller to communicate with HBase.")
    +
    +public class HBase_1_1_2_ClientMapCacheService extends AbstractControllerService implements DistributedMapCacheClient {
    +
    +    static final PropertyDescriptor HBASE_CLIENT_SERVICE = new PropertyDescriptor.Builder()
    +        .name("HBase Client Service")
    +        .description("Specifies the HBase Client Controller Service to use for accessing HBase.")
    +        .required(true)
    +        .identifiesControllerService(HBaseClientService.class)
    +        .build();
    +
    +    public static final PropertyDescriptor HBASE_CACHE_TABLE_NAME = new PropertyDescriptor.Builder()
    --- End diff --
    
    For the table, col fal, and col qual, you may want to support expression language. There are obviously no flow files in this case, but if you have expressionLanguageSupported(true) on the property descriptors and then when you get the values .evaluateAttributeExpressions(), this would let someone reference an environment variable if they want to specify a different table across environments,


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi pull request #1645: NIFI-3644 - Added HBase_1_1_2_ClientMapCacheService

Posted by bbende <gi...@git.apache.org>.
Github user bbende commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/1645#discussion_r110264255
  
    --- Diff: nifi-nar-bundles/nifi-standard-services/nifi-hbase_1_1_2-client-service-bundle/nifi-hbase_1_1_2-client-service/src/main/java/org/apache/nifi/hbase/HBase_1_1_2_ClientMapCacheService.java ---
    @@ -0,0 +1,224 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.nifi.hbase;
    +
    +import java.io.ByteArrayOutputStream;
    +import java.io.IOException;
    +import java.util.ArrayList;
    +import java.util.List;
    +
    +import org.apache.nifi.annotation.documentation.CapabilityDescription;
    +import org.apache.nifi.annotation.documentation.SeeAlso;
    +import org.apache.nifi.annotation.documentation.Tags;
    +import org.apache.nifi.annotation.lifecycle.OnEnabled;
    +import org.apache.nifi.components.PropertyDescriptor;
    +import org.apache.nifi.controller.AbstractControllerService;
    +import org.apache.nifi.controller.ConfigurationContext;
    +
    +import org.apache.nifi.distributed.cache.client.DistributedMapCacheClient;
    +import org.apache.nifi.distributed.cache.client.Serializer;
    +import org.apache.nifi.distributed.cache.client.Deserializer;
    +import java.io.ByteArrayOutputStream;
    +import org.apache.nifi.reporting.InitializationException;
    +
    +import java.nio.charset.StandardCharsets;
    +import org.apache.nifi.hbase.scan.ResultCell;
    +import org.apache.nifi.hbase.scan.ResultHandler;
    +import org.apache.nifi.hbase.scan.Column;
    +import org.apache.nifi.hbase.put.PutColumn;
    +
    +
    +import org.apache.nifi.processor.util.StandardValidators;
    +
    +@Tags({"distributed", "cache", "state", "map", "cluster","hbase"})
    +@SeeAlso(classNames = {"org.apache.nifi.distributed.cache.server.map.DistributedMapCacheClient", "org.apache.nifi.hbase.HBase_1_1_2_ClientService"})
    +@CapabilityDescription("Provides the ability to use an HBase table as a cache, in place of a DistributedMapCache."
    +    + " Uses a HBase_1_1_2_ClientService controller to communicate with HBase.")
    +
    +public class HBase_1_1_2_ClientMapCacheService extends AbstractControllerService implements DistributedMapCacheClient {
    +
    +    static final PropertyDescriptor HBASE_CLIENT_SERVICE = new PropertyDescriptor.Builder()
    +        .name("HBase Client Service")
    +        .description("Specifies the HBase Client Controller Service to use for accessing HBase.")
    +        .required(true)
    +        .identifiesControllerService(HBaseClientService.class)
    +        .build();
    +
    +    public static final PropertyDescriptor HBASE_CACHE_TABLE_NAME = new PropertyDescriptor.Builder()
    +        .name("HBase Cache Table Name")
    +        .description("Name of the table on HBase to use for the cache.")
    +        .required(true)
    +        .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +        .build();
    +
    +    public static final PropertyDescriptor HBASE_COLUMN_FAMILY = new PropertyDescriptor.Builder()
    +        .name("HBase Column Family")
    +        .description("Name of the column family on HBase to use for the cache.")
    +        .required(true)
    +        .defaultValue("f")
    +        .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +        .build();
    +
    +    public static final PropertyDescriptor HBASE_COLUMN_QUALIFIER = new PropertyDescriptor.Builder()
    +        .name("HBase Column Qualifier")
    +        .description("Name of the column qualifier on HBase to use for the cache")
    +        .defaultValue("q")
    +        .required(true)
    +        .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +        .build();
    +
    +    @Override
    +    protected List<PropertyDescriptor> getSupportedPropertyDescriptors() {
    +        final List<PropertyDescriptor> descriptors = new ArrayList<>();
    +        descriptors.add(HBASE_CACHE_TABLE_NAME);
    +        descriptors.add(HBASE_CLIENT_SERVICE);
    +        descriptors.add(HBASE_COLUMN_FAMILY);
    +        descriptors.add(HBASE_COLUMN_QUALIFIER);
    +        return descriptors;
    +    }
    +
    +    private String hBaseCacheTableName;
    +    private HBaseClientService hBaseClientService;
    +
    +    private String hBaseColumnFamily;
    +    private byte[] hBaseColumnFamilyBytes;
    +
    +    private String hBaseColumnQualifier;
    +    private byte[] hBaseColumnQualifierBytes;
    +
    +    @OnEnabled
    +    public void onConfigured(final ConfigurationContext context) throws InitializationException{
    +       hBaseCacheTableName  = context.getProperty(HBASE_CACHE_TABLE_NAME).getValue();
    +       hBaseClientService   = context.getProperty(HBASE_CLIENT_SERVICE).asControllerService(HBaseClientService.class);
    +       hBaseColumnFamily    = context.getProperty(HBASE_COLUMN_FAMILY).getValue();
    +       hBaseColumnQualifier = context.getProperty(HBASE_COLUMN_QUALIFIER).getValue();
    +
    +       hBaseColumnFamilyBytes    = hBaseColumnFamily.getBytes(StandardCharsets.UTF_8);
    +       hBaseColumnQualifierBytes = hBaseColumnQualifier.getBytes(StandardCharsets.UTF_8);
    +    }
    +
    +    private <T> byte[] serialize(final T value, final Serializer<T> serializer) throws IOException {
    +        final ByteArrayOutputStream baos = new ByteArrayOutputStream();
    +        serializer.serialize(value, baos);
    +        return baos.toByteArray();
    +    }
    +    private <T> T deserialize(final byte[] value, final Deserializer<T> deserializer) throws IOException {
    +        return deserializer.deserialize(value);
    +    }
    +
    +
    +    @Override
    +    public <K, V> boolean putIfAbsent(final K key, final V value, final Serializer<K> keySerializer, final Serializer<V> valueSerializer) throws IOException {
    +      if (containsKey(key, keySerializer)) {
    +        put(key, value, keySerializer, valueSerializer);
    +        return true;
    +      } else return false;
    +    }
    +
    +    @Override
    +    public <K, V> void put(final K key, final V value, final Serializer<K> keySerializer, final Serializer<V> valueSerializer) throws IOException {
    +
    +        List<PutColumn> putColumns = new ArrayList<PutColumn>(1);
    +        final byte[] rowIdBytes = serialize(key, keySerializer);
    +        final byte[] valueBytes = serialize(value, valueSerializer);
    +
    +        final PutColumn putColumn = new PutColumn(hBaseColumnFamilyBytes, hBaseColumnQualifierBytes, valueBytes);
    +        putColumns.add(putColumn);
    +
    +        hBaseClientService.put(hBaseCacheTableName, rowIdBytes, putColumns);
    +    }
    +
    +    @Override
    +    public <K> boolean containsKey(final K key, final Serializer<K> keySerializer) throws IOException {
    +      final byte[] rowIdBytes = serialize(key, keySerializer);
    +      final HBaseRowHandler handler = new HBaseRowHandler();
    +
    +      final List<Column> columnsList = new ArrayList<Column>(0);
    +
    +      hBaseClientService.scan(hBaseCacheTableName, rowIdBytes, rowIdBytes, columnsList, handler);
    +      return (handler.numRows() > 0);
    +    }
    +
    +    @Override
    +    public <K, V> V getAndPutIfAbsent(final K key, final V value, final Serializer<K> keySerializer, final Serializer<V> valueSerializer, final Deserializer<V> valueDeserializer) throws IOException {
    --- End diff --
    
    @baolsen thanks for looking into those atomic methods... I had a similar realization yesterday about the getAndPutIfAbsent, basically this one is really challenging to make 100% atomic unless the backing data store provides the same capability. I think it is fine to take the approach you outlined in the pseudo-code and just have a java-doc on that method that mentions how it works. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi issue #1645: NIFI-3644 - Added HBase_1_1_2_ClientMapCacheService

Posted by baolsen <gi...@git.apache.org>.
Github user baolsen commented on the issue:

    https://github.com/apache/nifi/pull/1645
  
    Hi @bbende, please take a look at this PR.
    
    I've added HBase_1_1_2_ClientMapCacheService as a controller service which uses the HBase_1_1_2_ClientService to store a cache of values on HBase. 
    
    Can be used in the DetectDuplicate processor in place of a DistributedMapCache (and other processors as well)
    
    Travis build is passing 4/5, not sure why one of the languages would fail.
    The AppVeyor build is failing on a specific test "TestListFile.testAttributesSet" which I don't think is mine.
    
    Let me know what you think.
    Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi pull request #1645: NIFI-3644 - Added HBase_1_1_2_ClientMapCacheService

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/nifi/pull/1645


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi pull request #1645: NIFI-3644 - Added HBase_1_1_2_ClientMapCacheService

Posted by bbende <gi...@git.apache.org>.
Github user bbende commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/1645#discussion_r109752585
  
    --- Diff: nifi-nar-bundles/nifi-standard-services/nifi-hbase_1_1_2-client-service-bundle/nifi-hbase_1_1_2-client-service/src/main/java/org/apache/nifi/hbase/HBase_1_1_2_ClientMapCacheService.java ---
    @@ -0,0 +1,224 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.nifi.hbase;
    +
    +import java.io.ByteArrayOutputStream;
    +import java.io.IOException;
    +import java.util.ArrayList;
    +import java.util.List;
    +
    +import org.apache.nifi.annotation.documentation.CapabilityDescription;
    +import org.apache.nifi.annotation.documentation.SeeAlso;
    +import org.apache.nifi.annotation.documentation.Tags;
    +import org.apache.nifi.annotation.lifecycle.OnEnabled;
    +import org.apache.nifi.components.PropertyDescriptor;
    +import org.apache.nifi.controller.AbstractControllerService;
    +import org.apache.nifi.controller.ConfigurationContext;
    +
    +import org.apache.nifi.distributed.cache.client.DistributedMapCacheClient;
    +import org.apache.nifi.distributed.cache.client.Serializer;
    +import org.apache.nifi.distributed.cache.client.Deserializer;
    +import java.io.ByteArrayOutputStream;
    +import org.apache.nifi.reporting.InitializationException;
    +
    +import java.nio.charset.StandardCharsets;
    +import org.apache.nifi.hbase.scan.ResultCell;
    +import org.apache.nifi.hbase.scan.ResultHandler;
    +import org.apache.nifi.hbase.scan.Column;
    +import org.apache.nifi.hbase.put.PutColumn;
    +
    +
    +import org.apache.nifi.processor.util.StandardValidators;
    +
    +@Tags({"distributed", "cache", "state", "map", "cluster","hbase"})
    +@SeeAlso(classNames = {"org.apache.nifi.distributed.cache.server.map.DistributedMapCacheClient", "org.apache.nifi.hbase.HBase_1_1_2_ClientService"})
    +@CapabilityDescription("Provides the ability to use an HBase table as a cache, in place of a DistributedMapCache."
    +    + " Uses a HBase_1_1_2_ClientService controller to communicate with HBase.")
    +
    +public class HBase_1_1_2_ClientMapCacheService extends AbstractControllerService implements DistributedMapCacheClient {
    +
    +    static final PropertyDescriptor HBASE_CLIENT_SERVICE = new PropertyDescriptor.Builder()
    +        .name("HBase Client Service")
    +        .description("Specifies the HBase Client Controller Service to use for accessing HBase.")
    +        .required(true)
    +        .identifiesControllerService(HBaseClientService.class)
    +        .build();
    +
    +    public static final PropertyDescriptor HBASE_CACHE_TABLE_NAME = new PropertyDescriptor.Builder()
    +        .name("HBase Cache Table Name")
    +        .description("Name of the table on HBase to use for the cache.")
    +        .required(true)
    +        .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +        .build();
    +
    +    public static final PropertyDescriptor HBASE_COLUMN_FAMILY = new PropertyDescriptor.Builder()
    +        .name("HBase Column Family")
    +        .description("Name of the column family on HBase to use for the cache.")
    +        .required(true)
    +        .defaultValue("f")
    +        .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +        .build();
    +
    +    public static final PropertyDescriptor HBASE_COLUMN_QUALIFIER = new PropertyDescriptor.Builder()
    +        .name("HBase Column Qualifier")
    +        .description("Name of the column qualifier on HBase to use for the cache")
    +        .defaultValue("q")
    +        .required(true)
    +        .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +        .build();
    +
    +    @Override
    +    protected List<PropertyDescriptor> getSupportedPropertyDescriptors() {
    +        final List<PropertyDescriptor> descriptors = new ArrayList<>();
    +        descriptors.add(HBASE_CACHE_TABLE_NAME);
    +        descriptors.add(HBASE_CLIENT_SERVICE);
    +        descriptors.add(HBASE_COLUMN_FAMILY);
    +        descriptors.add(HBASE_COLUMN_QUALIFIER);
    +        return descriptors;
    +    }
    +
    +    private String hBaseCacheTableName;
    +    private HBaseClientService hBaseClientService;
    +
    +    private String hBaseColumnFamily;
    +    private byte[] hBaseColumnFamilyBytes;
    +
    +    private String hBaseColumnQualifier;
    +    private byte[] hBaseColumnQualifierBytes;
    +
    +    @OnEnabled
    +    public void onConfigured(final ConfigurationContext context) throws InitializationException{
    +       hBaseCacheTableName  = context.getProperty(HBASE_CACHE_TABLE_NAME).getValue();
    +       hBaseClientService   = context.getProperty(HBASE_CLIENT_SERVICE).asControllerService(HBaseClientService.class);
    +       hBaseColumnFamily    = context.getProperty(HBASE_COLUMN_FAMILY).getValue();
    +       hBaseColumnQualifier = context.getProperty(HBASE_COLUMN_QUALIFIER).getValue();
    +
    +       hBaseColumnFamilyBytes    = hBaseColumnFamily.getBytes(StandardCharsets.UTF_8);
    +       hBaseColumnQualifierBytes = hBaseColumnQualifier.getBytes(StandardCharsets.UTF_8);
    +    }
    +
    +    private <T> byte[] serialize(final T value, final Serializer<T> serializer) throws IOException {
    +        final ByteArrayOutputStream baos = new ByteArrayOutputStream();
    +        serializer.serialize(value, baos);
    +        return baos.toByteArray();
    +    }
    +    private <T> T deserialize(final byte[] value, final Deserializer<T> deserializer) throws IOException {
    +        return deserializer.deserialize(value);
    +    }
    +
    +
    +    @Override
    +    public <K, V> boolean putIfAbsent(final K key, final V value, final Serializer<K> keySerializer, final Serializer<V> valueSerializer) throws IOException {
    +      if (containsKey(key, keySerializer)) {
    +        put(key, value, keySerializer, valueSerializer);
    +        return true;
    +      } else return false;
    +    }
    +
    +    @Override
    +    public <K, V> void put(final K key, final V value, final Serializer<K> keySerializer, final Serializer<V> valueSerializer) throws IOException {
    +
    +        List<PutColumn> putColumns = new ArrayList<PutColumn>(1);
    +        final byte[] rowIdBytes = serialize(key, keySerializer);
    +        final byte[] valueBytes = serialize(value, valueSerializer);
    +
    +        final PutColumn putColumn = new PutColumn(hBaseColumnFamilyBytes, hBaseColumnQualifierBytes, valueBytes);
    +        putColumns.add(putColumn);
    +
    +        hBaseClientService.put(hBaseCacheTableName, rowIdBytes, putColumns);
    +    }
    +
    +    @Override
    +    public <K> boolean containsKey(final K key, final Serializer<K> keySerializer) throws IOException {
    +      final byte[] rowIdBytes = serialize(key, keySerializer);
    +      final HBaseRowHandler handler = new HBaseRowHandler();
    +
    +      final List<Column> columnsList = new ArrayList<Column>(0);
    +
    +      hBaseClientService.scan(hBaseCacheTableName, rowIdBytes, rowIdBytes, columnsList, handler);
    +      return (handler.numRows() > 0);
    +    }
    +
    +    @Override
    +    public <K, V> V getAndPutIfAbsent(final K key, final V value, final Serializer<K> keySerializer, final Serializer<V> valueSerializer, final Deserializer<V> valueDeserializer) throws IOException {
    --- End diff --
    
    For putIfAbsent and getAndPutIfAbsent, they currently require two client calls (a get + conditional write). This could be problematic in that you can't ensure that after doing the get that something else hasn't modified the value before you do the write, this could happen for example in a cluster where two nodes go to do a putIfAbsent at the same time with the same key. 
    
    It looks like HBase has some atomic checkAndMutate operations:
    https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html
    
    I think we should see if we can use these in place of the two calls, although admittedly I have not used them myself.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi pull request #1645: NIFI-3644 - Added HBase_1_1_2_ClientMapCacheService

Posted by baolsen <gi...@git.apache.org>.
Github user baolsen commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/1645#discussion_r110191840
  
    --- Diff: nifi-nar-bundles/nifi-standard-services/nifi-hbase_1_1_2-client-service-bundle/nifi-hbase_1_1_2-client-service/src/main/java/org/apache/nifi/hbase/HBase_1_1_2_ClientMapCacheService.java ---
    @@ -0,0 +1,224 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.nifi.hbase;
    +
    +import java.io.ByteArrayOutputStream;
    +import java.io.IOException;
    +import java.util.ArrayList;
    +import java.util.List;
    +
    +import org.apache.nifi.annotation.documentation.CapabilityDescription;
    +import org.apache.nifi.annotation.documentation.SeeAlso;
    +import org.apache.nifi.annotation.documentation.Tags;
    +import org.apache.nifi.annotation.lifecycle.OnEnabled;
    +import org.apache.nifi.components.PropertyDescriptor;
    +import org.apache.nifi.controller.AbstractControllerService;
    +import org.apache.nifi.controller.ConfigurationContext;
    +
    +import org.apache.nifi.distributed.cache.client.DistributedMapCacheClient;
    +import org.apache.nifi.distributed.cache.client.Serializer;
    +import org.apache.nifi.distributed.cache.client.Deserializer;
    +import java.io.ByteArrayOutputStream;
    +import org.apache.nifi.reporting.InitializationException;
    +
    +import java.nio.charset.StandardCharsets;
    +import org.apache.nifi.hbase.scan.ResultCell;
    +import org.apache.nifi.hbase.scan.ResultHandler;
    +import org.apache.nifi.hbase.scan.Column;
    +import org.apache.nifi.hbase.put.PutColumn;
    +
    +
    +import org.apache.nifi.processor.util.StandardValidators;
    +
    +@Tags({"distributed", "cache", "state", "map", "cluster","hbase"})
    +@SeeAlso(classNames = {"org.apache.nifi.distributed.cache.server.map.DistributedMapCacheClient", "org.apache.nifi.hbase.HBase_1_1_2_ClientService"})
    +@CapabilityDescription("Provides the ability to use an HBase table as a cache, in place of a DistributedMapCache."
    +    + " Uses a HBase_1_1_2_ClientService controller to communicate with HBase.")
    +
    +public class HBase_1_1_2_ClientMapCacheService extends AbstractControllerService implements DistributedMapCacheClient {
    +
    +    static final PropertyDescriptor HBASE_CLIENT_SERVICE = new PropertyDescriptor.Builder()
    +        .name("HBase Client Service")
    +        .description("Specifies the HBase Client Controller Service to use for accessing HBase.")
    +        .required(true)
    +        .identifiesControllerService(HBaseClientService.class)
    +        .build();
    +
    +    public static final PropertyDescriptor HBASE_CACHE_TABLE_NAME = new PropertyDescriptor.Builder()
    +        .name("HBase Cache Table Name")
    +        .description("Name of the table on HBase to use for the cache.")
    +        .required(true)
    +        .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +        .build();
    +
    +    public static final PropertyDescriptor HBASE_COLUMN_FAMILY = new PropertyDescriptor.Builder()
    +        .name("HBase Column Family")
    +        .description("Name of the column family on HBase to use for the cache.")
    +        .required(true)
    +        .defaultValue("f")
    +        .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +        .build();
    +
    +    public static final PropertyDescriptor HBASE_COLUMN_QUALIFIER = new PropertyDescriptor.Builder()
    +        .name("HBase Column Qualifier")
    +        .description("Name of the column qualifier on HBase to use for the cache")
    +        .defaultValue("q")
    +        .required(true)
    +        .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +        .build();
    +
    +    @Override
    +    protected List<PropertyDescriptor> getSupportedPropertyDescriptors() {
    +        final List<PropertyDescriptor> descriptors = new ArrayList<>();
    +        descriptors.add(HBASE_CACHE_TABLE_NAME);
    +        descriptors.add(HBASE_CLIENT_SERVICE);
    +        descriptors.add(HBASE_COLUMN_FAMILY);
    +        descriptors.add(HBASE_COLUMN_QUALIFIER);
    +        return descriptors;
    +    }
    +
    +    private String hBaseCacheTableName;
    +    private HBaseClientService hBaseClientService;
    +
    +    private String hBaseColumnFamily;
    +    private byte[] hBaseColumnFamilyBytes;
    +
    +    private String hBaseColumnQualifier;
    +    private byte[] hBaseColumnQualifierBytes;
    +
    +    @OnEnabled
    +    public void onConfigured(final ConfigurationContext context) throws InitializationException{
    +       hBaseCacheTableName  = context.getProperty(HBASE_CACHE_TABLE_NAME).getValue();
    +       hBaseClientService   = context.getProperty(HBASE_CLIENT_SERVICE).asControllerService(HBaseClientService.class);
    +       hBaseColumnFamily    = context.getProperty(HBASE_COLUMN_FAMILY).getValue();
    +       hBaseColumnQualifier = context.getProperty(HBASE_COLUMN_QUALIFIER).getValue();
    +
    +       hBaseColumnFamilyBytes    = hBaseColumnFamily.getBytes(StandardCharsets.UTF_8);
    +       hBaseColumnQualifierBytes = hBaseColumnQualifier.getBytes(StandardCharsets.UTF_8);
    +    }
    +
    +    private <T> byte[] serialize(final T value, final Serializer<T> serializer) throws IOException {
    +        final ByteArrayOutputStream baos = new ByteArrayOutputStream();
    +        serializer.serialize(value, baos);
    +        return baos.toByteArray();
    +    }
    +    private <T> T deserialize(final byte[] value, final Deserializer<T> deserializer) throws IOException {
    +        return deserializer.deserialize(value);
    +    }
    +
    +
    +    @Override
    +    public <K, V> boolean putIfAbsent(final K key, final V value, final Serializer<K> keySerializer, final Serializer<V> valueSerializer) throws IOException {
    +      if (containsKey(key, keySerializer)) {
    +        put(key, value, keySerializer, valueSerializer);
    +        return true;
    +      } else return false;
    +    }
    +
    +    @Override
    +    public <K, V> void put(final K key, final V value, final Serializer<K> keySerializer, final Serializer<V> valueSerializer) throws IOException {
    +
    +        List<PutColumn> putColumns = new ArrayList<PutColumn>(1);
    +        final byte[] rowIdBytes = serialize(key, keySerializer);
    +        final byte[] valueBytes = serialize(value, valueSerializer);
    +
    +        final PutColumn putColumn = new PutColumn(hBaseColumnFamilyBytes, hBaseColumnQualifierBytes, valueBytes);
    +        putColumns.add(putColumn);
    +
    +        hBaseClientService.put(hBaseCacheTableName, rowIdBytes, putColumns);
    +    }
    +
    +    @Override
    +    public <K> boolean containsKey(final K key, final Serializer<K> keySerializer) throws IOException {
    +      final byte[] rowIdBytes = serialize(key, keySerializer);
    +      final HBaseRowHandler handler = new HBaseRowHandler();
    +
    +      final List<Column> columnsList = new ArrayList<Column>(0);
    +
    +      hBaseClientService.scan(hBaseCacheTableName, rowIdBytes, rowIdBytes, columnsList, handler);
    +      return (handler.numRows() > 0);
    +    }
    +
    +    @Override
    +    public <K, V> V getAndPutIfAbsent(final K key, final V value, final Serializer<K> keySerializer, final Serializer<V> valueSerializer, final Deserializer<V> valueDeserializer) throws IOException {
    --- End diff --
    
    Thanks for all the feedback.
    I'm going to work through your comments and update the code accordingly. 
    Not sure how to do the tests but I'll take a look.
    
    I was researching for some way to do atomic operations on HBase but lucked out. checkAndMutate (and, checkAndPut) will work really well for this purpose - thanks for that. I'll update the HBase ClientService with them since I'm sure they'll be useful to other processors. Eg I need another processor that gets a unique incrementing ID (which I can use an HBase get/scan, then a checkAndPut to accomplish)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi issue #1645: NIFI-3644 - Added HBase_1_1_2_ClientMapCacheService

Posted by baolsen <gi...@git.apache.org>.
Github user baolsen commented on the issue:

    https://github.com/apache/nifi/pull/1645
  
    @bbende 
    Ready for another review! I've updated per your comments and added some unit tests.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi issue #1645: NIFI-3644 - Added HBase_1_1_2_ClientMapCacheService

Posted by joewitt <gi...@git.apache.org>.
Github user joewitt commented on the issue:

    https://github.com/apache/nifi/pull/1645
  
    This looks like it could be pretty helpful!  I wonder if in light of the recent LookupService work we should consider exposing/using this via that interface instead of or in addition to this distributed cache one.  Thoughts?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---