You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nifi.apache.org by Shawn Weeks <sw...@weeksconsulting.us> on 2019/04/25 01:27:25 UTC

Adding HBase Support for AtomicDistributedMapCacheClient

Seems like this should be fairly easy for HBase 2.x with the checkAndMutate functionality and I was wondering if there is already a Jira for this. Otherwise I might make an attempt at it. It would be good to be able to support Wait/Notify and other things that need AtomicDistributedMapCacheClient using an Apache developed product commonly found in a Hadoop Cluster.

Thanks
Shawn

RE: Adding HBase Support for AtomicDistributedMapCacheClient

Posted by Shawn Weeks <sw...@weeksconsulting.us>.
Another item that seems to fail is the call to DiskUtils.deleteRecursively(rootFile); in TestFileSystemRepository. Is there a reason why rootFile.delete() isn't sufficient? It deletes directories as well.

Thanks
Shawn Weeks

-----Original Message-----
From: Shawn Weeks <sw...@weeksconsulting.us> 
Sent: Saturday, May 4, 2019 10:02 AM
To: dev@nifi.apache.org
Subject: RE: Adding HBase Support for AtomicDistributedMapCacheClient

I discovered what appears to be a bug while compiling on Windows.

Line 46 of NiFiGroovyTest.groovy should be

private static final String TEST_RES_PATH = Paths.get(NiFiGroovyTest.getClassLoader().getResource(".").toURI()).toString()

Instead of

private static final String TEST_RES_PATH = NiFiGroovyTest.getClassLoader().getResource(".").toURI().getPath()

See posts like https://stackoverflow.com/questions/43972777/exception-in-thread-main-java-nio-file-invalidpathexception-illegal-char for an explanation.

Thanks
Shawn Weeks

-----Original Message-----
From: Shawn Weeks <sw...@weeksconsulting.us>
Sent: Saturday, May 4, 2019 9:07 AM
To: dev@nifi.apache.org
Subject: RE: Adding HBase Support for AtomicDistributedMapCacheClient

I've created Pull Request https://github.com/apache/nifi/pull/3462 for this change. I'm still doing some testing and it might not actually work right but I wanted some other folks to be able to see it. If anyone knows how to do include timestamp in a checkAndPut for HBase 1.x let me know and I'll implement it.

Thanks
Shawn Weeks

-----Original Message-----
From: Bryan Bende <bb...@gmail.com>
Sent: Thursday, April 25, 2019 7:05 PM
To: dev@nifi.apache.org
Subject: Re: Adding HBase Support for AtomicDistributedMapCacheClient

Should be available through the existing scan methods, they take a ResultHandler which gets passed an array of ResultCells, and each one has the timestamp. 

> On Apr 25, 2019, at 7:52 PM, Shawn Weeks <sw...@weeksconsulting.us> wrote:
> 
> I haven't looked at the other side of equation yet and that's how to get the timestamp on fetch. That will probably require a change or new scan method.
> 
> Thanks
> Shawn
> 
> -----Original Message-----
> From: Bryan Bende <bb...@gmail.com>
> Sent: Thursday, April 25, 2019 4:29 PM
> To: dev@nifi.apache.org
> Subject: Re: Adding HBase Support for AtomicDistributedMapCacheClient
> 
> Also just realized that we do have two versions of the HBase DMC client service, so they could each do different things.
> 
> The HBase_1_1_2_ClientMapCacheService could call the original checkAndPut, and the  HBase_2_x_ClientMapCacheService could call the method.
> 
> In this approach the 1_1_2 client service could throw unsupported for the new method since it would never be used.
> 
> On Thu, Apr 25, 2019 at 5:25 PM Bryan Bende <bb...@gmail.com> wrote:
>> 
>> Thanks, I'm following now...
>> 
>> I think adding the new method to the interface and throwing 
>> UnsupportedOperationException for 1_1_2, or using the original 
>> checkAndPut and implementing it in both services, would both be fine 
>> solutions.
>> 
>> I guess another variation might be to introduce the new method in the 
>> interface, but in the 1_1_2 implementation just delegate back to the 
>> original checkAndPut and ignore the timestamp, and document that it 
>> isn't used in that implementation. I don't love this, but it does 
>> allow both services to implement the functionality and still leverage 
>> the better solution for 2_x.
>> 
>> 
>> On Thu, Apr 25, 2019 at 3:54 PM Shawn Weeks <sw...@weeksconsulting.us> wrote:
>>> 
>>> Here is what I think the new checkAndPut or checkAndMutate method would look like. This also shows what the new mutate api looks like.
>>> 
>>>    @Override
>>>    public boolean checkAndPut(String tableName, byte[] rowId, byte[] family, byte[] qualifier, byte[] value, long timestamp, PutColumn column) throws IOException {
>>>        try (final Table table = connection.getTable(TableName.valueOf(tableName))) {
>>>            Put put = new Put(rowId);
>>>            put.addColumn(
>>>                    column.getColumnFamily(),
>>>                    column.getColumnQualifier(),
>>>                    column.getBuffer());
>>>            return table.checkAndMutate(rowId, family).qualifier(qualifier).ifEquals(value).timeRange(TimeRange.at(timestamp)).thenPut(put);
>>>        }
>>>    }
>>> 
>>> If the atomic guarantee for the original checkAndPut is good enough then there is no reason I can't implement the atomic map cache for both versions of HBase.
>>> 
>>> Thanks
>>> Shawn
>>> 
>>> -----Original Message-----
>>> From: Bryan Bende <bb...@gmail.com>
>>> Sent: Thursday, April 25, 2019 12:39 PM
>>> To: dev@nifi.apache.org
>>> Subject: Re: Adding HBase Support for 
>>> AtomicDistributedMapCacheClient
>>> 
>>> I'm not totally if would matter if there were changes in between, as long as the current value is what we thought it was then the changes we are sending back should be accurate as a replacement. As a simplified scenario, if the current value is 1 and thread-A retrieves that value, thread-B then changes it to 2 and back to 1 before thread-A can do anything, then thread-A sends in 2 with a previous of 1, that is still the correct replacement.
>>> 
>>> I can see the argument for using the timestamp though... can you show the method signature of the new checkAndMutate method that would need to be added to the client service, and also which method of the HBase client it needs to call?
>>> 
>>> Just so I can get an idea of the differences between 1.x and 2.x.
>>> 
>>> On Thu, Apr 25, 2019 at 1:00 PM Shawn Weeks <sw...@weeksconsulting.us> wrote:
>>>> 
>>>> While checkAndPut is atomic as it's built now it doesn't support also checking the timestamp range which is included in the new checkAndMutate API. I had planned on using the cell's timestamp as the revision along with the value to ensure not only that the value hadn't been changed but that there hadn't been changes in between that just happened to put the value back.
>>>> 
>>>> As I was looking at everything I had another question. Why is the cache currently using a scan instead of a get to fetch values from HBase. It seems like that would be much less performant considering we know the row key we're looking for.
>>>> 
>>>> 
>>>> Thanks
>>>> Shawn
>>>> 
>>>> -----Original Message-----
>>>> From: Bryan Bende <bb...@gmail.com>
>>>> Sent: Thursday, April 25, 2019 11:56 AM
>>>> To: dev@nifi.apache.org
>>>> Subject: Re: Adding HBase Support for 
>>>> AtomicDistributedMapCacheClient
>>>> 
>>>> Can it not be done with the existing checkAndPut method? [1]
>>>> 
>>>> I think if you use the value as the revision it should work. Would be similar to how the Redis implementation works [2].
>>>> 
>>>> [1]
>>>> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-s
>>>> tand
>>>> ard-services/nifi-hbase-client-service-api/src/main/java/org/apach
>>>> e/ni
>>>> fi/hbase/HBaseClientService.java#L65
>>>> [2]
>>>> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-r
>>>> edis
>>>> -bundle/nifi-redis-extensions/src/main/java/org/apache/nifi/redis/
>>>> serv
>>>> ice/RedisDistributedMapCacheClientService.java#L271
>>>> 
>>>> On Thu, Apr 25, 2019 at 12:38 PM Shawn Weeks <sw...@weeksconsulting.us> wrote:
>>>>> 
>>>>> I'll need to add a check and mutate method to the HBaseClientService Interface, should I just extend with a HBase2ClientService or add checkAndMutate to the existing interface and just make it raise an exception if you try and use it against hbase 1? While Hbase 1.x supports checkAndMutate it doesn't provide a way to filter on timestamp which is part of how I was going to implement the revision requirement for AtomicMapCache.
>>>>> 
>>>>> Thanks
>>>>> Shawn
>>>>> 
>>>>> -----Original Message-----
>>>>> From: Bryan Bende <bb...@gmail.com>
>>>>> Sent: Thursday, April 25, 2019 9:11 AM
>>>>> To: dev@nifi.apache.org
>>>>> Subject: Re: Adding HBase Support for 
>>>>> AtomicDistributedMapCacheClient
>>>>> 
>>>>> I'm not aware of a JIRA, so I'd say go for it.
>>>>> 
>>>>> On Wed, Apr 24, 2019 at 9:27 PM Shawn Weeks <sw...@weeksconsulting.us> wrote:
>>>>>> 
>>>>>> Seems like this should be fairly easy for HBase 2.x with the checkAndMutate functionality and I was wondering if there is already a Jira for this. Otherwise I might make an attempt at it. It would be good to be able to support Wait/Notify and other things that need AtomicDistributedMapCacheClient using an Apache developed product commonly found in a Hadoop Cluster.
>>>>>> 
>>>>>> Thanks
>>>>>> Shawn


RE: Adding HBase Support for AtomicDistributedMapCacheClient

Posted by Shawn Weeks <sw...@weeksconsulting.us>.
I discovered what appears to be a bug while compiling on Windows.

Line 46 of NiFiGroovyTest.groovy should be

private static final String TEST_RES_PATH = Paths.get(NiFiGroovyTest.getClassLoader().getResource(".").toURI()).toString()

Instead of

private static final String TEST_RES_PATH = NiFiGroovyTest.getClassLoader().getResource(".").toURI().getPath()

See posts like https://stackoverflow.com/questions/43972777/exception-in-thread-main-java-nio-file-invalidpathexception-illegal-char for an explanation.

Thanks
Shawn Weeks

-----Original Message-----
From: Shawn Weeks <sw...@weeksconsulting.us> 
Sent: Saturday, May 4, 2019 9:07 AM
To: dev@nifi.apache.org
Subject: RE: Adding HBase Support for AtomicDistributedMapCacheClient

I've created Pull Request https://github.com/apache/nifi/pull/3462 for this change. I'm still doing some testing and it might not actually work right but I wanted some other folks to be able to see it. If anyone knows how to do include timestamp in a checkAndPut for HBase 1.x let me know and I'll implement it.

Thanks
Shawn Weeks

-----Original Message-----
From: Bryan Bende <bb...@gmail.com>
Sent: Thursday, April 25, 2019 7:05 PM
To: dev@nifi.apache.org
Subject: Re: Adding HBase Support for AtomicDistributedMapCacheClient

Should be available through the existing scan methods, they take a ResultHandler which gets passed an array of ResultCells, and each one has the timestamp. 

> On Apr 25, 2019, at 7:52 PM, Shawn Weeks <sw...@weeksconsulting.us> wrote:
> 
> I haven't looked at the other side of equation yet and that's how to get the timestamp on fetch. That will probably require a change or new scan method.
> 
> Thanks
> Shawn
> 
> -----Original Message-----
> From: Bryan Bende <bb...@gmail.com>
> Sent: Thursday, April 25, 2019 4:29 PM
> To: dev@nifi.apache.org
> Subject: Re: Adding HBase Support for AtomicDistributedMapCacheClient
> 
> Also just realized that we do have two versions of the HBase DMC client service, so they could each do different things.
> 
> The HBase_1_1_2_ClientMapCacheService could call the original checkAndPut, and the  HBase_2_x_ClientMapCacheService could call the method.
> 
> In this approach the 1_1_2 client service could throw unsupported for the new method since it would never be used.
> 
> On Thu, Apr 25, 2019 at 5:25 PM Bryan Bende <bb...@gmail.com> wrote:
>> 
>> Thanks, I'm following now...
>> 
>> I think adding the new method to the interface and throwing 
>> UnsupportedOperationException for 1_1_2, or using the original 
>> checkAndPut and implementing it in both services, would both be fine 
>> solutions.
>> 
>> I guess another variation might be to introduce the new method in the 
>> interface, but in the 1_1_2 implementation just delegate back to the 
>> original checkAndPut and ignore the timestamp, and document that it 
>> isn't used in that implementation. I don't love this, but it does 
>> allow both services to implement the functionality and still leverage 
>> the better solution for 2_x.
>> 
>> 
>> On Thu, Apr 25, 2019 at 3:54 PM Shawn Weeks <sw...@weeksconsulting.us> wrote:
>>> 
>>> Here is what I think the new checkAndPut or checkAndMutate method would look like. This also shows what the new mutate api looks like.
>>> 
>>>    @Override
>>>    public boolean checkAndPut(String tableName, byte[] rowId, byte[] family, byte[] qualifier, byte[] value, long timestamp, PutColumn column) throws IOException {
>>>        try (final Table table = connection.getTable(TableName.valueOf(tableName))) {
>>>            Put put = new Put(rowId);
>>>            put.addColumn(
>>>                    column.getColumnFamily(),
>>>                    column.getColumnQualifier(),
>>>                    column.getBuffer());
>>>            return table.checkAndMutate(rowId, family).qualifier(qualifier).ifEquals(value).timeRange(TimeRange.at(timestamp)).thenPut(put);
>>>        }
>>>    }
>>> 
>>> If the atomic guarantee for the original checkAndPut is good enough then there is no reason I can't implement the atomic map cache for both versions of HBase.
>>> 
>>> Thanks
>>> Shawn
>>> 
>>> -----Original Message-----
>>> From: Bryan Bende <bb...@gmail.com>
>>> Sent: Thursday, April 25, 2019 12:39 PM
>>> To: dev@nifi.apache.org
>>> Subject: Re: Adding HBase Support for 
>>> AtomicDistributedMapCacheClient
>>> 
>>> I'm not totally if would matter if there were changes in between, as long as the current value is what we thought it was then the changes we are sending back should be accurate as a replacement. As a simplified scenario, if the current value is 1 and thread-A retrieves that value, thread-B then changes it to 2 and back to 1 before thread-A can do anything, then thread-A sends in 2 with a previous of 1, that is still the correct replacement.
>>> 
>>> I can see the argument for using the timestamp though... can you show the method signature of the new checkAndMutate method that would need to be added to the client service, and also which method of the HBase client it needs to call?
>>> 
>>> Just so I can get an idea of the differences between 1.x and 2.x.
>>> 
>>> On Thu, Apr 25, 2019 at 1:00 PM Shawn Weeks <sw...@weeksconsulting.us> wrote:
>>>> 
>>>> While checkAndPut is atomic as it's built now it doesn't support also checking the timestamp range which is included in the new checkAndMutate API. I had planned on using the cell's timestamp as the revision along with the value to ensure not only that the value hadn't been changed but that there hadn't been changes in between that just happened to put the value back.
>>>> 
>>>> As I was looking at everything I had another question. Why is the cache currently using a scan instead of a get to fetch values from HBase. It seems like that would be much less performant considering we know the row key we're looking for.
>>>> 
>>>> 
>>>> Thanks
>>>> Shawn
>>>> 
>>>> -----Original Message-----
>>>> From: Bryan Bende <bb...@gmail.com>
>>>> Sent: Thursday, April 25, 2019 11:56 AM
>>>> To: dev@nifi.apache.org
>>>> Subject: Re: Adding HBase Support for 
>>>> AtomicDistributedMapCacheClient
>>>> 
>>>> Can it not be done with the existing checkAndPut method? [1]
>>>> 
>>>> I think if you use the value as the revision it should work. Would be similar to how the Redis implementation works [2].
>>>> 
>>>> [1]
>>>> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-s
>>>> tand
>>>> ard-services/nifi-hbase-client-service-api/src/main/java/org/apach
>>>> e/ni
>>>> fi/hbase/HBaseClientService.java#L65
>>>> [2]
>>>> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-r
>>>> edis
>>>> -bundle/nifi-redis-extensions/src/main/java/org/apache/nifi/redis/
>>>> serv
>>>> ice/RedisDistributedMapCacheClientService.java#L271
>>>> 
>>>> On Thu, Apr 25, 2019 at 12:38 PM Shawn Weeks <sw...@weeksconsulting.us> wrote:
>>>>> 
>>>>> I'll need to add a check and mutate method to the HBaseClientService Interface, should I just extend with a HBase2ClientService or add checkAndMutate to the existing interface and just make it raise an exception if you try and use it against hbase 1? While Hbase 1.x supports checkAndMutate it doesn't provide a way to filter on timestamp which is part of how I was going to implement the revision requirement for AtomicMapCache.
>>>>> 
>>>>> Thanks
>>>>> Shawn
>>>>> 
>>>>> -----Original Message-----
>>>>> From: Bryan Bende <bb...@gmail.com>
>>>>> Sent: Thursday, April 25, 2019 9:11 AM
>>>>> To: dev@nifi.apache.org
>>>>> Subject: Re: Adding HBase Support for 
>>>>> AtomicDistributedMapCacheClient
>>>>> 
>>>>> I'm not aware of a JIRA, so I'd say go for it.
>>>>> 
>>>>> On Wed, Apr 24, 2019 at 9:27 PM Shawn Weeks <sw...@weeksconsulting.us> wrote:
>>>>>> 
>>>>>> Seems like this should be fairly easy for HBase 2.x with the checkAndMutate functionality and I was wondering if there is already a Jira for this. Otherwise I might make an attempt at it. It would be good to be able to support Wait/Notify and other things that need AtomicDistributedMapCacheClient using an Apache developed product commonly found in a Hadoop Cluster.
>>>>>> 
>>>>>> Thanks
>>>>>> Shawn


RE: Adding HBase Support for AtomicDistributedMapCacheClient

Posted by Shawn Weeks <sw...@weeksconsulting.us>.
I've created Pull Request https://github.com/apache/nifi/pull/3462 for this change. I'm still doing some testing and it might not actually work right but I wanted some other folks to be able to see it. If anyone knows how to do include timestamp in a checkAndPut for HBase 1.x let me know and I'll implement it.

Thanks
Shawn Weeks

-----Original Message-----
From: Bryan Bende <bb...@gmail.com> 
Sent: Thursday, April 25, 2019 7:05 PM
To: dev@nifi.apache.org
Subject: Re: Adding HBase Support for AtomicDistributedMapCacheClient

Should be available through the existing scan methods, they take a ResultHandler which gets passed an array of ResultCells, and each one has the timestamp. 

> On Apr 25, 2019, at 7:52 PM, Shawn Weeks <sw...@weeksconsulting.us> wrote:
> 
> I haven't looked at the other side of equation yet and that's how to get the timestamp on fetch. That will probably require a change or new scan method.
> 
> Thanks
> Shawn
> 
> -----Original Message-----
> From: Bryan Bende <bb...@gmail.com>
> Sent: Thursday, April 25, 2019 4:29 PM
> To: dev@nifi.apache.org
> Subject: Re: Adding HBase Support for AtomicDistributedMapCacheClient
> 
> Also just realized that we do have two versions of the HBase DMC client service, so they could each do different things.
> 
> The HBase_1_1_2_ClientMapCacheService could call the original checkAndPut, and the  HBase_2_x_ClientMapCacheService could call the method.
> 
> In this approach the 1_1_2 client service could throw unsupported for the new method since it would never be used.
> 
> On Thu, Apr 25, 2019 at 5:25 PM Bryan Bende <bb...@gmail.com> wrote:
>> 
>> Thanks, I'm following now...
>> 
>> I think adding the new method to the interface and throwing 
>> UnsupportedOperationException for 1_1_2, or using the original 
>> checkAndPut and implementing it in both services, would both be fine 
>> solutions.
>> 
>> I guess another variation might be to introduce the new method in the 
>> interface, but in the 1_1_2 implementation just delegate back to the 
>> original checkAndPut and ignore the timestamp, and document that it 
>> isn't used in that implementation. I don't love this, but it does 
>> allow both services to implement the functionality and still leverage 
>> the better solution for 2_x.
>> 
>> 
>> On Thu, Apr 25, 2019 at 3:54 PM Shawn Weeks <sw...@weeksconsulting.us> wrote:
>>> 
>>> Here is what I think the new checkAndPut or checkAndMutate method would look like. This also shows what the new mutate api looks like.
>>> 
>>>    @Override
>>>    public boolean checkAndPut(String tableName, byte[] rowId, byte[] family, byte[] qualifier, byte[] value, long timestamp, PutColumn column) throws IOException {
>>>        try (final Table table = connection.getTable(TableName.valueOf(tableName))) {
>>>            Put put = new Put(rowId);
>>>            put.addColumn(
>>>                    column.getColumnFamily(),
>>>                    column.getColumnQualifier(),
>>>                    column.getBuffer());
>>>            return table.checkAndMutate(rowId, family).qualifier(qualifier).ifEquals(value).timeRange(TimeRange.at(timestamp)).thenPut(put);
>>>        }
>>>    }
>>> 
>>> If the atomic guarantee for the original checkAndPut is good enough then there is no reason I can't implement the atomic map cache for both versions of HBase.
>>> 
>>> Thanks
>>> Shawn
>>> 
>>> -----Original Message-----
>>> From: Bryan Bende <bb...@gmail.com>
>>> Sent: Thursday, April 25, 2019 12:39 PM
>>> To: dev@nifi.apache.org
>>> Subject: Re: Adding HBase Support for 
>>> AtomicDistributedMapCacheClient
>>> 
>>> I'm not totally if would matter if there were changes in between, as long as the current value is what we thought it was then the changes we are sending back should be accurate as a replacement. As a simplified scenario, if the current value is 1 and thread-A retrieves that value, thread-B then changes it to 2 and back to 1 before thread-A can do anything, then thread-A sends in 2 with a previous of 1, that is still the correct replacement.
>>> 
>>> I can see the argument for using the timestamp though... can you show the method signature of the new checkAndMutate method that would need to be added to the client service, and also which method of the HBase client it needs to call?
>>> 
>>> Just so I can get an idea of the differences between 1.x and 2.x.
>>> 
>>> On Thu, Apr 25, 2019 at 1:00 PM Shawn Weeks <sw...@weeksconsulting.us> wrote:
>>>> 
>>>> While checkAndPut is atomic as it's built now it doesn't support also checking the timestamp range which is included in the new checkAndMutate API. I had planned on using the cell's timestamp as the revision along with the value to ensure not only that the value hadn't been changed but that there hadn't been changes in between that just happened to put the value back.
>>>> 
>>>> As I was looking at everything I had another question. Why is the cache currently using a scan instead of a get to fetch values from HBase. It seems like that would be much less performant considering we know the row key we're looking for.
>>>> 
>>>> 
>>>> Thanks
>>>> Shawn
>>>> 
>>>> -----Original Message-----
>>>> From: Bryan Bende <bb...@gmail.com>
>>>> Sent: Thursday, April 25, 2019 11:56 AM
>>>> To: dev@nifi.apache.org
>>>> Subject: Re: Adding HBase Support for 
>>>> AtomicDistributedMapCacheClient
>>>> 
>>>> Can it not be done with the existing checkAndPut method? [1]
>>>> 
>>>> I think if you use the value as the revision it should work. Would be similar to how the Redis implementation works [2].
>>>> 
>>>> [1]
>>>> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-s
>>>> tand
>>>> ard-services/nifi-hbase-client-service-api/src/main/java/org/apach
>>>> e/ni
>>>> fi/hbase/HBaseClientService.java#L65
>>>> [2]
>>>> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-r
>>>> edis
>>>> -bundle/nifi-redis-extensions/src/main/java/org/apache/nifi/redis/
>>>> serv
>>>> ice/RedisDistributedMapCacheClientService.java#L271
>>>> 
>>>> On Thu, Apr 25, 2019 at 12:38 PM Shawn Weeks <sw...@weeksconsulting.us> wrote:
>>>>> 
>>>>> I'll need to add a check and mutate method to the HBaseClientService Interface, should I just extend with a HBase2ClientService or add checkAndMutate to the existing interface and just make it raise an exception if you try and use it against hbase 1? While Hbase 1.x supports checkAndMutate it doesn't provide a way to filter on timestamp which is part of how I was going to implement the revision requirement for AtomicMapCache.
>>>>> 
>>>>> Thanks
>>>>> Shawn
>>>>> 
>>>>> -----Original Message-----
>>>>> From: Bryan Bende <bb...@gmail.com>
>>>>> Sent: Thursday, April 25, 2019 9:11 AM
>>>>> To: dev@nifi.apache.org
>>>>> Subject: Re: Adding HBase Support for 
>>>>> AtomicDistributedMapCacheClient
>>>>> 
>>>>> I'm not aware of a JIRA, so I'd say go for it.
>>>>> 
>>>>> On Wed, Apr 24, 2019 at 9:27 PM Shawn Weeks <sw...@weeksconsulting.us> wrote:
>>>>>> 
>>>>>> Seems like this should be fairly easy for HBase 2.x with the checkAndMutate functionality and I was wondering if there is already a Jira for this. Otherwise I might make an attempt at it. It would be good to be able to support Wait/Notify and other things that need AtomicDistributedMapCacheClient using an Apache developed product commonly found in a Hadoop Cluster.
>>>>>> 
>>>>>> Thanks
>>>>>> Shawn


Re: Adding HBase Support for AtomicDistributedMapCacheClient

Posted by Bryan Bende <bb...@gmail.com>.
Should be available through the existing scan methods, they take a ResultHandler which gets passed an array of ResultCells, and each one has the timestamp. 

> On Apr 25, 2019, at 7:52 PM, Shawn Weeks <sw...@weeksconsulting.us> wrote:
> 
> I haven't looked at the other side of equation yet and that's how to get the timestamp on fetch. That will probably require a change or new scan method.
> 
> Thanks
> Shawn
> 
> -----Original Message-----
> From: Bryan Bende <bb...@gmail.com> 
> Sent: Thursday, April 25, 2019 4:29 PM
> To: dev@nifi.apache.org
> Subject: Re: Adding HBase Support for AtomicDistributedMapCacheClient
> 
> Also just realized that we do have two versions of the HBase DMC client service, so they could each do different things.
> 
> The HBase_1_1_2_ClientMapCacheService could call the original checkAndPut, and the  HBase_2_x_ClientMapCacheService could call the method.
> 
> In this approach the 1_1_2 client service could throw unsupported for the new method since it would never be used.
> 
> On Thu, Apr 25, 2019 at 5:25 PM Bryan Bende <bb...@gmail.com> wrote:
>> 
>> Thanks, I'm following now...
>> 
>> I think adding the new method to the interface and throwing 
>> UnsupportedOperationException for 1_1_2, or using the original 
>> checkAndPut and implementing it in both services, would both be fine 
>> solutions.
>> 
>> I guess another variation might be to introduce the new method in the 
>> interface, but in the 1_1_2 implementation just delegate back to the 
>> original checkAndPut and ignore the timestamp, and document that it 
>> isn't used in that implementation. I don't love this, but it does 
>> allow both services to implement the functionality and still leverage 
>> the better solution for 2_x.
>> 
>> 
>> On Thu, Apr 25, 2019 at 3:54 PM Shawn Weeks <sw...@weeksconsulting.us> wrote:
>>> 
>>> Here is what I think the new checkAndPut or checkAndMutate method would look like. This also shows what the new mutate api looks like.
>>> 
>>>    @Override
>>>    public boolean checkAndPut(String tableName, byte[] rowId, byte[] family, byte[] qualifier, byte[] value, long timestamp, PutColumn column) throws IOException {
>>>        try (final Table table = connection.getTable(TableName.valueOf(tableName))) {
>>>            Put put = new Put(rowId);
>>>            put.addColumn(
>>>                    column.getColumnFamily(),
>>>                    column.getColumnQualifier(),
>>>                    column.getBuffer());
>>>            return table.checkAndMutate(rowId, family).qualifier(qualifier).ifEquals(value).timeRange(TimeRange.at(timestamp)).thenPut(put);
>>>        }
>>>    }
>>> 
>>> If the atomic guarantee for the original checkAndPut is good enough then there is no reason I can't implement the atomic map cache for both versions of HBase.
>>> 
>>> Thanks
>>> Shawn
>>> 
>>> -----Original Message-----
>>> From: Bryan Bende <bb...@gmail.com>
>>> Sent: Thursday, April 25, 2019 12:39 PM
>>> To: dev@nifi.apache.org
>>> Subject: Re: Adding HBase Support for 
>>> AtomicDistributedMapCacheClient
>>> 
>>> I'm not totally if would matter if there were changes in between, as long as the current value is what we thought it was then the changes we are sending back should be accurate as a replacement. As a simplified scenario, if the current value is 1 and thread-A retrieves that value, thread-B then changes it to 2 and back to 1 before thread-A can do anything, then thread-A sends in 2 with a previous of 1, that is still the correct replacement.
>>> 
>>> I can see the argument for using the timestamp though... can you show the method signature of the new checkAndMutate method that would need to be added to the client service, and also which method of the HBase client it needs to call?
>>> 
>>> Just so I can get an idea of the differences between 1.x and 2.x.
>>> 
>>> On Thu, Apr 25, 2019 at 1:00 PM Shawn Weeks <sw...@weeksconsulting.us> wrote:
>>>> 
>>>> While checkAndPut is atomic as it's built now it doesn't support also checking the timestamp range which is included in the new checkAndMutate API. I had planned on using the cell's timestamp as the revision along with the value to ensure not only that the value hadn't been changed but that there hadn't been changes in between that just happened to put the value back.
>>>> 
>>>> As I was looking at everything I had another question. Why is the cache currently using a scan instead of a get to fetch values from HBase. It seems like that would be much less performant considering we know the row key we're looking for.
>>>> 
>>>> 
>>>> Thanks
>>>> Shawn
>>>> 
>>>> -----Original Message-----
>>>> From: Bryan Bende <bb...@gmail.com>
>>>> Sent: Thursday, April 25, 2019 11:56 AM
>>>> To: dev@nifi.apache.org
>>>> Subject: Re: Adding HBase Support for 
>>>> AtomicDistributedMapCacheClient
>>>> 
>>>> Can it not be done with the existing checkAndPut method? [1]
>>>> 
>>>> I think if you use the value as the revision it should work. Would be similar to how the Redis implementation works [2].
>>>> 
>>>> [1]
>>>> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-s
>>>> tand 
>>>> ard-services/nifi-hbase-client-service-api/src/main/java/org/apach
>>>> e/ni
>>>> fi/hbase/HBaseClientService.java#L65
>>>> [2]
>>>> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-r
>>>> edis 
>>>> -bundle/nifi-redis-extensions/src/main/java/org/apache/nifi/redis/
>>>> serv
>>>> ice/RedisDistributedMapCacheClientService.java#L271
>>>> 
>>>> On Thu, Apr 25, 2019 at 12:38 PM Shawn Weeks <sw...@weeksconsulting.us> wrote:
>>>>> 
>>>>> I'll need to add a check and mutate method to the HBaseClientService Interface, should I just extend with a HBase2ClientService or add checkAndMutate to the existing interface and just make it raise an exception if you try and use it against hbase 1? While Hbase 1.x supports checkAndMutate it doesn't provide a way to filter on timestamp which is part of how I was going to implement the revision requirement for AtomicMapCache.
>>>>> 
>>>>> Thanks
>>>>> Shawn
>>>>> 
>>>>> -----Original Message-----
>>>>> From: Bryan Bende <bb...@gmail.com>
>>>>> Sent: Thursday, April 25, 2019 9:11 AM
>>>>> To: dev@nifi.apache.org
>>>>> Subject: Re: Adding HBase Support for 
>>>>> AtomicDistributedMapCacheClient
>>>>> 
>>>>> I'm not aware of a JIRA, so I'd say go for it.
>>>>> 
>>>>> On Wed, Apr 24, 2019 at 9:27 PM Shawn Weeks <sw...@weeksconsulting.us> wrote:
>>>>>> 
>>>>>> Seems like this should be fairly easy for HBase 2.x with the checkAndMutate functionality and I was wondering if there is already a Jira for this. Otherwise I might make an attempt at it. It would be good to be able to support Wait/Notify and other things that need AtomicDistributedMapCacheClient using an Apache developed product commonly found in a Hadoop Cluster.
>>>>>> 
>>>>>> Thanks
>>>>>> Shawn


RE: Adding HBase Support for AtomicDistributedMapCacheClient

Posted by Shawn Weeks <sw...@weeksconsulting.us>.
I haven't looked at the other side of equation yet and that's how to get the timestamp on fetch. That will probably require a change or new scan method.

Thanks
Shawn

-----Original Message-----
From: Bryan Bende <bb...@gmail.com> 
Sent: Thursday, April 25, 2019 4:29 PM
To: dev@nifi.apache.org
Subject: Re: Adding HBase Support for AtomicDistributedMapCacheClient

Also just realized that we do have two versions of the HBase DMC client service, so they could each do different things.

The HBase_1_1_2_ClientMapCacheService could call the original checkAndPut, and the  HBase_2_x_ClientMapCacheService could call the method.

In this approach the 1_1_2 client service could throw unsupported for the new method since it would never be used.

On Thu, Apr 25, 2019 at 5:25 PM Bryan Bende <bb...@gmail.com> wrote:
>
> Thanks, I'm following now...
>
> I think adding the new method to the interface and throwing 
> UnsupportedOperationException for 1_1_2, or using the original 
> checkAndPut and implementing it in both services, would both be fine 
> solutions.
>
> I guess another variation might be to introduce the new method in the 
> interface, but in the 1_1_2 implementation just delegate back to the 
> original checkAndPut and ignore the timestamp, and document that it 
> isn't used in that implementation. I don't love this, but it does 
> allow both services to implement the functionality and still leverage 
> the better solution for 2_x.
>
>
> On Thu, Apr 25, 2019 at 3:54 PM Shawn Weeks <sw...@weeksconsulting.us> wrote:
> >
> > Here is what I think the new checkAndPut or checkAndMutate method would look like. This also shows what the new mutate api looks like.
> >
> >     @Override
> >     public boolean checkAndPut(String tableName, byte[] rowId, byte[] family, byte[] qualifier, byte[] value, long timestamp, PutColumn column) throws IOException {
> >         try (final Table table = connection.getTable(TableName.valueOf(tableName))) {
> >             Put put = new Put(rowId);
> >             put.addColumn(
> >                     column.getColumnFamily(),
> >                     column.getColumnQualifier(),
> >                     column.getBuffer());
> >             return table.checkAndMutate(rowId, family).qualifier(qualifier).ifEquals(value).timeRange(TimeRange.at(timestamp)).thenPut(put);
> >         }
> >     }
> >
> > If the atomic guarantee for the original checkAndPut is good enough then there is no reason I can't implement the atomic map cache for both versions of HBase.
> >
> > Thanks
> > Shawn
> >
> > -----Original Message-----
> > From: Bryan Bende <bb...@gmail.com>
> > Sent: Thursday, April 25, 2019 12:39 PM
> > To: dev@nifi.apache.org
> > Subject: Re: Adding HBase Support for 
> > AtomicDistributedMapCacheClient
> >
> > I'm not totally if would matter if there were changes in between, as long as the current value is what we thought it was then the changes we are sending back should be accurate as a replacement. As a simplified scenario, if the current value is 1 and thread-A retrieves that value, thread-B then changes it to 2 and back to 1 before thread-A can do anything, then thread-A sends in 2 with a previous of 1, that is still the correct replacement.
> >
> > I can see the argument for using the timestamp though... can you show the method signature of the new checkAndMutate method that would need to be added to the client service, and also which method of the HBase client it needs to call?
> >
> > Just so I can get an idea of the differences between 1.x and 2.x.
> >
> > On Thu, Apr 25, 2019 at 1:00 PM Shawn Weeks <sw...@weeksconsulting.us> wrote:
> > >
> > > While checkAndPut is atomic as it's built now it doesn't support also checking the timestamp range which is included in the new checkAndMutate API. I had planned on using the cell's timestamp as the revision along with the value to ensure not only that the value hadn't been changed but that there hadn't been changes in between that just happened to put the value back.
> > >
> > > As I was looking at everything I had another question. Why is the cache currently using a scan instead of a get to fetch values from HBase. It seems like that would be much less performant considering we know the row key we're looking for.
> > >
> > >
> > > Thanks
> > > Shawn
> > >
> > > -----Original Message-----
> > > From: Bryan Bende <bb...@gmail.com>
> > > Sent: Thursday, April 25, 2019 11:56 AM
> > > To: dev@nifi.apache.org
> > > Subject: Re: Adding HBase Support for 
> > > AtomicDistributedMapCacheClient
> > >
> > > Can it not be done with the existing checkAndPut method? [1]
> > >
> > > I think if you use the value as the revision it should work. Would be similar to how the Redis implementation works [2].
> > >
> > > [1]
> > > https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-s
> > > tand 
> > > ard-services/nifi-hbase-client-service-api/src/main/java/org/apach
> > > e/ni
> > > fi/hbase/HBaseClientService.java#L65
> > > [2]
> > > https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-r
> > > edis 
> > > -bundle/nifi-redis-extensions/src/main/java/org/apache/nifi/redis/
> > > serv
> > > ice/RedisDistributedMapCacheClientService.java#L271
> > >
> > > On Thu, Apr 25, 2019 at 12:38 PM Shawn Weeks <sw...@weeksconsulting.us> wrote:
> > > >
> > > > I'll need to add a check and mutate method to the HBaseClientService Interface, should I just extend with a HBase2ClientService or add checkAndMutate to the existing interface and just make it raise an exception if you try and use it against hbase 1? While Hbase 1.x supports checkAndMutate it doesn't provide a way to filter on timestamp which is part of how I was going to implement the revision requirement for AtomicMapCache.
> > > >
> > > > Thanks
> > > > Shawn
> > > >
> > > > -----Original Message-----
> > > > From: Bryan Bende <bb...@gmail.com>
> > > > Sent: Thursday, April 25, 2019 9:11 AM
> > > > To: dev@nifi.apache.org
> > > > Subject: Re: Adding HBase Support for 
> > > > AtomicDistributedMapCacheClient
> > > >
> > > > I'm not aware of a JIRA, so I'd say go for it.
> > > >
> > > > On Wed, Apr 24, 2019 at 9:27 PM Shawn Weeks <sw...@weeksconsulting.us> wrote:
> > > > >
> > > > > Seems like this should be fairly easy for HBase 2.x with the checkAndMutate functionality and I was wondering if there is already a Jira for this. Otherwise I might make an attempt at it. It would be good to be able to support Wait/Notify and other things that need AtomicDistributedMapCacheClient using an Apache developed product commonly found in a Hadoop Cluster.
> > > > >
> > > > > Thanks
> > > > > Shawn

Re: Adding HBase Support for AtomicDistributedMapCacheClient

Posted by Bryan Bende <bb...@gmail.com>.
Also just realized that we do have two versions of the HBase DMC
client service, so they could each do different things.

The HBase_1_1_2_ClientMapCacheService could call the original
checkAndPut, and the  HBase_2_x_ClientMapCacheService could call the
method.

In this approach the 1_1_2 client service could throw unsupported for
the new method since it would never be used.

On Thu, Apr 25, 2019 at 5:25 PM Bryan Bende <bb...@gmail.com> wrote:
>
> Thanks, I'm following now...
>
> I think adding the new method to the interface and throwing
> UnsupportedOperationException for 1_1_2, or using the original
> checkAndPut and implementing it in both services, would both be fine
> solutions.
>
> I guess another variation might be to introduce the new method in the
> interface, but in the 1_1_2 implementation just delegate back to the
> original checkAndPut and ignore the timestamp, and document that it
> isn't used in that implementation. I don't love this, but it does
> allow both services to implement the functionality and still leverage
> the better solution for 2_x.
>
>
> On Thu, Apr 25, 2019 at 3:54 PM Shawn Weeks <sw...@weeksconsulting.us> wrote:
> >
> > Here is what I think the new checkAndPut or checkAndMutate method would look like. This also shows what the new mutate api looks like.
> >
> >     @Override
> >     public boolean checkAndPut(String tableName, byte[] rowId, byte[] family, byte[] qualifier, byte[] value, long timestamp, PutColumn column) throws IOException {
> >         try (final Table table = connection.getTable(TableName.valueOf(tableName))) {
> >             Put put = new Put(rowId);
> >             put.addColumn(
> >                     column.getColumnFamily(),
> >                     column.getColumnQualifier(),
> >                     column.getBuffer());
> >             return table.checkAndMutate(rowId, family).qualifier(qualifier).ifEquals(value).timeRange(TimeRange.at(timestamp)).thenPut(put);
> >         }
> >     }
> >
> > If the atomic guarantee for the original checkAndPut is good enough then there is no reason I can't implement the atomic map cache for both versions of HBase.
> >
> > Thanks
> > Shawn
> >
> > -----Original Message-----
> > From: Bryan Bende <bb...@gmail.com>
> > Sent: Thursday, April 25, 2019 12:39 PM
> > To: dev@nifi.apache.org
> > Subject: Re: Adding HBase Support for AtomicDistributedMapCacheClient
> >
> > I'm not totally if would matter if there were changes in between, as long as the current value is what we thought it was then the changes we are sending back should be accurate as a replacement. As a simplified scenario, if the current value is 1 and thread-A retrieves that value, thread-B then changes it to 2 and back to 1 before thread-A can do anything, then thread-A sends in 2 with a previous of 1, that is still the correct replacement.
> >
> > I can see the argument for using the timestamp though... can you show the method signature of the new checkAndMutate method that would need to be added to the client service, and also which method of the HBase client it needs to call?
> >
> > Just so I can get an idea of the differences between 1.x and 2.x.
> >
> > On Thu, Apr 25, 2019 at 1:00 PM Shawn Weeks <sw...@weeksconsulting.us> wrote:
> > >
> > > While checkAndPut is atomic as it's built now it doesn't support also checking the timestamp range which is included in the new checkAndMutate API. I had planned on using the cell's timestamp as the revision along with the value to ensure not only that the value hadn't been changed but that there hadn't been changes in between that just happened to put the value back.
> > >
> > > As I was looking at everything I had another question. Why is the cache currently using a scan instead of a get to fetch values from HBase. It seems like that would be much less performant considering we know the row key we're looking for.
> > >
> > >
> > > Thanks
> > > Shawn
> > >
> > > -----Original Message-----
> > > From: Bryan Bende <bb...@gmail.com>
> > > Sent: Thursday, April 25, 2019 11:56 AM
> > > To: dev@nifi.apache.org
> > > Subject: Re: Adding HBase Support for AtomicDistributedMapCacheClient
> > >
> > > Can it not be done with the existing checkAndPut method? [1]
> > >
> > > I think if you use the value as the revision it should work. Would be similar to how the Redis implementation works [2].
> > >
> > > [1]
> > > https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-stand
> > > ard-services/nifi-hbase-client-service-api/src/main/java/org/apache/ni
> > > fi/hbase/HBaseClientService.java#L65
> > > [2]
> > > https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-redis
> > > -bundle/nifi-redis-extensions/src/main/java/org/apache/nifi/redis/serv
> > > ice/RedisDistributedMapCacheClientService.java#L271
> > >
> > > On Thu, Apr 25, 2019 at 12:38 PM Shawn Weeks <sw...@weeksconsulting.us> wrote:
> > > >
> > > > I'll need to add a check and mutate method to the HBaseClientService Interface, should I just extend with a HBase2ClientService or add checkAndMutate to the existing interface and just make it raise an exception if you try and use it against hbase 1? While Hbase 1.x supports checkAndMutate it doesn't provide a way to filter on timestamp which is part of how I was going to implement the revision requirement for AtomicMapCache.
> > > >
> > > > Thanks
> > > > Shawn
> > > >
> > > > -----Original Message-----
> > > > From: Bryan Bende <bb...@gmail.com>
> > > > Sent: Thursday, April 25, 2019 9:11 AM
> > > > To: dev@nifi.apache.org
> > > > Subject: Re: Adding HBase Support for
> > > > AtomicDistributedMapCacheClient
> > > >
> > > > I'm not aware of a JIRA, so I'd say go for it.
> > > >
> > > > On Wed, Apr 24, 2019 at 9:27 PM Shawn Weeks <sw...@weeksconsulting.us> wrote:
> > > > >
> > > > > Seems like this should be fairly easy for HBase 2.x with the checkAndMutate functionality and I was wondering if there is already a Jira for this. Otherwise I might make an attempt at it. It would be good to be able to support Wait/Notify and other things that need AtomicDistributedMapCacheClient using an Apache developed product commonly found in a Hadoop Cluster.
> > > > >
> > > > > Thanks
> > > > > Shawn

Re: Adding HBase Support for AtomicDistributedMapCacheClient

Posted by Bryan Bende <bb...@gmail.com>.
Thanks, I'm following now...

I think adding the new method to the interface and throwing
UnsupportedOperationException for 1_1_2, or using the original
checkAndPut and implementing it in both services, would both be fine
solutions.

I guess another variation might be to introduce the new method in the
interface, but in the 1_1_2 implementation just delegate back to the
original checkAndPut and ignore the timestamp, and document that it
isn't used in that implementation. I don't love this, but it does
allow both services to implement the functionality and still leverage
the better solution for 2_x.


On Thu, Apr 25, 2019 at 3:54 PM Shawn Weeks <sw...@weeksconsulting.us> wrote:
>
> Here is what I think the new checkAndPut or checkAndMutate method would look like. This also shows what the new mutate api looks like.
>
>     @Override
>     public boolean checkAndPut(String tableName, byte[] rowId, byte[] family, byte[] qualifier, byte[] value, long timestamp, PutColumn column) throws IOException {
>         try (final Table table = connection.getTable(TableName.valueOf(tableName))) {
>             Put put = new Put(rowId);
>             put.addColumn(
>                     column.getColumnFamily(),
>                     column.getColumnQualifier(),
>                     column.getBuffer());
>             return table.checkAndMutate(rowId, family).qualifier(qualifier).ifEquals(value).timeRange(TimeRange.at(timestamp)).thenPut(put);
>         }
>     }
>
> If the atomic guarantee for the original checkAndPut is good enough then there is no reason I can't implement the atomic map cache for both versions of HBase.
>
> Thanks
> Shawn
>
> -----Original Message-----
> From: Bryan Bende <bb...@gmail.com>
> Sent: Thursday, April 25, 2019 12:39 PM
> To: dev@nifi.apache.org
> Subject: Re: Adding HBase Support for AtomicDistributedMapCacheClient
>
> I'm not totally if would matter if there were changes in between, as long as the current value is what we thought it was then the changes we are sending back should be accurate as a replacement. As a simplified scenario, if the current value is 1 and thread-A retrieves that value, thread-B then changes it to 2 and back to 1 before thread-A can do anything, then thread-A sends in 2 with a previous of 1, that is still the correct replacement.
>
> I can see the argument for using the timestamp though... can you show the method signature of the new checkAndMutate method that would need to be added to the client service, and also which method of the HBase client it needs to call?
>
> Just so I can get an idea of the differences between 1.x and 2.x.
>
> On Thu, Apr 25, 2019 at 1:00 PM Shawn Weeks <sw...@weeksconsulting.us> wrote:
> >
> > While checkAndPut is atomic as it's built now it doesn't support also checking the timestamp range which is included in the new checkAndMutate API. I had planned on using the cell's timestamp as the revision along with the value to ensure not only that the value hadn't been changed but that there hadn't been changes in between that just happened to put the value back.
> >
> > As I was looking at everything I had another question. Why is the cache currently using a scan instead of a get to fetch values from HBase. It seems like that would be much less performant considering we know the row key we're looking for.
> >
> >
> > Thanks
> > Shawn
> >
> > -----Original Message-----
> > From: Bryan Bende <bb...@gmail.com>
> > Sent: Thursday, April 25, 2019 11:56 AM
> > To: dev@nifi.apache.org
> > Subject: Re: Adding HBase Support for AtomicDistributedMapCacheClient
> >
> > Can it not be done with the existing checkAndPut method? [1]
> >
> > I think if you use the value as the revision it should work. Would be similar to how the Redis implementation works [2].
> >
> > [1]
> > https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-stand
> > ard-services/nifi-hbase-client-service-api/src/main/java/org/apache/ni
> > fi/hbase/HBaseClientService.java#L65
> > [2]
> > https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-redis
> > -bundle/nifi-redis-extensions/src/main/java/org/apache/nifi/redis/serv
> > ice/RedisDistributedMapCacheClientService.java#L271
> >
> > On Thu, Apr 25, 2019 at 12:38 PM Shawn Weeks <sw...@weeksconsulting.us> wrote:
> > >
> > > I'll need to add a check and mutate method to the HBaseClientService Interface, should I just extend with a HBase2ClientService or add checkAndMutate to the existing interface and just make it raise an exception if you try and use it against hbase 1? While Hbase 1.x supports checkAndMutate it doesn't provide a way to filter on timestamp which is part of how I was going to implement the revision requirement for AtomicMapCache.
> > >
> > > Thanks
> > > Shawn
> > >
> > > -----Original Message-----
> > > From: Bryan Bende <bb...@gmail.com>
> > > Sent: Thursday, April 25, 2019 9:11 AM
> > > To: dev@nifi.apache.org
> > > Subject: Re: Adding HBase Support for
> > > AtomicDistributedMapCacheClient
> > >
> > > I'm not aware of a JIRA, so I'd say go for it.
> > >
> > > On Wed, Apr 24, 2019 at 9:27 PM Shawn Weeks <sw...@weeksconsulting.us> wrote:
> > > >
> > > > Seems like this should be fairly easy for HBase 2.x with the checkAndMutate functionality and I was wondering if there is already a Jira for this. Otherwise I might make an attempt at it. It would be good to be able to support Wait/Notify and other things that need AtomicDistributedMapCacheClient using an Apache developed product commonly found in a Hadoop Cluster.
> > > >
> > > > Thanks
> > > > Shawn

RE: Adding HBase Support for AtomicDistributedMapCacheClient

Posted by Shawn Weeks <sw...@weeksconsulting.us>.
Here is what I think the new checkAndPut or checkAndMutate method would look like. This also shows what the new mutate api looks like.

    @Override
    public boolean checkAndPut(String tableName, byte[] rowId, byte[] family, byte[] qualifier, byte[] value, long timestamp, PutColumn column) throws IOException {
        try (final Table table = connection.getTable(TableName.valueOf(tableName))) {
            Put put = new Put(rowId);
            put.addColumn(
                    column.getColumnFamily(),
                    column.getColumnQualifier(),
                    column.getBuffer());
            return table.checkAndMutate(rowId, family).qualifier(qualifier).ifEquals(value).timeRange(TimeRange.at(timestamp)).thenPut(put);
        }
    }

If the atomic guarantee for the original checkAndPut is good enough then there is no reason I can't implement the atomic map cache for both versions of HBase.

Thanks
Shawn

-----Original Message-----
From: Bryan Bende <bb...@gmail.com> 
Sent: Thursday, April 25, 2019 12:39 PM
To: dev@nifi.apache.org
Subject: Re: Adding HBase Support for AtomicDistributedMapCacheClient

I'm not totally if would matter if there were changes in between, as long as the current value is what we thought it was then the changes we are sending back should be accurate as a replacement. As a simplified scenario, if the current value is 1 and thread-A retrieves that value, thread-B then changes it to 2 and back to 1 before thread-A can do anything, then thread-A sends in 2 with a previous of 1, that is still the correct replacement.

I can see the argument for using the timestamp though... can you show the method signature of the new checkAndMutate method that would need to be added to the client service, and also which method of the HBase client it needs to call?

Just so I can get an idea of the differences between 1.x and 2.x.

On Thu, Apr 25, 2019 at 1:00 PM Shawn Weeks <sw...@weeksconsulting.us> wrote:
>
> While checkAndPut is atomic as it's built now it doesn't support also checking the timestamp range which is included in the new checkAndMutate API. I had planned on using the cell's timestamp as the revision along with the value to ensure not only that the value hadn't been changed but that there hadn't been changes in between that just happened to put the value back.
>
> As I was looking at everything I had another question. Why is the cache currently using a scan instead of a get to fetch values from HBase. It seems like that would be much less performant considering we know the row key we're looking for.
>
>
> Thanks
> Shawn
>
> -----Original Message-----
> From: Bryan Bende <bb...@gmail.com>
> Sent: Thursday, April 25, 2019 11:56 AM
> To: dev@nifi.apache.org
> Subject: Re: Adding HBase Support for AtomicDistributedMapCacheClient
>
> Can it not be done with the existing checkAndPut method? [1]
>
> I think if you use the value as the revision it should work. Would be similar to how the Redis implementation works [2].
>
> [1] 
> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-stand
> ard-services/nifi-hbase-client-service-api/src/main/java/org/apache/ni
> fi/hbase/HBaseClientService.java#L65
> [2] 
> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-redis
> -bundle/nifi-redis-extensions/src/main/java/org/apache/nifi/redis/serv
> ice/RedisDistributedMapCacheClientService.java#L271
>
> On Thu, Apr 25, 2019 at 12:38 PM Shawn Weeks <sw...@weeksconsulting.us> wrote:
> >
> > I'll need to add a check and mutate method to the HBaseClientService Interface, should I just extend with a HBase2ClientService or add checkAndMutate to the existing interface and just make it raise an exception if you try and use it against hbase 1? While Hbase 1.x supports checkAndMutate it doesn't provide a way to filter on timestamp which is part of how I was going to implement the revision requirement for AtomicMapCache.
> >
> > Thanks
> > Shawn
> >
> > -----Original Message-----
> > From: Bryan Bende <bb...@gmail.com>
> > Sent: Thursday, April 25, 2019 9:11 AM
> > To: dev@nifi.apache.org
> > Subject: Re: Adding HBase Support for 
> > AtomicDistributedMapCacheClient
> >
> > I'm not aware of a JIRA, so I'd say go for it.
> >
> > On Wed, Apr 24, 2019 at 9:27 PM Shawn Weeks <sw...@weeksconsulting.us> wrote:
> > >
> > > Seems like this should be fairly easy for HBase 2.x with the checkAndMutate functionality and I was wondering if there is already a Jira for this. Otherwise I might make an attempt at it. It would be good to be able to support Wait/Notify and other things that need AtomicDistributedMapCacheClient using an Apache developed product commonly found in a Hadoop Cluster.
> > >
> > > Thanks
> > > Shawn

Re: Adding HBase Support for AtomicDistributedMapCacheClient

Posted by Bryan Bende <bb...@gmail.com>.
I'm not totally if would matter if there were changes in between, as
long as the current value is what we thought it was then the changes
we are sending back should be accurate as a replacement. As a
simplified scenario, if the current value is 1 and thread-A retrieves
that value, thread-B then changes it to 2 and back to 1 before
thread-A can do anything, then thread-A sends in 2 with a previous of
1, that is still the correct replacement.

I can see the argument for using the timestamp though... can you show
the method signature of the new checkAndMutate method that would need
to be added to the client service, and also which method of the HBase
client it needs to call?

Just so I can get an idea of the differences between 1.x and 2.x.

On Thu, Apr 25, 2019 at 1:00 PM Shawn Weeks <sw...@weeksconsulting.us> wrote:
>
> While checkAndPut is atomic as it's built now it doesn't support also checking the timestamp range which is included in the new checkAndMutate API. I had planned on using the cell's timestamp as the revision along with the value to ensure not only that the value hadn't been changed but that there hadn't been changes in between that just happened to put the value back.
>
> As I was looking at everything I had another question. Why is the cache currently using a scan instead of a get to fetch values from HBase. It seems like that would be much less performant considering we know the row key we're looking for.
>
>
> Thanks
> Shawn
>
> -----Original Message-----
> From: Bryan Bende <bb...@gmail.com>
> Sent: Thursday, April 25, 2019 11:56 AM
> To: dev@nifi.apache.org
> Subject: Re: Adding HBase Support for AtomicDistributedMapCacheClient
>
> Can it not be done with the existing checkAndPut method? [1]
>
> I think if you use the value as the revision it should work. Would be similar to how the Redis implementation works [2].
>
> [1] https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-services/nifi-hbase-client-service-api/src/main/java/org/apache/nifi/hbase/HBaseClientService.java#L65
> [2] https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-redis-bundle/nifi-redis-extensions/src/main/java/org/apache/nifi/redis/service/RedisDistributedMapCacheClientService.java#L271
>
> On Thu, Apr 25, 2019 at 12:38 PM Shawn Weeks <sw...@weeksconsulting.us> wrote:
> >
> > I'll need to add a check and mutate method to the HBaseClientService Interface, should I just extend with a HBase2ClientService or add checkAndMutate to the existing interface and just make it raise an exception if you try and use it against hbase 1? While Hbase 1.x supports checkAndMutate it doesn't provide a way to filter on timestamp which is part of how I was going to implement the revision requirement for AtomicMapCache.
> >
> > Thanks
> > Shawn
> >
> > -----Original Message-----
> > From: Bryan Bende <bb...@gmail.com>
> > Sent: Thursday, April 25, 2019 9:11 AM
> > To: dev@nifi.apache.org
> > Subject: Re: Adding HBase Support for AtomicDistributedMapCacheClient
> >
> > I'm not aware of a JIRA, so I'd say go for it.
> >
> > On Wed, Apr 24, 2019 at 9:27 PM Shawn Weeks <sw...@weeksconsulting.us> wrote:
> > >
> > > Seems like this should be fairly easy for HBase 2.x with the checkAndMutate functionality and I was wondering if there is already a Jira for this. Otherwise I might make an attempt at it. It would be good to be able to support Wait/Notify and other things that need AtomicDistributedMapCacheClient using an Apache developed product commonly found in a Hadoop Cluster.
> > >
> > > Thanks
> > > Shawn

RE: Adding HBase Support for AtomicDistributedMapCacheClient

Posted by Shawn Weeks <sw...@weeksconsulting.us>.
While checkAndPut is atomic as it's built now it doesn't support also checking the timestamp range which is included in the new checkAndMutate API. I had planned on using the cell's timestamp as the revision along with the value to ensure not only that the value hadn't been changed but that there hadn't been changes in between that just happened to put the value back.

As I was looking at everything I had another question. Why is the cache currently using a scan instead of a get to fetch values from HBase. It seems like that would be much less performant considering we know the row key we're looking for.


Thanks
Shawn

-----Original Message-----
From: Bryan Bende <bb...@gmail.com> 
Sent: Thursday, April 25, 2019 11:56 AM
To: dev@nifi.apache.org
Subject: Re: Adding HBase Support for AtomicDistributedMapCacheClient

Can it not be done with the existing checkAndPut method? [1]

I think if you use the value as the revision it should work. Would be similar to how the Redis implementation works [2].

[1] https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-services/nifi-hbase-client-service-api/src/main/java/org/apache/nifi/hbase/HBaseClientService.java#L65
[2] https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-redis-bundle/nifi-redis-extensions/src/main/java/org/apache/nifi/redis/service/RedisDistributedMapCacheClientService.java#L271

On Thu, Apr 25, 2019 at 12:38 PM Shawn Weeks <sw...@weeksconsulting.us> wrote:
>
> I'll need to add a check and mutate method to the HBaseClientService Interface, should I just extend with a HBase2ClientService or add checkAndMutate to the existing interface and just make it raise an exception if you try and use it against hbase 1? While Hbase 1.x supports checkAndMutate it doesn't provide a way to filter on timestamp which is part of how I was going to implement the revision requirement for AtomicMapCache.
>
> Thanks
> Shawn
>
> -----Original Message-----
> From: Bryan Bende <bb...@gmail.com>
> Sent: Thursday, April 25, 2019 9:11 AM
> To: dev@nifi.apache.org
> Subject: Re: Adding HBase Support for AtomicDistributedMapCacheClient
>
> I'm not aware of a JIRA, so I'd say go for it.
>
> On Wed, Apr 24, 2019 at 9:27 PM Shawn Weeks <sw...@weeksconsulting.us> wrote:
> >
> > Seems like this should be fairly easy for HBase 2.x with the checkAndMutate functionality and I was wondering if there is already a Jira for this. Otherwise I might make an attempt at it. It would be good to be able to support Wait/Notify and other things that need AtomicDistributedMapCacheClient using an Apache developed product commonly found in a Hadoop Cluster.
> >
> > Thanks
> > Shawn

Re: Adding HBase Support for AtomicDistributedMapCacheClient

Posted by Bryan Bende <bb...@gmail.com>.
Can it not be done with the existing checkAndPut method? [1]

I think if you use the value as the revision it should work. Would be
similar to how the Redis implementation works [2].

[1] https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-services/nifi-hbase-client-service-api/src/main/java/org/apache/nifi/hbase/HBaseClientService.java#L65
[2] https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-redis-bundle/nifi-redis-extensions/src/main/java/org/apache/nifi/redis/service/RedisDistributedMapCacheClientService.java#L271

On Thu, Apr 25, 2019 at 12:38 PM Shawn Weeks <sw...@weeksconsulting.us> wrote:
>
> I'll need to add a check and mutate method to the HBaseClientService Interface, should I just extend with a HBase2ClientService or add checkAndMutate to the existing interface and just make it raise an exception if you try and use it against hbase 1? While Hbase 1.x supports checkAndMutate it doesn't provide a way to filter on timestamp which is part of how I was going to implement the revision requirement for AtomicMapCache.
>
> Thanks
> Shawn
>
> -----Original Message-----
> From: Bryan Bende <bb...@gmail.com>
> Sent: Thursday, April 25, 2019 9:11 AM
> To: dev@nifi.apache.org
> Subject: Re: Adding HBase Support for AtomicDistributedMapCacheClient
>
> I'm not aware of a JIRA, so I'd say go for it.
>
> On Wed, Apr 24, 2019 at 9:27 PM Shawn Weeks <sw...@weeksconsulting.us> wrote:
> >
> > Seems like this should be fairly easy for HBase 2.x with the checkAndMutate functionality and I was wondering if there is already a Jira for this. Otherwise I might make an attempt at it. It would be good to be able to support Wait/Notify and other things that need AtomicDistributedMapCacheClient using an Apache developed product commonly found in a Hadoop Cluster.
> >
> > Thanks
> > Shawn

RE: Adding HBase Support for AtomicDistributedMapCacheClient

Posted by Shawn Weeks <sw...@weeksconsulting.us>.
I'll need to add a check and mutate method to the HBaseClientService Interface, should I just extend with a HBase2ClientService or add checkAndMutate to the existing interface and just make it raise an exception if you try and use it against hbase 1? While Hbase 1.x supports checkAndMutate it doesn't provide a way to filter on timestamp which is part of how I was going to implement the revision requirement for AtomicMapCache.

Thanks
Shawn

-----Original Message-----
From: Bryan Bende <bb...@gmail.com> 
Sent: Thursday, April 25, 2019 9:11 AM
To: dev@nifi.apache.org
Subject: Re: Adding HBase Support for AtomicDistributedMapCacheClient

I'm not aware of a JIRA, so I'd say go for it.

On Wed, Apr 24, 2019 at 9:27 PM Shawn Weeks <sw...@weeksconsulting.us> wrote:
>
> Seems like this should be fairly easy for HBase 2.x with the checkAndMutate functionality and I was wondering if there is already a Jira for this. Otherwise I might make an attempt at it. It would be good to be able to support Wait/Notify and other things that need AtomicDistributedMapCacheClient using an Apache developed product commonly found in a Hadoop Cluster.
>
> Thanks
> Shawn

Re: Adding HBase Support for AtomicDistributedMapCacheClient

Posted by Bryan Bende <bb...@gmail.com>.
I'm not aware of a JIRA, so I'd say go for it.

On Wed, Apr 24, 2019 at 9:27 PM Shawn Weeks <sw...@weeksconsulting.us> wrote:
>
> Seems like this should be fairly easy for HBase 2.x with the checkAndMutate functionality and I was wondering if there is already a Jira for this. Otherwise I might make an attempt at it. It would be good to be able to support Wait/Notify and other things that need AtomicDistributedMapCacheClient using an Apache developed product commonly found in a Hadoop Cluster.
>
> Thanks
> Shawn