You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hbase.apache.org by 杨苏立 Yang Su Li <ya...@gmail.com> on 2017/04/09 19:47:51 UTC

Is HBase RPC-Handling idempotent for reads?

Hi,

I am wondering, for read requests like Get/MultiGet/Scan, is the RPC
handling idempotent in HBase?

More specifically, if in the middle of RPC handling we stop the handling
threads, puts the RPC call back to the queue, and later another RPC Handler
picks up this call and starts all over again, will the result be the same
as if this call is being handled for the first time now? Or are their any
unexpected side effects?

Thanks!

Suli

-- 
Suli Yang

Department of Physics
University of Wisconsin Madison

4257 Chamberlin Hall
Madison WI 53703

Re: Is HBase RPC-Handling idempotent for reads?

Posted by "张铎 (Duo Zhang)" <pa...@gmail.com>.

I believe here Suli is doing research, not building a production level
feature into HBase? So the idea is the most important thing here, it must
be something new. As Yu Li said above, speculative execution is a usually
used in computation framework such MR, so it is something new for a storage
system? And I think HBase is somehow also a computation framework, as we
have filter and coprocessor which can move computation from client to RS.
There could be long queries in HBase(Phoenix, OLAP, right?)

Oh I found a typo in my previous email... Missed a 'no', completely
different meaning...

"But at client side, there is [NO] guarantee that the request you send
first will be executed first."

I'd say just go ahead. Do not struggle with the implementation details. For
a research program, a POC is enough. A production level project is just a
bonus.

Thanks.

2017-04-12 23:41 GMT+08:00 Josh Elser <el...@apache.org>:

> Yeah, neat idea now that I understand the big picture :)
>
> Instead of trying to do this purely server-side, have you considered a
> first "wag" at a solution of hooking into the existing RPC quota work?
>
> Presently, in the context of a user's RPCs, quotas only limit the number
> of RPCs that user makes in a timeframe. I think it would be a much easier
> first implementation to extend that work to include some notion of I/O
> "cost" to an RPC (in addition, hooking into the existing implementation).
> You'd inherit a lot of functionality and be able to test your hypothesis a
> bit quicker. Something like, "user elserj is only allowed to have
> 1MB/second of I/O at normal priority".
>
> After you have the groundwork laid, it would be a natural follow-on step
> to reduce client RPC retries (handle it purely server-side), evaluate
> practical test cases for the I/O cost computations (are the costs
> "valid"?), figure out if existing RPC priorities are sufficient for
> de-prioritization of RPCs, etc.
>
>
> Yu Li wrote:
>
>> I see, some priority-based preemptive scheduling.
>>
>> bq. if it requires I/O resources that are not allocated to it
>> Easy to tell whether the request misses the cache and requires IO
>> operation, but what's the standard of "not allocated"? Some kind of
>> timeout? Anyway, interesting topic and let us know if you work it out
>> (smile).
>>
>> Best Regards,
>> Yu
>>
>> On 11 April 2017 at 01:09, 杨苏立 Yang Su Li<ya...@gmail.com>  wrote:
>>
>> On Sun, Apr 9, 2017 at 11:14 PM, Yu Li<ca...@gmail.com>  wrote:
>>>
>>> Correct me if I'm wrong, but I think we should assume no other but the
>>>> single operation when checking whether it's idempotent. Similar to the
>>>> wikipedia
>>>> example<https://en.wikipedia.org/wiki/Idempotence#Examples>: "A
>>>>
>>> function
>>>
>>>> looking up a customer's name and address in a database
>>>> <https://en.wikipedia.org/wiki/Database>  is typically idempotent,
>>>> since
>>>> this will not cause the database to change", I think all
>>>>
>>> Get/MultiGet/Scan
>>>
>>>> operations in hbase are idempotent.
>>>>
>>>> About "speculative rpc handling", I doubt whether it benefits in hbase.
>>>> Normally if a request already arrives at server side but with slow
>>>> execution, the problem might be:
>>>> 1. The server is too busy and request get queued
>>>> 2. The processing itself is slow due to the request pattern or some
>>>> hardware failure
>>>> I don't think a speculative execution of the request could help in any
>>>> of
>>>> the above cases. It's different from the speculative task execution in
>>>>
>>> MR,
>>>
>>>> there we could choose another node to execute the task while here we
>>>> have
>>>> no choice.
>>>>
>>>>
>>> We have a different use case here. Basically we are trying to enforce
>>> scheduling at HBase.
>>> Consider the following scenario: both client-1 and client-2 are competing
>>> for I/O resources.
>>> But client-2 are also issuing a bunch of requests that do not require any
>>> I/O resources (say, data is cached).
>>> Since we have idle CPU/memory, we want to serve these cached requests for
>>> client-2, but we do not want client-2 to use more than its fair share of
>>> I/O.
>>>
>>> Unfortunately, at the time we pick RPC call to handle, we don't know
>>> whether an RPC would cause I/O or not.
>>> So we think we can abort a request if it requires I/O resources that are
>>> not allocated to it, and re-schedule it later based on our scheduling
>>> policy.
>>>
>>>
>>>
>>>
>>>
>>>
>>> OTOH, we already have timeout mechanism to make sure server resource
>>>>
>>> won't
>>>
>>>> be wasted:
>>>> 1. For scan
>>>>      - When a request handling timeouts, server will stop further
>>>> processing, refer to RSRpcServices#getTimeLimit and
>>>> ScannerContext#checkTimeLimit
>>>>      - If the client went away during processing, server will also stop
>>>> processing, check the SimpleRpcServer#disconnectSince and
>>>> RegionScannerImpl#nextInternal methods for more details.
>>>>
>>>> 2. For single Get
>>>>      - Controlled by rpc and operation timeout
>>>>
>>>> 3. For MultiGet
>>>>      - I think this is something we could improve. On client side we
>>>> have
>>>> timeout mechanism but on server side there seems to be no relative
>>>> interrupt logic.
>>>>
>>>>
>>>> Best Regards,
>>>> Yu
>>>>
>>>> On 10 April 2017 at 11:12, Jerry He<je...@gmail.com>  wrote:
>>>>
>>>> Again, it depends on how you abort and 'idempotent' can have different
>>>>> definitions.
>>>>>
>>>>> For example, even if you are only concerned about read,
>>>>> there are resources on the HRegion that the read touches or acquires
>>>>> (scanner, lock, mvcc etc) that hopefully will be cleaned/releases with
>>>>>
>>>> the
>>>>
>>>>> abort.
>>>>> Or you may have it in a bad/inconsistent state.
>>>>>
>>>>> Thanks.
>>>>>
>>>>> Jerry
>>>>>
>>>>>
>>>>> On Sun, Apr 9, 2017 at 7:14 PM, 张铎(Duo Zhang)<pa...@gmail.com>
>>>>> wrote:
>>>>>
>>>>> I think this depends on how you model the problem. At server side, if
>>>>>>
>>>>> you
>>>>
>>>>> re-execute a read operation with a new mvcc, then you may read a
>>>>>>
>>>>> value
>>>
>>>> that
>>>>>
>>>>>> should not be visible if you use the old mvcc. If you define this as
>>>>>>
>>>>> an
>>>
>>>> error then I think there will be conflicts.
>>>>>>
>>>>>> But at client side, there is guarantee that the request you send
>>>>>>
>>>>> first
>>>
>>>> will
>>>>>
>>>>>> be executed first. So as long as the read request does not return, I
>>>>>>
>>>>> think
>>>>>
>>>>>> it is OK to read a value which is written by a write request which is
>>>>>>
>>>>> sent
>>>>>
>>>>>> after the read request?
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>> 2017-04-10 9:52 GMT+08:00 杨苏立 Yang Su Li<ya...@gmail.com>:
>>>>>>
>>>>>> We are only concerned about read operations here. Are you
>>>>>>>
>>>>>> suggesting
>>>
>>>> they
>>>>>
>>>>>> are completely idempotent?
>>>>>>> Are there any read-after-write conflicts?
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>> Sui
>>>>>>>
>>>>>>> On Sun, Apr 9, 2017 at 8:48 PM, 张铎(Duo Zhang)<
>>>>>>>
>>>>>> palomino219@gmail.com
>>>
>>>> wrote:
>>>>>>>
>>>>>>> It depends on how you about the rpc request. For hbase, there
>>>>>>>>
>>>>>>> will
>>>
>>>> be
>>>>
>>>>> no
>>>>>>
>>>>>>> write conflict, but a write operation can only be finished iff
>>>>>>>>
>>>>>>> all
>>>
>>>> the
>>>>>
>>>>>> write operations with a lower mvcc number have been finished. So
>>>>>>>>
>>>>>>> if
>>>
>>>> you
>>>>>
>>>>>> just stop a write operation without recovering the mvcc(I do not
>>>>>>>>
>>>>>>> know
>>>>
>>>>> how
>>>>>>
>>>>>>> to recover but I think you need to something...) then the writes
>>>>>>>>
>>>>>>> will
>>>>
>>>>> be
>>>>>>
>>>>>>> stuck.
>>>>>>>>
>>>>>>>> And one more thing, for read operation you may interrupt it at
>>>>>>>>
>>>>>>> any
>>>
>>>> time,
>>>>>>
>>>>>>> but for write operation, I do not think you can re-execute it
>>>>>>>>
>>>>>>> with
>>>
>>>> a
>>>>
>>>>> new
>>>>>>
>>>>>>> mvcc number if the WAL entry has already been flushed out. That
>>>>>>>>
>>>>>>> means,
>>>>>
>>>>>> the
>>>>>>>
>>>>>>>> re-execution process will be different if you about the write
>>>>>>>>
>>>>>>> operation
>>>>>
>>>>>> at
>>>>>>>
>>>>>>>> different stages.
>>>>>>>>
>>>>>>>> Thanks.
>>>>>>>>
>>>>>>>> 2017-04-10 6:47 GMT+08:00 杨苏立 Yang Su Li<ya...@gmail.com>:
>>>>>>>>
>>>>>>>> We are trying to implement speculative rpc handling for our
>>>>>>>>>
>>>>>>>> workloads.
>>>>>>
>>>>>>> So
>>>>>>>
>>>>>>>> we want allow RPC Handler to stop executing an RPC call, put it
>>>>>>>>>
>>>>>>>> back
>>>>>
>>>>>> to
>>>>>>
>>>>>>> the
>>>>>>>>
>>>>>>>>> queue, and later re-execute it.
>>>>>>>>>
>>>>>>>>> If at time t1, we execute and RPC call half way, aborts, and
>>>>>>>>>
>>>>>>>> put
>>>
>>>> the
>>>>>
>>>>>> call
>>>>>>>
>>>>>>>> back to the queue.
>>>>>>>>> Then at time t2 another RPC handler picks the call and
>>>>>>>>>
>>>>>>>> re-execute
>>>
>>>> it.
>>>>>
>>>>>> I understand that we might get a different mvcc number and
>>>>>>>>>
>>>>>>>> different
>>>>>
>>>>>> results at t2 compared to we execute it at t1.
>>>>>>>>> My question is that: would this situation any different
>>>>>>>>>
>>>>>>>> compared
>>>
>>>> to
>>>>
>>>>> the
>>>>>>
>>>>>>> situation where the call was never executed at t1, and is
>>>>>>>>>
>>>>>>>> executed
>>>>
>>>>> at
>>>>>
>>>>>> t2
>>>>>>>
>>>>>>>> for the first time.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> My guess is that since at t1 we may already gotten an mvcc
>>>>>>>>>
>>>>>>>> number,
>>>>
>>>>> so
>>>>>
>>>>>> it
>>>>>>>
>>>>>>>> might potentially cause some write conflicts and certain write
>>>>>>>>>
>>>>>>>> operations
>>>>>>>
>>>>>>>> to retry. But correctness wise, is there any difference?
>>>>>>>>>
>>>>>>>>> Thanks a lot!
>>>>>>>>>
>>>>>>>>> Suli
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Sun, Apr 9, 2017 at 5:14 PM, Jerry He<je...@gmail.com>
>>>>>>>>>
>>>>>>>> wrote:
>>>>>
>>>>>> I don't know what your intention and your context are.
>>>>>>>>>>
>>>>>>>>>> You may get a different mvcc number and get different results
>>>>>>>>>>
>>>>>>>>> next
>>>>>
>>>>>> time
>>>>>>>
>>>>>>>> around if there are concurrent writes.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> Jerry
>>>>>>>>>>
>>>>>>>>>> On Sun, Apr 9, 2017 at 12:48 PM 杨苏立 Yang Su Li<
>>>>>>>>>>
>>>>>>>>> yangsuli@gmail.com
>>>>>
>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> I am wondering, for read requests like Get/MultiGet/Scan,
>>>>>>>>>>>
>>>>>>>>>> is
>>>
>>>> the
>>>>>
>>>>>> RPC
>>>>>>>
>>>>>>>> handling idempotent in HBase?
>>>>>>>>>>>
>>>>>>>>>>> More specifically, if in the middle of RPC handling we stop
>>>>>>>>>>>
>>>>>>>>>> the
>>>>
>>>>> handling
>>>>>>>>>
>>>>>>>>>> threads, puts the RPC call back to the queue, and later
>>>>>>>>>>>
>>>>>>>>>> another
>>>>
>>>>> RPC
>>>>>>
>>>>>>> Handler
>>>>>>>>>>
>>>>>>>>>>> picks up this call and starts all over again, will the
>>>>>>>>>>>
>>>>>>>>>> result
>>>
>>>> be
>>>>>
>>>>>> the
>>>>>>>
>>>>>>>> same
>>>>>>>>>
>>>>>>>>>> as if this call is being handled for the first time now? Or
>>>>>>>>>>>
>>>>>>>>>> are
>>>>
>>>>> their
>>>>>>>
>>>>>>>> any
>>>>>>>>>
>>>>>>>>>> unexpected side effects?
>>>>>>>>>>>
>>>>>>>>>>> Thanks!
>>>>>>>>>>>
>>>>>>>>>>> Suli
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Suli Yang
>>>>>>>>>>>
>>>>>>>>>>> Department of Physics
>>>>>>>>>>> University of Wisconsin Madison
>>>>>>>>>>>
>>>>>>>>>>> 4257 Chamberlin Hall
>>>>>>>>>>> Madison WI 53703
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Suli Yang
>>>>>>>>>
>>>>>>>>> Department of Physics
>>>>>>>>> University of Wisconsin Madison
>>>>>>>>>
>>>>>>>>> 4257 Chamberlin Hall
>>>>>>>>> Madison WI 53703
>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Suli Yang
>>>>>>>
>>>>>>> Department of Physics
>>>>>>> University of Wisconsin Madison
>>>>>>>
>>>>>>> 4257 Chamberlin Hall
>>>>>>> Madison WI 53703
>>>>>>>
>>>>>>>
>>>
>>> --
>>> Suli Yang
>>>
>>> Department of Physics
>>> University of Wisconsin Madison
>>>
>>> 4257 Chamberlin Hall
>>> Madison WI 53703
>>>
>>>
>>

Re: Is HBase RPC-Handling idempotent for reads?

Posted by Josh Elser <el...@apache.org>.

Yeah, neat idea now that I understand the big picture :)

Instead of trying to do this purely server-side, have you considered a 
first "wag" at a solution of hooking into the existing RPC quota work?

Presently, in the context of a user's RPCs, quotas only limit the number 
of RPCs that user makes in a timeframe. I think it would be a much 
easier first implementation to extend that work to include some notion 
of I/O "cost" to an RPC (in addition, hooking into the existing 
implementation). You'd inherit a lot of functionality and be able to 
test your hypothesis a bit quicker. Something like, "user elserj is only 
allowed to have 1MB/second of I/O at normal priority".

After you have the groundwork laid, it would be a natural follow-on step 
to reduce client RPC retries (handle it purely server-side), evaluate 
practical test cases for the I/O cost computations (are the costs 
"valid"?), figure out if existing RPC priorities are sufficient for 
de-prioritization of RPCs, etc.

Yu Li wrote:
> I see, some priority-based preemptive scheduling.
>
> bq. if it requires I/O resources that are not allocated to it
> Easy to tell whether the request misses the cache and requires IO
> operation, but what's the standard of "not allocated"? Some kind of
> timeout? Anyway, interesting topic and let us know if you work it out
> (smile).
>
> Best Regards,
> Yu
>
> On 11 April 2017 at 01:09, \u6768\u82cf\u7acb Yang Su Li<ya...@gmail.com>  wrote:
>
>> On Sun, Apr 9, 2017 at 11:14 PM, Yu Li<ca...@gmail.com>  wrote:
>>
>>> Correct me if I'm wrong, but I think we should assume no other but the
>>> single operation when checking whether it's idempotent. Similar to the
>>> wikipedia
>>> example<https://en.wikipedia.org/wiki/Idempotence#Examples>: "A
>> function
>>> looking up a customer's name and address in a database
>>> <https://en.wikipedia.org/wiki/Database>  is typically idempotent, since
>>> this will not cause the database to change", I think all
>> Get/MultiGet/Scan
>>> operations in hbase are idempotent.
>>>
>>> About "speculative rpc handling", I doubt whether it benefits in hbase.
>>> Normally if a request already arrives at server side but with slow
>>> execution, the problem might be:
>>> 1. The server is too busy and request get queued
>>> 2. The processing itself is slow due to the request pattern or some
>>> hardware failure
>>> I don't think a speculative execution of the request could help in any of
>>> the above cases. It's different from the speculative task execution in
>> MR,
>>> there we could choose another node to execute the task while here we have
>>> no choice.
>>>
>>
>> We have a different use case here. Basically we are trying to enforce
>> scheduling at HBase.
>> Consider the following scenario: both client-1 and client-2 are competing
>> for I/O resources.
>> But client-2 are also issuing a bunch of requests that do not require any
>> I/O resources (say, data is cached).
>> Since we have idle CPU/memory, we want to serve these cached requests for
>> client-2, but we do not want client-2 to use more than its fair share of
>> I/O.
>>
>> Unfortunately, at the time we pick RPC call to handle, we don't know
>> whether an RPC would cause I/O or not.
>> So we think we can abort a request if it requires I/O resources that are
>> not allocated to it, and re-schedule it later based on our scheduling
>> policy.
>>
>>
>>
>>
>>
>>
>>> OTOH, we already have timeout mechanism to make sure server resource
>> won't
>>> be wasted:
>>> 1. For scan
>>>      - When a request handling timeouts, server will stop further
>>> processing, refer to RSRpcServices#getTimeLimit and
>>> ScannerContext#checkTimeLimit
>>>      - If the client went away during processing, server will also stop
>>> processing, check the SimpleRpcServer#disconnectSince and
>>> RegionScannerImpl#nextInternal methods for more details.
>>>
>>> 2. For single Get
>>>      - Controlled by rpc and operation timeout
>>>
>>> 3. For MultiGet
>>>      - I think this is something we could improve. On client side we have
>>> timeout mechanism but on server side there seems to be no relative
>>> interrupt logic.
>>>
>>>
>>> Best Regards,
>>> Yu
>>>
>>> On 10 April 2017 at 11:12, Jerry He<je...@gmail.com>  wrote:
>>>
>>>> Again, it depends on how you abort and 'idempotent' can have different
>>>> definitions.
>>>>
>>>> For example, even if you are only concerned about read,
>>>> there are resources on the HRegion that the read touches or acquires
>>>> (scanner, lock, mvcc etc) that hopefully will be cleaned/releases with
>>> the
>>>> abort.
>>>> Or you may have it in a bad/inconsistent state.
>>>>
>>>> Thanks.
>>>>
>>>> Jerry
>>>>
>>>>
>>>> On Sun, Apr 9, 2017 at 7:14 PM, \u5f20\u94ce(Duo Zhang)<pa...@gmail.com>
>>>> wrote:
>>>>
>>>>> I think this depends on how you model the problem. At server side, if
>>> you
>>>>> re-execute a read operation with a new mvcc, then you may read a
>> value
>>>> that
>>>>> should not be visible if you use the old mvcc. If you define this as
>> an
>>>>> error then I think there will be conflicts.
>>>>>
>>>>> But at client side, there is guarantee that the request you send
>> first
>>>> will
>>>>> be executed first. So as long as the read request does not return, I
>>>> think
>>>>> it is OK to read a value which is written by a write request which is
>>>> sent
>>>>> after the read request?
>>>>>
>>>>> Thanks.
>>>>>
>>>>> 2017-04-10 9:52 GMT+08:00 \u6768\u82cf\u7acb Yang Su Li<ya...@gmail.com>:
>>>>>
>>>>>> We are only concerned about read operations here. Are you
>> suggesting
>>>> they
>>>>>> are completely idempotent?
>>>>>> Are there any read-after-write conflicts?
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> Sui
>>>>>>
>>>>>> On Sun, Apr 9, 2017 at 8:48 PM, \u5f20\u94ce(Duo Zhang)<
>> palomino219@gmail.com
>>>>>> wrote:
>>>>>>
>>>>>>> It depends on how you about the rpc request. For hbase, there
>> will
>>> be
>>>>> no
>>>>>>> write conflict, but a write operation can only be finished iff
>> all
>>>> the
>>>>>>> write operations with a lower mvcc number have been finished. So
>> if
>>>> you
>>>>>>> just stop a write operation without recovering the mvcc(I do not
>>> know
>>>>> how
>>>>>>> to recover but I think you need to something...) then the writes
>>> will
>>>>> be
>>>>>>> stuck.
>>>>>>>
>>>>>>> And one more thing, for read operation you may interrupt it at
>> any
>>>>> time,
>>>>>>> but for write operation, I do not think you can re-execute it
>> with
>>> a
>>>>> new
>>>>>>> mvcc number if the WAL entry has already been flushed out. That
>>>> means,
>>>>>> the
>>>>>>> re-execution process will be different if you about the write
>>>> operation
>>>>>> at
>>>>>>> different stages.
>>>>>>>
>>>>>>> Thanks.
>>>>>>>
>>>>>>> 2017-04-10 6:47 GMT+08:00 \u6768\u82cf\u7acb Yang Su Li<ya...@gmail.com>:
>>>>>>>
>>>>>>>> We are trying to implement speculative rpc handling for our
>>>>> workloads.
>>>>>> So
>>>>>>>> we want allow RPC Handler to stop executing an RPC call, put it
>>>> back
>>>>> to
>>>>>>> the
>>>>>>>> queue, and later re-execute it.
>>>>>>>>
>>>>>>>> If at time t1, we execute and RPC call half way, aborts, and
>> put
>>>> the
>>>>>> call
>>>>>>>> back to the queue.
>>>>>>>> Then at time t2 another RPC handler picks the call and
>> re-execute
>>>> it.
>>>>>>>> I understand that we might get a different mvcc number and
>>>> different
>>>>>>>> results at t2 compared to we execute it at t1.
>>>>>>>> My question is that: would this situation any different
>> compared
>>> to
>>>>> the
>>>>>>>> situation where the call was never executed at t1, and is
>>> executed
>>>> at
>>>>>> t2
>>>>>>>> for the first time.
>>>>>>>>
>>>>>>>>
>>>>>>>> My guess is that since at t1 we may already gotten an mvcc
>>> number,
>>>> so
>>>>>> it
>>>>>>>> might potentially cause some write conflicts and certain write
>>>>>> operations
>>>>>>>> to retry. But correctness wise, is there any difference?
>>>>>>>>
>>>>>>>> Thanks a lot!
>>>>>>>>
>>>>>>>> Suli
>>>>>>>>
>>>>>>>>
>>>>>>>> On Sun, Apr 9, 2017 at 5:14 PM, Jerry He<je...@gmail.com>
>>>> wrote:
>>>>>>>>> I don't know what your intention and your context are.
>>>>>>>>>
>>>>>>>>> You may get a different mvcc number and get different results
>>>> next
>>>>>> time
>>>>>>>>> around if there are concurrent writes.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> Jerry
>>>>>>>>>
>>>>>>>>> On Sun, Apr 9, 2017 at 12:48 PM \u6768\u82cf\u7acb Yang Su Li<
>>>> yangsuli@gmail.com
>>>>>>>> wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> I am wondering, for read requests like Get/MultiGet/Scan,
>> is
>>>> the
>>>>>> RPC
>>>>>>>>>> handling idempotent in HBase?
>>>>>>>>>>
>>>>>>>>>> More specifically, if in the middle of RPC handling we stop
>>> the
>>>>>>>> handling
>>>>>>>>>> threads, puts the RPC call back to the queue, and later
>>> another
>>>>> RPC
>>>>>>>>> Handler
>>>>>>>>>> picks up this call and starts all over again, will the
>> result
>>>> be
>>>>>> the
>>>>>>>> same
>>>>>>>>>> as if this call is being handled for the first time now? Or
>>> are
>>>>>> their
>>>>>>>> any
>>>>>>>>>> unexpected side effects?
>>>>>>>>>>
>>>>>>>>>> Thanks!
>>>>>>>>>>
>>>>>>>>>> Suli
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Suli Yang
>>>>>>>>>>
>>>>>>>>>> Department of Physics
>>>>>>>>>> University of Wisconsin Madison
>>>>>>>>>>
>>>>>>>>>> 4257 Chamberlin Hall
>>>>>>>>>> Madison WI 53703
>>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Suli Yang
>>>>>>>>
>>>>>>>> Department of Physics
>>>>>>>> University of Wisconsin Madison
>>>>>>>>
>>>>>>>> 4257 Chamberlin Hall
>>>>>>>> Madison WI 53703
>>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Suli Yang
>>>>>>
>>>>>> Department of Physics
>>>>>> University of Wisconsin Madison
>>>>>>
>>>>>> 4257 Chamberlin Hall
>>>>>> Madison WI 53703
>>>>>>
>>
>>
>> --
>> Suli Yang
>>
>> Department of Physics
>> University of Wisconsin Madison
>>
>> 4257 Chamberlin Hall
>> Madison WI 53703
>>
>

Re: Is HBase RPC-Handling idempotent for reads?

Posted by Yu Li <ca...@gmail.com>.

I see, some priority-based preemptive scheduling.

bq. if it requires I/O resources that are not allocated to it
Easy to tell whether the request misses the cache and requires IO
operation, but what's the standard of "not allocated"? Some kind of
timeout? Anyway, interesting topic and let us know if you work it out
(smile).

Best Regards,
Yu

On 11 April 2017 at 01:09, 杨苏立 Yang Su Li <ya...@gmail.com> wrote:

> On Sun, Apr 9, 2017 at 11:14 PM, Yu Li <ca...@gmail.com> wrote:
>
> > Correct me if I'm wrong, but I think we should assume no other but the
> > single operation when checking whether it's idempotent. Similar to the
> > wikipedia
> > example <https://en.wikipedia.org/wiki/Idempotence#Examples>: "A
> function
> > looking up a customer's name and address in a database
> > <https://en.wikipedia.org/wiki/Database> is typically idempotent, since
> > this will not cause the database to change", I think all
> Get/MultiGet/Scan
> > operations in hbase are idempotent.
> >
> > About "speculative rpc handling", I doubt whether it benefits in hbase.
> > Normally if a request already arrives at server side but with slow
> > execution, the problem might be:
> > 1. The server is too busy and request get queued
> > 2. The processing itself is slow due to the request pattern or some
> > hardware failure
> > I don't think a speculative execution of the request could help in any of
> > the above cases. It's different from the speculative task execution in
> MR,
> > there we could choose another node to execute the task while here we have
> > no choice.
> >
>
>
> We have a different use case here. Basically we are trying to enforce
> scheduling at HBase.
> Consider the following scenario: both client-1 and client-2 are competing
> for I/O resources.
> But client-2 are also issuing a bunch of requests that do not require any
> I/O resources (say, data is cached).
> Since we have idle CPU/memory, we want to serve these cached requests for
> client-2, but we do not want client-2 to use more than its fair share of
> I/O.
>
> Unfortunately, at the time we pick RPC call to handle, we don't know
> whether an RPC would cause I/O or not.
> So we think we can abort a request if it requires I/O resources that are
> not allocated to it, and re-schedule it later based on our scheduling
> policy.
>
>
>
>
>
>
> >
> > OTOH, we already have timeout mechanism to make sure server resource
> won't
> > be wasted:
> > 1. For scan
> >     - When a request handling timeouts, server will stop further
> > processing, refer to RSRpcServices#getTimeLimit and
> > ScannerContext#checkTimeLimit
> >     - If the client went away during processing, server will also stop
> > processing, check the SimpleRpcServer#disconnectSince and
> > RegionScannerImpl#nextInternal methods for more details.
> >
> > 2. For single Get
> >     - Controlled by rpc and operation timeout
> >
> > 3. For MultiGet
> >     - I think this is something we could improve. On client side we have
> > timeout mechanism but on server side there seems to be no relative
> > interrupt logic.
> >
> >
> > Best Regards,
> > Yu
> >
> > On 10 April 2017 at 11:12, Jerry He <je...@gmail.com> wrote:
> >
> > > Again, it depends on how you abort and 'idempotent' can have different
> > > definitions.
> > >
> > > For example, even if you are only concerned about read,
> > > there are resources on the HRegion that the read touches or acquires
> > > (scanner, lock, mvcc etc) that hopefully will be cleaned/releases with
> > the
> > > abort.
> > > Or you may have it in a bad/inconsistent state.
> > >
> > > Thanks.
> > >
> > > Jerry
> > >
> > >
> > > On Sun, Apr 9, 2017 at 7:14 PM, 张铎(Duo Zhang) <pa...@gmail.com>
> > > wrote:
> > >
> > > > I think this depends on how you model the problem. At server side, if
> > you
> > > > re-execute a read operation with a new mvcc, then you may read a
> value
> > > that
> > > > should not be visible if you use the old mvcc. If you define this as
> an
> > > > error then I think there will be conflicts.
> > > >
> > > > But at client side, there is guarantee that the request you send
> first
> > > will
> > > > be executed first. So as long as the read request does not return, I
> > > think
> > > > it is OK to read a value which is written by a write request which is
> > > sent
> > > > after the read request?
> > > >
> > > > Thanks.
> > > >
> > > > 2017-04-10 9:52 GMT+08:00 杨苏立 Yang Su Li <ya...@gmail.com>:
> > > >
> > > > > We are only concerned about read operations here. Are you
> suggesting
> > > they
> > > > > are completely idempotent?
> > > > > Are there any read-after-write conflicts?
> > > > >
> > > > > Thanks
> > > > >
> > > > > Sui
> > > > >
> > > > > On Sun, Apr 9, 2017 at 8:48 PM, 张铎(Duo Zhang) <
> palomino219@gmail.com
> > >
> > > > > wrote:
> > > > >
> > > > > > It depends on how you about the rpc request. For hbase, there
> will
> > be
> > > > no
> > > > > > write conflict, but a write operation can only be finished iff
> all
> > > the
> > > > > > write operations with a lower mvcc number have been finished. So
> if
> > > you
> > > > > > just stop a write operation without recovering the mvcc(I do not
> > know
> > > > how
> > > > > > to recover but I think you need to something...) then the writes
> > will
> > > > be
> > > > > > stuck.
> > > > > >
> > > > > > And one more thing, for read operation you may interrupt it at
> any
> > > > time,
> > > > > > but for write operation, I do not think you can re-execute it
> with
> > a
> > > > new
> > > > > > mvcc number if the WAL entry has already been flushed out. That
> > > means,
> > > > > the
> > > > > > re-execution process will be different if you about the write
> > > operation
> > > > > at
> > > > > > different stages.
> > > > > >
> > > > > > Thanks.
> > > > > >
> > > > > > 2017-04-10 6:47 GMT+08:00 杨苏立 Yang Su Li <ya...@gmail.com>:
> > > > > >
> > > > > > > We are trying to implement speculative rpc handling for our
> > > > workloads.
> > > > > So
> > > > > > > we want allow RPC Handler to stop executing an RPC call, put it
> > > back
> > > > to
> > > > > > the
> > > > > > > queue, and later re-execute it.
> > > > > > >
> > > > > > > If at time t1, we execute and RPC call half way, aborts, and
> put
> > > the
> > > > > call
> > > > > > > back to the queue.
> > > > > > > Then at time t2 another RPC handler picks the call and
> re-execute
> > > it.
> > > > > > > I understand that we might get a different mvcc number and
> > > different
> > > > > > > results at t2 compared to we execute it at t1.
> > > > > > > My question is that: would this situation any different
> compared
> > to
> > > > the
> > > > > > > situation where the call was never executed at t1, and is
> > executed
> > > at
> > > > > t2
> > > > > > > for the first time.
> > > > > > >
> > > > > > >
> > > > > > > My guess is that since at t1 we may already gotten an mvcc
> > number,
> > > so
> > > > > it
> > > > > > > might potentially cause some write conflicts and certain write
> > > > > operations
> > > > > > > to retry. But correctness wise, is there any difference?
> > > > > > >
> > > > > > > Thanks a lot!
> > > > > > >
> > > > > > > Suli
> > > > > > >
> > > > > > >
> > > > > > > On Sun, Apr 9, 2017 at 5:14 PM, Jerry He <je...@gmail.com>
> > > wrote:
> > > > > > >
> > > > > > > > I don't know what your intention and your context are.
> > > > > > > >
> > > > > > > > You may get a different mvcc number and get different results
> > > next
> > > > > time
> > > > > > > > around if there are concurrent writes.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > >
> > > > > > > > Jerry
> > > > > > > >
> > > > > > > > On Sun, Apr 9, 2017 at 12:48 PM 杨苏立 Yang Su Li <
> > > yangsuli@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi,
> > > > > > > > >
> > > > > > > > > I am wondering, for read requests like Get/MultiGet/Scan,
> is
> > > the
> > > > > RPC
> > > > > > > > > handling idempotent in HBase?
> > > > > > > > >
> > > > > > > > > More specifically, if in the middle of RPC handling we stop
> > the
> > > > > > > handling
> > > > > > > > > threads, puts the RPC call back to the queue, and later
> > another
> > > > RPC
> > > > > > > > Handler
> > > > > > > > > picks up this call and starts all over again, will the
> result
> > > be
> > > > > the
> > > > > > > same
> > > > > > > > > as if this call is being handled for the first time now? Or
> > are
> > > > > their
> > > > > > > any
> > > > > > > > > unexpected side effects?
> > > > > > > > >
> > > > > > > > > Thanks!
> > > > > > > > >
> > > > > > > > > Suli
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Suli Yang
> > > > > > > > >
> > > > > > > > > Department of Physics
> > > > > > > > > University of Wisconsin Madison
> > > > > > > > >
> > > > > > > > > 4257 Chamberlin Hall
> > > > > > > > > Madison WI 53703
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Suli Yang
> > > > > > >
> > > > > > > Department of Physics
> > > > > > > University of Wisconsin Madison
> > > > > > >
> > > > > > > 4257 Chamberlin Hall
> > > > > > > Madison WI 53703
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Suli Yang
> > > > >
> > > > > Department of Physics
> > > > > University of Wisconsin Madison
> > > > >
> > > > > 4257 Chamberlin Hall
> > > > > Madison WI 53703
> > > > >
> > > >
> > >
> >
>
>
>
> --
> Suli Yang
>
> Department of Physics
> University of Wisconsin Madison
>
> 4257 Chamberlin Hall
> Madison WI 53703
>

Re: Is HBase RPC-Handling idempotent for reads?

Posted by 杨苏立 Yang Su Li <ya...@gmail.com>.

On Sun, Apr 9, 2017 at 11:14 PM, Yu Li <ca...@gmail.com> wrote:

> Correct me if I'm wrong, but I think we should assume no other but the
> single operation when checking whether it's idempotent. Similar to the
> wikipedia
> example <https://en.wikipedia.org/wiki/Idempotence#Examples>: "A function
> looking up a customer's name and address in a database
> <https://en.wikipedia.org/wiki/Database> is typically idempotent, since
> this will not cause the database to change", I think all Get/MultiGet/Scan
> operations in hbase are idempotent.
>
> About "speculative rpc handling", I doubt whether it benefits in hbase.
> Normally if a request already arrives at server side but with slow
> execution, the problem might be:
> 1. The server is too busy and request get queued
> 2. The processing itself is slow due to the request pattern or some
> hardware failure
> I don't think a speculative execution of the request could help in any of
> the above cases. It's different from the speculative task execution in MR,
> there we could choose another node to execute the task while here we have
> no choice.
>


We have a different use case here. Basically we are trying to enforce
scheduling at HBase.
Consider the following scenario: both client-1 and client-2 are competing
for I/O resources.
But client-2 are also issuing a bunch of requests that do not require any
I/O resources (say, data is cached).
Since we have idle CPU/memory, we want to serve these cached requests for
client-2, but we do not want client-2 to use more than its fair share of
I/O.

Unfortunately, at the time we pick RPC call to handle, we don't know
whether an RPC would cause I/O or not.
So we think we can abort a request if it requires I/O resources that are
not allocated to it, and re-schedule it later based on our scheduling
policy.






>
> OTOH, we already have timeout mechanism to make sure server resource won't
> be wasted:
> 1. For scan
>     - When a request handling timeouts, server will stop further
> processing, refer to RSRpcServices#getTimeLimit and
> ScannerContext#checkTimeLimit
>     - If the client went away during processing, server will also stop
> processing, check the SimpleRpcServer#disconnectSince and
> RegionScannerImpl#nextInternal methods for more details.
>
> 2. For single Get
>     - Controlled by rpc and operation timeout
>
> 3. For MultiGet
>     - I think this is something we could improve. On client side we have
> timeout mechanism but on server side there seems to be no relative
> interrupt logic.
>
>
> Best Regards,
> Yu
>
> On 10 April 2017 at 11:12, Jerry He <je...@gmail.com> wrote:
>
> > Again, it depends on how you abort and 'idempotent' can have different
> > definitions.
> >
> > For example, even if you are only concerned about read,
> > there are resources on the HRegion that the read touches or acquires
> > (scanner, lock, mvcc etc) that hopefully will be cleaned/releases with
> the
> > abort.
> > Or you may have it in a bad/inconsistent state.
> >
> > Thanks.
> >
> > Jerry
> >
> >
> > On Sun, Apr 9, 2017 at 7:14 PM, 张铎(Duo Zhang) <pa...@gmail.com>
> > wrote:
> >
> > > I think this depends on how you model the problem. At server side, if
> you
> > > re-execute a read operation with a new mvcc, then you may read a value
> > that
> > > should not be visible if you use the old mvcc. If you define this as an
> > > error then I think there will be conflicts.
> > >
> > > But at client side, there is guarantee that the request you send first
> > will
> > > be executed first. So as long as the read request does not return, I
> > think
> > > it is OK to read a value which is written by a write request which is
> > sent
> > > after the read request?
> > >
> > > Thanks.
> > >
> > > 2017-04-10 9:52 GMT+08:00 杨苏立 Yang Su Li <ya...@gmail.com>:
> > >
> > > > We are only concerned about read operations here. Are you suggesting
> > they
> > > > are completely idempotent?
> > > > Are there any read-after-write conflicts?
> > > >
> > > > Thanks
> > > >
> > > > Sui
> > > >
> > > > On Sun, Apr 9, 2017 at 8:48 PM, 张铎(Duo Zhang) <palomino219@gmail.com
> >
> > > > wrote:
> > > >
> > > > > It depends on how you about the rpc request. For hbase, there will
> be
> > > no
> > > > > write conflict, but a write operation can only be finished iff all
> > the
> > > > > write operations with a lower mvcc number have been finished. So if
> > you
> > > > > just stop a write operation without recovering the mvcc(I do not
> know
> > > how
> > > > > to recover but I think you need to something...) then the writes
> will
> > > be
> > > > > stuck.
> > > > >
> > > > > And one more thing, for read operation you may interrupt it at any
> > > time,
> > > > > but for write operation, I do not think you can re-execute it with
> a
> > > new
> > > > > mvcc number if the WAL entry has already been flushed out. That
> > means,
> > > > the
> > > > > re-execution process will be different if you about the write
> > operation
> > > > at
> > > > > different stages.
> > > > >
> > > > > Thanks.
> > > > >
> > > > > 2017-04-10 6:47 GMT+08:00 杨苏立 Yang Su Li <ya...@gmail.com>:
> > > > >
> > > > > > We are trying to implement speculative rpc handling for our
> > > workloads.
> > > > So
> > > > > > we want allow RPC Handler to stop executing an RPC call, put it
> > back
> > > to
> > > > > the
> > > > > > queue, and later re-execute it.
> > > > > >
> > > > > > If at time t1, we execute and RPC call half way, aborts, and put
> > the
> > > > call
> > > > > > back to the queue.
> > > > > > Then at time t2 another RPC handler picks the call and re-execute
> > it.
> > > > > > I understand that we might get a different mvcc number and
> > different
> > > > > > results at t2 compared to we execute it at t1.
> > > > > > My question is that: would this situation any different compared
> to
> > > the
> > > > > > situation where the call was never executed at t1, and is
> executed
> > at
> > > > t2
> > > > > > for the first time.
> > > > > >
> > > > > >
> > > > > > My guess is that since at t1 we may already gotten an mvcc
> number,
> > so
> > > > it
> > > > > > might potentially cause some write conflicts and certain write
> > > > operations
> > > > > > to retry. But correctness wise, is there any difference?
> > > > > >
> > > > > > Thanks a lot!
> > > > > >
> > > > > > Suli
> > > > > >
> > > > > >
> > > > > > On Sun, Apr 9, 2017 at 5:14 PM, Jerry He <je...@gmail.com>
> > wrote:
> > > > > >
> > > > > > > I don't know what your intention and your context are.
> > > > > > >
> > > > > > > You may get a different mvcc number and get different results
> > next
> > > > time
> > > > > > > around if there are concurrent writes.
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Jerry
> > > > > > >
> > > > > > > On Sun, Apr 9, 2017 at 12:48 PM 杨苏立 Yang Su Li <
> > yangsuli@gmail.com
> > > >
> > > > > > wrote:
> > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > I am wondering, for read requests like Get/MultiGet/Scan, is
> > the
> > > > RPC
> > > > > > > > handling idempotent in HBase?
> > > > > > > >
> > > > > > > > More specifically, if in the middle of RPC handling we stop
> the
> > > > > > handling
> > > > > > > > threads, puts the RPC call back to the queue, and later
> another
> > > RPC
> > > > > > > Handler
> > > > > > > > picks up this call and starts all over again, will the result
> > be
> > > > the
> > > > > > same
> > > > > > > > as if this call is being handled for the first time now? Or
> are
> > > > their
> > > > > > any
> > > > > > > > unexpected side effects?
> > > > > > > >
> > > > > > > > Thanks!
> > > > > > > >
> > > > > > > > Suli
> > > > > > > >
> > > > > > > > --
> > > > > > > > Suli Yang
> > > > > > > >
> > > > > > > > Department of Physics
> > > > > > > > University of Wisconsin Madison
> > > > > > > >
> > > > > > > > 4257 Chamberlin Hall
> > > > > > > > Madison WI 53703
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Suli Yang
> > > > > >
> > > > > > Department of Physics
> > > > > > University of Wisconsin Madison
> > > > > >
> > > > > > 4257 Chamberlin Hall
> > > > > > Madison WI 53703
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Suli Yang
> > > >
> > > > Department of Physics
> > > > University of Wisconsin Madison
> > > >
> > > > 4257 Chamberlin Hall
> > > > Madison WI 53703
> > > >
> > >
> >
>



-- 
Suli Yang

Department of Physics
University of Wisconsin Madison

4257 Chamberlin Hall
Madison WI 53703

Re: Is HBase RPC-Handling idempotent for reads?

Posted by Jerry He <je...@gmail.com>.

Yes.  In the context to the underlying physical region or database,. read
is idempotent.


Thanks

Jerry

On Apr 9, 2017 9:15 PM, "Yu Li" <ca...@gmail.com> wrote:

> Correct me if I'm wrong, but I think we should assume no other but the
> single operation when checking whether it's idempotent. Similar to the
> wikipedia
> example <https://en.wikipedia.org/wiki/Idempotence#Examples>: "A function
> looking up a customer's name and address in a database
> <https://en.wikipedia.org/wiki/Database> is typically idempotent, since
> this will not cause the database to change", I think all Get/MultiGet/Scan
> operations in hbase are idempotent.
>
> About "speculative rpc handling", I doubt whether it benefits in hbase.
> Normally if a request already arrives at server side but with slow
> execution, the problem might be:
> 1. The server is too busy and request get queued
> 2. The processing itself is slow due to the request pattern or some
> hardware failure
> I don't think a speculative execution of the request could help in any of
> the above cases. It's different from the speculative task execution in MR,
> there we could choose another node to execute the task while here we have
> no choice.
>
> OTOH, we already have timeout mechanism to make sure server resource won't
> be wasted:
> 1. For scan
>     - When a request handling timeouts, server will stop further
> processing, refer to RSRpcServices#getTimeLimit and
> ScannerContext#checkTimeLimit
>     - If the client went away during processing, server will also stop
> processing, check the SimpleRpcServer#disconnectSince and
> RegionScannerImpl#nextInternal methods for more details.
>
> 2. For single Get
>     - Controlled by rpc and operation timeout
>
> 3. For MultiGet
>     - I think this is something we could improve. On client side we have
> timeout mechanism but on server side there seems to be no relative
> interrupt logic.
>
>
> Best Regards,
> Yu
>
> On 10 April 2017 at 11:12, Jerry He <je...@gmail.com> wrote:
>
> > Again, it depends on how you abort and 'idempotent' can have different
> > definitions.
> >
> > For example, even if you are only concerned about read,
> > there are resources on the HRegion that the read touches or acquires
> > (scanner, lock, mvcc etc) that hopefully will be cleaned/releases with
> the
> > abort.
> > Or you may have it in a bad/inconsistent state.
> >
> > Thanks.
> >
> > Jerry
> >
> >
> > On Sun, Apr 9, 2017 at 7:14 PM, 张铎(Duo Zhang) <pa...@gmail.com>
> > wrote:
> >
> > > I think this depends on how you model the problem. At server side, if
> you
> > > re-execute a read operation with a new mvcc, then you may read a value
> > that
> > > should not be visible if you use the old mvcc. If you define this as an
> > > error then I think there will be conflicts.
> > >
> > > But at client side, there is guarantee that the request you send first
> > will
> > > be executed first. So as long as the read request does not return, I
> > think
> > > it is OK to read a value which is written by a write request which is
> > sent
> > > after the read request?
> > >
> > > Thanks.
> > >
> > > 2017-04-10 9:52 GMT+08:00 杨苏立 Yang Su Li <ya...@gmail.com>:
> > >
> > > > We are only concerned about read operations here. Are you suggesting
> > they
> > > > are completely idempotent?
> > > > Are there any read-after-write conflicts?
> > > >
> > > > Thanks
> > > >
> > > > Sui
> > > >
> > > > On Sun, Apr 9, 2017 at 8:48 PM, 张铎(Duo Zhang) <palomino219@gmail.com
> >
> > > > wrote:
> > > >
> > > > > It depends on how you about the rpc request. For hbase, there will
> be
> > > no
> > > > > write conflict, but a write operation can only be finished iff all
> > the
> > > > > write operations with a lower mvcc number have been finished. So if
> > you
> > > > > just stop a write operation without recovering the mvcc(I do not
> know
> > > how
> > > > > to recover but I think you need to something...) then the writes
> will
> > > be
> > > > > stuck.
> > > > >
> > > > > And one more thing, for read operation you may interrupt it at any
> > > time,
> > > > > but for write operation, I do not think you can re-execute it with
> a
> > > new
> > > > > mvcc number if the WAL entry has already been flushed out. That
> > means,
> > > > the
> > > > > re-execution process will be different if you about the write
> > operation
> > > > at
> > > > > different stages.
> > > > >
> > > > > Thanks.
> > > > >
> > > > > 2017-04-10 6:47 GMT+08:00 杨苏立 Yang Su Li <ya...@gmail.com>:
> > > > >
> > > > > > We are trying to implement speculative rpc handling for our
> > > workloads.
> > > > So
> > > > > > we want allow RPC Handler to stop executing an RPC call, put it
> > back
> > > to
> > > > > the
> > > > > > queue, and later re-execute it.
> > > > > >
> > > > > > If at time t1, we execute and RPC call half way, aborts, and put
> > the
> > > > call
> > > > > > back to the queue.
> > > > > > Then at time t2 another RPC handler picks the call and re-execute
> > it.
> > > > > > I understand that we might get a different mvcc number and
> > different
> > > > > > results at t2 compared to we execute it at t1.
> > > > > > My question is that: would this situation any different compared
> to
> > > the
> > > > > > situation where the call was never executed at t1, and is
> executed
> > at
> > > > t2
> > > > > > for the first time.
> > > > > >
> > > > > >
> > > > > > My guess is that since at t1 we may already gotten an mvcc
> number,
> > so
> > > > it
> > > > > > might potentially cause some write conflicts and certain write
> > > > operations
> > > > > > to retry. But correctness wise, is there any difference?
> > > > > >
> > > > > > Thanks a lot!
> > > > > >
> > > > > > Suli
> > > > > >
> > > > > >
> > > > > > On Sun, Apr 9, 2017 at 5:14 PM, Jerry He <je...@gmail.com>
> > wrote:
> > > > > >
> > > > > > > I don't know what your intention and your context are.
> > > > > > >
> > > > > > > You may get a different mvcc number and get different results
> > next
> > > > time
> > > > > > > around if there are concurrent writes.
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Jerry
> > > > > > >
> > > > > > > On Sun, Apr 9, 2017 at 12:48 PM 杨苏立 Yang Su Li <
> > yangsuli@gmail.com
> > > >
> > > > > > wrote:
> > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > I am wondering, for read requests like Get/MultiGet/Scan, is
> > the
> > > > RPC
> > > > > > > > handling idempotent in HBase?
> > > > > > > >
> > > > > > > > More specifically, if in the middle of RPC handling we stop
> the
> > > > > > handling
> > > > > > > > threads, puts the RPC call back to the queue, and later
> another
> > > RPC
> > > > > > > Handler
> > > > > > > > picks up this call and starts all over again, will the result
> > be
> > > > the
> > > > > > same
> > > > > > > > as if this call is being handled for the first time now? Or
> are
> > > > their
> > > > > > any
> > > > > > > > unexpected side effects?
> > > > > > > >
> > > > > > > > Thanks!
> > > > > > > >
> > > > > > > > Suli
> > > > > > > >
> > > > > > > > --
> > > > > > > > Suli Yang
> > > > > > > >
> > > > > > > > Department of Physics
> > > > > > > > University of Wisconsin Madison
> > > > > > > >
> > > > > > > > 4257 Chamberlin Hall
> > > > > > > > Madison WI 53703
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Suli Yang
> > > > > >
> > > > > > Department of Physics
> > > > > > University of Wisconsin Madison
> > > > > >
> > > > > > 4257 Chamberlin Hall
> > > > > > Madison WI 53703
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Suli Yang
> > > >
> > > > Department of Physics
> > > > University of Wisconsin Madison
> > > >
> > > > 4257 Chamberlin Hall
> > > > Madison WI 53703
> > > >
> > >
> >
>

Re: Is HBase RPC-Handling idempotent for reads?

Posted by Yu Li <ca...@gmail.com>.

Correct me if I'm wrong, but I think we should assume no other but the
single operation when checking whether it's idempotent. Similar to the
wikipedia
example <https://en.wikipedia.org/wiki/Idempotence#Examples>: "A function
looking up a customer's name and address in a database
<https://en.wikipedia.org/wiki/Database> is typically idempotent, since
this will not cause the database to change", I think all Get/MultiGet/Scan
operations in hbase are idempotent.

About "speculative rpc handling", I doubt whether it benefits in hbase.
Normally if a request already arrives at server side but with slow
execution, the problem might be:
1. The server is too busy and request get queued
2. The processing itself is slow due to the request pattern or some
hardware failure
I don't think a speculative execution of the request could help in any of
the above cases. It's different from the speculative task execution in MR,
there we could choose another node to execute the task while here we have
no choice.

OTOH, we already have timeout mechanism to make sure server resource won't
be wasted:
1. For scan
    - When a request handling timeouts, server will stop further
processing, refer to RSRpcServices#getTimeLimit and
ScannerContext#checkTimeLimit
    - If the client went away during processing, server will also stop
processing, check the SimpleRpcServer#disconnectSince and
RegionScannerImpl#nextInternal methods for more details.

2. For single Get
    - Controlled by rpc and operation timeout

3. For MultiGet
    - I think this is something we could improve. On client side we have
timeout mechanism but on server side there seems to be no relative
interrupt logic.


Best Regards,
Yu

On 10 April 2017 at 11:12, Jerry He <je...@gmail.com> wrote:

> Again, it depends on how you abort and 'idempotent' can have different
> definitions.
>
> For example, even if you are only concerned about read,
> there are resources on the HRegion that the read touches or acquires
> (scanner, lock, mvcc etc) that hopefully will be cleaned/releases with the
> abort.
> Or you may have it in a bad/inconsistent state.
>
> Thanks.
>
> Jerry
>
>
> On Sun, Apr 9, 2017 at 7:14 PM, 张铎(Duo Zhang) <pa...@gmail.com>
> wrote:
>
> > I think this depends on how you model the problem. At server side, if you
> > re-execute a read operation with a new mvcc, then you may read a value
> that
> > should not be visible if you use the old mvcc. If you define this as an
> > error then I think there will be conflicts.
> >
> > But at client side, there is guarantee that the request you send first
> will
> > be executed first. So as long as the read request does not return, I
> think
> > it is OK to read a value which is written by a write request which is
> sent
> > after the read request?
> >
> > Thanks.
> >
> > 2017-04-10 9:52 GMT+08:00 杨苏立 Yang Su Li <ya...@gmail.com>:
> >
> > > We are only concerned about read operations here. Are you suggesting
> they
> > > are completely idempotent?
> > > Are there any read-after-write conflicts?
> > >
> > > Thanks
> > >
> > > Sui
> > >
> > > On Sun, Apr 9, 2017 at 8:48 PM, 张铎(Duo Zhang) <pa...@gmail.com>
> > > wrote:
> > >
> > > > It depends on how you about the rpc request. For hbase, there will be
> > no
> > > > write conflict, but a write operation can only be finished iff all
> the
> > > > write operations with a lower mvcc number have been finished. So if
> you
> > > > just stop a write operation without recovering the mvcc(I do not know
> > how
> > > > to recover but I think you need to something...) then the writes will
> > be
> > > > stuck.
> > > >
> > > > And one more thing, for read operation you may interrupt it at any
> > time,
> > > > but for write operation, I do not think you can re-execute it with a
> > new
> > > > mvcc number if the WAL entry has already been flushed out. That
> means,
> > > the
> > > > re-execution process will be different if you about the write
> operation
> > > at
> > > > different stages.
> > > >
> > > > Thanks.
> > > >
> > > > 2017-04-10 6:47 GMT+08:00 杨苏立 Yang Su Li <ya...@gmail.com>:
> > > >
> > > > > We are trying to implement speculative rpc handling for our
> > workloads.
> > > So
> > > > > we want allow RPC Handler to stop executing an RPC call, put it
> back
> > to
> > > > the
> > > > > queue, and later re-execute it.
> > > > >
> > > > > If at time t1, we execute and RPC call half way, aborts, and put
> the
> > > call
> > > > > back to the queue.
> > > > > Then at time t2 another RPC handler picks the call and re-execute
> it.
> > > > > I understand that we might get a different mvcc number and
> different
> > > > > results at t2 compared to we execute it at t1.
> > > > > My question is that: would this situation any different compared to
> > the
> > > > > situation where the call was never executed at t1, and is executed
> at
> > > t2
> > > > > for the first time.
> > > > >
> > > > >
> > > > > My guess is that since at t1 we may already gotten an mvcc number,
> so
> > > it
> > > > > might potentially cause some write conflicts and certain write
> > > operations
> > > > > to retry. But correctness wise, is there any difference?
> > > > >
> > > > > Thanks a lot!
> > > > >
> > > > > Suli
> > > > >
> > > > >
> > > > > On Sun, Apr 9, 2017 at 5:14 PM, Jerry He <je...@gmail.com>
> wrote:
> > > > >
> > > > > > I don't know what your intention and your context are.
> > > > > >
> > > > > > You may get a different mvcc number and get different results
> next
> > > time
> > > > > > around if there are concurrent writes.
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Jerry
> > > > > >
> > > > > > On Sun, Apr 9, 2017 at 12:48 PM 杨苏立 Yang Su Li <
> yangsuli@gmail.com
> > >
> > > > > wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > I am wondering, for read requests like Get/MultiGet/Scan, is
> the
> > > RPC
> > > > > > > handling idempotent in HBase?
> > > > > > >
> > > > > > > More specifically, if in the middle of RPC handling we stop the
> > > > > handling
> > > > > > > threads, puts the RPC call back to the queue, and later another
> > RPC
> > > > > > Handler
> > > > > > > picks up this call and starts all over again, will the result
> be
> > > the
> > > > > same
> > > > > > > as if this call is being handled for the first time now? Or are
> > > their
> > > > > any
> > > > > > > unexpected side effects?
> > > > > > >
> > > > > > > Thanks!
> > > > > > >
> > > > > > > Suli
> > > > > > >
> > > > > > > --
> > > > > > > Suli Yang
> > > > > > >
> > > > > > > Department of Physics
> > > > > > > University of Wisconsin Madison
> > > > > > >
> > > > > > > 4257 Chamberlin Hall
> > > > > > > Madison WI 53703
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Suli Yang
> > > > >
> > > > > Department of Physics
> > > > > University of Wisconsin Madison
> > > > >
> > > > > 4257 Chamberlin Hall
> > > > > Madison WI 53703
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Suli Yang
> > >
> > > Department of Physics
> > > University of Wisconsin Madison
> > >
> > > 4257 Chamberlin Hall
> > > Madison WI 53703
> > >
> >
>

Re: Is HBase RPC-Handling idempotent for reads?

Posted by Jerry He <je...@gmail.com>.

Again, it depends on how you abort and 'idempotent' can have different
definitions.

For example, even if you are only concerned about read,
there are resources on the HRegion that the read touches or acquires
(scanner, lock, mvcc etc) that hopefully will be cleaned/releases with the
abort.
Or you may have it in a bad/inconsistent state.

Thanks.

Jerry


On Sun, Apr 9, 2017 at 7:14 PM, 张铎(Duo Zhang) <pa...@gmail.com> wrote:

> I think this depends on how you model the problem. At server side, if you
> re-execute a read operation with a new mvcc, then you may read a value that
> should not be visible if you use the old mvcc. If you define this as an
> error then I think there will be conflicts.
>
> But at client side, there is guarantee that the request you send first will
> be executed first. So as long as the read request does not return, I think
> it is OK to read a value which is written by a write request which is sent
> after the read request?
>
> Thanks.
>
> 2017-04-10 9:52 GMT+08:00 杨苏立 Yang Su Li <ya...@gmail.com>:
>
> > We are only concerned about read operations here. Are you suggesting they
> > are completely idempotent?
> > Are there any read-after-write conflicts?
> >
> > Thanks
> >
> > Sui
> >
> > On Sun, Apr 9, 2017 at 8:48 PM, 张铎(Duo Zhang) <pa...@gmail.com>
> > wrote:
> >
> > > It depends on how you about the rpc request. For hbase, there will be
> no
> > > write conflict, but a write operation can only be finished iff all the
> > > write operations with a lower mvcc number have been finished. So if you
> > > just stop a write operation without recovering the mvcc(I do not know
> how
> > > to recover but I think you need to something...) then the writes will
> be
> > > stuck.
> > >
> > > And one more thing, for read operation you may interrupt it at any
> time,
> > > but for write operation, I do not think you can re-execute it with a
> new
> > > mvcc number if the WAL entry has already been flushed out. That means,
> > the
> > > re-execution process will be different if you about the write operation
> > at
> > > different stages.
> > >
> > > Thanks.
> > >
> > > 2017-04-10 6:47 GMT+08:00 杨苏立 Yang Su Li <ya...@gmail.com>:
> > >
> > > > We are trying to implement speculative rpc handling for our
> workloads.
> > So
> > > > we want allow RPC Handler to stop executing an RPC call, put it back
> to
> > > the
> > > > queue, and later re-execute it.
> > > >
> > > > If at time t1, we execute and RPC call half way, aborts, and put the
> > call
> > > > back to the queue.
> > > > Then at time t2 another RPC handler picks the call and re-execute it.
> > > > I understand that we might get a different mvcc number and different
> > > > results at t2 compared to we execute it at t1.
> > > > My question is that: would this situation any different compared to
> the
> > > > situation where the call was never executed at t1, and is executed at
> > t2
> > > > for the first time.
> > > >
> > > >
> > > > My guess is that since at t1 we may already gotten an mvcc number, so
> > it
> > > > might potentially cause some write conflicts and certain write
> > operations
> > > > to retry. But correctness wise, is there any difference?
> > > >
> > > > Thanks a lot!
> > > >
> > > > Suli
> > > >
> > > >
> > > > On Sun, Apr 9, 2017 at 5:14 PM, Jerry He <je...@gmail.com> wrote:
> > > >
> > > > > I don't know what your intention and your context are.
> > > > >
> > > > > You may get a different mvcc number and get different results next
> > time
> > > > > around if there are concurrent writes.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jerry
> > > > >
> > > > > On Sun, Apr 9, 2017 at 12:48 PM 杨苏立 Yang Su Li <yangsuli@gmail.com
> >
> > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I am wondering, for read requests like Get/MultiGet/Scan, is the
> > RPC
> > > > > > handling idempotent in HBase?
> > > > > >
> > > > > > More specifically, if in the middle of RPC handling we stop the
> > > > handling
> > > > > > threads, puts the RPC call back to the queue, and later another
> RPC
> > > > > Handler
> > > > > > picks up this call and starts all over again, will the result be
> > the
> > > > same
> > > > > > as if this call is being handled for the first time now? Or are
> > their
> > > > any
> > > > > > unexpected side effects?
> > > > > >
> > > > > > Thanks!
> > > > > >
> > > > > > Suli
> > > > > >
> > > > > > --
> > > > > > Suli Yang
> > > > > >
> > > > > > Department of Physics
> > > > > > University of Wisconsin Madison
> > > > > >
> > > > > > 4257 Chamberlin Hall
> > > > > > Madison WI 53703
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Suli Yang
> > > >
> > > > Department of Physics
> > > > University of Wisconsin Madison
> > > >
> > > > 4257 Chamberlin Hall
> > > > Madison WI 53703
> > > >
> > >
> >
> >
> >
> > --
> > Suli Yang
> >
> > Department of Physics
> > University of Wisconsin Madison
> >
> > 4257 Chamberlin Hall
> > Madison WI 53703
> >
>

Re: Is HBase RPC-Handling idempotent for reads?

Posted by "张铎 (Duo Zhang)" <pa...@gmail.com>.

I think this depends on how you model the problem. At server side, if you
re-execute a read operation with a new mvcc, then you may read a value that
should not be visible if you use the old mvcc. If you define this as an
error then I think there will be conflicts.

But at client side, there is guarantee that the request you send first will
be executed first. So as long as the read request does not return, I think
it is OK to read a value which is written by a write request which is sent
after the read request?

Thanks.

2017-04-10 9:52 GMT+08:00 杨苏立 Yang Su Li <ya...@gmail.com>:

> We are only concerned about read operations here. Are you suggesting they
> are completely idempotent?
> Are there any read-after-write conflicts?
>
> Thanks
>
> Sui
>
> On Sun, Apr 9, 2017 at 8:48 PM, 张铎(Duo Zhang) <pa...@gmail.com>
> wrote:
>
> > It depends on how you about the rpc request. For hbase, there will be no
> > write conflict, but a write operation can only be finished iff all the
> > write operations with a lower mvcc number have been finished. So if you
> > just stop a write operation without recovering the mvcc(I do not know how
> > to recover but I think you need to something...) then the writes will be
> > stuck.
> >
> > And one more thing, for read operation you may interrupt it at any time,
> > but for write operation, I do not think you can re-execute it with a new
> > mvcc number if the WAL entry has already been flushed out. That means,
> the
> > re-execution process will be different if you about the write operation
> at
> > different stages.
> >
> > Thanks.
> >
> > 2017-04-10 6:47 GMT+08:00 杨苏立 Yang Su Li <ya...@gmail.com>:
> >
> > > We are trying to implement speculative rpc handling for our workloads.
> So
> > > we want allow RPC Handler to stop executing an RPC call, put it back to
> > the
> > > queue, and later re-execute it.
> > >
> > > If at time t1, we execute and RPC call half way, aborts, and put the
> call
> > > back to the queue.
> > > Then at time t2 another RPC handler picks the call and re-execute it.
> > > I understand that we might get a different mvcc number and different
> > > results at t2 compared to we execute it at t1.
> > > My question is that: would this situation any different compared to the
> > > situation where the call was never executed at t1, and is executed at
> t2
> > > for the first time.
> > >
> > >
> > > My guess is that since at t1 we may already gotten an mvcc number, so
> it
> > > might potentially cause some write conflicts and certain write
> operations
> > > to retry. But correctness wise, is there any difference?
> > >
> > > Thanks a lot!
> > >
> > > Suli
> > >
> > >
> > > On Sun, Apr 9, 2017 at 5:14 PM, Jerry He <je...@gmail.com> wrote:
> > >
> > > > I don't know what your intention and your context are.
> > > >
> > > > You may get a different mvcc number and get different results next
> time
> > > > around if there are concurrent writes.
> > > >
> > > > Thanks,
> > > >
> > > > Jerry
> > > >
> > > > On Sun, Apr 9, 2017 at 12:48 PM 杨苏立 Yang Su Li <ya...@gmail.com>
> > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I am wondering, for read requests like Get/MultiGet/Scan, is the
> RPC
> > > > > handling idempotent in HBase?
> > > > >
> > > > > More specifically, if in the middle of RPC handling we stop the
> > > handling
> > > > > threads, puts the RPC call back to the queue, and later another RPC
> > > > Handler
> > > > > picks up this call and starts all over again, will the result be
> the
> > > same
> > > > > as if this call is being handled for the first time now? Or are
> their
> > > any
> > > > > unexpected side effects?
> > > > >
> > > > > Thanks!
> > > > >
> > > > > Suli
> > > > >
> > > > > --
> > > > > Suli Yang
> > > > >
> > > > > Department of Physics
> > > > > University of Wisconsin Madison
> > > > >
> > > > > 4257 Chamberlin Hall
> > > > > Madison WI 53703
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Suli Yang
> > >
> > > Department of Physics
> > > University of Wisconsin Madison
> > >
> > > 4257 Chamberlin Hall
> > > Madison WI 53703
> > >
> >
>
>
>
> --
> Suli Yang
>
> Department of Physics
> University of Wisconsin Madison
>
> 4257 Chamberlin Hall
> Madison WI 53703
>

Re: Is HBase RPC-Handling idempotent for reads?

Posted by 杨苏立 Yang Su Li <ya...@gmail.com>.

We are only concerned about read operations here. Are you suggesting they
are completely idempotent?
Are there any read-after-write conflicts?

Thanks

Sui

On Sun, Apr 9, 2017 at 8:48 PM, 张铎(Duo Zhang) <pa...@gmail.com> wrote:

> It depends on how you about the rpc request. For hbase, there will be no
> write conflict, but a write operation can only be finished iff all the
> write operations with a lower mvcc number have been finished. So if you
> just stop a write operation without recovering the mvcc(I do not know how
> to recover but I think you need to something...) then the writes will be
> stuck.
>
> And one more thing, for read operation you may interrupt it at any time,
> but for write operation, I do not think you can re-execute it with a new
> mvcc number if the WAL entry has already been flushed out. That means, the
> re-execution process will be different if you about the write operation at
> different stages.
>
> Thanks.
>
> 2017-04-10 6:47 GMT+08:00 杨苏立 Yang Su Li <ya...@gmail.com>:
>
> > We are trying to implement speculative rpc handling for our workloads. So
> > we want allow RPC Handler to stop executing an RPC call, put it back to
> the
> > queue, and later re-execute it.
> >
> > If at time t1, we execute and RPC call half way, aborts, and put the call
> > back to the queue.
> > Then at time t2 another RPC handler picks the call and re-execute it.
> > I understand that we might get a different mvcc number and different
> > results at t2 compared to we execute it at t1.
> > My question is that: would this situation any different compared to the
> > situation where the call was never executed at t1, and is executed at t2
> > for the first time.
> >
> >
> > My guess is that since at t1 we may already gotten an mvcc number, so it
> > might potentially cause some write conflicts and certain write operations
> > to retry. But correctness wise, is there any difference?
> >
> > Thanks a lot!
> >
> > Suli
> >
> >
> > On Sun, Apr 9, 2017 at 5:14 PM, Jerry He <je...@gmail.com> wrote:
> >
> > > I don't know what your intention and your context are.
> > >
> > > You may get a different mvcc number and get different results next time
> > > around if there are concurrent writes.
> > >
> > > Thanks,
> > >
> > > Jerry
> > >
> > > On Sun, Apr 9, 2017 at 12:48 PM 杨苏立 Yang Su Li <ya...@gmail.com>
> > wrote:
> > >
> > > > Hi,
> > > >
> > > > I am wondering, for read requests like Get/MultiGet/Scan, is the RPC
> > > > handling idempotent in HBase?
> > > >
> > > > More specifically, if in the middle of RPC handling we stop the
> > handling
> > > > threads, puts the RPC call back to the queue, and later another RPC
> > > Handler
> > > > picks up this call and starts all over again, will the result be the
> > same
> > > > as if this call is being handled for the first time now? Or are their
> > any
> > > > unexpected side effects?
> > > >
> > > > Thanks!
> > > >
> > > > Suli
> > > >
> > > > --
> > > > Suli Yang
> > > >
> > > > Department of Physics
> > > > University of Wisconsin Madison
> > > >
> > > > 4257 Chamberlin Hall
> > > > Madison WI 53703
> > > >
> > >
> >
> >
> >
> > --
> > Suli Yang
> >
> > Department of Physics
> > University of Wisconsin Madison
> >
> > 4257 Chamberlin Hall
> > Madison WI 53703
> >
>



-- 
Suli Yang

Department of Physics
University of Wisconsin Madison

4257 Chamberlin Hall
Madison WI 53703

Re: Is HBase RPC-Handling idempotent for reads?

Posted by "张铎 (Duo Zhang)" <pa...@gmail.com>.

It depends on how you about the rpc request. For hbase, there will be no
write conflict, but a write operation can only be finished iff all the
write operations with a lower mvcc number have been finished. So if you
just stop a write operation without recovering the mvcc(I do not know how
to recover but I think you need to something...) then the writes will be
stuck.

And one more thing, for read operation you may interrupt it at any time,
but for write operation, I do not think you can re-execute it with a new
mvcc number if the WAL entry has already been flushed out. That means, the
re-execution process will be different if you about the write operation at
different stages.

Thanks.

2017-04-10 6:47 GMT+08:00 杨苏立 Yang Su Li <ya...@gmail.com>:

> We are trying to implement speculative rpc handling for our workloads. So
> we want allow RPC Handler to stop executing an RPC call, put it back to the
> queue, and later re-execute it.
>
> If at time t1, we execute and RPC call half way, aborts, and put the call
> back to the queue.
> Then at time t2 another RPC handler picks the call and re-execute it.
> I understand that we might get a different mvcc number and different
> results at t2 compared to we execute it at t1.
> My question is that: would this situation any different compared to the
> situation where the call was never executed at t1, and is executed at t2
> for the first time.
>
>
> My guess is that since at t1 we may already gotten an mvcc number, so it
> might potentially cause some write conflicts and certain write operations
> to retry. But correctness wise, is there any difference?
>
> Thanks a lot!
>
> Suli
>
>
> On Sun, Apr 9, 2017 at 5:14 PM, Jerry He <je...@gmail.com> wrote:
>
> > I don't know what your intention and your context are.
> >
> > You may get a different mvcc number and get different results next time
> > around if there are concurrent writes.
> >
> > Thanks,
> >
> > Jerry
> >
> > On Sun, Apr 9, 2017 at 12:48 PM 杨苏立 Yang Su Li <ya...@gmail.com>
> wrote:
> >
> > > Hi,
> > >
> > > I am wondering, for read requests like Get/MultiGet/Scan, is the RPC
> > > handling idempotent in HBase?
> > >
> > > More specifically, if in the middle of RPC handling we stop the
> handling
> > > threads, puts the RPC call back to the queue, and later another RPC
> > Handler
> > > picks up this call and starts all over again, will the result be the
> same
> > > as if this call is being handled for the first time now? Or are their
> any
> > > unexpected side effects?
> > >
> > > Thanks!
> > >
> > > Suli
> > >
> > > --
> > > Suli Yang
> > >
> > > Department of Physics
> > > University of Wisconsin Madison
> > >
> > > 4257 Chamberlin Hall
> > > Madison WI 53703
> > >
> >
>
>
>
> --
> Suli Yang
>
> Department of Physics
> University of Wisconsin Madison
>
> 4257 Chamberlin Hall
> Madison WI 53703
>

Re: Is HBase RPC-Handling idempotent for reads?

Posted by 杨苏立 Yang Su Li <ya...@gmail.com>.

We are trying to implement speculative rpc handling for our workloads. So
we want allow RPC Handler to stop executing an RPC call, put it back to the
queue, and later re-execute it.

If at time t1, we execute and RPC call half way, aborts, and put the call
back to the queue.
Then at time t2 another RPC handler picks the call and re-execute it.
I understand that we might get a different mvcc number and different
results at t2 compared to we execute it at t1.
My question is that: would this situation any different compared to the
situation where the call was never executed at t1, and is executed at t2
for the first time.

My guess is that since at t1 we may already gotten an mvcc number, so it
might potentially cause some write conflicts and certain write operations
to retry. But correctness wise, is there any difference?

Thanks a lot!

Suli

On Sun, Apr 9, 2017 at 5:14 PM, Jerry He <je...@gmail.com> wrote:

> I don't know what your intention and your context are.
>
> You may get a different mvcc number and get different results next time
> around if there are concurrent writes.
>
> Thanks,
>
> Jerry
>
> On Sun, Apr 9, 2017 at 12:48 PM 杨苏立 Yang Su Li <ya...@gmail.com> wrote:
>
> > Hi,
> >
> > I am wondering, for read requests like Get/MultiGet/Scan, is the RPC
> > handling idempotent in HBase?
> >
> > More specifically, if in the middle of RPC handling we stop the handling
> > threads, puts the RPC call back to the queue, and later another RPC
> Handler
> > picks up this call and starts all over again, will the result be the same
> > as if this call is being handled for the first time now? Or are their any
> > unexpected side effects?
> >
> > Thanks!
> >
> > Suli
> >
> > --
> > Suli Yang
> >
> > Department of Physics
> > University of Wisconsin Madison
> >
> > 4257 Chamberlin Hall
> > Madison WI 53703
> >
>

-- 
Suli Yang

Department of Physics
University of Wisconsin Madison

4257 Chamberlin Hall
Madison WI 53703

Re: Is HBase RPC-Handling idempotent for reads?

Posted by Josh Elser <el...@apache.org>.

+1 to that one, Jerry :). I think we're missing some context, Suli.

Also, I don't know of any code path in which an RPC would be partially 
processed and then returned to the queue. Calls go from wire -> queue -> 
handler, they can't move backwards. They either move forward or throw an 
exception.

Jerry He wrote:
> I don't know what your intention and your context are.
>
> You may get a different mvcc number and get different results next time
> around if there are concurrent writes.
>
> Thanks,
>
> Jerry
>
> On Sun, Apr 9, 2017 at 12:48 PM \u6768\u82cf\u7acb Yang Su Li<ya...@gmail.com>  wrote:
>
>> Hi,
>>
>> I am wondering, for read requests like Get/MultiGet/Scan, is the RPC
>> handling idempotent in HBase?
>>
>> More specifically, if in the middle of RPC handling we stop the handling
>> threads, puts the RPC call back to the queue, and later another RPC Handler
>> picks up this call and starts all over again, will the result be the same
>> as if this call is being handled for the first time now? Or are their any
>> unexpected side effects?
>>
>> Thanks!
>>
>> Suli
>>
>> --
>> Suli Yang
>>
>> Department of Physics
>> University of Wisconsin Madison
>>
>> 4257 Chamberlin Hall
>> Madison WI 53703
>>
>

Re: Is HBase RPC-Handling idempotent for reads?

Posted by Jerry He <je...@gmail.com>.

I don't know what your intention and your context are.

You may get a different mvcc number and get different results next time
around if there are concurrent writes.

Thanks,

Jerry

On Sun, Apr 9, 2017 at 12:48 PM 杨苏立 Yang Su Li <ya...@gmail.com> wrote:

> Hi,
>
> I am wondering, for read requests like Get/MultiGet/Scan, is the RPC
> handling idempotent in HBase?
>
> More specifically, if in the middle of RPC handling we stop the handling
> threads, puts the RPC call back to the queue, and later another RPC Handler
> picks up this call and starts all over again, will the result be the same
> as if this call is being handled for the first time now? Or are their any
> unexpected side effects?
>
> Thanks!
>
> Suli
>
> --
> Suli Yang
>
> Department of Physics
> University of Wisconsin Madison
>
> 4257 Chamberlin Hall
> Madison WI 53703
>