You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@accumulo.apache.org by z11373 <z1...@outlook.com> on 2016/03/16 21:33:53 UTC

delete + insert case

Hi,
I have object abstraction class which delete/add operation will eventually
translate to calling Accumulo writer.putDelete and writer.put
To achieve higher throughput, the code will only call writer.flush per
request (my implementation knows when it's end of request), instead of
flushing per each delete or add operation.
In this case we have client request calling my service which for example
would be:
1. delete A
2. add A
3. add B

I'd expect the end result would be both row id A and B exists in the table,
but apparently it's only B. I already checked from the log, the order the
code being executed is delete first before add operation. However, I guess
since I call flush after all putDelete and put calls being made, Accumulo
somehow make putDelete 'win' (in same flush cycle), is that correct? If yes,
how to workaround this without sacrificing performance.


Thanks,
Z



--
View this message in context: http://apache-accumulo.1065345.n5.nabble.com/delete-insert-case-tp16375.html
Sent from the Developers mailing list archive at Nabble.com.

Re: delete + insert case

Posted by z11373 <z1...@outlook.com>.

Thanks Keith!
Though I am not 100% liking this idea (given I shouldn't manually determine
the order of delete/insert submission), but this seems will work for our
case, because semantically it should never be case which insert comes before
delete.
I agree, using current timestamp is not safe in distributed system, and
debugging the issue later could be quite hard too :-(


Thanks,
Z



--
View this message in context: http://apache-accumulo.1065345.n5.nabble.com/delete-insert-case-tp16375p16387.html
Sent from the Developers mailing list archive at Nabble.com.

Re: delete + insert case

Posted by Keith Turner <ke...@deenlo.com>.

On Thu, Mar 17, 2016 at 11:40 AM, z11373 <z1...@outlook.com> wrote:

> Hi Josh,
> What I meant by better throughput is less times of calling flush().
> My service will receive user request, which is query string contains
> DELETE/INSERT statements, which will be translated to Accumulo delete/add
> operations. I used to call flush() for each operation, and that hurts
> performance. Luckily, by end of request, a callback function will be
> called,
> and there is where I make Accumulo flush call, and perf is improved,
> especially if the request generates Accumulo write operations.
>

If you had two batch writers (one for deletes and one for insert) then you
could flush them both in your callback.  Always flush the deletes batch
writer first.  Then you do not need to do it per statement.

I wouldn't recommend using System.currentTime because time can go backwards
in a distributed system.  When the time stamp goes backwards this can cause
newer updates to fall behind older ones.  When Accumulo sets timestamps it
goes through some trouble to ensure that time does not go backwards.

>
> Argh... System.currentTimeMillis() not really working, in my test, the add
> operation got same timestamp value as prior delete operation (addMutation
> is
> so fast!), I think I should use System.nanoTime()?
>
>
> Thanks,
> Z
>
>
>
> --
> View this message in context:
> http://apache-accumulo.1065345.n5.nabble.com/delete-insert-case-tp16375p16384.html
> Sent from the Developers mailing list archive at Nabble.com.
>

Re: delete + insert case

Posted by z11373 <z1...@outlook.com>.

Hi Josh,
What I meant by better throughput is less times of calling flush().
My service will receive user request, which is query string contains
DELETE/INSERT statements, which will be translated to Accumulo delete/add
operations. I used to call flush() for each operation, and that hurts
performance. Luckily, by end of request, a callback function will be called,
and there is where I make Accumulo flush call, and perf is improved,
especially if the request generates Accumulo write operations.

Argh... System.currentTimeMillis() not really working, in my test, the add
operation got same timestamp value as prior delete operation (addMutation is
so fast!), I think I should use System.nanoTime()?


Thanks,
Z



--
View this message in context: http://apache-accumulo.1065345.n5.nabble.com/delete-insert-case-tp16375p16384.html
Sent from the Developers mailing list archive at Nabble.com.

Re: delete + insert case

Posted by William Slacum <ws...@gmail.com>.

Be aware of the OS's underlying granularity for time as well:

http://docs.oracle.com/javase/6/docs/api/java/lang/System.html#currentTimeMillis%28%29

I almost wonder if it's better to use the RowDeletingIterator in this case.
If the check it does is "if TS < delete marker TS", in theory you could get
away with putting the delete marker inside the same Mutation as the update
and the iterator will mask any data marked with a TS before the delete
marker.

On Thu, Mar 17, 2016 at 11:18 AM, Josh Elser <jo...@gmail.com> wrote:

> Server-assigned timestamps aren't noticeably slower than user-assigned
> timestamps, if that's what you're referring to WRT throughput.
>
> As for using currentTimeMillis(), probably fine, but not always.
>
> 1) NTP updates might cause currentTimeMillis() to change in reverse
> 2) You need to make sure the delete and update always come from the same
> host (otherwise two hosts might have different values for
> currentTimeMillis())
>
> Time is hard in distributed systems.
>
>
> z11373 wrote:
>
>> Thanks Josh! For better throughput, I think I'd just assign the timestamp
>> from my code.
>> Using this code, System.currentTimeMillis(); for timestamp should be ok,
>> right?
>>
>>
>> Thanks,
>> Z
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-accumulo.1065345.n5.nabble.com/delete-insert-case-tp16375p16382.html
>> Sent from the Developers mailing list archive at Nabble.com.
>>
>

Re: delete + insert case

Posted by Josh Elser <jo...@gmail.com>.

Server-assigned timestamps aren't noticeably slower than user-assigned 
timestamps, if that's what you're referring to WRT throughput.

As for using currentTimeMillis(), probably fine, but not always.

1) NTP updates might cause currentTimeMillis() to change in reverse
2) You need to make sure the delete and update always come from the same 
host (otherwise two hosts might have different values for 
currentTimeMillis())

Time is hard in distributed systems.

z11373 wrote:
> Thanks Josh! For better throughput, I think I'd just assign the timestamp
> from my code.
> Using this code, System.currentTimeMillis(); for timestamp should be ok,
> right?
>
>
> Thanks,
> Z
>
>
>
>
> --
> View this message in context: http://apache-accumulo.1065345.n5.nabble.com/delete-insert-case-tp16375p16382.html
> Sent from the Developers mailing list archive at Nabble.com.

Re: delete + insert case

Posted by z11373 <z1...@outlook.com>.

Thanks Josh! For better throughput, I think I'd just assign the timestamp
from my code.
Using this code, System.currentTimeMillis(); for timestamp should be ok,
right?


Thanks,
Z




--
View this message in context: http://apache-accumulo.1065345.n5.nabble.com/delete-insert-case-tp16375p16382.html
Sent from the Developers mailing list archive at Nabble.com.

Re: delete + insert case

Posted by Josh Elser <jo...@gmail.com>.

Server-assigned timestamps are done per-batch. This is getting back to 
what Keith suggested. It's not that Accumulo isn't "setting the 
timestamp properly" like you suggest, this is just how server-assigned 
time works.

If you submit a delete and an update, without timestamps, in the same 
batch (without a flush inbetween) they'll get the same timestamp, and 
the delete will override the update.

tl;dr Use Keith's approach if you can't compute an increasing value for 
the timestamp.

z11373 wrote:
> Thanks Keith/Josh!
>
> @Josh: the client API I use in my code is the one without passing that long
> timestamp, so Accumulo should assign the timestamp in time ordered manner,
> right?
>  From my service app log file, I see the add always being called after
> delete, so it should work if Accumulo set the timestamp properly, but there
> could be possibility they are assigned same timestamp, hence the behavior is
> not predictable?
> I prefer not to assign timestamp from my code if possible.
>
>
> Thanks,
> Z
>
>
>
> --
> View this message in context: http://apache-accumulo.1065345.n5.nabble.com/delete-insert-case-tp16375p16379.html
> Sent from the Developers mailing list archive at Nabble.com.

Re: delete + insert case

Posted by z11373 <z1...@outlook.com>.

Thanks Keith/Josh!

@Josh: the client API I use in my code is the one without passing that long
timestamp, so Accumulo should assign the timestamp in time ordered manner,
right?
>From my service app log file, I see the add always being called after
delete, so it should work if Accumulo set the timestamp properly, but there
could be possibility they are assigned same timestamp, hence the behavior is
not predictable?
I prefer not to assign timestamp from my code if possible.


Thanks,
Z



--
View this message in context: http://apache-accumulo.1065345.n5.nabble.com/delete-insert-case-tp16375p16379.html
Sent from the Developers mailing list archive at Nabble.com.

Re: delete + insert case

Posted by Josh Elser <jo...@gmail.com>.

Just clarified with Keith in IRC (because I wasn't positive)

This approach will work if you want Accumulo to assign timestamps (e.g. 
not specify them at all in the client). If you can manage that yourself, 
you can try what I suggested in the other message.

Keith Turner wrote:
> There are no order guarantees for two mutations added prior to flush being
> called.   One possible solution it to have two batch writers.  One for
> deletes and flush it first.
>
> On Wed, Mar 16, 2016 at 4:33 PM, z11373<z1...@outlook.com>  wrote:
>
>> Hi,
>> I have object abstraction class which delete/add operation will eventually
>> translate to calling Accumulo writer.putDelete and writer.put
>> To achieve higher throughput, the code will only call writer.flush per
>> request (my implementation knows when it's end of request), instead of
>> flushing per each delete or add operation.
>> In this case we have client request calling my service which for example
>> would be:
>> 1. delete A
>> 2. add A
>> 3. add B
>>
>> I'd expect the end result would be both row id A and B exists in the table,
>> but apparently it's only B. I already checked from the log, the order the
>> code being executed is delete first before add operation. However, I guess
>> since I call flush after all putDelete and put calls being made, Accumulo
>> somehow make putDelete 'win' (in same flush cycle), is that correct? If
>> yes,
>> how to workaround this without sacrificing performance.
>>
>>
>> Thanks,
>> Z
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-accumulo.1065345.n5.nabble.com/delete-insert-case-tp16375.html
>> Sent from the Developers mailing list archive at Nabble.com.
>>
>

Re: delete + insert case

Posted by Keith Turner <ke...@deenlo.com>.

There are no order guarantees for two mutations added prior to flush being
called.   One possible solution it to have two batch writers.  One for
deletes and flush it first.

On Wed, Mar 16, 2016 at 4:33 PM, z11373 <z1...@outlook.com> wrote:

> Hi,
> I have object abstraction class which delete/add operation will eventually
> translate to calling Accumulo writer.putDelete and writer.put
> To achieve higher throughput, the code will only call writer.flush per
> request (my implementation knows when it's end of request), instead of
> flushing per each delete or add operation.
> In this case we have client request calling my service which for example
> would be:
> 1. delete A
> 2. add A
> 3. add B
>
> I'd expect the end result would be both row id A and B exists in the table,
> but apparently it's only B. I already checked from the log, the order the
> code being executed is delete first before add operation. However, I guess
> since I call flush after all putDelete and put calls being made, Accumulo
> somehow make putDelete 'win' (in same flush cycle), is that correct? If
> yes,
> how to workaround this without sacrificing performance.
>
>
> Thanks,
> Z
>
>
>
> --
> View this message in context:
> http://apache-accumulo.1065345.n5.nabble.com/delete-insert-case-tp16375.html
> Sent from the Developers mailing list archive at Nabble.com.
>

Re: delete + insert case

Posted by Josh Elser <jo...@gmail.com>.

Make sure that your insert has a newer timestamp than the delete does. 
Otherwise, the delete will mask any inserts with smaller timestamps 
until it is compacted away (which is essentially an unknown to you as a 
client).

e.g.

1. delete A ts=5
2. add A ts=6
3. add B ts=whatever

z11373 wrote:
> Hi,
> I have object abstraction class which delete/add operation will eventually
> translate to calling Accumulo writer.putDelete and writer.put
> To achieve higher throughput, the code will only call writer.flush per
> request (my implementation knows when it's end of request), instead of
> flushing per each delete or add operation.
> In this case we have client request calling my service which for example
> would be:
> 1. delete A
> 2. add A
> 3. add B
>
> I'd expect the end result would be both row id A and B exists in the table,
> but apparently it's only B. I already checked from the log, the order the
> code being executed is delete first before add operation. However, I guess
> since I call flush after all putDelete and put calls being made, Accumulo
> somehow make putDelete 'win' (in same flush cycle), is that correct? If yes,
> how to workaround this without sacrificing performance.
>
>
> Thanks,
> Z
>
>
>
> --
> View this message in context: http://apache-accumulo.1065345.n5.nabble.com/delete-insert-case-tp16375.html
> Sent from the Developers mailing list archive at Nabble.com.