You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@accumulo.apache.org by Keith Turner <ke...@deenlo.com> on 2016/11/02 19:35:13 UTC

New Accumulo Blog Post

I just posted a new blog post for Accumulo about configuring
durability settings and durability performance impact.

http://accumulo.apache.org/blog/2016/11/02/durability-performance.html

Re: New Accumulo Blog Post

Posted by Christopher <ct...@apache.org>.

I believe the work is done, or nearly done. I was coordinating with Mike
Walch off list to prepare the code, before it's officially submitted as a
patch to the Apache project. I've asked him to give me a chance to review
it before it gets submitted.

If you'd like to take a preview, you can see it in this branch:
https://github.com/mikewalch/accumulo/tree/volume-chooser

I'd definitely like it to be a blocker for 2.0.0. I think it's an essential
feature.

On Tue, Dec 20, 2016 at 3:00 PM Jeff Kubina <je...@gmail.com> wrote:

> Chris,
>
> Any status on the patch to Accumulo to allow customizing the HDFS volume
> on which the WALs are stored.
>
>
> --
> Jeff Kubina
> 410-988-4436 <(410)%20988-4436>
>
>
> On Wed, Nov 2, 2016 at 10:34 PM, Christopher <ct...@apache.org> wrote:
>
> I'm aware of at least one person who has patched Accumulo to allow
> customizing the HDFS volume on which the WALs are stored. This reminds me
> that I need to check on the status of that patch. I'm hoping it'll be
> contributed soon.
>
> I'm also curious if it'd make a difference writing to HDFS with the data
> nodes mounted with sync, instead of doing a separate sync call.
>
> On Wed, Nov 2, 2016 at 9:49 PM <dl...@comcast.net> wrote:
>
> Regarding #2 – I think there are two options here:
>
>
>
> 1. Modify Accumulo to take advantage of HDFS Heterogeneous Storage
>
> 2. Modify Accumulo WAL code to support volumes
>
>
>
> *From:* Jeff Kubina [mailto:jeff.kubina@gmail.com]
> *Sent:* Wednesday, November 02, 2016 9:02 PM
> *To:* user@accumulo.apache.org
> *Subject:* Re: New Accumulo Blog Post
>
>
>
> Thanks for the blog post, very interesting read. Some questions ...
>
>
>
> 1. Are the operations "Writes mutation to tablet servers’ WAL/Sync or
> flush tablet servers’ WAL" and "Adds mutations to sorted in memory map of
> each tablet." performed by threads in parallel?
>
>
>
> 2. Could the latency of hsync-ing the WALs be overcome by modifying
> Accumulo to write them to a separate SSD-only HDFS? To maintain data
> locality it would require two datanode processes (one for the HDDs and one
> for the SSD), running on the same node, which is not hard to do.
>
>
>
>
> --
Christopher

Re: New Accumulo Blog Post

Posted by Jeff Kubina <je...@gmail.com>.

Chris,

Any status on the patch to Accumulo to allow customizing the HDFS volume on
which the WALs are stored.


-- 
Jeff Kubina
410-988-4436


On Wed, Nov 2, 2016 at 10:34 PM, Christopher <ct...@apache.org> wrote:

> I'm aware of at least one person who has patched Accumulo to allow
> customizing the HDFS volume on which the WALs are stored. This reminds me
> that I need to check on the status of that patch. I'm hoping it'll be
> contributed soon.
>
> I'm also curious if it'd make a difference writing to HDFS with the data
> nodes mounted with sync, instead of doing a separate sync call.
>
> On Wed, Nov 2, 2016 at 9:49 PM <dl...@comcast.net> wrote:
>
>> Regarding #2 – I think there are two options here:
>>
>>
>>
>> 1. Modify Accumulo to take advantage of HDFS Heterogeneous Storage
>>
>> 2. Modify Accumulo WAL code to support volumes
>>
>>
>>
>> *From:* Jeff Kubina [mailto:jeff.kubina@gmail.com]
>> *Sent:* Wednesday, November 02, 2016 9:02 PM
>> *To:* user@accumulo.apache.org
>> *Subject:* Re: New Accumulo Blog Post
>>
>>
>>
>> Thanks for the blog post, very interesting read. Some questions ...
>>
>>
>>
>> 1. Are the operations "Writes mutation to tablet servers’ WAL/Sync or
>> flush tablet servers’ WAL" and "Adds mutations to sorted in memory map of
>> each tablet." performed by threads in parallel?
>>
>>
>>
>> 2. Could the latency of hsync-ing the WALs be overcome by modifying
>> Accumulo to write them to a separate SSD-only HDFS? To maintain data
>> locality it would require two datanode processes (one for the HDDs and one
>> for the SSD), running on the same node, which is not hard to do.
>>
>>
>>
>

Re: New Accumulo Blog Post

Posted by Christopher <ct...@apache.org>.

I'm aware of at least one person who has patched Accumulo to allow
customizing the HDFS volume on which the WALs are stored. This reminds me
that I need to check on the status of that patch. I'm hoping it'll be
contributed soon.

I'm also curious if it'd make a difference writing to HDFS with the data
nodes mounted with sync, instead of doing a separate sync call.

On Wed, Nov 2, 2016 at 9:49 PM <dl...@comcast.net> wrote:

> Regarding #2 – I think there are two options here:
>
>
>
> 1. Modify Accumulo to take advantage of HDFS Heterogeneous Storage
>
> 2. Modify Accumulo WAL code to support volumes
>
>
>
> *From:* Jeff Kubina [mailto:jeff.kubina@gmail.com]
> *Sent:* Wednesday, November 02, 2016 9:02 PM
> *To:* user@accumulo.apache.org
> *Subject:* Re: New Accumulo Blog Post
>
>
>
> Thanks for the blog post, very interesting read. Some questions ...
>
>
>
> 1. Are the operations "Writes mutation to tablet servers’ WAL/Sync or
> flush tablet servers’ WAL" and "Adds mutations to sorted in memory map of
> each tablet." performed by threads in parallel?
>
>
>
> 2. Could the latency of hsync-ing the WALs be overcome by modifying
> Accumulo to write them to a separate SSD-only HDFS? To maintain data
> locality it would require two datanode processes (one for the HDDs and one
> for the SSD), running on the same node, which is not hard to do.
>
>
>

RE: New Accumulo Blog Post

Posted by dl...@comcast.net.

Regarding #2 – I think there are two options here:

 

1. Modify Accumulo to take advantage of HDFS Heterogeneous Storage

2. Modify Accumulo WAL code to support volumes

 

From: Jeff Kubina [mailto:jeff.kubina@gmail.com] 
Sent: Wednesday, November 02, 2016 9:02 PM
To: user@accumulo.apache.org
Subject: Re: New Accumulo Blog Post

 

Thanks for the blog post, very interesting read. Some questions ...

 

1. Are the operations "Writes mutation to tablet servers’ WAL/Sync or flush tablet servers’ WAL" and "Adds mutations to sorted in memory map of each tablet." performed by threads in parallel?

 

2. Could the latency of hsync-ing the WALs be overcome by modifying Accumulo to write them to a separate SSD-only HDFS? To maintain data locality it would require two datanode processes (one for the HDDs and one for the SSD), running on the same node, which is not hard to do.

Re: New Accumulo Blog Post

Posted by Keith Turner <ke...@deenlo.com>.

On Wed, Nov 2, 2016 at 9:01 PM, Jeff Kubina <je...@gmail.com> wrote:
> Thanks for the blog post, very interesting read. Some questions ...
>
> 1. Are the operations "Writes mutation to tablet servers’ WAL/Sync or flush
> tablet servers’ WAL" and "Adds mutations to sorted in memory map of each
> tablet." performed by threads in parallel?

Not for a batch from a client.  Each batch of mutations from a client
is processed by a single thread.   Batches from separate clients can
process in parallel.

>
> 2. Could the latency of hsync-ing the WALs be overcome by modifying Accumulo
> to write them to a separate SSD-only HDFS? To maintain data locality it
> would require two datanode processes (one for the HDDs and one for the SSD),
> running on the same node, which is not hard to do.

Christopher answered this.  I just wanted to mention that I have run
Accumulo on my laptop which has a SSD.  Hsync is much faster there,
like 10x faster.

>

Re: New Accumulo Blog Post

Posted by Jeff Kubina <je...@gmail.com>.

Thanks for the blog post, very interesting read. Some questions ...

1. Are the operations "Writes mutation to tablet servers’ WAL/Sync or flush
tablet servers’ WAL" and "Adds mutations to sorted in memory map of each
tablet." performed by threads in parallel?

2. Could the latency of hsync-ing the WALs be overcome by modifying
Accumulo to write them to a separate SSD-only HDFS? To maintain data
locality it would require two datanode processes (one for the HDDs and one
for the SSD), running on the same node, which is not hard to do.