You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by David Smiley <ds...@apache.org> on 2020/12/22 16:00:05 UTC

Shared Storage -- BlobDirectory, SOLR-15051

Hello,

There's lots of exciting work going on in Solr at the moment, judging from
the SIPs & some JIRA issues. I want to draw attention to my proposal for
shared storage in SolrCloud that I call "BlobDirectory" -- SOLR-15051 [1].
It has a linked proposal document[2] in Google Docs. If any of you have
comments / concerns on the design, now is a good time to share them. I
expect to share a very early draft WIP PR today, containing an early form
of only some of the components. I'll repeat the issue description here:

[1] https://issues.apache.org/jira/browse/SOLR-15051
[2]
https://docs.google.com/document/d/1kjQPK80sLiZJyRjek_Edhokfc5q9S3ISvFRM2_YeL8M/edit?usp=sharing
----

This proposal is a way to accomplish shared storage in SolrCloud with a few
key characteristics: (A) using a Directory implementation, (B) delegates to
a backing local file Directory as a kind of read/write cache (C) replicas
have their own "space", (D) , de-duplication across replicas via reference
counting, (E) uses ZK but separately from SolrCloud stuff.

The Directory abstraction is a good one, and helps isolate shared storage
from the rest of SolrCloud that doesn't care. Using a backing normal file
Directory is faster for reads and is simpler than Solr's HDFSDirectory's
BlockCache. Replicas having their own space solves the problem of multiple
writers (e.g. of the same shard) trying to own and write to the same space,
and it implies that any of Solr's replica types can be used along with what
goes along with them like peer-to-peer replication (sometimes
faster/cheaper than pulling from shared storage). A de-duplication feature
solves needless duplication of files across replicas and from parent shards
(i.e. from shard splitting). The de-duplication feature requires a place
to cache directory listings so that they can be shared across replicas and
atomically updated; this is handled via ZooKeeper. Finally, some sort of
Solr daemon / auto-scaling code should be added to implement
"autoAddReplicas", especially to provide for a scenario where the leader is
gone and can't be replicated from directly but we can access shared storage.

For more about shared storage concepts, consider looking at the description
in SOLR-13101 <https://issues.apache.org/jira/browse/SOLR-13101> and the
linked Google Doc.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley

Re: Shared Storage -- BlobDirectory, SOLR-15051

Posted by David Smiley <ds...@apache.org>.

Glad to hear there is interest!  Atri Sharma intends to start helping as
soon as there is code to show, which is today.  The part of it that I think
might be most subject to feedback is file listing tracking w/ dedupe... so
I'll go slow there knowing there is feedback pending.

Ishan: yeah I saw your update on Twitter about chess.  I recently finished
watching The Queen's Gambit on Netflix, and it was such a fantastic show
that it has gotten me a little more interested in chess too.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Tue, Dec 22, 2020 at 11:55 AM Ishan Chattopadhyaya <
ichattopadhyaya@gmail.com> wrote:

> Thanks for looking at this problem, David. I have some thoughts and ideas
> around the same, but I'll be in a better position to comment after the
> holidays. Focusing on chess these days 😊
>
> On Tue, 22 Dec, 2020, 10:17 pm Mike Drob, <md...@apache.org> wrote:
>
>> Hi David,
>>
>> Thanks for sharing. I am sure I will have thoughts on this, but won’t be
>> able to substantively comment until January. Just letting you know that
>> there is interest and not to be discouraged if you get only silence for a
>> while.
>>
>> Hopefully others will look and comment as well.
>>
>> Mike
>>
>> On Tue, Dec 22, 2020 at 10:00 AM David Smiley <ds...@apache.org> wrote:
>>
>>> Hello,
>>>
>>> There's lots of exciting work going on in Solr at the moment, judging
>>> from the SIPs & some JIRA issues.  I want to draw attention to my proposal
>>> for shared storage in SolrCloud that I call "BlobDirectory" -- SOLR-15051
>>> [1].  It has a linked proposal document[2] in Google Docs.  If any of you
>>> have comments / concerns on the design, now is a good time to share them.
>>> I expect to share a very early draft WIP PR today, containing an early form
>>> of only some of the components.  I'll repeat the issue description here:
>>>
>>> [1] https://issues.apache.org/jira/browse/SOLR-15051
>>> [2]
>>> https://docs.google.com/document/d/1kjQPK80sLiZJyRjek_Edhokfc5q9S3ISvFRM2_YeL8M/edit?usp=sharing
>>> ----
>>>
>>> This proposal is a way to accomplish shared storage in SolrCloud with a
>>> few key characteristics: (A) using a Directory implementation, (B)
>>> delegates to a backing local file Directory as a kind of read/write cache
>>> (C) replicas have their own "space", (D) , de-duplication across replicas
>>> via reference counting, (E) uses ZK but separately from SolrCloud stuff.
>>>
>>> The Directory abstraction is a good one, and helps isolate shared
>>> storage from the rest of SolrCloud that doesn't care.  Using a backing
>>> normal file Directory is faster for reads and is simpler than Solr's
>>> HDFSDirectory's BlockCache.  Replicas having their own space solves the
>>> problem of multiple writers (e.g. of the same shard) trying to own and
>>> write to the same space, and it implies that any of Solr's replica types
>>> can be used along with what goes along with them like peer-to-peer
>>> replication (sometimes faster/cheaper than pulling from shared storage).  A
>>> de-duplication feature solves needless duplication of files across replicas
>>> and from parent shards (i.e. from shard splitting).  The de-duplication
>>> feature requires a place to cache directory listings so that they can be
>>> shared across replicas and atomically updated; this is handled via
>>> ZooKeeper.  Finally, some sort of Solr daemon / auto-scaling code should be
>>> added to implement "autoAddReplicas", especially to provide for a scenario
>>> where the leader is gone and can't be replicated from directly but we can
>>> access shared storage.
>>>
>>> For more about shared storage concepts, consider looking at the
>>> description in SOLR-13101
>>> <https://issues.apache.org/jira/browse/SOLR-13101> and the linked
>>> Google Doc.
>>>
>>> ~ David Smiley
>>> Apache Lucene/Solr Search Developer
>>> http://www.linkedin.com/in/davidwsmiley
>>>
>>

Re: Shared Storage -- BlobDirectory, SOLR-15051

Posted by Ishan Chattopadhyaya <ic...@gmail.com>.

Thanks for looking at this problem, David. I have some thoughts and ideas
around the same, but I'll be in a better position to comment after the
holidays. Focusing on chess these days 😊

On Tue, 22 Dec, 2020, 10:17 pm Mike Drob, <md...@apache.org> wrote:

> Hi David,
>
> Thanks for sharing. I am sure I will have thoughts on this, but won’t be
> able to substantively comment until January. Just letting you know that
> there is interest and not to be discouraged if you get only silence for a
> while.
>
> Hopefully others will look and comment as well.
>
> Mike
>
> On Tue, Dec 22, 2020 at 10:00 AM David Smiley <ds...@apache.org> wrote:
>
>> Hello,
>>
>> There's lots of exciting work going on in Solr at the moment, judging
>> from the SIPs & some JIRA issues.  I want to draw attention to my proposal
>> for shared storage in SolrCloud that I call "BlobDirectory" -- SOLR-15051
>> [1].  It has a linked proposal document[2] in Google Docs.  If any of you
>> have comments / concerns on the design, now is a good time to share them.
>> I expect to share a very early draft WIP PR today, containing an early form
>> of only some of the components.  I'll repeat the issue description here:
>>
>> [1] https://issues.apache.org/jira/browse/SOLR-15051
>> [2]
>> https://docs.google.com/document/d/1kjQPK80sLiZJyRjek_Edhokfc5q9S3ISvFRM2_YeL8M/edit?usp=sharing
>> ----
>>
>> This proposal is a way to accomplish shared storage in SolrCloud with a
>> few key characteristics: (A) using a Directory implementation, (B)
>> delegates to a backing local file Directory as a kind of read/write cache
>> (C) replicas have their own "space", (D) , de-duplication across replicas
>> via reference counting, (E) uses ZK but separately from SolrCloud stuff.
>>
>> The Directory abstraction is a good one, and helps isolate shared storage
>> from the rest of SolrCloud that doesn't care.  Using a backing normal file
>> Directory is faster for reads and is simpler than Solr's HDFSDirectory's
>> BlockCache.  Replicas having their own space solves the problem of multiple
>> writers (e.g. of the same shard) trying to own and write to the same space,
>> and it implies that any of Solr's replica types can be used along with what
>> goes along with them like peer-to-peer replication (sometimes
>> faster/cheaper than pulling from shared storage).  A de-duplication feature
>> solves needless duplication of files across replicas and from parent shards
>> (i.e. from shard splitting).  The de-duplication feature requires a place
>> to cache directory listings so that they can be shared across replicas and
>> atomically updated; this is handled via ZooKeeper.  Finally, some sort of
>> Solr daemon / auto-scaling code should be added to implement
>> "autoAddReplicas", especially to provide for a scenario where the leader is
>> gone and can't be replicated from directly but we can access shared storage.
>>
>> For more about shared storage concepts, consider looking at the
>> description in SOLR-13101
>> <https://issues.apache.org/jira/browse/SOLR-13101> and the linked Google
>> Doc.
>>
>> ~ David Smiley
>> Apache Lucene/Solr Search Developer
>> http://www.linkedin.com/in/davidwsmiley
>>
>

Re: Shared Storage -- BlobDirectory, SOLR-15051

Posted by Mike Drob <md...@apache.org>.

Hi David,

Thanks for sharing. I am sure I will have thoughts on this, but won’t be
able to substantively comment until January. Just letting you know that
there is interest and not to be discouraged if you get only silence for a
while.

Hopefully others will look and comment as well.

Mike

On Tue, Dec 22, 2020 at 10:00 AM David Smiley <ds...@apache.org> wrote:

> Hello,
>
> There's lots of exciting work going on in Solr at the moment, judging from
> the SIPs & some JIRA issues.  I want to draw attention to my proposal for
> shared storage in SolrCloud that I call "BlobDirectory" -- SOLR-15051 [1].
> It has a linked proposal document[2] in Google Docs.  If any of you have
> comments / concerns on the design, now is a good time to share them.  I
> expect to share a very early draft WIP PR today, containing an early form
> of only some of the components.  I'll repeat the issue description here:
>
> [1] https://issues.apache.org/jira/browse/SOLR-15051
> [2]
> https://docs.google.com/document/d/1kjQPK80sLiZJyRjek_Edhokfc5q9S3ISvFRM2_YeL8M/edit?usp=sharing
> ----
>
> This proposal is a way to accomplish shared storage in SolrCloud with a
> few key characteristics: (A) using a Directory implementation, (B)
> delegates to a backing local file Directory as a kind of read/write cache
> (C) replicas have their own "space", (D) , de-duplication across replicas
> via reference counting, (E) uses ZK but separately from SolrCloud stuff.
>
> The Directory abstraction is a good one, and helps isolate shared storage
> from the rest of SolrCloud that doesn't care.  Using a backing normal file
> Directory is faster for reads and is simpler than Solr's HDFSDirectory's
> BlockCache.  Replicas having their own space solves the problem of multiple
> writers (e.g. of the same shard) trying to own and write to the same space,
> and it implies that any of Solr's replica types can be used along with what
> goes along with them like peer-to-peer replication (sometimes
> faster/cheaper than pulling from shared storage).  A de-duplication feature
> solves needless duplication of files across replicas and from parent shards
> (i.e. from shard splitting).  The de-duplication feature requires a place
> to cache directory listings so that they can be shared across replicas and
> atomically updated; this is handled via ZooKeeper.  Finally, some sort of
> Solr daemon / auto-scaling code should be added to implement
> "autoAddReplicas", especially to provide for a scenario where the leader is
> gone and can't be replicated from directly but we can access shared storage.
>
> For more about shared storage concepts, consider looking at the
> description in SOLR-13101
> <https://issues.apache.org/jira/browse/SOLR-13101> and the linked Google
> Doc.
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>