You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Uma Maheswara Rao G (Jira)" <ji...@apache.org> on 2020/12/02 05:56:00 UTC
[jira] [Commented] (HDDS-3816) Erasure Coding in Apache Hadoop Ozone

    [ https://issues.apache.org/jira/browse/HDDS-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242065#comment-17242065 ] 

Uma Maheswara Rao G commented on HDDS-3816:
-------------------------------------------

Thanks a lot [~elek] for taking a look again. And thanks for additional docs.
{quote}
And one additional note: for me reusing the last 5 bits of container id is a non-backward compatible change, even if we can handle it (as our implementation assigns incremental container ids). With dedicated field we don't have any headache with the upgrade.
 \{quote}

It is backward compatible. Not sure if there are misunderstandings there. The current container IDs are already incremental. So, generating group ID would be backward compatible.

{quote}
Uploaded "Ozone EC Container groups and instances.pdf" which describes my problem with the bitmasking.
{quote}

Not sure if there are some misunderstandings on the approach. Otherwise bitmasking is backward compatible. 
Points from your [docs| https://issues.apache.org/jira/secure/attachment/13016210/Ozone%20EC%20Container%20groups%20and%20instances.pdf]:
1) Type-safe usage of ContainerId/InstanceID at any place: 
 It equally easy that you call a util method to get index. Just one liner & operation.
2) Easier implementation: We don’t need to add any new masking to the OM side, everything can work as before.
  We don't need any code in OM. OM works as before. It's SCM generates container IDs.
3)Easy to follow the call hierarchy: 
I am not very sure, whats the hierarchy means here. Where u used getIndex API, the same place you would use util.getIndex. 
4)Consistent implementation style 
I am not aware if there are guidelines on this ID pattern before.

In general per my understanding both are solving same thing here. one is carrying index within container id, in other approach that index is separated as new field in container and packed via protobuff ( for non ec containers that field does not have any meaning, which should be just null). I am open to consider alternatives if there are real concerns (ex: if people are feeling it as complex etc.) 

{quote}
I believe that SCM is not interested to know anything about missing blocks. Scm is the manager of the containers. The easiest approach seems to use your proposal:
{quote} 
My bad here. I meant DN report container as corrupt as some blocks missed (identified by scanners at DN). I think this is existing part already in DN. By marking numChunks(=SID) 0/-1, when co-ordinator download blocks, that missing blockID chunks nums  would be 0, so it can find such blocks to go for recovery.

{quote}
which metadata is required to store on block / container / chunk level and which should be stored on key level.
{quote}
Since we plan to support the granularity at bucket level, we don't need any special info at key level at this stage. Block level meta informations so far are , numChunks(already have), blockGroupLength. This BlockGroupLength only to find the boundaries accurately when co-ordinator doing recovery.

{quote}
We need to handle different containers (Raits vs. EC) in a separated way, anyway. (Storage-class proposes the same: just us a generic string, the storage-class name). With cluster-wide setting we need to explain how can Ratis and EC work together. Do we plan to introduce new replication type as RATIS without with out the factor parameter?{quote}
The simplest way I am thinking is to store the EC policy details at bucket level similar to encryptionInfo. So, that when opening key, we can get that info and let clients decide whether it's EC or not. As part of Storage classes, we can do modifications as needed if required to support at key level as storage classes tries to define that. 


{quote}
 But keeping the entry of the deleted containers or using some seq can solve this problem, just a question which should be defined.
{quote}
I agree. Thanks for the thoughts.

{quote}
I am fine with closing containers but I am interested why do we need it. For Ratis, it's clean: ratis ring is expensive and we don't need for all the closed containers. But EC seems to be cheap and I am not sure (yet) what is the benefit to make EC containers closed.
{quote}
Closed is a state where container transitioned to immutable. To avoid growing forever, we use the same state in EC as well and can mark containers as closed. Not sure if we need to call immutable EC state with some other name. 
Once containers reached to immutable state, it's safe to start recovery process as blocks will not be modified. If we need to do aggressive reconstruction, probably we need additional tweaking like  DN should self mark blocks as finalized. Otherwise co-ordinator will not have clear idea whether that blocks are still in progress writes etc. 

 

 

> Erasure Coding in Apache Hadoop Ozone
> -------------------------------------
>
>                 Key: HDDS-3816
>                 URL: https://issues.apache.org/jira/browse/HDDS-3816
>             Project: Hadoop Distributed Data Store
>          Issue Type: New Feature
>          Components: SCM
>            Reporter: Uma Maheswara Rao G
>            Assignee: Uma Maheswara Rao G
>            Priority: Major
>         Attachments: Apache Ozone Erasure Coding-V2.pdf, EC-read-write-path.pdf, Erasure Coding in Apache Hadoop Ozone.pdf, Ozone EC Container groups and instances.pdf
>
>
> We propose to implement Erasure Coding in Apache Hadoop Ozone to provide efficient storage. With EC in place, Ozone can provide same or better tolerance by giving 50% or more  storage space savings. 
> In HDFS project, we already have native codecs(ISAL) and Java codecs implemented, we can leverage the same or similar codec design.
> However, the critical part of EC data layout design is in-progress, we will post the design doc soon.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org