You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by "Ewan Higgs (JIRA)" <ji...@apache.org> on 2017/05/16 08:14:04 UTC

[jira] [Created] (HDFS-11828) Refactor FsDatasetImpl as the BlockAlias is in the wire protocol for PROVIDED blocks.

Ewan Higgs created HDFS-11828:
---------------------------------

             Summary: Refactor FsDatasetImpl as the BlockAlias is in the wire protocol for PROVIDED blocks.
                 Key: HDFS-11828
                 URL: https://issues.apache.org/jira/browse/HDFS-11828
             Project: Hadoop HDFS
          Issue Type: Sub-task
            Reporter: Ewan Higgs
            Assignee: Ewan Higgs


From HDFS-11639:

{quote}[~virajith]
Looking over this patch, one thing that occurred to me is if it makes sense to unify FileRegionProvider with BlockProvider? They both have very close functionality.

I like the use of BlockProvider#resolve(). If we unify FileRegionProvider with BlockProvider, then resolve can return null if the block map is accessible from the Datanodes also. If it is accessible only from the Namenode, then a non-null value can be propagated to the Datanode.
One of the motivations for adding the BlockAlias to the client protocol was to have the blocks map only on the Namenode. In this scenario, the ReplicaMap in FsDatasetImpl of will not have any replicas apriori. Thus, one way to ensure that the FsDatasetImpl interface continues to function as today is to create a FinalizedProvidedReplica in FsDatasetImpl#getBlockInputStream when BlockAlias is not null.
{quote}

{quote}[~ehiggs]
With the pending refactoring of the FsDatasetImpl which won't have replicas a priori, I wonder if it makes sense for the Datanode to have a FileRegionProvider or BlockProvider at all. They are given the appropriate block ID and block alias in the readBlock or writeBlock message. Maybe I'm overlooking what's still being provided.{quote}

{quote}[~virajith]
I was trying to reconcile the existing design (FsDatasetImpl knows about provided blocks apriori) with the new design where FsDatasetImpl will not know about these before but just constructs them on-the-fly using the BlockAlias from readBlock or writeBlock. Using BlockProvider#resolve() allows us to have both designs exist in parallel. I was wondering if we should still retain the earlier given the latter design.
{quote}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org