You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@crail.apache.org by GitBox <gi...@apache.org> on 2021/02/09 18:31:42 UTC

[GitHub] [incubator-crail] mbrodmann opened a new pull request #86: Resource Elasticity [3/3]

mbrodmann opened a new pull request #86:
URL: https://github.com/apache/incubator-crail/pull/86


   This PR is the third of three planned PRs for implementing resource elasticity within Crail. More specifically, the goal is to allow to automatically and dynamically adapt the number of running datanodes based on different possible parameters (e.g. resource consumption, available storage capacities, ...). This requires support for starting and terminating running datanodes automatically. This functionality will be implemented mostly within the Crail namenode. Therefore, the Crail namenode will not only maintain information on the current state of the datanodes and the deployment but will also start or terminate datanodes as required. For deciding how to adjust the number of datanodes  based on the current state observed in the Crail namenode so-called _policies_ are used. 
   
   The following changes will be included within the seperate PRs:
   
   1) This first PR implements a new RPC (for both darpc and narpc) that allows to specify a datanode (using IP-address and port number) that should be shutdown.
   2) The seconds PR will implement the actual mechanism for shutting down a datanode.
   
   3) This third PR implements a simple policy for implementing elasticity based on the system's storage consumption. It introduces a new `ElasticNameNodeService` extending the existing `NameNodeService` which frequently evaluates the current storage utilization. When it exceeds a certain threshold this will make the namenode start new datanode(s). In case the utilization falls below a certain threshold the namenode tries to terminate unneeded datanodes using the shutdown mechanism. Parameters are configurable.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-crail] PepperJo merged pull request #86: Resource Elasticity [3/3]

Posted by GitBox <gi...@apache.org>.
PepperJo merged pull request #86:
URL: https://github.com/apache/incubator-crail/pull/86


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-crail] PepperJo commented on a change in pull request #86: Resource Elasticity [3/3]

Posted by GitBox <gi...@apache.org>.
PepperJo commented on a change in pull request #86:
URL: https://github.com/apache/incubator-crail/pull/86#discussion_r580136092



##########
File path: namenode/src/main/java/org/apache/crail/namenode/BlockStore.java
##########
@@ -107,6 +107,85 @@ short prepareDataNodeForRemoval(DataNodeInfo dn) throws IOException {
 		return RpcErrors.ERR_DATANODE_NOT_REGISTERED;
 	}
 
+	public double getStorageUsage() throws Exception {

Review comment:
       Naming. Maybe `getStorageUsedPercentage` or `getFractionStorageUsed`?

##########
File path: namenode/src/main/java/org/apache/crail/namenode/BlockStore.java
##########
@@ -107,6 +107,85 @@ short prepareDataNodeForRemoval(DataNodeInfo dn) throws IOException {
 		return RpcErrors.ERR_DATANODE_NOT_REGISTERED;
 	}
 
+	public double getStorageUsage() throws Exception {
+		long total = 0;
+		long free = 0;
+		for(StorageClass storageClass : storageClasses) {
+			total += storageClass.getTotalCapacity();
+			free += storageClass.getFreeCapacity();
+		}
+
+		// if there is no available capacity (i.e. total number of available blocks is 0),
+		// return 1.0 which tells that all storage is used
+		if(total != 0) {
+			double available = (double) free / (double) total;
+			return 1.0 - available;
+		} else {
+			return 1.0;
+		}
+
+	}
+
+	public int getBlockUsage() throws Exception {
+		int total = 0;
+
+		for(StorageClass storageClass: storageClasses) {
+			total += (storageClass.getTotalCapacity() - storageClass.getFreeCapacity());
+		}
+
+		return total;
+	}
+
+	public int getBlockCapacity() throws Exception {

Review comment:
       Should return a long since `getTotalCapacity` returns a long otherwise this might overflow.
   Again naming: Maybe `getNumberOfBlocks` ?

##########
File path: namenode/src/main/java/org/apache/crail/namenode/BlockStore.java
##########
@@ -107,6 +107,85 @@ short prepareDataNodeForRemoval(DataNodeInfo dn) throws IOException {
 		return RpcErrors.ERR_DATANODE_NOT_REGISTERED;
 	}
 
+	public double getStorageUsage() throws Exception {
+		long total = 0;
+		long free = 0;
+		for(StorageClass storageClass : storageClasses) {
+			total += storageClass.getTotalCapacity();
+			free += storageClass.getFreeCapacity();
+		}
+
+		// if there is no available capacity (i.e. total number of available blocks is 0),
+		// return 1.0 which tells that all storage is used
+		if(total != 0) {
+			double available = (double) free / (double) total;
+			return 1.0 - available;
+		} else {
+			return 1.0;
+		}
+
+	}
+
+	public int getBlockUsage() throws Exception {

Review comment:
       Should return `long` as `getTotalCapacity` and `getFreeCapactiy` return long. Otherwise this could overflow.
   Also Naming: Maybe `getNumberOfBlocksUsed`

##########
File path: namenode/src/main/java/org/apache/crail/namenode/DataNodeBlocks.java
##########
@@ -86,6 +86,10 @@ public boolean isScheduleForRemoval(){
 		return this.scheduleForRemoval;
 	}
 
+	public long getMaxBlockCount() {

Review comment:
       I guess currently we do not shrink block count so block count is always increasing for a datanode. Maybe in the future this might change so I propose to name the method instead:  `getTotalBlockCount` or `getTotalNumberOfBlocks`.

##########
File path: namenode/src/main/java/org/apache/crail/namenode/BlockStore.java
##########
@@ -222,6 +303,33 @@ short removeDatanode(DataNodeInfo dn) throws IOException {
 
 	//---------------
 
+	public long getTotalCapacity() {
+		long capacity = 0;
+
+		for(DataNodeBlocks datanode : membership.values()) {
+			capacity += datanode.getMaxBlockCount();
+		}
+
+		return capacity;
+	}
+
+	public long getFreeCapacity() {

Review comment:
       Also naming. Maybe `getNumberOfFreeBlocks` or `getFreeBlockCount`?

##########
File path: namenode/src/main/java/org/apache/crail/namenode/NameNode.java
##########
@@ -81,8 +81,15 @@ public static void main(String args[]) throws Exception {
 		CrailConstants.NAMENODE_ADDRESS = namenode + "?id=" + serviceId + "&size=" + serviceSize;
 		CrailConstants.printConf();
 		CrailConstants.verify();
-		
-		RpcNameNodeService service = RpcNameNodeService.createInstance(CrailConstants.NAMENODE_RPC_SERVICE);
+
+		RpcNameNodeService service;
+
+		if(CrailConstants.ELASTICSTORE) {

Review comment:
       Do we really need this? Wouldn't it be sufficient if we use `CrailConstants.NAMENODE_RPC_SERVICE` and specify the elastic namenode rpc service class there?

##########
File path: namenode/src/main/java/org/apache/crail/namenode/NameNodeService.java
##########
@@ -552,6 +552,26 @@ public short getLocation(RpcRequestMessage.GetLocationReq request, RpcResponseMe
 		return RpcErrors.ERR_OK;
 	}
 
+	public double getStorageUsage() throws Exception {

Review comment:
       Naming. See above.

##########
File path: namenode/src/main/java/org/apache/crail/namenode/BlockStore.java
##########
@@ -222,6 +303,33 @@ short removeDatanode(DataNodeInfo dn) throws IOException {
 
 	//---------------
 
+	public long getTotalCapacity() {

Review comment:
       The naming does not clearly state that this is number of blocks. Maybe `getTotalNumberOfBlocks` or `getTotalNumberBlocks` or `getTotalBlockCount`?

##########
File path: namenode/src/main/java/org/apache/crail/namenode/BlockStore.java
##########
@@ -222,6 +303,33 @@ short removeDatanode(DataNodeInfo dn) throws IOException {
 
 	//---------------
 
+	public long getTotalCapacity() {
+		long capacity = 0;
+
+		for(DataNodeBlocks datanode : membership.values()) {
+			capacity += datanode.getMaxBlockCount();
+		}
+
+		return capacity;
+	}
+
+	public long getFreeCapacity() {
+		long capacity = 0;
+
+		for(DataNodeBlocks datanode : membership.values()) {
+			capacity += datanode.getBlockCount();
+		}
+
+		return capacity;
+	}
+
+	public Collection<DataNodeBlocks> getDataNodeBlocks() {
+		return this.membership.values();
+	}
+
+	public int getRunningDatanodes() {

Review comment:
       Naming. Maybe `getNumberOfRunningDatanodes()`?
   I know its long but it could be misinterpreted otherwise.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-crail] BernardMetzler commented on a change in pull request #86: Resource Elasticity [3/3]

Posted by GitBox <gi...@apache.org>.
BernardMetzler commented on a change in pull request #86:
URL: https://github.com/apache/incubator-crail/pull/86#discussion_r576252695



##########
File path: namenode/src/main/java/org/apache/crail/namenode/BlockStore.java
##########
@@ -171,9 +273,61 @@ short addDataNode(DataNodeBlocks dataNode) {
 		return RpcErrors.ERR_OK;
 
 	}
-	
+
+	short prepareForRemovalDatanode(DataNodeInfo dn) throws Exception {
+		// this will only mark it for removal
+		return _prepareOrRemoveDN(dn, true);
+	}
+
+	short removeDatanode(DataNodeInfo dn) throws Exception {
+		// this will remove it as well
+		return _prepareOrRemoveDN(dn, false);
+	}
+
 	//---------------
-	
+
+	public long getTotalCapacity() {
+		long capacity = 0;
+
+		for(DataNodeBlocks datanode : membership.values()) {
+			capacity += datanode.getMaxBlockCount();
+		}
+
+		return capacity;
+	}
+
+	public long getFreeCapacity() {
+		long capacity = 0;
+
+		for(DataNodeBlocks datanode : membership.values()) {
+			capacity += datanode.getBlockCount();
+		}
+
+		return capacity;
+	}
+
+	public Collection<DataNodeBlocks> getDataNodeBlocks() {
+		return this.membership.values();
+	}
+
+	public int getRunningDatanodes() {
+		return this.membership.size();
+	}
+
+	private short _prepareOrRemoveDN(DataNodeInfo dn, boolean onlyMark) throws Exception {

Review comment:
       Minor nit: I'd prefer to have two methods for the two tasks implemented here, one for preparing a DN for removal, and one to execute DN removal.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-crail] PepperJo commented on pull request #86: Resource Elasticity [3/3]

Posted by GitBox <gi...@apache.org>.
PepperJo commented on pull request #86:
URL: https://github.com/apache/incubator-crail/pull/86#issuecomment-782118295


   Please rebase. Thanks!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-crail] BernardMetzler commented on pull request #86: Resource Elasticity [3/3]

Posted by GitBox <gi...@apache.org>.
BernardMetzler commented on pull request #86:
URL: https://github.com/apache/incubator-crail/pull/86#issuecomment-787808471


   Looks good now!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org