You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@accumulo.apache.org by br...@apache.org on 2022/04/21 13:01:49 UTC

[accumulo-testing] branch main updated: Managed disk support for Azure terraform infra (#202)

This is an automated email from the ASF dual-hosted git repository.

brianloss pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/accumulo-testing.git


The following commit(s) were added to refs/heads/main by this push:
     new e9cbb85  Managed disk support for Azure terraform infra (#202)
e9cbb85 is described below

commit e9cbb856a0fac3335fff13fc175d7439c0940cca
Author: Brian Loss <br...@gmail.com>
AuthorDate: Thu Apr 21 09:01:44 2022 -0400

    Managed disk support for Azure terraform infra (#202)
    
    Support adding managed disk to the Azure VMs created by the terraform
    testing infrastructure. By adding multiple managed disks to a VM, we can
    get significantly more space for data storage and also increase
    performance since the data is striped across multiple disks.
    
    * Modify the cloud-init module to accept an argument indicating the type
      of deployment (AWS or Azure) so that conditional blocks can be
      included in the cloud-init script.
    * cloud-init module now accepts an optional lvm_mount_point argument. If
      this argument is specified, then the cloud-init script will assume
      that managed disks were created and load a script on the VM and run it
      to wait for the disks to be attached, then group them in an LVM volume
      that is mounted under the specified mount point.
    * The azure main.tf file accepts a new managed_disk_configuration
      optional argument that contains the LVM mount point, and the number,
      size, and sku of managed disks to add to each VM. If this argument is
      specified, then the managed disks are created and attached to the VMs,
      and the lvm mount point and expected number of disks are passed along
      to the cloud-init module. Due to the way attaching managed disks are
      supported by Terraform (they must be attached after the VM is created,
      although Azure does not have this restriction), the provisioner script
      that waits for cloud-init to complete had to be moved outside of the
      VM creation to a null_resource. This null_resource must then be
      explicitly added as a dependency of any module that requires the
      manager or worker VMs to be created AND have cloud-init completed
      running.
    * Fix bug in Azure configuration where the script would fail if the
      create_resource_group variable was set to false (indicating that an
      existing resource group should be used instead of creating a new one).
    * Update the maven version to 3.8.5.
---
 contrib/terraform-testing-infrastructure/README.md |   5 +-
 .../terraform-testing-infrastructure/aws/main.tf   |   1 +
 .../aws/variables.tf                               |   2 +-
 .../terraform-testing-infrastructure/azure/main.tf | 141 +++++++++++++++++----
 .../azure/variables.tf                             |  31 ++++-
 .../files/azure-format-lvm-data-disk.sh            |  52 ++++++++
 .../modules/cloud-init-config/main.tf              |  23 +++-
 .../cloud-init-config/templates/cloud-init.tftpl   |   9 ++
 .../modules/config-files/templates/zoo.cfg.tftpl   |   2 +-
 9 files changed, 237 insertions(+), 29 deletions(-)

diff --git a/contrib/terraform-testing-infrastructure/README.md b/contrib/terraform-testing-infrastructure/README.md
index a4c3fea..9c99883 100644
--- a/contrib/terraform-testing-infrastructure/README.md
+++ b/contrib/terraform-testing-infrastructure/README.md
@@ -165,7 +165,7 @@ The table below lists the variables and their default values that are used in th
 | instance\_count | The number of EC2 instances to create | `string` | `"2"` | no |
 | instance\_type | The type of EC2 instances to create | `string` | `"m5.2xlarge"` | no |
 | local\_sources\_dir | Directory on local machine that contains Maven, ZooKeeper or Hadoop binary distributions or Accumulo source tarball | `string` | `""` | no |
-| maven\_version | The version of Maven to download and install | `string` | `"3.8.4"` | no |
+| maven\_version | The version of Maven to download and install | `string` | `"3.8.5"` | no |
 | optional\_cloudinit\_config | An optional config block for the cloud-init script. If you set this, you should consider setting cloudinit\_merge\_type to handle merging with the default script as you need. | `string` | `null` | no |
 | private\_network | Indicates whether or not the user is on a private network and access to hosts should be through the private IP addresses rather than public ones. | `bool` | `false` | no |
 | root\_volume\_gb | The size, in GB, of the EC2 instance root volume | `string` | `"300"` | no |
@@ -208,7 +208,8 @@ The table below lists the variables and their default values that are used in th
 | hadoop\_version | The version of Hadoop to download and install | `string` | `"3.3.1"` | no |
 | local\_sources\_dir | Directory on local machine that contains Maven, ZooKeeper or Hadoop binary distributions or Accumulo source tarball | `string` | `""` | no |
 | location | The Azure region where resources are to be created. If an existing resource group is specified, this value is ignored and the resource group's location is used. | `string` | n/a | yes |
-| maven\_version | The version of Maven to download and install | `string` | `"3.8.4"` | no |
+| managed\_disk\_configuration | Optional managed disk configuration. If supplied, the managed disks on each VM will be combined into an LVM volume mounted at the named mount point. | <pre>object({<br>    mount_point          = string<br>    disk_count           = number<br>    storage_account_type = string<br>    disk_size_gb         = number<br>  })</pre> | `null` | no |
+| maven\_version | The version of Maven to download and install | `string` | `"3.8.5"` | no |
 | network\_address\_space | The network address space to use for the virtual network. | `list(string)` | <pre>[<br>  "10.0.0.0/16"<br>]</pre> | no |
 | optional\_cloudinit\_config | An optional config block for the cloud-init script. If you set this, you should consider setting cloudinit\_merge\_type to handle merging with the default script as you need. | `string` | `null` | no |
 | os\_disk\_caching | The type of caching to use for the OS disk. Possible values are None, ReadOnly, and ReadWrite. | `string` | `"ReadOnly"` | no |
diff --git a/contrib/terraform-testing-infrastructure/aws/main.tf b/contrib/terraform-testing-infrastructure/aws/main.tf
index f4444f3..59a855a 100644
--- a/contrib/terraform-testing-infrastructure/aws/main.tf
+++ b/contrib/terraform-testing-infrastructure/aws/main.tf
@@ -131,6 +131,7 @@ module "cloud_init_config" {
   accumulo_branch_name = var.accumulo_branch_name
   accumulo_version     = var.accumulo_version
   authorized_ssh_keys  = local.ssh_keys[*]
+  cluster_type         = "aws"
 
   optional_cloudinit_config = var.optional_cloudinit_config
   cloudinit_merge_type      = var.cloudinit_merge_type
diff --git a/contrib/terraform-testing-infrastructure/aws/variables.tf b/contrib/terraform-testing-infrastructure/aws/variables.tf
index b19c2a5..5d7f0e4 100644
--- a/contrib/terraform-testing-infrastructure/aws/variables.tf
+++ b/contrib/terraform-testing-infrastructure/aws/variables.tf
@@ -129,7 +129,7 @@ variable "accumulo_dir" {
 }
 
 variable "maven_version" {
-  default     = "3.8.4"
+  default     = "3.8.5"
   description = "The version of Maven to download and install"
   nullable    = false
 }
diff --git a/contrib/terraform-testing-infrastructure/azure/main.tf b/contrib/terraform-testing-infrastructure/azure/main.tf
index 4e11749..82bf818 100644
--- a/contrib/terraform-testing-infrastructure/azure/main.tf
+++ b/contrib/terraform-testing-infrastructure/azure/main.tf
@@ -69,6 +69,15 @@ locals {
 
   ssh_keys = toset(concat(var.authorized_ssh_keys, [for k in var.authorized_ssh_key_files : file(k)]))
 
+  # Resource group name and location
+  # This is pulled either from the resource group that was created (if create_resource_group is true)
+  # or from the resource group that already exists (if create_resource_group is false). Keeping
+  # references to the resource group or data object rather than just using var.resource_group_name
+  # allows for terraform to automatically create the dependency graph and wait for the resource group
+  # to be created if necessary.
+  rg_name = var.create_resource_group ? azurerm_resource_group.rg[0].name : data.azurerm_resource_group.existing_rg[0].name
+  location = var.create_resource_group ? azurerm_resource_group.rg[0].location : data.azurerm_resource_group.existing_rg[0].location
+
   # Save the public/private IP addresses of the VMs to pass to sub-modules.
   manager_ip         = azurerm_linux_virtual_machine.manager.public_ip_address
   worker_ips         = azurerm_linux_virtual_machine.workers[*].public_ip_address
@@ -84,6 +93,11 @@ locals {
   ]
 }
 
+data "azurerm_resource_group" "existing_rg" {
+  count = var.create_resource_group ? 0 : 1
+  name = var.resource_group_name
+}
+
 # Place all resources in a resource group
 resource "azurerm_resource_group" "rg" {
   count    = var.create_resource_group ? 1 : 0
@@ -98,8 +112,8 @@ resource "azurerm_resource_group" "rg" {
 # Creates a virtual network for use by this cluster.
 resource "azurerm_virtual_network" "accumulo_vnet" {
   name                = "${var.resource_name_prefix}-vnet"
-  resource_group_name = azurerm_resource_group.rg[0].name
-  location            = azurerm_resource_group.rg[0].location
+  resource_group_name = local.rg_name
+  location            = local.location
   address_space       = var.network_address_space
 }
 
@@ -107,7 +121,7 @@ resource "azurerm_virtual_network" "accumulo_vnet" {
 # so that we'll be able to create an NFS share.
 resource "azurerm_subnet" "internal" {
   name                 = "${var.resource_name_prefix}-subnet"
-  resource_group_name  = azurerm_resource_group.rg[0].name
+  resource_group_name  = local.rg_name
   virtual_network_name = azurerm_virtual_network.accumulo_vnet.name
   address_prefixes     = var.subnet_address_prefixes
 }
@@ -116,8 +130,8 @@ resource "azurerm_subnet" "internal" {
 # traffic from the internet and denies everything else.
 resource "azurerm_network_security_group" "nsg" {
   name                = "${var.resource_name_prefix}-nsg"
-  location            = azurerm_resource_group.rg[0].location
-  resource_group_name = azurerm_resource_group.rg[0].name
+  location            = local.location
+  resource_group_name = local.rg_name
 
   security_rule {
     name                       = "allow-ssh"
@@ -140,6 +154,8 @@ resource "azurerm_network_security_group" "nsg" {
 module "cloud_init_config" {
   source = "../modules/cloud-init-config"
 
+  lvm_mount_point      = var.managed_disk_configuration != null ? var.managed_disk_configuration.mount_point : null
+  lvm_disk_count       = var.managed_disk_configuration != null ? var.managed_disk_configuration.disk_count : null
   software_root        = var.software_root
   zookeeper_dir        = var.zookeeper_dir
   hadoop_dir           = var.hadoop_dir
@@ -151,6 +167,7 @@ module "cloud_init_config" {
   accumulo_version     = var.accumulo_version
   authorized_ssh_keys  = local.ssh_keys[*]
   os_type              = local.os_type
+  cluster_type         = "azure"
 
   optional_cloudinit_config = var.optional_cloudinit_config
   cloudinit_merge_type      = var.cloudinit_merge_type
@@ -159,16 +176,16 @@ module "cloud_init_config" {
 # Create a static public IP address for the manager node.
 resource "azurerm_public_ip" "manager" {
   name                = "${var.resource_name_prefix}-manager-ip"
-  resource_group_name = azurerm_resource_group.rg[0].name
-  location            = azurerm_resource_group.rg[0].location
+  resource_group_name = local.rg_name
+  location            = local.location
   allocation_method   = "Static"
 }
 
 # Create a NIC for the manager node.
 resource "azurerm_network_interface" "manager" {
   name                = "${var.resource_name_prefix}-manager-nic"
-  location            = azurerm_resource_group.rg[0].location
-  resource_group_name = azurerm_resource_group.rg[0].name
+  location            = local.location
+  resource_group_name = local.rg_name
 
   enable_accelerated_networking = true
 
@@ -190,8 +207,8 @@ resource "azurerm_network_interface_security_group_association" "manager" {
 resource "azurerm_public_ip" "workers" {
   count               = var.worker_count
   name                = "${var.resource_name_prefix}-worker${count.index}-ip"
-  resource_group_name = azurerm_resource_group.rg[0].name
-  location            = azurerm_resource_group.rg[0].location
+  resource_group_name = local.rg_name
+  location            = local.location
   allocation_method   = "Static"
 }
 
@@ -199,8 +216,8 @@ resource "azurerm_public_ip" "workers" {
 resource "azurerm_network_interface" "workers" {
   count               = var.worker_count
   name                = "${var.resource_name_prefix}-worker${count.index}-nic"
-  location            = azurerm_resource_group.rg[0].location
-  resource_group_name = azurerm_resource_group.rg[0].name
+  location            = local.location
+  resource_group_name = local.rg_name
 
   enable_accelerated_networking = true
 
@@ -223,8 +240,8 @@ resource "azurerm_network_interface_security_group_association" "workers" {
 # Add a login user that can SSH to the VM using the first supplied SSH key.
 resource "azurerm_linux_virtual_machine" "manager" {
   name                = "${var.resource_name_prefix}-manager"
-  resource_group_name = azurerm_resource_group.rg[0].name
-  location            = azurerm_resource_group.rg[0].location
+  resource_group_name = local.rg_name
+  location            = local.location
   size                = var.vm_sku
   computer_name       = "manager"
   admin_username      = var.admin_username
@@ -256,15 +273,44 @@ resource "azurerm_linux_virtual_machine" "manager" {
     sku       = var.vm_image.sku
     version   = var.vm_image.version
   }
+}
+
+# Create and attach managed disks to the manager VM.
+resource "azurerm_managed_disk" "manager_managed_disk" {
+  count                = var.managed_disk_configuration != null ? var.managed_disk_configuration.disk_count : 0
+  name                 = format("%s_disk%02d", azurerm_linux_virtual_machine.manager.name, count.index)
+  resource_group_name  = local.rg_name
+  location             = local.location
+  storage_account_type = var.managed_disk_configuration.storage_account_type
+  disk_size_gb         = var.managed_disk_configuration.disk_size_gb
+  create_option        = "Empty"
+}
+
+resource "azurerm_virtual_machine_data_disk_attachment" "manager_managed_disk_attachment" {
+  count              = var.managed_disk_configuration != null ? var.managed_disk_configuration.disk_count : 0
+  managed_disk_id    = azurerm_managed_disk.manager_managed_disk[count.index].id
+  virtual_machine_id = azurerm_linux_virtual_machine.manager.id
+  lun                = 10 + count.index
+  caching            = "ReadOnly"
+}
 
+# Wait for cloud-init to complete on the manager VM.
+# This is done here rather than in the VM resource because the cloud-init script
+# waits for managed disks to be attached (if used), but the managed disks cannot
+# be attached until the VM is created, so we'd have a deadlock.
+resource "null_resource" "wait_for_manager_cloud_init" {
   provisioner "remote-exec" {
     inline = local.ready_script
     connection {
       type = "ssh"
-      user = self.admin_username
-      host = self.public_ip_address
+      user = azurerm_linux_virtual_machine.manager.admin_username
+      host = azurerm_linux_virtual_machine.manager.public_ip_address
     }
   }
+
+  depends_on = [
+    azurerm_virtual_machine_data_disk_attachment.manager_managed_disk_attachment
+  ]
 }
 
 # Create the worker VMs.
@@ -272,8 +318,8 @@ resource "azurerm_linux_virtual_machine" "manager" {
 resource "azurerm_linux_virtual_machine" "workers" {
   count               = var.worker_count
   name                = "${var.resource_name_prefix}-worker${count.index}"
-  resource_group_name = azurerm_resource_group.rg[0].name
-  location            = azurerm_resource_group.rg[0].location
+  resource_group_name = local.rg_name
+  location            = local.location
   size                = var.vm_sku
   computer_name       = "worker${count.index}"
   admin_username      = var.admin_username
@@ -305,15 +351,57 @@ resource "azurerm_linux_virtual_machine" "workers" {
     sku       = var.vm_image.sku
     version   = var.vm_image.version
   }
+}
+
+# Create and attach managed disks to the worker VMs.
+locals {
+  worker_disks = var.managed_disk_configuration == null ? [] : flatten([
+    for vm_num, vm in azurerm_linux_virtual_machine.workers : [
+      for disk_num in range(var.managed_disk_configuration.disk_count) : {
+        datadisk_name = format("%s_disk%02d", vm.name, disk_num)
+        lun           = 10 + disk_num
+        worker_num    = vm_num
+      }
+    ]
+  ])
+}
+
+resource "azurerm_managed_disk" "worker_managed_disk" {
+  count                = length(local.worker_disks)
+  name                 = local.worker_disks[count.index].datadisk_name
+  resource_group_name  = local.rg_name
+  location             = local.location
+  storage_account_type = var.managed_disk_configuration.storage_account_type
+  disk_size_gb         = var.managed_disk_configuration.disk_size_gb
+  create_option        = "Empty"
+}
 
+resource "azurerm_virtual_machine_data_disk_attachment" "worker_managed_disk_attachment" {
+  count              = length(local.worker_disks)
+  managed_disk_id    = azurerm_managed_disk.worker_managed_disk[count.index].id
+  virtual_machine_id = azurerm_linux_virtual_machine.workers[local.worker_disks[count.index].worker_num].id
+  lun                = local.worker_disks[count.index].lun
+  caching            = "ReadOnly"
+}
+
+# Wait for cloud-init to complete on the worker VMs.
+# This is done here rather than in the VM resources because the cloud-init script
+# waits for managed disks to be attached (if used), but the managed disks cannot
+# be attached until the VMs are created, so we'd have a deadlock.
+resource "null_resource" "wait_for_workers_cloud_init" {
+  count = length(azurerm_linux_virtual_machine.workers)
   provisioner "remote-exec" {
     inline = local.ready_script
     connection {
       type = "ssh"
-      user = self.admin_username
-      host = self.public_ip_address
+      user = azurerm_linux_virtual_machine.workers[count.index].admin_username
+      host = azurerm_linux_virtual_machine.workers[count.index].public_ip_address
     }
   }
+
+  depends_on = [
+    azurerm_virtual_machine_data_disk_attachment.worker_managed_disk_attachment
+  ]
 }
 
 ##############################
@@ -351,6 +439,10 @@ module "config_files" {
 
   accumulo_instance_name = var.accumulo_instance_name
   accumulo_root_password = var.accumulo_root_password
+
+  depends_on = [
+    null_resource.wait_for_manager_cloud_init
+  ]
 }
 
 #
@@ -363,6 +455,10 @@ module "upload_software" {
   local_sources_dir = var.local_sources_dir
   upload_dir        = var.software_root
   upload_host       = local.manager_ip
+
+  depends_on = [
+    null_resource.wait_for_manager_cloud_init
+  ]
 }
 
 #
@@ -379,7 +475,8 @@ module "configure_nodes" {
 
   depends_on = [
     module.upload_software,
-    module.config_files
+    module.config_files,
+    null_resource.wait_for_workers_cloud_init
   ]
 }
 
diff --git a/contrib/terraform-testing-infrastructure/azure/variables.tf b/contrib/terraform-testing-infrastructure/azure/variables.tf
index 9edf84b..2a0e8bb 100644
--- a/contrib/terraform-testing-infrastructure/azure/variables.tf
+++ b/contrib/terraform-testing-infrastructure/azure/variables.tf
@@ -126,6 +126,35 @@ variable "os_disk_caching" {
   }
 }
 
+variable "managed_disk_configuration" {
+  default = null
+  type = object({
+    mount_point          = string
+    disk_count           = number
+    storage_account_type = string
+    disk_size_gb         = number
+  })
+  description = "Optional managed disk configuration. If supplied, the managed disks on each VM will be combined into an LVM volume mounted at the named mount point."
+  nullable    = true
+
+  validation {
+    condition     = var.managed_disk_configuration.mount_point != null
+    error_message = "The mount point must be specified."
+  }
+  validation {
+    condition     = var.managed_disk_configuration.disk_count > 0
+    error_message = "The number of disks must be at least 1."
+  }
+  validation {
+    condition     = contains(["Standard_LRS", "StandardSSD_LRS", "Premium_LRS"], var.managed_disk_configuration.storage_account_type)
+    error_message = "The storage account type must be one of 'Standard_LRS', 'StandardSSD_LRS', or 'Premium_LRS'."
+  }
+  validation {
+    condition     = var.managed_disk_configuration.disk_size_gb > 0 && var.managed_disk_configuration.disk_size_gb <= 32767
+    error_message = "The disk size must be at least 1GB and less than 32768GB."
+  }
+}
+
 variable "software_root" {
   default     = "/opt/accumulo-testing"
   description = "The full directory root where software will be installed"
@@ -178,7 +207,7 @@ variable "accumulo_dir" {
 }
 
 variable "maven_version" {
-  default     = "3.8.4"
+  default     = "3.8.5"
   description = "The version of Maven to download and install"
   nullable    = false
 }
diff --git a/contrib/terraform-testing-infrastructure/modules/cloud-init-config/files/azure-format-lvm-data-disk.sh b/contrib/terraform-testing-infrastructure/modules/cloud-init-config/files/azure-format-lvm-data-disk.sh
new file mode 100644
index 0000000..4a94412
--- /dev/null
+++ b/contrib/terraform-testing-infrastructure/modules/cloud-init-config/files/azure-format-lvm-data-disk.sh
@@ -0,0 +1,52 @@
+#!/usr/bin/env bash
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+[ $# -eq 3 ] || { echo "usage: $0 disk_count mount_point user.group"; exit 1; }
+
+diskCount=$1
+mountPoint=$2
+owner=$3
+
+until [[ $(ls -1 /dev/disk/azure/scsi1/ | wc -l) == "$diskCount" ]]; do
+	echo "Waiting for $diskCount disks to be attached..."
+	sleep 10
+done
+
+VG_GROUP_NAME=storage_vg
+LG_GROUP_NAME=storage_lv
+
+DISK_PATH="/dev/disk/azure/scsi1"
+declare -a REAL_PATH_ARR
+
+for i in $(ls ${DISK_PATH} 2>/dev/null);
+do
+	REAL_PATH=`realpath ${DISK_PATH}/${i}|tr '\n' ' ' `
+	REAL_PATH_ARR+=($REAL_PATH)
+done;
+
+
+RAID_DEVICE_LIST=`echo "${REAL_PATH_ARR[@]}"|sort`
+RAID_DEVICES_COUNT=`echo "${#REAL_PATH_ARR[@]}"`
+pvcreate ${RAID_DEVICE_LIST}
+vgcreate -s 4M ${VG_GROUP_NAME} ${RAID_DEVICE_LIST}
+lvcreate -n $LG_GROUP_NAME -l 100%FREE -i ${RAID_DEVICES_COUNT} ${VG_GROUP_NAME}
+mkfs.xfs -K -f /dev/${VG_GROUP_NAME}/${LG_GROUP_NAME}
+mkdir -p ${mountPoint}
+printf "/dev/${VG_GROUP_NAME}/${LG_GROUP_NAME}\t${mountPoint}\tauto\tdefaults,noatime\t0\t2\n" >> /etc/fstab
+mount --target ${mountPoint}
+chown ${owner} ${mountPoint}
diff --git a/contrib/terraform-testing-infrastructure/modules/cloud-init-config/main.tf b/contrib/terraform-testing-infrastructure/modules/cloud-init-config/main.tf
index 1de3a15..2fa1299 100644
--- a/contrib/terraform-testing-infrastructure/modules/cloud-init-config/main.tf
+++ b/contrib/terraform-testing-infrastructure/modules/cloud-init-config/main.tf
@@ -25,6 +25,16 @@ variable "hadoop_version" {}
 variable "accumulo_branch_name" {}
 variable "accumulo_version" {}
 variable "authorized_ssh_keys" {}
+variable "lvm_mount_point" {
+  default = null
+  description = "Mount point for the LVM volume containing managed disks. If not specified, then no LVM volume is created."
+  nullable = true
+}
+variable "lvm_disk_count" {
+  default = null
+  description = "Number of disks to be combined in an LVM volume. If lvm_mount_point is not specified, this is not used."
+  nullable = true
+}
 variable "cloudinit_merge_type" {
   default  = "dict(recurse_array,no_replace)+list(append)"
   nullable = false
@@ -42,6 +52,14 @@ variable "os_type" {
     error_message = "The value of os_type must be either 'centos' or 'ubuntu'."
   }
 }
+variable "cluster_type" {
+  type     = string
+  nullable = false
+  validation {
+    condition     = contains(["aws", "azure"], var.cluster_type)
+    error_message = "The value of cluster_type must be either 'aws' or 'azure'."
+  }
+}
 
 #####################
 # Create Hadoop Key #
@@ -68,7 +86,10 @@ locals {
     accumulo_branch_name = var.accumulo_branch_name
     accumulo_version     = var.accumulo_version
     authorized_ssh_keys  = local.ssh_keys[*]
+    lvm_mount_point      = var.lvm_mount_point != null ? var.lvm_mount_point : ""
+    lvm_disk_count       = var.lvm_disk_count != null ? var.lvm_disk_count : ""
     os_type              = var.os_type
+    cluster_type         = var.cluster_type
     hadoop_public_key    = indent(6, tls_private_key.hadoop.public_key_openssh)
     hadoop_private_key   = indent(6, tls_private_key.hadoop.private_key_pem)
   })
@@ -97,5 +118,3 @@ data "cloudinit_config" "cfg" {
 output "cloud_init_data" {
   value = data.cloudinit_config.cfg.rendered
 }
-
-
diff --git a/contrib/terraform-testing-infrastructure/modules/cloud-init-config/templates/cloud-init.tftpl b/contrib/terraform-testing-infrastructure/modules/cloud-init-config/templates/cloud-init.tftpl
index 30cc806..0172e40 100644
--- a/contrib/terraform-testing-infrastructure/modules/cloud-init-config/templates/cloud-init.tftpl
+++ b/contrib/terraform-testing-infrastructure/modules/cloud-init-config/templates/cloud-init.tftpl
@@ -75,6 +75,9 @@ packages:
 # Make directories on each node
 #
 runcmd:
+%{ if cluster_type == "azure" && lvm_mount_point != "" ~}
+  - /usr/local/bin/format-lvm-data-disk.sh ${lvm_disk_count} ${lvm_mount_point} hadoop.hadoop
+%{ endif ~}
   - mkdir -p ${software_root} ${zookeeper_dir} ${hadoop_dir} ${accumulo_dir}
   - chown hadoop.hadoop ${software_root} ${zookeeper_dir} ${hadoop_dir} ${accumulo_dir}
   - systemctl enable docker
@@ -149,3 +152,9 @@ write_files:
     permissions: '0755'
     content: |
       ${indent(6, file("${files_path}/update-hosts-genders.sh"))}
+%{ if cluster_type == "azure" ~}
+  - path: /usr/local/bin/format-lvm-data-disk.sh
+    permissions: '0755'
+    content: |
+      ${indent(6, file("${files_path}/azure-format-lvm-data-disk.sh"))}
+%{ endif ~}
diff --git a/contrib/terraform-testing-infrastructure/modules/config-files/templates/zoo.cfg.tftpl b/contrib/terraform-testing-infrastructure/modules/config-files/templates/zoo.cfg.tftpl
index 3eba821..1de64d8 100644
--- a/contrib/terraform-testing-infrastructure/modules/config-files/templates/zoo.cfg.tftpl
+++ b/contrib/terraform-testing-infrastructure/modules/config-files/templates/zoo.cfg.tftpl
@@ -9,7 +9,7 @@ syncLimit=5
 # the directory where the snapshot is stored.
 # do not use /tmp for storage, /tmp here is just
 # example sakes.
-dataDir=${zookeeper_dir}
+dataDir=${zookeeper_dir}/data
 # the port at which the clients will connect
 clientPort=2181
 # the maximum number of client connections.