Azure instance IO stress

Azure instance I/O stress disrupts the state of infra resources.

This fault induces stress on the Azure instance using the Azure Run command. The Azure Run command is executed using the in-built bash scripts within the fault.
It causes I/O stress on the Azure Instance using the bash script for a specific duration.

Azure Instances IO Stress

Use cases

Azure instance I/O stress:

Determines the resilience of an Azure instance when unexpected stress is applied on the I/O sources.
Determines how Azure scales the resources to maintain the application under stress.
Simulates slower disk operations by the application.
Simulates noisy neighbour problems by hogging the disk bandwidth.
Verifies the disk performance on increasing I/O threads and varying I/O block sizes.
Checks whether or not the application functions under high disk latency conditions.
Checks whether or not the application functions under high I/O traffic, and large I/O blocks.
Checks if other services monopolize the I/O disks during stress.

Prerequisites

Kubernetes >= 1.17
Azure Run Command agent is installed and running in the target Azure instance.
Azure instance should be in a healthy state.
Use Azure file-based authentication to connect to the instance using Azure GO SDK. to generate the auth file, run az ad sp create-for-rbac --sdk-auth > azure.auth Azure CLI command.
Kubernetes secret should contain the auth file created in the previous step in the CHAOS_NAMESPACE. Below is a sample secret file:

apiVersion: v1
kind: Secret
metadata:
  name: cloud-secret
type: Opaque
stringData:
  azure.auth: |-
    {
      "clientId": "XXXXXXXXX",
      "clientSecret": "XXXXXXXXX",
      "subscriptionId": "XXXXXXXXX",
      "tenantId": "XXXXXXXXX",
      "activeDirectoryEndpointUrl": "XXXXXXXXX",
      "resourceManagerEndpointUrl": "XXXXXXXXX",
      "activeDirectoryGraphResourceId": "XXXXXXXXX",
      "sqlManagementEndpointUrl": "XXXXXXXXX",
      "galleryEndpointUrl": "XXXXXXXXX",
      "managementEndpointUrl": "XXXXXXXXX"
    }

tip

If you change the secret key name from azure.auth to a new name, ensure that you update the AZURE_AUTH_LOCATION environment variable in the chaos experiment with the new name.

Mandatory tunables

Tunable	Description	Notes
AZURE_INSTANCE_NAMES	Names of the target Azure instances.	Multiple values can be provided as a comma-separated string. For example, `instance-1,instance-2`. For more information, go to stop instance by name.
RESOURCE_GROUP	The Azure Resource Group name where the instances will be created.	All the instances must be from the same resource group. For more information, go to resource group field in the YAML file.

Optional tunables

Tunable	Description	Notes
TOTAL_CHAOS_DURATION	Duration that you specify, through which chaos is injected into the target resource (in seconds).	Defaults to 30s. For more information, go to duration of the chaos.
CHAOS_INTERVAL	Time interval between two successive container kills (in seconds).	Defaults to 60s. For more information, go to chaos interval.
AZURE_AUTH_LOCATION	Name of the Azure secret credentials files.	Defaults to `azure.auth`.
SCALE_SET	Check if the instance is a part of Scale Set.	Defaults to `disable`. Also supports `enable`. For more information, go to scale set instances.
INSTALL_DEPENDENCIES	Install dependencies to run I/O stress.	Defaults to `true`. Also supports `false`.
FILESYSTEM_UTILIZATION_PERCENTAGE	Specify the size as a percentage of free space on the file system.	Defaults to 0 %, which results in 1 GB utilization. For more information, go to file system utilization in percentage.
FILESYSTEM_UTILIZATION_BYTES	Specify the size of the files used per worker (in GB). `FILESYSTEM_UTILIZATION_PERCENTAGE` and `FILESYSTEM_UTILIZATION_BYTES` are mutually exclusive. If both are specified, `FILESYSTEM_UTILIZATION_PERCENTAGE` takes precedence.	Defaults to 0 GB, which results in 1 GB utilization. For more information, go to file system utilization in gigabytes.
NUMBER_OF_WORKERS	Number of I/O workers involved in I/O disk stress.	Default to 4. For more information, go to multiple workers.
VOLUME_MOUNT_PATH	Location that points to the volume mount path used in I/O stress.	Defaults to the user HOME directory. For more information, go to volume mount path.
DEFAULT_HEALTH_CHECK	Determines if you wish to run the default health check which is present inside the fault.	Default: 'true'. For more information, go to default health check.
SEQUENCE	Sequence of chaos execution for multiple target pods.	Defaults to `parallel`. Also supports `serial` sequence. For more information, go to sequence of chaos execution.
RAMP_TIME	Period to wait before and after injecting chaos (in seconds).	For example, 30s. For more information, go to ramp time.

File system utilization in gigabytes

It specifies the size of file utilised by the Azure instance (in gigabytes). Tune it by using the FILESYSTEM_UTILIZATION_BYTES environment variable.