Azure instance IO stress
Azure instance I/O stress disrupts the state of infra resources.
- This fault induces stress on the Azure instance using the Azure
Run
command. The AzureRun
command is executed using the in-built bash scripts within the fault. - It causes I/O stress on the Azure Instance using the bash script for a specific duration.
Use cases
Azure instance I/O stress:
- Determines the resilience of an Azure instance when unexpected stress is applied on the I/O sources.
- Determines how Azure scales the resources to maintain the application under stress.
- Simulates slower disk operations by the application.
- Simulates noisy neighbour problems by hogging the disk bandwidth.
- Verifies the disk performance on increasing I/O threads and varying I/O block sizes.
- Checks whether or not the application functions under high disk latency conditions.
- Checks whether or not the application functions under high I/O traffic, and large I/O blocks.
- Checks if other services monopolize the I/O disks during stress.
Prerequisites
- Kubernetes >= 1.17
- Azure Run Command agent is installed and running in the target Azure instance.
- Azure instance should be in a healthy state.
- Use Azure file-based authentication to connect to the instance using Azure GO SDK. to generate the auth file, run
az ad sp create-for-rbac --sdk-auth > azure.auth
Azure CLI command. - Kubernetes secret should contain the auth file created in the previous step in the
CHAOS_NAMESPACE
. Below is a sample secret file:
apiVersion: v1
kind: Secret
metadata:
name: cloud-secret
type: Opaque
stringData:
azure.auth: |-
{
"clientId": "XXXXXXXXX",
"clientSecret": "XXXXXXXXX",
"subscriptionId": "XXXXXXXXX",
"tenantId": "XXXXXXXXX",
"activeDirectoryEndpointUrl": "XXXXXXXXX",
"resourceManagerEndpointUrl": "XXXXXXXXX",
"activeDirectoryGraphResourceId": "XXXXXXXXX",
"sqlManagementEndpointUrl": "XXXXXXXXX",
"galleryEndpointUrl": "XXXXXXXXX",
"managementEndpointUrl": "XXXXXXXXX"
}
tip
If you change the secret key name from azure.auth
to a new name, ensure that you update the AZURE_AUTH_LOCATION
environment variable in the chaos experiment with the new name.
Mandatory tunables
Tunable | Description | Notes |
---|---|---|
AZURE_INSTANCE_NAMES | Names of the target Azure instances. | Multiple values can be provided as a comma-separated string. For example, instance-1,instance-2 . For more information, go to stop instance by name. |
RESOURCE_GROUP | The Azure Resource Group name where the instances will be created. | All the instances must be from the same resource group. For more information, go to resource group field in the YAML file. |
Optional tunables
Tunable | Description | Notes |
---|---|---|
TOTAL_CHAOS_DURATION | Duration that you specify, through which chaos is injected into the target resource (in seconds). | Defaults to 30s. For more information, go to duration of the chaos. |
CHAOS_INTERVAL | Time interval between two successive container kills (in seconds). | Defaults to 60s. For more information, go to chaos interval. |
AZURE_AUTH_LOCATION | Name of the Azure secret credentials files. | Defaults to azure.auth . |
SCALE_SET | Check if the instance is a part of Scale Set. | Defaults to disable . Also supports enable . For more information, go to scale set instances. |
INSTALL_DEPENDENCIES | Install dependencies to run I/O stress. | Defaults to true . Also supports false . |
FILESYSTEM_UTILIZATION_PERCENTAGE | Specify the size as a percentage of free space on the file system. | Defaults to 0 %, which results in 1 GB utilization. For more information, go to file system utilization in percentage. |
FILESYSTEM_UTILIZATION_BYTES | Specify the size of the files used per worker (in GB). FILESYSTEM_UTILIZATION_PERCENTAGE and FILESYSTEM_UTILIZATION_BYTES are mutually exclusive. If both are specified, FILESYSTEM_UTILIZATION_PERCENTAGE takes precedence. | Defaults to 0 GB, which results in 1 GB utilization. For more information, go to file system utilization in gigabytes. |
NUMBER_OF_WORKERS | Number of I/O workers involved in I/O disk stress. | Default to 4. For more information, go to multiple workers. |
VOLUME_MOUNT_PATH | Location that points to the volume mount path used in I/O stress. | Defaults to the user HOME directory. For more information, go to volume mount path. |
DEFAULT_HEALTH_CHECK | Determines if you wish to run the default health check which is present inside the fault. | Default: 'true'. For more information, go to default health check. |
SEQUENCE | Sequence of chaos execution for multiple target pods. | Defaults to parallel . Also supports serial sequence. For more information, go to sequence of chaos execution. |
RAMP_TIME | Period to wait before and after injecting chaos (in seconds). | For example, 30s. For more information, go to ramp time. |