Categories:
Kernel datasets upload handling
Guidelines for the enabling and disabling of Kernel datasets upload in non-productive environments.
Introduction
After the deployment of Aura in any environment, all its components will generate KPI entities files that will be uploaded into Kernel in CSV or Avro format, as datasets. These procedures increment the cost, both in Aura and in Kernel instances:
- More consumption of Azure Storage
- More time of execution of the Databricks cluster of Aura
- Need for more storage in Kernel, both in Azure for the CSVs and for Avro datasets
Moreover, the data generated in these environments is almost never analyzed nor used.
Because of this, the proposal is to disable the uploading and to minimize the storage of these files, to minimize the costs, once the sanity test set was executed and the process has been validated.
If, eventually, there is a need to test the process again or to upload some data to validate algorithms or to use the Aura billing module, everything can be enabled again.
Prerequisites
- A kubeconfig of the Aura environment must be configured.
- az client installed in your PC.
- Credentials to access the Azure subscription.
- Substitute
<YOUR-ENV>with the corresponding pre-production environment:es-pre,es-cert,br-pre,de-pre,de-int, etc. - The installation output file (
output_install/<YOUR_ENV>_info.json) to get:- The token and the URL of the Databricks cluster.
- Substitute
<DATABRICKS_TOKEN>with Databricks cluster token. - Substitute
<DATABRICKS_URL>with the domain of the Databricks cluster URL.
- Substitute
- The
job_idof the databricks job in charge of uploading the datasets to Kernel- Substitute
<DATABRICKS_JOB_ID>with thejob_id.
- Substitute
- The Azure Storage account name and the blob container where the KPI entities files are stored.
- Substitute
<AZURE_COMMON_STORAGE>withSTORAGE_ACCOUNT_NAMEand<KPI_BLOB_CONTAINER_NAME>with its value.
- Substitute
- The token and the URL of the Databricks cluster.
PATH_TO_YOUR_OUTPUT_INSTALL_ENV_FILE=output_install/<YOUR_ENV>_info.json
STORAGE_ACCOUNT_NAME=$(cat ${PATH_TO_YOUR_OUTPUT_INSTALL_ENV_FILE}|jq -r .common_azure_storage_account_name)
STORAGE_ACCESS_KEY=$(cat ${PATH_TO_YOUR_OUTPUT_INSTALL_ENV_FILE}|jq -r .common_azure_storage_access_key)
KPI_BLOB_CONTAINER_NAME=$(cat ${PATH_TO_YOUR_OUTPUT_INSTALL_ENV_FILE}|jq -r .kpi_blob_container_name)
DATABRICKS_JOB_ID=$(cat ${PATH_TO_YOUR_OUTPUT_INSTALL_ENV_FILE}|jq -r .databricks.job_id)
DATABRICKS_URL=$(cat ${PATH_TO_YOUR_OUTPUT_INSTALL_ENV_FILE}|jq -r .databricks.url)
DATABRICKS_TOKEN=$(cat ${PATH_TO_YOUR_OUTPUT_INSTALL_ENV_FILE}|jq -r .databricks.token)
Disable data uploading
Disable aura-kpis-uploader CSV files upload
- Suspend aura-kpi-uploader job:
kubectl -n <YOUR-ENV> patch cronjobs kpi-uploader -p '{"spec" : {"suspend" : true }}'
Disable aura-databricks-job Avro files upload
-
Pause aura-databricks-job job:
- Substitute
<DATABRICKS_JOB_ID>with theDATABRICKS_JOB_IDobtained from the installation output file.
- Substitute
curl -XPOST --header 'Authorization: <DATABRICKS_TOKEN>' https://<DATABRICKS_URL>/api/2.1/jobs/update --data '{
"job_id": <DATABRICKS_JOB_ID>,
"new_settings":{
"schedule":{
"pause_status":"PAUSED"
}
}
}'
Remove old KPI entity files generated by Aura and ATRIA components
This step will be fulfilled by applying a removal policy on the Azure blob container where the components write the KPIs.
There are two ways of applying this change:
- Apply the policy from Azure portal
- Apply the policy using az client
Apply the policy from Azure portal
- Access Azure portal
- Look for
<AZURE_COMMON_STORAGE>account and<KPI_BLOB_CONTAINER_NAME> - Apply
management-policyto<KPI_BLOB_CONTAINER_NAME>and to<KPI_BLOB_CONTAINER_NAME>/processed
Apply the policy using az client
To execute this step, first log in to the Azure subscription with az login.
PATH_TO_YOUR_OUTPUT_INSTALL_ENV_FILE=output_install/<YOUR_ENV>_info.json
STORAGE_ACCOUNT_NAME=$(cat ${PATH_TO_YOUR_OUTPUT_INSTALL_ENV_FILE}|jq -r .common_azure_storage_account_name)
STORAGE_ACCESS_KEY=$(cat ${PATH_TO_YOUR_OUTPUT_INSTALL_ENV_FILE}|jq -r .common_azure_storage_access_key)
RESOURCE_GROUP=$(cat ${PATH_TO_YOUR_OUTPUT_INSTALL_ENV_FILE}|jq -r .common_resource_group)
az storage account management-policy show -g ${RESOURCE_GROUP} --account-name ${STORAGE_ACCOUNT_NAME} -o json > policy.json
KPI_BLOB_CONTAINER_NAME=$(cat ${PATH_TO_YOUR_OUTPUT_INSTALL_ENV_FILE}|jq -r .kpi_blob_container_name)
sed -i "s|${KPI_BLOB_CONTAINER_NAME}/proccesed||g" policy.json
az storage account management-policy create -g ${RESOURCE_GROUP} --account-name ${STORAGE_ACCOUNT_NAME} --policy policy.json
Enable data uploading
Execute Aura installer and everything will be reconfigured again, running the deploy_core stage.