Kernel datasets upload handling

Guidelines for the enabling and disabling of Kernel datasets upload in non-productive environments.

Introduction

After the deployment of Aura in any environment, all its components will generate KPI entities files that will be uploaded into Kernel in CSV or Avro format, as datasets. These procedures increment the cost, both in Aura and in Kernel instances:

  • More consumption of Azure Storage
  • More time of execution of the Databricks cluster of Aura
  • Need for more storage in Kernel, both in Azure for the CSVs and for Avro datasets

Moreover, the data generated in these environments is almost never analyzed nor used.

Because of this, the proposal is to disable the uploading and to minimize the storage of these files, to minimize the costs, once the sanity test set was executed and the process has been validated.

If, eventually, there is a need to test the process again or to upload some data to validate algorithms or to use the Aura billing module, everything can be enabled again.

Prerequisites

  • A kubeconfig of the Aura environment must be configured.
  • az client installed in your PC.
  • Credentials to access the Azure subscription.
  • Substitute <YOUR-ENV> with the corresponding pre-production environment: es-pre, es-cert, br-pre, de-pre, de-int, etc.
  • The installation output file (output_install/<YOUR_ENV>_info.json) to get:
    • The token and the URL of the Databricks cluster.
      • Substitute <DATABRICKS_TOKEN> with Databricks cluster token.
      • Substitute <DATABRICKS_URL> with the domain of the Databricks cluster URL.
    • The job_id of the databricks job in charge of uploading the datasets to Kernel
      • Substitute <DATABRICKS_JOB_ID> with the job_id.
    • The Azure Storage account name and the blob container where the KPI entities files are stored.
      • Substitute <AZURE_COMMON_STORAGE> with STORAGE_ACCOUNT_NAME and <KPI_BLOB_CONTAINER_NAME> with its value.
PATH_TO_YOUR_OUTPUT_INSTALL_ENV_FILE=output_install/<YOUR_ENV>_info.json
STORAGE_ACCOUNT_NAME=$(cat ${PATH_TO_YOUR_OUTPUT_INSTALL_ENV_FILE}|jq -r .common_azure_storage_account_name)
STORAGE_ACCESS_KEY=$(cat ${PATH_TO_YOUR_OUTPUT_INSTALL_ENV_FILE}|jq -r .common_azure_storage_access_key)
KPI_BLOB_CONTAINER_NAME=$(cat ${PATH_TO_YOUR_OUTPUT_INSTALL_ENV_FILE}|jq -r .kpi_blob_container_name)
DATABRICKS_JOB_ID=$(cat ${PATH_TO_YOUR_OUTPUT_INSTALL_ENV_FILE}|jq -r .databricks.job_id)
DATABRICKS_URL=$(cat ${PATH_TO_YOUR_OUTPUT_INSTALL_ENV_FILE}|jq -r .databricks.url)
DATABRICKS_TOKEN=$(cat ${PATH_TO_YOUR_OUTPUT_INSTALL_ENV_FILE}|jq -r .databricks.token)

Disable data uploading

Disable aura-kpis-uploader CSV files upload

  • Suspend aura-kpi-uploader job:
kubectl -n <YOUR-ENV> patch cronjobs kpi-uploader -p '{"spec" : {"suspend" : true }}'

Disable aura-databricks-job Avro files upload

  • Pause aura-databricks-job job:

    • Substitute <DATABRICKS_JOB_ID> with the DATABRICKS_JOB_ID obtained from the installation output file.
curl -XPOST --header 'Authorization: <DATABRICKS_TOKEN>' https://<DATABRICKS_URL>/api/2.1/jobs/update --data '{
   "job_id": <DATABRICKS_JOB_ID>,
   "new_settings":{
      "schedule":{
         "pause_status":"PAUSED"
      }
   }
}'

Remove old KPI entity files generated by Aura and ATRIA components

This step will be fulfilled by applying a removal policy on the Azure blob container where the components write the KPIs.

There are two ways of applying this change:

  • Apply the policy from Azure portal
  • Apply the policy using az client

Apply the policy from Azure portal

  • Access Azure portal
  • Look for <AZURE_COMMON_STORAGE> account and <KPI_BLOB_CONTAINER_NAME>
  • Apply management-policy to <KPI_BLOB_CONTAINER_NAME> and to <KPI_BLOB_CONTAINER_NAME>/processed

Apply the policy using az client

To execute this step, first log in to the Azure subscription with az login.

PATH_TO_YOUR_OUTPUT_INSTALL_ENV_FILE=output_install/<YOUR_ENV>_info.json
STORAGE_ACCOUNT_NAME=$(cat ${PATH_TO_YOUR_OUTPUT_INSTALL_ENV_FILE}|jq -r .common_azure_storage_account_name)
STORAGE_ACCESS_KEY=$(cat ${PATH_TO_YOUR_OUTPUT_INSTALL_ENV_FILE}|jq -r .common_azure_storage_access_key)
RESOURCE_GROUP=$(cat ${PATH_TO_YOUR_OUTPUT_INSTALL_ENV_FILE}|jq -r .common_resource_group)

az storage account management-policy show -g ${RESOURCE_GROUP} --account-name ${STORAGE_ACCOUNT_NAME} -o json > policy.json

KPI_BLOB_CONTAINER_NAME=$(cat ${PATH_TO_YOUR_OUTPUT_INSTALL_ENV_FILE}|jq -r .kpi_blob_container_name)

sed -i "s|${KPI_BLOB_CONTAINER_NAME}/proccesed||g" policy.json

az storage account management-policy create -g ${RESOURCE_GROUP} --account-name ${STORAGE_ACCOUNT_NAME} --policy policy.json

Enable data uploading

Execute Aura installer and everything will be reconfigured again, running the deploy_core stage.