This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Aura Databricks Jobs

1: Configuration
2: Environment variables
3: User guide
4: Troubleshooting

Aura Databricks Jobs

aura-databricks-jobs is a component based on Databricks. Discover in the current section its technical description and main components.

Introduction

aura-databricks-jobs is a component based on Databricks for the optimization of data processing and the training of ML-based models.

Currently, its primary function is to import Avro-formatted files into Kernel datasets. For this purpose, we will see later that it is necessary to configure a run job in the Databricks environment. Find the method of the job in avro_to_dataset_job.py.

aura-to-dataset-job-cli is an executable script that imports Avro KPIs into the storage location indicated in the Kernel dataset destination config. It is configured in a Databricks cluster that is executed every day (although it is configurable in the job schedule). It is developed with Python and uses the Kernel Spark SDK to read the Avro files and write in Kernel datasets.

Detailed information regarding aura-databricks-jobs is found in the following documents:
. Architecture and main components
. How does aura-databricks-jobs work?
. aura-databricks-jobs configuration
. How to use aura-databricks-jobs?
. Environment variables
. Troubleshooting

Aura Databricks Jobs architecture

In the following diagram, the architecture of aura-to-dataset-job-cli is represented, including its main components, which are described in the following sections.

Components diagram

Avro to Dataset Job components

ConfigManager

ConfigManager is a handler for configuration that is gathered from input config_dict to fulfill the variables needed in the import process. It also validates the configuration. In any error case, the process is not executed.

AuraLogging

AuraLogging is a wrapper of LoggerWrapper class imported from aura-pytraces library. It used to register logs adding the required items such as version, app, stck etc.

The behavior of logs in the file logging.cfg is internally configurable, following the format established by the aura-pytraces library. This configuration may be overwritten:

level of handler config by environment variable AURA_LOGGING_LEVEL. By default, INFO value.
formatter of handler config by environment variable AURA_LOGGING_FORMAT. By default, simple value.
version by environment variable AURA_VERSION. By default, not-reachable value.

Avro to Dataset Job

It is referred to the process that a cron-job executes in the Databricks.

It contains the logic to configure coroutines to import Avro files by type of dataset with asyncio library.

The result of each coroutine is a report. When all the coroutines are finished, the reports are processed, generating a single one with the information of all the import process and including Spark processing info.

Avro KPI importer

It contains the logic to import Avro-formatted files by type of dataset. If there are not Avro-formatted files of this type of dataset, this coroutine finishes.

The result of each routine is the report of the importation process of the specific type of dataset.

Azure Storage Manager

This module is used to download and upload files from and to Azure Storage.

Spark SDK Manager

This module is used to load data as a Dataframe from Azure Storage and write in dataset of Kernel Datalake.

Aura Databricks Job operation

The execution flowchart of avro-to-dataset-job-cli is shown in the following image:

Execution flowchart

avro-to-dataset-job-cli

It is responsible for importing the Avro-formatted files in Aura KPIs container (job’s variable: AURA_MICROSOFT_AZURE_STORAGE_KPIS_CONTAINER_NAME) to the correspondent dataset in Kernel.

The information necessary to import the Avro-formatted files with the same Avro schema to their corresponding dataset is obtained from the configuration file stored in the Azure KPIs container, specifically the file path configured in the job’s variable: AURA_KPI_AVRO_ADAPTER_CONFIG_PATH.

In addition, there is a file that will provide us with the average size of the files by type of dataset, specifically the file path configured in the job’s variable: AURA_KPI_AVRO_SOURCE_SIZE_REPORT_PATH. This information will be useful when writing in Kernel datasets with the Spark tool to correctly indicate how the data should be partitioned to improve performance.

From this file, we will obtain all the schemas that are imported. For this purpose, it is required that targetType is set with avro value in each item.

Below, it is defined the information that the job gathers for each Avro schema:

name: dataset_id used to import into Kernel. For example, D_Aura_Channel.
schema: type of schema. For example, dimensional or entity.
versionSchema: Version of avroSchema. For example, 6.0.0. The major version will be used in the Spark stage to write in Kernel dataset.
avroSchema: name of the schema stored in the container within the folder configured in the AURA_KPI_AVRO_SCHEMAS_PATH variable. The Avro schema necessary when reading the files in spark is obtained from the path configured in the job variable: AURA_KPI_AVRO_SCHEMAS_PATH and extra parameters: $AURA_KPI_AVRO_SCHEMAS_PATH/$schema/$versionSchema/$avroSchema. Example: schemas/dimensional/6.0.0/aura-channel-asvc.json.

Sample of Aura Avro adapter file:

[
    {
        "name": "D_Aura_Channel",
        "schema": "dimensional",
        "avroSchema": "aura-channel-asvc.json",
        "versionSchema": "6.0.0",
        "source": {
            "data": "object",
            "id": "CHANNEL"
        },
        "targetType": "avro",
        "fields": {
            "AURA_CHANNEL_ID": {
                "sourceName": "id",
                "targetType": "string"
            },
            "AURA_CHANNEL_NAME": {
                "sourceName": "name",
                "targetType": "string"
            },
            "AURA_CHANNEL_SHORT_NAME": {
                "sourceName": "prefix",
                "targetType": "string"
            }
        }
    },
    {
        "name": "D_Aura_Recognizer",
        "schema": "dimensional",
        "avroSchema": "aura-recognizer-asvc.json",
        "versionSchema": "6.0.0",
        "source": {
            "data": "object",
            "id": "RECOGNIZER"
        },
        "targetType": "avro",
        "fields": {
            "AURA_RECOGNIZER_ID": {
                "sourceName": "id",
                "targetType": "string"
            },
            "AURA_RECOGNIZER_NAME": {
                "sourceName": "name",
                "targetType": "string"
            },
            "EXTRACTION_TM": {
                "sourceName": "EXTRACTION_TM",
                "targetType": "string",
                "preCalculated": "DATE_ISO_8691"
            }
        }
    }
]

The job will run the import process for each schema type, running in coroutines and using the asyncio library.

The following process is carried out for each type of schema:

Check if there are schemas configured not to be loaded. The job variable where this configuration is configured is: AURA_KPI_AVRO_SCHEMAS_NOT_TO_UPLOAD. The format is a list formatted as schema_1:dataset_id_1,schema_1:dataset_id_2,schema_2;dataset_id_3. Example: dimensional:D_Aura_Channel,entity:E_Aura_GROOT. The number of files that have been skipped for that type are recorded in a report.
Check if there are files of that type to import in its corresponding folder. The path where the Avro-formatted files are stored is: AURA_KPI_AVRO_SOURCE_PATH. Within this path, the files are stored by their corresponding $schema/$dataset/$version. Example dimensional/6.0.0/D_Aura_Channel. If there are no files, the coroutine ends up generating a report without uploaded files.
If there are files, the reading will be carried out with Spark, indicating the Azure Blob where the files with the same Avro schema are located. Additionally, they will be written to its corresponding dataset of Kernel Datalake. This step is configured with locking using asyncio to prevent asyncio.Lock() from protecting read and write operations on a DataFrame.
Once the files are imported, the local copy is moved to a folder inside the container (job’s variables: AURA_MICROSOFT_AZURE_STORAGE_KPIS_CONTAINER_NAME/AURA_KPI_AVRO_PROCESSED_FOLDER_PATH) and kept there during a fixed time, for recovering purposes.
All the details of the process are recorded in a report that is stored in the job variable: AURA_KPI_AVRO_REPORTS_DESTINATION_PATH/aura-avro-kpis-report-{iso-date}.json.
Depending on the configured report mode, AURA_KPI_AVRO_REPORTS_MODE will be generated only when errors occur, always or never.

Independently of when it runs, avro-to-dataset-job-cli always performs the same process: it gets all the Avro-formatted files in KPIs container (job variable: AURA_MICROSOFT_AZURE_STORAGE_KPIS_CONTAINER_NAME) from the last upload executed by the aura-kpis-uploader component.

When running independently on the Databricks cluster, Prometheus alerts cannot be configured. Therefore, the process information will be obtained from the report generated along with the following generated files:

If the process has ended with errors:
- A file with the name set in the variable AURA_KPI_AVRO_PROCESS_ERROR_FILENAME will be generated containing the execution date.
- Additionally, if the report has been generated in Azure Storage, the link to it will be included, valid for the time configured in the variable AURA_KPI_AVRO_REPORTS_SAS_EXPIRATION.
- If the report cannot be recorded, the error will appear in the file.
If the process terminates abruptly due to a timeout and the databricks manager kills the process:
- A report will be generated, showing each dataset in its corresponding stage.
- The stages of each dataset can be completed, as when it is run again, it will obtain the last report generated. From this one, it will identify the stage to continue.
- If the process remained in the stage WRITING_DATASET_OK, the files from the last execution will be moved to the processed folder and deleted from the avro folder.
- If the process remained in the stage READING_BLOBS or WRITING_DATASET, the files will be loaded together with the rest of the files that have been generated without making distinctions.
- If the process remained in the stage MOVING_BLOBS_TO_PROCESSED, the files will be moved to the processed folder. If this second attempt fails again, the stage will be set to the value NOT_PROCESSED_PREVIOUS_ERRORS to indicate that it is not recoverable and that a manual review must be carried out in case there is a corrupt Avro file.
- If the process remained in the stage REMOVING_BLOBS, the files will be deleting from the avro folder. If this second deletion attempt fails again, the stage will be set to the value NOT_PROCESSED_PREVIOUS_ERRORS to indicate that it is not recoverable and that a manual review must be carried out in case there is a corrupt Avro file.
- If the process remained in the stage WRITING_DATASET_ERROR_NOT_RECOVERABLE, the files of the last execution and the possible ones that have been added since the last run will not be loaded, since there are unrecoverable errors that must be verified manually to be resolved. This involves writing datasets with malformed records or discarded records. So, for the dataset, the stage is recorded as NOT_PROCESSED_PREVIOUS_ERRORS to avoid loading this dataset.

1 - Configuration

Aura Databricks Jobs configuration

This document describes the internal configuration of the aura-databricks-jobs component that will be enabled in every Aura release from the current one onwards.

⚠️ The users can modify this configuration at a certain extent, described in Aura Databricks Jobs user guide

Prerequisites

Python version 3.9 or higher

# determine python version
python --version

aura-pytraces: Aura repository for Python traces functionalities.

Execution of the tool in Databricks cluster

1. Configuration of the Databricks cluster

Firstly, it is necessary to follow the steps defined in Kernel documentation for the correct installation of the cluster: Create a Databricks cluster.

In addition, to configure our environment and Python package in the Databricks cluster, it is necessary to configure a docker image that we will have previously registered: docker_image: auraregistry.azurecr.io/aura/tools/aura-databricks-jobs:$VERSION

Configuration example obtained by applying the steps in the Kernel documentation and configuring docker image URL:

{
    "spark_version": "12.2.x-scala2.12",
    "spark_conf": {
        "spark.driver.memory": "4g",
        "spark.jars.packages": "com.telefonica.baikal:spark-sdk_2.12:2.2.1,org.apache.spark:spark-avro_2.12:3.3.2",
        "spark.jars.repositories": "https://4p-public-artifacts.s3.amazonaws.com/baikal/releases/,https://repo.osgeo.org/repository/release/",
        "spark.debug.maxToStringFields": "100"
    },
    "spark_env_vars": {
        "PYSPARK_PYTHON": "/databricks/python3/bin/python3",
        "JNAME": "zulu11-ca-amd64"
    },
    "init_scripts": [
        {
            "workspace": { "destination": "/InitScripts//init_script.sh"}
        }
    ],
    "docker_image": {
        "url": "auraregistry.azurecr.io/aura/tools/aura-databricks-jobs:{$VERSION}",
        "basic_auth": {
            "username": "$USERNAME",
            "password": "$PASSWORD"
        }
    }
}

Example of configuring the init script as indicated in the Kernel documentation:

#!/bin/bash
wget -O /databricks/jars/config-1.3.4.jar https://repo1.maven.org/maven2/com/typesafe/config/1.3.4/config-1.3.4.jar
rm -f /databricks/jars/*--com.typesafe__config__1.2.1.jar

2. Configuration of the job’s variables

The job will be configured with some input parameters that are included in the variable: config_dict.

You can review all variables in Job’s variables.

config_dict = {
    'AURA_ENVIRONMENT_NAME': 'DEV',
    'AURA_DATABRICKS_EXECUTION_PERIOD': 24,
    'AURA_FP_SPARK_BASE_URL': '',
    'AURA_FP_SPARK_CLIENT_ID': 'aura-bot-xxx',
    'AURA_FP_SPARK_CLIENT_SECRET': '',
    'AURA_FP_SPARK_PURPOSES': '',
    'AURA_FP_SPARK_SCOPES': '',
    'AURA_FP_SPARK_JARS_PACKAGES': 'com.telefonica.baikal:spark-sdk_2.12:2.2.1,org.apache.spark:spark-avro_2.12:2.2.1',
    'AURA_FP_SPARK_JARS_REPOSITORIES':
        'https://4p-public-artifacts.s3.amazonaws.com/baikal/releases/,https://repo.osgeo.org/repository/release/',
    'AURA_FP_SPARK_SUFFIX_DATASET_TEST': '',
    'AURA_KPI_AVRO_SOURCE_PATH': 'avro',
    'AURA_KPI_AVRO_REPORTS_DESTINATION_PATH': 'avro/reports',
    'AURA_MICROSOFT_AZURE_STORAGE_COMMON_ACCOUNT': '',
    'AURA_MICROSOFT_AZURE_STORAGE_COMMON_ACCESS_KEY': '',
    'AURA_MICROSOFT_AZURE_STORAGE_KPIS_CONTAINER_NAME': 'aura-kpis',
    'AURA_KPI_AVRO_SCHEMAS_NOT_TO_UPLOAD': 'entity:E_Aura_GROOT',
    'AURA_KPI_AVRO_PROCESSED_FOLDER_PATH': 'processed'
}

if __name__ == "__main__":
    asyncio.run(import_avro_files_job(config_dict))

3. Configuration of job in Databricks cluster

To execute the job in Databricks, you should create a new job, following the guidelines Create and run Databricks Jobs and copying the template avro_to_dataset_job_cli.py without these unnecessary params:

AURA_FP_SPARK_JARS_PACKAGES
AURA_FP_SPARK_JARS_REPOSITORIES

Execution of the tool in local environment

To install Apache Spark on your local machine and run Python scripts, follow the steps below.

1. Install Java 11

Apache Spark requires Java to run. We recommend using Java 11, as indicated in the Kernel documentation Spark SDK.

You can install Java 11 using a package manager or downloading the installer: Download.

On Ubuntu/Debian:

sudo apt update
sudo apt install openjdk-11-jdk

On macOS (using Homebrew):

brew install openjdk@11

On Windows: Download the JRE installer from the Oracle website, run the installer and follow the on-screen instructions.

Finally, verify the installation with:

java -version

2. Install requirements via pip

pip install -r requirements.txt

These requirements include PySpark library and automatically includes a lightweight version of Spark, so you can run Spark jobs locally without needing to install Spark separately.

pip install pyspark

3. Config spark Session

By default, the Databricks cluster is configured with the required jar files and packages. But in local mode, you must indicate this configuration when you create the Spark session using the jobs variables: AURA_FP_SPARK_JARS_PACKAGES and AURA_FP_SPARK_JARS_REPOSITORIES.

Example:

AURA_FP_SPARK_JARS_PACKAGES = 'com.telefonica.baikal:spark-sdk_2.12:2.2.1,org.apache.spark:spark-avro_2.12:3.3.2'
AURA_FP_SPARK_JARS_REPOSITORIES = 'https://4p-public-artifacts.s3.amazonaws.com/baikal/releases/,https://repo.osgeo.org/repository/release/'

4. Execute job

You can execute the job with the configured variables:

python avro_to_dataset_job_cli.py

2 - Environment variables

Environment variables

List of environment variables handled by aura-databricks-jobs and avro-to-dataset-job-cli

Aura Databricks Jobs variables

List of environment variables handled by aura-databricks-jobs.

Properties marked in bold are mandatory
Properties marked in italics are optional

Property	Type	Description	Modifiable by OB?
AURA_LOGGING_FORMAT	string	Format to be used in monitoring logs: `console`, `json`, `string` or `simple`. By default: `simple`.	NO.
AURA_LOGGING_LEVEL	string	Level to be used in monitoring logs, from more to less verbose: `'DEBUG', 'INFO', 'WARN', 'ERROR', 'FATAL', 'OFF', 'NOTSET', 'CRITICAL`. By default: `INFO`.	YES, for development set it to `DEBUG`. In pre/production, it should be `INFO` or `ERROR`. For the analysis of an issue in pre/production, it may be changed to `DEBUG`.
AURA_VERSION	string	Number of the Aura’s release being executed.	NO

Avro to Dataset job cli variables

List of job’s variables handled by avro-to-dataset-job-cli

Properties marked in bold are mandatory
Properties marked in italics are optional

Property	Type	Description	Modifiable by OB?
AURA_ENVIRONMENT_NAME	string	Name of the environment where aura-databricks-jobs is deployed. For example: `ap-next`, `es-dev`, `de-pre`	NO
AURA_FP_SPARK_BASE_URL	string	Base URL for Kernel Spark SDK.	NO
AURA_FP_SPARK_CLIENT_ID	string	Client ID for Kernel Spark SDK.	NO
AURA_FP_SPARK_CLIENT_SECRET	string	Client secret for Kernel Spark SDK.	NO
AURA_FP_SPARK_JARS_PACKAGES	string	The jar packages configured only for local run, because in Databricks cluster this configuration is set previously.	NO
AURA_FP_SPARK_JARS_REPOSITORIES	string	The repositories configured only for local run, because in Databricks cluster this configuration is set previously.	NO
AURA_FP_SPARK_SCOPES	string	Scopes for Kernel Spark SDK.	NO
AURA_FP_SPARK_PURPOSES	string	Purposes for Kernel Spark SDK.	NO
AURA_FP_SPARK_SUFFIX_DATASET_TEST	string	Suffix used in tests with Kernel Spark SDK. By default: ``.	NO. It is used for testing in the development environment.
AURA_KPI_AVRO_ADAPTER_CONFIG_PATH	string	File path for getting Aura Avro adapter configuration.	NO
AURA_KPI_AVRO_PROCESS_ERROR_FILENAME	string	File name that records an error in the last execution. By default: `databricks.ERROR`.	NO
AURA_KPI_AVRO_PROCESSED_FOLDER_PATH	string	Destination path for the processed KPIs Avro files.	NO
AURA_KPI_AVRO_SOURCE_PATH	string	Source path for the KPIs Avro data.	NO
AURA_KPI_AVRO_SOURCE_SIZE_REPORT_PATH	string	The file path for getting size report. By default: `avro/sizeReport.json`.	NO
AURA_KPI_AVRO_REPORTS_MODE	string	Behavior of avro-to-dataset-job-cli regarding the generation of reports. Possible values: `all`: a report is generated for each processed file; `none`: it does not generate any report; `error`: it generates a report if an error has occurred. By default: `all`.	NO
AURA_KPI_AVRO_REPORTS_DESTINATION_PATH	string	Destination path for the KPIs Avro reports.	YES
AURA_KPI_AVRO_REPORTS_SAS_EXPIRATION	integer	Time to expiration in minutes for the report SAS URL generated when an error occurs. Default: `43200` (30 days).	NO
AURA_KPI_AVRO_SCHEMAS_NOT_TO_UPLOAD	string	Schemas not to be uploaded in the KPIs Avro data, included in a list formatted as follows: `schema_1:dataset_id_1,schema_1:dataset_id_2,schema_2;dataset_id_3` Example: `dimensional:D_Aura_Channel,entity:E_Aura_GROOT`.	NO
AURA_KPI_AVRO_SCHEMAS_PATH	string	Schema path where Avro schemas are stored. By default, `schemas`.	NO
AURA_MICROSOFT_AZURE_RETRY_TOTAL	integer	Total number of allowed retries. Default value: `3`.	NO
AURA_MICROSOFT_AZURE_RETRY_BACKOFF_FACTOR	float	Backoff factor to apply between attempts after the second try (most errors are resolved immediately by a second try without a delay). In ’exponential’ mode, retry policy will sleep for: `{backoff factor} * (2 ** ({number of total retries} - 1))` seconds. If the backoff_factor is 0.1, then the retry will sleep for [0.0s, 0.2s, 0.4s, …] between retries. The default value is `0.3`.	NO
AURA_MICROSOFT_AZURE_RETRY_BACKOFF_MAX	integer	Maximum backoff time in seconds. Default value: `5`.	NO
AURA_MICROSOFT_AZURE_STORAGE_COMMON_ACCOUNT	string	Microsoft Storage account of the environment.	NO
AURA_MICROSOFT_AZURE_STORAGE_COMMON_ACCESS_KEY	string	Microsoft Storage password of the deployment.	NO
AURA_MICROSOFT_AZURE_STORAGE_KPIS_CONTAINER_NAME	string	Name of the container where the KPIs are stored.	NO
SPARK_CONTEXT_LOG_LEVEL	string	Log level for the Spark context.	NO

3 - User guide

Aura Databricks Jobs user guide

Guidelines including the orderly steps to use Aura Databricks Jobs

Prerequisites

Python version 3.9 or higher.

# determine python version
python --version

Installed aura-pytraces: Aura repository for Python traces functionalities.
Prerequisites in Aura installer:
- Databricks must be enabled in Aura installer
- Databricks cluster node type must be configured
- Databricks job execution must be configured
Configure Kernel datasets. See more details in Kernel datasets configuration.

Flow

The flow that aura-databricks-jobs follows to validate if it is going to be executed is as follows:

flow

Generate Reports

By default, aura-databricks-jobs generates a report in the import process. This report is available in the Azure Storage defined in AURA_MICROSOFT_AZURE_STORAGE_COMMON_ACCOUNT, and path AURA_KPI_AVRO_REPORTS_DESTINATION_PATH with the file name: aura-avro-kpis-report-{iso-date}.json.

If you want to change the behavior and generate reports of all uploaded files or disable their generation, you can do it by changing the environment variable AURA_KPIS_REPORTS_MODE. If the value is set to all, it will generate a report for each of the processed files, if it is set to none, it will not generate any report and if it set to error, the report will be generated only when there are errors in the process. The default value is all.

3.1 Report Model

A report will contain the following template in JSON format.

{
    "num_files_kernel_uploaded": 30,
    "num_files_moved_to_processed": 30,
    "num_files_deleted": 30,
    "num_files_skipped": 0,
    "num_errors": 0,
    "summary": {
        "D_Aura_Channel": {
            "dataset_id": "D_Aura_Channel",
            "schema": "dimensional",
            "version": "6.0.0",
            "step": "FINISH",
            "num_files_kernel_uploaded": 4,
            "num_files_moved_to_processed": 4,
            "num_files_deleted": 4,
            "num_files_skipped": 0,
            "num_errors": 0,
            "errors": [],
            "spark_executions": {
                "dataset_id": "D_Aura_Channel",
                "version": 6,
                "correlator": "55fc318d-b9cd-4070-ae6e-0407ef4b871e",
                "resource_id": "8fb3e408-2ce0-42f4-8bbf-5b0974b44108",
                "request_type": "writes",
                "status": "finished",
                "metrics": {
                    "total_records_written": 116,
                    "local_spark_write_discards": 0,
                    "local_spark_write_discards_total": 0,
                    "malformed_records_written": 0,
                    "total_records_filtered_by_gdpr": 0,
                    "local_spark_bytes_written_total": 14640,
                    "total_malformed_records_by_partition_written": [],
                    "partitions_written": [],
                    "total_malformed_records_written": 0,
                    "total_malformed_records_by_column_written": [],
                    "total_records_by_partition_written": [],
                    "total_not_informed_records_by_partition_written": [],
                    "records_read": 116,
                    "local_spark_records_written_total": 116,
                    "total_not_informed_records_written": 0,
                    "records_written": 116,
                    "total_malformed_records_discarded": 0,
                    "records_discarded": 0,
                    "data_access_audit": {
                        "partitions_num": 1,
                        "wasb_type": "avro_fp"
                    },
                    "total_executor_cpu_millis": 1,
                    "total_executor_memory": 593913446,
                    "total_bytes_written": 4796
                }
            },
            "files_uploaded": [
                "avro_test/dimensional/D_Aura_Channel/6.0.0/CR_DIM_CHANNEL_20241017T070000Z.avro",
                "avro_test/dimensional/D_Aura_Channel/6.0.0/CR_DIM_CHANNEL_20241017T080000Z.avro",
                "avro_test/dimensional/D_Aura_Channel/6.0.0/CR_DIM_CHANNEL_20241017T090000Z.avro",
                "avro_test/dimensional/D_Aura_Channel/6.0.0/CR_DIM_CHANNEL_20241017T100000Z.avro"
            ],
            "duration_seconds": 141.32
        },
        "D_Aura_Recognizer": {
            "dataset_id": "D_Aura_Recognizer",
            "schema": "dimensional",
            "version": "6.0.0",
            "step": "FINISH",
            "num_files_kernel_uploaded": 4,
            "num_files_moved_to_processed": 4,
            "num_files_deleted": 4,
            "num_files_skipped": 0,
            "num_errors": 0,
            "errors": [],
            "spark_executions": {
                "dataset_id": "D_Aura_Recognizer",
                "version": 6,
                "correlator": "55fc318d-b9cd-4070-ae6e-0407ef4b871e",
                "resource_id": "415fb219-6ef4-4b21-9e14-c10347f1d2fa",
                "request_type": "writes",
                "status": "finished",
                "metrics": {
                    "total_records_written": 376,
                    "local_spark_write_discards": 0,
                    "local_spark_write_discards_total": 0,
                    "malformed_records_written": 0,
                    "total_records_filtered_by_gdpr": 0,
                    "local_spark_bytes_written_total": 49744,
                    "total_malformed_records_by_partition_written": [],
                    "partitions_written": [],
                    "total_malformed_records_written": 0,
                    "total_malformed_records_by_column_written": [],
                    "total_records_by_partition_written": [],
                    "total_not_informed_records_by_partition_written": [],
                    "records_read": 376,
                    "local_spark_records_written_total": 376,
                    "total_not_informed_records_written": 0,
                    "records_written": 376,
                    "total_malformed_records_discarded": 0,
                    "records_discarded": 0,
                    "data_access_audit": {
                        "partitions_num": 1,
                        "wasb_type": "avro_fp"
                    },
                    "total_executor_cpu_millis": 1,
                    "total_executor_memory": 593913446,
                    "total_bytes_written": 9055
                }
            },
            "files_uploaded": [
                "avro_test/dimensional/D_Aura_Recognizer/6.0.0/CR_DIM_RECOGNIZER_20241017T070000Z.avro",
                "avro_test/dimensional/D_Aura_Recognizer/6.0.0/CR_DIM_RECOGNIZER_20241017T080000Z.avro",
                "avro_test/dimensional/D_Aura_Recognizer/6.0.0/CR_DIM_RECOGNIZER_20241017T090000Z.avro",
                "avro_test/dimensional/D_Aura_Recognizer/6.0.0/CR_DIM_RECOGNIZER_20241017T100000Z.avro"
            ],
            "duration_seconds": 94.75
        },
        "D_Aura_Component": {
            "dataset_id": "D_Aura_Recognizer",
            "schema": "dimensional",
            "version": "6.0.0",
            "step": "FINISH",
            "num_files_kernel_uploaded": 4,
            "num_files_moved_to_processed": 4,
            "num_files_deleted": 4,
            "num_files_skipped": 0,
            "num_errors": 0,
            "errors": [],
            "spark_executions": {
                "dataset_id": "D_Aura_Component",
                "version": 6,
                "correlator": "55fc318d-b9cd-4070-ae6e-0407ef4b871e",
                "resource_id": "340c90a8-00d5-4868-a746-5ec0f8342a90",
                "request_type": "writes",
                "status": "finished",
                "metrics": {
                    "total_records_written": 28,
                    "local_spark_write_discards": 0,
                    "local_spark_write_discards_total": 0,
                    "malformed_records_written": 0,
                    "total_records_filtered_by_gdpr": 0,
                    "local_spark_bytes_written_total": 2108,
                    "total_malformed_records_by_partition_written": [],
                    "partitions_written": [],
                    "total_malformed_records_written": 0,
                    "total_malformed_records_by_column_written": [],
                    "total_records_by_partition_written": [],
                    "total_not_informed_records_by_partition_written": [],
                    "records_read": 28,
                    "local_spark_records_written_total": 28,
                    "total_not_informed_records_written": 0,
                    "records_written": 28,
                    "total_malformed_records_discarded": 0,
                    "records_discarded": 0,
                    "data_access_audit": {
                        "partitions_num": 1,
                        "wasb_type": "avro_fp"
                    },
                    "total_executor_cpu_millis": 1,
                    "total_executor_memory": 593913446,
                    "total_bytes_written": 1255
                }
            },
            "files_uploaded": [
                "avro_test/dimensional/D_Aura_Component/6.0.0/CR_DIM_COMPONENT_20241017T070000Z.avro",
                "avro_test/dimensional/D_Aura_Component/6.0.0/CR_DIM_COMPONENT_20241017T080000Z.avro",
                "avro_test/dimensional/D_Aura_Component/6.0.0/CR_DIM_COMPONENT_20241017T090000Z.avro",
                "avro_test/dimensional/D_Aura_Component/6.0.0/CR_DIM_COMPONENT_20241017T100000Z.avro"
            ],
            "duration_seconds": 105.14
        },
        "D_Aura_Skill": {
            "dataset_id": "D_Aura_Skill",
            "schema": "dimensional",
            "version": "6.0.0",
            "step": "FINISH",
            "num_files_kernel_uploaded": 4,
            "num_files_moved_to_processed": 4,
            "num_files_deleted": 4,
            "num_files_skipped": 0,
            "num_errors": 0,
            "errors": [],
            "spark_executions": {
                "dataset_id": "D_Aura_Skill",
                "version": 6,
                "correlator": "55fc318d-b9cd-4070-ae6e-0407ef4b871e",
                "resource_id": "60da9e25-0767-4097-ab9a-2bf388d8daa7",
                "request_type": "writes",
                "status": "finished",
                "metrics": {
                    "total_records_written": 16,
                    "local_spark_write_discards": 0,
                    "local_spark_write_discards_total": 0,
                    "malformed_records_written": 0,
                    "total_records_filtered_by_gdpr": 0,
                    "local_spark_bytes_written_total": 1280,
                    "total_malformed_records_by_partition_written": [],
                    "partitions_written": [],
                    "total_malformed_records_written": 0,
                    "total_malformed_records_by_column_written": [],
                    "total_records_by_partition_written": [],
                    "total_not_informed_records_by_partition_written": [],
                    "records_read": 16,
                    "local_spark_records_written_total": 16,
                    "total_not_informed_records_written": 0,
                    "records_written": 16,
                    "total_malformed_records_discarded": 0,
                    "records_discarded": 0,
                    "data_access_audit": {
                        "partitions_num": 1,
                        "wasb_type": "avro_fp"
                    },
                    "total_executor_cpu_millis": 1,
                    "total_executor_memory": 593913446,
                    "total_bytes_written": 1246
                }
            },
            "files_uploaded": [
                "avro_test/dimensional/D_Aura_Skill/6.0.0/CR_DIM_SKILL_20241017T070000Z.avro",
                "avro_test/dimensional/D_Aura_Skill/6.0.0/CR_DIM_SKILL_20241017T080000Z.avro",
                "avro_test/dimensional/D_Aura_Skill/6.0.0/CR_DIM_SKILL_20241017T090000Z.avro",
                "avro_test/dimensional/D_Aura_Skill/6.0.0/CR_DIM_SKILL_20241017T100000Z.avro"
            ],
            "duration_seconds": 95.97
        },
        "D_Aura_Preset": {
            "dataset_id": "D_Aura_Preset",
            "schema": "dimensional",
            "version": "6.0.0",
            "step": "FINISH",
            "num_files_kernel_uploaded": 4,
            "num_files_moved_to_processed": 4,
            "num_files_deleted": 4,
            "num_files_skipped": 0,
            "num_errors": 0,
            "errors": [],
            "spark_executions": {
                "dataset_id": "D_Aura_Preset",
                "version": 6,
                "correlator": "55fc318d-b9cd-4070-ae6e-0407ef4b871e",
                "resource_id": "8b143625-9bf7-484a-8a05-671a6cff72fe",
                "request_type": "writes",
                "status": "finished",
                "metrics": {
                    "total_records_written": 64,
                    "local_spark_write_discards": 0,
                    "local_spark_write_discards_total": 0,
                    "malformed_records_written": 0,
                    "total_records_filtered_by_gdpr": 0,
                    "local_spark_bytes_written_total": 5020,
                    "total_malformed_records_by_partition_written": [],
                    "partitions_written": [],
                    "total_malformed_records_written": 0,
                    "total_malformed_records_by_column_written": [],
                    "total_records_by_partition_written": [],
                    "total_not_informed_records_by_partition_written": [],
                    "records_read": 64,
                    "local_spark_records_written_total": 64,
                    "total_not_informed_records_written": 0,
                    "records_written": 64,
                    "total_malformed_records_discarded": 0,
                    "records_discarded": 0,
                    "data_access_audit": {
                        "partitions_num": 1,
                        "wasb_type": "avro_fp"
                    },
                    "total_executor_cpu_millis": 1,
                    "total_executor_memory": 593913446,
                    "total_bytes_written": 2001
                }
            },
            "files_uploaded": [
                "avro_test/dimensional/D_Aura_Preset/6.0.0/CR_DIM_PRESETS_20241017T070000Z.avro",
                "avro_test/dimensional/D_Aura_Preset/6.0.0/CR_DIM_PRESETS_20241017T080000Z.avro",
                "avro_test/dimensional/D_Aura_Preset/6.0.0/CR_DIM_PRESETS_20241017T090000Z.avro",
                "avro_test/dimensional/D_Aura_Preset/6.0.0/CR_DIM_PRESETS_20241017T100000Z.avro"
            ],
            "duration_seconds": 72.97
        },
        "D_Aura_App": {
            "dataset_id": "D_Aura_App",
            "schema": "dimensional",
            "version": "6.0.0",
            "step": "FINISH",
            "num_files_kernel_uploaded": 4,
            "num_files_moved_to_processed": 4,
            "num_files_deleted": 4,
            "num_files_skipped": 0,
            "num_errors": 0,
            "errors": [],
            "spark_executions": {
                "dataset_id": "D_Aura_App",
                "version": 6,
                "correlator": "55fc318d-b9cd-4070-ae6e-0407ef4b871e",
                "resource_id": "f99b5dac-47ce-4525-aa86-6d3bbb3b67f5",
                "request_type": "writes",
                "status": "finished",
                "metrics": {
                    "total_records_written": 28,
                    "local_spark_write_discards": 0,
                    "local_spark_write_discards_total": 0,
                    "malformed_records_written": 0,
                    "total_records_filtered_by_gdpr": 0,
                    "local_spark_bytes_written_total": 5192,
                    "total_malformed_records_by_partition_written": [],
                    "partitions_written": [],
                    "total_malformed_records_written": 0,
                    "total_malformed_records_by_column_written": [],
                    "total_records_by_partition_written": [],
                    "total_not_informed_records_by_partition_written": [],
                    "records_read": 28,
                    "local_spark_records_written_total": 28,
                    "total_not_informed_records_written": 0,
                    "records_written": 28,
                    "total_malformed_records_discarded": 0,
                    "records_discarded": 0,
                    "data_access_audit": {
                        "partitions_num": 1,
                        "wasb_type": "avro_fp"
                    },
                    "total_executor_cpu_millis": 1,
                    "total_executor_memory": 593913446,
                    "total_bytes_written": 2742
                }
            },
            "files_uploaded": [
                "avro_test/dimensional/D_Aura_App/6.0.0/CR_DIM_APP_20241017T070000Z.avro",
                "avro_test/dimensional/D_Aura_App/6.0.0/CR_DIM_APP_20241017T080000Z.avro",
                "avro_test/dimensional/D_Aura_App/6.0.0/CR_DIM_APP_20241017T090000Z.avro",
                "avro_test/dimensional/D_Aura_App/6.0.0/CR_DIM_APP_20241017T100000Z.avro"
            ],
            "duration_seconds": 93.86
        },
        "Aura_Audit": {
            "dataset_id": "Aura_Audit",
            "schema": "entity",
            "version": "6.0.0",
            "step": "FINISH",
            "num_files_kernel_uploaded": 2,
            "num_files_moved_to_processed": 2,
            "num_files_deleted": 2,
            "num_files_skipped": 0,
            "num_errors": 0,
            "errors": [],
            "spark_executions": {
                "dataset_id": "Aura_Audit",
                "version": 6,
                "correlator": "55fc318d-b9cd-4070-ae6e-0407ef4b871e",
                "resource_id": "3013424c-4ef1-4bdb-b4fc-a02540f9b1f8",
                "request_type": "writes",
                "status": "finished",
                "metrics": {
                    "total_records_written": 63,
                    "local_spark_write_discards": 0,
                    "local_spark_write_discards_total": 0,
                    "malformed_records_written": 0,
                    "total_records_filtered_by_gdpr": 0,
                    "local_spark_bytes_written_total": 12452,
                    "total_malformed_records_by_partition_written": [],
                    "partitions_written": [
                        [
                            [
                                "DAY_DT",
                                "2024-10-04"
                            ]
                        ],
                        [
                            [
                                "DAY_DT",
                                "2024-10-07"
                            ]
                        ]
                    ],
                    "total_malformed_records_written": 0,
                    "total_malformed_records_by_column_written": [],
                    "total_records_by_partition_written": [
                        [
                            "DAY_DT=2024-10-04",
                            53
                        ],
                        [
                            "DAY_DT=2024-10-07",
                            10
                        ]
                    ],
                    "total_not_informed_records_by_partition_written": [],
                    "records_read": 63,
                    "local_spark_records_written_total": 63,
                    "total_not_informed_records_written": 0,
                    "records_written": 63,
                    "total_malformed_records_discarded": 0,
                    "records_discarded": 0,
                    "data_access_audit": {
                        "partitions_num": 1,
                        "wasb_type": "avro_fp"
                    },
                    "total_executor_cpu_millis": 1,
                    "total_executor_memory": 593913446,
                    "total_bytes_written": 6854
                }
            },
            "files_uploaded": [
                "avro_test/entity/Aura_Audit/6.0.0/AURA_062a0ab0-d0bd-5347-98bf-d88977af622f_CR_AUDIT_20241007T090000Z.avro",
                "avro_test/entity/Aura_Audit/6.0.0/AURA_1d43887a-f368-51ce-abee-60f5b25387ad_CR_AUDIT_20241004T110000Z.avro"
            ],
            "duration_seconds": 100.70
        },
        "Aura_Gateway_Message": {
            "dataset_id": "Aura_Gateway_Message",
            "schema": "entity",
            "version": "6.0.0",
            "step": "NOT_PROCESSED",
            "num_files_kernel_uploaded": 0,
            "num_files_moved_to_processed": 0,
            "num_files_deleted": 0,
            "num_files_skipped": 0,
            "num_errors": 0,
            "errors": [],
            "spark_executions": {},
            "files_uploaded": [],
            "duration_seconds": 0.07
        }
    },
    "start_time": "2024-10-23T15:18:30.098166Z",
    "end_time": "2024-10-23T15:36:57.161532Z",
    "duration_seconds": 1107.06,
    "step": "FINISH",
    "status": "successfully"
}

The parameters are defined as follows:

dataset_id: Kernel dataset id to load.
schema: Type of schema to load.
version: Dataset version to load.
step: Stage of loading process. It could be:
- INIT: In this stage, the necessary Azure and Spark connections are created and a report is created.
- CHECK_PREVIOUS_ERRORS: In this stage, it is checked if there were errors in the last execution; the errors of the datasets that cannot be recovered are marked and those that can be recovered will be executed again.
- WRITING_KERNEL_STAGE: Stage for reading files and writing data to the Kernel datasets.
- MOVING_PROCESSED_BLOBS_STAGE: Stage for moving files to the processed folder.
- FINISH: This stage indicates that the process has been completed.
num_files_kernel_uploaded: Number of files that have been verified as successfully uploaded in Kernel Datalake.
num_files_moved_to_processed: Number of files that have been moved to the processed folder.
num_files_deleted : Number of files that have been deleted from the main folder.
num_files_skipped: Number of files that have been skipped. This is because they have not yet been processed due to match with pattern defined in job’s variable: AURA_KPI_AVRO_SCHEMAS_NOT_TO_UPLOAD
num_errors: Total of errors reported. It may indicate an error when loading the source files contained in one of the Avro-formatted folders. So it does not correspond to the number of erroneous files.
start_time: Date in ISO format with start time
end_time: Date in ISO format with end time
duration_seconds: duration in seconds of the import process.
status: It contains the status of process. The value will be failed or successfully.

summary: It contains the information of each coroutine processed that is responsible for loading a folder with files that have the same Avro schema and the same version. If there is a general error prior to the coroutines, it will also appear in the summary in the process_error field. It contains for each dataset id:

num_files_kernel_uploaded: Number of files that have been verified as successfully uploaded in Kernel Datalake for this dataset id.
num_files_moved_to_processed: Number of files that have been moved to the processed folder for this dataset id.
num_files_deleted: Number of files that have been deleted from the main folder for this dataset id.
num_errors: Number of errors reported for this dataset id.
errors: Produced errors for this dataset id. With elements: error, corr, step.
- error: Description or exception of error obtained.
- corr: Correlator used in process.
- step: It indicates the phase of the process for each Kernel dataset.
  - MOVING_BLOBS_TO_PROCESSED_WITH_PREVIOUS_ERRORS: In this stage, the processed files that were pending to move due to an error are now moved.
  - REMOVING_BLOBS_WITH_PREVIOUS_ERRORS: In this stage, the processed files that were pending to be deleted due to an error are now deleted.
  - NOT_PROCESSED_PREVIOUS_ERRORS: Errors that occurred in a previous process that are not recoverable. For example, if the writing has malformed or discarded records, they must be reviewed manually and should not be written to the dataset. Or if after trying to move the files to be processed again they fail again, it would be necessary to specifically check what happens with those files.
  - READING_BLOBS: In this stage, the files are read to create data to be written to the dataset.
  - WRITING_DATASET: This stage proceeds to write data to the dataset.
  - WRITING_DATASET_OK: At this stage, the data has already been correctly written to the dataset.
  - WRITING_DATASET_ERROR_NOT_RECOVERABLE: In the writing process, malformed or discarded records have been detected that must be checked manually.
  - MOVING_BLOBS_TO_PROCESSED: At this stage, the files are moved to the processed folder.
  - REMOVING_BLOBS: At this stage, the files are deleted from the processed folder.
  - NOT_PROCESSED: The dataset has no data and will not be processed.
  - FINISH: The dataset uploading has been completed correctly.
spark_executions: Spark report for that dataset id. Included info such as records read, written, discarded, etc.
files_uploaded: List of files that have been uploaded in Kernel for this dataset id.

Example of one coroutine executed for ´D_Aura_Channel´ dataset:

{
  "D_Aura_Channel": {
            "dataset_id": "D_Aura_Channel",
            "schema": "dimensional",
            "version": "6.0.0",
            "step": "FINISH",
            "num_files_kernel_uploaded": 156,
            "num_files_moved_to_processed": 156,
            "num_files_deleted": 156,
            "num_files_skipped": 0,
            "num_errors": 0,
            "errors": [],
            "spark_executions": {
                "dataset_id": "D_Aura_Channel",
                "version": 6,
                "correlator": "d558b080-f261-4e6b-9adc-a7503f3e51a9",
                "resource_id": "36417c66-a276-4107-bcb8-3792bccb076c",
                "request_type": "writes",
                "status": "finished",
                "metrics": {
                    "total_records_written": 4967,
                    "local_spark_write_discards": 0,
                    "local_spark_write_discards_total": 0,
                    "malformed_records_written": 0,
                    "total_records_filtered_by_gdpr": 0,
                    "local_spark_bytes_written_total": 4049495,
                    "total_malformed_records_by_partition_written": [],
                    "partitions_written": [],
                    "total_malformed_records_written": 0,
                    "total_records_by_partition_written": [],
                    "total_not_informed_records_by_partition_written": [],
                    "records_read": 4967,
                    "local_spark_records_written_total": 4967,
                    "total_not_informed_records_written": 0,
                    "records_written": 4967,
                    "total_malformed_records_discarded": 0,
                    "records_discarded": 0,
                    "data_access_audit": {
                        "partitions_num": 1,
                        "wasb_type": "avro_fp"
                    },
                    "total_executor_cpu_millis": 1,
                    "total_executor_memory": 593913446,
                    "total_bytes_written": 394038
                }
            },
            "duration_seconds": 112.05
        }
}

4 - Troubleshooting

Aura Databricks Jobs troubleshooting

Most common errors in Aura Databricks Jobs, along with the generated logs and recommendations for error fixing

Required environment variables

Situation produced due to missing configuration of the mandatory environment variables.

If any of the mandatory environment variables is missing, an error message appears in the aura-databricks-jobs logs similar to the one shown below:

marshmallow.exceptions.ValidationError: {'AURA_MICROSOFT_AZURE_STORAGE_COMMON_ACCOUNT': ['AURA_MICROSOFT_AZURE_STORAGE_COMMON_ACCOUNT is required.'], 'AURA_MICROSOFT_AZURE_STORAGE_COMMON_ACCESS_KEY': ['AURA_MICROSOFT_AZURE_STORAGE_COMMON_ACCESS_KEY is required.'], 'AURA_MICROSOFT_AZURE_STORAGE_KPIS_CONTAINER_NAME': ['AURA_MICROSOFT_AZURE_STORAGE_KPIS_CONTAINER_NAME is required.']}

Error in the Azure Blob container that stores Avro-formatted files

The value of AURA_MICROSOFT_AZURE_STORAGE_KPIS_CONTAINER_NAME in the job’s variable is not correct, as the container does not exist. To solve it, review the credentials in the aura-conversations bucket/blob container in Kernel. In the aura-databricks-jobs logs, an error message similar to this will appear:

azure.core.exceptions.ResourceNotFoundError: The specified container does not exist.
RequestId:2dfad4cd-401e-0083-31cf-190020000000
Time:2024-10-08T22:11:23.1996799Z
ErrorCode:ContainerNotFound
Content: <?xml version="1.0" encoding="utf-8"?><Error><Code>ContainerNotFound</Code><Message>The specified container does not exist.
RequestId:2dfad4cd-401e-0083-31cf-190020000000
Time:2024-10-08T22:11:23.1996799Z</Message></Error>

Errors in the source Microsoft Storage account

The value of AURA_MICROSOFT_AZURE_STORAGE_COMMON_ACCOUNT in the job’s variable is not correct. To solve it, review the credentials in the aura-conversations bucket/blob container in Kernel.
In the aura-databricks-jobs logs, an error message similar to this will appear:
```
azure.core.exceptions.ServiceRequestError: <urllib3.connection.HTTPSConnection object at 0x10276ebe0>: Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known
```
The value of AURA_MICROSOFT_AZURE_STORAGE_COMMON_ACCOUNT in the job’s variable is empty. In the aura-databricks-jobs logs, an error message similar to this will appear:
```
azure.core.exceptions.ServiceRequestError: URL has an invalid label.
```

Error in the source Microsoft Storage password

The value of AURA_MICROSOFT_AZURE_STORAGE_COMMON_ACCESS_KEY in the job’s variable is not correct. To solve it, review the credentials in the aura-conversations bucket/blob container in Kernel. In the aura-databricks-jobs logs, an error message similar to this will appear:
```
azure.storage.blob._shared.authentication.AzureSigningError: Invalid base64-encoded string: number of data characters (81) cannot be 1 more than a multiple of 4
```

The value of AURA_MICROSOFT_AZURE_STORAGE_COMMON_ACCESS_KEY in the job’s variable is empty. In the aura-databricks-jobs logs, an error message similar to this will appear:

azure.core.exceptions.ServiceRequestError: <urllib3.connection.HTTPSConnection object at 0x10284bac0>: Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known

Errors in Spark configuration

Error in dataset id option

The value of dataset.id configured in the Kernel dataset write statement is not correct for the aura-bot Kernel app. To solve it, review the configuration of the file configured in the job’s variable: AURA_KPI_AVRO_ADAPTER_CONFIG_PATH. This file contains the list of datasets to be imported. If this dataset is not included, contact Kernel Operations team and request them to add this dataset with a specific version and include the new scope in purpose configured for the corresponding application.
For more detail: Kernel datasets configuration

In the aura-databricks-jobs logs, an error message similar to this will appear:

com.telefonica.baikal.spark.exceptions.InvalidDataSourceConfigException: An error occurred trying to recover dataset D_Aura_LivingApp_ERROR-6: ErrorResponse(NOT_FOUND,Dataset D_Aura_LivingApp_ERROR version 6 not found,None). Configured data source options Map(client.purposes -> aura-kpi-data-write-purpose, 4p.baseurl -> global-int-current.baikalplatform.com, writemode -> append, dataset.id -> D_Aura_LivingApp_ERROR, correlator -> df776bdc-a7d9-482e-8364-8c617afc75be, client.scopes -> , repartition.enabled -> true, client.id -> aura-bot, skipunpseudonymize -> true, repartition.compressedrecordsize -> 1403, client.secret -> ********, dataset.version -> 6)

Error in version of dataset option

The value of dataset.version configured in the Kernel dataset write statement is not correct for the aura-bot Kernel app. To solve it, review the configuration of the file configured in the job’s variable: AURA_KPI_AVRO_ADAPTER_CONFIG_PATH. This file contains the list of datasets, together with their versions, to be imported.
The value of dataset.version is not correct for the aura-bot Kernel app because the format is not number. In the aura-databricks-jobs logs, an error message similar to this will appear:
```
pyspark.sql.utils.IllegalArgumentException: For input string: "version_error"
```

The value of dataset.version is not correct for the aura-bot Kernel app because this version does not exist. In the aura-databricks-jobs logs, an error message similar to this will appear:

py4j.protocol.Py4JJavaError: An error occurred while calling o123.save.
: com.telefonica.baikal.spark.exceptions.InvalidDataSourceConfigException: An error occurred trying to recover dataset D_Aura_LivingApp_PRUEBAS_AURA-8: ErrorResponse(NOT_FOUND,Dataset D_Aura_LivingApp_PRUEBAS_AURA version 8 not found,None). Configured data source options Map(client.purposes -> aura-kpi-data-write-purpose, 4p.baseurl -> global-int-current.baikalplatform.com, writemode -> append, dataset.id -> D_Aura_LivingApp_PRUEBAS_AURA, correlator -> 09c988c5-4d45-4590-9c76-847b7f3d1579, client.scopes -> , repartition.enabled -> true, client.id -> aura-bot, skipunpseudonymize -> true, repartition.compressedrecordsize -> 1403, client.secret -> ********, dataset.version -> 8)

Error in base URL option

The value of AURA_FP_SPARK_BASE_URL in the job’s variable used to set 4p.baseurl in the Kernel dataset write statement is not correct for the aura-bot Kernel app.

To solve it, contact Kernel Operations team to review the value of the variable. In the aura-databricks-jobs logs, an error message similar to this will appear:

[WARN] [10/09/2024 10:45:56.456] [spark-sdk-akka.actor.default-dispatcher-4] [spark-sdk/Pool(shared->https://auth.global-int-current.baikalplatform.com.error:443)] Connection attempt failed. Backing off new connection attempts for at least 100 milliseconds.
[WARN] [10/09/2024 10:46:01.495] [spark-sdk-akka.actor.default-dispatcher-3] [spark-sdk/Pool(shared->https://auth.global-int-current.baikalplatform.com.error:443)] Connection attempt failed. Backing off new connection attempts for at least 200 milliseconds.
[WARN] [10/09/2024 10:46:06.545] [spark-sdk-akka.actor.default-dispatcher-7] [spark-sdk/Pool(shared->https://auth.global-int-current.baikalplatform.com.error:443)] Connection attempt failed. Backing off new connection attempts for at least 400 milliseconds.
[WARN] [10/09/2024 10:46:11.569] [spark-sdk-akka.actor.default-dispatcher-3] [spark-sdk/Pool(shared->https://auth.global-int-current.baikalplatform.com.error:443)] Connection attempt failed. Backing off new connection attempts for at least 800 milliseconds.
[WARN] [10/09/2024 10:46:16.600] [spark-sdk-akka.actor.default-dispatcher-7] [spark-sdk/Pool(shared->https://auth.global-int-current.baikalplatform.com.error:443)] Connection attempt failed. Backing off new connection attempts for at least 1600 milliseconds.
[WARN] [10/09/2024 10:46:21.633] [spark-sdk-akka.actor.default-dispatcher-3] [spark-sdk/Pool(shared->https://auth.global-int-current.baikalplatform.com.error:443)] Connection attempt failed. Backing off new connection attempts for at least 3200 milliseconds.
[WARN] [10/09/2024 10:46:26.673] [spark-sdk-akka.actor.default-dispatcher-45] [spark-sdk/Pool(shared->https://auth.global-int-current.baikalplatform.com.error:443)] Connection attempt failed. Backing off new connection attempts for at least 6400 milliseconds.
[WARN] [10/09/2024 10:46:39.154] [spark-sdk-akka.actor.default-dispatcher-48] [spark-sdk/Pool(shared->https://auth.global-int-current.baikalplatform.com.error:443)] Connection attempt failed. Backing off new connection attempts for at least 12800 milliseconds.
[WARN] [10/09/2024 10:46:52.129] [spark-sdk-akka.actor.default-dispatcher-48] [spark-sdk/Pool(shared->https://auth.global-int-current.baikalplatform.com.error:443)] Connection attempt failed. Backing off new connection attempts for at least 25600 milliseconds.
[WARN] [10/09/2024 10:47:19.988] [spark-sdk-akka.actor.default-dispatcher-48] [spark-sdk/Pool(shared->https://auth.global-int-current.baikalplatform.com.error:443)] Connection attempt failed. Backing off new connection attempts for at least 51200 milliseconds.
24/10/09 10:47:19 ERROR DefaultOAuthService: An error occurred trying to connect with http service
akka.stream.StreamTcpException: Tcp command [Connect(auth.global-int-current.baikalplatform.com.error:443,None,List(),Some(10 seconds),true)] failed because of java.net.UnknownHostException: auth.global-int-current.baikalplatform.com.error
Caused by: java.net.UnknownHostException: auth.global-int-current.baikalplatform.com.error

Error in client id option

The value of AURA_FP_SPARK_CLIENT_ID in the job’s variable used to set client.id in the Kernel dataset write statement is not correct for the aura-bot Kernel app. To solve it, review the credentials in the aura-conversations bucket/blob container in Kernel. In the aura-databricks-jobs logs, an error message similar to this will appear, and a timeout of the job will occur since it will remain trying to execute that statement until the job is stopped by the databricks manager.

24/10/09 10:38:48 ERROR OAuthTokenActor: Invalid authentication: invalid_client, Bad credentials
24/10/09 10:38:48 ERROR OAuthTokenActor: Could not update token, rescheduling in PT5S

Error in client secret option

The value of AURA_FP_SPARK_CLIENT_SECRET in the job’s variable used to set client.secret in the Kernel dataset write statement is not correct for the aura-bot Kernel app.

To solve it, review the credentials with Kernel operations team for the aura-bot Kernel app.

In the aura-databricks-jobs logs, an error message similar to this will appear, and a timeout of the job will occur since it will remain trying to execute that statement until the job is stopped by the databricks manager.

24/10/09 10:58:51 ERROR OAuthTokenActor: Invalid authentication: invalid_client, Bad credentials
24/10/09 10:58:51 ERROR OAuthTokenActor: Could not update token, rescheduling in PT5S

Error in purposes option

The value of AURA_FP_SPARK_PURPOSES in the job’s variable used to set client.purposes in the Kernel dataset write statement is not correct for the aura-bot Kernel app.

To solve it, contact Kernel operations team and request them to add the purpose for the corresponding application. In the happening that the purpose is not created follow these guides to create them: Kernel datasets configuration.

24/10/09 10:56:38 ERROR OAuthTokenActor: Invalid authentication: invalid_purpose, Invalid purpose: aura-kpi-data-write-purpose-error for client_credentials
24/10/09 10:56:38 ERROR OAuthTokenActor: Could not update token, rescheduling in PT5S

Token retrieval error: Kernel service not available

The configuration is correct but the Kernel service is not available at that time. A timeout occurs in the job when making several retries, since the Spark session is not closed by Kernel.

In this case, it is necessary to contact Kernel Operations team and wait for the service to be restored and to rerun the job.

Standard error: It is waiting to connect to the Kernel client.

2024-10-26 06:05:35,846 INFO 1016 /databricks/python/lib/python3.9/site-packages/aura_pytraces/aura_logging/base_logger.py msg="Writing blobs of avro blob path: "avro/dimensional/D_Aura_Channel/6.0.0" to dataset_id: "D_Aura_Channel""

Log4j output file: Information about error trying to get token to connect in Kernel, as in the following example:

24/10/26 06:59:32 INFO OAuthTokenActor: Token not set yet [52698445-1943-4d62-9425-8451637736b7]
24/10/26 06:59:32 INFO OAuthTokenActor: Token not set yet [52698445-1943-4d62-9425-8451637736b7]
24/10/26 06:59:32 INFO OAuthTokenActor: Token not set yet [52698445-1943-4d62-9425-8451637736b7]
24/10/26 06:59:32 INFO OAuthTokenActor: Token not set yet [52698445-1943-4d62-9425-8451637736b7]
24/10/26 06:59:32 INFO OAuthTokenActor: Token not set yet [52698445-1943-4d62-9425-8451637736b7]
24/10/26 06:59:32 INFO OAuthTokenActor: Token not set yet [52698445-1943-4d62-9425-8451637736b7]
24/10/26 06:59:32 INFO OAuthTokenActor: Token not set yet [52698445-1943-4d62-9425-8451637736b7]
24/10/26 06:59:32 INFO OAuthTokenActor: Token not set yet [52698445-1943-4d62-9425-8451637736b7]
24/10/26 06:59:32 INFO OAuthTokenActor: Token not set yet [52698445-1943-4d62-9425-8451637736b7]
24/10/26 06:59:33 INFO OAuthTokenActor: Token not set yet [52698445-1943-4d62-9425-8451637736b7]
24/10/26 06:59:33 INFO OAuthTokenActor: Token not set yet [52698445-1943-4d62-9425-8451637736b7]
24/10/26 06:59:33 INFO OAuthTokenActor: Token not set yet [52698445-1943-4d62-9425-8451637736b7]
24/10/26 06:59:33 INFO OAuthTokenActor: Token not set yet [52698445-1943-4d62-9425-8451637736b7]

Error in scopes option

The value of AURA_FP_SPARK_SCOPES in the job’s variable used to set client.scopes in the Kernel dataset write statement is not correct for the aura-bot Kernel app. The most common behavior is that a purpose is created with a list of scopes added, so this variable would not need to be configured. If it is necessary to use this variable and a scope is not defined, an error will be produced. To solve it, review the configuration of the scopes reflected in: Kernel datasets configuration.
In the aura-databricks-jobs logs, an error message similar to this will appear, and a timeout of the job will occur since it will remain trying to execute that statement until the job is stopped by the databricks manager.

24/10/09 11:00:59 ERROR OAuthTokenActor: Invalid authentication: invalid_scope, Invalid scope 'scopes-error' requested for client 'aura-bot-six'
24/10/09 11:00:59 ERROR OAuthTokenActor: Could not update token, rescheduling in PT5S
com.telefonica.baikal.services.exceptions.InvalidOAuthAuthException: Invalid authentication: invalid_scope, Invalid scope 'scopes-error' requested for client 'aura-bot-six'

Errors in Spark execution

Error trying to import dataset with Avro files with schema error

This error is produced in the WRITING_DATASET step because there are Avro files to import with an error schema. To solve it, review the specific error of the schema indicated in logs. To check the problem, review the schema configuration for the failing dataset:

First, get the path of the schema defined in the file configured in the job’s variable: AURA_KPI_AVRO_ADAPTER_CONFIG_PATH.
Afterwards, with the path, get the schema definition.
Depending on the indicated error, you must validate the data of files that do not follow the schema specification.

In the aura-databricks-jobs logs, an error message similar to this will appear:

24/10/09 15:58:53 ERROR Executor: Exception in task 0.0 in stage 63.0 (TID 553)
org.apache.avro.AvroTypeException: Found com.telefonica.urm.Digital_Products.Aura.Aura_Suggestion, expecting com.telefonica.urm.Digital_Products.Aura.Aura_Suggestion, missing required field AURA_MODEL_VERSION_ID

A new file will be created with the name configured in the job’s variable: AURA_KPI_AVRO_PROCESS_ERROR_FILENAME. The file content will be similar to:

{
    "time": "2024-10-09T15:47:41.507980Z",
    "report_link": "https://commauradevstorage.blob.core.windows.net/aura-kpis-ap-six/avro_test/reports/aura-avro-kpis-report-2024-10-09T16%3A01%3A34.247575Z.json?se=2024-11-08T14%3A01%3A46Z&sp=r&sv=2021-08-06&sr=b&sig=GmHLQ/F5rk4Bob5OrbAZBpBs6z/CXiUjI4KLyticGzg%3D"
}

A new report will be created and stored in the path configured in the job’s variable: AURA_KPI_AVRO_SOURCE_SIZE_REPORT_PATH. The report will indicate the error in Aura_Suggestion dataset and will be similar to:

{
    "num_files_kernel_uploaded": 182,
    "num_files_moved_to_processed": 182,
    "num_files_deleted": 182,
    "num_files_skipped": 0,
    "num_errors": 1,
    "summary": {
        "D_Aura_Channel": {
            "dataset_id": "D_Aura_Channel",
            "schema": "dimensional",
            "version": "6.0.0",
            "step": "FINISH",
            "num_files_kernel_uploaded": 25,
            "num_files_moved_to_processed": 25,
            "num_files_deleted": 25,
            "num_files_skipped": 0,
            "num_errors": 0,
            "errors": [],
            "spark_executions": {
                "dataset_id": "D_Aura_Channel",
                "version": 6,
                "correlator": "5f19247e-40b2-4643-8ed1-b1e0f6c0d759",
                "resource_id": "1aabef7e-03f6-40f5-9812-263e49c1d4b0",
                "request_type": "writes",
                "status": "finished",
                "metrics": {
                    "total_records_written": 775,
                    "local_spark_write_discards": 0,
                    "local_spark_write_discards_total": 0,
                    "malformed_records_written": 0,
                    "total_records_filtered_by_gdpr": 0,
                    "local_spark_bytes_written_total": 697275,
                    "total_malformed_records_by_partition_written": [],
                    "partitions_written": [],
                    "total_malformed_records_written": 0,
                    "total_malformed_records_by_column_written": [],
                    "total_records_by_partition_written": [],
                    "total_not_informed_records_by_partition_written": [],
                    "records_read": 775,
                    "local_spark_records_written_total": 775,
                    "total_not_informed_records_written": 0,
                    "records_written": 775,
                    "total_malformed_records_discarded": 0,
                    "records_discarded": 0,
                    "data_access_audit": {
                        "partitions_num": 1,
                        "wasb_type": "avro_fp"
                    },
                    "total_executor_cpu_millis": 1,
                    "total_executor_memory": 593913446,
                    "total_bytes_written": 68804
                }
            },
            "files_uploaded": [
                "avro_test/dimensional/D_Aura_Channel/6.0.0/CR_DIM_CHANNEL_20241017T070000Z.avro",
                "avro_test/dimensional/D_Aura_Channel/6.0.0/CR_DIM_CHANNEL_20241017T080000Z.avro",
                "avro_test/dimensional/D_Aura_Channel/6.0.0/CR_DIM_CHANNEL_20241017T090000Z.avro",
                "avro_test/dimensional/D_Aura_Channel/6.0.0/CR_DIM_CHANNEL_20241017T100000Z.avro"
            ],
            "duration_seconds": 141.32
        },
        "Aura_Suggestion": {
            "dataset_id": "Aura_Suggestion",
            "schema": "entity",
            "version": "6.0.0",
            "step": "WRITING_DATASET",
            "num_files_kernel_uploaded": 0,
            "num_files_moved_to_processed": 0,
            "num_files_deleted": 0,
            "num_files_skipped": 0,
            "num_errors": 1,
            "errors": [
                {
                    "step": "WRITING_DATASET",
                    "description": "avro_test/entity/Aura_Suggestion/6.0.0",
                    "error": "An error occurred while calling o208.save.\n: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 63.0 failed 1 times, most recent failure: Lost task 0.0 in stage 63.0 (TID 553) (192.168.1.71 executor driver): org.apache.avro.AvroTypeException: Found com.telefonica.urm.Digital_Products.Aura.Aura_Suggestion, expecting com.telefonica.urm.Digital_Products.Aura.Aura_Suggestion, missing required field AURA_MODEL_VERSION_ID\n\tat org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:308)\n\tat org.apache.avro.io.parsing.Parser.advance(Parser.java:86)\n\tat org.apache.avro.io.ResolvingDecoder.readFieldOrder(ResolvingDecoder.java:127)\n\tat org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:240)\n\tat org.apache.avro.specific.SpecificDatumReader.readRecord(SpecificDatumReader.java:123)\n\tat org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:180)\n\tat org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:161)\n\tat org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:154)\n\tat org.apache.avro.file.DataFileStream.next(DataFileStream.java:251)\n\tat org.apache.avro.mapreduce.AvroRecordReaderBase.nextKeyValue(AvroRecordReaderBase.java:126)\n\tat org.apache.avro.mapreduce.AvroKeyRecordReader.nextKeyValue(AvroKeyRecordReader.java:55)\n\tat org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext(NewHadoopRDD.scala:251)\n\tat org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)\n\tat scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)\n\tat scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)\n\tat scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)\n\tat scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)\n\tat scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)\n\tat scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:513)\n\tat scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)\n\tat scala.collection.Iterator$SliceIterator.hasNext(Iterator.scala:268)\n\tat scala.collection.Iterator.foreach(Iterator.scala:943)\n\tat scala.collection.Iterator.foreach$(Iterator.scala:943)\n\tat scala.collection.AbstractIterator.foreach(Iterator.scala:1431)\n\tat scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)\n\tat scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)\n\tat scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105)\n\tat scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49)\n\tat scala.collection.TraversableOnce.to(TraversableOnce.scala:366)\n\tat scala.collection.TraversableOnce.to$(TraversableOnce.scala:364)\n\tat scala.collection.AbstractIterator.to(Iterator.scala:1431)\n\tat scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:358)\n\tat scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:358)\n\tat scala.collection.AbstractIterator.toBuffer(Iterator.scala:1431)\n\tat scala.collection.TraversableOnce.toArray(TraversableOnce.scala:345)\n\tat scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:339)\n\tat scala.collection.AbstractIterator.toArray(Iterator.scala:1431)\n\tat org.apache.spark.rdd.RDD.$anonfun$take$2(RDD.scala:1470)\n\tat org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2278)\n\tat org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)\n\tat org.apache.spark.scheduler.Task.run(Task.scala:136)\n\tat org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)\n\t... 1 more\n",
                    "corr": "5f19247e-40b2-4643-8ed1-b1e0f6c0d759"
                }
            ],
            "spark_executions": {}
        },
       
    },
    "start_time": "2024-10-09T15:47:41.507980Z",
    "end_time": "2024-10-09T16:01:34.247575Z",
    "duration_seconds": 832.73,
    "step": "FINISH",
    "status": "failed"
}

Error trying to import Avro files with wrong schema in dataset and version configured in Kernel

This error is produced in the WRITING_DATASET step because there is a wrong Avro dataset schema configured in Kernel. This can happen if the configured schema for an Avro dataset and its specific version have not been properly published in Kernel’s environment.

For instance, Aura_Audit dataset for v6.0.0 in Kernel does not have the latest schema changes indicated in 4p-datasets codebase repository, for example, Aura_Audit dataset for v6.0.0 in 4p-datasets.

In the aura-databricks-jobs logs, error messages similar to the ones below will appear in different files:

Standard error file: Information on the general import process.

2024-10-14 13:08:53,922 ERROR 1110 /databricks/python/lib/python3.9/site-packages/aura_pytraces/aura_logging/base_logger.py msg="Error writing DATASET_ID: "Aura_Audit", there are local spark write discards that must be reviewed."

Log4j output file: Information about Spark operations and detail of the records with errors that will be ignored, as in the following example:

24/10/14 13:05:50 ERROR WasbAvroProducer: Unable to transform [c3a5b3ef-c968-4cf5-8c65-41d62b1a1562,2024-10-14 07:57:37.577,null,92e76dd4-a5c2-4672-a6c5-ba613e229c19,CRI,ai,d18c3ad3-6c7b-5739-8bcd-02e6d49b28bb,aura-gateway-api-6ddc48797-pnvl9,9.4.0,2024-10-14,0401] to avro message at partition 0 (ignoring it)
org.apache.spark.sql.avro.IncompatibleSchemaException: Cannot write "ai" since it's not defined in enum "rag", "generative", "message", "other", "nlpaas"
    at org.apache.spark.sql.avro.BaikalAvroSerializer.$anonfun$newConverter$12(BaikalAvroSerializer.scala:123)
    at org.apache.spark.sql.avro.BaikalAvroSerializer.$anonfun$newConverter$12$adapted(BaikalAvroSerializer.scala:120)
    at org.apache.spark.sql.avro.BaikalAvroSerializer.$anonfun$newStructConverter$2(BaikalAvroSerializer.scala:258)```

A new file will be created with the name configured in the job’s variable: AURA_KPI_AVRO_PROCESS_ERROR_FILENAME. The file content will be similar to:

{
    "time": "2024-10-09T15:47:41.507980Z",
    "report_link": "https://commauradevstorage.blob.core.windows.net/aura-kpis-ap-six/avro_test/reports/aura-avro-kpis-report-2024-10-09T16%3A01%3A34.247575Z.json?se=2024-11-08T14%3A01%3A46Z&sp=r&sv=2021-08-06&sr=b&sig=GmHLQ/F5rk4Bob5OrbAZBpBs6z/CXiUjI4KLyticGzg%3D"
}

{
    "num_files_kernel_uploaded": 20,
    "num_files_moved_to_processed": 20,
    "num_files_deleted": 20,
    "num_files_skipped": 0,
    "num_errors": 1,
    "summary": {
        "Aura_Audit": {
            "dataset_id": "Aura_Audit",
            "schema": "entity",
            "version": "6.0.0",
            "step": "WRITING_DATASET_ERROR_NOT_RECOVERABLE",
            "num_files_kernel_uploaded": 9,
            "num_files_moved_to_processed": 9,
            "num_files_deleted": 9,
            "num_files_skipped": 0,
            "num_errors": 1,
            "errors": [
                {
                    "step": "WRITING_DATASET_ERROR_NOT_RECOVERABLE",
                    "key": "WRITING_DATASET_DISCARDED_RECORDS",
                    "description": "Local spark discarded records",
                    "error": "Error writing DATASET_ID: \"Aura_Audit\", there are local spark write discards that must be reviewed.",
                    "corr": "3ef2ac10-726f-4f07-a6ae-5407c2b02fb2"
                }
            ],
            "spark_executions": {
                "dataset_id": "Aura_Audit",
                "version": 6,
                "correlator": "3ef2ac10-726f-4f07-a6ae-5407c2b02fb2",
                "resource_id": "e03a1c5b-cd69-4fef-92fb-d80d3f8dd92a",
                "request_type": "writes",
                "status": "finished",
                "metrics": {
                    "total_records_written": 1083,
                    "local_spark_write_discards": 9,
                    "local_spark_write_discards_total": 9,
                    "malformed_records_written": 0,
                    "total_records_filtered_by_gdpr": 0,
                    "local_spark_bytes_written_total": 208945,
                    "total_malformed_records_by_partition_written": [],
                    "partitions_written": [
                        [
                            [
                                "DAY_DT",
                                "2024-10-10"
                            ]
                        ],
                        [
                            [
                                "DAY_DT",
                                "2024-10-14"
                            ]
                        ],
                        [
                            [
                                "DAY_DT",
                                "2024-10-11"
                            ]
                        ]
                    ],
                    "total_malformed_records_written": 0,
                    "total_malformed_records_by_column_written": [],
                    "total_records_by_partition_written": [
                        [
                            "DAY_DT=2024-10-14",
                            981
                        ],
                        [
                            "DAY_DT=2024-10-10",
                            47
                        ],
                        [
                            "DAY_DT=2024-10-11",
                            55
                        ]
                    ],
                    "total_not_informed_records_by_partition_written": [],
                    "records_read": 1083,
                    "local_spark_records_written_total": 1083,
                    "total_not_informed_records_written": 0,
                    "records_written": 1083,
                    "total_malformed_records_discarded": 0,
                    "records_discarded": 0,
                    "data_access_audit": {
                        "partitions_num": 1,
                        "wasb_type": "avro_fp"
                    },
                    "total_executor_cpu_millis": 1,
                    "total_executor_memory": 593913446,
                    "total_bytes_written": 63165
                }
            },
            "files_uploaded": [
                "avro_test/entity/Aura_Audit/6.0.0/AURA_062a0ab0-d0bd-5347-98bf-d88977af622f_CR_AUDIT_20241007T090000Z.avro",
                "avro_test/entity/Aura_Audit/6.0.0/AURA_1d43887a-f368-51ce-abee-60f5b25387ad_CR_AUDIT_20241004T110000Z.avro"
            ]
        }
    },
    "start_time": "2024-10-14T12:55:38.427732Z",
    "end_time": "2024-10-14T13:08:41.567204Z",
    "duration_seconds": 783.13,
    "step": "WRITING_KERNEL_STAGE",
    "status": "failed"
}

To resolve these errors, several steps must be performed:

Contact Kernel Operations team and specify the dataset id and version that must be republished, so that the environment is updated.

Before the job is run again, check if the problem in the schema has caused errors in some specific records that have not been loaded. They could have these messages in the error report:

Local Spark discarded records:

    {
        "step": "WRITING_DATASET",
        "description": "Local spark discarded records",
        "error": "Error writing DATASET_ID: \"{DATASET_ID}\", there are local spark write discards that must be reviewed.",
        "corr": "3ef2ac10-726f-4f07-a6ae-5407c2b02fb2"
    }

Malformed records:

    {
        "step": "WRITING_DATASET",
        "description": "Malformed records",
        "error": "Error writing DATASET_ID: \"{DATASET_ID}\", there are malformed records written that must be reviewed.",
        "corr": "3ef2ac10-726f-4f07-a6ae-5407c2b02fb2"
    }

Records discarded:

    {
        "step": "WRITING_DATASET",
        "description": "Malformed records",
        "error": "Error writing DATASET_ID: \"{DATASET_ID}\", there are records discarded written that must be reviewed.",
        "corr": "3ef2ac10-726f-4f07-a6ae-5407c2b02fb2"
    }

For these cases, the wrong records must be manually corrected and reloaded independently of the rest of the records that were loaded correctly, to avoid duplicated data in the Kernel datasets. To correct the errors of schema, the information can be obtained from the Databricks’s logs, as explained before.

When these records have been resolved, the file will be deleted so that the job can be run again normally. Remove the file that was created with the name configured in the job’s variable: AURA_KPI_AVRO_PROCESS_ERROR_FILENAME.

Error trying to import dataset with missing schema

This error is produced in the READING_BLOBS step due to a missing Avro schema in configuration. To solve it, review the schema path error indicated in logs and check if that path is valid in the file configured in the job’s variable: AURA_KPI_AVRO_ADAPTER_CONFIG_PATH. If you know the correct path to modify, you could change it in this file.
In the aura-databricks-jobs logs, an error message similar to this will appear:

py4j.protocol.Py4JJavaError: An error occurred while calling o39.load.
: java.io.FileNotFoundException: Could not read schema. You provided a path that does not exists: wasbs://aura-kpis-ap-six@commauradevstorage.blob.core.windows.net/avro_test/schemas/dimensional/6.0.0/aura-channel-asvc.json. Make sure that the filename and extension are in the path.
2024-10-09 11:13:15,924 ERROR 84269 .venv/../base_logger.py msg="Error processed avro_type_schema: "dimensional" and dataset_id: "D_Aura_Channel""

A new file will be created with the name configured in the job’s variable: AURA_KPI_AVRO_PROCESS_ERROR_FILENAME. The file content will be similar to:

{
    "time": "2024-10-09T15:47:41.507980Z",
    "report_link": "https://commauradevstorage.blob.core.windows.net/aura-kpis-ap-six/avro_test/reports/aura-avro-kpis-report-2024-10-09T16%3A01%3A34.247575Z.json?se=2024-11-08T14%3A01%3A46Z&sp=r&sv=2021-08-06&sr=b&sig=GmHLQ/F5rk4Bob5OrbAZBpBs6z/CXiUjI4KLyticGzg%3D"
}

{
    "num_files_kernel_uploaded": 0,
    "num_files_moved_to_processed": 0,
    "num_files_deleted": 0,
    "num_files_skipped": 0,
    "num_errors": 1,
    "summary": {
        "D_Aura_Channel": {
            "dataset_id": "D_Aura_Channel",
            "schema": "dimensional",
            "version": "6.0.0",
            "step": "READING_BLOBS",
            "num_files_kernel_uploaded": 0,
            "num_files_moved_to_processed": 0,
            "num_files_deleted": 0,
            "num_files_skipped": 0,
            "num_errors": 1,
            "errors": [
                {
                    "step": "READING_BLOBS",
                    "description": "avro_test/dimensional/D_Aura_Channel/6.0.0",
                    "error": "An error occurred while calling o39.load.\n: java.io.FileNotFoundException: Could not read schema. You provided a path that does not exists: wasbs://aura-kpis-ap-six@commauradevstorage.blob.core.windows.net/avro_test/schemas/dimensional/6.0.0/aura-channel-asvc.json. Make sure that the filename and extension are in the path.\n\tat com.telefonica.baikal.spark.sources.telefonica.external.write.TelefonicaExternalSourceRelationProvider.readSchema(TelefonicaExternalSourceRelationProvider.scala:75)\n\tat com.telefonica.baikal.spark.sources.telefonica.external.write.TelefonicaExternalSourceRelationProvider.readSchema$(TelefonicaExternalSourceRelationProvider.scala:66)\n\tat com.telefonica.baikal.spark.sources.telefonica.external.TelefonicaExternalSource.readSchema(TelefonicaExternalSource.scala:33)\n\tat com.telefonica.baikal.spark.sources.telefonica.external.TelefonicaExternalSource.$anonfun$getTable$2(TelefonicaExternalSource.scala:65)\n\tat scala.collection.MapLike.getOrElse(MapLike.scala:131)\n\tat scala.collection.MapLike.getOrElse$(MapLike.scala:129)\n\tat org.apache.spark.sql.catalyst.util.CaseInsensitiveMap.getOrElse(CaseInsensitiveMap.scala:30)\n\tat com.telefonica.baikal.spark.sources.telefonica.external.TelefonicaExternalSource.getTable(TelefonicaExternalSource.scala:63)\n\tat org.apache.spark.sql.execution.datasources.v2.DataSourceV2Utils$.getTableFromProvider(DataSourceV2Utils.scala:92)\n\tat org.apache.spark.sql.execution.datasources.v2.DataSourceV2Utils$.loadV2Source(DataSourceV2Utils.scala:140)\n\tat org.apache.spark.sql.DataFrameReader.$anonfun$load$1(DataFrameReader.scala:209)\n\tat scala.Option.flatMap(Option.scala:271)\n\tat org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:207)\n\tat org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:185)\n\tat java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\n\tat java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\n\tat java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.base/java.lang.reflect.Method.invoke(Method.java:566)\n\tat py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)\n\tat py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)\n\tat py4j.Gateway.invoke(Gateway.java:282)\n\tat py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)\n\tat py4j.commands.CallCommand.execute(CallCommand.java:79)\n\tat py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)\n\tat py4j.ClientServerConnection.run(ClientServerConnection.java:106)\n\tat java.base/java.lang.Thread.run(Thread.java:834)\n",
                    "corr": "4f4db627-1de8-4436-80c9-95ade4788559"
                }
            ],
            "spark_executions": {}
        }
    },
    "start_time": "2024-10-09T16:23:01.483043Z",
    "end_time": "2024-10-09T16:23:39.137639Z",
    "duration_seconds": 37.65,
    "step": "WRITING_KERNEL_STAGE",
    "status": "failed"
}

Error trying to init Spark session

In the event of a possible error in the initialization of the spark context. To solve it, we must re-execute the job to check if this momentary connection problem with the cluster is resolved. If the error continues to occur, it would be necessary to contact Kernel operations team. In the aura-databricks-jobs logs, an error message similar to this will appear:

24/10/09 13:18:28 WARN TransportChannelHandler: Exception in connection from /192.168.1.71:59460
java.lang.IllegalArgumentException: Too large frame: 5785721462170058752
	at org.sparkproject.guava.base.Preconditions.checkArgument(Preconditions.java:119)
	at org.apache.spark.network.util.TransportFrameDecoder.decodeNext(TransportFrameDecoder.java:148)
	at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:98)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:722)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:658)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:584)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496)
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.base/java.lang.Thread.run(Thread.java:834)
24/10/09 13:18:28 ERROR TransportResponseHandler: Still have 1 requests outstanding when connection from /192.168.1.71:59460 is closed
24/10/09 13:18:28 ERROR SparkContext: Error initializing SparkContext.

A new file will be created with the name configured in the job’s variable: AURA_KPI_AVRO_PROCESS_ERROR_FILENAME. The file content will be similar to:

{
    "time": "2024-10-09T13:18:08.119222Z",
    "report_link": "https://{account_name}}.blob.core.windows.net/{container_name}/avro/reports/aura-avro-kpis-report-2024-10-09T13%3A18%3A28.761361Z.json?{signature}",
    "error": [
        "An error occurred in sparkSDKManager. An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.\n: java.lang.IllegalArgumentException: Too large frame: 5785721462170058752\n\tat org.sparkproject.guava.base.Preconditions.checkArgument(Preconditions.java:119)\n\tat org.apache.spark.network.util.TransportFrameDecoder.decodeNext(TransportFrameDecoder.java:148)\n\tat org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:98)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)\n\tat io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)\n\tat io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)\n\tat io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)\n\tat io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)\n\tat io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:722)\n\tat io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:658)\n\tat io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:584)\n\tat io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496)\n\tat io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)\n\tat io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)\n\tat io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)\n\tat java.base/java.lang.Thread.run(Thread.java:834)\n"
    ]
}

It will be created a new report stored in path configured in the job’s variable: AURA_KPI_AVRO_SOURCE_SIZE_REPORT_PATH. The report will be similar to:

{
    "num_files_kernel_uploaded": 0,
    "num_files_moved_to_processed": 0,
    "num_files_deleted": 0,
    "num_files_skipped": 0,
    "num_errors": 1,
    "summary": {
        "process_error": "An error occurred in sparkSDKHandler. An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.\n: java.lang.IllegalArgumentException: Too large frame: 5785721462170058752\n\tat org.sparkproject.guava.base.Preconditions.checkArgument(Preconditions.java:119)\n\tat org.apache.spark.network.util.TransportFrameDecoder.decodeNext(TransportFrameDecoder.java:148)\n\tat org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:98)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)\n\tat io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)\n\tat io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)\n\tat io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)\n\tat io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)\n\tat io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:722)\n\tat io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:658)\n\tat io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:584)\n\tat io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496)\n\tat io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)\n\tat io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)\n\tat io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)\n\tat java.base/java.lang.Thread.run(Thread.java:834)\n"
    },
    "start_time": "2024-10-09T13:18:08.119222Z",
    "end_time": "2024-10-09T13:18:28.761361Z",
    "duration_seconds": 20.64,
    "step": "INIT",
    "status": "failed"
}

Writing error in dataset due to out of memory error

In this scenario, certain stage in Spark is not executed due to some Java heap space or error, so the files of that dataset are not imported.

To correct it, delete the error file configured in the job’s variable: AURA_KPI_AVRO_PROCESS_ERROR_FILENAME and run the job again, so that the data from the files that were not imported are now loaded.

In the aura-databricks-jobs logs, an error message similar to this will appear in the Log4j output file:

An error occurred while calling o582.save.\n: com.telefonica.baikal.spark.exceptions.WriteStatusException: The writing process has failed with resourceId 10543db5-cb35-446e-8cc7-349a3c6cbffb and dataset (D_Aura_App, 6)
at com.telefonica.baikal.spark.sources.telefonica.config.DatasetServiceComponents.$anonfun$waitWriterStatus$2(DatasetServiceComponents.scala:344)

A new report is generated and stored in path configured in the job’s variable: AURA_KPI_AVRO_SOURCE_SIZE_REPORT_PATH. The report will be similar to:

{
    "num_files_kernel_uploaded": 0,
    "num_files_moved_to_processed": 0,
    "num_files_deleted": 0,
    "num_files_skipped": 0,
    "num_errors": 1,
    "summary": {
        "D_Aura_App": {
            "errors": [
                {
                    "step": "WRITING_DATASET",
                    "description": "avro/dimensional/D_Aura_App/6.0.0",
                    "error": "An error occurred while calling o582.save.\n: com.telefonica.baikal.spark.exceptions.WriteStatusException: The writing process has failed with resourceId 10543db5-cb35-446e-8cc7-349a3c6cbffb and dataset (D_Aura_App, 6)\n\tat com.telefonica.baikal.spark.sources.telefonica.config.DatasetServiceComponents.$anonfun$waitWriterStatus$2(DatasetServiceComponents.scala:344)\n\tat com.telefonica.baikal.spark.sources.telefonica.config.DatasetServiceComponents.$anonfun$waitWriterStatus$2$adapted(DatasetServiceComponents.scala:341)\n\tat scala.util.Success.$anonfun$map$1(Try.scala:255)\n\tat scala.util.Success.map(Try.scala:213)\n\tat scala.concurrent.Future.$anonfun$map$1(Future.scala:292)\n\tat scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33)\n\tat scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33)\n\tat scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)\n\tat java.base/java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1426)\n\tat java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)\n\tat java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020)\n\tat java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656)\n\tat java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594)\n\tat java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183)\n",
                    "corr": "21fe54f6-6c34-499a-993c-9dfe30e9e717"
                }
            ],
            "spark_executions": {
                "dataset_id": "D_Aura_App",
                "version": 6,
                "correlator": "21fe54f6-6c34-499a-993c-9dfe30e9e717",
                "resource_id": "10543db5-cb35-446e-8cc7-349a3c6cbffb",
                "request_type": "writes",
                "status": "failed",
                "metrics": {
                    "local_spark_bytes_written_total": 44596,
                    "local_spark_records_written_total": 241,
                    "local_spark_write_discards_total": 0,
                    "local_spark_write_discards": 0
                }
            }
        }
    },
    "start_time": "2024-10-09T13:18:08.119222Z",
    "end_time": "2024-10-09T13:18:28.761361Z",
    "duration_seconds": 20.64,
    "step": "WRITING_KERNEL_STAGE",
    "status": "failed"

Error trying to import datasets with timeout in Spark execution

This error is produced in the WRITING_DATASET step because the configurations of the spark partitions are not correct. The spark process runs for two hours and then terminates without writing the data to the dataset.

To solve it, contact Kernel Operations team to review the file configured in the job’s variable: AURA_KPI_AVRO_SOURCE_SIZE_REPORT_PATH and modify the value of averageFileSize in each dataset.

In the aura-databricks-jobs logs, a message similar to this will appear, and no trace will continue afterwards since the process will end with a timeout.

{"corr":"8be82aec-6559-4fc9-be74-74dfc56de615","msg":"Writing blobs of avro blob path: \"avro/entity/D_Aura_Audit/6.0.0\" to dataset_id: \"D_Aura_LivingApp\"","lvl":"INFO","time":"2024-12-18T12:17:51.056Z","app":"aura-databricks-jobs","version":"9.6.0","module":"avro-kpis-manager","host":"1218-120721-e3l79q40-192-168-64-10","pid":1278,"caller_info":"/databricks/python/lib/python3.9/site-packages/aura_databricks_jobs/avro_kpis/avro_kpis_manager.py:70"}

A new file will be created with the name configured in the job’s variable: AURA_KPI_AVRO_PROCESS_ERROR_FILENAME. The file content will be similar to:

{
    "time": "2024-10-09T15:47:41.507980Z",
    "report_link": "https://commauradevstorage.blob.core.windows.net/aura-kpis-ap-six/avro_test/reports/aura-avro-kpis-report-2024-10-09T16%3A01%3A34.247575Z.json?se=2024-11-08T14%3A01%3A46Z&sp=r&sv=2021-08-06&sr=b&sig=GmHLQ/F5rk4Bob5OrbAZBpBs6z/CXiUjI4KLyticGzg%3D"
}

A new report will be created and stored in the path configured in the job’s variable: AURA_KPI_AVRO_SOURCE_SIZE_REPORT_PATH. The report will indicate the process will not finish in the FINISH stage but in WRITING_DATASET_STAGE stage. In the next execution, it will try to load the files again.

{
    "num_files_kernel_uploaded": 0,
    "num_files_moved_to_processed": 0,
    "num_files_deleted": 0,
    "num_files_skipped": 0,
    "num_errors": 0,
    "summary": {
        "D_Aura_Channel": {
            "dataset_id": "D_Aura_Audit",
            "schema": "entity",
            "version": "6.0.0",
            "step": "WRITING_DATASET",
            "num_files_kernel_uploaded": 0,
            "num_files_moved_to_processed": 0,
            "num_files_deleted": 0,
            "num_files_skipped": 0,
            "num_errors": 0,
            "errors": [],
            },
            "files_uploaded": [
                "avro_test/dimensional/D_Aura_Channel/6.0.0/CR_DIM_CHANNEL_20241017T070000Z.avro",
                "avro_test/dimensional/D_Aura_Channel/6.0.0/CR_DIM_CHANNEL_20241017T080000Z.avro",
                "avro_test/dimensional/D_Aura_Channel/6.0.0/CR_DIM_CHANNEL_20241017T090000Z.avro",
                "avro_test/dimensional/D_Aura_Channel/6.0.0/CR_DIM_CHANNEL_20241017T100000Z.avro"
            ],
            "duration_seconds": 1411.32
        }
       
    },
    "start_time": "2024-10-09T15:47:41.507980Z",
    "end_time": "2024-10-09T16:01:34.247575Z",
    "duration_seconds": 832.73,
    "step": "WRITING_DATASET_STAGE",
    "status": "succesfully"
}

Reports SAS Expiration configuration

The value of AURA_KPI_AVRO_REPORTS_SAS_EXPIRATION has an incorrect format. To solve it, indicate an integer with the time to expiration in minutes to be configured.
In the aura-databricks-jobs logs, an error message similar to this will appear:

2024-10-09 11:04:29,495 ERROR 83383 .venv/../base_logger.py msg="Error in configuration: {'AURA_KPI_AVRO_REPORTS_SAS_EXPIRATION': ['Not a valid integer.']}"

Error copying files to processed folder

This error is produced in the MOVING_BLOBS_TO_PROCESSED step due to, for example, a connection error with Azure or permissions problems when copying the destination folder.

To resolve it, move manually the files from the path with the error to the processed folder configured in the job’s variable: AURA_KPI_AVRO_PROCESSED_FOLDER_PATH.

In the aura-databricks-jobs logs, an error message similar to this will appear:

2024-10-09 11:23:15,924 ERROR 84269 .venv/../base_logger.py msg="Detected 2 errors when trying copying files in "avro/processed/avro/dimensional/D_Aura_Channel/6.0.0". Review generated report for more detail.

A new report will be created and stored in the path configured in the job’s variable: AURA_KPI_AVRO_SOURCE_SIZE_REPORT_PATH. The report will be similar to:

{
    "num_files_kernel_uploaded": 2,
    "num_files_moved_to_processed": 0,
    "num_files_deleted": 0,
    "num_files_skipped": 0,
    "num_errors": 2,
    "summary": {
        "D_Aura_Channel": {
            "dataset_id": "D_Aura_Channel",
            "schema": "dimensional",
            "version": "6.0.0",
            "step": "MOVING_BLOBS_TO_PROCESSED",
            "num_files_kernel_uploaded": 2,
            "num_files_moved_to_processed": 0,
            "num_files_deleted": 0,
            "num_files_skipped": 0,
            "num_errors": 2,
            "errors": [
                {
                    "step": "MOVING_BLOBS_TO_PROCESSED",
                    "description": "avro_test/dimensional/D_Aura_Channel/6.0.0/CR_DIM_CHANNEL_20240920T120000Z.avro",
                    "error": "Error copy blob: \"avro_test/dimensional/D_Aura_Channel/6.0.0/CR_DIM_CHANNEL_20240920T120000Z.avro\" to \"avro_test/processed/avro/dimensional/D_Aura_Channel/6.0.0/CR_DIM_CHANNEL_20240920T120000Z.avro\" and container: \"aura-kpis-ap-six\". Error: Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.\nRequestId:ae99a5cf-501e-009f-5b62-195240000000\nTime:2024-10-08T09:11:13.7634974Z\nErrorCode:CannotVerifyCopySource\nContent: <?xml version=\"1.0\" encoding=\"utf-8\"?><Error><Code>CannotVerifyCopySource</Code><Message>Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.\nRequestId:ae99a5cf-501e-009f-5b62-195240000000\nTime:2024-10-08T09:11:13.7634974Z</Message></Error>",
                    "corr": "no-correlator"
                },
                {
                    "step": "MOVING_BLOBS_TO_PROCESSED",
                    "description": "avro_test/dimensional/D_Aura_Channel/6.0.0/CR_DIM_CHANNEL_20240920T130000Z.avro",
                    "error": "Error copy blob: \"avro_test/dimensional/D_Aura_Channel/6.0.0/CR_DIM_CHANNEL_20240920T130000Z.avro\" to \"avro_test/processed/avro/dimensional/D_Aura_Channel/6.0.0/CR_DIM_CHANNEL_20240920T130000Z.avro\" and container: \"aura-kpis-ap-six\". Error: Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.\nRequestId:ae99a5fb-501e-009f-0262-195240000000\nTime:2024-10-08T09:11:13.8156074Z\nErrorCode:CannotVerifyCopySource\nContent: <?xml version=\"1.0\" encoding=\"utf-8\"?><Error><Code>CannotVerifyCopySource</Code><Message>Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.\nRequestId:ae99a5fb-501e-009f-0262-195240000000\nTime:2024-10-08T09:11:13.8156074Z</Message></Error>",
                    "corr": "no-correlator"
                }
              ]
        }
    },
    "start_time": "2024-09-03T17:56:26.464890Z",
    "end_time": "2024-09-03T18:21:17.115379Z",
    "duration_seconds": 1490.65,
    "step": "MOVING_PROCESSED_BLOBS_STAGE",
    "status": "failed"
}

Error deleting processed files

This error is produced in the REMOVING_BLOBS step due to, for example, a connection error with Azure or permissions problems when copying the destination folder. To resolve it, delete manually the files from the path with the error.
In the aura-databricks-jobs logs, an error message similar to this will appear:

2024-10-09 12:13:15,924 ERROR 84269 .venv/../base_logger.py msg="Detected 2 errors when trying remove files in "avro/dimensional/D_Aura_Channel/6.0.0". Review generated report for more detail.

A new report will be created and stored in the path configured in the job’s variable: AURA_KPI_AVRO_SOURCE_SIZE_REPORT_PATH. The report will be similar to:

{
    "num_files_kernel_uploaded": 2,
    "num_files_moved_to_processed": 2,
    "num_files_deleted": 0,
    "num_files_skipped": 0,
    "num_errors": 2,
    "summary": {
        "D_Aura_Channel": {
           "dataset_id": "D_Aura_Channel",
           "schema": "dimensional",
           "version": "6.0.0",
           "step": "REMOVING_BLOBS",
           "num_files_kernel_uploaded": 2,
           "num_files_moved_to_processed": 2,
           "num_files_deleted": 0,
           "num_files_skipped": 0,
           "num_errors": 2,
           "errors": [
                {
                    "step": "REMOVING_BLOBS",
                    "description": "avro/dimensional/D_Aura_Channel/6.0.0/CR_DIM_CHANNEL_20240920T120000Z.avro",
                    "error": "Error deleting the blob: \"avro/dimensional/D_Aura_Channel/6.0.0/CR_DIM_CHANNEL_20240920T120000Z.avro\". Error: Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.\nRequestId:ae99a5cf-501e-009f-5b62-195240000000\nTime:2024-10-08T09:11:13.7634974Z\nErrorCode:CannotVerifyCopySource\nContent: <?xml version=\"1.0\" encoding=\"utf-8\"?><Error><Code>CannotVerifyCopySource</Code><Message>Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.\nRequestId:ae99a5cf-501e-009f-5b62-195240000000\nTime:2024-10-08T09:11:13.7634974Z</Message></Error>",
                    "corr": "no-correlator"
                },
                {
                    "step": "REMOVING_BLOBS",
                    "description": "avro/dimensional/D_Aura_Channel/6.0.0/CR_DIM_CHANNEL_20240920T130000Z.avro",
                    "error": "Error deleting the blob: \"avro/dimensional/D_Aura_Channel/6.0.0/CR_DIM_CHANNEL_20240920T130000Z.avro\". Error: Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.\nRequestId:ae99a5cf-501e-009f-5b62-195240000000\nTime:2024-10-08T09:11:13.7634974Z\nErrorCode:CannotVerifyCopySource\nContent: <?xml version=\"1.0\" encoding=\"utf-8\"?><Error><Code>CannotVerifyCopySource</Code><Message>Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.\nRequestId:ae99a5cf-501e-009f-5b62-195240000000\nTime:2024-10-08T09:11:13.7634974Z</Message></Error>",
                    "corr": "no-correlator"
                }
              ]
        }
    },
    "start_time": "2024-09-03T17:56:26.464890Z",
    "end_time": "2024-09-03T18:21:17.115379Z",
    "duration_seconds": 1490.65,
    "step": "MOVING_PROCESSED_BLOBS_STAGE",
    "status": "failed"
}

Error in adapter configuration

There is an error in the process to obtain adapter information of the file configured in the variable AURA_KPI_AVRO_ADAPTER_CONFIG_PATH.

To correct it, check that the file is generated by aura-kpis-uploader in this path.

In the aura-databricks-jobs logs, a warning message similar to this will appear:

2024-10-09 16:19:39,994 ERROR 52315 msg="It could not obtain the configuration of the schemas to import in schemas/aura-avro-adapter.json"

Message indicating no Avro files are configured in the adapter

There are elements configured in AURA_KPI_AVRO_ADAPTER_CONFIG_PATH that are not defined as Avro schema to import in Kernel datasets.

In the aura-databricks-jobs logs, a warn message similar to this will appear:

2024-10-09 16:23:36,914 WARN 12400 .venv/../base_logger.py msg="The schema type "E_Aura_BOT" is not avro format and is not imported"
2024-10-09 16:23:36,914 WARN 12400 .venv/../base_logger.py msg="The schema type "E_Aura_CLF" is not avro format and is not imported"
2024-10-09 16:23:36,914 WARN 12400 .venv/../base_logger.py msg="The schema type "E_Aura_GROOT" is not avro format and is not imported"
2024-10-09 16:23:36,914 WARN 12400 .venv/../base_logger.py msg="The schema type "E_Aura_NLP" is not avro format and is not imported"
2024-10-09 16:23:36,914 WARN 12400 .venv/../base_logger.py msg="The schema type "E_Aura_SERVICES" is not avro format and is not imported"

Error in size report configuration

There is an error when obtaining adapter information of a file configured in variable AURA_KPI_AVRO_ADAPTER_CONFIG_PATH. To correct it, you must check the file is generated by aura-kpis-uploader in this path.

In the aura-databricks-jobs logs, a warn message similar to this will appear:

2024-10-09 18:29:39,023 ERROR 52395 msg="It could not obtain the configuration of the size report to import in "avro/sizeReport.json""

Message indicating no Avro files to load in dataset

There are elements configured in AURA_KPI_AVRO_ADAPTER_CONFIG_PATH as Avro schema that there are not Avro files to import in Kernel datasets. In the aura-databricks-jobs logs, an info message similar to this will appear:

2024-10-09 16:23:37,972 INFO 12400 .venv/../base_logger.py msg="Import files from directory "avro_test/dimensional/D_Aura_Recognizer/6.0.0""
2024-10-09 16:23:38,115 INFO 12400 .venv/../base_logger.py msg="There are no avro files to load for the path: "avro_test/dimensional/D_Aura_Recognizer/6.0.0""

A new report will be created and stored in the path configured in the job’s variable: AURA_KPI_AVRO_SOURCE_SIZE_REPORT_PATH. The report will be similar to:

{
    "num_files_kernel_uploaded": 0,
    "num_files_moved_to_processed": 0,
    "num_files_deleted": 0,
    "num_files_skipped": 0,
    "num_errors": 0,
    "summary": {
        "D_Aura_Channel": {
            "dataset_id": "D_Aura_Channel",
            "schema": "dimensional",
            "version": "6.0.0",
            "step": "NOT_PROCESSED",
            "num_files_kernel_uploaded": 0,
            "num_files_moved_to_processed": 0,
            "num_files_deleted": 0,
            "num_files_skipped": 0,
            "num_errors": 0,
            "errors": [],
            "spark_executions": {}
        },
        "D_Aura_Recognizer": {
            "dataset_id": "D_Aura_Recognizer",
            "schema": "dimensional",
            "version": "6.0.0",
            "step": "NOT_PROCESSED",
            "num_files_kernel_uploaded": 0,
            "num_files_moved_to_processed": 0,
            "num_files_deleted": 0,
            "num_files_skipped": 0,
            "num_errors": 0,
            "errors": [],
            "spark_executions": {}
        }
    },
    "start_time": "2024-09-03T17:56:26.464890Z",
    "end_time": "2024-09-03T18:21:17.115379Z",
    "duration_seconds": 1490.65,
    "step": "FINISH"
    "status": "successfully"
}

Aura Databricks Jobs

Aura Databricks Jobs

Introduction

Aura Databricks Jobs architecture

Avro to Dataset Job components

ConfigManager

AuraLogging

Avro to Dataset Job

Avro KPI importer

Azure Storage Manager

Spark SDK Manager

Aura Databricks Job operation

1 - Configuration

Aura Databricks Jobs configuration

Prerequisites

Execution of the tool in Databricks cluster

1. Configuration of the Databricks cluster

2. Configuration of the job’s variables

3. Configuration of job in Databricks cluster

Execution of the tool in local environment

1. Install Java 11

2. Install requirements via pip

3. Config spark Session

4. Execute job

2 - Environment variables

Environment variables

Aura Databricks Jobs variables

Avro to Dataset job cli variables

3 - User guide

Aura Databricks Jobs user guide

Prerequisites

Flow

Generate Reports

3.1 Report Model

4 - Troubleshooting

Aura Databricks Jobs troubleshooting

Errors related to missing configuration

Required environment variables

Errors related to the connection with Azure accounts

Error in the Azure Blob container that stores Avro-formatted files

Errors in the source Microsoft Storage account

Error in the source Microsoft Storage password

Errors related to the connection with operations with Databricks and Spark

Errors in Spark configuration

Error in dataset id option

Error in version of dataset option

Error in base URL option

Error in client id option

Error in client secret option

Error in purposes option

Token retrieval error: Kernel service not available

Error in scopes option

Errors in Spark execution

Error trying to import dataset with Avro files with schema error

Error trying to import Avro files with wrong schema in dataset and version configured in Kernel

Error trying to import dataset with missing schema

Error trying to init Spark session

Writing error in dataset due to out of memory error

Error trying to import datasets with timeout in Spark execution

Errors related to Aura Databricks Job process

Reports SAS Expiration configuration

Errors related to the connection in operations with Azure

Error copying files to processed folder

Error deleting processed files

Error in adapter configuration

Message indicating no Avro files are configured in the adapter

Error in size report configuration

Message indicating no Avro files to load in dataset