3 - User guide
Aura Databricks Jobs user guide
Guidelines including the orderly steps to use Aura Databricks Jobs
Prerequisites
-
Python version 3.9 or higher.
# determine python version
python --version
-
Installed aura-pytraces: Aura repository for Python traces functionalities.
-
Prerequisites in Aura installer:
- Databricks must be enabled in Aura installer
- Databricks cluster node type must be configured
- Databricks job execution must be configured
-
Configure Kernel datasets. See more details in Kernel datasets configuration.
Flow
The flow that aura-databricks-jobs follows to validate if it is going to be executed is as follows:

Generate Reports
By default, aura-databricks-jobs generates a report in the import process. This report is available in the Azure Storage defined in AURA_MICROSOFT_AZURE_STORAGE_COMMON_ACCOUNT, and path AURA_KPI_AVRO_REPORTS_DESTINATION_PATH with the file name: aura-avro-kpis-report-{iso-date}.json.
If you want to change the behavior and generate reports of all uploaded files or disable their generation, you can do it by changing the environment variable AURA_KPIS_REPORTS_MODE. If the value is set to all, it will generate a report for each of the processed files, if it is set to none, it will not generate any report and if it set to error, the report will be generated only when there are errors in the process. The default value is all.
3.1 Report Model
A report will contain the following template in JSON format.
{
"num_files_kernel_uploaded": 30,
"num_files_moved_to_processed": 30,
"num_files_deleted": 30,
"num_files_skipped": 0,
"num_errors": 0,
"summary": {
"D_Aura_Channel": {
"dataset_id": "D_Aura_Channel",
"schema": "dimensional",
"version": "6.0.0",
"step": "FINISH",
"num_files_kernel_uploaded": 4,
"num_files_moved_to_processed": 4,
"num_files_deleted": 4,
"num_files_skipped": 0,
"num_errors": 0,
"errors": [],
"spark_executions": {
"dataset_id": "D_Aura_Channel",
"version": 6,
"correlator": "55fc318d-b9cd-4070-ae6e-0407ef4b871e",
"resource_id": "8fb3e408-2ce0-42f4-8bbf-5b0974b44108",
"request_type": "writes",
"status": "finished",
"metrics": {
"total_records_written": 116,
"local_spark_write_discards": 0,
"local_spark_write_discards_total": 0,
"malformed_records_written": 0,
"total_records_filtered_by_gdpr": 0,
"local_spark_bytes_written_total": 14640,
"total_malformed_records_by_partition_written": [],
"partitions_written": [],
"total_malformed_records_written": 0,
"total_malformed_records_by_column_written": [],
"total_records_by_partition_written": [],
"total_not_informed_records_by_partition_written": [],
"records_read": 116,
"local_spark_records_written_total": 116,
"total_not_informed_records_written": 0,
"records_written": 116,
"total_malformed_records_discarded": 0,
"records_discarded": 0,
"data_access_audit": {
"partitions_num": 1,
"wasb_type": "avro_fp"
},
"total_executor_cpu_millis": 1,
"total_executor_memory": 593913446,
"total_bytes_written": 4796
}
},
"files_uploaded": [
"avro_test/dimensional/D_Aura_Channel/6.0.0/CR_DIM_CHANNEL_20241017T070000Z.avro",
"avro_test/dimensional/D_Aura_Channel/6.0.0/CR_DIM_CHANNEL_20241017T080000Z.avro",
"avro_test/dimensional/D_Aura_Channel/6.0.0/CR_DIM_CHANNEL_20241017T090000Z.avro",
"avro_test/dimensional/D_Aura_Channel/6.0.0/CR_DIM_CHANNEL_20241017T100000Z.avro"
],
"duration_seconds": 141.32
},
"D_Aura_Recognizer": {
"dataset_id": "D_Aura_Recognizer",
"schema": "dimensional",
"version": "6.0.0",
"step": "FINISH",
"num_files_kernel_uploaded": 4,
"num_files_moved_to_processed": 4,
"num_files_deleted": 4,
"num_files_skipped": 0,
"num_errors": 0,
"errors": [],
"spark_executions": {
"dataset_id": "D_Aura_Recognizer",
"version": 6,
"correlator": "55fc318d-b9cd-4070-ae6e-0407ef4b871e",
"resource_id": "415fb219-6ef4-4b21-9e14-c10347f1d2fa",
"request_type": "writes",
"status": "finished",
"metrics": {
"total_records_written": 376,
"local_spark_write_discards": 0,
"local_spark_write_discards_total": 0,
"malformed_records_written": 0,
"total_records_filtered_by_gdpr": 0,
"local_spark_bytes_written_total": 49744,
"total_malformed_records_by_partition_written": [],
"partitions_written": [],
"total_malformed_records_written": 0,
"total_malformed_records_by_column_written": [],
"total_records_by_partition_written": [],
"total_not_informed_records_by_partition_written": [],
"records_read": 376,
"local_spark_records_written_total": 376,
"total_not_informed_records_written": 0,
"records_written": 376,
"total_malformed_records_discarded": 0,
"records_discarded": 0,
"data_access_audit": {
"partitions_num": 1,
"wasb_type": "avro_fp"
},
"total_executor_cpu_millis": 1,
"total_executor_memory": 593913446,
"total_bytes_written": 9055
}
},
"files_uploaded": [
"avro_test/dimensional/D_Aura_Recognizer/6.0.0/CR_DIM_RECOGNIZER_20241017T070000Z.avro",
"avro_test/dimensional/D_Aura_Recognizer/6.0.0/CR_DIM_RECOGNIZER_20241017T080000Z.avro",
"avro_test/dimensional/D_Aura_Recognizer/6.0.0/CR_DIM_RECOGNIZER_20241017T090000Z.avro",
"avro_test/dimensional/D_Aura_Recognizer/6.0.0/CR_DIM_RECOGNIZER_20241017T100000Z.avro"
],
"duration_seconds": 94.75
},
"D_Aura_Component": {
"dataset_id": "D_Aura_Recognizer",
"schema": "dimensional",
"version": "6.0.0",
"step": "FINISH",
"num_files_kernel_uploaded": 4,
"num_files_moved_to_processed": 4,
"num_files_deleted": 4,
"num_files_skipped": 0,
"num_errors": 0,
"errors": [],
"spark_executions": {
"dataset_id": "D_Aura_Component",
"version": 6,
"correlator": "55fc318d-b9cd-4070-ae6e-0407ef4b871e",
"resource_id": "340c90a8-00d5-4868-a746-5ec0f8342a90",
"request_type": "writes",
"status": "finished",
"metrics": {
"total_records_written": 28,
"local_spark_write_discards": 0,
"local_spark_write_discards_total": 0,
"malformed_records_written": 0,
"total_records_filtered_by_gdpr": 0,
"local_spark_bytes_written_total": 2108,
"total_malformed_records_by_partition_written": [],
"partitions_written": [],
"total_malformed_records_written": 0,
"total_malformed_records_by_column_written": [],
"total_records_by_partition_written": [],
"total_not_informed_records_by_partition_written": [],
"records_read": 28,
"local_spark_records_written_total": 28,
"total_not_informed_records_written": 0,
"records_written": 28,
"total_malformed_records_discarded": 0,
"records_discarded": 0,
"data_access_audit": {
"partitions_num": 1,
"wasb_type": "avro_fp"
},
"total_executor_cpu_millis": 1,
"total_executor_memory": 593913446,
"total_bytes_written": 1255
}
},
"files_uploaded": [
"avro_test/dimensional/D_Aura_Component/6.0.0/CR_DIM_COMPONENT_20241017T070000Z.avro",
"avro_test/dimensional/D_Aura_Component/6.0.0/CR_DIM_COMPONENT_20241017T080000Z.avro",
"avro_test/dimensional/D_Aura_Component/6.0.0/CR_DIM_COMPONENT_20241017T090000Z.avro",
"avro_test/dimensional/D_Aura_Component/6.0.0/CR_DIM_COMPONENT_20241017T100000Z.avro"
],
"duration_seconds": 105.14
},
"D_Aura_Skill": {
"dataset_id": "D_Aura_Skill",
"schema": "dimensional",
"version": "6.0.0",
"step": "FINISH",
"num_files_kernel_uploaded": 4,
"num_files_moved_to_processed": 4,
"num_files_deleted": 4,
"num_files_skipped": 0,
"num_errors": 0,
"errors": [],
"spark_executions": {
"dataset_id": "D_Aura_Skill",
"version": 6,
"correlator": "55fc318d-b9cd-4070-ae6e-0407ef4b871e",
"resource_id": "60da9e25-0767-4097-ab9a-2bf388d8daa7",
"request_type": "writes",
"status": "finished",
"metrics": {
"total_records_written": 16,
"local_spark_write_discards": 0,
"local_spark_write_discards_total": 0,
"malformed_records_written": 0,
"total_records_filtered_by_gdpr": 0,
"local_spark_bytes_written_total": 1280,
"total_malformed_records_by_partition_written": [],
"partitions_written": [],
"total_malformed_records_written": 0,
"total_malformed_records_by_column_written": [],
"total_records_by_partition_written": [],
"total_not_informed_records_by_partition_written": [],
"records_read": 16,
"local_spark_records_written_total": 16,
"total_not_informed_records_written": 0,
"records_written": 16,
"total_malformed_records_discarded": 0,
"records_discarded": 0,
"data_access_audit": {
"partitions_num": 1,
"wasb_type": "avro_fp"
},
"total_executor_cpu_millis": 1,
"total_executor_memory": 593913446,
"total_bytes_written": 1246
}
},
"files_uploaded": [
"avro_test/dimensional/D_Aura_Skill/6.0.0/CR_DIM_SKILL_20241017T070000Z.avro",
"avro_test/dimensional/D_Aura_Skill/6.0.0/CR_DIM_SKILL_20241017T080000Z.avro",
"avro_test/dimensional/D_Aura_Skill/6.0.0/CR_DIM_SKILL_20241017T090000Z.avro",
"avro_test/dimensional/D_Aura_Skill/6.0.0/CR_DIM_SKILL_20241017T100000Z.avro"
],
"duration_seconds": 95.97
},
"D_Aura_Preset": {
"dataset_id": "D_Aura_Preset",
"schema": "dimensional",
"version": "6.0.0",
"step": "FINISH",
"num_files_kernel_uploaded": 4,
"num_files_moved_to_processed": 4,
"num_files_deleted": 4,
"num_files_skipped": 0,
"num_errors": 0,
"errors": [],
"spark_executions": {
"dataset_id": "D_Aura_Preset",
"version": 6,
"correlator": "55fc318d-b9cd-4070-ae6e-0407ef4b871e",
"resource_id": "8b143625-9bf7-484a-8a05-671a6cff72fe",
"request_type": "writes",
"status": "finished",
"metrics": {
"total_records_written": 64,
"local_spark_write_discards": 0,
"local_spark_write_discards_total": 0,
"malformed_records_written": 0,
"total_records_filtered_by_gdpr": 0,
"local_spark_bytes_written_total": 5020,
"total_malformed_records_by_partition_written": [],
"partitions_written": [],
"total_malformed_records_written": 0,
"total_malformed_records_by_column_written": [],
"total_records_by_partition_written": [],
"total_not_informed_records_by_partition_written": [],
"records_read": 64,
"local_spark_records_written_total": 64,
"total_not_informed_records_written": 0,
"records_written": 64,
"total_malformed_records_discarded": 0,
"records_discarded": 0,
"data_access_audit": {
"partitions_num": 1,
"wasb_type": "avro_fp"
},
"total_executor_cpu_millis": 1,
"total_executor_memory": 593913446,
"total_bytes_written": 2001
}
},
"files_uploaded": [
"avro_test/dimensional/D_Aura_Preset/6.0.0/CR_DIM_PRESETS_20241017T070000Z.avro",
"avro_test/dimensional/D_Aura_Preset/6.0.0/CR_DIM_PRESETS_20241017T080000Z.avro",
"avro_test/dimensional/D_Aura_Preset/6.0.0/CR_DIM_PRESETS_20241017T090000Z.avro",
"avro_test/dimensional/D_Aura_Preset/6.0.0/CR_DIM_PRESETS_20241017T100000Z.avro"
],
"duration_seconds": 72.97
},
"D_Aura_App": {
"dataset_id": "D_Aura_App",
"schema": "dimensional",
"version": "6.0.0",
"step": "FINISH",
"num_files_kernel_uploaded": 4,
"num_files_moved_to_processed": 4,
"num_files_deleted": 4,
"num_files_skipped": 0,
"num_errors": 0,
"errors": [],
"spark_executions": {
"dataset_id": "D_Aura_App",
"version": 6,
"correlator": "55fc318d-b9cd-4070-ae6e-0407ef4b871e",
"resource_id": "f99b5dac-47ce-4525-aa86-6d3bbb3b67f5",
"request_type": "writes",
"status": "finished",
"metrics": {
"total_records_written": 28,
"local_spark_write_discards": 0,
"local_spark_write_discards_total": 0,
"malformed_records_written": 0,
"total_records_filtered_by_gdpr": 0,
"local_spark_bytes_written_total": 5192,
"total_malformed_records_by_partition_written": [],
"partitions_written": [],
"total_malformed_records_written": 0,
"total_malformed_records_by_column_written": [],
"total_records_by_partition_written": [],
"total_not_informed_records_by_partition_written": [],
"records_read": 28,
"local_spark_records_written_total": 28,
"total_not_informed_records_written": 0,
"records_written": 28,
"total_malformed_records_discarded": 0,
"records_discarded": 0,
"data_access_audit": {
"partitions_num": 1,
"wasb_type": "avro_fp"
},
"total_executor_cpu_millis": 1,
"total_executor_memory": 593913446,
"total_bytes_written": 2742
}
},
"files_uploaded": [
"avro_test/dimensional/D_Aura_App/6.0.0/CR_DIM_APP_20241017T070000Z.avro",
"avro_test/dimensional/D_Aura_App/6.0.0/CR_DIM_APP_20241017T080000Z.avro",
"avro_test/dimensional/D_Aura_App/6.0.0/CR_DIM_APP_20241017T090000Z.avro",
"avro_test/dimensional/D_Aura_App/6.0.0/CR_DIM_APP_20241017T100000Z.avro"
],
"duration_seconds": 93.86
},
"Aura_Audit": {
"dataset_id": "Aura_Audit",
"schema": "entity",
"version": "6.0.0",
"step": "FINISH",
"num_files_kernel_uploaded": 2,
"num_files_moved_to_processed": 2,
"num_files_deleted": 2,
"num_files_skipped": 0,
"num_errors": 0,
"errors": [],
"spark_executions": {
"dataset_id": "Aura_Audit",
"version": 6,
"correlator": "55fc318d-b9cd-4070-ae6e-0407ef4b871e",
"resource_id": "3013424c-4ef1-4bdb-b4fc-a02540f9b1f8",
"request_type": "writes",
"status": "finished",
"metrics": {
"total_records_written": 63,
"local_spark_write_discards": 0,
"local_spark_write_discards_total": 0,
"malformed_records_written": 0,
"total_records_filtered_by_gdpr": 0,
"local_spark_bytes_written_total": 12452,
"total_malformed_records_by_partition_written": [],
"partitions_written": [
[
[
"DAY_DT",
"2024-10-04"
]
],
[
[
"DAY_DT",
"2024-10-07"
]
]
],
"total_malformed_records_written": 0,
"total_malformed_records_by_column_written": [],
"total_records_by_partition_written": [
[
"DAY_DT=2024-10-04",
53
],
[
"DAY_DT=2024-10-07",
10
]
],
"total_not_informed_records_by_partition_written": [],
"records_read": 63,
"local_spark_records_written_total": 63,
"total_not_informed_records_written": 0,
"records_written": 63,
"total_malformed_records_discarded": 0,
"records_discarded": 0,
"data_access_audit": {
"partitions_num": 1,
"wasb_type": "avro_fp"
},
"total_executor_cpu_millis": 1,
"total_executor_memory": 593913446,
"total_bytes_written": 6854
}
},
"files_uploaded": [
"avro_test/entity/Aura_Audit/6.0.0/AURA_062a0ab0-d0bd-5347-98bf-d88977af622f_CR_AUDIT_20241007T090000Z.avro",
"avro_test/entity/Aura_Audit/6.0.0/AURA_1d43887a-f368-51ce-abee-60f5b25387ad_CR_AUDIT_20241004T110000Z.avro"
],
"duration_seconds": 100.70
},
"Aura_Gateway_Message": {
"dataset_id": "Aura_Gateway_Message",
"schema": "entity",
"version": "6.0.0",
"step": "NOT_PROCESSED",
"num_files_kernel_uploaded": 0,
"num_files_moved_to_processed": 0,
"num_files_deleted": 0,
"num_files_skipped": 0,
"num_errors": 0,
"errors": [],
"spark_executions": {},
"files_uploaded": [],
"duration_seconds": 0.07
}
},
"start_time": "2024-10-23T15:18:30.098166Z",
"end_time": "2024-10-23T15:36:57.161532Z",
"duration_seconds": 1107.06,
"step": "FINISH",
"status": "successfully"
}
The parameters are defined as follows:
-
dataset_id: Kernel dataset id to load.
-
schema: Type of schema to load.
-
version: Dataset version to load.
-
step: Stage of loading process. It could be:
- INIT: In this stage, the necessary Azure and Spark connections are created and a report is created.
- CHECK_PREVIOUS_ERRORS: In this stage, it is checked if there were errors in the last execution; the errors of the datasets that cannot be recovered are marked and those that can be recovered will be executed again.
- WRITING_KERNEL_STAGE: Stage for reading files and writing data to the Kernel datasets.
- MOVING_PROCESSED_BLOBS_STAGE: Stage for moving files to the processed folder.
- FINISH: This stage indicates that the process has been completed.
-
num_files_kernel_uploaded: Number of files that have been verified as successfully uploaded in Kernel Datalake.
-
num_files_moved_to_processed: Number of files that have been moved to the processed folder.
-
num_files_deleted : Number of files that have been deleted from the main folder.
-
num_files_skipped: Number of files that have been skipped. This is because they have not yet been processed due to match with pattern defined in job’s variable: AURA_KPI_AVRO_SCHEMAS_NOT_TO_UPLOAD
-
num_errors: Total of errors reported. It may indicate an error when loading the source files contained in one of the Avro-formatted folders. So it does not correspond to the number of erroneous files.
-
start_time: Date in ISO format with start time
-
end_time: Date in ISO format with end time
-
duration_seconds: duration in seconds of the import process.
-
status: It contains the status of process. The value will be failed or successfully.
-
summary: It contains the information of each coroutine processed that is responsible for loading a folder with files that have the same Avro schema and the same version. If there is a general error prior to the coroutines, it will also appear in the summary in the process_error field.
It contains for each dataset id:
- num_files_kernel_uploaded: Number of files that have been verified as successfully uploaded in Kernel Datalake for this dataset id.
- num_files_moved_to_processed: Number of files that have been moved to the processed folder for this dataset id.
- num_files_deleted: Number of files that have been deleted from the main folder for this dataset id.
- num_errors: Number of errors reported for this dataset id.
- errors: Produced errors for this dataset id. With elements:
error, corr, step.
- error: Description or exception of error obtained.
- corr: Correlator used in process.
- step: It indicates the phase of the process for each Kernel dataset.
- MOVING_BLOBS_TO_PROCESSED_WITH_PREVIOUS_ERRORS: In this stage, the processed files that were pending to move due to an error are now moved.
- REMOVING_BLOBS_WITH_PREVIOUS_ERRORS: In this stage, the processed files that were pending to be deleted due to an error are now deleted.
- NOT_PROCESSED_PREVIOUS_ERRORS: Errors that occurred in a previous process that are not recoverable. For example, if the writing has malformed or discarded records, they must be reviewed manually and should not be written to the dataset. Or if after trying to move the files to be processed again they fail again, it would be necessary to specifically check what happens with those files.
- READING_BLOBS: In this stage, the files are read to create data to be written to the dataset.
- WRITING_DATASET: This stage proceeds to write data to the dataset.
- WRITING_DATASET_OK: At this stage, the data has already been correctly written to the dataset.
- WRITING_DATASET_ERROR_NOT_RECOVERABLE: In the writing process, malformed or discarded records have been detected that must be checked manually.
- MOVING_BLOBS_TO_PROCESSED: At this stage, the files are moved to the processed folder.
- REMOVING_BLOBS: At this stage, the files are deleted from the processed folder.
- NOT_PROCESSED: The dataset has no data and will not be processed.
- FINISH: The dataset uploading has been completed correctly.
- spark_executions: Spark report for that dataset id. Included info such as records read, written, discarded, etc.
- files_uploaded: List of files that have been uploaded in Kernel for this dataset id.
Example of one coroutine executed for ´D_Aura_Channel´ dataset:
{
"D_Aura_Channel": {
"dataset_id": "D_Aura_Channel",
"schema": "dimensional",
"version": "6.0.0",
"step": "FINISH",
"num_files_kernel_uploaded": 156,
"num_files_moved_to_processed": 156,
"num_files_deleted": 156,
"num_files_skipped": 0,
"num_errors": 0,
"errors": [],
"spark_executions": {
"dataset_id": "D_Aura_Channel",
"version": 6,
"correlator": "d558b080-f261-4e6b-9adc-a7503f3e51a9",
"resource_id": "36417c66-a276-4107-bcb8-3792bccb076c",
"request_type": "writes",
"status": "finished",
"metrics": {
"total_records_written": 4967,
"local_spark_write_discards": 0,
"local_spark_write_discards_total": 0,
"malformed_records_written": 0,
"total_records_filtered_by_gdpr": 0,
"local_spark_bytes_written_total": 4049495,
"total_malformed_records_by_partition_written": [],
"partitions_written": [],
"total_malformed_records_written": 0,
"total_records_by_partition_written": [],
"total_not_informed_records_by_partition_written": [],
"records_read": 4967,
"local_spark_records_written_total": 4967,
"total_not_informed_records_written": 0,
"records_written": 4967,
"total_malformed_records_discarded": 0,
"records_discarded": 0,
"data_access_audit": {
"partitions_num": 1,
"wasb_type": "avro_fp"
},
"total_executor_cpu_millis": 1,
"total_executor_memory": 593913446,
"total_bytes_written": 394038
}
},
"duration_seconds": 112.05
}
}
4 - Troubleshooting
Aura Databricks Jobs troubleshooting
Most common errors in Aura Databricks Jobs, along with the generated logs and recommendations for error fixing
Required environment variables
Situation produced due to missing configuration of the mandatory environment variables.
If any of the mandatory environment variables is missing, an error message appears in the aura-databricks-jobs logs similar to the one shown below:
marshmallow.exceptions.ValidationError: {'AURA_MICROSOFT_AZURE_STORAGE_COMMON_ACCOUNT': ['AURA_MICROSOFT_AZURE_STORAGE_COMMON_ACCOUNT is required.'], 'AURA_MICROSOFT_AZURE_STORAGE_COMMON_ACCESS_KEY': ['AURA_MICROSOFT_AZURE_STORAGE_COMMON_ACCESS_KEY is required.'], 'AURA_MICROSOFT_AZURE_STORAGE_KPIS_CONTAINER_NAME': ['AURA_MICROSOFT_AZURE_STORAGE_KPIS_CONTAINER_NAME is required.']}
The value of AURA_MICROSOFT_AZURE_STORAGE_KPIS_CONTAINER_NAME in the job’s variable is not correct, as the container does not exist. To solve it, review the credentials in the aura-conversations bucket/blob container in Kernel.
In the aura-databricks-jobs logs, an error message similar to this will appear:
azure.core.exceptions.ResourceNotFoundError: The specified container does not exist.
RequestId:2dfad4cd-401e-0083-31cf-190020000000
Time:2024-10-08T22:11:23.1996799Z
ErrorCode:ContainerNotFound
Content: <?xml version="1.0" encoding="utf-8"?><Error><Code>ContainerNotFound</Code><Message>The specified container does not exist.
RequestId:2dfad4cd-401e-0083-31cf-190020000000
Time:2024-10-08T22:11:23.1996799Z</Message></Error>
Errors in the source Microsoft Storage account
-
The value of AURA_MICROSOFT_AZURE_STORAGE_COMMON_ACCOUNT in the job’s variable is not correct. To solve it, review the credentials in the aura-conversations bucket/blob container in Kernel.
In the aura-databricks-jobs logs, an error message similar to this will appear:
azure.core.exceptions.ServiceRequestError: <urllib3.connection.HTTPSConnection object at 0x10276ebe0>: Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known
-
The value of AURA_MICROSOFT_AZURE_STORAGE_COMMON_ACCOUNT in the job’s variable is empty. In the aura-databricks-jobs logs, an error message similar to this will appear:
azure.core.exceptions.ServiceRequestError: URL has an invalid label.
Error in the source Microsoft Storage password
-
The value of AURA_MICROSOFT_AZURE_STORAGE_COMMON_ACCESS_KEY in the job’s variable is not correct. To solve it, review the credentials in the aura-conversations bucket/blob container in Kernel.
In the aura-databricks-jobs logs, an error message similar to this will appear:
azure.storage.blob._shared.authentication.AzureSigningError: Invalid base64-encoded string: number of data characters (81) cannot be 1 more than a multiple of 4
-
The value of AURA_MICROSOFT_AZURE_STORAGE_COMMON_ACCESS_KEY in the job’s variable is empty. In the aura-databricks-jobs logs, an error message similar to this will appear:
azure.core.exceptions.ServiceRequestError: <urllib3.connection.HTTPSConnection object at 0x10284bac0>: Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known
Errors in Spark configuration
Error in dataset id option
The value of dataset.id configured in the Kernel dataset write statement is not correct for the aura-bot Kernel app.
To solve it, review the configuration of the file configured in the job’s variable: AURA_KPI_AVRO_ADAPTER_CONFIG_PATH. This file contains the list of datasets to be imported. If this dataset is not included, contact Kernel Operations team and request them to add this dataset with a specific version and include the new scope in purpose configured for the corresponding application.
For more detail: Kernel datasets configuration
In the aura-databricks-jobs logs, an error message similar to this will appear:
com.telefonica.baikal.spark.exceptions.InvalidDataSourceConfigException: An error occurred trying to recover dataset D_Aura_LivingApp_ERROR-6: ErrorResponse(NOT_FOUND,Dataset D_Aura_LivingApp_ERROR version 6 not found,None). Configured data source options Map(client.purposes -> aura-kpi-data-write-purpose, 4p.baseurl -> global-int-current.baikalplatform.com, writemode -> append, dataset.id -> D_Aura_LivingApp_ERROR, correlator -> df776bdc-a7d9-482e-8364-8c617afc75be, client.scopes -> , repartition.enabled -> true, client.id -> aura-bot, skipunpseudonymize -> true, repartition.compressedrecordsize -> 1403, client.secret -> ********, dataset.version -> 6)
Error in version of dataset option
-
The value of dataset.version configured in the Kernel dataset write statement is not correct for the aura-bot Kernel app.
To solve it, review the configuration of the file configured in the job’s variable: AURA_KPI_AVRO_ADAPTER_CONFIG_PATH. This file contains the list of datasets, together with their versions, to be imported.
-
The value of dataset.version is not correct for the aura-bot Kernel app because the format is not number. In the aura-databricks-jobs logs, an error message similar to this will appear:
pyspark.sql.utils.IllegalArgumentException: For input string: "version_error"
-
The value of dataset.version is not correct for the aura-bot Kernel app because this version does not exist. In the aura-databricks-jobs logs, an error message similar to this will appear:
py4j.protocol.Py4JJavaError: An error occurred while calling o123.save.
: com.telefonica.baikal.spark.exceptions.InvalidDataSourceConfigException: An error occurred trying to recover dataset D_Aura_LivingApp_PRUEBAS_AURA-8: ErrorResponse(NOT_FOUND,Dataset D_Aura_LivingApp_PRUEBAS_AURA version 8 not found,None). Configured data source options Map(client.purposes -> aura-kpi-data-write-purpose, 4p.baseurl -> global-int-current.baikalplatform.com, writemode -> append, dataset.id -> D_Aura_LivingApp_PRUEBAS_AURA, correlator -> 09c988c5-4d45-4590-9c76-847b7f3d1579, client.scopes -> , repartition.enabled -> true, client.id -> aura-bot, skipunpseudonymize -> true, repartition.compressedrecordsize -> 1403, client.secret -> ********, dataset.version -> 8)
Error in base URL option
The value of AURA_FP_SPARK_BASE_URL in the job’s variable used to set 4p.baseurl in the Kernel dataset write statement is not correct for the aura-bot Kernel app.
To solve it, contact Kernel Operations team to review the value of the variable. In the aura-databricks-jobs logs, an error message similar to this will appear:
[WARN] [10/09/2024 10:45:56.456] [spark-sdk-akka.actor.default-dispatcher-4] [spark-sdk/Pool(shared->https://auth.global-int-current.baikalplatform.com.error:443)] Connection attempt failed. Backing off new connection attempts for at least 100 milliseconds.
[WARN] [10/09/2024 10:46:01.495] [spark-sdk-akka.actor.default-dispatcher-3] [spark-sdk/Pool(shared->https://auth.global-int-current.baikalplatform.com.error:443)] Connection attempt failed. Backing off new connection attempts for at least 200 milliseconds.
[WARN] [10/09/2024 10:46:06.545] [spark-sdk-akka.actor.default-dispatcher-7] [spark-sdk/Pool(shared->https://auth.global-int-current.baikalplatform.com.error:443)] Connection attempt failed. Backing off new connection attempts for at least 400 milliseconds.
[WARN] [10/09/2024 10:46:11.569] [spark-sdk-akka.actor.default-dispatcher-3] [spark-sdk/Pool(shared->https://auth.global-int-current.baikalplatform.com.error:443)] Connection attempt failed. Backing off new connection attempts for at least 800 milliseconds.
[WARN] [10/09/2024 10:46:16.600] [spark-sdk-akka.actor.default-dispatcher-7] [spark-sdk/Pool(shared->https://auth.global-int-current.baikalplatform.com.error:443)] Connection attempt failed. Backing off new connection attempts for at least 1600 milliseconds.
[WARN] [10/09/2024 10:46:21.633] [spark-sdk-akka.actor.default-dispatcher-3] [spark-sdk/Pool(shared->https://auth.global-int-current.baikalplatform.com.error:443)] Connection attempt failed. Backing off new connection attempts for at least 3200 milliseconds.
[WARN] [10/09/2024 10:46:26.673] [spark-sdk-akka.actor.default-dispatcher-45] [spark-sdk/Pool(shared->https://auth.global-int-current.baikalplatform.com.error:443)] Connection attempt failed. Backing off new connection attempts for at least 6400 milliseconds.
[WARN] [10/09/2024 10:46:39.154] [spark-sdk-akka.actor.default-dispatcher-48] [spark-sdk/Pool(shared->https://auth.global-int-current.baikalplatform.com.error:443)] Connection attempt failed. Backing off new connection attempts for at least 12800 milliseconds.
[WARN] [10/09/2024 10:46:52.129] [spark-sdk-akka.actor.default-dispatcher-48] [spark-sdk/Pool(shared->https://auth.global-int-current.baikalplatform.com.error:443)] Connection attempt failed. Backing off new connection attempts for at least 25600 milliseconds.
[WARN] [10/09/2024 10:47:19.988] [spark-sdk-akka.actor.default-dispatcher-48] [spark-sdk/Pool(shared->https://auth.global-int-current.baikalplatform.com.error:443)] Connection attempt failed. Backing off new connection attempts for at least 51200 milliseconds.
24/10/09 10:47:19 ERROR DefaultOAuthService: An error occurred trying to connect with http service
akka.stream.StreamTcpException: Tcp command [Connect(auth.global-int-current.baikalplatform.com.error:443,None,List(),Some(10 seconds),true)] failed because of java.net.UnknownHostException: auth.global-int-current.baikalplatform.com.error
Caused by: java.net.UnknownHostException: auth.global-int-current.baikalplatform.com.error
Error in client id option
The value of AURA_FP_SPARK_CLIENT_ID in the job’s variable used to set client.id in the Kernel dataset write statement is not correct for the aura-bot Kernel app.
To solve it, review the credentials in the aura-conversations bucket/blob container in Kernel.
In the aura-databricks-jobs logs, an error message similar to this will appear, and a timeout of the job will occur since it will remain trying to execute that statement until the job is stopped by the databricks manager.
24/10/09 10:38:48 ERROR OAuthTokenActor: Invalid authentication: invalid_client, Bad credentials
24/10/09 10:38:48 ERROR OAuthTokenActor: Could not update token, rescheduling in PT5S
Error in client secret option
The value of AURA_FP_SPARK_CLIENT_SECRET in the job’s variable used to set client.secret in the Kernel dataset write statement is not correct for the aura-bot Kernel app.
To solve it, review the credentials with Kernel operations team for the aura-bot Kernel app.
In the aura-databricks-jobs logs, an error message similar to this will appear, and a timeout of the job will occur since it will remain trying to execute that statement until the job is stopped by the databricks manager.
24/10/09 10:58:51 ERROR OAuthTokenActor: Invalid authentication: invalid_client, Bad credentials
24/10/09 10:58:51 ERROR OAuthTokenActor: Could not update token, rescheduling in PT5S
Error in purposes option
The value of AURA_FP_SPARK_PURPOSES in the job’s variable used to set client.purposes in the Kernel dataset write statement is not correct for the aura-bot Kernel app.
To solve it, contact Kernel operations team and request them to add the purpose for the corresponding application. In the happening that the purpose is not created follow these guides to create them: Kernel datasets configuration.
In the aura-databricks-jobs logs, an error message similar to this will appear, and a timeout of the job will occur since it will remain trying to execute that statement until the job is stopped by the databricks manager.
24/10/09 10:56:38 ERROR OAuthTokenActor: Invalid authentication: invalid_purpose, Invalid purpose: aura-kpi-data-write-purpose-error for client_credentials
24/10/09 10:56:38 ERROR OAuthTokenActor: Could not update token, rescheduling in PT5S
Token retrieval error: Kernel service not available
The configuration is correct but the Kernel service is not available at that time. A timeout occurs in the job when making several retries, since the Spark session is not closed by Kernel.
In this case, it is necessary to contact Kernel Operations team and wait for the service to be restored and to rerun the job.
In the aura-databricks-jobs logs, an error message similar to this will appear, and a timeout of the job will occur since it will remain trying to execute that statement until the job is stopped by the databricks manager.
- Standard error: It is waiting to connect to the Kernel client.
2024-10-26 06:05:35,846 INFO 1016 /databricks/python/lib/python3.9/site-packages/aura_pytraces/aura_logging/base_logger.py msg="Writing blobs of avro blob path: "avro/dimensional/D_Aura_Channel/6.0.0" to dataset_id: "D_Aura_Channel""
- Log4j output file: Information about error trying to get token to connect in Kernel, as in the following example:
24/10/26 06:59:32 INFO OAuthTokenActor: Token not set yet [52698445-1943-4d62-9425-8451637736b7]
24/10/26 06:59:32 INFO OAuthTokenActor: Token not set yet [52698445-1943-4d62-9425-8451637736b7]
24/10/26 06:59:32 INFO OAuthTokenActor: Token not set yet [52698445-1943-4d62-9425-8451637736b7]
24/10/26 06:59:32 INFO OAuthTokenActor: Token not set yet [52698445-1943-4d62-9425-8451637736b7]
24/10/26 06:59:32 INFO OAuthTokenActor: Token not set yet [52698445-1943-4d62-9425-8451637736b7]
24/10/26 06:59:32 INFO OAuthTokenActor: Token not set yet [52698445-1943-4d62-9425-8451637736b7]
24/10/26 06:59:32 INFO OAuthTokenActor: Token not set yet [52698445-1943-4d62-9425-8451637736b7]
24/10/26 06:59:32 INFO OAuthTokenActor: Token not set yet [52698445-1943-4d62-9425-8451637736b7]
24/10/26 06:59:32 INFO OAuthTokenActor: Token not set yet [52698445-1943-4d62-9425-8451637736b7]
24/10/26 06:59:33 INFO OAuthTokenActor: Token not set yet [52698445-1943-4d62-9425-8451637736b7]
24/10/26 06:59:33 INFO OAuthTokenActor: Token not set yet [52698445-1943-4d62-9425-8451637736b7]
24/10/26 06:59:33 INFO OAuthTokenActor: Token not set yet [52698445-1943-4d62-9425-8451637736b7]
24/10/26 06:59:33 INFO OAuthTokenActor: Token not set yet [52698445-1943-4d62-9425-8451637736b7]
Error in scopes option
The value of AURA_FP_SPARK_SCOPES in the job’s variable used to set client.scopes in the Kernel dataset write statement is not correct for the aura-bot Kernel app.
The most common behavior is that a purpose is created with a list of scopes added, so this variable would not need to be configured. If it is necessary to use this variable and a scope is not defined, an error will be produced. To solve it, review the configuration of the scopes reflected in: Kernel datasets configuration.
In the aura-databricks-jobs logs, an error message similar to this will appear, and a timeout of the job will occur since it will remain trying to execute that statement until the job is stopped by the databricks manager.
24/10/09 11:00:59 ERROR OAuthTokenActor: Invalid authentication: invalid_scope, Invalid scope 'scopes-error' requested for client 'aura-bot-six'
24/10/09 11:00:59 ERROR OAuthTokenActor: Could not update token, rescheduling in PT5S
com.telefonica.baikal.services.exceptions.InvalidOAuthAuthException: Invalid authentication: invalid_scope, Invalid scope 'scopes-error' requested for client 'aura-bot-six'
Errors in Spark execution
Error trying to import dataset with Avro files with schema error
This error is produced in the WRITING_DATASET step because there are Avro files to import with an error schema.
To solve it, review the specific error of the schema indicated in logs.
To check the problem, review the schema configuration for the failing dataset:
- First, get the path of the schema defined in the file configured in the job’s variable:
AURA_KPI_AVRO_ADAPTER_CONFIG_PATH.
- Afterwards, with the path, get the schema definition.
Depending on the indicated error, you must validate the data of files that do not follow the schema specification.
In the aura-databricks-jobs logs, an error message similar to this will appear:
24/10/09 15:58:53 ERROR Executor: Exception in task 0.0 in stage 63.0 (TID 553)
org.apache.avro.AvroTypeException: Found com.telefonica.urm.Digital_Products.Aura.Aura_Suggestion, expecting com.telefonica.urm.Digital_Products.Aura.Aura_Suggestion, missing required field AURA_MODEL_VERSION_ID
A new file will be created with the name configured in the job’s variable: AURA_KPI_AVRO_PROCESS_ERROR_FILENAME. The file content will be similar to:
{
"time": "2024-10-09T15:47:41.507980Z",
"report_link": "https://commauradevstorage.blob.core.windows.net/aura-kpis-ap-six/avro_test/reports/aura-avro-kpis-report-2024-10-09T16%3A01%3A34.247575Z.json?se=2024-11-08T14%3A01%3A46Z&sp=r&sv=2021-08-06&sr=b&sig=GmHLQ/F5rk4Bob5OrbAZBpBs6z/CXiUjI4KLyticGzg%3D"
}
A new report will be created and stored in the path configured in the job’s variable: AURA_KPI_AVRO_SOURCE_SIZE_REPORT_PATH. The report will indicate the error in Aura_Suggestion dataset and will be similar to:
{
"num_files_kernel_uploaded": 182,
"num_files_moved_to_processed": 182,
"num_files_deleted": 182,
"num_files_skipped": 0,
"num_errors": 1,
"summary": {
"D_Aura_Channel": {
"dataset_id": "D_Aura_Channel",
"schema": "dimensional",
"version": "6.0.0",
"step": "FINISH",
"num_files_kernel_uploaded": 25,
"num_files_moved_to_processed": 25,
"num_files_deleted": 25,
"num_files_skipped": 0,
"num_errors": 0,
"errors": [],
"spark_executions": {
"dataset_id": "D_Aura_Channel",
"version": 6,
"correlator": "5f19247e-40b2-4643-8ed1-b1e0f6c0d759",
"resource_id": "1aabef7e-03f6-40f5-9812-263e49c1d4b0",
"request_type": "writes",
"status": "finished",
"metrics": {
"total_records_written": 775,
"local_spark_write_discards": 0,
"local_spark_write_discards_total": 0,
"malformed_records_written": 0,
"total_records_filtered_by_gdpr": 0,
"local_spark_bytes_written_total": 697275,
"total_malformed_records_by_partition_written": [],
"partitions_written": [],
"total_malformed_records_written": 0,
"total_malformed_records_by_column_written": [],
"total_records_by_partition_written": [],
"total_not_informed_records_by_partition_written": [],
"records_read": 775,
"local_spark_records_written_total": 775,
"total_not_informed_records_written": 0,
"records_written": 775,
"total_malformed_records_discarded": 0,
"records_discarded": 0,
"data_access_audit": {
"partitions_num": 1,
"wasb_type": "avro_fp"
},
"total_executor_cpu_millis": 1,
"total_executor_memory": 593913446,
"total_bytes_written": 68804
}
},
"files_uploaded": [
"avro_test/dimensional/D_Aura_Channel/6.0.0/CR_DIM_CHANNEL_20241017T070000Z.avro",
"avro_test/dimensional/D_Aura_Channel/6.0.0/CR_DIM_CHANNEL_20241017T080000Z.avro",
"avro_test/dimensional/D_Aura_Channel/6.0.0/CR_DIM_CHANNEL_20241017T090000Z.avro",
"avro_test/dimensional/D_Aura_Channel/6.0.0/CR_DIM_CHANNEL_20241017T100000Z.avro"
],
"duration_seconds": 141.32
},
"Aura_Suggestion": {
"dataset_id": "Aura_Suggestion",
"schema": "entity",
"version": "6.0.0",
"step": "WRITING_DATASET",
"num_files_kernel_uploaded": 0,
"num_files_moved_to_processed": 0,
"num_files_deleted": 0,
"num_files_skipped": 0,
"num_errors": 1,
"errors": [
{
"step": "WRITING_DATASET",
"description": "avro_test/entity/Aura_Suggestion/6.0.0",
"error": "An error occurred while calling o208.save.\n: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 63.0 failed 1 times, most recent failure: Lost task 0.0 in stage 63.0 (TID 553) (192.168.1.71 executor driver): org.apache.avro.AvroTypeException: Found com.telefonica.urm.Digital_Products.Aura.Aura_Suggestion, expecting com.telefonica.urm.Digital_Products.Aura.Aura_Suggestion, missing required field AURA_MODEL_VERSION_ID\n\tat org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:308)\n\tat org.apache.avro.io.parsing.Parser.advance(Parser.java:86)\n\tat org.apache.avro.io.ResolvingDecoder.readFieldOrder(ResolvingDecoder.java:127)\n\tat org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:240)\n\tat org.apache.avro.specific.SpecificDatumReader.readRecord(SpecificDatumReader.java:123)\n\tat org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:180)\n\tat org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:161)\n\tat org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:154)\n\tat org.apache.avro.file.DataFileStream.next(DataFileStream.java:251)\n\tat org.apache.avro.mapreduce.AvroRecordReaderBase.nextKeyValue(AvroRecordReaderBase.java:126)\n\tat org.apache.avro.mapreduce.AvroKeyRecordReader.nextKeyValue(AvroKeyRecordReader.java:55)\n\tat org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext(NewHadoopRDD.scala:251)\n\tat org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)\n\tat scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)\n\tat scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)\n\tat scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)\n\tat scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)\n\tat scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)\n\tat scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:513)\n\tat scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)\n\tat scala.collection.Iterator$SliceIterator.hasNext(Iterator.scala:268)\n\tat scala.collection.Iterator.foreach(Iterator.scala:943)\n\tat scala.collection.Iterator.foreach$(Iterator.scala:943)\n\tat scala.collection.AbstractIterator.foreach(Iterator.scala:1431)\n\tat scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)\n\tat scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)\n\tat scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105)\n\tat scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49)\n\tat scala.collection.TraversableOnce.to(TraversableOnce.scala:366)\n\tat scala.collection.TraversableOnce.to$(TraversableOnce.scala:364)\n\tat scala.collection.AbstractIterator.to(Iterator.scala:1431)\n\tat scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:358)\n\tat scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:358)\n\tat scala.collection.AbstractIterator.toBuffer(Iterator.scala:1431)\n\tat scala.collection.TraversableOnce.toArray(TraversableOnce.scala:345)\n\tat scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:339)\n\tat scala.collection.AbstractIterator.toArray(Iterator.scala:1431)\n\tat org.apache.spark.rdd.RDD.$anonfun$take$2(RDD.scala:1470)\n\tat org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2278)\n\tat org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)\n\tat org.apache.spark.scheduler.Task.run(Task.scala:136)\n\tat org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)\n\t... 1 more\n",
"corr": "5f19247e-40b2-4643-8ed1-b1e0f6c0d759"
}
],
"spark_executions": {}
},
},
"start_time": "2024-10-09T15:47:41.507980Z",
"end_time": "2024-10-09T16:01:34.247575Z",
"duration_seconds": 832.73,
"step": "FINISH",
"status": "failed"
}
This error is produced in the WRITING_DATASET step because there is a wrong Avro dataset schema configured in Kernel. This can happen if the configured schema for an Avro dataset and its specific version have not been properly published in Kernel’s environment.
For instance, Aura_Audit dataset for v6.0.0 in Kernel does not have the latest schema changes indicated in 4p-datasets codebase repository, for example, Aura_Audit dataset for v6.0.0 in 4p-datasets.
In the aura-databricks-jobs logs, error messages similar to the ones below will appear in different files:
-
Standard error file: Information on the general import process.
2024-10-14 13:08:53,922 ERROR 1110 /databricks/python/lib/python3.9/site-packages/aura_pytraces/aura_logging/base_logger.py msg="Error writing DATASET_ID: "Aura_Audit", there are local spark write discards that must be reviewed."
-
Log4j output file: Information about Spark operations and detail of the records with errors that will be ignored, as in the following example:
24/10/14 13:05:50 ERROR WasbAvroProducer: Unable to transform [c3a5b3ef-c968-4cf5-8c65-41d62b1a1562,2024-10-14 07:57:37.577,null,92e76dd4-a5c2-4672-a6c5-ba613e229c19,CRI,ai,d18c3ad3-6c7b-5739-8bcd-02e6d49b28bb,aura-gateway-api-6ddc48797-pnvl9,9.4.0,2024-10-14,0401] to avro message at partition 0 (ignoring it)
org.apache.spark.sql.avro.IncompatibleSchemaException: Cannot write "ai" since it's not defined in enum "rag", "generative", "message", "other", "nlpaas"
at org.apache.spark.sql.avro.BaikalAvroSerializer.$anonfun$newConverter$12(BaikalAvroSerializer.scala:123)
at org.apache.spark.sql.avro.BaikalAvroSerializer.$anonfun$newConverter$12$adapted(BaikalAvroSerializer.scala:120)
at org.apache.spark.sql.avro.BaikalAvroSerializer.$anonfun$newStructConverter$2(BaikalAvroSerializer.scala:258)```
A new file will be created with the name configured in the job’s variable: AURA_KPI_AVRO_PROCESS_ERROR_FILENAME. The file content will be similar to:
{
"time": "2024-10-09T15:47:41.507980Z",
"report_link": "https://commauradevstorage.blob.core.windows.net/aura-kpis-ap-six/avro_test/reports/aura-avro-kpis-report-2024-10-09T16%3A01%3A34.247575Z.json?se=2024-11-08T14%3A01%3A46Z&sp=r&sv=2021-08-06&sr=b&sig=GmHLQ/F5rk4Bob5OrbAZBpBs6z/CXiUjI4KLyticGzg%3D"
}
A new report will be created and stored in the path configured in the job’s variable: AURA_KPI_AVRO_SOURCE_SIZE_REPORT_PATH. The report will indicate the error in Aura_Suggestion dataset and will be similar to:
{
"num_files_kernel_uploaded": 20,
"num_files_moved_to_processed": 20,
"num_files_deleted": 20,
"num_files_skipped": 0,
"num_errors": 1,
"summary": {
"Aura_Audit": {
"dataset_id": "Aura_Audit",
"schema": "entity",
"version": "6.0.0",
"step": "WRITING_DATASET_ERROR_NOT_RECOVERABLE",
"num_files_kernel_uploaded": 9,
"num_files_moved_to_processed": 9,
"num_files_deleted": 9,
"num_files_skipped": 0,
"num_errors": 1,
"errors": [
{
"step": "WRITING_DATASET_ERROR_NOT_RECOVERABLE",
"key": "WRITING_DATASET_DISCARDED_RECORDS",
"description": "Local spark discarded records",
"error": "Error writing DATASET_ID: \"Aura_Audit\", there are local spark write discards that must be reviewed.",
"corr": "3ef2ac10-726f-4f07-a6ae-5407c2b02fb2"
}
],
"spark_executions": {
"dataset_id": "Aura_Audit",
"version": 6,
"correlator": "3ef2ac10-726f-4f07-a6ae-5407c2b02fb2",
"resource_id": "e03a1c5b-cd69-4fef-92fb-d80d3f8dd92a",
"request_type": "writes",
"status": "finished",
"metrics": {
"total_records_written": 1083,
"local_spark_write_discards": 9,
"local_spark_write_discards_total": 9,
"malformed_records_written": 0,
"total_records_filtered_by_gdpr": 0,
"local_spark_bytes_written_total": 208945,
"total_malformed_records_by_partition_written": [],
"partitions_written": [
[
[
"DAY_DT",
"2024-10-10"
]
],
[
[
"DAY_DT",
"2024-10-14"
]
],
[
[
"DAY_DT",
"2024-10-11"
]
]
],
"total_malformed_records_written": 0,
"total_malformed_records_by_column_written": [],
"total_records_by_partition_written": [
[
"DAY_DT=2024-10-14",
981
],
[
"DAY_DT=2024-10-10",
47
],
[
"DAY_DT=2024-10-11",
55
]
],
"total_not_informed_records_by_partition_written": [],
"records_read": 1083,
"local_spark_records_written_total": 1083,
"total_not_informed_records_written": 0,
"records_written": 1083,
"total_malformed_records_discarded": 0,
"records_discarded": 0,
"data_access_audit": {
"partitions_num": 1,
"wasb_type": "avro_fp"
},
"total_executor_cpu_millis": 1,
"total_executor_memory": 593913446,
"total_bytes_written": 63165
}
},
"files_uploaded": [
"avro_test/entity/Aura_Audit/6.0.0/AURA_062a0ab0-d0bd-5347-98bf-d88977af622f_CR_AUDIT_20241007T090000Z.avro",
"avro_test/entity/Aura_Audit/6.0.0/AURA_1d43887a-f368-51ce-abee-60f5b25387ad_CR_AUDIT_20241004T110000Z.avro"
]
}
},
"start_time": "2024-10-14T12:55:38.427732Z",
"end_time": "2024-10-14T13:08:41.567204Z",
"duration_seconds": 783.13,
"step": "WRITING_KERNEL_STAGE",
"status": "failed"
}
To resolve these errors, several steps must be performed:
-
Contact Kernel Operations team and specify the dataset id and version that must be republished, so that the environment is updated.
-
Before the job is run again, check if the problem in the schema has caused errors in some specific records that have not been loaded. They could have these messages in the error report:
- Local Spark discarded records:
{
"step": "WRITING_DATASET",
"description": "Local spark discarded records",
"error": "Error writing DATASET_ID: \"{DATASET_ID}\", there are local spark write discards that must be reviewed.",
"corr": "3ef2ac10-726f-4f07-a6ae-5407c2b02fb2"
}
{
"step": "WRITING_DATASET",
"description": "Malformed records",
"error": "Error writing DATASET_ID: \"{DATASET_ID}\", there are malformed records written that must be reviewed.",
"corr": "3ef2ac10-726f-4f07-a6ae-5407c2b02fb2"
}
{
"step": "WRITING_DATASET",
"description": "Malformed records",
"error": "Error writing DATASET_ID: \"{DATASET_ID}\", there are records discarded written that must be reviewed.",
"corr": "3ef2ac10-726f-4f07-a6ae-5407c2b02fb2"
}
For these cases, the wrong records must be manually corrected and reloaded independently of the rest of the records that were loaded correctly, to avoid duplicated data in the Kernel datasets. To correct the errors of schema, the information can be obtained from the Databricks’s logs, as explained before.
-
When these records have been resolved, the file will be deleted so that the job can be run again normally. Remove the file that was created with the name configured in the job’s variable: AURA_KPI_AVRO_PROCESS_ERROR_FILENAME.
Error trying to import dataset with missing schema
This error is produced in the READING_BLOBS step due to a missing Avro schema in configuration.
To solve it, review the schema path error indicated in logs and check if that path is valid in the file configured in the job’s variable: AURA_KPI_AVRO_ADAPTER_CONFIG_PATH. If you know the correct path to modify, you could change it in this file.
In the aura-databricks-jobs logs, an error message similar to this will appear:
py4j.protocol.Py4JJavaError: An error occurred while calling o39.load.
: java.io.FileNotFoundException: Could not read schema. You provided a path that does not exists: wasbs://aura-kpis-ap-six@commauradevstorage.blob.core.windows.net/avro_test/schemas/dimensional/6.0.0/aura-channel-asvc.json. Make sure that the filename and extension are in the path.
2024-10-09 11:13:15,924 ERROR 84269 .venv/../base_logger.py msg="Error processed avro_type_schema: "dimensional" and dataset_id: "D_Aura_Channel""
A new file will be created with the name configured in the job’s variable: AURA_KPI_AVRO_PROCESS_ERROR_FILENAME. The file content will be similar to:
{
"time": "2024-10-09T15:47:41.507980Z",
"report_link": "https://commauradevstorage.blob.core.windows.net/aura-kpis-ap-six/avro_test/reports/aura-avro-kpis-report-2024-10-09T16%3A01%3A34.247575Z.json?se=2024-11-08T14%3A01%3A46Z&sp=r&sv=2021-08-06&sr=b&sig=GmHLQ/F5rk4Bob5OrbAZBpBs6z/CXiUjI4KLyticGzg%3D"
}
A new report will be created and stored in the path configured in the job’s variable: AURA_KPI_AVRO_SOURCE_SIZE_REPORT_PATH. The report will indicate the error in Aura_Suggestion dataset and will be similar to:
{
"num_files_kernel_uploaded": 0,
"num_files_moved_to_processed": 0,
"num_files_deleted": 0,
"num_files_skipped": 0,
"num_errors": 1,
"summary": {
"D_Aura_Channel": {
"dataset_id": "D_Aura_Channel",
"schema": "dimensional",
"version": "6.0.0",
"step": "READING_BLOBS",
"num_files_kernel_uploaded": 0,
"num_files_moved_to_processed": 0,
"num_files_deleted": 0,
"num_files_skipped": 0,
"num_errors": 1,
"errors": [
{
"step": "READING_BLOBS",
"description": "avro_test/dimensional/D_Aura_Channel/6.0.0",
"error": "An error occurred while calling o39.load.\n: java.io.FileNotFoundException: Could not read schema. You provided a path that does not exists: wasbs://aura-kpis-ap-six@commauradevstorage.blob.core.windows.net/avro_test/schemas/dimensional/6.0.0/aura-channel-asvc.json. Make sure that the filename and extension are in the path.\n\tat com.telefonica.baikal.spark.sources.telefonica.external.write.TelefonicaExternalSourceRelationProvider.readSchema(TelefonicaExternalSourceRelationProvider.scala:75)\n\tat com.telefonica.baikal.spark.sources.telefonica.external.write.TelefonicaExternalSourceRelationProvider.readSchema$(TelefonicaExternalSourceRelationProvider.scala:66)\n\tat com.telefonica.baikal.spark.sources.telefonica.external.TelefonicaExternalSource.readSchema(TelefonicaExternalSource.scala:33)\n\tat com.telefonica.baikal.spark.sources.telefonica.external.TelefonicaExternalSource.$anonfun$getTable$2(TelefonicaExternalSource.scala:65)\n\tat scala.collection.MapLike.getOrElse(MapLike.scala:131)\n\tat scala.collection.MapLike.getOrElse$(MapLike.scala:129)\n\tat org.apache.spark.sql.catalyst.util.CaseInsensitiveMap.getOrElse(CaseInsensitiveMap.scala:30)\n\tat com.telefonica.baikal.spark.sources.telefonica.external.TelefonicaExternalSource.getTable(TelefonicaExternalSource.scala:63)\n\tat org.apache.spark.sql.execution.datasources.v2.DataSourceV2Utils$.getTableFromProvider(DataSourceV2Utils.scala:92)\n\tat org.apache.spark.sql.execution.datasources.v2.DataSourceV2Utils$.loadV2Source(DataSourceV2Utils.scala:140)\n\tat org.apache.spark.sql.DataFrameReader.$anonfun$load$1(DataFrameReader.scala:209)\n\tat scala.Option.flatMap(Option.scala:271)\n\tat org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:207)\n\tat org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:185)\n\tat java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\n\tat java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\n\tat java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.base/java.lang.reflect.Method.invoke(Method.java:566)\n\tat py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)\n\tat py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)\n\tat py4j.Gateway.invoke(Gateway.java:282)\n\tat py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)\n\tat py4j.commands.CallCommand.execute(CallCommand.java:79)\n\tat py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)\n\tat py4j.ClientServerConnection.run(ClientServerConnection.java:106)\n\tat java.base/java.lang.Thread.run(Thread.java:834)\n",
"corr": "4f4db627-1de8-4436-80c9-95ade4788559"
}
],
"spark_executions": {}
}
},
"start_time": "2024-10-09T16:23:01.483043Z",
"end_time": "2024-10-09T16:23:39.137639Z",
"duration_seconds": 37.65,
"step": "WRITING_KERNEL_STAGE",
"status": "failed"
}
Error trying to init Spark session
In the event of a possible error in the initialization of the spark context. To solve it, we must re-execute the job to check if this momentary connection problem with the cluster is resolved. If the error continues to occur, it would be necessary to contact Kernel operations team.
In the aura-databricks-jobs logs, an error message similar to this will appear:
24/10/09 13:18:28 WARN TransportChannelHandler: Exception in connection from /192.168.1.71:59460
java.lang.IllegalArgumentException: Too large frame: 5785721462170058752
at org.sparkproject.guava.base.Preconditions.checkArgument(Preconditions.java:119)
at org.apache.spark.network.util.TransportFrameDecoder.decodeNext(TransportFrameDecoder.java:148)
at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:98)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:722)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:658)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:584)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Thread.java:834)
24/10/09 13:18:28 ERROR TransportResponseHandler: Still have 1 requests outstanding when connection from /192.168.1.71:59460 is closed
24/10/09 13:18:28 ERROR SparkContext: Error initializing SparkContext.
A new file will be created with the name configured in the job’s variable: AURA_KPI_AVRO_PROCESS_ERROR_FILENAME. The file content will be similar to:
{
"time": "2024-10-09T13:18:08.119222Z",
"report_link": "https://{account_name}}.blob.core.windows.net/{container_name}/avro/reports/aura-avro-kpis-report-2024-10-09T13%3A18%3A28.761361Z.json?{signature}",
"error": [
"An error occurred in sparkSDKManager. An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.\n: java.lang.IllegalArgumentException: Too large frame: 5785721462170058752\n\tat org.sparkproject.guava.base.Preconditions.checkArgument(Preconditions.java:119)\n\tat org.apache.spark.network.util.TransportFrameDecoder.decodeNext(TransportFrameDecoder.java:148)\n\tat org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:98)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)\n\tat io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)\n\tat io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)\n\tat io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)\n\tat io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)\n\tat io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:722)\n\tat io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:658)\n\tat io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:584)\n\tat io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496)\n\tat io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)\n\tat io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)\n\tat io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)\n\tat java.base/java.lang.Thread.run(Thread.java:834)\n"
]
}
It will be created a new report stored in path configured in the job’s variable: AURA_KPI_AVRO_SOURCE_SIZE_REPORT_PATH. The report will be similar to:
{
"num_files_kernel_uploaded": 0,
"num_files_moved_to_processed": 0,
"num_files_deleted": 0,
"num_files_skipped": 0,
"num_errors": 1,
"summary": {
"process_error": "An error occurred in sparkSDKHandler. An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.\n: java.lang.IllegalArgumentException: Too large frame: 5785721462170058752\n\tat org.sparkproject.guava.base.Preconditions.checkArgument(Preconditions.java:119)\n\tat org.apache.spark.network.util.TransportFrameDecoder.decodeNext(TransportFrameDecoder.java:148)\n\tat org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:98)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)\n\tat io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)\n\tat io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)\n\tat io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)\n\tat io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)\n\tat io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:722)\n\tat io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:658)\n\tat io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:584)\n\tat io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496)\n\tat io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)\n\tat io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)\n\tat io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)\n\tat java.base/java.lang.Thread.run(Thread.java:834)\n"
},
"start_time": "2024-10-09T13:18:08.119222Z",
"end_time": "2024-10-09T13:18:28.761361Z",
"duration_seconds": 20.64,
"step": "INIT",
"status": "failed"
}
Writing error in dataset due to out of memory error
In this scenario, certain stage in Spark is not executed due to some Java heap space or error, so the files of that dataset are not imported.
To correct it, delete the error file configured in the job’s variable: AURA_KPI_AVRO_PROCESS_ERROR_FILENAME and run the job again, so that the data from the files that were not imported are now loaded.
In the aura-databricks-jobs logs, an error message similar to this will appear in the Log4j output file:
An error occurred while calling o582.save.\n: com.telefonica.baikal.spark.exceptions.WriteStatusException: The writing process has failed with resourceId 10543db5-cb35-446e-8cc7-349a3c6cbffb and dataset (D_Aura_App, 6)
at com.telefonica.baikal.spark.sources.telefonica.config.DatasetServiceComponents.$anonfun$waitWriterStatus$2(DatasetServiceComponents.scala:344)
A new report is generated and stored in path configured in the job’s variable: AURA_KPI_AVRO_SOURCE_SIZE_REPORT_PATH. The report will be similar to:
{
"num_files_kernel_uploaded": 0,
"num_files_moved_to_processed": 0,
"num_files_deleted": 0,
"num_files_skipped": 0,
"num_errors": 1,
"summary": {
"D_Aura_App": {
"errors": [
{
"step": "WRITING_DATASET",
"description": "avro/dimensional/D_Aura_App/6.0.0",
"error": "An error occurred while calling o582.save.\n: com.telefonica.baikal.spark.exceptions.WriteStatusException: The writing process has failed with resourceId 10543db5-cb35-446e-8cc7-349a3c6cbffb and dataset (D_Aura_App, 6)\n\tat com.telefonica.baikal.spark.sources.telefonica.config.DatasetServiceComponents.$anonfun$waitWriterStatus$2(DatasetServiceComponents.scala:344)\n\tat com.telefonica.baikal.spark.sources.telefonica.config.DatasetServiceComponents.$anonfun$waitWriterStatus$2$adapted(DatasetServiceComponents.scala:341)\n\tat scala.util.Success.$anonfun$map$1(Try.scala:255)\n\tat scala.util.Success.map(Try.scala:213)\n\tat scala.concurrent.Future.$anonfun$map$1(Future.scala:292)\n\tat scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33)\n\tat scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33)\n\tat scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)\n\tat java.base/java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1426)\n\tat java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)\n\tat java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020)\n\tat java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656)\n\tat java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594)\n\tat java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183)\n",
"corr": "21fe54f6-6c34-499a-993c-9dfe30e9e717"
}
],
"spark_executions": {
"dataset_id": "D_Aura_App",
"version": 6,
"correlator": "21fe54f6-6c34-499a-993c-9dfe30e9e717",
"resource_id": "10543db5-cb35-446e-8cc7-349a3c6cbffb",
"request_type": "writes",
"status": "failed",
"metrics": {
"local_spark_bytes_written_total": 44596,
"local_spark_records_written_total": 241,
"local_spark_write_discards_total": 0,
"local_spark_write_discards": 0
}
}
}
},
"start_time": "2024-10-09T13:18:08.119222Z",
"end_time": "2024-10-09T13:18:28.761361Z",
"duration_seconds": 20.64,
"step": "WRITING_KERNEL_STAGE",
"status": "failed"
Error trying to import datasets with timeout in Spark execution
This error is produced in the WRITING_DATASET step because the configurations of the spark partitions are not correct.
The spark process runs for two hours and then terminates without writing the data to the dataset.
To solve it, contact Kernel Operations team to review the file configured in the job’s variable: AURA_KPI_AVRO_SOURCE_SIZE_REPORT_PATH and modify the value of averageFileSize in each dataset.
In the aura-databricks-jobs logs, a message similar to this will appear, and no trace will continue afterwards since the process will end with a timeout.
{"corr":"8be82aec-6559-4fc9-be74-74dfc56de615","msg":"Writing blobs of avro blob path: \"avro/entity/D_Aura_Audit/6.0.0\" to dataset_id: \"D_Aura_LivingApp\"","lvl":"INFO","time":"2024-12-18T12:17:51.056Z","app":"aura-databricks-jobs","version":"9.6.0","module":"avro-kpis-manager","host":"1218-120721-e3l79q40-192-168-64-10","pid":1278,"caller_info":"/databricks/python/lib/python3.9/site-packages/aura_databricks_jobs/avro_kpis/avro_kpis_manager.py:70"}
A new file will be created with the name configured in the job’s variable: AURA_KPI_AVRO_PROCESS_ERROR_FILENAME. The file content will be similar to:
{
"time": "2024-10-09T15:47:41.507980Z",
"report_link": "https://commauradevstorage.blob.core.windows.net/aura-kpis-ap-six/avro_test/reports/aura-avro-kpis-report-2024-10-09T16%3A01%3A34.247575Z.json?se=2024-11-08T14%3A01%3A46Z&sp=r&sv=2021-08-06&sr=b&sig=GmHLQ/F5rk4Bob5OrbAZBpBs6z/CXiUjI4KLyticGzg%3D"
}
A new report will be created and stored in the path configured in the job’s variable: AURA_KPI_AVRO_SOURCE_SIZE_REPORT_PATH. The report will indicate the process will not finish in the FINISH stage but in WRITING_DATASET_STAGE stage. In the next execution, it will try to load the files again.
{
"num_files_kernel_uploaded": 0,
"num_files_moved_to_processed": 0,
"num_files_deleted": 0,
"num_files_skipped": 0,
"num_errors": 0,
"summary": {
"D_Aura_Channel": {
"dataset_id": "D_Aura_Audit",
"schema": "entity",
"version": "6.0.0",
"step": "WRITING_DATASET",
"num_files_kernel_uploaded": 0,
"num_files_moved_to_processed": 0,
"num_files_deleted": 0,
"num_files_skipped": 0,
"num_errors": 0,
"errors": [],
},
"files_uploaded": [
"avro_test/dimensional/D_Aura_Channel/6.0.0/CR_DIM_CHANNEL_20241017T070000Z.avro",
"avro_test/dimensional/D_Aura_Channel/6.0.0/CR_DIM_CHANNEL_20241017T080000Z.avro",
"avro_test/dimensional/D_Aura_Channel/6.0.0/CR_DIM_CHANNEL_20241017T090000Z.avro",
"avro_test/dimensional/D_Aura_Channel/6.0.0/CR_DIM_CHANNEL_20241017T100000Z.avro"
],
"duration_seconds": 1411.32
}
},
"start_time": "2024-10-09T15:47:41.507980Z",
"end_time": "2024-10-09T16:01:34.247575Z",
"duration_seconds": 832.73,
"step": "WRITING_DATASET_STAGE",
"status": "succesfully"
}
Reports SAS Expiration configuration
The value of AURA_KPI_AVRO_REPORTS_SAS_EXPIRATION has an incorrect format. To solve it, indicate an integer with the time to expiration in minutes to be configured.
In the aura-databricks-jobs logs, an error message similar to this will appear:
2024-10-09 11:04:29,495 ERROR 83383 .venv/../base_logger.py msg="Error in configuration: {'AURA_KPI_AVRO_REPORTS_SAS_EXPIRATION': ['Not a valid integer.']}"
Error copying files to processed folder
This error is produced in the MOVING_BLOBS_TO_PROCESSED step due to, for example, a connection error with Azure or permissions problems when copying the destination folder.
To resolve it, move manually the files from the path with the error to the processed folder configured in the job’s variable: AURA_KPI_AVRO_PROCESSED_FOLDER_PATH.
In the aura-databricks-jobs logs, an error message similar to this will appear:
2024-10-09 11:23:15,924 ERROR 84269 .venv/../base_logger.py msg="Detected 2 errors when trying copying files in "avro/processed/avro/dimensional/D_Aura_Channel/6.0.0". Review generated report for more detail.
A new report will be created and stored in the path configured in the job’s variable: AURA_KPI_AVRO_SOURCE_SIZE_REPORT_PATH. The report will be similar to:
{
"num_files_kernel_uploaded": 2,
"num_files_moved_to_processed": 0,
"num_files_deleted": 0,
"num_files_skipped": 0,
"num_errors": 2,
"summary": {
"D_Aura_Channel": {
"dataset_id": "D_Aura_Channel",
"schema": "dimensional",
"version": "6.0.0",
"step": "MOVING_BLOBS_TO_PROCESSED",
"num_files_kernel_uploaded": 2,
"num_files_moved_to_processed": 0,
"num_files_deleted": 0,
"num_files_skipped": 0,
"num_errors": 2,
"errors": [
{
"step": "MOVING_BLOBS_TO_PROCESSED",
"description": "avro_test/dimensional/D_Aura_Channel/6.0.0/CR_DIM_CHANNEL_20240920T120000Z.avro",
"error": "Error copy blob: \"avro_test/dimensional/D_Aura_Channel/6.0.0/CR_DIM_CHANNEL_20240920T120000Z.avro\" to \"avro_test/processed/avro/dimensional/D_Aura_Channel/6.0.0/CR_DIM_CHANNEL_20240920T120000Z.avro\" and container: \"aura-kpis-ap-six\". Error: Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.\nRequestId:ae99a5cf-501e-009f-5b62-195240000000\nTime:2024-10-08T09:11:13.7634974Z\nErrorCode:CannotVerifyCopySource\nContent: <?xml version=\"1.0\" encoding=\"utf-8\"?><Error><Code>CannotVerifyCopySource</Code><Message>Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.\nRequestId:ae99a5cf-501e-009f-5b62-195240000000\nTime:2024-10-08T09:11:13.7634974Z</Message></Error>",
"corr": "no-correlator"
},
{
"step": "MOVING_BLOBS_TO_PROCESSED",
"description": "avro_test/dimensional/D_Aura_Channel/6.0.0/CR_DIM_CHANNEL_20240920T130000Z.avro",
"error": "Error copy blob: \"avro_test/dimensional/D_Aura_Channel/6.0.0/CR_DIM_CHANNEL_20240920T130000Z.avro\" to \"avro_test/processed/avro/dimensional/D_Aura_Channel/6.0.0/CR_DIM_CHANNEL_20240920T130000Z.avro\" and container: \"aura-kpis-ap-six\". Error: Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.\nRequestId:ae99a5fb-501e-009f-0262-195240000000\nTime:2024-10-08T09:11:13.8156074Z\nErrorCode:CannotVerifyCopySource\nContent: <?xml version=\"1.0\" encoding=\"utf-8\"?><Error><Code>CannotVerifyCopySource</Code><Message>Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.\nRequestId:ae99a5fb-501e-009f-0262-195240000000\nTime:2024-10-08T09:11:13.8156074Z</Message></Error>",
"corr": "no-correlator"
}
]
}
},
"start_time": "2024-09-03T17:56:26.464890Z",
"end_time": "2024-09-03T18:21:17.115379Z",
"duration_seconds": 1490.65,
"step": "MOVING_PROCESSED_BLOBS_STAGE",
"status": "failed"
}
Error deleting processed files
This error is produced in the REMOVING_BLOBS step due to, for example, a connection error with Azure or permissions problems when copying the destination folder. To resolve it, delete manually the files from the path with the error.
In the aura-databricks-jobs logs, an error message similar to this will appear:
2024-10-09 12:13:15,924 ERROR 84269 .venv/../base_logger.py msg="Detected 2 errors when trying remove files in "avro/dimensional/D_Aura_Channel/6.0.0". Review generated report for more detail.
A new report will be created and stored in the path configured in the job’s variable: AURA_KPI_AVRO_SOURCE_SIZE_REPORT_PATH. The report will be similar to:
{
"num_files_kernel_uploaded": 2,
"num_files_moved_to_processed": 2,
"num_files_deleted": 0,
"num_files_skipped": 0,
"num_errors": 2,
"summary": {
"D_Aura_Channel": {
"dataset_id": "D_Aura_Channel",
"schema": "dimensional",
"version": "6.0.0",
"step": "REMOVING_BLOBS",
"num_files_kernel_uploaded": 2,
"num_files_moved_to_processed": 2,
"num_files_deleted": 0,
"num_files_skipped": 0,
"num_errors": 2,
"errors": [
{
"step": "REMOVING_BLOBS",
"description": "avro/dimensional/D_Aura_Channel/6.0.0/CR_DIM_CHANNEL_20240920T120000Z.avro",
"error": "Error deleting the blob: \"avro/dimensional/D_Aura_Channel/6.0.0/CR_DIM_CHANNEL_20240920T120000Z.avro\". Error: Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.\nRequestId:ae99a5cf-501e-009f-5b62-195240000000\nTime:2024-10-08T09:11:13.7634974Z\nErrorCode:CannotVerifyCopySource\nContent: <?xml version=\"1.0\" encoding=\"utf-8\"?><Error><Code>CannotVerifyCopySource</Code><Message>Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.\nRequestId:ae99a5cf-501e-009f-5b62-195240000000\nTime:2024-10-08T09:11:13.7634974Z</Message></Error>",
"corr": "no-correlator"
},
{
"step": "REMOVING_BLOBS",
"description": "avro/dimensional/D_Aura_Channel/6.0.0/CR_DIM_CHANNEL_20240920T130000Z.avro",
"error": "Error deleting the blob: \"avro/dimensional/D_Aura_Channel/6.0.0/CR_DIM_CHANNEL_20240920T130000Z.avro\". Error: Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.\nRequestId:ae99a5cf-501e-009f-5b62-195240000000\nTime:2024-10-08T09:11:13.7634974Z\nErrorCode:CannotVerifyCopySource\nContent: <?xml version=\"1.0\" encoding=\"utf-8\"?><Error><Code>CannotVerifyCopySource</Code><Message>Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.\nRequestId:ae99a5cf-501e-009f-5b62-195240000000\nTime:2024-10-08T09:11:13.7634974Z</Message></Error>",
"corr": "no-correlator"
}
]
}
},
"start_time": "2024-09-03T17:56:26.464890Z",
"end_time": "2024-09-03T18:21:17.115379Z",
"duration_seconds": 1490.65,
"step": "MOVING_PROCESSED_BLOBS_STAGE",
"status": "failed"
}
Error in adapter configuration
There is an error in the process to obtain adapter information of the file configured in the variable AURA_KPI_AVRO_ADAPTER_CONFIG_PATH.
To correct it, check that the file is generated by aura-kpis-uploader in this path.
In the aura-databricks-jobs logs, a warning message similar to this will appear:
2024-10-09 16:19:39,994 ERROR 52315 msg="It could not obtain the configuration of the schemas to import in schemas/aura-avro-adapter.json"
There are elements configured in AURA_KPI_AVRO_ADAPTER_CONFIG_PATH that are not defined as Avro schema to import in Kernel datasets.
In the aura-databricks-jobs logs, a warn message similar to this will appear:
2024-10-09 16:23:36,914 WARN 12400 .venv/../base_logger.py msg="The schema type "E_Aura_BOT" is not avro format and is not imported"
2024-10-09 16:23:36,914 WARN 12400 .venv/../base_logger.py msg="The schema type "E_Aura_CLF" is not avro format and is not imported"
2024-10-09 16:23:36,914 WARN 12400 .venv/../base_logger.py msg="The schema type "E_Aura_GROOT" is not avro format and is not imported"
2024-10-09 16:23:36,914 WARN 12400 .venv/../base_logger.py msg="The schema type "E_Aura_NLP" is not avro format and is not imported"
2024-10-09 16:23:36,914 WARN 12400 .venv/../base_logger.py msg="The schema type "E_Aura_SERVICES" is not avro format and is not imported"
Error in size report configuration
There is an error when obtaining adapter information of a file configured in variable AURA_KPI_AVRO_ADAPTER_CONFIG_PATH.
To correct it, you must check the file is generated by aura-kpis-uploader in this path.
In the aura-databricks-jobs logs, a warn message similar to this will appear:
2024-10-09 18:29:39,023 ERROR 52395 msg="It could not obtain the configuration of the size report to import in "avro/sizeReport.json""
Message indicating no Avro files to load in dataset
There are elements configured in AURA_KPI_AVRO_ADAPTER_CONFIG_PATH as Avro schema that there are not Avro files to import in Kernel datasets.
In the aura-databricks-jobs logs, an info message similar to this will appear:
2024-10-09 16:23:37,972 INFO 12400 .venv/../base_logger.py msg="Import files from directory "avro_test/dimensional/D_Aura_Recognizer/6.0.0""
2024-10-09 16:23:38,115 INFO 12400 .venv/../base_logger.py msg="There are no avro files to load for the path: "avro_test/dimensional/D_Aura_Recognizer/6.0.0""
A new report will be created and stored in the path configured in the job’s variable: AURA_KPI_AVRO_SOURCE_SIZE_REPORT_PATH. The report will be similar to:
{
"num_files_kernel_uploaded": 0,
"num_files_moved_to_processed": 0,
"num_files_deleted": 0,
"num_files_skipped": 0,
"num_errors": 0,
"summary": {
"D_Aura_Channel": {
"dataset_id": "D_Aura_Channel",
"schema": "dimensional",
"version": "6.0.0",
"step": "NOT_PROCESSED",
"num_files_kernel_uploaded": 0,
"num_files_moved_to_processed": 0,
"num_files_deleted": 0,
"num_files_skipped": 0,
"num_errors": 0,
"errors": [],
"spark_executions": {}
},
"D_Aura_Recognizer": {
"dataset_id": "D_Aura_Recognizer",
"schema": "dimensional",
"version": "6.0.0",
"step": "NOT_PROCESSED",
"num_files_kernel_uploaded": 0,
"num_files_moved_to_processed": 0,
"num_files_deleted": 0,
"num_files_skipped": 0,
"num_errors": 0,
"errors": [],
"spark_executions": {}
}
},
"start_time": "2024-09-03T17:56:26.464890Z",
"end_time": "2024-09-03T18:21:17.115379Z",
"duration_seconds": 1490.65,
"step": "FINISH"
"status": "successfully"
}