<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Aura – </title>
    <link>/docs/deployment/infraestructure/</link>
    <description>Recent content on Aura</description>
    <generator>Hugo -- gohugo.io</generator>
    <language>en</language>
    
	  <atom:link href="/docs/deployment/infraestructure/index.xml" rel="self" type="application/rss+xml" />
    
    
      
        
      
    
    
    <item>
      <title>Docs: </title>
      <link>/docs/deployment/infraestructure/kubernetes-cluster/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      
      <guid>/docs/deployment/infraestructure/kubernetes-cluster/</guid>
      <description>
        
        
        &lt;h1 id=&#34;kubernetes-cluster&#34;&gt;Kubernetes cluster&lt;/h1&gt;


&lt;div class=&#34;pageinfo pageinfo-primary&#34;&gt;
&lt;p&gt;Description of the Kubernetes cluster: components, connection, automation, storage and more&lt;/p&gt;

&lt;/div&gt;

&lt;h2 id=&#34;introduction&#34;&gt;Introduction&lt;/h2&gt;
&lt;p&gt;The Kubernetes cluster is created on the cloud using &lt;a href=&#34;https://github.com/Azure/aks-engine&#34;&gt;aks-engine&lt;/a&gt; (Azure), the tool provided by Microsoft to generate the Azure Resource Manager templates that deploy Kubernetes cluster. This tool simplifies the cluster installation, management and upgrades.&lt;/p&gt;
&lt;p&gt;Kubernetes are internally used by the Aura Platform installer but &lt;strong&gt;they are not intended to be directly used by the Operations and Support Teams&lt;/strong&gt;. The reason is that changes made to a Kubernetes cluster that are not reflected in the Aura Platform deployment profile could be lost in a platform upgrade. For example, if you changed the instance type in a specific agent pool with Kops, you would lose the changes in a new deployment.&lt;/p&gt;
&lt;h2 id=&#34;cluster-metadata&#34;&gt;Cluster metadata&lt;/h2&gt;
&lt;p&gt;The &lt;strong&gt;cluster metadata&lt;/strong&gt; is saved in the object storage (Azure Blob Storage).&lt;/p&gt;
&lt;p&gt;&amp;#x26a0;&amp;#xfe0f; Remember that if you lose this bucket, you need to rebuild the cluster manually (Azure). It also contains critical information to access the cluster (certificates, private keys), so it must not be shared.&lt;/p&gt;
&lt;h2 id=&#34;connecting-to-kubernetes&#34;&gt;Connecting to Kubernetes&lt;/h2&gt;
&lt;p&gt;The &lt;strong&gt;kubectl&lt;/strong&gt; command-line tool is the basic tool to operate the Kubernetes cluster. It uses &lt;strong&gt;kubeconfig files&lt;/strong&gt; that contain all the required information (endpoints, certificates, etc) to connect with the Kubernetes API and manage the cluster in a secure way.&lt;/p&gt;
&lt;p&gt;A default (root) &lt;code&gt;kubeconfig&lt;/code&gt; file with full access to the cluster is created the first time you install the Aura Platform and is stored in a bucket as part of the cluster metadata.&lt;/p&gt;
&lt;p&gt;&amp;#x26a0;&amp;#xfe0f;&lt;strong&gt;Do NOT use the default &lt;code&gt;kubeconfig&lt;/code&gt; file to manage the cluster&lt;/strong&gt;. Use it to &lt;a href=&#34;../../../docs/deployment/security/&#34;&gt;create new users with limited permissions&lt;/a&gt; instead.&lt;/p&gt;
&lt;!-- Dónde debe apuntar el link específicamente dentro de security? --&gt;
&lt;p&gt;By default, kubectl looks for a file named &lt;code&gt;config&lt;/code&gt; in the &lt;code&gt;$HOME/.kube&lt;/code&gt; directory. However, you can specify other kubeconfig files by setting the &lt;code&gt;KUBECONFIG&lt;/code&gt; environment variable or passing the flag &lt;code&gt;--kubeconfig&lt;/code&gt; to kubectl.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#204a87&#34;&gt;export&lt;/span&gt; &lt;span style=&#34;color:#000&#34;&gt;KUBECONFIG&lt;/span&gt;&lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;=&lt;/span&gt;/path/to/kubeconfig.json  &lt;span style=&#34;color:#8f5902;font-style:italic&#34;&gt;# Azure&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&amp;#x26a0;&amp;#xfe0f; For security reasons, &lt;code&gt;kubeconfig&lt;/code&gt; files are personal and must not be shared. Each action a user executes on the cluster is logged. If your &lt;code&gt;kubeconfig&lt;/code&gt; file is compromised, you must report it.&lt;/p&gt;
&lt;p&gt;&amp;#x1f4c4; You can get more information about &lt;code&gt;kubeconfig&lt;/code&gt; files at the
&lt;a href=&#34;https://kubernetes.io/docs/concepts/configuration/organize-cluster-access-kubeconfig/&#34;&gt;official Kubernetes documentation&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&amp;#x26a0;&amp;#xfe0f; Kubernetes-dashboard is not deployed in Aura cluster as we promote the &lt;strong&gt;kubectl&lt;/strong&gt; use, but if you still want to use it, you can run the dashboard in your machine and connect to the cluster with your kubeconfig as follows:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;docker run -p 9090:9090 -v &lt;span style=&#34;color:#4e9a06&#34;&gt;${&lt;/span&gt;&lt;span style=&#34;color:#000&#34;&gt;PATH_TO_YOUR_KUBECONFIG&lt;/span&gt;&lt;span style=&#34;color:#4e9a06&#34;&gt;}&lt;/span&gt;/kubeconfig.json:/opt/kubeconfig.json -v /tmp:/tmp kubernetesui/dashboard:v2.2.0 --kubeconfig /opt/kubeconfig.json
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id=&#34;kubernetes-automation-with-the-cloud&#34;&gt;Kubernetes automation with the Cloud&lt;/h2&gt;
&lt;p&gt;Kubernetes automates certain tasks such as creating a Load Balancer for a service or mounting a disk on a node as a persistent volume for a pod.&lt;/p&gt;
&lt;p&gt;Authentication against the cloud provider APIs is done using credentials configured automatically:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Azure&lt;/strong&gt;: by using a Service Principal with contributor role and scope for the infrastructure Resource Group.&lt;/p&gt;
&lt;p&gt;&amp;#x26a0;&amp;#xfe0f; Please &lt;strong&gt;do not change the password, delete the Service Principal or remove its Contributor role&lt;/strong&gt;   If doing so, credentials will not be automatically updated due to a known limitation of aks-engine.&lt;br&gt;
See the issue &lt;a href=&#34;https://github.com/Azure/aks-engine/issues/724&#34;&gt;Azure/aks-engine/724&lt;/a&gt; in the aks-engine repository for more information.&lt;/p&gt;
&lt;p&gt;The &amp;ldquo;Security&amp;rdquo; section describes an unofficial procedure to &lt;a href=&#34;../../../docs/deployment/security/&#34;&gt;change the Service Principal credentials in Azure&lt;/a&gt; that involves a period of service disruption.&lt;/p&gt;
&lt;!-- Dónde debe apuntar el link específicamente dentro de security? --&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;kubernetes-namespaces&#34;&gt;Kubernetes Namespaces&lt;/h2&gt;
&lt;p&gt;&lt;a href=&#34;https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/&#34;&gt;Kubernates Namespaces&lt;/a&gt; are a way to divide and organize cluster resources.&lt;/p&gt;
&lt;p&gt;You can list the existing namespaces running &lt;code&gt;kubectl get namespaces&lt;/code&gt;. In Aura Platform, you will find the following ones:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Namespaces used by the Aura Platform services:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;aura-$ENV&lt;/strong&gt;: Aura Platform core services.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;aura-system&lt;/strong&gt;: Aura Platform system services (prometheus, alertmanager, node-exporter,
fluentd, elasticsearch, kube-static-metrics).&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Namespaces used by Kubernetes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;kube-system&lt;/strong&gt;: for objects created by the Kubernetes system. Aura Platform also deploys some objects into this namespace that are very tied to the infrastructure.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;kube-public&lt;/strong&gt;: readable by all users. It should be empty.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Default namespace (&lt;strong&gt;default&lt;/strong&gt;): for objects with no other namespace.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In most situations, you will need to use &lt;strong&gt;aura-$ENV&lt;/strong&gt;. You can use the &lt;code&gt;--namespace&lt;/code&gt; flag in &lt;code&gt;kubectl&lt;/code&gt; to specify which namespace you are referring to. For example, to get the pods in the Aura Platform core:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;$ kubectl get pods --namespace aura-&lt;span style=&#34;color:#000&#34;&gt;$ENV&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Remember that some low-level resources, such as nodes and persistent volumes, are not in a namespace.&lt;/p&gt;
&lt;h2 id=&#34;kubernetes-objects&#34;&gt;Kubernetes objects&lt;/h2&gt;
&lt;h3 id=&#34;working-with-pods&#34;&gt;Working with pods&lt;/h3&gt;
&lt;p&gt;You can list all pods in a given namespace along with additional metadata (the node where the pod is allocated, its age, etc.):&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;$ kubectl get pods -n aura-&lt;span style=&#34;color:#000&#34;&gt;$ENV&lt;/span&gt; -o wide
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;NAME                                 READY   STATUS      RESTARTS   AGE
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;aog-bridge-744bbb9595-94g7n          1/1     Running     &lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;0&lt;/span&gt;          4h15m
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;aog-bridge-744bbb9595-pzg2l          1/1     Running     &lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;0&lt;/span&gt;          4h15m
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;api-gw-5c584b4c8d-hdk25              1/1     Running     &lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;0&lt;/span&gt;          4h15m
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;api-gw-5c584b4c8d-knm27              1/1     Running     &lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;0&lt;/span&gt;          4h15m
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;aura-bot-84bd44dc6d-5jdzk            1/1     Running     &lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;0&lt;/span&gt;          4h16m
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;aura-bot-84bd44dc6d-ktz74            1/1     Running     &lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;0&lt;/span&gt;          4h16m
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;aura-bot-makeup-4qrbl                0/1     Completed   &lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;0&lt;/span&gt;          4h16m
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;authentication-api-b849b6ff9-5t96c   1/1     Running     &lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;0&lt;/span&gt;          4h16m
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;authentication-api-b849b6ff9-rwcvm   1/1     Running     &lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;0&lt;/span&gt;          4h16m
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;nginx-5fd94584d8-tcpwd               2/2     Running     &lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;0&lt;/span&gt;          4h15m
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;nginx-5fd94584d8-z72tf               2/2     Running     &lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;0&lt;/span&gt;          4h15m
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;nlp-85b4b446cc-df2zw                 1/1     Running     &lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;0&lt;/span&gt;          4h16m
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;nlp-85b4b446cc-s6z5k                 1/1     Running     &lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;0&lt;/span&gt;          4h16m
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;nlp-provisioning-zxkk5               0/1     Completed   &lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;0&lt;/span&gt;          4h16m
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;user-helper-67b75cb8fc-6vkt9         1/1     Running     &lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;0&lt;/span&gt;          4h15m
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;user-helper-67b75cb8fc-td42l         1/1     Running     &lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;0&lt;/span&gt;          4h15m
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;web-sdk-5f7654b797-9npzn             1/1     Running     &lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;0&lt;/span&gt;          4h16m
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;web-sdk-5f7654b797-mlcrj             1/1     Running     &lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;0&lt;/span&gt;          4h16m
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Pods can have different statuses:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Running&lt;/li&gt;
&lt;li&gt;Completed: some pods (e.g., jobs) have a reduced lifespan. They change to completed status when they finish.
Kubernetes eventually removes them from the list of pods.&lt;/li&gt;
&lt;li&gt;Others&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&amp;#x1f4c4; You can get more information about working with pods in the
&lt;a href=&#34;https://kubernetes.io/docs/tasks/access-application-cluster/list-all-running-container-images/&#34;&gt;Kubernetes documentation&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Each pod is configured using environment variables. They are a OS-agnostic standard that allows to change the configuration between deployments without changing any code in a very easy way.
Sensitive information (e.g., passwords) is configured using Kubernetes secrets.&lt;/p&gt;
&lt;h4 id=&#34;working-with-deployments&#34;&gt;Working with deployments&lt;/h4&gt;
&lt;p&gt;A &lt;strong&gt;deployment controller&lt;/strong&gt; provides declarative updates for Pods and ReplicaSets, according to the desired state described in a deployment object.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;$ kubectl get deployments
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;NAME                 READY   UP-TO-DATE   AVAILABLE   AGE
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;aog-bridge           2/2     &lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;2&lt;/span&gt;            &lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;2&lt;/span&gt;           4h16m
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;api-gw               2/2     &lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;2&lt;/span&gt;            &lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;2&lt;/span&gt;           4h16m
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;aura-bot             2/2     &lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;2&lt;/span&gt;            &lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;2&lt;/span&gt;           4h17m
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;authentication-api   2/2     &lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;2&lt;/span&gt;            &lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;2&lt;/span&gt;           4h16m
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;nginx                2/2     &lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;2&lt;/span&gt;            &lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;2&lt;/span&gt;           4h16m
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;nlp                  2/2     &lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;2&lt;/span&gt;            &lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;2&lt;/span&gt;           4h16m
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;user-helper          2/2     &lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;2&lt;/span&gt;            &lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;2&lt;/span&gt;           4h16m
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;web-sdk              2/2     &lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;2&lt;/span&gt;            &lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;2&lt;/span&gt;           4h16m
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&amp;#x2139;&amp;#xfe0f; You can get more information about working with deployments in the
&lt;a href=&#34;https://kubernetes.io/docs/concepts/workloads/controllers/deployment/&#34;&gt;Kubernetes documentation&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id=&#34;working-with-nodes&#34;&gt;Working with nodes&lt;/h3&gt;
&lt;p&gt;Nodes are the virtual machines that run Aura Platform.&lt;/p&gt;
&lt;p&gt;There are two types of nodes in any Kubernetes cluster:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Master nodes: they host the control plane aspects of the cluster. Typically, these nodes are not used to schedule application workloads.&lt;/li&gt;
&lt;li&gt;Compute nodes: nodes which are responsible for executing workloads for the platform services.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If there is an issue in the cluster, the first thing you should review is the status of the nodes.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;$ kubectl get nodes
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;NAME                                 STATUS   ROLES    AGE     VERSION
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;k8s-common-26582301-vmss000000       Ready    agent    7h33m   v1.13.4
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;k8s-common-26582301-vmss000001       Ready    agent    7h33m   v1.13.4
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;k8s-database-26582301-vmss000000     Ready    agent    7h33m   v1.13.4
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;k8s-database-26582301-vmss000001     Ready    agent    7h33m   v1.13.4
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;k8s-database-26582301-vmss000002     Ready    agent    7h33m   v1.13.4
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;k8s-management-26582301-vmss000000   Ready    agent    7h33m   v1.13.4
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;k8s-management-26582301-vmss000001   Ready    agent    7h33m   v1.13.4
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;k8s-management-26582301-vmss000002   Ready    agent    7h33m   v1.13.4
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;k8s-master-26582301-0                Ready    master   7h33m   v1.13.4
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;k8s-master-26582301-1                Ready    master   7h33m   v1.13.4
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;k8s-master-26582301-2                Ready    master   7h32m   v1.13.4
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;You should see three nodes with the role &amp;ldquo;master&amp;rdquo;, that is the default recommended value. An odd number of master nodes is mandatory to guarantee that &lt;code&gt;etcd&lt;/code&gt;, the service that stores the cluster status, reaches its consensus (see &lt;a href=&#34;https://etcd.io/docs/v3.3/faq/#why-an-odd-number-of-cluster-membersk&#34;&gt;&amp;ldquo;Why an odd number of cluster members&amp;rdquo; in etcd frequently asked questions&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;The normal status for a node is &amp;ldquo;Ready&amp;rdquo;, that means that the node is up and a healthy member of the Kubernetes cluster.&lt;/p&gt;
&lt;p&gt;Nodes can become &amp;ldquo;NotReady&amp;rdquo; for different reasons when something is not right. In this case, the first step is to describe the affected node and check the &amp;ldquo;Conditions&amp;rdquo; and &amp;ldquo;Events&amp;rdquo; to determine what could be wrong:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;$ kubectl describe node k8s-common-26582301-vmss000000
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Conditions:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  Type                 Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  ----                 ------  -----------------                 ------------------                ------                       -------
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  NetworkUnavailable   False   Thu, &lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;04&lt;/span&gt; Jul &lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;2019&lt;/span&gt; 07:08:50 +0200   Thu, &lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;04&lt;/span&gt; Jul &lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;2019&lt;/span&gt; 07:08:50 +0200   RouteCreated                 RouteController created a route
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  MemoryPressure       False   Thu, &lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;04&lt;/span&gt; Jul &lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;2019&lt;/span&gt; 14:41:10 +0200   Thu, &lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;04&lt;/span&gt; Jul &lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;2019&lt;/span&gt; 07:07:06 +0200   KubeletHasSufficientMemory   kubelet has sufficient memory available
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  DiskPressure         False   Thu, &lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;04&lt;/span&gt; Jul &lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;2019&lt;/span&gt; 14:41:10 +0200   Thu, &lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;04&lt;/span&gt; Jul &lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;2019&lt;/span&gt; 07:07:06 +0200   KubeletHasNoDiskPressure     kubelet has no disk pressure
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  PIDPressure          False   Thu, &lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;04&lt;/span&gt; Jul &lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;2019&lt;/span&gt; 14:41:10 +0200   Thu, &lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;04&lt;/span&gt; Jul &lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;2019&lt;/span&gt; 07:07:06 +0200   KubeletHasSufficientPID      kubelet has sufficient PID available
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  Ready                True    Thu, &lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;04&lt;/span&gt; Jul &lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;2019&lt;/span&gt; 14:41:10 +0200   Thu, &lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;04&lt;/span&gt; Jul &lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;2019&lt;/span&gt; 07:07:20 +0200   KubeletReady                 kubelet is posting ready status. AppArmor enabled
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;With &lt;code&gt;kubectl top nodes&lt;/code&gt;, you can get a quick overview of each node CPU and memory usage. If it is not enough and you need more details about the usage of resources, go to &lt;a href=&#34;../../../docs/developers-workspace/monitoring/dashboards/&#34;&gt;Grafana dashboards&lt;/a&gt; section in &lt;strong&gt;Monitor Aura&lt;/strong&gt; documentation.&lt;/p&gt;
&lt;p&gt;If a node is misbehaving, the recommended steps are:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;a href=&#34;https://qna.baikalplatform.com/t/how-to-drain-a-kubernetes-node/60&#34;&gt;Drain the Kubernetes node&lt;/a&gt;. This way, Kubernetes give the pods running on that node a chance to stop in an orderly way and stops scheduling new pods on it. This step is not mandatory, but recommended.&lt;/li&gt;
&lt;li&gt;Terminate the node. It is always safe to terminate one node at a time, waiting until it joins the cluster. Terminating many nodes at a time can affect the quorum of services that need to form a cluster,
so it is not recommended if the node you want to terminate contains these kind of pods.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Nodes that are cordoned appear with the status &amp;ldquo;SchedulingDisabled&amp;rdquo;:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;k8s-common-26582301-vmss000000      Ready,SchedulingDisabled   agent    ...
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Filtering nodes with &lt;code&gt;kubectl&lt;/code&gt; is very handy. For example, you can filter nodes to get those in a specific agent pool:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;$ kubectl get nodes -l &lt;span style=&#34;color:#4e9a06&#34;&gt;&amp;#39;agentpool in (common)&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;NAME                             STATUS   ROLES   AGE     VERSION
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;k8s-common-26582301-vmss000000   Ready    agent   7h36m   v1.13.4
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;k8s-common-26582301-vmss000001   Ready    agent   7h36m   v1.13.4
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Or to get only those in a specific availability zone:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;$ kubectl get nodes -l &lt;span style=&#34;color:#4e9a06&#34;&gt;&amp;#39;failure-domain.beta.kubernetes.io/zone in (0)&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;NAME                                 STATUS   ROLES    AGE     VERSION
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;k8s-common-26582301-vmss000000       Ready    agent    4h14m   v1.13.4
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;k8s-database-26582301-vmss000000     Ready    agent    4h14m   v1.13.4
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;k8s-management-26582301-vmss000000   Ready    agent    4h14m   v1.13.4
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;k8s-master-26582301-0                Ready    master   4h14m   v1.13.4
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;k8s-master-26582301-1                Ready    master   4h14m   v1.13.4
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;You can find information about the nodes usage in Grafana. Also, describing the nodes gives you information about how Kubernetes allocated resources on it:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;$ kubectl describe node k8s-common-26582301-vmss000001
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Non-terminated Pods:         &lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;(&lt;/span&gt;&lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;13&lt;/span&gt; in total&lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  Namespace                  Name                                     CPU Requests  CPU Limits  Memory Requests  Memory Limits  AGE
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  ---------                  ----                                     ------------  ----------  ---------------  -------------  ---
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  aura-es-test               aog-bridge-744bbb9595-ffbzz              100m &lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;(&lt;/span&gt;5%&lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;)&lt;/span&gt;     &lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;1&lt;/span&gt; &lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;(&lt;/span&gt;51%&lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;)&lt;/span&gt;     256Mi &lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;(&lt;/span&gt;4%&lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;)&lt;/span&gt;       512Mi &lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;(&lt;/span&gt;9%&lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;)&lt;/span&gt;     35m
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  aura-es-test               api-gw-5c584b4c8d-qg7bz                  100m &lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;(&lt;/span&gt;5%&lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;)&lt;/span&gt;     &lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;1&lt;/span&gt; &lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;(&lt;/span&gt;51%&lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;)&lt;/span&gt;     256Mi &lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;(&lt;/span&gt;4%&lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;)&lt;/span&gt;       512Mi &lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;(&lt;/span&gt;9%&lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;)&lt;/span&gt;     35m
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  aura-es-test               aura-bot-84bd44dc6d-7cc46                100m &lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;(&lt;/span&gt;5%&lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;)&lt;/span&gt;     &lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;1&lt;/span&gt; &lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;(&lt;/span&gt;51%&lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;)&lt;/span&gt;     256Mi &lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;(&lt;/span&gt;4%&lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;)&lt;/span&gt;       512Mi &lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;(&lt;/span&gt;9%&lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;)&lt;/span&gt;     36m
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  aura-es-test               authentication-api-b849b6ff9-pjdk6       100m &lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;(&lt;/span&gt;5%&lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;)&lt;/span&gt;     &lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;1&lt;/span&gt; &lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;(&lt;/span&gt;51%&lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;)&lt;/span&gt;     256Mi &lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;(&lt;/span&gt;4%&lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;)&lt;/span&gt;       512Mi &lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;(&lt;/span&gt;9%&lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;)&lt;/span&gt;     35m
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  aura-es-test               nginx-5fd94584d8-fwmcd                   600m &lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;(&lt;/span&gt;31%&lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;)&lt;/span&gt;    &lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;2&lt;/span&gt; &lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;(&lt;/span&gt;103%&lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;)&lt;/span&gt;    768Mi &lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;(&lt;/span&gt;14%&lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;)&lt;/span&gt;      1Gi &lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;(&lt;/span&gt;19%&lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;)&lt;/span&gt;      34m
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  aura-es-test               nlp-85b4b446cc-2dtj8                     100m &lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;(&lt;/span&gt;5%&lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;)&lt;/span&gt;     &lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;1&lt;/span&gt; &lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;(&lt;/span&gt;51%&lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;)&lt;/span&gt;     256Mi &lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;(&lt;/span&gt;4%&lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;)&lt;/span&gt;       512Mi &lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;(&lt;/span&gt;9%&lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;)&lt;/span&gt;     35m
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  aura-es-test               user-helper-67b75cb8fc-szmv8             100m &lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;(&lt;/span&gt;5%&lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;)&lt;/span&gt;     &lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;1&lt;/span&gt; &lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;(&lt;/span&gt;51%&lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;)&lt;/span&gt;     256Mi &lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;(&lt;/span&gt;4%&lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;)&lt;/span&gt;       512Mi &lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;(&lt;/span&gt;9%&lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;)&lt;/span&gt;     35m
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  aura-es-test               web-sdk-5f7654b797-jbrfg                 100m &lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;(&lt;/span&gt;5%&lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;)&lt;/span&gt;     &lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;1&lt;/span&gt; &lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;(&lt;/span&gt;51%&lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;)&lt;/span&gt;     256Mi &lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;(&lt;/span&gt;4%&lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;)&lt;/span&gt;       512Mi &lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;(&lt;/span&gt;9%&lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;)&lt;/span&gt;     35m
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  aura-system                fluentd-st7jb                            50m &lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;(&lt;/span&gt;2%&lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;)&lt;/span&gt;      100m &lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;(&lt;/span&gt;5%&lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;)&lt;/span&gt;   256Mi &lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;(&lt;/span&gt;4%&lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;)&lt;/span&gt;       512Mi &lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;(&lt;/span&gt;9%&lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;)&lt;/span&gt;     51m
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  aura-system                node-exporter-4mlqs                      10m &lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;(&lt;/span&gt;0%&lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;)&lt;/span&gt;      50m &lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;(&lt;/span&gt;2%&lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;)&lt;/span&gt;    24Mi &lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;(&lt;/span&gt;0%&lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;)&lt;/span&gt;        32Mi &lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;(&lt;/span&gt;0%&lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;)&lt;/span&gt;      52m
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  kube-system                kube-proxy-fnb2d                         100m &lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;(&lt;/span&gt;5%&lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;)&lt;/span&gt;     &lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;0&lt;/span&gt; &lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;(&lt;/span&gt;0%&lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;)&lt;/span&gt;      &lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;0&lt;/span&gt; &lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;(&lt;/span&gt;0%&lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;)&lt;/span&gt;           &lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;0&lt;/span&gt; &lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;(&lt;/span&gt;0%&lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;)&lt;/span&gt;         4h14m
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  kube-system                kubernetes-dashboard-7947fffdf5-pdrf2    300m &lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;(&lt;/span&gt;15%&lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;)&lt;/span&gt;    300m &lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;(&lt;/span&gt;15%&lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;)&lt;/span&gt;  150Mi &lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;(&lt;/span&gt;2%&lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;)&lt;/span&gt;       150Mi &lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;(&lt;/span&gt;2%&lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;)&lt;/span&gt;     4h14m
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Allocated resources:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;(&lt;/span&gt;Total limits may be over &lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;100&lt;/span&gt; percent, i.e., overcommitted.&lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  Resource                       Requests      Limits
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  --------                       --------      ------
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  cpu                            1860m &lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;(&lt;/span&gt;96%&lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;)&lt;/span&gt;   10450m &lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;(&lt;/span&gt;541%&lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  memory                         3246Mi &lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;(&lt;/span&gt;62%&lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;)&lt;/span&gt;  5814Mi &lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;(&lt;/span&gt;111%&lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  ephemeral-storage              &lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;0&lt;/span&gt; &lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;(&lt;/span&gt;0%&lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;)&lt;/span&gt;        &lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;0&lt;/span&gt; &lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;(&lt;/span&gt;0%&lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  attachable-volumes-azure-disk  &lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;0&lt;/span&gt;             &lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The CPU and memory limits can be over 100%. But the CPU and &lt;strong&gt;memory requests&lt;/strong&gt; cannot. This means that the resources requested by the pods also establish a limit even if they do not use the requested resources.&lt;br&gt;
If all nodes in an agent pool are full, new pods will wait in a &amp;ldquo;pending&amp;rdquo; status until the cluster autoscaler adds a new node to the agent pool.&lt;/p&gt;
&lt;h4 id=&#34;autoscaling-groups&#34;&gt;Autoscaling groups&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Compute nodes&lt;/strong&gt;
In Azure, we use virtual machine scale set, this means that, when a node is terminated for any reason, another one will be automatically created and each agent pool corresponds to one VMSS in Azure.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Master nodes&lt;/strong&gt;
Master nodes in Azure are individual nodes that do not belong to any VMSS.
This means that, if you remove a master node in Azure, you need to run the installer again to recreate it.
For this reason, you can restart the node first and wait some minutes to verify if is able to rejoin the Kubernetes cluster.
In this case you do not need to terminate it. Remember to uncordon the node if you had cordoned it previously.&lt;/p&gt;
&lt;h3 id=&#34;horizontal-scaling-of-a-component&#34;&gt;Horizontal scaling of a component&lt;/h3&gt;
&lt;p&gt;Scaling deployments is easy using the &lt;code&gt;kubectl scale&lt;/code&gt; command. It enables you to scale one or more replicated services either up or down to the desired number of replicas.&lt;/p&gt;
&lt;p&gt;For example, you might want to scale the number of replicas of the apigw deployment to 6:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;$ kubectl scale deployment aura-bot --replicas&lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;6&lt;/span&gt; -n aura-&lt;span style=&#34;color:#000&#34;&gt;$ENV&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Deployments contain stateless loads, so they can safely scale up and down. The only restrictions are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;To have only one insights-loader in order to avoid race conditions trying to load insight files from the object storage.&lt;/li&gt;
&lt;li&gt;To have only one kube-state-metrics to avoid duplicated metrics in Prometheus.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The platform supports &lt;a href=&#34;https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/&#34;&gt;Horizontal Pod Autoscalers&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;They have been included in some relevant services to autoscale the number of replicas based the CPU usage of a pod.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;$ kubectl get hpa -n aura-&lt;span style=&#34;color:#000&#34;&gt;$ENV&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;NAME                 REFERENCE                       TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;aog-bridge           Deployment/aog-bridge           0%/150%   &lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;2&lt;/span&gt;         &lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;4&lt;/span&gt;         &lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;2&lt;/span&gt;          65m
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;api-gw               Deployment/api-gw               0%/150%   &lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;2&lt;/span&gt;         &lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;4&lt;/span&gt;         &lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;2&lt;/span&gt;          65m
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;aura-bot             Deployment/aura-bot             0%/150%   &lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;2&lt;/span&gt;         &lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;4&lt;/span&gt;         &lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;2&lt;/span&gt;          66m
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;authentication-api   Deployment/authentication-api   0%/150%   &lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;2&lt;/span&gt;         &lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;4&lt;/span&gt;         &lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;2&lt;/span&gt;          66m
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;nginx                Deployment/nginx                0%/150%   &lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;2&lt;/span&gt;         &lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;4&lt;/span&gt;         &lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;2&lt;/span&gt;          65m
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;nlp                  Deployment/nlp                  0%/150%   &lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;2&lt;/span&gt;         &lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;4&lt;/span&gt;         &lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;2&lt;/span&gt;          65m
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;user-helper          Deployment/user-helper          0%/150%   &lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;2&lt;/span&gt;         &lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;4&lt;/span&gt;         &lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;2&lt;/span&gt;          65m
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;web-sdk              Deployment/web-sdk              0%/150%   &lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;2&lt;/span&gt;         &lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;4&lt;/span&gt;         &lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;2&lt;/span&gt;          66m
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;MINPODS&lt;/strong&gt; values are set according to your deployment profile using the &lt;code&gt;service_replicas&lt;/code&gt; value for each service.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;MAXPODS&lt;/strong&gt; values are stablished as three times the value of MINPODS.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;TARGET&lt;/strong&gt; is defined as a CPU utilization threshold fixed to 80% for all platform services. According to these policies, each service will scale up/down (when needed) based on their own CPU usage metrics. Support for additional custom metrics (e.g., latencies) will be added in future releases.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This feature is very powerful in combination with the &lt;a href=&#34;#kubernetes-cluster-autoscaler&#34;&gt;cluster autoscaler&lt;/a&gt;, because once the pods created by the HPA do not fit in the available compute nodes, the cluster autoscaler will automatically add new nodes to the cluster.&lt;/p&gt;
&lt;p&gt;Regarding statefulsets, not all of them scale nicely, so it is really important to understand them well to be able to scale them safely.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;$ kubectl get statefulsets --namespace aura-system
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;NAME                    READY   AGE
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;alertmanager            2/2     82m
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;elasticsearch           3/3     83m
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;fluent-bit-aggregator   3/3     83m
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;mongodb                 3/3     84m
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;prometheus              3/3     83m
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&amp;#x2139;&amp;#xfe0f; You have to keep in mind that:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;elasticsearch&lt;/strong&gt; scales well for adding or removing nodes.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;fluentd-aggregator&lt;/strong&gt; scales well for adding or removing nodes.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;prometheus&lt;/strong&gt; stores the same information in all the available replicas, so it is recommended to keep the number of replicas in, at least, 2 for high availability reasons.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Once you are sure about it, use &lt;code&gt;kubectl&lt;/code&gt; to scale the statefulset, for example:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;$ kubectl scale statefulsets elasticsearch --replicas&lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;7&lt;/span&gt; -n aura-&lt;span style=&#34;color:#000&#34;&gt;$ENV&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id=&#34;jobs&#34;&gt;Jobs&lt;/h2&gt;
&lt;p&gt;There are some scheduled jobs that run in Aura Platform. You can check them with &lt;code&gt;kubectl get jobs&lt;/code&gt;:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;$ kubectl get &lt;span style=&#34;color:#204a87&#34;&gt;jobs&lt;/span&gt; --namespace aura-es-test
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;NAME                      COMPLETIONS   DURATION   AGE
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;aura-bot-makeup           1/1           92s        69m
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;nlp-provisioning          1/1           10m        68m
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Most of them are provisioning jobs that create all the required entities during the installation (applications, APIs, etc).&lt;/p&gt;
&lt;h2 id=&#34;horizontal-scaling-the-infrastructure&#34;&gt;Horizontal scaling the infrastructure&lt;/h2&gt;
&lt;p&gt;It is possible to add and remove nodes to the different agent pools in the Kubernetes cluster.&lt;/p&gt;
&lt;p&gt;Adding nodes to a running cluster is a safe operation. However, bear in mind that removing nodes can result in statefulsets not working properly. The reason is that some agent pools are dedicated to stateful services that need to form a cluster.&lt;/p&gt;
&lt;p&gt;That is the case of the following agent pools:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;master&lt;/strong&gt;: Kubernetes master nodes use a quorum protocol that needs an odd number of nodes. Three nodes is the minimum number to have HA (high availability), so 3 nodes in preproduction and 5 in production environments is a safe choice. The number of master nodes in an existing environment cannot be modified for now.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;database&lt;/strong&gt;: It must be 3, for a PostgreSQL cluster with one master and two followers.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;kubernetes-cluster-autoscaler&#34;&gt;Cluster autoscaler&lt;/h3&gt;
&lt;p&gt;The Aura Platform Kubernetes cluster deploys the official Kubernetes cluster-autoscaler. It is a deployment with one pod that runs in one of the master nodes.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;$ kubectl get po -l &lt;span style=&#34;color:#000&#34;&gt;app&lt;/span&gt;&lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;=&lt;/span&gt;cluster-autoscaler -n kube-system
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;NAME                                  READY   STATUS    RESTARTS   AGE
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;cluster-autoscaler-6fb6b8dcdc-xr59x   1/1     Running   &lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;1&lt;/span&gt;          4h48m
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This feature is intended to automatically adjust the Kubernetes cluster size when one of these conditions are met:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Scale up&lt;/strong&gt;: there are pending pods that do not fit in the cluster due to insufficient available resources, but could fit if new compute nodes are added.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Scale down&lt;/strong&gt;: there are nodes in the cluster that have been underutilized for an extended period of time and their pods can be placed on other existing nodes.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The cluster autoscaler is able to scale the agent pools down to zero if needed.
This is the reason why it is not needed to configure the number of nodes in your deployment profile.
The cluster autoscaler will take care of everything to cut your cloud costs to the minimum.&lt;/p&gt;
&lt;p&gt;&amp;#x2139;&amp;#xfe0f; You can find more information about the &lt;a href=&#34;https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler&#34;&gt;cluster autoscaler in the official GitHub repository&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;vertical-scaling-the-infrastructure&#34;&gt;Vertical scaling the infrastructure&lt;/h2&gt;
&lt;p&gt;The process is similar to the horizontal scaling. You need to tune the &lt;code&gt;type&lt;/code&gt; properties in the infrastructure section of your profile configuration file.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#8f5902;font-style:italic&#34;&gt;# Infrastructure&lt;/span&gt;&lt;span style=&#34;color:#f8f8f8;text-decoration:underline&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#f8f8f8;text-decoration:underline&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#204a87;font-weight:bold&#34;&gt;infrastructure&lt;/span&gt;&lt;span style=&#34;color:#000;font-weight:bold&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#f8f8f8;text-decoration:underline&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#f8f8f8;text-decoration:underline&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#204a87;font-weight:bold&#34;&gt;region&lt;/span&gt;&lt;span style=&#34;color:#000;font-weight:bold&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#f8f8f8;text-decoration:underline&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#4e9a06&#34;&gt;&amp;#34;westeurope&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#f8f8f8;text-decoration:underline&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#f8f8f8;text-decoration:underline&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#204a87;font-weight:bold&#34;&gt;compute&lt;/span&gt;&lt;span style=&#34;color:#000;font-weight:bold&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#f8f8f8;text-decoration:underline&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#f8f8f8;text-decoration:underline&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#204a87;font-weight:bold&#34;&gt;masters&lt;/span&gt;&lt;span style=&#34;color:#000;font-weight:bold&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#f8f8f8;text-decoration:underline&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#f8f8f8;text-decoration:underline&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#204a87;font-weight:bold&#34;&gt;size&lt;/span&gt;&lt;span style=&#34;color:#000;font-weight:bold&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#f8f8f8;text-decoration:underline&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;3&lt;/span&gt;&lt;span style=&#34;color:#f8f8f8;text-decoration:underline&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#f8f8f8;text-decoration:underline&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#204a87;font-weight:bold&#34;&gt;type&lt;/span&gt;&lt;span style=&#34;color:#000;font-weight:bold&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#f8f8f8;text-decoration:underline&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#4e9a06&#34;&gt;&amp;#34;Standard_DS2_v2&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#f8f8f8;text-decoration:underline&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#f8f8f8;text-decoration:underline&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#204a87;font-weight:bold&#34;&gt;common_nodes&lt;/span&gt;&lt;span style=&#34;color:#000;font-weight:bold&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#f8f8f8;text-decoration:underline&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#f8f8f8;text-decoration:underline&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#204a87;font-weight:bold&#34;&gt;min_size&lt;/span&gt;&lt;span style=&#34;color:#000;font-weight:bold&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#f8f8f8;text-decoration:underline&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;2&lt;/span&gt;&lt;span style=&#34;color:#f8f8f8;text-decoration:underline&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#f8f8f8;text-decoration:underline&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#204a87;font-weight:bold&#34;&gt;max_size&lt;/span&gt;&lt;span style=&#34;color:#000;font-weight:bold&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#f8f8f8;text-decoration:underline&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;8&lt;/span&gt;&lt;span style=&#34;color:#f8f8f8;text-decoration:underline&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#f8f8f8;text-decoration:underline&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#204a87;font-weight:bold&#34;&gt;type&lt;/span&gt;&lt;span style=&#34;color:#000;font-weight:bold&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#f8f8f8;text-decoration:underline&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#4e9a06&#34;&gt;&amp;#34;Standard_DS2_v2&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#f8f8f8;text-decoration:underline&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#f8f8f8;text-decoration:underline&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#204a87;font-weight:bold&#34;&gt;database_nodes&lt;/span&gt;&lt;span style=&#34;color:#000;font-weight:bold&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#f8f8f8;text-decoration:underline&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#f8f8f8;text-decoration:underline&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#204a87;font-weight:bold&#34;&gt;min_size&lt;/span&gt;&lt;span style=&#34;color:#000;font-weight:bold&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#f8f8f8;text-decoration:underline&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;3&lt;/span&gt;&lt;span style=&#34;color:#f8f8f8;text-decoration:underline&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#f8f8f8;text-decoration:underline&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#204a87;font-weight:bold&#34;&gt;max_size&lt;/span&gt;&lt;span style=&#34;color:#000;font-weight:bold&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#f8f8f8;text-decoration:underline&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;6&lt;/span&gt;&lt;span style=&#34;color:#f8f8f8;text-decoration:underline&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#f8f8f8;text-decoration:underline&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#204a87;font-weight:bold&#34;&gt;type&lt;/span&gt;&lt;span style=&#34;color:#000;font-weight:bold&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#f8f8f8;text-decoration:underline&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#4e9a06&#34;&gt;&amp;#34;Standard_DS2_v2&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#f8f8f8;text-decoration:underline&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#f8f8f8;text-decoration:underline&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#204a87;font-weight:bold&#34;&gt;management_nodes&lt;/span&gt;&lt;span style=&#34;color:#000;font-weight:bold&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#f8f8f8;text-decoration:underline&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#f8f8f8;text-decoration:underline&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#204a87;font-weight:bold&#34;&gt;min_size&lt;/span&gt;&lt;span style=&#34;color:#000;font-weight:bold&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#f8f8f8;text-decoration:underline&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;3&lt;/span&gt;&lt;span style=&#34;color:#f8f8f8;text-decoration:underline&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#f8f8f8;text-decoration:underline&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#204a87;font-weight:bold&#34;&gt;max_size&lt;/span&gt;&lt;span style=&#34;color:#000;font-weight:bold&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#f8f8f8;text-decoration:underline&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;6&lt;/span&gt;&lt;span style=&#34;color:#f8f8f8;text-decoration:underline&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#f8f8f8;text-decoration:underline&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#204a87;font-weight:bold&#34;&gt;type&lt;/span&gt;&lt;span style=&#34;color:#000;font-weight:bold&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#f8f8f8;text-decoration:underline&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#4e9a06&#34;&gt;&amp;#34;Standard_DS3_v2&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#f8f8f8;text-decoration:underline&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Afterwards, run the installer to apply the changes:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;$ ./aura deploy_infra --cfg /PATH/TO/config.yml -c /PATH/TO/credentials.k8s.json -v &lt;span style=&#34;color:#4e9a06&#34;&gt;&amp;#34;VAULT_PASSWD&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;$ ./aura deploy_system --cfg /PATH/TO/config.yml -c /PATH/TO/credentials.k8s.json -v &lt;span style=&#34;color:#4e9a06&#34;&gt;&amp;#34;VAULT_PASSWD&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;$ ./aura deploy_core --cfg /PATH/TO/config.yml -c /PATH/TO/credentials.k8s.json -v &lt;span style=&#34;color:#4e9a06&#34;&gt;&amp;#34;VAULT_PASSWD&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The operation has to terminate and recreate every node. It has to be done as a rolling update to avoid service disruption, so it can take a lot of time to complete (around 5-10 minutes per node) in a big Kubernetes cluster.&lt;/p&gt;
&lt;p&gt;In Azure, it is not possible to change the instance types with the Aura Platform installer yet. This means that changing the instance types in the deployment profile has no effects on redeployments.&lt;/p&gt;
&lt;p&gt;&amp;#x26a0;&amp;#xfe0f; Do not use the Azure Portal to modify the cluster nodes. It is an error-prone and unsupported
way to scale the cluster that could impact the Aura Platform stability. Kubernetes must be aware of the changes done to the cluster and all changes must be kept in sync with the deployment profile.&lt;/p&gt;
&lt;h2 id=&#34;kubernetes-storage&#34;&gt;Kubernetes storage&lt;/h2&gt;
&lt;p&gt;Kubernetes uses persistent volumes.
They are backed by Managed Disks in Azure.&lt;/p&gt;
&lt;h2 id=&#34;services-logs-with-kubectl&#34;&gt;Services logs with kubectl&lt;/h2&gt;
&lt;p&gt;The best way to access to the logs of one service is using &lt;strong&gt;Kibana&lt;/strong&gt; platform. Find more information in &lt;a href=&#34;../../../docs/developers-workspace/monitoring/aura-logs/&#34;&gt;Manage Aura logs&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;However, you can access the same way to:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;$ kubectl logs -f -l &lt;span style=&#34;color:#000&#34;&gt;app&lt;/span&gt;&lt;span style=&#34;color:#ce5c00;font-weight:bold&#34;&gt;=&lt;/span&gt;aura-bot -n aura-&lt;span style=&#34;color:#000&#34;&gt;$ENV&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
      </description>
    </item>
    
    <item>
      <title>Docs: </title>
      <link>/docs/deployment/infraestructure/platform-services/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      
      <guid>/docs/deployment/infraestructure/platform-services/</guid>
      <description>
        
        
        &lt;h1 id=&#34;aura-platform-services&#34;&gt;Aura Platform services&lt;/h1&gt;


&lt;div class=&#34;pageinfo pageinfo-primary&#34;&gt;
&lt;p&gt;Description of all the services that compose Aura Platform&lt;/p&gt;

&lt;/div&gt;

&lt;h2 id=&#34;services&#34;&gt;Introduction to Aura Platform services&lt;/h2&gt;
&lt;p&gt;All services that compose the Aura Platform are run as Docker containers on the &lt;a href=&#34;../../../docs/deployment/infraestructure/kubernetes-cluster/&#34;&gt;Kubernetes cluster&lt;/a&gt;. This helps us monitor and operate them in a consistent way.&lt;/p&gt;
&lt;p&gt;The services can be grouped in three categories:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;#services-infrastructure&#34;&gt;Infrastructure services&lt;/a&gt;&lt;/strong&gt;. Services that are very tied to the infrastructure and could be reused by other products using this infrastructure.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;#services-system&#34;&gt;System services&lt;/a&gt;&lt;/strong&gt;. Management services that are part of Aura Platform and could be potentially shared by several Aura Platform deployments.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Core services&lt;/strong&gt;. All the other platform services that provide the end-user features.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;services-infrastructure&#34;&gt;Infrastructure services&lt;/h2&gt;
&lt;p&gt;Services related to Aura infrastructure that could be reused by other products using this infrastructure.&lt;/p&gt;
&lt;h3 id=&#34;cluster-autoscaler&#34;&gt;cluster-autoscaler&lt;/h3&gt;
&lt;p&gt;This service scales the cluster nodes. See &lt;a href=&#34;../../../docs/deployment/infraestructure/kubernetes-cluster/#kubernetes-cluster-autoscaler&#34;&gt;cluster autoscaler&lt;/a&gt; section in the Kubernetes cluster documentation for more details.&lt;/p&gt;
&lt;h2 id=&#34;services-system&#34;&gt;System services&lt;/h2&gt;
&lt;h3 id=&#34;alertmanager&#34;&gt;alertmanager&lt;/h3&gt;
&lt;p&gt;The &lt;a href=&#34;https://github.com/prometheus/alertmanager&#34;&gt;alertmanager&lt;/a&gt; is part of the Prometheus suite in charge of sending notifications (email, slack) when an alert goes off in Prometheus.&lt;/p&gt;
&lt;!-- link desde monitoring &gt; alerts?--&gt;
&lt;p&gt;Notifications are sent to the &lt;code&gt;notifications_email&lt;/code&gt; (defined in the deployment profile) using an external global SMTP server administered by the Aura Platform team.&lt;/p&gt;
&lt;p&gt;&amp;#x26a0;&amp;#xfe0f; It is important that the different teams that operate the platform are subscribed to the alerts.&lt;/p&gt;
&lt;h3 id=&#34;blackbox-exporter&#34;&gt;blackbox-exporter&lt;/h3&gt;
&lt;p&gt;The &lt;a href=&#34;https://github.com/prometheus/blackbox_exporter&#34;&gt;blackbox-exporter&lt;/a&gt; is a service that allows probing endpoints over HTTP, HTTPS, DNS, TCP and ICMP.&lt;/p&gt;
&lt;p&gt;Aura Platform is deployed along with an external service/endpoint that is able to check some services health. The blackbox-exporter uses some HTTPS probes that periodically sends a request to the external endpoint to validate that is still healthy. Its metrics (result, latency, etc) are stored in Prometheus.&lt;/p&gt;
&lt;h3 id=&#34;elasticsearch&#34;&gt;Elasticsearch&lt;/h3&gt;
&lt;p&gt;&lt;a href=&#34;https://www.elastic.co/es/elasticsearch/&#34;&gt;ElasticSearch&lt;/a&gt; is a stateful service that indexes the Aura Platform logs so they can be used for analysis.&lt;/p&gt;
&lt;p&gt;It runs as a statefulset. Logs can use a lot of space in disk, so it is important to size the volume accordingly by modifying the following section of the deployment profile:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#204a87;font-weight:bold&#34;&gt;elasticsearch&lt;/span&gt;&lt;span style=&#34;color:#000;font-weight:bold&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#f8f8f8;text-decoration:underline&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#f8f8f8;text-decoration:underline&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#204a87;font-weight:bold&#34;&gt;storage&lt;/span&gt;&lt;span style=&#34;color:#000;font-weight:bold&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#f8f8f8;text-decoration:underline&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;10&lt;/span&gt;&lt;span style=&#34;color:#f8f8f8;text-decoration:underline&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The retention time of the logs is 7 days by default, but can be configured in the deployment profile.
Remember that increasing this value means that logs will take more space on disk and queries against ElasticSearch could take longer to complete.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#204a87;font-weight:bold&#34;&gt;log_retention_time&lt;/span&gt;&lt;span style=&#34;color:#000;font-weight:bold&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#f8f8f8;text-decoration:underline&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;3&lt;/span&gt;&lt;span style=&#34;color:#f8f8f8;text-decoration:underline&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;There is a lifecycle policy configured to remove the old index from Elasticsearch. You can check this in the &lt;a href=&#34;https://www.elastic.co/es/kibana/&#34;&gt;Kibana UI&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Also for long term storage of indexes, there is an snapshot configured that stores the index during a year in a blob from the cluster associated storage account. This snapshot can be also checked in &lt;a href=&#34;https://www.elastic.co/es/kibana/&#34;&gt;Kibana UI&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;To check the disk usage, you can use both the &lt;a href=&#34;../../../docs/deployment/infraestructure/kubernetes-cluster/#kubernetes-storage&#34;&gt;Kubernetes Storage&lt;/a&gt; or the ElasticSearch dashboard in Grafana.&lt;/p&gt;
&lt;h3 id=&#34;fluent-bit&#34;&gt;fluent-bit&lt;/h3&gt;
&lt;p&gt;&lt;a href=&#34;https://fluentbit.io/&#34;&gt;Fluent-bit&lt;/a&gt; is a daemonset that runs in all nodes, processing its logs and sending them to &lt;a href=&#34;#fluentbit-aggregator&#34;&gt;fluentbit-aggregator&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id=&#34;fluentbit-aggregator&#34;&gt;fluentbit-aggregator&lt;/h3&gt;
&lt;p&gt;Stateful service that aggregates all logs coming from the &lt;a href=&#34;#fluent-bit&#34;&gt;fluent-bit&lt;/a&gt; processes on every node.&lt;/p&gt;
&lt;p&gt;It indexes the logs in &lt;a href=&#34;#elasticsearch&#34;&gt;Elasticsearch&lt;/a&gt;. It has a small data disk of 10GB that acts as a buffer to avoid losing data if something goes wrong (e.g., a network issue or a problem with the Elasticsearch cluster) while trying to index the logs.&lt;/p&gt;
&lt;p&gt;It is safe to kill a specific fluentbit-aggregator pod (&lt;code&gt;kubectl delete pod&lt;/code&gt;) if it gets stuck for some reason.&lt;/p&gt;
&lt;h4 id=&#34;node-exporter&#34;&gt;node-exporter&lt;/h4&gt;
&lt;p&gt;The &lt;a href=&#34;https://github.com/prometheus/node_exporter&#34;&gt;node-exporter&lt;/a&gt; is an official Prometheus exporter that gathers hardware and OS metrics exposed by the virtual machines that compose Aura Platform.&lt;/p&gt;
&lt;h4 id=&#34;prometheus&#34;&gt;Prometheus&lt;/h4&gt;
&lt;p&gt;&lt;a href=&#34;https://prometheus.io&#34;&gt;Prometheus&lt;/a&gt; is a stateful service that scrapes metrics from all the exporters in the platform.&lt;/p&gt;
&lt;p&gt;It works in a pull-based metrics collection approach. This means that it periodically (every 30 seconds) requests metrics from every HTTP endpoint exposed by the Aura Platform services. It also gathers information about the infrastructure that sustains the Aura Platform.&lt;/p&gt;
&lt;p&gt;Metrics are stored using a &lt;a href=&#34;https://prometheus.io/docs/prometheus/latest/storage/&#34;&gt;local on-disk time series database&lt;/a&gt;. The local storage is not meant as durable long-term storage, so metrics have a retention time of 15 days.&lt;/p&gt;
&lt;p&gt;On average, Prometheus uses only around 1-2 bytes per sample. To plan the capacity of a Prometheus server, you can use the formula:&lt;/p&gt;


&lt;div class=&#34;pageinfo pageinfo-primary&#34;&gt;
&lt;p&gt;needed_disk_space = retention_time_seconds * ingested_samples_per_second * bytes_per_sample&lt;/p&gt;

&lt;/div&gt;

&lt;p&gt;Since metrics can take some space, it is important to size its volume size accordingly in the following section of the deployment profile:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#204a87;font-weight:bold&#34;&gt;prometheus&lt;/span&gt;&lt;span style=&#34;color:#000;font-weight:bold&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#f8f8f8;text-decoration:underline&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#f8f8f8;text-decoration:underline&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#204a87;font-weight:bold&#34;&gt;storage_size&lt;/span&gt;&lt;span style=&#34;color:#000;font-weight:bold&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#f8f8f8;text-decoration:underline&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#0000cf;font-weight:bold&#34;&gt;10&lt;/span&gt;&lt;span style=&#34;color:#f8f8f8;text-decoration:underline&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;If the local storage becomes corrupted for whatever reason, shut down Prometheus and remove the storage directory, that is mounted under &lt;code&gt;/prometheus&lt;/code&gt; on the Prometheus container.&lt;/p&gt;
&lt;p&gt;To check the disk usage, you can use the Kubernetes Storage dashboard in Grafana, looking for the PVC named &lt;code&gt;data-prometheus-0&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;If you have a user with a valid role to access the Kubernetes cluster, you can also check the disk usage running &lt;code&gt;df -h&lt;/code&gt; on the Prometheus container:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;$ kubectl -n aura-system &lt;span style=&#34;color:#204a87&#34;&gt;exec&lt;/span&gt; -it prometheus-0 -- df -h
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;However, the second alternative is discouraged because it is more intrusive and error prone. Running arbitrary commands on the Aura Platform containers or nodes will be forbidden and will fire security alerts in future releases.&lt;/p&gt;
&lt;p&gt;The recommended pattern for running Prometheus in HA mode is to run duplicated instances (same configuration, scraping the same targets independently). That means having at least two replicas running.&lt;/p&gt;
&lt;h3 id=&#34;thanos&#34;&gt;Thanos&lt;/h3&gt;
&lt;h4 id=&#34;thanos-sidecar-container&#34;&gt;thanos-sidecar container&lt;/h4&gt;
&lt;p&gt;The &lt;a href=&#34;https://thanos.io/tip/components/sidecar.md/&#34;&gt;&amp;ldquo;thanos-sidecar&amp;rdquo;&lt;/a&gt; container is a sidecar to the &amp;ldquo;prometheus&amp;rdquo; container that enhances it by exposing a gRPC StoreAPI and by uploading blocks to an object storage API (like Azure Blob Storage). It is a stateless component.&lt;/p&gt;
&lt;h4 id=&#34;thanos-querier&#34;&gt;thanos-querier&lt;/h4&gt;
&lt;p&gt;The &lt;a href=&#34;https://thanos.io/tip/components/query.md/&#34;&gt;&amp;ldquo;thanos-querier&amp;rdquo;&lt;/a&gt; container exposes a gRPC StoreAPI and an HTTP Prometheus v1 API. It gathers the data needed to evaluate the query from underlying StoreAPIs, evaluates the query and returns the result. It is a stateless component.&lt;/p&gt;
&lt;p&gt;The configured StoreAPI sources are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;system namespace -&amp;gt; prometheus/thanos-sidecar&lt;/li&gt;
&lt;li&gt;system namespace -&amp;gt; thanos-store-gateway&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id=&#34;thanos-compact&#34;&gt;thanos-compact&lt;/h4&gt;
&lt;p&gt;The &lt;a href=&#34;https://thanos.io/tip/components/compact.md/&#34;&gt;&amp;ldquo;thanos-compact&amp;rdquo;&lt;/a&gt; container is a component that applies the compaction procedure of the Prometheus 2.0 storage engine in order to block data stored in object storage APIs (like Azure Blob Storage). It also generates downsampled blocks from each raw block.&lt;/p&gt;
&lt;p&gt;It is a stateful component, as it must be deployed as a singleton (against an exclusive label selector).&lt;/p&gt;
&lt;h4 id=&#34;thanos-store-gateway&#34;&gt;thanos-store-gateway&lt;/h4&gt;
&lt;p&gt;The &lt;a href=&#34;https://thanos.io/tip/components/store.md/&#34;&gt;&amp;ldquo;thanos-store-gateway&amp;rdquo;&lt;/a&gt; container exposes a gRPC StoreAPI. It serves data blocks containing metrics stored in Azure Blob Storage.&lt;/p&gt;
&lt;p&gt;It is a stateless component; however it consumes local storage for sync purposes and benefits from persistence against increased start-up times.&lt;/p&gt;
&lt;h3 id=&#34;grafana&#34;&gt;Grafana&lt;/h3&gt;
&lt;p&gt;&lt;a href=&#34;https://grafana.com/&#34;&gt;Grafana&lt;/a&gt; is an open-source service for analytics and monitoring purposes. The Aura Platform uses it to display metrics from Prometheus in several dashboards.&lt;/p&gt;
&lt;p&gt;&amp;#x26a0;&amp;#xfe0f; Changes to the official Aura Platform dashboards (those with the &amp;ldquo;baikal&amp;rdquo; label) will be overridden on every deployment of the Aura Platform infrastructure.&lt;/p&gt;
&lt;p&gt;&amp;#x26a0;&amp;#xfe0f; Grafana is not prepared to alert the Aura Platform. Aura Platform interfaces are the Prometheus API and the alertmanager.&lt;/p&gt;
&lt;h3 id=&#34;kibana&#34;&gt;Kibana&lt;/h3&gt;
&lt;p&gt;&lt;a href=&#34;https://www.elastic.co/es/kibana/&#34;&gt;Kibana&lt;/a&gt; is the service used by Aura Platform to access logs indexed in ElasticSearch.&lt;/p&gt;
&lt;p&gt;The first time accessing Kibana, you need to create an index mapping against &amp;ldquo;aura-services-*&amp;rdquo;, using @timestamp as the temporal reference for logs.&lt;/p&gt;
&lt;p&gt;Remember that logs are stored in ElasticSearch for 10 days by default.&lt;/p&gt;

      </description>
    </item>
    
  </channel>
</rss>
