OpenSearch

Configuration settings for OpenSearch

1: Index Management

2: Possible problems

2.1: Missing data node

Kuberenetes logging helm chart supports multiple deployment layouts of OpenSearch, which both satisfy local development needs where minimum use of resources is required or production layout with additional Kafka brokers and HA setup of the various components.

By default the helm chart configures two indices with corresponding index templates. One index is containers-{YYYY.MM.dd} indexing by default all workloads logs and systemd-{YYYY.MM.dd} for storing journal system logs for “kubelet” or “containerd” services running on the respective cluster nodes. Both indices are created according index templates allowing later on dedicated visualizations in OpenSearch Dahboards UI.

“Containers” index template uses composable pattern and leverages a predefined component template named “kubernetes-metadata”.

containers  [containers-*]  0  [kubernetes-metadata]
systemd     [systemd-*]     0  []

The latter uses kubernetes metadata attached by the FluentBit log shippers to unify its structure among workloads. It shall be also used by any container specific index with the purpose of sharing the same kubernetes fields mappings.

The helm chart deploys all templates extensions found in index-templates folder. An example of such index template is nginx, which inherits the mappings in the “kubernetes-metadata” component templates and adds access logs fields mappings.

{
   "index_patterns":[
      "nginx-*"
   ],
   "composed_of":[
      "kubernetes-metadata"
   ],
   "template":{
      "settings":{
         "index":{
            "codec":"best_compression",
            "mapping":{
               "total_fields":{
                  "limit":1000
               }
            },
            "number_of_shards":"{{ (.Values.data.replicas | int) }}",
            "number_of_replicas":"{{ (sub (.Values.data.replicas | int) 1) }}",
            "refresh_interval":"5s"
         }
      },
      "mappings":{
         "_source":{
            "enabled":true
         },
         "properties":{
            "log":{
               "type":"text"
            },
            "agent":{
               "type":"keyword"
            },
            "code":{
               "type":"keyword"
            },
            "host":{
               "type":"keyword"
            },
            "method":{
               "type":"keyword"
            },
            "path":{
               "type":"keyword"
            },
            "proxy_upstream_name":{
               "type":"keyword"
            },
            "referrer":{
               "type":"keyword"
            },
            "reg_id":{
               "type":"keyword"
            },
            "request_length":{
               "type":"long"
            },
            "request_time":{
               "type":"double"
            },
            "size":{
               "type":"long"
            },
            "upstream_addr":{
               "type":"keyword"
            },
            "upstream_response_length":{
               "type":"long"
            },
            "upstream_response_time":{
               "type":"double"
            },
            "upstream_status":{
               "type":"keyword"
            },
            "user":{
               "type":"keyword"
            }
         }
      }
   }
}

1 - Index Management

Overview on supported OpenSearch index management scenarios

The helm chart maintains retention days for indices in OpenSearch using a ISM policy defined in file index-retention_policy.json. Value is taken from opensearch.retentionDays key.

Note: Retention period configured in the helm chart (7 days by default) shall reflect the size of the persistence volumes mounted by the OpenSearch data nodes. If the logs volume in the cluster is high, the data nodes PVs sizes shall correspond.

It is a good practice to have a resizable storage class in the cluster supporting updates on the persistence volumes. When the persistence volumes fill up, the OpenSearch data node switch to read-only mode and new logs are prevented from indexing.

2 - Possible problems

OpenSearch / ElasticSearch is pretty nice piece of technology with many self-healing procedure, but sometimes manual interventions is required. In this chapter you can find solutions for some problems.

2.1 - Missing data node

Prerequisites

In this guide I expect, that you have access to Kubernetes cluster and have portfoward to some OpenSearch node (= pod from Kubernetes perspective). I recommend chose one node client type.
All API call to OpenSearch cluster beginning: curl -ks https://<Name>:<Password>@localhost:9200/<Source>, where:

<Name> = user name in OpenSearch cluster with admin privileges
<Password> = corespondig password for admin account
<Source> = datapoint from OpenSearch cluster

Check current status:

$ curl -ks https://<Name>:<Password>@localhost:9200/_cat/health

1654179496 14:18:16 logging yellow 5 1 true 30 30 0 0 27 0 - 52.6%

our cluster is in yellow state

List all available nodes from OpenSearch perspective:

$ curl -ks https://<Name>:<Password>@localhost:9200/_cat/nodes
100.96.7.5  72  97 1 0.30 0.36 0.35 mr  - ofd-manager-0
100.96.7.11 51  76 1 0.30 0.36 0.35 r   - ofd-client-56dd9c66fb-bs7hp
100.96.7.7  53  76 4 0.30 0.36 0.35 dir - ofd-data-1
100.96.1.8  21 100 1 1.73 0.82 0.41 mr  * ofd-manager-1
100.96.7.12 19  76 1 0.30 0.36 0.35 r   - ofd-client-56dd9c66fb-q9tv5

here you can see multinode setup, where must exist two nodes from type client, data and manager (6 OpenSearch nodes total)

one datanode “ofd-data-0” is missing

Check suspicious pods:

$ kubectl -n logging get pods | grep data
ofd-data-0    1/1    Running    0    130m
ofd-data-1    1/1    Running    0    129m

from Kubernetes perspective this pods (OpenSearch node) working fine

Check logs from suspicious pod:

$ kubectl -n logging logs ofd-data-0
...
"message": "failed to join ...
...
Caused by: org.opensearch.cluster.coordination.CoordinationStateRejectedException: join validation on cluster state with a different cluster uuid 2UlST0WBQIKEV05cDpuWwQ than local cluster uuid v1vi49Q_RRaaC83iMthBnQ, rejecting
...

it seems, that this node hold old previously used OpenSearch cluster ID and attempt to connect to this old OpenSearch cluster instance

this is reason, why this node is missing

Reset failed data node:

Warning
Double check, that you have at last half data nodes healthy! In our case, we must have 2 data nodes total and 1 is missing. Double check, that OpenSearch cluster is in yellow state. Proceeding with smaller amount of datanodes come to datalost!

$ kubectl -n logging exec -i -t ofd-data-0 -- /bin/bash

delete datadir in this pod:
```
$ rm -rf /data/nodes
```
logout from this pod:
```
$ exit
```

delete this pod for restarting:

$ k -n logging delete pod ofd-data-0
pod "ofd-data-0" deleted

Check OpenSearch cluster health again:

$ curl -ks https://<Name>:<Password>@localhost:9200/_cat/health
1654180648 14:37:28 logging yellow 6 2 true 45 30 0 4 11 0 - 75.0%
...
1654180664 14:37:44 logging yellow 6 2 true 53 30 0 2 5 0 - 88.3%
...
1654180852 14:40:52 logging green 6 2 true 60 30 0 0 0 0 - 100.0%

our cluster is still in yellow state

running curl command over time give information, that cluster regenerating

wait some time and if this problem was solved, you can see cluster again healthy in green state

OpenSearch

1 - Index Management

2 - Possible problems

2.1 - Missing data node

Prerequisites

Warning