This is the multi-page printable view of this section. Click here to print.
Logging Stack Components
- 1: FluentBit
- 2: OpenSearch
- 2.1: Index Management
- 2.2: Possible problems
- 2.2.1: Missing data node
- 3: OpenSearch-Dashboards
- 3.1: Opensearch-Dashboards Authentication & Authorization
- 3.2: OpenSearch-Dashboards-Observability
- 3.3: Opensearch-Dashboards Visualizations
- 4: Kafka
- 4.1: Howtos
- 4.1.1: How to generating cluster ID
1 - FluentBit
FluentBit is installed as daemon set on each of the k8s nodes by the helm chart. It follows FluentBit data pipeline setup designed for kubernetes environments.
The helm chart itself supports different deployment layouts depending on whether a simple or standard model is required. The standard model is recommended in production where various components runs in HA mode. In this case the FluentBit instances send the collected logs to kafka brokers. The kafka brokers are used for buffering and greatly increase the overall reliability and stability of the entire stack.
In the simple case the FluentBit instances communicate directly with OpenSearch nodes.
In both cases there is a set of FluentBit configurations which is responsible for proper logs collection from the containers and enriching those with the respective kubernetes metadata such as namespace of the origin workload, its labels and so on. The metadata is later used in indexing, searchers and visualisations scenarios. This shared configuration is shown on the diagrams here as “kubernetes data pipeline”.
The “kubernetes data pipeline” uses standard “Tail” input plugin to read the logs from the mounted node filesystem, “Kube-Tag” parser plugin to generate FluentBit tag of the events. Followed by “Kubernetes” filter used to add the kubernetes metadata to the events followed by the end by a “de_dot” filter used to replace dots “.” with undescores “_” in event names.
The “kubernetes data pipeline” is the foundation of any application specific configurations. For example nginx ingress controller produces unstructured access logs. To parse those logs and transform the lines into structured json formatted messages we shall enrich the pipeline with corresponding filters and parsers.
The nginx access logs parsing example is located at fluentbit-configs folder. Any additional application specific configs needs to be saved in the same location following filenames the naming conventions. Aka filters needs to have “filter” predix, “parsers” for parsers and so on.
In the nginx access log example the rewrite_tag filter is used to tag messages originating from containers and which share the app_kubernetes_io/name: ingress-nginx
label.
[FILTER]
Name rewrite_tag
Match kube.*
Rule $kubernetes['labels']['app_kubernetes_io/name'] "^(ingress-nginx)$" nginx false
[FILTER]
Name parser
Match nginx
Key_Name log
Parser k8s-nginx-ingress
Reserve_Data True
The messages are tagged and re-emitted in the FluentBit data pipeline. Later matched by the nginx parser which uses regex to construct a json formatted structured message
[PARSER]
Name k8s-nginx-ingress
Format regex
Regex ^(?<host>[^ ]*) - (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^\"]*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*) "(?<referrer>[^\"]*)" "(?<agent>[^\"]*)" (?<request_length>[^ ]*) (?<request_time>[^ ]*) \[(?<proxy_upstream_name>[^ ]*)\] (\[(?<proxy_alternative_upstream_name>[^ ]*)\] )?(?<upstream_addr>[^ ]*) (?<upstream_response_length>[^ ]*) (?<upstream_response_time>[^ ]*) (?<upstream_status>[^ ]*) (?<reg_id>[^ ]*).*$
Time_Key time
Time_Format %d/%b/%Y:%H:%M:%S %z
Additional parsers are supported such as multiline parses allowing to reconstruct java stacktraces into a single message.
Here is an example of such configuration.
filter-zookeeper.conf
:
[FILTER]
Name rewrite_tag
Match kube.*.logging.*.*
Rule $kubernetes['labels']['type'] "^(zk)$" zookeeper false
Emitter_Storage.type filesystem
[FILTER]
Name multiline
Match zookeeper
multiline.parser zookeeper_multiline
parser-zookeeper.conf
[MULTILINE_PARSER]
name zookeeper_multiline
type regex
flush_timeout 1000
key_content log
# Regex rules for multiline parsing
# ---------------------------------
# - first state always has the name: start_state
# - every field in the rule must be inside double quotes
#
# rules | state name | regex pattern | next state name
# ------|--------------|--------------------------------------|----------------
rule "start_state" "/^(?<exception>[^ ]+:)(?<rest>.*)$/" "cont"
rule "cont" "/\s+at\s.*/" "cont"
Hint: For high volume logs producers consider adding:
Emmiter_Storage.type filesystem
property. It allows additional buffering during re-emitting of the events, details see FluentBit rewrite-tag.
2 - OpenSearch
Kuberenetes logging helm chart supports multiple deployment layouts of OpenSearch, which both satisfy local development needs where minimum use of resources is required or production layout with additional Kafka brokers and HA setup of the various components.
By default the helm chart configures two indices with corresponding index templates. One index is containers-{YYYY.MM.dd}
indexing by default all workloads logs and systemd-{YYYY.MM.dd}
for storing journal system logs for “kubelet” or “containerd” services running on the respective cluster nodes.
Both indices are created according index templates allowing later on dedicated visualizations in OpenSearch Dahboards UI.
“Containers” index template uses composable pattern and leverages a predefined component template named “kubernetes-metadata”.
containers [containers-*] 0 [kubernetes-metadata]
systemd [systemd-*] 0 []
The latter uses kubernetes metadata attached by the FluentBit log shippers to unify its structure among workloads. It shall be also used by any container specific index with the purpose of sharing the same kubernetes fields mappings.
The helm chart deploys all templates extensions found in index-templates folder. An example of such index template is nginx, which inherits the mappings in the “kubernetes-metadata” component templates and adds access logs fields mappings.
{
"index_patterns":[
"nginx-*"
],
"composed_of":[
"kubernetes-metadata"
],
"template":{
"settings":{
"index":{
"codec":"best_compression",
"mapping":{
"total_fields":{
"limit":1000
}
},
"number_of_shards":"{{ (.Values.data.replicas | int) }}",
"number_of_replicas":"{{ (sub (.Values.data.replicas | int) 1) }}",
"refresh_interval":"5s"
}
},
"mappings":{
"_source":{
"enabled":true
},
"properties":{
"log":{
"type":"text"
},
"agent":{
"type":"keyword"
},
"code":{
"type":"keyword"
},
"host":{
"type":"keyword"
},
"method":{
"type":"keyword"
},
"path":{
"type":"keyword"
},
"proxy_upstream_name":{
"type":"keyword"
},
"referrer":{
"type":"keyword"
},
"reg_id":{
"type":"keyword"
},
"request_length":{
"type":"long"
},
"request_time":{
"type":"double"
},
"size":{
"type":"long"
},
"upstream_addr":{
"type":"keyword"
},
"upstream_response_length":{
"type":"long"
},
"upstream_response_time":{
"type":"double"
},
"upstream_status":{
"type":"keyword"
},
"user":{
"type":"keyword"
}
}
}
}
}
2.1 - Index Management
The helm chart maintains retention days for indices in OpenSearch using a ISM policy defined in file index-retention_policy.json
. Value is taken from opensearch.retentionDays
key.
Note: Retention period configured in the helm chart (7 days by default) shall reflect the size of the persistence volumes mounted by the OpenSearch data nodes. If the logs volume in the cluster is high, the data nodes PVs sizes shall correspond.
It is a good practice to have a resizable storage class in the cluster supporting updates on the persistence volumes. When the persistence volumes fill up, the OpenSearch data node switch to read-only mode and new logs are prevented from indexing.
2.2 - Possible problems
OpenSearch / ElasticSearch is pretty nice piece of technology with many self-healing procedure, but sometimes manual interventions is required. In this chapter you can find solutions for some problems.
2.2.1 - Missing data node
Prerequisites
In this guide I expect, that you have access to Kubernetes cluster and have portfoward to some OpenSearch node (= pod from Kubernetes perspective). I recommend chose one node client type.
All API call to OpenSearch cluster beginning: curl -ks https://<Name>:<Password>@localhost:9200/<Source>
, where:
<Name>
= user name in OpenSearch cluster with admin privileges<Password>
= corespondig password for admin account<Source>
= datapoint from OpenSearch cluster
- Check current status:
$ curl -ks https://<Name>:<Password>@localhost:9200/_cat/health
1654179496 14:18:16 logging yellow 5 1 true 30 30 0 0 27 0 - 52.6%
our cluster is in yellow state
- List all available nodes from OpenSearch perspective:
$ curl -ks https://<Name>:<Password>@localhost:9200/_cat/nodes
100.96.7.5 72 97 1 0.30 0.36 0.35 mr - ofd-manager-0
100.96.7.11 51 76 1 0.30 0.36 0.35 r - ofd-client-56dd9c66fb-bs7hp
100.96.7.7 53 76 4 0.30 0.36 0.35 dir - ofd-data-1
100.96.1.8 21 100 1 1.73 0.82 0.41 mr * ofd-manager-1
100.96.7.12 19 76 1 0.30 0.36 0.35 r - ofd-client-56dd9c66fb-q9tv5
here you can see multinode setup, where must exist two nodes from type client, data and manager (6 OpenSearch nodes total)
one datanode “ofd-data-0” is missing
- Check suspicious pods:
$ kubectl -n logging get pods | grep data
ofd-data-0 1/1 Running 0 130m
ofd-data-1 1/1 Running 0 129m
from Kubernetes perspective this pods (OpenSearch node) working fine
- Check logs from suspicious pod:
$ kubectl -n logging logs ofd-data-0
...
"message": "failed to join ...
...
Caused by: org.opensearch.cluster.coordination.CoordinationStateRejectedException: join validation on cluster state with a different cluster uuid 2UlST0WBQIKEV05cDpuWwQ than local cluster uuid v1vi49Q_RRaaC83iMthBnQ, rejecting
...
it seems, that this node hold old previously used OpenSearch cluster ID and attempt to connect to this old OpenSearch cluster instance
this is reason, why this node is missing
- Reset failed data node:
Warning
Double check, that you have at last half data nodes healthy! In our case, we must have 2 data nodes total and 1 is missing. Double check, that OpenSearch cluster is in yellow state. Proceeding with smaller amount of datanodes come to datalost!
- login to this pod:
$ kubectl -n logging exec -i -t ofd-data-0 -- /bin/bash
- delete datadir in this pod:
$ rm -rf /data/nodes
- logout from this pod:
$ exit
- delete this pod for restarting:
$ k -n logging delete pod ofd-data-0 pod "ofd-data-0" deleted
- Check OpenSearch cluster health again:
$ curl -ks https://<Name>:<Password>@localhost:9200/_cat/health
1654180648 14:37:28 logging yellow 6 2 true 45 30 0 4 11 0 - 75.0%
...
1654180664 14:37:44 logging yellow 6 2 true 53 30 0 2 5 0 - 88.3%
...
1654180852 14:40:52 logging green 6 2 true 60 30 0 0 0 0 - 100.0%
our cluster is still in yellow state
running
curl
command over time give information, that cluster regeneratingwait some time and if this problem was solved, you can see cluster again healthy in green state
3 - OpenSearch-Dashboards
Kubernetes logging helm chart deploys an single instance of OpenSearch Dashboards (or just Dashboards) presenting the UI interface to OpenSearch indices.
The helm chart enables authentication configurations based on SAML, ODIC or standalone and leverages dashboards tenant concept. The latter allows teams to innovate UIs such as searches, visualizations and dashboards in shared tenant space leaving a predefined readonly UIs at a global space. Once the UIs are ready to be promoted those can become part of the helm chart saved-objects folder and become standard set of the chart deployment.
In addition the helm chart provisions an OpenSearch DataPrepper component which allows OpenTelemetry traces to be indexed and later visualized at Dashboards observability UI.
3.1 - Opensearch-Dashboards Authentication & Authorization
tbd
3.2 - OpenSearch-Dashboards-Observability
The Observability plugin allow you to visualize tracing data.
Example:
3.3 - Opensearch-Dashboards Visualizations
tbd
4 - Kafka
Kubernetes logging helm chart deploys Apache Kafka as a message broker between FluentBit and Logstash for improving stability and loadbalance in big deployments.
From helm chart version 4.6.0 we omited Apache ZooKeeper as Kafka dependency. Kafka from version 2.8.0 introduced KRaft aka ZooKeeper-less mode. From Kafka version 3.3.0 is KRaft marked as production ready, so, we decide to adopt it in the logging helm chart to save some resources and deploying time. Kafka in Raft mode need to have generated cluster ID. Please check how to generating cluster ID.
4.1 - Howtos
Here you can find guidelines, how to manipulate with Kafka in context of logging helm chart.
4.1.1 - How to generating cluster ID
Cluster ID is required for each Kafka instance to know, to which instance it must to connect and make cluster. It is also used to preparing storage space. We need to set all Kafka instance same cluster ID. If you omit this settings, all Kafka instance generate their own ID and reject make cluster.
There is many ways, how to generate the ID, but I recommend use this chain of Bash command:
$ cat /proc/sys/kernel/random/uuid | tr -d '-' | base64 | cut -b 1-22
You can also use built-in script:
$ bin/kafka-storage.sh random-uuid
Or just start one Kafka instance without the cluster ID settings and it will be generated to the logs.
Sources: