Building UK8S Log Solution Using ELK
Next, we'll cover how to build a UK8S log solution using Elasticsearch, Filebeat, and Kibana.
I. Deploying Elasticsearch
1. About Elasticsearch
Elasticsearch (ES) is an open-source, distributed, RESTful full-text search engine built on Lucene. Elasticsearch is also a distributed document database, where each field can be indexed and each field's data can be searched. ES can horizontally scale to hundreds of servers to store and process PB-level data. It can store, search, and analyze large volumes of data in a very short time. It typically acts as the core engine in situations with complex search scenarios.
2. Environment Requirements
When running Elasticsearch, the vm.max_map_count kernel parameter must be greater than 262144. Therefore, before starting, make sure this parameter has been adjusted correctly.
sysctl -w vm.max_map_count=262144
You can also add an initContainer to the ES orchestration file to modify the kernel parameters, but this requires the kubelet to be started with the --allow-privileged parameter. The UK8S defaults to enabling this parameter. We'll use the initContainer method in the examples to follow.
3. ES Node Roles
ES Nodes can be divided into several roles:
Master-eligible node is the node that is eligible to be selected as a Master node. Set node.master: true.
Data node is the node that stores data, set as node.data: true.
Ingest node is the node that processes data, set as node.ingest: true.
Trible node is used for cluster integration.
For a single-node Node, it defaults to master-eligible and data. For a multi-node cluster, the roles of each node need to be carefully planned based on needs.
4. Deploying Elasticsearch
To facilitate demonstration, we'll place all object resources in a namespace called elk. So we'll need to create a namespace first:
kubectl create namespace elk
No Distinction Between Node Roles
Under this mode, roles are not distinguished among nodes in the cluster. Please refer to the elk-cluster.yaml configuration file.
bash-4.4# kubectl apply -f elk-cluster.yaml
deployment.apps/kb-single created
service/kb-single-svc created
statefulset.apps/es-cluster created
service/es-cluster-nodeport created
service/es-cluster created
bash-4.4# kubectl get po -n elk
NAME READY STATUS RESTARTS AGE
es-cluster-0 1/1 Running 0 2m18s
es-cluster-1 1/1 Running 0 2m15s
es-cluster-2 1/1 Running 0 2m12s
kb-single-69ddfc96f5-lr97q 1/1 Running 0 2m18s
bash-4.4# kubectl get svc -n elk
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
es-cluster ClusterIP None <none> 9200/TCP,9300/TCP 2m20s
es-cluster-nodeport NodePort 172.17.177.40 <none> 9200:31200/TCP,9300:31300/TCP 2m20s
kb-single-svc LoadBalancer 172.17.129.82 117.50.40.48 5601:38620/TCP 2m20s
bash-4.4#
Access Kibana via kb-single-svc's EXTERNAL-IP.
Distinguishing Node Roles
If you need to distinguish the roles of the nodes, you need to set up two StatefulSet deployments - one for the Master cluster and one for the Data cluster. In the Data cluster storage example, an emptyDir is simply used. Depending on your needs, you can use localStorage or hostPath. For information about storage, refer to the Kubernetes official website (opens in a new tab). This method can prevent data loss and index rebuilding when a Data node restarts locally. However, if migration occurs, a shared storage solution is the only way to keep the data. The specific orchestration file can be found here: elk-role-cluster.yaml
bash-4.4# kubectl apply -f elk-role-cluster.yaml
deployment.apps/kb-single created
service/kb-single-svc created
statefulset.apps/es-cluster created
statefulset.apps/es-cluster-data created
service/es-cluster-nodeport created
service/es-cluster created
bash-4.4# kubectl get po -n elk
NAME READY STATUS RESTARTS AGE
es-cluster-0 1/1 Running 0 53s
es-cluster-1 1/1 Running 0 50s
es-cluster-2 1/1 Running 0 47s
es-cluster-data-0 1/1 Running 0 53s
es-cluster-data-1 1/1 Running 0 50s
es-cluster-data-2 1/1 Running 0 47s
kb-single-69ddfc96f5-lxsn8 1/1 Running 0 53s
bash-4.4# kubectl get statefulset -n elk
NAME READY AGE
es-cluster 3/3 2m
es-cluster-data 3/3 2m
bash-4.4# kubectl get svc -n elk
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
es-cluster ClusterIP None <none> 9200/TCP,9300/TCP 44s
es-cluster-nodeport NodePort 172.17.63.138 <none> 9200:31200/TCP,9300:31300/TCP 44s
kb-single-svc LoadBalancer 172.17.183.59 117.50.92.74 5601:32782/TCP
II. Deploying FileBeat
In the process of log collection, Logstash comes to mind first because it's an important member of the ELK stack. But during testing, we found that Logstash is based on JDK. Just starting Logstash without generating logs consumes about 500M of memory. In the case of starting a log collection component in each Pod, using Logstash seems to waste system resources. Therefore, we recommend the lightweight log collection tool Filebeat, which consumes only about 12M of memory when started separately. The specific orchestration file can be referred to as filebeat.yaml (opens in a new tab). This example uses DaemonSet for orchestration.
bash-4.4# kubectl apply -f filebeat.yaml
configmap/filebeat-config created
daemonset.extensions/filebeat created
clusterrolebinding.rbac.authorization.k8s.io/filebeat created
clusterrole.rbac.authorization.k8s.io/filebeat created
serviceaccount/filebeat created
The configuration of filebeat used in the orchestration file is mounted to /home/uk8s-filebeat/filebeat.yaml. The actual startup of filebeat uses this custom configuration. For information about filebeat's configuration, see the corresponding explanation in Configuring Filebeat (opens in a new tab).
The Filebeat command line parameters can be referred to Filebeat Command Reference (opens in a new tab). The parameters used in this example are explained as follows:
- -c, --c FILE
Specifies the configuration file used by Filebeat. If not specified, the default configuration file /usr/share/filebeat/filebeat.yaml is used.
- -d, --d SELECTORS
Enables the debug mode for the specified selectors. Selectors is a list separated by commas. -d "*" means to debug all components. Please turn off this option in the actual production environment. It can be effectively debugged when the configuration is first switched on.
- -e, --e
Specifies that the log is output to standard error, and the default syslog/file output is turned off.
III. Deploying Logstash (Optional)
Since Filebeat's message filtering capability is limited, it is usually combined with Logstash in actual production environments. In this architecture, Filebeat acts as a log collector, sending data to Logstash. After parsing and filtering by Logstash, it is sent to Elasticsearch for storage and presented to users by Kibana.
1. Creating Configuration Files
Create the configuration file of Logstash. You can refer to elk-log.conf. More detailed configuration information can be seen in Configuring Logstash (opens in a new tab). Most Logstash configuration files can be divided into 3 parts: input, filter, and output. The sample configuration file indicates that Logstash gets data from Filebeat and outputs it to Elasticsearch.
2. Create a ConfigMap named elk-pipeline-config based on the configuration file as follows:
bash-4.4# kubectl create configmap elk-pipeline-config --from-file=elk-log.conf --namespace=elk
configmap/elk-pipeline created
bash-4.4# kubectl get configmap -n elk
NAME DATA AGE
elk-pipeline-config 1 9s
filebeat-config 1 21m
3. Deploy Logstash on K8S cluster.
Write logstash.yaml, and mount the ConfigMap created in the yaml file. It's important to note that the logstash-oss image is used here. For the differences between the OSS and non-OSS versions, please refer to this link (opens in a new tab).
bash-4.4# kubectl apply -f logstash.yaml
deployment.extensions/elk-log-pipeline created
service/elk-log-pipeline created
bash-4.4# kubectl get po -n elk
NAME READY STATUS RESTARTS AGE
elk-log-pipeline-55d64bbcf4-9v49w 1/1 Running 0 50m
4. Check if Logstash is working correctly. If the following content appears, it means Logstash is working normally
bash-4.4# kubectl logs -f elk-log-pipeline-55d64bbcf4-9v49w -n elk
[2019-03-19T08:56:03,631][INFO ][logstash.agent ] Successfully started Logstash API endpoint {:port=>9600}
...
[2019-03-19T08:56:09,845][INFO ][logstash.inputs.beats ] Beats inputs: Starting input listener {:address=>"0.0.0.0:5044"}
[2019-03-19T08:56:09,934][INFO ][logstash.pipeline ] Pipeline started succesfully {:pipeline_id=>"main", :thread=>"#<Thread:0x77d5c9b5 run>"}
[2019-03-19T08:56:10,034][INFO ][org.logstash.beats.Server] Starting server on port: 5044
5. Modify the output parameter of filebeat.yaml to point the output to Logstash
items:
- apiVersion: v1
kind: ConfigMap
metadata:
...
data:
filebeat.yml: |
...
output.logstash:
hosts: ["elk-log-pipeline:5044"]
...
IV. Collecting Application Logs
We've deployed Filebeat to collect application logs and output the collected logs to Elasticsearch. Now we'll use an Nginx application as an example to test whether the logs can be collected, indexed, and displayed properly.
1. Deploying Nginx Application
Create a deployment and LoadBalancer service for Nginx. This way, Nginx can be accessed via EIP. We mount the output path of the Nginx access log to the host's /var/log/nginx/ path using hostPath.
bash-4.4# kubectl apply -f nginx.yaml
deployment.apps/nginx-deployment unchanged
service/nginx-cluster configured
bash-4.4# kubectl get svc -n elk nginx-cluster
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
nginx-cluster LoadBalancer 172.17.153.144 117.50.25.74 5680:48227/TCP 19m
bash-4.4# kubectl get po -n elk -l app=nginx
NAME READY STATUS RESTARTS AGE
nginx-deployment-6c858858d5-7tcbx 1/1 Running 0 36m
nginx-deployment-6c858858d5-9xzh8 1/1 Running 0 36m
2. Filebeat Configuration
When Filebeat was deployed before, /var/log/nginx/ was added to inputs.paths, so Filebeat can monitor and collect nginx logs.
filebeat.modules:
- module: system
filebeat.inputs:
- type: log
paths:
- /var/log/containers/*.log
- /var/log/messages
- /var/log/nginx/*.log
- /var/log/*
symlinks: true
include_lines: ['hyperkube']
output.logstash:
hosts: ["elk-log-pipeline:5044"]
logging.level: info
index: filebeat-