Kubernetes Cloud in a Bare Metal Box



It's A Little Different


In most cases, Kubernetes is installed on a cloud infrastructure somewhere. These days this is a one-click operation at most cloud providers. Of course, it was designed by Google and initially catered to the GCP system though it has been expanded to work seamlessly with AWS, Azure, and many other cloud services. But, sometimes, when we want to know the raw core of how things work, we need to install it ourselves on bare-metal servers or, in the case of this project, VMs inside Proxmox.

Many different tutorials all over the Internet walk through installing a Kubernetes cluster. Most of them are geared toward minikube on a laptop, workstation, or a cloud-based install. Installing on bare metal has a couple of hurdles that we need to overcome to be able to use this as a true Kubernetes cluster. We will cover these changes in detail when we get to those sections.

The main reason for building this cluster is to practice for the Kubernetes certification exam. However, it will live on after I have mastered that certification. It will live on as a platform to test databases and different ways of processing data into said databases. Also, it will become a major part of our development, staging, and production system for this website and other website projects. So, without further hesitation, let us get to building this cluster.



But Wait, There is a Legend

Over the time of writing these projects, I have come up with a legend that I will use from here on out, and I will update my previous projects with the same legend. All commands that use a single line will be in this color run this command.

Terminal output will look like the example below.

NAME                                  READY   STATUS    RESTARTS   AGE
coredns-565d847f94-rdvfc              1/1     Running   0          9m47s
coredns-565d847f94-svs8x              1/1     Running   0          9m47s
etcd-kube-control-plane                      1/1     Running   0          10m

Amusingly the color of the commands is an error from highlight.js letting me know that my code is not inside of a pre block. This color draws attention so I believe was good to leave it this way to identify commands.


When we have commands that are multi line and use EOF, they will look like this.

cat <<EOF | kubectl apply -f -
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
  name: first-pool
  namespace: metallb-system
spec:
  addresses:
  - 192.168.2.180-192.168.2.189
EOF

End Legend



Resources

Proxmox VMs In This Case

Since Kubernetes will take over most of the work on this development server, we dedicated large parts of the resources to the cluster workers. Since all workers have the same VM configuration, it will only be listed once. This project has been designed to be able to run on a Proxmox host with 13 available cores and 13 processor cores. With a crafty Proxmox setup this could work fine on a decent laptop.

Hardware



That Static IP Thing

When the operating system is installed on these nodes, they should all be set up for static IP addresses. DHCP is easy but will constantly overwrite our resolver configuration, which causes problems. We also do not want the IP to change for some reason during a reboot. The easiest way to do this is to set them up with static IP addresses during OS installation. If you are working through this project on Proxmox, you can follow the guide below to install and configure a VyOS software router to separate your Kubernetes traffic from your primary network. If you are following this tutorial on another platform where you do not have this option, or you are using bare metal with a hardware router, then the VyOS configuration can serve as a guide on how to build this routing configuration in your lab.

If you want to follow this document to the letter use the IP assignments laid out below.

  • VyOS eth1
    172.16.1.1
  • kube-control-plane
    172.16.1.10
  • kube-worker-1
    172.16.1.11
  • kube-worker-2
    172.16.1.12
  • kube-worker-3
    172.16.1.13



Lets Get This Project on the Road


Install and Confiure VyOS on Proxmox

Though installing and configuring VyOS on Proxmox adds an extra step to the process, this is a good way to separate the Kubernetes network from your primary network. Setting up this project this way has two main benefits. Of course, the separation from the primary network, but the other added benefit is that it keeps internal pod communication completely within Proxmox. Adding VyOS in front of the Kubernetes cluster also allows full control of all routing, DHCP, and DNS decisions inside the Kubernetes network.


As part of this Kubernetes project, a vyOS VM is set up as the 'backplane' of the cluster. This routing configuration is very project-and-scenario-dependent because of our lab setup. This private network that we will build ends up double-NATed to the outside world. This type of NAT setup is an important consideration if you're hosting production services off of it but of very little importance in a lab setup where you're learning the basics.

Summary

Below are the detailed steps and screenshots that walk through the whole process of installing VyOS on a Proxmox host. The network that we will assign to the INSIDE interface, which connects the Kubernetes hosts, will be 172.16.1.0/24. In our lab network, the primary network is 192.168.2.0/24. The one thing that has to be done outside of Proxmox is to build a static route in your primary router that routes all 172.16.1.0/24 traffic to 172.16.1.1, which will end up being the address of the eth1 VyOS interface. If the static route is not setup the Kubernetes nodes will partially work but you will not be able to ssh into them from the primary network and assigning EXTERNAL IP's to the pod will not work outside of the 172.16.1.0/24 netowrk. Just a reminder that this static route is needed if you plan on following this whole project. Also remember that 192.168.2.0/24 is our internal network, your's could be anything in the IANA private networkspace. Though for a consumer router these are normally something that lives int he 192.168.0.0/16 address space.



Ok, Enough Build Up, Lets Build this Thing


First Step. Get VyOS Installer Image

The first thing that we need to do is download the VyOS ISO. We will mount this ISO on Proxmox and then build a virtual machine to install it. Head on over to the Proxmox ISo downloads page and download Proxmox VE 7.2 ISO Installer. Save this somewhere on your workstation. When the download is complete, let's move on to Proxmox and upload this new image so we can mount it to the VM we will create.

Where you store your ISO's is based on your local Poxmox setup, so you should know where to upload this ISO so we can use it to build a VM. If you are new to Proxmox check out this blog article that explains where to find this and how to set it up if it is not available How to Upload ISO Files to ProxmoxVE.


Configuring the Proxmox Linux Bridge

As you can see this is a very simple bridge. We do not set the gatway or any bridge ports here. What this does is allows us to connect our primary Proxmox interface, in our case vmbr0, and this new bridge vmbr4 together as eth0 and eth1 on the Proxmox software router. After this VyOS handles all of the routing and gateway duties.


Building the VyOS VM

Next we need to create the VyOS VM and add a secondary network interface to it.

  • 1.) Create VyOS VM. We gave our Kubernetes router 1G of RAM, one CPU core and left all else relatively unchanged. Leaving the Kernel type on 5.x-2.6 is just fine. 32G of disk should, in my experience, suffice.

    The series of images below step through this process.



    You can choose whatever you want for the VM ID and the Name. Mcp is, of course, the Proxmox host where this VyOS router will run. I chose 300 for the VM ID because it separates this project from other projects. The Kubernetes control plane, worker 1, worker 2, and worker 3 will have VM ID 301, 302, 303, and 304 respectively



    Here, we choose the VyOS image used to install this router OS. Your ISO may be named slightly differently than the one here, but as long as you downloaded it from the official VyOS page linked above, it should be fine. That is the only thing that changes on this screen. Click next after choosing the correct ISO image.



    This screen is where we choose the type, amount, and storage location. We chose to carve the storage out of our ZFS pool. It does not matter where the storage comes from, only that there are 4GB available at whichever location you choose. Next, we move on to choosing the CPU layout.



    We will make no changes here as this network so small that 1 CPU core should handle the routing just fine.



    In most cases this defaults to 2GB whe bulding a Proxmox VM. Though this default can be changed so just make sure that the Memory is set at 1024 and click next to move on to the network settings for this VM.



    Bridge selection normally defaults to vmbr0. This identifier may be different depending on how Proxmox is configured. In our case, vmbr0 is one of the network interfaces that connect to the primary lab network 192.168.2.0/24. Choose what works for your setup, remembering that the purpose of this router is to bridge the 172.16.1.0/24 network into the primary network, in our case 192.168.2.0/24.

    Note: Thoughout this project you can uncehck Firewall or leave it. This setting doesn't matter much if you don't have Proxmox's firewall active/set with rules, but this is more following the idea that none of this matters considering our NAT situation, explained further down.

    After choosing the network bridge click on Next. This takes us to the build VM confirmation page. Confirm this VM and wait for Proxmox to build it.

  • 2.) Now we need to add the new bridge that we created. Remember that we created vmbr4, or whatever identifier you chose, above.
    • Navigate to the VM, head to 'Hardware' and click 'Add' at the top of the page.



    • Add a Network Device, assign it the bridge you just created, 'vmbr4' for our example



      Click on Add, and we are finished setting up this VM for VyOS to route our Kubernetes network. When we are all done, the VM Hardware information in Proxmox should look like the image below. Your mac address will be different, but otherwise, your VM should look very similar to this one.


Start the VyOS VM

Use whatever method that you like to start the VyOS vm and switch over to the console for this VM.

It may take a few seconds for VyOS to boot. Once it is finished it will presetnt us with a console login as shown below.



The first thing we need to do is install VyOS as a permanent router. So lets log into the router with the default username and password of vyos and vyos.

After the login is complete run the command below to start the installation.

Install VyOS to VM permanent storage
install image

The installer will ask a few questions, the default answers should work for everything. It will ask you to set a new password, choose whatever works for you. For this project I we will be using notasecurepassword as the password wherever we need one. If you run into any issues with the installer please check out the VyOS document Permanent installation.

Installing this system will require a reboot at the end. Once the login prompt appears on the console again, let's login with the username vyos and the password set during the installation. Now we can move on to configuring the router.

Enter VyOS configuration mode
conf

Through the rest of this part of the document we will setup a bridge between the primary network and the network that we will create on this router. We will create the network 172.16.1.0/24 and route it through our primary netowrk, in our lab this is 192.168.2.0/24. First we will setup some very basic firewall rules that allow all traffic to pass between these two networks. For the sake of this project you can look at the primary network as the Internet as that is what we are trying to emulate with this project.

Set LAN-LOCAL firewall to accept all traffic
set firewall name LAN-LOCAL default-action 'accept'

Set LAN-WAN firewall to accept all traffic
set firewall name LAN-WAN default-action 'accept'

Warning:

You would NOT want to do this in a production environment. These settings allow ALL traffic through from both directions. However, as we are behind two separate routers, with the first router in line the gateway to our WAN address properly firewalled, this is only allowing all traffic from both private networks (192.168.2.0/24, 172.16.1.0/24) and not allowing outside world traffic.

Every network has its own specific nuances just like our lab set-up, so I recommend reading up on VyOS docs if you're planning to replicate what we do here outside of a lab environment.


Now lets move on to actually setting up the network. This set of commands works with the eth0 interface that we connected to our primary network.

Configure eth0 to grab an address from DHCP
set interfaces ethernet eth0 address dhcp

This is a description line, for us to know which way this interface points
set interfaces ethernet eth0 description 'OUTSIDE''

Auto negotiate the network duplex (Should always negotiate to full)
set interfaces ethernet eth0 duplex 'auto'

Set MTU for interface eth0
set interfaces ethernet eth0 mtu '1500'

Autonegotiate speed with uplinke router
set interfaces ethernet eth0 speed 'auto'

Commit and save this, and you should find that the interface will soon have an IP address assigned by the first gateway. Depending on the gateway, you should be able to set this as a static IP assignment (HIGHLY RECOMMENDED).

Now, on to configuring our 172.16.1.x/24 network for Kubernetes.


Configure Kubernetes Network Segment

Set the IP scope of the interface to the 172.16.1.1/24 network
set interfaces ethernet eth1 address '172.16.1.1/24'

Set the description
set interfaces ethernet eth1 description 'INSIDE'

Duplex: Always set to 'auto' if you're not sure what you're doing
set interfaces ethernet eth1 duplex 'auto'

MTU: 1500 is the standard for 1G Ethernet. Keep it at 1500.
set interfaces ethernet eth1 mtu '1500'

Speed: Negotiation speed. Keep to auto.
set interfaces ethernet eth1 speed 'auto'

Regarding the NAT situation mentioned above, the following are the configuration settings for basic NAT on a VyOS instance. Once again, I'll mention that this is a very specific case where this hosts a private network behind another private network.

This sets the outbound interface for NAT
set nat source rule 100 outbound-interface 'eth0'

This sets the source address space for NATing through the 'eth0' interface
set nat source rule 100 source address '172.16.1.0/24'

This tells it to use whatever IP is bound to the interface as opposed to setting a NAT-specific IP address
set nat source rule 100 translation address 'masquerade'

To outline, this allows the Kubernetes pods routed through this network to communicate with the OTHER network and thus the outside world. As was mentioned earlier, for this to work correctly, you need a static route between your primary router and this VyOS IP address. This static route boils down to telling your primary router to route all 172.16.1.0/24 traffic into our VyOS router at IP address 172.16.1.1, which we set above.

How to set a static route will vary heavily (or may not even be a feature at all) on some consumer-grade routers, so this part is all you- Google-fu to the rescue should suffice. Our router (a Linksys EA9300, for now at least) has the option to set static routes and it is a very straightforward process.

Most of these configuration options are in the VyOS Quick Start guide. I give major props to the VyOS team for writing very detailed and easy to understand documentation.


Now we Need to Get The Assigned IP

When we move on to setting up the DNS forwarder below, we will need the IP address that the primary router assigned to eth0 of this VyOS router. Let's go ahead and grab that IP now so we do not have to exit out of configuration mode during the DNS setup. On the VyOS router console run the commands below.

Commit Current Configuration
commit

Save Current Configuration
save

Exit Configuration Mode
exit

Exiting will take us out of the configuraton console back to the command console. Once we are back at a prompt run the command below to query the information on interface eth0, which is where we had the primary network assign a dhcp address. If you chose to assign eth0 statically you will know the address we are trying to find here.

Query eht0 to get assigned IP address
show interface ethernet eth0

This command should return information similar to what is below.

eth0: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 0a:46:d2:bd:a7:8c brd ff:ff:ff:ff:ff:ff
    inet 192.168.2.164/24 brd 192.168.2.255 scope global dynamic eth0
       valid_lft 2512162sec preferred_lft 2512162sec
    inet6 fe80::846:d2ff:febd:a78c/64 scope link
       valid_lft forever preferred_lft forever
    Description: OUTSIDE

    RX:       bytes  packets  errors  dropped  overrun       mcast
         8466883734  2145791       0        0        0           0
    TX:       bytes  packets  errors  dropped  carrier  collisions
          251309236  1550768       0        0        0           0

We are interested in the 3rd line where this example shows 192.168.2.164/24. This is the IP address that DHCP assigned to that interface. In most cases what you see here will be completely different but this is the IP that creates the gateway between the outside network and our Kubernetes network. This is also the ip address where the static route that I have mentioned a few times needs to be routed. Static Route 172.16.1.0/24 -> 192.168.2.164.



Time to do the DNS

All of our Kubernetes nodes will need to talk to the outside world and resolve from real-world DNS servers before the Kubernetes kube-dns package is online. Since our 172.16.1.0/24 network is hidden behind another network, we must let it know how to get DNS through its own gateway. The settings below set this up in our VyOS software router.

Enter Configuration Mode
conf

Allow forward request from the network we have built
set service dns forwarding allow-from '172.16.1.0/24'

Allow forward requests from the primary network
set service dns forwarding allow-from '192.168.2.0/24'

Maximum number of cached DNS entries. With our project the total cache should be quite small
set service dns forwarding cache-size '2000000'

Attempt to validate the request data but do not be strict about it
set service dns forwarding dnssec 'process'

Only listen on this IP address for DNS forwarding requests. This is the gateway IP address for the Kubernetes network.
set service dns forwarding listen-address '172.16.1.1'

IP address to forward all requests to.
(We use an inhouse name server, you could also use 8.8.8.8 which is Google)
set service dns forwarding name-server 'xxx.xxx.xxx.xxx'

Do not forward requests for the IANA private networks such as the 10.0.0.0/8
set service dns forwarding no-serve-rfc1918

Avaialble IP address to use for sending requests.
(This is where we use the IP address that we gathered above, yours will be different)
set service dns forwarding source-address '192.168.2.164'

This is our basic, working DNS configuration. If curious on anything, take a peek at the VyOS document on Configuring DNS.

And now we can finally save all of this VyOS configuration and prepare to move on to setting up the VM's.

Commit the changes
commit

Save the new configuration
save

Exit Config Mode
exit

We can test this from VyOS by running a nameserver dig pointed at our 172.16.1.1 address. Run the command below, it should return a result. If it times out then something is not correct.

dig test
dig @172.16.1.1 google.com

This is a final reminder to not forget to setup the static route in the router that VyOS eth0 is connected to. This is actually vmbr0 in the proxmox server but since it does its own bridging we just need to focus on the IP we retreived from eth0 and make sure 172.16.1.0/24 is statically routed to that ip address.

And done! You should be able to assign the 'vmbr4' interface to any VM of your choosing, assign it a static IP address in your OS of choice and be able to communicate across both networks. Now that all of this is in place it keeps our Kubernetes traffic to its self unless specifically requested to be routed outside with an EXTERNAL ip which we will get to later in this document




Software

All VMs in this project are running Ubuntu 22.04. As of this writing Kubernetes 1.25 is the latest version so that is what we will be installing.

Beyond the cluster its self we will also be installing kube-state-metrics, Elasticsearch, Kibana, and Metricbeat for the initial monitoring solution. Though this will change over time as we install and test other monitoring solutions such as PostgreSQL with TimescalDB, InfluxDB, and other packages that draw my interest.

This defines the basics of our cluster with a bit of information on what we will be doing.




Basic Setup for All VMs

Before starting the install, all of the nodes need a bit of basic setup.


Disable an annoyance

Ubuntu 22.04 enabled the needsrestart system by default. This system generates the menu that pops up at the end of updates and new package installs that inform you what daemons need to be restarted. These menus are good information to know but not something we want to pop up every time we install a package because it normally pops up every time. The command below disables this by telling it to restart all of the daemons whenever it needs to. This is great in this lab but maybe not so great in production, though most of that is automated now anyway.


Modify needsrestart to disable menu by providing the a option to enable automatic restarts
sudo sed -i 's/#$nrconf{restart} = '"'"'i'"'"';/$nrconf{restart} = '"'"'a'"'"';/g' /etc/needrestart/needrestart.conf

This command has been tested extensivley with Ubuntu 22.04 server. I can not guarantee its operation on any other distrubition or version.


Update the Repositories and Packages

This should be number one on all Operating System installs. Run the commands below to update the repositories and install any upgrades that are available.

sudo apt update

sudo apt upgrade -y

-y tells apt-get to aswer yes to questions about installing package dependanices


Disable Swap on All Nodes

The first thing that we need to do is to disable swap memory on all Kubernetes nodes permanently. Swap memory allows the operating system to store memory blocks on permanent storage. You may think this only comes into play when the system is under a high memory load, but this is not the case. The OS will also use swap memory to move old objects out of Memory to free up space for the disk cache and other operations. OS memory swapping is similar to how JAVA treats young-generation and old-generation memory segments. Seeing as how permanent storage of any kind, even the fastest NVME, is much slower than the system RAM that sits right on the CPU bus, having to retrieve anything from a swap drive puts a tremendous load on the system as we are now dealing with the IO wait of the storage device and controller. It is better to make sure you have enough physical ram to carry the load you will present to the cluster and turn off the swap altogether.


Disable swap now on the live system.

sudo swapoff -a


Disable swap on reboot

Next we need to disalbe swap in fstab so it will not load on a reboot. Initially I had us using an editor here to go and edit the files directly. I decided to change that to a sed command here instead. Running this command has the same effect as editing /etc/fstab and adding a comment # character in front of the /swap.img line. This has been tested many times with Ubuntu 22.04, I can not guarantee that it will work on any other version, though it should.


Update /etc/fstab
sudo sed -i '/\/swap.img/ s/^/#/' /etc/fstab

Now we can check that the change was made using the cat command. Run the command below to check the contents of /etc/fstab.

cat /etc/fstab

# /etc/fstab: static file system information.
#
# Use 'blkid' to print the universally unique identifier for a
# device; this may be used with UUID= as a more robust way to name devices
# that works even if disks are added and removed. See fstab(5).
#
# <file system> <mount point>   <type>  <options>       <dump>  <pass>
# / was on /dev/sda2 during curtin installation
/dev/disk/by-uuid/04dc156d-780c-43aa-acaf-5db03c915200 / ext4 defaults 0 1
#/swap.img       none    swap    sw      0       0

Your fstab file will look different from this as the uuids for our disks will differ. We only need to check that a # has been added in front of /swap.img. If all looks good, then we can head off to set up the node host networking.




Increase vm.max_map_count

When working with Elasticsearch, it is a good idea to increase vm.max_map_count to the highest setting. This setting limits how many memory-mapped areas a process may have. The default setting works well for many applications, but part of the Elasticsearch query speed is related to how it maps indices into memory. Small Elasticsearch workloads may work fine without changing this setting, but we will change it here, so we do not have to do it later.

Increase vm.max_map_count to 262144 now
sudo sysctl -w vm.max_map_count=262144

Keep vm.max_map_count at 262144 on reboot
echo "vm.max_map_count=262144" | sudo tee -a /etc/sysctl.conf


End of the host basics

These first few steps prepared the operating system for the next steps. We will still be modifying system files and restarting daemons. These few things make those steps easier. Now we move on to installing modules and changing system-level variables that pertain to networking.



Setting up the Host Network


Setup Bridge Networking and IP Forwarding

The first thing you may wonder is why we are building a bridge on top of a bridge on top of a bridge. This setup is not uncommon in any cloud-based network. All network segments are bridges, with normally one gateway into each bridge. This design keeps the different unique networks separate while still allowing access in and out through one or multiple load-balanced gateways. It just looks a little different here because we are doing all of this inside of one server.

To work as a cluster, the Kubernetes system builds its own network that can we selectively connect to the primary network through services and endpoints. For this to work correctly, all nodes' network settings need to be in IP forwarding bridge mode. Setting this up is broken down into a few commands, which are outlined below.


Add overlay and br_netfilter to kernel modules
cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf
overlay
br_netfilter
EOF

This command creates the file at /etc/modules-load.d/k8s.conf and populates it with overlay, and br_netfilter. Adding them here tells Linux to load these modules into the kernel on the next reboot.

  • br_netfilter

    This module enables transparent masquerading of the network stack and facilitates VLAN traffic for communication between pods.

  • overlay

    This is a file system overlay module that allows the PODS to create their own mounts on top of the host file system.

We can use the cat command to verify the values entered by this command.

cat /etc/modules-load.d/k8s.conf

The output should show overlay and br_netfilter, as this is what we sent to the file with the cat and sudo tee commands above. Now we need to load these modules into the system. Yet, with the power of Linux and modprobe, we do not have to reboot. Run the commands below to load these modules into the kernel without rebooting.

Load overlay kernel module
sudo modprobe overlay

Load br_netfilter kernel module
sudo modprobe br_netfilter

If there are no errors these commands will not return any values. These commands force the modules to load on a running system. During a reboot they will be loaded from /etc/modules-load.d/k8s.conf.


Enable network forwarding and bridging

Next we have to modify sysctl params to turn on the network bridging that we created with the commands above. We will cat these commands into a sysctl file at /etc/sysct.d/k8s.conf. This is in a different location from the ones above as these work directy on system variables.

cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-iptables  = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward                 = 1
EOF

Load new network configuration

Now we need to tell the system to load these new configurations with the command below.

sudo sysctl --system

This command will output every change that was made. Near the bottom, we will see our net.bridge and net.ipv4_forward configurations. You can see that they all =1, which means that our changes are enabled. Now we will move on to installing the container runtime. This project will use CRI-O. All my testing shows that this runtime works with all of the pods we will install with this project. These commands should be run on all nodes.




Installing CRI-O container runtime

Enabling the repositories.

These first few commands enable the repositories for the CRI-O container runtime. This package is what runs all of the containers. Sort of like Docker but not. You can see that the two variables OS= and VERSION= are used in the tee output to set the version that will be added to the repo list.

Set the OS Version environment variable
OS="xUbuntu_22.04"

Set the package version variable
VERSION="1.25"

Use Linux cat command to add repository
cat <<EOF | sudo tee /etc/apt/sources.list.d/devel:kubic:libcontainers:stable.list
deb https://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable/$OS/ /
EOF
Use Linux cat command to add the versioned repository
cat <<EOF | sudo tee /etc/apt/sources.list.d/devel:kubic:libcontainers:stable:cri-o:$VERSION.list
deb http://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable:/cri-o:/$VERSION/$OS/ /
EOF

Next we need to add the gpg keys for these repos.

Load the GPG key for the version specivied in the VERSION= variable above
curl -L https://download.opensuse.org/repositories/devel:kubic:libcontainers:stable:cri-o:$VERSION/$OS/Release.key | sudo apt-key --keyring /etc/apt/trusted.gpg.d/libcontainers.gpg add -

Load the GPG key for the latest stable version identified by the OS= variable above
curl -L https://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable/$OS/Release.key | sudo apt-key --keyring /etc/apt/trusted.gpg.d/libcontainers.gpg add -

Now we update the repositories and install the cri-o runtime engine.

sudo apt-get update

sudo apt-get install cri-o cri-o-runc cri-tools -y

And finally we reload systemd and enable cri-o to tie all of this initial setup together.

sudo systemctl daemon-reload

sudo systemctl enable crio --now

At this point the core setup is complete. Now it is time to install the Kubernetes packages on all nodes.



Finally Installing Kubernetes


Install Kubeadm, Kubelet and Kubectl

Now we can start to install the Kubernetes system. First, we install core dependencies that Kubernetes and other parts of this install process depend on. At this point, we are still running these commands on all nodes that we set up for this Kubernetes cluster. When we start running commands on individual nodes, which will be soon, it will be noted.


Let's make sure we have an updated pacakge list
sudo apt-get update

Here we install a few dependancines needed to install the other packages
sudo apt-get install -y apt-transport-https ca-certificates curl

Now we curl the GPG key from Google. This ensures a secure package.
sudo curl -fsSLo /usr/share/keyrings/kubernetes-archive-keyring.gpg https://packages.cloud.google.com/apt/doc/apt-key.gpg

Next we install the GPG key for the Kubernetes repository.
echo "deb [signed-by=/usr/share/keyrings/kubernetes-archive-keyring.gpg] https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee /etc/apt/sources.list.d/kubernetes.list

Now that we are done setting up repositories and preparing the system we will install the Kubernetes tools.


Update the pacakge repository so we can see the new packages
sudo apt-get update -y

Install Kubernetes Tools
sudo apt-get install -y kubelet kubeadm kubectl

This brings us to the end of all of the system setup and package installs that are needed to create a Kubernetes cluster. Next we will initiialize the Kubernetes control plane.



Initialize the Control Plane

We have to initialize the control plane first so we can generate the tokens and networks needed to add the workers when we get to that point. First, we need to set a few environment variables. We have to make an important decision here when it comes to POD_CIDR. The POD_CIDR defines the internal network that pods will communicate on. We want it to be distinct from any other network in our infrastructure. Since my lab infrastructure is based on 192.168.2.0/24, we will assign the POD_CIDR to a network in the 10.0.0.0/8 private IP set. We only need a few addresses, so we will work with 10.200.0.0/16. The NODENAME is set by telling the shell to run the hostname -s command.

Since we statically assigned all of the address on these nodes we know that the IP address of the control plane is 172.16.1.10, which connects to our VyOS router at 172.16.1.1, of course that was already setup during the VM build when we selected the gateway


Set Environment Variables (Run on controle plane server only)
IPADDR="172.16.1.10"
NODENAME=$(hostname -s)
POD_CIDR="10.200.0.0/16"

Now we can run the initializer command. As you can see, it uses the variables we just set above. These are stored in the user environment and can be called by this user until the session ends.



Initialize Kubernetes Control Plane (Run on controle plane server only)
sudo kubeadm init --apiserver-advertise-address=$IPADDR --apiserver-cert-extra-sans=$IPADDR --pod-network-cidr=$POD_CIDR --node-name $NODENAME

This can take a few minutes to run as it configures everything and boot up the control plane but eventually you will see an output similar to the one below.

                   
Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

Alternatively, if you are the root user, you can run:

  export KUBECONFIG=/etc/kubernetes/admin.conf

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join 192.168.2.99:6443 --token l5c9su.exos28jxx7q2jzl3 \
        --discovery-token-ca-cert-hash sha256:8ef5f2cb7efd439bcfdcfd4dd54204cb5e1305b653ccb483d5fa50613b715cf0
                  
               

I have left the token and cert hash in this example as this cluster will be destroyed as soon as this project is over. In no situation should you ever leave sensitive information like this on a live cluster.

As the output says, there are a couple more things that we need to do. First we need to create a folder and move some files to it. Since we are not running this project as root we will use the first set of commands.

We are only working with the node hosting the control plane at this point. Do not run these commands on the workers.


Create .kube configuration folder in users home folder
mkdir -p $HOME/.kube

Super user copy the Kubernetes file to the folder we just created
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config

sudo chown $(id -u):$(id -g) $HOME/.kube/config

Now we can check that all of our steps worked correctly by running the command below.

kubectl get po -n kube-system

You should see nearly the same output as what is shown below.

NAME                                         READY   STATUS    RESTARTS   AGE
coredns-565d847f94-4nnxp                     1/1     Running   0          20s
coredns-565d847f94-t4v8q                     1/1     Running   0          20s
etcd-kube-control-plane                      1/1     Running   0          34s
kube-apiserver-kube-control-plane            1/1     Running   0          35s
kube-controller-manager-kube-control-plane   1/1     Running   4          34s
kube-proxy-fhjct                             1/1     Running   0          20s
kube-scheduler-kube-control-plane            1/1     Running   4          34s

So, now we have a Kubernetes almost cluster running. By default, the control plane will not schedule any pods to itself. Though the control plane can run pods, we have to taint the pod to allow it to run on the control plane. We will do that later for one pod when we install Metricbeat.



Install Calico Networking Plugin

This is a fairly easy install as it is available as a Kuberntes manifest from projectcalico.org.


Download and apply the Calico manifest
kubectl apply -f https://docs.projectcalico.org/manifests/calico.yaml

This manifest can take a minute or so to initialize and launch. We can check the status by running the same get pods command we did above, kubectl get po -n kube-system. You should see a similar output to the one above, but we will show two new pods running here as the top two entries.

                  NAME                                         READY   STATUS              RESTARTS   AGE
calico-kube-controllers-59697b644f-grt8l     0/1     ContainerCreating   0          9s
calico-node-kjn8s                            0/1     Init:0/3            0          9s
coredns-565d847f94-4nnxp                     1/1     Running             0          89s
coredns-565d847f94-t4v8q                     1/1     Running             0          89s
etcd-kube-control-plane                      1/1     Running             0          103s
kube-apiserver-kube-control-plane            1/1     Running             0          104s
kube-controller-manager-kube-control-plane   1/1     Running             4          103s
kube-proxy-fhjct                             1/1     Running             0          89s
kube-scheduler-kube-control-plane            1/1     Running             4          103s

It is best to wait until the calico containers are fully running before moving on.



End of the Basics

Every tutorial I have found for installing Kubernetes on bare metal stops here. The problem is that this is not the Kubernetes you will use in the exam or when working with it in the cloud, which is where the exam happens. The problem is that at this stage, our bare metal Kubernetes knows nothing about auto-provisioning of persistent volumes or assigning external IPs to our pods. If we leave our cluster in this state, we will spend hours manually writing local storage and hostpath configurations along with the only way of being able to access the pods outside of the pod network is through nodePort settings, which is very limiting.

But, we have a solution, two actually. Below we will install the two Kubernetes packages that give us these two auto provisioning packages that then make our bare metal cluster as close to a full cloud install as we are going to get.



Install OpenEBS

This package is provided by OpenEBS. This package provides the services and configuration changes needed to emulate block storage on bare metal Kubernetes clusters. Without this functionality, everything related to PersistentVolumeClaims has to be built manually. They must also be manually torn down when a pod is removed from the system. OpenEBS loads an auto-provisioning layer between Kubernetes and your pods. So far, all I can say is it just works.

As always you should check the vendor site or repository for new versions and breaking changes. Below I have included the simple instructions from their web page.

Install OpenEBS Operator manifest
kubectl apply -f https://openebs.github.io/charts/openebs-operator.yaml

Next we can run the command below to see which storage classes are available. We should see at least one that is an openebs.ip provisioner.

Show Storage Classes
kubectl get sc

                     NAME               PROVISIONER        RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
openebs-device     openebs.io/local   Delete          WaitForFirstConsumer   false                  98s
openebs-hostpath   openebs.io/local   Delete          WaitForFirstConsumer   false                  98s

In the case of our cluster we will use openebs-hostpath. This will become part of the Elasticsearch manifest that we create when we get to that part. Next we have to install the MetalLB network provisining system.



Let's add some worker nodes


Get Worker Join Token from kubeadm (Run on control plane only)
kubeadm token create --print-join-command

This command will output the full string that we need to run on the workers so they will join the cluster. It will look similar to the output below but the token and ca-cert-hash will be different.


Created Join Command
sudo kubeadm join 192.168.2.99:6443 --token 36txhy.dq1dbw9jwcswptjs --discovery-token-ca-cert-hash sha256:8ef5f2cb7efd439bcfdcfd4dd54204cb5e1305b653ccb483d5fa50613b715cf0

Run the command above on all worker nodes. Every worker where you run this should produce an output that contains the line This node has joined the cluster:. If there are no errors we can now verify if all of our nodes have joined the cluster by running the command below on the control plane.


Check Worker Nodes
kubectl get nodes

If all of the nodes have joined the cluster you should see an output that looks similar to this. This cluster has one control plane host and three worker nodes. This is what we see here.

                  NAME                 STATUS   ROLES           AGE     VERSION
kube-control-plane   Ready    control-plane   5m18s   v1.25.3
kube-worker-1        Ready    none          78s     v1.25.3
kube-worker-2        Ready    none          69s     v1.25.3
kube-worker-3        Ready    none          68s     v1.25.3


Restart kube-dns

I have run through this project dozens of times while writing this document. I still can not figure out exactly why this happens. If we do not restart kube-dns right here, the next section on installing MetalLB never works correctly. Everything else seems to work fine but not MetalLB. So, I decided that at this point, we just restart kube-dns, and everything has worked fine since then. Please run the command below on the Kubernetes control plane to restart kube-dns.

Delete kube-dns
kubectl delete pod -n kube-system -l k8s-app=kube-dns

Though we are deleting this pod it will be rebuit by kube-system. After this everything else should work fine. This is another situation where it is best to wait until all pods are in a Running state before moving on.


MetalLB

This package allows us to emulate the LoadBalancer network provisioning system that many pre-packaged applications depend on. Parts of this are difficult. Other parts are impossible on bare metal without this package. How this works without these packages is another one of things that is good to know, so check out these documents,Kubernetes Service and Kubernetes Exposing an External IP but by using either this service or by installing your cluster at a cloud provider that supports auto-provisioning the actual setup and tear down is automatic.

We will be installing this in Layer 2 mode. This pacakge does support other modes including BGP but Layer 2 will work fine for this project as it does for many production setups.

Install MetalLB
kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.13.7/config/manifests/metallb-native.yaml

This install should only take a few seconds and then we are all done with installing the package. But, we need to create two objects. One object that is a poll of IP's that MetalLB can assign to pods and one that sets up an L2Advertidement so Kubernete's knows that there are LoadBalancer addresses available.

MetalLB IP Pool Manifest

This one of the multi line EOF commands. Copy the whole thing and run this only on the control plane node.

cat <<EOF | kubectl apply -f -
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
  name: first-pool
  namespace: metallb-system
spec:
  addresses:
  - 172.16.1.100-172.16.1.254
EOF

The small manifest above is our IP pool file. This manifest lets MetalLB know what IP's it is allowed to hand out to the pods. Our Kubernetes cluster has two network interfaces. One side connects to the rest of the network; the other side connects to a VyOS router in software. That router hosts the 172.16.1.0/24 network and routes it back into the primary network so we can directly access the 172.16.1.0/24 network, but it is technically isolated from our home network.

Now we need to install the MetalLB L2 pool advertiser

cat <<EOF | kubectl apply -f -
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
  name: example
  namespace: metallb-system
spec:
  ipAddressPools:
  - first-pool
EOF

This brings us to the end of building a working Kubernetes cluster. We need to install one more thing that will help us keep track of the cluster health since we do not yet have a metrics and visualization pod set running. So, for one final step before part one of this project ends lets install the Kubernetes Metrics Server.


Install Kubernetes Metric Server

Now that Kubernetes is running we can do almost everything that we need to do with the kubectl command. In this case we are going to apply a manifest that installs the metrics server pod.

kubectl apply -f https://raw.githubusercontent.com/techiescamp/kubeadm-scripts/main/manifests/metrics-server.yaml

This can take a couple minutes to configure and load. Until that time, if you run kubectl top nodes you will see the error "Error from server (ServiceUnavailable): the server is currently unable to handle the request (get nodes.metrics.k8s.io)". This is expected. When the pod is finished loading running the same command will procuce an output similar to below.



Check Basic Node Stats
kubectl top nodes
                  NAME                 CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%
kube-control-plane   83m          2%     2414Mi          30%
kube-worker-1        29m          0%     1710Mi          21%
kube-worker-2        38m          0%     1710Mi          21%
kube-worker-3        27m          0%     1529Mi          19%

This is the last step. We now have a fully functional Kubernetes cluster with access to a bit of metrics so we can keep an eye on our cluster. In the next section.



Let's Install Monitoring on Our Cluster



It's Much The Same


In this part of the project, we will install Elasticsearch, Kibana, and Metricbeat for the initial monitoring solution. As I mentioned in the last part, this configuration is the one I am most familiar with, but there will be other monitoring solutions I test on this cluster. We will generally use the Kubernetes install guides available on the Elastic website, but we will not run the demo manifests. We will use them as examples, but we will change a few things, such as the namespace, cluster name, and resources, to meet the needs of this project. First, we need to install the Elasticsearch ECK operator, as it is the easiest and most reliable way to deploy an Elasticsearch cluster in a Kubernetes environment.

Since all of the items we will install are now Kubernetes manifests, the installations are fairly easy. There are only three steps in the document available here ECK documentation, and one of them is checking logs to make sure the ECK operator is working correctly. Though there are only three steps in that document, we will replicate them here so we can complete this with only one open documen

But, first we must install kube-state-metrics so Metricbeat can gather metrics from the Kubernetes cluster along with the system metrics for the nodes.



Install kube-state-metrics

The kube-state-metrics system collects metrics from the entire Kubernetes system and then provides them through a simple interface. Our initial metrics collector Metricbeat will then poll the kube-state-metrics endpoint and store them in Elasticsearch for use in the dashboards. kube-state-metrics is fairly easy to set up, we will start by using git clone to pull down the manifests. This will download the whole project but we will only run the manifests that are available in examples/standard.


Make temporary home folder
mkdir ~/tmp/

Switch to new folder
cd ~/tmp/

Clone kube-state-metrics from github repo
git clone https://github.com/kubernetes/kube-state-metrics.git

Switch to new kibe-state-metrics folder
cd ~/tmp/kube-state-metrics

Deploy kube-state-metrics standard manifest
kubectl apply -f examples/standard

Check kube-state-metrics deployment
kubectl -n kube-system get deployments kube-state-metrics

When everything is loaded, which does not take long, you should eventually see the output below.

                     NAME                 READY   UP-TO-DATE   AVAILABLE   AGE
kube-state-metrics   1/1     1            1           3h55m




The Elastic Parts


Install ECK custom resource definitions

This is a large file that would take a whole document to explain. That may come in the future but for how we will follow the documents at Elastic and install this manifest from the Elastic provided URL.

Install custom resource definitions.
kubectl create -f https://download.elastic.co/downloads/eck/2.4.0/crds.yaml

This command will print lines out to the terminal telling you what resources were created. For full details please see the install document linked above.



Install the ECK operator and its RBAC rules

ECK uses its own operator which makes use of custom resources to manage the configuration and deployment of Elastic stack applications. This manifest is a large file that you can download and browse through if you wish. For this project, we will just install it from the URL as shown below.

Install Elastic Operator
kubectl apply -f https://download.elastic.co/downloads/eck/2.4.0/operator.yaml

And finally we can check the logs to make sure this operator has started correctly. Notice how this lives in the elastic-system namespace.

Check operator logs in elastic-system namespace
kubectl -n elastic-system logs -f statefulset.apps/elastic-operator

This manifest is all that is required to install the Elasticsearch ECK operator on our Kubernetes cluster. If you have followed through with the whole document up to this point, we now have a fully operating Kubernetes cluster with kube-state-metrics and the Elastic ECK operator waiting for us to install an Elasticsearch cluster.


This will get a little harder

Our journey up until now has been fairly easy. Now we move on to the part that caused me to avoid Kubernetes until recently, the Object Oriented way that we have to build things. In the end, it makes great sense. Objects can be reused over and over, this is the heart of the whole Kubernetes system. But, when you come from a world of top-down config, it can be a bit intimidating at first. As I sorted this out in my brain, I came up with the image below, and everything clicked.



I came up with this block diagram while working through the complete Metricbeat manifest in this document . It helped guide me through role bindings, service accounts, config maps, cluster roles, etc. Everything that it took to configure Metricbeat for my system. Only a little has to be changed in the end, but it was the first manifest that I had trouble with throughout this project, so that is the one that I tore apart for the document. That document led to this and the next few applications we will deploy on this cluster.



Installing an Elasticsearch Cluster


The Bare Metal Hurdle

Now that the ECK operator is installed, we can easily create custom Elasticsearch manifests. Because of our custom setup, we can not use the default manifests offered by most vendors. This is because persistent volumes and external IP assignments could be slightly different. This stuff can be ind-boggling when you do it all manually, but this is where OpenEBS and MetalLB come into play for our personal Kubernetes cloud.


Let's Build our Own Elasticsearch Manifest

If you follow the quickstart instructions in the Elastic documentation, you always end up with a cluster and services with "quickstart" and "demo" in the object name. This manifest is great for testing things but not something that you want to put into production. So we will take the extra step to build custom manifests where needed. The Elasticsearch database is one of these custom manifests.


First we need to create our es-lab-cluster namespace
kubectl create namespace es-lab-cluster

Run the cluster Manifest
cat <<EOF | kubectl apply -f -
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  name: lab-cluster
  namespace: es-lab-cluster
spec:
  version: 8.4.3
  nodeSets:
  - name: data-node
    count: 2
    volumeClaimTemplates:
    - metadata:
        name: elasticsearch-data # Do not change this name unless you set up a volume mount for the data path.
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 10Gi
        storageClassName: openebs-hostpath
EOF


Now we can check the status of our cluster in its own namesace.

kubectl get pods -n es-lab-cluster

Initially, the pods will be in a state of pending. It can take a few minutes for Elasticsearch to configure the pods after they are launched. Through all of the reinstalls, I have randomly encountered a situation where one of the Elasticsearch nodes can not resolve the hostname. If one of the Elasticsearch pods is not starting, check the logs for that pod with the following command.

kubectl logs lab-cluster-es-data-node-# -n es-lab-cluster

with # being the number of the es-data-node that is not starting. If the logs show an error that contains Failed to resolve publish address then we can restart kube-dns again to resolve this issue.

Restart kube-dns
kubectl delete pod -n kube-system -l k8s-app=kube-dns

Manifest Breakdown

Below we will use a list to step through this whole manifest. We changed many things from the default quickstart manifest offered at the Elastic website. The most important of these changes is the storageClassName.

  • apiVersion: elasticsearch.k8s.elastic.co/v1

    This can be a little misleading because it is not just the version but also which API library to use. In this case we are using the custom library that was installed when we installed ECK

  • kind: Elasticsearch

    This tells Kubernetes what kind of manifest this is. This Elasticsearch kind is similar to a StatefulSet but is configured specifically for Elasticsearch.

  • metadata:
    • name: lab-cluster

      This is be the identifying part of the name of this cluster. We will see that later when we inspect the cluster that we install. In the default manifest this field has the value quickstart

    • namespace: es-lab-cluster

      This is the Kubernetes namespace where our cluster will live. It is not good practice to run everything in the default namespace so we might as well start off the right way. When we get to using kubectl we will access this namespace with the -n option: kubectl -n es-lab-cluster

  • spec:
    • version: 8.4.3

      This is the version of Elasticsearch that we are installing.

    • nodeSets:
      • - name: data-node

        This sets the unique identifying name of all of the nodes in this cluster. You will see this when we expect the cluster.

      • count: 3

        This is the number of nodes that will be created with these specifications.

      • volumeClaimTemplates:
        • - metadata:
          • name: elasticsearch-data

            This key is the name of the volume claim that elasticsearch will use when building these nodes. More or less the hard drive space that Kubernetes will carve out for elasticsearch to use. Not all deployments need storage, so this is an efficient way to parse it out on an as-needed basis. Without OpenEBS, this takes building multiple custom manifests. With OpenEBS, we only need the storageClassName, which is explained below.

        • spec:
          • accessModes:
            • - ReadWriteOnce

              Volume can be mounted as read/write by a single node.

          • resources:
            • requests:
              • storage: 10Gi

                Allow this container to access up to 50 Gigablytes of storage.

          • storageClassName: openebs-hostpath

            This is the most important part of the manifest for our setup. Because we are using OpenEBS hostPath mode every container that needs a volumeClaim must request that claim from this storageClass. We saw this earlier when we ran kubectl get sc. As a side note, the openebs-hostpath storage class could be set as the default storage class for any name space. We do not do that here as it is more informative to know exactly what the pod is doing.

And that ends our breakdown of the manifest that will build our 2 node Elasticsearch system named lab-cluster with 10GB of storage in the namespace es-lab-cluster.


Now lets connect Kibana to our Cluster

Since we are working with objects and connecting objects together instead of hard-coded network addresses installing Kibana is just as easy, if not easier, than installing the Elasticsearch cluster. We only have to reference the name of the cluster, and it all clicks together. Since we are using MetalLB to hand out external IP addresses, there is only one major change in this manifest.

cat <<EOF | kubectl apply -f -
apiVersion: kibana.k8s.elastic.co/v1
kind: Kibana
metadata:
  name: kibana-lab
  namespace: es-lab-cluster
spec:
  version: 8.4.3
  count: 1
  elasticsearchRef:
    name: lab-cluster
    namespace: es-lab-cluster
  http:
    service:
      spec:
        type: LoadBalancer
EOF

As you can see, this is much the same as the one for Elasticsearch. We named it kibana-lab, which is how we will reference this. The namespace is the same as where we put the cluster, es-lab-cluster, we reference Elasticsearch as lab-cluster, and we are launching one copy of this container. The interesting part is http.service.spec.type. MetalLB handles this. I have found that the LoadBalancer type is the easiest to work with for this project. Go ahead and launch this container by running the command above if you have not already, and we will see how this works.

Now we can take a look at the services and see what MetalLB assinged Kibana for an external IP address. Remember from above that we only gave it the pool of 172.16.1.100 - 172.16.1.254 to work with.

kubectl get service -n es-lab-cluster

That will produce and output simiar to below.

NAME                           TYPE           CLUSTER-IP       EXTERNAL-IP    PORT(S)          AGE
kibana-lab-kb-http             LoadBalancer   10.100.150.108   172.16.1.100   5601:32151/TCP   19s
lab-cluster-es-data-node       ClusterIP      None             none         9200/TCP         9m2s
lab-cluster-es-http            ClusterIP      10.104.186.230   none         9200/TCP         9m4s
lab-cluster-es-internal-http   ClusterIP      10.111.127.250   none         9200/TCP         9m4s
lab-cluster-es-transport       ClusterIP      None             none         9300/TCP         9m4s

You can see that there is only one service with an external IP, kibana-lab-kb-http. This is because we told it to assign one with the http.service.spec.type: LoadBalancer specification. At this point, you should be able to access kibana through the EXTERNAL-IP. In this case, that would be https://172.16.1.100:5601. There will be a security error because these are all self-signed certificates. Accept the browser error and wait for Kibana to load.



It takes the kibana pod a bit of time to load. At first you may see Kibana server not ready. If this happens wait a few minutes and refresh the browser, it should load the page above. The first thing we need to do is get the password that ECK generated during the Elasticsearch install. We will use this for the first login to Kibana.

Get Elastic secret with kubectl
kubectl -n es-lab-cluster get secret lab-cluster-es-elastic-user -o go-template='{{.data.elastic | base64decode}}'; echo ""

This prints the auto-generated Elasticsearch password to the terminal. We add the echo at the end of the kubectl command to force a new line to print. Otherwise, the password runs into the shell prompt and is hard to read.

Log in to Kibana with the username elastic and the password we printed above. We are going to create a kube-metrics user, so we are not always logging in with the elastic user. Click on the hamburger menu on the top left and scroll down until you find Stack Management, as shown in the image below.



When the new window loads choose Users from the new menul on the left and then click the Create user button on the top right of this new window.



In this new window, we will create a new user named kube-metrics. In the case of this project, we will leave full name and email address blank, and I have chosen to use a password of notasecurepassword for this project. You can use whatever you want here. We are going to give this user superuser privileges. Not a good idea in production, but for this project, it will be the easiest to work with.



After filling in this information click Create user and we are done with this window for a little bit. Next we move on to installing the Metricbeat pods. Since we made changes to the elasticsearch cluster that we installed we also have to make a few changes to the default metricbeat manifest.


Install Metricbeat

Metricbeat is a large manifest. It is a collection of 11 objects that all connect, allowing Metricbeat to collect metrics from all of the host systems and the Kubernetes pods. So, in this case, instead of running the manifests from the command as we have up until this point, we will download the whole manifest to our kube-control plane. Let's navigate back to the user's home folder where we have been working and download this manifest.

cd ~

Now we can download the manifest.

curl -L -O https://dangerousmetrics.org/downloads/lab-metrics-metricbeat-kubernetes.yaml

This is a large manifest that I have completely broken down and explained in the document Metricbeat Kuberneties Manifest Teardown. But, for this project we do not need the full teardown. There are a few changes that we have made to make this work with our non default setup. Below is a diff between the custom manifest above and the one available as a demo from the Elastic website.

77a78
>       ssl.verification_mode: none
138a140
> ---
159c161,170a
<       dnsPolicy: ClusterFirstWithHostNet
---
>       dnsPolicy: "ClusterFirstWithHostNet"
>       dnsConfig:
>         searches:
>           - es-lab-cluster.svc.cluster.local
>           - kube-system.svc.cluster.local
>           - svc.cluster.local
>           - cluster.local
>       tolerations:
>         - key: node-role.kubernetes.io/control-plane
>           effect: NoSchedule
170c181
<           value: elasticsearch
---
>           value: https://lab-cluster-es-internal-http
174c185
<           value: elastic
---
>           value: kube-metrics
176c187
<           value: changeme
---
>           value: notasecurepassword
361a373
>

As you can see there were a few changes made here other than the Elasticsearch connection information. We also added output.elasticsearch.ssl.verification_mode: none, which of course disables ssl verification. Setting up the self-signed CA on this cluster is a project in its self, just like anything that has to do with SSL. Since it is beyond the scope of this project we will turn off ssl.verification on the parts that connect to Elasticsearch. Internally the TLS transport is still SSL.

There are also changes to the dnsPolicy and configuration. Since the metricbeat pods run in the namespace kube-system while Elasticsearch and Kibana run in the es-lab-cluster namespace we have to tell these pods where to look to find the Elasticsearch cluster that we want to write to, in this case it lives at es-lab-cluster.svc.cluster.local.

Apply lab-metrics-metricbeat-kubernetes.yaml
kubectl apply -f lab-metrics-metricbeat-kubernetes.yaml

Now we should be collecting some data. Before we can visualize that data we need to log into one of the pods to install the default metricbeat dashboards into Kibana. There are a few different ways to use metribeat to install these dashboards and the visuals that go with them. We are going to go the route of opening a shell into one of the metricbeat pods and running the commands directly there. First we need to find a metricbeat pod to log into. Run the command below to find a suitable pod.

kubectl get pods -n kube-system | grep metricbeat

This will produce an output similar to below.

metricbeat-mlg5c                             1/1     Running   3 (9m39s ago)   4h2m
metricbeat-rfcrp                             1/1     Running   0               4h2m
metricbeat-tzslw                             1/1     Running   0               4h2m

Your pod names will be different because of the random identifier. Just chose one, it does not matter which and run the command below on the kube-control-plane server.

kubectl -n kube-system exec --stdin --tty metricbeat-????? -- /bin/bash

This will open a shell into that metricbeat pod. You will actually see the prompt of the worker node that this pod is running on. Once you are connected to the pod terminal run the command below.

metricbeat setup --dashboards \
  -E output.elasticsearch.hosts=['https://lab-cluster-es-http:9200'] \
  -E output.elasticsearch.username=kube-metrics \
  -E output.elasticsearch.password=notasecurepassword \
  -E output.elasticsearch.ssl.verification_mode=none \
  -E setup.kibana.ssl.verification_mode=none \
  -E setup.kibana.host=https://kibana-lab-kb-http:5601

This is the same command that you can find at the Elastic website that describes how to manually install metricbeat dashboards. This version has been modified to work with our es-lab-cluster namespace. I find this a much easier way to do this if metricbet pods are already running in the cluster. Notice that here we have also diabled ssl verification on this line -E setup.kibana.ssl.verification_mode=none.

This process may take a couple minutes as it loads dozens of dashboards and hundreds of saved objects that go with these dashboards. Once the command completes type exit to close this pod and return to the control plane.

Now we can switch back to the Kibana web browser that we opened earlier and navigate to the Dashboards section and search for either Kubernetes or System dashboards, both types should have some information. Below is a screenshot of the [Metricbeat Kubernetes] Overview dashboard.




If you want to browse the raw metricbeat data just lick on the menu again and select Discover from the menu. Since metricbeat is our only public index Kibana will default to that index and you can browse he raw data as shown below.



Coming to the End

We are now at the end of this project. If you have followed along, you should now have launched a fully functioning Kubernetes cluster with Elasticserch, Kibana, and Metricbeat. You should be able to look at metrics in the Kibana dashboards and better understand how all of this works together. If you would like to see any changes or customizations to this document, leave me a message on whatever social media platform where you find this.

Have a wonderous day every day.