In most cases, Kubernetes is installed on a cloud infrastructure somewhere. These days this is a one-click operation at most cloud providers. Of course, it was designed by Google and initially catered to the GCP system though it has been expanded to work seamlessly with AWS, Azure, and many other cloud services. But, sometimes, when we want to know the raw core of how things work, we need to install it ourselves on bare-metal servers or, in the case of this project, VMs inside Proxmox.
Many different tutorials all over the Internet walk through installing a Kubernetes cluster. Most of them are geared toward minikube on a laptop, workstation, or a cloud-based install. Installing on bare metal has a couple of hurdles that we need to overcome to be able to use this as a true Kubernetes cluster. We will cover these changes in detail when we get to those sections.
The main reason for building this cluster is to practice for the Kubernetes certification exam. However, it will live on after I have mastered that certification. It will live on as a platform to test databases and different ways of processing data into said databases. Also, it will become a major part of our development, staging, and production system for this website and other website projects. So, without further hesitation, let us get to building this cluster.
Over the time of writing these projects, I have come up with a legend that I will use from here on out, and I will update my previous projects with the same legend. All commands that use a single line will be in this color run this command.
Terminal output will look like the example below.
NAME READY STATUS RESTARTS AGE
coredns-565d847f94-rdvfc 1/1 Running 0 9m47s
coredns-565d847f94-svs8x 1/1 Running 0 9m47s
etcd-kube-control-plane 1/1 Running 0 10m
Amusingly the color of the commands is an error from highlight.js letting me know that my code is not inside of a pre block. This color draws attention so I believe was good to leave it this way to identify commands.
When we have commands that are multi line and use EOF, they will look like this.
cat <<EOF | kubectl apply -f -
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
name: first-pool
namespace: metallb-system
spec:
addresses:
- 192.168.2.180-192.168.2.189
EOF
Since Kubernetes will take over most of the work on this development server, we dedicated large parts of the resources to the cluster workers. Since all workers have the same VM configuration, it will only be listed once. This project has been designed to be able to run on a Proxmox host with 13 available cores and 13 processor cores. With a crafty Proxmox setup this could work fine on a decent laptop.
When the operating system is installed on these nodes, they should all be set up for static IP addresses. DHCP is easy but will constantly overwrite our resolver configuration, which causes problems. We also do not want the IP to change for some reason during a reboot. The easiest way to do this is to set them up with static IP addresses during OS installation. If you are working through this project on Proxmox, you can follow the guide below to install and configure a VyOS software router to separate your Kubernetes traffic from your primary network. If you are following this tutorial on another platform where you do not have this option, or you are using bare metal with a hardware router, then the VyOS configuration can serve as a guide on how to build this routing configuration in your lab.
If you want to follow this document to the letter use the IP assignments laid out below.
Though installing and configuring VyOS on Proxmox adds an extra step to the process, this is a good way to separate the Kubernetes network from your primary network. Setting up this project this way has two main benefits. Of course, the separation from the primary network, but the other added benefit is that it keeps internal pod communication completely within Proxmox. Adding VyOS in front of the Kubernetes cluster also allows full control of all routing, DHCP, and DNS decisions inside the Kubernetes network.
As part of this Kubernetes project, a vyOS VM is set up as the 'backplane' of the cluster. This routing configuration is very project-and-scenario-dependent because of our lab setup. This private network that we will build ends up double-NATed to the outside world. This type of NAT setup is an important consideration if you're hosting production services off of it but of very little importance in a lab setup where you're learning the basics.
Below are the detailed steps and screenshots that walk through the whole process of installing VyOS on a Proxmox host. The network that we will assign to the INSIDE interface, which connects the Kubernetes hosts, will be 172.16.1.0/24. In our lab network, the primary network is 192.168.2.0/24. The one thing that has to be done outside of Proxmox is to build a static route in your primary router that routes all 172.16.1.0/24 traffic to 172.16.1.1, which will end up being the address of the eth1 VyOS interface. If the static route is not setup the Kubernetes nodes will partially work but you will not be able to ssh into them from the primary network and assigning EXTERNAL IP's to the pod will not work outside of the 172.16.1.0/24 netowrk. Just a reminder that this static route is needed if you plan on following this whole project. Also remember that 192.168.2.0/24 is our internal network, your's could be anything in the IANA private networkspace. Though for a consumer router these are normally something that lives int he 192.168.0.0/16 address space.
The first thing that we need to do is download the VyOS ISO. We will mount this ISO on Proxmox and then build a virtual machine to install it. Head on over to the Proxmox ISo downloads page and download Proxmox VE 7.2 ISO Installer. Save this somewhere on your workstation. When the download is complete, let's move on to Proxmox and upload this new image so we can mount it to the VM we will create.
Where you store your ISO's is based on your local Poxmox setup, so you should know where to upload this ISO so we can use it to build a VM. If you are new to Proxmox check out this blog article that explains where to find this and how to set it up if it is not available How to Upload ISO Files to ProxmoxVE.
As you can see this is a very simple bridge. We do not set the gatway or any bridge ports here. What this does is allows us to connect our primary Proxmox interface, in our case vmbr0, and this new bridge vmbr4 together as eth0 and eth1 on the Proxmox software router. After this VyOS handles all of the routing and gateway duties.
Next we need to create the VyOS VM and add a secondary network interface to it.
The series of images below step through this process.
You can choose whatever you want for the VM ID and the Name. Mcp is, of course, the Proxmox host where this VyOS router will run. I chose 300 for the VM ID because it separates this project from other projects. The Kubernetes control plane, worker 1, worker 2, and worker 3 will have VM ID 301, 302, 303, and 304 respectively
Here, we choose the VyOS image used to install this router OS. Your ISO may be named slightly differently than the one here, but as long as you downloaded it from the official VyOS page linked above, it should be fine. That is the only thing that changes on this screen. Click next after choosing the correct ISO image.
This screen is where we choose the type, amount, and storage location. We chose to carve the storage out of our ZFS pool. It does not matter where the storage comes from, only that there are 4GB available at whichever location you choose. Next, we move on to choosing the CPU layout.
We will make no changes here as this network so small that 1 CPU core should handle the routing just fine.
In most cases this defaults to 2GB whe bulding a Proxmox VM. Though this default can be changed so just make sure that the Memory is set at 1024 and click next to move on to the network settings for this VM.
Bridge selection normally defaults to vmbr0. This identifier may be different depending on how Proxmox is configured. In our case, vmbr0 is one of the network interfaces that connect to the primary lab network 192.168.2.0/24. Choose what works for your setup, remembering that the purpose of this router is to bridge the 172.16.1.0/24 network into the primary network, in our case 192.168.2.0/24.
Note: Thoughout this project you can uncehck Firewall or leave it. This setting doesn't matter much if you don't have Proxmox's firewall active/set with rules, but this is more following the idea that none of this matters considering our NAT situation, explained further down.
After choosing the network bridge click on Next. This takes us to the build VM confirmation page. Confirm this VM and wait for Proxmox to build it.
Click on Add, and we are finished setting up this VM for VyOS to route our Kubernetes network. When we are all done, the VM Hardware information in Proxmox should look like the image below. Your mac address will be different, but otherwise, your VM should look very similar to this one.
Use whatever method that you like to start the VyOS vm and switch over to the console for this VM.
It may take a few seconds for VyOS to boot. Once it is finished it will presetnt us with a console login as shown below.
The first thing we need to do is install VyOS as a permanent router. So lets log into the router with the default username and password of vyos and vyos.
After the login is complete run the command below to start the installation.
install image
The installer will ask a few questions, the default answers should work for everything. It will ask you to set a new password, choose whatever works for you. For this project I we will be using notasecurepassword as the password wherever we need one. If you run into any issues with the installer please check out the VyOS document Permanent installation.
Installing this system will require a reboot at the end. Once the login prompt appears on the console again, let's login with the username vyos and the password set during the installation. Now we can move on to configuring the router.
conf
Through the rest of this part of the document we will setup a bridge between the primary network and the network that we will create on this router. We will create the network 172.16.1.0/24 and route it through our primary netowrk, in our lab this is 192.168.2.0/24. First we will setup some very basic firewall rules that allow all traffic to pass between these two networks. For the sake of this project you can look at the primary network as the Internet as that is what we are trying to emulate with this project.
set firewall name LAN-LOCAL default-action 'accept'
set firewall name LAN-WAN default-action 'accept'
You would NOT want to do this in a production environment. These settings allow ALL traffic through from both directions. However, as we are behind two separate routers, with the first router in line the gateway to our WAN address properly firewalled, this is only allowing all traffic from both private networks (192.168.2.0/24, 172.16.1.0/24) and not allowing outside world traffic.
Every network has its own specific nuances just like our lab set-up, so I recommend reading up on VyOS docs if you're planning to replicate what we do here outside of a lab environment.
Now lets move on to actually setting up the network. This set of commands works with the eth0 interface that we connected to our primary network.
set interfaces ethernet eth0 address dhcp
set interfaces ethernet eth0 description 'OUTSIDE''
set interfaces ethernet eth0 duplex 'auto'
set interfaces ethernet eth0 mtu '1500'
set interfaces ethernet eth0 speed 'auto'
Commit and save this, and you should find that the interface will soon have an IP address assigned by the first gateway. Depending on the gateway, you should be able to set this as a static IP assignment (HIGHLY RECOMMENDED).
Now, on to configuring our 172.16.1.x/24 network for Kubernetes.
set interfaces ethernet eth1 address '172.16.1.1/24'
set interfaces ethernet eth1 description 'INSIDE'
set interfaces ethernet eth1 duplex 'auto'
set interfaces ethernet eth1 mtu '1500'
set interfaces ethernet eth1 speed 'auto'
Regarding the NAT situation mentioned above, the following are the configuration settings for basic NAT on a VyOS instance. Once again, I'll mention that this is a very specific case where this hosts a private network behind another private network.
set nat source rule 100 outbound-interface 'eth0'
set nat source rule 100 source address '172.16.1.0/24'
set nat source rule 100 translation address 'masquerade'
To outline, this allows the Kubernetes pods routed through this network to communicate with the OTHER network and thus the outside world. As was mentioned earlier, for this to work correctly, you need a static route between your primary router and this VyOS IP address. This static route boils down to telling your primary router to route all 172.16.1.0/24 traffic into our VyOS router at IP address 172.16.1.1, which we set above.
How to set a static route will vary heavily (or may not even be a feature at all) on some consumer-grade routers, so this part is all you- Google-fu to the rescue should suffice. Our router (a Linksys EA9300, for now at least) has the option to set static routes and it is a very straightforward process.
Most of these configuration options are in the VyOS Quick Start guide. I give major props to the VyOS team for writing very detailed and easy to understand documentation.
When we move on to setting up the DNS forwarder below, we will need the IP address that the primary router assigned to eth0 of this VyOS router. Let's go ahead and grab that IP now so we do not have to exit out of configuration mode during the DNS setup. On the VyOS router console run the commands below.
commit
save
exit
Exiting will take us out of the configuraton console back to the command console. Once we are back at a prompt run the command below to query the information on interface eth0, which is where we had the primary network assign a dhcp address. If you chose to assign eth0 statically you will know the address we are trying to find here.
show interface ethernet eth0
This command should return information similar to what is below.
eth0: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 0a:46:d2:bd:a7:8c brd ff:ff:ff:ff:ff:ff
inet 192.168.2.164/24 brd 192.168.2.255 scope global dynamic eth0
valid_lft 2512162sec preferred_lft 2512162sec
inet6 fe80::846:d2ff:febd:a78c/64 scope link
valid_lft forever preferred_lft forever
Description: OUTSIDE
RX: bytes packets errors dropped overrun mcast
8466883734 2145791 0 0 0 0
TX: bytes packets errors dropped carrier collisions
251309236 1550768 0 0 0 0
We are interested in the 3rd line where this example shows 192.168.2.164/24. This is the IP address that DHCP assigned to that interface. In most cases what you see here will be completely different but this is the IP that creates the gateway between the outside network and our Kubernetes network. This is also the ip address where the static route that I have mentioned a few times needs to be routed. Static Route 172.16.1.0/24 -> 192.168.2.164.
All of our Kubernetes nodes will need to talk to the outside world and resolve from real-world DNS servers before the Kubernetes kube-dns package is online. Since our 172.16.1.0/24 network is hidden behind another network, we must let it know how to get DNS through its own gateway. The settings below set this up in our VyOS software router.
conf
set service dns forwarding allow-from '172.16.1.0/24'
set service dns forwarding allow-from '192.168.2.0/24'
set service dns forwarding cache-size '2000000'
set service dns forwarding dnssec 'process'
set service dns forwarding listen-address '172.16.1.1'
set service dns forwarding name-server 'xxx.xxx.xxx.xxx'
set service dns forwarding no-serve-rfc1918
set service dns forwarding source-address '192.168.2.164'
This is our basic, working DNS configuration. If curious on anything, take a peek at the VyOS document on Configuring DNS.
And now we can finally save all of this VyOS configuration and prepare to move on to setting up the VM's.
commit
save
exit
We can test this from VyOS by running a nameserver dig pointed at our 172.16.1.1 address. Run the command below, it should return a result. If it times out then something is not correct.
dig @172.16.1.1 google.com
This is a final reminder to not forget to setup the static route in the router that VyOS eth0 is connected to. This is actually vmbr0 in the proxmox server but since it does its own bridging we just need to focus on the IP we retreived from eth0 and make sure 172.16.1.0/24 is statically routed to that ip address.
And done! You should be able to assign the 'vmbr4' interface to any VM of your choosing, assign it a static IP address in your OS of choice and be able to communicate across both networks. Now that all of this is in place it keeps our Kubernetes traffic to its self unless specifically requested to be routed outside with an EXTERNAL ip which we will get to later in this document
All VMs in this project are running Ubuntu 22.04. As of this writing Kubernetes 1.25 is the latest version so that is what we will be installing.
Beyond the cluster its self we will also be installing kube-state-metrics, Elasticsearch, Kibana, and Metricbeat for the initial monitoring solution. Though this will change over time as we install and test other monitoring solutions such as PostgreSQL with TimescalDB, InfluxDB, and other packages that draw my interest.
This defines the basics of our cluster with a bit of information on what we will be doing.
Before starting the install, all of the nodes need a bit of basic setup.
Ubuntu 22.04 enabled the needsrestart system by default. This system generates the menu that pops up at the end of updates and new package installs that inform you what daemons need to be restarted. These menus are good information to know but not something we want to pop up every time we install a package because it normally pops up every time. The command below disables this by telling it to restart all of the daemons whenever it needs to. This is great in this lab but maybe not so great in production, though most of that is automated now anyway.
sudo sed -i 's/#$nrconf{restart} = '"'"'i'"'"';/$nrconf{restart} = '"'"'a'"'"';/g' /etc/needrestart/needrestart.conf
This command has been tested extensivley with Ubuntu 22.04 server. I can not guarantee its operation on any other distrubition or version.
This should be number one on all Operating System installs. Run the commands below to update the repositories and install any upgrades that are available.
sudo apt update
sudo apt upgrade -y
-y tells apt-get to aswer yes to questions about installing package dependanices
The first thing that we need to do is to disable swap memory on all Kubernetes nodes permanently. Swap memory allows the operating system to store memory blocks on permanent storage. You may think this only comes into play when the system is under a high memory load, but this is not the case. The OS will also use swap memory to move old objects out of Memory to free up space for the disk cache and other operations. OS memory swapping is similar to how JAVA treats young-generation and old-generation memory segments. Seeing as how permanent storage of any kind, even the fastest NVME, is much slower than the system RAM that sits right on the CPU bus, having to retrieve anything from a swap drive puts a tremendous load on the system as we are now dealing with the IO wait of the storage device and controller. It is better to make sure you have enough physical ram to carry the load you will present to the cluster and turn off the swap altogether.
sudo swapoff -a
Next we need to disalbe swap in fstab so it will not load on a reboot. Initially I had us using an editor here to go and edit the files directly. I decided to change that to a sed command here instead. Running this command has the same effect as editing /etc/fstab and adding a comment # character in front of the /swap.img line. This has been tested many times with Ubuntu 22.04, I can not guarantee that it will work on any other version, though it should.
sudo sed -i '/\/swap.img/ s/^/#/' /etc/fstab
Now we can check that the change was made using the cat command. Run the command below to check the contents of /etc/fstab.
cat /etc/fstab
# /etc/fstab: static file system information.
#
# Use 'blkid' to print the universally unique identifier for a
# device; this may be used with UUID= as a more robust way to name devices
# that works even if disks are added and removed. See fstab(5).
#
# <file system> <mount point> <type> <options> <dump> <pass>
# / was on /dev/sda2 during curtin installation
/dev/disk/by-uuid/04dc156d-780c-43aa-acaf-5db03c915200 / ext4 defaults 0 1
#/swap.img none swap sw 0 0
Your fstab file will look different from this as the uuids for our disks will differ. We only need to check that a # has been added in front of /swap.img. If all looks good, then we can head off to set up the node host networking.
When working with Elasticsearch, it is a good idea to increase vm.max_map_count to the highest setting. This setting limits how many memory-mapped areas a process may have. The default setting works well for many applications, but part of the Elasticsearch query speed is related to how it maps indices into memory. Small Elasticsearch workloads may work fine without changing this setting, but we will change it here, so we do not have to do it later.
sudo sysctl -w vm.max_map_count=262144
echo "vm.max_map_count=262144" | sudo tee -a /etc/sysctl.conf
These first few steps prepared the operating system for the next steps. We will still be modifying system files and restarting daemons. These few things make those steps easier. Now we move on to installing modules and changing system-level variables that pertain to networking.
The first thing you may wonder is why we are building a bridge on top of a bridge on top of a bridge. This setup is not uncommon in any cloud-based network. All network segments are bridges, with normally one gateway into each bridge. This design keeps the different unique networks separate while still allowing access in and out through one or multiple load-balanced gateways. It just looks a little different here because we are doing all of this inside of one server.
To work as a cluster, the Kubernetes system builds its own network that can we selectively connect to the primary network through services and endpoints. For this to work correctly, all nodes' network settings need to be in IP forwarding bridge mode. Setting this up is broken down into a few commands, which are outlined below.
cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf
overlay
br_netfilter
EOF
This command creates the file at /etc/modules-load.d/k8s.conf and populates it with overlay, and br_netfilter. Adding them here tells Linux to load these modules into the kernel on the next reboot.
This module enables transparent masquerading of the network stack and facilitates VLAN traffic for communication between pods.
This is a file system overlay module that allows the PODS to create their own mounts on top of the host file system.
We can use the cat command to verify the values entered by this command.
cat /etc/modules-load.d/k8s.conf
The output should show overlay and br_netfilter, as this is what we sent to the file with the cat and sudo tee commands above. Now we need to load these modules into the system. Yet, with the power of Linux and modprobe, we do not have to reboot. Run the commands below to load these modules into the kernel without rebooting.
sudo modprobe overlay
sudo modprobe br_netfilter
If there are no errors these commands will not return any values. These commands force the modules to load on a running system. During a reboot they will be loaded from /etc/modules-load.d/k8s.conf.
Next we have to modify sysctl params to turn on the network bridging that we created with the commands above. We will cat these commands into a sysctl file at /etc/sysct.d/k8s.conf. This is in a different location from the ones above as these work directy on system variables.
cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward = 1
EOF
Now we need to tell the system to load these new configurations with the command below.
sudo sysctl --system
This command will output every change that was made. Near the bottom, we will see our net.bridge and net.ipv4_forward configurations. You can see that they all =1, which means that our changes are enabled. Now we will move on to installing the container runtime. This project will use CRI-O. All my testing shows that this runtime works with all of the pods we will install with this project. These commands should be run on all nodes.
These first few commands enable the repositories for the CRI-O container runtime. This package is what runs all of the containers. Sort of like Docker but not. You can see that the two variables OS= and VERSION= are used in the tee output to set the version that will be added to the repo list.
OS="xUbuntu_22.04"
VERSION="1.25"
cat <<EOF | sudo tee /etc/apt/sources.list.d/devel:kubic:libcontainers:stable.list
deb https://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable/$OS/ /
EOF
cat <<EOF | sudo tee /etc/apt/sources.list.d/devel:kubic:libcontainers:stable:cri-o:$VERSION.list
deb http://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable:/cri-o:/$VERSION/$OS/ /
EOF
Next we need to add the gpg keys for these repos.
curl -L https://download.opensuse.org/repositories/devel:kubic:libcontainers:stable:cri-o:$VERSION/$OS/Release.key | sudo apt-key --keyring /etc/apt/trusted.gpg.d/libcontainers.gpg add -
curl -L https://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable/$OS/Release.key | sudo apt-key --keyring /etc/apt/trusted.gpg.d/libcontainers.gpg add -
Now we update the repositories and install the cri-o runtime engine.
sudo apt-get update
sudo apt-get install cri-o cri-o-runc cri-tools -y
And finally we reload systemd and enable cri-o to tie all of this initial setup together.
sudo systemctl daemon-reload
sudo systemctl enable crio --now
At this point the core setup is complete. Now it is time to install the Kubernetes packages on all nodes.
Now we can start to install the Kubernetes system. First, we install core dependencies that Kubernetes and other parts of this install process depend on. At this point, we are still running these commands on all nodes that we set up for this Kubernetes cluster. When we start running commands on individual nodes, which will be soon, it will be noted.
sudo apt-get update
sudo apt-get install -y apt-transport-https ca-certificates curl
sudo curl -fsSLo /usr/share/keyrings/kubernetes-archive-keyring.gpg https://packages.cloud.google.com/apt/doc/apt-key.gpg
echo "deb [signed-by=/usr/share/keyrings/kubernetes-archive-keyring.gpg] https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee /etc/apt/sources.list.d/kubernetes.list
Now that we are done setting up repositories and preparing the system we will install the Kubernetes tools.
sudo apt-get update -y
sudo apt-get install -y kubelet kubeadm kubectl
This brings us to the end of all of the system setup and package installs that are needed to create a Kubernetes cluster. Next we will initiialize the Kubernetes control plane.
We have to initialize the control plane first so we can generate the tokens and networks needed to add the workers when we get to that point. First, we need to set a few environment variables. We have to make an important decision here when it comes to POD_CIDR. The POD_CIDR defines the internal network that pods will communicate on. We want it to be distinct from any other network in our infrastructure. Since my lab infrastructure is based on 192.168.2.0/24, we will assign the POD_CIDR to a network in the 10.0.0.0/8 private IP set. We only need a few addresses, so we will work with 10.200.0.0/16. The NODENAME is set by telling the shell to run the hostname -s command.
Since we statically assigned all of the address on these nodes we know that the IP address of the control plane is 172.16.1.10, which connects to our VyOS router at 172.16.1.1, of course that was already setup during the VM build when we selected the gateway
IPADDR="172.16.1.10"
NODENAME=$(hostname -s)
POD_CIDR="10.200.0.0/16"
Now we can run the initializer command. As you can see, it uses the variables we just set above. These are stored in the user environment and can be called by this user until the session ends.
sudo kubeadm init --apiserver-advertise-address=$IPADDR --apiserver-cert-extra-sans=$IPADDR --pod-network-cidr=$POD_CIDR --node-name $NODENAME
This can take a few minutes to run as it configures everything and boot up the control plane but eventually you will see an output similar to the one below.
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
Alternatively, if you are the root user, you can run:
export KUBECONFIG=/etc/kubernetes/admin.conf
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join 192.168.2.99:6443 --token l5c9su.exos28jxx7q2jzl3 \
--discovery-token-ca-cert-hash sha256:8ef5f2cb7efd439bcfdcfd4dd54204cb5e1305b653ccb483d5fa50613b715cf0
I have left the token and cert hash in this example as this cluster will be destroyed as soon as this project is over. In no situation should you ever leave sensitive information like this on a live cluster.
As the output says, there are a couple more things that we need to do. First we need to create a folder and move some files to it. Since we are not running this project as root we will use the first set of commands.
We are only working with the node hosting the control plane at this point. Do not run these commands on the workers.
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
Now we can check that all of our steps worked correctly by running the command below.
kubectl get po -n kube-system
You should see nearly the same output as what is shown below.
NAME READY STATUS RESTARTS AGE
coredns-565d847f94-4nnxp 1/1 Running 0 20s
coredns-565d847f94-t4v8q 1/1 Running 0 20s
etcd-kube-control-plane 1/1 Running 0 34s
kube-apiserver-kube-control-plane 1/1 Running 0 35s
kube-controller-manager-kube-control-plane 1/1 Running 4 34s
kube-proxy-fhjct 1/1 Running 0 20s
kube-scheduler-kube-control-plane 1/1 Running 4 34s
So, now we have a Kubernetes almost cluster running. By default, the control plane will not schedule any pods to itself. Though the control plane can run pods, we have to taint the pod to allow it to run on the control plane. We will do that later for one pod when we install Metricbeat.
This is a fairly easy install as it is available as a Kuberntes manifest from projectcalico.org.
kubectl apply -f https://docs.projectcalico.org/manifests/calico.yaml
This manifest can take a minute or so to initialize and launch. We can check the status by running the same get pods command we did above, kubectl get po -n kube-system. You should see a similar output to the one above, but we will show two new pods running here as the top two entries.
NAME READY STATUS RESTARTS AGE
calico-kube-controllers-59697b644f-grt8l 0/1 ContainerCreating 0 9s
calico-node-kjn8s 0/1 Init:0/3 0 9s
coredns-565d847f94-4nnxp 1/1 Running 0 89s
coredns-565d847f94-t4v8q 1/1 Running 0 89s
etcd-kube-control-plane 1/1 Running 0 103s
kube-apiserver-kube-control-plane 1/1 Running 0 104s
kube-controller-manager-kube-control-plane 1/1 Running 4 103s
kube-proxy-fhjct 1/1 Running 0 89s
kube-scheduler-kube-control-plane 1/1 Running 4 103s
It is best to wait until the calico containers are fully running before moving on.
Every tutorial I have found for installing Kubernetes on bare metal stops here. The problem is that this is not the Kubernetes you will use in the exam or when working with it in the cloud, which is where the exam happens. The problem is that at this stage, our bare metal Kubernetes knows nothing about auto-provisioning of persistent volumes or assigning external IPs to our pods. If we leave our cluster in this state, we will spend hours manually writing local storage and hostpath configurations along with the only way of being able to access the pods outside of the pod network is through nodePort settings, which is very limiting.
But, we have a solution, two actually. Below we will install the two Kubernetes packages that give us these two auto provisioning packages that then make our bare metal cluster as close to a full cloud install as we are going to get.
This package is provided by OpenEBS. This package provides the services and configuration changes needed to emulate block storage on bare metal Kubernetes clusters. Without this functionality, everything related to PersistentVolumeClaims has to be built manually. They must also be manually torn down when a pod is removed from the system. OpenEBS loads an auto-provisioning layer between Kubernetes and your pods. So far, all I can say is it just works.
As always you should check the vendor site or repository for new versions and breaking changes. Below I have included the simple instructions from their web page.
kubectl apply -f https://openebs.github.io/charts/openebs-operator.yaml
Next we can run the command below to see which storage classes are available. We should see at least one that is an openebs.ip provisioner.
kubectl get sc
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
openebs-device openebs.io/local Delete WaitForFirstConsumer false 98s
openebs-hostpath openebs.io/local Delete WaitForFirstConsumer false 98s
In the case of our cluster we will use openebs-hostpath. This will become part of the Elasticsearch manifest that we create when we get to that part. Next we have to install the MetalLB network provisining system.
Since we are setting this up to be as automated as possible we now need to set openebs-hostpath as the default storageclass. We can do that with the following command: kubectl patch storageclassopenebs-hostpath -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
kubeadm token create --print-join-command
This command will output the full string that we need to run on the workers so they will join the cluster. It will look similar to the output below but the token and ca-cert-hash will be different.
sudo kubeadm join 192.168.2.99:6443 --token 36txhy.dq1dbw9jwcswptjs --discovery-token-ca-cert-hash sha256:8ef5f2cb7efd439bcfdcfd4dd54204cb5e1305b653ccb483d5fa50613b715cf0
Run the command above on all worker nodes. Every worker where you run this should produce an output that contains the line This node has joined the cluster:. If there are no errors we can now verify if all of our nodes have joined the cluster by running the command below on the control plane.
kubectl get nodes
If all of the nodes have joined the cluster you should see an output that looks similar to this. This cluster has one control plane host and three worker nodes. This is what we see here.
NAME STATUS ROLES AGE VERSION
kube-control-plane Ready control-plane 5m18s v1.25.3
kube-worker-1 Ready none 78s v1.25.3
kube-worker-2 Ready none 69s v1.25.3
kube-worker-3 Ready none 68s v1.25.3
I have run through this project dozens of times while writing this document. I still can not figure out exactly why this happens. If we do not restart kube-dns right here, the next section on installing MetalLB never works correctly. Everything else seems to work fine but not MetalLB. So, I decided that at this point, we just restart kube-dns, and everything has worked fine since then. Please run the command below on the Kubernetes control plane to restart kube-dns.
kubectl delete pod -n kube-system -l k8s-app=kube-dns
Though we are deleting this pod it will be rebuit by kube-system. After this everything else should work fine. This is another situation where it is best to wait until all pods are in a Running state before moving on.
This package allows us to emulate the LoadBalancer network provisioning system that many pre-packaged applications depend on. Parts of this are difficult. Other parts are impossible on bare metal without this package. How this works without these packages is another one of things that is good to know, so check out these documents,Kubernetes Service and Kubernetes Exposing an External IP but by using either this service or by installing your cluster at a cloud provider that supports auto-provisioning the actual setup and tear down is automatic.
We will be installing this in Layer 2 mode. This pacakge does support other modes including BGP but Layer 2 will work fine for this project as it does for many production setups.
kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.13.7/config/manifests/metallb-native.yaml
This install should only take a few seconds and then we are all done with installing the package. But, we need to create two objects. One object that is a poll of IP's that MetalLB can assign to pods and one that sets up an L2Advertidement so Kubernete's knows that there are LoadBalancer addresses available.
This one of the multi line EOF commands. Copy the whole thing and run this only on the control plane node.
cat <<EOF | kubectl apply -f -
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
name: first-pool
namespace: metallb-system
spec:
addresses:
- 172.16.1.100-172.16.1.254
EOF
The small manifest above is our IP pool file. This manifest lets MetalLB know what IP's it is allowed to hand out to the pods. Our Kubernetes cluster has two network interfaces. One side connects to the rest of the network; the other side connects to a VyOS router in software. That router hosts the 172.16.1.0/24 network and routes it back into the primary network so we can directly access the 172.16.1.0/24 network, but it is technically isolated from our home network.
Now we need to install the MetalLB L2 pool advertiser
cat <<EOF | kubectl apply -f -
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
name: example
namespace: metallb-system
spec:
ipAddressPools:
- first-pool
EOF
This brings us to the end of building a working Kubernetes cluster. We need to install one more thing that will help us keep track of the cluster health since we do not yet have a metrics and visualization pod set running. So, for one final step before part one of this project ends lets install the Kubernetes Metrics Server.
Now that Kubernetes is running we can do almost everything that we need to do with the kubectl command. In this case we are going to apply a manifest that installs the metrics server pod.
kubectl apply -f https://raw.githubusercontent.com/techiescamp/kubeadm-scripts/main/manifests/metrics-server.yaml
This can take a couple minutes to configure and load. Until that time, if you run kubectl top nodes you will see the error "Error from server (ServiceUnavailable): the server is currently unable to handle the request (get nodes.metrics.k8s.io)". This is expected. When the pod is finished loading running the same command will procuce an output similar to below.
kubectl top nodes
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
kube-control-plane 83m 2% 2414Mi 30%
kube-worker-1 29m 0% 1710Mi 21%
kube-worker-2 38m 0% 1710Mi 21%
kube-worker-3 27m 0% 1529Mi 19%
This is the last step. We now have a fully functional Kubernetes cluster with access to a bit of metrics so we can keep an eye on our cluster. In the next section.
In this part of the project, we will install Elasticsearch, Kibana, and Metricbeat for the initial monitoring solution. As I mentioned in the last part, this configuration is the one I am most familiar with, but there will be other monitoring solutions I test on this cluster. We will generally use the Kubernetes install guides available on the Elastic website, but we will not run the demo manifests. We will use them as examples, but we will change a few things, such as the namespace, cluster name, and resources, to meet the needs of this project. First, we need to install the Elasticsearch ECK operator, as it is the easiest and most reliable way to deploy an Elasticsearch cluster in a Kubernetes environment.
Since all of the items we will install are now Kubernetes manifests, the installations are fairly easy. There are only three steps in the document available here ECK documentation, and one of them is checking logs to make sure the ECK operator is working correctly. Though there are only three steps in that document, we will replicate them here so we can complete this with only one open documen
But, first we must install kube-state-metrics so Metricbeat can gather metrics from the Kubernetes cluster along with the system metrics for the nodes.
The kube-state-metrics system collects metrics from the entire Kubernetes system and then provides them through a simple interface. Our initial metrics collector Metricbeat will then poll the kube-state-metrics endpoint and store them in Elasticsearch for use in the dashboards. kube-state-metrics is fairly easy to set up, we will start by using git clone to pull down the manifests. This will download the whole project but we will only run the manifests that are available in examples/standard.
mkdir ~/tmp/
cd ~/tmp/
git clone https://github.com/kubernetes/kube-state-metrics.git
cd ~/tmp/kube-state-metrics
kubectl apply -f examples/standard
kubectl -n kube-system get deployments kube-state-metrics
When everything is loaded, which does not take long, you should eventually see the output below.
NAME READY UP-TO-DATE AVAILABLE AGE
kube-state-metrics 1/1 1 1 3h55m
This is a large file that would take a whole document to explain. That may come in the future but for how we will follow the documents at Elastic and install this manifest from the Elastic provided URL.
kubectl create -f https://download.elastic.co/downloads/eck/2.4.0/crds.yaml
This command will print lines out to the terminal telling you what resources were created. For full details please see the install document linked above.
ECK uses its own operator which makes use of custom resources to manage the configuration and deployment of Elastic stack applications. This manifest is a large file that you can download and browse through if you wish. For this project, we will just install it from the URL as shown below.
kubectl apply -f https://download.elastic.co/downloads/eck/2.4.0/operator.yaml
And finally we can check the logs to make sure this operator has started correctly. Notice how this lives in the elastic-system namespace.
kubectl -n elastic-system logs -f statefulset.apps/elastic-operator
This manifest is all that is required to install the Elasticsearch ECK operator on our Kubernetes cluster. If you have followed through with the whole document up to this point, we now have a fully operating Kubernetes cluster with kube-state-metrics and the Elastic ECK operator waiting for us to install an Elasticsearch cluster.
Our journey up until now has been fairly easy. Now we move on to the part that caused me to avoid Kubernetes until recently, the Object Oriented way that we have to build things. In the end, it makes great sense. Objects can be reused over and over, this is the heart of the whole Kubernetes system. But, when you come from a world of top-down config, it can be a bit intimidating at first. As I sorted this out in my brain, I came up with the image below, and everything clicked.
I came up with this block diagram while working through the complete Metricbeat manifest in this document
Now that the ECK operator is installed, we can easily create custom Elasticsearch manifests. Because of our custom setup, we can not use the default manifests offered by most vendors. This is because persistent volumes and external IP assignments could be slightly different. This stuff can be ind-boggling when you do it all manually, but this is where OpenEBS and MetalLB come into play for our personal Kubernetes cloud.
If you follow the quickstart instructions in the Elastic documentation, you always end up with a cluster and services with "quickstart" and "demo" in the object name. This manifest is great for testing things but not something that you want to put into production. So we will take the extra step to build custom manifests where needed. The Elasticsearch database is one of these custom manifests.
kubectl create namespace es-lab-cluster
cat <<EOF | kubectl apply -f -
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
name: lab-cluster
namespace: es-lab-cluster
spec:
version: 8.4.3
nodeSets:
- name: data-node
count: 2
volumeClaimTemplates:
- metadata:
name: elasticsearch-data # Do not change this name unless you set up a volume mount for the data path.
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
storageClassName: openebs-hostpath
EOF
Now we can check the status of our cluster in its own namesace.
kubectl get pods -n es-lab-cluster
Initially, the pods will be in a state of pending. It can take a few minutes for Elasticsearch to configure the pods after they are launched. Through all of the reinstalls, I have randomly encountered a situation where one of the Elasticsearch nodes can not resolve the hostname. If one of the Elasticsearch pods is not starting, check the logs for that pod with the following command.
kubectl logs lab-cluster-es-data-node-# -n es-lab-cluster
with # being the number of the es-data-node that is not starting. If the logs show an error that contains Failed to resolve publish address then we can restart kube-dns again to resolve this issue.
kubectl delete pod -n kube-system -l k8s-app=kube-dns
Below we will use a list to step through this whole manifest. We changed many things from the default quickstart manifest offered at the Elastic website. The most important of these changes is the storageClassName.
This can be a little misleading because it is not just the version but also which API library to use. In this case we are using the custom library that was installed when we installed ECK
This tells Kubernetes what kind of manifest this is. This Elasticsearch kind is similar to a StatefulSet but is configured specifically for Elasticsearch.
This is be the identifying part of the name of this cluster. We will see that later when we inspect the cluster that we install. In the default manifest this field has the value quickstart
This is the Kubernetes namespace where our cluster will live. It is not good practice to run everything in the default namespace so we might as well start off the right way. When we get to using kubectl we will access this namespace with the -n option: kubectl -n es-lab-cluster
This is the version of Elasticsearch that we are installing.
This sets the unique identifying name of all of the nodes in this cluster. You will see this when we expect the cluster.
This is the number of nodes that will be created with these specifications.
This key is the name of the volume claim that elasticsearch will use when building these nodes. More or less the hard drive space that Kubernetes will carve out for elasticsearch to use. Not all deployments need storage, so this is an efficient way to parse it out on an as-needed basis. Without OpenEBS, this takes building multiple custom manifests. With OpenEBS, we only need the storageClassName, which is explained below.
Volume can be mounted as read/write by a single node.
Allow this container to access up to 50 Gigablytes of storage.
This is the most important part of the manifest for our setup. Because we are using OpenEBS hostPath mode every container that needs a volumeClaim must request that claim from this storageClass. We saw this earlier when we ran kubectl get sc. As a side note, the openebs-hostpath storage class could be set as the default storage class for any name space. We do not do that here as it is more informative to know exactly what the pod is doing.
And that ends our breakdown of the manifest that will build our 2 node Elasticsearch system named lab-cluster with 10GB of storage in the namespace es-lab-cluster.
Since we are working with objects and connecting objects together instead of hard-coded network addresses installing Kibana is just as easy, if not easier, than installing the Elasticsearch cluster. We only have to reference the name of the cluster, and it all clicks together. Since we are using MetalLB to hand out external IP addresses, there is only one major change in this manifest.
cat <<EOF | kubectl apply -f -
apiVersion: kibana.k8s.elastic.co/v1
kind: Kibana
metadata:
name: kibana-lab
namespace: es-lab-cluster
spec:
version: 8.4.3
count: 1
elasticsearchRef:
name: lab-cluster
namespace: es-lab-cluster
http:
service:
spec:
type: LoadBalancer
EOF
As you can see, this is much the same as the one for Elasticsearch. We named it kibana-lab, which is how we will reference this. The namespace is the same as where we put the cluster, es-lab-cluster, we reference Elasticsearch as lab-cluster, and we are launching one copy of this container. The interesting part is http.service.spec.type. MetalLB handles this. I have found that the LoadBalancer type is the easiest to work with for this project. Go ahead and launch this container by running the command above if you have not already, and we will see how this works.
Now we can take a look at the services and see what MetalLB assinged Kibana for an external IP address. Remember from above that we only gave it the pool of 172.16.1.100 - 172.16.1.254 to work with.
kubectl get service -n es-lab-cluster
That will produce and output simiar to below.
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kibana-lab-kb-http LoadBalancer 10.100.150.108 172.16.1.100 5601:32151/TCP 19s
lab-cluster-es-data-node ClusterIP None none 9200/TCP 9m2s
lab-cluster-es-http ClusterIP 10.104.186.230 none 9200/TCP 9m4s
lab-cluster-es-internal-http ClusterIP 10.111.127.250 none 9200/TCP 9m4s
lab-cluster-es-transport ClusterIP None none 9300/TCP 9m4s
You can see that there is only one service with an external IP, kibana-lab-kb-http. This is because we told it to assign one with the http.service.spec.type: LoadBalancer specification. At this point, you should be able to access kibana through the EXTERNAL-IP. In this case, that would be https://172.16.1.100:5601. There will be a security error because these are all self-signed certificates. Accept the browser error and wait for Kibana to load.
It takes the kibana pod a bit of time to load. At first you may see Kibana server not ready. If this happens wait a few minutes and refresh the browser, it should load the page above. The first thing we need to do is get the password that ECK generated during the Elasticsearch install. We will use this for the first login to Kibana.
kubectl -n es-lab-cluster get secret lab-cluster-es-elastic-user -o go-template='{{.data.elastic | base64decode}}'; echo ""
This prints the auto-generated Elasticsearch password to the terminal. We add the echo at the end of the kubectl command to force a new line to print. Otherwise, the password runs into the shell prompt and is hard to read.
Log in to Kibana with the username elastic and the password we printed above. We are going to create a kube-metrics user, so we are not always logging in with the elastic user. Click on the hamburger menu on the top left and scroll down until you find Stack Management, as shown in the image below.
When the new window loads choose Users from the new menul on the left and then click the Create user button on the top right of this new window.
In this new window, we will create a new user named kube-metrics. In the case of this project, we will leave full name and email address blank, and I have chosen to use a password of notasecurepassword for this project. You can use whatever you want here. We are going to give this user superuser privileges. Not a good idea in production, but for this project, it will be the easiest to work with.
After filling in this information click Create user and we are done with this window for a little bit. Next we move on to installing the Metricbeat pods. Since we made changes to the elasticsearch cluster that we installed we also have to make a few changes to the default metricbeat manifest.
Metricbeat is a large manifest. It is a collection of 11 objects that all connect, allowing Metricbeat to collect metrics from all of the host systems and the Kubernetes pods. So, in this case, instead of running the manifests from the command as we have up until this point, we will download the whole manifest to our kube-control plane. Let's navigate back to the user's home folder where we have been working and download this manifest.
cd ~
Now we can download the manifest.
curl -L -O https://dangerousmetrics.org/downloads/lab-metrics-metricbeat-kubernetes.yaml
This is a large manifest that I have completely broken down and explained in the document Metricbeat Kuberneties Manifest Teardown. But, for this project we do not need the full teardown. There are a few changes that we have made to make this work with our non default setup. Below is a diff between the custom manifest above and the one available as a demo from the Elastic website.
77a78
> ssl.verification_mode: none
138a140
> ---
159c161,170a
< dnsPolicy: ClusterFirstWithHostNet
---
> dnsPolicy: "ClusterFirstWithHostNet"
> dnsConfig:
> searches:
> - es-lab-cluster.svc.cluster.local
> - kube-system.svc.cluster.local
> - svc.cluster.local
> - cluster.local
> tolerations:
> - key: node-role.kubernetes.io/control-plane
> effect: NoSchedule
170c181
< value: elasticsearch
---
> value: https://lab-cluster-es-internal-http
174c185
< value: elastic
---
> value: kube-metrics
176c187
< value: changeme
---
> value: notasecurepassword
361a373
>
As you can see there were a few changes made here other than the Elasticsearch connection information. We also added output.elasticsearch.ssl.verification_mode: none, which of course disables ssl verification. Setting up the self-signed CA on this cluster is a project in its self, just like anything that has to do with SSL. Since it is beyond the scope of this project we will turn off ssl.verification on the parts that connect to Elasticsearch. Internally the TLS transport is still SSL.
There are also changes to the dnsPolicy and configuration. Since the metricbeat pods run in the namespace kube-system while Elasticsearch and Kibana run in the es-lab-cluster namespace we have to tell these pods where to look to find the Elasticsearch cluster that we want to write to, in this case it lives at es-lab-cluster.svc.cluster.local.
kubectl apply -f lab-metrics-metricbeat-kubernetes.yaml
Now we should be collecting some data. Before we can visualize that data we need to log into one of the pods to install the default metricbeat dashboards into Kibana. There are a few different ways to use metribeat to install these dashboards and the visuals that go with them. We are going to go the route of opening a shell into one of the metricbeat pods and running the commands directly there. First we need to find a metricbeat pod to log into. Run the command below to find a suitable pod.
kubectl get pods -n kube-system | grep metricbeat
This will produce an output similar to below.
metricbeat-mlg5c 1/1 Running 3 (9m39s ago) 4h2m
metricbeat-rfcrp 1/1 Running 0 4h2m
metricbeat-tzslw 1/1 Running 0 4h2m
Your pod names will be different because of the random identifier. Just chose one, it does not matter which and run the command below on the kube-control-plane server.
kubectl -n kube-system exec --stdin --tty metricbeat-????? -- /bin/bash
This will open a shell into that metricbeat pod. You will actually see the prompt of the worker node that this pod is running on. Once you are connected to the pod terminal run the command below.
metricbeat setup --dashboards \
-E output.elasticsearch.hosts=['https://lab-cluster-es-http:9200'] \
-E output.elasticsearch.username=kube-metrics \
-E output.elasticsearch.password=notasecurepassword \
-E output.elasticsearch.ssl.verification_mode=none \
-E setup.kibana.ssl.verification_mode=none \
-E setup.kibana.host=https://kibana-lab-kb-http:5601
This is the same command that you can find at the Elastic website that describes how to manually install metricbeat dashboards. This version has been modified to work with our es-lab-cluster namespace. I find this a much easier way to do this if metricbet pods are already running in the cluster. Notice that here we have also diabled ssl verification on this line -E setup.kibana.ssl.verification_mode=none.
This process may take a couple minutes as it loads dozens of dashboards and hundreds of saved objects that go with these dashboards. Once the command completes type exit to close this pod and return to the control plane.
Now we can switch back to the Kibana web browser that we opened earlier and navigate to the Dashboards section and search for either Kubernetes or System dashboards, both types should have some information. Below is a screenshot of the [Metricbeat Kubernetes] Overview dashboard.
If you want to browse the raw metricbeat data just lick on the menu again and select Discover from the menu. Since metricbeat is our only public index Kibana will default to that index and you can browse he raw data as shown below.
We are now at the end of this project. If you have followed along, you should now have launched a fully functioning Kubernetes cluster with Elasticserch, Kibana, and Metricbeat. You should be able to look at metrics in the Kibana dashboards and better understand how all of this works together. If you would like to see any changes or customizations to this document, leave me a message on whatever social media platform where you find this.