Components of Kubernetes¶

Instead of introducing all the components of Kubernetes at once, let's try to build a Kubernetes step by step by ourselves.

Docker in One Machine¶

Let's assume we have a machine Node 1 with docker installed.

flowchart TD

subgraph Node 1
container1(Container 1)
container2(Container 2)
end

Everytime we want to create a container, we need to ssh into Node 1 and execute docker run.

flowchart LR
user(User) --> ssh{SSH} --> docker(Docker)
docker --> run{Run}
subgraph Node 1
docker
run --> container1(Container 1)
run --> container2(Container 2)
end

class ssh,docker,run active
classDef active stroke:#f26f33,stroke-width:2px

To be honest, if there are lots of containers from time to time to create or delete, this becomes really inconvenient.

What About Using an API to Manage Containers?¶

Yes. An API server can be used to accept our request and it can directly start the container on Node 1.

flowchart LR
user(User) --> |Request|api(API Server) --> docker(Docker)
docker --> run{Run}
subgraph Node 1
api
docker
run --> container1(Container 1)
run --> container2(Container 2)
end

class api active
classDef active stroke:#f26f33,stroke-width:2px

Let's give the API server the name kube-apiserver.

And to store the user's request, we use etcd as the database. This helps us to store containers' status, so we don't have to run docker inspect everytime we need the status of our containers. And also it helps us debug whether the containers are started as requested.

flowchart LR
user(User) --> |Request|api(Kube API Server) --> docker(Docker)
docker --> run{Run}
subgraph Node 1
api --> etcd(etcd)
docker
run --> container1(Container 1)
run --> container2(Container 2)
end

class api,etcd active
classDef active stroke:#f26f33,stroke-width:2px

kube-apiserver

The Kubernetes API server validates and configures data for the api objects which include pods, services, replicationcontrollers, and others. The API Server services REST operations and provides the frontend to the cluster's shared state through which all other components interact.

Visit https://kubernetes.io/docs/reference/command-line-tools-reference/kube-apiserver/ for detailed usage of kube-apiserver.

etcd

Consistent and highly-available key value store used as Kubernetes' backing store for all cluster data.

Visit https://etcd.io/docs/ for more documentations.

Multi-Node¶

Now, let's try to run containers on multiple nodes.

SSH¶

However, we only have one api-server, now it goes back to the ssh mode.

flowchart LR
user(User) --> |Request|api(Kube API Server) --> docker(Docker)
docker --> run{Run}

subgraph Node 1
api --> etcd(etcd)
docker
run --> container1(Container 1)
run --> container2(Container 2)
end

api --> ssh{SSH} --> docker2(Docker) --> run2{Run}

subgraph Node 2
docker2
run2 --> container3(Container 3)
run2 --> container4(Container 4)
end

class ssh,docker2,run2,container3,container4 active
classDef active stroke:#f26f33,stroke-width:2px

Node API-Server¶

This is not the ideal case to run containers on multiple nodes.

Perhaps we can start another api-server on each node, and call them node-api-server.

flowchart LR
user(User) --> |Request|api(Kube API Server) --> nodeapi(Node API Server) --> docker(Docker)
docker --> run{Run}

subgraph Node 1
api --> etcd(etcd)
nodeapi
docker
run --> container1(Container 1)
run --> container2(Container 2)
end

api --> nodeapi2(Node API Server) --> docker2(Docker) --> run2{Run}

subgraph Node 2
nodeapi2
docker2
run2 --> container3(Container 3)
run2 --> container4(Container 4)
end

class nodeapi,nodeapi2 active
classDef active stroke:#f26f33,stroke-width:2px

This is workable. However, there is a problem, we design a push-mode for the master api-server (kube-apiserver) to push tasks to nodes api-server. If there is anything wrong and the task not delivered or carried out properly, the master has to retry many times. The more nodes there are, the queue of the tasks may get very long for the master api-server to handle.

Node Agent¶

Another design is pull-mode. Instead of running api-server on each node, we start an agent that will watch the tasks the master api-server received. If there is a task for the agent's node to start a container, the agent will start a new container.

kube-apiserver and kubelet

Actually, kubelet exposes its own API. And the communication between kube-apiserver and kubelet is bi-directional.

flowchart LR
user(User) --> |Request|api(Kube API Server) <--> agent(Agent) --> docker(Docker)
docker --> run{Run}

subgraph Node 1
api --> etcd(etcd)
agent
docker
run --> container1(Container 1)
run --> container2(Container 2)
end

api <--> agent2(Agent) --> docker2(Docker) --> run2{Run}

subgraph Node 2
agent2
docker2
run2 --> container3(Container 3)
run2 --> container4(Container 4)
end

class agent,agent2 active
classDef active stroke:#f26f33,stroke-width:2px

The agent will run containers according to the requests kube-apiserver received, and update containers' status back to kube-apiserver.

Let's rename the node having kube-apiserver to master, and name the agent as kubelet.

flowchart LR
user(User) --> |Request|api(Kube API Server) <--> kubelet(kubelet) --> docker(Docker)
docker --> run{Run}

subgraph Master
api --> etcd(etcd)
kubelet
docker
run --> container1(Container 1)
run --> container2(Container 2)
end

api <--> kubelet2(kubelet) --> docker2(Docker) --> run2{Run}

subgraph Node 1
kubelet2
docker2
run2 --> container3(Container 3)
run2 --> container4(Container 4)
end

class kubelet,kubelet2 active
classDef active stroke:#f26f33,stroke-width:2px

kubelet

As in https://kubernetes.io/docs/concepts/overview/components/#kubelet:

An agent that runs on each node in the cluster. It makes sure that containers are running in a Pod.

The kubelet takes a set of PodSpecs that are provided through various mechanisms and ensures that the containers described in those PodSpecs are running and healthy. The kubelet doesn't manage containers which were not created by Kubernetes.

We Need a Scheduler¶

Yes, we need a scheduler. Why? As we have multiple nodes now, some nodes may be full of containers and don't have any spared CPU or Memory, while some nodes may have a lot of free CPU and Memory.

We need a scheduler to determine which node to run the containers.

We call this scheduler kube-scheduler

flowchart LR
user(User) --> |Request|api(Kube API Server) <--> kubelet(kubelet) --> docker(Docker)
docker --> run{Run}

subgraph Master
api --> etcd(etcd)
api <--> scheduler(kube-scheduler)
kubelet
docker
run --> container1(Container 1)
run --> container2(Container 2)
end

api <--> kubelet2(kubelet) --> docker2(Docker) --> run2{Run}

subgraph Node 1
kubelet2
docker2
run2 --> container3(Container 3)
run2 --> container4(Container 4)
end

class scheduler active
classDef active stroke:#f26f33,stroke-width:2px

kube-scheduler

As in https://kubernetes.io/docs/concepts/overview/components/#kube-scheduler

Control plane component that watches for newly created Pods with no assigned node, and selects a node for them to run on.

Factors taken into account for scheduling decisions include: individual and collective resource requirements, hardware/software/policy constraints, affinity and anti-affinity specifications, data locality, inter-workload interference, and deadlines.

Communication Between Nodes?¶

Communications between containers, especially across nodes are not possible now.

As you may have noticed, when we start a container with docker, an IP 172.17.0.1 will be assigned to the container. Containers cannot communicate across nodes.

To solve this problem, we need cluster networking. There are many options as in here, we will use flannel as an example.

flannel

Flannel runs a small, single binary agent called flanneld on each host, and is responsible for allocating a subnet lease to each host out of a larger, preconfigured address space. Flannel uses either the Kubernetes API or etcd directly to store the network configuration, the allocated subnets, and any auxiliary data (such as the host's public IP). Packets are forwarded using one of several backend mechanisms including VXLAN and various cloud integrations.

flowchart LR
user(User) --> |Request|api(Kube API Server) <--> kubelet(kubelet) --> docker(Docker)
docker --> run{Run}

subgraph Master
api --> etcd(etcd)
api <--> scheduler(kube-scheduler)
kubelet
docker
run --> container1(Container 1)
run --> container2(Container 2)
flanneld1(flanneld) -->routing1{{Routing Table}}
container1 <--> routing1
etcd --> flanneld1
end

api <--> kubelet2(kubelet) --> docker2(Docker) --> run2{Run}

subgraph Node 1
kubelet2
docker2
run2 --> container3(Container 3)
run2 --> container4(Container 4)
flanneld2(flanneld) -->routing2{{Routing Table}}
container3 <--> routing2
end

routing1 <--> routing2
etcd --> flanneld2

class routing1,routing2,flanneld1,flanneld2 active
classDef active stroke:#f26f33,stroke-width:2px

Pod¶

In Kubernetes, Pod is used as the basic resource unit. A pod can have multiple containers.

flowchart LR
user(User) --> |Request|api(Kube API Server) <--> kubelet(kubelet) --> docker(Docker)
docker --> run{Run}

subgraph Master
api --> etcd(etcd)
api <--> scheduler(kube-scheduler)
kubelet
docker
run --> pod1(Pod 1)
run --> pod2(Pod 2)
flanneld1(flanneld) -->routing1{{Routing Table}}
pod1 <--> routing1
etcd --> flanneld1
end

api <--> kubelet2(kubelet) --> docker2(Docker) --> run2{Run}

subgraph Node 1
kubelet2
docker2
run2 --> pod3(Pod 3)
run2 --> pod4(Pod 4)
flanneld2(flanneld) -->routing2{{Routing Table}}
pod3 <--> routing2
end

routing1 <--> routing2
etcd --> flanneld2

class pod1,pod2,pod3,pod4 active
classDef active stroke:#f26f33,stroke-width:2px

Pod

According to https://kubernetes.io/docs/concepts/workloads/pods/

Pods are the smallest deployable units of computing that you can create and manage in Kubernetes.

A Pod (as in a pod of whales or pea pod) is a group of one or more containers, with shared storage and network resources, and a specification for how to run the containers. A Pod's contents are always co-located and co-scheduled, and run in a shared context.

High Available Service?¶

Since we have multiple nodes now, it would be great if we can create high available services.

A deployment is a group of pods. If any of the pod fails, there will be another pod created to replace it.

However, how to create and manage the pods of the deployment, and guarantee once there is a pod fails, a new pod will be created?

We need a manager to watch the resources created through the kube-apiserver. If a deployment of 3 replicas of pods is created, the manager will create three pods for it and wait for the pod to run properly. If any of the pod's status changes to failed, the manager will start a new pod.

The manager, we call it kube-controller.

kube-controller

Besides Deployment, there are other resources managed by kube-controler, like DaemonSet, StatefulSet, etc.

graph TD

kubeapi(kube-apiserver) <--> kubecontroller{{kube-controller}} --> pod1(Pod 1)
kubecontroller --> pod2(Pod 2)

subgraph Deployment
pod1
pod2
end

Now we have the multi-pod deployment, but the pods have their only IP addresses. We need a load balancer with an virtual IP to forward requests to different pods. That is where Service and kube-proxy kick in.

Service

As in Service:

Kubernetes Pods are created and destroyed to match the desired state of your cluster. Pods are nonpermanent resources. If you use a Deployment to run your app, it can create and destroy Pods dynamically.

Each Pod gets its own IP address, however in a Deployment, the set of Pods running in one moment in time could be different from the set of Pods running that application a moment later.

This leads to a problem: if some set of Pods (call them "backends") provides functionality to other Pods (call them "frontends") inside your cluster, how do the frontends find out and keep track of which IP address to connect to, so that the frontend can use the backend part of the workload?

Enter Services.

kube-proxy

As in kube-proxy:

The Kubernetes network proxy runs on each node. This reflects services as defined in the Kubernetes API on each node and can do simple TCP, UDP, and SCTP stream forwarding or round robin TCP, UDP, and SCTP forwarding across a set of backends.

graph TD

client(Client) --> clusterip(Cluster IP)
kubeproxy(kube-proxy) --> clusterip
clusterip--> pod1(Pod 1)
clusterip --> pod2(Pod 2)

subgraph Deployment
pod1
pod2
end

subgraph Service
clusterip
end