Using finalizers for custom clean-up in your controllers

December 19, 2021 - kubernetes crd controller-runtime

Deleting objects using finalizers in controller-runtime based controllers

Controller-runtime is a great piece of software created and maintained by the sig-apimachinery group. It hides a lot of details and complexity from the controller/operator developer. In my opinion, it does for those who build Kubernetes extensions the same that Django or Ruby on Rails did for web developers.

In this post I want to build a small example of using it and cover proper deleting of the objects - something that was a little counterintuitive for me at the beginning.

But first things first, why we do even want to extend Kubernetes?

Extending the cluster

We all kinda know that Kubernetes is a platform for building platforms. While not being an author of the saying, I agree with it and can relate.

But what does it exactly mean? Kubernetes is the glue-like substance between your application and your infrastructure. While every company has, its almost unique way to deal with infrastructure there is now more or less homogeneous way to maintain it and work with it as a consumer.

It's barely possible to create a one-size-fit-all solution and the only way, I guess, is to give chance to extend and a tune platform according to your environment.

That's where the kubernetes facilities for cluster extensions are coming:

Building controllers for custom resources

The more in-depth overview of the pattern available in the official documentation

Why we want to use controller-runtime? Let's make a quick look at the typical parts of a controller (not focusing our attention on the business logic).

Bits and pieces

On the top-level, any controller is basically an infinitive loop, which:

listens to some events
somehow reacts on those events
provides some feedback to the loop

This pattern is called control loop and this is not something specific to Kubernetes or even software engineering.

For us, the events are actually the events that are outcomes of someone else's actions on resources in the kube-api. Those resources could be either core resources or something custom defined.

Though, to keep that control loop robust we need somehow deal with:

events
possibility of missing an event or events in case of network glitch
rate-limiting the api calls
de-duplicate data (again, protecting api from being over-used)

The new events are handled by using so-called watcher which watches for new events of particular type.

The rate-limiting is covered by using informers which is a combo of watcher + some cache. On the other side, queues are also used for the sake of limiting the amount of calls for kube-api.

Always expected loss of events is handled by combining informer with periodical full resync. The ListWatcher covers it.

There are also different ways to sync cache between controllers, to set up handling functions and to manage the back pressure, but I guess you got the point. It is complicated. For those who is interested in how it is done without controller-runtime, there is slightly old but still great blogpost on controller details Another great source of knowledge is the source code of controller-runtime

Reconcile loop

With the controller-runtime all this machinery is hidden from us. Even more, for free we have almost out of the box:

metrics
filtering and dispatching events
no code generation typed client
webhooks scaffolding

All we need to have is a single structure implementing a single method Reconciler interface. In fact, it can be just a single function, no need to have structure if you don't need it. That's the situation when it is much easier to show rather than tell.

Example

I wrote a small controller as an example. The controller allows you to expose any service in a cluster by using ngrok tunnels. Under any circumstances, it can be used for live clusters, only for test purposes.

In more details:

it watches for all services in the cluster with label "ngrok: true"
for each service it runs pod with ngrok tunnel set up
the tunnel URL for the sake of simplicity is printed into the tunnel pod logs

You can use it to share some local cluster service with a friend or just imagine any other controller. This one is good enough as an example because it uses some external resources.

Let's deploy and try to use it. For the sake of this tutorial, I am going to use k0s project, but it would be the same for any other distro. For the sake of simplicity, the controller runs out of the cluster, using the given kubeconfig to build a connection. In other words, it would use the same connection your local kubectl uses.

export KUBECONFIG=/var/lib/k0s/pki/admin.conf
make run

Let's try to use it. Try following commands in other shell:

# let's create some pod to expose
kubectl run --image=nginx nginx

# let's expose it by creating service object
kubectl expose pod/nginx --port=80 -lngrok=true

# check for the tunneling pod
kubectl -n ngrok-tunnel get pods

# get the tunnel address from log
kubectl -n ngrok-tunnel logs -l exposed-from=ns-default-svc-nginx

As you can see, using the http://*.ngrok.io URL gives the same NGINX welcome page as for the in-cluster pod call.

Object deletion

Let's clean up the staff:

kubectl delete svc/nginx

Wait. Now we are in something I'd call trouble.

We dropped the service so there is no intention to use the tunnel anymore. But if we check the ngrok-tunnel namespace one more time for pods, the pod is still there. Even the ngrok.io URL should be still working at the moment.

What is in it for us? First, this is the resource leak. Second, it is potential resource conflict.

We are definitely in a need of proper clean-up. Especially in a real-world projects, not the blog post example.

Let's try to figure out what happens. If we have a look at the Reconciler interface and example source code we see, that Reconcile method never actually receives any objects. The only information about the object is the name or key of the object (the Request object).

type Reconciler interface {
	// Reconcile performs a full reconciliation for the object referred to by the Request.
	// The Controller will requeue the Request to be processed again if an error is non-nil or
	// Result.Requeue is true, otherwise upon completion it will remove the work from the queue.
	Reconcile(context.Context, Request) (Result, error)
}

So to work with actual instance in the code we do manual lookup:

l.Info("Reconciling", "key", r)
if err := mgr.GetClient().Get(c, r.NamespacedName, &svc); err != nil {
	l.Error(err, "Can't load service")
	if errors.IsNotFound(err) {
		// no need to reconcile if not found
		return reconcile.Result{}, nil
	}
	// let the controller-runtime deal with backoff logic
	return reconcile.Result{}, err
}

Why is it done that way? My bet, it is just easier, no need to somehow track object versions, stalled outdated instances, etc.

But for us, it means that we fall into the if-clause for non-nil err. We can check for the error message and understand that the object was deleted. But we never actually do any deletion in the code.

Manual referencing

One way to fix the issue is to attach a service name (reference) to the pod we create. With that in mind, we just do Pod lookup based on the labels and clean it up:

# ATTENTION, PSEUDOCODE
k8sClient.Pods().GetByLabel("exposed-by", "svc/nginx").Delete()

It would work perfectly fine in this example. Though in some cases it is either impossible or leads to some unpleasant over-engineering. For example, we can't easily build an external key from the observing object or we need some status fields, not only the name.

What should we do?

Finally, finalizers

We finally got to the main point of the post. The finalizers.

What are those things?

Finalizers are namespaced keys that tell Kubernetes to wait until specific conditions are met before it fully deletes resources marked for deletion. Finalizers alert controllers to clean up resources the deleted object owned.

Looks like it is exactly what we are looking for. It is a piece of data in the metadata section of your object. Kube-api delete objects unless there are some finalizers attached. In that case, it just marks the object for deletion by setting up the metadata.deletionTimestamp field. What does it mean to us is that the object is not deleted, and we are not getting into that err-non-nill code path.

In the manifest, finalizers may look like:

- apiVersion: v1
  kind: Service
  metadata:
    creationTimestamp: "2021-12-18T22:31:40Z"
    finalizers:
    - ngrok.io/tunnel
	- some.other/finalizer

We need to extend the control loop to be like:

watch for all services in the cluster with the label "ngrok: true"
if deletionTimestamp is not set
- run pod with ngrok tunnel set up
- attach finalizer to the observing object (the Service in our case)
else
- delete the corresponding pod from the ngrok-tunnel namespace
- delete the finalizer from object
- save object

After saving object kube-api deletes the object from the store as usual since there are no more finalizers anymore.

The fixed version is available in the main branch.

Example

# # again
kubectl -n ngrok-tunnel get pods

No resources found in ngrok-tunnel namespace.

kubectl expose pod/nginx --port=80 -lngrok=true

service/nginx exposed


kubectl -n ngrok-tunnel get pods
NAME                 READY   STATUS    RESTARTS   AGE
ngrok-tunnel-d25xg   1/1     Running   0          3s

kubectl delete svc/nginx
service "nginx" deleted

kubectl -n ngrok-tunnel get pods
No resources found in ngrok-tunnel namespace.

Some common sense for finalizers

Any object can have any amount of finalizers. Each controller MUST clean up its own finalizers. At the bottom level, it is just a label of predefined format.

In a sense, it is like old good logical deletion from a database, by setting deletion flag.

This post was written for the Golang Advent Calendar 2021