Setup k3s with LetsEncrypt and Traefik dashboard exposed

2022-02-16 5 Min. lesen Marco

About

This article describes how to expose a Kubernetes instance running with k3s with TLS certificates from Let’s Encrypt.

The K3s installation will install a Traefik ingress on its default configuration. But it will not have a certificate resolver present and uses the Traefik default certificate.

Versions

K3s: v1.22.6+k3s1 (3228d9cb)
K8s-Client: 1.23.3
K8s-Server: 1.22.6+k3s1
Traefik: 2.6.1
Traefik Helmchart: 10.9.100

Installation k3s

The installation of k3s is quite simple. We are following the installation guide of k3s documentation on rancher.com.

# Set hostname as node name
$ export K3S_NODE_NAME=$(hostname -f)

# start installation
$ curl -sfL https://get.k3s.io | sh -

After the installation the Kubernetes configuration file of ranger can be copied from k3s.

$ cp /etc/rancher/k3s/k3s.yaml /root/.kube/config

Configure k3s Traefik ingress

To use the preinstalled ingress with certificates you need to add some additional configuration to your Traefik. You will find the deployment Helm configuration here: /var/lib/rancher/k3s/server/manifests/traefik.yaml. Keep in mind not to change this configuration file because it might be overwritten by k3s later (e.g. on upgrade).

To configure the k3s ingress correctly please create a HelmChartConfig resources as described in the official documentation. This section will guide you through this simple configuration.

We will add the following parameters:

traefik image and tag (default uses a mirrored image)
enable insecure API mode
add certificate resolvers (Let’s Encrypt staging and prod)

The configuration will be added to the Helm chart by overwriting the values file. You will find all possible configuration options in the complete values.yaml file of the chart.

The image will be specified directly as image.name and image.tag. All other configurations mentioned above will be specified as arguments to Traefik. Therefore we use globalArguments.

In order to keep already given configurations from k3s we copy the valuesContent from the k3s traefik manifest. This file can be found on the mentioned path above on your server (/var/lib/rancher/k3s/server/manifests/traefik.yaml) or have a look on GitHub.

I added my custom configuration on top of the copied on from the existing manifest (separated by a comment).

Place the configuration at /var/lib/rancher/k3s/server/manifests/traefik-config.yaml.

Modified Traefik configuration

# traefik-config.yaml
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: traefik
  namespace: kube-system
spec:
  valuesContent: |-
    image:
      name: traefik
      tag: v2.6.1

    globalArguments:
      - "--global.checknewversion=false"
      - "--global.sendanonymoususage=false"
      - "--api.insecure=true"
      - "--certificatesresolvers.le-staging.acme.tlschallenge"
      - "[email protected]"
      - "--certificatesresolvers.le-staging.acme.caserver=https://acme-staging-v02.api.letsencrypt.org/directory"
      - "--certificatesresolvers.le-staging.acme.storage=/data/acme.json"
      - "--certificatesresolvers.le-prod.acme.tlschallenge"
      - "[email protected]"
      - "--certificatesresolvers.le-prod.acme.caserver=https://acme-v02.api.letsencrypt.org/directory"
      - "--certificatesresolvers.le-prod.acme.storage=/data/acme.json"

    # ---- k3s Traefik default configuration below ----
    rbac:
      enabled: true
    ports:
      websecure:
        tls:
          enabled: true
    podAnnotations:
      prometheus.io/port: "8082"
      prometheus.io/scrape: "true"
    providers:
      kubernetesIngress:
        publishedService:
          enabled: true
    priorityClassName: "system-cluster-critical"
    tolerations:
    - key: "CriticalAddonsOnly"
      operator: "Exists"
    - key: "node-role.kubernetes.io/control-plane"
      operator: "Exists"
      effect: "NoSchedule"
    - key: "node-role.kubernetes.io/master"
      operator: "Exists"
      effect: "NoSchedule"

Important is also to remove duplicate entries - e.g. on the default k3s configuration I had to remove image.name. Otherwise your changes will not be reflected.

Deployment modified Traefik ingress

To deploy the modified ingress you do not need to deploy the yaml-file yourself. K3s will keep track of the directory where the file was placed and run Helm to deploy changes.

To verify if the changes are inherited have a look at the pods in namespace kube-system:

$ kubectl get pod -n kube-system
NAME                                      READY   STATUS      RESTARTS   AGE
local-path-provisioner-84bb864455-hzk75   1/1     Running     0          4m53s
coredns-96cc4f57d-cg59b                   1/1     Running     0          4m53s
helm-install-traefik-crd--1-24p6k         0/1     Completed   0          4m54s
metrics-server-ff9dbcb6c-bwcv2            1/1     Running     0          4m53s
svclb-traefik-2qxsw                       2/2     Running     0          4m33s
helm-install-traefik--1-pw4cs             0/1     Completed   0          32s
traefik-968cf9598-6qxtm                   1/1     Running     0          30s

You can see the helm-install-traefik-* pod was completed 30 seconds ago. This pod deployed the changes from the added HelmChartConfig.

This resulted in a updated traefik-* pod with updated configuration. To check the configuration you can view the pod: kubectl get pod traefik-968cf9598-6qxtm -o yaml and search for e.g. the args of the container or the used image.

Expose Traefik dashboard

To check your configuration we now will expose the Traefik dashboard. Keep in mind that it is not recommended to publish the Traefik dashboard to the public (especially not unprotected).

We will use the Custom Resource IngressRoute (CRD from Traefik) to expose the dashboard to a public domain.

# ingress-traefik-dashboard-public.yaml
apiVersion: traefik.containo.us/v1alpha1
kind: IngressRoute
metadata:
  name: traefik-dashboard-public
  namespace: kube-system
spec:
  entryPoints:
    - websecure
  routes:
    - match: Host(`traefik.example.org`) && (PathPrefix(`/dashboard`) || PathPrefix(`/api`))
      kind: Rule
      services:
        - name: api@internal
          kind: TraefikService
  tls:
    # use staging certificates
    # keep in mind you will get a TLS warning by your browser when using staging!
    certResolver: le-staging

This file should not be saved in the same location as the traefik-config.yml. Replace traefik.example.org with your domain.

To deploy this IngressRoute apply the configuration with following command:

kubectl apply -f ./ingress-traefik-dashboard-public.yaml

After a successful deployment the traefik dashboard should be exposed at the defined domain:

https://traefik.example.org/dashboard/

As described in the yaml the certificate will not be trusted by your browser because currently the Let’s Encrypt staging resolver is used. This occurs in a Your connection is not private: ERR_CERT_AUTHORITY_INVALID error on your browser. To change the certificate to a trusted on change the the resolver to spec.tls.certResolver: le-prod.

Known issues

Timeout during connect

If a IngressRoute is already present when Traefik itself is not yet available the following error could occur:

$ kubectl logs  -n kube-system traefik-5c7b868c6-k4m2r
time="2022-02-16T18:01:15Z" level=info msg="Configuration loaded from flags."
time="2022-02-16T18:01:39Z" level=error msg="Unable to obtain ACME certificate for domains \"traefik.example.org\": unable to generate a certificate for the domains [traefik.example.org]: error: one or more domains had a problem:\n[traefik.example.org] acme: error: 400 :: urn:ietf:params:acme:error:connection :: Timeout during connect (likely firewall problem)\n" ACME CA="https://acme-staging-v02.api.letsencrypt.org/directory" routerName=hidden@kubernetescrd rule="Host(`traefik.example.org`)" providerName=le-staging.acme

If an Ingress(Route) is already defined prior to fully existence of the Traefik pod the HTTP challenge might will fail. This problem could occur due the fact that the old Traefik pod (shortly before its termination) will receive the response from service instead of the new pod. Due the fact that no response will return to the newly created pod will timeout.

This issue is also related to the certificate store issue above.

Certificate store

In the current configuration the certificate store (/data/acme.json inside the pod) will be lost if the pod will be recreated. All issued certificates will then be lost.

The certificate store would have to be outsourced in e.g. a Kubernetes Secret.

Have a look at cert-manager - this will also allow you to scale Traefik with multiple replicas.