Multi Tenant Logs with Grafana Loki

Malcolm Pereira
13 min readMay 2, 2022
Photo by Parrish Freeman on Unsplash

Grafana, Loki, Multi Tenant Log aggregation, Walk through on a Kubernetes Cluster.

TLDR

Simple walk through to get up and running with Grafana/Loki on a Kubernetes cluster and multi-tenant observability configured in Loki.

Show me the code: https://github.com/MalcolmPereira/grafanaoverview

Versions:

Ingress-NGINX: 1.5.1

Grafana: 9.3.6

Grafana Loki: 2.7.2

Grafana Promtail: 2.7.2

Prometheus: v2.41.0

Grafana and Loki

Grafana Labs is a visualization platform for encompassing observability across all application stacks. Grafana is a one stop solution for end to end observability stack and widely used across enterprises.

Grafana Loki is a centralized log aggregation system for maintaining and querying logs.

Please refer to Grafana site for more information about Grafana and Loki. We will look at how to set up Grafana and Loki on a Kubernetes Cluster for Multi-Tenant observability.

We leverage Prometheus for getting insights into cluster metrics. Most cloud providers provide metric collectors example Azure Container Insights, AWS Cloud Watch Container Insights which can be readily plugged into Grafana via Grafana Plugins.

Sample Application

The sample application is a simple image processor, which allows tenants to upload an image and return image meta data ; mime-type and size. The application is not interesting at all, however this will do in demonstrating centralized logging, monitoring and metrics using Grafana and Loki in multi-tenant environment. The application is a SpringBoot implementation with a rest controller.

In the application various tenants call out image api service which is intercepted by public facing ingress and then routed to image api service for processing, cannot get any simpler.

The Image API service is provisioned in imageapi namespace.

imageapi service

The image api service exposes one endpoint for getting image metadata.

Since we are talking multi-tenant we need some way to log information about tenant in application logs to distinguish requests between tenants. The tenant id can come from request header, some security token etc.., in the sample application tenant id is passed as a parameter when service is invoked.

imageapi open api documentation

The sample application uses log4J2 for logging and logs to standard out using JSON logging via EcsLayout template. This requires log4j-layout-template-json dependency. JSON logging allows for structured logging and fields in log message parsed while aggregating in Loki.

log4j2 config for JSON logging
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-layout-template-json</artifactId>
<version>2.17.1</version>
</dependency>

The sample application generates following logging:

{
"@timestamp":"2022-05-01T21:10:51.451Z",
"ecs.version":"1.2.0",
"log.level":"DEBUG",
"message":"Image Metadata size: 22813",
"process.thread.name":"qtp1978462681-23",
"log.logger":"com.malcolm.imageapi.ImageAPI","REQUEST_ID":"4c5fb205-65e4-4cff-8e13-a2573bba7b36",
"TENANT_ID":"Tenant_1",
"TRACE_ID":"520f7851-c397-4385-b9c6-9bfb750a7ff5",
"source":{
"className":"com.malcolm.imageapi.ImageAPI",
"methodName":"validate",
"fileName":"ImageAPI.java",
"lineNumber":69
}
}

Important to note that each log line contains TENANT_ID, this allows to query log data for a specific tenant and also apply some rules around accessing this data.

More over we generate trace id in service and log any client request id passed by the caller, This helps solidify logging and aids troubleshooting.

There are other ways using Open-Telemetry to trace and span requests in more complex microservices but in our case generated request ids, trace ids should work fine.

We also expose metrics and prometheus data from SpringBoot application using, SpringBoot actuator.

imageapi metrics

The imageapi client that simulates multiple tenants is a simple Golang command line application that calls the service.

imageapi client — simulated tenants

Grafana / Loki

Grafana and Loki are provisioned in the grafana namespace. The expectation being tenant specific ops will access grafana dashboard and visualize log data for a given tenant.

grafana service

Only Grafana dashboard accessible via ingress, Loki, Prometheus, Promtail all are runing within the cluster and not accessible outside of the cluster.

Now that we have a fair understanding about use case, we will perform the following actions to get up and running with Grafana and Loki with Multi Tenant Observability.

  • Provision Storage
  • Provision Ingress Controller
  • Generate TLS certificates
  • Install Grafana
  • Install Loki
  • Install Promtail
  • Install Prometheus
  • Install image api service
  • Access and Configure Dashboards

Please do not be alarmed with so many install steps we leverage Helm charts so they are just one liners.

All steps are executed on command line from root folder.

I am using minikube for local cluster.

minikube version

minikube version: v1.25.2
commit: 362d5fdc0a3dbee389b3d3f1034e8023e72bd3a7

minikube config set cpus 4
❗ These changes will take effect upon a minikube delete and then a minikube start

minikube config set memory 5048
❗ These changes will take effect upon a minikube delete and then a minikube start


minikube start
😄 minikube v1.25.2 on Darwin 10.15.7
✨ Automatically selected the hyperkit driver. Other choices: virtualbox, sshs

Provision Storage

Storage and retention policies around metrics and log data is very important. In this walk through we are using host system for storage. In a production system durable alternatives are needed. Please plan for amount of data and rentention policies.

Please read about Loki Storage. This proof of concept is using file system storage. Please refer to Loki Storage Retention Policies regarding additional configuration for data retention.

kubectl create namespace grafana

namespace/grafana created

kubectl apply -f 03_yaml/grafana-pv-pvc-minikube.yaml --namespace grafana

persistentvolumeclaim/grafana-volume-claim created

Confirm that the PV and PVC are created and bound.

kubectl get pv -Akubectl get pvc -A

Storage YAML files:

  • grafana-pv-pvc-minikube.yaml

Persistent Volume Claim when running in Minikubem uses dynamic storage provisioning.

  • grafana-pv-pvc-dockerdesktop.yaml

Persistent Volume and Persistent Volume Claim when running in DockerDesktop on Mac volume needs to be provisioned in DockerDestop under preferences, resources, file sharing first.

  • grafana-pv-pvc-dockerdesktop-wsl.yaml

Persistent Volume and Persistent Volume Claim when running in DockerDesktop on Windows via WSL, folders need to be created on host system first and then mapped to persistent volume using /run/desktop/mnt/host/c/ convention.

Provision Ingress Controller

Provision ingress contoller, in case of minikube we just have to enable ingress addons. Kubernetes NGINX Ingress Controller is the the default Ingress class that comes along with minikube.

minikube addons enable ingress

▪ Using image k8s.gcr.io/ingress-nginx/controller:v1.1.1
▪ Using image k8s.gcr.io/ingress-nginx/kube-webhook-certgen:v1.1.1
▪ Using image k8s.gcr.io/ingress-nginx/kube-webhook-certgen:v1.1.1
🔎 Verifying ingress addon...
🌟 The 'ingress' addon is enabled

In the case of other Kubenetes Clusters please install via helm chart.

helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx

helm repo update

##Ingress-NGINX Version 1.5.1, Helm Chart Version 4.4.2
helm upgrade --install ingress-nginx ingress-nginx/ingress-nginx --version 4.4.2 --namespace ingress-nginx --create-namespace

Confirm Ingress class was deployed and the ingress service is available.

kubectl get services -A

This should show ingress-nginx-controller and ingress-nginx-controller-admission services provisioned.

Generate TLS certificates

We will use CloudFlare CFSSL to generate trust material so that the imageapi service and grafana dashboard is accessed via https with SSL termination occuring at the ingress. CloudFlare cfssl can be easily installed on all operating systems or built from source.

Note: Please never ever commit TLS trust material to source code repository, this is a simple walk through and my tls keys and cert not actually used anywhere. One should never commit tls keys or certs in source code repo.

- Generate CA for Self Signed TLS material

Generate ca using cfssl. CA defintions in myaceme_ca.json file.

cfssl gencert -initca 02_tls/myaceme_ca.json | cfssljson -bare 02_tls/myaceme_ca

2022/04/29 18:49:46 [INFO] generating a new CA key and certificate from CSR
2022/04/29 18:49:46 [INFO] generate received request
2022/04/29 18:49:46 [INFO] received CSR
2022/04/29 18:49:46 [INFO] generating key: rsa-2048
2022/04/29 18:49:46 [INFO] encoded CSR
2022/04/29 18:49:46 [INFO] signed certificate with serial number 614045538385103150398837043280959282421471767182

- Generate self signed tls material for grafana dashboard

Defintion for tls certificate in grafana.json file, profile.json contains tls signing profiles.

cfssl gencert -ca 02_tls/myaceme_ca.pem -ca-key 02_tls/myaceme_ca-key.pem -config 02_tls/profile.json -profile=server 02_tls/grafana_tls/grafana.json | cfssljson -bare 02_tls/grafana_tls/grafana

2022/04/29 18:54:34 [INFO] generate received request
2022/04/29 18:54:34 [INFO] received CSR
2022/04/29 18:54:34 [INFO] generating key: rsa-2048
2022/04/29 18:54:34 [INFO] encoded CSR
2022/04/29 18:54:34 [INFO] signed certificate with serial number 124590712233076516768785810343258332253533015392

- Generate self signed tls material for imageapi service

Defintion for tls certificate in imageapi.json file, profile.json contains tls signing profiles.

cfssl gencert -ca 02_tls/myaceme_ca.pem -ca-key 02_tls/myaceme_ca-key.pem -config 02_tls/profile.json -profile=server 02_tls/imageapi_tls/imageapi.json | cfssljson -bare 02_tls/imageapi_tls/imageapi

2022/04/29 18:56:13 [INFO] generate received request
2022/04/29 18:56:13 [INFO] received CSR
2022/04/29 18:56:13 [INFO] generating key: rsa-2048
2022/04/29 18:56:13 [INFO] encoded CSR
2022/04/29 18:56:13 [INFO] signed certificate with serial number 43858409390059653480603204189057375942926449031

Install Grafana

We add grafana tls secret so the same can be used by Grafana Ingress.

kubectl create secret tls grafana-ingress-tls --key 02_tls/grafana_tls/grafana-key.pem --cert 02_tls/grafana_tls/grafana.pem --namespace grafana  

secret/grafana-ingress-tls created

We add grafana helm repo and then install grafana chart via helm.

Please note the overide “ — values 03_yaml/grafana-values.yaml”, This values files contains ingress defintion along with tls secret and configuration for storage.

helm repo add grafana https://grafana.github.io/helm-charts

helm repo update

##Grafana Version 9.3.6, Helm Chart Version 6.50.6
helm upgrade --install grafana grafana/grafana --version 6.50.6 --values 03_yaml/grafana-values.yaml --namespace grafana --create-namespace

If all is well, you should see following output:

Release "grafana" does not exist. Installing it now.
W0501 18:22:30.363934 50090 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
W0501 18:22:30.368409 50090 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
W0501 18:22:30.480730 50090 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
W0501 18:22:30.480731 50090 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
NAME: grafana
LAST DEPLOYED: Sun May 1 18:22:29 2022
NAMESPACE: grafana
STATUS: deployed
REVISION: 1
NOTES:

1. Get your 'admin' user password by running:

kubectl get secret --namespace grafana grafana -o jsonpath="{.data.admin-password}" | base64 --decode ; echo

2. The Grafana server can be accessed via port 80 on the following DNS name from within your cluster:

grafana.grafana.svc.cluster.local

If you bind grafana to 80, please update values in values.yaml and reinstall:

securityContext:
runAsUser: 0
runAsGroup: 0
fsGroup: 0

command:
- "setcap"
- "'cap_net_bind_service=+ep'"
- "/usr/sbin/grafana-server &&"
- "sh"
- "/run.sh"


Details refer to https://grafana.com/docs/installation/configuration/#http-port.
Or grafana would always crash.

From outside the cluster, the server URL(s) are:
http://grafana.malcolm.io


1. Login with the password from step 1 and the username: admin

Confirm that grafana was successfully deployed and running.

kubectl get pods --namespace grafana

NAME READY STATUS RESTARTS AGE
grafana-77d44ccbff-sttjp 1/1 Running 0 4m21s
kubectl get services --namespace grafana

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
grafana ClusterIP 10.99.137.178 <none> 80/TCP 4m43s

Confirm that grafana ingress is available. This does take a while on my system so please have patience while ingress is provisioned.

kubectl get ingress -A

NAMESPACE NAME CLASS HOSTS ADDRESS PORTS AGE
grafana grafana <none> grafana.malcolm.io 192.168.64.23 80, 443 5m45s

Now add the ip address the the hosts file to that grafana dashboard can be accessed.

192.168.64.23 grafana.malcolm.io

Access Grafana Dashboard using user name as admin.

grafana-login

But what is the password ?, The instructions for this was displayed in the output from the helm install. We could have aslo set admin password in grafana-values.yaml.

1. Get your 'admin' user password by running:

kubectl get secret --namespace grafana grafana -o jsonpath="{.data.admin-password}" | base64 --decode ; echo
...
kubectl get secret --namespace grafana grafana -o jsonpath="{.data.admin-password}" | base64 --decode ; echo

x66M3z5gaVP5sWUlpbASiTuiEoiHLhJFOvAGQgG9
grafana-login

We have grafana running but it does not do anything since we do not have any data sources or data, lets configure that next.

Install Loki

We already have grafana repo added to helm, so we can just install loki chart via helm.

Please note the overide in“ — values 03_yaml/loki-values.yaml”, This values files contains configuration for storage and more importantly starts loki with auth enabled. With auth enabled Loki expects X-Scope-OrgID header this determines where the logs being posted are associated or which logs to return when loki is querried.

##Grafana Loki Version 2.7.2, Helm Chart Version 4.4.2helm upgrade --install loki grafana/loki --version 4.4.2 --values 03_yaml/loki-values.yaml --namespace grafana --create-namespace

You should see the following output when Loki is installed.

Release "loki" does not exist. Installing it now.
W0501 18:41:42.012528 51382 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
W0501 18:41:42.076079 51382 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
NAME: loki
LAST DEPLOYED: Sun May 1 18:41:41 2022
NAMESPACE: grafana
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
Verify the application is working by running these commands:
kubectl --namespace grafana port-forward service/loki 3100
curl http://127.0.0.1:3100/api/prom/label

Confirm loki pods and services

kubectl get pods --namespace grafana

NAME READY STATUS RESTARTS AGE
grafana-77d44ccbff-sttjp 1/1 Running 0 20m
loki-0 1/1 Running 0 101s
kubectl get services --namespace grafana

grafana ClusterIP 10.99.137.178 <none> 80/TCP 21m
loki ClusterIP 10.104.137.92 <none> 3100/TCP 2m13s
loki-headless ClusterIP None <none> 3100/TCP 2m13s

With Loki installed we are now ready to send data to Loki, which will be the centralized logging system.

How do we send data tto Loki. We use PromTail, Promtail will scrap logs and push them to loki.

Install Promtail

We already have grafana helm repo added to our local helm so we can just install promtail.

##Grafana Promtail Version 2.7.2, Helm Chart Version 6.8.2helm upgrade --install promtail grafana/promtail --version 6.8.2  --values 03_yaml/promtail-values.yaml --namespace grafana

Please note overide “03_yaml/promtail-values.yaml”, This where the multi tenant magic occurs as promtail parse the log messges and sends it to loki for the correct tenant organization. Loki Multi Tenancy

This specifices location of the loki service i.e loki is the service name, grafana is the namepsace and svc.cluster.local is the qualified name for the local cluster.

clients:
- url: http://loki.grafana.svc.cluster.local:3100/loki/api/v1/push

The pipelines stages contains the stages to parse the json log message and extract the tenant information. So any pod that does not contain the app == imageapi label will be grouped under the adim org, else log by the given tenant id for the imageapi app.

pipelineStages:
- match:
selector: '{app!="imageapi"}'
stages:
- tenant:
value: admin
- match:
selector: '{app="imageapi"}'
stages:
- json:
expressions:
output: log
- json:
source: output
expressions:
tenant_id: TENANT_ID
message: message
trace_id: TRACE_ID
request_id: REQUEST_ID
logger_name: logger_name
level: log.level
className: source.className
methodName: source.methodName
error_message: error.message
error_trace: error.stack_trace
- labels:
tenant_id:
message:
trace_id:
request_id:
className:
methodName:
error_message:
error_trace:
- tenant:
source: tenant_id

You should see the following output when promtail is installed.

Release "promtail" does not exist. Installing it now.
NAME: promtail
LAST DEPLOYED: Sun May 1 18:56:16 2022
NAMESPACE: grafana
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
***********************************************************************
Welcome to Grafana Promtail
Chart version: 6.8.2
Promtail version: 2.7.2***********************************************************************

Verify the application is working by running these commands:

* kubectl --namespace grafana port-forward daemonset/promtail 3101
* curl http://127.0.0.1:3101/metrics

Confirm promtail pods, there is promtail service, promtail is a daemon sets that will scrapes logs.

kubectl get pods --namespace grafana

grafana-77d44ccbff-sttjp 1/1 Running 0 34m
loki-0 1/1 Running 0 15m
promtail-fjlwv 1/1 Running 0 47s

Install Prometheus

We will add Prometheus Community repo to our helm install and install Prometheus which will be responsible for scraping metrics.

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts

helm repo update

##Prometheus Version v2.41.0, Helm Chart Version 6.8.2
helm upgrade --install prometheus prometheus-community/prometheus --version 19.3.3

Note: On DockerDesktp we get error regarding the node exported, please execute the following to resolve the node exporter error if one is encountered.

kubectl patch ds prometheus-node-exporter --type "json" -p '[{"op": "remove", "path" : "/spec/template/spec/containers/0/volumeMounts/2/mountPropagation"}]'

Install ImageAPI Service

We will create imageapi namespace, create the imaeapi tls secret and install imagepai application.

kubectl create namespace imageapi

namespace/imageapi created


kubectl create secret tls imageapi-ingress-tls --key 02_tls/imageapi_tls/imageapi-key.pem --cert 02_tls/imageapi_tls/imageapi.pem --namespace imageapi

secret/imageapi-ingress-tls created

kubectl apply -f 03_yaml/imageapi.yaml --namespace imageapi

deployment.apps/imageapi created
service/imageapi created
ingress.networking.k8s.io/imageapi-ingress created

Confirm image api deployments and services are available.

kubectl get pods --namespace imageapi

NAME READY STATUS RESTARTS AGE
imageapi-56757c94f8-nszhn 1/1 Running 0 82s

kubectl get services --namespace imageapi

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
imageapi ClusterIP 10.109.177.127 <none> 8080/TCP 40s

Confirm if the imageapi application ingress is available.

kubectl get ingress -A

NAMESPACE NAME CLASS HOSTS ADDRESS PORTS AGE
grafana grafana <none> grafana.malcolm.io 192.168.64.23 80, 443 44m
imageapi imageapi-ingress <none> imageapi.malcolm.io 192.168.64.23 80, 443 2m11s

Add the ip address the hosts file.

192.168.64.23 imageapi.malcolm.io

Validate that Image API Application

imageapi service

This generates required logging and metrics for the walk through.

Start the tenant applications

Open couple of terminal windows by tenant and fire away api invocations using imageapiclient application.

imageapi client app

Configure Grafana Dashboard

  1. Create Orgs in Grafana using the admin credentials.
tenant orgs

2. Create Users for org again admin is creating these users and associating them to the previously created orgs.

tenant users

3. Admin then switches to each org and creates a datasource specific for the tenant.

admin switch org
loki datasource

Need to create datasource using the X-Scope-OrgID http reader and value being Tenant_1 to Tenant_4. This header is important and only retrieves logs associated to the tenant, without this header error 404 will be returned from loki.

So there will be 1 datasource for each tenant and each data source can only see logs for its respective tenant.

loki datasource by tenant
tenant logs

4. Tenant specific ops user will login to their respective tenants and access thier own datasource created by the admin.

tenant dashboard

5. Configure Prometheus metrics datasource, using same convention as loki, in this case prometheus-server is the service name running in the default namespace and svc.cluster.local is the qualified name in the local cluster.

metrics dashboard.

Hope this walk through was helpful in getting up and running with Grafana and Loki.

--

--

Malcolm Pereira

Agile Practitioner, Cloud and Programing Enthusiast. Experiments with technologies : Languages, Concepts, Tools and Utilities