istio default and metrics customizations

One of the cool thing about istio is that it automatically expose metrics and if you have prometheus, you can easily query it 

Lets start by setting up your mesh

istioctl install --set profile=ambient --skip-confirmation


Install kubernetes gateway API 

kubectl get crd gateways.gateway.networking.k8s.io &> /dev/null || \
  kubectl apply --server-side -f https://github.com/kubernetes-sigs/gateway-api/releases/download/v1.4.0/experimental-install.yaml

Setup bookinfo app

You can refer to this link here: https://istio.io/latest/docs/ambient/getting-started/deploy-sample-app/

Configure your namespace to use ambient mode

kubectl label namespace default istio.io/dataplane-mode=ambient

Deploy prometheus and kiali

kubectl apply -f samples/addons/prometheus.yaml
kubectl apply -f samples/addons/kiali.yaml

Please ensure that you have update your mesh to include more metrics 

apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
spec:
  meshConfig:
    defaultConfig:
      proxyStatsMatcher:
        inclusionRegexps:
          - ".*outlier_detection.*"
          - ".*upstream_rq_retry.*"
          - ".*upstream_rq_pending.*"
          - ".*upstream_cx_.*"
        inclusionSuffixes:
          - upstream_rq_timeout





Some of the key metrics for http basic traffic are below:-

Generic HTTP request metrics

istio_tcp_sent_bytes_total

istio_tcp_connections_opened_total

istio_tcp_received_bytes_total

Circuit breaker 

envoy_cluster_upstream_rq_pending_overflow

envoy_cluster_upstream_cx_overflow

We can also change the behaviour using the telemetry API. For example we are removing response code using the following yaml


apiVersion: telemetry.istio.io/v1
kind: Telemetry
metadata:
  name: remove-response-code
  namespace: default
spec:
  selector:
    matchLabels:
      service.istio.io/canonical-name: bookinfo-gateway-istio
  metrics:
  - providers:
    - name: prometheus
    overrides:
    - match:
        metric: REQUEST_COUNT
      tagOverrides:
        response_code:
          operation: REMOVE
    - match:
        metric: REQUEST_DURATION
      tagOverrides:
        response_code:
          operation: REMOVE
    - match:
        metric: REQUEST_SIZE
      tagOverrides:
        response_code:
          operation: REMOVE
    - match:
        metric: RESPONSE_SIZE
      tagOverrides:
        response_code:
          operation: REMOVE

Then you will see this is proogated by istiod 



And after hitting productpage for a couple of time, we can query prometheus and as you can see the response_code is missing in the upper row. 



And if you remove the telemetry above and you hit the workload product page 100 times, you will see that the row with response code starts to increase. 


The code that will trigger these metrics are pilot/pkg/networking/core/listener_builder.go which then call pilot/pkg/model/telemetry.go generateStatsConfig to build the final Envoy configuration for the istio_requests_total metric (delegated to Envoy)







Comments

Popular posts from this blog

vllm : Failed to infer device type

android studio kotlin source is null error

gemini cli getting file not defined error