Cloud Native means embracing failures. At Agilicus, our strategy for security is Defense in Depth. In a nutshell, assume bad things will happen and have a fallback position, rather than dying on the hill of the first line. Similarly for reliability we assume Strength in Numbers. Rather than spending large time and money on a single infinitely reliable thing, we assume each component will fail, and have a strategy to make that invisible.
A big part of our strategy is the Pre-emptible node. This means that the underlying machines our software runs on can be powered off without notice. This might sound like a bad thing, but, consider, kernels panic, hardware fails, its going to happen anyway. Would you rather make it infrequent enough you don’t know what to do when it happens? Or would you rather make it part of a normal workday and have a solution? Embrace Failure.
The Agilicus strategy to embracing the failure involves Kuberentes, Istio, retries, etc. Many pieces. And sometimes one of those needs improving or investigating. And, the first question is always: “what restarted when”? For this, you would think, “kubectl -n namespace get pods
“, right? Well, turns out that kubectl
is a liar. It misses events like a node panic, showing an age which doesn’t correlate to when it started.
But, all is not lost, kubectl
is capable of telling you the truth, its just not the default. So, this small script below was invented. It uses the jsonpath output and a wee bit of formatting to make it beautiful readable.
$ cat ~/bin/p-times
#!/bin/bash
ns=$1
(
echo -e "Pod\tnodeName\tstartTime\tstartedAt"
kubectl -n "$1" get pods -o=jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.nodeName}{"\t"}{.status.startTime}{"\t"}{.status.containerStatuses[0].state.running.startedAt}{"\n"}{end}'
) | column -t
Running it gives this output. Adding the nodeName makes it quite wide, feel free to remove that. We use it to correlate to node-logs in Elasticsearch if needed.
$ p-times istio-system
Pod nodeName startTime startedAt
istio-citadel-849cc4bfb6-ksklm gke-noc-noc-preempt-pool-356e72e3-heye 2021-01-15T15:13:55Z 2021-01-15T15:15:13Z
istio-citadel-849cc4bfb6-l5cqs gke-noc-noc-preempt-pool-bc2cfcf6-gmcm 2021-01-14T05:59:18Z 2021-01-14T21:59:04Z
istio-galley-6476d5f4df-qx5q5 gke-noc-noc-preempt-pool-bc2cfcf6-gmcm 2021-01-14T05:59:15Z 2021-01-14T21:56:13Z
istio-galley-6476d5f4df-x5vp5 gke-noc-noc-preempt-pool-356e72e3-heye 2021-01-15T15:13:54Z 2021-01-15T15:14:35Z
istio-ingressgateway-6d7d9cc94-2cr62 gke-noc-noc-preempt-pool-356e72e3-heye 2021-01-15T15:13:54Z 2021-01-15T15:14:47Z
istio-ingressgateway-6d7d9cc94-dc2g6 gke-noc-noc-preempt-pool-bc2cfcf6-lw40 2021-01-14T05:59:18Z 2021-01-15T01:37:03Z
istio-pilot-89497d7c9-kndft gke-noc-noc-preempt-pool-bc2cfcf6-lw40 2021-01-14T05:59:18Z 2021-01-15T01:37:45Z
istio-pilot-89497d7c9-pjgjp gke-noc-noc-preempt-pool-356e72e3-heye 2021-01-15T15:13:58Z 2021-01-15T15:15:28Z
istio-policy-7589588588-b8dhk gke-noc-noc-preempt-pool-bc2cfcf6-lw40 2021-01-14T05:59:19Z 2021-01-15T01:37:58Z
istio-policy-7589588588-vkc4s gke-noc-noc-preempt-pool-356e72e3-heye 2021-01-15T15:13:53Z 2021-01-15T15:14:21Z
istio-sidecar-injector-67d9c85d6f-l52hr gke-noc-noc-preempt-pool-bc2cfcf6-gmcm 2021-01-14T05:59:15Z 2021-01-14T22:00:42Z
istio-sidecar-injector-67d9c85d6f-pcll7 gke-noc-noc-preempt-pool-356e72e3-heye 2021-01-15T15:13:58Z 2021-01-15T15:15:33Z
istio-telemetry-6f6fb677bd-jj7h8 gke-noc-noc-preempt-pool-356e72e3-heye 2021-01-15T15:13:54Z 2021-01-15T15:14:50Z
istio-telemetry-6f6fb677bd-jt4dg gke-noc-noc-preempt-pool-bc2cfcf6-gmcm 2021-01-14T05:59:15Z 2021-01-14T22:01:00Z