So the other day I posted my pride and joy regex. You know, this one?
'^(?<host>[^ ]*) - \[(?<real_ip>)[^ ]*\] -
(?<user>[^ ]*) \[(?<time>[^\]]*)\]
"(?<method>\S+)(?: +(?<path>[^\"]*?)(?: +\S*)?)?"
(?<code>[^ ]*) (?<size>[^ ]*)
"(?<referer>[^\"]*)" "(?<agent>[^\"]*)"
(?<request_length>[^ ]*)
(?<request_time>[^ ]*)
\[(?<proxy_upstream_name>[^ ]*)\]
(?<upstream_addr>[^ ]*)
(?<upstream_response_length>[^ ]*)
(?<upstream_response_time>[^ ]*)
(?<upstream_status>[^ ]*) (?<last>[^$]*)'
Seems simple, right? But, it leads to a set of questions:
- If we can get a ” in the path, we can do a quoting-style-escape to avoid getting logged
- The regex engine used in fluent-bit is onigmo. And it has some CVE. This means its conceivable that a pattern that a user can put on the wire can escape into our trusted most privileged logging container (running privileged, node filesystem mounted, etc)
- DDoS. We log a lot. But the logs are often bigger than the thing they are logging.
For #3, consider this. Its a connection log from istio. Yes you read that right, a TCP SYN( ~64 bytes) creates this in JSON of 816 bytes:
{“level”:”info”,”time”:”2018-09-17T20:12:59.912982Z”,”instance”:”tcpaccesslog.logentry.istio-system”,”connectionDuration”:”12.740646ms”,”connectionEvent”:”close”,”connection_security_policy”:”none”,”destinationApp”:””,”destinationIp”:”10.244.1.57″,”destinationName”:”payment-6cdc5b656-fkhxh”,”destinationNamespace”:”socks”,”destinationOwner”:”kubernetes://apis/extensions/v1beta1/namespaces/socks/deployments/payment”,”destinationPrincipal”:””,”destinationServiceHost”:””,”destinationWorkload”:”payment”,”protocol”:”tcp”,”receivedBytes”:117,”reporter”:”destination”,”requestedServerName”:””,”sentBytes”:240,”sourceApp”:””,”sourceIp”:”10.244.1.1″,”sourceName”:”unknown”,”sourceNamespace”:”default”,”sourceOwner”:”unknown”,”sourcePrincipal”:””,”sourceWorkload”:”unknown”,”totalReceivedBytes”:117,”totalSentBytes”:240}
Hmm, so you are seeing where I am going. You remember a few years ago where we found that NTP could be asked its upstream list? So a small packet would create a large response? And, being UDP, could be spoofed, so the response could go to someone else? Making it a great DDoSsource.
Well, my log. Your SYN costs me a lot more to receive than it costs you to send. Think of all the mechanisms below that (elasticsearch, fluent-bit, kibana, storage, network, cpu, ram, …).
Hmm.
Now about #2. That is a bit of a trouble point. Who wants to find that the regex that is parsing the field that any user can send you via netcat is itself prone to a crash, or remote escape? Not me.