Recently there has been a DDoS attack against VoIP provider voip.ms. A tough problem to deal with for sure. And, one of the challenges relates to the end-users and the inevitable reconfiguration of their in-premise devices.
Many of the end users have devices like a Cisco SPA-3102, a > 10 years old widget. These devices have worked hard doing the same thing over and over, day in and day out. And, they have received no software updates in a long time. It would be unsafe to allow their manage console to have Internet access, or even lateral access within the network. Strong identity, multi-factor authentication, etc have passed them by.
Many of these devices are in lights-out or remote locations. What is a safe and secure way to remotely administer them so you have strong identity and an audit trail? Some would say, well, VPN to the building they are in, that is fine. But, then that means they are always accessible to all VPN users.
Others would say, get in a car and drive over. There has to be a better way. And, of course, one of our customers found one. So I thought I would share.
Ingredients:
raspberry pi
Cisco SPA-3102
Frustration with voip outage
Agilicus Identity-aware web application firewall
A sprinkling of multi-factor authentication
So, the use case here… the Cisco device management console gets a new TLS-protected HTTPS endpoint on the public Internet. No VPN needed. And, not a single packet hits it except for an authorised user, courtesy of an identity-based proxy. Open browser from work, change the DNS name of the sip server, and boom, back to receiving telemarking duct-cleaning calls in peace. Progress!
The moral of the story. Old devices can have modern access securely, without a fuss.
Your cyber insurance is up for review. If you can get all applications authenticated with multi-factor, you can afford it. But, you have only managed to get the new ones done, leaving the miscellaneous. Let’s talk about how an authenticating proxy can get the rest done, with no work, no fuss. Become compliant right quick.
Authentication is often viewed as a user-is-present activity, making authorisation obvious. But, users are encouraged to create API keys by many SaaS tools, and, these present real authorisation challenges.
Your CRM will allow any user to create this key, it will have full privilege as them, no multi-factor, no timeout. They will then paste it into e.g. Zapier, a tool in their home directory, whatever. From here, control becomes blurry.
Instead, consider an authentication proxy which enriches with this key. Each transaction is authenticated and authorised, and, the proxy then adds the API key it holds in escrow. This authenticating proxy can take care of reducing authorisation scope at the same time.
The world wants the web. Simple web applications are easier to scale, easier to use, more accessible. And, more standard for authentication, for multi-factor, for multi-platform. But, sometimes we have some legacy, an older desktop application that still gets the job done. Can we bridge that gap somewhat? Yes we can!
You have an internal tool. Perhaps Grafana, Prometheus, Tableau, Nagios. You get an alert, its via Slack, Chat, etc. You click. The link goes nowhere. You curse. I should really fix this, my phone doesn’t have access to 10.0.0.99, that’s internal only. But, it’s just an internal tool, so you don’t leap into action. Half an hour later, repeat. And repeat. Or maybe it’s that you need your outsourced NOC vendor to see the performance metrics? Either way, it’s just not important enough to spend what must be many hours to fix.
What if I showed you you could get this fixed in an hour? And be highly secure. And simple to use? Your librenms, your grafana, on your phone, when you click a link in the ChatOps tool. No VPN. Any user.
In April 2021 a criminal group DarkSide successfully shut down major energy pipelines of the United States. How did the criminals get in? The Ransomware came via the VPN. The VPN (Virtual Private Network) is conceptually a ‘really long network cable from your house to the company’. People often believe it’s part of their security posture, but, in practice, it’s a risk. In hindsight, an unacceptable risk.
In this case, the VPN existed so that the team could work remotely, access the various and sundry internal services (email, wiki, reporting, HR, …) that they need to do their jobs. No one is questioning the need for remote access. But, a blunt tool of a VPN, it’s not fit for purpose. Ransomware transmits via the VPN too.
I’m sure the company in question is currently addressing the symptoms. The shared passwords, leaked. They are probably investing in multi-factor authentication. Perhaps they are even investing in some segmentation of their internal network. Some will be suggesting “blow it up and outsource it”, move to managed SaaS, etc. It’s not a panacea. Stitching together a single Identity and multi-factor across applications is not easy, whether they are SaaS or Self or Managed.
The real answer is to move to a Zero Trust architecture. Lower the blast radius. Authenticate the user. Authorise the action. Provide Access to a single resource. A user can now access what they need, but not more.
Zero Trust is part of a Defense In-Depth strategy. Imagine each component being breached, and, have a plan for what will happen next. An application gets breached? Well, it cannot reach the rest of the network. A user is compromised? They can only do their normal activities on their normal applications. No more ransomware spreading via the VPN.
How do you get there quickly? An identity-aware authenticating web application firewall as a reverse proxy is a good part of the solution. Quickly ramp up and enable single, strong identity with multi-factor on all legacy applications. Enable multi-factor without reworking them. Remove the need for the VPN. It’s not worth the risk.
Would you like to discuss? Or perhaps you would like to just try.
Your burgeoning fleet of virtual machines poses a problem: No public IP means you use a VPN to access. But, you have only sufficiently secured the SSH, you worry about what else the VPN can access.You have considered an SSH jump box, but then the SSH is not end-to-end encrypted. You have a set of 3rd parties and vendors you want to sometimes grant access to a single server, but not all via a VPN. What do to?
SSH by its very nature is end-to-end encryption with strong protection against Man-In-The-Middle attacks. All servers need to be accessible via SSH to be manageable, often by external users (e.g. vendors, outsourced NOC, etc). However, despite SSH being strong on encryption it is challenging on accessiblity. The servers are typically on private networks (e.g. Virtual-Private-Cloud VPC, internal network VLAN’s, etc). Making them directly accessible to the Internet can be dangerous (we would have to somehow police that all users have passphrases on their keys, that we don’t have passwords allowed, etc.). SSH jump-boxes add a step to the workflow, are difficult to ssh-port-forward and scp through. VPN access is difficult to secure on a per-server basis, often being all-or-none.
In the corner sits a small server. It has a directory called ‘HR’, ‘Finance’, etc. These are shared via Windows file sharing to the appropriate teams. You know that ransomware is a risk, but, this is how you do business.
You’ve been asked how to make this share available to a contractor or external service provider, but you are reluctant to create an account in your Active Directory, given them VPN access.
Users have complained that they must start the VPN to get this shared directory.
You’ve considered e.g. Dropbox, but then QuickBooks doesn’t run, your Excel spreadsheet gets corrupted by multiple edits.
What if I told you this was a very simple problem to solve, not requiring any rework or rethinking. A simple tactical solution that won’t break the bank. That provides a simpler end-user experience, and, no ransomware risk? Don’t believe me? Give it a try!
Zero Trust. The principle of limiting access to user resource pairs. It is part of a good defense in depth strategy. Defense in depth means having multiple layers, each augmenting and overlapping each other. It means you have a fallback position if something is breached. And, for a Zero Day, that breach happens suddenly and without warning. Zero Day Zero Trust are good complements.
Recently vmware announced a 9.8 CVE in something that is the core of many networks today, vSphere. In a nutshell, walk all over your world if you have any network access. Now, many of you will say, but I have a firewall and a VPN, I am safe. Good for you. You have done a small piece of Defense In Depth, and, I hope enough. But, you have certainly not done all you could. You see, that VPN+Firewall creates a ‘outside bad inside good’ mentality. You have something inside with infinite risk. And, you have no controls on how things that, once they are inside, can wander around.
Perhaps a phishing attack will influence a user to click on something that causes their browser to wander over to your vSphere system? Perhaps a lower security system will get a wee bit of malware on it? But, they can escalate upwards.
The description for this vmware problem says “A malicious actor with network access to port 443 may exploit this issue to execute commands with unrestricted privileges on the underlying operating system that hosts vCenter Server.” Hmm, your browser inside the network. It has access to port 443. You can be tricked into running JavaScript that will do so.
Or maybe you have put port 443 in your vsphere into some sort of DMZ and its not really firewalled? After all, its encrypted, TLS, it must be secure, right?
In a zero trust model, the user to that lower security system would be authorised, but, not able to get to the vSphere system. It would augment that firewall + vpn model, and, prevent that lateral attack that you are now worried about. You will still need to get the patch applied (patch early patch often), but, the defense in depth inherent would lower the risk, lower the urgency.
These three simple steps will dramatically reduce your ransomware risk.
Enable Multi-Factor Authentication on your identity provider (Microsoft Active Directory, Google Workspace)
Enable Enterprise Sign-In/Single-Sign-On/SAML on all your applications. Disable *all* local accounts including admin
The remainder applications which don’t support single identity? Fix them or Fire them (or see me for an authenticating reverse proxy to keep them alive).
A Florida water treatment plant breached. People nearly poisoned. A SCADA device exposed via Windows & Team Viewer. Not where we want to be. How did it happen, how do we prevent systematically? Read On!
Earlier I wrote about the attack on a Florida water treatment plant. A remote site had an old Windows 7 PC with Team Viewer and shared passwords, and, well, the rest wrote itself.
A new blog post out gives some more details about a ‘watering hole’ attack that occurred just in advance. In a nutshell, a (legitimate) related industry site was compromised and served a wee bit of malicious JavaScript. Someone from the target opened a page there.
As we can see in the screen shot, a script is present that probably the original author did not wish. WordPress is particularly hard to secure for Content-Security-Policy, but, a script-src directive in the CSP would have prevented this issue. The browser would have rejected loading the malicious script.
source: dragos
I’ve written a lot about Content-Security-Policy (andsomevideos!). In particular, I showed how it protected my personal site (well the users of it) from malicious content injected by some malware extension they had installed. Its a great seat belt. Head on over to the Mozilla Observatory and check your site.
Now, let’s assume that either a) we didn’t prevent with Content-Security-Policy, or b) the attacker worked around. What would be our next step (Defense In Depth!). Well, Zero-Trust. If we assume that we can’t fix human behaviour, that people are going to need remote access to things, we may as well make it safe and convenient. A simple single-sign on, Zero-Trust Network Access model, we could have made that SCADA device safely usable remotely, removing the value of the attack.
Spam. The cat and mouse game of advertisers seeking to reach more people for less cost, and, people seeking to spend more to not be reached. The current state of the art in proving “I am not a spam-sending robot” is the captcha. Do you love the captcha? Me neither. Do you sometimes fail it? Me too!
I was very intrigued to see that Cloudflare has decided to (mis-use? re-purpose?) the WebAuthN standard, and specifically, the security-key, to prove your humanity. You can head on over https://cloudflarechallenge.com/ and try it out with your YubiKey et al.
Its quite simple, if a bit inconvenient. On the web site you see a challenge button. You press it, it asks for permission to see your security key, you grant it, you press the button on the key in your USB hub, challenges are generated, and the web site appears. Is this better than the Captcha? Um, maybe? You are not doing the ‘labelling’ homework for some corporation. You don’t have to squint at the various boxes on the screen. Is that a hill? or a smudge? Is that a bus or an RV? So, I guess its more deterministic.
Now, I found this worked just fine with my YubiKey on my desktop. But when I tried to use the built-in trust store of my Google Pixel 4 I found that it was an ‘unsupported issuer’.
Is this a good idea? Well, half of me things yes. More people with multi-factor authentication devices, getting used to using them, would be good. But, how can it take off? Which web site owner will turn away users without a speciality device?
Grade 10 English. We learned the W5/6 (Who, What, Why, When, Where, How). Its a common framework to, well, frame something. So i thought, why not apply it to the problem domain at hand: Zero Trust Networking.
Who, Authentication, Identity: This is how we identify a user. We use a first class identity provider that already exists (Azure, Google, Apple, Okta, Auth0, etc). No sense making yet another password. Trust the upstream identity provider.
What, Authorisation. This is what you are allowed to do. Trust the upstream identity provider but verify, control. This is owned by our customer.
How, Access. After I know who you are, and what you are allowed to do, the next step is to make it happen. This is where the magic occurs, we intersect the Who with the What, and them make it transparent.
Earlier I wrote about the Security Report Chain (via security.txt), how one documents their policy on vulnerability reporting. You can see the the Agilicus’ security.txt here, as well as the policy. Well, people find them, and, they send you information in the hopes of gigantic bug bounties. Let’s talk about the reports.
The Security Report Chain breaks down into 2 categories so far. The first are exceptionally basic, no research, things like “your web site runs on non-tls”. Well, yes… but all it does is “301 redirect https://”, here we have an HSTS record. And, we are all-in on TLS, being on the preload list. So, uh, thanks for nothing.
Its the second type of Security Report Chain that intrigue me more. They have taken some time. I think they have some automated tools. And they often say things like:
So yes, this is true. And, it does take a bit of reading to understand what it is, and why its not only safe, but desirable. The standard is OpenID Connect, and, it has a discovery document (that first link). In turn, it has a means of discovering the current public keys, which are rotated hourly. Why do we do that? Well, the standard requires it. But what is the intent?
The first, the discovery document, this allows zero-configuration software to just work. A client can read this document, use the configuration, and proceed onwards with OpenID Connect. Nothing in the document is security sensitive (the alternative would have been to hard code it in the client). It simply says which URI to talk to to authenticate, how to obtain tokens, and, how to validate them. Does this belong in my Security Report Chain
The second, the keys (JWKS) are the public key. This allows a client which has received a token from the authentication process to independently verify it. It does not allow you to sign a token, it is not the private key.
OK, so in this Security Report Chain the reporter has done some scan, has found something of interest. Without knowing the standard, they have assumed that encryption keys should not be available unauthenticated. So they have followed my policy, reported the “problem”. But, here’s where I’m not sure what to do. They are doing this solely because they want a bug bounty. I decline to pay such for things which are “as designed, not a bug”, on principle. Would I be better to pay $5 for their diligence? I sent them a response (there were more than a few on similar topics) pointing out individual reading they could do to learn about the standards involved.
I performed my period audit of my accounts. And, to my surprise, I found the password for my rubygems was in the breach corpus. The 2nd-factor caught the save, but… the password was generated via pwgen 12 (so it looked like aibeaNongoo0). I think you will agree that was not guessed somehow. So, on this topic, when was the last time you opened chrome://settings/passwords/check?start=true and checked your accounts for safety? Well, read the next couple of paragraphs and then go to it.
There’s a spectrum of password strength. One the one end, some people use something guessable (a pet, a birthday), and, they re-use the password on many (all?) sites.
Next we have those who have a strong(ish) password, but reuse that across multiple sites.
And then we have the strongest password approach: every site gets its own password, and they are strongly generated. This necessitates a password manager (I use KDE Wallet, which stores in my GPG keyring).
Layered on top of this is a multi-factor strategy. Interestingly, it dramatically imrpoves all three strategies. The breach of the bad password does not have your 2nd-factor code generator in it. These end up being uncorrelated risks, and the combination is very strong.
However, this is becoming very tedious. Over the years one accumulates hundreds of online accounts. Some merchants force one to buy a single product. A few forums here and there, suddenly you have two or three hundred accounts to audit. Changing the password across them is no mean feat.
So what is the solution? My very strong password was breached on a single site, saved only by the 2nd factor. Well, in my opinion, the answer is to remove the password. That’s right. Rather than make it 16 characters long, I want to go to 0. Use a single common identity provider, via OpenID Connect. Secure that appropriately (strong password + 2nd-factor). And force each and every web site I use to accept it for authentication (without sharing the password).
OK, gentle reader, now your homework. Open chrome://settings/passwords/check?start=true. Check yourself out on https://haveibeenpwned.com/. In your browser, in the saved passwords, if it flags any, fix it. That means going to the web site in question, and changing the password to a new single-use one, and enabling multi-factor authentication if available.
Empowered people make pragmatic decisions to improve the productivity of themself and their team. This can often lead to the dreaded “shadow IT’, and, more specifically, “Identity Sprawl” with each system having unique login credentials, impacting security, scaling poorly, increasing end-user complexity.
The problem? Each system, introduced to fix a targetted problem, often has a new, unique, login for each new user. We call this ‘Identity Sprawl’. When the team is 3 people, this can be ok. But as the team scales, the inevitable problems arise:
users recycle the password
users do not get removed when they leave the team or organisation
password reset flow becomes a challenge (who do I talk with)
security audits fail
Solutions to Identity Sprawl exist, but are often championed by a different team, or too complex a process to obtain permission to use. OpenID Connect, SAML provide strong simple identity. Identity providers such as Azure, Google, Okta exist to harmonise the user side. However, often we are left with systems which don’t have a centralised authorisation layer (the other half of identity). Introducing the Agilicus Identity Aware Web Application Firewall can solve this in a single, central point in the network.
Let the new applications emerge, without fear of password breach, without complex retooling and rewriting to move the authorisation into them.
Summary: deploy OpenWRT on a Mikrotik to achieve SpaceX Starlink + bonded DSL backup, with Zero-Trust Network Access inbound from any user, any network, any device.
A friend of mine lives far enough from the city that his only Internet choice has been DSL. 10+ years ago I helped him bond 2 DSL loops together to get a 5+ Mbps experience. It was the best achievable given no LTE, no cable. Suddenly, a wild low-earth-orbit satellite solution arrives in the form of SpaceX’s Starlink. The beta is marked as “Better than Nothing”. What have we got to lose? Let’s try!
Now, the DSL lines have been reliable over the years. We don’t want to have outages, so we want to have a way to automatically fail over to them (read on for why I chose not to balance them in active/active).
One of the requirements is remote access. We want to be able to view the security cameras from remote. Currently (pre-Starlink) this is achieved with an inbound VPN. However, the Starlink system uses RFC 6598 Carrier NAT (the 100.64.0.0/10 range). This means we cannot allow inbound VPN (or any port-forwarding / DMZ-type scheme) since we do not have a public IP. So how can we achieve that? We will use the Agilicus Secure Exposed Agent, which makes an outbound-only connection. From there we can make inbound connections over HTTP, end-to-end encrypted, securely authenticated.
One other requirement is to be able to see the Starlink statistics. Why? Because it is interesting.
We also wanted to keep the Google WiFi as-is, as the sole WiFi. The device that comes with the Starlink has poor coverage, doesn’t handle this one building, and certainly not the outbuilding. It will only cause interference. Back in the box.
On to the project. The Starlink system comes with 3 pieces: the dish, the black box with power + 2 Ethernet, and the silver/white WiFi router. That last piece is not needed, put it back in the box. Now, we must select a suitable router. I chose a Mikrotik hEX S (RB760iGS). There are quite a few to choose from, cross-join the product-list with the OpenWRT hardware support list. The router I chose uses MIPS, the details are here, with these instructions for install. This router supports POE input power with 5 Gigabit interfaces. You might chose the hEX if you don’t care for the SFP interface.
On top of the OpenWRT we will use the mwan3 package. This allows active-health-checking and load-balancing/failover of multiple WAN interfaces. We want to bond the 2 DSL lines as 1+1 active/active, and then make that group be secondary to the Starlink interface.
Now, you may wonder why I chose to not make a bonding group of 3? After all, more bandwidth is more better, right? Well, we have some differences between these types to consider. The first, the MTU. The DSL interfaces use PPPoE, which has a MTU (maximum transmission unit, e.g. the largest packet size) of 1492. However, the Starlink has 1500. I was concerned that devices would memorise the path-MTU on an IP node-pair basis, and, become confused by multiple flows with different sizes.
The 2nd is IPv6. Now, the Starlink does not seem to support IPv6 (yet? Its better than nothing after all!). But, when it does, we want to use it. IPv6 cannot (should not) be NAT’d, so it makes no sense to load balance across multiple interfaces if we cannot have a constant source IP.
The other concern I had was bandwidth. When you have a large number of devices each doing low bandwidth flows, it would make sense to bond all three interfaces with some appropriate weighting. But, here, with the DSL links individually very close to the minimum we want for a video conferencing connection, we don’t want any traffic on them unless we must. A bonded set of 4+4 is not 8Mbps, each TCP flow can only use one. We want reliable, high fidelity conferencing.
For the mwan3 setup I configured it as below. I used active health checking via ICMP.
We then setup mwan3 membership. Note that the weight of the satellite link is lower.
At this stage we tested. Perfect. It accurately detects the ‘upness’ of each interface. If the satellite is up, it attracts all the traffic. if down, it balances across the DSL. A couple of performance tests courtesy of fast.com and we can see that indeed the satellite delivers more bandwidth than the DSL. The latency varies between 35ms and 55ms, which is worse than the DSL (which clocks in about 20ms). But, on balance, its definitely an improvement.
Now, we want direct access to the Starlink stats. These live on 192.168.100.1. For this, we will add a static interface route to the ‘starlink’ wan port. Test, works!. For double bonus points we can add some polling into Grafana.
Now the remote access problem. The IP address we have inbound is 100.64…/10, not routeable. So the inbound VPN has to go. But how to access the security cameras? For this we use the Agilicus Secure Exposed Agent. We install it on the OpenWRT. It makes an outbound connection to the cloud. Each user gets a URL to directly access, over HTTPS, from any browser. They prove their identity using OpenID Connect (and their upstream identity provider) and boom, are directly connected. No messing around with a VPN. No worry about Dynamic DNS. No open ports, No DMZ. It uses all interfaces, preferring the satellite, just as above.
OK, this was a bit of a long post. I’ll skip the details of the config and setup. If you want to know more, feel free to email me (info@agilicus.com), or use that talk icon at the left (yes it really is a person, me, not a bot).
Shoutout to Nicolas for the encouragement when the flashing of the router was at its bleakest, and for the grafana screenshots of the stats.
And please remember to subscribe (that bell icon at the right).
Your website is based on WordPress. You use “Easy Forms for Mailchimp by Yikes“. You use Google Recaptcha to prevent Spam as well as Akismet. However, your Core Web Vitals WordPress score is too low, your are about to be penalised on search. How can you resolve this state of afairs? With Async and Defer for the scripts involved. Read on to see how I improved my site on my Quest for Performance.
Your web page has a large number of <script> tags. Each of these goes and fetches a bit of Javascript, parses it, runs it. That takes CPU and battery and network and time. The best thing you can do for performance is avoid the script entirely. If you cannot, try making the script simpler, shorter, smaller. If you cannot, in this case because Google Recaptcha is served via their CDN, the next best thing is to make it asynchronously load, deferring its needs until your page is loaded and visible.
So what are Async & Defer? Why do they matter for core web vitals in your wordpress? In a nutshell (see the <script> spec):
async — Tells the browser to execute the script when ready, without blocking any HTML parsing
defer — Tells the browser to delay script execution until HTML parsing is complete
OK, how can we add this to our WordPress web site? Well, we add a function to our theme functions.php. Below is the one I used, it defers all ‘yikes’, ‘akisment’, an ‘form-submission’. This causes the Google Recaptcha, the Yikes JS to load after the web page is up and visible.
The above changed improved my Core Web Vitals on my WordPress from 56 to 92. It improved my Longest Content Paint to 2.2s from 4.4s. Your mileage may vary.
Core Web Vitals is coming. Users prefer faster pages, so on search, faster pages (that are relevant) will be returned first. Recently I installed “The Events Calendar” for WordPress. I found it was loading a large amount of Javascript and CSS on ever page. But I only use it on /events (and its own post types). What do to to speedup wordpress?
In my Quest for Web Site Performance Perfection, I found these enqueues scripts, no matter whether you minifed or preloaded, were causing a lot of excess CPU and parse time. An infinitely fast network cannot help. Its probably also using your battery up. So, let’s dig in.
First, we have to track them all down. To do this I added an echo statement to where they were enqueued. There may be a more elegant way by reading the fine documentation, but, I was in a hurry and this worked.
Once done, I added this below function to my theme functions. What it does is “if this page is a tribe_event or /events then allow else dequeue”. This will speedup wordpress: less things sent to user.
Install, load, done. Now I have 42 less scripts/js enqueued, and this means more CPU for parsing other things, and, more battery for you the reader.
You can test your changes with the Chrome Lighthouse, or web.dev. Efficiency is in everyones interest, so save a bit, save a watt, speedup wordpress.
Recently a friend subscribed me and some others to a mailing list in Pardot. He was stunned when a few people immediately unsubscribed. Was it something he said? The unsubscribers denied all knowledge. Upon some digging, it was discovered that a corporate email security threat scanner was fetching HTTP links, and, the emails in question had a link http://host/unsub?user=name. You can see where this is going, the users were unsubscribed by the scanner.
The friend in question rushed in a solution to the Pardot symptom, based on this thread. Mission accomplished, right? Well, read on!
First, lets discuss why this should be safe. An HTTP GET (should be) is idempotent. No, not the thing you take the blue pill for. The operation that can be repeated without side effects. So, http://host/unsub?user=name should not cause an action, and, should be callable multiple times. The correct thing for Pardot to do would be to use a POST operation here. But, they, like many others, didn’t get the memo. Also, its a bit harder for them to do, and, well, why spend time when you can be lazy, right?
By this stage you are saying “so what, I don’t run Pardot, I don’t like mailing lists”. You have fallen victim misreading the symptom as ‘i.e.’ when it really is ‘e.g.’. Let’s talk about how much more dangerous this corporate email scanner is.
First, let’s make an assumption. Let’s assume your company runs an email server on site. You buy a anti-phishing, anti-malware product and bolt it on. Let’s say I know this or can reasonably assume it. That email scanner became a threat.
Now, let’s assume I know you have some other piece of equipment or software on site that also didn’t get the memo about GET vs PUT vs POST. For me, in my house, I have a Kankun smart switch, I’ve written about here. In a nutshell http://switch?state=on causes it to turn on. So you could now send me an email and turn my lights on. Hmmm. I wonder what other industrial automation is within the firewall that might have this property and be more expensive? SCADA? Front-door lock for after-hours delivery? is the email scanner threat worse than those things inhernently?
So now we have expanded the problem of the email security threat scanner, but let’s make it worse. Let’s assume there exists a tool like Facebook Business. In order to register a brand page, I need an email address @yourdomain. But I don’t work there. Good thing you installed that security scanner, I will go to Facebook, say “Add user@yourdomain”. Facebook will send an email to that address containing a link http://facebook/proof=true, which will get auto-clicked by the scanner. Boom, I can now impersonate your brand.
At my last company we had a corporate membership to ETSI. As long as email was @domain you could use it. So you could create a login, get it confirmed by the email auto-scanner, and, free standards.
We can now see the blast radius is large. By installing an automated email scanner inside the corporate firewall, and by not using Zero Trust networking, we have allowed anyone in the world to call URL inside our building, on any service, and, to create accounts or impersonate our own staff. Maybe we can use this to reset passwords, steal money, cause general mayhem.
What could this email security scanner do differently? I guess it could remove CGI parameters (the ?X=Y stuff). But then it might get a different page back. It could try and enumerate all the poorly written GET vs POST systems in the world. It could invent some marketing verbiage around machine learning paradigms and hope you don’t read further.
I recommend reading my companion article “The browser was the accomplice“, which has a video, for an understanding of how the same techniques as this email scanner threat can be used through a browser to trick you into setting fire to your manufacturing.
The US National Security Agency (NSA) published a document giving guidance on how (and why) to adopt a Zero Trust Security Model. The TL;DR is: NSA recommends embracing Zero Trust for all critical systems.
Their rationale is “Defense In Depth“: assume a breach is inevitable or has already occurred. So constantly limit access to only what is currently needed with granular access-controls. As the systems evolve and interconnect more often, at more points, across more boundaries, perimeter-based security (your VPN and single firewall) become more challenging to implement while also becoming less effective. They actual increase risk by causing a false-sense of security and reducing the time available to address real threats. Embracing zero trust means focusing on what will occur, and how to deal with it, rather than denying it can occur.
In the NSA document, they define Embracing Zero Trust (in turn from the NIST 800-207) as:
Zero Trust is a security model, a set of system design principles, and a coordinated cybersecurity and system management strategy based on an acknowledgement that threats exist both inside and outside traditional network boundaries. Zero Trust repeatedly questions the premise that users, devices, and network components should be implicitly trusted based on their location within the network. Zero Trust embeds comprehensive security monitoring; granular, dynamic, and risk-based access controls; and system security automation in a coordinated manner throughout all aspects of the infrastructure in order to focus specifically on protecting critical assets (data) in real-time within a dynamic threat environment. This data-centric security model allows the concept of least privileged access to be applied for every access decision, where the answers to the questions of who, what, when, where, and how are critical for appropriately allowing or denying access to resources
If you think about the load time of your web page, you often think only of bandwidth, image size. But latency is becoming a more critical factor than bandwidth, and, is much harder to solve. Latency occurs due to the speed of light, hand off between systems, etc. The worst kinds of latency are ones that are round-trip: a request needing a response before you can continue. And one of these is DNS latency, the name lookup to IP we rely on daily.
Today lets use a tool from Pingdom for performance testing (you can also use Lighthouse which is built into Chrome). It gives us a nice shiny letter-grade score, but also some details. Here we can see to load the Agilicus web site took 830KB, 600ms, and 32 requests. Its these last two that will help the most in optimisation as we examine DNS latency.
If we look at the Gantt chart, we can see that there is a set of blocking activities before we get going. They are:
DNS Latency Lookup (71ms)
SSL Handshake (20ms)
HTTP Connect (36ms)
Send Request (0.1ms)
Wait for response (23ms)
The largest times here are before we even get to the HTML. 140ms before our server even gets the GET. How can we improve?
Some would say get rid of the SSL. If this is you, read this page. “Is TLS Fast Yet“. And then read about 0-RTT TLS. These are trying to get the overhead of TLS (SSL) to 0.
But what about DNS latency? Its our top item. In this case, our DNS is run on Google DNS. The Browser emits a request to its recursive DNS (the one on your PC). The recursive DNS, if not in the cache, goes to (usually) another recursive, e.g. the one in your Router. This in turn usually goes to another recursive DNS (e.g. your ISP). This in turn may need to consult the root to find the origin server, and request. Each of those hand offs takes time. But also, we’ve gone to all the trouble to encrypt our traffic, and now we are sending unencrypted DNS around broadcasting what we are doing. Hmm.
We can improve things a little bit. We can’t do much about the first page, but if we use other sites (e.g. CDN, images, etc), we can add a prefetch tag:
Now, this only makes sense if its cross-origin (e.g. to another site), not to your own. Also, we might consider addign a preconnect in (since that will help get the TLS live):
But, here, on my web site, those preconnect/dns-prefetch will not do anything, since I serve all my “above the fold blocking” content from a single domain. Worse, the time seems to vary quite a bit, this run was 90ms for the DNS.
If my hostname were a CNAME, I could consider removing that and making it an A record, to avoid a DNS latency lookup. But its not.
If we look at solvedns.com dnsspeedtools, we can see that our authoritative (ns-cloud-d1.googledomains.com) is going to be ~40ms. A lot of that will be the ~15ms each way for the request-response round trip. So there is not much we can do.
So, is there a solution to initial DNS latency? It seems not, other than I suppose using an IP address instead of a host name. This works if your are Cloudflare with https://1.1.1.1/ but for the rest of us, its not very feasible. The DNS authoritative I use is already well-sinked and well-peered around the world.
The performance of your website is a big factor in your success. Search engines favour fast load times, as do users. Milliseconds matter. Web site performance is as important as the content, as important as the appearance.
The simplest way to test your web site performance is Chrome’s Lighthouse. Open your web page, hit “Ctrl-Shift-I”. Select “Lighthouse” and, well, there you go. You get a simple report as below. It also tells you specific actions to take to improve performance (showing which images, assets, css, scripts are the culprits).
Another simple tool is GTmetrix. You head there, you type your URL, you get a report card on your web site performance.
Earlier I showed you how to load your web site. Run your performance tests while the load is high to get an accurate view of what your users will see.
You would be surprised how easy it is to get from a web site performance score of 50 to a score of 75, give it a try. The top items that I had to do included:
Enable gzip encoding
Remove render blocking CSS (in my case this was the wordpress dashboard icons, I disabled for non-logged-in users)
Minify CSS and JavaScript
Use SVG instead of JPG
Add a Service Worker
Increase cache duration on static assets (and use a cache-busting-naming-scheme)
Move chat widget to onLoad event rather than in-band
Use HTTP/2 (courtesy of Istio + Envoy)
And, well, the affect was quite strong.
The Lighthouse tool in Chrome is also good for simulating mobile conditions (which have longer latency and slower processors). It is important to optimise for the mobile handset since half or more of your traffic will come from these devices.
Now that you have optimised for speed, take a look at optmising for Accessibility: you want all users to access your content, regardless of their abilities. And look at the other best practices and Search-Engine-Optimisation hints that are given.
It doesn’t take a lot of skill or care or effort to be in the top-quartile of websites. Just a little bit of patience. Give it a go!
Your new wite site uses new technology. Shake it down, load it up, see its performance using Locust and sitemap. Compare to your Istio metrics from Grafana. All with 0 configuration. Latency and load testing with locust and istio Too easy? Read on.
First, understand that a modern web site has a wel-known file called robots.txt. Its main purpose is in instructing a search engine how and what to see on your site. One of its main tools is the link to the sitemap(s). A sitemap is a full list of the pages on your site. So, if your site is setup automatically, our proposed latency and load testing with locust and istio can read robots.txt, from there find the sitemaps, read the sitemaps, and thus have a list of pages. An example robots.txt is below:
From here its super simple to do latency and load testing with locust and istio. I write a sitemap-parser-locust-task, see the code on Github. But the steps are super simple for you to run:
export SITE=https://mysite.ca
git clone https://github.com/Agilicus/web-site-load
cd web-site-load
poetry install
poetry run web-site-load
At this stage you can open your browser to http://localhost:8089 and start your test, see the statistics. After a few seconds we’ll start to see each URL on our site as a separate line. We’ll see the 90% percentile latency, and error counts.
We can also view this as a set of charts:
Now, we can cross-compare our latency and load testing with locust and istio to our istio mesh statistics from Grafana.
Istio Mesh Stats
So, what have we achieved? We are loading all pages on our site (via sitemap via robots.txt), in random order, from a set of threads. We can arbitrarily load down our web site and observe if there are errors (perhaps indicating a plugin or script problem). We can see that our databases, filesystems, load balancers are working.
And, we did it with (almost) no code. See the details on github.
import random
from locust import TaskSet, task
from pyquery import PyQuery
class SitemapSwarmer(TaskSet):
def on_start(self):
request = self.client.get("/robots.txt")
self.sitemap_links = ['/']
for line in request.content.decode('utf-8').split("\n"):
fn = line.split()[:1]
if len(fn) and fn[0] == "Sitemap:":
lf = line.split()[1:][0]
request = self.client.get(lf)
pq = PyQuery(request.content, parser='html')
for loc in pq.find('loc'):
self.sitemap_links.append(PyQuery(loc).text())
self.sitemap_links = list(set(self.sitemap_links))
@task(10) def load_page(self):url = random.choice(self.sitemap_links)self.client.get(url)
from os import getenv
from locust import HttpUser
from .sitemapSwarmer import SitemapSwarmer
class WebSiteLoad(HttpUser):
tasks = {
SitemapSwarmer: 10,
}
host = getenv('SITE', 'https://www.example.com')
min_wait = 5 * 1000
max_wait = 20 * 1000
Intuit’s QuickBooks operates a Desktop software with a database on a shared directory. If we look on Intuit’s site, we see many different methods and instructions for remote QuickBooks Share Access. We can “move” it. We can “share a remote server”, and we can “export/import” (which creates an exclusive lock, preventing access).
The “Share a remote server” option is typically done with a VPN. But the VPN creates many challenges:
It breaks your video conferencing, requiring users to disconnect the VPN before using Microsoft Teams, Zoom, Google Meet, etc.
It requires accounts in your VPN or Active Directory for all users (making it complex to work with your accountant)
It increases your ransomware risk, since it uses windows SMB mounting across more computers
There has to be a better way for remote quickbooks access. And, there is! If we look at “Sharing Files“, we can see a method using Zero Trust Networking. In this model we can directly make a directory on our current server, available to anyone, on any device, without a VPN. And, its live. The user can directly access this without exporting/importing, without moving the database around.
Shares: Outbound Agent Secure Exposed Access
The steps for remote quickbooks access are simple, the setup fast. the security high but convenient.
As an end-user experience remote QuickBooks access is pretty simple. Each user, via their profile web page, will see a list of all Shares they have access to. Clicking on one will open it in the browser, or, give a set of mount instructions. Mounting the share, done once, will make it permanently available. At this stage they can install the QuickBooks Desktop and use it, as if they were in the office. No VPN.
When it comes time to do your taxes, you can simply add the Google or Apple account of your accountant into the access Group. They can now access your QuickBooks server live.
Ransomware? None. The remote QuickBooks access Share is via HTTP and WebDav, no SMB to transmit.
Forget about the complex export/import, or moving the database. Ditch the VPN. Rmote QuickBooks Access., live.
Get In Touch
Ready To Learn More?
Agilicus AnyX Zero Trust enables any user, on any device, secure connectivity to any resource they need—without a client or VPN. Whether that resource is a web application, a programmable logic controller, or a building management system, Agilicus can secure it with multi-factor authentication while keeping the user experience simple with single sign-on.
Recently a team mate ran into an issue. His browser would not let him proceed to a site he had just setup, with a valid certificate. We use Let’s Encrypt (and our sites are all in the HSTS preload so they must be TLS). Nonetheless, he was presented an error “The server presented a certificate that was not publicly disclosed using the Certificate Transparency policy. This is a requirement for some certificates, to ensure that they are trustworthy and protect against attackers.” What could be wrong?
Well in this case I asked him what time it was. He gave me an answer about 1 minutes in the past. Huh. Is your systemd-timesyncd running? No, its dead. Aha! Your NTP time is off by about 2 minutes. This certificate, from your perspective, will only become valid in the future.
We can check this using the Certificate Transparency logs. My favourite way to search them is crt.sh. This gives you a list of all Certificates issued, when, by who.
Its a good spot to check if someone is trying to spearphish your domain. Check for common mispellings.
So check your clocks. Its not enough to be close, you need to be exact. Else you might be p0wnd.
On Friday February 5th a water treatment plant in Florida was breached by an attacker looking to increase the chemical flow, poisoning people, destroying pipes. It was caught more or less by chance. How did the hacker get in, and how would we systematically make this more secure? Can Zero Trust secure SCADA? Read more about the ISA/IEC 62443 Micro Segmentation Zones and Conduits and how it inter-relates to Zero-Trust.
Well, we have some information in the Cybersecurity Advisory for Public Water Suppliers. And, in a nutshell, a Windows 7 PC was setup with internal access to industrial SCADA devices. The PC was also running TeamViewer, which allowed anyone with the password, from anywhere in the world, to behave as if they were in front of the monitor/keyboard/mouse. Your critical infrastructure defeated.
Now, clearly not a best practice, but, behind every bad practice is an unmet product need. Its easy to scold the folks who set this up and then shared the password around. But, they had a need to safely, securely, remotely control something that was very poorly designed for the purpose (SCADA).
If we were to architect SCADA today, we would do it using modern API’s, API gateways, JSON Web Tokens, etc. But, redesigning it means revisiting a lot of industrial power plants and manufacturing systems. So, you find this sort of setup fairly frequently, VNC or TeamViewer etc, bridging the air-gapped network. Its not a new threat, its fairly well documented, e.g. “Scared? Or Bored? Terrorists and teh power grid“. If we use Shodan.io we find many SCADA devices exposed one way or another. My favourite was this one which showed a cement plant on unauthenticated VNC.
So, given we have a system that should not be Internet exposed, and given that we have remote lights out installations it runs in that are expensive and inconvenient to staff, and given that people are bridging this gap unsafely, how can we enable to them to do so safely?
Enter Zero Trust. The principle that we move from bastion-based security to user<->resource pairs. We have a choice of how we would use it here. We would either use it to “remote” the SCADA devices themself (so each one would become a resource, and we would ensure only the right user had access, cryptographically secure), or, the remote desktop software itself. On the latter, what we might do is leave that facility-PC as-is, but, instead of allowing it to have TeamViewer or VNC (which have built-in or no authentication), we would prevent the PC from reaching the Internet. Instead, we would use a Zero Trust agent, securely exposing the local Remote Desktop (RDP) software.
This means team members could achieve their exact goal: when needed, safely access that PC, from any device, any network. But, we could control who, and have a full audit trail. There would be no “shadow IT authentication system”, it would be full single-sign-on, single identity provider.
So, would you like to know more or discuss how to use zero trust to secure SCADA? I’d love to get your feedback or questions. info @ agilicus.com,
Web applications get exploited, leading to economic and reputation damage. Rich content is difficult to protect. Complex standards and complex tooling fight with each other. New technologies like Angular Single Page Applications and externally-driven analytics make it difficult to construct a valid Content-Security-Policy. If you get this wrong, you get malware injected on your site, as I wrote about here. Today let’s learn how, if we must have an inline script, we can do so with content-security-policy complex nonce
Setting up Google Analytics (via Google Tag Manager) is difficult to understand and achieve securely: you are running Javascript fetched from an inline script, how do you cause it to be trusted without destroying all trust? The answer: a nonce. Yes, content-security-policy is complex without the nonce. Set the nonce (to a unique value) on each page load, and use this to indicate what scripts your page has requested.
I’ve talked earlier about the complexities of web security, about how hard it is to balance security and functionality. One of the tools that Content-Security-Policy allows is the Nonce. The Nonce must be set differently on each HTTP response, making it complex: it requires participation of the server.
In an Angular project, we normally use -aot and -subresource-integrity, this sets a secure hash on each resource that we build and serve. However, anything that is fetched externally (e.g. Google Analytics) is more challenging. The recommended way of using Google Analytics is via Google Tag Manager. In turn, you must use a Nonce with it (as shown here). How can we set that Nonce to be unique, on each request, in an Angular SPA? Read on!
Before we start adding the nonce to external scripts, lets ensure we have hash-based subresource integrity enabled for all internal, compiled Typescript:
ng build --aot --subresource-integrity --outputHashing=all --prod=true
If we look at our index.html in the dist directory, we will now see it looks something like:
Those ‘integrity=’ lines have a sha384 hash of the body of the file. You can read more about Subresource Integrity. These hashes were generated by Webpack as part of the angular build, in essence, for each file, it performed:
openssl dgst -sha384 -binary XXXX.js | openssl base64 -A
and this means your browser won’t load the resulting file if it has been modified. But, that is not what we are here for today, we are here for external resources (e.g. Google Tag Manager), and a safe way to allow them to be loaded, the nonce.
Next, we enable indexTransform. We cause it to, on production builds, add a script to the index.html HEAD section. We add a magic string CSP_NONCE which we will then replace in the server side (using lua in nginx).
npm i -D @angular-builders/custom-webpack
/*
* index-html.transform.ts
* This exists to modify index.html, after build, for prod,
* to insert the google tag manager
*/
import { TargetOptions } from '@angular-builders/custom-webpack';
import { environment } from './src/environments/environment.prod';
export default (targetOptions: TargetOptions, indexHtml: string) => {
let insertTag = `<!-- No script for gtm, non-prod -->`;
if (targetOptions.configuration || 'production') {
insertTag = `<script nonce=CSP_NONCE src=https://www.googletagmanager.com/gtm.js?id=${environment.gtmTag} async></script>`;
}
const i = indexHtml.indexOf('</head>');
return `${indexHtml.slice(0, i)}
${insertTag}
${indexHtml.slice(i)}`;
};
OK, no we are generating an Angular SPA with a header which loads our Google Tag Manager, with our tag, and a Nonce just waiting to be set. Note: you don’t have to use the Webpack transform, you can just hard code in the script line with the <script nonce=CSP_NONCE prefix if you prefer.
We will now replace the string nonce=CSP_NONCE with a new, per transaction value, in our nginx.conf, adding a new location /index.html which will find all CSP_NONCE in scripts, and alter:
Now, when we fetch our index.html, the string is replaced on each transaction.
The net affect of this is we can safely use Google Tag Manager, since it, and its chain of dependants, could only have been fetched via our code. If we have a flaw in our application, perhaps incorrectly sanitising user-generated content, it will be unable to fetch a script (since it cannot guess our Nonce). Give it a try! No more ‘*’ and ‘unsafe-*’ for Content-Security-Policy, there’s no need.
Make sure you have a Content-Security-Policy header set (you can use nginx add_header to set it) to a strong policy.
Now, lets test. My favourite tool is the Mozilla Observatory. Enter your URL, and let it scan. It will come back with a very actionable list for you. Below is an example output showing the results after we have added content-security-policy complex nonce.
A Doppelganger Domain is used in spear-phishing. (Its also a pretty terrible 1993 movie with Drew Barrymore). The concept: I register a domain very similar to the one you normally go to. Maybe I replace an ‘i’ with an ‘l’. Maybe its .co instead of .ca. Its particularly insidious since the TLS certificate can be valid, so you see the green icon etc.
A team member at Agilicus recently mistyped our domain. Never fear, chrome to the rescue. See the image above? We were warned. Google had this to say about unsafe domains. (Note, in this case the doppelganger is probably not unsafe, merely similar).
The general class of doppelganger detection is complex. You might find than an internationalised-domain name (IDN) uses a character that looks similar to you, but not to a machine. Do you render them in the font of your choice and diff the images? Do you do some span-of-difference letters detection?
One of the best ways, and probably what Chrome is doing, is watch your normal history, and compare that against a new domain you’ve never been.
But, the absolute best way: teach users to be suspicious. An email with a link? Don’t click.
OAuth 2.0 is a deceptively simple protocol. For many of us, we create a client id, client secret, set a few environment variables, and watch the black magic take effect. It turns Auth into a Boolean on/off switch. Great! But, what are the best practices for how to configure and use it if we are a bit more behind the scenes? Read on!