Cloud configuration errors are a significant concern for stakeholders invested in modern DevOps processes, thanks to the quantity of cloud-native software used in production environments these days (think of microservices, as well as serverless and containerized workloads such as Kubernetes). Misconfigured cloud environments can result in everything from poor performance, to system downtime, to data breaches.
Cloud-native architectures mean the introduction of new attack surfaces. Complex architectures with various network stack components can be involved in volatile Kubernetes pod scenarios, microservices architectures primarily relying on API-based integration across systems, or applications running outside the managed cloud environment.
This article provides insight into some common cloud configuration errors and how to recognize them. Even more importantly, this article explores how you can help avoid them in your various DevOps processes.
Common cloud configuration errors
There are three common reasons for cloud configuration errors:
- The overly complex infrastructure of cloud-native architectures and cloud platforms makes it hard to track and spot errors. Often, this is compounded by overstretched teams with knowledge gaps, who are not using managed services and miss important configuration steps—especially when deploying cloud architectures quickly, without understanding all the complexity of the cloud.
- Configuration drift. Cloud providers strongly recommend using infrastructure as code (IaC) to allow for the automated, template-based deployment of cloud resources. Some examples include Azure ARM templates and Bicep, AWS CloudFormation, but also HashiCorp Terraform or Pulumi as multi-cloud deployment solutions. They all integrate with the most common DevOps pipeline solutions. However, besides standardizing automated deployments, cloud admins can often make changes in other ways, like through an admin management portal or command-line interface (CLI). Any changes outside your IaC and DevOps frameworks are considered configuration drift.
- Failure to properly configure cloud environments in development or release. Organizations should rely on DevOps automation backed by IaC, but that’s easier said than done. Dev environments often deviate from the eventual production environment. Even administrative permissions can be different—DevOps teams and cloud admins might have more permissions in the dev environment than in production, leading to misconfigurations and conflicting setups.
Common cloud configuration problems
Now that you’ve learned about some common cloud configuration errors, consider some ubiquitous configuration problems seen in cloud-native scenarios:
Lack of access control
One widespread issue is the lack of tight access controls and failure to apply the Principle of Least Privilege (PoLP) for both machine and human access to systems.
Cloud and DevOps teams often have too many privileges that they don’t need. Having permissions that are too powerful (for example, full administrator or owner roles) can lead to misconfigurations and pose security issues—such as exposing data that your DevOps team should not see.
Apply the PoLP in your Cloud Teams. Only a handful of admins should have owner permissions, and most tasks should not rely on continuously configured administrative permissions. Instead, look into privileged identity management solutions, allowing for just-in-time permissions.
In other words, grant adequate permissions to an admin to perform their administrative task—and nothing more—for a specified amount of time (typically a couple of hours). If your cloud environment doesn’t provide a privileged identity management solution, consider undertaking regular audits to validate current and required permissions.
Overly permissive network flows
Overly permissive networks and unrestricted inbound/outbound ports are another common problem in cloud-native architectures.
First, most cloud providers allow for enabling Remote Server Management ports (RDP, SSH) for virtual machines (VMs). Infrastructure compute resources like VMs or Kubernetes clusters are bound to a virtual network. By design, all IP-related traffic within such a virtual network is allowed.
The same goes for network communication between your back-end servers and the front-end load balancers. Applications and resources can have more access than needed, posing a security risk. It can also lead to “pass-the-hash/pass-the-ticket” attacks (Use alternate authentication material: Pass the hash, 2022) or make it easier for malware to spread across servers with the same network topology. This also applies to hybrid network scenarios and integrating cloud network services with on-premises data center VLANs (or across branch offices). Before you know it, all network resources could be infected—on-premises, remote, and those running in cloud environments.
The primary recommendation is to integrate network security and firewall solutions into each network stack component. For example, keep VM host-based firewall services (such as Windows Firewall) enabled to protect the operating system and application layer, and allow built-in virtual network services like Azure Firewall, Azure Network Security Groups (NSG), or AWS Network Firewall. For hybrid connectivity, rely on on-premises firewall applications to protect and secure these boundaries.
Lack of observability
Configuration errors that impact observability often include restrictive permissions that prevent access to logs and other data.
Observability and monitoring are key to running a healthy platform, whether on-premises or in a cloud environment. If your DevOps team cannot access the full architectural stack, that raises observability challenges. As you already saw with permissions, you can only monitor what you can manage. Admins don’t need administrative permissions to perform monitoring. Reader or viewer privilege is enough.
With a single cloud provider, developers would normally rely on the cloud provider’s monitoring solutions. For hybrid and multi-cloud topologies, deploy a monitoring and observability solution that spans all clouds. Kubernetes, for example, works perfectly fine with open-source observability solutions like Prometheus and Grafana.
Poorly configured data storage endpoints
Another common problem is insecure data storage endpoints. While these cloud services are secure-by-design, relying on HTTPS and offering out-of-the-box encryption, there have been several documented instances, including in 2020 and 2022, where these secure data endpoints were misconfigured.
One issue is that, although these cloud storage solutions provide security features, they’re often not enforced. For example, an Azure Storage Account allows for both HTTP and HTTPS communication and does not enforce HTTPS-only by default—it merely provides the option.
Another issue is that cloud storage is by default a public-cloud endpoint, which means that technically anyone could connect to the URL of the storage endpoint. Similarly, while most cloud providers offer data and storage endpoint encryption, organizations should look into using a bring your own keys (BYOK) solution for more customized encryption security and protection, along with a key-rotation system to avoid compromised keys.
Another layer to highlight here is poorly configured data storage that allows authorized users (cloud admins and DevOps teams) to access information outside the scope of their responsibilities. Often, allocated administrative privileges permit management of the cloud service aspects, which also gives access to the actual data stored in the cloud.
Mitigation means first limiting administrative permissions—what cannot be managed cannot be mismanaged. Next, make your DevOps teams aware of all available security settings for cloud data storage endpoints and integrate policies to enforce them across your cloud environments.
Missing effective secrets management
With encryption keys, these issues could be handled with an effective secrets management policy. Applying IaC and cloud-deployment automation means that your DevOps teams are continuously handling secrets, which they must do correctly.
Never store secrets in deployment templates—they are not secure. Don’t save any secrets hardcoded in application configuration settings inside your cloud services. It’s recommended that you instead use a secret vault service, such as Azure Key Vault, AWS Secrets Manager, or HashiCorp Vault.
Incomplete or failed audits
As previously mentioned, another common problem is a failure to validate configurations or perform regular auditing. This is another observability issue: If you don’t monitor your environment, you cannot properly manage it, leading to persistent or unnoticed configuration errors.
Audit your DevOps team’s administrative permissions to learn from them, then lock them down and allocate only the necessary privileges required to perform a task. Next, given the dynamics and fast-changing environment of the cloud, perform those audits regularly on all possible levels: network, storage, compute, application, and administrative access.
Starting from the built-in auditing capabilities of the cloud provider, consider extending them with third-party, multi-cloud, or multi-platform auditing solutions. When your auditing reports and outcomes are in place, perform regular revisions and implement necessary changes continuously.
Failure to scan cloud-native resources and artifacts
Finally, consider the risks of not scanning third-party resources (for example, container images in your Kubernetes cloud environments) and not validating scans of your application source code or IaC template definition files. Insecure packages, vulnerabilities, and malware are rapidly and easily exposed as part of pre-built development artifacts or nested inside Docker container images.
Your DevOps teams should apply the concept of “shifting left” and moving towards DevSecOps practices. This means that security scanning and security validation becomes an integral part of each cycle of your DevOps process. Integrate security code scanning with source control solutions such as GitHub Repository Security, or integrate container and cloud container registry code-scanning with Snyk or similar security solutions.
The cloud has a lot of benefits, but you can’t take them all for granted. With the rapid adoption of cloud-native software solutions, like Kubernetes containerized architectures, and with microservices and serverless becoming more popular, organizations often face new security-related problems.
Trend Micro is a trusted partner for your multi-cloud strategy and has the necessary solutions to optimize your cloud migrations and deployments. Take a look here to learn more.
Use alternate authentication material: Pass the hash. Use Alternate Authentication Material: Pass the Hash, Sub-technique T1550.002 – Enterprise | MITRE ATT&CK®. (n.d.). Retrieved August 10, 2022, from https://attack.mitre.org/techniques/T1550/002/
Use alternate authentication material: Pass the ticket. Use Alternate Authentication Material: Pass the Ticket, Sub-technique T1550.003 – Enterprise | MITRE ATT&CK®. (n.d.). Retrieved August 10, 2022, from https://attack.mitre.org/techniques/T1550/003/