
A critical privilege escalation vulnerability has been identified in Azure Machine Learning (AML), allowing attackers with minimal permissions to execute arbitrary code within AML pipelines. This flaw, discovered by cloud security firm Orca Security, underscores the importance of stringent access controls and vigilant configuration management in cloud environments.
Understanding the Vulnerability
Azure Machine Learning is a cloud-based platform designed to build, train, and deploy machine learning models at scale. It utilizes pipelines to streamline complex workflows, with each pipeline consisting of interconnected components executed sequentially or in parallel. These components rely on invoker scripts—Python files that orchestrate the execution of ML tasks—stored within an automatically created Azure Storage Account.
Orca Security's research revealed that these invoker scripts, when modified, run with the permissions of the AML compute instance. This is particularly concerning because compute instances often operate under highly privileged identities. An attacker with write access to the storage account can replace these scripts to inject malicious code, leading to several potential exploits:
- Code Injection: By altering the invoker scripts, attackers can execute arbitrary code within the AML environment.
- Secret Extraction: Malicious scripts can access and exfiltrate sensitive information, such as secrets stored in Azure Key Vault.
- Privilege Escalation: Leveraging the managed identity of the AML compute instance, attackers can escalate their privileges, potentially assuming the role of the user who created the instance. If the creator holds "Owner" permissions on the Azure subscription, this could lead to full subscription compromise.
Upon disclosure, Microsoft acknowledged the findings but clarified that this behavior is "by design," equating access to the storage account with access to the compute instance itself. However, recognizing the security implications, Microsoft has implemented several changes:
- Code Snapshot Execution: AML now runs jobs using snapshots of component code rather than reading scripts from storage in real time. This change mitigates the risk of code injection via storage modification.
- Documentation Updates: Microsoft has updated its documentation to clarify the security model and provide guidance on securing AML environments.
Recommended Security Practices
To mitigate the risks associated with this vulnerability, AML users are advised to implement the following security practices:
- Restrict Write Access: Limit write permissions to AML storage accounts to prevent unauthorized script modifications.
- Disable SSO: Where possible, disable Single Sign-On on compute instances to prevent them from inheriting creator-level access.
- Use Minimal Permissions: Assign system-assigned identities with the least privileges necessary for the task.
- Enforce Immutability: Implement immutability and versioning on critical scripts to prevent unauthorized changes.
- Implement Checksum Validation: Use checksum validation for invoker scripts to ensure their integrity before execution.
This vulnerability is not an isolated incident. Similar issues have been identified in other Azure services:
- Azure Shared Key Authorization Exploitation: Orca Security discovered that abuse of shared key authorizations, a default on Azure storage accounts, could allow a threat actor to steal higher privileged access tokens, move laterally throughout the network, and execute remote code. Microsoft recommends disabling shared key access and implementing Azure Active Directory authentication.
- Privilege Escalation via Managed Identities: Researchers have demonstrated how attackers can leverage managed identities in Azure to escalate privileges and access resources beyond their initial permissions. This underscores the need for careful management of identity and access controls within Azure environments.
The discovery of this privilege escalation flaw in Azure Machine Learning highlights the critical need for robust security practices in cloud environments. While Microsoft has taken steps to mitigate the risk, users must proactively implement security measures to protect their resources. Regular configuration reviews, adherence to the principle of least privilege, and vigilant monitoring are essential in safeguarding machine learning pipelines and other cloud-based services.
Source: Infosecurity Magazine Privilege Escalation Flaw Found in Azure Machine Learning Service