Azure RBAC Catalog

Live site: rbac-catalog.dev

What is Azure RBAC?

Azure Role-Based Access Control (Azure RBAC) is an authorization system built on Azure Resource Manager that provides fine-grained access management of Azure resources. It enables you to manage who has access to Azure resources, what they can do with those resources, and what areas they have access to.

With Azure RBAC, you can segregate duties within your team and grant only the amount of access to users that they need to perform their jobs. Instead of giving everybody unrestricted permissions in your Azure subscription or resources, you can allow only specific actions at a particular scope. Azure RBAC includes over 800 built-in roles, or you can create your own custom roles tailored to your organization's needs.

For more information, see What is Azure role-based access control (Azure RBAC)? in the official Microsoft documentation.

About This Catalog

A comprehensive catalog and monitoring tool for Azure built-in RBAC roles. Browse roles, explore their permissions, track changes over time, find least-privilege roles based on operation requirements, and get AI-powered role recommendations.

Key Features

  • Role Catalog: Browse all 800+ Azure built-in roles with full permission details.
  • Operation Explorer: Search 20,000+ resource provider operations.
  • Change Tracking: Monitor when Microsoft adds, modifies, or deprecates roles.
  • AI Role Recommender: Describe what you need in natural language, get least-privilege role suggestions.
  • Diff Viewer: See exactly what changed between role versions.
  • 8 AI Recommendation Modes: From fast keyword matching to LLM-powered semantic understanding.

AI Recommendation Modes

The AI Role Recommender supports 8 different modes, each with different speed/accuracy trade-offs:

  • TF-IDF: Enhanced TF-IDF + BM25 keyword matching (CPU only)
  • Semantic: Pure sentence embedding similarity using sentence-transformers
  • ColBERT: Token-level late interaction for precise matching via ColBERT index
  • Cross-Encoder: Bi-encoder retrieval followed by neural reranking
  • LLM: Fine-tuned Qwen2.5-0.5B-Instruct model for direct inference via Ollama
  • RAG: Retrieval-Augmented Generation with LLM reranking
  • HyDE: Hypothetical document generation + semantic search
  • Hybrid: Multi-stage pipeline: TF-IDF → Embeddings → LLM

Fine-tuning was done using Unsloth for efficient LoRA training on consumer hardware. The model takes natural language queries like "I need to read blob storage" and outputs structured JSON with role recommendations and confidence scores.

Knowledge Base

Each role is converted into a searchable document_text combining:

  • Role name & description — From Azure's Role Definition API
  • Action keywords — Tokenized from expanded permissions (e.g., Microsoft.Compute/virtualMachines/powerOff/actionvirtualmachines poweroff action)
  • Curated patterns — Human-written query examples (e.g., "read blob storage")

Knowledge Base Pipeline

The diagram above shows how data flows from Azure APIs and curated patterns through the tokenization pipeline to produce the searchable document text used by the AI recommendation engines.

References

Andrea Tomassilli
Andrea Tomassilli
Senior Software Engineer