Azure Lab Guide: Troubleshoot Performance with Azure Monitor, Schedule VM Backups, and Bulk Invite Guest Users
Lab Overview
This lab covers three high-frequency Azure admin tasks:
- Use Azure Monitor metrics to identify infrastructure performance bottlenecks.
- Configure Azure Backup to schedule VM backups in a Recovery Services vault and understand which VM OS types and states are supported.
- Bulk onboard external users as Azure AD (Entra ID) guests using the correct bulk operation and PowerShell invitation cmdlets.
Lab Objectives
By the end of this lab, you will be able to:
- Use Azure Monitor to analyze time-series metrics and isolate a performance issue.
- Differentiate when to use Metrics, Logs, Activity log, Advisor, and Traffic Analytics.
- Configure VM backup policies and confirm supported VM workloads.
- Bulk invite external users to your tenant using:
- Portal-based Bulk invite users
- PowerShell automation using New-AzureADMSInvitation (or Microsoft Graph equivalent)
Prerequisites
Permissions
- Monitoring Reader or higher on the Azure resources you are troubleshooting.
- Backup Contributor (or Contributor) on the Recovery Services vault and protected VMs.
- User Administrator or Guest Inviter role (Entra ID) for guest invitations.
- Some tenants restrict guest invites via External collaboration settings.
Resources
- An Azure subscription with at least one workload showing performance symptoms (VMs, App Service, AKS, Storage, etc.).
- A Recovery Services vault in the same subscription.
- A CSV file with external user names and email addresses (500 users).
Lab 1: Troubleshoot Infrastructure Performance Using Azure Monitor Metrics
Goal
Identify the likely cause of performance degradation using Azure infrastructure metrics (fast, near-real-time signals like CPU, memory-related counters, disk, and network).
Why Azure Monitor (Metrics) is the correct tool
- Metrics are stored in a time-series database optimized for fast charting, alerting, and trend analysis.
- This is the first stop for โperformance issuesโ when you suspect the platform layer (VM host, disk throughput, network) or resource limits.
Where Azure Monitor fits compared to other options
Use this as your mental model:
| Tool | What it answers | When to use |
|---|---|---|
| Azure Monitor (Metrics) | โWhat is the resource doing over time?โ | CPU spikes, disk queue, NIC throughput, saturation, throttling |
| Azure Monitor (Logs / Log Analytics) | โWhat happened and why?โ | Deep investigation across logs, KQL correlation, app + OS events |
| Activity log | โWho/what changed my resource?โ | Deployments, scale operations, NSG changes, restart, write operations |
| Advisor | โWhat should I improve?โ | Recommendations: cost, performance, reliability, security posture |
| Traffic Analytics | โWhatโs the flow pattern on the network?โ | NSG flow logs analytics, traffic patterns, security review |
Step-by-step: Metrics investigation workflow (Portal)
Step 1: Identify the affected resource(s)
Start from the resource you suspect:
- Virtual machine: check CPU, disk, network
- App Service: check CPU, memory working set, requests, response time
- Storage: check latency, throttling
- AKS: check node metrics, pod pressure, cluster autoscaler behavior (often via Container Insights)
Step 2: Open Metrics
Azure portal โ Monitor โ Metrics
Or resource โ Monitoring โ Metrics
Step 3: Select scope
Pick the specific resource(s):
- VM:
VM-SRV01,VM-SRV02 - Disk: managed disk resource (if needed)
- NIC: network interface resource (if needed)
Step 4: Add the โfirst-passโ metrics (typical for VM performance triage)
For Windows/Linux VMs (platform metrics vary by SKU/agent, but these are common checks):
Compute
- CPU Percentage (or CPU utilization)
Disk
- Disk Read Bytes/Sec
- Disk Write Bytes/Sec
- Disk Read Operations/Sec
- Disk Write Operations/Sec
- Disk Queue Depth (if available)
Network
- Network In Total
- Network Out Total
Step 5: Set time range and granularity
- Time range: last 1 hour (then expand to 24 hours for patterns)
- Granularity: start with 1 minute for spikes, then 5 minutes for trends
Step 6: Interpret common patterns (quick diagnosis)
Use these โsymptom to causeโ mappings:
- CPU pinned 90โ100% for long periods
Likely compute saturation, too small VM size, runaway process, scaling needed. - Disk throughput flatlines at a ceiling + latency symptoms
Likely disk IOPS/throughput limit for SKU/disk tier, or heavy IO pattern. - Network in/out spikes + packet drops (if you have NVA or guest stats)
Likely network saturation, large data transfers, or misrouted traffic. - Performance changed suddenly at a specific time
Cross-check Activity log for a deployment, resize, restart, NSG change.
Step 7: Create an alert (to catch recurrence)
Azure portal โ Monitor โ Alerts โ Create
- Signal type: Metrics
- Example: CPU > 85% for 10 minutes
- Action group: email / ITSM / webhook
Validation checklist
- Metrics show a clear correlation between user-reported slowness and infrastructure saturation.
- You captured the โbefore/afterโ baseline.
- You created at least one metric alert to detect recurrence.
- If root cause is unclear, you escalated to Logs (Log Analytics / VM insights).
Lab 2: Schedule Azure VM Backups to a Recovery Services Vault
Goal
Enable scheduled backups for Azure VMs using Azure Backup with a Recovery Services vault, and confirm which VMs are supported.
Core concept: What you can back up
Azure VM backup supports a wide range of OS types and also supports backing up VMs regardless of whether they are currently running.
In practice, for typical Azure IaaS workloads:
- Windows client (including Windows 10) can be protected (64-bit).
- Windows Server versions (including 2012 and above) are supported.
- Many Linux distros are supported (Debian versions are supported, subject to Azureโs backup support matrix).
- VM power state (running vs stopped) does not prevent backup eligibility.
Step-by-step: Configure backups (Portal)
Step 1: Open the vault
Azure portal โ Recovery Services vaults โ select your vault
Step 2: Start backup configuration
Vault โ Backup
- Where is your workload running? Azure
- What do you want to back up? Virtual machine
Step 3: Select the VMs
Choose the VMs you want to protect.
Step 4: Assign or create a backup policy
Pick:
- Backup frequency (daily/weekly)
- Backup time
- Retention (daily/weekly/monthly/yearly)
Step 5: Enable backup
Submit. The vault registers the VM for protection.
Step 6: Trigger an on-demand backup (recommended for validation)
Vault โ Backup items โ Azure Virtual Machine โ select VM โ Backup now
Verification steps
Vault โ Backup jobs
- Confirm job status: Completed
Vault โ Backup items โ VM
- Confirm latest recovery point exists
Operational notes (what admins should watch)
- Ensure the VM has required connectivity and that backup extension operations succeed.
- Ensure retention aligns with ransomware recovery needs.
- For ransomware scenarios:
- Prefer restoring to a new VM for full restore.
- Use File recovery to retrieve specific files safely into a clean environment.
Lab 3: Bulk Create Guest Users from a CSV (Correct Methods)
Goal
Create guest user accounts for 500 external users listed in a CSV file.
Key concept: Guests are invited, not created like members
A guest user workflow is an invitation flow.
- Creating โnormal usersโ uses user creation operations.
- Creating guests uses invitation operations.
This matters because:
- โBulk create usersโ is for member users, not guest invites.
- For guests, use Bulk invite users or an invitation cmdlet/API.
Method A: Portal bulk invite users (fastest for admins)
Step 1: Prepare CSV
Use columns that align with Entra ID bulk invite requirements. Common fields:
emaildisplayName- optional:
invitedUserMessageInfoor redirect URL preferences (tenant-dependent)
Step 2: Run bulk invite
Azure portal โ Microsoft Entra ID
- Users โ All users
- Select Bulk operations โ Bulk invite users
- Upload the CSV
- Start the operation
Step 3: Verify results
- Entra ID โ Users โ filter by User type = Guest
- Check invitation status if available
Method B: PowerShell automation (invitation cmdlet)
Why this works
The invitation cmdlet is designed to:
- Invite external users
- Create guest objects in the directory
- Send invitation emails (unless configured otherwise)
Steps (high-level workflow)
- Import CSV
- Loop each record
- Send invitation per user
- Log successes and failures
- Validate guest objects exist
Example structure (AzureAD module pattern)
If using New-AzureADMSInvitation, the flow is:
- Connect to AzureAD
- For each user:
- Invite with
New-AzureADMSInvitation - Set redirect URL
- Optionally customize invitation message
- Invite with
Validation
- Confirm 500 guest user objects exist
- Confirm invitations were issued successfully (audit logs can help)
Note: In many environments, admins now prefer Microsoft Graph PowerShell (
New-MgInvitation) because AzureAD module is being phased out. The principle stays the same: use an invitation operation for guests.
Common failure points and fixes
1) Guests canโt be invited
- Check Entra ID โ External Identities โ External collaboration settings
- Ensure your role is allowed to invite (or that โadmins onlyโ is not restricting you)
2) Duplicate invites
- If a guest object already exists, invitation may fail or re-invite depending on settings.
- Pre-check by email (UPN or mail attribute) before inviting.
3) CSV formatting issues
- Wrong headers or extra whitespace
- Bad email formats
- Encoding issues (save as UTF-8 CSV)
Final validation checklist
- You can see guest user objects for the invited emails.
- Invitation process completed without significant failures.
- You captured a log of failed invites for reprocessing.
- You confirmed external collaboration policy allows invitations.
