AWS Monitoring

The AWS app watches your AWS accounts and surfaces what actually matters in your project's Updates feed: 5xx error spikes, cost jumps, new security findings, and notable changes — plus one daily report that rolls all of it up. AWS accounts are noisy by design; the app's whole job is separating signal from that noise.

What you get

  • Incident updates — when a load balancer, API, Lambda function, or CloudFront distribution starts throwing 5xx errors well above its own learned baseline, an update opens in the feed with the numbers and a link to the exact resource in the AWS console. When the errors stop, the update gets a "resolved" beat and leaves the feed.
  • Security alerts — new Security Hub findings at or above your chosen severity (Security Hub aggregates GuardDuty, Inspector, and AWS config checks). A recurring finding alerts once, not every day.
  • Change tripwires — a curated set of always-alert events from CloudTrail: IAM users/keys/policies changing, a security group opened to the internet, CloudTrail or GuardDuty being disabled, KMS keys scheduled for deletion, root account logins.
  • A daily report — one update per day with each account's section: yesterday's spend vs. its 7-day norm with the services that moved, incidents, new findings, and a digest of what changed and who changed it. Twenty accounts still produce one feed row.
  • The AWS panel — a sidebar panel with a 30-day cost chart by service, open incidents, recent findings, and the change log across all connected accounts.

You can also just ask the agent things like "why did our AWS bill jump yesterday?" or "show me 5xx rates on the prod ALB this week" — the same tools the monitor uses are available in chat.

Connecting an AWS account

AWS doesn't use OAuth. Instead Avi assumes a read-only IAM role in your AWS account — created for you by a one-click CloudFormation stack:

  1. In Project Settings → Integrations → Add Integration, choose AWS.
  2. Enter your 12-digit AWS account ID (and an optional label like "prod"), then click Launch stack in the AWS Console. A pre-filled CloudFormation page opens — review it and click Create stack. The stack creates one role whose every permission is read-only — CloudWatch metrics, Cost Explorer, CloudTrail lookups, Security Hub findings. Avi can observe the account, never change it.
  3. That's it. Avi detects the new role automatically (usually under a minute) and the connection completes on its own. Deleting the stack later disconnects it.

Prefer to manage IAM yourself (Terraform, restricted consoles)? Expand Manual setup on the same screen for the equivalent CLI commands and policy documents, then paste the resulting role ARN.

No access keys are ever stored — Avi requests short-lived credentials from AWS each time it needs them, and you can revoke access at any moment by deleting the role.

Multiple accounts: repeat the same steps for each account (prod, staging, dev, …). Each connection is independent.

Turning on monitoring

Enable the aws app for the project, then add an aws subagent in Project Settings → Subagentsone instance per connected account. Pick the account's credentials secret in the instance config, and optionally tune:

  • regions — which regions to watch (default us-east-1).
  • reportHourUtc — when the daily report publishes.
  • severityFloor — minimum Security Hub severity that opens an incident (default HIGH).
  • costSpikePercent / costFloorUsd — how big a daily cost move must be, in percent and dollars, before it's flagged.
  • errorRateMultiplier / errorMinCount — how far above its own baseline a resource's 5xx count must be.
  • ignore — substrings to suppress (resource names, finding titles, event names, principals).

Per-account instances mean per-account tuning: prod can check every 15 minutes with strict thresholds while dev checks hourly and ignores small cost swings. The instance's instructions field steers judgment in plain English — e.g. "never alert on the load-test stack; cost spikes in SageMaker are expected this month."

How it decides what's a real issue

Everything is compared against the account's own history, not fixed thresholds:

  • Errors — each resource's 5xx count is checked against its trailing 7-day baseline. An alert needs to clear a multiple of that baseline and an absolute minimum count, so three errors against a baseline of one stays quiet, and a real spike doesn't.
  • Cost — each service's daily spend is compared to its trailing 7-day median; both a percentage and a dollar floor must be crossed. AWS Cost Anomaly Detection results pass straight through when you have it enabled.
  • Changes — routine churn is filtered out deterministically; only tripwire events alert immediately, and everything else is summarized in the daily report so nothing disappears silently.
  • Whatever passes those gates gets one final pass against your instructions before it reaches the feed. Suppressed items still appear in the daily report.

Open incidents don't re-alert — they gain new beats if things get materially worse, and resolve themselves when the metrics return to baseline.